Design and Implementation of a Fast Inter Domain Communication Mechanism Amitabha Roy [email protected] July 6, 2006

Abstract— This research proposal draws attention to the problem of different VM domains running applications that need to communicate efficiently with each other. It proposes the construction of a new device class for high performance inter domain communication while preserving isolations between virtual machines.

I. P ROBLEM S TATEMENT Virtualization is an old concept [1] that has recently made a comeback as a solution to the resource utilization problem in servers. All the current virtual machine implementations such as Xen [2], Vmware [3], Denali [4] and others support running more than one virtual machine on the same physical node. Most virtual machine hypervisors also provide one or more methods of inter domain communication (IDC). This is useful in cases where virtual machines move data and events explicity among each other to form a data pipeline. For example in the case of a web interface to a transaction processing system it is natural to use virtual machines to isolate the webserver and the database management system for fault containment, especially when running on the same physical node. Another area where explicit IDC is becoming important is grids where virtual machines are used to instantiate preconfigured environments [5]. These are then used to run parallel algorithms using application level communication abstractions such as MPI [6]. This research proposal focuses on Xen as a hypervisor platform with the aim of designing and implementing a high performance IDC mechanism. Xen supports running multiple virtual machines (called domains) on a physical node and provides virtual networks, shared pages and events as mechanisms for IDC. There are performance and flexibility problems with these mechanisms. Performance problems have been pointed out in a number of studies. For example [7] shows that running MPI applications using IDC on the same physical node performs poorly in comparison to Inter Process Communication (IPC). Also for remote domain communication the number of data copies impacts performance as shown in [8]. The available options for IDC also tend to severely limit scheduling of domains through middleware such as Xenoservers [9]. Using shared memory pages for high performance IDC automatically constrains VMs to be scheduled on the same physical node. On the other hand using virtual networks restores flexibility in scheduling but results in suboptimal performance when the VMs are running on the same node. Again, using infiniband based RDMA solutions

such as [8] for better network performance can be difficult in heterogeneous environments, since the actual hardware configuration of the target node is not known in advance and configuring the actual VM image becomes cumbersome. An interesting dimension to this problem is that physical proximity of virtual machines sharing data and events can change dynamically due to virtual machine migration [10]. Thus if IDC between VMs on a physical node is done with shared pages, this means they must be moved together to the same physical node, constraining the available scheduling options. On the other hand it maybe possible, based on data sharing patterns observed, to move VMs to the same physical node for better IDC performance. However if the VMs have already chosen VMnets for IDC then the entire purpose is defeated. II. P ROPOSED W ORK The goal of this research is to build an IDC mechanism for Xen that provides the best performance while retaining isolation and flexibility. The platform for experimentation will be virtual machines running on the Xen hypervisor. The most efficient mechanism for IDC on the same node is clearly physcial memory. Zero copy communication from the producer to the consumer domain can be achieved by switching pages from the former to the latter using a technique like device channels [11], already present in Xen. To maintain buffer availability the consumer domain will return synchronously, as many pages as it consumes, to the producer domain. This implementation will allow IDC on the same node to approach IPC in terms of data bandwidth and transfer latency. Alternatively, this can be done with a shared memory region mapped into both VMs. This will also be investigated. Memory is not an option for inter domain communication spanning physical nodes . However as shown in [8] a lot of the inherent latency in moving data around in this case can be removed by using RDMA techniques from userspace. It is possible to implement maximally efficient IDC while retaining flexibility by separating the actual data copying decisions from the virtual machines themselves. This can be done by exposing a standard “VM Communication device” to each of the VMs. A front end driver will support read and write operations to this device. The target VM for these operations is chosen using an appropriate chosen VM naming and identification scheme for a distributed system of physical nodes running virtual machines.

The backend driver chooses the most efficient way of transferring the data. If the target VM is on the same node it may simply switch pages resulting in zero copy transfer of data between the frontend drivers in the source and target VMs. If the frontend driver in the producer VM wishes to continue mapping the pages (perhaps to avoid copying from the application buffers for the send call), the backend driver can choose the most efficient way of memory to memory copy on the same system. It may be possible to do this using harware assists such as memory to memory to DMA (if the platform supports it, for example using [12]) or in the worst case plain old fashioned copying. If the transfer involves communication between different physical nodes, the backend driver can decide whether to use RDMA type solutions depending on the platform configuration (such as whether it supports infiniband) for quick transfer or fall back to setting up a network connection to the remote node. Even in the latter case it may decide to open and maintain a persistent connection based on the frequency of data transfer or based on explicit request from the source VM (perhaps through an open call on the device). It may even be possible to multiplex data from different VMs on the same network connection to avoid the overhead opening and maintaining multiple connections to the same node. This solution will also cover the problem of VM migration because the backend can choose to change its data transfer mechanism independent of the front end. At the same time flexibility is maintained because the back end driver in the control domain or in an isolated domain can be statically configured on a per physical node basis. The proposed solution is analogous to virtual channel processors [13], which are aimed at increasing flexibility in implementing IO stacks by decoupling them into a separate virtual machine. It can be thought of as a virtual channel processor dedicated to inter domain communication. The proposed research will design and implement this fexible high performance IDC mechanism using the Xen hypervisor. A select set of applications will be used to measure performance gains using this technique. It is anticipated that this will consist of transaction processing systems such as a DBMS and webserver responding to TPCC style queries to reflect server workloads and scientific applications communicating using MPI to reflect grid workloads. Any required changes to userspace communication libraries such as sockets or MPI will be made to support this new device (a move towards extreme para virtualization like in [14]). To level the playing field in terms of flexibility the implementation will be benchmarked against an equivalent implementation using only vmnets. The benchmarking will also subject the virtual machines to live migration. It is expected that the typical usage scenario for such deployment of VMs using IDC is in server farms running middleware such as Xenoserver [9] with a central scheduling entity. It is thus possible to use feedback from the backend drivers described above about IDC patterns and costs. This is turn can be used for better scheduling policies that will

dynamically move communicating VMs to the same node as far as possible. This research will target an implementation that provided such feedback. If possible such a scheduler will also be investigated. Filesystem based communication is currently a non goal for this research. Techniques for improving the efficiency of VMs that share file systems is already being investigated [14] [15] [16]. It is not anticipated that files will be a popular mode of IDC, primarily due to the difficulty of maintaining coherent views among VMs. However, the IDC device described above can easily coexist with these mechanisms. In addition if the VMs desire persistent copies of transferred data it may be possible to interface the backend driver with the unified buffer cache of [14] or with an on disk FS cache like in [16]. III. B ENEFITS OF R ESEARCH The current trend in computer architecture of a larger number of smaller, simpler and power efficient cores in microprocessors [17] means that the number of hardware threads available in server platforms is set to grow phenomenally, an early example being SUN’s Niagara [18]. This coupled with server consolidation means that the number of virtual machines running on a single physical node will also grow. In such an environment it is inevitable that virtual machines will share data. This research will improve the performance of communication between virtual machines as well as decrease while retaining scheduling flexibility and tolerance to failures. R EFERENCES [1] Gerald J. Popek and Robert P. Goldberg. Formal requirements for virtualizable third generation architectures. Commun. ACM, 17(7):412– 421, 1974. [2] Paul T. Barham, Boris Dragovic, Keir Fraser, Steven Hand, Timothy L. Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization. In SOSP, pages 164–177, 2003. [3] S. Devine, E. Bugnion, and M. Rosenblum. Virtualization system including a virtual machine monitor for a computer with a segmented architecture. [4] A. Whitaker, M. Shaw, and S. Gribble. Scale and performance in the denali isolation kernel, 2002. [5] Xuehai Zhang, Katarzyna Keahly, Ian Foster, and Timothy Freeman. Virtual cluster workspaces for grid applications. [6] Message Passing Interface Forum. MPI: A message-passing interface standard. Technical Report UT-CS-94-230, 1994. [7] Anirban Saha and Gang Peng. How good is xen for simulating distributed applications. [8] Jiuxing Liu, Wei Huang, Bulent Abali, and Dhabaleswar K. Panda. High performance vmm-bypass i/o in virtual machines. In Usenix, 2006. [9] Evangelos Kotsovinos. Global Public Computing. Technical Report UCAM-CL-TR-615, Computer Laboratory, University of Cambridge, January 2005. [10] C. Clark, K. Fraser, S. Hand, J. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines, 2005. [11] K. Fraser, S. Hand, R. Neugebauer, I. Pratt, A. Warfield, and M. Williamson. Safe hardware access with the xen virtual machine monitor. In 1st Workshop on Operating System and Architectural Support for the on demand IT InfraStructure (OASIS), 2004. [12] www.intel.com/go/ioat. [13] D. McAuley and R. Neugebauer. A case for virtual channel processors, 2003. [14] Mark Williamson. Extreme paravirtualisation: beyond arch/xen. [15] Mendel Rosenblum Ben Pfaff, Tal Garfinkel. Virtualization aware file systems: Getting beyond the limitations of virtual disks. In 3rd Symposium of Networked Systems Design and Implementation (NSDI), May 2006.

[16] Ming Zhao, Jian Zhang, and Renato Figueiredo. Distributed file system support for virtual machines in grid computing. In HPDC-13, 2004. [17] John D. Davis, James Laudon, and Kunle Olukotun. Maximizing cmp throughput with mediocre cores. In PACT ’05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT’05), pages 51–62, Washington, DC, USA, 2005. IEEE Computer Society. [18] Poonacha Kongetira, Kathirgamar Aingaran, and Kunle Olukotun. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, 25(2):21– 29, 2005.

Design and Implementation of a Fast Inter Domain ...

Jul 6, 2006 - proximity of virtual machines sharing data and events can .... that share file systems is already being investigated [14] [15]. [16]. It is not ...

65KB Sizes 2 Downloads 278 Views

Recommend Documents

Applying UML to Design an Inter-Domain Service ...
to design service components of a telecommunications management system. This paper ..... to easily produce an OMG-IDL (interface definition language) file.

Fast and Accurate Time-Domain Simulations of Integer ... - IEEE Xplore
Mar 27, 2017 - Fast and Accurate Time-Domain. Simulations of Integer-N PLLs. Giovanni De Luca, Pascal Bolcato, Remi Larcheveque, Joost Rommes, and ...

Design and Implementation of e-AODV: A Comparative Study ... - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, ... In order to maximize the network life time, the cost function defined in [9] ...

design and implementation of a high spatial resolution remote sensing ...
Aug 4, 2007 - 3College of Resources Science and Technology, Beijing Normal University, Xinjiekou Outer St. 19th, Haidian ..... 02JJBY005), and the Research Foundation of the Education ... Photogrammetric Record 20(110): 162-171.

Design and implementation of a new tinnitus ... -
School of Electronics and Information Engineering, Sichuan University, ... Xavier Etchevers; Thierry Coupaye; Fabienne Boyer; Noël de Palma; Gwen ...

design and implementation of a high spatial resolution remote sensing ...
Aug 4, 2007 - 3College of Resources Science and Technology, Beijing Normal University, ..... 02JJBY005), and the Research Foundation of the Education.

design and implementation of a high spatial resolution remote sensing ...
Therefore, the object-oriented image analysis for extraction of information from remote sensing ... Data Science Journal, Volume 6, Supplement, 4 August 2007.

The Design and Implementation of a Large-Scale ...
a quadratic analytical initial step serving to create a quick coarse placement ..... GORDIAN [35] is another tool that uses a classical quadratic objective function ...... porting environment of software components with higher levels of functionality

Design, Simulation and Implementation of a MIMO ...
2011 JOT http://sites.google.com/site/journaloftelecommunications/. Design, Simulation and Implementation of a. MIMO MC-CDMA based trans-receiver system.

Design and Implementation of e-AODV: A Comparative Study ... - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, ... Keywords: Wireless mobile ad hoc networks, AODV routing protocol, energy ... In order to maximize the network life time, the cost function defined in [9] ...

Design and Implementation of a Ubiquitous Robotic ...
three proposed spaces are similar to the work conducted by Saffiotti and colleagues [8]. .... tential parent nodes around the mobile node by calling NLME-NET- .... to investigate and interact with the physical space in an intuitive way. Fusion of ...

design and implementation of a voronoi diagrams ...
There are various legal systems in place to protect consumers. Electronic fraud related to ... at lower prices. Online companies are trying their best to attract and ...

Design and Implementation of a Log-Structured File ... - IEEE Xplore
We introduce the design principles for SSD-based file systems. They should exploit the performance character- istics of SSD and directly utilize file block level statistics. In fact, the architectural differences between SSD and. HDD result in differ

Complexity Measurements of the Inter-Domain Management System
Current use of software metrics in the industry focuses on the cost and effort estimation, ..... VPN-Manager), and the VASP Management Information. Base (MIB).

Design and implementation of Interactive ...
Processing can be deployed within the development environment, in Java projects, as well as in HTML/JSP pages in tags. For web pages integration, processing.js extension is needed. Its main purpose is to translate the whole sketch into the Javascrip

design and implementation of a computer systems ...
for a specialised diagnosis service arose which would give the management system the ability to predict failure causes ...... taccesspolicy.xml and crossdomain.xml files at the root of the domain where the service is hosted. ...... Algorithm types Li

The Design and Implementation of a Large-Scale ...
Figure 2.3: GORDIAN: Center of Gravity Constraints: The average location of ..... as via a distributable solution framework for both global and detailed placement phases. ...... BonnPlace calls this step repartitioning, we call it re-warping.

Design and Implementation of a Combinatorial Test Suite Strategy ...
Design and Implementation of a Combinatorial Test Su ... rategy Using Adaptive Cuckoo Search Algorithm_ p.pdf. Design and Implementation of a ...

Implementation of Domain Name Server System using ...
Today is a world of high speed internet with millions of websites. Hence, in ... system is in true sense the backbone of the secure high speed internet [11]. As the ...

Design and Implementation of High Performance and Availability Java ...
compute engine object that is exported by one JAVA RMI (JRMI) server, it will take ... addition, to build such a system, reliable multicast communication, grid ...

Design and Implementation of High Performance and Availability Java ...
compute engine object that is exported by one JAVA RMI (JRMI) server, it will take .... Test an application, such as a huge computation on this system. 2. Test the ...

Design and implementation of Interactive visualization of GHSOM ...
presented the GHSOM (Growing Hierarchical Self Organizing Map) algorithm, which is an extension of the standard ... frequently labeled as text mining) is in comparison with classic methods of knowledge discovery in .... Portal provides a coherent sys

Design & Implementation of a DS-CDMA RAKE ...
recovery process. 2. System ... the forward link (Base station to Mobile Station) of a ... Data bits generated by user, either from a text or from a vocoder. Direct.