Bin Repacking Scheduling in Virtualized Datacenters - Back to Work Fabien Hermenier, OASIS Team, INRIA - CNRS - I3S, Univ. Nice Sophia-Antipolis, [email protected] Sophie Demassey, TASC project, Mines Nantes-INRIA, LINA CNRS UMR 6241, [email protected] Xavier Lorca, TASC project, Mines Nantes-INRIA, LINA CNRS UMR 6241, [email protected] September 26, 2011 Because the acceptation of a paper does not necessarily means that its objectives were achieved, we made some progress in the model and the implementation of the VRSP since the acceptation of Bin Repacking Scheduling in Virtualized Datacenters [3] in June 2011. In this report, we present our modifications and the new results using the same evaluation protocol.

1

Modifications

Profiling and testing our code in various situations revealed a few flaws which reduced the performance of the solving process and the quality of the computed reconfiguration plans. In this section, we describe the modifications that were made. In the VRSP. The implementation of the bin-packing constraint in Choco partially relies on set variables. The constraint has been reimplemented to directly maintain the candidate items for a bin as internal data, using bitsets, instead of using the external set abstraction. This reduces the propagation time of the constraint by a significant order of magnitude. In the side constraints. Profiling the solving process of instances having numerous side constraints shown bottlenecks in lonely and capacity that were responsible of a high computation time. The implementation of lonely has been modified to substitute the set-based disjoint constraint by an equivalent that directly manage two lists of integer variables. As a result, the set variables and the channeling constraints to link the sets with the integer variables have been removed. The capacity constraint has been remodeled to also avoid the use of set variables to store the number of VMs running on servers. capacity is now modeled with one among [1] constraint that directly counts the VMs assigned to any server in the designated group of servers. For any subset of servers R ( R and value n ∈ N, n > 0, capacity(R, n) is modeled by: among([0, n], hFjr | j ∈ J i, R).

1

where variable Fjr models the final server assigned to VM j ∈ J . In the specific case n = 0, constraint capacity(R, n) is instead directly modeled using domain constraints: Fjr ∈ 6 R, ∀j ∈ J . In the search heuristics. A bug was found in the search heuristic that prevented it to use a worst-fit approach to place VMs on servers. Fixing this bug improves its efficiency to guide the solver to a solution when the consolidation ratio was high. A second optimization was performed to reduce the scheduling delay of the actions and the number of migrations of the reconfiguration plan. In the repair mode, the branching heuristic now tries first to place VMs on servers that only host candidate VMs and VMs that will not be suspended nor stopped. No resource will then be freed on these servers during the reconfiguration process so any action that will place a VM on them is ensured to be scheduled without any delay. This also avoids the creation of additional migrations to liberate resource on servers when too many VMs are temporary assigned to them during the reconfiguration.

2

Evaluations

Our modifications reduce the running time of the solving process and improve the quality of the computed solutions. For an accurate comparison of the gains with regards to the published results, we have rerun our experiments using the same set of instances and the same computing servers in the Grid’5000 testbed [2]. Details about the instances and the experimental environment are available in the original paper [3].

2.1

Impact of the consolidation ratio Ratio 2:1 3:1 4:1 5:1 6:1

solved

obj

100 100 100 100 100

387 766 1393 2644 4873

Rebuild Mode nodes fails 1972 2952 3932 4921 5958

0 0 1 9 71

time

solved

obj

2.3 3.9 6.5 10.3 15.3

100 100 100 100 100

380 742 1309 2117 3271

Repair Mode nodes 162 393 830 1574 2691

fails

time

0 0 0 4 1

0.3 0.8 1.5 3.2 7.0

Table 1: Impact of the consolidation ratio on the solving process. Table 1 shows the impact of the consolidation ratio on the solving process in the rebuild and repair modes using the new implementation. Figure 1 shows the performance of our new implementation with regards to the original one. Figure 1(a) shows the average solving time to compute the first solution. Figure 1(b) shows the average cost of these solutions.

���� ����

����� ����������� ����������� ���������� ����������

����� ���������������

���������������������������

����

���� ���

����� �����

����������� ����������� ���������� ����������

����� ����� ����� �����

�� ���

�� ���

���

���

���

���

�������������������

���

���

���

���

�������������������

(a) Computation time for the first solution

(b) Cost of the solution

Figure 1: Gains on the solving process for instances having a variable consolidation ratio

2

We first observe on Table 1 that all the instances are now solved. This is mostly explained by the bug fix in the search heuristic. Figure 1(a) shows the gain from the optimization of the pack constraints. In the rebuild mode, instances with a consolidation ratio of 5:1 are now computed in 17 seconds; 14 times faster. In the repair mode, instances with a consolidation ratio of 6:1 are now computed in 7.6 seconds; 9.5 times faster. This gain is slightly less important than the gain in the rebuild mode as the number of managed VMs, so the number of items handled by the pack constraint, is reduced. Finally, Figure 1(b) shows the improvements on the computed reconfiguration plans. This cost was reduced by a factor of 2.8 in the rebuild mode and by a factor of 1.15 in the repair mode. Our optimizations reduce the original gap between the rebuild and the repair mode for these instances. The gain is however still appreciable as the repair mode still provides better solutions in a fewer amount of time.

2.2 Set x1 x2 x3 x4

Impact of the datacenter size #servers

#VMs

500 1,000 1,500 2,000

2,500 5,000 7,500 10,000

Rebuild Mode obj nodes fails 1326 2457 0 2653 4914 0 3950 7370 0 5317 9828 0

solved 100 100 100 100

time 2.0 10.3 36.5 88.9

solved 100 100 100 100

Repair Mode obj nodes fails 1058 798 0 2130 1578 0 3179 2346 0 4269 3143 0

time 0.8 3.3 10.1 21.9

Table 2: Impact of the datacenter size on the solving process. Table 2 shows the impact of the datacenter size on the computation in the rebuild and the repair modes. Contrary to the old implementation that failed at solving one instance for sets x2 and x3, we observe here that every instances are now solved using the new implementation. The performance gap between the rebuild mode and the repair mode is more visible in this experiment where instances are getting bigger. Figure 2 shows the performance of our new implementation with regards to the original one. Figure 2(a) shows the average solving time to compute the first solution while Figure 2(b) shows the average cost of these solutions. We first observe that the gap between the old and the new implementation increases with the size of the problems. The accelerator factor using the new implementation in the repair mode stays however over 9. Figure 2(b) shows a constant 10% improvement on the quality of the computed solutions. ���������� ����������

����

���������������

���������������������������

����

���� ���� ��� �� ��

��

��

��

����� ����� ����� ����� ����� ����� ����� ����� ����� ���� ��

���������� ����������

��

���������������

��

��

��

���������������

(a) Computation time for the first solution

(b) Cost of the solution

Figure 2: Gains on the solving process for variable datacenter sizes.

2.3

Impact of the side constraints

Table 3 shows the impact of the side constraints on the instances with a variable consolidation ratio (left) and with a variable datacenter size (right) using the new implementation. Contrary to the previous imple-

3

variable consolidation ratios Set solved 2:1 100 3:1 100 4:1 100 5:1 100 6:1 100

obj 381 742 1310 2117 3298

nodes 163 393 830 1570 3172

variable datacenter sizes

fails 0 0 0 0 0

time 0.4 0.9 1.9 4.1 10.4

Set solved x1 100 x2 100 x3 100 x4 100

obj 1059 2131 3179 4269

nodes 798 1578 2346 3143

fails 0 0 0 0

time 0.9 4.2 13.8 30.7

Table 3: Impact of the side constraints on the solving process (repair mode). mentation where the additional constraints forbade to solve some of the biggest instances, all the instances can now be solved. Figure 3 and Figure 4 shows the impact of the side constraints on the solving process for instances having a variable consolidation ratio and a variable datacenter size, respectively. Similar to the previous experiments, we observe that the optimizations performed on our model and our implementation improve the performance by a significant order of magnitude. ���������������� ����������������

����

���������������

���������������������������

����

��� ��� ��� ��� �� ���

���

���

���

���

����� ����� ����� ����� ����� ����� ����� ����� ����� ���� ��

���������������� ����������������

���

�������������������

���

���

���

���

�������������������

(a) Solving duration

(b) Cost of the solutions

Figure 3: Impact of the side constraints with variable consolidation ratios

���������������� ����������������

����

���������������

���������������������������

����

���� ���� ��� �� ��

��

��

��

����� ����� ����� ����� ����� ����� ����� ����� ����� ���� ��

���������������� ����������������

��

���������������

��

��

��

���������������

(a) Solving duration

(b) Cost of the solutions

Figure 4: Impact of the side constraints with variable datacenter sizes

2.4

Practical quality of our solutions

The objective value denotes the quality of the computed reconfiguration plan. In practice, this quality is mostly characterized by the amount of actions to execute and the estimated application duration of the plan. Such a representation of the solution was not included in the original paper for space constraints. Figure 5 and 6 show for the new implementation, the practical cost of the computed plans, for the instances with a variable consolidation ratio and a variable datacenter size, respectively. We observe first 4

�������������������������������

��������������

����� ���� ���� ���� ���� ���� ���� ���� ���� ���� ��

����������� ���������� ����������������

���

���

���

���

��� ��� ���

����������� ���������� ����������������

��� ��� ��� �� ��

���

���

�������������������

���

���

���

���

�������������������

(a) Number of actions

(b) Estimated application duration

Figure 5: Practical cost of the reconfiguration plans with variable consolidation ratios

��������������

����

�������������������������������

���� ����������� ���������� ����������������

���� ���� ���� ���� �� ��

��

��

��

��� ���

����������� ���������� ����������������

��� ��� ��� �� �� ��

���������������

��

��

��

���������������

(a) Number of actions

(b) Estimated application duration

Figure 6: Practical cost of the reconfiguration plans with variable datacenter sizes that the gain between the rebuild mode and the repair mode is very interesting in practice. Indeed, for the biggest problems, the estimated application duration is divided by up to 2. With faster reconfiguration plans, Entropy will be more reactive: it will quickly fix performance issues while being able to manage VMs with frequent resource requirement variations. We also observe a significant reduction in the number of actions to execute. In practice, a fewer amount of migrations will then be executed in the repair mode. This improvement reduces the temporary performance degradation for the VMs involved in migrations while saving network bandwidth. Finally, we observe that the practical quality of the reconfiguration plans is mostly equivalent when the side constraints are considered. For the instance set x4, only 6 instances have a reconfiguration plan made up with more actions when the side constraints are considered. This is explained by the VM placement heuristic in the repair mode. Indeed, the search heuristic computes that about half of the servers are candidates to host VMs with a guarantee to have their associated action scheduled without any delay.

3

Conclusions

The modifications we performed on Entropy lead to a major improvements on our results. With regards to the published results, Entropy is now capable of solving up to ten time faster instances that can be harder to solve, while providing better reconfiguration plans. It has to be noticed that our optimizations do not change the theoretical complexity of the problem. The solving duration of the instances is still exponential with regards to the size of the instances and the consolidation ratio. However, we have shifted the solving capability of Entropy to a point that current instances are no longer complex and big enough to show the limitation of our implementation. 5

References [1] C. Bessiere, E. Hebrard, B. Hnich, and Z. Kiziltan. Among, common and disjoint constraints. In In CSCLP: Recent Advances in Constraints, volume 3978 of LNCS, pages 29–43. Springer, 2006. [2] R. Bolze, F. Cappello, E. Caron, M. Daydé, F. Desprez, E. Jeannot, Y. Jégou, S. Lanteri, J. Leduc, N. Melab, G. Mornet, R. Namyst, P. Primet, B. Quetier, O. Richard, E.-G. Talbi, and I. Touche. Grid’5000: A large scale and highly reconfigurable experimental grid testbed. Int. J. High Perform. Comput. Appl., 20:481–494, November 2006. [3] F. Hermenier, S. Demassey, and X. Lorca. Bin repacking scheduling in virtualized datacenters. In J. Lee, editor, Principles and Practice of Constraint Programming, CP 2011, volume 6876 of Lecture Notes in Computer Science, pages 27–41. Springer Berlin / Heidelberg, 2011.

6

Bin Repacking Scheduling in Virtualized Datacenters

The capacity constraint has been remodeled to also avoid the use of set variables to store the number of. VMs running on servers. capacity is now modeled with one among [1] constraint that directly counts the. VMs assigned to .... The accelerator factor using the new implementation in the repair mode stays however over 9.

360KB Sizes 0 Downloads 160 Views

Recommend Documents

Higher SLA Satisfaction in Datacenters with Continuous ...
ABSTRACT. In a virtualized datacenter, the Service Level Agreement for an application restricts the Virtual Machines (VMs) place- ment. An algorithm is in charge of maintaining a placement compatible with the stated constraints. Conventionally, when

Preserving I/O Prioritization in Virtualized OSes
First, CPU accounting in guest OSes can be inaccurate under discontinuous time, leading to false identi- cation of I/O-bound task as compute-bound. Second and most importantly, work-conserving (WC) scheduling, which is designed for continuous CPU ava

Tracing Packet Latency across Different Layers in Virtualized Systems
Aug 5, 2016 - tracing mechanisms and this calls for a system level and application transparent tracing tool. There exist ... trace network latency at packet level in virtualized environ- ments. TC timestamps packets at ..... its fair CPU share, it al

On Exploiting Page Sharing in a Virtualized Environment - an ...
On Exploiting Page Sharing in a Virtualized Environment ... dy of Virtualization Versus Lightweight Containers.pdf. On Exploiting Page Sharing in a Virtualized ...

Small is Better: Avoiding Latency Traps in Virtualized ...
lution for improving latency in virtualized cloud envi- ronments. In this approach ..... kernel network stack has been a hot topic in the Linux community, and ... Cloud. Network. Server VM. Client VM. Good VM. Server VM. Bad VM. Physical Machine. Phy

Running virtualized native drivers in User Mode Linux
on the host which contains a filesystem. Interrupts ... Only 4 process/UML on the host ... code netbus. App. hostap driver new virtual external device tcp exported.

Effective VM Sizing in Virtualized Data Centers
gated resource demand of a host where the VM may be placed. Based on effective sizing, we .... smaller VMs (relative to the hosting server's capacity) since.

Maulid Simthud Duror - [Habib Ali bin Muhammad bin Husein Al ...
He was a Shaykh (accomplished spiritual master) in the 'Alawi tariqa who learned Islam from his father al-Habib. Muhammad bin Husain al-Habshi Rahmatullahi 'alaih who was the Mufti in Makkah in the Shafi'i school of. Muslim law. Three kutub (books) o

Performance Models for Virtualized Applications
new tools for predicting performance, providing information for resource alloca- .... In other words, models depend on data collected by measurement tools in ...

Predictive Resource Scheduling in Computational ... - Semantic Scholar
Department of Computer Science ... started to adopt Grid computing techniques and infrastruc- ..... dependently and with minimal input from site providers is.

Virtualized Server Host Lists
WinPak is a software that controls access control to all of our building door lock systems. This service used to run on ... Pittsburg Community Schools Mail - Virtualized Server Host Lists ... When the 4th physical (ESX) Host is purchased for the SAN

Predictive Resource Scheduling in Computational ... - Semantic Scholar
been paid to grid scheduling and load balancing techniques to reduce job waiting ... implementation for a predictive grid scheduling framework which relies on ...

Minimum-Latency Aggregation Scheduling in ...
... networks. To the best of our knowledge, none of existing research works have addressed ... search works on underwater networks have addressed the ..... information into underwater acoustic sensor coverage estimation in estu- aries”, In ...

Affordable Solution For Waste Bin Hire in Melbourne.pdf
Affordable Solution For Waste Bin Hire in Melbourne.pdf. Affordable Solution For Waste Bin Hire in Melbourne.pdf. Open. Extract. Open with. Sign In. Main menu.

Server-side recycle bin system
Aug 25, 2005 - via a wide area computer network, a local area network, the. Internet, of any other ... Local Computer System J. /. File Manager. Application. Server. 3. 6. 9. File Serving Application l "l. 12. 2. 5. 8. \ 'Uger's recycle bin. Mass Fil

Server-side recycle bin system
Aug 25, 2005 - data residing on the local computer's hard disk drive only. The Windows® operating systems do not protect the data residing on any of the other ...

Energy-Efficient Datacenters: The Role of Modularity ...
organizations that manage the company's corporate infrastructure portfolio including engineering, services .... Chapter 6 on page 47 discusses the business benefits of the modular design and looks towards the future that the ..... In Sun's Santa Clar

Insertion Scheduling: An Alternative to List Scheduling ...
constraints between the tasks, build a schedule which satisfies the precedence and the ... reservation vector, for the conditional stores of the DEC Alpha 21064.

Scheduling Fairness of Real-Time Scheduling ...
MExp, outage capacity, quality of service (QoS), scheduling fairness, ... Wireless industry has been making its way to the field of real-time multimedia applications. ... data applications over both wire-line and wireless channels, multiplexing ...

FCFS Scheduling
To schedule snapshot of processes queued according to FCFS (First Come First Serve) scheduling. Algorithm. 1. Define an array of structure process with members pid, btime, wtime & ttime. 2. Get length of the ready queue, i.e., number of process (say

Cluster-Wide Context Switch of Virtualized Jobs
resources are allocated to jobs according to their description. ▷ scheduling: ..... fond of - virtualization, distributed systems, autonomic computing, . . . ▷ dislike - ...

Fast Modulo Scheduling Under the Simplex Scheduling ...
framework, where all the computations are performed symbolically with T the .... Indeed the renewable resources in an instruction scheduling problem are the ...