Reinforcement Learning for Capacity Tuning of Multi-core Servers

Liat Ein-Dor

Yossi Ittach

Aharon Bar-Hillel

Intel Research Israel Labs

Intel Research Israel Labs

Intel Research Israel Labs

Amir Di-Nur

Ran Gilad-Bachrach

Intel IT

Intel Research Israel Labs

Abstract The computing power in mainstream computers keeps on growing in exponential rates. However, the progress in the serial computing power has slowed down and much of the progress is achieved by making parallel computing the mainstream. Indeed, two-core and Quad-core computers are already widely spread, and 80 cores CPUs have already been demonstrated. At the same time, more and more software product uses the parallel computing power by multi-tasking, threading and other techniques. In this setting, the question how many jobs should a server concurrently run? becomes more involved, and the optimal number depends in a non-trivial way on the job's nature and the exact optimality criterion. We claim here that reinforcement learning techniques can be used to continuously monitor the number of running jobs, which we term the

server's capacity,

thus leading to signicant performance

improvements. A server's performance can be measured in several dimensions, and the most important quantities are throughputthe number of successful job executions per time unit, and average job slowdownthe time it takes a job to complete compared to the time it would have taken if it was singly running on a dedicated machine. Increasing throughput and reducing slowdown are often conicting aims[3], since running more jobs in parallel increases throughput but it also increases the time it takes to nish each job. The default manually-tuned policy used in our organization ignores this trade-o, and simply sets the number of concurrently running jobs equal to the number of cores the server has. We suggest to replace this policy with a dynamic learning agent optimizing both quantities, with the trade-o between them determined by the system administrator. An important observation for the capacity tuning problem is that a static policy, which does not respond to changes in job types or machine loads, can only reach sub-optimal resource utilization. For example, two heavy jobs with high memory usage are often run in parallel, causing memory swaps that reduce throughput dramatically. Similarly, running multi-core aware jobs together, each of which fully utilizes the computer resources when run alone, increase job slowdown in comparison with sequential runs, without any increase in throughput value. The opposite situation occurs for light jobs with low CPU and memory requirements. Many such jobs can be packed together with almost no impact on slowdown and a signicant improvement in throughput. We pose the problem as an MDP with three possible actions for the agent: additional job,

resubmit

a running job (for later execution), and

keep

accept

an

the current situation.

The agent is activated in xed time intervals and the state space it monitors is a continuous 7 vector in R , including measurements as free memory, machine load, system/idle time and number of currently running jobs. To compute the state's reward, we measure the fraction of CPU utilization of all running jobs in the last interval.

The reward is dened based

α. The trade-o between slowdown (on the X-axis) and workload duration (1/throughtput on the Y-axis) as α varies. Results were obtained using simulated workload with the Iterative Q-tting algorithm. Left: The advantage of ML-tuned servers: statistics of throughput improvement obtained by tted Q-iteration in 10-days experiments with real workloads. Improvement duration measures the fraction of time in which the ML-tuned machines were superior to manually-tuned machines. Figure 1: Right: The role of

Improvement

2-core

4-core

6.7%

8.6%

2.4%

6.3%

80%

99%

Mean Median duration on the

α−Minkovsky

norm of the resulting utilization vector.

just the total fraction of CPU time used, but for CPU resources on a single job are prefered.

α>1

Changing

For

α = 1

this reward is

states with high concentration of

α

moves the emphasize between

throughput enhancement (requiring high CPU utilization and hence low

α

values) and

slowdown reduction (requiring CPU concentration on a single job). We learnt control policies for capacity tuning using three reinforcement learning algorithms, representing dierent approaches: Hidden Markov Model (HMM) + Linear Programming (LP)[1], online learning with TD-λ[4], and tted Q-iteration with Parzen window regression[2]. Experiments on a small grid using simulated workload have shown that all three algorithms were able to learn policies which outperform the manual policy both in terms of reward and of server's throughput.

The impact of the reward parameter

α

was

teseted empirically, and varying it enabled us to produce a trade-o curve between throughput and average job slowdown (see gure 1 right).

The tted Q-iteration algorithm per-

formed slightly better then its competitors, and was hence used in further experiments done on a living grid with real jobs. In these experiments the learning agent was tested on 2-core and 4-core machines, and achieved stable and signicant throughtput enhancements of 6-9% over manually-tuned servers (see gure 1 left). Such enhancement implies possible annual saving of dozens of million dollars for organizations employing large grids or batch systems.

References [1] L. Chrisman. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In

National Conference on Articial Intelligence, pages 183188, 1992.

[2] D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning.

Journal of Machine Learning Research (JMLR), 6:503556, 2005.

[3] A. Snavely and J. Kepner. good thing?

In

99% utilizationis 99% utilization of a supercomputer a

ACM/IEEE conference on Supercomputing,

page 37, New York, NY,

USA, 2006. ACM Press. [4] R. S. Sutton and A. G. Barto. Cambridge, MA, 1998.

Reinforcement Learning: An Introduction.

MIT Press,

Reinforcement Learning for Capacity Tuning of Multi ...

At the same time, more and more software product uses the parallel ... gramming (LP)[1], online learning with TD-λ[4], and fitted Q-iteration with Parzen window ... In National Conference on Artificial Intelligence, pages 183 188, 1992.

98KB Sizes 4 Downloads 188 Views

Recommend Documents

Workstation Capacity Tuning using Reinforcement ...
Perl and C++ APIs. It relies on a ... The user uses the Command Line Interface (CLI) or API ...... the lower hierarchy machines are grouped and managed by.

Multi-Objective Reinforcement Learning for AUV Thruster Failure ...
Page 1 of 4. Multi-Objective Reinforcement Learning. for AUV Thruster Failure Recovery. Seyed Reza Ahmadzadeh, Petar Kormushev, Darwin G. Caldwell. Department of Advanced Robotics, Istituto Italiano di Tecnologia, via Morego, 30, 16163 Genova. Email:

Reinforcement Learning Trees
Feb 8, 2014 - size is small and prevents the effect of strong variables from being fully explored. Due to these ..... muting, which is suitable for most situations), and 50% ·|P\Pd ..... illustration of how this greedy splitting works. When there ar

Bayesian Reinforcement Learning
2.1.1 Bayesian Q-learning. Bayesian Q-learning (BQL) (Dearden et al, 1998) is a Bayesian approach to the widely-used Q-learning algorithm (Watkins, 1989), in which exploration and ex- ploitation are balanced by explicitly maintaining a distribution o

Improved Unicast Capacity Bounds for General Multi ...
to the interfaces tuned to channel 2, a bit-meters/sec of s1 s2 s3 s4 d1 d2 d3 d4 s d. (a) ... of the 11th annual international conference on mobile computing and.

Evaluating Multi-task Learning for Multi-view Head-pose ... - Vintage
to produce the best classification performance, with SVM+MTL outperforming classical ... freely in parties only allows for behavioral monitoring using distant, large ... the head orientation can improve social attention estimates as observed in [3].

Evaluating Multi-task Learning for Multi-view Head ... - Semantic Scholar
Head-pose Classification in Interactive Environments. Yan Yan1, Ramanathan Subramanian2, Elisa Ricci3,4 ... interaction, have been shown to be an extremely effective behavioral cue for decoding his/her personality ..... “Putting the pieces together

Evaluating Multi-task Learning for Multi-view Head-pose ... - Vintage
variations when multiple, distant and large field-of-view cameras are employed for visual ..... each denoting a quantized 45o (360/8) head-pan. To train the MTL ...

Multi-task, Multi-Kernel Learning for Estimating Individual Wellbeing
Because our application involves classifying multiple highly related outputs (i.e. the five related measures of wellbeing), we can also benefit from Multi-task ...

Reinforcement Learning as a Context for Integrating AI ...
placing it at a low level would provide maximum flexibility in simulations. Furthermore ... long term achievement of values. It is important that powerful artificial ...

Kernel-Based Models for Reinforcement Learning
cal results of Ormnoneit and Sen (2002) imply that, as the sample size grows, for every s ∈ D, the ... 9: until s is terminal. Note that in line 3 we compute the value ...

Interactive reinforcement learning for task-oriented ... - Semantic Scholar
to a semantic representation called dialogue acts and slot value pairs; .... they require careful engineering and domain expertise to create summary actions or.

Reinforcement Learning for Adaptive Dialogue Systems
43 items - ... of user action ˜su, system action based on noisy state estimate ˜as, system action given current state as) ... Online learning. .... Simulate phone-level confusions, e.g.. [Pietquin ... Example: Cluster-based user simulations from sm

Reinforcement learning for parameter estimation in ...
Oct 14, 2011 - Keywords: spoken dialogue systems, reinforcement learning, POMDP, dialogue .... “I want an Indian restaurant in the cheap price range” spoken in a noisy back- ..... 1“Can you give me a phone number of The Bakers?” 12 ...

Small-sample Reinforcement Learning - Improving Policies Using ...
Small-sample Reinforcement Learning - Improving Policies Using Synthetic Data - preprint.pdf. Small-sample Reinforcement Learning - Improving Policies ...

Reinforcement Learning Agents with Primary ...
republish, to post on servers or to redistribute to lists, requires prior specific permission ... agents cannot work well quickly in the soccer game and it affects a ...

Reinforcement Learning: An Introduction
important elementary solution methods: dynamic programming, simple Monte ..... To do this, we "back up" the value of the state after each greedy move to the.

A Theory of Model Selection in Reinforcement Learning - Deep Blue
seminar course is my favorite ever, for introducing me into statistical learning the- ory and ..... 6.7.2 Connections to online learning and bandit literature . . . . 127 ...... be to obtain computational savings (at the expense of acting suboptimall

A Theory of Model Selection in Reinforcement Learning
4.1 Comparison of off-policy evaluation methods on Mountain Car . . . . . 72 ..... The base of log is e in this thesis unless specified otherwise. To verify,. γH Rmax.