Using Task Load Tracking to Improve Kernel Scheduler Load Balancing Linux Foundation Collaboration Summit 2013 Morten Rasmussen

1

Task Load Tracking - Introduction  What is it and why is it necessary?  

Implements load-tracking on a per-sched_entity basis. Introduced by PJT to enable bottom-up load-computation which improves fair group scheduling.

 Not currently used for load-balancing, but it has good potential.  Included from Linux 3.8.  Power-aware scheduling  Largely non-existing. SCHED_MC is long gone and didn't do the job.  Load-balancing is based on task load-weight which is currently a static value from a look-up table indexed by the task priority.

 

2

No distinction between tasks with different behaviour. The scheduling policy is to spread tasks for best performance.

Agenda  Task Load Tracking (TLT) overview  Proposed scheduler improvements:  

Packing Small Tasks Heterogeneous Systems obtaining consistent power and performance.

 TLT observations and open issues:   

3

Frequency scaling implications. Interactions with middleware. More aggressive task packing.

Task Load Tracking – Maths 1 Runnable time per ms [us]

1200 1000 800 600 400 200 0

Runnable History

Now

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 Time [ms]

1

Weight Series

0.8 Weight

0.6 0.4

x

0.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 Time [ms]

Weighted contribution

1200 1000 800 600 400 200 0

Weighted History

sum() Runnable avg. sum = 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 Time [ms]

4

3856

Task Load Tracking – Maths 2 Runnable time per ms [us]

1200 1000 800 600 400 200 0

Runnable History

Before

Now* 30ms later

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 Time [ms]

1

Weight Series

0.8 Weight

0.6 0.4

x

0.2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 Time [ms]

Weighted contribution

1200 1000 800 600 400 200 0

Weighted History

sum() Runnable avg. sum = 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 Time [ms]

5

2010

Task Load Tracking – load_avg_contrib  Task load contribution (load_avg_contrib) is based on:  Runnable avg. sum (runnable history)  Runnable avg. period (task life time)  Only has an effect early in the tasks life.  Task priority (niceness)  load_contrib is scaled by load.weight which is determined using a table in kernel/sched/sched.h static const int prio_to_weight[40] = {  /* ­20 */     88761,     71755,     56483,     46273,     36291,  /* ­15 */     29154,     23254,     18705,     14949,     11916,  /* ­10 */      9548,      7620,      6100,      4904,      3906,  /*  ­5 */      3121,      2501,      1991,      1586,      1277,  /*   0 */      1024,       820,       655,       526,       423,  /*   5 */       335,       272,       215,       172,       137,  /*  10 */       110,        87,        70,        56,        45,  /*  15 */        36,        29,        23,        18,        15, };

 Result: 0 <= load_avg_contrib <= load.weight 6

Task Load Tracking – Task Profiles  Examples of nice = 0 tasks:

1000 800 600 400 200 0

1200 Load Contrib.

Task Runnable History

Time [ms]

1200 800 600 400 200 0

1200

1000 800 600 400 200 Time [ms]

Task Runnable History

1000

Task Load Contrib.

0

7

Runnable Time per ms [us]

1200

Short periodic task

Load Contrib.

Runnable Time per ms [us]

Long-running task

Time [ms]

Task Load Contrib.

1000 800 600 400 200 0 Time [ms]

Saving Power: Packing Small Tasks  Small task  A periodic task with short execution time.  Disturbs cpus in deep sleep which is bad for power.  Can easily be identified using task load_avg_contrib.  Proposed solution  Pack these tasks on as few cpus in as few power domains as possible:

   

Remaining cpus sleep longer. Performance impact should be none or neglectable.

Patch set by Vincent Guittot, Linaro:



8

Deeper sleep states can be reached by remaining cpus.

[RFC PATCH v3 0/6] sched: packing small tasks

Packing Small Tasks implementation  The basics:  

Introduces a SD_SHARE_POWERDOMAIN sched_domain flag.



Checks waking task's tracked load against 20% (of load.weight) threshold.

Selects as packing buddy for each cpu which is the migration target for small tasks.

 If below and packing buddy is not busy, migrate the task.  Else, do normal wake-up balancing.  Buddy selection algorithm Power domain: 0 cpu:

Packing buddy:

9

1 00

11

22

33

Packing Small Tasks results  Evaluation platform: ARM TC2 (2xCortex-A15 + 3xCortex-A7)  Results from Vincent's patch set. MP3 playback  MP3 playback on Ubuntu 200

CA7

180

10

Hackbench

3.9-rc2

+patches

Avg.

2.048

0.047

St.dev.

2.015

0.068

140 Normalized energy

 36% less energy  Hackbench  No performance regressions

CA15

160

120 100 80 60 40 20 0 default

Scheduler

pack

Saving Power: Heterogeneous MPs  Heterogeneous Multi-processors: 

Contains cpus with different power/performance characteristics:

 

Power-efficient High performance

 Example: ARM big.LITTLE  Informed decisions about task placement is crucial to exploit the full potential of heterogeneous systems.  Don't use high performance cpus unnecessarily.  Ensure that demanding tasks are always running on high performance cpus.

 Proposed solution 

11

Use TLT to select an appropriate cpu for each task.

Scheduling on Heterogeneous MPs  Long-running tasks executing on high performance cpus.

Load Contrib.

 load_avg_contrib: High  Significant performance benefit from

1200

 Observation:

800 600 400 200 Time [ms]

1200 Load Contrib.

performance impact by executing on power efficient cpus.

1000

0

 Small tasks  load_avg_contrib: Low  Short execution time means limited

Task Load Contrib.

Task Load Contrib.

1000 800 600 400 200 0

 Most real-world tasks fall into these two categories.  TLT handles tasks with changing behaviour.  Goal:  Good default scheduler behaviour on heterogeneous MPs.  It will never be perfect. 12

Time [ms]

Improving Scheduling on Heterogeneous MPs

 Small tasks  Already identified and handled in the pack small tasks patch set.  Change packing buddy to be a power-efficient cpu.  Long-running tasks  CFS load-balancing must use TLT (load_avg_contrib and cfs.runnable_load_avg) instead of load.weight in order to correctly identify these tasks.



13

Alex Shi, Intel, and Preeti U Murthy, IBM, have experimented with this already.



If one/few long-running tasks, they must be actively migrated to highperformance cpus.



If many long-running tasks, spread them across all cpus to get highest throughput.

Proposed Solution for Heterogeneous MPs  Use cpu_power to represent compute capacity.  Assume low cpu_power cpus to be more power-efficient.  ARM big.LITTLE example: Cortex-A7 = 606, Cortex-A15 = 1442  Compare cpu load to cpu_power to find overloaded low cpu_power cpus during periodic load-balance.  Offload tasks to high cpu_power cpus.  Maximize throughput  Let idle low cpu_power cpus take long-running tasks from high cpu_power cpus when these are overloaded.

 RFC Patch set: 

14

Vincent Guittot, Linaro, and Morten Rasmussen, ARM: [RFC PATCH 0/2] sched: Task placement on mixed cpu_power systems

Mixed cpu_power patch set results  Evaluation system: ARM TC2 big.LITTLE Heterogeneous MP ARM 2xCA15 + 3xCA7

-1%

70 60

score

50

90

3.9-rc2

80

+shi

70

+shi+patches

60

34% 15%

30

11%

20 10

3.9-rc2 +shi +shi+patches

40 30 20

4%

0

10 0

hackbench sysbench_2t cyclictest sysbench_1t sysbench_5t

15

ARM 2xCA15

50

40

score

80

SMP

hackbench sysbench_2t cyclictest sysbench_1t sysbench_5t

TLT and Frequency Scaling  Observation  

There is no link between cpufreq and the scheduler.



Tasks appears to cause more load at lower frequencies, and thereby overestimates the load of the cpu.

TLT is based solely on runqueue residency and is therefore relative to the cpu compute capacity at the current frequency, not the potential compute capacity at higher frequencies.

 Proposed solution:

16



Make TLT frequency invariant by scaling load contribution by freq/max_freq.

 

Requires cpufreq-scheduler callback to pass frequency information. RFC patch set in development.

Interaction with middleware  TLT cannot provide all the information needed to make good scheduling decisions for all applications.

 Information about task importance is missing  Some long-running tasks may not require high performance.  Task dependency information is missing  A long-running task may depend small tasks.  Should the scheduler care about this or leave it to middleware?

17

More aggressive task packing  Some tasks don't fit into the  These can potentially be packed too.

Load Contrib.

long-running task and small task categories.

1200

Task Load Contrib.

1000 800 600 400 200 0 Time [ms]

 Hard to distinguish medium tasks from tasks transitioning from small to long-running or vice versa.  Packing a transitioning task is an unnecessary migration.  Further investigation required.

18

Conclusion  TLT is can easily be extended to improve power and performance.

 More work is needed on:   

19

Frequency scaling Interaction with middleware Better task packing

Questions?  Thanks for listening.

20

Using Task Load Tracking to Improve Kernel Scheduler Load ...

Using Task Load Tracking to Improve Kernel Scheduler Load Balancing.pdf. Using Task Load Tracking to Improve Kernel Scheduler Load Balancing.pdf. Open.

223KB Sizes 3 Downloads 296 Views

Recommend Documents

vBalance: Using Interrupt Load Balance to Improve I/O ...
very limited help in the hypervisor layer. In the guest OS,. vBalance can dynamically and adaptively migrate the inter- ..... events, making it too expensive in practice. The hypervisor scheduler treats each vCPU as .... Measure an interrupt imbalanc

Measuring Cognitive Task Load on a Naval Ship
Application of more and more automation in process control shifts the operator's task ... adapt in real time in response, with the goal being to improve total man machine ... cognitive state measurements, a wide variety of data is proposed [10,11]. O

load balancing
Non-repudiation means that messages can be traced back to their senders, ..... Traffic sources: web, ftp, telnet, cbr (constant bit rate), stochastic, real audio. ... updates, maintaining topology boundary etc are implemented in C++ while plumbing ..

Cable take up load binder
Feb 8, 1979 - cable or chain about the spool, winding force (torsion) must be .... is fastened to the platform to which the “U” shaped base. 15 is attached.

Dalton Load In.pdf
Mar 9, 2018 - Parking areas. • No FRC Team trailer or bus parking will be allowed in Zone C during the event other than. for unloading/loading purposes. • No vehicles should back up to the ground-level dock door in Zone C. Safety Notes: • Pedes

Torque Sensor Calibration Using Virtual Load for a ...
computed by the structural analysis,. 6 ..... 3.3 Error analysis of sensor calibration using virtual load ..... Kim, B. H., “Modeling and Analysis of Robotic Dual Soft-.

Monotonicity and Processing Load - CiteSeerX
The council houses are big enough for families with three kids. c. You may attend ..... data, a repeated measures analysis of variance was conducted with three.

Load Development Manual.pdf
Twenty incremental loads will take. you the Ladder up from 40.0 to 45.7 grains in 0.3 grain steps. When loading it is. important to keep track of which charge is in ...

Load Test Report -
Mar 11, 2013 - ERRORS. The first error happened at 5.02 seconds into the test when the number of concurrent users was at 42. Errors are usually caused by resource exhaustion issues, like running out of file descriptors or the connection pool size bei

Client-side Load Balancer using Cloud
any server side processing such as JSP, CGI, or PHP. 3.2 Overview and Originality. We present a new web server farm architecture: client-side load balancing.

Efficient Estimation of Critical Load Levels Using ... - IEEE Xplore
4, NOVEMBER 2011. Efficient Estimation of Critical Load Levels. Using Variable Substitution Method. Rui Bo, Member, IEEE, and Fangxing Li, Senior Member, ...

solidworks task scheduler pdf
Download now. Click here if your download doesn't start automatically. Page 1 of 1. solidworks task scheduler pdf. solidworks task scheduler pdf. Open. Extract.

How to LOAD Cccam to Cloud N3.pdf
Loading… Page 1. Whoops! There was a problem loading more pages. Retrying... How to LOAD Cccam to Cloud N3.pdf. How to LOAD Cccam to Cloud N3.pdf.

Load Frequency Control -
The above LFC system is equipped with the secondary integral control ... Step Input. 7. KI. 1 s. Integrator. 1. 10s+0.8. Inertia & load. 1. 0.2s+1. Governor. 20. 1/R ...

1499609918039-load-marketing-proposal-on-steroid-manager ...
Jun 24, 2017 - MarketingLlc. Page 2 of 3. Page 3 of 3. Page 3 of 3. 1499609918039-load-marketing-proposal-on-steroid-manager-rushed.pdf.

off load tap changer pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. off load tap changer pdf. off load tap changer pdf. Open. Extract.

A clonal algorithm to solve economic load dispatch
zones and also accounts for non-smoothness of cost function arising due to the use of multiple fuels. ... mentation, has been applied through various software packages to solve ... promising results are obtained in small sized power systems ...... ve

Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems
A core problem in peer to peer systems is the distribu- tion of items to be stored or computations to be car- ried out to the nodes that make up the system. A par-.

Tangentially-loaded high-load retrievable slip system
Jul 16, 2010 - BACKGROUND ... culty in creating acceptable holding strength. As the art to which this ... 5 is an illustration of an alternate slip ring con?gured.

Long-Term Load Forecasting of Jordanian - ijeecs.org
May 1, 2013 - natural resources, his imports from energy represent around ... [3] . (ii) Nonlinear models such as Artificial. Neural Networks (ANN) [4], Support Vector ..... Load Forecasting, Renewable Energy, Nuclear Energy, Economic and.

Cognitive Load Theory and the
from six experiments testing the consequences of split-source and integrated infor- .... The same cognitive load principles should apply to initial instruction and to ..... trical training board and had access to appropriate wires and electrical tool

1499609918039-load-marketing-proposal-on-steroid-manager ...
Gosmallbiz: Put Your Small Business'S Content Marketing On . This Short FilmIs ... 1499609918039-load-marketing-proposal-on-steroid-manager-rushed.pdf.