Adaptive Correction of Sampling Bias in Dynamic Call Graphs Byeongcheol Lee Gwangju Institute of Science and Technology

January 19, 2016

This talk is based on ACM Transaction on Architecture and Code Optimization, Vol 12, No. 4, Article 45 (December 2015)

1 / 28

Profiling dynamic call graphs

main 12

I

DCG g = (N, E , freq) I I I I

I

foo

12 bar

N as a set of procedures E as a set of caller-callee relationships freq as a function mapping call-callee pairs to frequency Concise frequency statistics of the call events in a program run

Clients I

Manual offline analysis I I I

I

Examine performance bottlenecks Collect exact offline DCGs gprof [Graham et al. ’82]

Automatic online analysis and optimization I I I

Java virtual machines [Arnold et al. ’00, Nakaike et al. ’14] Collect approximate online DCGs Aggressive adaptive inlining

2 / 28

Accuracy-overhead tradeoffs in profiling DCGs 25 Full instrumentation [Graham et al. ’82]

Overhead (%)

20 15 10 Timer-based sampling [Arnold et al. ’00] [Arnold & Grove ’05]

5

Adaptive error correction (this talk)

0 40

60

80

100

Accuracy (%)

3 / 28

Outline

I I

Introduction Background on profiling dynamic call graphs I I

I I I I

Full instrumentation Timer-based sampling

Sampling bias Adaptive correction Evaluation Conclusion

4 / 28

Profiling exact dynamic call graphs [Graham et al. ’82]

v o i d main ( ) { int i ; f o r ( i =0; i <12; i ++) A: foo ( ) ; f o r ( i =0; i <12; i ++) B: bar ( ) ; } v o i d f o o ( ) {} v o i d b a r ( ) {}

5 / 28

Generating instrumented Programs v o i d main ( ) { int i ; f o r ( i =0; i <12; i ++) A: foo ( ) ; f o r ( i =0; i <12; i ++) B: bar ( ) ; } v o i d f o o ( ) {} v o i d b a r ( ) {}

v o i d main ( ) { int i ; f o r ( i =0; i <12; i ++) A: foo ( ) ; f o r ( i =0; i <12; i ++) B: bar ( ) ; report (); } void foo () { update ( ) ; } void bar () { update ( ) ; }

6 / 28

Running instrumented programs An activation tree and call events main ( ) foo () update () A: foo () update () ... B: bar () update () B: bar () update () ... report ()

A:

R e c o r d i n g t h e c a l l e v e n t from main t o f o o (A) R e c o r d i n g t h e c a l l e v e n t from main t o f o o (A)

R e c o r d i n g t h e c a l l e v e n t from main t o b a r (B) R e c o r d i n g t h e c a l l e v e n t from main t o b a r (B) S t o r e t h e DCG i n t o a f i l e ( ” gmon . o u t ” )

main A: 12 B: 12

A sequence of recored call events

foo

bar

A A A A A A A A A A A A B B B B B B B B B B B B

7 / 28

Timer-based sampling [Arnold et al.’00] boolean takeSample = f a l s e ; void t i m e r t i c k s () { while ( true ) { s l e e p ( INTERVAL ) ; takeSample = t r u e ; } } void update () { . . . /∗ u p d a t e DCG ∗/ takeSample = f a l s e ; }

v o i d main ( ) { int i ; start thread ( timer ticks ); f o r ( i =0; i <12; i ++) A: foo ( ) ; f o r ( i =0; i <12; i ++) B: bar ( ) ; report (); } void foo () { i f ( takeSample ) update ( ) ; } void bar () { i f ( takeSample ) update ( ) ; } 8 / 28

Sampling approximated dynamic call graphs An activation tree and call events main ( ) foo () update () foo () foo () update () ... foo () bar () update () bar () bar () update () ...

R e c o r d i n g t h e c a l l e v e n t from main t o f o o (A)

R e c o r d i n g t h e c a l l e v e n t from main t o f o o (A)

R e c o r d i n g t h e c a l l e v e n t from main t o b a r (B)

R e c o r d i n g t h e c a l l e v e n t from main t o b a r (B) main 6

A sequence of recored call events

foo

6 bar

A A A A A A A A A A A A B B B B B B B B B B B B 6 samples of A and 6 samples of B 9 / 28

Ideal completely fair sampling

Equally spaced events A A A A A A A A A A A A B B B B B B B B B B B B Equally spaced sampling activities

freq(A) = 1 + 1 + 1 + 1 + 1 + 1 = 6 freq(B) = 1 + 1 + 1 + 1 + 1 + 1 = 6 freq(A) = freq(B)

10 / 28

Sampling errors from unequally spaced events

Dense events AAAAAAAAAAAA

Unequally spaced call events Sparse events B B B B B B B B B B B B

Equally spaced sampling activities

freq(A) = 1 + 1 + 1 + 1 = 4 freq(B) = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 = 8 freq(A) 6= freq(B)

11 / 28

Sampling errors from unequally spaced sampling activities

Equally spaced events A A A A A A A A A A A A B B B B B B B B B B B B Sparse sampling activities Dense sampling activities Unequally spaced sampling activities

freq(A) = 1 + 1 + 1 + 1 = 4 freq(B) = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 = 8 freq(A) 6= freq(B)

12 / 28

Unequally weighting samples from irregularly spaced events

Dense events AAAAAAAAAAAA

Unequally spaced call events Sparse events B B B B B B B B B B B B

Equally spaced sampling activities

The density of the A events is twice of the density of the B events. ⇒ freq(A) = 2 + 2 + 2 + 2 = 8 freq(B) = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 = 8 freq(A) = freq(B)

13 / 28

Unequally weighting samples from irregular sampling activities Equally spaced events A A A A A A A A A A A A B B B B B B B B B B B B Sparse sampling activities Dense sampling activities Unequally spaced sampling activities

The density of the first four sampling activities is twice of the density of the next eight sampling activities. ⇒ freq(A) = 2 + 2 + 2 + 2 = 8 freq(B) = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 = 8 freq(A) = freq(B)

14 / 28

Adaptive correction of sampling bias

I

Compute anti-bias weights at each sampling activity I I I

Proportional to the density of call events Inversely proportional to the density of sampling activities Use hardware performance counters (e.g., IA-32 HPM) I I

BR INST RETIRED.NEAR CALL for counting call events rdtsc for timing sampling activities

I

Increment the DCG frequency of a sample by its anti-bias weight

I

Straightforward implementation in JVMs (e.g., Jikes RVM)

15 / 28

Experimental setup

I

Environment I I I I

I

Benchmarks I I I

I

Intel Xeon E5-2665 2.4 GHZ 16 GB DDR3-1500 main memory 32bit Ubuntu 12.04 LTS distribution Linux 3.2.0-48 kernel 2 microbenchmarks 7 benchmarks from SPECjvm98 11 benchmarks from DaCapo 2006-MR2

Dynamic optimization system I I

Jikes RVM 3.1.3 Implementation of adaptive correction

16 / 28

Measuring overhead, accuracy, and performance I

Reducing nondeterministic results I I

I

Overhead and accuracy I I I

I

Disable the adaptive optimization in Jikes RVM Take medians of measurement values out of 40 trials Opt0 Methodology O0 Optimizing compiler at the first invocation Profile DCGs that influence adaptive inlining

Performance I I I I I I

Replay methodology of iterating applications twice Use offline profiles and optimization advices 1st iteration compilation and application run 2nd iteration - application run Report the 2nd iteration Estimate the steady state performance

17 / 28

1.00

0.98

0.96

0.94

0.92

0.90

Normalized execution time

Overhead

Adaptive correction Sampling

n ea M eo G n la xa d h pmearc s x lu nde i lu on th jy ldb q hs p fo se lip ec rt a ch t oa bl lr t an ck ja rt dio t u m ega p m ac e v ja ac r yt ra s ss es je pr m co

18 / 28

Adaptive correction Sampling 100 90 80 70 60 50 40 30 20 10 0

Overlap accuracy (%)

Accuracy

ge ra ve A n la xa d h pmearc s x lu nde i lu on th jy ldb q hs p fo se lip ec rt a ch t oa bl lr t an ck ja rt dio t u m ega p m ac e v ja trac y ra s ss es je pr m co

19 / 28

1.00

Adaptive correction Sampling

0.95

0.90

0.85

0.80

Normalized execution time

Performance

n ea M eo G n la xa d h pmearc s x lu nde i lu on th jy ldb q hs p fo se lip ec rt a ch t oa bl lr t an ck ja rt dio t u m ega p m ac e v ja ac r yt ra s ss es je pr m co

20 / 28

Summary I

Profiling dynamic call graphs I I

I

Inaccurate profiles from timer-based sampling I I

I

Unequal spacing of call events Unequal spacing of sampling activities

Adaptive correction I I I

I

Full instrumentation for exact profiles Timer-based sampling for approximated profiles

Measure unequal spacing of events and sampling activities Compute adjust weight values at each timer tick Weight each sample unequally

Results I I I

Unmeasurable overhead Significant accuracy improvement Modest speedup in adaptive inlining

21 / 28

22 / 28

23 / 28

Backup slides

24 / 28

Computing anti-bias weights

I

Measuring unequal spacing of events and sampling activities I I I I I

I

t1 , t2 , ..., ti , ... are timer ticks in a sampling system density (ti ) is the number of events per CPU cycle at ti latency (ti ) is the sampling latency in CPU cycles at ti Use hardware performance monitoring unit to count events Use CPU time-stamp counters to count CPU cycles

Adaptive correction of sampling bias I I I

density (ti ) Compute weight(ti ) = 1+γ×latency (ti ) × 1000 at ti Choose constant γ such that weight(ti ) ranges from 0 to 1000 Weight each sample at ti by weight(ti ).

25 / 28

Implementation in Jikes RVM 3.1.3

I

Timer thread I

I

At timer tick, record the TSC for each application thread

Application thread I

Thread startup

I

Yield points

I

I I I

I

configure PMU to count call instructions Compute latency since the most recent timer tick Compute call event density and and weight Enqueue the sample and its weight into the sampling buffer

DCG construction thread I I

Dequeue call event samples and their weights Increment the frequency of the call edges by the weights

26 / 28

Accuracy metric Consider the exact DCG gexact = (Vexact , Eexact , fexact ) and an approximate DCG gsample = (Vsample , Esample , fsample ). First, we normalize frequency values:

wsample (e) = wexact (e) =

fsample (e) ei ∈Esample fsample (ei )

e ∈ Esample

fexact (e) ei ∈Eexact fexact (ei )

e ∈ Eexact

P

P

Then, the accuracy is a sum of minimum of normalized frequency values over common P call edges: accuracy (gsample ) = e∈Esample min (wsample (e), wexact (e))

27 / 28

Accuracy metric example Call edge e1 e2 e3 e4 e5 total

accuracy (gsample ) =

gexact fexact wexact 300 0.43 100 0.14 100 0.14 180 0.26 20 0.03 700 1.00

gsample fsample wsample 3 0.60 1 0.20 1 0.20

5

X

1.00

min (wsample (e), wexact (e))

e∈Esample

= min (0.43, 0.60) + min (0.14, 0.20) + min (0.14, 0.20) =

0.43 + 0.14 + 0.14

=

0.71 28 / 28

Adaptive Correction of Sampling Bias in Dynamic Call ...

Jan 19, 2016 - Profiling dynamic call graphs main foo. 12 bar. 12. ▷ DCG g = (N,E,freq). ▻ N as a set of procedures. ▻ E as a set of caller-callee relationships.

415KB Sizes 0 Downloads 237 Views

Recommend Documents

Dynamic forward error correction
Nov 5, 2003 - sponding error correction data therebetWeen during a plural ity of time frames. ..... mobile sWitching center, or any communication device that can communicate .... data according to a cyclical redundancy check (CRC) algo rithm and ...

Adaptive Sampling based Sampling Strategies for the ...
List of Figures. 1.1 Surrogate modeling philosophy. 1. 3.1 The function ( ). ( ) sin. y x x x. = and DACE. 3.1a The function ( ). ( ) sin. y x x x. = , Kriging predictor and .... ACRONYMS argmax. Argument that maximizes. CFD. Computational Fluid Dyna

Information Acquisition and Portfolio Bias in a Dynamic ...
prior information advantages, and hypothesizes that such large information ... countries for which there is an extensive amount of portfolio data available, with .... analysis, and do not speak directly to the evolution of the home bias over time.

Adaptive-sampling algorithms for answering ...
queries on Computer Science, then in order to maintain a good estimation, we ... study how to use sampling techniques to answer aggregation queries online by ... answer to the query using samples from the objects in the classes relevant to ...... He

Domain Adaptation and Sample Bias Correction Theory and Algorithm ...
Jun 21, 2013 - but labeled data from a source domain somewhat similar to the target, as .... analysis concentrates on the problem of adaptation in regression.

Sample Selection Bias Correction Theory - NYU Computer Science
correction experiments with several data sets using these techniques. ... statistics and machine learning for a variety of problems of this type (Little & Rubin,. 1986). ... This paper gives a theoretical analysis of sample selection bias correction.

Domain Adaptation and Sample Bias Correction Theory and Algorithm ...
Jun 21, 2013 - We present a series of new theoretical, algorithmic, and empirical results for domain adap- tation and sample bias correction in regression.

Domain Adaptation and Sample Bias Correction Theory and Algorithm ...
Jun 21, 2013 - Email addresses: [email protected] (Corinna Cortes), ... with any algorithm that allows for weighted examples. A somewhat different ...... Indeed, σ = 0.2 is the value for which KLIEP best matches ̂P (Figure 9, middle-.

Geographical sampling bias and its implications for ...
and compared it with sampling intensity in non-priority areas. We applied statistical ... the goal of maximizing species coverage, species represented by a single .... each grid cell are then multiplied in order to eliminate dis- junct or marginal ..

localized learning with the adaptive bias perceptron
Self-organization is a biological phenomenon in which large networks of simple organisms. (cells, termites, fish) exhibit complex behavior beyond the ...

Adaptive Maximal Poisson-disk Sampling on Surfaces
applications. Several novel and efficient operators are developed for improving the sampling/meshing quality over the state-of-the- art. CR Categories: I.3.6 [Computer ..... A spatial data struc- ture for fast Poisson-disk sample generation. ACM Tran

Adaptive Dynamic Inversion Control of a Linear Scalar Plant with ...
trajectory that can be tracked within control limits. For trajectories which ... x) tries to drive the plant away from the state x = 0. ... be recovered. So for an ..... 375–380, 1995. [2] R. V. Monopoli, “Adaptive control for systems with hard s

Correction
Nov 25, 2008 - Sophie Rutschmann, Faculty of Medicine, Imperial College. London ... 10550 North Torrey Pines Road, La Jolla, CA 92037; †Cancer and.

Correction
Jan 29, 2008 - Summary of empirical and computed Arrhenius parameters. SLO mutant. Experimental Arrhenius parameters. Calculated Arrhenius parameters ...

Correction
Nov 25, 2008 - be credited with performing research and analyzing data. The online version has been corrected. The corrected author and affiliation lines, and ...

Correction
Jan 29, 2008 - AH/AD. Ea(D). Ea(H), kcal/mol. AH/AD r0, Å. Gating, cm 1. WT. 2.1. 0.2‡. 0.9. 0.2‡. 18. 5‡. 1.0§. 15§ ... ‡Data from ref. 15. §Data from ref. 16.

Correction
Correctionwww.pnas.org/content/early/2009/02/02/0811993106.short

Burn-in, bias, and the rationality of anchoring - Stanford University
The model's quantitative predictions match published data on anchoring in numer- ... In cognitive science, a recent analysis concluded that time costs make.

Gender Bias in NFA-LD: An Examination of ...
call from the 1974 National Developmental Conference on Forensics (Mc-. Bath, 1975) jointly sponsored by the American Forensic Association and the. National ...

éBIAS/
Nov 13, 1995 - output signal of the photo-detector increases with an increas. U'S' P ...... digital measurement signal produced at the output 110 of the counter.

Bias Neglect
community is that there should be less aggression between ants that share a nest ... condition. Blind studies were much more likely to report aggression between.

Adaptive Partitioning for Large-Scale Dynamic Graphs
system for large-scale graph processing. In PODC,. 2009. [3] I. Stanton and G. Kliot. Streaming graph partition- ing for large distributed graphs. In KDD, 2012. [4] L. Vaquero, F. Cuadrado, D. Logothetis, and. C. Martella. xdgp: A dynamic graph pro-

EURASIP-Adaptive Transport Layer Protocol for Highly Dynamic ...
EURASIP-Adaptive Transport Layer Protocol for Highly Dynamic Environment 0.807.pdf. EURASIP-Adaptive Transport Layer Protocol for Highly Dynamic ...

A Dynamic and Adaptive Approach to Distribution ...
the performance of the underlying portfolio or unforeseen ... Distribution Planning and Monitoring by David M. .... performance-based withdrawal methodolo-.