Reducing OLTP Instruction  Misses with Thread Migration Islam Atta    Pınar Tözün Anastasia Ailamaki Andreas Moshovos University of Toronto École Polytechnique Fédérale de Lausanne

OLTP on a Intel Xeon5660 Shore‐MT Hyper‐threading disabled  100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

0.8

Breakdown of Core Stalls

Instructions  per Cycle

better

0.9

Resource (includes data) Instructions

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

TPC‐C

TPC‐E

TPC‐C

TPC‐E

IPC < 1 on a 4‐issue machine 70‐80% of stalls are instruction stalls

2

OLTP L1 Instruction Cache Misses Misses per k‐Instruction

better

60 50

Most common today!

40

Trace Simulation 4‐way L1‐I Cache  Shore‐MT

30 TPC‐C

20

TPC‐E 10 0 16

32

64

128

256

512

1024

Cache Size (KB)

~512KB is enough for OLTP instruction footprint

3

Reducing Instruction Stalls

at the hardware level

• Larger L1‐I cache size  Higher access latency

• Different replacement policies  Does not really affect OLTP workloads

• Advanced prefetching  Has too much space overhead (40KB per core)

• Simultaneous multi‐threading  Increases IPC per hardware context  Cache polluting

4

Alternative: Thread Migration • Enables usage of aggregate L1‐I capacity – Large cache size without increased latency

• Can exploit instruction commonality – Localizes common transaction instructions

• Dynamic hardware solution – More general purpose

5

Transactions Running Parallel Instruction parts that can fit into L1‐I Threads Transaction

T1

T2

T3

T3 T2 T1

Common instructions among concurrent threads

6

Scheduling Threads Threads T1 Total Misses

1 T2

time

3 6

T3 9 10

0 T1

Traditional

TMi

CORES 1 2

CORES 1 2

3

0 T1

3

1

L1I T1

T2

T1

T2

T1

T2

T2

T1

T3

T3

T2

T3

T1

T3

Total Misses

2 T1 T3

3 T2 T3

4 4  7

TMi Transaction A T1 T2

CORES 0 1 L1I

Transaction B T3 T4

time

T1

• Group threads • Wait till L1‐I is almost full – Count misses – Record last N misses – Misses > threshold => Migrate

8

TMi Transaction A T1 T2

CORES 0 1

time

T1 T2 T1 T1

Where to migrate?

• Check the last N misses recorded  L1I in other caches 1) No matching cache => Move to an idle core if exists T1 2) Matching cache => Move to that core T2 3) None of above =>  Do not move T2

9

Experimental Setup • Trace Simulation – – – – –

PIN to extract instructions & data accesses per transaction 16 core system 32KB 8‐way set‐associative L1 caches Miss‐threshold is 256 Last 6 misses are kept

• Shore‐MT as the storage manager – Workloads: TPC‐C, TPC‐E

10

Impact on L1‐I Misses Misses per k‐Instruction

better

45 40 35

Instruction

30 25 20 15 10 5 0 No Migration

TMi TPC‐C

TMi Blind

No Migration

TMi

TMi Blind

TPC‐E

Instruction misses reduced by half

11

Impact on L1‐D Misses Misses per k‐Instruction

better

45 40 35

Write Data Read Data Instruction

30 25 20 15 10 5 0 No Migration

TMi TPC‐C

TMi Blind

No Migration

TMi

TMi Blind

TPC‐E

Cannot ignore increased data misses

12

TMi’s Challenges • Dealing with the data left behind – Prefetching

• Depends on thread identification – Software assisted – Hardware detection 

• OS support needed – Disabling OS control over thread scheduling

13

Conclusion • ~50% of the time OLTP stalls on instructions • Spread computation through thread migration • TMi – Halves L1‐I misses – Time‐wise ~30% expected improvement – Data misses should be handled

Thank you!

14

Reducing OLTP Instruction Misses with Thread Migration

Transactions Running Parallel. 6. T1. T2. T3. Instruction parts that can fit into L1-I. Threads. Transaction. T123. Common instructions among concurrent threads ...

254KB Sizes 3 Downloads 259 Views

Recommend Documents

Interoperability with multiple instruction sets
Feb 1, 2002 - 712/209,. 712/210. See application ?le for complete search history. ..... the programmer speci?es the sorting order is to pass the address of a ...

Interoperability with multiple instruction sets
Feb 1, 2002 - ABSTRACT. Data processing apparatus comprising: a processor core hav ing means for executing successive program instruction. Words of a ...

Thread Injection.pdf
Identify main thread. 3. Suspend main thread. 4. Obtain thread context. 5. Create and write the code-cave. 6. Spoof instruction pointer to execute the code-cave.

PDF Download Tula Pink Coloring with Thread
... with Thread: Stitching a Whimsical World with Hand Embroidery ,digital book Tula Pink ..... with Hand Embroidery ,upload epub to kindle Tula Pink Coloring with Thread: Stitching a ..... and stitch her signature designs with needle and tread!

Reducing Costs and Complexity with WAP Gateway 2.0 ... - F5 Networks
Page 1 ... WAP Gateway 2.0 Offload. The biggest challenges communications service providers (CSPs) face when supporting their networks continue to be optimizing network architecture and reducing costs. Wireless. Access Protocol (WAP) ...

X-Ray Misses, CT Kisses.pdf
cystic carcinoma type [2] Squamous cell carcinomas typically. occur later in life and more frequently in men and smokers, while. adenoid cystic carcinomas are ...

Reducing Risk, Fraud and Downtime with Money ... - ACI Worldwide
Transfer System, three different financial institutions reap the benefits from their cloud-based payments solution. — offloading IT to focus on delivering mission- ...

Gneis tackles rising maintenance costs with RISC migration - Intel
Since 1997, the company had been using RISC servers for applications including Bankinter's ... three other hardware and software combi- nations, including ...

High Performance Virtual Machine Migration with ...
consolidation, performance isolation and ease of management. Migration is one of the most important features .... is preferable to explore an intelligent way that minimizes the contention on network bandwidth, while utilizing ..... grants and equipme

pdf-0734\tracking-animal-migration-with-stable-isotopes-volume-2 ...
... of the apps below to open or edit this item. pdf-0734\tracking-animal-migration-with-stable-isotopes-volume-2-terrestrial-ecology-from-academic-press.pdf.

Gneis tackles rising maintenance costs with RISC migration - Intel
migrated key applications to servers based on the Intel Xeon processor E5 family, running. Red Hat* ... results for our own business needs, and we found that many of our systems were locked in ... three other hardware and software combi-.

Reducing Systemic Cybersecurity Risk - OECD
Jan 14, 2011 - views of the OECD or of the governments of its member countries. ...... seeking to punish downloaders of copyright material, against the .... to focus more on the process of analysing risk rather than simply having a long list ... abou

Gneis tackles rising maintenance costs with RISC migration ... - Media13
One of the top five banks in the country ... on whichever platform offers the best price/performance ratio.” Spanish ... ing website was migrated next, achieving.

The Role Of Data Mining, Olap,Oltp And Data Warehousing.
The designer must also deal with data warehouse administrative processes, which are complex in structure, large in number and hard to code; deadlines must ...

Migration - Darryl.pdf
MacDonald, John M., and Robert J. Sampson. "Don't Shut the Golden Door. ... Bush Center, George W. Bush Institute, 2016, ... Page 3 of 3. Migration - Darryl.pdf.

Migration - Darryl.pdf
An example of positive socio-economie change would be Hazleton in Pennsylvania. Formerly known for a population with strong anti-immigration views has changed completely,. endorsing Latinos.ias they have helped reverse the local economic decline. Wit