RISC-V Memory Consistency Model Status Update Dan Lustig and the Memory Model TG Nov. 28, 2017

WHAT WERE OUR GOALS? • Define the RISC-V memory consistency model • Specifies the values that can be returned by loads

• Support a wide range of HW implementations • Support Linux, C/C++, and lots of other critical SW

2

DECADES MONTHS OF DEBATE Strong Models (e.g., x86-TSO)

Weak Models

(e.g., ARM, IBM Power)

• Stricter ordering rules

• More relaxed ordering rules

• Simpler for programmers and for architects

• Better perf/power/area

• More microarchitectural freedom

Linux/C/C++/Java/… work either way! That’s not a deciding factor 3

FINDING A COMPROMISE • First step: narrow choice to two specific models • RVTSO: as used by SPARC, x86 (strong) • RVWMO: similar to ARMv8

(weak)

• “RISC-V Weak Memory Ordering”

• Both are “multi-copy atomic” = both are simpler than the IBM Power and ARMv7 memory models 4

THE RISC-V MEMORY MODEL PLAN • Base RISC-V memory model: RVWMO • HW can still be as conservative as it wants to be • Portable SW must nevertheless assume RVWMO

• New ISA extension “Ztso”: HW which implements RVTSO can (optionally) choose to expose this to SW • E.g., a RISC-V core can implement “IMAFD+Ztso” 5

ARCHITECTURAL INTUITION (FOR BOTH) Pipelines

(In-order or OoO)

F D X M W

F D X M W

F D X M W

F D X M W

$

$

$

$

Hart-private buffering

(values can be forwarded only to loads from the same hart that issued that store)

Atomic Memory

(all values globally visible to all harts, possibly via a coherence protocol)

Last-level cache 6

ARCHITECTURAL INTUITION (FOR BOTH) Pipelines

(In-order or OoO)

F D X M W

F D X M W

F D X M W

F D X M W

Hart-private buffering

(values can be forwarded only to loads from the same hart that issued that store)

Atomic Memory

(all values globally visible to all harts, possibly via a coherence protocol)

$

$

$

$

• RVWMO and RVTSO differ in the degree of memory access reordering they permit at the point of global visibility • RVTSO: only storeload reordering can be observed

• RVWMO: unless otherwise synchronized (e.g., via .aq, .rl, and/or fences), most memory accesses can be reordered freely

Last-level cache 7

RVWMO RULES IN A NUTSHELL • A guaranteed to happen before B only if: A

A

Same addr.

Fence

Store B

B

.aq A

B

A

.rl B

.aqrl A

.aqrl B

A

AMO B .aqrl

AMO A .aqrl

A Addr/ctrl/ data dep.

B

with pr/pw/sr/sw set appropriately

B except ctrl deps. where B is a load

• Each load returns value from most recent store to same address • No store from another hart can interrupt an AMO or LR/SC • Available offline: Alloy/herd/operational model, lots of docs (some last details still being finalized…)

8

RVTSO RULES IN A NUTSHELL

Strictly stronger than RVWMO

• A guaranteed to happen before B only if: A

Load A

A

A

AMO A

AMO B

B

Fence Store B

B

B with pw and sr

• Each load returns value from most recent store to same address • No store from another hart can interrupt an AMO or LR/SC • To be made available: Alloy/herd/operational model, docs 9

THE RISC-V MEMORY MODEL: SOFTWARE • Portable SW must assume RVWMO, but it will run on both RVWMO HW and Ztso HW • Linux, gcc, bintools, … will target RVWMO by default

• RVTSO-only SW can be written, but it will only run on HW implementing Ztso •

Standard SW (RVWMO) RVTSOonly SW

Base HW (RVWMO)

HW using Ztso ISA ext.

OK

OK

WILL NOT RUN

OK

Object files will use different magic number 10

EVERYONE GETS WHAT THEY WANT • If you don’t want to think about memory models: just use the standard OSes and toolchains • If you care about PPA or flexibility: use RVWMO

• If you have lots of legacy x86 code: use HW implementing Ztso, so that any SW will work • If you believe TSO is the future: use HW with Ztso, and emit code with the “TSO-only” magic number 11

FRAGMENTATION • Risk: SW will fragment into “RVWMO version” and “RVTSO version”, double the maintainer burden • Solution: discourage the 2x model, and encourage RVWMO SW, since it’s portable across all HW • Current plan for Linux, gcc, binutils, etc. • Redundant fences simply become no-ops

• However: can’t stop people from customizing; RISC-V’s openness is a feature, not a bug 12

OTHER ISA CHANGES • ld.rl and sd.aq are deprecated

• ld.aqrl and sd.aqrl means RCsc, not fully fenced • Clarified other subtleties of atomicity and LR/SCs

• Possibly in a future extension: • Fences may take an address restriction parameter • Opcodes for “l{b|h|w|d}.aq” and “s{b|h|w|d}.rl” 13

I/O FENCES, FENCE.I, SFENCE.VM • Informal descriptions given in memory model spec

• No change to I/O channel ordering behavior • Some clarification to FENCE .pi/.po/.si/.so

• More detail available offline • Future: V/T/J compatibility as well

14

DOCUMENTATION & TOOLS • Definitions of RVWMO/RVTSO and Ztso • A dozen pages explaining the details in plain English • Two axiomatic models (Alloy + herd) • One formal operational model + app • Lots of litmus tests •

(also to be used to test compliance)

• …and the memory model TG as a resource to people who have questions • Come find me or email me! 15

RISC-V MEMORY MODEL ROADMAP • We’ll discuss this further Thursday at the members-only meeting • Or come find me in the meantime

• We’ll aim to release the complete documentation publicly (on isa-dev) soon after that • If all goes well, we’ll work with the Technical Committee to work towards ratification 16

RISC-V Memory Consistency Model Status Update

Nov 28, 2017 - 2. WHAT WERE OUR GOALS? • Define the RISC-V memory consistency model. • Specifies the values that can be returned by loads. • Support a wide range of HW implementations. • Support Linux, C/C++, and lots of other critical SW ...

574KB Sizes 8 Downloads 231 Views

Recommend Documents

May 26 Status Update Widgets - GitHub
MissionPlanner.app ... but we estimate 6pm. ... for the unpaid hours of washing machine labor at a cost that seems unreasonable for a .... st ...

Weak Atomicity Under the x86 Memory Consistency ...
Feb 16, 2011 - Keywords Software Transactional Memory, x86 Memory Model. 1. Introduction ... C++: Catch fire due to data race, any result allowed ... clude only programs with Transactional Reads Unprotected Writes. Copyright is held by ...

Weak Atomicity Under the x86 Memory Consistency ...
Feb 16, 2011 - Programming Techniques Concurrent Programming. General Terms ... In contrast to SLA work in these language level mem- ory models, there ...

Workspace Consistency: A Programming Model for ...
the statement merely evaluates all right-side expressions. (in some order) .... usually indicate software bugs, one response is to throw a runtime exception.

Workspace Consistency: A Programming Model for ...
tor prototype, particularly for applications demanding pipeline parallelism or “all-to-all” communication like the MapReduce model. During a MapReduce, for ...

Status Update: Adaptive Motion Estimation Search Range
Aug 3, 2009 - 33% of CPU time. • 90% of memory access ... CPU usage*. Motion .... H.264 / AVC Baseline Profile. Ref. Software. JM 15.1. Frame Size. QCIF.

Watch Status Update Full Movie Online Free (HD 1080P Streaming ...
Page 1 of 1. Watch Status Update Full Movie Online Free (HD 1080P Streaming) DVDrip.MP4.pdf. Watch Status Update Full Movie Online Free (HD 1080P ...

notes range extension and status update for the ...
Arkansas Field Office, The Nature Conservancy, 601 North University Avenue, Little Rock, AR 72205. (GOG, MES, SRM). Department of Zoology, University of Oklahoma, 730 Van Vleet Oval, Norman, OK 73019 (DBF). Department of Biology, Wittenberg Universit

socializing consistency
often rather interact with a person than a machine: Virtual people may represent a ..... (Cook, 2000), an active topic of discussion as telephone-based call.

socializing consistency
demonstrates that as interfaces become more social, social consistency .... action with any number of such complex beings on a daily basis. .... media (stereotypically gender neutral), and computers (stereotypically male) ... In line with predictions

Consistency Without Borders
Distributed consistency is a perennial research topic; in recent years it has become an urgent practical matter as well. The research literature has focused on enforcing various flavors of consistency at the I/O layer, such as linearizability of read

a scalable sparse distributed neural memory model
6.10 Average recovery capacity of the memory with error bits . . . . . 83 ... 9.6 Virtual environment . .... diction, machine vision, data mining, and many others.

User-directed Non-Disruptive Topic Model Update for ...
lication database to classify the collection of papers into three topics: Natural Language Processing (NLP), ... algorithms to address these basic usability issues. Our work is the first in this direction. Topic Model ... some use Variational Bayes (

DRFx: A Simple and Efficient Memory Model for ...
To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PLDI'10, June 5–10, 2010, Toronto, ...

Efficient Processor Support for DRFx, a Memory Model ...
defines the order in which memory operations performed by one thread become visible to other threads. ..... which each region is executed atomically in some global sequential order consistent with the per-thread order of ..... require that the operat

On a concept of sample consistency and its application to model ...
ria for model selection in mathematical statistics, Preprint 05-18, GSF. Neuherberg, 2005, 20p. [8] D. Williams, Probability with Martingales, Cambridge University Press,. 2001 (first printed in 1991). Table 1: Characteristics of sample consistency f

End-To-End Sequential Consistency - UCLA CS
Sequential consistency (SC) is arguably the most intuitive behavior for a shared-memory multithreaded program. It is widely accepted that language-level SC could significantly improve programmability of a multiprocessor system. How- ever, efficiently

SCWR Status
Feb 28, 2013 - Evolutionary development from current water cooled reactors. • Cooled .... Technology development ongoing with a focus on GIF objectives of ...

range extension and status update of the endangered hell creek cave ...
vice, Atlanta, Georgia. WELLS, P. H. 1959. ... The Cambarus zophonastes Recovery Plan cited limiting factors .... of baseline data on water quality. Grab samples.

range extension and status update of the endangered hell creek cave ...
The first published records of C. zophonastes ... sored the first census, and Arkansas Natural. Heritage .... onomic expertise; mapping and logistical support by.

Consistency of trace norm minimization
learning, norms such as the ℓ1-norm may induce ... When learning on rectangular matrices, the rank ... Technical Report HAL-00179522, HAL, 2007b. S. Boyd ...