Enabling GPGPU Low-Level Hardware Explorations with MIAOW An Open Source RTL Implementation Raghu Balasubramaniam, Vinay Gangadhar, Mario Paulo Drumond, Ziliang Guo, Jai Menon, Cherin Joseph, Robin Paul Prakash, Sharath Prasad, Pradip Harindran Vallathol, Karu Sankaralingam

Canonical GPGPU Organization

Flexibility and Realism

Our implementation

What’s new?

• First synthesizable open source RTL implementation of GPGPU named MIAOW - Many-core Integrated Accelerator Of Wisconsin • OpenCL Compatibility • Based on a subset of the AMD Southern Islands (SI) ISA • Detailed performance, area and power characterizations • Case studies prove the versatility of the implementation • Now available to download at: www.miaowgpu.org

MIAOW Compute Unit Design

• Typical GPU works as a slave for the processor • The Compute Unit (CU) is the computational core of the GPU capable of executing vector instructions, scalar instructions and initiating memory accesses of multiple work-items(threads)

Comparison with Real GPU Implementations MIAOW GPGPU Compute Unit

Case Studies and Results Timing Speculation: Fig. (a) • 6% error rate at 115mV reduction of supply voltage conforming with established trends. Transient faults: Fig. (b) • Injected faults in flip-flops to study GPU susceptibility to transient faults or Silent Data Corruption (SDC) • Fault propagation to program output similar to trends of CPU

KAVERI APU Compute Unit

Fig. (a)

Fig. (b)

Sampling DMR on GPUs: Fig. (c) • Done at a workgroup (thread-blocks) granularity at 1% sampling rate

• 95 instructions of the SI ISA: mixture of vector, scalar, control, synchronization and memory • Fetch, decode and issue bandwidth of 1 • Each CU has 1 scalar ALU, 8 vector ALUs and 1 Load-Store unit

Summary

Fig. (c)

Source: AMD’s presentation at Hot Chips 26, 2014

FPGA Implementation (Neko) • One CU’s 1 SIMD and 1 SIMF mapped to Xilinx Virtex7 FPGA • Microblaze implements Ultra-thread dispatcher in software

• Reasonable approximation of commercial GPGPUs • Flexible and Realistic design - proved to be useful as shown by case studies • A research tool for exploring ideas that require RTL or Gate level representation of a GPGPU Compute Unit

Enabling GPGPU Low-Level Hardware Explorations with MIAOW An ...

Reasonable approximation of commercial GPGPUs. • Flexible and Realistic ... Microblaze implements Ultra-thread dispatcher in software. Our implementation.

825KB Sizes 1 Downloads 223 Views

Recommend Documents

MIAOW Whitepaper Hardware Description and Four ... - GitHub
design so likely to remain relevant for a few years, and has a ... Table 1: MIAOW RTL vs. state-of-art products (Radeon HD) .... details are deferred to an accompanying technical report. ...... our workloads and believe programs rarely do this.

Enabling Ubiquitous Sensing with RFID
ditional barcode technology, it also provides additional ... retail automation, the technology can help bridge the .... readers will have access to wireless net-.

gpucc: An Open-Source GPGPU Compiler - Research at Google
mean of 22.9%. Categories and Subject Descriptors D.3.4 [Programming ... personal identifiable information. ... 2. Overview. In this section, we will provide an overview of the system ...... Computer Science, 9:1910–1919, 2012. [11] S. Che, M. Boye

Enabling Pervasive Healthcare with Privacy ...
wireless communication technologies and to provide timely and precise medical ... of smart homes, rather than 3G cellular networks, for reducing. RHM cost.

Enabling Federated Search with Heterogeneous ...
Mar 22, 2006 - This report analyses Federated Search in the VASCODA context, specifically fo- cusing on the existing TIB Hannover and UB Bielefeld search infrastructures. We first describe general requirements for a seamless integration of the two fu

Enabling Federated Search with Heterogeneous Search Engines
Mar 22, 2006 - tional advantages can be gained by agreeing on a common .... on top of local search engines, which can be custom-built for libraries, built upon ...... FAST software plays an important role in the Vascoda project because major.

Enabling Ubiquitous Sensing with RFID
Radio frequency identification has attracted considerable press attention in recent years, and for good reasons: RFID not only replaces tra- ditional barcode ...

Enabling Federated Search with Heterogeneous Search Engines
Mar 22, 2006 - 1.3.1 Distributed Search Engine Architecture . . . . . . . . . . 10 ..... over all covered documents, including document metadata (author, year of pub-.

Enabling Contribution Awareness in an Overlay ...
The number of trees a node is entitled to depends on the amount of bandwidth it ... or business environments, symmetric connections (eg: Eth- ernet) are more common. ... to connect to a subset of trees and contribute in smaller bandwidth ..... 800. 0

Enabling Contribution Awareness in an Overlay Broadcasting System
Department of Electrical and Computer Engineering ... where insufficient bandwidth exists to support all hosts, but have not received ... port from the network.

Evolution of the IBM Cloud: Enabling an enterprise ...
T. Volin. H. Wagner. Cloud computing is a new paradigm that is transforming the ... aspects of providing such architecture while promoting scalability, modularity, and ... IBM offers IaaS services for enterprise customers [4]. This service allows ...

Enabling Contribution Awareness in an Overlay Broadcasting System
broadcasting applications using overlays in environments char- acterized by hosts with ... multitree framework and design heuristics to enable it to consider.

Enabling Contribution Awareness in an Overlay Broadcasting System
our system leverages the multi-tree framework [3] to enable application-level ...... and the dashed line shows the actual bandwidth received. 0. 20. 40. 60. 80.

Enabling Wireless Communications with State-of-the ...
Enabling Wireless Communications with ... State-of-the-Art RF CMOS and SiGe BiCMOS Technologies ... The unity current gain frequency, ft is defined as the.

DWDM-RAM: enabling grid services with dynamic optical networks ...
an OGSI/OGSA compliant service interface and will promote greater convergence between dynamic optical networks and data intensive Grid computing. 1.