Enabling GPGPU Low-Level Hardware Explorations with MIAOW An Open Source RTL Implementation Raghu Balasubramaniam, Vinay Gangadhar, Mario Paulo Drumond, Ziliang Guo, Jai Menon, Cherin Joseph, Robin Paul Prakash, Sharath Prasad, Pradip Harindran Vallathol, Karu Sankaralingam
Canonical GPGPU Organization
Flexibility and Realism
Our implementation
What’s new?
• First synthesizable open source RTL implementation of GPGPU named MIAOW - Many-core Integrated Accelerator Of Wisconsin • OpenCL Compatibility • Based on a subset of the AMD Southern Islands (SI) ISA • Detailed performance, area and power characterizations • Case studies prove the versatility of the implementation • Now available to download at: www.miaowgpu.org
MIAOW Compute Unit Design
• Typical GPU works as a slave for the processor • The Compute Unit (CU) is the computational core of the GPU capable of executing vector instructions, scalar instructions and initiating memory accesses of multiple work-items(threads)
Comparison with Real GPU Implementations MIAOW GPGPU Compute Unit
Case Studies and Results Timing Speculation: Fig. (a) • 6% error rate at 115mV reduction of supply voltage conforming with established trends. Transient faults: Fig. (b) • Injected faults in flip-flops to study GPU susceptibility to transient faults or Silent Data Corruption (SDC) • Fault propagation to program output similar to trends of CPU
KAVERI APU Compute Unit
Fig. (a)
Fig. (b)
Sampling DMR on GPUs: Fig. (c) • Done at a workgroup (thread-blocks) granularity at 1% sampling rate
• 95 instructions of the SI ISA: mixture of vector, scalar, control, synchronization and memory • Fetch, decode and issue bandwidth of 1 • Each CU has 1 scalar ALU, 8 vector ALUs and 1 Load-Store unit
Summary
Fig. (c)
Source: AMD’s presentation at Hot Chips 26, 2014
FPGA Implementation (Neko) • One CU’s 1 SIMD and 1 SIMF mapped to Xilinx Virtex7 FPGA • Microblaze implements Ultra-thread dispatcher in software
• Reasonable approximation of commercial GPGPUs • Flexible and Realistic design - proved to be useful as shown by case studies • A research tool for exploring ideas that require RTL or Gate level representation of a GPGPU Compute Unit
design so likely to remain relevant for a few years, and has a ... Table 1: MIAOW RTL vs. state-of-art products (Radeon HD) .... details are deferred to an accompanying technical report. ...... our workloads and believe programs rarely do this.
ditional barcode technology, it also provides additional ... retail automation, the technology can help bridge the .... readers will have access to wireless net-.
mean of 22.9%. Categories and Subject Descriptors D.3.4 [Programming ... personal identifiable information. ... 2. Overview. In this section, we will provide an overview of the system ...... Computer Science, 9:1910â1919, 2012. [11] S. Che, M. Boye
wireless communication technologies and to provide timely and precise medical ... of smart homes, rather than 3G cellular networks, for reducing. RHM cost.
Mar 22, 2006 - This report analyses Federated Search in the VASCODA context, specifically fo- cusing on the existing TIB Hannover and UB Bielefeld search infrastructures. We first describe general requirements for a seamless integration of the two fu
Mar 22, 2006 - tional advantages can be gained by agreeing on a common .... on top of local search engines, which can be custom-built for libraries, built upon ...... FAST software plays an important role in the Vascoda project because major.
Radio frequency identification has attracted considerable press attention in recent years, and for good reasons: RFID not only replaces tra- ditional barcode ...
The number of trees a node is entitled to depends on the amount of bandwidth it ... or business environments, symmetric connections (eg: Eth- ernet) are more common. ... to connect to a subset of trees and contribute in smaller bandwidth ..... 800. 0
Department of Electrical and Computer Engineering ... where insufficient bandwidth exists to support all hosts, but have not received ... port from the network.
T. Volin. H. Wagner. Cloud computing is a new paradigm that is transforming the ... aspects of providing such architecture while promoting scalability, modularity, and ... IBM offers IaaS services for enterprise customers [4]. This service allows ...
broadcasting applications using overlays in environments char- acterized by hosts with ... multitree framework and design heuristics to enable it to consider.
our system leverages the multi-tree framework [3] to enable application-level ...... and the dashed line shows the actual bandwidth received. 0. 20. 40. 60. 80.
Enabling Wireless Communications with ... State-of-the-Art RF CMOS and SiGe BiCMOS Technologies ... The unity current gain frequency, ft is defined as the.