GPU Computing

Sam Sartor March 9, 2017 Mines Linux Users Group

The GPU

What is a GPU?

A Graphics Processing Unit (GPU) is a specialized chip primarily for accelerating graphical calculations. GPUs generally derive their performance from their ability to do large numbers of identical arithmetic calculations in parallel.

GPUs for Graphics

Screens have lot of pixels that need to be calculated very quickly. All of the required calculations are identical, just with different input numbers. And because pixels are independent the calculations are also trivial to parallelize. As a result, using the unnecessarily clever CPU would be wasteful and slow. A separate pixel-optimized chip can be used instead, leaving the CPU to do the important stuff.

GPU Computing

Coloring pixels is not the only problem that involves a large number of similar, repetitive calculations. General-purpose GPUs can be used for countless other problems including machine learning, computer vision, signal processing, statistics, linear algebra, finance, and cryptography.

History

1970s - Highly specialized, used only for buffering video and drawing simple 2D rasters (sprites) 1980s - Common bitmap operations such as filling simple 2D shapes 1990s - 3D triangular graphics, common interfaces (OpenGL, Direct3D) developed 2000s - General purpose GPUs, capable of executing arbitrary instructions 2010s - Highly general, used as much for supercomputing as for graphics

How do GPUs Work?

Architecture

GPUs excel at repetition. Instead of performing the same calculation many times in sequence, they step though sequences of instructions all at once using several cores. Each core does the same operation at the same time, but with different inputs.

Branching

Unlike CPUs, which jump back and forth through a program as conditions are met, a GPU will run every possible instruction in sequence, turning different cores on and off as branching occurs. In effect, GPUs are useful for parallel computations but not for multitasking.

Computing At Home

OpenGL Shaders

Although shaders are used for pixel stuff, they are still fundamentally general purpose. Use vertex attributes, uniforms, and textures as input. Use the framebuffer for output. OpenGL bindings exist for every language under the sun.

OpenGL Shaders - Pros & Cons

Pros • Shaders have been around since like 2004 • Universally supported • OpenGL allows for minimal setup Cons • Low level • Not very general • All data has to be stored in textures/images

CUDA

CUDA is a computing platform and API that provides truly general GPU computing. C/C++/Fortran code can be compiled ahead of time or at runtime and sent to the GPU along with arbitrary chunks of memory. Libraries for controlling and communicating with CUDA programs exist for many languages including C/C++ (through the CUDA SDK) and Python (PyCUDA library).

CUDA - Pros & Cons Pros • Get to use real C/C++ • Pointers, recursion, etc. • Copy arbitrary data between CPU and GPU • Fast Cons • Only available on high-end Nvidia cards • Low level • Annoying to setup

OpenCL

OpenCL is a cross platform alternative to CUDA. It is similar in structure to OpenGL, but intended for general-purpose computation (not just 3D graphics). Bindings exist for all languages. I even found a Brainfuck API.

OpenCL - Pros & Cons

Pros • Cross platform • Nice API • Will use CPU instead of GPU if needed (works anywhere) Cons • Must use C-like OpenCL language • No recursion, pointers, etc. • Slightly slower than CUDA

ArrayFire

ArrayFire is an easy-to-use library of highlevel functions with built-in implementations for CUDA, OpenCL, and the CPU. It is useful for linear algebra, statistics, trigonometry, signal processing, image processing, and more. ArrayFire has first-party support for C++, Python, Go, Rust, Ruby, Lisp, Java, Fortran, D, R, C#, JavaScript, and Lua.

ArrayFire - Pros & Cons

Pros • Trivial to use • Cross platform • Just pass arrays to functions Cons • Limited library of functions • No way of defining your own

Torch

Torch is a popular Lua library for machine learning that seems to be used a lot. It has CPU, CUDA, and OpenCL backends available.

Torch - Pros & Cons

Pros • Large community • High level API • Fast Cons • Lua

TensorFlow

TensorFlow is Google’s library for moving big lists of numbers around, generally with machine learning in mind. As a result, Torch and Tensorflow are currently at war. It has a CPU implementation and a CUDA-based GPU implementation. TensorFlow is primarily for Python, with C++ behind the scenes.

TensorFlow - Pros & Cons Pros • Python • Good visualization tools • Cool abstraction • Best library for Recurrent Neural Networks Cons • Slightly slower than Torch (for now) • Tricky to set up (CUDA) • Needs a high-end Nvidia card to use the GPU

Copyright Notice

This presentation was from the Mines Linux Users Group. A mostly-complete archive of our presentations can be found online at https://lug.mines.edu. Individual authors may have certain copyright or licensing restrictions on their presentations. Please be certain to contact the original author to obtain permission to reuse or distribute these slides.

GPU Computing - GitHub

Mar 9, 2017 - from their ability to do large numbers of ... volves a large number of similar, repetitive cal- ... Copy arbitrary data between CPU and GPU. • Fast.

2MB Sizes 12 Downloads 246 Views

Recommend Documents

Computing heritability - GitHub
Heritability is usually a desired outcome whenever a genetic component is involved in a model. There are three alternative ways for computing it. 1. Simulation ...

Hybrid computing CPU+GPU co-processing and its application to ...
Feb 18, 2012 - Hybrid computing: CPUþGPU co-processing and its application to .... CPU cores (denoted by C-threads) are running concurrently in the system.

pdf-1862\accelerating-matlab-with-gpu-computing-a-primer-with ...
... of the apps below to open or edit this item. pdf-1862\accelerating-matlab-with-gpu-computing-a-primer-with-examples-by-jung-w-suh-youngmin-kim.pdf.

GPU Multiple Sequence Alignment Fourier-Space Cross ... - GitHub
May 3, 2013 - consists of three FFTs and a sliding dot-product, both of these types of ... length n, h and g, and lets call this sum f. f indicates the similarity/correlation between ... transformed back out of Fourier space by way of an inverse-FFT.

OpenCUDA+MPI - A Framework for Heterogeneous GP-GPU ... - GitHub
Kenny Ballou, Boise State University Department of Computer Science ... computing limit scientists and researchers in various ways. The goal of.

Elastic computing with R and Redis - GitHub
May 16, 2016 - Listing 3 presents a parallel-capable variation of the boot function from the ... thisCall

Introduction to Scientific Computing in Python - GitHub
Apr 16, 2016 - 1 Introduction to scientific computing with Python ...... Support for multiple parallel back-end processes, that can run on computing clusters or cloud services .... system, file I/O, string management, network communication, and ...

GPU Power Model -
Analytical Model Description. ⬈ Parser. ⬈ Power Model. ⬈ Experiment Setup. ⬈ Results. ⬈ Conclusion and Further Work. CSCI 8205: GPU Power Model. 11.

Scientific Computing for Biologists Hands-On Exercises ... - GitHub
Scientific Computing for Biologists. Hands-On Exercises, Lecture 7 .... Download the file zeros.dat from the course wiki. This is a 25 × 15 binary matrix that ...

Scientific Computing for Biologists Hands-On Exercises ... - GitHub
Nov 15, 2011 - computer runs Windows you can have access to a Unix-like environment by installing a program called .... 6 4976 Nov 1 12:21 rolland-etal-2 -cAMP.pdf ..... GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin1 . ).

Computing Extremely Accurate Quantiles using t-Digests - GitHub
6. TED DUNNING AND OTMAR ERTL bins and the t-digest with its non-equal bin .... centroids plus a number of calls to sin−1 roughly equal to the size of the result and thus ..... The left panel shows the size of a serialized Q-digest versus the.

A Framework to Transform In-Core GPU Algorithms to Out-of ... - GitHub
Keywords: GPU, out of core, global illumination, path tracing. Concepts: .... include optimization of the compiler to reduce the VGPR usage, application to the ...

Scientific Computing for Biologists Hands-On Exercises ... - GitHub
Oct 25, 2011 - Discriminant Analysis in R. The function ... data = iris, prior = c(1, 1, 1)/3) ... if we were analyzing a data set with tens or hundreds of variables.

Abstraction in Technical Computing Jeffrey Werner Bezanson - GitHub
programming tools have advanced in terms of the abstractions and generalizations they are capable of. Science as a .... According to our theory, the issue is not really just a lack of types, but an exclusive focus on .... administrative tasks like me

Scientific Computing for Biologists Hands-On Exercises ... - GitHub
Oct 1, 2011 - iris.cl ... analysis of road distances between US cities available at the following link (but see notes ...

Scientific Computing for Biologists Hands-On Exercises ... - GitHub
Nov 8, 2011 - vignette can be downloaded from the CRAN website. Using mixtools. We'll look at how to use mixtools using a data set on eruption times for the ...

A Fast Dynamic Language for Technical Computing - GitHub
Jul 13, 2013 - JIT compiler - no need to vectorize for performance. ‣ Co-routines. ‣ Distributed memory ... Effortlessly call C, Fortran, and Python libraries.

Cascaded HOG on GPU
discards detection windows obviously not including target objects. It reduces the .... (block) consisting of some cells in window. The histogram binning and it.

gpu optimizer: a 3d reconstruction on the gpu using ...
graphics hardware available in any desktop computer. This work ... the domain of the problem. .... memory and then search the best candidate using the. CPU.

Call For Paper GPU Design Patterns - Teratec
Page 1. Call For Paper. GPU Design Patterns. The Open GPU aims at building OpenCL and CUDA tools for CPU /GPU hybrid computing through ... Web sites :.

Call For Paper GPU Design Patterns - Teratec
GPU Design Patterns. The Open GPU aims at ... Designing the appropriate hardware and software architectures for the exploitation of these ... Web sites :.

Scalable GPU Graph Traversal - Research - Nvidia
Feb 29, 2012 - format to store the graph in memory consisting of two arrays. Fig. ... single array of m integers. ... Their work complexity is O(n2+m) as there may.