OpenCUDA+MPI A Framework for Heterogeneous GP-GPU Cluster Computing

Kenny Ballou

July 20, 2013

Ballou OpenCUDA+MPI

1 Project Overview 2 Progress

Ballou OpenCUDA+MPI

1 Project Overview

Terms and Definitions

2 Progress

Node Power Node Configuration Sample/ Test Problem Development Results — N-Body Simulation

Ballou OpenCUDA+MPI

Introduction Parallel and Distributed Computing

What is Distributed General Purpose Graphics Computing? Parallel: Processing concurrently

Distributed: Processing over many computers

GPU Computing Highly Parallel Computing

(Highly) Parallel + Distributed Awesome “High-Performance-Computing”

Ballou OpenCUDA+MPI

1 Project Overview

Terms and Definitions

2 Progress

Node Power Node Configuration Sample/ Test Problem Development Results — N-Body Simulation

Ballou OpenCUDA+MPI

Node Power Requirements and Distribution

Now able to run all nodes 300 Watts per Node (Peak) ˜ 120 Volts at 20 Amps (Single Circuit) ˜ 48 Kilowatts

Ballou OpenCUDA+MPI

Salt Node Configuration

Provisioning Software Configurations Daemons/ Services

Arbitrary Command Execution Bring up and down nodes Query load

Complete for all nodes except for “master” or head node

Ballou OpenCUDA+MPI

Sample/ Test Programs

109 vector element-wise summation N-Body Simulation using particle particle, particle (adaptive) mesh (P3M) algorithm

Ballou OpenCUDA+MPI

Refactoring

def g e t p a r t i c l e s i n g r i d ( r , grid ) : def contains ( r , d ) : r e t u r n d [ 0 ] <= r [ 0 ] < d [ 0 ] + d [ 2 ] and \ d [ 1 ] <= r [ 1 ] < d [ 1 ] + d [ 2 ] return ( i f o r i in range ( len ( r [ 0 ] ) ) i f contains (( r [ 0 ] [ i ] , r [ 1 ] [ i ]) , grid ))

Ballou OpenCUDA+MPI

Refactoring — Result

def g e t p a r t i c l e s i n g r i d ( r , grid ) : x = np . i n t e r s e c t 1 d ( np . where ( g r i d [ 0 ] <= r [ 0 ] ) [ 0 ] , np . where ( r [ 0 ] < g r i d [ 0 ] + g r i d [ 2 ] ) [ 0 ] , a s s u m e u n i q u e=True ) y = np . i n t e r s e c t 1 d ( np . where ( g r i d [ 1 ] <= r [ 1 ] ) [ 0 ] , np . where ( r [ 1 ] < g r i d [ 1 ] + g r i d [ 2 ] ) [ 0 ] , a s s u m e u n i q u e=True ) r e t u r n np . i n t e r s e c t 1 d ( x , y , a s s u m e u n i q u e=True )

Ballou OpenCUDA+MPI

Timing Results — N-Body Simulation 2k Times

Method

User (seconds)

Sys (seconds)

Real (seconds)

CPU

28.62

0.01

29.81

GPU

0.45

0.56

2.31

CUDA+MPI

N/A

N/A

N/A

Table : N-Body 2k Time Comparisons

Ballou OpenCUDA+MPI

Timing Results — N-Body Simulation 20k Times

Method

User (seconds)

Sys (seconds)

Real (seconds)

CPU

2368.39

1.30

2377.88

GPU

18.92

2.25

22.95

CUDA+MPI

1.19

1.01

2.94

Table : N-Body 20k Time Comparisons

Ballou OpenCUDA+MPI

Timing Results — N-Body Simulation 200k Times

Method

User (seconds)

Sys (seconds)

Real (seconds)

CPU

...

...

...

GPU

39.14

4.68

46.57

CUDA+MPI

6.43

5.01

13.65

Table : N-Body 200k Time Comparisons

Ballou OpenCUDA+MPI

Timing Results — N-Body Simulation 2m Times

Method

User (seconds)

Sys (seconds)

Real (seconds)

CPU

......

......

......

GPU

158.23

17.88

184.64

CUDA+MPI

68.50

44.93

127.04

Table : N-Body 2m Time Comparisons

Ballou OpenCUDA+MPI

Timing Results — N-Body Simulation 20m Times

Method

User (seconds)

Sys (seconds)

Real (seconds)

CPU

Nope

Nope

Nope

GPU

1159.89

147.24

1359.77

CUDA+MPI

623.41

156.82

901.62

Table : N-Body 20m Time Comparisons

Ballou OpenCUDA+MPI

OpenCUDA+MPI A Framework for Heterogeneous GP-GPU Cluster Computing

Kenny Ballou

July 20, 2013

Ballou OpenCUDA+MPI

OpenCUDA+MPI - GitHub

A Framework for Heterogeneous GP-GPU Cluster Computing. Kenny Ballou ... Parallel: Processing concurrently. Distributed: Processing over many computers.

143KB Sizes 1 Downloads 78 Views

Recommend Documents

here - GitHub
Sep 14, 2015 - Highlights. 1 optimizationBenchmarking tool for evaluating and comparing ...... in artificial intelligence, logic, theoretical computer science, and various application ...... can automatically be compiled to PDF [86],ifaLATEX compiler

1 - GitHub
Mar 4, 2002 - is now an integral part of computer science curricula. ...... students have one major department in which they are working OIl their degree.

J - GitHub
DNS. - n~OTHOCTb aamiCI1 Ha IAJI i. FILE - CllHCOK HOUepOB OCipaCiaTbiBaeu~ tlJai'i~OB i. RCBD - KO~HqecTBO OCipaCiaTbiB86Y~ ~E3;. PRT.

Geomega - GitHub
2: Number of these atoms in the material (integer). Old style ..... A directional 3D strip detector, where some information of the electron direction is retained (*).

33932 - GitHub
for automotive electronic throttle control, but are applicable to any low voltage DC servo ... degree of heatsinking provided to the device package. Internal peak-.

here - GitHub
Feb 16, 2016 - 6. 2 Low Level System Information. 7. 2.1 Machine Interface . .... devspecs/abi386-4.pdf, which describes the Linux IA-32 ABI for proces- ...... rameters described for the usual personality routine below, plus an additional.

Syllabus - GitHub
others is an act of plagiarism, which is a serious offense and all involved parties will be penalized according ... Academic Honesty Policy Summary: Introduction.

here - GitHub
can start it for free, but at some point you need to pay to advance through (not sure of the ... R for Everyone, Jared Lander, http://www.amazon.com/Everyone-Advanced-Analytics-Graphics-Addison-Wesley/ ... ISLR%20First%20Printing.pdf.