OpenCUDA+MPI A Framework for Heterogeneous GP-GPU Cluster Computing

Kenny Ballou

June 29, 2013

Ballou OpenCUDA+MPI

1 Project Overview 2 Progress

Ballou OpenCUDA+MPI

1 Project Overview

Terms and Definitions Problems Plans and Goals

2 Progress

Node Configuration Sample/ Test Problem Development Results — Vector Summation Results — N-Body Simulation

Ballou OpenCUDA+MPI

Introduction Parallel and Distributed Computing

What is General Purpose Graphic Processing Unit (GP-GPU) Distributed Computing? Parallel: Processing concurrently

Distributed: Processing over many computers

GPU Computing Highly Parallel Computing

(Highly) Parallel + Distributed Awesome “High-Performance-Computing”

Ballou OpenCUDA+MPI

Current Problems

“Distributed Programming” is expensive Specificity of Hardware Data Distribution Volume Network File System (NFS)

Fault Tolerance Optimizing Resources and Utilization

Ballou OpenCUDA+MPI

Goals OpenCUDA+MPI: A Framework . . .

1

Ease Programming Expense

2

Enable / Allow Diversity in Computing Environment

3

Release as Free and Open-Source Software

Ballou OpenCUDA+MPI

Plan

Develop several different programs/ solutions for each program Profile (Analyze) solutions Develop framework; rework solutions to use new approach; profile again Add Cluster / Node Configuration and Scheduling Options

Ballou OpenCUDA+MPI

1 Project Overview

Terms and Definitions Problems Plans and Goals

2 Progress

Node Configuration Sample/ Test Problem Development Results — Vector Summation Results — N-Body Simulation

Ballou OpenCUDA+MPI

Salt Node Configuration

Provisioning Software Configurations Daemons/ Services

Arbitrary Command Execution Bring up and down nodes Query load

Complete for all nodes except for “master” or head node

Ballou OpenCUDA+MPI

Sample/ Test Programs

109 vector element-wise summation N-Body Simulation using particle particle, particle (adaptive) mesh (P3M) algorithm

Ballou OpenCUDA+MPI

Timing Results — Vector Summation Method

Time (s)

Total Time (s)

CPU Only

13.7

254.13

CUDA (Single Node)

13.83

4172

MPI + CUDA (7 nodes)

10.51

(average) 3177

MPI (7 nodes)

(average) 226

Table : Computational Timing Comparison of 109 element wise vector summation

LOTS of I/O Bad Example Ballou OpenCUDA+MPI

Timing Results — N-Body Simulation CPU Solutions

Size

User (seconds)

Sys (seconds)

Real (seconds)

2001

10.65

0.00

12.03

20000

861.46

0.00

861.72

200000

109306

18.05

109364

2 million

...

...

...

Ballou OpenCUDA+MPI

Timing Results — N-Body Simulation CUDA Solutions

Size

User (seconds)

Sys (seconds)

Real (seconds)

2001

0.68

0.48

01.25

20000

3.41

0.55

04.06

200000

31.06

1.11

32.28

2 million

347

11.93

361

20 million

115.47

120.65

13927

Ballou OpenCUDA+MPI

OpenCUDA+MPI A Framework for Heterogeneous GP-GPU Cluster Computing

Kenny Ballou

June 29, 2013

Ballou OpenCUDA+MPI

OpenCUDA+MPI - GitHub

Sample/ Test Problem Development. Results — Vector Summation ... Profile (Analyze) solutions. Develop framework ... Provisioning. Software. Configurations.

127KB Sizes 3 Downloads 74 Views

Recommend Documents

here - GitHub
Sep 14, 2015 - Highlights. 1 optimizationBenchmarking tool for evaluating and comparing ...... in artificial intelligence, logic, theoretical computer science, and various application ...... can automatically be compiled to PDF [86],ifaLATEX compiler

1 - GitHub
Mar 4, 2002 - is now an integral part of computer science curricula. ...... students have one major department in which they are working OIl their degree.

J - GitHub
DNS. - n~OTHOCTb aamiCI1 Ha IAJI i. FILE - CllHCOK HOUepOB OCipaCiaTbiBaeu~ tlJai'i~OB i. RCBD - KO~HqecTBO OCipaCiaTbiB86Y~ ~E3;. PRT.

Geomega - GitHub
2: Number of these atoms in the material (integer). Old style ..... A directional 3D strip detector, where some information of the electron direction is retained (*).

33932 - GitHub
for automotive electronic throttle control, but are applicable to any low voltage DC servo ... degree of heatsinking provided to the device package. Internal peak-.

here - GitHub
Feb 16, 2016 - 6. 2 Low Level System Information. 7. 2.1 Machine Interface . .... devspecs/abi386-4.pdf, which describes the Linux IA-32 ABI for proces- ...... rameters described for the usual personality routine below, plus an additional.

Syllabus - GitHub
others is an act of plagiarism, which is a serious offense and all involved parties will be penalized according ... Academic Honesty Policy Summary: Introduction.

here - GitHub
can start it for free, but at some point you need to pay to advance through (not sure of the ... R for Everyone, Jared Lander, http://www.amazon.com/Everyone-Advanced-Analytics-Graphics-Addison-Wesley/ ... ISLR%20First%20Printing.pdf.