OpenCUDA+MPI A Framework for Heterogeneous GP-GPU Cluster Computing

Kenny Ballou

May 3, 2013

Ballou OpenCUDA+MPI

1 Project Introduction 2 Project Goals / Objectives 3 Project Results 4 Future Work

Ballou OpenCUDA+MPI

1 Project Introduction

Project 2 Project Goals / Objectives

A Framework Metrics of Goals 3 Project Results

Configuration pyCUDA + MPI4PY Initial Timing Results 4 Future Work

Develop Framework Profiling and Testing Cluster Configuration

Ballou OpenCUDA+MPI

Project Parallel and Distributed Computing

What is GP-GPU Distributed Computing? Parallel: Processing concurrently CUDA

Distributed Processing over many computers, not necessarily in parallel MPI

Combined CUDA+MPI

Ballou OpenCUDA+MPI

1 Project Introduction

Project 2 Project Goals / Objectives

A Framework Metrics of Goals 3 Project Results

Configuration pyCUDA + MPI4PY Initial Timing Results 4 Future Work

Develop Framework Profiling and Testing Cluster Configuration

Ballou OpenCUDA+MPI

Framework

Ease and abstract difficulties in Parallel / Distributed Programming Allow for Diversity in Computing Environment “Jungle” Computing

Add process “scheduler” to best utilize available computing resources Add Cluster Configuration and Management Release as FOSS

Ballou OpenCUDA+MPI

Goal Metrics

Develop Several Test Programs Vascular Extraction from CT angiography scans N-Body Simulation (possible) Prime Number Searching (possible)

Profile / Compare CPU-only, CUDA-only, MPI+CUDA Solutions

Ballou OpenCUDA+MPI

1 Project Introduction

Project 2 Project Goals / Objectives

A Framework Metrics of Goals 3 Project Results

Configuration pyCUDA + MPI4PY Initial Timing Results 4 Future Work

Develop Framework Profiling and Testing Cluster Configuration

Ballou OpenCUDA+MPI

Salt Node Configuration

Provision Software/Settings/Daemons Bring up and down Nodes Complete for all nodes except “master”

Ballou OpenCUDA+MPI

pyCUDA + MPI4PY

pyCUDA Host to Device Memory Copies Device to Host Memory Copies Block, Thread Indexing Complexities gpuarray

MPI mpirun is a bit messy

MPI4PY: No Python 3.x support

Ballou OpenCUDA+MPI

Timing Results / Comparison Method

Time (s)

Total Time (s)

CPU Only

13.7

254.13

CUDA (Single Node)

13.83

4172

MPI + CUDA (7 nodes)

10.51

(average) 3177

Table : Computational Timing Comparison of 109 element wise vector summation

LOTS of IO Bad Example

Ballou OpenCUDA+MPI

1 Project Introduction

Project 2 Project Goals / Objectives

A Framework Metrics of Goals 3 Project Results

Configuration pyCUDA + MPI4PY Initial Timing Results 4 Future Work

Develop Framework Profiling and Testing Cluster Configuration

Ballou OpenCUDA+MPI

Begin Framework Development

Create Abstraction For CUDA and MPI Array Slicing Calculations

Create custom MPI runner Will Help with Scheduling

Ballou OpenCUDA+MPI

Add Profiling, Unit Testing, and Integration Testing

Use existing Python and CUDA profiling tools Interface Profiling tools into Framework Tests for sense of “correctness”

Ballou OpenCUDA+MPI

Cluster Configuration / Management

Add Salt Configuration for Master Node Research and Implement a distributed filesystem

Ballou OpenCUDA+MPI

OpenCUDA+MPI A Framework for Heterogeneous GP-GPU Cluster Computing

Kenny Ballou

May 3, 2013

Ballou OpenCUDA+MPI

OpenCUDA+MPI - GitHub

May 3, 2013 - Add process “scheduler” to best utilize available computing resources. Add Cluster ... Host to Device Memory Copies. Device to Host Memory ...

170KB Sizes 5 Downloads 85 Views

Recommend Documents

here - GitHub
Sep 14, 2015 - Highlights. 1 optimizationBenchmarking tool for evaluating and comparing ...... in artificial intelligence, logic, theoretical computer science, and various application ...... can automatically be compiled to PDF [86],ifaLATEX compiler

1 - GitHub
Mar 4, 2002 - is now an integral part of computer science curricula. ...... students have one major department in which they are working OIl their degree.

J - GitHub
DNS. - n~OTHOCTb aamiCI1 Ha IAJI i. FILE - CllHCOK HOUepOB OCipaCiaTbiBaeu~ tlJai'i~OB i. RCBD - KO~HqecTBO OCipaCiaTbiB86Y~ ~E3;. PRT.

Geomega - GitHub
2: Number of these atoms in the material (integer). Old style ..... A directional 3D strip detector, where some information of the electron direction is retained (*).

33932 - GitHub
for automotive electronic throttle control, but are applicable to any low voltage DC servo ... degree of heatsinking provided to the device package. Internal peak-.

here - GitHub
Feb 16, 2016 - 6. 2 Low Level System Information. 7. 2.1 Machine Interface . .... devspecs/abi386-4.pdf, which describes the Linux IA-32 ABI for proces- ...... rameters described for the usual personality routine below, plus an additional.

Syllabus - GitHub
others is an act of plagiarism, which is a serious offense and all involved parties will be penalized according ... Academic Honesty Policy Summary: Introduction.

here - GitHub
can start it for free, but at some point you need to pay to advance through (not sure of the ... R for Everyone, Jared Lander, http://www.amazon.com/Everyone-Advanced-Analytics-Graphics-Addison-Wesley/ ... ISLR%20First%20Printing.pdf.