High Performance Computing For senior undergraduate students

Lecture 8: Analytical Modeling of Parallel Systems 29.11.2016

Dr. Mohammed Abdel-Megeed Salem Scientific Computing Department Faculty of Computer and Information Sciences Ain Shams University

Outlook • Sources of Overhead in Parallel Programs • Interprocess Interaction, Idling, Excess Computation

• Performance Metrics for Parallel Systems – Execution Time – Total Parallel Overhead – Speedup – Efficiency

Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

2

Analytical Modeling - Basics

• A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size). • The asymptotic runtime of a sequential program is identical on any serial platform. • The parallel runtime of a program depends on the input size, the number of processors, and the communication parameters of the machine. • A parallel system is a combination of a parallel algorithm and an underlying platform. Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

3

Terms

• Interprocess interactions: Processors need to talk to each other. • Idling: Processes may idle because of load imbalance, synchronization, or serial components. • Excess Computation: This is computation not performed by the serial version.

Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

4

Execution Time • Serial runtime of a program is the time elapsed between the beginning and the end of its execution on a sequential computer. • The parallel runtime is the time that elapses from the moment the first processor starts to the moment the last processor finishes execution. • We denote the serial runtime by Ts and the parallel runtime by TP .

Ts + Tp = 100% Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

5

Performance Metrics for Parallel Systems: Total Parallel Overhead • Let Tall be the total time collectively spent by all the processing elements. Tall = p TP (p is the number of processors).

• TS is the serial time. • The total overhead To = p TP - TS

Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

6

Performance Metrics for Parallel Systems: Speedup • Speedup (S) is the ratio of the time taken to solve a problem on a single processor to the time required to solve the same problem on a parallel computer with p identical processing elements.

Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

7

Performance Metrics: Example • Consider the problem of adding n numbers by using n processing elements. • If n is a power of two, we can perform this operation in log n steps by propagating partial sums up a logical binary tree of processors.

Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

8

Performance Metrics: Example • Assuming that n is a power of two, we can perform this operation in log n steps by propagating partial sums up a logical binary tree of processing elements.

• Initially, each processing element is assigned one of the numbers to be added and, at the end of the computation, one of the processing elements stores the sum of all the numbers.

Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

9

Performance Metrics: Example

Computing the globalsum of 16 partial sums using 16 processing elements . Σji denotes the sum of numbers with Dr. Mohammed Salem 10 consecutive labels from i toAbdel-Megeed j. High Performance Computing 2016/ 2017 Lecture 8

Performance Metrics: Example • Each step consists of one addition and the communication of a single word. • If an addition takes constant time, say, tc and communication of a single word takes time ts + tw, the addition and communication operations take a constant amount of time. • We have the parallel time TP = Θ (log n) • We know that TS = Θ (n) • Speedup S is given by S = Θ (n / log n) Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

11

Performance Metrics: Speedup • For a given problem, there might be many serial algorithms available. These algorithms may have different asymptotic runtimes and may be parallelizable to different degrees. • For the purpose of computing speedup, we always consider the best sequential program as the baseline. Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

12

Performance Metrics: Speedup • Consider the problem of parallel bubble sort. • The serial time for bubblesort is 150 seconds for 100 000 records. • The parallel time for odd-even sort (efficient parallelization of bubble sort) is 40 seconds. • The speedup would appear to be 150/40 = 3.75. • But is this really a fair assessment of the system? • What if serial quicksort only took 30 seconds? In this case, the speedup is 30/40 = 0.75. This is a more realistic assessment of the system. Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

13

Performance Metrics: Speedup Bounds • Speedup can be as low as 0 (the parallel program never terminates). • Speedup, in theory, should be upper bounded by p - after all, we can only expect a p-fold speedup if we use times as many resources. • A speedup greater than p is possible only if each processing element spends less than time TS / p solving the problem. • In this case, a single processor could be timeslided to achieve a faster serial program, which contradicts our assumption of fastest serial program as basis for speedup. Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

14

Performance Metrics: Efficiency • Efficiency is a measure of the fraction of time for which a processing element is usefully employed

• Mathematically, it is given by =

• Following the bounds on speedup, efficiency can be as low as 0 and as high as 1. Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

15

Performance Metrics: Efficiency • The speedup of adding numbers on processors is given by

• Efficiency is given by =

=

Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

16

Parallel Time, Speedup, and Efficiency Consider the problem of edge-detection in images. The problem requires us to apply a 3 x 3 template to each pixel. If each multiply-add operation takes time tc, the serial time for an n x n image is given by TS= 9tc n2.

Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

17

Parallel Time, Speedup, and Efficiency One possible parallelization partitions the image equally into vertical segments, each with n2 / p pixels. The algorithm executes in two steps: (i) exchange a layer of n pixels with each of the two adjoining processing elements; and (ii) apply template on local subimage.

Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

18

Parallel Time, Speedup, and Efficiency • One possible parallelization partitions the image equally into vertical segments, each with n2 / p pixels. • The algorithm executes in two steps: (i) exchange a layer of n pixels with each of the two adjoining processing elements; and (ii) apply template on local subimage. • The first step involves two n-word messages (assuming each pixel takes a word to communicate RGB data). This takes time 2(ts + twn). • The second step takes time: Templates may now be applied to all n2 / p pixels in time = 9tcn2/p. Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

19

Parallel Time, Speedup, and Efficiency • The total time for the algorithm is therefore given by:

• The corresponding values of speedup and efficiency are given by:

and Dr. Mohammed Abdel-Megeed Salem

High Performance Computing 2016/ 2017

Lecture 8

20

Contacts High Performance Computing, 2016/2017 Dr. Mohammed Abdel-Megeed M. Salem Faculty of Computer and Information Sciences, Ain Shams University Abbassia, Cairo, Egypt Tel.: +2 011 1727 1050 Email: [email protected] Web: https://sites.google.com/a/fcis.asu.edu.eg/salem

High Performance Computing

Nov 29, 2016 - problem requires us to apply a 3 x 3 template to each pixel. If ... (ii) apply template on local subimage. .... Email: [email protected].

731KB Sizes 3 Downloads 290 Views

Recommend Documents

High Performance Computing
Nov 8, 2016 - Faculty of Computer and Information Sciences. Ain Shams University ... Tasks are programmer-defined units of computation. • A given ... The number of tasks that can be executed in parallel is the degree of concurrency of a ...

High Performance Computing
Dec 20, 2016 - Speedup. – Efficiency. – Cost. • The Effect of Granularity on Performance .... Can we build granularity in the example in a cost-optimal fashion?

High Performance Computing
Nov 1, 2016 - Platforms that support messaging are called message ..... Complete binary tree networks: (a) a static tree network; and (b) a dynamic tree ...

High Performance Computing
Computational science paradigm: 3) Use high performance computer systems to simulate the ... and marketing decisions. .... Email: [email protected].

Advances in High-Performance Computing ... - Semantic Scholar
tions on a domain representing the surface of lake Constance, Germany. The shape of the ..... On the algebraic construction of multilevel transfer opera- tors.

SGI UV 300RL - High Performance Computing
By combining additional chassis (up to eight per standard 19-inch rack), UV 300RL is designed to scale up to 32 sockets and 1,152 threads (with hyper threading enabled). All of the interconnected chassis operate as a single system running under a sin

Advances in High-Performance Computing ... - Semantic Scholar
ement module is illustrated on the following model problem in eigenvalue computations. Let Ω ⊂ Rd, d = 2, 3 be a domain. We solve the eigenvalue problem:.

High performance computing in structural determination ...
Accepted 7 July 2008. Available online 16 July 2008 ... increasing complexity of algorithms and the amount of data needed to push the resolution limits. High performance ..... computing power and dozens of petabytes of storage distributed.

Ebook Introduction to High Performance Computing for ...
Book synopsis. Suitable for scientists, engineers, and students, this book presents a practical introduction to high performance computing (HPC). It discusses the ...

pdf-0743\high-performance-cluster-computing-programming-and ...
... the apps below to open or edit this item. pdf-0743\high-performance-cluster-computing-programming-and-applications-volume-2-by-rajkumar-buyya.pdf.

pdf-0743\high-performance-cluster-computing-programming-and ...
... the apps below to open or edit this item. pdf-0743\high-performance-cluster-computing-programming-and-applications-volume-2-by-rajkumar-buyya.pdf.

Bridging the High Performance Computing Gap: the ...
up by the system and the difficulties that have been faced by ... posed as a way to build virtual organizations aggregating .... tion and file transfer optimizations.

High-Performance Cloud Computing: A View of ...
1Cloud computing and Distributed Systems (CLOUDS) Laboratory. Department .... use of Cloud computing in computational science is still limited, but ..... Linux based systems. Being a .... features such as support for file transfer and resource.