Automatic Generation of Efficient Codes from Mathematical Descriptions of Stencil Computation Takayuki Muranushi1 Seiya Nishizawa1 Hirofumi Tomita1 Keigo Nitadori1 Masaki Iwasawa1 Yutaka Maruyama1 Hisashi Yashiro1 Yoshifumi Nakamura1 Hideyuki Hotta2 Junichiro Makino3 Natsuki Hosono4 Hikaru Inoue5 1 RIKEN Advanced Institute for Computational Science 2 Chiba University 3 Kobe University 4 Kyoto University 5 Fujitsu Ltd.

Sep 22, 2016 for FHPC 2016 workshop / ICFP’16 Nara, Japan T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

1 / 37

Programming Language

Formura T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

2 / 37

Programming language Formura

Domain specific language for stencil computaion

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

3 / 37

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

4 / 37

Good news of Formura 1/2

1:184 Petaflops (11.62% of the peak) on 663,552 cores

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

5 / 37

Good news of Formura 1/2

ACM Gordon Bell Prize Finalist

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

6 / 37

Good news of Formura 2/2

@ @t ddt_  = -

T. Muranushi et al. (RIKEN AICS)



=

3 @ X (vi) @x i=1 i

fun ( i ) @ i (  * v i )

Formura

Sep 22, 2016

7 / 37

Formura

is a functional programming language is implemented in a functional programming language (Haskell)

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

8 / 37

Backend: How we generate efficient codes

Backend: How we generate efficient codes

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

9 / 37

Backend: How we generate efficient codes

Stencil Computation

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

10 / 37

Backend: How we generate efficient codes

Byte / Flops of hardwares are decreasing

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

11 / 37

Backend: How we generate efficient codes

Naive implementation of stencil computation

2He The optimal B = F C e

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

12 / 37

Backend: How we generate efficient codes

Temporal Blocking

The optimal

T. Muranushi et al. (RIKEN AICS)

B F

0 1 2 He @ 1 2 dNs A = C N + N e

Formura

F

T

Sep 22, 2016

13 / 37

Backend: How we generate efficient codes

Decompose & fuse array computations in space-time

manifest :: a [ i ] b[i] manifest :: c [ i ] d[i] manifest :: e [ i ]

T. Muranushi et al. (RIKEN AICS)

= = = =

a [i -1] b [i -1] c [i -1] d [i -1]

Formura

+ * + *

a[i] b[i] c[i] d[i]

+ * + *

a [ i +1] b [ i +1] c [ i +1] d [ i +1]

Sep 22, 2016

14 / 37

Backend: How we generate efficient codes

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

15 / 37

Backend: How we generate efficient codes

In which language shall we code?

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

16 / 37

Backend: How we generate efficient codes

Paraiso : a DSL embedded in Haskell (Muranushi, 2012) among Nikola (Mainland & Morrisett, 2010), Obsidian (Svensson, 2011), Accelerate (Chakravarty et al., 2011), SPOC (Bourgoin et al., 2012), NOVA (Collins et al., 2014), and LMS series (Rompf, 2012). T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

17 / 37

Backend: How we generate efficient codes

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

18 / 37

Backend: How we generate efficient codes

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

19 / 37

Backend: How we generate efficient codes

Paraiso: a bad sell

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

20 / 37

Backend: How we generate efficient codes

Our team

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

21 / 37

Formura : a standalone DSL

Formura : a standalone DSL

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

22 / 37

Formura : a standalone DSL

Design principle of Formura

 Simple

enough  Rich enough

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

23 / 37

Formura : a standalone DSL

Syntax of Formura # dimension declaration dimension :: 3 # array declaration double [] :: vx , vy , vz # array computation A2 [i ,j , k ] = A [i -1] + A [ i +1] # Tuple v = ( vx , vy , vz ) # Lambda expression tripe = fun ( x ) 3 * x

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

24 / 37

Formura : a standalone DSL

Tuples are functions (a , b ) 1 = b (f ,( h ,p , c )) 1 2 = c

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

25 / 37

Formura : a standalone DSL

Inferred promotion to tuples and functions x + (a , b ) = ( x +a , x + b ) (x , y ) + (a , b ) = ( x +a , y + b ) (x , y ) + (a ,b , c ) = ? (f + g) x = f x + g x ( f + g + 1) x = f x + g x + 1 rk4 = fun ( ddt ) \ fun ( sys_0 ) let \ sys_q4 = sys_0 + sys_q3 = sys_0 + sys_q2 = sys_0 + sys_next = sys_0 in sys_next T. Muranushi et al. (RIKEN AICS)

dt /4 dt /3 dt /2 + dt Formura

* * * *

ddt ( sys_0 ) ddt ( sys_q4 ) ddt ( sys_q3 ) ddt ( sys_q2 ) Sep 22, 2016

26 / 37

Formura : a standalone DSL

Differentiation Operators ddx = fun ( a ) ( a [ i +1/2 , j , k ] - a [i -1/2 , j , k ])/ dx ddy = fun ( a ) ( a [i , j +1/2 , k ] - a [i ,j -1/2 , k ])/ dy ddz = fun ( a ) ( a [i ,j , k +1/2] - a [i ,j ,k -1/2])/ dz

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

27 / 37

Formura : a standalone DSL

Nabla and Summation @ = ( ddx , ddy , ddz )  = fun ( e ) e 0 + e 1 + e 2

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

28 / 37

Formura : a standalone DSL

Evaluation of formura expression



fun(i) @ i ( * v i)

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

29 / 37

Formura : a standalone DSL

Evaluation of formura expression





fun(i) @ i ( * v i)

= fun ( e ) e 0 + e 1 + e 2

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

29 / 37

Formura : a standalone DSL

Evaluation of formura expression





fun(i) @ i ( * v i)

= fun ( e ) e 0 + e 1 + e 2

!

(fun(i) @ i ( * v i)) 0 + (fun(i) @ i ( * v i)) 1 + (fun(i) @ i ( * v i)) 2

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

29 / 37

Formura : a standalone DSL

Evaluation of formura expression

(fun(i) @ i ( * v i)) 0

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

30 / 37

Formura : a standalone DSL

Evaluation of formura expression

!

(fun(i) @ i ( * v i)) 0 @ 0 ( * v 0))

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

30 / 37

Formura : a standalone DSL

Evaluation of formura expression

!

(fun(i) @ i ( * v i)) 0 @ 0 ( * v 0))

@ = ( ddx , ddy , ddz ) v = ( vx , vy , vz ) (a ,b , c ) 0 = a

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

30 / 37

Formura : a standalone DSL

Evaluation of formura expression

!

(fun(i) @ i ( * v i)) 0 @ 0 ( * v 0))

!

ddx ( * vx)

@ = ( ddx , ddy , ddz ) v = ( vx , vy , vz ) (a ,b , c ) 0 = a

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

30 / 37

Formura : a standalone DSL

Evaluation of formura expression

ddx ( * vx)

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

31 / 37

Formura : a standalone DSL

Evaluation of formura expression

ddx ( * vx) ddx = fun ( a ) ( a [ i +1/2 , j , k ] - a [i -1/2 , j , k ])/ dx

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

31 / 37

Formura : a standalone DSL

Evaluation of formura expression

ddx ( * vx) ddx = fun ( a ) ( a [ i +1/2 , j , k ] - a [i -1/2 , j , k ])/ dx

!

(( * vx)[i+1/2,j,k] ( * vx)[i-1/2,j,k])/dx

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

31 / 37

Formura : a standalone DSL

Evaluation of formura expression

ddx ( * vx) ddx = fun ( a ) ( a [ i +1/2 , j , k ] - a [i -1/2 , j , k ])/ dx

!

(( * vx)[i+1/2,j,k] ( * vx)[i-1/2,j,k])/dx ! ([i+1/2,j,k] * vx[i+1/2,j,k] [i-1/2,j,k] * vx[i-1/2,j,k])/dx

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

31 / 37

Formura : a standalone DSL

Evaluation of formura expression



fun(i) @ i ( * v i)

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

32 / 37

Formura : a standalone DSL

Evaluation of formura expression



fun(i) @ i ( * v i) ! ([i+1/2,j,k] * vx[i+1/2,j,k] [i-1/2,j,k] * vx[i-1/2,j,k])/dx + ([i,j+1/2,k] * vy[i,j+1/2,k] [i,j-1/2,k] * vy[i,j-1/2,k])/dy + ([i,j,k+1/2] * vz[i,j,k+1/2] [i,j,k-1/2] * vz[i,j,k-1/2])/dz

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

32 / 37

Formura : a standalone DSL

Evaluation of formura expression



!

3 @ X (vi) i=1 @xi

fun ( i ) @ i (  * v i )

([i+1/2,j,k] * vx[i+1/2,j,k] [i-1/2,j,k] * vx[i-1/2,j,k])/dx + ([i,j+1/2,k] * vy[i,j+1/2,k] [i,j-1/2,k] * vy[i,j-1/2,k])/dy + ([i,j,k+1/2] * vz[i,j,k+1/2] [i,j,k-1/2] * vz[i,j,k-1/2])/dz

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

33 / 37

Formura : a standalone DSL

More to talk about Modular Reifiable Matching (MRM)(Oliveira et al., 2015) + Pattern synoynm solves “expression problem” Details of code transformation paths Varieties of temporal blocking methods How we have gave proof to certain types of temporal blocking methods

T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

34 / 37

Conclusion

Conclusion

Functional programming is a good choice for user interface ! weather scientists and astronomers can use it is crucial in implementing all the program transformations ! achieves high performance T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

35 / 37

Conclusion

Conclusion

1.184 Pflops Formura T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

36 / 37

Bibliography

Bibliography I Bourgoin, M., Chailloux, E., & Lamotte, J.-L. 2012, Parallel Processing Letters, 22, 1240007 Chakravarty, M. M., Keller, G., Lee, S., McDonell, T. L., & Grover, V. 2011, in Proceedings of the sixth workshop on Declarative aspects of multicore programming, ACM, 3–14 Collins, A., Grewe, D., Grover, V., Lee, S., & Susnea, A. 2014, in Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ACM, 8 Mainland, G., & Morrisett, G. 2010in , ACM, 67–78 Oliveira, B. C. d. S., Mu, S.-C., & You, S.-H. 2015, in Proceedings of the 8th ACM SIGPLAN Symposium on Haskell, ACM, 82–93 ´ ´ ERALE ´ Rompf, T. 2012, PhD thesis, ECOLE POLYTECHNIQUE FED DE LAUSANNE Svensson, J. 2011, PhD thesis, Chalmers University of Technology T. Muranushi et al. (RIKEN AICS)

Formura

Sep 22, 2016

37 / 37

Automatic Generation of Efficient Codes from Mathematical ... - GitHub

Sep 22, 2016 - Programming language Formura. Domain specific language for stencil computaion. T. Muranushi et al. (RIKEN AICS). Formura. Sep 22, 2016.

7MB Sizes 19 Downloads 310 Views

Recommend Documents

Automatic Generation of Regular Expressions from ... - Semantic Scholar
Jul 11, 2012 - ABSTRACT. We explore the practical feasibility of a system based on genetic programming (GP) for the automatic generation of regular expressions. The user describes the desired task by providing a set of labeled examples, in the form o

Efficient Decoding of Permutation Codes Obtained from ...
N. Thus it is of interest to consider other means of obtaining permutation codes, for .... the transmitted symbol corresponding to bi = 0 is different from the received ...

Efficient Decoding of Permutation Codes Obtained from ...
Index Terms—Permutation codes, Distance preserving maps ... have efficient decoding, are known to achieve this upper bound. (see [1], [2]). ... denote this mapping. The example below illustrates the map of b = (1, 1, 0, 1) to the permutation vector

Automatic generation of synthetic sequential ...
M. D. Hutton is with the Department of Computer Science, University of. Toronto, Ontario M5S ... terization and generation efforts of [1] and [2] to the more dif- ficult problem of ..... for bounds on the fanin (in-degree) and fanout (out-degree) of

Automatic Generation of Scientific Paper Reviews
maximizing the expected reward using reinforcement learning. ..... Oh, A.H., Rudnicky, A.I.: Stochastic natural language generation for spoken dialog systems.

Automatic Generation of Scientific Paper Reviews
whose incentives may or may not actually drive the overall process toward those ideal goals. ... (c) conveys a recommendation specified as input. A tool that is ..... Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech.

Automatic Generation of Release Notes
mining approaches together to address the problem of re- lease note generation, for ... ing data sets of the three evaluation studies. Paper structure. Section 2 ... To design ARENA, we performed an exploratory study aimed at understanding ...

Automatic Generation of Scientific Paper Reviews
paper_overly_honest_citation_slips_into_peer_reviewed_journal.html ... Oh, A.H., Rudnicky, A.I.: Stochastic natural language generation for spoken dialog.

Automatic generation of synthetic sequential ...
M. D. Hutton is with the Department of Computer Science, University of. Toronto ..... an interface to other forms of circuits (e.g., memory [20]) or to deal with ...

Automatic Test Data Generation from Embedded C Code.pdf ...
There was a problem loading this page. Automatic Test Data Generation from Embedded C Code.pdf. Automatic Test Data Generation from Embedded C Code.

Efficient Generation of Evolutionary Trees
We give an algorithm to generate all evolutionary trees having n ordered .... sent such trees with n species with a sequence of (n−2) numbers. Let T(n) be the set ...

Automatic Polynomial Expansions - GitHub
−0.2. 0.0. 0.2. 0.4. 0.6. 0.8. 1.0 relative error. Relative error vs time tradeoff linear quadratic cubic apple(0.125) apple(0.25) apple(0.5) apple(0.75) apple(1.0) ...

Efficient Generation of Evolutionary Trees
SAIDUR RAHMAN. 2. Department of Computer Science and Engineering. Bangladesh ... 20 millions of years ago. Raccoon. Monkey ... plete binary tree with n leaves by a sequence of (n−2) numbers. ... vertex of degree 1. Each vertex in a tree ...

Efficient Generation of Evolutionary Trees
cycle-free connected graph, but to a biologist it represents a series of hypotheses ... International Conference on Information and Communication Technology. ICICT 2007, 7-9 ..... bridge, Massachusetts, London, England, 2004. [4] S. Kawano ...

Automatic Score Alignment of Recorded Music - GitHub
Bachelor of Software Engineering. November 2010 .... The latter attempts at finding the database entries that best mach the musical or symbolic .... However, the results of several alignment experiments have been made available online. The.

alv report generation - GitHub
Jun 24, 2016 - COURSE: Bachelor of Technology in Computer Science. BATCH: 2013-17 ..... organization for the award of any degree or any professional diploma. (Signature of .... Asia, as per Platt's 250 Global Energy Companies List for the year 2007.

Sample use of automatic numbering - GitHub
Apr 11, 2015 - Exercise 1. This is the first exercise. Have also a look at the Theorem 1.1, the exercise 2 and the exercise 3. Theorem 1.1: Needed for the second exercise. This is a the first theorem. Look at the exercise. 1. Page 2. Exercise 2 (This

Mathematical Preliminaries - GitHub
Theorem 13 The set of rational numbers, Q, is countable. Proof: For every q .... example, such a truth table for formula (2.2) would look like this: 7Another symbol ...

Automatic Navmesh Generation via Watershed ...
we do not necessarily need a convex area, just simpler area .... A Navigation Graph for Real-time Crowd Animation on Multilayered and Uneven Terrain.

Automatic Generation of Provably Correct Embedded ...
Scheduling. Model. Checking ... Model. Non-functional. Information. Counterexample. Software. C/C++ Code. Implementation ... e = queue.get() dispatch(e) e.

Towards Automatic Generation of Security-Centric ... - Semantic Scholar
Oct 16, 2015 - ically generate security-centric app descriptions, based on program analysis. We implement a prototype ... Unlike traditional desktop systems, Android provides end users with an opportunity to proactively ... perceive such differences

efficient automatic verification of loop and data-flow ...
and transformations that is common in the domain of digital signal pro- cessing and ... check, it generates feedback on the possible locations of errors in the program. ...... statements and for-loops as the only available constructs to specify the.

Automatic, Efficient, Temporally-Coherent Video ... - Semantic Scholar
Enhancement for Large Scale Applications ..... perceived image contrast and observer preference data. The Journal of imaging ... using La*b* analysis. In Proc.

“Best Dinner Ever!!!”: Automatic Generation of ...
Although the services hosting product reviews do apply filters and procedures aimed at limiting the proliferation of false reviews, an attacker able to generate ...