Automatic Generation of Efficient Codes from Mathematical Descriptions of Stencil Computation Takayuki Muranushi1 Seiya Nishizawa1 Hirofumi Tomita1 Keigo Nitadori1 Masaki Iwasawa1 Yutaka Maruyama1 Hisashi Yashiro1 Yoshifumi Nakamura1 Hideyuki Hotta2 Junichiro Makino3 Natsuki Hosono4 Hikaru Inoue5 1 RIKEN Advanced Institute for Computational Science 2 Chiba University 3 Kobe University 4 Kyoto University 5 Fujitsu Ltd.
Sep 22, 2016 for FHPC 2016 workshop / ICFP’16 Nara, Japan T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
1 / 37
Programming Language
Formura T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
2 / 37
Programming language Formura
Domain specific language for stencil computaion
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
3 / 37
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
4 / 37
Good news of Formura 1/2
1:184 Petaflops (11.62% of the peak) on 663,552 cores
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
5 / 37
Good news of Formura 1/2
ACM Gordon Bell Prize Finalist
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
6 / 37
Good news of Formura 2/2
@ @t ddt_ = -
T. Muranushi et al. (RIKEN AICS)
=
3 @ X (vi) @x i=1 i
fun ( i ) @ i ( * v i )
Formura
Sep 22, 2016
7 / 37
Formura
is a functional programming language is implemented in a functional programming language (Haskell)
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
8 / 37
Backend: How we generate efficient codes
Backend: How we generate efficient codes
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
9 / 37
Backend: How we generate efficient codes
Stencil Computation
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
10 / 37
Backend: How we generate efficient codes
Byte / Flops of hardwares are decreasing
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
11 / 37
Backend: How we generate efficient codes
Naive implementation of stencil computation
2He The optimal B = F C e
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
12 / 37
Backend: How we generate efficient codes
Temporal Blocking
The optimal
T. Muranushi et al. (RIKEN AICS)
B F
0 1 2 He @ 1 2 dNs A = C N + N e
Formura
F
T
Sep 22, 2016
13 / 37
Backend: How we generate efficient codes
Decompose & fuse array computations in space-time
manifest :: a [ i ] b[i] manifest :: c [ i ] d[i] manifest :: e [ i ]
T. Muranushi et al. (RIKEN AICS)
= = = =
a [i -1] b [i -1] c [i -1] d [i -1]
Formura
+ * + *
a[i] b[i] c[i] d[i]
+ * + *
a [ i +1] b [ i +1] c [ i +1] d [ i +1]
Sep 22, 2016
14 / 37
Backend: How we generate efficient codes
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
15 / 37
Backend: How we generate efficient codes
In which language shall we code?
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
16 / 37
Backend: How we generate efficient codes
Paraiso : a DSL embedded in Haskell (Muranushi, 2012) among Nikola (Mainland & Morrisett, 2010), Obsidian (Svensson, 2011), Accelerate (Chakravarty et al., 2011), SPOC (Bourgoin et al., 2012), NOVA (Collins et al., 2014), and LMS series (Rompf, 2012). T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
17 / 37
Backend: How we generate efficient codes
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
18 / 37
Backend: How we generate efficient codes
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
19 / 37
Backend: How we generate efficient codes
Paraiso: a bad sell
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
20 / 37
Backend: How we generate efficient codes
Our team
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
21 / 37
Formura : a standalone DSL
Formura : a standalone DSL
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
22 / 37
Formura : a standalone DSL
Design principle of Formura
Simple
enough Rich enough
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
23 / 37
Formura : a standalone DSL
Syntax of Formura # dimension declaration dimension :: 3 # array declaration double [] :: vx , vy , vz # array computation A2 [i ,j , k ] = A [i -1] + A [ i +1] # Tuple v = ( vx , vy , vz ) # Lambda expression tripe = fun ( x ) 3 * x
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
24 / 37
Formura : a standalone DSL
Tuples are functions (a , b ) 1 = b (f ,( h ,p , c )) 1 2 = c
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
25 / 37
Formura : a standalone DSL
Inferred promotion to tuples and functions x + (a , b ) = ( x +a , x + b ) (x , y ) + (a , b ) = ( x +a , y + b ) (x , y ) + (a ,b , c ) = ? (f + g) x = f x + g x ( f + g + 1) x = f x + g x + 1 rk4 = fun ( ddt ) \ fun ( sys_0 ) let \ sys_q4 = sys_0 + sys_q3 = sys_0 + sys_q2 = sys_0 + sys_next = sys_0 in sys_next T. Muranushi et al. (RIKEN AICS)
dt /4 dt /3 dt /2 + dt Formura
* * * *
ddt ( sys_0 ) ddt ( sys_q4 ) ddt ( sys_q3 ) ddt ( sys_q2 ) Sep 22, 2016
26 / 37
Formura : a standalone DSL
Differentiation Operators ddx = fun ( a ) ( a [ i +1/2 , j , k ] - a [i -1/2 , j , k ])/ dx ddy = fun ( a ) ( a [i , j +1/2 , k ] - a [i ,j -1/2 , k ])/ dy ddz = fun ( a ) ( a [i ,j , k +1/2] - a [i ,j ,k -1/2])/ dz
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
27 / 37
Formura : a standalone DSL
Nabla and Summation @ = ( ddx , ddy , ddz ) = fun ( e ) e 0 + e 1 + e 2
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
28 / 37
Formura : a standalone DSL
Evaluation of formura expression
fun(i) @ i ( * v i)
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
29 / 37
Formura : a standalone DSL
Evaluation of formura expression
fun(i) @ i ( * v i)
= fun ( e ) e 0 + e 1 + e 2
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
29 / 37
Formura : a standalone DSL
Evaluation of formura expression
fun(i) @ i ( * v i)
= fun ( e ) e 0 + e 1 + e 2
!
(fun(i) @ i ( * v i)) 0 + (fun(i) @ i ( * v i)) 1 + (fun(i) @ i ( * v i)) 2
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
29 / 37
Formura : a standalone DSL
Evaluation of formura expression
(fun(i) @ i ( * v i)) 0
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
30 / 37
Formura : a standalone DSL
Evaluation of formura expression
!
(fun(i) @ i ( * v i)) 0 @ 0 ( * v 0))
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
30 / 37
Formura : a standalone DSL
Evaluation of formura expression
!
(fun(i) @ i ( * v i)) 0 @ 0 ( * v 0))
@ = ( ddx , ddy , ddz ) v = ( vx , vy , vz ) (a ,b , c ) 0 = a
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
30 / 37
Formura : a standalone DSL
Evaluation of formura expression
!
(fun(i) @ i ( * v i)) 0 @ 0 ( * v 0))
!
ddx ( * vx)
@ = ( ddx , ddy , ddz ) v = ( vx , vy , vz ) (a ,b , c ) 0 = a
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
30 / 37
Formura : a standalone DSL
Evaluation of formura expression
ddx ( * vx)
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
31 / 37
Formura : a standalone DSL
Evaluation of formura expression
ddx ( * vx) ddx = fun ( a ) ( a [ i +1/2 , j , k ] - a [i -1/2 , j , k ])/ dx
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
31 / 37
Formura : a standalone DSL
Evaluation of formura expression
ddx ( * vx) ddx = fun ( a ) ( a [ i +1/2 , j , k ] - a [i -1/2 , j , k ])/ dx
!
(( * vx)[i+1/2,j,k] ( * vx)[i-1/2,j,k])/dx
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
31 / 37
Formura : a standalone DSL
Evaluation of formura expression
ddx ( * vx) ddx = fun ( a ) ( a [ i +1/2 , j , k ] - a [i -1/2 , j , k ])/ dx
!
(( * vx)[i+1/2,j,k] ( * vx)[i-1/2,j,k])/dx ! ([i+1/2,j,k] * vx[i+1/2,j,k] [i-1/2,j,k] * vx[i-1/2,j,k])/dx
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
31 / 37
Formura : a standalone DSL
Evaluation of formura expression
fun(i) @ i ( * v i)
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
32 / 37
Formura : a standalone DSL
Evaluation of formura expression
fun(i) @ i ( * v i) ! ([i+1/2,j,k] * vx[i+1/2,j,k] [i-1/2,j,k] * vx[i-1/2,j,k])/dx + ([i,j+1/2,k] * vy[i,j+1/2,k] [i,j-1/2,k] * vy[i,j-1/2,k])/dy + ([i,j,k+1/2] * vz[i,j,k+1/2] [i,j,k-1/2] * vz[i,j,k-1/2])/dz
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
32 / 37
Formura : a standalone DSL
Evaluation of formura expression
!
3 @ X (vi) i=1 @xi
fun ( i ) @ i ( * v i )
([i+1/2,j,k] * vx[i+1/2,j,k] [i-1/2,j,k] * vx[i-1/2,j,k])/dx + ([i,j+1/2,k] * vy[i,j+1/2,k] [i,j-1/2,k] * vy[i,j-1/2,k])/dy + ([i,j,k+1/2] * vz[i,j,k+1/2] [i,j,k-1/2] * vz[i,j,k-1/2])/dz
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
33 / 37
Formura : a standalone DSL
More to talk about Modular Reifiable Matching (MRM)(Oliveira et al., 2015) + Pattern synoynm solves “expression problem” Details of code transformation paths Varieties of temporal blocking methods How we have gave proof to certain types of temporal blocking methods
T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
34 / 37
Conclusion
Conclusion
Functional programming is a good choice for user interface ! weather scientists and astronomers can use it is crucial in implementing all the program transformations ! achieves high performance T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
35 / 37
Conclusion
Conclusion
1.184 Pflops Formura T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
36 / 37
Bibliography
Bibliography I Bourgoin, M., Chailloux, E., & Lamotte, J.-L. 2012, Parallel Processing Letters, 22, 1240007 Chakravarty, M. M., Keller, G., Lee, S., McDonell, T. L., & Grover, V. 2011, in Proceedings of the sixth workshop on Declarative aspects of multicore programming, ACM, 3–14 Collins, A., Grewe, D., Grover, V., Lee, S., & Susnea, A. 2014, in Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ACM, 8 Mainland, G., & Morrisett, G. 2010in , ACM, 67–78 Oliveira, B. C. d. S., Mu, S.-C., & You, S.-H. 2015, in Proceedings of the 8th ACM SIGPLAN Symposium on Haskell, ACM, 82–93 ´ ´ ERALE ´ Rompf, T. 2012, PhD thesis, ECOLE POLYTECHNIQUE FED DE LAUSANNE Svensson, J. 2011, PhD thesis, Chalmers University of Technology T. Muranushi et al. (RIKEN AICS)
Formura
Sep 22, 2016
37 / 37