CP2K Open Source Molecular Dynamics This document is intended to be a recipe for building and running the Intel branch of CP2K, which uses the Intel Development Tools and the Intel runtime environment. Differences compared to CP2K/trunk may be incorporated into the mainline version of CP2K at any time (and subsequently released). For example, starting with CP2K 3.0 an LIBXSMM integration is available which is (optionally) substituting CP2K’s “libsmm” library. Some additional reference can found under https://groups.google.com/d/msg/cp2k/xgkJc59NKGw/U5v5FtzTBwAJ.

Getting the Source Code The source code is hosted at GitHub and is supposed to represent the master version of CP2K in a timely fashion. CP2K’s main repository is hosted at SourceForge but automatically mirrored at GitHub. The LIBXSMM library can be found under https://github.com/hfp/libxsmm.

Build Instructions In order to build CP2K/intel from source, one may rely on Intel Compiler 16 or 17 series (the 2018 version may be supported at a later point in time). For the Intel Compiler 2017 prior to Update 4, one should source the compiler followed by sourcing a specific version of Intel MKL (to avoid an issue in Intel MKL): source /opt/ intel / compilers_and_libraries_2017 .3.191/ linux/bin/ compilervars .sh intel64 source /opt/ intel / compilers_and_libraries_2017 .0.098/ linux/mkl/bin/ mklvars .sh intel64

Since Update 4 of the Compiler and Libraries 2017 suite, one can source the compiler and libraries as shown below: source /opt/ intel / compilers_and_libraries_2017 .4.196/ linux/bin/ compilervars .sh intel64

LIBXSMM is built when building CP2K if CP2K/intel is used. LIBXSMM is built in an out-of-tree fashion (LIBXSMMROOT path needs to be detected or supplied). A recipe targeting “Haswell” (HSW) may look like below. git clone https :// github .com/hfp/ libxsmm .git git clone --branch intel https :// github .com/cp2k/cp2k.git cp2k.git ln -s cp2k.git/cp2k cp2k cd cp2k/ makefiles make ARCH=Linux -x86 -64 - intel VERSION =psmp AVX =2

To target for instance “Knights Landing” (KNL), use “AVX=3 MIC=1” instead of “AVX=2”. Since CP2K 3.0, the mainline version (non-Intel branch) supports LIBXSMM as well. If an own ARCH file is used or prepared, the LIBXSMM library needs to be built separately and one may follow the official guide. Building LIBXSMM is rather simple (instead of the master revision, an official release can used as well): git clone https :// github .com/hfp/ libxsmm .git cd libxsmm ; make

To download and build an official CP2K release, one can still use the ARCH files that are part of the CP2K/intel branch. In this case, LIBXSMM is also built implicitly. git clone https :// github .com/hfp/ libxsmm .git wget http :// downloads . sourceforge .net/ project /cp2k/cp2k -4.1. tar.bz2 tar xvf cp2k -4.1. tar.bz2 cd cp2k -4.1/ arch wget https :// github .com/cp2k/cp2k/raw/ intel /cp2k/arch/Linux -x86 -64- intel.x wget https :// github .com/cp2k/cp2k/raw/ intel /cp2k/arch/Linux -x86 -64- intel.popt wget https :// github .com/cp2k/cp2k/raw/ intel /cp2k/arch/Linux -x86 -64- intel.psmp wget https :// github .com/cp2k/cp2k/raw/ intel /cp2k/arch/Linux -x86 -64- intel.sopt wget https :// github .com/cp2k/cp2k/raw/ intel /cp2k/arch/Linux -x86 -64- intel.ssmp cd ../ makefiles source /opt/ intel / compilers_and_libraries_2017 .4.196/ linux/bin/ compilervars .sh intel64 make ARCH=Linux -x86 -64 - intel VERSION =psmp AVX =2

For Intel MPI, usually any version is fine. For product suites, the compiler and the MPI library are sourced in one step. To work around known issues, one may combine components from different suites. To further improve performance and versatility, one may supply LIBINTROOT, LIBXCROOT, and ELPAROOT when relying on CP2K/intel’s ARCH files (see later sections about these libraries). To further adjust CP2K at build time of the application, additional key-value pairs can be passed at make’s command line (like ARCH=Linux-x86-64-intel or VERSION=psmp).

• SYM: set SYM=1 to include debug symbols into the executable e.g., helpful with performance profiling. • DBG: set DBG=1 to include debug symbols, and to generate non-optimized code.

Running the Application Running the application may go beyond a single node, however for first example the pinning scheme and thread affinization is introduced. As a rule of thumb, a high rank-count for single-node computation (perhaps according to the number of physical CPU cores) may be preferred. In contrast (communication bound), a lower rank count for multi-node computations may be desired. In general, CP2K prefers the total rank-count to be a square-number (two-dimensional communication pattern) rather than a Power-of-Two (POT) number. Running an MPI/OpenMP-hybrid application, an MPI rank-count that is half the number of cores might be a good starting point (below command could be for an HT-enabled dual-socket system with 16 cores per processor and 64 hardware threads). mpirun -np 16 \ -genv I_MPI_PIN_DOMAIN =auto \ -genv KMP_AFFINITY =scatter , granularity =fine ,1 \ -genv OMP_NUM_THREADS =4 \ cp2k/exe/Linux -x86 -64 - intel/cp2k.psmp workload .inp

For an actual workload, one may try cp2k/tests/QS/benchmark/H2O-32.inp, or for example the workloads under cp2k/tests/QS/benchmark_single_node which are supposed to fit into a single node (in fact to fit into 16 GB of memory). For the latter set of workloads (and many others), LIBINT and LIBXC may be required. The CP2K/intel branch carries several “reconfigurations” and environment variables, which allow to adjust important runtime options. Most of these options are also accessible via the input file format (input reference e.g., http: //manual.cp2k.org/trunk/CP2K_INPUT/GLOBAL/DBCSR.html). • CP2K_RECONFIGURE: environment variable for reconfiguring CP2K (default depends on whether the ACCeleration layer is enabled or not). With the ACCeleration layer enabled, CP2K is reconfigured (as if CP2K_RECONFIGURE=1 is set) e.g. an increased number of entries per matrix stack is populated, and otherwise CP2K is not reconfigured. Further, setting CP2K_RECONFIGURE=0 is disabling the code specific to the Intel branch of CP2K, and relies on the (optional) LIBXSMM integration into CP2K 3.0 (and later). • CP2K_STACKSIZE: environment variable which denotes the number of matrix multiplications which is collected into a single stack. Usually the internal default performs best across a variety of workloads, however depending on the workload a different value can be better. This variable is relatively impactful since the work distribution and balance is affected. • CP2K_HUGEPAGES: environment variable for disabling (0) huge page based memory allocation, which is enabled by default (if TBBROOT was present at build-time of the application). • CP2K_RMA: enables (1) an experimental Remote Memory Access (RMA) based multiplication algorithm (requires MPI3). • CP2K_SORT: enables (1) an indirect sorting of each multiplication stack according to the C-index (experimental).

LIBINT and LIBXC Dependencies Please refer to the XCONFIGURE project (https://github.com/hfp/xconfigure), which helps to configure common HPC software for Intel software development tools. The XCONFIGURE project provides recipes for LIBINT, LIBXC, and ELPA. To configure, build, and install LIBINT (version 1.1.5 and 1.1.6 have been tested), one may proceed as shown below (please note there is no easy way to cross-built the library for an instruction set extension which is not supported by the compiler host). To incorporate LIBINT, the key LIBINTROOT=/path/to/libint needs to be supplied when using CP2K/intel’s ARCH files (make). env \ AR=xiar CC=icc CXX=icpc \ ./ configure \ --with -cxx - optflags ="-O2␣-xCORE -AVX2" \ --with -cc - optflags ="␣-O2␣-xCORE -AVX2" \ --with -libderiv -max -am1 =4 \ --with -libint -max -am=5 \ --prefix =$HOME / libint /default -hsw make

make install make realclean

To configure, build, and install LIBXC (version 3.0.0 has been tested), one may proceed as shown below. To make use of LIBXC, the key LIBXCROOT=/path/to/libxc needs to be supplied when using CP2K/intel’s ARCH files (make). env \ AR=xiar F77=ifort F90= ifort FC=ifort CC=icc \ FCFLAGS ="-O2␣-xCORE -AVX2" \ CFLAGS ="␣-O2␣-xCORE -AVX2" \ ./ configure \ --prefix =$HOME / libxc/default -hsw make make install make clean

If the library needs to be cross-compiled, one may add --host=x86_64-unknown-linux-gnu to the command line arguments of the configure script.

Tuning Eigenvalue SoLvers for Petaflop-Applications (ELPA) Please refer to the XCONFIGURE project (https://github.com/hfp/xconfigure), which helps to configure common HPC software (and ELPA in particular) for Intel software development tools. To incorporate ELPA, the key ELPAROOT=/path/to/elpa needs to be supplied when using CP2K/intel’s ARCH files (make). For the Intel-branch, ELPA-2017.05.001 is already supported: make ARCH=Linux -x86 -64 - intel VERSION =psmp ELPA =201705 ELPAROOT =/ path/to/elpa/default -arch

At runtime, a build of the Intel-branch supports an environment variable CP2K_ELPA: • CP2K_ELPA=-1: requests ELPA to be enabled; the actual kernel type depends on the ELPA configuration. • CP2K_ELPA=0: ELPA is not enabled by default (only on request via input file); same as non-Intel branch. • CP2K_ELPA=: requests ELPA-kernel according to CPUID (default with CP2K/Intel-branch). Memory Allocation Wrapper Dynamic allocation of heap memory usually requires global book keeping eventually incurring overhead in sharedmemory parallel regions of an application. For this case, specialized allocation strategies are available. To use the malloc-proxy of Intel Threading Building Blocks (Intel TBB), use the TBBMALLOC=1 key-value pair at build time of CP2K. Usually, Intel TBB is just available due to sourcing the Intel development tools (see TBBROOT environment variable). To use TCMALLOC as an alternative, set TCMALLOCROOT at build time of CP2K by pointing to TCMALLOC’s installation path (configured per ./configure --enable-minimal --prefix=).

CP2K with LIBXSMM - GitHub

make ARCH=Linux-x86-64-intel VERSION=psmp AVX=2. To target for instance “Knights ... //manual.cp2k.org/trunk/CP2K_INPUT/GLOBAL/DBCSR.html).

41KB Sizes 22 Downloads 382 Views

Recommend Documents

CP2K PBE0 benchmarking for ionic crystals - PDFKUL.COM
CP2K PBE0 benchmarking for ionic crystals. Xiaoming Wang. Department of Physics and Astronomy. The University of Toledo [email protected] ...

Java with Generators - GitHub
processes the control flow graph and transforms it into a state machine. This is required because we can then create states function SCOPEMANGLE(node).

OpenBMS connection with CAN - GitHub
Arduino with BMS- and CAN-bus shield as BMS a master. - LTC6802-2 or LTC6803-2 based boards as cell-level boards. - CAN controlled Eltek Valere as a ...

Better performance with WebWorkers - GitHub
Chrome52 on this Laptop. » ~14kbyte. String => 133ms ... 3-4 Seks processing time on samsung galaxy S5 with crosswalk to finish the transition with ... Page 17 ...

with ZeroMQ and gevent - GitHub
Normally, the networking of distributed systems is ... Service Oriented .... while True: msg = socket.recv() print "Received", msg socket.send(msg). 1. 2. 3. 4. 5. 6. 7.

Getting Started with CodeXL - GitHub
10. Source Code View . ..... APU, a recent version of Radeon Software, and the OpenCL APP SDK. This document describes ...... lel_Processing_OpenCL_Programming_Guide-rev-2.7.pdf. For GPU ... trademarks of their respective companies.

Getting Started with Go - GitHub
Jul 23, 2015 - The majority of my experience is in PHP. I ventured into Ruby, ... Compiled, Statically Typed, Concurrent, Imperative language. Originally ...

Getting Acquainted with R - GitHub
In this case help.search(log) returns all the functions with the string 'log' in them. ... R environment your 'working directory' (i.e. the directory on your computer's file ... Later in the course we'll discuss some ways of implementing sanity check

Training ConvNets with Torch - GitHub
Jan 17, 2014 - ... features + SVM. – Neural Nets (and discuss discovering graph structure automařcally). – ConvNets. • Notebook Setup ... Page 9 ...

Examples with importance weights - GitHub
Page 3 ... Learning with importance weights y. wT t x wT t+1x s(h)||x||2 ... ∣p=(wt−s(h)x)Tx s (h) = η. ∂l(p,y). ∂p. ∣. ∣. ∣. ∣p=(wt−s(h)x)Tx. Finally s(0) = 0 ...

Digital Design with Chisel - GitHub
Dec 27, 2017 - This lecture notes (to become a book) are an introduction into hardware design with the focus on using the hardware construction language Chisel. The approach of this book is to present small to medium sized typical hardware components

Deep Learning with H2O.pdf - GitHub
best-in-class algorithms such as Random Forest, Gradient Boosting and Deep Learning at scale. .... elegant web interface or fully scriptable R API from H2O CRAN package. · grid search for .... takes to cut the learning rate in half (e.g., 10−6 mea

CP2K PBE0 benchmarking for ionic crystals -
CP2K PBE0 benchmarking for ionic crystals. Xiaoming Wang. Department of Physics and Astronomy. The University of Toledo [email protected] ...

How to make presentations with LATEX - GitHub
Aug 29, 2011 - well with PGF/TikZ packages which might make it the best solution out there. ... done from scratch, the user will end up having a unique theme for his/her ... .tex files, which get compiled when needed and then the resultant .pdf.

Getting Started with Transact-SQL Labs - GitHub
The SQL Server Database Engine is a complex software product. For your systems to achieve optimum performance, you need to have a knowledge of Database Engine architecture, understand how query design affects performance, and be able to implement eff

visualization with ggplot and R - GitHub Pages
Aug 10, 2014 - Some terminology. ▷ data. ▷ aesthetics. ▷ geometry. ▷ The geometric objects in the plot. ▷ points, lines, polygons, etc. ▷ shortcut functions: geom point(), geom bar(), geom line(). Page 20. Basic structure ggplot(data = iris ...... Pa

Automatic Model Construction with Gaussian Processes - GitHub
This chapter also presents a system that generates reports combining automatically generated ... in different circumstances, our system converts each kernel expression into a standard, simplified ..... (2013) developed an analytic method for ...

Processing Big Data with Hive - GitHub
Processing Big Data with Hive ... Defines schema metadata to be projected onto data in a folder when ... STORED AS TEXTFILE LOCATION '/data/table2';.

Cluster-parallel learning with VW - GitHub
´runvw.sh ´ -reducer NONE. Each mapper runs VW. Model stored in /model on HDFS runvw.sh calls VW, used to modify VW ...

Realtime HTML5 Multiplayer Games with Node.js - GitHub
○When writing your game no mental model shift ... Switching between different mental models be it java or python or a C++ .... Senior Applications Developer.

NWP Facing the Future with Cpp.pptx - GitHub
May 13, 2015 - Data sources for the ECMWF Meteorological Opera^onal System. 10 .... 13/05/15. 24. Big Data. Scalable Algorithms. Pla orm Uncertainty ...

Getting Started with Transact-SQL Labs - GitHub
Getting Started document for information about how to provision this. Challenge 1: Inserting Products. Each Adventure Works product is stored in the SalesLT.