Reproducible, relocatable, customizable builds and packaging with HashDist Aron Ahmadia and the HashDist Developers SciPy, 8 July 2014

Don’t fail at packaging! From: http://www.cartridgesave.co.uk/news/10-absolutely-pathetic-packaging-fails/

Why Reproducible Software?

“Generally, data and code [are] not made available at the time of publication, insufficient information [is] captured in the publication for verification, replication of results.”

Professor Victoria Stodden Columbia University

Computational science is at a crisis of credibility.

Is Code Sufficient? An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. David Donoho, 1998.

Is Code Sufficient? An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures” David Donoho, 1998.

Thirty years ago

“Here’s my .f77 file! I hope you’ve got a computer with 1 MB of RAM…”

Ten years ago

“Here’s a bunch of source code and a Makefile, don’t worry I just included all of the dependencies in my code…”

Today “These results rely on SciPy, matplotlib, IPython, pandas, sympy, NumPy version 1.7.1, a patched version of PETSc, the system’s Accelerate framework, and the latest commit on this branch on my GitHub repository.”

Why Relocatable Builds?

Binary installers are fast!

From: Simpsons “Saturdays of Thunder”

What do binaries have to do with relocatability? •

Dynamic libraries and scripts hard-code paths to their dependencies at build-time



This doesn’t matter for system builds, since everything goes into the same place; e.g. /usr/lib



But it does matter if you want to install to arbitrary locations, such as a user’s home directory

Why Customizable Builds?

I just want to pip install scipy •

Do you want a released version of SciPy, or a commit from somebody’s branch?



Are you using a system LAPACK? Would you like to install one? What about BLAS? Where’s your Fortran compiler?

The situation is even more complicated for scientific Python packages that leverage C and Fortran extension modules and libraries

Modern Open Computational Science Relies on Complex Software Stacks

“We are building the car, not reinventing the wheel” William Stein, Sage Project

I’ll just pay somebody to do it… •

First-Class Commercial Software (e.g. MATLAB) •



Regular annual releases, almost never removes or changes functionality

Worst-Class Commercial Software •

Random updates



Features removed or changed drastically

Software wants to be free! •

Free tools are terrible for reproducibility • Inconsistent builds • Versions change drastically • Features come and go with the whims of developers



Packaging source-based scientific software is a full time job at many organizations

Well, I’ll use a package manager •

Door A: System package managers (homebrew) •



Door B: Language package managers (PyPi) •



Only supports one platform, usually require root

Only supports one language, doesn’t do binaries great

Door C: Conda •

Great solution based on binary distributions, but does not integrate with system and support parametrized builds as well

Well, I better build a packaging system… •

Visit - 15.2K line Bash Script



PETSc BuildSystem - 31.6K Python Package



Dorsal - 7.3K Bash Script



HashDist - 13.8K Python Package

What is HashDist? •

HashDist - A user-local tool for developing, managing, versioning, and deploying software builds



HashStack - A collection of package specifications for building software using HashDist

Portability There are a number of paths to achieve portability of complex scientific software, but there are no magic bullets. Our approach: •

Relocatable, user-local builds



Platform- and parameter-driven customization



Cygwin for portability on Windows



Don't rely on the system stack!

Sharing and Reuse HashDist promotes the sharing of scientific software packages with others, and the reuse of other software packages. •

Source and build caches are automatic



Reuse builds from other stacks and profiles on your machine



Share the recipes, source, and builds of your complete software environment with others

Reproducibility When your software stack is built by HashDist, HashDist versions your builds like a revision control system. •

Unique hashes track all components and subcomponents of each stack



Switching between build versions is a single command



Profile and package specifications can be tracked in any version control system Be confident that you can reproduce your code, and that others can reproduce it as well.

Flexibility and Extensibility By default, HashDist does its best to work everywhere. As a consequence, it will build packages instead of reusing them from the system. However, HashDist is not dogmatic about dependencies. It is very easy to satisfy package dependencies using the system or a tool like homebrew. Additionally, HashDist encourages the extension and customization of builds with a flexible parametrization system.

Automation

Installing scientific software shouldn't be hard. Build it Once. Use it Everywhere.

Contributors •

Aron Ahmadia!



Fernando Perez



Volker Braun!



Johannes Ring



Ondřej Čertík!



Dag Sverre Seljebotn!



Ilya



Jimmy Tang



Lea Jenkins



Andy Terrel



Chris Kees!



Lane Wittgen

Check it out Website https://hashdist.github.io/ Repository github.com/hashdist/hashdist Mailing List https://groups.google.com/d/forum/hashdist

Reproducible, relocatable, customizable builds and ... -

Only supports one platform, usually require root. • Door B: Language ... Platform- and parameter-driven customization. • Cygwin for ... Automation. Installing ...

609KB Sizes 3 Downloads 235 Views

Recommend Documents

Reproducible, relocatable, customizable builds and ... -
Automation. Installing scientific software shouldn't be hard. Build it Once. Use it Everywhere. ... Mailing List https://groups.google.com/d/forum/hashdist.

Customizable Tracking Solutions - Beacon Technologies
A Web Marketing ... campaigns, not just based on the final visit, which is where Google ... application Conversions), Several visits were often vital to creating.

Overview Local Builds and Modifications - GitHub
The first stage is part of the asic and loads the Secondary Program Loader. (SPL) into the asic's ... git checkout --track -b lsi-v2013.01.01 origin/lsi-v2013.01.01. 1 ...

Customizable Tracking Solutions
Atypical Tracking Solutions Provide. Detailed, Advanced ... accurate conversion metrics to make informed business decisions. Business Solution: In 2011, ...

Making Computations and Publications Reproducible with VisTrails
6/8/12 10:41 AM ... through a Web-based interface, and upgrade the ..... the host and database name: .... best practices aren't necessarily formalized. By pub-.

Customizable Tracking Solutions - Beacon Technologies, Inc.
campaigns, not just based on the final visit, which is where Google. Analytics attributes the goal. The data being provided by GA for assisted conversions ...

Shared, reproducible research
•Open data. •Open research process. •Open research results. •Open access to publications. •Open source. •Open peer review. •Collaborative research. •Citizen participation in research. •… Open, reproducible research. Reproducible r

Overview Local Builds and Modifications - GitHub
restore "u-boot-spl.bin" binary S:0x20000000 set var $pc ... restore "parameters" binary S:0x2003f000 ... It is possible to use the data path instead of the FEMAC.

eBook Language: Usage and Practice: Reproducible ...
The easy-to-follow format allows students to work independently, saving time ... Reproducible Grade 6 Full ebook download by STECK-VAUGHN , Full Epub ... Usage and Practice: Reproducible Grade 6 For android by STECK-VAUGHN}.

Reproducible graphics with R and ggplot2 -
May 17, 2012 - plot(d$wavelength, d$extinction, type = "l", lty = 1, xlab = "wavelength", ylab = "cross-section") lines(d$wavelength, d$absorption, lty = 2) lines(d$wavelength, d$scattering, lty = 3). 0.4. 0.6. 0.8. 1.0. 1.2. 0.02. 0.06. 0.10 wavelen

Edmonton Public Schools builds collaborative learning community ...
400, so access to technology and the Internet was limited,” says Terry Korte,. Technology Integration ... In April 2010, EPS conducted several information sessions for principals and lead teachers and ... been all my career?' And for many ...