Variant Pipeline Tools reborn as

Script of Scripts Bo Peng, Ph.D. Department of Bioinformatics and Computational Biology The University of Texas, MD Anderson Cancer Center Mar. 30th, 2016 Updated on Apr. 7th, for SoS release 0.5.7

Why workflow management? Bioinformatics analyses usually involve the application of various tools and scripts

Why workflow management? Bioinformatics analyses usually involve the application of various tools and scripts Workflow management systems handle •  Creation and management of workflows (viewer, editor, GUI, repository, …) •  Execution of workflows under different environments (cluster, docker, cloud, …) •  Execution management (resource management, suspension, resume, …) •  Logging and data provenance (monitor, logging, reproducibility, …)

There are already big players in the field

There are already big players in the field

There are already big players in the field

There are already big players in the field

There are already big players in the field

There are already big players in the field

Why did we start VPT? •  Embed into Variant Tools (VT) to provide VT pipelines through VT repository •  Reasonably powerful and flexible •  But easy to read, write, and share •  No suitable tool for this task –  Snakemake: Make-style system does not work for VT where commands change existing files –  CWL, Galaxy etc: too bulky for our purposes

History of Variant Pipeline Tools

History of Variant Pipeline Tools VPT  was   created  as  a   small  Variant   Tools  add-­‐on  to   execute   internal  Variant   Tools   commands  

2013  

History of Variant Pipeline Tools VPT  was   created  as  a   small  Variant   Tools  add-­‐on  to   execute   internal  Variant   Tools   commands  

2013  

VPT  was  expanded   to  execute  Variant   Tools  related   pipelines  

2014  

History of Variant Pipeline Tools VPT  was   created  as  a   small  Variant   Tools  add-­‐on  to   execute   internal  Variant   Tools   commands  

2013  

VPT  was  expanded   to  execute  Variant   Tools  related   pipelines  

2014  

VPT  was  expanded  to   execute  more   bioinforma>cs   pipelines,  and  Variant   Simula>on  Tools  

2015  

History of Variant Pipeline Tools VPT  was   created  as  a   small  Variant   Tools  add-­‐on  to   execute   internal  Variant   Tools   commands  

2013  

VPT  was  expanded   to  execute  Variant   Tools  related   pipelines  

2014  

VPT  was  expanded  to   execute  more   bioinforma>cs   pipelines,  and  Variant   Simula>on  Tools  

2015  

VPT  was   redesigned  and   rewriCen  as   Script  of  Scripts  

2016  

Lessons learned

Lessons learned

Repository  is  good   Tied  to  VT  is  bad.  

Lessons learned

Repository  is  good   Tied  to  VT  is  bad.  

Simple  format  is  good   Too  simple  can  be  limi>ng  

Lessons learned

Repository  is  good   Tied  to  VT  is  bad.  

Simple  format  is  good   Too  simple  can  be  limi>ng  

Flexibility  is  good   Too  much  flexibility  can   be  dangerous  

Script of Scripts Script of Scripts (SoS) is a lightweight workflow system that helps you organize your commands and scripts in different languages into readable workflows.

Script of Scripts Script of Scripts (SoS) is a lightweight workflow system that helps you organize your commands and scripts in different languages into readable workflows.

Input  driven  workflow   system  

Script of Scripts Script of Scripts (SoS) is a lightweight workflow system that helps you organize your commands and scripts in different languages into readable workflows.

Input  driven  workflow   system  

Based  on  Python    

Script of Scripts Script of Scripts (SoS) is a lightweight workflow system that helps you organize your commands and scripts in different languages into readable workflows.

Input  driven  workflow   system  

Based  on  Python    

Script  and  user  friendly  

Basic SoS •  •  •  • 

Basic format Command line argument Input and output files Parallelization

A bioinformatics workflow A  shell  script  to  align  reads  to  reference  genome  

A bioinformatics workflow A  shell  script  to  align  reads  to  reference  genome  

A  R  script  to  analyze  results  

Script of Scripts

Script of Scripts header    

Script of Scripts Workflow  descrip>on  

Script of Scripts

Steps  with  logical  order  of  execu>on  

Script of Scripts

Shell  script  

Script of Scripts

Shell  script  

Script of Scripts

R  script  

Script of Scripts

Execute the script

Execute the script

Command line arguments

Command line arguments

   Parameters  

Command line arguments

String  interpola>on  

Command line arguments

Command line arguments

Specify input and output files

Specify input and output files

Output  of  step  1   (No  input)  

Specify input and output files

Input,  output  and   dependent  files  of   step  2  

Specify input and output files

Output  of  step  3   (Input  is  the  output  of  step  2)  

Specify input and output files

Use  of  SoS  variables  input   and  output  

Ignore executed steps

Ignore executed steps

Ignore executed steps

Reuse  exis>ng   results  

Ignore executed steps

Ignore executed steps

Re-­‐execute  with   different  input  files  

Execute Workflow in Parallel

Execute Workflow in Parallel

input  files  are  sent  one  by   one  to  _input    

Execute Workflow in Parallel

input  files  are  paired  with   sample_type    

Execute Workflow in Parallel

Each  _input  has  a  corresponding   _sample_type  

Execute Workflow in Parallel

Execute  in  parallel  

Execute Workflow in Parallel

Summary •  SoS scripts consist of (meaningful) comments, scripts and commands, and SoS specific syntax •  Command sos with subcommands show, dryrun and run   •  Workflows are defined in logical order, all steps will be executed •  Scripts in different languages can be included verbatim •  Scripts can be customized using user and SoS-defined variables and command line arguments •  Specification of input and output files are not required, but are helpful •  Workflows can be executed in parallel, can be resumed while skipping executed steps

Intermediate SoS •  •  •  • 

String interpolation Shared variables Step process Execution

String Interpolation

String Interpolation Arbitrary  python  expression   and  statements  are  allowed.  

String Interpolation Arbitrary  python  expression   and  statements  are  allowed.  

:    for  format  specifica>on   !r  for  object  representa>on   !q  for  shell  quote  

String Interpolation Arbitrary  python  expression   and  statements  are  allowed.  

:    for  format  specifica>on   !r  for  object  representa>on   !q  for  shell  quote  

String Interpolation Arbitrary  python  expression   and  statements  are  allowed.  

:    for  format  specifica>on   !r  for  object  representa>on   !q  for  shell  quote  

String Interpolation Arbitrary  python  expression   and  statements  are  allowed.  

:    for  format  specifica>on   !r  for  object  representa>on   !q  for  shell  quote  

String Interpolation Arbitrary  python  expression   and  statements  are  allowed.  

:    for  format  specifica>on   !r  for  object  representa>on   !q  for  shell  quote  

String Interpolation Arbitrary  python  expression   and  statements  are  allowed.  

:    for  format  specifica>on   !r  for  object  representa>on   !q  for  shell  quote  

String Interpolation Arbitrary  python  expression   and  statements  are  allowed.  

:    for  format  specifica>on   !r  for  object  representa>on   !q  for  shell  quote  

String Interpolation Arbitrary  python  expression   and  statements  are  allowed.  

:    for  format  specifica>on   !r  for  object  representa>on   !q  for  shell  quote  

String Interpolation Arbitrary  python  expression   and  statements  are  allowed.  

:    for  format  specifica>on   !r  for  object  representa>on   !q  for  shell  quote  

Step input, output, depends •  By default a step gets its input from the output of its previous step •  Strings, variables, expressions are allowed •  Wildcard characters are expanded •  Step input defines variable input   •  Step output defines variable output   •  Step depends defines variable depends  

Variables, expressions and statements

Variables, expressions and statements sorted_bam  is  a  parameter  or   global  variable  

Variables, expressions and statements

output  can  be  derived  from  input  

Variables, expressions and statements

Op>on  alias  exposes  step  variables   to  later  steps  (readonly)  

Variables, expressions and statements

Arbitrary  Python  statements  can  be  used  in  a   step  

SoS actions and step process

SoS actions and step process check_command  and  run  are   both  SoS  ac>ons     check_command  is  executed  in   both  dryrun  and  run  modes     run  is  executed  only  in  run  mode  

SoS actions and step process check_command  and  run  are   both  SoS  ac>ons     check_command  is  executed  in   both  dryrun  and  run  modes     run  is  executed  only  in  run  mode   process  starts  step  process  that   accepts  run&me  op&ons  (e.g.   concurrent=True)    

SoS actions and step process check_command  and  run  are   both  SoS  ac>ons     check_command  is  executed  in   both  dryrun  and  run  modes     run  is  executed  only  in  run  mode   process  starts  step  process  that   accepts  run&me  op&ons  (e.g.   concurrent=True)    

action:  options      script     is  a  shortcut  for     process:  options   action(‘script’)  

Execution of simple workflows   Workflow  with  no  defined  input  and  output  files  

Step  1  

Step  2  

Step  3  

Step  4  

Execution of simple workflows   Workflow  with  no  defined  input  and  output  files  

Step  1  

Step  2  

Step  3  

Step  4  

•  Workflow  steps  are  executed  sequen>ally   •  Steps  are  executed  in  separate  processes   •  Step  loops  can  be  executed  in  parallel  

Execution of complex workflows   Workflow  with  defined  input  and  output  

Logical   View  

None  

Step  1  

a1  

Step  5  

a6  

Step  2  

a6  

Step  6  

a2  

None  

a1  

Step  3  

a3  

a2   a3  

Step  4  

a5  

a2   a6  

Step  7  

a7  

a7  

Step  8  

a8  

Execution of complex workflows   Workflow  with  defined  input  and  output  

Logical   View  

None  

Step  1  

a1  

Step  5  

a6  

Step  2  

a6  

Step  6  

a1  

Step  3  

a3  

a2   a3  

Step  4  

a5  

a2   a6  

Step  7  

a7  

a7  

Step  8  

a8  

a2  

None  

a3   Step  1   a1  

Processing   View  

Step  5   Step  2  

a5  

a2   Step  7   Step  5  

a6   None  

a7  

Step  8  

a8  

Advanced SoS •  •  •  • 

Step options Multiple workflows Sub- and combined-workflows Nested workflows

group_by,  for_each and paired_with   Without  group_by  and  for_each,  paired_with=‘var’     Previous   output  

Step   input  

input  =  _input   var  =  _var   input  is  sent     all  at  once  as  _input    

Step   output  

_output  =  output  

_output  becomes  output    

group_by,  for_each and paired_with   Without  group_by  and  for_each,  paired_with=‘var’     Previous   output  

Step   input  

input  =  _input   var  =  _var   input  is  sent     all  at  once  as  _input    

Step   output  

_output  =  output  

_output  becomes  output    

With  group_by  and/or  for_each,  paired_with=‘var’    

Previous   output  

Step   input  

 input  

_input,  _var  

Step   output  

_output  

_input.  _var  

Step   output  

_output  

Step   output  

_output  

var   _input,  _var  

Subsets  of  input  become   _input  with  matching  subsets   of  var  (_var)  

output  

output  is  the  collec>on  of   _output  

Input option group_by  

Input option group_by  

Input option group_by  

Input option group_by  

Input option group_by  

Input option group_by  

Input option for_each  

Input option for_each  

Input option for_each  

Input option for_each  

Input and output option pattern  

Input and output option pattern  

•  Input  op>on  pattern  matches  paCern  with  input  file  names     •  It  creates  variables  name  and  par  that  paired_with  variable  input   •  Output  op>on  pattern  creates  output  filenames  from  list  variables  

Input and output option pattern  

•  Input  op>on  pattern  matches  paCern  with  input  file  names     •  It  creates  variables  name  and  par  that  paired_with  variable  input   •  Output  op>on  pattern  creates  output  filenames  from  list  variables  

•  Loop  variables  _name,  _par  and  _output  can  also  be  used  in  case  of  input  loops   •  Specifying  par>al  output  (_output)  allows  SoS  to  have  finer  control  of  the  execu>on  

Multiple workflows Single  default  workflow  

Multiple workflows Single  default  workflow  

Single  named  workflow  

Multiple workflows Single  default  workflow  

Single  named  workflow  

Mul>ple  named  workflows  

Multiple workflows Single  default  workflow  

Mul>ple  named  workflows  

Single  named  workflow  

Default  and  named  workflow  

Multiple workflows Single  default  workflow  

Mul>ple  named  workflows  

Single  named  workflow  

Default  and  named  workflow  

Shared  workflow  steps  

Multiple workflows Single  default  workflow  

Mul>ple  named  workflows  

Shared  workflow  steps  

Single  named  workflow  

Default  and  named  workflow   Wildcard  workflow  steps  

Shared workflow step

step_name  can  be  either  mouse_20  or  human_20   The  statement  determines  which  reference  genome  to  use  from  step_name  

Sub-workflow and Combined workflow A  sub-­‐workflow  consists  of  one  or  more  consecu>ve  steps  of  a  workflow  

Sub-workflow and Combined workflow A  sub-­‐workflow  consists  of  one  or  more  consecu>ve  steps  of  a  workflow  

A  combined  workflow  consists  of  one  or  more  subworkflows  

Sub-workflow and Combined workflow A  sub-­‐workflow  consists  of  one  or  more  consecu>ve  steps  of  a  workflow  

A  combined  workflow  consists  of  one  or  more  subworkflows  

sos  commands  accept  regular,  sub-­‐,  and  combined  workflows  

Nested workflow A  nested-­‐workflow  is  a  workflow  executed  within  a  SoS  step  by  ac8on  sos_run  

Nested workflow A  nested-­‐workflow  is  a  workflow  executed  within  a  SoS  step  by  ac8on  sos_run  

Nested workflow A  nested-­‐workflow  is  a  workflow  executed  within  a  SoS  step  by  ac8on  sos_run  

Customized  input  files  from  output  of   two  previous  steps  

Nested workflow A  nested-­‐workflow  is  a  workflow  executed  within  a  SoS  step  by  ac8on  sos_run  

Customized  output  file   generator  

Nested workflow A  nested-­‐workflow  is  a  workflow  executed  within  a  SoS  step  by  ac8on  sos_run  

A  nested  workflow  from  three  sub-­‐workflows  

Nested workflow A  nested-­‐workflow  is  a  workflow  executed  within  a  SoS  step  by  ac8on  sos_run  

One  or  more  sub-­‐workflows   defined  in  another  script  

Nested workflow A  nested-­‐workflow  is  a  workflow  executed  within  a  SoS  step  by  ac8on  sos_run  

Execute  complete  workflows  100   8mes  with  different  random  seeds    

Status of SoS •  •  •  •  • 

Hosted at https://github.com/BoPeng/SOS/ Require Python 3.3 or higher Install using command pip3 install sos Most features have been implemented and well-tested Pending (but not in a hurry) features: –  –  –  –  –  –  –  – 

Native Docker support (build and run docker containers) Auxiliary step (snakemake-like rules) Dynamic DAG (directed acyclic graphs) Celery task management (cluster systems) Resource management (CPU, RAM, etc) Execution monitor (monitor status of jobs) Post-execution analysis (command sos summary) …

•  Version 0.5.9, expect changes and bugs, but you know how to reach me for help

Acknowledgements •  •  •  •  • 

Dr.  Gao  Wang   Dr.  Paul  Scheet   Dr.  Suzanne  Leal   Dr.  John  Weinstein   and  others  

•  Grant 1R01HG005859  (Dr.  Paul  Scheet) •  The Michael and Susan Dell Foundation •  The Chapman Foundation

Script of Scripts - GitHub

Apr 7, 2016 - CWL, Galaxy etc: too bulky for our purposes .... Step 3. Execution of simple workflows. • Workflow steps are executed sequen*ally. • Steps are ...

3MB Sizes 5 Downloads 68 Views

Recommend Documents

study of the Indus script
according to the cardinal directions and provided with a network of covered drains. .... systems. The archaeologist S. R. Rao in his book The decipherment of the Indus ... In 1932, a Hungarian engineer, Vilmos Hevesy compared the Indus script with th

script
Jan 29, 2010 - DO NOT SAY ANYTHING THAT IS NOT IN SCRIPT. ... 1) What is the maximum amount (dan kuli) that you are willing to pay for this soap?

Romeo and Juliet Full Script
Mar 5, 2012 - on2, I welcome their company ...... apple. ROMEO. 2.4.83. And is it not [then]2 well served into a sweet ...... enjoyed by my new owner, long.

Script Breakdown Example.pdf
CARLY, age 25, is getting ready for a date. Her hair is wet ... She exits her room and enters the kitchen. As she walks .... Displaying Script Breakdown Example.pdf.

Romeo and Juliet Full Script
Mar 5, 2012 - Adding to clouds more clouds with his deep sighs. ... Doth add more grief to too much of mine own. ..... Marry, that "marry" is the very theme.

Romeo and Juliet Full Script
Mar 5, 2012 - No, sir, I do not bite my thumb at you, sir, but I bite my thumb, sir. ... Why call you for a sword? ...... Turn back, dull earth, and find thy center out.

university of stellenbosch - GitHub
the degrei. BACHELOR OF SCIENCE WITH HONOURS. (BScHons). (Computer Science) with all the rishts and privilepes pertaininp thereto was conferred on.

Contents of - GitHub
Endpoints. /. The application root. /contact. Contact manager. Methods. POST. Creates a new ... Not an actual endpoint, but the HTTP method to use. NAME/?.

Crash Script - Excerpt.SCW
A MOTORCYCLE COP appears at her open window. ... should be going home, but you come ... Dirk nods toward the Security Guard, who heads this way.