Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles.

Lampa S, Alvarsson J, Spjuth O

J Cheminform 8 (-) 67 [2016-11-24; online 2016-11-24]

Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.Graphical abstract.

Bioinformatics Support and Infrastructure [Collaborative]

QC bibliography QC xrefs

PubMed 27942268

DOI 10.1186/s13321-016-0179-6

Crossref 10.1186/s13321-016-0179-6

179

pmc PMC5123367