high-performance Computing for big omics data analysis

High-throughput bio-technologies generate a growing set of huge omics data that represent a challenge for predictive data-driven computational models.

In this context, we developed parSMURF, a high-performance Computing (HPC) tool to deal with big genomic and more in general with big omics data, to predict the deleterious or pathogenic variants associated with genetic diseases (Petrini et al. 2020). Even if originally designed to automatically tune the learning hyper-parameters of hyperSURF for the prediction of pathogenic variants causative or associated with Mendelian or complex genetic diseases (Petrini et al. 2017), parSMURF, by exploiting high computational parallelism and memory distributed across multiple nodes of an HPC cluster, can deal with very imbalanced Genomic Medicine problems and scales well with big omics data, even when only a very reduced set of labelled examples is available.

Publications

A. Petrini, M. Schubach, M. Re, M. Frasca, M. Mesiti, G. Grossi, T. Castrignanò, P. Robinson and G. Valentini. Parameters tuning boosts HyperSMURF predictions of rare deleterious non-coding genetic variants. PeerJ Preprints, PeerJ Inc. San Francisco, USA 5, 2017.