Scalable methods for the analysis of big biomolecular networks

Big data analysis in large biomolecular networks represents a significant problem in computational biology. In this context, we developed vertex-centric algorithms and used technologies such as GraphChi based on the efficient use of secondary memory to process large graphs that cannot be directly loaded into primary memory. The goal is to analyze big data built with "omics" data, with relevant applications in Molecular Biology and Medicine, using simple stand-alone workstations (Mesiti et al. 2014). Promising experimental results have been obtained in the context of multi-species prediction of protein function (Mesiti et al. 2013, Lin et al. 2017).

Another line of research concerns the use of GPU technology for the massively parallel implementation of node label prediction algorithms such as COSNet for efficient processing of large graphs (Frasca et al. 2018), with applications to the prediction of GO classes, using the integrated multi-species network of the STRING database, which includes millions of proteins of different species.

Publications

M. Frasca, G. Grossi, J. Gliozzo, M. Mesiti, M. Notaro, P. Perlasca, A. Petrini and G. Valentini. A GPU-based algorithm for fast node label learning in large and unbalanced biomolecular networks. BMC Bioinformatics, BioMed Central 19(10), 2018.

J. Lin, M. Mesiti, M. Re and G. Valentini. Within network learning on big graphs using secondary memory-based random walk kernels. International Workshop on Complex Networks and their Applications, 2016.

M. Mesiti, M. Re and G. Valentini. Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction. GigaScience, BioMed Central 3(1), 2014.

M. Mesiti, M. Re and G. Valentini. Scalable network-based learning methods for automated function prediction based on the neo4j graph-database. Proceedings of the Automated Function Prediction SIG 2013—ISMB Special Interest Group Meeting, 2013.