Section: Computer Science and Engineering
Tutor: TANCA LETIZIA
Advisor: MASSEROLI MARCO Major Research topic
:Computational methods for data-driven predictions and understanding of biological interactionsAbstract:
Human cells are complex biological systems in which complex phenomena take place, including molecular interactions. Mechanisms of cell regulation, differentiation and development derive from such complexity, and they can be explored through system biology, i.e., a common approach for studying molecular interactions involved in specific functions within a cell. Complex networks provide a generalizable method to represent object associations and to understand the overall structure of complex systems. More in detail, biological networks are computational models typically employed in system biology for representing functionalities and structure of molecular relationships. They have been used for the property of abstraction, since they can represent the system's components as nodes and connections between them as links. Network-based approaches allow a global vision of each node's contribution, providing insights that other methods based on single node analyses cannot give. Indeed, they are able to significantly improve our knowledge of biological systems and shed light on pathological disruptions occurring in the cell. Moreover, one of their main features is their ability to easily integrate data from various sources. During the last decade, massive efforts were made to build public databases of biological data; among them, next-generation sequencing (NGS) and drug-related data are the ones used for this project. Complex networks are the perfect paradigm to answer to several different biological questions using heterogeneous data.
This Thesis focuses on three main steps of network biology: network inference, link prediction and network feature extraction, where each of these topics is developed on three different applications to answer major biological questions, using innovative computational methods and showing significant results. My work offers a broad picture regarding what complex networks may accomplish in biology and contributes to delivering advances in this field. From the computational perspective, I developed novel approaches to build, predict and analyze complex networks, whereas, from a biological standpoint, the achieved results have a significant impact on each application.
During my first year of PhD, I focused my work on identifying functional interaction networks among transcription factors (TF), i.e., proteins that control gene transcription by acting in complexes. Performing network inference to identify TF interaction networks is a notable task to understand the genome regulation framework and its changes when subjected to external stimuli. We developed an approach based on the computation of association rules and the definition of a novel Importance Index, which leads to the creation of TF interaction networks in user-selected genomic regions. The Importance Index provides a relevance measure of TF interactions; thus, inferred networks have TF as nodes and their relationships are weighted according to the Importance Index between them.
To explore the link prediction problem of network biology, I focused on an original drug repurposing approach by designing a drug-centred network and by leveraging the Non-negative Matrix Tri-Factorization (NMTF) method to obtain drug-centric predictions. Computational drug repurposing proposes alternative indications for already in use drugs bypassing the highly expensive and lengthy drug discovery process. Thus, we answered to a crucial biological question by modelling drugs, their protein targets, diseases and biological pathways as nodes of a multilayered network in which we predicted links between drugs and other nodes. We innovatively combined the NMTF method with a shortest-path evaluation of drug-protein pairs using the protein-to-protein interaction network, increasing the correctness of our link predictions and the pool of possible protein targets.
During the last period of my PhD, I worked on network feature extraction by providing a general framework to infer relevant genes from multiple gene co-expression networks. Thus, I investigated a famous network biology problem by focusing on a specific application: identifying cancer-related genes to be experimentally validated. The hallmark of this work is the use of a combination of gene co-expression networks based on different similarity measures for the normal and cancer condition individually, and the subsequent fusion of two condition networks. Fused networks are disease-specific; thus, the extracted gene communities represent important features of the disease-specific networks.
This manuscript is organized as follows:
Chapter 1 provides the background needed to understand considered biological applications; Chapter 2 presents motivations and goals for applying complex network techniques in biology; Chapter 3 describes data and general methods adopted in this work; Chapter 4, Chapter 5 and Chapter 6 provide real-world applications of network inference, link prediction and network feature extraction, respectively; Chapter 7 concludes the Thesis with discussions and future developments.