|STAMOULAKATOU EIRINI||Cycle: XXXII |
Section: Computer Science and Engineering
Tutor: PERNICI BARBARA
Advisor: CERI STEFANO Major Research topic
:Computational tools for data-driven genomicsAbstract:
The aim of my research project is to develop innovative applications and methodologies that focus on the management of genomic problems. Under this scope, I developed studied an important problem: understanding if cancer can be explained by alterations of the tridimensional structure of the genome. We concentrate on the junctions between topological domains, i.e. specific genomic positions that are critical in the tridimensional structure of the genome; in these positions, we consider the role of CTCF, a protein that creates such tridimensional structure in connection with other protein complexes. By integrating public datasets from different databases (TCGA, ICGC and other) we are building models of the three different ways that can lead to deregulation of the junctions; we consider enrichment of somatic mutations, dysregulation of methylation and of copy number alterations. For different cancer types, we checked the enrichment of mutations attracted by CTCF binding sites falling inside the junctions in comparison with CTCF binding sites falling outside the junctions, and we analyzed the different type of mutations fall inside the mutated boundaries and of the close oncogenes. We also analyzed differences in the copy number variations and methylation between normal and cancer inside the CTCF binding sites on the junctions. The computational challenges of this research were to develop and apply the most suitable statistical methods for the different kind of datasets and aspects of the problem (unknown underlying model of the data).
In addition, I studied the problem of analysis of gene regulatory networks inferred from ChIP-seq data. Computational network biology aims to understand cell behavior through complex network analysis. The Chromatin Immuno-Precipitation sequencing (ChIP-seq) technique allows interrogating the physical binding interactions between proteins and DNA using Next-Generation Sequencing. Taking advantage of this technique, in this study we propose a computational framework to analyze gene regulatory networks built from ChIP-seq data. We focus on two different cell lines: GM12878, a normal lymphoblastoid cell line, and K562, an immortalized myelogenous leukemia cell line. In the proposed framework, we preprocessed the data, derived network relationships in the data, analyzed their network properties, and identified differences between the two cell lines through network comparison analysis. Throughout our analysis, we identified known cancer genes and other genes that may play important roles in chronic myelogenous leukemia.
Interests: Bioinformatics; statistics; data analysis; graph-theory; next-generation sequencing data;