|Thesis abstract: |
One main challenge in next generation sequencing (NGS) is handling, analysing and integrating huge amounts of heterogeneus data, providing an efficient and scalable data model. The development of such model will enable the identification of patterns among molecular and structural genomic regions, aiming to retrieve, fetch and predict all the functional genomic features. Only few of these features are currently available in biomolecular databases, expressed through controlled vocabularies and ontologies. Many of these genomic features are extracted through the analysis of different types of NGS data. In this scenario, the creation of pre-processing and analysis pipelines is a critical step. This new data model will support easy and efficient data storage and retrieval, allowing full tracking of original NGS data and metods used to analyse it. Finally, such data model should support both quantitative and visual feature evaluations, enabling searches by complex multi-feature queries.