|BERNASCONI ANNA||Cycle: XXXII |
Section: Computer Science and Engineering
Tutor: PERNICI BARBARA
Advisor: CAMPI ALESSANDRO Major Research topic
:Metadata Integration Framework for Genomic DatasetsAbstract:
The integration of genomic metadata is at the same time an important, well-defined and difficult challenge. It is important, because a wealth of public data repositories is available to drive biological and clinical research - combining information from various heterogeneous and widely dispersed sources is paramount for a number of biological discoveries. It is well-defined, because most of the important sources of genomic datasets have dedicated efforts in locally defining metadata. It is difficult, because the domain is complex and there is no agreement among the various metadata definitions, which refer to different vocabularies and ontologies. In this paper, we provide a systematic approach to the development of a metadata management framework that integrates metadata from a variety of genomic datasets. Our approach is based upon a structured transformation process, with well defined use of abstractions and algorithms. Along the way, we adopt a variety of classical techniques of data and knowledge engineering, including several rule-based methods. The result is a framework that already integrates several important sources, and a general, open and extensible process that can easily incorporate new data sources.