DisMaNET: A network-based tool to cross map disease vocabularies
Eduardo P.García del Valle; Gerardo Lagunes García; Lucía Prieto Santamaría; Massimiliano Zanin; Ernestina Menasalvas Ruiz; Alejandro Rodríguez-González
Computer Methods and Programs in Biomedicine 207, 106233 (2021)
Background and Objectives
The growing integration of healthcare sources is improving our understanding of diseases. Cross-mapping resources such as UMLS play a very important role in this area, but their coverage is still incomplete. With the aim to facilitate the integration and interoperability of biological, clinical and literary sources in studies of diseases, we built DisMaNET, a system to cross-map terms from disease vocabularies by leveraging the power and interpretability of network analysis.
First, we collected and normalized data from 8 disease vocabularies and mapping sources to generate our datasets. Next, we built DisMaNET by integrating the generated datasets into a Neo4j graph database. Then we exploited the query mechanisms of Neo4j to cross-map disease terms of different vocabularies with a relevance score metric and contrasted the results with some state-of-the-art solutions. Finally, we made our system publicly available for its exploitation and evaluation both through a graphical user interface and REST APIs.
DisMaNET contains almost half a million nodes and near nine hundred thousand edges, including hierarchical and mapping relationships. Its query capabilities enabled the detection of connections between disease vocabularies that are not present in major mapping sources such as UMLS and the Disease Ontology, even for rare diseases. Furthermore, DisMaNET was capable of obtaining more than 80% of the mappings with UMLS reported in MonDO and DisGeNET, and it was successfully exploited to resolve the missing mappings in the DISNET project.
DisMaNET is a powerful, intuitive and publicly available system to cross-map terms from different disease vocabularies. Our study proves that it is a competitive alternative to existing mapping systems, incorporating the potential of network analysis and the interpretability of the results through a visual interface as its main advantages. Expansion with new sources, versioning and the improvement of the search and scoring algorithms are envisioned as future lines of work.