Integrating Embedding and Network Community Detection to Unveil Terrestrial Virome Domains

Broadcast soon

Viruses, though accounting for a small portion of global biomass, are the most numerous organisms across all life domains. They play critical roles in regulating populations, transmitting diseases, and facilitating horizontal gene transfer. This study classifies samples from various environments based on their viromes—the collective viral populations within a given environment or organism. The KMAP platform provided data from over 2,000 samples across 19 biomes.



Virus taxa from each sample were analyzed using several grouping methods. Embedding techniques like principal component analysis (PCA) and uniform manifold approximation and projection (UMAP) were combined with hierarchical density-based spatial clustering of applications with noise (HDBSCAN) to form clusters. Additionally, a network-based approach employed a similarity measure to establish relationships between samples, applying community detection algorithms. Each clustering method defines groups differently, leading to varied results.



To reconcile these differences, consensus clustering was used, producing a clustering set that correlates well with individual methods and highlights the most relevant relationships. Five distinct virome domains were identified, primarily organized around the presence of three virus families: Nucleoviricota Phycodnaviridae, Phyxviricota Microviridae, and Uroviricota Kyanoviridae.



Contact details:

Juan Fernández Gracia

Contact form


This web uses cookies for data collection with a statistical purpose. If you continue browsing, it means acceptance of the installation of the same.


More info I agree