Project acronym: EDEN
Project full title: Ecological Diversity and Evolutionary Networks
Type of contract:
SPECIFIC TARGETED RESEARCH OR INNOVATION PROJECT (STREP)
Contract number: 043251
Start date: January 1st, 2007
End date: December 31th, 2009
Project Coordinator: Emilio Hernández-García, IFISC, Instituto de Física Interdisciplinar y Sistemas Complejos (CSIC-UIB), Palma de Mallorca, Spain.
The last Administrative Project Officer is: Christine Wichern
Previous Scientific Project Officers: Alejandro Martín-Hobdey, Iliana Nikolova, Pilar López
Motivation and background
Although elusive to be defined with precision, the concept of complex system is always associated to the presence of a large number of interacting units, so that emergent properties arise that are fundamentally different from the properties of a single element in isolation. The global behaviour is more dependent on the set of interactions than on the nature of the interacting units themselves. Thus it is not strange that the study of complex networks, representing the interactions and relations among components, has established itself as one of the central paradigms in the science of complex systems.
Biological systems, with interacting entities organized in fascinating sequences of hierarchically nested structures (ecosystems, communities, foodwebs, populations, genetic networks,...), have always been quoted as the archetype of complexity in the sense defined above. Although the network picture has been used by the specialists for many years in order to describe some biological systems, it is nowadays that it becomes apparent that there is a need to progress from a static depiction of these networks to the formal exploration of the dynamics that has shaped them and of their subsequent evolution. The paradigm of a large number of interacting units linked up with each other in dynamically evolving networks is required to comprehend many biological systems as a whole (Proulx et al., 2005). In particular, there have been detailed analysis of protein interaction networks (Jeong et al.,2001), genetic regulation networks (Davidson et al., 2002), metabolic networks (Wagner and Fell, 2001), etc., and topological characteristics have been related to functionality (Wagner and Fell, 2001; Jeong et al., 2000). At a level of description more macroscopic, that of populations, interaction topology has been shown to be nontrivial and relevant in cases such as communication among social insects (Fewell, 2003) and mammals (Lusseau and Newman, 2004), and food webs (Williams et al., 2002; Dunne et al., 2002). Many aspects of these studies have been possible thanks to the intense development that the science of Networks has experienced in recent years with contributions from areas such as Statistical Physics or Applied Mathematics(Albert and Barabási, 2002; Dorogovtsev and Mendes, 2002; Newman, 2003). Areal transfer of concepts and tools has occurred here between the physical and the biological sciences.
Amongst the processes through which organisms interact, the most dynamic and fundamental is that of genetic interactions, which conforms the basis for evolutionary processes. The advent and rapid development of technologies to efficiently sequence genomes in the "genomic" era has greatly expanded the available empirical basis on the genetic structure of organisms. In fact the accretion of sequence data far exceeds the capacity of researchers to assimilate these data and derive insights into the gene flow and associated evolutionary processes in populations. Efforts to assimilate the large throughput of genomic data being produced at present largely involved the development of data bases, through bioinformatics, but little progress has been made in the development of new modelling tools to extract relevant information on gene flow and associated evolutionary processes in populations from these data. The complexity resulting from the fact that (1) this problem involves the interaction between multiple agents exchanging genes in the populations and (2) these exchanges lead to genetic innovation and, therefore, to evolutionary processes, renders the examination of gene flow in populations from molecular data a most appropriate field for rapid progress through the use of Network theory. Surprisingly, however, such opportunities have not yet been explored, and the bulk of analyses of gene flow in populations are based on tenets of classical theories with premises, such as random mixing or equilibrium (Wright, 1939; Hey and Machado, 2003), often violated by the data.
The present proposal of STREP aims at addressing this gap and research opportunity by considering in depth for the first time the representation of ecological and evolutionary relationships among biological entities of different kinds -organisms, populations, taxa- in terms of networks leading to the development of methods to identify such structures from genetic data, and explore evolutionary processes and the underlying gene flow among them. The approach is to use the most advanced methods available to examine complex systems to address gene flow and population genetics, thereby expanding the range and power of the limited toolbox available at present for the analysis of these processes.
Genetic information (being it in the form of nucleotide or aminoacid sequences, restriction site or satellite data, etc.) has become a major tool to analyze the structure of populations, their interchanges, and their evolution. At the interspecific level the evolutionary relationship between species is usually represented in phylogenies, i.e. evolutionary trees. In fact, the whole diversity of life on Earth is usually depicted as the Tree of Life. A tree is a particular type of network in which there are no cycles. This particular structure is dictated by the assumption that each species descends from a unique ancestor, although a common ancestor may have evolved into several newer species. This topology, although relatively simple, is already rich enough to allow the consideration of community structure, split unbalance, scaling issues, and other properties. Departures from this simple paradigm, however, arise in a variety of situations: The strongest one appears when building intraspecific genealogies. In species that reproduce sexually, each individual has two immediate ancestors. Recombination mixes gene lines of different individuals. At the interspecific level, lateral gene transfer is known to occur among bacteria, and it is very likely that it played a major role in connecting lineages along the evolution and history of life in our planet. Replacing the concept of "Tree of Life" by that of the "Ring of Life" has been proposed from evidences of genome fusion (Rivera and Lake, 2004). Hybridization between lineages gives rise to cycles in gene genealogies. Clearly, the arising of a new species by allopolyploidy (combination of the genome of two species in a single, consequently new, one) can not be represented in the usual phylogeny trees, and homoplasy (the development of similar genes by independent evolution of separated lineages) can be better represented in phylogeny topologies richer than simple trees. "Reticulate evolution" is the term coined for the evolutionary events that can not be adequately represented within simple trees. Thus, complex networks arise in the context of intraspecific and interspecific gene flow and evolution.
After this background, we are in position to enumerate the project objectives:
- Identification and analysis of genetic diversity networks
This objective will be followed at two levels of description, intraspecific and interspecific:
- Construction and analysis of networks of population genetic diversity.
A large sample of genetic data (mainly DNA microsatellite repetitions) from endangered marine plants of great ecological value in coastal environments(Posidonia oceanica, Cymodocea nodosa, Zostera noltii and Zostera marina) will be collected and used for this end. The clonal character of the chosen species, by allowing the production of genetically-identical organisms which may in addition reproduce within themselves or even experience mutations, generates types of genetic diversity qualitatively beyond the spectrum encountered in non-clonal organisms. Innovative network methods will be developed to address questions on population structure, gene flow, evolution,biogeography, and conservation of threatened ecosystems.
- Construction and analysis of phylogenies with rich structure beyond trees.
Starting from DNA sequence data of major prokaryote and eukaryote groups, the tools of reticulate evolution (Posada and Crandall, 2001;Legendre and Makarenkov, 2002) and novel approaches based on network theory will be used to build and analyze phylogenies with rich structure beyond trees. The aim is to gain insight on the evolution of life, from specific lineages to the entire spectrum of life forms, by evolving from the present concept of the "Tree of Life" to a more flexible concept of "Network of Life".
- Construction and analysis of networks of population genetic diversity.
- Computer modelling of ecological and evolutionary networks
In addition to the analysis in depth of experimental data, computer simulation has been a major tool to develop understanding of complex systems.Again the computer modelling will be followed at the two previous scales:
- Modelling at the population scale.
Building on previous experience on the quantitative modelling of growth processes of marine plants, the inclusion of genomic characteristics in the organism modelled will allow gaining understanding and predictive power from spatiotemporal simulations of space occupation, genetic diversity and fragmentation, gene flow, and related issues relevant for conservation policies. Including phenomena such as sexual reproduction or recombination clearly add cycles to any graph theory representation of the population dynamics.
- Modelling at the evolutionary scale.
At a higher level, most of the current methods to infer phylogenies from genetic data are based on evolutionary models. They connect the quantities relevant to estimate distances in evolutionary trees or networks, such as the time since the divergence of two genes, with observables such as base-pair differences between the corresponding sequences. Stochastic models clarify the relation between statistical properties of trees such as split imbalance or branch size distribution with macroevolutionary processes (Aldous, 2001). Additionally, evolution models involve networks when they include interaction between genotypes than influence their fitness or reproduction ability. Models have been developed (Christensen et al., 2002) in which the interaction network coevolves with the genotype. Developments of these kinds of models will be necessary to assess the biological significance and calibrate parameters when building non-standard phylogenies beyond the tree structure.
- Modelling at the population scale.
- General concepts and tools for Complexity Science
Beyond the specific systems to which EDEN is devoted, ideas with clear applications in many areas of Complexity research will be developed. We consider this advance in conceptual frameworks, and its implementation in software tools, as an objective in itself. To complete the traditional tree-building and more recent network-building methodologies, novel methods of network construction will be also introduced and evaluated. In particular, new metrics describing distances between individual organisms, as opposed to distances between species or haplotypes, will be developed. The use of correlation measures and the identification of nested families of networks defined by the introduction of a variable correlation threshold will be formalized, analyzed, and compared with other methodologies. In particular the hierarchical relationships unveiled by modern community detection methods, and by exploiting the variable threshold of the correlation methods, will be compared with phylogenies constructed by classical tools. Characterization of weighted networks is another general subject in which EDEN will advance.
- WP1: Data collection
- To produce and/or compile the intraspecific and interspecific genetic databases for the organisms used in the project. These will be 1) microsatellite data for populations of the seagrasses Posidonia oceanica, Cymodocea nodosa, Zostera noltii, and the macroalgae Fucus spp. and Caulerpa spp. 2) Trait-associated gene (TAG) marker data derived from EST (Expressed Sequence Tag) databases and consisting of EST-microsatellites and/or single nucleotide polymorphisms (SNPs) for Fucus spp. and Zostera noltii. 3) Sequence data compiled from public databases (e.g., GenBank) for different domains of the Tree of Life.
- To organize these data in a form accessible and usable by the rest of WPs.
- To compile and post in usable form the different networks built in the project.
- WP2: Network theory Toolbox
- Provide tools for network analysis to be used by the rest of the Project.
- Develop novel conceptual frameworks: weighted networks, networks embedded in physical space, thresholded correlation networks and their community structure and percolation properties.
- Analyze the potential of classical and modern phylogeny construction methods to understand community structure in more general complex networks.
- WP3: Dynamical and spatiotemporal modelling
- Understanding the genetic structure of clonal plant meadows in terms of the biological and spatial processes occurring in them.
- To obtain evolution models allowing the interpretation of reticulate phylogenies.
- WP4: Population ecology networks
- Obtaining relevant information on the ecological processes influencing the extent of clonality, and on the evolutionary processes (mating system, migration) shaping the genetic structure of meadows of clonal plants with the tools developed in the project.
- Extending the above to the whole biogeographical range of the studied plants, and to the general analysis of metapopulations dynamics.
- WP5: Phylogenies and the Tree of Life
- Get insight into the way evolution shapes phylogenetic trees.
- Advance in the consideration of the whole Tree of Life and its different parts as complex objects arising from interactions that can give them a topology richer that the one of a simple tree.
- WP6: Management and assesment
- To provide the Scientific Coordination that would give coherence and excellence in the results to the project as a whole.
- To provide the Administrative Coordination of the financial and formal aspects of project development
- To assure the efficient communication among partners, with the Commission and with the public.