To structure our response to challenges and to help define specific questions and answers, over the last two days the attendees were divided up into specific groups. Each group was given a topic to respond to and tasked with writing up the response. A total of eight topics were assigned, namely: What is the project we are describing? What are the current limitations of microbial community analysis? Why will this project be novel, and why will it succeed? Who are the main beneficiaries of the project and its deliverables? How will the project enable predictive modeling of microbial communities? How much sampling and sequencing will be needed to answer the test specific hypotheses? How much will the project cost? What are the potential risks of the project? To start, the project was given a name; the group decided to pay homage to the Human Microbiome Project and the grandiose nature of the proposed research. In this way, The Earth Microbiome Project was born.
The Earth Microbiome Project (Web Site) presents a revolution in how we tackle the challenge of understanding the interactions among microbes and their environments, and defines both questions and a potential suite of tools to provide answers. We wish to sequence microbes and microbial communities from a broad range of biomes (an environment with unique environmental parameters, e.g. a hydrothermal vent on an abyssal plain) to achieve three main goals. First, to define microbial community structure, and to explore the factors that affect community structure at different scales. Second, to explore the protein universe and attempt to produce a complete inventory of protein family diversity. Finally, to curate this information to create a global database of samples, genes and proteins the can be used to answer fundamental questions about the ecology of life on and off the earth.
As envisioned, the Earth Microbiome Project would be a massively multidisciplinary effort to analyze microbial communities across the globe. The general premise is to examine microbial communities from their own perspective, which is formed by their immediate environments. This means that, from the perspective of a microbe the world is a very different place, it is sensed only by availability of nutrients and favorable environmental conditions, therefore, this is the way in which we should structure our attempt to explore their world. Hence we propose to characterize the Earth by environmental parameter space relevant to microbes, and then to explore these different biomes using samples currently available from researchers across the globe. We will analyze these communities using metagenomics, metatranscriptomics and amplicon sequencing to produce a global Gene Atlas describing protein space, environmental metabolic models for each biome, approximately 500,000 microbial genomes, and a data-analysis portal for visualization of all information.
To achieve these general aims, the EMP will focus on ten core questions which can be grouped into different sections:
Section 1 - Community Structure:
Are microbial communities structured primarily by environmental conditions or trophic/metabolic interactions?
If microbes are structured by environmental conditions, how do we define the Environmental Parameter Space (EPS) to characterize microbiomes?
What are the primary mechanisms of cross-kingdom interaction, metabolic or genetic?
Section 2 - Defining Physiology and Metabolic Capability:
Is ecosystem function defined by community taxonomy or by the trophic/metabolic dynamics in that ecosystem, i.e. who is doing what, how fast and by what mechanisms?
What is the role of rare microorganisms in an ecosystem, e.g. functional plasticity or specific biochemical function?
Section 3 - Practical Considerations:
How do we sample microbiomes to best explore global structure, e.g. temporal studies, experimentally controlled perturbations, biogeographic studies, and at what density?
How do we best use metagenomic data to re-assemble genomes, and what can we learn from this study to improve the yield of novel microbial genomes from metagenomic studies?
Section 4 - Models and Visualization:
What aspects/metrics of microbial community structure is it necessary to measure to enable parameterization of predictive ecological models?
At what taxonomic level does the pan-genome operate, and what controls this?
How do we most accurately visualize global microbial space, and what can this tell us about extraterrestrial microbial communities and fundamental ecology?
To date, virtually all studies have leveraged modest sequencing efforts against small numbers of environments, yet still yielding impressive returns of novel proteins and taxa. The Environmental Parameter Space (EPS) will cover many diverse environments, including marine (water, sediment, and host associated), freshwater (rivers, lakes, etc.), terrestrial (surface, subsurface, rock, etc.), air (particle associated, rain water associated, etc.), extreme environments (extremes of temperature, salinity, pH, UV exposure, desiccation, pressure, etc.) and manmade locations (human interactive environments, cities, agricultural practice areas, transportation, etc.). However, environmental samples will not be the sole aim. We will also explore lab-based mesocosm and microcosm studies in which environmental manipulation will enable us to identify microbial community dynamics (e.g. Winogradsky columns). We propose a systematic re-evaluation and characterization of microbial studies relevant to the global scale analysis of protein space, and microbial diversity with the following tasks:
Defining Environmental Parameter Space (EPS). Microbes live in EPS and tend to show similar functional and taxonomic properties when they are isolated from similar environments. We will define microbial communities by their position in EPS.
Defining the Ideal Sampling Strategy. EPS-defined biomes will be used to describe an “ideal” sampling strategy, i.e. one that provides enough breadth and depth to best determine the full extent of the protein universe, to explore trophic level interactions in microbial communities, and to facilitate generalization of results across biomes and across spatial and temporal scales (to allow for predictive modeling). We recognize that even 200,000 samples will merely scratch the surface of microbial diversity: the goal of this project is not to characterize all microbial life on the planet, but to provide a framework for assisting in interpreting and integrating vast numbers of other studies as sequencing increasingly becomes commoditized.
Defining the Realistic Sampling Strategy. We intend to catalogue the Global Environmental Sample Inventory (GESI) derived from thousands of researchers. We will only use samples that adhere to the Minimal Informatic about a Metagenomic Sequence (MIMS)  and an Environmental Sequence (MIENS)  standards, and will determine the extent to which the currently available GESI fulfills the “ideal” sampling strategy. This topic was covered in a closed meeting of the EMP advisory committee in October 2010 .
EMP Sequencing Strategy. The GESI will be sequenced strategically based on priority samples that best fit the ‘ideal’ sampling strategy. This task will involve ~200,000 sampling locations over 3 years; from each we will sequence ~30 million reads (~6 billion base pairs) divided unequally between total community DNA (metagenomics), mRNA (metatranscriptomics) and amplified marker-genes (e.g. 16S rRNA).
EMP Deep Sequencing Strategy of Selected Environments. We will pick 100 samples against which to leverage 60 billion reads (12 Trillion base pairs) per sample spread across 4 replicates with metagenomics, metatranscriptomics and amplified gene-markers.
Application of a Standard Data Analysis Pipeline. Analysis of that data including metrics of diversity, richness, statistical similarity, non-parametric analyses are essential to provide a resource for the community. We will compute Alpha- (within-sample) and Beta- (between samples/over time or space) diversity indices, protein annotation and conserved domain prediction, 3-D protein structure prediction, genome assembly from metagenomic data, functional metabolic modeling (modelSEED/Relative Metabolic Flux analysis), non-parametric statistical analysis (canonical correlations, network mapping, ANOSIM, dendrogram clustering from dissimilarity matrices, etc.).
Genome Analysis. We estimate that from the ~2.4 quadrillion base pairs of sequencing data we could assemble over 500,000 microbial genomes, helping to answer many questions regarding microbial evolution, the pan-genome concept, and horizontal gene transfer.
The Earth Microbiome Project will have many deliverables. We believe that, like previous mega-sequencing projects, the data provided here will produce spin-out research which will make significant contributions to our understanding of microbial ecology. We cannot know or implement the requirements of every researcher; hence, we aim to provide the data, and analysis of the data, in a format which is widely accessible. The following key deliverables will be of considerable benefit to a wide number of communities: Gene Atlas (GA) – a centralized repository and database for all sequencing and metadata information acquired during this study. Earth Microbiome Assembled Genomes (EM-AG) – all metagenome-derived assembled microbial genomes will be deposited in public repositories. Earth Microbiome VIsualisation Portal (EM-VIP) –we want to view the Earth from the perspective of microbes, describing environmental parameter space and genomic functional space. Earth Microbiome Metabolic Reconstruction (EMMR) – based on metagenomic metabolome description and prediction (e.g. modelSEED and Relative Metabolic Flux) we will describe changes in metabolite profiles between all samples. We will deliver descriptions regarding metabolite production in specific biomes, providing another metric against which to refine biome descriptions.
The Earth Microbiome Project (EMP) must be, of necessity, a cross-discipline effort, involving microbial ecologists, genomicists, microbiologists, physicists, computer scientists, mathematicians, and ecosystem modelers, to provide the most comprehensive global assessment of microbial life ever seen. Additionally, similar to the Human Genome Project, which has revolutionized biomedicine, the proposed Earth Microbiome Project will revolutionize the way we can assess and model the health of our changing planet. This has implications for global change science, and understanding the intimate connections between the machinery of our planet- the microbes- and their ecosystems.