Genome sequence of Ensifer medicae strain WSM1369; an effective microsymbiont of the annual legume Medicago sphaerocarpos

Ensifer medicae WSM1369 is an aerobic, motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of Medicago. WSM1369 was isolated in 1993 from a nodule recovered from the roots of Medicago sphaerocarpos growing at San Pietro di Rudas, near Aggius in Sardinia (Italy). WSM1369 is an effective microsymbiont of the annual forage legumes M. polymorpha and M. sphaerocarpos. Here we describe the features of E. medicae WSM1369, together with genome sequence information and its annotation. The 6,402,557 bp standard draft genome is arranged into 307 scaffolds of 307 contigs containing 6,656 protein-coding genes and 79 RNA-only encoding genes. This rhizobial genome is one of 100 sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.


Introduction
One of the key nutritional constraints to plant growth and development is the availability of nitrogen (N) in nutrient deprived soils [1]. Although the atmosphere consists of approximately 80% N, the overwhelming proportion of this is present in the form of dinitrogen (N2) which is biologically inaccessible to most plants and other higher organisms. Before the development of the Haber-Bosch process, the primary mechanism for converting atmospheric N2 into a bioaccessible form was via biological nitrogen fixation (BNF) [2]. In BNF, N2 is made available by specialized microbes that possess the necessary molecular machinery to reduce N2 into NH3. Some plants, most of which are legumes, have harnessed BNF by evolving symbiotic relationships with specific N2-fixing microbes (termed rhizobia) whereby the host plant houses the bacteria in root nodules, supplying the microsymbiont with carbon and in return receives essential reduced N-containing products [3]. When BNF is exploited in agriculture, some of this N2 fixed into plant tissues is ultimately released into the soil following harvest or senescence, where it can then be assimilated by subsequent crops. Compared to industrially synthesized Nbased fertilizers, BNF is a low energy, low cost and low greenhouse-gas producing alternative and hence its application is crucial to increasing the environmental and economic sustainability of farming systems [4].
Forage and fodder legumes play vital roles in sustainable farming practice, with approximately 110 million ha under production worldwide [5], a significant proportion of which is made up by members of the genus Medicago. Ensifer meliloti and E. medicae are known to nodulate and fix N2 with Medicago spp [6], although they have differences in host specificity. While E. meliloti strains do not nodulate M. murex, nodulate but do not fix N2 with M. polymorpha and nodulate but fix very poorly with M. arabica [7,8], they are able to nodulate and fix N2 with Medicago species originating from alkaline soils including the perennial M. sativa and the annuals M. littoralis and M. tornata [9,10]. In contrast, E. medicae strains can nodulate and fix N2 with annuals well adapted to acidic soils, such as M. murex, M. arabica and M. polymorpha [7,8].
The E. medicae strain WSM1369 was isolated from a nodule collected from M. sphaerocarpos growing at San Pietro di Rudas, near Aggius in Sardinia (Italy). This strain nodulates and fixes N2 effectively with M. polymorpha and M. sphaerocarpos [8]. Like M. murex and M. polymorpha, M. sphaerocarpos is an annual species which is tolerant of low pH soils [11], with studies suggesting that it only establishes N2-fixing associations with E. medicae strains [8,9]. However, owing to a paucity of symbiotic information, it is not yet clear whether M. sphaerocarpos fixes N2 with a wide range of E. medicae strains or if this ability is restricted to a smaller set of E. medicae accessions. Therefore, genome sequences of E. medicae strains effective with M. sphaerocarpos will provide a valuable genetic resource to further investigate the symbiotaxonomy of Medicago-nodulating rhizobia and will further enhance the existing available genome data for Ensifer microsymbionts [12][13][14][15]. Here we present a summary classification and a set of general features for this microsymbiont together with a description of its genome sequence and annotation.

Classification and features
E. medicae WSM1369 is a motile, non-sporulating, non-encapsulated, Gram-negative rod in the order Rhizobiales of the class Alphaproteobacteria. The rod-shaped form varies in size with dimensions of approximately 0.25-0.5 μm in width and 1.0-1.5 μm in length ( Figure 1 Left and 1 Center). It is fast growing, forming colonies within 3-4 days when grown on TY agar [16] or half strength Lupin Agar (½LA) [17] at 28°C. Colonies on ½LA are opaque, slightly domed and moderately mucoid with smooth margins (Figure 1 Right). Minimum Information about the Genome Sequence (MIGS) is provided in Table 1. Figure 2 shows the phylogenetic neighborhood of E. medicae WSM1369 in a 16S rRNA sequence based tree. This strain shares 100% sequence identity (over 1290 bp) to the 16S rRNA of E. medicae A321 T and E. medicae WSM419 [13] and 99% sequence identity (1362/1366 bp) to the 16S rRNA of E. meliloti Sm1021 [12].

Symbiotaxonomy
E. medicae strain WSM1369 was isolated in 1993 from a nodule collected from the annual M. sphaerocarpos growing at San Pietro di Rudas, near Aggius, Sardinia in Italy (J. G. Howieson, pers. comm.). The site of collection was undulating grassland, with a soil derived from granite materials that had a depth of 20-40 cm and a pH of 6.0. The soil was a loamy-sand and Lathyrus and Trifolium spp. grew in association with M. sphaerocarpos. WSM1369 forms nodules (Nod + ) and fixes N2 (Fix + ) with M. polymorpha and M. sphaerocarpos [8].  Phylum Proteobacteria TAS [20] Class Alphaproteobacteria TAS [21,22] Order Rhizob iales TAS [21,23] Family Rhizob iaceae TAS [24,25] Genus Ensifer TAS [26][27][28] Species , not directly observed for the living , isolated sample, but based on a g enerally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontolog y project [30]. All sites were informative and there were no g ap-containing sites. Phylog enetic analyses were performed using MEGA, version 5 [31]. The tree was built using the Maximum-Likelihood method with the General Time Reversible model [32]. Bootstrap analysis [33] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Brackets after the strain name contain a DNA database accession number and/or a GOLD ID (beginning with the prefix G) for a sequencing project registered in GOLD [34]. Published g enomes are indicated with an asterisk. Standards in Genomic Sciences

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [34] and a standard draft genome sequence in IMG. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
E. medicae WSM1369 was cultured to mid logarithmic phase in 60 ml of TY rich medium on a gyratory shaker at 28°C [35]. DNA was isolated from the cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [36].

Genome sequencing and assembly
The genome of Ensifer medicae WSM1369 was sequenced at the Joint Genome Institute (JGI) using Illumina technology [37]. An Illumina standard shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform which generated 13,712,318 reads totaling 2,057 Mbp.

Genome annotation
Genes were identified using Prodigal [41] as part of the DOE-JGI annotation pipeline [42]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. The tRNAScanSE tool [43] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [44]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [45]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG-ER) platform [46].

Genome properties
The genome is 6,402,557 nucleotides with 61.13% GC content ( Table 3) and comprised of 307 scaffolds (Figure 3) of 307 contigs. From a total of 6,735 genes, 6,656 were protein encoding and 79 RNA only encoding genes. The majority of genes (74.14%) were assigned a putative function while the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4.