Complete genome sequence of the Medicago microsymbiont Ensifer (Sinorhizobium) medicae strain WSM419

Ensifer (Sinorhizobium) medicae is an effective nitrogen fixing microsymbiont of a diverse range of annual Medicago (medic) species. Strain WSM419 is an aerobic, motile, non-spore forming, Gram-negative rod isolated from a M. murex root nodule collected in Sardinia, Italy in 1981. WSM419 was manufactured commercially in Australia as an inoculant for annual medics during 1985 to 1993 due to its nitrogen fixation, saprophytic competence and acid tolerance properties. Here we describe the basic features of this organism, together with the complete genome sequence, and annotation. This is the first report of a complete genome sequence for a microsymbiont of the group of annual medic species adapted to acid soils. We reveal that its genome size is 6,817,576 bp encoding 6,518 protein-coding genes and 81 RNA only encoding genes. The genome contains a chromosome of size 3,781,904 bp and 3 plasmids of size 1,570,951 bp, 1,245,408 bp and 219,313 bp. The smallest plasmid is a feature unique to this medic microsymbiont.


Introduction
Agricultural systems are nearly always nitrogen deficient, a factor which grossly limits their productivity. In fact, each year some 50 Tg of nitrogen is harvested globally in food crops [3], and must be replaced. External inputs of nitrogen to agriculture may come from mineral fertilizers, the production of which is heavily dependent on fossil fuels. Alternatively, nitrogen can be obtained from symbiotic nitrogen fixation (SNF) by root nodule bacteria (rhizobia) on nodulated legumes [4]. SNF is therefore considered a key biological process on the planet. The commonly accepted figure for global SNF in agriculture is 50-70 million metric tons annually, worth in excess of U.S. $10 billion [5]. Rhizobia associated with forage legumes contribute a substantial proportion of this fixed nitro-gen across 400 million ha [5]. The amount fixed annually by the Ensifer (Sinorhizobium)-Medicago symbiosis is estimated to be worth $250 million. A particular constraint to the formation of this symbiosis is acidity, due mainly to the acidsensitive nature of the microsymbionts [6]. In laboratory culture, the medic microsymbionts fail to grow below pH 5.6 and are considered to be the most acid-sensitive of all the commercial root nodule bacteria [7]. Many agricultural regions have moderately acidic soils (typically in the pH range of 4.0 to 6.0) and this has prevented the Ensifer-Medicago symbiosis reaching its full potential [8]. Consequently, an effort was initiated in the 1980s to discover more acid-tolerant medic microsymbionts from world regions with acidic soils upon which annual medics had evolved. A particular suite of strains isolated from acidic soils on the Italian island of Sardinia proved to be acid soil tolerant [9], an attribute we now know is related to the presence of a unique set of genes required for acid adaptation [10]. Characterization of these acid-tolerant isolates revealed that they belonged to the species E. medicae and could be symbiotically distinguished from the related species E. meliloti by their unique capacity to fix nitrogen in association with annual acid soil adapted Medicago hosts of worldwide agronomic value [11], as well as with the perennial forage legume M. sativa (alfalfa) [12]. One of the acid-tolerant isolates, E. medicae strain WSM419, was isolated in 1981 from a nodule recovered from the roots of an annual medic (M. murex) growing south of Tempio in Sardinia. WSM419 is of particular interest because it is saprophytically competent in the acidic, infertile soils of southern Australia [9,13], and it is also a highly effective nitrogen fixing microsymbiont of a broad range of annual medics of Mediterranean origin [11,12]. These attributes contributed to the commercialization of the strain in Australia as an inoculant for acid soil medics between 1985 and 1993 [14,15]. Here we present a summary classification and a set of features (Table 1) for E. medicae strain WSM419, together with the description of a complete genome sequence and annotation.

Classification and features
E. medicae strain WSM419 forms mucoid colonies that may appear as donut shaped (Figure 1, left) on specific media such as YMA [13]. It is a Gramnegative, non-spore-forming rod (Figure 1, center) that has peritrichous flagellae ( Figure 1, right). In minimal media E. medicae WSM419 has a mean generation time of 4.1 h when grown at 28°C [33]. It is a member of the Rhizobiaceae family of the class Alphaproteobacteria based on phylogenetic analysis. Figure 2 shows the phylogenetic neighborhood of E. medicae strain WSM419 inferred from a 16S rRNA based phylogenetic tree. An intragenic fragment of 1,440 bp was chosen since the 16S rRNA gene has not been completely sequenced in many type strains. A comparison of the entire 16S rRNA gene of WSM419 to completely sequenced 16S rRNA genes of other sinorhizobia revealed 4 and 18 bp mismatches to the reported sequences of E. meliloti (Sm1021) and E. fredii (YcS2, 15067 and SjzZ4), respectively. Standards in Genomic Sciences All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 3.1 [34]. Kimura two-parameter distances were derived from the aligned sequences [35] and a bootstrap analysis [36] as performed with 500 replicates in order to construct a consensus unrooted tree using the neighbor-joining method [37] for each gene alignment separately. Genera in this tree include Bradyrhizobium (B); Mesorhizobium (M); Rhizobium (R); Ensifer (Sinorhizobium) (S). Type strains are indicated with a superscript T. Strains with a genome sequencing project registered in GOLD [31] are in bold red print. Published genomes are designated with an asterisk.

Symbiotaxonomy
E. medicae and E. meliloti are traditionally separated on the basis of the effective nodulation (Nod + , Fix + ) by E. medicae on M. polymorpha [38]. Specific symbiotic characteristics that further distinguish E. medicae WSM419 from E. meliloti include its ability to nodulate and fix nitrogen effectively with a wide range of annual Mediterranean medics, including M. polymorpha, M. arabica, M. murex and M. sphaerocarpos. WSM419 is symbiotically competent with these species when grown in acidic soils [39]. In contrast, WSM419 is Fixwith the alkaline soil species of annual medics such as M. littoralis, M. tornata and hybrids of M. littoralis/M. truncatula [11,40]. WSM419 is also Nod + , Fix + with the perennial forage legume M. sativa [11,12] but is less effective with this species than are some E. meliloti isolates. However, WSM419 is more effective at fixing nitrogen with M. truncatula than the previously sequenced E. meliloti Sm1021, making it an ideal candidate for inoculation of this model legume [12]. Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [32]. If the evidence code is IDA, then the property was directly observed for a living isolate by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and annotation
Genome project history E. medicae WSM419 was selected for sequencing on the basis of its importance as a symbiotic nitrogen fixing bacterium in agriculture, and its tolerance for acidic soils [9,14]. This strain was selected for sequencing as part of the Community Sequencing Program of the Joint Genome Institute (JGI) in 2005. The genome project is deposited in the Genomes OnLine Database [31] and the complete genome sequence in GenBank. A summary of the project information is shown in Table 2. Growth conditions and DNA isolation E. medicae strain WSM419 was grown to mid logarithmic phase in TY medium (a rich medium) [41] on a gyratory shaker at 28°C. DNA was isolated from 60 ml of cells using a CTAB (Cetyl trimethylammonium bromide) bacterial genomic DNA isolation method (JGI general information).

Genome sequencing and assembly
The genome was sequenced using a Sanger platform. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website. Sequence data statistics from the trace archive for this project are presented in Table 3.
All reads were assembled using the phrap assembler. Possible mis-assemblies were corrected and gaps between contigs were closed by custom primer walks from sub-clones or PCR products. Processing of sequence traces and base calling and assessment of data quality and assembly were performed with the PHRED/PHRAP/CONSED package [42][43][44]. The initial draft assembly was produced from 84,192 high-quality reads and consisted of 30 contigs (each with at least 20 reads per contig). Gaps in the sequence were primarily identified by mate-pair sequences and then closed by primer walking on gap-spanning library clones or genomic DNA amplified PCR products. True physical gaps were closed by combinatorial and multiplex PCR. All repeated sequences were addressed using mate-pair sequences and PCR data. Sequence finishing and polishing added 638 reads. The final assembly of the main chromosome and 3 plasmids from 84,830 reads produced approximately 13-fold coverage across the genome. Assessment of final assembly quality was completed as described previously [45].

Genome annotation
Automated gene prediction was completed by assessing congruence of gene call results from three independent programs, the Critica [46], Generation, and Glimmer [47] modeling packages, and by comparing the translations to the GenBank nonredundant database using the basic local alignment search tool for proteins (BLASTP). Product description annotations were obtained using searches against the KEGG, InterPro, TIGRFams, PROSITE, and Clusters of Orthologous Groups of protein (COGs) databases. The tRNAScanSE tool [48] was used to find tRNA genes, whereas ribosomal RNAs were found by using BLASTN vs. the 16S and 23S ribosomal RNA databases. Initial comparative analyses of bacterial genomes and gene neighborhoods were completed using the JGI Integrated Microbial Genomes web-based interface. Additional gene prediction analysis and func-tional annotation was performed within the Integrated Microbial Genomes (IMG-ER) platform [49].

Genome properties
The genome is 6,817,576 bp long with 61.15% GC content and comprised of four replicons (Table 4); one circular chromosome of size 3,781,904 bp ( Figure 3) and three plasmids of size 1,570,951 bp, 1,245,408 bp and 219,313 bp ( Figure 4). Of the 6,599 genes predicted, 6,518 were protein-coding genes, and 81 RNA only encoding genes. In addition, 305 pseudogenes were identified. The majority of the genes (70.4%) were assigned a putative function while those remaining were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 5.