Complete genome sequence of Rhizobium leguminosarum bv. trifolii strain WSM1325, an effective microsymbiont of annual Mediterranean clovers.

Rhizobium leguminosarum bv trifolii is a soil-inhabiting bacterium that has the capacity to be an effective nitrogen fixing microsymbiont of a diverse range of annual Trifolium (clover) species. Strain WSM1325 is an aerobic, motile, non-spore forming, Gram-negative rod isolated from root nodules collected in 1993 from the Greek Island of Serifos. WSM1325 is produced commercially in Australia as an inoculant for a broad range of annual clovers of Mediterranean origin due to its superior attributes of saprophytic competence, nitrogen fixation and acid-tolerance. Here we describe the basic features of this organism, together with the complete genome sequence, and annotation. This is the first completed genome sequence for a microsymbiont of annual clovers. We reveal that its genome size is 7,418,122 bp encoding 7,232 protein-coding genes and 61 RNA-only encoding genes. This multipartite genome contains 6 distinct replicons; a chromosome of size 4,767,043 bp and 5 plasmids of size 828,924 bp, 660,973 bp, 516,088 bp, 350,312 bp and 294,782 bp.


Introduction
The productivity of agricultural systems is heavily dependent on nitrogen (N) [1]. The requirement for N-input can be met by the application of exogenous N-fertilizer manufactured through the Haber-Bosch process, but as the cost of fossil fuelderived energy increases, so does the cost to manufacture and apply such fertilizer. Furthermore, there are inherent issues with the synthesis and application of N-fertilizer, including greenhouse gas emissions and run-off causing eutrophication. Alternatively, N can be obtained from symbiotic nitrogen fixation (SNF) by root nodule bacteria (rhizobia) on nodulated legumes [2]; this is a key biological process in natural and agricultural environments driven by solar radiation and utilizing atmospheric CO2. The commonly accepted figure for global SNF in agriculture is 50-70 million metric tons annually, worth in excess of U.S. $10 billion [3]. Rhizobia are applied across 400 million ha of agricultural land per annum to improve legume forage and crop production through symbiotic N-fixation [3].
The clover (Trifolium) nodulating Rhizobium R. leguminosarum bv. trifolii is amongst the most exploited species of root-nodule bacteria in world agriculture. Clovers are widely grown pasture legumes and include both annual species (e.g. T. subterraneum) and perennial species (e.g. T. pratense, T. repens and T. polymorphum). Clovers are adapted to a wide range of environments, from sub-tropical to moist Mediterranean systems, and thus are important nitrogen-fixing legumes in many natural and agricultural regions of North and South America, Europe, Africa and Australasia [4]. Rhizobium leguminosarum bv. trifolii strain WSM1325 was isolated from a nodule recovered from the roots of an annual clover plant growing near Livadi beach on the Greek Cyclades island of Serifos in 1993 [5]. Strain WSM1325 is of particular interest because it is a highly effective nitrogen-fixing microsymbiont of a broad range of annual clovers of Mediterranean origin [5] and is also saprophytically competent in acid, infertile soils of both Uruguay and southern Australia [6]. Strain WSM1325 is an effective microsymbiont under competitive conditions for nodulation in what appears to be a host-mediated selection process [7].
As well as being a highly effective inoculant strain for annual Trifolium spp., strain WSM1325 is compatible with key perennial clovers of Mediterranean origin used in farming, such as T. repens and T. fragiferum, and is therefore one of the most important clover inoculants used in agriculture. However, WSM1325 is incompatible with American and African clovers, sometimes nodulating but never fixing N [5]. This is in contrast to other Rhizobium leguminosarum bv. trifolii strains, such as WSM2304, which are effective at N-fixation with some perennial American clovers, but ineffective with the Mediterranean clovers [5][6][7].
Here we present a summary classification and a set of features for R. leguminosarum bv. trifolii strain WSM1325 (Table 1), together with the description of a complete genome sequence and annotation.

Classification and features
R. leguminosarum bv. trifolii WSM1325 is a motile, Gram-negative, non-spore-forming rod ( Figure  1A,B) in the Rhizobiaceae family of the class Alphaproteobacteria that forms mucoid colonies ( Figure 1C) on solid media [24]. It has a mean generation time of 3.9 h in rich medium at the optimal growth temperature of 28°C [7].  Figure 2 shows the phylogenetic neighborhood of R. leguminosarum bv. trifolii strain WSM1325 in a 16S rRNA-based tree. An intragenic fragment of 1,440 bp was chosen since the 16S rRNA gene has not been completely sequenced in many type strains. A comparison of the entire 16S rRNA gene of WSM1325 to completely sequenced 16S rRNA genes of other rhizobia revealed 100% gene sequence identity to the same gene of R. leguminosarum bv. trifolii strain WSM2304 but revealed a 1 bp difference to the same gene of R. leguminosarum bv. viciae strain 3841. All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 3.1 [25]. Kimura two-parameter distances were derived from the aligned sequences [26] and a bootstrap analysis [27] as performed with 500 replicates in order to construct a consensus unrooted tree using the neighbor-joining method [28] for each gene alignment separately. B.-Bradyrhizobium; M.-Mesorhizobium; R.-Rhizobium; S-Ensifer (Sinorhizobium). Type strains are indicated with a superscript T. Strains with a genome sequencing project registered in GOLD [22] are in bold red print. Published genomes are designated with an asterisk.

Symbiotaxonomy
R. leguminosarum bv. trifolii WSM1325 nodulates (Nod + ) and fixes nitrogen effectively (Fix + ) with a wide range of annual clovers of Mediterranean origin which are in commercial agriculture, globally. Examples of these clover species include T. subterraneum, T. vesiculosum, T. purpureum T. glanduliferum, T. resupinatum, T. michellianum and T. incarnatum. An illustration of the ability of WSM1325 to fix nitrogen effectively across a range of annual clover species is displayed in Figure 3. Additionally, WSM1325 is Fix + with some Mediterranean perennial clovers such as T. repens and T. fragiferum, but is inconsistently Nod + , and consistently Fixwith clovers of African and American origin [5,30]. Under conditions of competitive nodulation, WSM1325 may preferentially nodulate T. purpureum even when outnumbered 100:1 by WSM2304 [7].

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the US Department of Energy Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [22] and the complete genome sequence in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
R. leguminosarum bv. trifolii WSM1325 was grown to mid logarithmic phase in TY medium (a rich medium) [31] on a gyratory shaker at 28°C. DNA was isolated from 60 mL of cells using a CTAB (Cetyl trimethylammonium bromide) bacterial genomic DNA isolation method .

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov/. 454 Pyrosequencing reads were assembled using the Newbler assembler, version 1.1.02.15 (Roche). Large Newbler contigs were broken into 6,084 overlapping frag-ments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and to adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the parallel phrap assembler (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher or transposon bombing of bridg-ing clones [32]. Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification. A total of 2,155 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. Together, all sequence types provided 36× coverage of the genome. The error rate of the completed genome sequence is less than 1 in 100,000. Evidence codes -TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from http://www.geneontology.org/GO.evidence.shtml of the Gene Ontology project [23]. If the evidence code is IDA, then the property was directly observed by one of the authors or an expert mentioned in the acknowledgements.

Genome annotation
Genes were identified using Prodigal [33] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePrimp pipeline [34]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analyses and functional annotation were performed within the Integrated Microbial Genomes (IMG-ER) platform (http://img.jgi.doe.gov/er) [35].

Genome properties
The genome is 7,418,122 bp long with a 60.77% GC content (  (Figure 4). Of the 7293 genes predicted, 7,232 were protein coding genes, and 61 RNA only encoding genes. Two hundred and thirty one pseudogenes were also identified. The majority of genes (74.21%) were assigned a putative function whilst the remaining genes were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.