Complete genome sequence of Rhizobium leguminosarum bv trifolii strain WSM2304, an effective microsymbiont of the South American clover Trifolium polymorphum.

Rhizobium leguminosarum bv trifolii is the effective nitrogen fixing microsymbiont of a diverse range of annual and perennial Trifolium (clover) species. Strain WSM2304 is an aerobic, motile, non-spore forming, Gram-negative rod, isolated from Trifolium polymorphum in Uruguay in 1998. This microsymbiont predominated in the perennial grasslands of Glencoe Research Station, in Uruguay, to competitively nodulate its host, and fix atmospheric nitrogen. Here we describe the basic features of WSM2304, together with the complete genome sequence, and annotation. This is the first completed genome sequence for a nitrogen fixing microsymbiont of a clover species from the American center of origin. We reveal that its genome size is 6,872,702 bp encoding 6,643 protein-coding genes and 62 RNA only encoding genes. This multipartite genome was found to contain 5 distinct replicons; a chromosome of size 4,537,948 bp and four circular plasmids of size 1,266,105 bp, 501,946 bp, 308,747 bp and 257,956 bp.


Introduction
Since ancient times, crop fields have been regularly rotated with legumes, and this continues in the modern world because of the recognition that the productivity of agricultural systems is nitrogen dependent [1]. Legumes may redress nitrogen deficiency through the fixation of atmospheric nitrogen by rhizobia in root nodules [2]. Today, despite the ready availability of nitrogen-fertilizer manufactured through the Haber-Bosch process, globally in excess of 400 million ha of agricultural land are sustained by nitrogen derived from forage legumes [3]. These forages are grown for animal feed, for rotation with cereal crops, as disease breaks or as cover crops for tree plantations. Amongst the forage legumes, the Trifolium spp.
(clovers) are acknowledged as one of the most important genera, with 237 species distributed across the temperate and sub-tropical regions of North and South America, Europe, Africa and Australasia [4]. These clovers are nodulated by R. leguminosarum bv trifolii, which is one of the most exploited species of root-nodule bacteria in world agriculture. However, because clovers are geographically widely distributed, and phenologically variable (they may be either annual [e.g. T. subterraneum] or perennial [e.g. T. pratense, T. raepens and T. polymorphum]), it is rare that a single strain of R. leguminosarum bv trifolii can effectively fix nitrogen across a wide diversity of clovers, especially those from different geographical and phenological backgrounds [5]. Rhizobium leguminosarum bv trifolii strain WSM2304 was isolated from a nodule recovered from the roots of the perennial clover Trifolium polymorphum growing at Glencoe Research Station near Tacuarembó, Uruguay in December 1998. WSM2304 is of particular interest because it is a highly effective microsymbiont of a perennial clover of South American origin, has a narrow, specialized host range for nitrogen fixation [5], and is highly competitive for nodulation of T. polymorphum in the acid, infertile soils of Uruguay [6]. WSM2304 has also been implicated in host mediated selection for an effective microsymbiont under competitive conditions for nodulation [7]. Here we present a summary classification and a set of features for R. leguminosarum bv trifolii strain WSM2304 (Table 1), together with the description of the complete genome sequence and annotation.

Classification and features
R. leguminosarum bv trifolii strain WSM2304 is a motile, Gram-negative, non-spore-forming rod (Figure 1 A & B) in the Rhizobiaceae family of the class Alphaproteobacteria that forms mildly mucoid colonies ( Figure 1C) on solid media [24]. It has a mean generation time of 3.5 h in rich medium at the optimal growth temperature of 28°C [7].  Figure 2 shows the phylogenetic neighborhood of R. leguminosarum bv trifolii strain WSM2304 in a 16S rRNA-based tree. An intragenic fragment of 1,440 bp was chosen since the 16S rRNA gene has not been completely sequenced in many type strains. A comparison of the entire 16S rRNA gene of WSM2304 to completely sequenced 16S rRNA genes of other rhizobia revealed 100% gene sequence identity with R. leguminosarum bv trifolii strain WSM1325 but a 1 bp difference from the 16S rRNA gene of R. leguminosarum bv viciae strain 3841.

Symbiotaxonomy
R. leguminosarum bv trifolii WSM2304 nodulates (Nod + ) and fixes nitrogen effectively (Fix + ) with the South American perennial clover T. polymorphum [5]. WSM2304 is Nod + , Fixwith Mediterranean annual clovers T. subterraneum and T. glanduliferum, in contrast to R. leguminosarum bv trifolii WSM1325 [5,29]. When inoculated onto perennial clovers of either North American or Mediterranean origin WSM2304 is variably Nod + , but always Fix - [5,6,30]. Under conditions of competitive nodulation, WSM2304 may preferentially nodulate T. polymorphum even when outnumbered 100:1 by WSM1325 [7]. . All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 3.1 [25]. Kimura two-parameter distances were derived from the aligned sequences [26] and a bootstrap analysis [27] as performed with 500 replicates in order to construct a consensus unrooted tree using the neighbor-joining method [28] for each gene alignment separately. The genera in this tree include Bradyrhizobium (B.), Mesorhizobium (M), Rhizobium (R); Ensifer (Sinorhizobium) (S). Type strains are indicated with a superscript T. Strains with a genome sequencing project registered in GOLD [22] are in bold red print. Published genomes are designated with an asterisk. Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [23]. If the evidence code is IDA, then the property was directly observed for a living isolate by one of the authors or an expert mentioned in the acknowledgements.

Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the Department of Energy Joint Genome Institute (JGI) for projects of relevance to DOE missions. The genome project is deposited in the Genomes OnLine Database [22] and the complete genome sequence in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2 and sequence data statistics from the trace archive for this project are presented in Table 3. Standards in Genomic Sciences

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website (http://www.jgi.doe.gov/). 454 Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 5,676 fragments of 1,500 bp with 100 bp overlap and entered into the assembly as pseudo-reads. The se-quences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and to adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the phrap assembler. Possible mis-assemblies were corrected and gaps between contigs were closed by custom primer walks from sub-clones or PCR products. A total of 1,826 Sanger finishing reads were produced. Illumina reads were used to improve the final consensus quality using an in-house developed tool (the Polisher). The final assembly consists of 168,617 Sanger reads in addition to 5,663 454 pseudo reads. The error rate of the completed genome sequence is less than 1 in 100,000. Together all sequence types provided about 31.4× coverage of the genome.

Genome annotation
Genes were identified using Prodigal [32] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [33]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analyses and functional annotation were performed within the Integrated Microbial Genomes platform (http://img.jgi.doe.gov/er) [34].

Genome properties
The genome is 6,872,702 bp long with a 61.18% GC content, ( Table 4) and comprised of 5 replicons; 1 circular chromosome of size 4,537,948 bp (Figure 3) and 4 circular plasmids of size 4,537,948, 1,266,105, 501,946, 308,747 and 257,956 bp (Figure 4). Of the 6,643 genes predicted, 6,581 were protein coding genes, and 62 RNA only encoding genes. In addition, 166 pseudogenes were identified. The majority of the genes (72.44%) were assigned a putative function whilst the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 5.