Genome sequence of the South American clover-nodulating Rhizobium leguminosarum bv. trifolii strain WSM597

Rhizobium leguminosarum bv. trifolii strain WSM597 is an aerobic, motile, Gram-negative, non-spore-forming rod isolated from a root nodule of the annual clover Trifolium pallidum L. growing at Glencoe Research Station near Tacuarembó, Uruguay. This strain is generally ineffective for nitrogen (N2) fixation with clovers of Mediterranean, North American and African origin, but is effective on the South American perennial clover T. polymorphum Poir. Here we describe the features of R. leguminosarum bv. trifolii strain WSM597, together with genome sequence information and annotation. The 7,634,384 bp high-quality-draft genome is arranged in 2 scaffolds of 53 contigs, contains 7,394 protein-coding genes and 87 RNA-only encoding genes, and is one of 20 rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Community Sequencing Program.


Introduction
A key factor which limits the productivity of agricultural systems is the availability of soil nitrogen (N). Legumes can overcome soil N limitations by forming symbiotic relationships with root nodule bacteria (rhizobia). Rhizobia, through their interaction with legumes, are able to reduce atmospheric dinitrogen (N2) into ammonia, which can supply essential N for growth to the plant. In addition, much of this fixed N is subsequently released into the soil following plant senescence and decay, grazing by livestock or human harvest [1], thereby increasing soil N content and fertility for subsequent crops. Thus, biological N2 fixation forms a vital component of sustainable agriculture as it provides a means of ameliorating N-deficient soils without the need for industrially synthesized Nbased fertilizers, the production and application of which have significant environmental and economic costs [2]. Forage and fodder legumes play an integral role in sustainable farming practice, providing feed for stock while also enriching soil with available N. Worldwide, there are approximately 110 million ha of forage and fodder legumes under production [3], of which Trifolium spp. (clover) are of key importance [4]. The bacterial microsymbionts that nodulate clovers are Rhizobium leguminosarum bv. trifolii. Since Trifolium spp. are geographically widely distributed and are also phenologically variable (i.e. they may be either annual [e.g. T. subterraneum, T. pallidum and T. scutatum] or perennial [e.g. T. pratense, T. repens and T. polymorphum]), it is rare that a single strain of R. leguminosarum bv. trifolii can effectively fix N2 across a wide diversity of clovers [5].
Rhizobium leguminosarum bv. trifolii strain WSM597 was isolated from the nodules of Trifolium pallidum, which were collected from the INIA Glencoe Research Station, Uruguay in 1999. WSM597 is able to nodulate (Nod + ) and fix (Fix + ) N2 effectively on the South American perennial clover Trifolium polymorphum. However, while WSM597 is able to nodulate Trifolium pallidum and other annual and perennial Trifolium spp. of Mediterranean, African and North American origin, it is not effective for N2 fixation on any of these hosts (Yates et al., unpublished data). Therefore, WSM597 is highly specific for effectiveness in symbiosis, as is also evident with the recently sequenced South American clover microsymbiont R. leguminosarum bv. trifolii WSM2304 [6]. Thus, both microsymbionts demonstrate that phenological and geographic barriers exist for effective nodulation in clover symbioses. As this phenotype represents a common challenge to managing the legume-rhizobial symbiosis in agriculture, the genome of WSM597 is a valuable comparator for genetic studies of nodulation and N2 fixation.
Here we present a summary classification and a set of general features for R. leguminosarum bv. trifolii strain WSM597 together with a description of the genome sequence and annotation.  Table 1. Figure 2 shows the phylogenetic neighborhood of R. leguminosarum bv. trifolii strain WSM597 in a 16S rRNA sequence based tree. This strain clusters closest to Rhizobium leguminosarum bv. trifolii T24 and Rhizobium leguminosarum bv. phaseoli RRE6 with 99.9% and 99.8% sequence identity, respectively.

Genome sequencing and annotation information Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [25] and an improved-high-quality-draft genome sequence in IMG. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 2.  Phylum Proteobacteria TAS [10] Class Alphaproteobacteria TAS [11,12] Order Rhizob iales TAS [12,13] Family Rhizob iaceae TAS [14,15] Genus , not directly observed for the living , isolated sample, but based on a g enerally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontolog y project [22]. (shown in blue print) with some of the root nodule bacteria in the orde r Rhizob iales based on alig ned sequences of the 16S rRNA g ene (1,307 bp internal reg ion). All sites were informative and there were no g apcontaining sites. Phylogenetic analyses were performed using MEGA, version 5.05 [23]. The tree was built using the maximum likelihood method with the General Time Reversible model. Bootstrap analysis [24] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Strains with a g enome sequencing project reg istered in GOLD [25] are in bold print and the GOLD ID is mentioned after the accession number. Published g enomes are desig nated with an asterisk.

Growth conditions and DNA isolation
Rhizobium leguminosarum bv. trifolii strain WSM597 was grown to mid logarithmic phase in TY rich medium [26] on a gyratory shaker at 28°C. DNA was isolated from 60 mL of cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [27].

Genome sequencing and assembly
The genome of Rhizobium leguminosarum bv. trifolii strain WSM597 was sequenced at the Joint Genome Institute (JGI) using a combination of Illumina [28] and 454 technologies [29].

Genome annotation
Genes were identified using Prodigal [32] as part of the DOE-JGI Annotation pipeline [33], followed by a round of manual curation using the JGI GenePRIMP pipeline [34]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Noncoding genes and miscellaneous features were predicted using tRNAscan-SE [35], RNAMMer [36], Rfam [37], TMHMM [38], and SignalP [39]. Additional gene prediction analyses and functional annotation were performed within the Integrated Microbial Genomes (IMG-ER) platform [40].

Genome properties
The genome is 7,634,384 nucleotides with 61.01% GC content (Table 3) in 2 scaffolds containing 53 contigs. From a total of 7,481 genes, 7,394 were protein encoding and 87 RNA only encoding genes. The majority of genes (79.24%) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4 and Figure 3.