Genome sequence of the clover-nodulating Rhizobium leguminosarum bv. trifolii strain TA1

Rhizobium leguminosarum bv. trifolii strain TA1 is an aerobic, motile, Gram-negative, non-spore-forming rod that is an effective nitrogen fixing microsymbiont on the perennial clovers originating from Europe and the Mediterranean basin. TA1 however is ineffective with many annual and perennial clovers originating from Africa and America. Here we describe the features of R. leguminosarum bv. trifolii strain TA1, together with genome sequence information and annotation. The 8,618,824 bp high-quality-draft genome is arranged in a 6 scaffold of 32 contigs, contains 8,493 protein-coding genes and 83 RNA-only encoding genes, and is one of 20 rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Community Sequencing Program.


Introduction
Biological fixation of inert atmospheric dinitrogen gas is a process that can only be performed by certain prokaryotes in the domains Archaea and Bacteria. By far the greatest amounts of nitrogen (N) are fixed by specialized soil bacteria (root nodule bacteria or rhizobia) that form proto-cooperative, non-obligatory symbiotic relationships with legumes [1]. Indeed, these symbioses contribute ~40 million tonnes of N annually to support global food production [2].
Species of the legume genus Trifolium (clovers) are amongst the most widely cultivated pasture legumes. Naturally, this genus inhabits three distinct centers of diversity with approximately 28% of species in the Americas, 57% in Eurasia and 15% in Sub-Saharan Africa [3]. A smaller subset of about 30 species, almost all of Eurasian origin, are widely gown as annual and perennial species in pasture systems in Mediterranean and temperate regions [3]. Globally important perennial species of clover include T. repens (white clover), T. pratense (red clover), T. fragiferum (strawberry clover) and T. hybridum (alsike clover). Clovers usually form N2-fixing symbioses with the common soil bacterium Rhizobium leguminosarum bv. trifolii, and different combinations of Trifolium hosts and strains of R. leguminosarum bv. trifolii can vary markedly in symbiotic compatibility [4], resulting in a broad range of symbiotic developmental outcomes ranging from ineffective (nonnitrogen fixing) nodulation to fully effective N2fixing partnerships [5].
In Australia, Rhizobium leguminosarum bv. trifolii strain TA1 (initially designated BA-Tas) has a long history of use as a commercial inoculant for Trifolium spp. [6]. TA1 was originally isolated from a root nodule on the annual species T. subterraneaum in Bridport, Tasmania in the early 1950's [6]. This isolate is likely to be a naturalized strain of European origin that arrived by chance in Tasmania in the 1800's. Although widely used as a microsymbiont of European clovers, it became evident that this soil saprophyte is not acid tolerant [7] and survives poorly when coated onto clover seed with a peat based carrier [8][9][10]. Nevertheless, TA1 remains the commercial inoculant in Australia for perennial (T repens, T. pratense, T. fragiferum, T. hybridum, T. tumens (talish clover)) and annual (T. alexandrinum (berseem clover), T. glomeratum (cluster clover) and T. dubium (suckling clover)) clovers of European origin [11]. Furthermore, this R. leguminosarum bv. trifolii strain has been adopted by the international community as a model organism to investigate the biology of the Trifolium-Rhizobium symbiosis [12]. Here we present a summary classification and a set of general features for R. leguminosarum bv. trifolii strain TA1 together with the description of the complete genome sequence and its annotation.

Classification and general features
R. leguminosarum bv. trifolii strain TA1 is a motile, Gram-negative, non-spore-forming rod ( Figure 1 Left and Center) in the order Rhizobiales of the class Alphaproteobacteria. It is slow growing, forming 1-4 mm diameter colonies within 3-5 days grown on half Lupin Agar (½LA) [13] at 28°C. Colonies on ½LA are white-opaque, slightly domed, moderately mucoid with smooth margins (Figure 1 Right). Minimum Information about the Genome Sequence (MIGS) is provided in Table 1. Figure 2 shows the phylogenetic neighborhood of R. leguminosarum bv. trifolii strain TA1 in a 16S rRNA sequence based tree. This strain clusters closest to R. leguminosarum bv. trifolii T24 and R. leguminosarum bv. phaseoli RRE6 with 99.9% and 99.8% sequence identity, respectively.

Symbiotaxonomy
Rhizobium leguminosarum bv. trifolii strain TA1 is currently the commercial inoculant for white (Trifolium repens), red (Trifolium pratense) and strawberry (Trifolium fragiferum) clovers in Australia. TA1 in general is not as effective for nitrogen fixation on annual clovers as other strains, such as WSM1325 [34,35]. However TA1 is of particular interest because it displays a broad host range for nodulation and nitrogen fixation across annual and perennial clovers originating from the European and Mediterranean centre of origin of clovers [1]. TA1 is generally able to nodulate but unable to fix with many annual and and perennial clovers originating from Africa and America [34].

Genome sequencing and annotation information Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [33] and an improved-high-quality-draft genome sequence in IMG. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 2.   [14].

Current classification
Domain Bacteria TAS [15] Phylum Proteobacteria TAS [16] Class Alphaproteobacteria TAS [17,18] Order Rhizob iales TAS [17,19] Family Rhizob iaceae TAS [20,21] Genus Rhizob ium TAS [20,22- in the literature). These evidence codes are from the Gene Ontolog y project [30]. (shown in blue print) with some of the root nodule bacteria in the order Rhizobiales based on alig ned sequences of the 16S rRNA g ene (1,307 bp internal reg ion). All sites were informative and there were no g apcontaining sites. Phylogenetic analyses were performed using MEGA, version 5.05 [31]. The tree was built using the maximum likelihood method with the General Time Reversible model. Bootstrap analysis [32] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Strains with a g enome sequencing project reg istered in GOLD [33] are in bold print and the GOLD ID is mentioned after the accession number. Published g enomes are desig nated with an asterisk.

Growth conditions and DNA isolation
Rhizobium leguminosarum bv. trifolii strain TA1 was grown to mid logarithmic phase in TY rich media [36] on a gyratory shaker at 28°C. DNA was isolated from 60 ml of cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [37].

Genome sequencing and assembly
The genome of Rhizobium leguminosarum bv. trifolii strain TA1 was sequenced at the Joint Genome Institute (JGI) using a combination of Illumina [38] and 454 technologies [39]. An Illumina GAii shotgun library which generated 66,421,308 reads totaling 5,048 Mb, and a paired end 454 library with an average insert size of 13 kb which generated 393,147 reads totaling 100.1 Mb of 454 data were generated for this genome. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI user homepage [40]. The initial draft assembly contained 199 contigs in 5 scaffolds. The 454 paired end data was assembled with Newbler, version 2.3. The Newbler consensus sequences were computationally shredded into 2 kb overlapping fake reads (shreds). Illumina sequencing data were assembled with VELVET, version 1.0.13 [41], and the consensus sequence were computationally shredded into 1.5 kb overlapping fake reads (shreds). We integrated the 454 Newbler consensus shreds, the Illumina VELVET consensus shreds and the read pairs in the 454 paired end library using parallel phrap, version SPS -4.24 (High Performance Software, LLC). The software Consed [42][43][44] was used in the following finishing process. Illumina data was used to correct potential base errors and increase consensus quality using the software Polisher developed at JGI (Alla Lapidus, unpublished). Possible mis-assemblies were corrected using gapResolution (Cliff Han, un-

Genome annotation
Genes were identified using Prodigal [45] as part of the DOE-JGI Annotation pipeline [46], followed by a round of manual curation using the JGI GenePRIMP pipeline [47]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Noncoding genes and miscellaneous features were predicted using tRNAscan-SE [48], RNAMMer [49], Rfam [50], TMHMM [51], and SignalP [52]. Additional gene prediction analyses and functional annotation were performed within the Integrated Microbial Genomes (IMG-ER) platform [37,53].

Genome properties
The genome is 8,618,824 nucleotides with 60.74% GC content (Table 3) and comprised of 32 contigs in 6 scaffolds (Figure 3). From a total of 8,576 genes, 8,493 were protein encoding and 83 RNA only encoding genes. The majority of genes (77.85%) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4.