Genome sequence of Microvirga lupini strain LUT6T, a novel Lupinus alphaproteobacterial microsymbiont from Texas

Microvirga lupini LUT6T is an aerobic, non-motile, Gram-negative, non-spore-forming rod that can exist as a soil saprophyte or as a legume microsymbiont of Lupinus texensis. LUT6T was isolated in 2006 from a nodule recovered from the roots of the annual L. texensis growing in Travis Co., Texas. LUT6T forms a highly specific nitrogen-fixing symbiosis with endemic L. texensis and no other Lupinus species can form an effective nitrogen-fixing symbiosis with this isolate. Here we describe the features of M. lupini LUT6T, together with genome sequence information and its annotation. The 9,633,614 bp improved high quality draft genome is arranged into 160 scaffolds of 1,366 contigs containing 10,864 protein-coding genes and 87 RNA-only encoding genes, and is one of 20 rhizobial genomes sequenced as part of a DOE Joint Genome Institute 2010 Community Sequencing Project.


Introduction
Microvirga is one of the most recently discovered genera of Proteobacteria known to engage in symbiotic nitrogen fixation with legume plants, and joins a diverse set of at least twelve other lineages of Proteobacteria that share this ecological niche [1][2][3][4]. Several genera of legume root-nodule symbionts have a world-wide distribution and interact with many legume taxa. By contrast, symbiotic strains of Microvirga are currently known from two distant locations and only two legume host genera [5,6]. The limited geographic and host distribution of Microvirga symbionts, along with the fact that root-nodule symbiosis is not characteristic of the genus Microvirga as a whole [7], suggest a relatively recent evolutionary transition to legume symbiosis in this group. M. lupini is a specialized nodule symbiont associated with the legume Lupinus texensis, an annual plant endemic to a relatively small geographic area in central Texas and northeastern Mexico [5]. The genus Lupinus has about 270 annual and perennial species concentrated in western North America and in Andean regions of South America, and a much smaller number of species in the Mediterranean region of Europe and northern Africa [8]. Basal lineages of Lupinus all occur in the Mediterranean and are associated with bacterial symbionts in the genus Bradyrhizobium [9,10]. Bradyrhizobium is also the main symbiont lineage for most Lupinus species in North and South America, although a few Lupinus species utilize nodule bacteria in the genus Mesorhizobium [10][11][12][13]. Thus, the acquisition of symbionts in the genus Microvirga by plants of L. texensis appears to be an unusual, derived condition for this legume genus. L. texensis occurs in grassland and open shrub communities with an annual precipitation of 50 -100 cm, on diverse soil types [14]. L. texensis appears to have a specialized symbiotic relationship with M. lupini in that existing surveys have failed to detect nodule symbionts of any other bacterial genus associated with this plant [5]. Moreover, inoculation experiments with other North American species of Lupinus, as well as other legume genera, have so far failed to identify any plant besides L. texensis that is capable of forming an effective, nitrogen-fixing symbiosis with M. lupini [5]. M. lupini strain Lut6 T was isolated from a nodule collected from a L. texensis plant in Travis Co., Texas in 2006. Here we provide an analysis of the complete genome sequence of M. lupini strain Lut6 T ; one of the three described symbiotic species of Microvirga [15].

Classification and general features
M. lupini LUT6 T is a non-motile, Gram-negative rod in the order Rhizobiales of the class Alphaproteobacteria. The rod-shaped form varies in size with dimensions of 1.0 μm for width and 1.5-2.0 μm for length ( Figure 1 Left and Center). It is fast growing, forming colonies within 3-4 days when grown on half strength Lupin Agar (½LA) [16], tryptone-yeast extract agar (TY) [17] or a modified yeast-mannitol agar (YMA) [18] at 28°C. Colonies on ½LA are white-opaque, slightly domed and moderately mucoid with smooth margins (Figure 1 Right).
Minimum Information about the Genome Sequence (MIGS) is provided in Table 1. Figure 2 shows the phylogenetic neighbor-hood of M. lupini LUT6 T in a 16S rRNA sequence based tree. This strain shares 100% (1,358/1,358 bases) and 98% (1,344/1,367 bases) sequence identity to the 16S rRNA of Microvirga sp. Lut5 and Microvirga lotononidis WSM3557 T , respectively. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [30].  All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5 [31]. The tree was built using the Maximum-Likelihood method with the General Time Reversible model [32]. Bootstrap analysis [33] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Brackets after the strain name contain a DNA database accession number and/or a GOLD ID (beginning with the prefix G) for a sequencing project registered in GOLD [34]. Published genomes are indicated with an asterisk.

Symbiotaxonomy
M. lupini strain Lut6 T was isolated in from a nodule collected from Lupinus texensis growing near Travis Co., Texas. The symbiotic character-istics of this isolate on a range of selected hosts are provided in Table 2.  [5]. Note that '+' and '-' denote presence or absence, respectively, of nodulation (Nod) or N 2 fixation (Fix).

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [34] and an improved-high-quality-draft genome sequence in IMG. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 3.

Growth conditions and DNA isolation
M. lupini LUT6 T was cultured to mid logarithmic phase in 60 ml of TY rich media [35] on a gyratory shaker at 28°C. DNA was isolated from the cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [36].

Genome sequencing and assembly
The genome of M. lupini LUT6 T was sequenced at the Joint Genome Institute (JGI) using a combination of Illumina [37] and 454 technologies [38]. An Illumina GAii shotgun library which generated 77,090,752 reads totaling 5,858.9 Mbp, and a paired end 454 library with an average insert size of 8 Kbp which generated 238,026 reads totaling 81.4 Mb of 454 data were generated for this genome [36]. All general aspects of library construction and sequencing performed at the JGI can be found at [36]. The initial draft assembly contained 1,719 contigs in 6 scaffolds. The 454 paired end data were assembled together with Newbler, version 2.3-PreRelease-6/30/2009. The Newbler consensus sequences were computationally shredded into 2 Kbp overlapping fake reads (shreds). Illumina sequencing data was assembled with VELVET, version 1.0.13 [39], and the consensus sequence computationally shredded into 1.5 Kbp overlapping fake reads (shreds). The 454 Newbler consensus shreds, the Illumina VELVET consensus shreds and the read pairs in the 454 paired end library were integrated using parallel phrap, version SPS -4.24 (High Performance Software, LLC). The software Consed [40][41][42] was used in the following finishing process. Illumina data was used to correct potential base errors and increase consensus quality using the software Polisher developed at JGI [43]. Possible mis-assemblies were corrected using gapResolution (Cliff Han, unpublished) or Dupfinisher [44]. Some gaps between contigs were closed by editing in Consed. The estimated genome size is 10.3 Mb and the final assembly is based on 36.2 Mb of 454 draft data which provides an average 3.5x coverage of the genome and 3,090 Mbp of Illumina draft data which provides an average 300x coverage of the genome.

Genome annotation
Genes were identified using Prodigal [45] as part of the DOE-JGI annotation pipeline [46]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. The tRNAScanSE tool [47] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [48]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [49]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG-ER) platform [50].

Genome properties
The genome is 9,633,614 nucleotides long with 60.26% GC content ( Table 4) and comprised of 160 scaffolds (Figure 3) of 1,366 contigs. From a total of 10,951 genes, 10,864 were protein encoding and 87 RNA only encoding genes. The majority of genes (63.25%) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 5.