Genome sequence of the Lebeckia ambigua-nodulating “Burkholderia sprentiae” strain WSM5005T

“Burkholderia sprentiae” strain WSM5005T is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated in Australia from an effective N2-fixing root nodule of Lebeckia ambigua collected in Klawer, Western Cape of South Africa, in October 2007. Here we describe the features of “Burkholderia sprentiae” strain WSM5005T, together with the genome sequence and its annotation. The 7,761,063 bp high-quality-draft genome is arranged in 8 scaffolds of 236 contigs, contains 7,147 protein-coding genes and 76 RNA-only encoding genes, and is one of 20 rhizobial genomes sequenced as part of the DOE Joint Genome Institute 2010 Community Sequencing Program.


Introduction
Legumes of the Fabaceae family of flowering plants have the unique capacity to form a symbiotic N2-fixing symbiosis with soil-inhabiting root nodule bacteria (RNB). This symbiosis supplies leguminous species with the essential bioavailable nitrogen that could otherwise not be obtained from soils that are inherently infertile. The agricultural region of south-west Western Australia contains such impoverished soils and the successful establishment of effective legume-RNB symbioses has been exploited to drive plant and animal productivity in this landscape without the reliance on nitrogenous fertilizer [1,2]. This landscape's rainfall patterns appear to be changing, from a dry Mediterranean-type distribution to a generally reduced annual rainfall with a less predictable distribution [3]. Due to changes in rainfall patterns, the reproduction of the commercially used annual legume species is challenged. Perennial species might be more able to adapt to climate change, though few commercial perennial forage legumes are adapted to the acid and infertile soils encountered in the region [2]. Therefore, deep-rooted herbaceous perennial legumes including Rhynchosia and Lebeckia species adapted to acid and infertile soils have been investigated for use in this Australian agricultural setting [2,4,5]. The genus Lebeckia Thunb. is part of the Crotalarieae tribe, and refers to a group of 33 species of papilionoid legumes that are endemic to the southern and western parts of South Africa, which have similar soil and climate conditions to Western Australia [6,7]. This genus has recently been revised and is now subdivided into several sections, including Lebeckia s.s., Calobota and Wiborgiella [7]. The Lebeckia s.s. section, which includes L. ambigua, can easily be distinguished from other species by their acicular leaves and 5+5 anther arrangement [7][8][9].
In four expeditions to the Western Cape of South Africa, between 2002 and 2007, nodules and seeds of Lebeckia ambigua were collected and stored [5]. The isolation of RNB from these nodules gave rise to a collection of 23 microsymbionts that clustered into five groups within the genus Burkholderia [5]. Unlike most of the previously studied rhizobial Burkholderia strains, this South African group appears to be associated with papilionoid forage legumes (rather than Mimosa spp.). One of these Burkholderia strains has now been designated as the type strain of the new species "Burkholderia sprentiae" strain WSM5005 T [10]. This isolate effectively nodulates Lebeckia ambigua and L. sepiaria [5]. Here we present a summary classification and a set of general features for "Burkholderia sprentiae" strain WSM5005 T together with the description of the complete genome sequence and its annotation.

Classification and general features
"Burkholderia sprentiae" strain WSM5005 T is a motile, Gram-negative, non-spore-forming rod ( Figure 1, left and center panels) in the order Burkholderiales of the class Betaproteobacteria [10]. It is fast growing, forming 2-4 mm diameter colonies within 2-3 days when grown on half Lupin Agar (½LA) [11] at 28°C. Colonies on ½LA are white-opaque, slightly domed, moderately mucoid with smooth margins (Figure 1, right panel). Minimum Information about the Genome Sequence (MIGS) is provided in Table 1. Figure 2 shows the phylogenetic relationship of "Burkholderia sprentiae" strain WSM5005 T in a 16S rRNA sequence based tree. This strain clusters closest to Burkholderia tuberum STM678 T (CIP 108238 T ) and Burkholderia kururiensis KP23 T with 98.2% and 96.9% sequence identity, respectively.

Symbiotaxonomy
"Burkholderia sprentiae" strain WSM5005 T is part of a cadre of Burkholderia strains that were assessed for nodulation and nitrogen fixation on three separate L. ambigua genotypes (CRSLAM-37, CRSLAM-39 and CRSLAM-41) and on L. sepiaria [5]. Representatives of this group of nodule bacteria are generally Nod + and Fixon Macroptillium atropurpureum and appear to have a very narrow host range for symbiosis. They belong to a group of Burkholderia strains that nodulate papilionoid forage legumes rather than the classical Burkholderia hosts Mimosa spp. (Mimosoideae) [28].

Genome sequencing and annotation information Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [27] and an improved-high-quality-draft genome sequence in IMG. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 2.  Phylum Proteobacteria TAS [14] Class Betaproteobacteria TAS [15,16] Order Burkholderiales TAS [15,17] Family Burkholderiaceae TAS [15,18] Genus Burkholderia TAS [19][20][21] Species "Burkholderia sprentiae" TAS [10] Gram stain Negative IDA [22] Cell shape Rod IDA Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontolog y project [24]. Standards in Genomic Sciences . All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5.05 [25]. The tree was built using the maximum likelihood method with the General Time Reversible model. Bootstrap analysis [26] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Strains with a genome sequencing project registered in GOLD [27] are in bold print and the GOLD ID is mentioned after the accession number. Published genomes are designated with an asterisk.

Growth conditions and DNA isolation
"Burkholderia sprentiae" strain WSM5005 T was grown to mid logarithmic phase in TY rich medium [29] on a gyratory shaker at 28°C. DNA was isolated from 60 mL of cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [30].

Genome sequencing and assembly
The genome of "Burkholderia sprentiae" strain WSM5005 T was sequenced at the Joint Genome Institute (JGI) using a combination of Illumina [31] and 454 technologies [32]. An Illumina GAii shotgun library which generated 76,247,610 reads totaling 5,794.8 Mb, and a paired end 454 library with an average insert size of 13 kb which generated 612,483 reads totaling 112.9 Mb of 454 data were generated for this genome. All general aspects of library construction and sequencing performed at the JGI can be found at [30]. The initial draft assembly contained 420 contigs in 8 scaffolds. The 454 paired end data was assembled with Newbler, version 2.3. The Newbler consensus sequences were computationally shredded into 2 kb overlapping fake reads (shreds). Illumina sequencing data were assembled with VELVET, version 1.0.13 [33], and the consensus sequences were computationally shredded into 1.5 kb overlapping fake reads (shreds). We integrated the 454 Newbler consensus shreds, the Illumina VELVET consensus shreds and the read pairs in the 454 paired end library using parallel phrap, version SPS -4.24 (High Performance Software, LLC). The software Consed [34][35][36] was used in the following finishing process. Illumina data was used to correct potential base errors and increase consensus quality using the software Polisher developed at JGI (Alla Lapidus, unpublished). Possible mis-assemblies were corrected using gapResolution (Cliff Han, unpublished), Dupfinisher [37], or sequencing cloned bridging PCR fragments with subcloning. Gaps

Genome annotation
Genes were identified using Prodigal [38] as part of the DOE-JGI Annotation pipeline [39], followed by a round of manual curation using the JGI GenePRIMP pipeline [40]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Noncoding genes and miscellaneous features were predicted using tRNAscan-SE [41], RNAMMer [42], Rfam [43], TMHMM [44], and SignalP [45]. Additional gene prediction analyses and functional annotation were performed within the Integrated Microbial Genomes (IMG-ER) platform [46].

Genome properties
The genome is 7,761,063 nucleotides with 63.18% GC content ( Table 3) and comprised of 8 scaffolds of 236 contigs. From a total of 7,223 genes, 7,147 were protein encoding and 76 RNA only encoding genes. Within the genome, 377 pseudogenes were also identified. The majority of genes (76.16%) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 4, Figure 3 and Figure 4.