Genome sequence of the acid-tolerant Burkholderia sp. strain WSM2232 from Karijini National Park, Australia

Burkholderia sp. strain WSM2232 is an aerobic, motile, Gram-negative, non-spore-forming acid-tolerant rod that was trapped in 2001 from acidic soil collected from Karijini National Park (Australia) using Gastrolobium capitatum as a host. WSM2232 was effective in nitrogen fixation with G. capitatum but subsequently lost symbiotic competence during long-term storage. Here we describe the features of Burkholderia sp. strain WSM2232, together with genome sequence information and its annotation. The 7,208,311 bp standard-draft genome is arranged into 72 scaffolds of 72 contigs containing 6,322 protein-coding genes and 61 RNA-only encoding genes. The loss of symbiotic capability can now be attributed to the loss of nodulation and nitrogen fixation genes from the genome. This rhizobial genome is one of 100 sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.


Introduction
Burkholderia spp. are a diverse group of organisms capable of thriving in diverse environments with many forming mutualistic associations with organisms such as fungi and plants [1]. The development in the 1960s and 1970s of a rational classification system for Pseudomonas species resulted in proposals to give different generic names to taxonomically distinct groups. The organisms previously classified within Pseudomonas rRNA similarity Group II were transferred into the new genus Burkholderia [2]. All described Burkholderia species at that time were phytopathogenic, or opportunistic mammalian pathogens with the type species B. cepacia becoming a growing community health concern in immunocompromised and cystic fibrosis patients [3][4][5]. With the isolation of more Burkholderia spp., it has become apparent that the genus is a far more complex mix, with the isolation of numerous soil-inhabiting species capable of degrading heavy metals and environmental contaminants [6,7]. Further reports identified plant growth promoting (PGP) species and legume microsymbionts. This led to a paradigm shift in rhizobiology and resulted in numerous new novel Burkholderia spp. descriptions [8][9][10]. Most PGP, or legume microsymbiont species of Burkholderia have been isolated in South America from Mimosa spp. or South Africa from Papilionoideae legumes and until recently, B. graminis was the only described PGP bacterial species isolated from Australia in the maize rhizosphere [11]. Australian Burkholderia have been isolated as nodule occupants from some Acacia spp., [12] however none have been authenticated or tested for the nodulation of other legumes. There is little data regarding the symbiosis between Burkholderia and legumes in Australia compared to South Africa and South America. Burkholderia sp. WSM2232 was trapped from acidic soil (pHCaCl2 4.8) collected from Karijini National Park (Western Australia) using Gastrolobium capitatum as a host. Sites where the soil pH was higher (pHCaCl2 >7) did not contain any Burkholderia symbionts but did contain numerous Bradyrhizobium and Rhizobium spp. (Watkin, unpublished). Soil pH is an edaphic variable that controls microbial biogeography [13] and the acid tolerance of Burkholderia has been shown to account for the biogeographical distribution of this genus [14]. The symbiotic capacity of WSM2232 was authenticated in axenic glasshouse trials using inoculation of G. capitatum grown in nitrogen free conditions. Inoculated plants nodulated by WSM2232 produced significantly greater mass than uninoculated controls. WSM2232 was subcultured and placed in long-term storage in frozen laboratory glycerol stocks. Isolate revival and inoculation onto endemic Australian legumes failed to elicit a symbiotic response. The reason for the loss of the symbiotic phenotype has, until now, not been identified. The genome of Burkholderia strain WSM2232 is one of two Australian Burkholderia genomes (the other being that of WSM2230 (GOLD ID Gi08831)) that have now been sequenced through the Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) program. Here we present a preliminary description of the general features of Burkholderia sp. WSM2232 together with its genome sequence and annotation. The absence of nodulation genes within this genome explains the nodulation minus symbiotic phenotype of the laboratory cultured strain. The genomes of WSM2232 and WSM2230 will be an important resource to identify the processes enabling such isolates to adapt to the infertile, highly acidic soils that dominate the Australian landscape.

Classification and features
Burkholderia sp. strain WSM2232 is a motile, nonsporulating, non-encapsulated, Gram-negative rod in the order Burkholderiales of the class Betaproteobacteria. The rod-shaped form varies in size with dimensions of 0.25-0.5 μm for width and 0.5-2.0 μm for length ( Figure 1A and 1B). It is fast growing, forming colonies within 1-2 days when grown on LB agar [15] devoid of NaCl and within 3-4 days when grown on half strength Lupin Agar (½LA) [16], tryptone-yeast extract agar (TY) [17] or a modified yeast-mannitol agar (YMA) [18] at 28°C. Colonies on ½LA are opaque, slightly domed and moderately mucoid with smooth margins. Burkholderia sp. WSM2232 falls into a large clade containing PGP, bioremediation and legume microsymbiont species, and WSM2232 demonstrates PGP phenotypes including phosphate solubilization and hydroxamate-like siderophore production and is acid tolerant with growth in the pH range of 4.5-9.0 (Walker, unpublished). Minimum Information about the Genome Sequence (MIGS) is provided in Table 1. Figure 2 shows the phylogenetic neighborhood of Burkholderia sp. strain WSM2232 in a 16S rRNA sequence based tree. This strain shares 99% (1352/1364 bp) sequence identity to the 16S rRNA gene of the sequenced strain Burkholderia sp. WSM2230 (Gi08831).  , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [28]. . All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA [29], version 5. The tree was built using the Maximum-Likelihood method with the General Time Reversible model [30]. Bootstrap analysis [31] with 500 replicates was performed to assess the support for the clusters. Type strains are indicated with a superscript T. Brackets after the strain name contain a DNA database accession number and/or a GOLD ID (beginning with the prefix G) for a sequencing project registered in GOLD [32]. Published genomes are indicated with an asterisk.
However, after long-term storage and subsequent culture, it failed to effectively nodulate G. capitatum.

Phenotype Microarray
Strain WSM2232 was assayed using the Biolog Phenotype Microarray® plates (PM1 to 3) system testing 190 carbon and 95 nitrogen compounds. Plates were purchased from Biolog and tests were carried out per manufacturer's instructions. The irreversible reduction of tetrazolium dye to formazan is used in this system to report on active metabolism [33]. The results obtained from the colorimetric assay are shown in Table 3.

Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequenc-ing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [32] and a standard-draft genome sequence in IMG. Se-quencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 4.

Growth conditions and DNA isolation
Burkholderia sp. strain WSM2232 was cultured to mid logarithmic phase in 60 ml of TY rich medium on a gyratory shaker at 28°C [34]. DNA was isolated from the cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method (http://my.jgi.doe.gov/general-/index.html).

Genome sequencing and assembly
The genome of Burkholderia sp. strain WSM2232 was sequenced at the Joint Genome Institute (JGI) using Illumina technology [35]. An Illumina standard shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform, which generated 12,244,888, reads totaling 1,837 Mbp.
All general aspects of library construction and sequencing performed at the JGI can be found at http://my.jgi.doe.gov/general/index.html. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artifacts (Mingkun, L., Copeland, A. and Han, J., unpublished). The following steps were then performed for assembly:

Genome annotation
Genes were identified using Prodigal [38] as part of the DOE-JGI annotation pipeline [39], followed by a round of manual curation using the JGI GenePrimp pipeline [40]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. The tRNAScanSE tool [41] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [42]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL (http://infernal.janelia.org). Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG-ER) platform [43].

Genome properties
The genome is 7,208,311 nucleotides 63.11% GC content ( Table 5) and comprised of 72 scaffolds ( Figure 3) of 72 contigs. From a total of 6,383 genes, 6,322 were protein encoding and 61 RNA only encoding genes. The majority of genes (80.90%) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 6. Standards in Genomic Sciences  a Total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. b 4 copies of 5S, 2 copies of 16S and 1 copy of 23S rRNA. Not in COGs a The total is based on the total number of protein coding genes in the annotated genome.