Genome sequence of the acid-tolerant Burkholderia sp. strain WSM2230 from Karijini National Park, Australia

Burkholderia sp. strain WSM2230 is an aerobic, motile, Gram-negative, non-spore-forming acid-tolerant rod isolated from acidic soil collected in 2001 from Karijini National Park, Western Australia, using Kennedia coccinea (Coral Vine) as a host. WSM2230 was initially effective in nitrogen-fixation with K. coccinea, but subsequently lost symbiotic competence. Here we describe the features of Burkholderia sp. strain WSM2230, together with genome sequence information and its annotation. The 6,309,801 bp high-quality-draft genome is arranged into 33 scaffolds of 33 contigs containing 5,590 protein-coding genes and 63 RNA-only encoding genes. The genome sequence of WSM2230 failed to identify nodulation genes and provides an explanation for the observed failure of the laboratory grown strain to nodulate. The genome of this strain is one of 100 sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.


Introduction
Burkholderia spp. are ubiquitous in the environment and are found in nearly all terrestrial and some marine ecosystems. They have adapted to occupy numerous niches and may have saprophytic, parasitic, pathogenic or symbiotic lifestyles [1]. Emerging evidence suggests an ancient and stable symbiosis between Burkholderia and Mimosa genera within South America [2] and between Burkholderia and legumes from the Papilionoideae subfamily in South Africa [3,4]. Despite this, there is very little data regarding the symbiosis between Burkholderia and endemic legumes outside of South America and South Africa.
In Australia, legumes are predominately nodulated by species from the genera Bradyrhizobium, Ensifer, and Rhizobium [5,6]. There are no published genomes or species descriptions of symbiotic Burkholderia spp. isolated in Australia and there is a paucity of information on the interaction between Burkholderia and endemic Australia legumes. Burkholderia sp. WSM2230 was isolated from an effective nitrogen fixing nodule on Kennedia coccinea grown in an acidic soil (pH(CaCl2) 4.8) collected from Karijini National Park, Western Australia. Its symbiotic phenotype was authenticated in glasshouse experiments (Watkin, unpublished). Recently this isolate was revived from long-term storage from frozen glycerol stocks but failed to form nodules on K. coccinea in axenic glasshouse trials (Walker, unpublished). In this regard, it is interesting that the South African microsymbiont B. tuberum STM678 T only infrequently forms effective nodules on Macroptilium atropurpureum (Siratro). A recent study [7] revealed that B. tuberum forms effective nodules on Siratro when water levels are reduced and temperature is increased. Unlike B. tuberum STM678 T , the annotation of the genome sequence of the laboratory cultured strain of WSM2230 failed to identify nodulation genes and this offers an explanation for the lack of a nodulation phenotype. Establishing the genomic sequences of Australian Burkholderia will be beneficial to understand the mutualistic interactions occurring between plant and rhizosphere organisms in low-pH soil. WSM2230 was only isolated from Karijini National Park acidic soil (pH(CaCl2) 4.8) and other sites where the soil pH was higher (pH(CaCl2) >7) did not contain any Burkholderia symbionts. In these more alkaline soils, numerous Bradyrhizobium and Rhizobium spp. were instead trapped (Watkin, unpublished). Soil pH is an edaphic variable that controls microbial biogeography [8] and the acid tolerance of Burkholderia has been shown to account for the biogeographical distribution of this genus [9]. The genome of WSM2230 is one of two Australian Burkholderia genomes (the other being that of WSM2232 (GOLD ID Gi08832)) that have now been sequenced through the Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) program. Here we present a preliminary description of the general features of the Burkholderia sp. WSM2230 together with its genome sequence and annotation. The genomes of WSM2232 and WSM2230 will be an important resource to identify the processes enabling such isolates to adapt to the infertile, highly acidic soils that dominate the Australian landscape.

Classification and features
Burkholderia sp. strain WSM2230 is a motile, nonsporulating, non-encapsulated, Gram-negative rod in the order Burkholderiales of the class Betaproteobacteria. The rod-shaped form varies in size with dimensions of 0.5 μm for width and 1.0-2.0 μm for length ( Figure 1 Left and Center). It is fast growing, forming colonies within 1-2 days when grown on LB agar [10] devoid of NaCl and within 2-3 days when grown on half strength Lupin Agar (½LA) [11], tryptone-yeast extract agar (TY) [12] or a modified yeast-mannitol agar (YMA) [13] at 28°C. Colonies on ½LA are -opaque, slightly domed and moderately mucoid with smooth margins (Figure 1 Right). Burkholderia sp. WSM2230 can solubilize inorganic phosphate, produces hydroxymate-like siderophores, and can tolerate a pH range of 4.5 -9.0 (Walker, unpublished). Minimum Information about the Genome Sequence (MIGS) is provided in Table 1. Figure 2 shows the phylogenetic neighborhood of Burkholderia sp. strain WSM2230 in a 16S rRNA sequence based tree. This strain shares 99% (1352/1364 bp) sequence identity to the 16S rRNA gene of the sequenced strain Burkholderia sp. WSM2232 (Gi08831).

Symbiotaxonomy
Burkholderia sp. WSM2230 formed nodules (Nod+) on, and fixed N2 (Fix+) with, K. coccinea when first isolated. However, after long term storage and its subsequent culture, it failed to nodulate Australian legume hosts (Table 2).  Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living , isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontolog y project [24]. Standards in Genomic Sciences  [26]. Bootstrap analysis [27] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Brackets after the strain name contain a DNA database accession number and/or a GOLD ID (beginning with the prefix G) for a sequencing project registered in GOLD [28]. Published genomes are indicated with an asterisk.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of relevance to agency missions. The genome project is deposited in the Genomes OnLine Database [28] and an improved-high-quality-draft genome sequence in IMG. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 3. IDA: Inferred from Direct Assay from the Gene Ontology project [24].

Growth conditions and DNA isolation
Burkholderia sp. strain WSM2230 was cultured to mid logarithmic phase in 60 ml of TY rich medium on a gyratory shaker at 28°C [29]. DNA was isolated from the cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [30].

Genome sequencing and assembly
The genome of Burkholderia sp. strain WSM2230 was sequenced at the Joint Genome Institute (JGI) using Illumina technology [31]. An Illumina standard shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform which generated 15,498,652 reads totaling 2,324 Mbp.
All general aspects of library construction and sequencing performed at the JGI can be found at the JGI user home [30]. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artifacts (Mingkun, L., Copeland, A. and Han, J., unpublished).
The final draft assembly contained 33 contigs in 33 scaffolds. The total size of the genome is 6.3 Mbp and the final assembly is based on 2,324 Mbp of Illumina data, which provides an average 368× coverage of the genome.

Genome annotation
Genes were identified using Prodigal [34] as part of the DOE-JGI annotation pipeline [35], followed by a round of manual curation using the JGI GenePrimp pipeline [36]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. The tRNAScanSE tool [37] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [38]. Other noncoding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [39].
Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG-ER) platform [40,41].

Genome properties
The genome is 6,309,801 nucleotides 63.07% GC content ( Table 4) and comprised of 33 scaffolds (Figures 3a,3b,3c and Figure 3d) of 33 contigs. From a total of 5,653 genes, 5,590 were protein encoding and 63 RNA only encoding genes. The majority of genes (83.42%) were assigned a putative function whilst the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 5.