Complete genome sequence of ‘Thermobaculum terrenum’ type strain (YNP1T)

‘Thermobaculum terrenum’ Botero et al. 2004 is the sole species within the proposed genus ‘Thermobaculum’. Strain YNP1T is the only cultivated member of an acid tolerant, extremely thermophilic species belonging to a phylogenetically isolated environmental clone group within the phylum Chloroflexi. At present, the name ‘Thermobaculum terrenum’ is not yet validly published as it contravenes Rule 30 (3a) of the Bacteriological Code. The bacterium was isolated from a slightly acidic extreme thermal soil in Yellowstone National Park, Wyoming (USA). Depending on its final taxonomic allocation, this is likely to be the third completed genome sequence of a member of the class Thermomicrobia and the seventh type strain genome from the phylum Chloroflexi. The 3,101,581 bp long genome with its 2,872 protein-coding and 58 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain YNP1 T (= ATCC BAA-798 = CCMEE 7001) is the proposed type strain of the not yet validly published species 'Thermobaculum terrenum', which represents the type species of the not yet validly published genus name 'Thermobaculum' [1]. The strain was cultivated from a moderately acidic (pH 3.9) extreme thermal soil in Yellowstone National Park (YNP), Wyoming (USA) for which a thorough chemotaxonomic characterization was published by Botero et al. in 2004 [1]. Although the biological characteristics of the novel strain fulfill all criteria required for the type strain of a novel genus, the proposed name 'Thermobaculum terrenum' (= hot small rod belonging to earth/soil) has not yet been validly published (= included in one of the updates of the Validation List that is regularly published in Int J Syst Evol Bacteriol), because rule 30 (3a) of the Bacteriological Code (1990 Revision), which re-quires that as of 1 st January 2001 the description of a new species [...] must include the designation of a type strain, and a viable culture of that strain must be deposited in at least two publicly accessible service collections in different countries from which subcultures must be available [2]. Strain YNP1 T is currently deposited only in two US culture collections. Here we present a summary classification and a set of features for 'T. terrenum' strain YNP1 T , together with the description of the complete genomic sequencing and annotation.

Classification and features
Based on analyses of 16S rRNA gene sequences, strain YNP T is the sole cultured representative of the genus 'Thermobaculum'. It has no close relatives among the validly described species within the Chloroflexi. The type strain of Sphaerobacter thermophilus [3] shares the highest pairwise similarity (84.9%), followed by Thermoleophilum album and T. minutum [4][5][6], the two sole members of the actinobacterial order Thermoleophilales [7] with 83.6% sequence identity, and three type strains from the clostridial genus Thermaerobacter (83.2-83.5%) [8], that are currently not placed within a named family. Only four uncultured bacterial clones in GenBank share a higher degree of sequence similarity with strain YNP T than the type strain of the 'closest' related species, S. thermophilus. These are clone DRV-SSB031 from rock varnish in the Whipple Mountains, California (92.1%) [9], and clones AY6_14 (FJ891044), AY6_27 (FJ891057) and AY6_18 (FJ891048) from quartz substrates in the hyperarid core of the Atacama Desert (86.9-87.9%). No phylotypes from environmental screening or metagenomic surveys could be linked to 'T. terrenum', indicating a rather rare occurrence in the habitats screened thus far (as of September 2010). A representative genomic 16S rRNA sequence of 'T. terrenum' YNP T was compared using BLAST with the most recent release of the Greengenes database [10] and the relative frequencies of taxa and keywords, weighted by BLAST scores, were determined. The three most frequent genera were Thermobaculum (81.2%), Sphaerobacter (10.3%) and Conexibacter (8.4%). The five most frequent keywords within the labels of environmental samples which yielded hits were 'microbial' (3.6%), 'waste' (3.3%), 'soil' (3.3%), 'simulated' (3.2%) and 'level' (3.1%). The five most frequent keywords within the labels of environmental samples which yielded hits of a higher score than the highest scoring species were 'soil' (4.5%), 'structure' (3.3%), 'simulated' (3.2%), 'level/site/waste' (2.9%) and 'core' (2.1%). Figure 1 shows the phylogenetic neighborhood of 'T. terrenum' strain YNP T in a 16S rRNA based tree. The sequences of the two identical 16S rRNA gene copies in the genome do not differ from the previously published 1,333 nt long partial sequence generated from ATCC BAA-798 (AF391972). Phylogenetic tree highlighting the position of 'T. terrenum' strain YNP T relative to the type strains of the other species within the phylum Chloroflexi . The trees were inferred from 1,316 aligned characters [11,12] of the 16S rRNA gene sequence under the maximum likelihood criterion [13] and rooted in accordance with the current taxonomy. The branches are scaled in terms of the expected number of substitutions per site. Numbers above the branches are support values from 1,000 bootstrap replicates [14] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [15] are shown in blue, published genomes [16] and GenBank records [CP000804,CP000875,CP000909,CP001337] in bold, e.g. the GEBA genome S. thermophilus [17].
The cells of strain YNP1 T are 1-1.5 × 2-3 μm long, non-motile rods ( Figure 2 and Table 1), enveloped by a thick cell wall external to a cytoplasmic membrane [1]. YNP1 T cells occur singly or in pairs, stain Gram-positive in the exponential growthphase, are obligately aerobic, and non-sporeforming [1]. Colonies are pink-colored and growth occurs best at pH 6-8 (pHopt 7) and 67°C, with a possible temperature range of 41-75°C [1]. Culture doubling time at Topt was 4 hours and increases sharply above 70°C, whereas growth at the temperature extremes was relatively poor [1]. Cells grow best in complex media containing 0.5% NaCl and yeast extract (for growth factors) [1], but also on sucrose, fructose, glucose, ribose, xylose, sorbitol, and xylitol [1]. Strain YNP1 T was positive for catalase, urease, and nitrate reduction, but tested negative for oxidases, and was also negative for fermentation of glucose or lactose [1]. No anaerobic growth was observed in the presence of sulfate, nitrate, ferric iron, or arsenate as possible electron acceptors [1]. No chemolithoautotrophic growth was observed in an experimental matrix that included the electron donors H2, H2S, or S0 with oxygen as the electron acceptor. Surprisingly, the in vitro pH optimum of strain YNP1 T (pH 7) is much higher than that of the soil from which it was isolated (pH 4-5) [1]. In pure culture, strain YNP1 T failed to grow at such low pH values, suggesting that the thermal soil habitat is not optimal for the strain [1].

Chemotaxonomy
Murein is present in large amounts, which is consistent with the observed thick (approximately 34 nm) cell walls with a muramic acid content similar to that of Bacillus subtilis [1]. The muramic acid content of strain YNP1 T was roughly one quarter of that measured for B. subtilis) but almost 40-fold greater than in E. coli [1]. Lipopolysaccharide (LPS) was not detected [1]. Major fatty acids were dominated by straight and branched chain saturated acids: C18:0 (27.0%); iso-C17:0 (11.6%); iso-C19:0 (12.9%); anteiso-C18:0 (12.5%); C20:0 (16.5%) and C19:0 (6.6%). The pink pigment associated with strain YNP1 T exhibited a significant absorption at wavelengths 267, 326, 399, 483, 511, and 549 nm [1].  Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [25]. If the evidence code is IDA, then the property was directly observed by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position [26], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [27]. The genome project is deposited in the Genome OnLine Database [15] and the complete genome sequence is deposited in Gen-Bank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
T. terrenum strain YNP1 T , ATCC BAA-798, was grown in ATCC medium 1981 (M-R2A medium) [28] at 60°C. The culture used to prepare genomic DNA (gDNA) for sequencing was only two transfers from the original deposit. The purity of the culture was determined by growth on general maintenance media under both aerobic and anaerobic conditions. Cells where harvested after 24 hours by centrifugation and gDNA was extracted from lysozyme-treated cells using CTAB and phenolchloroform. The purity, quality and size of the bulk gDNA preparation was assessed according to DOE-JGI guidelines. Amplification and partial sequencing of the 16S rRNA gene confirmed the isolate as 'T. terrenum'. The quantity of the DNA was determined on a 1% agarose gel using mass markers of known concentration supplied by JGI. The average fragment size of the purified gDNA determined to be ~43kb by pulsed-field gel electrophoresis.

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website. Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 3,926 overlapping fragments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the parallel phrap assembler (High Performance Software, LLC). Possible misassemblies were corrected with Dupfinisher or transposon bombing of bridging clones [29]. A total of 432 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. Illumina reads were used to improve the final consensus quality using an in-house developed tool (the Polisher [30]). The error rate of the completed genome sequence is less than 1 in 100,000. Together, the combination of the Sanger and 454 sequencing platforms provided 10.0× coverage of the genome. The final assembly contains 32,920 Sanger reads.

Genome annotation
Genes were identified using Prodigal [31] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [32]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, Uni-Prot, TIGRFam, Pfam, PRIAM, KEGG, COG, and In-terPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [33]. Standards in Genomic Sciences

Genome properties
The genome consists of two chromosomes: the low G+C (48%) 2,026,947 bp long chromosome 1, and the high G+C (64%) 1,074,634 bp long chromosome 2 (Table 3, Figure 3, Figure 4). Of the 2,930 genes predicted (1,935 on chromosome 1 and 995 on chromosome 2), 2,872 were protein-coding genes, and 58 RNAs; forty one pseudogenes were also identified. The majority of the protein-coding genes (73.4%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.