Complete genome sequence of Thermocrinis albus type strain (HI 11/12T)

Thermocrinis albus Eder and Huber 2002 is one of three species in the genus Thermocrinis in the family Aquificaceae. Members of this family have become of significant interest because of their involvement in global biogeochemical cycles in high-temperature ecosystems. This interest had already spurred several genome sequencing projects for members of the family. We here report the first completed genome sequence a member of the genus Thermocrinis and the first type strain genome from a member of the family Aquificaceae. The 1,500,577 bp long genome with its 1,603 protein-coding and 47 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain HI 11/12 T (= DSM 14484 = JCM 11386) is the type strain of the species Thermocrinis albus [1]. Officially, the genus Thermocrinis currently contains three species [2], however, it should be noted that at the time of writing, the 16S rDNA sequence of the type strain of Thermocrinis ruber held in the DSMZ open collection as DSM 12173 does not correspond with that published under AJ005640. The generic name derives from the Greek word 'therme', meaning 'heat', and the Latin word 'crinis', hair, meaning 'hot hair', referring to the long hair-like filamentous cell structures found in the high-temperature environments, such as hot-spring outlets [3]. These long filaments are formed under conditions where there is a continuous flow of medium. The species name is derived from the Greek word 'alphos', white, referring to the cell color [1]. Strain HI 11/12 T has been isolated from whitish streamers in Hveragerthi, Iceland [3]. Other strains of the species have been isolated from further high-temperature habitats in Iceland, but also in Kamchatka, Russia [1]. Members of the genus Thermocrinis appear to play a major ecological role in global biochemical cycles in such high-temperature habitats [4][5][6][7]. As currently defined the genus does not appear to form a monophyletic group, suggesting that further taxonomic work is necessary. The large interest in the involvement of members of the family Aquificaceae in global biogeochemical cycles in high-temperature ecosystems made them attractive targets for early genome sequencing, e.g. 'Aquifex aeolicus' [8], the third hyperthermophile whose genome was already decoded in 1998 [9]. Like 'A. aeolicus' (a name that was never validly published) strain VF5 [10], Hydrogenobaculum sp. Y04AAS1 (CP001130, JGI unpublished) and Hydrogenivirga sp. 128-5-R1-1 (draft, Moore Foundation) do not represent type strains. Here we present a summary classification and a set of features for T. albus HI 11/12 T , together with the description of the complete genomic sequencing and annotation.

Classification and features
Only four cultivated strains are reported for the species T. albus in addition to HI 11/12 T : Strains H7L1B and G3L1B from the same team that isolated HI 11/12 T [1], and SRI-48 (AF255599) from hot spring microbial mats [11]. All three strains originate from Iceland and share 98.9-99.7% 16S rRNA sequence identity with HI 11/12 T . The only non-Icelandic isolate, UZ23L3A (99.2%), originates from Kamchatka (Russia) [1]. Almost all uncultured clones also originate from Iceland: clones KF6 and HV-7 (GU233821 and GU233840, >99%) from water-saturated sediment in the Krafla and Hveragerdi geothermal systems, respectively. Clones GY1-1 and GY1-2 (GU233809, GU233812, >99%) from water-saturated sediment Geysir hot springs; clone SUBT-1 (AF361217, 99.2%) from subterranean hot springs [12], and clone PIce1 (AF301907, 99.3%) as the dominant clone from a blue filament community of a thermal spring. Only clone PNG_TB_4A2.5H2_B11 (EF100635, 95.9%) originated from a non-Icelandic source: a heated, arsenic-rich sediment of a shallow submarine hydrothermal system on Ambitle Island, Papua New Guinea. According to the original publication the 16S rRNA of the type strain of the closest related species within the genus, T. ruber [3], shares 95.2% sequence identity, whereas the type strains from the closest related genus, Hydrogenobacter, share 94.7-95.0% sequence identity, as determined with EzTaxon [13]. However, as noted above the 16S rDNA sequence of the T. ruber strain held in the DSMZ (DSM 12173) does not correspond with the sequence deposited (AJ005640). Environmental samples and metagenomic surveys featured in the NCBI database contain not a single sequence with >88% sequence identity (as of February 2010), indicating that the species T. albus might play a rather limited and regional role in the environment. Figure 1 shows the phylogenetic neighborhood of T. albus HI 11/12 T in a 16S rRNA based tree. The sequence of the single 16S rRNA gene copy in the genome differs by seven nucleotides from the previously published 16S rRNA sequence generated from DSM 14484 (AJ278895), which contains two ambiguous base calls. Phylogenetic tree highlighting the position of T. albus HI 11/12 T relative to the other type strains within the family Aquificaceae. The tree was inferred from 1,439 aligned characters [14,15] of the 16S rRNA gene sequence under the maximum likelihood criterion [16] and rooted in accordance with the current taxonomy. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 250 bootstrap replicates [17] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [18] are shown in blue, published genomes in bold. Note that the sequence AJ005640 does not correspond with that from the type strain of T. ruber deposited as DSM 12173.
When grown in the laboratory in a continuous flow of medium, for example in a glass chamber [1], strain HI 11/12 T exhibits filamentous growth with a length of 10-60 µm [1]. When grown in static culture, the strain grows singly or in pairs [1]. The cells are short rods with 0.5-0.6 µm in width by 1-3 µm in length and motile by means of a monopolar monotrichous flagellum [1] (Figure 2 and Table 1). However, no flagella are visible in Figure 2. A regularly arrayed surface layer protein was not observed [1]. Strain 11/12 T is microaerophilic with oxygen as electron acceptor [1]. Strain 11/12 T appears to be strictly chemolithoautotrophic [1]. This differentiates T. albus from its two sister species T. ruber and T. minervae, which both can also grow chemoorganoheterotrophically [3,24]. Strain 11/12 T grows optimally under microaerophilic conditions when hydrogen and sulfur are present simultaneously as electron donors [1], however, no growth is observed on nitrate. Physiological characteristics such as the wide temperature preference are reported in Table 1.

Chemotaxonomy
The cell wall of strain 11/12 T contains mesodiaminopimelic acid [1]. There are no reports on the presence of a lipopolysaccharide in the typical Gram-negative cell wall, although there are reports of an LPS in Aquifex pyrophilus [27,28]. Cellular polyamines are important to stabilize cellular nucleic acid structure as a major function, and may function in thermophilic eubacteria as important chemotaxonomic markers [29]. In the genus Thermocrinis, the major polyamines are spermidine and a quaternary branched penta-amine, N 4bis(aminopropyl)-norspermidine [29].
T. albus belongs to a group of organisms where characteristic sulfur containing napthoquinones, menathioquinones (2-methylthio-1,4naphthoquinone) are present [31][32][33][34][35]. The polar lipids reported in members of the genera Aquifex, Hydrogenobaculum, Hydrogenothermus and Thermocrinis are also characteristic, with an unusual aminopentanetetrol phospholipid derivative being present in all strains examined [36,37]. Where detailed analyses have been carried out phosphatidylinositol has also been reported [37]. Stöhr et al. labeled these lipids PNL and PL1 respectively [31]. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [26]. If the evidence code is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position, and is part of the Genomic Encyclopedia of Bacteria and Archaea project [38]. The genome project is deposited in the Genome OnLine Database [18] and the com-plete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2. with a modified protocol for cell lysis, using an additional 5 µl mutanolysin to standard lysis solution, and one hour incubation on ice after the MPC-step.

Genome sequencing and assembly
The genome of strain HI 11/12 T was sequenced using a combination of Illumina [40] and 454. An Illumina GAii shotgun library with reads of 447 Mb, a 454 Titanium draft library with average read length of 287 bases, and a paired end 454 library with average insert size of 17 Kb were generated for this genome. All general aspects of library construction and sequencing can be found at http://www.jgi.doe.gov/. Illumina sequencing data were assembled with VELVET [41], and the consensus sequences were shredded into 1.5 kb overlapped fake reads and assembled together with the 454 data. Draft assemblies were based on 79 Mb 454 draft data. Newbler parameters were -conseda 50 -l 350 -g -m -ml 20. The initial assembly contained six contigs in one scaffold. We converted the initial 454 assembly into a phrap assembly by making fake reads from the consensus, collecting the read pairs in the 454 paired end library. The Phred/Phrap/Consed software package was used for sequence assembly and quality assessment in the subsequent finishing process. After the shotgun stage, reads were assembled with parallel phrap. Possible mis-assemblies were corrected with ga-pResolution (unpublished), Dupfinisher or sequencing cloned bridging PCR fragments with sub-cloning or transposon bombing [42]. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks (J-F. Chan, unpublished). A total of 68 additional reactions were necessary to close gaps and to raise the quality of the finished sequence. The completed genome sequence had an error rate of less than 1 in 100,000 bp

Genome annotation
Genes were identified using Prodigal [43] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [44]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [45].

Genome properties
The genome consists of a 1,500,577 bp long chromosome with a 46.9% GC content (Table 3 and Figure 3). Of the 1,650 genes predicted, 1,593 were protein coding genes, and 47 RNAs; 10 pseudogenes were identified. The majority of the proteincoding genes (75.2%) were assigned with a putative function while those remaining were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Insights from the genome sequence
With only very few papers published on the organism [1,2], and only one gene sequence (16S rRNA) available in GenBank from strain HI 11/12 T , a comparison of already known sequences to the here presented novel genomic data is rather meager for T. albus. As shown in Figure 1, there are presently no other type strain genomes from the Aquificaceae available either to allow a meaningful comparative genomics analysis. This might change when other type/neotype strains of species within the genus Thermocrinis which are also part of the Genomic Encyclopedia of Bacteria and Archaea project [38] will become available in the near future.