Complete genome sequence of Hydrogenobacter thermophilus type strain (TK-6T)

Hydrogenobacter thermophilus Kawasumi et al. 1984 is the type species of the genus Hydrogenobacter. H. thermophilus was the first obligate autotrophic organism reported among aerobic hydrogen-oxidizing bacteria. Strain TK-6T is of interest because of the unusually efficient hydrogen-oxidizing ability of this strain, which results in a faster generation time compared to other autotrophs. It is also able to grow anaerobically using nitrate as an electron acceptor when molecular hydrogen is used as the energy source, and able to aerobically fix CO2 via the reductive tricarboxylic acid cycle. This is the fifth completed genome sequence in the family Aquificaceae, and the second genome sequence determined from a strain derived from the original isolate. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 1,742,932 bp long genome with its 1,899 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain TK-6 T (= DSM 6534 = JCM 7687 = NBRC 102181) is the type strain of Hydrogenobacter thermophilus, which in turn is the type species of the genus Hydrogenobacter [1]. Currently, there are four species in the genus Hydrogenobacter, one of which has subsequently been reclassified as Hydrogenobaculum acidophilum. Strain TK-6 T was previously isolated by Kawasumi in 1980 [2]. The genus name Calderobacterium Kryukov et al. 1984 is, based on page priority, a later heterotypic synonym of Hydrogenobacter Kawasumi et al. 1984 [3], because of similar genetic, phenotypic and biochemical properties between the type strains of H. thermophilus and Calderobacterium hydrogenophilum. Despite the relatively high degree of 16S rRNA gene sequence similarity between the two species, DNA-DNA hybridization [4] indicates that they may be considered to be different species within the genus

Classification and features
The 16S rRNA gene sequence of the strain TK-6 T (Z30214) shows the highest degree of sequence identity, 97%, to the type strain of H. hydrogenophilus [6]. Further analysis shows 96% 16S rRNA gene sequence identity with an uncultured Aquificales bacterium clone pKA (AF453505) from a nearneutral thermal spring in Kamchatka, Russia. The single genomic 16S rRNA sequence of H. thermophilus was compared with the most recent release of the Greengenes database [13] using NCBI BLAST under default values and the relative frequencies of taxa and keywords, weighted by BLAST scores, were determined. The five most frequent genera were Hydrogenobacter (52.4%), Thermocrinis (18.8%), Aquifex (10.3%), Sulfurihydrogenibium (6.2%) and Hydrogenivirga (5.7%). Regarding hits to sequences from other members of the genus, the average identity within HSPs (high-scoring segment pairs) was 96.1%, whereas the average coverage by HSPs was 93.5%. The species yielding the highest score was H. hydrogenophilus. The five most frequent keywords within the labels of environmental samples which yielded hits were 'hot' (6.5%), 'yellowstone' (5.8%), 'spring' (5.6%), 'national/park' (5.4%) and 'microbial' (3.9%). These keywords corroborate with what is known from the ecology and physiology of strain TK-6 T [1,2]. The two most frequent keywords within the labels of environmental samples which yielded hits of a higher score than the highest scoring species were 'aquificales' (34.1%) and 'hot/spring' (32.9%). Figure 1 shows the phylogenetic neighborhood of H. thermophilus TK-6 T in a 16S rRNA based tree. The sequence of the single 16S rRNA gene in the genome differs by one nucleotide from the previously published 16S rRNA sequence (Z30214), which contains 31 ambiguous base calls. Cells of strain TK-6 T are Gram-negative, nonmotile straight rods of 0.3 to 0.5 µm by 2.0 to 3.0 µm occurring singly or in pairs [1] (Figure 2 and Table 1). Molecular oxygen is used as an electron acceptor for respiratory metabolism [1]. However, strain TK-6 T can grow anaerobically on nitrate as an electron acceptor when molecular hydrogen is used as an energy source [33]. Strain TK-6 T does not form colonies on agar plates, but does form colonies on plates solidified with GELRITE, a polysaccharide produced by Pseudomonas species [34]. The optimal temperature for autotrophic growth on H2-O2-CO2 was between 70ºC and 75°C, no growth being observed at 37°C or 80°C [1]. A neutral pH 7.2 was suitable for growth of the strain TK-6 T [1]. One important feature of the strain TK-6 T is a generation time that is faster by about 1h compared to other autotrophs, suggesting that this strain has an efficient hydrogen-oxidizing ability [35]. No spore formation was observed [1]. Strain TK-6 T assimilates carbon dioxide via the reductive tricarboxylic acid cycle [10,36,37]. This is also true when the strain TK-6 T grows anaerobically on nitrate [10]. Cytochromes b and c were found in strain TK-6 T [1]. Interestingly, cytochrome C552 of H. thermophilus TK-6 T is extremely thermostable and can restore its conformation even after being autoclaved for 10 minutes at 121ºC [30]. One of the denitrification enzymes of the strain TK-6 T , cytochrome cd1 nitrite reductase has been isolated and analyzed [38]. Optimum temperature for the activity of this enzyme was found to range between 70ºC-75ºC [38]. Moreover, this enzyme was found to be of the heme cd1-type [33]. Ammonium and nitrate were utilized as nitrogen sources [1,33], but not urea and N2. Growth was inhibited by nitrite [1]. Nitrate reduction and peroxidase were positive, while urease was negative [1]. Strain TK-6 T could not utilize any of the following as sole sources of energy or carbon: glucose, fructose, galactose, maltose, sucrose, xylose, raffinose, L-rhamnose, D-mannose, D-trehalose, mannitol, starch, formate, acetate, propionate, pyruvate, succinate, malate, citrate, fumarate, maleate, glycolate, gluconate, DL-lactate, α-ketoglutarate, phydroxybenzoate, DL-polyhydroxybutyrate, betaine, methanol, ethanol, methylamine, dimethylamine, trimethylamine, glycine, L-glutamate, L-aspartate, Lserine, L-leucine, L-valine, L-tryptophan, L-histidine, L-alanine, L-lysine, L-proline, L-arginine, nutrient broth, yeast extract-malt extract medium, and brain heart infusion [1]. Strain TK-6 T showed no growth under an atmosphere containing 90% CO, 5% CO2, and 5% O2 [1]. No heterotrophic growth was observed in the presence of glucose, fructose, pyruvate, citrate, α-ketoglutarate, succinate, fumarate, malate, acetate, and ethanol with and without yeast extract or carbon dioxide at different concentrations (0.02, 0.05, and 0.1% wt/vol) [1]. H. thermophilus TK-6 T was recently reported to grow on formate and formamide [39]. Malate dehydrogenase, isocitrate dehydrogenase and glucose-6-phosphate isomerase were also detected in the strain TK-6 T [1]. Enzymes of the reductive tricarboxylic acid cycle and some related enzymes in cell-free extracts of strain TK-6 T were detected and their specific activities were found to increase with the temperature, the enzymes being more active at 70°C, as compared to lower temperatures (50°C and 30°C) [10]. In H. thermophilus, ATP-dependent citrate cleavage is catalyzed by two enzymes, citryl-CoA synthetase and citryl-CoA lyase, which catalyze ATP-dependent formation of citryl-CoA from citrate and CoA and the subsequent cleavage of citryl-CoA into acetyl-CoA and oxaloacetate, respectively [40,41]. The biochemistry of key enzymes of the reductive tricarboxylic acid cycle, such as fumarate reductase, ATP citrate lyase, pyruvate:ferredoxin oxidoreductase and 2-oxoglutarate:ferredoxin oxidoreductase, have been studied in some detail in strain TK-6 T [10,37,42]. Strain TK-6 T lacks some important enzyme activities in the central carbon metabolic pathways [43]. For example, activities of phosphofructokinase, pyruvate kinase, 6-phospho gluconate aldolase, which are key enzymes of the Embden-Meyerhof and the Entner-Doudoroff pathways, and activity of α-ketoglutarate dehydrogenase of the tricarboxylic acid cycle could not be detected in cell-free extracts of strain TK-6 T [43]. This is in accord with the findings from the genome sequencing where none of these genes were found in the genome. These metabolic deficits were considered to be partially responsible for the obligate autotrophy of the strain TK-6 T [44]. Activities of phosphoenolpyruvate synthetase and pyruvate carboxylase were also detected [10]. The reverse reactions (dehydrogenase reactions) of αketoglutarate synthase and pyruvate synthase could be detected by using methyl viologen as an electron acceptor [10]. Cloning experiments of the hydrogenase genes from the strain TK-6 T revealed that this strain has at least four clusters of hydrogenase genes [35]. Strain TK-6 T assimilates ammonium using glutamine synthetase (GS type I) [45]. Anisomycin, cycloheximide and emetine (100 µg/ml each) do not inhibit protein biosynthesis and therefore growth of strain TK-6 T [46]. But the inhibitors of protein biosynthesis streptomycin, kanamycin, chloramphenicol, erythromycin, oleandomycin and virginiamycin were found to suppress growth of strain TK-6 T at concentrations below 20 µg/ml [46]. No growth was observed when cell wall synthesis inhibitors were used, (D-cycloserine, fosfomycin, cephalosporin C, penicillin G, oxacillin and ampicillin) at the concentration even below 20 µg/ml [46]. Strain TK-6 T could grow in the presence of monensin, lasalosid, valinomycin, nonactin and polymyxin B [46].
Investigations of the polar lipids has shown that the polar lipids comprise phosphatidylglycerol, phosphatidylinositol, phosphatidylaminopentantetrol and a small amount of an unidentified phospholipid. The sum of these chemotaxomonic features appear to be characteristic of members of the genus Hydrogenobacter, with features such as the presence of methionaquinone, a polar lipid pattern containing phosphatidylglycerol, phosphatidylinositol and phosphatidylaminopentantetrol and the presence of C18:0 and C20:1 fatty acids being taxonomic and evolutionary markers for at least members of the genera Hydrogenobacter, Hydrogenobaculum, Aquifex and Thermoncrinis. This has been discussed in a previous SIGS paper [50].  [14,15] of the 16S rRNA gene sequence under the maximum likelihood criterion [16] and rooted in accordance with the current taxonomy [17]. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates [18] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [19] are shown in blue, published genomes in bold [12,20,21].  Altitude not reported Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [32]. If the evidence code is IDA, then the property was directly observed by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position [51], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [52]. The genome project is deposited in the Genome On Line Database [19] and the complete genome sequence is deposited in Gen-Bank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
H. thermophilus TK-6 T , DSM 6534, was grown in DSMZ medium 533 (Thermophilic hydrogen bacteria medium) [53] with 5% oxygen at 72°C. DNA was isolated from 0.5-1 g of cell paste using Qiagen Genomic 500 DNA Kit (Qiagen, Hilden, Germany) following the standard protocol as recommended by the manufacturer. DNA is available through the DNA Bank Network [54].

Genome annotation
Genes were identified using Prodigal [60] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [61]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, Uni-Prot, TIGRFam, Pfam, PRIAM, KEGG, COG, and In-terPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [62].

Genome properties
The genome consists of a 1,742,932 bp long chromosome with a 44.0% GC content (Table 3 and Figure 3). Of the 1,948 genes predicted, 1,899 were protein-coding genes, and 49 RNAs; thirty pseudogenes were also identified. The majority of the protein-coding genes (97.5%) were assigned with a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Insights into the genome
While the sequencing of the genome described in this paper was underway, Arai et al. from University of Tokyo published the first version of the H. thermophilus TK-6 T genome [19, AP011112]. We take the opportunity to compare the two completed genome sequences, because the history of the two strains designated TK-6 T might differ since the original isolation of the strain by Kawasumu et al. [1], more than a 25 years ago. The first of the two genomes was published by a team of researchers located at the same place where the strain was originally analyzed, with Yasuo Igarashi participating in both, the original description of the strain and the genome analysis. According to personal information by Dr. Arai Hiroyuki (lead author in [19]), the genome was sequenced from clone and fosmid libraries generated by a strain subcultured in the lab since the time of the initial isolation. A fresh culture of the strain from JCM was used for final gap filling and error checking. The DSM 6534 version of the genome was generated from cryopreserved material, which DSMZ received in 1991 from Tohru Kodama of University of Tokyo, and the strain was preserved by storage in liquid nitrogen since it was accessed. A comparison of the two TK-6 T genomes using the genome-to-genome-distance calculation [63][64][65] in conjunction with NCBI-BLASTN yielded a distance of 0.0001 with formula 1, 0.0100 with formula 2 and 0.0101 with formula 3. That is, 99.99% of the total genome length was covered by HSPs, 99.0% of the positions within the HSPs held identical bases, and 98.99% of the total genome length corresponded to such identical base pairs within HSPs. The synteny of the two TK-6 T genome sequences based on a DNA blot was confirmed (data not shown), whereas Table 5 provides a comparison of the basic genome statistics.
The Japanese strain has 1,868 (out of 1,893) protein coding genes identical to the DSMZ strain which is 98.7% of the genome. This means there are 25 genes in the Japanese strain that are not in the DSMZ strain, all except L34P are hypothetical genes. L34P is however present in the version of the genome as presented in this paper, but was missed from the ORF calling/annotation. We also identified 24 genes in the genome sequenced from the DSMZ strain that were missing in the Arai et al. strain. Also most of these were again hypothetical genes. The abundance profiles for both genomes were almost identical, with glycosyltransferase (COG0438) being the most frequent gene in both versions (eleven copies), followed by seven copies of an outer membrane protein (COG1538), each. The DSM 6534 genome contains seven copies of transposase IS605 OrfB (COG0675), whereas Tokyo contains five copies of it. Standards in Genomic Sciences