Draft genome sequence of Gluconobacter thailandicus NBRC 3257

Gluconobacter thailandicus strain NBRC 3257, isolated from downy cherry (Prunus tomentosa), is a strict aerobic rod-shaped Gram-negative bacterium. Here, we report the features of this organism, together with the draft genome sequence and annotation. The draft genome sequence is composed of 107 contigs for 3,446,046 bp with 56.17% G+C content and contains 3,360 protein-coding genes and 54 RNA genes.


Introduction
Acetic acid bacteria (AAB) are strictly aerobic Alphaproteobacteria. AAB are well known for their potential to incompletely oxidize a wide variety of sugars and alcohols. The genus Gluconobacter oxidizes a wide range of sugars, sugar alcohols, and sugar acids, and can accumulate a large amount of the corresponding oxidized products in the culture medium [1]. Thus, Gluconobacter strains are widely used for the industrial production of pharmaceutical intermediates, such as L-sorbose (vitamin C synthesis), 6-amino-L-sorbose (synthesis of the antidiabetic drug miglitol), and dihydroxyacetone (cosmetics for sunless tanning) [1]. Furthermore, the genera Acetobacter and Gluconacetobacter are widely used for the industrial production of vinegar because of their high ethanol oxidation ability [2]. To date, six genome sequences of Gluconobacter strains (Gluconobacter oxydans 621H, Gluconobacter oxydans H24, Gluconobacter oxydans WSH-003, Gluconobacter thailandicus NBRC 3255, Gluconobacter frateurii NBRC 101659, and Gluconobacter frateurii NBRC 103465) are available in the public databases [3][4][5][6][7][8]. These genomic data are useful for the experimental identification of unique proteins or estimation of the phylogenetic relationship among the related strains [9][10][11]. Gluconobacter thailandicus NBRC 3257 was isolated from downy cherry (Prunus tomentosa) in Japan [12], and identified based on its 16S rRNA sequence [13]. Here, we present a summary of the classification and a set of features of G. thailandicus NBRC 3257, together with a description of the draft genome sequencing and annotation.

Classification and features
A representative genomic 16S rRNA sequence of G. thailandicus NBRC 3257 was compared to the 16S rRNA sequences of all known Gluconobacter species type strains. The 16S rRNA gene sequence identities between G. thailandicus NBRC 3257 and all other type strains of genus Gluconobacter species were 97.58-99.85%. Gluconobacter species (type strains) exhibiting the highest sequence identities to NBRC 3257 were Gluconobacter frateurii NBRC 3264 T and Gluconobacter japonicas NBRC 3271 T . Figure 1 shows the phylogenetic relationships of G. thailandicus NBRC 3257 to other Gluconobacter species in a 16S rRNA based tree. All the type strains and ten strains of G. thailandicus including NBRC 3257 were used for the analysis [13,17]. Based on this tree, genus Gluconobacter is divided into two sub-groups. Gluconobacter wanchamiae, Gluconobacter cerinus, G. frateurii, G. japonicas, Gluconobacter nephelli, and G. thailandicus are classified as clade 1. On the other hand, Gluconobacter kondonii, Gluconobacter sphaericus, Gluconobacter albidus, Gluconobacter kanchanaburiensis, Gluconobacter uchimurae, Gluconobacter roseus, and Gluconobacter oxydans belong to the clade 2. All ten G. thailandicus strains are closely related to each other, and the 16S rRNA sequences have 100% identities.
Although ethanol oxidation ability is a typical feature of AAB, it is a critical feature that NBRC 3257 lacks the ability to oxidize ethanol because it is missing the cytochrome subunit of the alcohol dehydrogenase complex that functions as the primary dehydrogenase in the ethanol oxidase respiratory chain [18]. Despite its inability to oxidize ethanol, NBRC 3257 can efficiently oxidize many unique sugars and sugar alcohols, such as pentitols, D-sorbitol, D-mannitol, glycerol, mesoerythritol, and 2,3-butanediol [19]. Thus, G. thailandicus NBRC 3257 has unique characteristic features and the potential for the industrial production of many different oxidized products useful as drug intermediates or commodity chemicals. To construct the phylog enetic tree, these sequences were collected and nucleotide sequence alig nment was carried out using CLUSTALW [14]. We used the MEGA version 5.05 package to generate phylogenetic trees based on 16S rRNA g enes with the neig hbor-joining (NJ) approach and 1,000 bootstrap replicates [15,16]. Acetobacter aceti NBRC14818 (X74066) was used as the outg roup.
G. thanilandicus NBRC 3257 is a strictly aerobic, mesophilic (temperature optimum ≈ 30C) organism. Differential interference contrast image of G. thailandicus NBRC 3257 cells grown on mannitol medium (25 g of D-manntiol, 5 g of yeast extract, and 3 g of polypeptone per liter) are shown in Figure 2 (A). The cells have short-rod shape with 2.6 ± 0.6 (mean ± SD, n = 10) µm in cell length and 1.2 ± 0.1 (mean ± SD, n = 10) µm in cell width. The flagella stained by the modified Ryu method are shown in Figure 2 (B) and Figure 2 (C) [20]. Singly and multiply flagellated cells were observed frequently. The characteristic features are shown in Table 1.  Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [33]. If the evidence code is IDA, then the property should have been directly observed, for the purpose of this specific publication, for a live isolate by one of the authors, or an expert or reputable institution mentioned in the acknowledgements.

Genome sequencing information Genome project history
This genome was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the Gluconobacter genus. This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession BASM00000000. The version described in this paper is the first version, BASM01000000, and the sequence consists of 107 contigs. Table 2 presents the project information and its association with MIGS version 2.0 compliance [34].

Growth conditions and DNA isolation
The culture of strain NBRC 3257 used to prepare genomic DNA for sequencing was a laboratory stock and grown on ∆P medium [ 35] at 30°C with vigorous shaking. The genomic DNA was isolated as described in [36] with some modifications [35]. Three ml of culture broth was used to isolate DNA, and the final DNA preparation was dissolved in 10 mM Tris-HCl (pH 8.0) and 1 mM ethylendiamine tetraacetic acid solution. The purity, quality, and size of the genomic DNA preparation were analyzed by Hokkaido System Science Co., Ltd. (Japan) using spectrophotometer, agarose gel electrophoresis, and Qubit (Invitrogen, Carlsbad, CA) according to the their guidelines.

Genome sequencing and assembly
The genome of G. thailandicus NBRC 3257 was sequenced using the Illumina Hiseq 2000 sequencing platform by the paired-end strategy (2×100 bp). Paired-end genome fragments were annealed to the flow-cell surface in a cluster station (Illumina). A total of 100 cycles of sequencing-by-synthesis were performed and high-quality sequences were retained for further analysis. The final coverage reached 358-fold for an estimated genome size of 3.44 Mb. The sequence data from Illumina HiSeq 2000 were assembled with Velvet ver. 1.2.07 [37]. The final assembly yielded 107 contigs generating a genome size of 3.44 Mb. The contigs were ordered against the complete genome of G. oxydans 621H [3] using Mauve [38][39][40].

Genome annotation
Protein-coding genes (ORFs) of draft genome assemblies were predicted using Glimmer version 3.02 with a self-training dataset [41,42]. tRNAs and rRNAs were predicted using ARAGORN and RNAmmer, respectively [43,44]. Functional assignments of the predicted ORFs were based on a BLASTP homology search against two genome sequences, G. thailandicus NBRC 3255 and G. oxydans 621H, and the NCBI nonredundant (NR) database [45]. Functional assignment was also performed with a BLASTP homology search against Clusters of Orthologous Groups (COG) databases [46].

Genome properties
The genome of G. thailandicus NBRC 3257 is 3,446,046 bp long (107 contigs) with a 56.17% G + C content (Table 3). Of the 3,414 predicted genes, 3,360 were protein coding genes, and 54 were RNAs (3 rRNA genes, and 51 tRNA genes). A total of 2,249 genes (66.93%) were assigned a putative function. The remaining genes were annotated as hypothetical genes. The properties and statistics of the genome are summarized in Table 3. The distribution of genes into COG functional categories is presented in Table 4. Of the 3,360 proteins, 2,669 (79%) were assigned to COG functional categories. Of these, 245 proteins were assigned to multiple COG categories. The most abundant COG category was "General function prediction only" (342 proteins) followed by "Amino acid transport and metabolism" (247 proteins), "Function unknown" (232 proteins), "Cell wall/membrane/envelope biogenesis" (220 proteins), "Inorganic ion transport and metabolism" (210 proteins), and "Replication, recombination and repair" (201 proteins). The genome map of G. thailandicus NBRC 3257 is illustrated in Figure 3, which demonstrates that the pattern of GC skew shifts from negative to positive along an ordered set of contigs with some exceptions. This suggests that the draft genome sequences were ordered almost exactly.

Gene repertoire of G. thailandicus NBRC 3257 genome
Annotation of the genome indicated that NBRC 3257 has membrane-bound PQQ-dependent alco-hol dehydrogenase, adhAB operon (locus_tag NBRC3257_1377 and NBRC3257_1378) and adh subunit III (NBRC3257_1024). A unique orphan gene of adh subunit I was also identified (NBRC3257_3117). The gene repertories of other membrane-bound PQQ dependent proteins were investigated. Homologous proteins of membranebound PQQ-dependent dehydrogenase (NBRC3257_0292), membrane-bound glucose dehydrogenase (PQQ) (NBRC3257_0371), PQQdependent dehydrogenase 4 (NBRC3257_0662), and PQQ-dependent dehydrogenase 3 (NBRC3257_1743), were identified. In addition, two paralogous copies of the PQQ-glycerol dehydrogenase sldAB operon (NBRC3257_0924 to NBRC3257_0925 and NBRC3257_1134 to NBRC3257_1135) were identified. It has been thought that the respiratory chains of Gluconobacter species play key roles in respiratory energy metabolism [48][49][50][51]. Therefore, the gene repertoires of respiratory chains of NBRC 3257 were also investigated. Besides two type II NADH dehydrogenase homologs (NBRC3257_1995 and NBRC3257_2785) [51], a proton-pumping NADH:ubiquinone oxidoreductase operon (type I NADH dehydrogenase complex) (NBRC3257_2617 to NBRC3257_2629), a cytochrome o ubiquinol oxidase cyoBACD operon (NBRC3257_2304 to NBRC3257_2307), and a cyanide-insensitive terminal oxidase cioAB operon (NBRC3257_0388 to NBRC3257_0389) [48,49], were identified. a) The total is based on either the size of the g enome in base pairs or the total number of protein coding genes in the annotated genome. Graphical circular map of a simulated draft Gluconobacter tha ilandicus NBRC 3257 g enome. The simulated genome is a set of contig s ordered against the complete g enome of G. oxydans 621H [3] using Mauve [38][39][40]. The circular map was g enerated using CGview [47]. From t he outside to the center: genes on forward strand, genes on reverse strand, GC content, GC skew. a) The total is based on the total number of protein coding g enes in the annotated genome.

Acknowledgement
This work was financi ally supported by the Adv anced Low Carbon Technology Research and Development Program (ALCA).