Complete genome sequence of Enterobacter sp. IIT-BT 08: A potential microbial strain for high rate hydrogen production

Enterobacter sp. IIT-BT 08 belongs to Phylum: Proteobacteria, Class: Gammaproteobacteria, Order: Enterobacteriales, Family: Enterobacteriaceae. The organism was isolated from the leaves of a local plant near the Kharagpur railway station, Kharagpur, West Bengal, India. It has been extensively studied for fermentative hydrogen production because of its high hydrogen yield. For further enhancement of hydrogen production by strain development, complete genome sequence analysis was carried out. Sequence analysis revealed that the genome was linear, 4.67 Mbp long and had a GC content of 56.01%. The genome properties encode 4,393 protein-coding and 179 RNA genes. Additionally, a putative pathway of hydrogen production was suggested based on the presence of formate hydrogen lyase complex and other related genes identified in the genome. Thus, in the present study we describe the specific properties of the organism and the generation, annotation and analysis of its genome sequence as well as discuss the putative pathway of hydrogen production by this organism.


Introduction
Hydrogen has great promise in contributing substatially to the renewable energy demands of the future. It is considered a dream fuel by virtue of the fact that it is renewable, does not evolve green house gases, has the highest energy content per unit mass of any known fuel (143 GJ t -1 ), is easily converted to electricity by fuel cells and upon combustion, gives water as the only byproduct [1]. Moreover, hydrogen is the third most abundant element on Earth. However, finding simple, inexpensive ways to extract hydrogen and produce it in a pure gaseous form is a crucial step toward making the "hydrogen economy" a reality. Considering this, hydrogen production using microbes is thought to be a promising technique to produce economical, abundant hydrogen without utilizing fossil fuels. Many microbial species have been reported for hydrogen production [2]. Among them, Enterobacter sp. IIT-BT 08 (MTCC 5373, DSM 24603) was reported as a high rate hydrogen producer [3]. It is a Gram negative, facultative anaerobe that can grow and produce hydrogen from a wide range of simple sugars and complex polysaccharides [4]. In the past decade, the group at the Bioprocess Engineering Laboratory at IIT Kharagpur, India, has extensively worked on this organism using various fermentative approaches and established it as one of the highest yielding hydrogen producers [5]. The novelty of the organism lies in the amount of hydrogen (2.2 mol H2 mol -1 glucose) it can produce at ambient temperature (37 °C) and atmospheric pressure as compared to other closely related species reported in literature. Besides, high rate of continuous hydrogen production has been reported using immobilized Enterobacter sp. IIT-BT 08 and waste as substrate using 20 L and 800 L reactors [5]. Therefore, whole genome sequencing of this potential strain was considered to determine the genes responsible for the high rate hydrogen production. In this report we present a summary of the properties and features of Enterobacter sp. IIT-BT 08 genome and also suggest a putative pathway for hydrogen production.

Classification and features
E. sp. IIT-BT 08 was isolated from the leaves of a local plant near the Kharagpur railway station, Kharagpur, West Bengal, India [4]. The bacterium is a Gram negative, small, motile, catalase positive rod [4,6,7] belonging to the family Enterobacteriaceae (Table 1). To characterize the strain, a set of standard tests were carried out according to Bergey's Manual and the results showed that the strain belongs to Enterobacter species. 16S rRNA sequencing by Microbial Type Culture Collection (MTCC), Chandigarh further confirmed the strain identity. The genetic complexity of the organism is illustrated in the phylogenetic tree of the 16S RNA region (Figure 1). Initially the strain was classified as Enterobacter sp. IIT-BT 08, however, whole genome sequencing of the strain revealed sequence variation in the six 16S rRNA copies of the strain. We presume that this may have been the source of difficulty in the initial mis-identification of the strain. Currently, without a complete set of type strain genome sequences available for a more detailed taxonomic identification, the name of the strain has been changed to Enterobacter sp. IIT-BT 08.

Genome project history Genome sequencing information
Enterobacter sp. IIT-BT 08 is a promising hydrogen producer and can utilize waste as substrate for hydrogen production [4]. Therefore, it was considered essential to sequence the whole genome of the organism to determine the genes that contributed towards hydrogen production. Besides, complete genome information was also critical to facilitate studies on genetic engineering of the organism for further enhancement of its hydrogen production potential. Therefore, the group applied for the Community Sequencing Program-2010 (CSP-2010) offered by DoE-JGI.
One of the DOE missions is to address the critical question of depleting energy reserves by creating a new generation of biological research enabled by the genome revolution. This organism therefore appeared relevant to this mission and was selected for sequencing. The genome sequence was completed on May 21, 2012. Quality assurance was done by the DSMZ (Braunschweig, DE), finishing and annotation was completed at Joint Genome Institute. A summary of the project information is shown in Table 2, which also presents the project information and its association with MIGS version 2.0 compliance [8].

Growth conditions and DNA isolation
For genomic DNA isolation, Enterobacter sp. was cultivated overnight in nutrient broth at 37 °C and 200 rpm in a gyratory incubator shaker. DNA isolation was carried out by Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (DSMZ) institute. For DNA isolation, the strain was grown in DSMZ medium 381 (Luria-Bertani Medium) at 37°C. DNA was isolated from 1-1.5 g of cell paste using Jetflex Genomic DNA Purification Kit (Genomed_600100) following the manufacturer's recommendations for Gram-positive bacteria (which were more efficient than the conditions recommended for Gram-negative cells). The identity of the DNA was confirmed via 16S rRNA gene sequencing and the quality was analyzed following the recommendations of the sequencing center (JGI), including pulse-field gel electrophoresis.

Genome sequencing and assembly
The draft genome of Enterobacter sp. IIT-BT 08 was generated at the DOE Joint Genome Institute (JGI) using Illumina data [22]. For this genome, JGI constructed and sequenced an Illumina short-insert paired-end library with an average insert size of 231 +/-59 bp which generated 24,130,984 reads and an Illumina long-insert paired-end library with an average insert size of 8,267 +/-2,204 bp which generated 13,553,468 reads totaling 5,653 Mbp of Illumina data. (unpublished, Feng Chen). All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website. The initial draft assembly contained 21 contigs in 3 scaffold(s). The initial draft data was assembled with Allpaths, version 39750, and the consensus was computationally shredded into 10 Kbp overlapping fake reads (shreds). The Illumina draft data was also assembled with Velvet, version 1.1.05 [23], and the consensus sequences were computationally shredded into 1.5 Kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from the second Velvet assembly was shredded into 1.5 Kbp overlapping fake reads. The fake reads from the Allpaths assembly and both Velvet assemblies and a subset of the Illumina CLIP paired-end reads were assembled using parallel phrap, version 4.24 (High Performance Software, LLC). Possible mis-assemblies were corrected with manual editing in Consed [24][25][26]. Gap closure was accomplished using repeat resolution software (Wei Gu, unpublished), and sequencing of bridging PCR fragments with Sanger and/or PacBio (unpublished, Cliff Han) technologies. For improved high quality draft and noncontiguous finished projects, one round of manual/wet lab finishing may have been completed. Primer walks, shatter libraries, and/or subsequent PCR reads may also be included for a finished project. A total of 0 additional sequencing reactions, 6 PCR PacBio consensus sequences, and 0 shatter libraries were completed to close gaps and to raise the quality of the final sequence. The total estimated size of the genome is 4.7 Mb and the final assembly is based on 5,653 Mbp of Illumina draft data, which provides an average 1,203× coverage of the genome.

Genome annotation
Genes were identified using Prodigal [27] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [28. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, Uni-Prot, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation were performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [29].

Genome properties
The genome of E. sp. IIT-BT 08 consists of one linear chromosome of 4,672,040 bp ( Figure 2). The average G+C content for the genome is 56.01% (Table 3). There are 78 tRNA genes and 6 rRNA operons each consisting of a 16S, 23S, and 5S rRNA gene. There are 4,393 predicted proteincoding regions and 43 pseudogenes in the genome. A total of 3,881 protein-coding genes (85.64%) have been assigned a predicted function while the rest have been designated as hypothetical proteins ( Table 4). The numbers of genes assigned to each COG functional category are listed in Table 4. About 2% of the annotated genes were not assigned to COGs and have an unknown function.

Biohydrogen production pathway
The complete genome sequencing of the organism helps provide a preliminary idea of the genes involved in the hydrogen production pathway. The genome revealed the presence of formate hydrogen lyase (EntIIITBT8_2511) and its maturation operons HycH (EntIIITBT8_2678), NiFe hydrogenase III small and large subunit (EntIIITBT8_2679, EntIIITBT8_2681), their maturation operons and the FeS cluster containing hydrogenase components 1 and 2 (EntIIITBT8_0331, EntIIITBT8_2684). A complete list of all the genes predicted to be involved in the hydrogen production pathway is listed in Table 5. The whole genome information of the organism suggests that hydrogen production in Enterobacter sp. IIT-BT 08 is carried out through the formate hydrogen lyase (FHL) complex which consists of formate dehydrogenase (FDH-H), hydrogenase (Hyd-3) and the electron transfer mediators [30].  a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.
b) Pseudogenes may also be counted as protein coding or RNA genes, so is not additive under total gene count. The total is based on the total number of protein coding g enes in the entire annotated g enome However, in the future the hypothetical pathway must be verified with wet lab experiments. Based on the previous reported literature it may be that formate dehydrogenase and hydrogenase 3 together form a membrane protein complex that is responsible for hydrogen production in facultative anaerobes [30][31][32]. Rossmann et al. suggested that in facultative anaerobes hydrogen production was determined by the concentration of formate in the cell, which in turn determined the formation of the FHL complex [32]. A putative model ( Figure 3) has been suggested based on the biochemistry of the reactions involved in the pathway [34]. Formate dehydrogenase is suggested to catalyze the oxidation of formate into carbon dioxide. The electrons released in the process are transferred to Hyd3 encoded by hycABCDEFGH to generate molecular hydrogen under anaerobic conditions [33]. The model suggests a plausible scheme of electron transfer from FdhF to the catalytic subunit of hycE via hycBCFG subunits. Among these, hycB and hycF have been determined to be [4Fe-4S] ferredoxin type electron transfer proteins [35]. On the other hand, hycE and hycG shares homology with NADH ubiquinone oxidoreductase (NUO) subunits of the mitochondria and chloroplast [35].
In the model, hycC and hycD have been suggested to act as transmembrane proteins. Standards in Genomic Sciences Electron acceptors, like oxygen or nitrate, generally inhibit the expression of the FHL complex, whereas its biosynthesis is controlled by the concentration of formate in the cell [32]. Further, it has been suggested that the micro elements selenium and molybdenum are involved at the active site of FDH-H, while nickel is a component of the Hyd-3 active site [30,36]. Accordingly, it has been suggested that the FHL complex can be induced by regulating the presence of formate and metal ions in slightly acidic pH under anaerobic conditions.
Transcription of the FHL complex is under the control of several genes, including fhlA, which codes for the FHL activator protein FHLA, a tetramer that binds to the upstream region of the DNA encoding the FHL complex and promotes the transcription of the FHL complex [34,37]. Moreover, hycA codes for the FHL repressor protein that binds to FHLA or to the FHLA-formate complex. Since fhlA and hycA control the transcription of the FHL complex, it is theoretically possible to control the specific FHL activity and the specific hydrogen production rate by manipulating these genes or their genetic controls [38].

Conclusion
The genome of Enterobacter sp. IIT-BT 08 was sequenced and annotated by the DOE Joint Genome Institute. The genomic properties of the organism were analyzed using various IMG tools, and, based on the genome sequence, a putative pathway of hydrogen production based on formate hydrogen lyase complex was discussed.

Acknowledgement
The work was conducted by the U.S. Department of Energy Joint Genome Ins titute and is s upported by the Office of Science of the U.S. D epartment of Energy under Contrac t No. DE-AC02-05CH11231. Authors (DD and NK) are also thankful to MNRE for thei r financial assistance. NK also gratefully acknowl edges D epartment of Biotechnology (DBT), Government of India, for senior research fellows hip. The authors from IITKgp, India s ubmitted the JGI-CSP project, analyzed the data and wrote the manuscri pt. T he authors from DSMZ confi rmed the s train identity and extracted high quality genomic DNA for s equencing. The authors from D oE-JGI, WC, USA, and LLNL, Livermore CA USA carried out the sequencing and annotation of the genome.