The complete genome sequence of Clostridium indolis DSM 755T

Clostridium indolis DSM 755T is a bacterium commonly found in soils and the feces of birds and mammals. Despite its prevalence, little is known about the ecology or physiology of this species. However, close relatives, C. saccharolyticum and C. hathewayi, have demonstrated interesting metabolic potentials related to plant degradation and human health. The genome of C. indolis DSM 755T reveals an abundance of genes in functional groups associated with the transport and utilization of carbohydrates, as well as citrate, lactate, and aromatics. Ecologically relevant gene clusters related to nitrogen fixation and a unique type of bacterial microcompartment, the CoAT BMC, are also detected. Our genome analysis suggests hypotheses to be tested in future culture based work to better understand the physiology of this poorly described species.


Introduction
The C. saccharolyticum species group is a poorly described and taxonomically confusing clade in the Lachnospiraceae, a family within the Clostridiales that includes members of clostridial cluster XIVa [1]. This group includes C. indolis, C. sphenoides, C. methoxybenzovorans, C. celerecrescens, and Desulfotomaculum guttoideum, none of which are well studied ( Figure 1). C. saccharolyticum has gained attention because its saccharolytic capacity was shown to be syntrophic with the cellulolytic activity of Bacteroides cellulosolvens in co-culture, enabling the conver-sion of cellulose to ethanol in a single step [6,7]. Members of this group, such as C. celerecrescens, are themselves cellulolytic [8], and others are known to degrade unusual substrates such as methylated aromatic compounds (C. methoxybenzovorans) [9], and the insecticide lindane (C. sphenoides) [10]. C. indolis was targeted for whole genome sequencing to provide insight into the genetic potential of this taxa that could then direct experimental efforts to understand its physiology and ecology. Standards in Genomic Sciences

Classification and features
The general features of Clostridium indolis DSM 755 T are listed in Table 1. C. indolis DSM 755 T was originally named for its ability to hydrolyze tryptophan to indole, pyruvate, and ammonia [23] in the classic Indole Test used to distinguish bacterial species. It has been isolated from soil [24], feces [25], and clinical samples from infections [27]. Despite its prevalence, C. indolis is not well characterized, and there are conflicting reports about its physiology. It is described as a sulfate reducer with the ability to ferment some simple sugars, pectin, pectate, mannitol, and galacturonate, and convert pyruvate to acetate, formate, ethanol, and butyrate [28]. According to this source, neither lactate nor citrate are utilized, however other studies demonstrate that fecal isolates closely related to C. indolis may utilize lactate [29], and that the type strain DSM 755 T utilizes citrate [30]. It is unclear whether C. indolis is able to make use of a wider range of sugars or break down complex carbohydrates, however growth is reported to be stimulated by fermentable carbohydrates [28].   jejuense HY-35-12 T , AY494606; C. xylanovorans HESP1 T , AF116920; C. phytofermentans ISDg T , CP000885: 15754-17276. The tree uses sequences aligned by MUSCLE, and was inferred using the Neighbor-Joining method [2]. The optimal tree with the sum of branch lengths = 0.50791241 is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches [3]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Maximum Composite Likelihood method [4] and are in the units of the number of base substitutions per site. Evolutionary analyses were conducted in MEGA 5 [5]. C. stercorarium ATCC 35414 T , CP003992: 856992-858513 was used as an outgroup. Phylum Firmicutes TAS [12][13][14] Class Clostridia TAS [15,16] Current classification Order Clostridiales TAS [17,18] Family Lachnospiraceae TAS [15,19] Genus Clostridium TAS [17,20,21] Species Clostridium indolis TAS [17,22] Type strain DSM 755 Gram stain Negative TAS [23,24] Cell shape Rod TAS [23,24] Motility Motile TAS [23,24]  Geographic location Soil, feces TAS [24,25] Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [26].

Genome project history
The genome was selected based on the relatedness of C. indolis DSM 755 T to C. saccharolyticum, an organism with interesting saccharolytic and syntrophic properties. The genome sequence was completed on May 2, 2013, and presented for public access on June 3, 2013. Quality assurance and annotation done by DOE Joint Genome Institute (JGI) as described below. Table 2 presents a summary of the project information and its association with MIGS version 2.0 compliance [31].

Growth conditions and DNA isolation
C. indolis DSM 755 T was cultivated anaerobically on GS2 medium as described elsewhere [32]. DNA for sequencing was extracted using the DNA Isolation Bacterial Protocol available through the JGI (http://www.jgi.doe.gov). The quality of DNA extracted was assessed by gel electrophoresis and NanoDrop (ThermoScientific, Wilmington, DE) according to the JGI recommendations, and the quantity was measured using the Quant-iT TM Picogreen assay kit (Invitrogen, Carlsbad, CA) as directed.

Genome annotation
Genes were identified using Prodigal [36], followed by a round of manual curation using GenePRIMP [9] for finished genomes and Draft genomes in fewer than 10 scaffolds. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, KEGG, COG, and InterPro databases. The tRNAScanSE tool [37] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [38]. Other noncoding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [39]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform [40] developed by the Joint Genome Institute, Walnut Creek, CA, USA [41]. Information in the tables below reflects the gene information in the JGI annotation on the IMG website [40].

Genome properties
The genome of C. indolis DSM 755 consists of a 6,383,701 bp circular chromosome with GC content of 44.93% (Table 3). Of the 5,903 genes predicted, 5,802 were protein-coding genes, and 101 RNAs; 170 pseudogenes were also identified. 81.21% of genes were assigned with a putative function with the remaining annotated as hypothetical proteins. The genome summary and distribution of genes into COGs functional categories are listed in Tables 3 and 4. The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. b) Also includes 170 pseudogenes. Not in COGs a) The total is based on the total number of protein coding genes in the annotated genome. The genomes of C. indolis and its near relatives (C. saccharolyticum, C. hathewayi, and C. phytofermentans) have similar numbers of genes in each of the 25 broad COG categories (not shown), however differences exist in the type and distribution of genes in specific functional groups (Table 5), particularly those related to COG categories (G) Carbohydrate transport and metabolism, (C) Energy production and conversion, and (Q) Secondary metabolites biosynthesis, transport and catabolism.

Carbohydrate transport and metabolism
Plant biomass is a complex composite of fibrils and sheets of cellulose, hemicellulose, waxes, pectin, proteins, and lignin. Bacteria from soil and the gut generally possess a variety of genes to degrade and transport the diversity of substrates encountered in these plant-rich environments. The genome of C. indolis includes 910 genes (17.65% of total protein coding genes) in this COG group including glycoside hydrolases with the potential to degrade complex carbohydrates including starch, cellulose, and chitin (Table 6), as well as an abundance of carbohydrate transporters (Figure 2). Almost 8% of the protein-coding genes in the genome of C. indolis were found to be associated with carbohydrate transport, represented by two main strategies. ABC (ATP binding cassette) transporters tend to carry oligosaccharides, and have less affinity for hexoses [43,44], while PTS (phosphotransferase system) transporters carry many different mono-and disaccharides, especially hexoses [45]. PTS systems provide a means of regulation via catabolite repression [46], and are thought to enable bacteria living in carbohydratelimited environments to more efficiently utilize and compete for substrates [46]. Both C. indolis and its near relatives are more highly enriched in ABC than PTS transporters (Fig 2), however nearly a third of C. indolis and C. saccharolyticum transporters are PTS genes, suggesting a preference for hexoses, as well as an adaptation to more marginal environments. C. indolis also possesses ten genes associated with all three components of the TRAP-type C4-dicarboxylate transport system, which transports C4-dicarboxylates such as formate, succinate, and malate [47], as well as six putative malate dehydrogenases and two putative succinate dehydrogenases suggesting that C. indolis may have the potential to utilize both of these short chain fatty acids.  [42]. b) Enzyme Commission (EC) numbers assigned by the Integrated Microbial Genome (IMG) database [41].

Energy production and conversion
The genome of C. indolis contains 261 genes in COG category (C) Energy production and conversion, 28 of which are not found in the near relatives analyzed, including genes for citrate utilization (Table 7) and nitrogen fixation (Table 8).

Citrate utilization
Citrate is a metabolic intermediary found in all living cells. In aerobic bacteria, citrate is utilized as part of the tricarboxylic acid (TCA) cycle. In anaerobes, citrate is fermented to acetate, formate, and/or succinate. The first step is the conversion of citrate to acetate and oxaloacetate in a reaction catalyzed by citrate lyase (EC:4.1.3.6) [48]. C. sphenoides, a close relative of C. indolis that does not yet have a sequenced genome has been shown to utilize citrate [49], but there is conflicting evidence as to whether this phenotype is present in C. indolis [28,30]. The genome of C. indolis reveals a group of seven citrate genes organized in a cluster similar to operons found in other bacterial species [48,50] (Figure 3) including CitD, CitE, and CitF, the three subunits of the citrate lyase gene [48], CitG and CitX which have been shown to be necessary for citrate lyase function [50], CitMHS, a citrate transporter, and a putative two component system similar to citrate regulatory mechanisms in other bacteria [51].

Nitrogen Fixation
Nitrogen fixation has been observed in other clostridia [52,53] but has not been demonstrated in the C. saccharolyticum species group. It has been suggested that the capacity to fix nitrogen confers a selective advantage to cellulolytic microbes that live in nitrogen limited environments such as many soils [52]. The functional summary suggests that C. indolis can fix nitrogen. The C. indolis genome reveals 22 nitrogenase related genes in four gene clusters (Table 8), none of which are found in the near relatives analyzed in this study. A minimum set of six genes encoding for structural and biosynthetic components of a functional nitrogenase complex have been hypothesized [54]. Genes needed for the nitrogenase structural component proteins (nifH, nifD, and nifK) are present in C. indolis, but one of the three genes required to synthesize the nitrogenase iron-molybdenum cofactor (nifN) is not identified. Follow up experiments are needed to determine whether C. indolis can fix nitrogen as predicted by the genome analysis Nitrogenase genes have a common gene identifier (EC:1.18.6.1), therefore the pfam numbers are given to distinguish between subunits. Gene product names and pfam numbers assigned by the Integrated Microbial Genome (IMG) database [41].

Lactate utilization
The genome of C. indolis includes both D-and Llactate dehydrogenases, which convert lactate to pyruvate. Additionally, there is a lactate transporter, suggesting that C. indolis is able to utilize exogenous lactate [ Table 9].

Bacterial microcompartments (BMC)
The C. indolis genome contains genes associated with bacterial microcompartment shell proteins. Bacterial microcompartments (BMCs) are proteinaceous organelles involved in the metabolism of ethanolamine, 1,2-propanediol, and possibly other metabolites (Rev in [55][56][57]). BMCs are often encoded by a single operon or contiguous stretch of DNA. The different metabolic types of BMCs can be distinguished by a key enzyme (e.g., ethanolamine lyase and propanediol dehydratase) related to its metabolic function. While the other associated genes in the operon can vary, they frequently include an alcohol dehydrogenase, an aldehyde dehydrogenase, an aldolase and an oxidoreductase. In C. indolis there are 2 separate genetic loci that code for BMCs (Table 10 and 11 and Figure 4). One C. indolis locus (Table 10) contains a gene (K401DRAFT_2189) with sequence similarity to a B12-independent propanediol dehydratase found in Roseburia inulinivorans and Clostridium phytofermentans [58,59] (both members of the Lachnospiraceae). This enzyme has been shown to be involved in the metabolism of fucose and rhamnose [58,59] and was subsequently categorized as the glycyl radical prosthetic group-based (grp) BMC [60]. The glycyl radical family of enzymes was recently expanded to include a choline trimethylamine lyase activity that is part of a microcompartment loci in Desulfovibrio desulfuricans [61]. The corresponding C. indolis enzymes (K401DRAFT_2189 and K401DRAFT_2190) are more similar to the D. desulfuricans protein, but there are differences in the gene content of the microcompartment loci. Further work is needed to determine the physiological role of this microcompartment. The second C. indolis BMC loci (Table 11 and Figure 4) is even more enigmatic. This loci contains the shell proteins, alcohol dehydrogenase, aldehyde dehydrogenase, aldolase and oxidoreductase commonly found in microcompartments, but it lacks a known key enzyme. Homologs of this operon were found in four other bacterial species (Figure 4). They are all missing a known key enzyme and contain 2 genes annotated as CoAtransferase. We propose that the C. indolis genome and these other bacteria contain a novel type of microcompartment, designated the CoAT BMC. It is not clear that the function of the 2 annotated CoA-transferase genes are as predicted and further research is needed to demonstrate the physiological role of this BMC. Annotations assigned by the Integrated Microbial Genome (IMG) database [41].
The second C. indolis BMC loci (Table 11 and Figure 4) is even more enigmatic. This loci contains the shell proteins, alcohol dehydrogenase, aldehyde dehydrogenase, aldolase and oxidoreductase commonly found in microcompartments, but it lacks a known key enzyme. Homologs of this operon were found in four other bacterial species (Figure 4). They are all missing a known key en-zyme and contain 2 genes annotated as CoAtransferase. We propose that the C. indolis genome and these other bacteria contain a novel type of microcompartment, designated the CoAT BMC. It is not clear that the function of the 2 annotated CoA-transferase genes are as predicted and further research is needed to demonstrate the physiological role of this BMC.  Figure 4. CoAT BMC operon found in C. indolis, Caldalkalibacillus thermarum, C. stricklandii, C. saccharolyticum, and Bacillus selenitrireducens. Gene details are found in Table 11. Fe-ADH, Alcohol dehydrogenase pfam00465 Annotations assigned by the Integrated Microbial Genome (IMG) database [41] Secondary metabolites biosynthesis, transport and catabolism Protocatechuate and other aromatics are intermediaries in the degradation of lignin in plant rich environments [62]. The genome of C. indolis contains two protocatechuate dioxygenases and an aromatic hydrolase, revealing the potential for utilizing aromatic compounds (Table 12).

Conclusion
The genomic sequence of C. indolis reported here reveals the metabolic potential of this organism to utilize a wide assortment of fermentable carbohydrates and intermediates including citrate, lactate, malate, succinate, and aromatics, and points to po-tential ecological roles in nitrogen fixation and ethanolamine utilization. Further culture-based characterization is necessary to confirm the metabolic activity suggested by this genomic analysis, and to expand the description of C. indolis.