Complete Genome Sequence of Clostridium clariflavum DSM 19732

Clostridium clariflavum is a Cluster III Clostridium within the family Clostridiaceae isolated from thermophilic anaerobic sludge (Shiratori et al, 2009). This species is of interest because of its similarity to the model cellulolytic organism Clostridium thermocellum and for the ability of environmental isolates to break down cellulose and hemicellulose. Here we describe features of the 4,897,678 bp long genome and its annotation, consisting of 4,131 protein-coding and 98 RNA genes, for the type strain DSM 19732.


Introduction
Cellulolytic clostridia are prominently represented among bacterial species. These organisms are able to solubilize lignocellulose, and their high rates of cellulose utilization make them candidates for consolidated bioprocessing applications [1]. In particular, anaerobic cellulolytic clostridia that grow at thermophilic temperatures are known to break down lignocellulose very efficiently. Clostridium clariflavum DSM 19732 is a cellulolytic thermophilic anaerobe isolated from anaerobic sludge [2], and closely related to the widely studied thermophile C. thermocellum. Environmental isolates of C. clariflavum have been found to dominate cellulolytic enrichment cultures from thermophilic compost and some have been found to utilize both hemicellulose and cellulose [3,4]. These organisms therefore represent a potentially important opportunity for the discovery of novel enzymes and mechanisms for efficient lignocellulose solubilization at thermophilic temperatures. Here we describe the complete annotated genomic sequence of the type strain Clostridium clariflavum DSM 19732.

Classification and Features
The phylogenetic relationship of the 16S rRNA gene of C. clariflavum DSM 19732 with other cellulolytic clostridia from Cluster III is shown in Figure 1. The sequences shown in here represent mostly cellulolytic and xylanotlytic clostridia sharing over 84.5% sequence identity. The branch comprised by C. clariflavum, C. straminisolvens and C. thermocellum is of particular interest since it includes cellulolytic organisms sharing at least 96.6% sequence homology able to grow at thermophilic temperatures. A few environmental samples have provided sequences with close homology (>99.0% sequence similarity) to the C. clariflavum 16S rRNA gene, and have been found in thermophilic methanogenic bioreactors [7], enrichment cultures from bioreactors (Accession number AB231801 and AM408567), and enrichments from thermophilic compost [3]. Two pure cultures have been isolated from compost enrichments with >99.7% sequence similarity to C. clariflavum and able to utilize xylan [4]. However, no evidence of this organism has been reported in metagenomic studies from similar environments. C. clariflavum DSM 19732 is anaerobic, chemoorganotrophic and grows in straight or slightly curved rods [ Figure 2]. This organism can ferment cellulose and cellobiose as sole carbon sources, but cannot utilize glucose, xylose or arabinose [2]. Aesculin hydrolysis is positive, but no starch, casein or gelatin hydrolysis has been observed [2]. Nitrate is not reduced to nitrite, and catalase production was negative [2].

Genome project history
The genome was selected based on the ability of Clostridium clariflavum DSM 19732 to grow on cellulose at thermophilic temperatures like its close relative C. thermocellum and the ability of environmental strains identified as C. clariflavum to utilize hemicellulose. A summary of the project information is presented in Table 2. The complete genome sequence was finished in July 2011. The GenBank accession number for the project is CP003065. The genome project is listed in the Genome OnLine Database (GOLD) [21] as project Gi10738. Sequencing was carried out at the DOE Joint Genome Institute (JGI). Finishing was performed by JGI-Los Alamos National Laboratory (LANL). Annotation and annotation quality assurance were carried out by the JGI.

Growth conditions and DNA isolation
Clostridium clariflavum DSM 19732 was obtained from the DSMZ culture collection and grown on medium DSM 520 at 55 o C. Genomic DNA was obtained by using a phenol-chloroform extraction protocol with CTAB, a JGI standard operating procedure [22].

Genome sequencing and assembly
The draft genome of Clostridium clariflavum DSM 19732 was generated at the DOE Joint genome Institute (JGI) using a combination of Illumina [23] and 454 technologies [24]. For this genome, we constructed and sequenced an Illumina GAii shotgun library which generated 44,772,666 reads totaling 3,402.7 Mb, a 454 Titanium standard library which generated 434,166 reads and 1 paired end 454 library with an average insert size of 9 kb which generated 392,711 reads totaling 223.9 Mb of 454 data. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website [25]. The initial draft assembly contained 239 contigs in 5 scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together with Newbler, version 2.3-PreRelease-6/30/2009. The Newbler consensus sequences were computationally shredded into 2 kb overlapping fake reads (shreds). Illumina sequencing data was assembled with VELVET, version 1.0.13 [26], and the consensus sequences were computationally shredded into 1.5 kb overlapping fake reads (shreds). We integrated the 454 Newbler consensus shreds, the Illumina VELVET consensus shreds and the read pairs in the 454 paired end library using parallel phrap, version SPS -4.24 (High Performance Software, LLC). The software Consed [27][28][29] was used in the following finishing process. Illumina data was used to correct potential base errors and increase consensus quality using the software Polisher developed at JGI (Alla Lapidus, unpublished).

Genome annotation
Genes were identified using Prodigal [31] as part of the Oak Ridge National Laboratory genome annotation pipeline followed by a round of manual curation using the JGI GenePRIMP pipeline [32]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro, databases. Additional gene prediction analysis and functional annotation were performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [33].

Genome properties
The genome of Clostridium clariflavum DSM 19732 is comprised of one circular chromosome of 4,897,678 bp in length with 35.6% GC content (Table 3 and Figure 3). The sequences of 16S rRNA gene (Accession number AB186359) and a family 48 glycosyl hydrolase (Accession number GQ487569) genes have been previously reported [2,3], and contain 3 mismatches each. Standards in Genomic Sciences The genome size of C. clariflavum is much larger than that of the cellulolytic thermophile and close relative Clostridium thermocellum ATCC 27405 (3.8Mb, 38.9%GC). The chromosome of C.
clariflavum was predicted to contain 4,242 coding gene sequences with 6 rRNA operons and 60 tRNA genes ( Table 3). The properties and the statistics of the genome are summarized in Tables 3 and 4. a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. b) Also includes 239 pseudogenes.  . While cellulose-active GH families seem to be similarly distributed between both organisms, C. clariflavum has a higher proportion and diversity of xylanolytic enzymes than C. thermocellum. Within the glycosyl hydrolase inventory of C. clariflavum, a subset of bifunctional cellulases was observed ( Table 5). Three of these are associated Standards in Genomic Sciences with Type I dockerins (cellulosomal) with varying arrangements of xylanases from families GH10 and GH11 (Clocl_1480, Clocl_2083, Clocl_2441). In addition, an untethered bifunctional set of cellulases (Clocl_3038) is a combination of a GH48 previously reported [3] most closely related to C. thermocellum CelY (Cthe_0071, 72% sequence similarity), in combination with a GH9 most closely related to C. thermocellum CelG (Cthe_0040, 69% sequence similarity) and two family 3 carbohydrate binding modules (CBM3) in a GH48-GH9-CBM3-CBM3 arrangement. A similar arrangement has been discovered in hyperthermophiles like Caldicellulosiruptor bescii [34] and Caldicellulosiruptor saccharolyticus [35], although these enzymes differ in that the CBMs are located in between both cellulases. The lack of a dockerin domain suggests that these multi-domain GHs are secreted, as is the case for Clostridium thermocellum CelY. This also suggests that synergy between secreted GH48 and GH9 enzymes in C. thermocellum [36] seems to be facilitated by this arrangement in C. clariflavum. It should also be noted that in our previous survey of GH48 enzymes from thermophilic cellulolytic clostridia, we reported that C. clariflavum only had a CelY-like GH48 [3]. However, the genome sequence of C. clariflavum revealed an additional cellulosomal GH48 enzyme (Clocl_4007) with a dockerin domain and high degree of similarity to C. thermocellum CelS, the most abundant enzymatic subunit of the C. thermocellum cellulosome [37] [38]. This makes C. clariflavum the only organism with two distinctly different GH48 enzymes, one of which is involved in a bifunctional association.

Lignocellulose sensing system
A novel system of carbohydrate sensory domains has recently been proposed for C. thermocellum [39]. We have identified a very similar set of genes present in C. clariflavum consisting of 8 sigma Ilike factors associated with adjacent carbohydrate active domains (Table 6). Based on the specificity of the CBM modules associated with these gene pairs, there seem to be three potential cellulosespecific (CBM3) pairs: Clocl_1053-54, Clocl_2843-44 and Clocl_4008-09, the latter located directly upstream of the GH48 closely related to endoglucanase CelS. We have identified also one xylan-specific (CBM42) in Clocl_2098-99, one pectin-specific (PA12) in Clocl_2747-48, and three additional domains (Clocl_2044-45, Clocl_2797-98, Clocl_4136-37) which seem to have no catalytic function or CBM domains, but retain high sequence similarity to the proposed unspecific pairs in C. thermocellum.

Cellulosome assembly
Most cellulolytic clostridia are known to organize glycosyl hydrolases and other catalytic subunits outside of the cell by means of a multiprotein complex known as the cellulosome [40,41]. , ii) an OlpA-like protein with a type I cohesin (Clocl_3334), iii) a similar arrangement with four Type I cohesins (Clocl_3304), and iv) an arrangement consisting of a type I and two type II cohesins (Clocl_3303) similar to a novel anchoring system found in Acetivibrio cellulolyticus [42]. In addition, there seem to be a variety of untethered multi-cohesin complexes with 3 complexes containing multiple Type-I cohesins associated with CBM2 modules (Cloc_4158, Clocl_4211, Clocl_4212), and one untethered Type II cohesin complex (Clocl_1799). The diversity of cellulosomal structural proteins is very similar to what is found in Clostridium thermocellum and other cellulosomal microorganisms. However, CBM2 modules are not very common in cellulolytic clostridia, with C. phytofermentans and C. cellulovorans each having one such domain. C. clariflavum has four of these domains and they are associated with three separate multi-cohesin (Type I) domains with no anchoring mechanism. It may also be noted that the organization of the scaffoldin and anchoring proteins resembles the cellulosomal complexes found in the mesophile Acetivibrio cellulolyticus [42,43] more than it does the C. thermocellum cellulosome.

Pyruvate metabolism
The genome sequence of C. clariflavum revealed that this organism possesses a standard glycolytic pathway. However, the pyruvate node is slightly different from other Cluster III clostridia in that C. clariflavum possesses genes for both pyruvate kinase (Clocl_1090) and pyruvate dikinase (PPDK, Clocl_2755). This may be of relevance to pyruvate metabolism because genomes of cellulolytic clostridia from cluster III reveal that the pathway from phosphoenol pyruvate (PEP) to pyruvate in these organisms uses either PPDK (Clostridium thermocellum ATCC 27405 and DSM 1313) or pyruvate kinase (C. cellulolyticum, C. papyrosolvens).
There are nevertheless cellulolytic clostridia outside of Cluster III that also possess both, as is the case of Clostridium cellulovorans.
Hemicellulose sugars metabolism C. clariflavum possesses a variety of xylanolytic enzymes that allow it to break down xylan completely to xylose, unlike C. thermocellum, which is only able to break xylan down to xylooligomers. One of the key enzymes in xylose utilization, xylose isomerase, is found in mesophilic xylanolytic/cellulolytic clostridia such as C. cellulolyticum, C. phytofermentans, C. papyrosolvens and C. cellulovorans, as well as in hyperthermophiles like Caldicellulosiruptor bescii. However, the genome of C. clariflavum does not seem to possess a xylose isomerase. On the other hand, a putative xylulose kinase has been identified in C. clariflavum (Clocl_2440), which is a key difference from C. thermocellum, where this enzyme is absent. Xylulose kinase is usually adjacent to or in the same operon as xylose isomerase. A xylose epimerase (5.1.3.4) that leads to the production of L-ribulose-5P is immediately adjacent (Clocl_2439) to the putative xylulose kinase. In C. clariflavum, these genes are also surrounded by a variety of hemicellulose-active enzymes in an operon from Clocl_2435 to Clocl_2447, that includes 3 family 10 glycosyl hydrolases. Considering that none of these enzymes is present in C. thermocellum, there should be great interest in further exploring this operon in C. clariflavum and in environmental isolates. An alternative xylose epimerase (5.1.3.1) that produces D-ribulose-5P used in the pentose phosphate pathway is present elsewhere in the genome (Clocl_2564). It therefore seems that C. clariflavum DSM 19732 has much of the capabilities to grow on xylan and xylose, but seems to have lost that ability due to the absence of a xylose isomerase.

Conclusion
In summary, the genome of C. clariflavum strain DSM 19732 contains several features that differentiate this organism from other close relatives within the Cluster III cellulolytic clostridia, and C. thermocellum in particular, providing the first indications of the mechanisms by which C. clariflavum strains utilize lignocellulosic biomass. Seventy two new glycosyl hydrolyses were identified from C. clariflavum with prominently represented structural families including GH9, GH10, GH11 and GH43. Bifunctional arrangements of key GHs are observed involving both cellulosomal (e.g. xylanases GH10, GH11) and non-cellulosomal (e.g. GH9 and GH48) components, and are more prevalent than in C. thermocellum. Xylanases are also more numerous in C. clariflavum than in C. thermocellum. Unique among cellulolytic clostridia of cluster III, the C. clariflavum genome includes putative sequences for pyruvate kinase, which is not found in C. thermocellum, as well as pyruvate dikinase.