Non contiguous-finished genome sequence and description of Clostridium jeddahense sp. nov.

Clostridium jeddahense strain JCDT (= CSUR P693 = DSM 27834) is the type strain of C. jeddahense sp. nov. This strain, whose genome is described here, was isolated from the fecal flora of an obese 24 year-old Saudian male (BMI=52 kg/m2). Clostridium jeddahense strain JCDT is an obligate Gram-positive bacillus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,613,503 bp long genome (1 chromosome, no plasmid) exhibits a G+C content of 51.95% and contains 3,462 protein-coding and 53 RNA genes, including 4 rRNA genes.


Introduction
Clostridium jeddahense strain JCD T (=CSUR P693 = DSM 27834), is the type strain of Clostridium jeddahense sp. nov. This bacterium is a Grampositive, anaerobic, spore-forming indole, positive bacillus that was isolated from the stool of an obese 24 year-old Saudian individual, as a part of a culturomics study as previously reported. The usual parameters used to delineate a bacterial species include 16S rDNA sequence identity and phylogeny [1,2], genomic G + C content diversity, and DNA-DNA hybridization (DDH) [3,4]. Nevertheless, some limitations appeared notably because the cutoff values vary dramatically between species and genera [5]. The introduction of highthroughput sequencing techniques made genomic data for many bacterial species available [6]. We recently proposed a new method (taxonogenomics), which includes genomic data in a polyphasic approach to describe new bacterial species [6]. This strategy combines phenotypic characteristics, including MALDI-TOF MS spectrum, and genomic analysis . Here, we present a summary classification and a set of features for C. jeddahense sp. nov. strain JCD T (=CSUR P693 = DSM 27834), together with the description of the complete genome sequencing and annotation. These characteristics support the circumscription of the species C. jeddahense. The genus Clostridium was created in 1880 [38] and consists of obligate anaerobic rod-shaped bacilli able to produce endospores [38]. More than 200 species have been described to date (http://www.bacterio.cict.fr/c/clostridium.html). Members of the genus Clostridium are mostly environmental bacteria or associated with the commensal digestive flora of mammals. However, several are major human pathogens, including C. botulinum, C. difficile and C. tetani [38]. Standards in Genomic Sciences

Classification and features
A stool sample was collected from an obese 24-year-old male Saudian volunteer patient living in Jeddah. The patient gave an informed and signed consent, and the agreement of the Ethical Committee of the King Abdulaziz University, King Fahd medical Research Centre, Saudi Arabia, and the local ethics committee of the IFR48 (Marseille, France) were obtained under agreement number 014-CEGMR-2-ETH-P and 09-022 respectively. The fecal specimen was preserved at -80°C after collection and sent to Marseille. Strain JCD T ( Table  1) was isolated in July 2013 by anaerobic cultivation on 5% sheep blood-enriched Columbia agar (BioMerieux, Marcy l'Etoile, France) after a 5-day preincubation on blood culture bottle with rumen fluid. This strain exhibited a 97.3% nucleotide sequence similarity with Clostridium sporosphaeroides strain DSM 1294 ( Figure 1). This value was lower than the 98.7% 16S rRNA gene sequence similarity threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [2] and was in the 78. 4 to 98.9% range of 16S rRNA identity values observed among 41 Clostridium species with validly published names [52]. Altitude 0 m above sea level IDA Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [51]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.

Figure 1.
A consensus phylogenetic tree highlighting the position of Clostridium jeddahense strain JCD T relative to other type strains within the Clostridum genus. GenBank accession numbers are indicated in parentheses. Sequences were aligned using CLUSTALW, and phylogenetic inferences were obtained using the maximum-likelihood method in the MEGA software package. Numbers at the nodes are the percentages of bootstrap values from 500 replicates that support the node. Clostridium ramosum was used as outgroup. The scale bar represents a 2% nucleotide sequence divergence.
Four growth temperatures (25,30,37, 45°C) were tested; growth occurred between 25 and 37°C, but optimal growth was observed at 37°C, 24 hours after inoculation. No growth occurred at 45°C. Colonies were translucent and approximately 0.2 to 0.3 mm in diameter on 5% sheep blood-enriched Columbia agar (BioMerieux). Growth of the strain was tested on the same agar under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMerieux), and in aerobic conditions, with or without 5% CO2. Growth was observed only anaerobically. No growth occurred in aerobic or microaerophilic conditions. Gram staining showed Gram-positive rods able to form spores ( Figure 2). A motility test was positive. Cells grown on agar exhibit a mean diameter of 1 µm and a mean length of 1.22 µm in electron microscopy ( Figure  3).
Strain JCD T exhibited neither catalase nor oxidase activity ( Table 2). Using an API Rapid ID 32A strip (BioMerieux), positive reactions were obtained for indole production, alkaline phosphatase, arginine arylamidase, proline arylamidase, alanine arylamidase, glycine arylamidase, histidine arylamidase, glutamyl glutamic acid arylamidase and serine arylamidase. Negative reactions were obtained for arginine dihydrolase, α-galactosidase, βgalactosidase, α-glucosidase, β-glucosidase, αarabinosidase, N-acetyl-β-glucosaminidase, glutamic acid decarboxylase, α-fucosidase, nitrate reduction, leucyl glycine arylamidase, fermentation of mannose and raffinose, urease, β-galactosidase-6-phosphatase, β-glucuronidase, phenylalanine arylamidase, leucine arylamidase, pyroglutamic acid arylamidase and tyrosine arylamidase. Using an API 50CH strip (Biomerieux), strain JCD T was asaccharolytic. C. jeddahense is susceptible to amoxicillin, amoxicillin-clavulanate, imipenem, metronidazole, doxycycline, rifampicin, vancomycin but resistant to ceftriaxone, ciprofloxacin and trimethoprimsulfamethoxazole. The comparisons with other Clostridium species are summarized in Table 2. Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was carried out as previously described [54]. Briefly, a pipette tip was used to pick one isolated bacterial colony from a culture agar plate and spread it as a thin film on a MTP 384 MALDI-TOF target plate (Bruker Daltonics, Leipzig, Germany). Twelve distinct deposits from twelve isolated colonies were performed for strain JCD T . Each smear was overlaid with 2 µL of matrix solution (saturated solution of alpha-cyano-4-hydroxycinnamic acid) in 50% acetonitrile, 2.5% tri-fluoracetic acid, and allowed to dry for 5 minutes. Measurements were performed with a Microflex spectrometer (Bruker). Spectra were recorded in the positive linear mode for the mass range of 2,000 to 20,000 Da (parameter settings: ion source 1 (ISI), 20kV; IS2, 18.5 kV; lens, 7 kV). A spectrum was obtained after 675 shots with variable laser power. The time of acquisition was between 30 seconds and 1 minute per spot. The twelve JCD T spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 3,769 bacteria, including 228 spectra from 96 Clostridium species. The method of identification included the m/z from 3,000 to 15,000 Da. For every spectrum, a maximum of 100 peaks were compared with spectra in database. The resulting score enabled the identification of published species enabled identification at the ification at the genus level, and a score < 1.7 did not enable any identification. No significant MALDI-TOF score was obtained for strain JCD T against the Bruker database, suggesting that our isolate was not a member of a known species. We added the spectrum from strain JCD T to our database ( Figure 4). Finally, the gel view showed the spectral differences with other members of the genus Clostridium ( Figure 5). The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a grey scale scheme code. The grey scale bar on the right y-axis indicates the relation between the shade of grey a peak is displayed with and the peak intensity in arbitrary units. Species names are shown on the left.

Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rDNA similarity to members of the genus Clostridium, and is part of a study of the human digestive flora aiming at isolating all bacterial species in human feces [55]. It was the 101 st genome of a Clostridium species and the first genome of C. jeddahense sp. nov. The GenBank accession number is CBYL00000000. The assembly consists of 104 contigs. Table 3 shows the project information and its association with MIGS version 2.0 compliance [39].

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [56] with default parameters. However, the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank [57] and Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAs and rRNAs were predicted using the tRNAScan-SE [58] and RNAmmer [59] tools, respectively. Signal peptides and numbers of transmembrane helices were predicted using SignalP [60] and TMHMM [61], respectively. Mobile genetic elements were predicted using PHAST [62] and RAST [63].
ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous work to define ORFans. Artemis [64] and DNA Plotter [65] were used for data management and visualization of genomic features, respectively. The Mauve alignment tool (version 2.3.1) was used for multiple genomic sequence alignment [66].
To estimate the mean level of nucleotide sequence similarity at the genome level between C. jeddahense and 7 other members of the genus Clostridium, we used the Average Genomic Identity Of gene Sequences (AGIOS) home-made software [6]. Briefly, this software combines the Proteinortho software [67] for detecting orthologous proteins between pairs of genomes, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needle-man-Wunsch global alignment algorithm. C. jeddahense strain JCD T was compared to C. senegalense strain JC122, C. dakarense strain FF1, Clostridium beijerinckii strain NCIMB 8052, C. difficile strain B1, Clostridium cellulolyticum strain H10, Clostridium leptum strain DSM 753, and Clostridium sporosphaeroides strain DSM 1294 (see Table 6B).

Genome properties
The genome is 3,613,503 bp long (1 chromosome, but no plasmid) with a 51.95% G+C content (Figure 6 and Table 4). Of the 3,515 predicted genes, 3,462 were protein-coding genes and 53 were RNAs, including 4 rRNAs. A total of 2,193 genes (62.38%) were assigned a putative function and 81 genes were identified as ORFans (2.3%). The properties and statistics of the genome are summarized in Tables 4 and 5. The distribution of genes into COG functional categories is presented in Table 5.  When we compared C. jeddahense with other species, AGIOS values ranged from 57.52 with C. senegalense to 91.97% with C. sporosphaeroides. Although the AGIOS value was elevated between C. jeddahense and C. sporosphaeroides, we believe that the remarkable phenotypic differences, including motility, indole production ( Table 2), and protein profile (Figure 7), enable the classification of C. jeddahense as a new species.

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses (taxono-genomics), we formally propose the creation of Clostridium jeddahense sp. nov. that contains strain JCD T . This strain was isolated from the fecal flora of an obese 24 year-old Saudian individual living in Jeddah.