Non-contiguous finished genome sequence and description of Corynebacterium jeddahense sp. nov.

Corynebacterium jeddahense sp. nov., strain JCBT, is the type strain of Corynebacterium jeddahense sp. nov., a new species within the genus Corynebacterium. This strain, whose genome is described here, was isolated from fecal flora of a 24-year-old Saudi male suffering from morbid obesity. Corynebacterium jeddahense is a Gram-positive, facultative anaerobic, nonsporulating bacillus. Here, we describe the features of this bacterium, together with the complete genome sequencing and annotation, and compare it to other member of the genus Corynebacterium. The 2,472,125 bp-long genome (1 chromosome but not plasmid) contains 2,359 protein-coding and 53 RNA genes, including 1 rRNA operon.


Introduction
Corynebacterium jeddahense strain JCB T (= CSUR P778 = DSM 45997) is the type strain of C. jeddahense sp. nov. This bacterium is a Grampositive bacillus, non-spore-forming, strictly aerobic and non-motile that was isolated from the feces of a 24 year-old man living in Jeddah, Saudi Arabia, who suffered from morbid obesity. This isolation was part of a "culturomics" study aiming at cultivating the maximum number of bacterial species from human feces [1,2]. The current classification of bacteria remains a matter of debate and relies on a combination of phenotypic and genomic characteristics [3]. Currently, more than 12,000 bacterial genomes have been sequenced [4], and we recently proposed an innovative concept for the taxonomic description of new bacterial species that integrates their genomic characteristics  as well as proteomic information obtained by MALDI-TOF-MS analysis [36]. In the present study, we present a summary classification and a set of features for Corynebacterium jeddahense sp. nov., strain JCB T (CSUR P778 = DSM 45997), including the description of its complete genome sequence and annotation. These characteristics support the circumscription of the species Corynebacterium jeddahense. The genus Corynebacterium was created in 1896 by Lehmann and Neumann and currently consists of mainly Gram-positive, non-spore-forming, rod-shaped bacteria with a high DNA G+C content [37]. This genus belongs to the phylum Actinobacteria and currently includes more than 100 species with standing in nomenclature [38]. Members of the genus Corynebacterium are found in various environments including water, soil, sewage, and plants as well as in human normal skin flora and human or animals clinical samples. Some Corynebacterium species are well-established human pathogens while others are only considered as opportunistic pathogens. Corynebacterium diphteriae, causing diphtheria, is the most significant pathogen in this genus [39]. However, many Corynebacterium species including, among others, C. jeikeium, C. urealyticum, C. striatum, C. ulcerans and C. pseudotuberculosis, are recognized agents of bacteremias, endocarditis, urinary tract infections, and respiratory or wound infections [40].

Classification and features
A stool sample was collected from a 24-year-old man living in Jeddah, Saudi Arabia, who suffered from morbid obesity (BMI=52). The patient gave a signed informed consent. The study and the assent procedure were approved by the Ethics Commit-tees of the King Abdulaziz University, King Fahd medical Research Center, Saudi Arabia, under agreement number 014-CEGMR-2-ETH-P, and of the Institut Fédératif de Recherche 48, Faculty of Medicine, Marseille, France, under agreement number 09-022. The patient was not taking any antibiotics at the time of stool sample collection and the fecal sample was kept at -80°C after collection. Strain JCB T (Table 1) was first isolated in July 2013 by cultivation on 5% sheep bloodenriched Columbia agar (BioMerieux, Marcy l'Etoile, France) in aerobic atmosphere with 5% CO2 at 37°C after a 14-day preincubation of the stool sample in an aerobic blood culture bottle that also contained sterile rumen sheep fluid. Several other new bacterial species were isolated from this stool specimen using various culture conditions. Altitude 0 m above sea level IDA a Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [51]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. This strain exhibited a 96.8% nucleotide sequence similarity with C. coyleae, the phylogenetically most closely related Corynebacterium species with a validly published name ( Figure 1). The similarity value was lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [52], and was in the 82.9 to 99.60% range observed among members of the genus Corynebacterium with standing in the nomenclature [53]. Four growth temperatures (25, 30, 37, 45°C) were tested. Growth occurred between 30 and 45°C on blood-enriched Columbia agar (BioMerieux), with the optimal growth being obtained at 37°C after 48 hours of incubation. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag Anaer and GENbag microaer systems, respectively (BioMerieux), and under aerobic conditions, with or without 5% CO2. Optimal growth was achieved aerobically. Weak cell growth was observed under microaerophilic and anaerobic conditions. The motility test was negative and the cells were not sporulating. Colonies were translucent and 1 mm in diameter on bloodenriched Columbia agar. Cells were Gram-positive rods ( Figure 2). In electron microscopy, the bacteria grown on agar had a mean diameter and length of 0.63 and 1.22 μm, respectively ( Figure 3).   Strain JCB T was catalase positive and oxidase negative. Using an API CORYNE strip, a positive reaction was observed only for alkaline phosphatase and for catalase. Negative reactions were observed for reduction of nitrates, pyrolidonyl arylamidase, pyrazinamidase, β-glucuronidase, βgalactosidase, α-glucosidase N-acetyl-β-glucosaminidase, β-glucosidase, urease, gelatin hydrolysis and fermentation of glucose, ribose xylose, mannitol, maltose, lactose, saccharose and glycogen. Using the Api Zym system (BioMerieux), alkaline and acid phosphatases and Naphtol-AS-BI phosphohydrolase activities were positive, but esterase (C4), esterase lipase (C8), lipase (C14), trypsin, α-chemotrypsin, α-galactosidase, βgalactosidase, β-glucuronidase, α-glucosidase, N actetyl-β-glucosaminidase, leucine arylamidase, valine arylamidase, cystin arylamidase, αmannosidase and α-fucosidase activities were negative.
Substrate oxidation and assimilation were examined with an API 50CH strip (BioMerieux) at 37°C. All reactions were negative, including fermenta- C. jeddahense is susceptible to amoxicillin, ceftriaxone, imipenem, rifampin, gentamicin, doxycycline and vancomycin, but resistant to ciprofloxacin, trimethoprim/sulfamethoxazole, erythromycin and metronidazole. When compared with representative species from the genus Corynebacterium, C. jeddahense strain JCB T exhibited the phenotypic differences detailed in Table 2.
(MALDI-TOF) MS protein analysis was carried out as previously described [36] using a Microflex spectrometer (Brüker Daltonics, Leipzig, Germany). Twelve individual colonies were deposited on a MTP 384 MALDI-TOF target plate (Brüker). The twelve spectra were imported into the MALDI BioTyper software (version 2.0, Brüker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 4,706 bacteria, including 169 spectra from 69 val-idly named Corynebacterium species used as reference data in the BioTyper database. The score generated enabled the presumptive identification and discrimination of the tested species from those in a database: a score > 2 with a validated species enabled the identification at the species level; and a score < 1.7 did not enable any identification. For strain JCB T , no significant score was obtained, suggesting that our isolate was not a member of any known species (Figures 4 and 5).

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position, 16S rDNA similarity and phenotypic differences with members of the genus Corynebacterium and is part of a culturomics study of the human digestive flora aiming at isolating all bacterial species within hu-man feces [2]. It was the 96th genome from a Corynebacterium species. The EMBL accession number is CBYN00000000and consists of 244 contigs. Table 3 shows the project information and its association with MIGS version 2.0 compliance [41].

Genome sequencing and assembly
Genomic DNA of C. jeddahense was sequenced on a MiSeq sequencer (Illumina Inc, San Diego, CA, USA) using both paired-end and mate-pair sequencing with the Nextera XT DNA sample and Nextera Mate Pair sample prep kits, respectively (Illumina).
To prepare the paired-end library, Genomic DNA was diluted 1:3 to obtain a 1ng/µl concentration. The "tagmentation" step fragmented and tagged the DNA with a mean size of 1.4kb. Then, a limited PCR amplification (12 cycles) completed the tag adapters and introduced dual-index barcodes. After purification on AMPure XP beads (Beckman Coulter Inc, Fullerton, CA, USA), the library was then normalized on specific beads according to the Nextera XT protocol (Illumina). The pooled single strand library was loaded onto the reagent cartridge and then onto the instrument along with the flow cell. Automated cluster generation and paired end sequencing with dual index reads were performed in a single 39-hours run in 2x250-bp. Total information of 5.3Gb was obtained from a 574 K/mm 2 cluster density with a cluster passing quality control filters of 95.4% (11,188,000 clusters). Within this run, the index representation for Corynebacterium jeddahense was determined to 6.2%. The 641,099 reads were filtered according to the read qualities. The mate-pair library was prepared with 1µg of genomic DNA using the Nextera mate-pair Illumina guide. The genomic DNA sample was simultaneously fragmented and tagged with a mate-pair junction adapter. The profile of the fragmentation was validated on an Agilent 2100 BioAnalyzer (Agilent Technologies Inc, Santa Clara, CA, USA) with a DNA 7500 labchip. The DNA fragments ranged in size from 1kb up to 10kb with a mean size of 2.6kb. No size selection was performed and 105ng of tagmented fragments were circularized. The circularized DNA was mechanically sheared to small fragments with an optimal at 409bp on the Covaris device S2 in microtubes (Covaris, Woburn, MA, USA).The library profile was visualized on a High Sensitivity Bioanalyzer LabChip (Agilent Technologies Inc, Santa Clara, CA, USA). After a denaturation step and dilution at 10pM, the library was loaded onto the reagent cartridge and then onto the instrument along with the flow cell. Automated cluster generation and sequencing run were performed in a single 42-hour run in a 2x250-bp. Total information of 3.9Gb was obtained from a 399 K/mm 2 cluster density with a cluster passing quality control filters of 97.9% (7,840,000 clusters). Within this run, the index representation for Corynebacterium jeddahense was determined to 8.17%. The 626,585 reads were filtered according to the read qualities. Genome assembly was performed using Newbler (Roche).

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [60] with default parameters. However, the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against GenBank [61] and Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAs and rRNAs were predicted using the tRNAScanSE [62] and RNAmmer [63] tools, respectively. Lipoprotein signal peptides and numbers of transmembrane helices were predicted using SignalP [64] and TMHMM [65], respectively. ORFans were identified if their BLASTP E-value was lower than 1e -3 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we use an E-value of 1e -5 . Such parameter thresholds have already been used in previous works to define ORFans. Artemis [66] and DNA Plotter [67] were used for data management and visualization of genomic features, respectively. The Mauve alignment tool (version 2.3.1) was used for multiple genomic sequence alignments [68]. To estimate the mean level of nucleotide sequence similarity at the genome level between C. jeddahense and another 4 members of the Corynebacterium genus (Tables 6  and 7), we used the Average Genomic Identity Of gene Sequences (AGIOS) home-made software [35]. Briefly, this software combines the Proteinortho software [69] for detecting orthologous proteins between genomes compared two by two, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needleman-Wunsch global alignment algorithm.

Genome properties
The genome C. jeddahense strain JCB T is 2,472,125 bp long (one chromosome, no plasmid) with a G+C content of 67.2% ( Figure 6, Table 4). Of the 2,412 predicted chromosomal genes, 2,359 were protein-coding genes and 53 were RNAs. A total of 1,462 genes (60.61%) were assigned a putative function. Sixty-seven genes were identified as ORFans (2.77%) and the remaining genes were annotated as hypothetical proteins. The properties and statistics of the genome are summarized in Table 4. The distribution of genes into COGs functional categories is presented in Table 5. The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome Not in COGs a The total is based on the total number of protein coding genes in the annotated genome Standards in Genomic Sciences

Genome comparison of C. jeddahense with other Corynebacterium genomes
We compared the genome of C. jeddahense strain JCB T with those of C. efficiens YS-314 T , C. lipophiloflavum strain DSM 44291 T , C. glutamicum strain ATCC 13032 T and C. pseudotuberculosis strain CIP 102968 T (Table 6 and 7). The draft genome sequence of C. jeddahiense strain JCB T is larger than those of C. efficiens, C. lipophiloflavum and C. glutamicum (2.47, 2.26, 2.43 and 2.11 Mb, respectively), but smaller than that of C. pseudotuberculosis (2.48 Mb). The G+C content of C. jeddahense is larger than those of C. efficiens, C. lipophiloflavum, C. glutamicum and C. pseudotuberculosis (67.2, 62.9, 64.8, 53.8, and 52.1%, respectively). The gene content of C. jeddahense (2,359) is smaller than those of C. efficiens, C. lipophiloflavum and C. glutamicum (2,398, 2,371 and 2,993, respectively) but larger that of C. pseudotuberculosis (2,060). The distribution of genes into COG categories was similar but not identical in all four compared genomes ( Figure  7). In addition, C. jeddahense shared 1,369, 1,345, 1,385 and 1,230 orthologous genes with C. efficiens, C. lipophiloflavum, C. glutamicum and C. pseudotuberculosis, respectively. The AGIOS value ranged from 66.7 to 75.04 among compared Corynebacterium species except C. jeddahense. When compared to other species, the AGIOS value ranged from 66.44% with C. pseudotuberculosis to 77.26% with C. lipoflavum, thus confirming its new species status (Table 7).

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of Corynebacterium jeddahense sp. nov., that contains the strain JCB T . The strain has been isolated from the fecal flora of a Saudi man suffering from morbid obesity. Several other as yet undescribed bacterial species were also cultivated from different fecal samples through diversification of culture conditions , thus suggesting that the human fecal flora of humans remains partially unknown.