Genome sequence and description of Corynebacterium ihumii sp. nov.

Corynebacterium ihumii strain GD7T sp. nov. is proposed as the type strain of a new species, which belongs to the family Corynebacteriaceae of the class Actinobacteria. This strain was isolated from the fecal flora of a 62 year-old male patient, as a part of the culturomics study. Corynebacterium ihumii is a Gram positive, facultativly anaerobic, nonsporulating bacillus. Here, we describe the features of this organism, together with the high quality draft genome sequence, annotation and the comparison with other member of the genus Corynebacteria. C. ihumii genome is 2,232,265 bp long (one chromosome but no plasmid) containing 2,125 protein-coding and 53 RNA genes, including 4 rRNA genes. The whole-genome shotgun sequence of Corynebacterium ihumii strain GD7T sp. nov has been deposited in EMBL under accession number GCA_000403725.


Introduction
Corynebacterium ihumii strain GD7 T sp. nov. (= CSUR P902, = DSM 45751) is the type strain of _Corynebacterium ihumii strain GD7 T sp. nov. This bacterium is a Gram-positive, facultativly anaerobic, non spore-forming, non-motile bacillus that was isolated from the stool of a 62 year-old French male who was admitted to the intensive care unit in the Timone Hospital, Marseille, France, for respiratory distress. This strain was isolated as a part of "culturomics" project whose scope is to cultivate all species within human feces [1,2]. The current classification of prokaryotes is based on a combination of phenotypic and genotypic characteristics [3,4] that include 16S rRNA gene phylogeny and nucleotide sequence similarity, G + C content and DNA-DNA hybridization (DDH). Despite being considered as a "gold standard" these genotypic tools exhibit several drawbacks that are overcome by newer sequencing methods [5,6]. Because of the rapidly declining cost of sequencing, the number of sequenced bacterial genomes rapidly increased (almost 7,000 to date [7]). Hence, we recently proposed to incorporate ge-nomic information among criteria used for the description of new bacterial species . Corynebacteria are Gram-positive bacteria that belong to the phylum Actinobacteria and have a high G+C content. They are found in diverse ecological niches such as soil, clinical specimens, cheese smear, vegetables, sewage etc. The genus Corynebacterium was created by Lehmann and Neumann in 1896 [30] which currently comprises 112 distinct species and 11 subspecies [31]. Many Corynebacterium species are involved in human and animal diseases and include C. diphtheriae [32], C. jeikeium, C. urealyticum, C. striatum, C. pseudotuberculosis, and C. ulcerans [33]. Others have industrial applications for amino acid production like C. glutamicum [34]. Here, we present a summary classification and a set of features for _Corynebacterium ihumii strain GD7 T sp. nov. (=CSUR P902, =DSM 45751) together with the description of the genome sequencing and annotation.

Classification and Features
A stool sample was collected from a 62 year-old male admitted to the intensive care unit of the Timone Hospital in Marseille, France. The patient gave a written informed consent for the study. The study was approved by the Ethics Committee of the Institut Fédératif de Recherche IFR48, Faculty of Medicine, Marseille, France, under agreement number 09-022. The fecal specimen was pre-served at -80°C after collection. Strain GD7 T (Table  1) was isolated in January 2012 by cultivation on PVX agar (BioMerieux, Marcy l'Etoile, France) in aerobic condition with 5% CO2 at 37°C, after 21 days of incubation. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [45]. If the evidence is IDA, then the property was observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing information Genome project history
As part of a 'culturomics' study of the human digestive flora, this organism was isolated and selected for sequencing on the basis of its phenotypic differences, phylogenetic position and 16S rRNA and rpoB sequence similarity to other members of the genus Corynebacterium [1,2]. It is the first sequenced genome of C. ihumii sp. nov. The GenBank Bioproject number is PRJEB646 and consists of 41 large contigs in 5 scaffolds. Table 3 shows the project information and its association with MIGS version 2.0 compliance [47].

Production of
Alkaline phosphatase   Intens. [a.

Genome sequencing and assembly
The 454 GS-FLX Titanium paired-end protocol (Roche, Meylan, France) was used for the library construction of C. ihumii strain GD7 T which was then pyrosequenced. Briefly, 3.7µg of purified chromosomal DNA was mechanically fragmented on the Covaris device (KBioScience-LGC Genomics, Middlesex, UK) through miniTUBE-Red with an enrichment size at 5kb. The DNA fragmentation was visualized through the Agilent 2100 BioAnalyzer on a DNA labchip 7500 with an optimal size of 2.5 kb. Circularization and nebulization were performed on 100ng of the fragmented DNA and generated an optimal pattern of 443 bp. This was followed by 17 PCR amplification cycles followed by double size selection. The single stranded paired-end library was then quantified using Quant-it Ribogreen kit (Invitrogen) on the Genios_Tecan fluorometer at 207 pg/µL. The library concentration equivalence was calculated as 8.57E+08 molecules/µL. The library was stored at -20°C until further use. The shotgun library was clonally amplified with 0.5cpb and 1cpb in 2 emPCR reactions for each condition, using the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche).The yield of the shotgun emPCR reactions was 5.27 and 7.56% respectively for the two kinds of paired-end emPCR reactions according to the quality expected (range of 5 to 20%) from the Roche procedure. The library was loaded on the 1/4 region of a GS Titanium PicoTiterPlate (PTP Kit 70x75, Roche) and pyrosequenced with the GS Titanium Sequencing Kit XLR70 and the GS FLX Titanium sequencer (Roche). The run was performed overnight and analyzed on the cluster through the gsRunBrowser and Newbler assembler (Roche). A total of 186,723 passed filter wells were obtained and generated 69.4Mb with a length average of 371 bp. The passed filter sequences were assembled using Newbler with 90% identity and 40bp as overlap. The assembly lead to 5 scaffolds and 41 large contigs (>1500bp) and generated a genome size of 2,232,265 bp which corresponds to a coverage of 30.84× genome equivalent.

Genome annotation
Open Reading Frames (ORFs) prediction was performed using Prodigal [48] with default parameters. The predicted ORFs were excluded if they spanned a sequencing gap region. Functional assessment of protein sequences was carried out by comparing them with sequences in the GenBank [49] and Clusters of Orthologus Groups (COG) databases using BLASTP. tRNAs, rRNAs, signal peptides and transmembrane helices were identified using tRNAscan-SE 1.21 [50], RNAmmer [51], SignalP [52] and TMHMM [53], respectively. ORFans were identified if their BLASTP E-value was lower than 1e -3 for alignment lengths greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e -5 [54]. PHAST was used to identify, annotate and graphically display prophage sequences within bacterial genomes or plasmids [55]. Artemis [56] was used for data management whereas DNA Plotter [57] was used for visualization of genomic features. In-house perl and bash scripts were used to automate these routine tasks. To estimate the mean level of nucleotide sequence similarity at the genome level between C. ihumii and another 42 members of the genus Corynebacterium, we used the Average Genomic Identity of Orthologous gene Sequences (AGIOS) home-made pipeline. Briefly, this pipeline combines the Proteinortho software (with the following parameters: e-value 1e -5 , 30% of identity, 50% coverage and algebraic connectivity of 50%) [58] for detecting orthologous proteins between genomes compared pairwise, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needleman-Wunsch global alignment algorithm.

Genome properties
The genome of C. ihumii sp. nov. strain GD7 T is 2,232,265 bp long (1 chromosome in 5 scaffolds, no plasmid) with a 65.1% GC content (Table 4, Figure 6). Of the 2,182 predicted genes, 2,125 were protein-coding genes and 57 were RNAs (53 tRNA and 4 rRNA genes). A total of 1,562 genes (71.58%) were assigned a putative function. Four hundred and twenty-two genes (19.8%) were annotated as hypothetical proteins, and 126 genes ORFans (5.9%). The distribution of genes into COGs functional categories is presented in Table 5. The properties and statistics of the genome are summarized in Tables 4 and 5. A quick search with PHAST revealed that C. ihumii harbors an incomplete bacteriophage. The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome The total is based on the total number of protein coding genes in the annotated genome.

Comparative genomics
Presently there are more than 75 genomic sequences (finished or draft) available for Corynebacterium species in GenBank. Here, we have compared C. ihumii sp. nov. strain GD7 T with 41 finished or draft genome sequences from 25 Corynebacterium species. Table 6 shows a compar-ison of genome size, GC%, coding-density, and numbers of proteins for the compared Corynebacterium genomes. C. ihumii had a smaller genome than all other compared genomes except that of C. urealyticum strain DSM 7111. AGIOS values identities ranged from 65.23 to 80.59% among Corynebacterium species, and from 97.97 Standards in Genomic Sciences to 99.99% within Corynebacterium species (Supplementary Table). By comparison with other species, C. ihumii exhibited AGIOS values ranging from 67.15% with C. pseudotuberculosis to 76.30% with C. lipophiloflavum, thus confirming its new species status. Figure 7 shows the comparison of gene distribution into COG categories of C. ihumii with C.

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of _Corynebacterium ihumii sp. nov. which contains strain GD7 T (= CSUR P902 = DSM 45751). This bacterium was isolated from the fecal flora of a 62 year-old male admitted in intensive care unit for respiratory distress.