Non contiguous-finished genome sequence and description of Enorma timonensis sp. nov.

Enorma timonensis strain GD5T sp. nov., is the type strain of E. timonensis sp. nov., a new member of the genus Enorma within the family Coriobacteriaceae. This strain, whose genome is described here, was isolated from the fecal flora of a 53-year-old woman hospitalized for 3 months in an intensive care unit. E. timonensis is an obligate anaerobic rod. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,365,123 bp long genome (1 chromosome but no plasmid) contains 2,060 protein-coding and 52 RNA genes, including 4 rRNA genes.


Figure 1.
Phylogenetic tree highlighting the position of Enorma timonensis strain GD5 T relative to other type strains within the Coriobacteriaceae family. GenBank accession numbers are indicated in parentheses. Sequences were aligned using CLUSTALW, and phylogenetic inferences obtained using the maximum-likelihood method within the MEGA software. Numbers at the nodes are percentages of 500 bootstrap replicates supporting that node. The tree is a majority consensus tree. Bifidobacterium bifidum was used as outgroup. The scale bar represents a 2% nucleotide sequence divergence.

Enorma timonensis (JX424767)
Enorma massiliensis (JN837493) Collinsella aerofaciens (AB011816)    and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 4,706 bacteria, which were used as reference data, in the BioTyper database. For strain GD5 T , no significant score was obtained, thus suggesting that our isolate was not a member of a known species. We added the spectrum from strain GD5 T to our database (Figure 4, Figure 5).

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rDNA similarity to E. massiliensis and other members of the family Coriobacteriaceae and is part of a study of the human digestive flora aiming at isolating all bacterial species within human feces [1][2][3]. It was the 2 nd genome of an Enorma species and the first genome of E. timonensis sp. nov. The GenBank accession number is CAPF00000000 and consists of 105 contigs. Table 3 shows the project information and its association with MIGS version 2.0 compliance [48].

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [56] with default parameters. However, the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank [57] and Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAs and rRNAs were predicted using the tRNAScanSE [58] and RNAmmer [59] tools, respectively. Lipoprotein signal peptides and numbers of transmembrane helices were predicted using SignalP [60] and TMHMM [61], respectively. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. Artemis [62] and DNA Plotter [63] were used for data management and visualization of genomic features, respectively. The Mauve alignment tool (version 2.3.1) was used for multiple genomic sequence alignment [64]. To estimate the mean level of nucleotide sequence similarity at the genome level between E. timonensis and five other members of the family Coriobacteriaceae (Table 6), we used the Average Genomic Identity Of gene Sequences (AGIOS) home-made software. Briefly, this software combines the Proteinortho software [65] for detecting orthologous proteins between genomes compared two by two, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needleman-Wunsch global alignment algorithm. Enorma timonensis strain GD5 T was compared to E. massiliensi strain phI T (GenBank accession number CAGZ00000000), C. aerofaciens strain ATCC 25986 (AAVN00000000), C. tanakei strain YIT 12063 (ADLS00000000) and C. glomerans strain PW2 (NC_015389).

Genome properties
The genome is 2,365,123 bp long (1 chromosome, no plasmid) with a 65.8% G+C content ( Figure 6 and Table 4). Of the 2,060 predicted chromosomal genes, 2,006 were protein-coding genes and 52 were RNAs, including a complete rRNA operon, an additional 5S rRNA and 48 tRNAs. A total of 1,384 genes (67.18%) were assigned a putative function. Fifty-five genes were identified as ORFans (2.74%) and the remaining genes were annotated as hypothetical proteins. The properties and statistics of the genome are summarized in Tables 3  and 4. The distribution of genes into COGs functional categories is presented in Table 5.

Genome comparison of E. timonensis with other members of the Coriobacteriaceae family
We compared the genome of E. timonensis strain GD5 T with those of E. massiliensis phI, Collinsella aerofaciens strain ATCC 25986, Collinsella tanakaei strain YIT 12063 and Coriobacterium glomerans strain PW2 ( Table 6).
The draft genome sequence of E. timonensis strain GD5 T is smaller than those of C. aerofaciens and C. tanakaei (2.36, 2.43 and 2.48 Mb, respectively), but larger than those of E. massiliensis and C. glomerans (2.26 and 2.11 Mb, respectively). The G+C content of E. timonensis is larger than those of E. massiliensis, C. aerofaciens, C. tanakaei and C. glomerans (65.80, 62.0, 60.54, 60.23 and 60.40%, respectively). The gene content of E. timonensis is smaller to those of E. massiliensis, C. glomerans and C. tanakaei (2,006, 2,159 and 2,195, respectively) but larger than those of C. aerofaciens and C. tanakaei (1,901 and 1,768, respectively). The distribution of genes into COG categories was not entirely similar in all compared genomes ( Figure  7).
In addition, E. timonensis shared 1,109, 1,026, 880 and 1,077 orthologous genes with E. massiliensis, C. aerofaciens, C. glomerans and C. tanakaei respectively. The average genomic nucleotide sequence identity ranged from 66.37 to 79.44% among Coriobacteriaceae family members, and from 66.01 to 79.44% between E. timonensis and other species (Table 6 and Table 7).  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome The total is based on the total number of protein coding genes in the annotated genome   On the basis of phenotypic, phylogenetic and genomic analyses (taxono-genomics), we formally propose the creation of Enorma timonensis sp. nov. that contains strain GD5 T . This bacterium has been found in France.

Description of Enorma timonensis sp. nov.
Enorma timonensis (ti.mo.nen'sis. L. gen. fem. timonensis, of Timone, the name of the hospital where strain GD5 T was cultivated). Colonies are translucent grey and 0.4 mm in diameter on blood-enriched Columbia agar. Cells are rodshaped with a mean diameter of 0.58 µm and a mean length of 1.32 µm. Optimal growth is achieved in anaerobic conditions. No growth is observed in aerobic or microaerophilic conditions. Growth occurs between 37-45°C, with optimal growth being observed at 37°C on blood-enriched Columbia agar. Cells are Gram-positive, nonendospore forming, and non-motile. Cells are negative for catalase and oxidase. Using an API ZYM strip, positive reactions are observed for leucine arylamidase, valine arylamidase, cystin arylamidase, naphthol-AS-BI-phosphohydrolase, βgalactosidase, β-glucuronidase, α-glucosidase and β-glucosidase. Negative reactions are observed for acid phosphatase, nitrate reduction, urease alka line phosphatase, esterase (C4), esterase lipase (C8), lipase (C14), trypsin, α-chemotrypsin, acid phosphatase, α-galactosidase, N-actetyl-βglucosaminidase, α-mannosidase, α-fucosidase. Using an API Rapid ID 32A strip, positive reactions are observed for proline arylamidase, phenylalanine arylamidase, histidin arylamidase, serine arylamidase. Negative reactions are observed for urease, arginine dihydrolase, tyrosin arylamidase, leucyl-glycyl arylamidase, alanine arylamidase, glycine arylamidase and arginine arylamidase. Using an API 50 CH strip, fermentation or assimilation was not observed. Cells are susceptible to amoxicillin-clavulanic acid, metronidazole, imipenem, vancomycin, rifampicin, gentamicin and resistant to penicillin G, amoxicillin, ceftriaxon, erythromycin, and trimethoprim/sulfamethoxazole. The 16S rDNA and genome sequences are deposited in GenBank under accession numbers JX424767 and CAPF00000000, respectively. The G+C content of the genome is 65.8%. The habitat of the organism is the human digestive tract. The type strain GD5 T (= CSUR P900 = DSM 26111) was isolated from the fecal flora of a 53-year old French patient hospitalized in an intensive care unit. This strain has been found in Marseille, France.