Non contiguous-finished genome sequence and description of Dielma fastidiosa gen. nov., sp. nov., a new member of the Family Erysipelotrichaceae

Dielma fastidiosa strain JC13T gen. nov., sp. nov. is the type strain of D. fastidiosa gen. nov., sp. nov., the type species of a new genus within the family Erysipelotrichaceae. This strain, whose draft genome is described here, was isolated from the fecal flora of a healthy 16-year-old male Senegalese volunteer. D. fastidiosa is a Gram-negative anaerobic rod. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,574,031 bp long genome comprises a 3,556,241-bp chromosome and a 17,790-bp plasmid. The chromosome contains 3,441 protein-coding and 50 RNA genes, including 3 rRNA genes, whereas the plasmid contains 17 protein-coding genes.


Introduction
Dielma fastidiosa strain JC13 T (CSUR P149 / DSM 26099) is the type strain of D. fastidiosa gen. nov., sp. nov., the type species of Dielma gen. nov. This bacterium is a Gram-negative, anaerobic, catalase and indole-negative bacillus, isolated from the stool of a healthy Senegalese patient as part of a study aimed at cultivating individually all species within human feces [1,2]. The conventional genotypic methods used in bacterial taxonomy include 16S rRNA gene-based phylogeny and nucleotide similarity [3,4], determination of the G + C content and DNA-DNA hybridization (DDH) [5,6]. Although DDH and 16S rRNA gene similarity cutoffs are considered as gold standards in bacterial taxonomy, they have some limitations as they do not apply well to all species or genera [3]. Hence, there is a need for alternative methods. The introduction of high-throughput genome sequencing and proteomic analyses [7] provided a source of comprehensive information about studied bacterial isolates. Such data may now be included among the criteria used for taxonomic identification. We recently proposed to use a polyphasic approach to describe new bacterial taxa that is based on their genome sequence, MALDI-TOF spectrum and main phenotypic characteristics [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26].

Classification and features
A stool sample was collected from a healthy 16year-old male Senegalese volunteer patient living in Dielmo (rural village in the Guinean-Sudanian zone in Senegal), who was included in a research protocol. Written assent was obtained from this individual. For this study, no written consent was needed from his guardians because he was older than 15 years (in accordance with the previous project approved by the Ministry of Health of Senegal and the assembled village population and as published elsewhere [41].) Both this study and the assent procedure were approved by the National Ethics Committee of Senegal (CNERS) and the Eth-  [9][10][11][12][13][14][15][16][17][18][19][20]23]. The fecal specimen was preserved at -80°C after collection. Strain JC13 T (Table 1) was isolated in January 2011 by cultivation on Brain Heart Infusion agar (Becton Dickinson, Pont de Claix, France), after a 10 day preincubation in anaerobic blood culture bottle. The 16S rRNA sequence (GenBank accession number JF824807) of D. fastidiosa strain JC13 T was compared to sequences in GenBank using BLAST [50] and showed a highest similarity of 89.71% with Clostridium innocuum (Figure 1). By comparison with type species from genera within the family Erysipelotrichaceae, D. fastidiosa exhibited a 16S rRNA sequence similarity ranging from 69.90 to 89.71%. Since these values are lower than the 95% threshold recommended by Stackebrandt and Ebers to delineate new genera without performing DDH [3], we propose to classify strain JC13 T within a novel genus. Strain JC13 T did not exhibit catalase or oxidase activity. Using API Rapid ID 32A, positive reactions were obtained for α-fucosidase and pyroglutamic acid arylamidase. Negative reactions were observed for indole production, nitrate reduction, urease, arginine dihydrolase, α-galactosidase, βgalactosidase 6 phosphate, α-glucosidase, βglucosidase, α-arabinosidase, β-glucuronidase, Nacetyl-β-glucosaminidase, mannose and raffinose fermentation, glutamic acid decarboxylase, alkanine phospatase, arginine arylamidase, proline arylamidase, leucyl glycine arylamidase, phenylalanine aylamidase, leucine arylamidase, pyroglutamic acid arylamidase, tyrosine arylamidase, alanine arylamidase, glycine arylamidase, histidine arylamidase, glutamyl glutamic acid arylamidase, and serine arylamidase. Using an API 20NE strip, a positive reaction was observed for esculine hydrolysis. No sugar fermentation was observed using API 50CH (Biomerieux). D. fastidiosa is susceptible to amoxicillin, imipenem, metronidazole and ciprofloxacine, but resistant to trimethoprim/sulfamethoxazole, rifampin, doxycycline, and gentamicin. The differential phenotypic characteristics with other species are summarized in Table 2. Different growth temperatures (25, 30, 37, 45°C) were tested; growth occurred between 25°C and 45°C and optimal growth was observed at 30°C. Colonies were 0.5 to 1 mm in diameter on bloodenriched Columbia agar and BHI agar. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux) and in the presence of air with or without 5% CO2. Growth was achieved only under anaerobic conditions. Gram staining showed a rodshaped Gram-negative bacterium ( Figure 2). The motility test was positive. Cells grown on agar have a mean diameter of 0.60 µm and a mean length of 2.2 µm in electron microscopy ( Figure 3). Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was carried out as previously described using a Microflex spectrometer (Bruker Daltonics, Leipzig, Germany) [7,51]. Briefly, a pipette tip was used to pick one isolated bacterial colony from a cultured agar plate, and spread it as a thin film on a MTP 384 MALDI-TOF target plate (Bruker Daltonics). Twelve distinct deposits were made for strain JC13 from twelve isolated colonies. Each smear was overlaid with 2µL of matrix solution (saturated solution of alpha-cyano-4-hydroxycinnamic acid) in 50% acetonitrile, 2.5% tri-fluoracetic-acid, and allowed to dry for five minutes. Measurements were performed with a Microflex spectrometer (Bruker). Spectra were recorded in the positive linear mode for the mass range of 2,000 to 20,000 Da (parameter settings: ion source 1 (IS1), 20 kV; IS2, 18.5 kV; lens, 7 kV). A spectrum was obtained after 675 shots at a variable laser power. The time of acquisi-Standards in Genomic Sciences tion was between 30 seconds and 1 minute per spot. The twelve JC13 spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the reference spectra from 4,334 bacteria (as updated on August 29, 2012), including spectra from 17 species within the Erysipelotrichaceae, contained in the BioTyper database. The method of identification included the m/z from 3,000 to 15,000 Da. For every spectrum, a maximum of 100 peaks were taken into account and compared with spectra in the database. A score enabled the identification, or not, from the tested species: a score > 2 with a validly published species enabled the identification at the species level, a score > 1.7 but < 2 enabled the identification only at the genus level; and a score < 1.7 did not enable any identification. For strain JC13 T , no significant score was obtained, suggesting that our isolate was not a member of any known species or genus in the Biotyper database. We incremented our database with the spectrum from strain JC13 T (Figure 4). The gel view allowed us to highlight the spectra differences with other species of Erysipelothrichaceae family members ( Figure 5). , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [49]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to members of the family Erysipelotrichaceae and is part of a study of the human digestive flora aiming at isolating all bacterial species within human feces [1]. It was the seventh genome from the Erysipelotrichaceae family and the first genome of Dielma fastidiosa gen. nov., sp. nov. The Genbank accession number is CAEN00000000 and consists of 82 contigs. Table  3 shows the project information and its association with MIGS version 2.0 compliance [52] Growth conditions and DNA isolation Dielma fastidiosa sp. nov., strain JC13 T (= CSUR P149 = DSM 26099) was grown on 5% sheep blood-enriched Columbia agar at 30°C in anaerobic atmosphere. Seven petri dishes were spread and the cultivated bacteria were resuspended in 3×100µl of G2 buffer (EZ1 DNA Tissue kit, Qiagen). A first mechanical lysis was performed with glass powder on a Fastprep-24 device (MP Biomedicals, USA) using 2×20 seconds cycles. DNA was then treated with 2.5µg/µL lysozyme for 30 minutes at 37°C and extracted through the BioRobot EZ1 Advanced XL (Qiagen). The DNA was then concentrated and purified with a QIAamp kit (Qiagen). The yield and concentration was measured using a Quant-it Picogreen kit (Invitrogen) on a GeniosTecan fluorometer at 46.6ng/µl.

Genome sequencing and assembly
DNA (5µg) was mechanically fragmented with a Hydroshear device (Digilab, Holliston, MA, USA) with an enrichment size at 3-4kb. The DNA fragmentation was visualized through the Agilent 2100 BioAnalyzer on a DNA labchip 7500 with an optimal size of 3.4kb. The library was constructed according to the 454 GS FLX Titanium paired-end protocol (Roche). Circularization and nebulization were performed. After PCR amplification through 15 cycles followed by double size selection, the single stranded paired-end library profile was visualized on an Agilent 2100 RNA Pico 6000 Labchip with an optimal length of 522 bp. Then the library was quantified on the Quant-it Ribogreen kit (Invitrogen) on a GeniosTecan fluorometer at 133 pg/µL. The library concentration equivalence was calculated as 4.67E+08 molecules/µL. The library was stored at -20°C until further use. The shotgun library was clonally amplified with 0.5cpb in 4 emPCR reactions and 1cpb in 4 emPCR reactions with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yields of the emPCRs were 5.10% and 10.73%, respectively, in the range of 5 to 20% recommended by the Roche procedure. Twice, approximately 790,000 beads were loaded on a GS Titanium PicoTiterPlate PTP Kit 70×75 and sequenced with the GS Titanium Sequencing Kit XLR70 (Roche). The runs were performed overnight and analyzed on the cluster through the gsRunBrowser and Newbler assembler (Roche). A total of 428,372 passed filter wells were obtained and generated 101.3Mb of sequences with an average of length of 219 bp. The passed filter sequences were assembled using Newbler (Roche) with 90% identity and 40 bp as overlap. The final assembly identified 22 scaffolds and 82 large contigs (>1,500bp), and generated a genome size of 3.57Mb which corresponds to a coverage of 28.92x genome equivalent.

Genome annotation
Open reading frames (ORFs) were predicted using Prodigal [53] with default parameters. However, the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank [54] and Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAs and rRNAs were predicted using the tRNAScanSE [55] and RNAmmer [56] tools, respectively. Lipoprotein signal peptides and numbers of transmembrane helices were predicted using SignalP [57] and TMHMM [58], respectively. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we use an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. Artemis [59] and DNA Plotter [60] were used for data management and visualization of genomic features, respectively. The Mauve alignment tool (version 2.3.1) was used for multiple genomic sequence alignment [61]. To estimate the mean level of nucleotide sequence similarity at the genome level between D. fastidiosa JC13 T and another 6 genomes from members of the Erysipelotrichaceae family (Table 4), orthologous proteins were detected using the Proteinortho [62] and we compared genomes two by two and determined the mean percentage of nucleotide sequence identity among orthologous ORFs using BLASTn.

Genome properties
The genome is 3,574,031 bp long (one chromosome of 3,556,241 bp and one plasmid of 17,790 bp) with a GC content of 40.00% ( Figure 6 and Table 5). Of the 3,491 predicted chromosomal genes, 3,441 were protein-coding genes and 50 were RNAs. A total of 2,534 genes (72.58%) were assigned a putative function. ORFans accounted for 269 genes (7.81%) and the remaining genes were annotated as hypothetical proteins. The properties and statistics of the genome are summarized in Tables 5 and 6. The distribution of genes into COGs functional categories is presented in Table 6. The 17,790bp-long plasmid contains 17 protein-coding genes. A BLASTN search showed its closest match to be the DO plasmid from Enterococcus faecium (GenBank Accession number: NC017961).

Genome comparison of D. fastidiosa with other genomes of Erysipelotrichaceae family
Here, we compared the genome of D. fastidiosa JC13T with 6 other genomes from Erysipelotrichaceae family (Table 3    a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome  Description of Dielma fastidiosa sp. nov., gen nov. Dielma fastidiosa (fas.ti.di.o'sa. N. L. F. adj. from the Latin adjective fastidiosus excessively sensitive; referring to the difficulty to isolate this microorganism). It has been isolated from feces from an asymptomatic Senegalese patient.
The G+C content of the genome is 40%. The 16S rRNA and genome sequences are deposited in Genbank and EMBL under accession numbers JF824807 and CAEN00000000, respectively. The type strain is JC13 T (= CSUR P149 = DSM 26099) was isolated from the fecal flora of a healthy Senegalese patient.