Non contiguous-finished genome sequence and description of Senegalemassilia anaerobia gen. nov., sp. nov.

Senegalemassilia anaerobia strain JC110T sp.nov. is the type strain of Senegalemassilia anaerobia gen. nov., sp. nov., the type species of a new genus within the Coriobacteriaceae family, Senegalemassilia gen. nov. This strain, whose genome is described here, was isolated from the fecal flora of a healthy Senegalese patient. S. anaerobia is a Gram-positive anaerobic coccobacillus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,383,131 bp long genome contains 1,932 protein-coding and 58 RNA genes.


Introduction
Senegalemassilia anaerobia strain JC110 T (= CSUR P147 = DSMZ 25959) is the type strain of S. anaerobia gen. nov., sp. nov. This bacterium was isolated from the feces of a healthy Senegalese patient. It is a Gram-positive, anaerobic, indole-negative coccobacillus. Classically, the polyphasic taxonomy is used to classify the prokaryotes by associating phenotypic and genotypic characteristics [1]. Culturomics is a new subfield of genomics aimed at studying the microbial repertoire of the gut, and has already lead to the isolation of many new bacterial species [2]. In parallel, as more than 3,000 bacterial genomes have been sequenced so far, we proposed to integrate genomic data in descriptions of new bacterial species [3][4][5][6][7][8][9][10][11][12][13][14][15]. The family Coriobacteriaceae was created in 1997, in the class Actinobacteria, and currently contains 13 genera of anaerobic Gram-positive members of the normal intestinal microbiota from humans and animals [16][17][18][19][20][21][22][23][24][25][26][27][28]. Among them, Gordonibacter and Paraeggherthella have occasionally been isolated from Crohn's disease specimens [26]. Here we present a summary classification and a set of features for S. anaerobia gen. nov., sp. nov. strain JC110 T together with the description of the complete genomic sequencing and annotation. These characteristics support the circumscription of the genus Senegalemassilia and the species S. anaerobia.

Classification and features
A stool sample was collected from a healthy 16-year-old male Senegalese volunteer patient living in Dielmo (rural village in the Guinean-Sudanian zone in Senegal), who was included in a research protocol. Written assent was obtained from this individual. No written consent was needed from his guardians for this study because he was older than 15 years old (in accordance with the previous project approved by the Ministry of Health of Senegal, the assembled village population, and as published elsewhere [28]. Both this study and the assent procedure were approved by the National Ethics Committee of Senegal (CNERS) and the Ethics Committee of the Institut Fédératif de Recherche IFR48, Faculty of Medicine, Marseille, France (agreement numbers 09-022 and 11-017). Several other new bacterial species were isolated from this specimen using various culture conditions, including the recently described Anaerococcus senegalensis, Alistipes senegalensis, Alistipes timonensis, Peptoniphilus timonensis, Clostridium senegalense, Paenibacillus senegalensis and Bacillus timonensis, Herbaspirillum massiliense, Kurthia massiliensis, Brevibacterium senegalense, Aeromicrobium massiliense and Cellulomonas massiliensis [3][4][5][6][7][8][9][10][11][12][13][14][15].
The fecal specimen was preserved at -80°C after collection and sent to Marseille. Strain JC110 T (Table 1)

was isolated in February 2011. Standards in Genomic Sciences
The stool was preincubated for 5 days in a blood culture bottle, and then inoculated onto 5% sheep blood agar and incubated in anaerobic atmosphere at 37°C. The strain exhibited a nucleotide sequence similarity with members of the Coriobacteriaceae ranging from 85.3% with Atopobium parvulum to 92.4% with Enterorhabdus mucosicola (Figure 1). This value was lower than the 95% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new genus [33]. By comparison to the NR database, strain JC110 T also exhibited nucleotide sequence similarities greater than 99% with uncultured bacterial clones detected in metagenomic studies of the human gut flora. These bacteria are most likely classified within the same species as strain JC110 ( Figure 1). Different growth temperatures (25, 30, 37, 45°C) were tested; no growth occurred at 25°C or 45°C, weak growth occurred at 30°C, optimal growth was observed at 37°C. Colonies were transparent and smooth with 0.5 mm in diameter on blood-enriched Columbia agar and Brain Heart Infusion (BHI) agar.
Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux), and in the presence of air, of 5% CO2 and in aerobic conditions. Growth only occurred under anaerobic conditions. A motility test was positive. Cells grown on agar appear as Gram-positive coccobacilli ( Figure  2) and have a diameter ranging from 0.62 to 0.76 µm (mean of 0.70 µm) and a length ranging from 1.36 to 1.73 µm (mean of 1.56 µm)( Figure 3). , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [33]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.  Strain 110 T exhibited neither catalase nor oxidase activities. In the API Rapid ID 32A system, positive reactions were obtained for arginine dihydrolase, and nitrate reduction. A weak reaction was obtained for alkanine phosphatase. In the API ZYM system, positive reaction was observed for Naphthlol-AS-BI-phosphohydrolase and a weak reaction was observed for alkaline phosphatase and acid phosphatase. Negative reactions were observed for alkaline phosphatase, esterase, esterase lipase, lipase, leucine arylamidase, valine arylamidase, cystine arylamidase, trypsin, αchymotrypsin, α-galactosidase, β-galactosidase, βglucuronidase, α-glucosidase, β-glucosidase, Nacetyl-β-glucosaminidase, α-mannosidase and αfucosidase. In the API 50CH system, all reactions were negative. S. anaerobia is susceptible to amox-icillin, imipenem, metronidazole and gentamicin but resistant to trimethoprim/sulfamethoxazole. The comparisons with genera of the Coriobacteriaceae family are summarized in Table  2. Senegalemassilia anaerobia JC110 T shares motility with Gordonibacter pamelae,in contrast with Adlercreutzia equolifaciens, Enterorhabdus mucosicola, Eggerthela sinensis and Collinsella aerofaciens. In contrast with Collinsella aerofaciens, Senegalemassilia anaerobia was asaccharolytic. Among these species, JC110 T revealed a positive reaction for nitrate reductase. Lastly, we observed within the members of Coriobacteriaceae family a large heterogeneity of DNA G+C content ranging from 60% to 66.5% [ Table 2].  Matrix-assisted laser-desorption/ionization time-offlight (MALDI-TOF) MS protein analysis was carried out as previously described using a Microflex spectrometer (Bruker Daltonics, Germany) [34]. Briefly, a pipette tip was used to pick one isolated bacterial colony from an agar plate, and to spread it as a thin film on an MTP 384 MALDI-TOF target plate (Bruker Daltonics, Leipzig, Germany). Twelve distinct deposits were done for strain JC110 T from twelve isolated colonies. Each smear was overlaid with 2 µL of matrix solution (saturated solution of alpha-cyano-4hydroxycinnamic acid) in 50% acetonitrile, 2.5% trifluoracetic-acid, and allowed to dry for five minutes. Measurements were performed with a Microflex spectrometer (Bruker). Spectra were recorded in the positive linear mode for the mass range of 2,000 to 20,000 Da (parameter settings: ion source 1 (IS1), 20 kV; IS2, 18.5 kV; lens, 7 kV). A spectrum was obtained after 675 shots at a variable laser power. The time of acquisition was between 30 seconds and 1 minute per spot. The twelve JC110 T spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 3,769 bacteria, which were used as reference data, in the BioTyper database. The method of identification included the m/z from 3,000 to 15,000 Da. For every spectrum, a maximum of 100 peaks taken into account and compared with spectra in the database. A score enabled the identification, of the tested species: a score > 2 with a validly published species enabled the identification at the species level, a score > 1.7 but < 2 enabled the identification at the genus level; and a score < 1.7 did not enable any identification. For strain JC110 T , no significant score was obtained, suggesting that JC110 T was not a member of a known species or genus. We incremented our database with the spectrum from strain JC110 T (Figure 4). The gel view allowed us to highlight the spectra differences with other of Coriobactericeae family members ( Figure 5).

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the Coriobacteriaceae, and is part of a "culturomics" study of the human digestive flora aiming at isolating all bacterial species in human feces. It was the sixth genome of a species within the Coriobacteriaceae and the first genome of Senegalemassilia anaerobia gen. nov., sp. nov. A summary of the project information is shown in Table 3. The EMBL accession number is CAEM00000000 and consists 8 scaffolds. Table 3 shows the project information and its association with MIGS version 2.0 compliance.   The Gel View displays the raw spectra of all loaded spectrum files arranged in a pseudo-gel like look. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a Gray scale scheme code. The color bar and the right y-axis indicate the relation between the color a peak is displayed with and the peak intensity in arbitrary units.

Genome sequencing and assembly
Sequencing was performed using the 3kb pairedend strategy on a Roche 454 Titanium pyrosequencer . This project was loaded twice onto a 1/8 region of a PTP Picotiterplate (Roche, Meylan, France). DNA (5µg) was mechanically fragmented on a Hydroshear device (Digilab, Holliston, MA, USA) with an enrichment size at 3-4 kb. DNA fragmentation was visualized using the Agilent 2100 BioAnalyzer on a DNA labchip 7500 with an optimal size of 3,215 kb. The library was constructed according to the 454 Titanium pairedend protocol. Circularization and nebulization were performed and generated a pattern with an optimum at 363 bp. After PCR amplification through 15 cycles, followed by double size selection, the single stranded paired-end library was quantified using a Quant-it Ribogreen kit (Invitrogen) on the Genios Tecan fluorometer at 152 pg/µL. The library concentration equivalence was calculated to be 7.68E+08 molecules/µL. The library was stored at -20°C until further use. The library was clonally amplified with 1 cpb in 3 SV-emPCR reactions with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yield of the emPCR was 12.87%, in the 5 to 20% range from the Roche procedure. Approximately 340,000 beads were loaded onto each of the two 1/8 regions of GS Titanium PicoTiterPlates. Sequencing was performed using the GS Titanium Sequencing Kit XLR70. The runs were performed overnight and then analyzed on the cluster through the gsRunBrowser and Newbler assembler (Roche). A total of 256,934 passed filter wells were obtained and generated 74 Mb of DNA sequence with an average length of 289 bp. The passed filter sequences were assembled using Newbler with 90% identity and 40 bp as overlap. The final assembly yielded 8 scaffolds and 62 large contigs (>1,500 bp).

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [35] with default parameters. The predicted bacterial protein sequences were searched against the Genbank database and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAScanSE tool [36] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [37] and BLASTn against GenBank. ORFans were identified if their BLASTp E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. To estimate the mean level of nucleotide sequence similarity at the genome level between S. anaerobia and other members of the Coriobacteriaceae and among members of this family, we compared genomes two by two and determined the mean percentage of nucleotide sequence identity among orthologous ORFs using BLASTn Orthologous genes were detected using the Proteinortho software [38].

Genomes properties
The genome is 2,383,131 bp long (one chromosome, no plasmid) with a 60.9% G + C content (Table 4). Of the 1,990 predicted genes, 1,932 were protein-coding genes, and 58 were RNAs (1 rRNA operon and 55 tRNA genes). A total of 1,430 genes (68.12%) were assigned a putative function. Fifty-six genes were identified as ORFans (2,90%).
The remaining genes were annotated as hypothetical proteins (330 genes = 17.08%). The distribution of genes into COGs functional categories is presented in Table 5 and Figure 6. The properties and the statistics of the genome are summarized in Tables 4 and 5.  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.

Conclusion
On the basis of phenotypic (Table 2), phylogenetic and genomic analyses (

Description of Senegalemassilia anaerobia sp. nov.
Senegalemassilia anaerobia (an.a.e.ro'bi.a. N. L. F. adj. Gr. pref. an not; Gr. N. aer air; Gr.n.bios life; N.L. adj. anaerobia anaerobe, that can live in the absence of oxygen; referring to the respiratory metabolism of organism). It has been isolated from the feces of an asymptomatic Senegalese patient.
Gram-positive coccobacilli, 0.7 µm in diameter and 1.56µm in length. Strictly anaerobic. Mesophilic. Motile and non-sporulating. Colonies are transparent and smooth with 0.5 mm in diameter on blood-enriched Columbia agar. Catalase oxydase and indole negative. Arginine dihydrolase, nitrate reduction,alkanine phosphatase, acid phosphatase and Naphtlol-AS-BI-phosphohydrolase positive. Asaccharolytic. Cells are susceptible to amoxicillin, imipenem, metronidazole and gentamicin but resistant to trimethoprim/sulfamethoxazole. The 16S rRNA and genome sequences are deposited in Genbank and EMBL under accession numbers JF824809 and CAEM00000000, respectively. The G+C content of the genome is 60.9%. Habitat: human digestive tract. The type strain JC110 T (= CSUR P147 = DSMZ 25959) was isolated from the fecal flora of a healthy patient in Senegal.