Non contiguous-finished genome sequence and description of Bacillus massiliosenegalensis sp. nov.

Bacillus massiliosenegalensis strain JC6T sp. nov. is the type strain of Bacillus massiliosenegalensis sp. nov., a new species within the genus Bacillus. This strain was isolated from the fecal flora of a healthy Senegalese patient. B. massiliosenegalensis is an aerobic Gram-positive rod-shaped bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,981,278-bp long genome comprises a 4,957,301-bp chromosome and a 23,977-bp plasmid. The chromosome contains 4,925 protein-coding and 72 RNA genes, including 4 rRNA genes. The plasmid contains 29 protein-coding genes.


Introduction
Bacillus massiliosenegalensis strain JC6 T (= CSUR P151 = DSM 25957) is the type strain of B. massiliosenegalensis sp. nov., a new species within the genus Bacillus. This bacterium is a Grampositive, aerobic, catalase-positive and indolenegative bacillus that was isolated from the stool of a healthy Senegalese patient as part of a study aimed at individually cultivating all human enteric bacterial species [1,2]. Currently, bacterial taxonomy relies on a combination of various genetic and phenotypic criteria. However, the three main genetic criteria that are used, including 16S rRNA gene-based phylogeny and nucleotide similarity [3,4], DNA-DNA hybridization [5] and G+C content suffer significant drawbacks and their cutoffs are not applicable to all genera and species. Over recent years, the introduction of high-throughput genome sequencing and proteomic analyses [6] provided a source of exhaustive information about characterized bacterial isolates. Such data may now be included among the criteria used for taxonomic identification. We recently proposed to use a polyphasic approach to describe new bacterial taxa that is based on their genome sequence, MALDI-TOF spectrum and main phenotypic characteristics [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25].
Here we present a summary classification and a set of features for B. massiliosenegalensis sp. nov. strain JC6 T together with the description of the complete genomic sequencing and annotation. These characteristics support the creation of the species B. massiliosenegalensis. The genus Bacillus (Cohn 1872) was created in 1872 [26] and currently consists of mainly Grampositive, motile, and spore-forming bacilli. Currently, 173 Bacillus species and 4 subspecies are validly published [27]. Members of the genus Bacillus are ubiquitous bacteria, mostly isolated from environmental sources. However, several species are associated with humans, either as pathogens or commensals [28].

Classification and features
A stool sample was collected from a healthy 16-year-old male Senegalese volunteer patient living in Dielmo (rural village in the Guinean-Sudanian zone in Senegal), who was included in a research protocol. Written assent was obtained from this individual. No written consent was needed from his guardians for this study because he was older than 15 years old (in accordance with the previous project approved by the Ministry of Health of Senegal and the assembled village population and as published elsewhere [29].) Both this study and the assent procedure were approved by the National Ethics Committee of  [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. The fecal specimen was conserved at -80°C after collection. Strain JC6 T was (Table 1) was isolated in January 2011 by cultivation on 5% sheep bloodenriched Brain Heart infusion (BHI) agar (Becton Dickinson, Heidelberg, Germany). The strain exhibited a 97.3% nucleotide sequence similarity with B. siralis (Pettersson et al. 2000), the phylogenetically closest Bacillus species ( Figure  1). , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [42]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. Standards in Genomic Sciences  Different growth temperatures (25, 30, 37, 45°C) were tested. Growth occurred between 25°C and 45°C, and the optimal growth was observed at 30°C. Colonies were translucent and 2 mm in diameter on blood-enriched Columbia agar. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems respectively (BioMérieux) in the presence of air with or without 5% CO 2 . Growth was achieved in aerobic condition (with or without CO 2 ), and weak growth was observed in microaerophilic and anaerobic conditions. Gram staining showed a rod-shaped Grampositive bacterium ( Figure 2). The motility test was positive by means of peritrichous flagella. Cells have a mean diameter of 0.65 µm and a mean length of 3.076 µm in electron microscopy ( Figure  3).
Strain JC6 T exhibited catalase activity but not oxidase activity. Using the API 50CH system, we observed positive reactions for aesculin, D-cellobiose, D-glucose, D-maltose, N-acetyl-glucosamine and Dtrehalose. Using the API ZYM system, a positive reaction was observed for α-glucosidase and weak reactions were observed for alkaline phosphatase, esterase lipase, valine arylamidase and trypsin. Using the API 20E system, a positive reaction was observed for nitrate reduction and negative reactions were observed for indole production and urease. B. massiliosenegalensis is susceptible to amoxicillin, ceftriaxone, imipenem, trimethoprim/sulfamethoxazole, gentamicin, ciprofloxacin, rifampicin and vancomycin, but resistant to metronidazole and erythromycin. The differential phenotypic characteristics with other Bacillus species are summarized in Table 2.   Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was carried out as previously described [8,43] using a Microflex spectrometer (Bruker Daltonics, Germany). Spectra were compared with the Bruker database that contained the main spectra from 3,769 bacteria including 129 spectra from 98 validly named Bacillus species. No significant score was obtained, thus suggesting that our isolate was not a member of a known species. We incremented our database with spectrum from strain JC6 T (Figure 4). Finally, the gel view allows us to highlight the spectra differences with other Bacillus genera members Figure 5.

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the genus Bacillus, and is part of a study of the human digestive flora aiming at isolating all bacterial species within human feces [1,2]. It was the 268 th genome of a Bacillus species, and the first genome of B.
massiliosenegalensis sp. nov. The Genbank accession number is CAHJ00000000 and consists of 102 contigs. Table 3 shows the project information and its association with MIGS version 2.0 compliance [44].

Genome sequencing and assembly
Both a shotgun and a 3kb paired-end libraries were constructed. The shotgun library was constructed with 500 ng of DNA as described by the manufacturer with the GS Rapid library Prep kit (Roche). For the paired-end library, 5µg of DNA was mechanically fragmented on the Hydroshear device (Digilab, Holliston, MA,USA) with an enrichment size at 3-4kb. The DNA fragmentation was visualized using an Agilent 2100 BioAnalyzer on a DNA labchip 7500 with an optimal size of 3.2kb. The library was constructed according to the 454 GS FLX Titanium paired-end protocol (Roche). Circularization and nebulization were performed and generated a pattern with an optimal at 555 bp. After PCR amplification through 17 cycles followed by double size selection, the single stranded paired-end library was then quantified on the Quant-it Ribogreen kit (Invitrogen) on the Genios Tecan fluorometer at 21pg/µL. The library concentration equivalence was calculated at 6.94E+07 molecules/µL. Libraries were stored at -20°C until further use. The shotgun library was clonally amplified with 2 cpb in 4 emPCR reactions, whereas the 3kb paired-end library was amplified with 1cpb in 9 emPCR reactions and 0.5 cpb in 2 emPCR with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yield of the shotgun emPCR reaction was 8.12%, and those of the 2 kinds of paired-end emPCR reactions were 7.8% and 11.2%, respectively. Such results were in the 5-20% quality range expected from the Roche procedure. For sequencing, the shotgun and paired-end libraries were loaded onto the 1/2 region and 4 1/4 regions of a PTP Picotiterplate 70×75 (Roche), respectively. The sequencing reactions were performed using a GS FLX Titanium sequencing kit XLR70 (Roche). The run was performed overnight and then analyzed on the cluster through the gsRunBrowser and Newbler assembler (Roche).
A total of 969,014 passed filter wells were obtained and generated 274 Mb of sequences with a length average of 286 bp. These sequences were assembled using Newbler with 90% identity and 40 bp as overlap. The final assembly identified 31 scaffolds and 129 large contigs (>1,500 bp), generating a genome size of 5.05 Mb, which corresponds to a 54.25-fold coverage.

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [45] with default parameters. However, the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank [46] and Clusters of Orthologous Groups (COG) databases using BLASTP. tRNAs and ribosomal RNAs were predicted using the tRNAScanSE [47] and RNAmmer [48] tools, respectively. Lipoprotein signal peptides and numbers of transmembrane helices were predicted using SignalP [49] and TMHMM [50], respectively. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. Artemis [51] was used for data management and DNA Plotter [52] was used for visualization of genomic features. The alignment tool, Mauve, was used for multiple genomic sequence alignment [53]. To estimate the mean level of nucleotide sequence similarity at the genome level between B. massiliosenegalensis and 5 other Bacillus genomes (Table 4), orthologous proteins were detected using the Proteinortho [54] and we compared genomes two by two and determined the mean percentage of nucleotide sequence identity among orthologous ORFs using BLASTn.

Genome properties
The genome is 4,981,278 bp long (chromosome: 4,957,301 bp, plasmid: 23,977 bp) with a GC content of 37.60% ( Figure 6 and Table 5). Of the 4,997 predicted chromosomal genes, 4,925 were protein-coding genes and 72 were RNAs. A total of 3,554 genes (72.16%) were assigned a putative function. ORFans accounted for 338 (6.86%) of the genes identified. The remaining genes were annotated as hypothetical proteins. The properties and statistics of the genome are summarized in Tables 5 and 6. The distribution of genes into COGs functional categories is presented in Table 6. The 23,977 bp-long plasmid contains 29 proteincoding genes. Figure 6. Graphical circular map of the chromosome. From outside to the center: Genes on the forward strand (colored by COG categories), genes on the reverse strand (colored by COG categories), RNA genes (tRNAs green, rRNAs red), GC content, and GC skew. a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome

Conclusion
On the basis of phenotypic, phylogenetic and genome analysis, we formally propose the creation of Bacillus massiliosenegalensis sp. nov. which contains the strain JC6 T . This strain has been found in Senegal.

Description of Bacillus massiliosenegalensis sp. nov.
Bacillus massiliosenegalensis (mas.si.li.o.se.ne.gal.en′sis. L. gen. masc. n. massiliosenegalensis, contraction of the Latin names of Marseille and Senegal, where strain JC6 T was cultivated and collected, respectively.) B. massiliosenegalensis is an aerobic Gram-positive bacterium. Optimal growth is achieved aerobically and weak growth is observed under microaerophilic or anaerobic conditions. Growth occurs on axenic media between 25 and 45°C, with optimal growth observed at 37°C. Cells stain Gram-positive, are rod-shaped, endosporeforming, motile and have a mean diameter of 0.65 µm and a mean length of 3.076 µm. Peritrichous flagellae were observed. Colonies are translucent and 2 mm in diameter on blood-enriched BHI agar.
No oxidase activity detected. Negative for indole. Presence of catalase activity. Using the API 50CH system, positive reactions are observed for aesculin, D-cellobiose, D-glucose, D-maltose, Nacetyl-glucosamine and D-trehalose. Using the API ZYM system, a positive reaction is observed for αglucosidase and weak reactions are observed for alkaline phosphatase, esterase lipase, valine arylamidase and trypsin. Using the API 20E system, a positive reaction is observed for nitrate reduction and a negative reaction is observed for urease. B. massiliosenegalensis is susceptible to amoxicillin, ceftriaxone, imipenem, trimethoprim/sulfamethoxazole, gentamicin, ciprofloxacin, rifampicin and vancomycin, but resistant to metronidazole and erythromycin. The G+C content of the genome is 37.60%. The 16S rRNA and genome sequences are deposited in Genbank and EMBL under accession numbers JF824800 and CAHJ00000000, respectively. The type strain is JC6 T (= CSUR P151 = DSM 25957) was isolated from the fecal flora of a healthy Senegalese patient.