Non-contiguous finished genome sequence and description of Bacillus massiliogorillae sp. nov.

Strain G2T sp. nov. is the type strain of B. massiliogorillae, a proposed new species within the genus Bacillus. This strain, whose genome is described here, was isolated in France from the fecal sample of a wild western lowland gorilla from Cameroon. B. massiliogorillae is a facultative anaerobic, Gram-variable, rod-shaped bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5,431,633 bp long genome (1 chromosome but no plasmid) contains 5,179 protein-coding and 98 RNA genes, including 91 tRNA genes.


Introduction
Strain G2 T (= CSUR P206 = DSM 26159) is the type strain of B. massiliogorillae sp. nov. This bacterium is a Gram-variable, facultatively anaerobic, indolenegative bacillus having rounded-ends. It was isolated from the stool sample of Gorilla gorilla gorilla as part of a "culturomics" study aiming at cultivating bacterial species within gorilla feces. The genus Bacillus ) was created about 140 years ago [1]. To date this genus, comprised mostly of Gram-positive, motile, and sporeforming bacteria, includes 276 species with validly published names [2]. Members of the genus Bacillus are ubiquitous bacteria isolated from various environments including soil, fresh and sea water, food, and occasionally from humans and animals in which they are either pathogens, such as B. anthracis (the causative agent of anthrax) [3] and B. cereus (associated mainly with food poisoning) [4], or saprophytes [5]. Bacillus species may also rarely be involved in a variety of human infections, including pneumonia, bacteremia, meningitis, endocarditis, endophthalmitis, osteomyelitis and skin/soft tissue infection [5]. However, in great apes, few data are available about the presence of the genus Bacillus. Recent reports have described the isolation of atypical B. anthracis (B. anthracis-like bacteria) in wild chimpanzees and gorillas from Africa [6][7][8].
Here we present a summary classification and a set of features for B. massiliogorillae sp. nov. strain G2 T together with the description of the complete genome sequence and annotation. These characteristics support the circumscription of the species B. massiliogorillae [9].

Classification and features
In July 2011, a fecal sample was collected from a wild western lowland gorilla near Messok, a village in the south-eastern part of the DJA FAUNAL Park (Cameroon). The collection of the stool sample was approved by the Ministry of Scientific Research and Innovation of Cameroon. No experimentation was conducted on this gorilla. The fecal specimen was preserved at -80°C after collection and sent to Marseille. Strain G2 T (Table 1) was isolated in January 2012 by cultivation on Brucella agar medium (Oxoid, Dardilly, France). This strain exhibited a 97.3% 16S rRNA nucleotide sequence similarity with Bacillus simplex, the phylogenetically closest validly published Bacillus species (Figure 1). This value was lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandtia and Beers to delineate a new species without carrying out DNA-DNA hybridization [23]. Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [21]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.  Figure 1. Phylogenetic tree highlighting the position of Bacillus massiliogorillae strain G2 T relative to other type strains within the Bacillus genus. GenBank accession numbers are indicated in parentheses. Sequences were aligned using CLUSTAL X (V2), and phylogenetic inferences obtained using the maximumlikelihood method within the MEGA 5 software [22]. Numbers at the nodes are percentages of bootstrap values obtained by repeating the analysis 1,000 times to generate a majority consensus tree. Clostridium botulinum was used as outgroup. The scale bar represents a 2% nucleotide sequence divergence.
Different growth temperatures (25, 30, 37, 45°C) were tested. Growth occurred at all tested temperatures, and the optimal growth was observed at 37°C. Colonies were 2-5 mm in diameter on Columbia agar, grey opaque in color. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux), and in aerobic conditions, with or without 5% CO 2 . Growth was achieved under aerobic (with and without CO 2 ), microaerophilic and anaerobic conditions. Gram staining showed Gram variable bacilli ( Figure 2). A motility test was positive. Cells grown on agar sporulate and the rods have a length ranging from 3.2 to 7.5 µm (mean 5.4 µm) and a diameter ranging from 0.8 to 1.2 µm (mean 1 µm) as determined by negative staining transmission electron microscopy ( Figure 3).
Strain G2 T exhibited catalase activity but not oxidase activity. Using the API 50CH system (BioMerieux), a positive reaction was observed for D-glucose, D-fructose, D-ribose, Nacetylglucosamine, amygdalin, arbutin, aesculin, salicin, cellobiose, maltose, D-lactose, D-trehalose, D-saccharose, and hydrolysis of starch. Using the API ZYM system, positive reactions were observed for esterase (C4), esterase lipase (C8), phosphatase acid, α-glucosidase and N-acetyl-βglucosaminidase. The urease reaction was also positive, but nitrate reduction and indole production were negative. B. massiliogorillae is susceptible to amoxicillin, nitrofurantoin, erythromycin, doxycycline, rifampin, vancomycin, gentamycin and imipenem but resistant to trimethoprimsulfamethoxazole, ciprofloxacin, ceftriaxon and amoxicillin-clavulanic acid. Standards in Genomic Sciences  When compared to other Bacillus species, B. massiliogorillae differed from B. simplex [24] for the utilization of amygdalin, cellobiose, lactose and glucose (Table 2). It also differed from B. psychrosaccharolyticus [25] in nitrate reductase and β-galactosidase production, and in the utilization of L-arabinose, mannitol, xylose and glycerol ( Table 2). Differences were also observed with B. circulans [26] in β-galactosidase production and the utilization of D-mannose, L-arabinose, Dxylose, mannitol, arabinose, xylose, glycerol and D-galactose (Table 2). Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was carried out as previously described [27,28]. Deposits were done for strain G2 T from 12 isolated colonies. Each smear was overlaid with 2µL of matrix solution (saturated solution of alpha-cyano-4hydroxycinnamic acid) in 50% acetonitrile, 2.5% tri-fluoracetic-acid, and allowed to dry for five minutes. Measurements were performed with a Microflex spectrometer (Bruker Daltonics, Leipzig, Germany). Spectra were recorded in the positive linear mode for the mass range of 2,000 to 20,000 Da (parameter settings: ion source 1 (IS1), 20 kV; IS2, 18.5 kV; lens, 7 kV). A spectrum was obtained after 675 shots at a variable laser power. The time of acquisition was between 30 seconds and 1 minute per spot. The 12 G2 T spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against 6,252 bacterial spectra including 199 spectra from 104 Bacillus species, used as reference data, in the BioTyper database. The method of identification included the m/z from 3,000 to 15,000 Da. For every spectrum, 100 peaks at most were taken into account and compared with spectra in the database. A score enabled the identification, or not, from the tested species: a score > 2 with a validated species enabled the identification at the species level, a score > 1.7 but < 2 enabled the identification at the genus level; and a score < 1.7 did not enable any identification. For strain G2 T , the scores obtained ranged from 1.177 to 1.343, thus suggesting that our isolate was not a member of a known species. We incremented our database with the spectrum from strain G2 T (Figure 4). Spectrum differences with other of Bacillus species are shown in Figure 5.

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the genus Bacillus, and is part of a "culturomics" study of the gorilla flora aiming at isolating all bacterial species within gorilla feces. It was the 61 st genome of a Bacillus species and the first genome of Bacillus massiliogorillae sp. nov.
A summary of the project information is shown in Table 2. The Genbank accession number is CAVL000000000 and consists of 66 large contigs. Table 3 shows the project information and its association with MIGS version 2.0 compliance [29].

Growth conditions and DNA isolation
B. massiliogorillae sp. nov. strain G2 T , CSUR P206, DSM 26159, was grown aerobically on 5% sheep blood-enriched Columbia agar at 37°C. Four petri dishes were spread and resuspended in 3x500µl of TE buffer and stored at 80°C. Then, 500µl of this suspension were thawed, centrifuged 3 minutes at 10,000 rpm and resuspended in 3x100µL of G2 buffer (EZ1 DNA Tissue kit, Qiagen). A first mechanical lysis was performed by glass powder on the Fastprep-24 device (Sample Preparation system, MP Biomedicals, USA) using 2x20 seconds cycles. DNA was then treated with 2.5µg/µL lysozyme (30 minutes at 37°C) and extracted using the BioRobot EZ1 Advanced XL (Qiagen). The DNA was then concentrated and purified using the Qiamp kit (Qiagen). The yield and the concentration was measured by the Quant-it Picogreen kit (Invitrogen) on the Genios Tecan fluorometer at 50ng/µl.

Genome sequencing and assembly
The paired-end library was prepared with 5 µg of bacterial DNA using the DNA fragmentation on the Covaris S-Series (S1, S2) instrument (Woburn, Massachusetts, USA) with an enrichment size at 3-5-kb. The DNA fragmentation was visualized through the Agilent 2100 BioAnalyzer on a DNA labchip 7500. The library was constructed according to the 454 GS FLX Titanium paired-end protocol (Roche). Circularization and nebulization were performed and generated a pattern with an optimum at 500 bp. After PCR amplification through 15 cycles followed by double size selection, the single stranded paired-end library was quantified using the Quant-it Ribogreen kit (Invitrogen) on the Genios Tecan fluorometer at 339 pg/µL. The library concentration equivalence was calculated as 1.00E+08 molecules/µL. The library was stored at -20°C until further use. The paired-end library was clonally amplified with 0.5 cpb and 1 cpb in 2 emPCR reactions with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [30] with default parameters but the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank database [31] and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAScanSE tool [32] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [33] and BLASTn against the GenBank database. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05.

Standards in Genomic Sciences
To estimate the mean level of nucleotide sequence similarity at the genome level between B. massiliogorillae sp nov. strain G2 T and another 3 Bacillus species (Table 6), we compared genomes pairwise and determined the mean percentage of nucleotide sequence identity among orthologous ORFs using BLASTn. Orthologous genes were detected using the Proteinortho software [34]. Figure 5. Gel view comparing Bacillus massiliogorillae G2 T spectra with other members of the Bacillus genus (B. thuringiensis, B. smithii, B.   simplex, B. psychrosaccharolyticus, B. nealsonii, B.   megaterium, B. lentus, B. flexus, B. firmus, B. circulans and B. benzoevorans). The Gel View displays the raw spectra of all loaded spectrum files arranged in a pseudo-gel like look. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a Gray scale scheme code. The color bar and the right y-axis indicate the relation between the color a peak is displayed with and the peak intensity in arbitrary units.

Genome properties
The genome is 5,431,633 bp long (1 chromosome, but no plasmid) with a 34.95% G+C content (Figure 6 and Table 5). It is composed of 66 large contigs. Of the 5,276 predicted genes, 5,179 were protein-coding genes and 98 were RNAs (1 16S rRNA, 1 23S rRNA gene, 5 5S rRNA genes and 91 tRNA genes). A total of 3,801 genes (73.39%) were assigned a putative function (by COGS or by NR BLAST) and 368 genes were identified as ORFans (7.11%). The remaining genes were annotated as hypothetical proteins (666 genes, 12.86%). The distribution of genes into COGs functional categories is presented in Table 6. The properties and statistics of the genome are summarized in Tables 4 and 5.  a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome  (Table 6), thus confirming its new species status. Table 6 summarizes the number of orthologous genes and the average percentage of nucleotide sequence identity between the different genomes studied.

Conclusion
On the basis of phenotypic (Table 2), phylogenetic and genomic analyses (taxonogenomics) ( Table  6), we formally propose the creation of Bacillus massiliogorillae sp. nov. that contains the strain G2 T . This strain has been found in a stool sample collected from gorilla in Cameroon.

Description of Bacillus massiliogorillae sp. nov.
Bacillus massiliogorillae (ma.sil.io.go.ril'ae. L. gen. masc. n. massiliogorillae, combination of Massilia, the Latin name of Marseille, where strain G2 T was isolated, and of Gorilla, the Latin name of the gorilla, from which the stool sample was obtained). B. massiliogorillae is an aerobic Gram-variable bacterium. Optimal growth is achieved aerobically. No growth is observed in microaerophilic or anaerobic conditions. Growth occurs on axenic media between 25 and 45°C, with optimal growth observed at 37°C. Cells stain Gram-positive or negative, are rodshaped, endospore-forming, motile and have a mean diameter of 1 µm (range 0.8 to 1.2 µm) and a mean length of 5.4 µm (range 3.2 to 7.5 µm). Colonies are grey opaque and 2-5 mm in diameter on bloodenriched BHI agar.
The G+C content of the genome is 34.95%. The 16S rRNA and genome sequences are deposited in GenBank under accession numbers JX650055 and CAVL00000000, respectively. The type strain G2 T (= CSUR P206 = DSM 26159) was isolated from the fecal flora of a Gorilla gorilla gorilla from Cameroon.