Non-contiguous finished genome sequence and description of Paenibacillus gorillae sp. nov.

Strain G1T sp. nov. is the type strain of Paenibacillus gorillae a newly proposed species within the genus Paenibacillus. This strain, whose genome is described here, was isolated in France from the fecal sample of a wild western lowland gorilla from Cameroon. P. gorillae is a facultative anaerobic, Gram-negative, rod-shaped bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 6,257,967 bp long genome (one chromosome but no plasmid) contains 5,856 protein-coding and 62 RNAs genes, including 60 tRNA genes.

Here we present a summary classification and a set of features for P. gorillae sp. nov. strain G1 T together with the description of the complete genome sequence and annotation. These characteristics support the circumscription of the species P. gorillae [26].

Classification and features
In July 2011, a fecal sample was collected from a wild western lowland gorilla near Minton, a village in the south-central part of the DJA FAUNAL Park (Cameroon). The collection of the stool sample was approved by the Ministry of Scientific Research and Innovation of Cameroon. No experimentation was conducted on this gorilla. The fecal specimen was preserved at -80°C after collection and sent to Marseille. Strain G1 T (Table 1) was isolated in January 2012 by cultivation on Columbia agar with sheep blood 5% (BioMerieux, France). This strain exhibited a 98.28% 16S rRNA nucleotide sequence similarity with Paenibacillus xinjiangensis, the phylogenetically closest validly published Paenibacillus species (Figure 1). This value was lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [42]. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [40]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.

Figure 1.
Phylogenetic tree highlighting the position of Paenibacillus gorillae strain G1 T relative to other type strains within the Paenibacillus genus. GenBank accession numbers are indicated in parentheses. Sequences were aligned using CLUSTAL X (V2), and phylogenetic inferences obtained using the maximum-likelihood method within the MEGA 5 software [41]. Numbers at the nodes are percentages of bootstrap values obtained by repeating the analysis 1,000 times to generate a majority consensus tree. Brevibacillus brevis was used as outgroup. The scale bar represents a 2% nucleotide sequence divergence.
Different growth temperatures (25, 30, 37, 45°C) were tested. Growth occurred for the temperatures (25°C-37°C), but the optimal growth was observed at 25°C. Colonies were 2-8 mm in diameter on Columbia agar, appear whitish in color at 25°C and produce a clear liquid. Growth of the strain was tested under anaerobic and microaerophilic conditions using the GENbag anaer and GENbag microaer systems, respectively (BioMérieux), and in aerobic conditions, with or without 5% CO2.
Growth was achieved under aerobic (with and without CO2), microaerophilic and anaerobic conditions. Gram staining showed Gram-negative bacilli ( Figure 2). A motility test was positive. Cells grown on agar sporulate and the rods have a length ranging from 2.5 to 3.97 µm (mean 3.2 µm) and a diameter ranging from 0.76 to 0.83 µm (mean 0.79 µm) as determined by negative staining transmission electron microscopy ( Figure 3). Standards in Genomic Sciences Strain G1 T exhibited oxidase activity but not catalase activity. Using the API 50CH system (BioMerieux), a positive reaction was observed for D-mannose, amygdalin, L-arabinose, cellobiose, lactose, D-xylose, D-glucose, mannitol, arabinose, xylose, glycerol, D-galactose, N-acetylglucosamine, arbutin, aesculin, D-sorbitol, D-maltose, Dsaccharose, D-trehalose, D-tagatose, L-rhamnose, salicin, adonitol, D-melibiose, D-raffinose, Dribose, D-fructose and hydrolysis of starch. Negative reactions were observed for potassium gluconate, potassium 2-cetogluconate, inulin, Dmelezitose, Glycogen, β-gentiobiose, D-turanose, methyl-αD-mannopyranoside and methyl-αDglucopyranoside. Using the API ZYM system, negative reactions were observed for lipase (C14), achymotrypsin, esterase (C4), esterase lipase (C8), naphthyl-AS-BI-phosphohydrolase, phenylalanine arylamidase, leucine arylamidase, cystine arylamidase, valine arylamidase, glycine arylamidase, arginine arylamidase and βglucosidase. Using the API Coryne system, positive reactions were observed for β-glucuronidase, alkaline phosphatase, α-glucosidase, α-galactosidase and N-acetyl-β-glucosaminidase activities. The urease reaction, nitrate reduction and indole production were negative. P. gorillae is susceptible to imipenem, rifampicin, gentamycin, nitrofurantoin and vancomycin, but resistant to metronidazole, trimethoprim/sulfamethoxazole, ceftriaxone, ciprofloxacin and amoxicillin. When compared to other Paenibacillus species [43][44][45][46] and Brevibacillus brevis [47], P. gorillae sp. nov. strain G1 T exhibited the phenotypic differences detailed in Table 2. Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was carried out as previously described [14] using a Microflex spectrometer (Bruker Daltonics, Leipzig, Germany). Twelve distinct deposits were made for strain G1 T from 12 isolated colonies. The 12 G1 T spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against 6,252 bacterial spectra including 123 spectra from 67 Paenibacillus species, used as reference data, in the BioTyper database. Interpretation of scores was as follows: a score > 2 to a validly published species enabled the identification at the species level, a score > 1.7 but < 2 enabled the identification at the genus level; and a score < 1.7 did not enable any identification. For strain G1 T , the obtained scores ranged from 1.177 to 1.343, thus suggesting that our isolate was not a member of a known species. We incremented our database with the spectrum from strain G1 T (Figure 4). Spectrum differences with other of Paenibacillus species are shown in Figure 5.    Gel view comparing Paenibacillus gorillae G1 T spectra with other members of the Paenibacillus genus (P. massiliensis, P. kobensis, and P. alvei) and with Brevibacillus brevis and "Gorillibacterium massiliense". The Gel View displays the raw spectra of all loaded spectrum files arranged in a pseudo-gel like look. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a Gray scale scheme code. The color bar and the right y-axis indicate the relation between the color a peak is displayed with and the peak intensity in arbitrary units.

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the genus Paenibacillus, and is part of a "culturomics" study of the gorilla flora which aims to isolate all bacterial species within gorilla feces. It is the 44 th genome of a Paenibacillus species sequenced and the first genome of Paenibacillus gorillae sp. nov. sequenced. A summary of the project information is shown in Table 3. The Genbank accession number is CBVJ000000000 and consists of 167 contigs (150 large contigs). Table 3 shows the project information and its association with MIGS version 2.0 compliance [48].

Genome sequencing and assembly
A shotgun and a 3 kb paired end library were pyrosequenced on the 454_Roche_Titanium. This project was loaded on a 1/4 region for each application on PTP Picotiterplates. The shotgun library was constructed with 500ng of DNA as describes by the manufacturer Roche with the Rapid library Preparation kit for XL+. The concentration of the shotgun library was measured with a TBS fluorometer and determined to be 2.89E+09 molecules/µL. The paired-end library was prepared with 5 µg of bacterial DNA using the DNA fragmentation on a Covaris S-Series (S2) instrument (Woburn, Massachusetts, USA) with an enrichment size at 3.2kb. The DNA fragmentation was visualized with an Agilent 2100 BioAnalyzer on a DNA labchip 7500. The library was constructed according to the 454 GS FLX Titanium paired-end protocol (Roche). Circularization and nebulization were performed and generated a pattern with an optimum at 591bp. After PCR amplification through 17 cycles followed by double size selection, the single stranded paired-end library was quantified using the Quant-it Ribogreen kit (Invitrogen) on a Genios Tecan fluorometer at 691 pg/µL. The library concentration equivalence was calculated as 1.07E+10 molecules/µL. The library was stored at -20°C until further use. The shotgun XL+ library was clonally amplified with 6 cpb in 2 emPCR reactions. The paired-end library was clonally amplified with 0.5 cpb in 3 emPCR reactions with the GS Titanium SV emPCR

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [49] with default parameters but the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank database [50] and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAScanSE tool [51] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [52] and BLASTn against the GenBank database. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05.
To estimate the mean level of nucleotide sequence similarity at the genome level between P. gorillae sp nov. strain G1 T and other Paenibacillaceae species, we use the Average Genomic Identity of orthologous gene Sequences (AGIOS) program. Briefly, this software combines the Proteinortho software [53] to detect orthologous proteins between genomes compared on a pair-wise basis, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needleman-Wunsch global alignment algorithm.

Genome properties
The genome 6,257,967 bp long (1 chromosome, but no plasmid) with a 48,80% G+C content (Fig-ure 6 and Table 4). It is composed of 167 contigs (150 large contigs, 11 scaffolds). Of the 5,918 predicted genes, 5,856 were protein-coding genes and 62 were RNAs (1 gene is 16S rRNA, 1 gene is 23S rRNA and 60 are tRNA genes). A total of 4,296 genes (73.36%) were assigned a putative function (by COGS or by NR blast) and 304 genes were identified as ORFans (5.19%). The remaining genes were annotated as hypothetical proteins (917 genes, 15.66%). The distribution of genes into COGs functional categories is presented in Table 5. The properties and statistics of the genome are summarized in Tables 4 and 5.  24.59 a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome Not in COGs a The total is based on the total number of protein coding genes in the annotated genome.

Genomic comparison of P. gorillae and other members of the family Paenibacillaceae
Here, we compared the genome of P. gorillae strain G1 T with those of "G. massiliense" strain G5 T , P. elgii strain B69, P. alvei strain DSM 29, P. massiliensis strain DSM 16942 and B. brevis strain NBRC 100599 ( Table 6). The draft genome of P. gorillae is larger in size than that of "G. massiliense" (6.25 vs 5.54 Mb) and smaller in size than those of P. elgii, P. alvei, P. massiliensis and B. brevis (6.25 vs 7.96, 6.83, 6.39 and 6.3 Mb respectively). P. gorillae has a lower G+C content than those of "G. massiliense" and P. elgii (48.8% vs 50.39% and 52.6% respectively) but higher than those of P. alvei and B. brevis (48.8% vs 45.9% and 47.3% respectively) and slightly higher than P. massiliensis (48.8% vs 48.5%). The protein con-tent of P. gorillae is lower than that of P. elgii, P. alvei and B. brevis (5,856 vs 7,597, 6,823 and 5,946 respectively) but higher than that of "G. massiliense" and P. massiliensis (5,856 vs 5,146 and 5,496 respectively) ( Table 6). In addition, P. gorillae shares 1,987, 2,380, 2,055, 2,121 and 1,935 orthologous genes with "G. massiliense", P. elgii, P. alvei, P. massiliensis and B. brevis, respectively ( Table 7). The nucleotide sequence identity of orthologous genes ranges from 66.3 to 68.7% among previously published genomes, and from 65.7 to 68.6% between P. gorillae and the other studied genomes (Table 7), thus confirming its status as a new species. Table 7 summarizes the number of orthologous genes and the average percentage of nucleotide sequence identity between the different genomes studied.

Conclusion
On the basis of phenotypic (Table 2), phylogenetic and genomic analyses (taxonogenomics) ( Table  7), we formally propose the creation of Paenibacillus gorillae sp. nov. that contains the strain G1 T . This strain has been found in gorilla stool sample collected from Cameroon.