Non-contiguous finished genome sequence and description of Bacteroides neonati sp. nov., a new species of anaerobic bacterium

Bacteroides neonati strain MS4T, is the type strain of Bacteroides neonati sp. nov., a new species within the genus Bacteroides. This strain, whose genome is described here, was isolated from a premature neonate stool sample. B. neonati strain MS4T is an obligate anaerobic Gram-negative bacillus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5.03 Mbp long genome exhibits a G+C content of 43.53% and contains 4,415 protein-coding and 91 RNA genes, including 9 rRNA genes.


Introduction
Bacteroides neonati strain MS4 T (= CSUR P 1500= DSM 26805), is the type strain of Bacteroides neonati sp. nov., and a new member of the genus Bacteroides. This bacterium is a Gram-negative, anaerobic, non spore-forming, indole positive bacillus that was isolated from a preterm neonate stool sample, during a study prospecting stool samples from patients with necrotizing enterocolitis and controls [unpublished]. To define a new bacterial species or genus, the "gold standard" method is the DNA-DNA hybridization and G+C content determination [1]. However, those methods are expensive, and poorly reproducible. The development of PCR and sequencing methods led to new ways of classifying bacterial species, using in particular 16S rDNA sequences with an internationally-validated cutoff value [2]. More recently, new bacterial genera and species are described using high throughput genome sequencing and mass spectrometric analyses, which allow access to a wealth of genetic and proteomic information [3,4]. We propose the description of a new bacterial species, using genome sequences, MALDI-TOF spectra, and the main phenotypic characteristics, as previously done [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. Here we present a summary classification and a set of features for B. neonati sp. nov. strain MS4 T (= CSUR P 1500= DSM 26805) together with a description of the complete genomic sequencing and annotation. These characteristics support the circumscription of a novel species, B. neonati sp. nov., within the Bacteroides genus. The Bacteroidaceae family is currently comprised of 3 genera: Acetomicrobium, Anaerorhabdus and Bacteroides. It is a heterogeneous family, grouping anaerobic and morphologically variable bacteria, and it is defined mainly on the basis of phylogenetic analyses of 16S rDNA sequences. The most closely related species to Bacteroides neonati sp. nov. is Bacteroides graminisolvens [23] followed by Bacteroides intestinalis [24]. Bacteroides neonati is a strictly anaerobic Gram negative, non spore-forming bacterium.

Classification and features
A stool sample was collected from a patient during a case-control study analyzing the fecal microbiota of premature neonates with necrotizing enterocolitis, using MALDI-TOF and 16S rRNA gene sequencing [unpublished]. After collection in Marseille, the specimen was preserved at -80°C. Strain MS4 T (Table 1) was isolated in October 2012, by anaerobic cultivation on 5% sheep blood-enriched Columbia agar (BioMerieux, Marcy l'Etoile, France). This strain exhibited a 94% nucleotide sequence similarity with Bacteroides graminisolvens [23] and a 94% nucleotide sequence similarity with Bacteroides intestinalis [24]. Those similarity values are lower than the threshold recommended to delineate a new spe-cies without carrying out DNA-DNA hybridization [38]. In the inferred phylogenetic tree, it forms a distinct lineage close to Bacteroides graminisolvens ( Figure 1). , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [37]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgments. Sequences were aligned using CLUSTALW, and phylogenetic inferences obtained using the maximumlikelihood method within the MEGA 4 software [39]. Numbers at the nodes are bootstrap values obtained by repeating the analysis 500 times the analysis to generate a majority consensus tree. The scale bar represents a 2% nucleotide sequence divergence.
Seven different growth temperatures (23°C, 25°C, 28°C, 32°C, 35°C, 37°C, 50°C) were tested; no growth occurred at 50°C, growth occurred between 23° and 37°C, and optimal growth was observed at 37°C. Colonies are punctiform, medium-sized, grey, shiny and round on blood-enriched Columbia agar under anaerobic conditions using GENbag anaer (BioMérieux). Bacteria were grown on bloodenriched Columbia agar (Biomerieux) and in Trypticase-soy TS broth medium, under anaerobic conditions using GENbag anaer (BioMérieux). They also were grown under anaerobic conditions on BHI agar and on BHI agar supplemented with 1% NaCl. Growth was achieved only anaerobically on blood-enriched Columbia agar and weakly on BHI agar as well as BHI agar supplemented with 1% NaCl after 72h incubation. Gram staining showed plump non spore-forming Gram-negative bacilli ( Figure 2). The motility test was negative. Cells grow anaerobically in TS broth medium have a mean wide of 0.681 µm (min = 0.323 µm; max = 0.878 µm) and a mean length of 2.165 µm (min = 1.402; max = 2.951), as determined using electron microscopic observation after negative staining ( Figure 3). Strain MS4 T exhibited catalase activity but no oxidase activities. Using API 20A, a positive reaction could be observed only weekly for Gelatinase. Using Api Zym, a positive reaction was observed for alkaline phosphatase (40 nmol of hydrolyzed substrata), acid phosphatase (40 nmol), naphtolphosphohydrolase (20 nmol), esterase (20 nmol), esterase lipase (5 nmol), alphagalactosidase (5 nmol), beta-galactosidase (20 nmol), beta-glucuronidase (30 nmol), betaglucosidase (5 nmol), N-acetyl-beta-glucosaminidase (40 nmol) and alpha-fucosidase (5 nmol). Using Api rapid id 32A, a positive reaction was observed for alpha-galactosidase, alphaglucosidase, N-acetyl-beta-glucosaminidase and alpha-fucosidase. Regarding antibiotic susceptibility, Bacteroides neonati was susceptible to clavulanate-amoxicillin, imipenem and metronidazole. When compared to the representative species within the genus Bacteroides, B. neonati exhibits the phenotypic characteristics detailed in Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was carried out as previously described [41]. A pipette tip was used to pick one isolated bacterial colony from a culture agar plate, and to spread it as a thin film on a MTP 384 MALDI-TOF target plate (Bruker Daltonics, Germany). Ten distinct deposits were done for strain MS4 T from ten isolated colonies. Each smear was overlaid with 2 µL of matrix solution (saturated solution of alphacyano-4-hydroxycinnamic acid) in 50% acetonitrile, 2.5% tri-fluoracetic acid, and allowed to dry for five minutes. Measurements were performed with a Microflex spectrometer (Bruker). Spectra were recorded in the positive linear mode for the mass range of 2,000 to 20,000 Da (parameter settings: ion source 1 (ISI), 20kV; IS2, 18.5 kV; lens, 7 kV). A spectrum was obtained after 675 shots at a variable laser power. The time of acquisition was between 30 seconds and 1 minute per spot. The ten MS4 T spectra were imported into the MALDI Bio Typer software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 6,335 bacteria, in the Bio Typer database. The method of identification includes the m/z from 3,000 to 15,000 Da. For every spectrum, 100 peaks at most were taken into account and compared with the spectra in database. A score enabled the identification, or not, from the tested species: a score > 2 with a validated species enabled the identification at the species level; a score > 1.7 but < 2 enabled the identification at the genus level; and a score < 1.7 did not enable any identification. For strain MS4 T , the best-obtained score was 1.345, which is not significant, suggesting that our isolate was not a member of a known genus. The reference spectrum from strain MS4 T (Figure 4) was added to our database. A dendrogram was constructed with the MALDI Bio Typer software (version 2.0, Bruker), comparing the reference spectrum of strain MS4 with reference spectra of 26 bacterial species, all belonging to the order of Bacteroidetes. In this dendrogram, strain MS4 T appears as a separated branch within the genus Bacteroides ( Figure 5).

Genome sequencing and annotation Genome project history
The organism was selected for sequencing because it was isolated from a premature neonate stool sample as part of a study prospecting stool samples from patients with necrotizing enterocolitis. The Genbank accession number is HG726019 -HG726036 and consists of 18 scaffolds with a total of 35 contigs. Table 3 shows the project information and its compliance with MIGS version 2.0 standards.

Growth conditions and DNA isolation
Bacteroides neonati strain MS4 T (= CSUR P 1500= DSM 26805), was grown on blood agar medium at 37°C under anaerobic conditions. Eight petri dishes were spread and resuspended in 5 ×100µl of G2 buffer. A first mechanical lysis was performed using glass powder in the Fastprep-24 Sample Preparation system (MP Biomedicals, USA) with 2×20 second bursts. DNA was then incubated with lysozyme (30 minutes at 37°C) and extracted on a BioRobot EZ 1 Advanced XL (Qiagen). The DNA was then concentrated and purified on a Qiamp kit (Qiagen). The yield and the concentration were measured by the Quant-it Picogreen kit (Invitrogen) on the Genios_Tecan fluorometer at 15.7ng/µl.

Genome sequencing and assembly
A 3 kb paired end library was pyrosequenced on the 454 Roche Titanium. This project was loaded on a 1/4 region on PTP Picotiterplates. 5 µg of DNA was mechanically fragmented with a Hydroshear device (Digilab, Holliston, MA, USA) with an enrichment size at 3-4kb. The DNA fragmentation was visualized with an Agilent 2100 BioAnalyzer on a DNA labchip 7500 with an average size of 3.2 kb. The library was constructed according to the 454 Titanium paired end protocol supplied by the manufacturer. Circularization and nebulization were performed and generated a pattern with an optimal at 604 bp. After PCR amplification through 15 cycles followed by double size selection, the single stranded paired end library was then quantified on the Agilent 2100 BioAnalyzer on a RNA pico 6,000 labchip at 91pg/µL. The library concentration equivalence was calculated at 2.76 x 10 8 molecules/µL. The library was stored at -20°C until used.
The library was clonally amplified with 0.5 and 1 cpb in 2 emPCR reactions in each condition with the GS Titanium SV emPCR Kit (Lib-L) v2. The yield of the emPCR was 10.46 and 11.53%, respectively, according to the quality expected by the range of 5 to 20% from the Roche procedure. 790,000 beads were loaded on the GS Titanium PicoTiterPlates PTP Kit 70×75 sequenced with the GS Titanium Sequencing Kit XLR70 The 454 sequencing generated 811,269 reads (180 Mb, coverage of 27.0) assembled into contigs and scaffolds using Newbler version 2.8 (Roche, 454 Life Sciences) and Mira assembler v3.2 [42]. The obtained contigs were combined using the Opera software v1.2 [43] in tandem with GapFiller V1.10 [44] to reduce the set. Finally, some manual refinements using CLC Genomics software v4.7.2 (CLC bio, Aarhus, Denmark) were made. The genome consists of 35 contigs in18 scaffolds.

Genome properties
The genome of B. neonati strain MS4 T is estimated to be 5.03 Mb long with a G+C content of 43.53% ( Figure 6 and Table 4). A total of 4,415 proteincoding and 91 RNA genes, including 9 rRNA genes, 65 tRNA, 1 tmRNA and 39 miscellaneous other RNA were founded. The majority of the proteincoding genes were assigned a putative function (69.26%) while the remaining ones were annotated as hypothetical proteins. The properties and the statistics of the genome are summarized in Table 4. The distribution of genes into COG functional categories is presented in Table 5.  a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome

Insights into the genome sequence
We made some brief comparisons against Bacteroides intestinalis DSM 17393 (ABJL00000000) that is currently the closest available sequenced genome. This genome is composed of 8 contigs (ABJL02000001-ABJL02000008).
The draft genome sequence of Bacteroides neonati has a smaller size compared to the Bacteroides intestinalis (respectively, 5.03 Mb against 6.05 Mb). The G+C content is very close to Bacteroides intestinalis (respectively, 43.53% and 42.8%). Bacteroides neonati has slightly fewer genes (4,506 genes against 4,984 genes), and a higher ratio of genes per Mb (895.82 genes/Mb against 823.8 genes/Mb). The total is based on the total number of protein coding genes in the annotated genome. Table 6 presents the difference of gene number (in percentage) related to each COG category between Bacteroides neonati and Bacteroides intestinalis. The proportion of COG is highly similar between the two species. The maximum difference is related to the COG "Carbohydrate Metabolism and transportation" which does not exceed 2.28%.

Conclusion
On the basis of phenotypic, phylogenetic and genomic analysis, we formally propose the creation of Bacteroides neonati that contains the strain MS4 T . This bacterium has been found in Marseille, France.

Description of Bacteroides neonati sp. nov.
Bacteroides neonati (neo.na'ti L. gen. masc. n. neonati, because this new species has been first isolated from a preterm neonate stool sample)is a Gram-negative bacillus; Obligate anaerobic; Nonspore-forming bacterium; Grows on axenic medium at 37°C in anaerobic atmosphere; Negative for indole; Non motile; The G+C content of the genome is 43.53%. The type strain is MS4 T (= CSUR P 1500 = DSM 26805).