Genome sequence and description of Bacteroides timonensis sp. nov.

Bacteroides timonensis strain AP1T (= CSUR P194 = DSM 26083) is the type strain of B. timonensis sp. nov. This strain, whose genome is described here, was isolated from the fecal flora of a 21-year-old French Caucasoid female who suffered from severe anorexia nervosa. Bacteroides timonensis is a Gram-negative, obligate anaerobic bacillus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 7,130,768 bp long genome (1 chromosome, no plasmid) exhibits a G+C content of 43.3% and contains 5,786 protein-coding and 59 RNA genes, including 2 rRNA genes.


Introduction
Bacteroides timonensis strain AP1 T (= CSUR P194 = DSM 26083) is the type strain of B. timonensis sp. nov. This bacterium was isolated from the stool sample of a 21-year-old French Caucasoid female in an effort of cultivating individually all bacterial species within human feces [1]. It is a Gram-negative, anaerobic, indole-positive rodshaped bacillus. The conventional genetic parameters used in the delineation of bacterial species include 16S rRNA sequence identity and phylogeny [2,3], genomic G + C content diversity and DNA-DNA hybridization (DDH) [4,5]. These tools have limitations, notably because their cutoff values vary across species or genera [6]. With the introduction of high-throughput sequencing techniques [7], a wealth of genomic data was made available for many bacterial species. We recently proposed to include genomic data in a polyphasic approach to describe new bacterial taxa (taxono-genomics) [8]. This strategy combines phenotypic characteristics, notably the MALDI-TOF MS spectrum, and genomic analysis . Here, we present a summary classification and a set of features for B. timonensis sp. nov. strain AP1 T (= CSUR P194 = DSM 26083) together with the description of the complete genome sequencing and annotation. These characteristics support the circumscription of the type species, B. timonensis. The genus Bacteroides (Castellani and Chalmers 1919) was created in 1919 [38]. Currently, it is one of the largest genera among the human gut microbiota [39], and consists of 91 species and 5 subspecies with validly published names [40]. Bacteroides species are Gram-negative, nonspore-forming, non-motile and anaerobic rods that are generally isolated from the gastrointestinal tract of mammals [41]. They have symbiotic relationships with humans and play many beneficial roles on normal intestinal physiology and function. Several Bacteroides species are identified as opportunistic pathogens when isolated from anaerobic infections [42].

Classification and features
A stool sample was collected from 21-year-old French Caucasoid female who suffered from severe restrictive anorexia nervosa from the age of 12 years. At the time of sample collection, she had been hospitalized for recent aggravation of her medical condition (BMI: 10.4 kg/m 2 ). The patient's written consent and the agreement of the local ethics committee of the IFR48 (Marseille, France) were obtained under agreement number 09-022. The feces sample of this patient was stored at -80°C immediately after collection. Strain AP1 T (Table 1) was isolated in November 2011 after 1 month of incubation in Columbia agar (BioMerieux, Marcy l'Etoile, France). Several other new bacterial species were isolated from this stool specimen using various culture conditions. When compared to sequences available in GenBank, the 16S rRNA gene sequence of B. timonensis strain AP1 T (GenBank accession number JX041639) exhibited an identity of 97.00% with Bacteroides cellulosilyticus ( Figure  1). This value was the highest similarity observed, but was lower than the 97.8% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers (2006) to delineate a new species without carrying out DNA-DNA hybridization [3], and was in the 74. 8  , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [55]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. Four different growth temperatures (25, 30, 37, 45°C) were tested; growth occurred between 25 and 37°C, but optimal growth was observed at 37°C, 24 hours after inoculation. No growth occurred at 45°C. Colonies were translucent and approximately 0.3 mm in diameter on 5% sheep blood-enriched Columbia agar (BioMerieux).
Growth of the strain was tested in the same agar under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMerieux), and under aerobic conditions, with or without 5% CO2. Growth was observed under anaerobic and microaerophilic conditions, and only weakly with 5% CO2. No growth occurred under aerobic condition without CO2. Gram staining showed short Gram-negative rods unable to form spores ( Figure 1). A motility test was negative. Cells grown on agar are translucent and exhibit a mean diameter of 0.88 µm in electron microscopy ( Figure 2, Figure 3). Strain AP1 T exhibited catalase but no oxidase activity (Table 2). Using an API Rapid ID 32A strip (BioMerieux), positive reactions were obtained for arginine dihydrolase, α-galactosidase, βgalactosidase, α-glucosidase, β-glucosidase, αarabinosidase, N-acetyl-β-glucosaminidase, glutamic acid decarboxylase, α-fucosidase, nitrate reduction, indole production, alkaline phosphatase, proline arylamidase, leucyl glycine arylamidase, alanine arylamidase, glutamyl glutamic acid arylamidase, and fermentation of mannose and raffinose. Weak activities were observed for glycine arylamidase and serine arylamidase. Negative reactions were obtained for urease, β-galctosidase-6-phosphatase, βglucuronidase, arginine arylamidase, phenylalanine arylamidase, leucine arylamidase, pyroglutamic acid arylamidase, tyrosine arylamidase and histidine arylamidase. Using an API 50CH strip (Biomerieux), strain AP1 T was asaccharolytic. B. timonensis is susceptible to amoxicillin-clavulanate, ceftriaxon, imipenem, trimethoprim-sulfamethoxazole, metronidazole and doxycycline but resistant to amoxicillin, vancomycin and gentamicin. By comparison with other Bacteroides species, B. timonensis dif-fered in production of indole, nitrate reductase, β-galactosidase and acidification of sugars. Matrix-assisted laser-desorption/ionization time-of-flight (MALDI-TOF) MS protein analysis was carried out as previously described [68]. Briefly, a pipette tip was used to pick one isolated bacterial colony from a culture agar plate and spread it as a thin film on a MTP 384 MALDI-TOF target plate (Bruker Daltonics, Leipzig, Germany). Twelve distinct deposits from twelve isolated colonies were performed for strain AP1 T . Each smear was overlaid with 2 µL of matrix solution (saturated solution of alpha-cyano-4-hydroxycinnamic acid) in 50% acetonitrile, 2.5% tri-fluoracetic acid, and allowed to dry for 5 minutes. Measurements were performed with a Microflex spectrometer (Bruker). Spectra were recorded in the positive linear mode for the mass range of 2,000 to 20,000 Da (parameter settings: ion source 1 (ISI), 20kV; IS2, 18.5 kV; lens, 7 kV). A spectrum was obtained after 675 shots with variable laser power. The time of acquisition was between 30 seconds and 1 minute per spot. The twelve AP1 T spectra were import-ed into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 3,769 bacteria, including 129 spectra from 98 Bacteroides species. The method of identification included the m/z from 3,000 to 15,000 Da. For every spectrum, a maximum of 100 peaks were compared with spectra in database. The resulting score enabled the identification of tested species, or not: a score 2 with a validly published species enabled identification at the species level, a score 1.7 but < 2 enabled identification at the genus level, and a score < 1.7 did not enable any identification. No significant MALDI-TOF score was obtained for strain AP1 T against the Bruker database, suggesting that our isolate was not a member of a known species. We added the spectrum from strain AP1 T to our database ( Figure  4). Finally, the gel view showed the spectral differences with other members of the genus Bacteroides ( Figure 5).  The gel view displays the raw spectra of loaded spectrum files as a pseudo-electrophoretic gel. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a grey scale scheme code. The grey scale bar on the right y-axis indicate the relation between the shade of grey a peak is displayed with and the peak intensity in arbitrary units. Displayed species are detailed in the left column.

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA gene sequence similarity to members of the genus Bacteroides, and is part of a study of the human digestive flora aiming at isolating all bacterial species within human feces [1]. It was the ninety-ninth genome of a Bacteroides species and the first genome of B. timonensis sp. nov. The GenBank accession number is CBVI000000000 and consists of 211 contigs. Table 3 shows the project information and its association with MIGS version 2.0 compliance [43].

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [69] with default parameters. However, the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank [70] and Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAs and rRNAs were predicted using the tRNAScan-SE [71] and RNAmmer [72] tools, respectively. Signal peptides and numbers of transmembrane helices were predicted using SignalP [73] and TMHMM [74], respectively. Mobile genetic elements were predicted using PHAST [75] and RAST [76]. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. Artemis [77] and DNA Plotter [78] were used for data management and visualization of genomic features, respectively. The Mauve alignment tool (version 2.3.1) was used for multiple genomic sequence alignment [79].
To estimate the mean level of nucleotide sequence similarity at the genome level between B. timonensis and 9 other members of the genus Bacteroides (Table 6), we used the Average Genomic Identity Of gene Sequences (AGIOS) inhouse software [8]. Briefly, this software uses the Proteinortho software [80] for the pairwise detection of orthologous proteins between genomes, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needleman-Wunsch global alignment algorithm. B. timonensis strain AP1 T was com-

Genome properties
The genome is 7,130,768 bp long (1 chromosome, but no plasmid) with a 43.3% G+C content ( Figure 6 and Table 4). Of the 5,845 predicted genes, 5,786 were protein-coding genes and 59 were RNAs, including 1 complete rRNA operon. A total of 3,111 genes (53.22%) were assigned a putative function and 3,283 genes were identified as ORFans (56.16%). Strain AP1 T possesses a variety of mobile genetic elements. These include 6 prophages of 13.70, 14.60, 10.51, 8.18, 9.91 and 12.79 Kb, respectively) and 91 trans-posable elements belonging to 18 transposon families that include the putative mobilization protein BF0133, the putative conjugative transposon mobilization protein BF0132, the hypothetical protein clustered with conjugative transposons BF0131, TraA-CTn, TraB-CTn,TraD-CTn,TraE-CTn,TraF-CTn,TraG-CTn,TraH-CTn,TraI-CTn,TraJ-CTn,TraK-CTn,TraL-CTn,TraM-CTn,TraN-CTn,TraO-CTn and TraQ-CTn. The properties and statistics of the genome are summarized in Tables 4 and 5. The distribution of genes into COGs functional categories is presented in Table 5.  The total is based on either the size of the genome in base pairs or the total number of protein-coding genes in the annotated genome

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses (taxono-genomics), we formally propose the creation of Bacteroides timonensis sp. nov. that contains strain AP1 T . This strain was isolated from the fecal flora of a 21-year-old woman who suffered from severe anorexia nervosa.

Description of B. timonensis sp. nov.
Bacteroides timonensis (tim.o.nen'sis. L. masc. adj. timonensis, of Timone, the name of the hospital where strain AP1 T was first cultivated). Colonies are translucent and 0.3 mm in diameter on blood-enriched Columbia agar. Cells are rodshaped with a mean diameter of 0.88 µm. Optimal growth is achieved anaerobically, although the strain is able to grow under microaerophilic conditions, and weakly with 5% CO2. Growth occurs between 25°C and 37°C, with optimal growth at 37°C. Cells stain Gram-negative and are not motile. Positive reactions for catalase, arginine dihydrolase, α-galactosidase, βgalactosidase, α-glucosidase, β-glucosidase, αarabinosidase, N-acetyl-β-glucosaminidase, glutamic acid decarboxylase, α-fucosidase, nitrate reduction, indole production, alkaline phosphatase, proline arylamidase, leucyl glycine arylamidase, alanine arylamidase, glutamyl glutamic acid arylamidase, and fermentation of mannose and raffinose. Weak activities are observed for glycine arylamidase and serine arylamidase. Negative reactions are obtained for urease, βgalctosidase-6-phosphatase, β-glucuronidase, arginine arylamidase, phenylalanine arylamidase, leucine arylamidase, pyroglutamic acid arylamidase, tyrosine arylamidase and histidine arylamidase. Using an API 50CH strip (Biomerieux), strain AP1 T is asaccharolytic. Cells are susceptible to susceptible to amoxicillinclavulanate, ceftriaxone, imipenem, trimethoprim-sulfamethoxazole, metronidazole and doxycycline but resistant to amoxicillin, vancomycin and gentamicin. The 16S rRNA and genome sequences are deposited in GenBank under accession numbers JX041639 and CBVI000000000, respectively. The G+C content of the genome is 43.3%. The habitat of the organism is the digestive tract. The type strain AP1 T (= CSUR P194 = DSMZ 26083) was isolated from the fecal flora of a French Caucasoid female who suffered from a severe restrictive form of anorexia nervosa. This strain has been found in Marseille, France.