Non-contiguous finished genome sequence and description of Alistipes ihumii sp. nov.

Alistipes ihumii strain AP11T sp. nov. is the type strain of A. ihumii sp. nov., a new species within the genus Alistipes. This strain, whose genome is described here, was isolated from the fecal flora of a 21-year-old French Caucasian female, suffering from a severe restrictive form of anorexia nervosa since the age of 12 years. A. ihumii is a Gram-negative anaerobic bacillus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,753,264 bp long genome (one chromosome but no plasmid) contains 2,254 protein-coding and 47 RNA genes, including 3 rRNA genes.


Introduction
Alistipes ihumii strain AP11 T (= CSUR P204 = DSM 26107) is the type strain of A. ihumii sp. nov. This bacterium is a Gram-negative, non-spore-forming, anaerobic and non-motile bacillus that was isolated from the stool of a 21-year-old French female suffering from anorexia nervosa, and is part of a "culturomics" study aiming at cultivating individually all species within human feces [1][2][3]. Prokaryotic taxonomy is episodically confronted with the advancement of methodological and conceptual innovations. The current classification methodology for prokaryotes is known as polyphasic taxonomy, and relies on a combination of phenotypic and genotypic characteristics [4]. The number of completely sequenced genomes is geometrically increasing with time, concurrently with the decrease in cost of such techniques. To date, more than 6,000 bacterial genomes have been published and approximately 25,000 genome sequencing projects have been announced [5]. We recently proposed to integrate genomic information in the taxonomic framework for the description of new bacterial species . The genus Alistipes (Rautio et al. 2003) was created in 2003 [28] and is composed of strictly anaerobic Gram-negative rods that resemble the Bacteroides fragilis group in that most species are bile-resistant and indole-positive [29]. This genus is currently comprised of five species with validly published names, including A. finegoldii, A. putredinis [28], A. indistinctus [30], A. onderdonkii and A. shahii [31], to which we added three proposed new species, A. senegalensis [8], A. timonensis [9] and A. obesi [22]. Here we present a summary classification and a set of features for a new Alistipes species, A. ihumii sp. nov. strain AP11 T (= CSUR P204 = DSM 26107), together with the description of the complete genomic sequence and its annotation.

Classification and features
A stool sample was collected from a 21-year-old French Caucasian female suffering from severe restrictive form of anorexia nervosa since the age of 12 years. At the time of sample collection, she was hospitalized in our hospital for recent aggravation of her medical condition (BMI: 10.4 kg/m 2 ). The patient gave an informed and signed consent. Both this study and the assent procedure were approved by the Ethics Committee of the Institut Fédératif de Recherche IFR48, Faculty of Medicine, Marseille, France under reference 09-022. Ten other potentially new bacterial species were isolated from this patient's stool, all of which are currently being described. Microbial culturomics also enabled the isolation of several other new bacterial species from other stool specimens . The fecal specimen was stored at -80°C immediately after collection. Strain AP11 T was isolated in November 2011 after 2 days of inoculation in anaerobic blood culture bottle with the addition of 5mL of thioglycolate and further inoculation on Columbia agar (BioMerieux, Marcy l'Etoile, France). This strain exhibited a 95% 16S rRNA sequence similarity with A. indistinctus [30], the phylogenetically closest Alistipes species with a validly published name (Table 1, Figure 1), and 92% with A. onderdonkii [28] and A. putredinis [31]. This value was in the range of 16S rRNA sequence identities among species within the genus Alistipes that range from 90 to 95%, and lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [41]. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [40]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. Different growth temperatures (25, 30, 37, 45°C) were tested. Growth was observed between 25 and 45°C, with optimal growth at 37°C after 24 hours of inoculation. Colonies were about 0.2 mm in diameter, transparent, and exhibited a ßhemolytic activity on blood-enriched Columbia agar. Growth of the strain was tested on 5% sheep blood agar, under anaerobic and microaerophilic conditions using the GENbag anaer and GENbag microaer systems, respectively (BioMerieux), and under aerobic conditions with or without 5% CO2. Optimal growth of this strain was obtained anaerobically, weak growth was observed under microaerophilic conditions, and no growth was observed under aerobic atmosphere. The motility test was negative. Cells grown on agar are Gramnegative rods ( Figure 2) and have mean diameter and length of 0.72 and 1.69 µm, respectively, as determined using electron microscopy ( Figure 3). Strain AP11 T exhibited oxidase but no catalase activities. Using API 50CH (BioMérieux), we observed that strain AP11 T was asaccharolytic. Using API 32A (BioMérieux), positive reactions were obtained for α-glucosidase, β-glucosidase, N-acetylβ-glucosaminidase, mannose and raffinose fermentation, alkaline phosphatase, leucyl glycine arylamidase, alanine arylamidase, and glutamyl glutamic acid arylamidase. Weak reactions were observed for α-galactosidase and glutamic acid decarboxilase. Negative reactions were obtained for urease, arginine dihydrolase, β-galactosidase, 6 phospho-β-galactosidase, α-arabinosidase, βglucuronidase, α-fucosidase, nitrate reduction, indole production, arginine arylamidase, proline arylamidase, phenylalanine arylamidase, leucine arylamidase, pyroglutamic acid arylamidase, tyrosine arylamidase, glycine arylamidase, histidine arylamidase, and serine arylamidase. A. ihumii is susceptible to amoxicillin, imipenem, and clindamycin, but resistant to vancomycin. When compared with representative species from the genus Alistipes, strain AP11 T exhibited the phenotypic differences detailed in Table 2.

Alistipes massilioanorexius (JX101692)
Alistipes ihumii (JX101692) Standards in Genomic Sciences   senegalensis, A. obesi and A. timonensis, used as reference data in the BioTyper database. The output score enabled the presumptive identification and discrimination of the tested species from those in the database: a score > 2 with a validated species identifies a strain at the species level; and a score < 1.7 indicates a species-level match was not made. For strain AP11 T , no significant score was obtained, suggesting that our isolate was not a member of any known species (Figures 4 and 5).
We added the spectrum from strain AP11 T to our database.

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the Alistipes genus, and is part of a "culturomics" study of the human digestive flora aiming at isolating all bacterial species within human feces. It was the eighth se-quenced genome from an Alistipes species and the first from Alistipes ihumii sp. nov. A summary of the project information is shown in Table 3. The Genbank accession number is CAPH00000000 and consists of 60 contigs. Table 3 shows the project information and its association with MIGS version 2.0 compliance [43]. The Gel View displays the raw spectra of all loaded spectrum files arranged in a pseudo-gel like look, with each peak displayed as a band or bar. The peak intensity is reflected by the intensity of the gray color. The right y-axis shows the relationship between the shades of gray and the peak intensity in arbitrary units. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra loading.

Genome sequencing and assembly
A 3kb paired-end sequencing strategy (Roche, Meylan, France) was used. DNA (5 µg) was mechanically fragmented for the paired-end sequencing, using a Covaris device (Covaris Inc., Woburn, MA,USA) with an enrichment size of 3-4 kb. The DNA fragmentation was visualized through an Ag-

Alistipes massilioanorexius DSM 26107
Alistipes ihumii AP11 T , DSM26107 ilent 2100 BioAnalyzer on a DNA Labchip 7500 which yielded an optimal size of 2.3 kb. The library was constructed using the 454 GS FLX Titanium paired-end rapid library protocol. Circularization and nebulization were performed which generated a pattern of optimal size of 457 bp. PCR amplification was performed for 17 cycles followed by double size selection. The singlestranded paired-end library was quantified using a Quant-it Ribogreen Kit (Invitrogen) and the Genios Tecan fluorometer. The library concentration equivalence was calculated as 1.94× 10 10 molecules/µL. The library was stored at -20°C until further use. The paired-end library was clonally amplified with 0.5 and 1 cpb in 2 emPCR reactions with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yield of the shotgun emPCR reactions was 6.24 and 16.24% respectively for the two kinds of paired-end emPCR reactions according to the quality expected (range of 5 to 20%) from the Roche procedure. Two libraries were loaded on the GS Titanium PicoTiterPlates (PTP Kit 70x75, Roche) and pyrosequenced with the GS Titanium Sequencing Kit XLR70 and the GS FLX Titanium sequencer (Roche). The run was performed overnight and then analyzed on the cluster through the gsRunBrowser and Newbler assembler (Roche). A total of 260,838 passed filter wells were obtained and generated 96.3 Mb with an average length of 369 bp. The passed filter sequences were assembled using Newbler with 90% identity and 40 bp as overlap. The final assembly identified 9 scaffolds and 60 contigs (> 1,500 bp) and generated a genome size of 2.75 Mb which corresponds to a coverage of 35× genome equivalent.

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [44] with default parameters but the predicted ORFs were excluded if they were spanning a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank database [45] and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAScan-SE tool [46] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [47] and BLASTn against the GenBank database. Lipoprotein signal peptides and numbers of transmembrane helices were predicted using SignalP [48] and TMHMM [49] respectively. ORFans were identified if their BLASTP E-value was lower than 1e -03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e -05 .  [51] was used for data management and DNA Plotter [52] was used for visualization of genomic features. The Mauve alignment tool was used for multiple genomic sequence alignment and visualization [53].

Genome properties
The genome of A. ihumii strain AP11 T is 2,753,264 bp long (1 chromosome, but no plasmid) with a 57.90% G + C content ( Figure 6 and Table 4). Of the 2,301 predicted genes, 2,254 were proteincoding genes, and 47 were RNAs. One rRNA operon (one 16S rRNA, one 23S rRNA and one 5S rRNA) and 44 predicted tRNA genes were identified in the genome. A total of 1,465 genes (63.66%) were assigned a putative function. Two hundred thirty-seven genes were identified as ORFans (10.29%). The remaining genes were annotated as hypothetical proteins. The properties and the statistics of the genome are summarized in Tables 4 and 5. The distribution of genes into COGs functional categories is presented in Table 5.

Genome comparison with other Alistipes species
Here, we compared the genome of A. ihumii strain AP11 T to those of A. obesi strain ph8 T (GenBank accession number CAHA00000000),  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome  (Table 6). However, the distribution of genes into COG categories was not entirely similar in all eight compared genomes (Figure 7).

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of Alistipes ihumii sp. nov. that contains strain AP11 T . This bacterial strain has been isolated from the fecal flora of a patient suffering from anorexia nervosa living in Marseille, France. Several other new bacterial species were also cultivated from this patient as well as fecal samples from other patients using microbial culturomics , thus suggesting that the human fecal flora from human remains partially unknown.  Colonies are 0.2 mm in diameter and are translucent on blood-enriched Columbia agar. Cells are rod-shaped with a mean diameter of 0.72 µm and a mean length of 1.69 µm. Optimal growth is achieved anaerobically. No growth is obtained aerobically but weak growth is observed in microaerophilic conditions. Growth occurs between 25°C and 45°C, with an optimal growth observed at 37°C. Cells stain Gram-negative, are non motile and are asaccharolytic. Activities present are αglucosidase, β-glucosidase, N-acetyl-β-glucosaminidase, mannose and rafinnose fermentation, alkaline phosphatase, leucyl glycine arylamidase, alanine arylamidase, and glutamyl glutamic acid arylamidase. Cells are negative for urease, arginine dihydrolase, β-galactosidase, 6-phospho-βgalactosidase, α-arabinosidase, β-glucuronidase, α-fucosidase, nitrate reduction, indole production, arginine arylamidase, proline arylamidase, phenylalanine arylamidase, leucine arylamidase, pyroglutamic acid arylamidase, tyrosine arylamidase, glycine arylamidase, histidine arylamidase, and serine arylamidase. Cells are susceptible to amoxicillin, imipenem, and clindamycin, but resistant to vancomycin. The G+C content of the genome is 57.90%. The 16S rRNA and genome sequences are deposited in Genbank under accession numbers JX101692 and CAPH00000000, respectively. The type strain AP11 T (= CSUR P204 = DSM 26107) was isolated from the fecal flora of a 21year-old French Caucasian female suffering from severe anorexia nervosa.