Non-contiguous finished genome sequence and description of Kallipyga massiliensis gen. nov., sp. nov., a new member of the family Clostridiales Incertae Sedis XI

Kallipyga massiliensis strain ph2T is the type strain of Kallipyga massiliensis gen. nov., sp. nov., the type species of the new genus Kallipyga within the family Clostridiales Incertae Sedis XI. This strain, whose genome is described here, was isolated from the fecal flora of a 26-year-old woman suffering from morbid obesity. K. massiliensis is an obligate anaerobic coccus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 1,770,679 bp long genome (1 chromosome but no plasmid) contains 1,575 protein-coding and 50 RNA genes, including 4 rRNA genes.


Introduction
Kallipyga massiliensis strain ph2 T (CSUR=P241, DSM=26229) is the type strain of K. massiliensis gen. nov., sp. nov. This bacterium was isolated from the stool sample of an obese French patient as part of a study aiming at individually cultivating all species occurring within human feces [1][2][3]. It is a Gram-positive, anaerobic, indole-negative coccus. Defining the taxonomic status of bacterial isolates remains a challenging task. The taxonomic molecular tools currently available, including 16S rRNA sequence similarity, G + C content and DNA-DNA hybridization (DDH) [4,5], although considered as gold standards, have limitations [6,7]. The 16S rRNA sequence similarity and G+C content thresholds do not apply uniformly to all species or genera, and the DDH method lacks intra-and inter-laboratory reproducibility [5]. The advent of high-throughput genome sequencing and proteomic analysis [8] has granted unprecedented access to exhaustive genetic and protein information for bacterial isolates. We recently proposed a polyphasic approach to describe new bacterial species in which genome sequences and MALDI-TOF spectra are used along with phenotypic characteristics . The family Clostridiales Incertae Sedis XI  was created in 2001 [31] and currently includes the 11 following genera: Anaerococcus (Ezaki et al. 2001) [32], Dethiosulfatibacter (Takii et al. 2007) [33], Finegoldia (Murdoch and Shah 2000) [34], Gallicola (Ezaki et al. 2001) [32], Helcococcus (Collins et al. 1993) [35], Parvimonas (Tindall and Euzéby 2006) [36], Peptoniphilus (Ezaki et al. 2001) [32], Sedimentibacter (Breitenstein et al. 2002) [37], Soehngenia (Parshina et al. 2003) [38], Sporanaerobacter (Hernandez-Eugenio et al. 2002) [39] and Tissierella (Collins and Shah 1986) [40]. Currently, 31 species with validly published names are reported in this family [41]. The species listed in the Clostridiales Incertae Sedis XI are mostly comprised of Gram-positive, obligate anaerobic cocci. Members belonging to this family were identified as pathogens in both humans and animals. In humans, they were often isolated from patients with septic arthritis, necrotizing pneumonia, prosthetic joint infection and other clinical conditions associated with vaginal discharges and ovarian, peritoneal and sacral abscesses [42][43][44][45][46]. Here we present a summary classification and a set of features for K. massiliensis gen. nov., sp. nov., strain ph2 T (CSUR=P241, DSM=26229) together with the description of the complete genomic sequencing and annotation. These characteristics support the circumscription of the genus Kallipyga and its type species, K. massiliensis within the Clostridiales Incertae Sedis XI family.

Classification and features
A stool sample was collected from a 26-year-old woman living in Marseille (France). She suffered from morbid obesity and had a body mass index of 48.2 (118.8 kg, 1.57 meter). At the time of stool sample collection she did not take any medication and was not on a diet. The patient gave an informed and signed consent, and the agreement of the ethics committee of the Institut Fédératif de Recherche (IFR48, Faculty of Medicine, Marseille, France) was obtained under reference 09-022. Another four new bacterial species, Alistipes obesi, Peptoniphilus grossensis, P. obesi and Enorma massiliensis [25][26][27]33], were also isolated from this specimen using various culture conditions. The fecal specimen was preserved at -80°C after collection. Strain ph2 T (Table 1) was isolated in 2011 by anaerobic culture on 5% sheep blood-enriched agar in anaerobic atmosphere at 37°C, following 26 days in a blood culture bottle with rumen and sheep blood. The 16S rRNA nucleotide sequence (GenBank accession number JN837487) of Kallipyga massiliensis strain ph2 T was 86.09% similar to Helcococcus sueciensis, the phylogenetically closest species (Figure 1). This value was lower than the 95.0% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers (2006) to delineate a new genus without carrying out DNA-DNA hybridization [5]. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [56]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.

Figure 1.
Phylogenetic tree highlighting the position of Kallipyga massiliensis strain ph2 T relative to other type strains within the Clostridiales Incertae Sedis XI family. Genbank accession numbers are indicated in parentheses. Sequences were aligned using CLUSTALW, and phylogenetic inferences obtained using the maximum-likelihood method in MEGA software. Numbers at the nodes are percentages of bootstrap values obtained by repeating the analysis 500 times to generate a majority consensus tree. Eubacterium cylindroides was used as outgroup. The scale bar represents a 2% nucleotide sequence divergence.
By comparison to the Genbank database [57], strain ph2 T also exhibited a nucleotide sequence similarity greater than 98.7% with 16 sequences from uncultured bacteria from the human skin microbiome [58]. These bacteria are most likely classified within the same species as strain ph2 T . Different growth temperatures (25, 30, 37, 45°C) were tested; no growth occurred at 25°C and 30°C, growth occurred between 37°C and 45°C, and optimal growth was observed at 37°C. Colonies were bright grey with a diameter of 1.0 mm on 5% blood-enriched Columbia agar. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux), and in the presence of air, with or without 5% CO 2 . Optimal growth was obtained anaerobically. No growth was observed under aerobic and microaerophilic conditions. Gram staining showed Gram-positive cocci ( Figure 2). A motility test was negative. Cells grown on agar are Gram-positive, have a diameter in electron microscopy ranging from 0.57µm to 0.78µm (mean, 0.67 µm, Figure 3) and are mostly grouped in pairs, short chains or small clumps. Strain ph2 T exhibited neither catalase or oxidase activities. Using API 32A (BioMerieux), nitrate reduction, indole formation and urease production were negative. A positive reaction was obtained for α-galactosidase, arginine dihydrolase and arginine arylamidase, α-glucosidase and β-glucosidase. Strain ph2 T did not ferment mannose or raffinose. Negative reactions were observed for β-galactosidase, βgalactosidase-6-phosphate, α-arabinosidase, βglucuronidase, N-acetyl-β-glucosaminidase, glutamic acid decarboxylase, proline arylamidase, leucyl glycine arylamidase, phenylalanine arylamidase, pyroglutamic acid arylamidase, tyrosine arylamidase, alanine arylamidase, glycine arylamidase, histidine arylamidase, glutamyl glutamic acid arylamidase, and serine arylamidase. Using an API Zym (BioMerieux), positive reactions were observed for esterase lipase, leucine arylamidase, α-glucosidase, β-glucosidase and acid phosphatase. Negative reactions were obtained for esterase, lipase, valine and cysteine arylamidase, trypsine, α-chymotrypsine, naphthol-AS-BIphosphohydrolase, α-galactosidase, β-galactosidase, β-glucuronidase, N-acetyl-β-glucosaminidase, αmannosidase and α-fucosidase. Using an API 50CH (BioMerieux), K. massiliensis weakly fermented Dribose, D-glucose, D-fructose and aesculin. By comparison with its closest phylogenetic neighbors, K. massiliensis differed from Finegoldia magna in αgalactosidase and α-glucosidase production, Dribose and D-fructose fermentation. It also differed from Helcococcus kunzii in oxygen requirement, αgalactosidase and leucine arylamidase production, D-ribose, D-glucose and esculin utilization. It differed from Parvimonas micra in alkaline phosphatase, glutamyl glutamic acid arylamidase, βglucosidase, phenylalanine arylamidase and histidine arylamidase production and D-glucose fermentation. It differed from Peptoniphilus indolicus in α-galactosidase, indole, α-glucosidase, βglucosidase, dihydrolase phenylalanine, phenylalanine arylamidase and histidine arylamidase production, and D-glucose fermentation ( Table 2). K. massiliensis is susceptible to amoxicillin, amoxicillin-clavulanic acid, gentamicin 500, penicillin, imipenem, vancomycin, rifampicin and nitrofurantoin, but resistant to ciprofloxacin, metronidazole, gentamicin 10, trimethoprim/sulfamethoxazole, ceftriaxon, erythromycin and doxycycline. Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was carried out as previously described [59] using a Microflex spectrometer (Bruker Daltonics, Germany). Twelve distinct deposits were done for strain ph2 T from 12 isolated colonies. The twelve ph2 T spectra were imported into our database and compared to spectra from 3,769 bacteria using the MALDI BioTyper software (version 2.0, Bruker). A score enabled the presumptive identification and discrimination of the tested species from those in a database: a score > 2 with a validly published species enabled the identification at the species level; a score > 1.7 but < 2 enabled the identification at the genus level; and a score < 1.7 did not enable any identification. For strain ph2 T , no significant score was obtained, suggesting that our isolate was not a member of any known species or genus (Figures 4  and 5). A broader study incorporating MALDI-TOF and 16S rDNA and genomic DNA identity data may be conducted to define taxonomic criteria at the family level.

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to members of the family Clostridiales Incertae Sedis XI and is part of a study of the human digestive flora aiming at isolating all bacterial species within human feces [1][2][3]. It was the thirty-sixth genome from the family Clostridiales Incertae Sedis XI to be sequenced and the first genome of K. massiliensis gen. nov., sp. nov. The GenBank accession number is CAHC00000000 and consists of 22 contigs. Table 3 shows the project information and its association with MIGS version 2.0 compliance [60].

Growth conditions and DNA isolation
Kallipyga massiliensis gen. nov., sp. nov., strain ph2 T (CSUR= P241, DSM=26229) was grown anaerobically on 5% sheep blood-enriched Columbia agar at 37°C. Three petri dishes were spread and the bacteria cultivated were resuspended in 3 × 100µl of G2 buffer (EZ1 DNA Tissue kit, Qiagen). A first mechanical lysis was performed by glass powder on the Fastprep-24 device (MP Biomedicals, USA) using 2 × 20 seconds cycles. DNA was then treated with 2.5µg/µL lysozyme for 30 minutes at 37°C and extracted using the BioRobot EZ1 Advanced XL (Qiagen). The DNA was then concentrated and purified on a QIAamp kit (Qiagen). The yield and concentration was measured by the Quant-it Picogreen kit (Invitrogen) on the Genios Tecan fluorometer at 78.2ng/µl.  Figure 4. Reference mass spectrum from K. massiliensis strain ph2 T . Spectra from 12 individual colonies were compared and a reference spectrum was generated.

Figure 5.
Gel view comparing Kallipyga massiliensis sp. nov strain ph2 T to other phylogenetically close species. The gel view displays the raw spectra of loaded spectrum files arranged in a pseudo-gel like look. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a Gray scale scheme code. The color bar and the right y-axis indicate the relation between the color a peak is displayed with and the peak intensity in arbitrary units. Displayed species are indicated on the left.

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [61] with default parameters. However, the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank [57] and Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAs and rRNAs were predicted using the tRNAScan-SE [62] and RNAmmer [63] tools, respectively. Lipoprotein signal peptides and numbers of transmembrane helices were predicted using SignalP [64] and TMHMM [65], respectively. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. Artemis [66] and DNA Plotter [67] were used for data management and visualization of genomic features, respectively. Mauve alignment tool (version 2.3.1) was used for multiple genomic sequence alignment [68]. To estimate the mean level of nucleotide sequence similarity at the genome level between K. massiliensis and four other members of the family Clostridiales Incertae Sedis XI (Table 6), orthologous proteins were detected using the Proteinortho software [69] and genomes compared two by two. For each pair of genomes, we determined the mean percentage of nucleotide sequence identity among orthologous ORFs using BLASTn. Standards in Genomic Sciences

Genome properties
The genome is 1,770,679 bp long (one chromosome, no plasmid) with a G+C content of 51.40% ( Figure 6 and Table 4). Of the 1,625 predicted chromosomal genes, 1,575 were protein-coding genes and 50 were RNAs. A total of 1,238 genes (76.18%) were assigned a putative function. Forty-two genes were identified as ORFans (2.66%) and the remaining genes were annotated as hypothetical proteins. The properties and statistics of the genome are summarized in Tables 4 and 5.
The distribution of genes into COGs functional categories is presented in Table 5.  a The total is based on either the size of the genome in base pairs or the total number of protein-coding genes in the annotated genome  (Table 6B).

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of Kallipya massiliensis gen. nov., sp. nov., that contains the strain ph2 T . This bacterium has been found in France.

Description of Kallipyga massiliensis gen. nov., sp. nov.
Kallipyga massiliensis (mas.il'ien'sis. L. gen. fem. n. massiliensis, of Massilia, the Latin name of Marseille where was cultivated strain ph2 T ). It has been isolated from the feces of an obese French patient. Gram-positive cocci. Strictly anaerobic. Mesophilic. Optimal growth at 37°C. Non-motile and non-sporulating. Colonies are bright grey with 1.0 mm in diameter on blood-enriched Columbia agar. Cells are cocci with a diameter ranging from 0.57 µm to 0.78 µm with a mean diameter of 0.67.