Non-contiguous finished genome sequence and description of Megasphaera massiliensis sp. nov.

Megasphaera massiliensis strain NP3T sp. nov. is the type strain of Megasphaera massiliensis sp. nov., a new species within the genus Megasphaera. This strain, whose genome is described here, was isolated from the fecal flora of an HIV-infected patient. M. massiliensis is a Gram-negative, obligate anaerobic coccobacillus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,661,757 bp long genome (1 chromosome but no plasmid) contains 2,577 protein-coding and 61 RNA genes, including 5 rRNA genes.


Introduction
Megasphaera massiliensis sp. nov. strain NP3 T (= CSUR P245 = DSM 26228) is the type strain of M. massiliensis sp. nov. This bacterium is a Gramnegative, non-sporulating, anaerobic and nonmotile coccobacillus that was isolated from the stool of an HIV-infected patient as part of a culturomics study designed to cultivate individually all bacterial species within human feces [1,2]. The current classification of prokaryotes is based on a combination of phenotypic and genotypic characteristics [3,4] including 16S rRNA gene phylogeny, G + C content and DNA-DNA hybridization (DDH). Despite being considered as a "gold standard", these tools exhibit several drawbacks [5,6]. To date, almost 4,000 bacterial genomes have been sequenced [7] and the cost of genomic sequencing is constantly decreasing. Therefore, we recently proposed the addition of genomic information to phenotypic criteria, including the protein profile, for the description of new bacterial species . The genus Megasphaera (Rogosa 1971), created in 1971 [30], currently contains 5 species including M. cerevisiae (Engelmann and Weiss 1986) [31], M. elsdenii (Gutierrez et al. 1959) [30], M. micronuciformis (Marchandin et al. 2003) [32], M. paucivorans (Juvonen and Suihko 2006) [33] and M. sueciensis (Juvonen and Suihko 2006) [33]. The type species, M. elsdenii (Gutierrez et al. 1959) [30], originally classified in the Peptostreptococcus genus (Gutierrez et al. 1959), was later reclassified within a new genus, Megasphaera (Rogosa 1971), in the family Veillonellaceae (Rogosa 1971) [30]. It is an obligately anaerobic, lactatefermenting, gastrointestinal microbe of ruminant and non-ruminant mammals, including humans. It was also isolated in a case of human endocarditis [34]. The genome from M. elsdenii strain DSM 20460, isolated from the rumen of sheep, was recently sequenced [35]. M. cerevisiae [31], M. micronuciformis [32], M. paucivorans and M. sueciensis [33] are brewery-associated species. Here we present a summary classification and a set of features for M. massiliensis sp. nov. strain NP3 T (= CSUR P245 = DSM 26228) together with the description of the complete genome sequencing and annotation. These characteristics support the circumscription of the species M. massiliensis.

Classification and features
A stool sample was collected from a 32-year-old HIV-infected patient living in Marseille, France. The patient gave written informed consent for the study. The study was approved by the Ethics Committee of the Institut Fédératif de Recherche IFR48, Faculty of Medicine, Marseille, France, under agreement number 09-022. Standards in Genomic Sciences The fecal specimen was preserved at -80°C after collection. Strain NP3 T (Table 1) was isolated in January 2012 by cultivation on 5% sheep blood agar in anaerobic condition at 37°C, following a 7day preincubation of the stool specimen in an anaerobic blood culture bottle enriched with sterile 5% sheep rumen fluid and 5% sheep blood. The strain exhibited a nucleotide sequence similarity with other members of the genus Megasphaera ranging from 91.5% with M. cerevisiae strain PAT1 T to 95.8% with M. elsdenii strain ATCC 25940 T , its closest validated phylogenetic neighbor ( Figure 1). These values were lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [4]. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [45]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.  Different growth temperatures (25, 30, 37, 45°C) were tested. Growth occurred between 30 and 45°C, and optimal growth was observed at 37°C. Colonies were transparent and smooth with a diameter of 0.5 to 1 mm on blood-enriched Columbia agar (BioMérieux). Growth of the strain was tested in 5% sheep blood agar (BioMérieux) under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux), and in the presence of air, with or without 5% CO 2 . Growth only occurred in anaerobic atmosphere. No growth was observed under aerobic conditions and microaerophilic conditions. A motility test was negative. Cells grown on agar are Gram-negative coccobacilli ( Figure 2), with a mean diameter of 0.87 µm and the presence of phages in electron microscopy ( Figure 3). Strain NP3 T exhibited oxidase, but no catalase activity. Using RAPID 32A identification strips (BioMérieux), positive reactions were observed for α-glucosidase and β-glucosidase. Negative reactions were observed for urease, arginine dihydrolase, α and β-galactosidase, βgalactosidase-6-phosphate, α-arabinosidase, βglucuronidase, N-acetyl-β-glucosanimidase, mannose and raffinose fermentation, α-fucosidase, alkanine phosphatase, arginine arylamidase, proline arylamidase, leucyl glycine arylamidase, phenylalanine arylamidase, leucine arylamidase, pyroglutamic acid arylamidase, tyrosine arylamidase, alanine arylamidase, glycine arylamidase, histidine arylamidase, glutamyl glutamic acid arylamidase and serine arylamidase. Carbohydrate metabolism was examined using an API 50CH strip (BioMerieux). Positive reactions were observed for potassium gluconate, potassium 5-cetogluconate, aesculin, salicine, Nacetylglucosamine, and arbutine production, and Larabinose, D-ribose, D-xylose, D-galactose, Dglucose, D-fructose, D-mannose, L-rhamnose, Dmannitol, D-sorbitol, D-celiobiose, D-maltose, Dlactose, D-trehalose, gentiobiose, L-fucose and Darabitol fermentation. Weak reactions were observed for amygdaline and potassium 2cetogluconate production, and glycerol and Darabinose fermentation. Table 2 summarizes the differential phenotypic characteristics of M. massiliensis, M. elsdenii and M. micronuciformis. M. massiliensis strain NP3 T was susceptible to amoxicillin, amoxicillin-clavulanic acid, ceftriaxone, imipenem and doxycycline but resistant to vancomycin, erythromycin, rifampicin, trimethoprim-sulfamethoxazole, metronidazole and ciprofloxacin.  Oxygen requirement anaerobic anaerobic Anaerobic Pigment production + + - Acid production from  (Figures 4 and 5). The method of identification included the m/z from 3,000 to 15,000 Da. For every spectrum, 100 peaks at most were taken into account and compared with the spectra in the database. The MALDI-TOF score enabled the predictive identification and discrimination of the tested species from those in a database: a score > 2 with a validated species enabled identification at th0e species level, and a score < 1.7 did not enable any identification. No significant score was obtained for strain NP3 T against the Brüker database, suggesting that our isolate was not a member of a known species. We added the spectrum from strain NP3 T to our database for future reference (Figure 4). Figure 5 shows the MALDI-TOF MS spectrum differences between M. massiliensis and other Megasphaera and Veillonella species ( Figure 5).   The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a Gray scale scheme code. The color bar and the right y-axis indicate the relation between the color a peak is displayed and the peak intensity in arbitrary units.

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phenotypic differences, phylogenetic position and 16S rRNA similarity to other members of the genus Megasphaera, and is part of a study of the human digestive flora aiming at isolating all bacterial species within human feces [1,2]. It was the third genome of a Megasphaera species and the first sequenced genome of M. massiliensis sp. nov. The GenBank ID is CAVO00000000 and consists of 106 large contigs. Table 3 shows the project information and its association with MIGS version 2.0 compliance [47].

Genome annotation
Prodigal [48] with default parameters was used to predict the Open Reading Frames (ORFs). The predicted ORFs were excluded if they spanned a sequencing gap region. Protein functional assessment was obtained by comparison with sequences in the GenBank [49] and Clusters of Orthologs Groups (COG) databases using BLASTP. The rRNA and tRNA were identified using RNAmmer [50] and tRNAscan-SE 1.21 [51] respectively. SignalP [52] and TMHMM [53] were used to predict signal peptides and transmembrane helices, respectively. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. Artemis [54] was used for data management and DNA Plotter [55] was used for visualization of genomic features. PHAST was used to identify, annotate and graphically display prophage sequences within bacterial genomes or plasmids [56]. To estimate the mean level of nucleotide sequence similarity at the genome level between M. massiliensis and another 5 members of the family Veillonellaceae, orthologous proteins were detected using the Proteinortho software with the following parameters: e-value 1e-5, 30% percentage of identity, 50% coverage and algebraic connectivity of 50% [57], and genomes compared two by two. For each pair of genomes, we determined the mean percentage of nucleotide sequence identity among orthologous ORFs using BLASTn.

Genome properties
The genome of M. massiliensis strain NP3 T is 2,661,757 bp long (in 28 scaffolds, 1 chromosome, and no plasmid) with a 50.2% GC content (Table 3 and Figure 6). Of the 2,577 predicted genes, 2,516 were protein-coding genes and there were 61 RNA genes. A total of 1,697 genes (65.8%) were assigned a putative function. A total of 248 genes (9.6%) were annotated as hypothetical proteins. The properties and the statistics of the genome are summarized in Tables 4 and 5. The distribution of genes into COGs functional categories is presented in Table 5.    The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of Megasphaera massiliensis sp. nov. that contains the strain NP3 T . This bacterial strain has been found in Marseille, France.