Non-contiguous finished genome sequence and description of Nosocomiicoccus massiliensis sp. nov.

Nosocomiicoccus massiliensis strain NP2T sp. nov. is the type strain of a new species within the genus Nosocomiicoccus. This strain, whose genome is described here, was isolated from the fecal flora of an AIDS-infected patient living in Marseille, France. N. massiliensis is a Gram-positive aerobic coccus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 1,645,244 bp long genome (one chromosome but no plasmid) contains 1,738 protein-coding and 45 RNA genes, including 3 rRNA genes.


Introduction
Nosocomiicoccus massiliensis strain NP2 T (= CSUR P246 = DSM 26222) is the type strain of N. massiliensis sp. nov. This bacterium is a Grampositive, non-spore-forming, indole negative, aerobic and motile coccus that was isolated from the stool of an AIDS-infected patient living in Marseille (France) and is part of a "culturomics" study aiming at cultivating all species within human feces [1,2]. The current prokaryote species classification, known as polyphasic taxonomy, is based on a combination of genomic and phenotypic properties [3]. With each passing year, the number of completely sequenced genomes increases geometrically while the cost of such techniques decreases. More than 4,000 bacterial genomes have been published and approximately 15,000 genome projects are anticipated to be completed in the near future [4]. We recently proposed to integrate genomic information in the taxonomic framework and description of new bacterial species [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. Here we present a summary classification and a set of features for N. massiliensis sp. nov. strain NP2 T (= CSUR P246 = DSM 26222), together with the description of the complete genomic sequence and its annotation. These characteristics support the circumscription of the species N. massiliensis. The genus Nosocomiicoccus Alves et al. 2008 was created on the basis of 16S rRNA gene sequence and phenotypic analyses within the family Staphylococcaceae [23]. To date, this genus is comprised of a single species, N. ampullae, which was isolated from the surface of saline bottles used for washing wounds in hospital wards [23].

Classification and features
A stool sample was collected from an HIV-infected patient living in Marseille (France). The patient gave an informed and signed consent. This study and the assent procedure were approved by the ethics committee of the IFR48 (Marseille, France) under reference 09-022. The fecal specimen was preserved at -80°C after collection. Strain NP2 T ( Table 1) was isolated in January 2012 by aerobic cultivation on 5% sheep blood agar (BioMerieux, Marcy l'Etoile, France) at 37°C, after 14-days of preincubation of the stool sample in a blood culture bottle supplemented with 5 ml of sterile ovine rumen fluid. This strain exhibited a 97% nucleotide sequence similarity with N. ampullae [23] and a range of 92-94% nucleotide sequence similarity to the most closely related members of the genus Jeotgalicoccus [34] (Figure 1). These values were lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [35]. Phylum Firmicutes TAS [25][26][27] Class Bacilli TAS [28,29] Current classification Order Bacillales TAS [30,31] Family Staphylococcaceae TAS [28,32] Genus Nosocomiicoccus TAS [23] Species , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [33]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.  Different growth temperatures (25, 30, 37, 45°C) were tested. Growth was observed between 25 and 45°C, with optimal growth at 37°C after 24 hours of incubation. Colonies were 1 mm in diameter on blood-enriched Columbia agar. Growth of the strain was tested on 5% sheep blood agar, under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMerieux), and under aerobic conditions, with or without 5% CO 2 . The strain optimal growth was obtained aerobically, weak growth was observed in microaerophilic but no growth was observed under anaerobic atmospheres. Gram staining showed Gram-positive coccus. The motility test was positive. Cells grown on agar are Gram-positive cocci ( Figure  2) and have a mean diameter of 0.72 µm as determined by electron microscopy ( Figure 3).
Strain NP2 T exhibited catalase but no oxidase activities. Using an API 20NE strip (BioMerieux, Marcy l'Etoile), negative reactions were obtained for nitrate reduction, urease, indole production, glucose fermentation, arginine dihydrolase, β-galactosidase, glucose, arabinose, mannose, mannitol, N-acetylglucosamine, maltose, gluconate, caprate, adipate, malate, citrate, phenyl-acetate and cytochrome oxidase. Substrate oxidation and assimilation was examined with an API 50CH strip (BioMerieux) at the optimal growth temperature but sugar fermentation reactions and assimilation were not observed. N. massiliensis strain NP2 T was susceptible to amoxicillin, imipenem, rifampicin, vancomycin doxycycline and gentamicin but resistant to trimethoprim/sulfamethoxazole, metronidazole and ciprofloxacine. When compared with representative species from the family Staphylococcaceae, N. massiliensis strain NP2 T exhibited the phenotypic differences detailed in Table 2.
Matrix-assisted laser-desorption/ionization time-offlight (MALDI-TOF) MS protein analysis was carried out as previously described [36] using a Microflex spectrometer (Bruker Daltonics, Leipzig, Germany). Twelve individual colonies were deposited on a MTP 384 MALDI-TOF target plate (Bruker). The twelve NP2 T spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 4, 706 bacteria, including spectra from one validly published species of Nosocomiicoccus, used as reference data in the BioTyper database. A score enabled the presumptive identification and discrimination of the tested species from those in a database: a score > 2 with a validly published species enabled the identification at the species level; and a score < 1.7 did not enable any identification. For strain NP2 T , no significant score was obtained, suggesting that our isolate was not a member of any known species (Figures 4 and 5).

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the genus Nosocomiicoccus, and is part of a "culturomics" study of the human digestive flora aiming at isolating all bacterial species within human feces. It was the first genome of a Nosocomiicoccus species and the first genome of Nosocomiicoccus massiliensis sp. nov. A summary of the project information is shown in Table 3. The Genbank accession number is CAVG00000000 and consists of 154 contigs. Table 3 shows the project information and its association with MIGS version 2.0 compliance [37]. Genome sequencing and assembly DNA (5 µg) was mechanically fragmented for the paired-end sequencing, using a Covaris device (Covaris Inc., Woburn, MA,USA) with an enrichment size of 3-4 kb. The DNA fragmentation was visualized through an Agilent 2100 BioAnalyzer on a DNA Labchip 7500 which yielded an optimal size of 3.4 kb. The library was constructed using a 454 GS FLX Titanium paired-end rapid library protocol. Circularization and nebulization were performed and a pattern of optimal size of 589 bp was generated. PCR amplification was performed for 17 cycles followed by double size selection. The single-stranded paired-end library was quantified using a Quant-it Ribogreen Kit (Invitrogen) using a Genios Tecan fluorometer. The library concentration equivalence was calculated as 1.42× 10 10 molecules/µL. The library was stored at -20°C until further use.  Endospore formation ------

Production of
Acid phosphatase --  The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a Gray scale scheme code. The color bar and the right y-axis indicate the relation between the color a peak is displayed with and the peak intensity in arbitrary units. Displayed species are indicated on the left. For the shotgun sequencing, DNA (500 ng) was mechanically fragmented using a Covaris device (Covaris Inc.) as described by the manufacturer. The DNA fragmentation was visualized using an Agilent 2100 BioAnalyzer on a DNA Labchip 7500 which yielded an optimal size of 1.7 kb. The library was constructed using the GS Rapid library Prep kit (Roche) and quantified using a TBS 380 mini fluorometer (Turner Biosystems, Sunnyvale, CA, USA). The library concentration equivalence was calculated as 2.8× 10 9 molecules/µL. The library was stored at -20°C until further use. The shotgun library was clonally amplified with 1 and 2 cpb in two emPCR reactions each, and the paired-end library was amplified with 0.5 cpb in three emPCR reactions using the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yields of the emPCR were 6.8 and 9.8%, respectively, for the shotgun library, and 11.29% for the paired-end library. These yields fall into the expected 5 to 20% range according to Roche protocol.

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [38] with default parameters but the predicted ORFs were excluded if they were spanning a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank database [39] and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAScanSE tool [40] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [41] and BLASTn against the GenBank database. Lipoprotein signal peptides and numbers of transmembrane helices were predicted using SignalP [42] and TMHMM [43] respectively. ORFans were identified if their BLASTP E-value was lower than 1e -03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have already been used in previous works to define ORFans. To estimate the mean level of nucleotide sequence similarity at the genome level between N. massiliensis and three other members of the family Staphylococcaceae (Table 6), we used the Average Genomic Identity of Orthologous gene Sequences (AGIOS) home-made software. Briefly, this software combines the Proteinortho software (version 1.4) [44] for detecting orthologous proteins between genomes compared two by two, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needleman-Wunsch global alignment algorithm. Nosocomiicoccus massiliensis strain NP2 T was compared to Macrococcus caseolyticus strain JCSC5402 (GenBank accession number NC_011999), Staphylococcus pseudointermedius strain ED 99 (NC_017568), and Salinicoccus albus strain DSM 19776 (ARQJ00000000). Artemis [45] was used for data management and DNA Plotter [46] was used for visualization of genomic features. The Mauve alignment tool was used for multiple genomic sequence alignment and visualization [47].

Genome properties
The genome of N. massiliensis strain NP2 T is 1,6452,44 bp long (1 chromosome, but no plasmid) with a 36.40% G + C content of ( Figure 6 and Table 4). Of the 1,783 predicted genes, 1,738 were protein-coding genes, and 45 were RNAs. Three rRNA genes (one 16S rRNA, one 23S rRNA and one 5S rRNA) and 42 predicted tRNA genes were identified in the genome. A total of 1,350 genes (75.71%) were assigned a putative function. Two hundred forty-six genes were identified as ORFans (13.79%). The remaining genes were annotated as hypothetical proteins. The properties and the statistics of the genome are summarized in Table 4 and Table 5. The distribution of genes into COGs functional categories is presented in Table 5.  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome  Figure 7). The nucleotide sequence identity ranged from 64.75 to 69.80% among the genera. Table 6 summarizes the numbers of orthologous genes and the average percentage of nucleotide sequence identity between the different genomes studied.

Description of Nosocomiicoccus massiliensis sp. nov.
Nosocomiicoccus massiliensis (mas.si.li.en′sis. L. masc. adj. massiliensis of Massilia, the Roman name of Marseille, France, where the type strain was isolated). Colonies are 1 mm in diameter on blood-enriched Columbia agar. Cells are cocci-shaped with a mean diameter of 0.72 µm. Optimal growth is achieved aerobically and weak growth was observed microaerophilic condition. No growth is observed in anaerobic conditions. Growth occurs between 25 and 45°C, with optimal growth observed at 37°C. Cells stain Gram-positive, are nonendospore forming and are motile. Cells are negative for nitrate reduction, urease, indole production, glucose fermentation, arginine dihydrolase, β-galactosidase, glucose, arabinose, mannose, mannitol, N-acetyl-glucosamine, maltose, gluconate, caprate, adipate, malate, citrate, phenyl-acetate and cytochrome oxidase. Cells are susceptible to amoxicillin, imipenem, rifampicin, vancomycin doxycycline and gentamicin but resistant to trimethoprim/sulfamethoxazole, metronidazole and ciprofloxacine. The G+C content of the genome is 36.40%. The 16S rRNA and genome sequences are deposited in Genbank under accession numbers JX424771 and CAVG00000000, respectively. The type strain NP2 T (= CSUR P246 = DSM 26222) was isolated from the fecal flora of an AIDSinfected patient living in Marseille, France.