Non-contiguous finished genome sequence and description of Anaerococcus provenciensis sp. nov.

Anaerococcus provenciensis strain 9402080T sp. nov. is the type strain of A. provenciensis sp. nov., a new species within the genus Anaerococcus. This strain was isolated from a cervical abscess sample. A. provenciensis is a Gram-positive anaerobic cocci. Here, we describe the features of this organism, together with the complete genome sequence and annotation. The 2.26 Mbp long genome contains 2099 protein-coding and 57 RNA genes including 8 rRNA genes and exhibits a G+C content of 33.48%.


Introduction
Anaerococcus provenciensis strain 9402080 T (= CSUR P121 = DSM 26345) is the type strain of A. provenciensis sp. nov. This bacterium is a Grampositive, non spore-forming, indole negative, anaerobic and non-motile cocci, that was isolated from a cervical abscess sample, during a study prospecting anaerobic isolates from deep samples [1]. Currently, to classify prokaryotes, a polyphasic approach is preferred, combining phenotypic and genotypic characteristics to describe a new isolate [2]. It was recently proposed to integrate genomic features in the description of new bacterial species, because, as a result of decreasing of genomic sequencing costs, more than 3,000 bacterial genome have been sequenced to date [3] providing much information [4][5][6][7][8][9][10][11][12][13][14][15]. The genus Anaerococcus belongs to the order Clostridiales, and the family Clostridiales Family XI Incertae Sedis [16]. This is a heterogeneous family, grouping anaerobic cocci and rods, and it is mainly defined on the basis of phylogenetic analyses of 16S rRNA gene sequences. Actually, 11 genera are found in the group Clostridiales Family XI Incertae Sedis, among which are the genera Anaerococcus and Peptoniphilus. The genus Anaerococcus was first described in 2001 [17], and contains 7 species, A. prevotii, A. hydrogenalis, A. lactolyticus, A. murdochii, A. octavius, A. tetradius and A. vaginalis.
The type species is A. prevotii (type strain ATCC 9321). It was first described in 1948 by Foubert and Douglas [18]. Members of the genus Anaerococcus are anaerobic Gram-positive non motile cocci, and formerly belonged to the genus Peptostreptococcus sp. bubt were reclassified in 2001 by Ezaki et al., based on phylogenetic and metabolic features [17]. They are mostly found in human vagina, and can also be found in nasal cavity or skin. They have also been implicated in human pathology, and were isolated from several infectious site, such as ovarian, peritoneal, sacral, digital and cervical abscesses, vaginoses, bacteremias, foot ulcers, a sternal wound, and an arthritic knee [17,[19][20][21][22]. Moreover, uncultured Anaerococcus sp. can be detected in metagenomes from the human skin flora [23]. The two species most closely related to Anaerococcus provenciensis sp. nov, are Anaerococcus prevotii and Anaerococcus tetradius, based on the comparison of their 16S rRNA gene sequence.
Here we present a summary classification and a set of features for A. provenciensis sp. nov. strain 9402080 T (= CSUR P121 = DSM 26345), together with a description of the complete genomic sequencing and annotation. These characteristics support the circumscription of the A. provenciensis species.

Classification and features
A cervical abscess sample was collected from a patient during a study designed to prospect for emerging anaerobes using MALDI-TOF and 16S rRNA gene sequencing, in Marseille [1]. The specimen was preserved at -80°C after sampling. Strain 9402080 T (Table 1) (Figure 1). These values are lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [35]. Altitude 0 m above sea level IDA a Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [33]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. Standards in Genomic Sciences Sequences were aligned using CLUSTALW, and phylogenetic inferences obtained using the maximumlikelihood method within the MEGA 4 software [34]. Numbers at the nodes are bootstrap values obtained from 500 replicates used to generate a majority consensus tree. Clostridium butyricum was used as the outgroup. The scale bar represents a 2% nucleotide sequence divergence.
Seven different growth temperatures (23°C, 25°C, 28°C, 32°C, 35°C, 37°C, 50°C) were tested ; no growth occurred at 50°C; growth occurred in 3 days between 23° and 37°C and optimal growth was observed in 2 days at 35°C and 37°C. Colonies are small, 1mm in diameter, light grey, smooth and round on blood-enriched Columbia agar under anaerobic conditions using GENbag anaer (BioMérieux). Bacteria were grown on blood-enriched Columbia agar (Biomerieux), on BHI agar medium, on BHI agar medium supplemented with 1% NaCl, in BHI broth medium and in Trypticase-soja TS broth medium. Agar plates were incubated under anaerobic conditions using GENbag anaer (BioMérieux), under microaerophilic conditions using GENbag microaer (BioMérieux) and in the presence of air, with or without 5%CO2. Growth was achieved anaerobically and weakly after 3 days under microaerophilic conditions, on blood-enriched Columbia agar and in TS broth medium. Growth on BHI agar medium, and on BHI agar medium supplemented with 1% NaCl was also weak, and occurred after 72h. Gram staining showed non spore-forming Gram-positive cocci ( Figure 2). The motility test was negative. Cells grown anaerobically in TS broth medium have a mean diameter of 1.12 µm (min = 0.98µm; max = 1.33 µm), as determined using electron microscopic observation after negative staining with a 3% ammonium molybdate solution ( Figure 3). Strain 9402080 T exhibited catalase activity and no oxidase activity. Using an API 20A strip (BioMerieux, Marcy l'Etoile), positive reactions could be observed for D-Glucose, D-Lactose, D-Saccharose, D-Maltose, Salicin, D-Xylose, Gelatinase, Esculin, D-Mannose, and D-Trehalose. Using an API ZYM strip positive reactions were obtained for alkaline phosphatase (5nmol of hydrolyzed substrate), esterase (5nmol), esterase lipase (5nmol), leucine arylamidase (40nmol), acid phosphatase (5nmol), naphtophosphohydrolase (20nmol), and hyaluronidase (30nmol). Using an Api rapid id 32A, positive reactions could be observed for Arginine Dihydrolase, Beta Galactosidase, Beta Glucosidase, Beta Glucuronidase, N-Acetyl-beta-Glucosaminidase, Alpha-fucosidase, Mannose fermentation, Alkaline phosphatase, Arginine arylamidase, Leucine arylamidase, Pyroglutamate arylamidase, and Histidine arylamidase. Regarding antibiotic susceptibility, A. provenciensis was susceptible to penicillin G, amoxicillin, cefotetan, imipenem, metronidazole and vancomycin. When compared to the representative species within the genus Anaerococcus, A. provenciensis exhibits the phenotypic characteristics details in Table 2.  Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was carried out as previously described [36]. Briefly, a pipette tip was used to pick an isolated bacterial colony from a culture agar plate and spread it as a thin film on a MTP 384 MALDI-TOF target plate (Bruker Daltonics, Germany). Ten distinct deposits were done for strain A. provenciensis strain 9402080 T , from ten isolated colonies. Each smear was overlaid with 2µL of matrix solution (saturated solution of alpha-cyano-4-hydroxycinnamic acid) in 50% acetonitrile, 2.5% tri-fluoracetic acid, and allowed to dry for five minutes. Measurements were performed with a Microflex spectrometer (Bruker). Spectra were recorded in the positive linear mode for the mass range of 2,000 to 20,000 Da (parameter settings: ion source 1 (ISI), 20kV; IS2, 18.5 kV; lens, 7 kV). A spectrum was obtained after 675 shots at a variable laser power. The time of acquisition was between 30 seconds and 1 minute per spot. The ten 9402080 T spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parame-ter settings) against the main spectra of 5,697 bacteria that were used as reference data in the BioTyper database. The method of identification includes the m/z from 3,000 to 15,000 Da. For every spectrum, 100 peaks at most were taken into account and compared with the spectra in database. A score enabled the presumptive identification and discrimination of the tested species from those in a database: a score > 2 with a validated species enabled the identification at the species level; a score > 1.7 but < 2 enabled the identification at the genus level; and a score < 1.7 did not enable any identification. For strain 9402080 T , no significance score was obtained, thus suggesting that our isolate was not a member of a known species. We added the spectrum from strain 9402080 T (Figure 4) to our database. A dendrogram was constructed with the MALDI Bio Typer software (version 2.0, Bruker), comparing the reference spectrum of strain 9402080 T with reference spectra of 24 bacterial species, all belonging to the order of Clostridiales. In this dendrogram, strain 9402080 T appears on a sepa-rate branch within the genus Anaerococcus (Fig-ure 5).

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rDNA similarity to other members of the Anaerococcus genus, and is part of a study for recovering and analyzing anaerobic bacteria from deep samples. It was the 8th genome of an Anaerococcus species and the first genome of Anaerococcus provenciensis sp. nov. The Genbank accession number is CAJU020000000 (CAJU020000001-CAJU020000026) and consists of 26 contigs. Table  3 shows the project information and its association with MIGS version 2.0 compliance [24].

Genome properties
The genome of Anaerococcus provenciensis strain 9402080 T is estimated to be 2.26 Mb long with a G+C content of 33.48% ( Figure 5 and Table 4). A total of 2,099 protein-coding and 96 RNA genes, including 8 rRNA genes, 48 tRNA, 1 tmRNA and 39 miscellaneous other RNA were found. The majority of the protein-coding genes were assigned a putative function (74.8%); the remainder were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 5 and Figure 6. The properties and the statistics of the genome are summarized in Tables 4 and 5.  a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome   Table 6 presents the difference in gene numbers (in percentage) from each COG category between Anaerococcus provenciensis and Anaerococcus prevotii DSM 20548. The totals are highly similar in the two species. The biggest difference is in the COG "Carbohydrate Metabolism and transportation" category, which does not exceed 1.42%.

Conclusion
On the basis of phenotypic, phylogenetic and genomic analysis, we formally propose the creation of Anaerococcus provenciensis sp. nov. that con-tains the strain 9402080 T . This bacterium has been found in Marseille, France .

Description of Anaerococcus provenciensis sp. nov.
Anaerococcus provenciensis (pro.ven.ci.en'cis; L. gen. masc. n. provenciensis, pertaining to Provence, the name of the aeae, south-east of France, where the type strain was isolated). Isolated from a cerebral abscess sample from a patient from Marseille. A. provenciensis is a Gram-positive cocci, obligately anaerobic, non-spore-forming bacterium. Grows at 37°C in anaerobic atmosphere. Negative for indole. Non-motile. The G+C content of the genome is 33.48%. The type strain is 9402080 T (= CSUR P121 = DSM 26345).