Non-contiguous finished genome sequence and description of Anaerococcus pacaensis sp. nov., a new species of anaerobic bacterium

Anaerococcus pacaensis strain 9403502T, is the type strain of Anaerococcus pacaensis sp. nov., a new species within a new genus Anaerococcus. This strain, whose genome is described here, was isolated from a blood sample. A. pacaensis strain 9403502T is an obligate anaerobic Gram-positive coccus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2.36 Mbp long genome exhibits a G+C content of 35.05% and contains 2,186 protein-coding and 72 RNA genes, including 3 rRNA genes.


Introduction
Anaerococcus pacaensis strain 9403502 T (= CSUR P122 = DSM 26346), is the type strain of Anaerococcus pacaensis sp. nov., and a member of the genus Anaerococcus. This bacterium is a Grampositive, anaerobic, non spore-forming, indole negative coccus that was isolated from a blood sample, during a study prospecting anaerobic isolates from deep samples [1]. The "gold standard" method to define a new bacterial species or genus is DNA-DNA hybridization and G+C content determination [2]. Those methods are expensive and poorly reproducible and actually, bacterial species can be classified with PCR and sequencing methods, particularly 16S rRNA sequences with internationally-validated cutoff [3]. More recently, an increasing number new bacterial genera and species have been described using high throughput genome sequencing and mass spectrometric analyses that allow access to the wealth of genetic and proteomic information [4,5]. In the past, studies have described new bacterial species and genera using genome sequencing, MALDI-TOF spectra, main phenotypic characteristics [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23], and we propose here to describe a new species within the genus Anaerococcus in the same way. Here we present a summary classification and a set of features for A. pacaensis sp. nov. strain 9403502 T (= CSUR P122= DSM 26346) together with the description of the complete genomic sequencing and annotation. These characteristics support the circumscription of a novel species, Anaerococcus pacaensis sp. nov., within the genus Anaerococcus, and within the Clostridiales Family XI Incertae sedis.
The genus Anaerococcus was first described in 2001 [24], and belongs to the Clostridiales Family XI Incertae sedis. This family is defined mainly on the basis of phylogenetic analyses of ARNr 16S sequences, and in the Anaerococcus genus, bacteria are all anaerobic gram positive cocci. Based on the comparison of the 16S rRNA gene sequence, the first closest related species to Anaerococcus pacaensis sp., nov., is Anaerococcus prevotii. It was first described in 1948 by Foubert and Douglas [25] and reclassified later in the genus Anaerococcus [24]. The second closest related species is A. octavius, which was described first as Peptostreptococcus octavius, isolated from a human sample in 1998 by Murdoch et al [26]. It was later re-classified in the genus Anaerococcus, as A. octavius [24].

Classification and features
A blood sample was collected from a patient during a study analyzing emerging anaerobes, with MALDI-TOF and 16S rRNA gene sequencing [1]. The specimen was sampled in Marseille and preserved at -80°C after collection. Strain 9403502 T (Table 1) was isolated in July 2009, by anaerobic cultivation on 5% sheep blood-enriched Columbia agar (BioMerieux, Marcy l'Etoile, France). This strain exhibited a 95% nucleotide sequence similarity with Anaerococcus prevotii [24,25]. Those similarity values are lower than the threshold recommended to delineate a new genus without carrying out DNA-DNA hybridization [38]. In the inferred phylogenetic tree, it forms a distinct lineage close to A. octavius ( Figure 1). Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [37]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.
Different growth temperatures (23°C, 25°C, 28°C, 32°C, 35°C, 37°C, 50°C) were tested; no growth occurred at 23°C, 25°C, 28°C and 50°C, growth occurred between 32° and 37°C, and optimal growth was observed at 37°C. Colonies are punctiform, very small, grey, dry and round on blood-enriched Columbia agar under anaerobic conditions using GENbag anaer (BioMérieux). Bacteria were grown on bloodenriched Columbia agar (Biomerieux), in BHI broth medium, and in Trypticase-soja TS broth medium, under anaerobic conditions using GENbag anaer (BioMérieux), under microaerophilic conditions using GENbag microaer (BioMérieux) and in the presence of air, with 5%CO 2 . They also were grown under anaerobic conditions on BHI agar, and on BHI agar supplemented with 1% NaCl. Growth was achieved only anaerobically, on blood-enriched Columbia agar, and weakly on BHI agar, and BHI agar supplemented with 1% NaCl after 72h incubation. Gram staining showed round non spore-forming Gram-positive cocci ( Figure 2). The motility test was negative. Cells grow anaerobically in TS broth medium have a mean diameter of 1.140µm (min = 0.955µm; max = 1.404µm), as determined using electron microscopic observation after negative staining ( Figure 3).

Figure 1.
Phylogenetic tree highlighting the position of Anaerococcus pacaensis strain 9403502 T relative to other type strains within the genus Anaerococcus. GenBank accession numbers are indicated in parentheses. Sequences were aligned using CLUSTALW, and phylogenetic inferences obtained using the maximum-likelihood method within the MEGA 4 software [39]. Numbers at the nodes are bootstrap values obtained by repeating the analysis 500 times the analysis to generate a majority consensus tree. Clostridium butyricum was used as outgroup. The scale bar represents a 2% nucleotide sequence divergence.  Strain 9403502 T exhibited catalase activity but no oxidase activities. Using API 20A, a positive reaction could be observed only weekly for Gelatinase. Using Api Zym, a positive reaction was observed for alkaline phosphatase (5nmol of hydrolyzed substrata), acid phosphatase (5nmol), naphtolphosphohydrolase (5nmol), and hyaluronidase (40nmol). Using Api rapid id 32A, a positive reaction could be observed only for beta glucuronydase and pyroglutamic acid arylamidase. Regarding antibiotic susceptibility, A. pacaensis was susceptible to penicillin G, amoxicillin, cefotetan, imipenem, metronidazole and vancomycin. When compared to the representative species within the genus Anaerococcus, A. pacaensis exhibits the phenotypic characteristics details in Table 2 [40]. Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was carried out as previously described [41]. A pipette tip was used to pick one isolated bacterial colony from a culture agar plate, and to spread it as a thin film on a MTP 384 MALDI-TOF target plate (Bruker Daltonics, Germany). Ten distinct deposits were done for strain 9403502 T from ten isolated colonies. Each smear was overlaid with 2 µL of matrix solution (saturated solution of alphacyano-4-hydroxycinnamic acid) in 50% acetonitrile, 2.5% tri-fluoracetic acid, and allowed to dry for five minutes. Measurements were performed with a Microflex spectrometer (Bruker). Spectra were recorded in the positive linear mode for the mass range of 2,000 to 20,000 Da (parameter settings: ion source 1 (ISI), 20kV; IS2, 18.5 kV; lens, 7 kV). A spectrum was obtained after 675 shots at a variable laser power. The time of acquisition was between 30 seconds and 1 minute per spot. The ten 9403502 T spectra were imported into the MALDI Bio Typer software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 5,697 bacteria, in the Bio Typer database. The method of identification includes the m/z from 3,000 to 15,000 Da. For every spectrum, 100 peaks at most were taken into account and compared with the spectra in database. A score enabled the identification, or not, from the tested species: a score ≥ 2 with a validated species enabled the identification at the species level; a score ≥ 1.7 but < 2 enabled the identification at the genus level; and a score < 1.7 did not enable any identification. For strain 9403502 T , the best obtained score was 1.265, which is not significant, suggesting that our isolate was not a member of a known genus. Our database was incremented with the reference spectrum from strain 9403502 T (Figure 4). A dendrogram was constructed with the MALDI Bio Typer software (version 2.0, Bruker), comparing the reference spectrum of strain 9403502 T with reference spectra of 26 bacterial species, all belonging to the order of Clostridiales. In this dendrogram, strain 9403502 T appears as a separated branch within the genus Anaerococcus (Figure 5). Standards in Genomic Sciences

Genome sequencing and annotation Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position, 16S rRNA similarity to other members of the Anaerococcus genus, and is part of a study prospecting anaerobic bacteria in several clinical deep samples. It was the first genome of the new genus Anaerococcus pacaensis sp. nov., and the 7th genome of Anaerococcus sp. The Genbank accession number is CAJJ020000000 (CAJJ020000001-CAJJ020000053) and consists of 14 scaffolds with a total of 53 contigs. Table 2 shows the project information and its association with MIGS version 2.0 compliance.

Genome annotation
Non-coding genes and miscellaneous features were predicted using RNAmmer [44], ARAGORN [45], Rfam [46], PFAM [47]. Open Reading Frames (ORFs) were predicted using Prodigal [48] with default parameters but the predicted ORFs were excluded if they were spanning a sequencing GAP region. The functional annotation was achieved using BLASTP [49] against the GenBank database [50] and the Clusters of Orthologous Groups (COG) database [51,52].

Genome properties
The genome of Anaerococcus pacaensis strain 9403502 T is estimated at 2.36 Mb long with a G+C content of 35.05% ( Figure 6 and Table 3). A total of 2,186 protein-coding and 72 RNA genes, including 3 rRNA genes, 42 tRNA, 1 tmRNA and 26 miscellaneous other RNA were founded. The majority of the protein-coding genes were assigned a putative function (74.1%) while the remaining ones were annotated as hypothetical proteins. The properties and the statistics of the genome are summarized in Tables 3 and 4. The Table 5 presents the difference of gene number (in percentage) related to each COG categories between Anaerococcus pacaensis and Anaerococcus prevotii DSM 20548. The proportion of COG is highly similar between the two species. The maximum difference is related to the COG "Carbohydrate Metabolism and transportation" which does not exceed 1.94%. The distribution of genes into COGs functional categories is presented in Table 6. Figure 6. Graphical circular map of the genome. From outside to the center: scaffolds are in grey (unordered), genes on forward strand (colored by COG categories), genes on reverse strand (colored by COG categories), RNA genes (tRNAs green, rRNAs red, tm RNAs black, misc_RNA pink), GC content (black/grey), and GC skew (purple/olive). *The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome

Conclusion
On the basis of phenotypic, phylogenetic and genomic analysis, we formally propose the creation of Anaerococcus pacaensis, whichcontains the strain 9403502 T . This bacterium has been found in Marseille, France.