Complete genome sequence of Atopobium parvulum type strain (IPP 1246T)

Atopobium parvulum (Weinberg et al. 1937) Collins and Wallbanks 1993 comb. nov. is the type strain of the species and belongs to the genomically yet unstudied Atopobium/Olsenella branch of the family Coriobacteriaceae. The species A. parvulum is of interest because its members are frequently isolated from the human oral cavity and are found to be associated with halitosis (oral malodor) but not with periodontitis. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Atopobium, and the 1,543,805 bp long single replicon genome with its 1369 protein-coding and 49 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain IPP 1246 T (= DSM 20469 = ATCC 33793 = JCM 10300) is the type strain of the species Atopobium parvulum and was first described by Weinberg et al. 1937 as 'Streptococcus parvulus' (basonym) [1]. In 1992 it was reclassified as A. parvulum [2]. A. parvulum is of high interest because it has frequently been isolated from the human oral cavity, especially from the tongue dorsum, where it has been associated with patients suffering from halitosis (oral malodor) [3,4]. In general, the malodorous compounds are volatile sulfur compounds, with the most frequent ones being hydrogen sulfide, methyl mercaptan, and dimethyl sulfide, which are produced by bacterial metabolism of the sulfur containing amino acids cysteine and methionine [3,4]. However, for A. parvulum itself, the production of these substances has not yet been studied. A. parvulum has not been found to be significantly associated with chronic periodontitis, though a participation in periodontitis can not be fully excluded [5]. Nevertheless, A. parvulum has been associated with odontogenic infections, e.g. dental implants, but also with the saliva of healthy subjects [6]. Here we present a summary classification and a set of features for A. parvulum IPP 1246 T together with the description of the complete genomic sequencing and annotation.

Classification and features
Phylotypes with significant 16S sequence similarity to strain IPP 1246 T were observed from intubated patients (EF510777) and from metagenomic human skin surveys (GQ081350) [7]. No significant similarities were found in human gut metagenomes (highest similarity is 92%, BABE01011651) [8], or in marine metagenomes (87%, AACY020304192) [9] (status June 2009). Figure 1 shows the phylogenetic neighborhood of A. parvulum strain IPP P1246 T in a 16S rRNA based tree. The sequence of the sole copy of the 16S rRNA gene in the genome is identical with the previously published sequence generated from ATCC 22793 (AF292372), but differs by four nucleotides from the sequence used for the last taxonomic emendation (X67150) [2].  [10,11] of the 16S rRNA gene sequence under the maximum likelihood criterion [12]. The tree was rooted with the type strains of the genera within the subclass Rubrobacteridae. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1000 bootstrap replicates if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [13] are shown in blue, published genomes in bold, including two of which are reported in this issue of SIGS [14,15] The cells are cocci (approximately 0.3 to 0.6 µm in diameter) that occur singly, in pairs, in clumps, and in short chains, occasionally with central swelling [16,17] (Table 1 and Figure 2). The strains are non-motile and obligate anaerobic. Interestingly, growth is substantially stimulated by 0.02% (vol/vol) Tween 80 and by 10% (vol/vol) rabbit serum added to culture media [16]. Strain IPP 1246 T is susceptible to chloramphenicol (12 µg/ml), clindamycin (1.6 µg/ml), erythromycin (3 µg/ml), penicillin G (2 U/ml), and tetracycline (6 µg/ml) [17]. Strain IPP 1126 T produces acid (final pH < 4.7) from cellobiose, esculin, fructose, galactose, glucose, inulin, lactose, maltose, mannose, salicin, sucrose, and trehalose; erythritol and xylose were weakly fermented; no acid was produced from amygdalin, arabinose, glycerol, glycogen, inositol, mannitol, melezitose, melibiose, pectin, raffinose, rhamnose, ribose, sorbitol, or starch. Esculin was hydrolyzed; neither starch nor hippurate was hydrolyzed. Nitrate was not reduced. Indole was not formed. A solid acid curd formed in milk; neither milk, gelatin, nor meat was digested. Neither catalase, urease, deoxyribonuclease, lecithinase, nor lipase was detected [17]. Other enzyme activities are positive for acid phosphatase, alanine arylamidase, arginine arylamidase, β-galactosidase, leucine arylamidase, pyroglutamic acid arylamidase, glycine arylamidase, tyrosine arylamidase, but negative for arginine dihydrolase, histidine arylamidase, proline arylamidase, serine arylamidase, as determined using the API system [24].

Genome project history
This organism was selected for sequencing on the basis of each phylogenetic position, and is part of the Genomic Encyclopedia of Bacteria and Archaea project. The genome project is deposited in the Genome OnLine Database [13] and the complete genome sequence is deposited in GenBank Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
A. parvulum strain IPP 1246 T , DSM 20469, was grown anaerobically in DSMZ medium 104 (modified PYG; Medium [26]) at 37°C. DNA was isolated from 0.5-1 g of cell paste using the JGI CTAP procedure with a modified protocol for cell lysis as described in Wu et al. [27].

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing performed at the JGI can be found on the JGI website. 454 Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 1,716 overlapping fragments of 1000bp and entered into assembly as pseudoreads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and to adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the parallel phrap assembler (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher [28] or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI). Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification. A total of 125 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. The error rate of the completed genome sequence is less than 1 in 100,000. Together all sequence types provided 51.2 x coverage of the genome. The final assembly contains 12,842 Sanger and 359,479 pyrosequence reads.

Genome annotation
Genes were identified using Prodigal [29] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [30]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation were performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [31].

Genome properties
The genome is 1,543,805 bp long and comprises one main circular chromosome with a 45.7% GC content ( Table 3 and Figure 3). Of the 1419 genes predicted, 1369 were protein coding genes, and 50 RNAs. Sixteen pseudogenes were also identified. The majority of the genes (74.5%) were assigned with a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.