Complete genome sequence of Veillonella parvula type strain (Te3T)

Veillonella parvula (Veillon and Zuber 1898) Prévot 1933 is the type species of the genus Veillonella in the family Veillonellaceae within the order Clostridiales. The species V. parvula is of interest because it is frequently isolated from dental plaque in the human oral cavity and can cause opportunistic infections. The species is strictly anaerobic and grows as small cocci which usually occur in pairs. Veillonellae are characterized by their unusual metabolism which is centered on the activity of the enzyme methylmalonyl-CoA decarboxylase. Strain Te3T, the type strain of the species, was isolated from the human intestinal tract. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the large clostridial family Veillonellaceae, and the 2,132,142 bp long single replicon genome with its 1,859 protein-coding and 61 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain Prévot Te3 T (= DSM 2008 = ATCC 10790 = JCM 12972) is the type strain of the species Veillonella parvula and was first described in 1898 by Veillon and Zuber [1] as "Staphylococcus parvulus" before it was renamed as Veillonella parvula by Prévot in 1933 [2]. Although it is a Gram-negative organism harboring lipopolysaccharide [3] it is more closely related to Grampositive species like Sporomusa, Megasphaera or Selenomonas. Together, they share the unusual presence of cadaverine and putrescine in their cell walls [4]. The genus Veillonella comprises 11 species (status July 2009) which are all known to inhabit the oral cavity and the gastrointestinal tract of homeothermic vertebrates. Six of the species, among them V. parvula, have been isolated from man, the others are typical for rodents [5]. In general, veillonellae are harmless inhabitants of most body cavities, however, occasionally they can participate in multispecies infections at diverse body sites and in rare cases cause severe infections also as pure cultures [6]. Here we present a summary classification and a set of features for V. parvula Te3 T , together with the description of the complete genomic sequencing and annotation.

Classification and features
The natural habitat is human dental plaque and V. parvula can amount to up to 98% of the cultivable veillonellae in healthy subgingival sites [7]. Additionally, veillonellae are common inhabitants of the gastrointestinal tract. Although the other species of the genus Veillonella are found in large numbers throughout the oral cavity, V. parvula is the only species of the genus involved in oral diseases such as gingivitis. It has also been isolated in rare cases of endocarditis, meningitis, discitis [8] or bacteremia as pure culture but more often V. parvula is involved in multispecies infections (reviewed in [6]). Medline indexes few cultivated strains with a high degree of 16S rRNA gene sequence similarity to Te3 T , e.g. DJF_B315 from porcine intestine (EU728725, Hojberg and Jensen, unpublished, 99.9% identity). The other type strains of the genus Veillonella vary from 94.1% (V. ratti) to 99.2% (V. dispar). A vast number of phylotypes with significant 16S rRNA sequence similarity to V. parvula were observed from intubated patients [9], carious dentine from advanced caries (AY995757; 99.7% identity), and the human skin microbiome [10]. Curiously, only one sample from a human gut metagenome analysis [11] scored above 96% sequence similarity in screenings of environmental samples (status September 2009). Figure 1 shows the phylogenetic neighborhood of V. parvula strain Te3 T in a 16S rRNA based tree. The sequences of the four copies of the 16S rRNA gene in the genome differ by up to seven nucleotides, and differ by up to four nucleotides from the previously published sequence generated from ATCC 10790 (AY995767). relative to all other type strains within the genus Veillonella. The tree was inferred from 1,378 aligned characters [12,13] of the 16S rRNA gene sequence under the maximum likelihood criterion [14]. The tree was rooted with the type strains of other genera within the family Veillonellaceae. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates if greater than 60%. Lineages with type strain genome sequencing projects registered in GOLD [15] are shown in blue, published genomes in bold.
V. parvula is a Gram-negative, non-motile, nonsporeforming, anaerobic coccus (approximately 0.3 to 0.5 µm in diameter) that grows in pairs or short chains (Table 1 and Figure 2). Veillonellae are characterized by an unusual metabolism using methylmalonyl-CoA decarboxylase to convert the free energy derived from decarboxylation reactions into an electrochemical gradient of sodium ions [29]. They utilize the metabolic end products of co-existing carbohydrate-fermenting bacteria, i.e. lactic acid bacteria in the gastrointestinal tract, and thereby play an important role in a natural microbial food chain [6]. Another characteristic trait of veillonellae is their ability to form intergeneric coaggregates with other bacteria which occur in the same ecological niche [30]. Although Veillonella cannot adhere to surfaces itself, the bacterium is able to attach to specific surface structures present on other cells, often mediated by lectin-carbohydrate interactions [31]. The coaggregation creates a functional community providing nutrients and protection for all participants. Strain Te3 T produces propionic and acetic acid, carbon dioxide and hydrogen from lactate and other organic acids like pyruvate, malate or fumarate. V. parvula cannot grow on succinate as a sole carbon source but can decarboxylate succinate during fermentation of lactate or malate [25]. Veillonellae are unable to use glucose or other carbohydrates for fermentation [26] and they do not possess a functional hexokinase [24]. Nitrate is reduced and arginine dihydrolase is produced. Veillonellae show resistance to tetracycline (>25 µg/ml), erythromycin (>25 µg/ml) gentamicin (>25 µg/ml) and kanamycin (>25 µg/ml) and they are susceptible to penicillin G (0.4 µg/ml), cephalotin (1.6 µg/ml) and clindamycin (0.1 µg/ml). Their resistance is intermediate for chloramphenicol (3.1 µg/ml) and lincomycin (6.2 µg/ml) [32].

Chemotaxonomy
The cell wall of V. parvula comprises an outer membrane, clearly demonstrating the presence of lipopolysaccharide [33]. The peptidoglycan of veillonellae is of the A1γ-type with glutamic acid in D configuration, diaminopimelic acid in meso configuration and covalently bound cadaverine or putrescine attached in α-linkage to glutamic acid [34]. As major fatty acids straight-chain saturated C13:0 (24%), C15:0 (12%) and C16:0 (7%) and unsaturated C16:1 (5%), C17:1 (22%) and C18:1 (6%) are synthesized [35]. Another characteristic feature of V. parvula is the presence of plasmalogens such as plasmenylethanolamine and plasmenylserine as major constituents of the cytoplasmic membrane. These ether lipids replace phospholipids and play an important role in the regulation of membrane fluidity [36].  Altitude not reported Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [28]. If the evidence code is IDA, then the property was observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position, and is part of the Genomic Encyclopedia of Bacteria and Archaea project. The genome project is deposited in the Genome OnLine Database [15] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
V. parvula strain Te3 T , DSM 2008, was grown anaerobically in DSMZ medium 104 (modified PYG-Medium, with the addition of lactate and putrescine) at 37°C [37]. DNA was isolated from 1.5-2 g of cell paste using Qiagen Genomic 500 DNA Kit (Qiagen, Hilden, Germany) following the manufacturer's protocol, with cell lysis protocol L as described in Wu et al. [38]. Standards in Genomic Sciences

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov/. 454 Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 1,716 overlapping fragments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and to adjust inflated qscores. A hybrid 454/Sanger assembly was made using the parallel phrap assembler (High Performance Software, LLC). Possible misassemblies were corrected with Dupfinisher [39] or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI). Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification. A total of 1,082 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. The error rate of the completed genome sequence is less than 1 in 100,000. Together all sequence types provided 51.2× coverage of the genome. The final assembly contains 16,169 Sanger and 445,271 pyrosequence reads.

Genome annotation
Genes were identified using Prodigal [40] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline (http://geneprimp.jgi-psf.org) [41]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation were performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform (http://img.jgi.doe.gov/er) [42].

Genome properties
The genome is 2,132,142 bp long and comprises one main circular chromosome with a 38.6% GC content (Table 3 and Figure 3). Of the 1,920 genes predicted, 1,859 were protein coding genes, and 61 RNAs; 15 pseudogenes were also identified. The majority (73.6%) of the genes were assigned a putative function while those remaining were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.