Permanent draft genome sequence of Dethiosulfovibrio peptidovorans type strain (SEBR 4207T)

Dethiosulfovibrio peptidovorans Magot et al. 1997 is the type species of the genus Dethiosulfovibrio of the family Synergistaceae in the recently created phylum Synergistetes. The strictly anaerobic, vibriod, thiosulfate-reducing bacterium utilizes peptides and amino acids, but neither sugars nor fatty acids. It was isolated from an offshore oil well where it was been reported to be involved in pitting corrosion of mild steel. Initially, this bacterium was described as a distant relative of the genus Thermoanaerobacter, but was not assigned to a genus, it was subsequently placed into the novel phylum Synergistetes. A large number of repeats in the genome sequence prevented an economically justifiable closure of the last gaps. This is only the third published genome from a member of the phylum Synergistetes. The 2,576,359 bp long genome consists of three contigs with 2,458 protein-coding and 59 RNA genes and is part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain SEBR 4207 T (= DSM 11002 = JCM 15826) is the type strain of the species Dethiosulfovibrio peptidovorans ('curved rod-shaped [vibrio] bacterium that reduces thiosulfate devouring peptides'), which represents the type species of the genus Dethiosulfovibrio [1]. D. peptidovorans strain SEBR 4207 T was isolated in 1989 from an offshore oil well in the Congo (Brazzaville) and initially described by Magot et al. in 1997 [1]. The strain pro-vided the first experimental evidence for the involvement of microbial thiosulfate reduction in the corrosion of steel (pitting corrosion). Strain SEBR 4207 T utilizes only peptides and amino acids, but no sugar or fatty acids. For the first few years neither the strain nor the genus Dethiosulfovibrio could be assigned to an established higher taxon, except that the distant relationship to the genus Thermanaerovibrio was reported [1]. The taxonomic situation of the species was only recently further enlightened, when Jumas-Bilak et al. [2] combined several genera with anaerobic, rodshaped, amino acid degrading, Gram-negative bacteria into the novel phylum Synergistetes [2]. The phylum Synergistetes contains organisms isolated from humans, animals, terrestrial and oceanic habitats: Thermanaerovibrio, Dethiosulfovibrio, Aminiphilus, Aminobacterium, Aminomonas, Anaerobaculum, Jonquetella, Synergistes and Thermovirga. Given the novelty of the phylum it is not surprising that many of the type strains from these genera are already subject to genome sequencing projects. Here we present a summary classification and a set of features for D. peptidovorans strain SEBR 4207 T , together with the description of the genomic sequencing and annotation.

Classification and features
The 16S rRNA genes of the four other type strains in the genus Dethiosulfovibrio share between 94.2% (D. salsuginis [3]) and 99.2% (D. marinus [4]) sequence identity with strain SEBR 4207 T , whereas the other type strains from the family Synergistaceae share 83.6 to 86.6% sequence identity [5]. There are no other cultivated strains that closely related. Uncultured clones with high sequence similarity to strain SEBR 4207 T were identified in a copper-polluted sediment in Chile (clones LC6 and LC23, FJ024724 and FJ024721, 99.1%). Metagenomic surveys and environmental samples based on 16S rRNA gene sequences provide no indication for organisms with sequence similarity values above 88% to D. peptidovorans SEBR 4207 T , indicating that members of this species are not abundant in habitats screened thus far. The majority of these 16S rRNA gene sequences with similarity between 88% and 93% originate from marine metagenomes (status July 2010).  peptidovorans SEBR 4207 T relative to the other type strains within the phylum Synergistetes. The tree was inferred from 1,328 aligned characters [6,7] of the 16S rRNA gene sequence under the maximum likelihood criterion [8] and rooted in accordance with the current taxonomy [9]. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates if greater than 60%. Lineages with type strain genome sequencing projects registered in GOLD [10] are shown in blue, published genomes in bold [11,12].
Cells of D. peptidovorans SEBR 4207 T stain Gramnegative [1]. Cells are vibriod with pointed or round ends and lateral flagella ( Figure 2, flagella not visible) and a size of 3-5 by 1 µm [1] (Table 1). Spores were not detected [1]. Optimal growth rate was observed at 42°C, pH 7.0 in 3% NaCl [1]. D. peptidovorans is capable of utilizing peptides and amino acids as a sole carbon and energy source and can ferment serine and histidine. In the presence of thiosulfate, strain SEBR 4207 T is capable of utilizing alanine, arginine, asparagines, glutamate, isoleucine, leucine, methionine and valine as an electron acceptor. The strain is capable of producing acetate, isobutyrate, isovalerate, 2-methylbutyrate, CO2 and H2 from peptides. The strain uses elemental sulfur and thiosulfate but not sulfate as electron acceptor. H2S is produced with a decrease in H2. Cells do not have cytochrome or desulfoviridin [1]. When yeast extract was added as sole carbon and energy source together with trypticase, thiosulfate was used as sole electron acceptor. Strain SEBR 4207 T was not able to utilize gelatine, casein, arabinose, fructose, galactose, glucose, lactose, maltose, mannose, rhamnose, ribose, sucrose, sorbose, trehalose, xylose, acetate, propionate, butyrate, citrate and lactate.

Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position [17], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [18]. The genome project is deposited in the Genome OnLine Database [10] and the complete genome sequence is deposited in Gen-Bank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.  Altitude about sea level NAS Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [16]. If the evidence code is IDA, then the property was observed by one of the authors or an expert mentioned in the acknowledgements.

Chemotaxonomy
None of the classical chemotaxonomic features (peptidoglycan structure, cell wall sugars, cellular fatty acid profile, menaquinones, or polar lipids) are known for D. peptidovorans SEBR 4207 T or any of the other members of the genus Dethiosulfovibrio.

Growth conditions and DNA isolation
D. peptidovorans SEBR 4207 T , DSM 11002, was grown anaerobically in DSMZ medium 786 (Dethiosulfovibrio peptidovorans Medium) [19] at 42°C. DNA was isolated from 0.5-1 g of cell paste using Qiagen Genomic 500 DNA Kit (Qiagen, Hilden, Germany) following the protocol as recommended by the manufacturer, with modification st/FT for cell lysis as described in Wu et al. [18]. Standards in Genomic Sciences

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website. Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into overlapping fragments of 1,000 bp and entered into assembly as pseudoreads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and adjust inflated q-scores. A hybrid 454/Sanger assembly was made using Arachne assembler. Possible mis-assemblies were corrected and gaps between contgis were closed by primer walks off Sanger clones and bridging PCR fragments and by editing in Consed. A total of 392 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. Illumina reads were used to improve the final consensus quality using an inhouse developed tool (the Polisher [20] ). The error rate of the final genome sequence is less than 1 in 100,000. Together, the combination of the Sanger and 454 sequencing platforms provided 63.0× coverage of the genome. The final assembly contains 35,314 Sanger reads and 626,193 pyrosequencing reads.

Genome annotation
Genes were identified using Prodigal [21] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [22]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, Uni-Prot, TIGRFam, Pfam, PRIAM, KEGG, COG, and In-terPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [23].

Genome properties
The genome is 2,576,359 bp long and assembled in one large contig and two small contigs (7,415 bp and 1,508 bp) with a 54.0% G+C content (Figure 3 and Table 3). Of the 2,517 genes predicted, 2,458 were protein-coding genes, and 59 RNAs; No pseudogenes were identified. The majority of the protein-coding genes (75.0%) were assigned with a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.