Complete genome sequence of Desulfarculus baarsii type strain (2st14T)

Desulfarculus baarsii (Widdel 1981) Kuever et al. 2006 is the type and only species of the genus Desulfarculus, which represents the family Desulfarculaceae and the order Desulfarculales. This species is a mesophilic sulfate-reducing bacterium with the capability to oxidize acetate and fatty acids of up to 18 carbon atoms completely to CO2. The acetyl-CoA/CODH (Wood-Ljungdahl) pathway is used by this species for the complete oxidation of carbon sources and autotrophic growth on formate. The type strain 2st14T was isolated from a ditch sediment collected near the University of Konstanz, Germany. This is the first completed genome sequence of a member of the order Desulfarculales. The 3,655,731 bp long single replicon genome with its 3,303 protein-coding and 52 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Most sulfate reducing bacteria, available in pure culture, oxidize organic electron donors incompletely to acetate, whereas species that oxidize acetate and other carbon compounds completely to CO2, using sulfate as an electron acceptor, are less frequently isolated. Sulfate reducers with the latter type of metabolism are of special interest, because it is assumed that they are dominant in anoxic marine sediments [1]. Sulfate reducing prokaryotes with the ability to mineralize organic compounds to CO2 are phylogenetically dispersed and can be found within the Proteobacteria, Firmicutes and Euryarchaeota. At the time of writing, representa-tives of this type of metabolism, for which a completely sequenced genome exists include Desulfobacterium autotrophicum [2], Desulfotomaculum acetoxidans [3] and Archaeoglobus fulgidus [4]. In the present work, the complete genome sequence of Desulfarculus baarsii a completely oxidizing sulfate reducing bacterium representing the order Desulfarculales within the Deltaproteobacteria, was determined. The original description of D. baarsii was based on strain 1st1 (= "Göttingen") [5], which was probably subsequently lost and replaced by the designated type strain 2st14 T (= "Konstanz") [6]. Strain 2st14 T (= DSM 2075 = ATCC 33931 = LMG 7858) was enriched from anoxic mud from a ditch near the University of Konstanz, Germany, in a medium supplemented with stearate and sulfate and subsequently isolated in an anaerobic agar dilution series with formate plus sulfate [7,8]. D. baarsii strain 2st14 T is the first member of the family Desulfarculaceae within the order Desulfarculales with a sequenced genome. The presented sequence data will enable interesting genome comparisons with other sulfate reducing bacteria of the class Deltaproteobacteria.

Classification and features
The species D. baarsii represents a separate lineage within the Deltaproteobacteria which is only distantly related to most other members of this class. The closest relatives based on 16S rRNA gene sequence similarity values are the type strains of Desulfomonile tiedjei (87.6% sequence identity) and Desulfomonile liminaris (87.2%), both belonging to the family Syntrophaceae within the order Syntrophobacterales [9]. The most similar cloned 16S rRNA gene, EUB-42 [10] shared only 95.5% sequence similarity with D. baarsii and was retrieved from anaerobic sludge. Strain 2st14 T represents the only strain of this species available from a culture collection, thus far. Currently available data from cultivation independent studies (environmental screening and genomic surveys) did not surpass 86% sequence similarity, indicating that members of this species are restricted to distinct habitats which are currently undersampled in most environments or are in low abundance (status October 2010). The single genomic 16S rRNA sequence of strain 2st14 T was compared using BLAST with the most resent release of the Greengenes database [11] and the relative frequencies of taxa and keywords, weighted by BLAST scores, were determined. The five most frequent genera were Desulfovibrio (43.3%), Syntrophobacter (14.4%), Desulfomonile (11.8%), Desulfarculus (9.6%) and Desulfatibacillum (7.5%). The species yielding the highest score was D. baarsii. The five most frequent keywords within the labels of environmental samples which yielded hits were 'sediment' (4.5%), 'microbial' (4.5%), 'lake' (1.7%), 'depth' (1.7%) and 'sea' (1.6%). Environmental samples which yielded hits of a higher score than the highest scoring species were not found. Figure 1 shows the phylogenetic neighborhood of D. baarsii 2st14 T in a 16S rRNA based tree. The sequence of the single 16S rRNA gene in the genome differs by one nucleotide from the previously published 16S rRNA gene sequence generated from DSM 2075 (AF418174) which contains five ambiguous base calls. Genbank entry M34403 from 1989 is also annotated as 16S rRNA sequence of strain 2st14 T , but differs in 45 positions (3.2%) from the actual sequence. This difference probably reflects more the progress in sequencing technology than biological differences.  [12,13] of the 16S rRNA gene sequence under the maximum likelihood criterion [14] and rooted in accordance with the current taxonomy. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates [15] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [16] are shown in blue, published genomes [17] and INSDC accession CP000478 for Syntrophobacter fumaroxidans in bold. Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [26]. If the evidence code is IDA, then the property was directly observed by one of the authors or an expert mentioned in the acknowledgements.
The cells of D. baarsii 2st14 T are vibrioid, Gramnegative and 0.5-0.7 by 1.5-4 µm in size ( Figure 2, Table 1). Motility is conferred by a single polar flagellum (not visible in Figure 2) [5]. The temperature range for growth is 20-39°C with an optimum around 35°C. The pH range for growth is 6.5-8.2, with an optimum at 7.3. The strain grows optimally in the presence of 7-20 g/l NaCl and 1.2-3g/l MgCl2 × 6 H2O, but growth is nearly as rapid at lower concentrations [7]. D. baarsii strain 2st14 T is a strictly anaerobic, nonfermentative, chemoorganotrophic sulfate-reducer that oxidizes organic substrates completely to CO2. Sulfate, sulfite and thiosulfate serve as terminal elec-tron acceptors and are reduced to H2S, but sulfur, fumarate and nitrate cannot be utilized. The following compounds are utilized as electron donors and carbon sources: formate, acetate, propionate, butyrate, iso-butyrate, 2-methylbutyrate, valerate, iso-valerate, and higher fatty acids up to 18 carbon atoms. Growth on formate does not require an additional organic carbon source [5,7]. A high activity of carbon monoxide dehydrogenase is observed in D. baarsii, indicating the operation of the anaerobic C1-pathway (Wood-Ljungdahl pathway) for formate assimilation and CO2 fixation or complete oxidation of acetyl-CoA [27]. The oxygen detoxification system of D. baarsii was analyzed in some detail. It could be shown that a genomic region encoding a putative rubredoxin oxidoreductase (rbo) and rubredoxin (rub) of D. baarsii is able to suppress deleterious effects of reactive oxygen species (ROS) in Escherichia coli mutants lacking superoxide dismutase [28]. The cloned genes were identified in the whole genome sequence as Deba_2049 (rub) and Deba_2050 (rbo) and found in close proximity to a gene encoding rubrerythrin (Deba_2051), which is supposed to play an important role in the oxygen tolerance of anaerobic bacteria [29]. The product of the recombinant rbo gene of D. baarsii was later further characterized and designated as desulfoferrodoxin (Dfx), because no evidence for a rubredoxin oxidoreductase could be demonstrated. Instead, a function as superoxide reductase was proposed [30].

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position [32], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [33]. The genome project is deposited in the Genome OnLine Database [16] and the complete genome sequence is deposited in Gen-Bank Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2. . DNA was isolated from 0.5-1 g of cell paste using Jetflex Genomic DNA Purification Kit (GENOMED 600100) following the manufacturer's instructions, but with 30 min incubation at 58°C with an additional 10 µl proteinase K for cell lysis.

Genome sequencing and assembly
The genome of was sequenced using a combination of Illumina and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website [35]. Pyrosequencing reads were assembled using the Newbler assembler version 2.1-PreRelease-4-28-2009-gcc-3.4.6-threads (Roche). The initial Newbler assembly consisted of 42 contigs in two scaffolds and was converted into a phrap assembly by making fake reads from the consensus, collecting the read pairs in the 454 paired end library. Illumina GAii sequencing data (267.7Mb) were assembled with Velvet [36] and the consensus sequences were shredded into 1.5 kb overlapped fake reads and assembled together with the 454 data. The 454 draft assembly was based on 157.7 Mb 454 draft data and all of the 454 paired end data. Newbler parameters are -consed -a 50 -l 350 -g -m -ml 20. The Phred/Phrap/Consed software package [37] was used for sequence assembly and quality assessment in the following finishing process: After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible misassemblies were corrected with gapResolution [35], Dupfinisher, or sequencing cloned bridging PCR fragments with subcloning or transposon bombing (Epicentre Biotechnologies, Madison, WI) [38]. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks (J.-F.Chang, unpublished). A total of 344 additional reactions were necessary to close gaps and to raise the quality of the finished sequence. Illumina reads were also used to correct potential base errors and increase consensus quality using a software Polisher developed at JGI [39]. The error rate of the completed genome sequence is less than 1 in 100,000.

Genome annotation
Genes were identified using Prodigal [40] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [41]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review platform [42].

Genome properties
The genome is 3,655,731 bp long and comprises one main circular chromosome with an overall GC content of 65.7% (Table 3 and Figure 3). Of the 3,355 genes predicted, 3,303 were protein-coding genes, and 52 RNAs; 26 pseudogenes were also identified. The majority of the protein-coding genes (73.4%) were assigned a putative function while those remaining were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.