Permanent draft genome sequence of Comamonas testosteroni KF-1

Comamonas testosteroni KF-1 is a model organism for the elucidation of the novel biochemical degradation pathways for xenobiotic 4-sulfophenylcarboxylates (SPC) formed during biodegradation of synthetic 4-sulfophenylalkane surfactants (linear alkylbenzenesulfonates, LAS) by bacterial communities. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 6,026,527 bp long chromosome (one sequencing gap) exhibits an average G+C content of 61.79% and is predicted to encode 5,492 protein-coding genes and 114 RNA genes.


Introduction
Comamonas testosteroni strain KF-1 (DSM14576) was isolated for its ability to degrade xenobiotic sulfophenylcarboxylates (SPC), which are degradation intermediates of the synthetic laundry surfactants linear alkylbenzenesulfonates (LAS) [1]. LAS is in use worldwide (appr. 3 × 10 6 tons per year [2]) and consists of a complex mixture of linear alkanes (C 10 -C 13 ) sub-terminally substituted by 4sulfophenyl rings (i.e., 38 different compounds) [2]. Commercial LAS is completely biodegradable, as known for more than 50 years [3], e.g., in sewage treatment plants, and its degradation is catalyzed by heterotrophic aerobic bacterial communities in two steps. First, an initial degradation step is catalyzed by bacteria such as Parvibaculum lavamentivorans DS-1 T [4] through activation and shortening of the alkyl-chains of LAS, and many short-chain degradation intermediates are excreted by these organisms, i.e., approximately 50 different SPCs and related compounds [1,[5][6][7][8]. Secondly, the ultimate degradation step, i.e., mineralization of all SPCs, is catalyzed by other bacteria in the community, and one representative of these is Comamonas testosteroni KF-1. In particular, strain KF-1 was isolated from a laboratory trickling filter that had been used to enrich a bacterial community from sewage sludge that completely degraded commercial LAS and SPCs [1,6]. Strain KF-1 is able to utilize four individual SPCs (both enantiomers), namely R/S-3-(4-sulfopenyl)butyrate (3-C 4 -SPC), enoyl-3-C 4 -SPC, R/S-3-(4-sulfopenyl)pentanoate (3-C 5 -SPC), and enoyl-3-C 5 -SPC (see therefore also below), as novel carbon an energy sources for its heterotrophic aerobic growth [1,9,10]. The first Comamonas testosteroni (formerly Pseudomonas testosteroni [11]) strain, type-strain ATCC 11996, was enriched from soil and isolated in 1952 for its ability to degrade testosterone [12,13]. Since then, the physiology, biochemistry, genetics, and regulation of steroid degradation in this and in other C. testosteroni strains have been elucidated in great detail [e.g., [14][15][16][17][18][19][20][21]. Most recently, the genome of C. testosteroni ATCC 11996 T has been sequenced in order to further improve the understanding of the molecular basis for the degradation of steroids [22]. Standards in Genomic Sciences In the environment, members of the genus Comamonas may also be important degraders of aromatic compounds other than steroids, especially of xenobiotic pollutants, since they have frequently been enriched and isolated for their ability to utilize (xenobiotic) aromatic compounds. For example, Comamonas sp. strain JS46 is able to grow with 3-nitrobenzoate [23], Comamonas sp. strain CNB-1 with 4-chloronitrobenzene [24], C. testosteroni T-2 with 4-toluenesulfonate and 4-sulfobenzoate [25], C. testosteroni WDL7 with chloroaniline [26], Comamonas sp. strain JS765 with nitrobenzene [27], Comamonas sp. strain B-9 with lignin-polymer fragments [28], C. testosteroni B-356 with biphenyl and 4-chlorobiphenyl [29], Comamonas sp. strain KD-7 with dibenzofuran [30], Comamonas sp. strain 4BC with naphthalene-2-sulfonate [31], or C. testosteroni SPB-2 (as well as strain KF-1) with 4-sulfophenylcarboxylates [1]. In several C. testosteroni strains, the physiology, biochemistry, genetics, and/or regulation of the utilization of aromatic compounds have been elucidated [e.g., 10,23,25,27,29,[32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48]. Furthermore, the genome sequence of (plasmid-cured) C. testosteroni CNB-2 has been published [24], and the sequence of its plasmid pCNB1 (of C. testosteroni CNB-1) [49], in order to further improve the understanding of the molecular basis for the ability of C. testosteroni to degrade such a large array of aromatic compounds. Members of the genus Comamonas are able to cope with harsh environmental conditions such as high concentrations of arsenate [50,51], zinc [52], cobalt and nickel [53], or phenol [54], and can exhibit increased resistance to oxidative stress [55] or antibiotics [56]. Another C. testosteroni genome sequence, of strain S44, has recently been established in order to improve the understanding of the molecular basis for its resistance to increased concentrations of zinc [52]. Notably, an increased antibiotic resistance (and enhanced insecticide catabolism) as a consequence of induction of the steroid degradation pathway has been shown for C. testosteroni ATCC 11996 T [56]. Here, we present a summary classification and a set of features for another C. testosteroni strain, strain KF-1, which has been genome-sequenced in order to improve the understanding of the molecular basis for its ability to degrade xenobiotic compounds, particularly xenobiotic, chiral 3-C 4 -SPC, and how this novel degradation pathway has been assembled in this organism, together with the description of its draft genome sequence and annotation. The genome sequence and its annotation have been established as part of the Microbial Genomics Program 2006 of the DOE Joint Genome Institute, and are accessible via the IMG platform [57].

Phylogeny
Based on its 16S rRNA gene sequence, strain KF-1 is a member of the genus Comamonas, which is placed in the family Comamonadaceae within the order Burkholderiales of Betaproteobacteria, as illustrated by a phylogenetic tree shown in Figure  2. Currently, 686 genome sequences of members of the order Burkholderiales of Betaproteobacteria, and 147 genome sequences within the family Comamonadaceae, have been, currently are, or are targeted to be established (GOLD database; May 2013).   [70]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. The 16S rRNA gene alignment included the three other C. testosteroni strains whose genome sequences have been published, strain S44 [52], strain CNB-2 [24], and type-strain ATCC 11996 [22], and some of other genome-sequenced representatives of the family Comamonadaceae or of other families within the order Burkholderiales. The corresponding genome-project accession numbers, or 16S rRNA gene accession numbers, are indicated. "T" indicates a type strain. The sequences were aligned using the RDP tree builder [76] and displayed using MEGA4 [77]. Bootstrap values are indicated; bar, 0.02 substitutions per nucleotide position.

Genome sequencing information Genome project history
The genome was selected for sequencing as part of the U.S. Department of Energy -Microbial Genomics Program 2006. The DNA sample was submitted in February 2006 and the initial sequencing phase was completed in July 2006. After the finishing and assembly phase the genome was presented for public access on January 2009; a modified version was presented (IMG) in August 2011. Table 2 presents the project information and its association with MIGS version 2.0 compliance [78].

Growth conditions and DNA isolation
Comamonas testosteroni KF-1, obtained from the Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSM14576), was grown on LB agar plates and transferred into selective medium (6 mM 4-sulfophenol/mineralsalts medium) in the 3-ml scale, and this culture was sub-cultivated in larger scale; cell pellets were stored frozen until DNA preparation. DNA was prepared following the JGI's DNA Isolation Bacterial CTAB Protocol.

Genome sequencing and assembly
The genome of Comamonas testosteroni KF-1 was sequenced at the Joint Genome Institute (JGI) using a combination of 3.5 kb, 9 kb and 37 kb DNA libraries. All general aspects of library construction and sequencing performed at the JGI can be found at JGI website [79]. In total, 66.91 Mbp of Sanger sequence data were generated for the assembly from all three libraries, which provided for a 12.8-fold coverage of the genome. The Phred/Phrap/Consed software package was used for sequence assembly and quality assessment [80][81][82]. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher [83], PCR amplification, or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI, USA). Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification (Roche Applied Science, Indianapolis, IN, USA). The genome could not be closed due to clone viability issues, however, several clones circularized the contig, and a PCR product was obtained that spanned the ends, but all attempts at primer walking and transforming the amplicon were unsuccessful. At this time no additional work is planned for this project (labeled as Permanent Draft; one linear contig).

Genome annotation
Genes were identified using Prodigal [84] as part of the genome annotation pipeline at Oak Ridge National Laboratory (ORNL), Oak Ridge, TN, USA, followed by a round of manual curation using the JGI GenePRIMP pipeline [85]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [86], RNAMMer [87], Rfam [88], TMHMM [89], and signalP [90]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform [91] developed by the Joint Genome Institute, Walnut Creek, CA, USA [92].