Complete genome sequence of Parvibaculum lavamentivorans type strain (DS-1T)

Parvibaculum lavamentivorans DS-1T is the type species of the novel genus Parvibaculum in the novel family Rhodobiaceae (formerly Phyllobacteriaceae) of the order Rhizobiales of Alphaproteobacteria. Strain DS-1T is a non-pigmented, aerobic, heterotrophic bacterium and represents the first tier member of environmentally important bacterial communities that catalyze the complete degradation of synthetic laundry surfactants. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,914,745 bp long genome with its predicted 3,654 protein coding genes is the first completed genome sequence of the genus Parvibaculum, and the first genome sequence of a representative of the family Rhodobiaceae.


Introduction
Parvibaculum lavamentivorans strain DS-1 T (DSM13023 = NCIMB13966) was isolated for its ability to degrade linear alkylbenzenesulfonate (LAS), a major laundry surfactant with a worldwide use of 2.5 million tons per annum [1]. Strain DS-1 T was difficult to isolate, is difficult to cultivate, and represents a novel genus in the Alphaproteobacteria [2,3]. Strain DS-1 catalyzes not only the degradation of LAS, but also of 16 other commercially important anionic and non-ionic surfactants (hence the species name lavamentivorans = consuming [chemicals] used for washing [3]). The initial degradation as catalyzed by strain DS-1 T involves the activation and shortening of the alkylchain of the surfactant molecules, and the excretion of short-chain degradation intermediates. These intermediates are then completely utilized by other bacteria in the community [4,5]. P. lavamentivorans DS-1 T is therefore an example of a first tier member of a two-step process that mineralizes environmentally important surfactants. Other representatives of the novel genus Parvibaculum have been recently isolated. Parvibaculum sp. strain JP-57 was isolated from seawater [6] and is also difficult to cultivate [3]. Parvibaculum indicum sp. nov. was also isolated from seawater, via an enrichment culture that degraded polycyclic aromatic hydrocarbons (PAH) and crude oil [7]. Another Parvibaculum sp. strain was isolated from a PAH-degrading enrichment culture, using river sediment as inoculum [8]. Parvibaculum species were also reported in a study on marine alkanedegrading bacteria [9]. Parvibaculum species are frequently detected by cultivation-independent methods, predominantly in habitats or settings with hydrocarbon degradation. These include a bacterial community on marine rocks polluted with diesel oil [10], a bacterial community from diesel-contaminated soil [11], a petroleumdegrading bacterial community from seawater [12], an oil-degrading cyanobacterial community [13] and biofilm communities in pipes of a district heating system [14]. Parvibaculum species have also been detected in denitrifying, linearnonylphenol (NP) degrading enrichment cultures from NP-polluted river sediment [15] and in groundwater that had been contaminated by linear alkyl benzenes (LABs; non-sulfonated LAS] [16]. Additionally, Parvibaculum species were detected in biofilms that degraded polychlorinated biphenyls (PCBs) using pristine soil as inoculum [17], and in a PAH-degrading bacterial community from deep-sea sediment of the West Pacific [18]. Finally, Parvibaculum species were detected in an autotrophic Fe(II)-oxidizing, nitrate-reducing enrichment culture [19], as well as in Tunisian geothermal springs [20]. The widespread occurrence of Parvibaculum species in habitats or settings related to hydrocarbon degradation implies an important function and role of these organisms in environmental biodegradation, despite their attribute as being difficult to cultivate in a laboratory. Here we present a summary classification and a set of features for P. lavamentivorans DS-1 T , together with the description of a complete genome sequence and annotation. The genome sequencing and analysis was part of the Microbial Genome Program of the DOE Joint Genome Institute.

Classification and features
P. lavamentivorans DS-1 T is a Gram-negative, nonpigmented, very small (approx. 1.0 × 0.2 µm), slightly curved rod-shaped bacterium that can be motile by means of a polar flagellum ( Figure 1, Table 1). Strain DS-1 T grows very slowly on complex medium (e.g. on LB-or peptone-agar plates) and forms pinpoint colonies only after more than two weeks of incubation. The organism can be quickly overgrown by other organisms. Larger colonies are obtained when the complex medium is supplemented with a surfactant, e.g. Tween 20 (see DSM-medium 884 [29]) or LAS [3]. When cultivated in liquid culture with mineral-salts medium, strain DS-1 T grows within one week with the single carbon sources acetate, ethanol, or succinate, or alkanes, alkanols and alkanoates (C8 -C16); no sugars tested were utilized [3]. To allow for growth in liquid culture with most of the 16 different surfactants at high concentrations (e.g. for LAS, >1 mM; see [3].), the culture fluid needs to be supplemented with a solid surface, e.g. polyester fleece or glass fibers [2,3]. The additional solid surface is believed to support biofilm formation, especially in the early growth phase when the surfactant concentration is high, and the organism grows as single, suspended cells (nonmotile) during the later growth phase. Growth with a non-membrane toxic substrate (e.g. acetate) is independent of a solid surface, and constitutes suspended, single cells (motile). We presume that the biofilm formation by strain DS-1 T is a protective response to the exposure to membranesolubilizing agents (cf. [30]). Based on the 16S rRNA gene sequence, strain DS1 T was described as the novel genus Parvibaculum, which was originally placed in the family Phyllobacteriaceae within the order Rhizobiales of Alphaproteobacteria [3,31]. The nearest welldescribed organism to strain DS-1 T is Afifella marina (formerly Rhodobium marinum) (92% 16S rRNA gene sequence identity), a photosynthetic purple, non-sulfur bacterium. The genus Rhodobium was later re-classified as a member of the novel family Rhodobiaceae [26,32], together with two novel genera of other photosynthetic purple non-sulfur bacteria (Afifella and Roseospirillum), as well as with two novel genera of heterotrophic aerobic bacteria, represented by the redpigmented Anderseniella baltica (gen. nov., sp. nov.) [33,34] and non-pigmented Tepidamorphus gemmatus (gen. nov., sp. nov.) [35,36]. A phylogenetic tree ( Figure 2) was constructed with the 16S rRNA gene sequence of P. lavamentivorans DS-1 T and that of (i) other isolated Parvibaculum strains, (ii) representatives of other genera within the family Rhodobiaceae, (iii) representatives of the genera in the family Phyllobacteriaceae, as well as, (iv) representatives of other families within the order Rhizobiales. The phylogenetic tree shows now the placement of Parvibaculum species within the family Rhodobiaceae, and that the Parvibaculum sequences clustered as a distinct evolutionary lineage within this family ( Figure 2). This classification of Parvibaculum has been adopted in the Ribosomal Database Project (RDP) and SILVA rRNA Database Project, but not in the GreenGenes database. The family Rhodobiaceae has also not been included in the NCBI-taxonomy, IMGtaxonomy, and GOLD databases. Currently, 360 genome sequences of members of the order Rhizobiales of Alphaproteobacteria have been made available (GOLD database; August 2011), and within the family Phyllobacteriaceae there are 21 genome sequences available (Chelativorans sp. BNC1, Hoeflea phototrophica DFL-43, and 18 Mesorhizobium strains). No genome sequences currently exist for a representative of the novel family Rhodobiaceae, except of the genome of P. lavamentivorans DS-1 T .

Chemotaxonomy
Examination of the respiratory lipoquinone composition of strain DS-1 T showed that ubiquinones are the sole respiratory quinones present, and the major lipoquinone is ubiquinone 11 (Q11) [3]. The fatty acids of P. lavamentivorans are straight chain saturated and unsaturated, as well as ester-and amide-linked hydroxy-fatty acids, in membrane fractions [3]. The major polar lipids are phosphatidyl glycerol, diphosphatidyl glycerol, phosphatidyl ethanolamine, phosphatidyl choline, and two, unidentified aminolipids; the presence of the two additional aminolipids appears to be distinctive of the organism [3]. The G+C content of the DNA was determined to be 64% [3], which corresponds well to the G+C content observed for the complete genome sequence (see below).

Genome sequencing information
Genome project history  Table 2 presents the project information and its association with MIGS version 2.0 compliance [39].  The corresponding 16S rRNA gene accession numbers (or draft genome sequence identifiers) are indicated. The sequences were aligned using the GreenGenes NAST alignment tool [37]; neighborjoining tree building and visualization involved the CLUSTAL and DENDROSCOPE software [38]. Caulobacterales sequences were used as outgroup. Bootstrap values >30 % are indicated; bar, 0.01 substitutions per nucleotide position. a) Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [28].

Genome sequencing and assembly
The genome of P. lavamentivorans DS-1 T was sequenced at the Joint Genome Institute (JGI) using a combination of 3.5 kb, 9 kb and 37 kb DNA libraries. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website [41].

Genome annotation
Genes were identified using a combination of Critica [47] and Glimmer [48] as part of the genome annotation pipeline at Oak Ridge National Laboratory (ORNL), Oak Ridge, TN, USA, followed by a round of manual curation. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases; miscellaneous features were predicted using TMHMM [49] and signalP [50]. These data sources were combined to assert a product description for each predicted protein. The tRNAS-canSE tool [51] was used to find tRNA genes, whereas ribosomal RNAs were found by using BLASTn against the ribosomal RNA databases. The RNA components of the protein secretion complex and the RNaseP were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [52]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform [41] developed by the Joint Genome Institute, Walnut Creek, CA, USA [53].

Genome properties
The genome of P. lavamentivorans DS-1 T comprises one circular chromosome of 3,914,745 bp (62.33% GC content) (Figure 3), for which a total number of 3,714 genes were predicted. Of these predicted genes, 3,654 are protein-coding genes, and 2,723 of the protein-coding genes were assigned to a putative function and the remaining annotated as hypothetical proteins; 18 pseudogenes were also identified. A total of 60 RNA genes and one rRNA operon are predicted; the latter is reflective of the slow growth of P. lavamentivorans DS-1 T [54,55]. Furthermore, one Clustered Regularly Interspaced Short Palindromic Repeats element (CRISPR) including associated protein genes were predicted. The properties and the statistics of the genome are summarized in Table 3, and the distribution of genes into COGs functional categories is presented in Table 4.

Metabolic features
The genome of P. lavamentivorans encodes complete pathways for synthesis of all proteinogenic amino acids and essential co-factors, and the central metabolism is represented by a complete pathway for the citrate cycle, glycolysis/gluconeogenesis, and the non-oxidative branch of the pentose-phosphate pathway; no candidate genes for the oxidative branch of the pentose-phosphate pathway or for the Entner-Doudoroff pathway are predicted. P. lavamentivorans DS-1 T does not grow on Dglucose, D-fructose, maltose, D-mannitol, Dmannose, and N-acetylglucosamine [3,7], and there are no valid candidate genes predicted in the genome for ATP-dependent sugar uptake systems or for D-glucose uptake via a phosphotransferase system. Similarly, no valid candidate genes were predicted for ATP-dependent amino-acid and di/oligo-peptide transport systems or for other amino-acid/peptide transporters, which reflects the poor growth of strain DS-1 T in complex medium (LB-medium). a ) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. For the assimilation of acetyl-CoA from the degradation of alkanes and surfactants [2,3,5], or during growth with acetate, the genome of P. lavamentivorans encodes the glyoxylate cycle (isocitrate lyase, Plav_0592; malate synthase, Plav_0593) to generate succinate for the synthesis of carbohydrates. The genome also encodes the complete ethyl-malonyl-CoA pathway to assimilate acetate [56]. This observation, i.e. glyoxylate cycle and ethyl-malonyl-CoA pathway in the same organism, has been made before [57], and these two pathways in P. lavamentivorans DS-1 T might be differentially expressed under varying environmental conditions. For the degradation of alkanes and surfactants through abstraction of acetyl-CoA [54], the genome contains a wealth of candidate genes for the entry into alkyl-chain degradation (omega-oxygenation to activate the chain) supplemented by a variety of genes predicted for omegaoxidations (to generate the corresponding fattyacids) and fatty-acid beta-oxidations (to excise acetyl-CoA units  Other predicted oxygenase genes comprise three putative Baeyer-Villiger-type FAD-binding monooxygenase genes (COG2072). Cyclohexanone and hydroxyacetophenone, which are putative substrates for such oxygenases (e.g [58,59]) were tested as carbon source for growth of strain DS-1 T , as well as cycloalkanes (C6, C8, C12), however, none supported growth. The terpenoids camphor (for the involvement of a cytochrome-P450 oxygenase in the degradation pathway [60]) and geraniol, citronellol, linalool, menthol and eucalyptol (for the involvement of acyl-CoA interconversion enzymes in the degradation pathways) as substrates for growth were also tested negative. In contrast to the high abundance of genes for aliphatic-hydrocarbon degradation, the genome contains few genes for aromatic-hydrocarbon degradation. One gene set for an aromatic-ring dioxygenase component (Plav_1761 and 1762; BenAB-type), three aromatic-ring monooxygenase component genes (Plav_1541 and 0131, MhpA-type; Plav_1785, HpaB-type), and three valid candidate genes for extradiol ring-cleavage dioxygenase (Plav_1539 [61] and 1787, BphC-type; Plav_0983, LigB-type) were predicted in the genome. Strain DS-1 T did not grow with benzoate, protocatechuate, phenylacetate, phenyl-propionate, or phenylalanine and tyrosine as carbon source when tested. Finally, P. lavamentivorans DS-1 T is predicted to store carbon in form of intracellular polyhydroxyalkanoate/butyrate (PHB) as its genome encodes a PHB-synthase (PhbC) gene (Plav_1129), PHBdepolymerase (PhaZ) gene (Plav_0012), and PHBsynthesis repressor (PhaR) gene (Plav_1572).