Genome sequence of Phaeobacter inhibens type strain (T5T), a secondary metabolite producing representative of the marine Roseobacter clade, and emendation of the species description of Phaeobacter inhibens

Strain T5T is the type strain of the species Phaeobacter inhibens Martens et al. 2006, a secondary metabolite producing bacterium affiliated to the Roseobacter clade. Strain T5T was isolated from a water sample taken at the German Wadden Sea, southern North Sea. Here we describe the complete genome sequence and annotation of this bacterium with a special focus on the secondary metabolism and compare it with the genomes of the Phaeobacter inhibens strains DSM 17395 and DSM 24588 (2.10), selected because of the close phylogenetic relationship based on the 16S rRNA gene sequences of these three strains. The genome of strain T5T comprises 4,130,897 bp with 3.923 protein-coding genes and shows high similarities in genetic and genomic characteristics compared to P. inhibens DSM 17395 and DSM 24588 (2.10). Besides the chromosome, strain T5T possesses four plasmids, three of which show a high similarity to the plasmids of the strains DSM 17395 and DSM 24588 (2.10). Analysis of the fourth plasmid suggested horizontal gene transfer. Most of the genes on this plasmid are not present in the strains DSM 17395 and DSM 24588 (2.10) including a nitrous oxide reductase, which allows strain T5T a facultative anaerobic lifestyle. The G+C content was calculated from the genome sequence and differs significantly from the previously published value, thus warranting an emendation of the species description.


Introduction
Strain T5 T was isolated from a water sample taken on 25 th of October 1999 above an intertidal mud flat of the German Wadden Sea (53°42'20''N, 07°43'11''E) and found to be closely related to the type strain of Roseobacter gallaeciensis [1]. Two years later Martens et al. (2006) reclassified Roseobacter gallaeciensis as Phaeobacter gallaeciensis and described strain T5 T as type strain of the species Phaeobacter inhibens. As found for various Phaeobacter strains [2][3][4][5][6][7], P. inhibens strain T5 T (= DSM 16374 T = LMG 22475 T = CIP 109289 T ) is able to produce the antibiotic tropodithietic acid (TDA) [8]. Furthermore, strains of P. gallaeciensis and P. inhibens, including strain T5 T , are able to produce a brownish pigment, which is the basis of the genus name (phaeos = dark, brown) [1]. The epithet of the species name points to the strong inhibitory activity of P. inhibens against different taxa of marine bacteria and algae [1]. The genus Phaeobacter is known to have a high potential for secondary metabolite production, as indicated by biosynthesis of TDA and N-acyl homoserine lactones (AHLs), as well as presence of genes coding for polyketide synthases (PKS) and nonribosomal peptide synthetases (NRPS) [2,[7][8][9][10]. Biosynthesis of many different bioactive natural products is mediated by PKSs or NRPSs, including antibiotics, toxins and siderophores. Moreover, production of volatile compounds is widespread over the Roseobacter clade. It displays a particularly high proportion of volatile sulfur-containing compounds and thus seems to play an important role in the sulfur cycle of the ocean [11]. The sulfur-containing TDA, for which the sulfur precursor has not yet been determined, plays an important role in the mutualistic symbioses of P. inhibens and marine algae [12]. p-Coumaric acid causes the organism to switch from a state of mutualistic symbiosis to a pathogenic lifestyle in which toxicity is mediated via the production of the algicidal roseobacticides, which, like pcoumaric, is also a sulfur-containing metabolite [13,14].
Here we present the genome of P. inhibens strain T5 T with particular emphasis on the genes involved in secondary metabolism and comparison with the recently published genomes of the P. inhibens strains DSM 17395 and DSM 24588 (2.10) [3]. DSM 17395 and DSM 24588, originally deposited as P. gallaeciensis strains, were recently reclassified as P. inhibens [15].

Classification and features
16S rRNA gene analysis Figure 1 shows the phylogenetic neighborhood of P. inhibens DSM 16374 T in a tree based on 16S rRNA genes. The sequences of the three identical 16S rRNA gene copies differ by one nucleotide from the previously published 16S rRNA sequence (NCBI Accession No. AY177712).
A representative genomic 16S rRNA gene sequence of P. inhibens DSM 16374 T was compared using NCBI BLAST [16,17] under default settings (e.g., considering only the high-scoring segment pairs (HSPs) from the best 250 hits) with the most recent release of the Greengenes database [18] and the relative frequencies of taxa and keywords (reduced to their stem [19]) were determined, weighted by BLAST scores. The most frequently occurring genera were Ruegeria (32.5%), Phaeobacter (28.8%), Silicibacter (13.6%), Roseobacter (13.3%) and Nautella (3.5%) (141 hits in total). Regarding the single hit to sequences from the species, the average identity within HSPs was 99.8%, whereas the average coverage by HSPs was 99.3%. Regarding the nine hits to sequences from other species of the genus, the average identity within HSPs was 99.0%, whereas the average coverage by HSPs was 99.2%. Among all other species, the one yielding the highest score was P. gallaeciensis (NZ_ABIF01000004), which corresponded to an identity of 100.0% and an HSP coverage of 100.0%. (Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification). The highest-scoring environmental sequence was AJ296158 (Greengenes short name 'Spain:Galicia isolate str. PP-154'), which showed an identity of 99.8% and an HSP coverage of 100.0%. The most frequently occurring keywords within the labels of all environmental samples which yielded hits were 'microbi' (3.1%), 'marine' (2.6%), 'coral' (2.3%), 'biofilm' (2.1%) and 'membrane, structure, swro' (1.8%) (100 hits in total). Environmental samples which yielded hits of a higher score than the highest scoring species were not found.

Morphology and physiology
Cells of T5 T are ovoid rods, 1.4-1.9 x 0.6-0.8 µm ( Figure 2). Furthermore, T5 T cells show the typical multicellular star-shaped structure described previously for P. gallaeciensis and other Roseobacterclade organisms [2,4,47] (Figure 2). Cells of T5 T are motile by means of a polar flagellum. T5 T is a Gram-negative, marine, facultatively anaerobic, mesophilic bacterium with an optimal growth temperature between 27 and 29 °C and an optimal salinity between 0.51 and 0.68 M. The pH range for growth is 6.0 -9.5, with an optimum at 7.5. On marine agar T5 T forms smooth and convex colonies with regular edges and brownish pigmentation on ferric citrate containing media. T5 T utilizes pentoses, hexoses, disaccharides and most amino acids as carbon and energy sources. No vitamin requirements were observed [1].

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of the DOE Joint Genome Institute Community Sequencing Program (CSP) 2010, CSP 441 "Whole genome type strain sequences of the genera Phaeobacter and Leisingera -a monophyletic group of physiological highly diverse organisms". The genome project is deposited in the Genomes On Line Database [40] and the complete genome sequence is deposited in GenBank and the Integrated Microbial Genomes database (IMG) [56]. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI) using state of the art sequencing technology [57]. A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
A culture of DSM 16374 T was grown aerobically in DSMZ medium 514 [58] at 25°C. Genomic DNA was isolated using the Jetflex Genomic DNA Purification Kit (GENOMED 600100) following the standard protocol provided by the manufacturer but modified by an incubation time of 40 min, the incubation on ice over night on a shaker, the use of additional 10 µl proteinase K, and the addition of 100 µl protein precipitation buffer. DNA is available from DSMZ through the DNA Bank Network [59].

Genome sequencing and assembly
For this genome, we constructed and sequenced an Illumina short-insert paired-end library with an average insert size of 225 bp, and an Illumina longinsert paired-end library with an average insert size of 9602 bp, which generated 18,471,132 reads and 11,906,846 reads, respectively, totaling 4,557 Mbp of Illumina data. All general aspects of library construction and sequencing performed can be found at the JGI website [60]. The initial draft assembly contained 13 contigs in 10 scaffold. The initial draft data was assembled with Allpaths [61] and the consensus was computationally shredded into 10 kbp overlapping fake reads (shreds). The Illumina draft data was also assembled with Velvet [62], and the consensus sequences were computationally shredded into 1.5 kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from the second Velvet assembly was shredded into 1.5 kbp overlapping fake reads.

Genome annotation
Genes were identified using Prodigal [64] as part of the DOE-JGI genome annotation pipeline [65], followed by a round of manual curation using the JGI GenePRIMP pipeline [66]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation were performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [56].

Genome properties
The genome statistics are provided in Table 3 and Figure 3. The genome consists of six scaffolds with a total length of 4,130,897 bp and a G+C content of 60.0%. The scaffolds correspond to a chromosome 3,669,861 bp in length and four extrachromosomal elements as identified by their replication systems (see below). Of the 3,986 genes predicted, 3,923 were protein-coding genes, and 63 RNAs; 39 pseudogenes were also identified. The majority of the protein-coding genes (81.0%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Insights into the genome
Genome sequencing of P. inhibens DSM 16374 T revealed the presence of four extrachromosomal elements with sizes of 227 kb, 88 kb, 78 kb, and 69 kb (Figure 3; Table 5) and DnaA-like I, RepABC-8, RepB-I and RepA-I as replication systems, respectively [68]. The different replicases that mediate the initiation of replication are designated according to the established plasmid classification scheme [69]. With the exception of the 88 kb replicon, these extrachromosomal elements are highly syntenic to specific replicons in the genomes of P. inhibens strains DSM 17395 and DSM 24588 ( Figure 3). The locus tags of all replicases, plasmid stability modules and the large virB4 gene of a type IV secretion system are presented in Table 6. The plasmids pInhi_A227 and pInhi_B88 contain postsegregational killing systems (PSK) consisting of a typical operon with two small genes encoding a stable toxin and an unstable antitoxin [70]. Moreover, plasmid pInhi_B88 also contains a complete virB gene cluster of type IV secretion system, required for the formation of a transmembrane channel. However, the absence of the relaxase VirD2, which is necessary for the strand-specific DNA nicking at the origin of transfer (oriT), and the coupling protein VirD4 indicates that this plasmid is non-conjugative [71,72]. The RepA-I type replicon pInhi_D69 contains a complete rhamnose operon [73] and is dominated by genes required for polysaccharide biosynthesis.
RepA-I Inhi 3972 ----As already indicated by the strong inhibitory activity of P. inhibens T5 T [8] all 26 described genes involved in the production of TDA are present in the genome of this strain. As found for the P. inhibens strains DSM 17395 and DSM 24588, the key genes for TDA production tdaABCDEF (Inhi_3684 -_3688, Inhi_3701), paaZ2 (Inhi_3702) and a gene coding for a putative Nadependent transporter (Inhi_3697) [3,74] are located on the 227 kb plasmid of T5 T (Figure 3). The remaining 19 genes, containing genes of the phenylacetyl-CoA and assimilatory sulfate reduction pathways, are scattered over the chromosome as in the strains DSM 17395 and DSM 24588 [3]. Beside the tdaA gene, present on the 227 kb plasmid, we also found other genes involved in the regulation of TDA synthesis located on the chromosome, what is in agreement with Thole et al. (2012) and Berger et al. (2012) [3,75]. This includes the genes encoding transcriptional activator proteins (Inhi_2121; _2059; _0396) comparable with pgaR, iorR a transcriptional regulator (PGA1_c20730), a putative serine-protein kinase (Inhi_2265) and a putative signal peptide peptidase (Inhi_2227).
Two complete prophages and an additional cluster coding for the production of gene transfer agents (GTA) were found in the genome of strain T5 T . The GTA gene cluster is equal in length and comprises the same genes (Inhi_0654 -Inhi_0670) as the GTA clusters of the strains DSM 17395 and DSM 24588. The two prophages of strain T5 T consist of 52 ORFs (prophage 1; ~37kb) and 63 ORFs (prophage 2; ~48kb), respectively. Strain DSM 17395 possesses two prophages, but for DSM 24588 no prophages were detected [3]. Prophage 1 of strain T5 T is similar to prophage 1 of strain DSM 17395, with the exception that a few ORFs are different (PGA1_c18280 -_c18310, PGA1_c18480 -_c18530 and PGA1_c18570 -_c18680; Inhi_1777, Inhi_1785 -_1788, Inhi_1803 -_1812 and Inhi_1816 -1829). Prophage 2 of strain T5 T is a Mu-like bacteriophage, not present in strain DSM 17395.
It was previously shown that strain T5 T produces two different AHLs, i.e. C18-en-HSL and N-3hydroxydecanoyl-homoserine lactone (3OHC10-HSL) [76]. In P. inhibens strain DSM 17395 TDA and pigment production are regulated via a pgaR-pgaI QS system [47]. The AHL synthase encoding gene pgaI in DSM 17395 is responsible for the production of 3OHC10-HSL. In the genome of strain T5 T we found a homologous system probably coding for the 3OHC10-HSL producing AHL synthase (Inhi_2120, homolog to pgaI) and the respective regulator (Inhi_2121, homologous to pgaR) (Figure 3, QS system I). Thus TDA production in strain T5 T might also be regulated by a QS system. In addition, two further QS systems (QS system II and III; Figure 3) were found on the chromosome of T5 T . System II is formed by the genes Inhi_0506 and _0507 and is located in the prophage region 2. Orthologs for these QS system genes are also present in P. inhibens strain DSM 24588 (PGA2_c18960 and PGA2_c18970) but absent in strain DSM 17395. QS system III consists of the genes Inhi_1819 and _1820 and is unique for strain T5 T compared to P. inhibens DSM 17395 and DSM 24588. It is also located in the potential prophage 1 region (Fig. 3). A homologous system was found in the genome of Phaeobacter caeruleus DSM 24564 T and the neighboring genes show a high synteny. The location in the prophage region and the high synteny to the system of P. caeruleus suggest a possible gene transfer of this QS system via a bacteriophage. The functions of QS system II and III are currently unknown, but it is likely that the compound C18-en-HSL is produced by one of those systems.
Two functions were suggested that can possibly be used as unique chemotaxonomic markers for the species P. inhibens within the Roseobacter clade [3]. The genes coding for the first of these functions are located on the chromosome and are involved in cell wall development and surface attachment [dltA encoding a D-alaninepoly(phosphoribitol) ligase involved in biosynthesis of D-alanyl-lipoteichoic acid]. The second unique function is the biosynthesis and transport of iron-chelating siderophores, and the encoding genes are located on the plasmid pPGA1_78 and pPGA2_95, respectively. These two clusters are also present in the genome of strain T5 T . The siderophore gene cluster (Inhi_3924 -Inhi_3928) is located on the 78 kb plasmid (Fig. 3) and the dltA gene cluster (Inhi_1065 -Inhi_1086) is located on the chromosome (Fig. 3). Screenings in the newly available Roseobacter genomes showed that Leisingera methylohalidivorans DSM 14336 [42] and Leisingera aquimarina DSM 24565 [41] also harbor the genes for siderophore synthesis. The uniqueness of the dltA gene cluster within the species P. inhibens, however, remains and can be used as chemotaxonomic marker.
The existence of genes coding for a polyketide synthase (Inhi_1972) and three non-ribosomal peptide synthetases (Inhi_1072, _1974 and _3983) confirm the results of Martens et al. (2007) [7]. These genes are present in the genomes of strains DSM 17395 and DSM 24588, too (PGA1_c04930 and PGA1_c05350, _c13760, _c28490; PGA2_c05370 and PGA2_c04910, _c13660, _71p110). The genes Inhi_3983 of P. inhibens strain T5 T and PGA2_71p110 of P. inhibens strain DSM 24588 are located on the 69 kb plasmid (Fig.  3) and 71 kb plasmid, respectively. In contrast, the homologous gene (PGA1_c28490) of P. inhibens strain DSM 17395 is located on the chromosome.
For the P. inhibens strains DSM 17395 and DSM 24588 a surface-attached lifestyle was inferred from the genome analysis [3]. Even though strain T5 T was isolated from a water sample, it exhibits the same genes associated with the biosynthesis and transport of polysaccharides as strains DSM 17395 and DSM 24588. This includes genes described as unique for the strains DSM 17395 and DSM 24588, i.e. a gene coding for a glycosyltransferase-like protein (Inhi_3961) and two ORFs (Inhi_3954 and Inhi_3955) related to a type I secretion system and used for the transport of exopolysaccharides. Production of extracellular polysaccharides is a major factor contributing to surface attachment [77,78]. Thus it appears likely that T5 T is also well-adapted to a surface attached lifestyle. P. inhibens was described as a strictly aerobic bacterium [1]. However, we found genes involved in the dissimilatory nitrate reduction pathway to nitrogen, including the gene coding for a copper containing nitrite reductase (Inhi_3645) and a nitric oxide reductase cluster (Inhi_3648 -Inhi_3654), both located on the replicon pInhi_A227. These genes are also present and located on the largest plasmids of P. inhibens DSM 17395 (PGA1_262p) and P. inhibens DSM 24588 (PGA2_239p) (Figure 3). In addition, P. inhibens strain T5 T possesses a gene cluster coding for a nitrous oxide reductase (Inhi_3786 -Inhi_3792) located on the replicon pInhi_B88, which is absent in the strains DSM 17395 and DSM 24588 (Figure 3). Neither strain T5 T nor DSM 17395 and DSM 24588 have genes coding for a nitrate reductase. The findings suggest that P. inhibens T5 T has a complete dissimilatory nitrite reduction pathway, but is not able to reduce nitrate, as previously described by Martens et al. (2006) [1]. To confirm the results we tested strain T5 T for its capability to grow anaerobically with nitrite. Anaerobic marine basal medium was prepared according to Cypionka and Pfennig (1986) [79] and supplemented with nitrite and glucose, both in a final concentration of 5 mM. After two weeks a decrease of nitrite was determined by photometric analysis at 545 nm by using the Griess reaction [80] and an increase of the turbidity was detected (results not shown). Thus it became clear that P. inhibens T5 T is able to grow anaerobically with nitrite, suggesting an emended description of this organism as a facultatively anaerobic bacterium.
Phylogenetic analysis shows that P. inhibens and P. gallaeciensis form a cluster together with Phaeobacter arcticus (Figure 1). The cluster is set apart from the cluster comprising Leisingera aquimarina, Leisnigera nanhaiensis, Leisingera methylohalidivorans, Phaeobacter caeruleus and Phaeobacter daeponensis, but the backbone of the 16S rRNA gene tree shown in Figure 1 is rather unresolved. Using the online analysis tool "Genome-to-Genome Distance Calculator " 2.0 (GGDC) [81,82], we performed a preliminary phylogenetic analysis of the draft genomes of the type strains of the genera Leisingera and Phaeobacter and the finished genomes of P. inhibens strains DSM 17395 and DSM 24588. Table 7 shows the results of the in silico calculated DNA-DNA hybridization (DDH) similarities of P. inhibens to other Phaeobacter and Leisingera species. In the following analysis, we will refer only to the results of formula 2, as this formula is robust against the use of draft genomes such as AOQA01000000 (CIP 105210 T ) [83]. The use of GGDC revealed a high similarity of T5 T (78%) to the strains P. inhibens DSM 17395 and DSM 24588, but a low similarity to P. gallaeciensis strain CIP 105210 T (36%). DSM 17395 and CIP 105210 T were previously supposed to be typestrain deposits for P. gallaeciensis [33] and we cross-compared them using GGDC. Formula 2 yielded a similarity of only 38.30% ± 2.50 between these two strains, thus indicating not only that they are not the same strain, but also do not even belong to the same species. The results are in agreement with the study of   [15] showing that strain DSM 17395 is the false deposit and belongs together with DSM 24588 to P. inhibens, whereas CIP 105210 T is the correct type-strain deposit for P. gallaeciensis.
The differences in the G+C content (55.7%) published earlier [1] and the value calculated directly from the genome (Table 3) warrants an update of the taxonomic description on P. inhibens [84].
Moreover, genomic and experimental evidence indicates that P. inhibens is not strictly aerobic but facultatively anaerobic.  † The standard deviations indicate the inherent uncertainty in estimating DDH values from interg enomic distances based on models derived from empirical test data sets (which are always limited in size); see [83] for details. The distance formulas are explained in [82]; formula 2 is recommended, particularly for draft g enomes (such as AOQA 01000000). The numbers in parentheses are GenBank accession numbers identifying the underlying g enome sequences.

Conclusion Emended description of the species Phaeobacter inhibens Martens et al. 2006
The description of the species Phaeobacter inhibens is the one given by Martens et al. 2006 [1], with the following modification. The G+C con-tent, rounded to zero decimal places, is 60%. Phaeobacter inhibens is a facultative anaerobic bacterium by using nitrite reduction.