Draft genome sequence of marine alphaproteobacterial strain HIMB11, the first cultivated representative of a unique lineage within the Roseobacter clade possessing an unusually small genome

Strain HIMB11 is a planktonic marine bacterium isolated from coastal seawater in Kaneohe Bay, Oahu, Hawaii belonging to the ubiquitous and versatile Roseobacter clade of the alphaproteobacterial family Rhodobacteraceae. Here we describe the preliminary characteristics of strain HIMB11, including annotation of the draft genome sequence and comparative genomic analysis with other members of the Roseobacter lineage. The 3,098,747 bp draft genome is arranged in 34 contigs and contains 3,183 protein-coding genes and 54 RNA genes. Phylogenomic and 16S rRNA gene analyses indicate that HIMB11 represents a unique sublineage within the Roseobacter clade. Comparison with other publicly available genome sequences from members of the Roseobacter lineage reveals that strain HIMB11 has the genomic potential to utilize a wide variety of energy sources (e.g. organic matter, reduced inorganic sulfur, light, carbon monoxide), while possessing a reduced number of substrate transporters.


Introduction
Bacteria belonging to the Roseobacter lineage of marine Alphaproteobacteria account for a substantial fraction (ranging ~10 -25%) of bacterioplankton cells in surface ocean seawater [1][2][3][4], making them one of a relatively small number of suitable targets for scientists investigating the ecology of abundant marine bacterial groups. Focused genome sequencing efforts have provided significant insights into the functional and ecological roles for this group [5][6][7]. In 2004, the first member of this group to have its genome sequenced, Ruegeria pomeroyi (basonym Silicibacter pomeroyi) strain DSS-3 [8], revealed strategies used by the Roseobacter group for nutrient acquisition in the marine environment. To date, over 40 genomes have been sequenced from members of the Roseobacter lineage. Comparative analysis among 32 of these genomes indicates that members of this group are ecological generalists, having relatively plastic requirements for carbon and energy metabolism, which may allow them to respond to a diverse range of environmental conditions [9]. For example, members of the Roseobacter lineage have the genomic potential to obtain energy via oxidation of organic substrates, oxidation of inorganic compounds, and/or sunlight-driven electron transfer via bacteriochlorophyll a, proteorhodopsin, or xanthorhodopsin phototrophic systems. Genome analyses as well as culture experiments have also revealed a variety of mechanisms by which roseobacters may associate and interact with phytoplankton and other eukaryotes. These include genes involved in uptake of compounds produced by algae such as peptides, amino acids, putrescine, spermidine, and DMSP [10,11], as well as genes for chemotaxis, attachment, and secretion [12].
Strain HIMB11 was isolated from surface seawater collected from Kaneohe Bay off the coast of Oahu, Hawaii, USA in May, 2005. Subsequent 16S rRNA gene sequence comparisons revealed it to be a member of the Roseobacter clade of marine bacterioplankton [13] that was highly abundant after a storm-induced phytoplankton bloom in the bay [14]. Here, we present a preliminary set of features for strain HIMB11, a description of the draft genome sequence and annotation, and a comparative analysis with 35 other genome sequences from members of the Roseobacter lineage.
Genome annotation revealed strain HIMB11 to have the genetic potential for bacteriochlorophyllbased aerobic anoxygenic phototrophic (AAnP) metabolism and degradation of the algal-derived compound DMSP along with production of the climate-relevant gas dimethylsulfide (DMS), and oxidation of the greenhouse gas carbon monoxide (CO). Collectively, these features indicate the potential for strain HIMB11 to participate in the biogeochemical cycling of sulfur and carbon, and concomitantly affect global climate processes.

Classification and features
Strain HIMB11 was isolated by a high-throughput, dilution-to-extinction approach [15] from surface seawater collected near the coast of Oahu, Hawaii, USA, in the tropical North Pacific Ocean. The strain was isolated in seawater sterilized by tangential flow filtration and amended with low concentrations of inorganic nitrogen and phosphorus (1.0 µM NH4Cl, 1.0 µM NaNO3, and 0.1 µM KH2PO4).
Comparative analysis of the HIMB11 16S rRNA gene sequence to those from cultured, sequenced roseobacters indicates that HIMB11 occupies a unique lineage that is divergent from the 16S rRNA gene sequences of Roseobacter strains already in culture ( Figure 1). Based on the National Center for Biotechnology Information (NCBI) non-redundant database, the HIMB11 16S rRNA gene sequence is most similar (~99% nucleotide identity) to a large number of environmental gene clones obtained from various marine environments that exclusively fall in the Roseobacter lineage of Alphaproteobacteria.
Because of the significant sequence variation in 16S rRNA genes (up to 11%) and the prevalence of horizontal gene transfer within the clade, establishing a taxonomic framework for roseobacters remains a challenge [9]. When genome sequence data is available, it is often more informative to perform a phylogenomic analysis based on shared orthologs versus 16S rRNA phylogenetic analysis alone [9,21]. A maximum likelihood tree constructed using 719 shared orthologous protein sequences supported the 16S rRNA gene-based analysis by revealing that HIMB11 formed a unique sublineage of the Roseobacter clade (Figure 2). Phylogenetic relationships between HIMB11 and bacterial strains belonging to the Roseobacter clade. SSU rRNA gene sequences were aligned with version 111 of the 'All-Species Living Tree' project SSU rRNA gene database [16] using the ARB software package [17]. The phylogeny was constructed from nearly full-length gene sequences using the RAxML maximum likelihood method [18] within ARB, filtered to exclude alig nment positions that contained gaps or ambiguous nucleotides in any of the sequences included in the tree. Bootstrap analyses were determined by RAxML [19] via the raxmlGUI g raphical front end [20]. The scale bar corresponds to 0.05 substitutions per nucleotide position. Open circles indicate nodes with bootstrap support between 50-80%, while closed circles indicate bootstrap support >80%, from 500 replicates. A variety of Archaea were used as outgroups.
HIMB11 cells are short, irregular rods (0.3-0.5 x 0.8 µm) that are generally smaller in size than previously reported for other cultured Roseobacter strains (e.g. described taxa in Bergey's Manual range from 0.5-1.6 -1.0-4.0 µm) [24] (Figure 3). HIMB11 is likely motile, as the genes necessary to build flagella are present (e.g. fli, flg). Based on the ability of HIMB11 to grow in dark or light on a medium consisting solely of sterile seawater amended with inorganic nitrogen and phosphorus, and the absence any of the known pathways for inorganic carbon fixation, the strain is presumed to acquire carbon and energy via the oxidation of components of the dissolved organic carbon pool in natural seawater. Based on the presence of carbon-monoxide-oxidizing genes (i.e. coxL, forms I and II) [25,26] as well as bacteriochlorophyll-based phototrophy genes (e.g. puf, puh, bch) [27,28], HIMB11 is hypothesized to oxidize both organic and inorganic compounds as well as obtain energy from light [5]. A summary of these and other features is shown in Table 1. MIGS-4.4 Altitude a) Evidence codes -IDA: Inferred f rom Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a g enerally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontolog y project [37].

Genome sequencing information Genome project history
The genome of strain HIMB11 was selected for sequencing based on its phylogenetic affiliation with the widespread and ecologically important Roseobacter clade of marine bacterioplankton and its periodically high abundance in coastal Hawaii seawater [14]. The genome sequence was completed on May 25, 2011, and presented for public access on September 15, 2013. The genome project is deposited in the Genomes OnLine Database (GOLD) as project Gi09592. The Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession number AVDB00000000. The version described in this paper is version AVBD01000000. Table 2 presents the main project information and its association with MIGS version 2.0 compliance [29].

Growth conditions and DNA isolation
Strain HIMB11 was grown at 27 °C in 60 L of coastal Hawaii seawater sterilized by tangential flow filtration and supplemented with 10 µM NH4Cl, 1.0 µM KH2PO4, and 1.0 µM NaNO3 (final concentrations). Cells from the liquid culture were collected on a membrane filter, and DNA was isolated using a standard phenol/chloroform/isoamyl alcohol extraction protocol. A total of 50 μg of DNA was obtained.

Genome annotation
Genes were identified using Prodigal 1.4 [38] as part of the genome annotation pipeline in the Integrated Microbial Genomes Expert Review (IMG-ER) system [39,40] developed by the Joint Genome Institute (Walnut Creek, CA, USA). Predicted coding sequences were translated and used as queries against the NCBI non-redundant database and UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. The tRNAScanSE tool [41] was used to identify tRNA genes, and ribosomal RNAs were identified using RNAmmer [42]. Other non-coding RNAs were found by searching the genome for corresponding Rfam profiles using INFERNAL [43]. Additional gene prediction analysis and manual functional annotation was performed within the IMG-ER platform.

Genome properties
The HIMB11 draft genome is 3,098,747 bp long and comprises 34 contigs ranging in size from 454 to 442,822 bp, with an overall GC content of 49.73% (Table 3). Of the 3,237 predicted genes, 3,183 (98.33%) were protein-coding genes, and 54 were RNAs. Most (78%) protein-coding genes were assigned putative functions, while the remaining genes were annotated as hypothetical proteins. The distribution of genes into COG functional categories is presented in Table 4. The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. a) The total is based on the total number of protein coding g enes in the annotated genome.

Metabolism of HIMB11
Major pathways of carbon, nitrogen, phosphorus, and sulfur acquisition, as well as alternative metabolisms and means of energy acquisition (e.g. light, CO), were annotated based on the presence and absence of key genes involved in these processes. A summary is provided in Figure 4. HIMB11 appears to possess an incomplete glycolysis pathway (pfkC and pgm are absent), yet it possesses the genes necessary for gluconeogenesis. HIMB11 harbors genes for the Entner-Doudoroff and pentose phosphate pathways, as well as pyruvate carboxylase to perform anaplerotic CO2 fixation. HIMB11 does not appear to use inorganic forms of nitrogen other than ammonium, as there are no genes present that are involved in nitrogen fixation, nitrate or nitrite reduction, nitric oxide reduction, nitrous oxide reduction, hydroxylamine oxidation, or nitroalkane denitrification. Instead, HIMB11 is hypothesized to rely solely on reduced and organic nitrogen sources; there are transporters for ammonium (amtB) and a variety of other nitrogen-containing substrates (e.g. amino acids, polyamines, glycine betaine, taurine), as well as genes for urease (ureABC). Strain HIMB11 possesses a high-affinity phosphate transporter accompanied by regulatory genes (pstSCAB, phoUBR) and alkaline phosphatase (phoA), suggesting that it can utilize both inorganic and organic forms of phosphorus; it does not harbor low-affinity phosphate transport (pitA) or the genes for phosphonate utilization (phnGHIJKLM). HIMB11 possesses genes for assimilatory sulfate reduction (cys) and for the metabolism of reduced, organic sulfur compounds (e.g. amino acids, DMSP). DMSP is an osmolyte produced by certain phytoplankton, including dinoflagellates and coccolithophores [44,45], and acts as a major source of both carbon and sulfur for marine bacterioplankton in ocean surface waters [46][47][48][49]. Roseobacters are frequently abundant during DMSP-producing algal blooms [1], and members of this group have become models for the study of bacterial transformations of DMSP [50]. There are two competing pathways for DMSP degradation: the demethylation pathway that leads to assimilation of sulfur (dmdA, -B, -C, -D), and the cleavage pathway that leads to the release of DMS (dddD, -L, -P, -Q, -W, -Y) [51]. DMS is a climate-active gas that has been implicated in the formation of atmospheric-cooling aerosols and clouds. The genome of HIMB11 harbors versions of both sides of the pathway (dmdA, -B, -C, -D' and dddP, -D).
The HIMB11 genome contains genes that encode for a diverse array of energy acquisition strategies. The presence of the sox gene cluster indicates that HIMB11 is putatively capable of oxidizing reduced inorganic sulfur compounds [8] as a mechanism for lithoheterotrophic growth. Additional modes of energy acquisition encoded by the HIMB11 genome include pathways for CO oxidation to CO2 via carbon monoxide dehydrogenase (i.e. cox operons, including coxL, forms I and II) [25,26], degradation of aromatics (i.e. gentisate, benzoate, phenylacetic acid), and bacteriochlorophyll-based anoxygenic photosynthesis. Photosynthetic genes are organized in a photosynthesis gene cluster (PGC) and include genes for the photosynthetic reaction center (puf and puh), light harvesting complexes, biosynthesis of bacteriochlorphyll a and carotenoids, and regulatory factors (bch and crt). Two conserved regions within the PGC that were identified in a recent study examining the structure and arrangement of PGCs in ten AAnP bacterial genomes of different phylogenies, bchFNBHLM-LhaA-puhABC and crtF-bchCXYZ [28], were also found to be conserved in the HIMB11 genome. The arrangement of the puf genes (pufQBALMC) as well as the puh genes (puhABC-hyp-ascF-puhE) in HIMB11 is very similar to what has been described before for other Roseobacter strains [28]. Putative genes containing the sensor domain BLUF (blue-light-utilizing flavin adenine dinucleotide) were also found in HIMB11. BLUF sensor domains have been hypothesized to be involved in a light-dependent regulation of the photosynthesis operon and may enable light sensing for phototrophy [5,52].

Genome comparisons with other members of the Roseobacter clade
A regression model was used to estimate the genome size of HIMB11 based on the genomes of 40 Roseobacter strains ( Figure 5). The model considers the number of nucleotides sequenced versus the ratio of the number of conserved single-copy genes universally present in Roseobacter genomes to the number of predicted protein-encoding genes. These data were fit to an exponential regression model (R 2 =0.94), which estimates the genome coverage of the draft HIMB11 genome to be 90.6% and the full genome size to be 3.42 Mb. This is relatively small compared to most cultured Roseobacter genomes (median 4.35 Mb) with only one notable exception (Roseobacter member HTCC2255, 2.21 Mb). At the time of this analysis, 35 other Roseobacter genomes were publically available in the IMG-ER database ( Table 5). The effect of a reduced genome size is readily apparent with respect to the transporter content of the HIMB11 genome: it possesses a highly reduced number of genes devoted to ATPbinding cassette (ABC) transporters and tripartite ATP-independent periplasmic (TRAP) transporters. ABC transporters use energy from ATP hydrolysis to transport a wide range of substrates across the membrane (e.g. ions, amino acids, peptides, sugars). While the 35 public Roseobacter genomes contain on average 279 genes involved in ABC transport (171 to 443 per genome), HIMB11 has only 169 genes for ABC transport systems. TRAP transporters are also underrepresented in the HIMB11 genome. These are a large prokaryotic family of solute transporters that contain a substrate binding protein (DctP) and two membrane proteins (DctQ and DctM). By relying on electrochemical ion gradients rather than ATP for transport [53], they mediate the uptake of a number of different substrates (e.g. succinate, malate, fumarate, pyruvate, taurine, ectoine, DMSP). Roseobacter genomes contain on average 60 genes devoted to TRAP transporter systems (23 to 135 genes per genome), while the HIMB11 genome harbors 26 TRAP transporter genes. In contrast, drug/metabolite transporters (DMTs), which are another abundant group of transporters found in roseobacters [5], are abundant in HIMB11. DMTs are a ubiquitous superfamily (containing 14 families, six of which are prokaryotic) of drug and metabolite transporters, of which few are functionally characterized [54]. In prokaryotes, most act as pumps for the efflux of drugs and metabolites. On average, individual Roseobacter genomes harbor 27 genes for DMTs (19 to 37 per genome). HIMB11 has 33 DMT genes, which is further elevated when normalized to its small genome size. Thus, the reductive trend for ABC and TRAP transporters is reversed in the DMT family of transporters, potentially a result of selective pressure for the efflux of toxins/metabolites.

Conclusion
HIMB11 represents a member of the Roseobacter lineage that is phylogenomically distinct from other cultured, sequenced members of the Roseobacter clade. This uniqueness is further supported by its small genome and cell size relative to other members of this group that have been similarly investigated. These characteristics, taken together with the atypical transporter inventories, the presence of many alternative methods of energy acquisition (e.g. CO, light), and the periodic abundance of HIMB11 in Kaneohe Bay, suggest that stain HIMB11 is an opportunist in the environment, persisting on relatively few reduced substrates and alternative energy metabolism until conditions arise that are favorable for rapid growth (e.g. a phytoplankton bloom). Consistent with other members of this lineage is the potential for HIMB11 to play an important role in the cycling of the climatically important gases DMS, CO, and CO2, warranting further study in both the laboratory and field.