Draft genome sequence of strain HIMB100, a cultured representative of the SAR116 clade of marine Alphaproteobacteria

Strain HIMB100 is a planktonic marine bacterium in the class Alphaproteobacteria. This strain is of interest because it is one of the first known isolates from a globally ubiquitous clade of marine bacteria known as SAR116 within the family Rhodospirillaceae. Here we describe preliminary features of the organism, together with the draft genome sequence and annotation. This is the second genome sequence of a member of the SAR116 clade. The 2,458,945 bp genome contains 2,334 protein-coding and 42 RNA genes.


Introduction
HIMB100 is a taxonomically uncharacterized marine bacterial strain isolated from surface seawater collected off the coast of Oahu, Hawaii in the subtropical Pacific Ocean [1]. It is of significant interest because it belongs to a 16S rRNA gene clade of marine Alphaproteobacteria known as SAR116, which was first described by Mullins et al. in 1995 [2] and has since been found to be widespread in the global surface ocean based on cultivation-independent surveys of marine bacterioplankton [3][4][5][6][7][8][9]. The first cultured strain of this clade was isolated from surface waters of the Pacific Ocean off the coast of Oregon, USA, in 2007 [10]. In 2010, the genome sequence of Candidatus Puniceispirillum marinum IMCC1322, a cultivated member of the SAR116 clade isolated from the East Sea in the Western Pacic Ocean (Sea of Japan), was reported [11]. Here we present a preliminary set of features for strain HIMB100 (Table  1), together with a description of the complete genomic sequencing and annotation, as well as a preliminary comparative analysis with the complete genome of Candidatus P. marinum IMCC1322.

Classification and features
Strain HIMB100 was isolated by a highthroughput, dilution-to-extinction approach [20] from seawater collected off the coast of Hawaii, USA, in the subtropical North Pacific Ocean, and bore an identical 16S rRNA gene sequence to three other isolates obtained from the same study [1]. All four strains were isolated in seawater sterilized by tangential flow filtration and amended with low concentrations of inorganic nitrogen and phosphorus (1.0 µM NH4Cl, 1.0 µM NaNO3, and 0.1 µM KH2PO4). Repeated attempts to cultivate the isolates on solidified culture media or in artificial seawater media failed. In addition, preliminary attempts have failed to identify amendments to the seawaterbased culture medium that would increase the abundance of cells in culture above ca. 1 ×10 6 cells ml -1 .
Phylogenetic analyses based on 16S rRNA gene sequence comparisons revealed strain HIMB100 to be closely related to a large number of environmental gene clones obtained almost exclusively from seawater. For example, alignment of HIMB100 against the Silva release 104 reference database (512,037 high quality bacterial 16S rRNA sequences >1200 base pairs in length, released October 2010) revealed 554 entries that belong to the same phylogenetic lineage within the Alphaproteobacteria. Of these, only one originated from a cultivated isolate (Candidatus P. marinum IMCC1322), and all 554 entries derived from either seawater or the marine environment. The 16S rRNA gene sequence from Oregon coast strain HTCC8037 was 98.0% similar to that of strain HIMB100, but it does not appear in the Silva reference database because it is a partial sequence of 884 nucleotides in length [10]. In phylogenetic analyses with taxonomically described members of the Alphaproteobacteria, strain HIMB100 and Candidatus P. marinum IMCC1322 (94.1% similar) formed a monophyletic lineage within the family Rhodospirillaceae ( Figure 1). The 16S rRNA gene of strain HIMB100 was most similar to the type strains of Nisaea denitrificans (90.3%), N. nitritireducens (89.9%), and Thalassobaculum salexigens (89.3%), which were all isolated from surface seawater of the northwestern Mediterranean Sea [25,26], T. litoreum (89.5%), isolated from coastal seawater off of Korea [27], and Oceanibaculum indicum (89.4%), isolated from a polycyclic aromatic hydrocarbondegrading consortium that was enriched from a deep-seawater sample collected from the Indian Ocean [28].
Cells of strain HIMB100 are long, thin spiralshaped rods (0.3 x 1-5 μm) when in exponential growth ( Figure 2). Because it is able to grow in media consisting solely of sterile seawater with added inorganic nitrogen and phosphorus in the light or dark, HIMB100 is presumed to grow chemoheterotrophically by oxidizing compounds in the dissolved organic carbon pool of natural seawater. A summary of other known preliminary features is shown in Table 1.

Chemotaxonomy
No cellular fatty acids profiles are currently available for strain HIMB100, nor have any been reported for other cultivated members of the SAR116 clade.

Genome sequencing and annotation Genome project history
Strain HIMB100 was selected for sequencing because of its phylogenetic affiliation with a widespread lineage of marine bacteria that is significantly underrepresented in culture collections. The genome project is deposited in the Genomes OnLine Database (GOLD) as project Gi06671, and the complete genome sequence in GenBank as accession number AFXB00000000 [ Table 2]. A summary of the main project information is shown in Table 2. Phylum Proteobacteria TAS [14] Class Alphaproteobacteria TAS [15,16] Order Rhodospirillales TAS [17,18] Family Rhodospirillaceae TAS [17,18] Genus not assigned Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [19]. If the evidence code is IDA, the property was directly observed by one of the authors or an expert mentioned in the acknowledgements.

Growth conditions and DNA isolation
Strain HIMB100 was grown at 27° C in 50 L of coastal Hawaii seawater sterilized by tangential flow filtration [1] and supplemented with (final concentration) 10 µM NH4Cl, 1.0 µM KH2PO4, 1.0 µM L-serine, 1.0 µM L-methionine, 10 mM FeCl3, 0.1 µM betaine, 0.001% (wt/vol) of D-ribose, D-glucose, succinic acid, pyruvic acid, glycerol, and N-acetyl-D-glucosamine, 0.002% (vol/vol) ethanol, and Va vitamin mix at a 10 -3 dilution [20]. Cells from the liquid culture were collected on a membrane filter, and DNA was isolated from the microbial biomass using a standard phenol/chloroform/isoamyl alcohol extraction protocol. A total of ca. 12 μg of DNA was obtained. Phylogenetic tree based comparisons between 16S rRNA gene sequences from strain HIMB100, Candidatus Puniceispirillum marinum IMCC1322, and type strains of related species within the family Rhodospirillaceae. Sequence selection and alignment improvements were carried out using the 'All-Species Living Tree' project database [21] and the ARB software package [22]. The tree was inferred from 1,206 alignment positions using the RAxML maximum likelihood method [23]. Support values from 100 bootstrap replicates, determined by RAxML [24], are displayed above branches if larger than 60%. The scale bar indicates substitutions per site.

Genome annotation
Genes were identified using Prodigal [29] as part of the genome annotation pipeline in the Integrated Microbial Genomes Expert Review (IMG-ER) system [30]. The predicted coding sequences were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. The tRNAS-canSE tool [31] was used to find tRNA genes, whereas ribosomal RNAs were found by using the tool RNAmmer [32]. Other non-coding RNAs were identified by searching the genome for the Rfam profiles using INFERNAL (v0.81) [33]. Additional gene prediction analysis and manual functional annotation was performed within IMG-ER.

Genome properties
The genome is 2,458,945 bp long and comprises 10 contigs ranging in size from 30,717 to 1,167490 bp, with an overall GC content of 50.48 % (Table 3 and Figure 3). Of the 2,376 genes predicted, 2,334 were protein coding genes, and 42 were RNAs. Most protein coding genes (82.0%) were assigned putative functions, while the remaining genes were annotated as hypothetical proteins. The distribution of genes into COG functional categories is presented in Table 4.

Genome comparisons with Candidatus Puniceispirillum marinum IMCC1322
The genome of one other member of the SAR116 clade, Candidatus P. marinum IMCC1322, was recently sequenced [11]. This genome is 2,753,527 bp in length (295 Kbp longer than HIMB100), arranged in a single chromosome, and possesses a G + C content similar to that of HIMB100 (48.85% vs. 50.48%). Although the genome of Candidatus P. marinum IMCC1322 is annotated with over 200 more genes than HIMB100 (2,582 genes vs. 2,376), it only encodes for 51 additional proteincoding genes with predicted function. The predicted metabolic potentials encoded by the two genomes have many features in common. The genomes of both strains possess a lesion in the Embden-Meyerhoff-Parnas pathway in that they lack the enzyme 6-phosphofructokinase. However, the genomes of both strains possess two key enzymes of the Entner-Doudoroff pathway, phosphogluconate dehydratase and 2-keto-3deoxy-phosphogluconate aldolase. The oxidative portion of the pentose phosphate pathway is incomplete in both strains; the genome of HIMB100 lacks a recognizable 6-phosphogluconolactonase, while the genomes of both strains lack a recognizable 6-phosphogluconate dehydrogenase. In addition, several genes of predicted biogeochemical importance are present in both strains, including proteorhodopsin and carotenoid biosynthesis genes, carbon monoxide dehydrogenase, dimethylsulfoniopropionate (DMSP) demethylase, and dimethylsulfoxide (DMSO) reductase. Genes for assimilatory sulfate reduction are incomplete in both genomes, and so it is hypothesized that exogenous reduced sulfur compounds, such as DMSP and DMSO, are likely to fill the requirement of sulfur for cellular growth. The genomes of both strains possess a high affinity inorganic phosphate transport system (pstSCAB), and encode a phosphate regulon sensor (phoU), phosphate starvation-inducible protein (phoH), and the phosphate regulon consisting of the phoB-phoR two-component system. Both genomes encode for ABC transporters for iron, glycine betaine/proline, zinc, sorbitol/mannitol, amino acids (branchedchain and general L-amino acids), sulfonate/nitrate/taurine and a heme exporter. Thiamine and alpha-glucoside transport systems were only identified within the genome of strain HIMB100, while ribose and putrescine transport systems were only identified within the genome of Candidatus P. marinum IMCC1322. Finally, two operons of potential ecological relevance show different distributions within the two genomes: the genome of strain HIMB100 possesses a sevengene operon encoding all of the subunits and accessory proteins for urease that is completely lacking in the genome of Candidatus P. marinum IMCC1322, while the genome of Candidatus P. marinum IMCC1322 possesses 21 genes for cobalamin biosynthesis that are absent from the genome of strain HIMB100. a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.  a) The total is based on the total number of protein coding genes in the annotated genome.