Genome sequence of strain HIMB624, a cultured representative from the OM43 clade of marine Betaproteobacteria

Strain HIMB624 is a planktonic marine bacterium within the family Methylophilaceae of the class Betaproteobacteria isolated from coastal seawater of Oahu, Hawaii. This strain is of interest because it is one of few known isolates from an abundant clade of Betaproteobacteria found in cultivation-independent studies of coastal seawater and freshwater environments around the globe, known as OM43. Here we describe some preliminary features of the organism, draft genome sequence and annotation, and comparative genomic analysis with one other sequenced member of this clade (strain HTCC2181). The 1,333,209 bp genome of strain HIMB624 is arranged in a single scaffold containing four contigs, and contains 1,381 protein encoding genes and 39 RNA genes.


Introduction
Strain HIMB624 was isolated from surface seawater of Kaneohe Bay, a subtropical bay on the northeastern shore of Oahu, Hawaii, via dilution to extinction culturing methods [1,2]. This strain is of interest because it belongs to a globally ubiquitous clade of aquatic bacterioplankton known as OM43, within the obligately methylotrophic family Methylophilaceae of the class Betaproteobacteria. The OM43 lineage was first described in 1997 from a 16S rRNA gene survey of coastal bacterioplankton from the Atlantic coast of the United States [3], and the first published report describing the isolation of OM43 strains via modified extinction to dilution culturing methods was reported in 2002 [1]. Recently, the genome sequence of a member of the OM43 lineage was reported for a strain isolated from the Pacific coast of the United States (HTCC2181) [4]. Here we present a preliminary set of features for strain HIMB624 (Table 1), together with a description of the genomic sequencing and annotation, as well as a preliminary comparative analysis with the genome of strain HTCC2181.

Classification and features
Strain HIMB624 was isolated from seawater collected off of the coast of Hawaii, USA, in the subtropical North Pacific Ocean by a high throughput, dilution-to-extinction approach [1,2]. The strain was re-grown in seawater that was sterilized by tangential flow filtration and by autoclaving. Attempts to cultivate cells on solidified seawater media or artificial seawater media (liquid or solidified) failed. However, amendment of sterile seawater with either methanol or formaldehyde increased the maximum cell density from ca. 1×10 6 cells ml -1 to ca. 1×10 7 cells ml -1 . Phylogenetic analyses based on 16S rRNA gene sequence comparisons revealed strain HIMB624 to be closely related to a large number of environmental gene clones obtained predominantly from seawater. Alignment of the HIMB624 16S rRNA gene sequence with the Silva release 104 reference database containing only high quality, aligned 16S rRNA sequences with a minimum length of 1,200 bases for Bacteria released in October 2010 (n=512,037 entries) [13], revealed 350 entries that belong to the same phylogenetic lineage within the Betaproteobacteria. Of these, only the entries from HTCC2181, HIMB624 and one other strain Standards in Genomic Sciences (AB022337) originated from cultivated isolates and all entries in the lineage were derived either from seawater, freshwater, or the marine environment. In phylogenetic analyses with taxonomically described members of the Betaproteobacteria, strains HIMB624 and HTCC2181 formed a monophyletic lineage within the family Methylophilaceae (Figure 1; 96.5% sequence similarity). The 16S rRNA gene of strain HIMB624 was most similar to the type strains of Methylophilus luteus strain Mim (94.4%) and Methylophilus flavus strain Ship (94.3%), both isolated from plants [18]; Methylophilus methylotrophus strain NCIMB 10515 (93.7%), isolated from activated sludge [19]; Methylotenera mobilis strain JLW8 (93.7%), isolated from freshwater sediment [20]; Methylobacillus flagellatus strain KT (93.5%) isolated from sewage [21]; Methylovorus mays strain C isolated from maize phyllosphere (92.5%) [22]; and Methylobacillus pratensis strain F31 (91.8%), isolated from meadow grass [23]. Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [12]. If the evidence code is IDA, then the property was directly observed by one of the authors or an expert mentioned in the acknowledgements. Sequence selection and alignment improvements were carried out using the 'All-Species Living Tree' project database [14] and the ARB software package [15]. The tree was inferred from 1,223 alignment positions using the RAxML maximum likelihood method [16]. Bootstrap support values, determined by RAxML [17], are displayed above branches if larger than 60% from 1000 replicates. The scale bar indicates substitutions per site.
In actively growing cultures, cells of strain HIMB624 are long, thin slightly curved rods between 0.1-0.3 μm wide and 0.6-1.8 μm long ( Figure 2). Cells in stationary phase are spherical and approximately 0.2 μm in diameter. Strain HIMB624 can replicate in sterile unamended seawater, reaching cell densities of approximately 1×10 6 cells ml -1 . However, in the presence of either methanol or formaldehyde, HIMB624 can achieve a significantly higher growth rate and cellular abundance, similar to the phylogenetically related strain HTCC2181 [4].

Chemotaxonomy
The fatty acid profile of strain HIMB624 was dominated by anteiso-C 17:1 , C 14:0 and C 16:0 . This is similar to known obligate and restricted facultative methylotrophs within the Betaproteobacteria, which are typically dominated by anteiso-C 17:1 and C 16:0 [20]. All of the fatty acids detected in strain HIMB624 are either found in closely related strains or in strains isolated from marine environments. C 13:02-OH was detected in HIMB624 but not in HTCC2181, and C 15:1 iso G was only found in strain HTCC2181.

Genome sequencing and annotation Genome project history
Strain HIMB624 was selected for whole genome sequencing because of its phylogenetic affiliation with a lineage (OM43) of coastal marine bacterioplankton that is common in 16S rRNA gene surveys of coastal and estuarine systems [24], but is underrepresented in culture collections [1,4]. In addition, a sister lineage is common in freshwater systems [24]. The respective genome project is deposited in the Genomes OnLine Database (GOLD) as project Gi02451, and in GenBank under the accession number ABXG00000000. A summary of the main project is given in Table 2.

Growth conditions and DNA isolation
Strain HIMB624 was grown at 27°C in 100 L of coastal Hawaii seawater sterilized by tangential flow filtration and autoclaving. Cells from liquid culture were collected on a 0.1 µm pore-sized polyethersulfone membrane filter, and DNA was isolated from the microbial biomass using a standard phenol/chloroform/isoamyl alcohol extraction protocol. A total of 74 µg of DNA was obtained. Standards in Genomic Sciences

Genome annotation
The whole genome sequence was automatically annotated using the genome annotation pipeline in the Integrated Microbial Genomes Expert Review (IMG-ER) system [26]. Genes were identified using Glimmer [27]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. The tRNAScanSE tool [28] was used to find tRNA genes, whereas ribosomal RNAs were found by using the tool RNAmmer [29]. Other non-coding RNAs were identified by searching the genome for the Rfam profiles using INFERNAL (v0.81) [30]. Additional gene prediction analysis and manual functional annotation was performed within IMG-ER.

Genome properties
The genome is 1,333,209 bp long and comprises four contigs in a single scaffold, with an overall GC content of 35.37% (Table 3 and Figure 3). Of the 1,420 genes predicted, 1,381 were protein-coding genes and 39 were RNAs. The majority (83.59%) of the protein coding genes was assigned with a putative function, while the remaining genes were annotated as hypothetical proteins. The distribution of genes into COGS functional categories is presented in Table 4.

Insights from the Genome
Of 1,381 protein encoding genes in the genome of HIMB624, 1,135 are shared with HTCC2181, representing 82-84% of the two genomes ( Figure   4). Pathways for the synthesis of all twenty amino acids are present in both strains, as well as for the synthesis of all major vitamins except B12. The family Methylophilaceae consists of obligate methylotrophs and, while HIMB624 and HTCC2181 lack genes coding for either the large (mxaF) or small (mxaI) subunit of a confirmed methanol dehydrogenase, both organisms appear to have genes coding for a related analog of mxaF, known as xoxF. Methanol dehydrogenase activity of this paralog has been questioned for some time (see [4] and references therein), but current evidence suggests that the xoxF genes in these organisms code for a large subunit having methanol dehydrogenase activity [4].  a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. Standards in Genomic Sciences The genomes of HIMB624 and HTCC2181 were compared to two closely related species within the family Methylophilaceae whose whole genomes are publicly available: Methylotenera mobilis (NC_012968) and Methylovorus glucosotrophus SIP3-4 (NC_012969, NC_012970, NC_012972). For this comparison only, the four strains were automatically annotated using the RAST annotation server [31] and protein sequences were compared using the sequence based analysis tool in order to identify all shared and unique gene combinations ( Figure 4). In addition to a single large chromosome, Methylovorus glucosotrophus SIP3-4 has 2 plasmids, while the remaining three genomes are all single chromosomes only. Strain HIMB624 contains one gene for a Type 4 fimbrial assembly/ATPase PilB that shares 43.44% protein identity with a gene located on one of the plasmids of Methylovorus glucosotrophus SIP3-4, and strain HTCC2181 contains a single DNA methylase gene that shares 31.1% protein identity with the same plasmid. Other than these, all genes located on the plasmids are exclusive to Methylovorus glucosotrophus SIP3-4, and the large majority of the genes on the plasmids are hypothetical proteins. The genomes of Methylotenera mobilis and Methylovorus glucosotrophus SIP3-4 share over 100 genes associated with motility (twitching, flagella related, pili), along with 13 genes for chemotaxis and 13 genes for secretion that are absent from the genomes of HIMB624 and HTCC2181, while the two smaller genomes have a higher percentage of their genomes (9.13% and 9.19%) dedicated to amino acid transport and metabolism than Methylovorus glucosotrophus SIP3-4 (6.76%) and Methylotenera mobilis (5.81%); and a higher percentage of translation, ribosomal structure and biogenesis genes (11.08% and 11.47%) than Methylovorus glucosotrophus SIP3-4 (6.12%) and Methylotenera mobilis (7.16%). Due to the small size of the two OM43 lineage genomes, the higher percentages result in a similar total number of genes between all genomes in these categories, at approximately 120 genes for amino acid transport and metabolism and approximately 140 genes for translation, ribosomal structure and biogenesis. The general distribution of genes in all other predicted COG categories are comparable between the four strains, resulting in smaller numbers of total genes in each COG category for the two members of the OM43 lineage due to their comparatively smaller genome sizes. a) The total is based on the total number of protein coding genes in the annotated genome. Standards in Genomic Sciences