Genome sequence of the Leisingera aquimarina type strain (DSM 24565T), a member of the marine Roseobacter clade rich in extrachromosomal elements

Leisingera aquimarina Vandecandelaere et al. 2008 is a member of the genomically well characterized Roseobacter clade within the family Rhodobacteraceae. Representatives of the marine Roseobacter clade are metabolically versatile and involved in carbon fixation and biogeochemical processes. They form a physiologically heterogeneous group, found predominantly in coastal or polar waters, especially in symbiosis with algae, in microbial mats, in sediments or associated with invertebrates. Here we describe the features of L. aquimarina DSM 24565T together with the permanent-draft genome sequence and annotation. The 5,344,253 bp long genome consists of one chromosome and an unusually high number of seven extrachromosomal elements and contains 5,129 protein-coding and 89 RNA genes. It was sequenced as part of the DOE Joint Genome Institute Community Sequencing Program 2010 and of the activities of the Transregional Collaborative Research Centre 51 funded by the German Research Foundation (DFG).


Introduction
Strain R-26159 T (= DSM 24565 T = LMG 24366 T = CCUG 55860 T ) is the type strain of the species Leisingera aquimarina [1], one of the three species currently with a validly published name in the genus Leisingera; the other ones are the type species L. methylohalidivorans [1,2] and L. nanhaiensis [3]. The genus Leisingera is a member of the widespread Roseobacter clade, present in various marine habitats [4]. Strain R-26159 T was isolated from a marine electroactive biofilm grown on a stainlesssteel cathode, which was exposed to natural seawater at the ISMAR-CNR Marine Station within the harbor of Genova (Italy) [1]. The genus Leisingera was named after Thomas Leisinger for his work on the bacterial methyl halide metabolism [2]; the species epithet aquimarina refers to the Neolatin adjective marinus, from the sea, from seawater. PubMed records do not currently indicate any follow-up research with strain R-26159 T after the initial description of L. aquimarina [1]. Here we present a summary classification and a set of features for L. aquimarina DSM 24565 T , together with the description of the genomic sequencing and annotation. Standards in Genomic Sciences (HSPs) from the best 250 hits) with the most recent release of the Greengenes database [7] and the relative frequencies of taxa and keywords (reduced to their stem [8]) were determined, weighted by BLAST scores. The most frequently occurring genera were Phaeobacter (31.4%), Ruegeria (25.9%), Silicibacter (16.1%), Roseobacter (14.4%) and Nautella (3.9%) (127 hits in total). Regarding the four hits to sequences from other members of the genus, the average identity within HSPs was 99.4%, whereas the average coverage by HSPs was 99.3%. Among all other species, the one yielding the highest score was Leisingera methylohalidivorans (NR_025637), which corresponded to an identity of 99.2% and an HSP coverage of 100.0%. (Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification.) The highest-scoring environmental sequence was FJ202534 (Greengenes short name 'and White Plague Disease-Induced Changes Caribbean Coral Montastraea faveolata kept aquarium 23 days clone SGUS1024'), which showed an identity of 97.8% and an HSP coverage of 100.0%. The most frequently occurring keywords within the labels of all environmental samples which yielded hits were 'coral' (4.7%), 'caribbean' (3.8%), 'faveolata' (3.5%), 'chang' (3.4%) and 'white' (3.3%) (117 hits in total). Environmental samples which yielded hits of a higher score than the highest scoring species were not found, which might indicate that the species is rarely found in the environment. Figure 1 shows the phylogenetic neighborhood of L. aquimarina in a 16S rRNA gene based tree. The sequences of the four identical 16S rRNA gene copies in the genome do not differ from the previously published 16S rRNA gene sequence AM900415.

Figure 1.
Phylogenetic tree highlighting the position of L. aquimarina relative to the type strains of the other species within the genus Leisingera and the neighboring genera Phaeobacter and Ruegeria. The tree was inferred from 1,383 aligned characters [9,10] of the 16S rRNA gene sequence under the maximum likelihood (ML) criterion [11]. Rooting was done initially using the midpoint method [12] and then checked for its agreement with the current classification ( Table 1). The branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent to the branches are support values from 1,000 ML bootstrap replicates [13] (left) and from 1,000 maximum-parsimony bootstrap replicates [14] (right) if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [15] are labeled with one asterisk, those also listed as 'Complete and Published' with two asterisks [16,17]. The genomes of P. caeruleus [18] and P. arcticus [19] are reported in this issue.

Morphology and physiology
Cells of strain R-26159 T are Gram-negative, ovoid (1 × 1.4 µm) and contain a single polar flagellum (not visible in Figure 2), which is used for motility. Poly-β-hydroxybutyrate is present in inclusion bodies. Colonies are dark beige-pink in color, round and 1-2 mm in diameter after 3 days incubation on marine agar (MA). Growth occurs after 2 days incubation at 20 °C on MA, but not on Reasoner' 2A agar (R2A), Nutrient agar (NA), Trypticase-Soy agar (TSA) or Peptone-Yeast Extract-Glucose agar (PYG). The temperature range for growth is 4-37°C whereas no growth occurs at 40°C or higher. The salinity range for growth is 1-7% NaCl. The pH range for growth is 5.5-9.0 with an optimum between 6.5-8. Growth occurs on betaine (1 mM) as a sole carbon source, but not on L-methionine (10 mM). Cells are catalase-and oxidase-positive. Degradation of gelatin is weakly positive but cells do not degrade tyrosine, DNA, starch, casein, chitin, aesculin or Tween 80. The strain shows leucine arylamidase activity; weak alkaline phosphatase, esterase lipase (C8) and naphthol-AS-BI phosphohydrolase activities. No activity is detected for esterase (C4), valine arylamidase, acid phosphatase, α-galactosidase, βglucuronidase, α-glucosidase, β-glucosidase, Nacetyl-β-glucosaminidase, α-mannosidase, lipase (C14), cystine arylamidase, trypsin, αchymotrypsin, arginine dihydrolase, urease or αfucosidase. Nitrate is not reduced to nitrite or nitrogen. Indole is not produced and glucose is not fermented. Cells do not assimilate D-glucose, L-arabinose, D-mannose, D-mannitol, Nacetylglucosamine, maltose, potassium gluconate, capric acid, adipic acid, malate, citrate or phenylacetic acid. Cells are susceptible to cefoxitin (30 mg), erythromycin (15 mg), tetracycline (30 mg) and streptomycin (25 mg), but resistant to vancomycin (30 mg), trimethoprim (1.25 mg), clindamycin (2 mg) and gentamicin (30 mg) (all data from [1]). The utilization of carbon compounds by L. aquimarina DSM 24565 T grown at 20°C was also determined for this study using Generation-III microplates in an OmniLog phenotyping device (BIOLOG Inc., Hayward, CA, USA). The microplates were inoculated at 28°C with a cell suspension at a cell density of 95-96% turbidity and dye IF-A. Further additives were vitamin, micronutrient and sea-salt solutions. The exported measurement data were further analyzed with the opm package for R [31,32], using its functionality for statistically estimating parameters from the respiration curves and translating them into negative, ambiguous, and positive reactions. The strain was studied in two independent biological replicates, and reactions with a different behavior between the two repetitions were regarded as ambiguous. At 28°C the strain reacted poorly, with positive reactions only for 1% NaCl, 4% NaCl and lithium chloride. This is in accordance with the comparatively low median of the temperature range of the strain [1].

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of the DOE Joint Genome Institute Community Sequencing Program 2010, CSP 441: "Whole genome type strain sequences of the genera Phaeobacter and Leisingera -a monophyletic group of highly physiologically diverse organisms". The genome project is deposited in the GenomesOnLine Database [15] and the complete genome sequence was submitted to GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
A culture of L. aquimarina DSM 24565 T was grown in the DSMZ medium 514 (BACTO Marine Broth) [34] at 20°C. Genomic DNA was isolated from 0.5-1 g of cell paste using Jetflex Genomic DNA Purification Kit (GENOMED 600100) following the standard protocol provided by the manufacturer but modified by the use of additional 20 µl proteinase K and 40 minute incubation. DNA is available through the DNA Bank Network [35].

Genome sequencing and assembly
The draft genome was generated using Illumina data [36]. For this genome, we constructed and sequenced an Illumina short-insert paired-end library with an average insert size of 270 bp which generated 13,668,574 reads and an Illumina longinsert paired-end library with an average insert size of 8047.58 +/-2682.23 bp which generated 11,512,166 reads totaling 3,777 Mbp of Illumina data (Feng Chen, unpublished). All general aspects of library construction and sequencing can be found at the JGI web site [37]. The initial draft assembly contained 64 contigs in 18 scaffold(s). The initial draft data was assembled with Allpaths [38] and the consensus was computationally shredded into 10 kbp overlapping fake reads (shreds). The Illumina draft data was also assembled with Velvet [39], and the consensus sequences were computationally shredded into 1.5 kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly.
The consensus from the second Velvet assembly was shredded into 1.5 kbp overlapping fake reads.

Genome annotation
Genes were identified using Prodigal [41] as part of the JGI genome annotation pipeline [42], followed by a round of manual curation using the JGI GenePRIMP pipeline [43]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [44].

Genome properties
The genome statistics are provided in Table 3 and Figure 3. The genome consists of a 4.25 Mbp chromosome and seven extrachromosomal elements of 6.2 to 248.9 kbp length with a G+C content of 61.4%. Of the 5,218 genes predicted, 5,129 were protein-coding genes, and 89 RNAs. The majority of the protein-coding genes (80.4%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Insights into the genome
Genome sequencing of Leisingera aquimarina DSM 24565 T reveals the presence of seven plasmids with sizes between 6 kb and 249 kb ( Table 5). The circular conformation of the chromosome and the two smallest extrachromosomal elements has been experimentally validated. The six larger plasmids contain characteristic replication modules [45] of the RepABC-, DnaA-like, RepA-and RepB-type comprising a replicase as well as the parAB partitioning operon [46]. The respective replicases that mediate the initiation of replication are designated according to the established plasmid classification scheme [47]. The different numbering of e.g. the replicases RepC-8, RepC-13 and RepC-14 from RepABC-type plasmids corresponds to specific plasmid compatibility groups that are required for a stable coexistence of the replicons within the same cell [48]. The cryptic 6 kb plasmid pAqui_G6 contains a solitary RepA-II type replicase without a partitioning module, but replicon maintenance in the daughter cells is probably ensured by its postsegregational killing system (PSK) consisting of a typical operon with two small genes encoding a stable toxin and an unstable antitoxin [49]. PSK systems are also located on pAqui_C182 and pAqui_F126 (Tab. 6).
The locus tags of all replicases, plasmid stability modules and the large virB4 and virD4 genes of type IV secretion systems are presented in Table  6. A characteristic T4SS comprising the relaxase VirD2 and the coupling protein VirD4 as well as the complete virB gene cluster for the transmembrane channel is located on the chromosome [50]. Its functional role is unclear, but very closely related T4SS are detected on plasmids of e.g. Dinoroseobacter shibae DSM 16493 T [51], Leisingera nanhaiensis DSM 24252 T and Phaeobacter caeruleus DSM 24564 T [52]. Furthermore, the largest plasmid pAqui_A249 contains the complete F factor conjugation transfer (tra) region with 20 genes (Aqui_4678 to Aqui_4697). It exhibits only weak homology with the typical type IV secretion system of the Roseobacter clade, which is represented by the chromosomal counterpart, but it resembles the F sex factor of Escherichia coli that is the paradigm for bacterial conjugation [53].  [55]. However, the putative functionality of the Embden-Meyerhoff-Parnas pathway (glycolysis) has to be validated e.g. via pulse-chase experiments with 13C labeled glucose [56]. Finally, the plasmid pAqui_B243 contains the phosphoenolpyruvate synthase (Aqui_4951; EC 2.7.1.11) that is required together with the chromosomal phosphoenolpyruvate carboxylase (Aqui_0364; EC 4.1.1.31) for prokaryotic CO2 fixation and the formation of oxaloacetate from pyruvate. The 148 kb RepB-I type plasmid pAqui_D148 contains a complete rhamnose operon [50] and many genes that are required for polysaccharide biosynthesis. This extrachromosomal replicon also harbors two siderophore synthetase genes (Aqui_4320; Aqui_4321), two outer membrane receptors for Fe-transport (Aqui_4319; Aqui_4360) and genes of a putative ABC-type Fe 3+ siderophore transport system (Aqui_4361 to Aqui_4364). The 140 kb RepA-I type plasmid pAqui_E140 is largely predominated by glycosyltransferases, polysaccharide biosynthesis as well as cell-wall biogenesis genes, and it contains an operon for GDP-mannose metabolism (Aqui_5058 to Aqui_5055). The 126 kb DnaA-like I replicon pAqui_F126 contains a large type VI secretion system (T6SS) with a size of about 30 kb. The role of this export system that has been first described in the context of bacterial pathogenesis, but recent findings indicate a more general physiological role in defense against eukaryotic cells and other bacteria in the environment [57]. Homologous T6S systems are present on the DnaA-like I plasmids of Leisingera methylohalidivorans DSM 14336 T (pMeth_A285) and Phaeobacter caeruleus DSM 24564 T (pCaer_C109) as well as the RepC-8 type plasmid of Phaeobacter daeponensis DSM23529 T (pDaep_A276). Genome analysis of strain L. aquimarina DSM 24565 T revealed further the presence of genes encoding LuxI as well as LuxR homologues, which are involved in quorum sensing (QS), an already known feature of several members of the Roseobacter clade [58]. QS is a bacterial communication system used by many bacterial species to coordinate special behaviors based on bacterial population density [58]. Whereas two genes encode a N-acyl-L-homoserine lactone synthase (LuxI, Aqui_0074, Aqui_4264), some genes were identified to encode LuxR homologues (response and transcriptional regulators, e.g., Aqui_0075 and Aqui_3114). Furthermore, several genes forming a putative operon are involved in the oxidation of (e.g., Aqui_3422 to Aqui_3426) indicating the oxidation of thiosulfate into sulfate to produce energy. Addi-tionally genes for carbon monoxide utilization (Aqui_2391 and Aqui_2392, Aqui_2518, Aqui_2520, Aqui_3522, Aqui_5216 and Aqui_5217) were observed. Interestingly, also a gene encoding a sensor of blue light using FAD (BLUF, Aqui_2375) was detected, indicating possible blue-light depending signal transduction. As indicated by the 16S rRNA gene sequence analysis (Figure 1), the classification of some Leisingera and Phaeobacter species might need to be reconsidered. We conducted a preliminary phylogenomic analysis with GGDC [59][60][61] applied to the genome of L. aquimarina DSM 24565 T and the draft genomes of the type strains of the other Leisingera and Phaeobacter species. The results shown in Table 7 indicate that the DNA-DNA hybridization (DDH) similarities calculated in silico of L. aquimarina to Phaeobacter caeruleus and P. daeponensis species are higher than those to L. nanhaiensis, confirming the 16S rRNA gene sequence analysis. Thus a taxonomic revision of L. aquimarina might be warranted.   Reference species formula 1 formula 2 formula 3 † DDH similarities were calculated in silico with the GGDC server version 2.0 [57]. The standard deviations indicate the inherent uncertainty in estimating DDH values from intergenomic distances based on models derived from empirical test data sets (which are always limited in size); see [56] for details. The distance formulas are explained in [56]. The numbers in parentheses are IMG object IDs (GenBank accession number in the case of P. gallaeciensis) identifying the underlying genome sequences.