Genome of the R-body producing marine alphaproteobacterium Labrenzia alexandrii type strain (DFL-11T)

Labrenzia alexandrii Biebl et al. 2007 is a marine member of the family Rhodobacteraceae in the order Rhodobacterales, which has thus far only partially been characterized at the genome level. The bacterium is of interest because it lives in close association with the toxic dinoflagellate Alexandrium lusitanicum. Ultrastructural analysis reveals R-bodies within the bacterial cells, which are primarily known from obligate endosymbionts that trigger “killing traits” in ciliates (Paramecium spp.). Genomic traits of L. alexandrii DFL-11T are in accordance with these findings, as they include the reb genes putatively involved in R-body synthesis. Analysis of the two extrachromosomal elements suggests a role in heavy-metal resistance and exopolysaccharide formation, respectively. The 5,461,856 bp long genome with its 5,071 protein-coding and 73 RNA genes consists of one chromosome and two plasmids, and has been sequenced in the context of the Marine Microbial Initiative.


Introduction
Strain DFL-11 T (= DSM 17067 = NCIMB 14079) is the type strain of Labrenzia alexandrii, a marine member of the Rhodobacteraceae (Rhodobacterales, Alphaproteobacteria) [1]. Strain DFL-11 T was isolated from single cells of a culture of the toxic dinoflagellate Alexandrium lusitanicum maintained at the Biological Research Institute of Helgoland, Germany [1]. L. alexandrii is the type species of the genus Labrenzia, which currently also harbors a couple of species (L. aggregata, L. alba and L. marina) that were previously classified in the genus Stappia [1]. Biebl et al. 2007 [1] did not provide a formal assignment of the genus Labrenzia to a family, but their phylogenetic analysis placed Labrenzia with high support within a clade also comprising Nesiotobacter, Pannonibacter, Pseudovibrio, Roseibium and Stappia, genera which at that time were either not formally assigned to a family or to Rhodobacteraceae [2]. Other analyses [3] indicate that the entire clade should not be placed within Rhodobacteraceae, but an alternative taxonomic arrangement has, to the best of our knowledge, not yet been published. Here we present a summary classification and a set of features for L. alexandrii DFL-11 T including so far undiscovered aspects of its ultrastructure and physiology, together with the description of the high-quality permanent draft genome sequence and annotation.
This work is part of the Marine Microbial Initiative (MMI) which enabled the J. Craig Venter Institute (JCVI) to sequence the genomes of approximately 165 marine microbes with funding from the Gordon and Betty Moore Foundation. These microbes were contributed by collaborators worldwide, and represent an array of physiological diversity, including carbon fixation, photoautotrophy, photoheterotrophy, nitrification, and methanotrophy. The MMI was designed to complement other ongoing research at JCVI and elsewhere to characterize the microbial biodiversity of marine and terrestrial environments through metagenomic profiling of environmental samples. Standards in Genomic Sciences

Classification and features 16S rRNA analysis
A representative genomic 16S rRNA sequence of strain DFL-11 T was compared using NCBI BLAST [4,5] using default settings (e.g., considering only the high-scoring segment pairs (HSPs) from the best 250 hits) with the most recent release of the Greengenes database [6] and the relative frequencies of taxa and keywords (reduced to their stem [7]) were determined, weighted by BLAST scores. The most frequently occurring genera were Stappia (36.9%), Pannonibacter (19.6%), Pseudovibrio (18.8%), Labrenzia (10.8%) and Achromobacter (5.0%) (98 hits in total). Regarding the seven hits to sequences from other members of the genus, the average identity within HSPs was 97.3%, whereas the average coverage by HSPs was 96.4%. Among all other species, the one yielding the highest score was Stappia alba (AJ889010) (since 2007 reclassified as L. alba [1]), which corresponded to an identity of 98.2% and an HSP coverage of 99.9%. (Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification.) The highest-scoring environmental sequence was AY701471 (Greengenes short name 'dinoflagellate symbiont clone GCDE08 W'), which showed an identity of 99.8% and an HSP coverage of 99.6%. The most frequently occurring keywords within the labels of all environmental samples which yielded hits were 'coral' (5.4%), 'microbi' (3.2%), 'marin' (3.0%), 'diseas' (2.8%) and 'healthi' (2.8%) (150 hits in total). The most frequently occurring keywords within the labels of those environmental samples which yielded hits of a higher score than the highest scoring species were 'coral' (11.1%), 'dinoflagel, symbiont' (5.7%), 'aquarium, caribbean, chang, dai, diseaseinduc, faveolata, kept, montastraea, plagu, white' (5.6%) and 'habitat, microbi, provid, threaten' (5.5%) (4 hits in total). These terms partially correspond with the known ecology of L. alexandrii. Figure 1 shows the phylogenetic neighborhood of L. alexandrii in a 16S rRNA based tree. The sequences of the three identical 16S rRNA gene copies in the genome do not differ from the previously published 16S rRNA sequence (AJ582083).

Morphology and physiology
The rod-shaped cells of strain DFL-11 T are 0.5 to 0.7 μm in width and 0.9 to 3.0 μm long with often unequal ends (Table 1 and Figure 2A), suggesting a polar mode of cell division which is increasingly being discovered in Alphaproteobacteria and thought to be ancient [23]. Motility is present by means of a single subpolar flagellum [1]. Starshaped aggregated clusters occur [1]. The colonies exhibit a beige to slightly pink color [1]. Strain DFL-11 T has a chemotrophic lifestyle; no fermentation occurs under aerobic or anaerobic conditions [1]. Optimal growth occurs in the presence of 1-10% NaCl and pH 7.0-8.5 at 26°C, whereas no growth occurs in the absence of NaCl or of biotin and thiamine as growth factors [1]. Several organic acids like acetate, butyrate, malate and citrate as well as glucose and fructose are metabolized, but methanol, ethanol and glycerol are not used for growth [1]. Whereas gelatin is hydrolyzed by the cells, starch is not; nitrate is not reduced [1]. The strain shows a weak resistance to potassium tellurite [1].
The utilization of carbon compounds by L. alexandrii DSM 17067 T was also determined for this study using PM01 microplates in an OmniLog phenotyping device (BIOLOG Inc., Hayward, CA, USA). The microplates were inoculated at 28°C with a cell suspension at a cell density of approximately 85% Turbidity and dye D. Further additives were artificial sea salts, vitamins, trace elements and NaHC0 3 . The exported measurement data were further analyzed with the opm package for R [24], using its functionality for statistically estimating parameters from the respiration curves such as the maximum height, and automatically translating these values into negative, ambiguous, and positive reactions. The strain was studied in six independent biological replicates, and reactions with a distinct behavior between the repetitions were regarded as ambiguous and are not listed below.
TEM analysis showed that individual cells of strain DFL-11 T , assembled in clusters, contained refractile inclusion bodies, known as R-bodies [26,27], when plate-grown bacteria were embedded as microcolonies of different growth states. Rbodies are highly insoluble protein ribbons coiled to form a hollow cylinder within the cytoplasma of the bacterial cells [26,27]. In strain DFL-11 T these unusual structures were generally observed in cell remnants, which contained only small amounts of cytoplasmic material (Figure 2A). They were built mainly as five-to six-layered spirals and often had a loose electron-dense, amorphous matrix. In concentric cross-or longitudinal sections the individual layers appeared to be composed of an electron-dense dark and an electron-translucent bright layer; each doublet was found to have an average thickness of 10.1 nm (standard deviation: 0.7 nm; N = 16), ranging from minimal 8.7 nm to maximum 11.9 nm. The overall diameter of the Rbodies ranged from 183 nm to 242 nm, which is in good accordance with the dimensions of furled Rbody ribbons reviewed in [27].
To date only a few bacterial species are known to produce R-bodies [26,27]. They were first described in members of the genus 'Caedibacter'. These bacteria live as obligate endosymbionts in Paramecium species and confer the so-called "killer trait" to their hosts: "killer-phenotype" paramecia release 'Caedibacter' cells via their cytopyge into the environment and these kill sensitive paramecia (i.e. 'Caedibacter'-free ciliates) after being ingested. The toxic effect of 'Caedibacter' is strictly correlated with R-body synthesis. Once incorporated into sensitive paramecia, the R-body extrudes in a telescopic fashion, thereby disrupting the bacterial cell. Cellular components are subsequently released into the cytoplasma of Paramecium, finally causing the ciliate's death. It has been proposed that a lethal toxin is involved in this process, but it has not been identified so far [28]. Interestingly, a phylogenetic study based on comparative 16S rRNA gene sequencing revealed that 'Caedibacter' is a polyphyletic assemblage, comprising Gammaproteobacteria related to Francisella tularensis as well as Alphaproteobacteria affiliated with Rickettsiales (including the obligate Paramecium endosymbiont 'Holospora') [29]. In addition to the obligate endosymbionts, some free-living bacteria, i.e. Hydrogenophaga taeniospiralis, Acidovorax avenae subsp.

Genome sequencing and annotation Genome project history
The genome was sequenced within the MMI supported by the Gordon and Betty Moore Foundation. Initial Sequencing was performed by the JCVI (Rockville, MD, USA) and a high-quality draft sequence was deposited at INSDC. The number of scaffolds and contigs was reduced and the assembly improved by a subsequent round of manual gap closure at HZI/DSMZ. A summary of the project information is shown in Table 2. Standards in Genomic Sciences Chemotaxonomy Ubiquinone 10 was found as the single respiratory lipoquinone, which is a common feature in most Alphaproteobacteria. The spectrum of polar lipids consists of phosphatidylglycerol, diphosphatidylglycerol, phosphatidylethanolamine, phosphatidylcholin, phosphatidylmonomethylethanolamine, sulphoquinovosyldiacylglyceride, as well as an unidentified aminolipid [1]. In the fatty acids spectrum is dominated by C 18 : 1ω7 (71%) and complemented by C 20 : 1ω7 (9.1%), C 18 : 0 (6.5%), 11-methyl C 18:1ω6t (3.7%) and some hydroxy fatty acids C 14:0 3-OH (3.4%) and C 16:0 3-OH (1.5%) as well as traces of C 18 : 1ω9 and cyclo C 21:0 [1]. The presence of photosynthetic pigments was tested in [1] and the absorption spectrum of the acetone/methanol extract showed that bacteriochlorophyll a was present at low concentrations. Another peak at 420 and 550 nm indicated the presence of an additional photosynthetic pigment, most probably a yet unidentified carotinoid.

Growth conditions and DNA extractions
A culture of DSM 17067 was grown for two to three days on a LB & sea-salt agar plate, containing (l -1 ) 10 g tryptone, 5 g yeast extract, 10 g NaCl, 17 g sea salt (Sigma-Aldrich S9883) and 15 g agar. A single colony was used to inoculate LB & sea-salt liquid medium and the culture was incubated at 28°C on a shaking platform. The genomic DNA was isolated using the Qiagen Genomic 500 DNA Kit (Qiagen 10262) as indicated by the manufacturer. DNA quality and quantity were in accordance with the instructions of the genome sequencing center. Figure 1. Phylogenetic tree highlighting the position of L. alexandrii relative to the type strains of the species of selected genera (see [1,3] and the results of the Greengenes database search described above) within the family Rhodobacteraceae. These genera form a clade [1,3], but it might be better not to place them in this family [3]. The tree was inferred from 1,366 aligned characters [8,9] of the 16S rRNA gene sequence under the maximum likelihood (ML) criterion [10] and rooted with Pseudovibrio. The branches are scaled in terms of the expected number of substitutions per site (see size bar). Numbers adjacent to the branches are support values from 1,000 ML bootstrap replicates [11] (left) and from 1,000 maximum-parsimony bootstrap replicates [12] (right) if larger than 60%. Lineages with type-strain genome sequencing projects registered in GOLD [13] are labeled with one asterisk.  Evidence codes -TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Nontraceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). Evidence codes are from the Gene Ontology project [22].

Genome sequencing and assembly
The genome was sequenced with the Sanger technology using a combination of two libraries. All general aspects of library construction and sequencing performed at the JCVI can be found on the JCVI website. Base calling of the sequences were performed with the phredPhrap script using default settings. The reads were assembled using the phred/phrap/consed pipeline [31]. The last gaps were closed by adding new reads produced by recombinant PCR and PCR primer walks. In total 21 reads were required for gap closure and improvement of low quality regions. The final consensus sequence was built from 60,668 Sanger reads (9.1 × coverage).

Genome annotation
Gene prediction was carried out using GeneMark as part of the genome annotation pipeline in the Integrated Microbial Genomes Expert Review (IMG-ER) system [32]. To identify coding genes, Prodigal [33] was used, while ribosomal RNA genes within the genome were identified using the tool RNAmmer [34]. Other non-coding genes were predicted using Infernal [35]. Manual functional annotation was performed within the IMG platform [32] and the Artemis Genome Browser [36].

Genome properties
The genome statistics are provided in Table 3 and Figures 3a, 3b and 3c. The genome consists of a 5,299,280 bp long chromosome and two plasmids with 68,647 bp and 93,929 bp length, respectively, with a G+C content of 56.4%. Of the 5,144 genes predicted, 5,071 were protein-coding genes, and 73 RNAs; pseudogenes were not identified. The majority of the protein-coding genes (81.0%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Insights into the genome R-body genes
In 'Caedibacter taeniospiralis', three genes (rebA, rebB and rebC) were identified to determine the Rbody production. They are clustered on large plasmids, ranging from 41-49 kb, and encompass 345 bp, 318 bp and 171 bp (accession number U04524), respectively. The corresponding proteins RebA (114 aa, 18 kDa), RebB (105 aa, 13 kDa) and RebC (56aa, 10 kDa) are necessary to assemble Rbodies through polymerization processes [37]. Furthermore, a putative forth gene rebD (249 bp; RepD 82aa) is located between rebB and rebC and might be involved in R-body formation. Based on high sequence similarities to the C. taeniospiralis R-body protein RebB, three homologues (ladfl_00000850, ladfl_00000900 and ladfl_00000910) were detected on the chromosome of strain DFL-11 T . Their amino acid sequence length is 122 aa, 109 aa and 76 aa, respectively, which is in accordance with R-body proteins found in C. taeniospiralis 47, and they were all assigned to the Pfam family RebB (PF11747). The chromosomal arrangement of R-body genes in strain DFL-11 T is not contiguous; ladfl_0000085 is separated from ladfl_0000090 and ladfl_0000091 by four hypothetical genes (ladfl_0000086 -ladfl_0000089). Interest-Standards in Genomic Sciences ingly, a putative alternative sigma-factor of the ECF subfamily (ladfl_0000084, upstream of ladfl_0000085) flanks the R-body gene cluster, indicating that reb gene expression in strain DFL-11 T is regulated by extracytoplasmic stimuli. Gene arrangements orthologous to the L. alexandrii DFL-11 T reb gene cluster were found in the alphaproteobacteria Roseibium sp. TrichSKD4 (NZ_GL47637) and Polymorphum gilvum (NC_015259), organisms which are closely related to L. alexandrii [38].  [39]. Pulsed-field gel electrophoresis (PFGE) showed faint bands with estimated sizes of 88 kb and 65 kb, and their circular conformation has been documented by comparative analyses with distinct PFGE parameters. An additional linear fragment of about 35 kb, which has not been recovered by genome sequencing, may represent a prophage (see below) whose excision from the genome depends on the cultivation conditions. Both plasmids represent RepABC-type replicons with the partitioning genes repA and repB as well as the replicase repC that are located in a typical operon [40]. Phy-logenetic analyses of the replicases provides the basis for the classification of alphaproteobacterial plasmids [41]. The respective phylogeny of both RepC sequences from L. alexandrii DSM 17067 T (ladfl_05027, ladfl_05140) documents a close affiliation with rhizobial genes to an exclusion of sequences from Rhodobacterales that are located in distinct subtrees (data not shown [42] ). Both plasmids seem to be equipped with characteristic post segregational killing systems consisting of a toxin/antitoxin operon that prevent plasmid loss (ladfl_05100/ladfl_05101, ladfl_05128/ladfl_05129 [43] ). Plasmid LADFL_5 contains several genes that are related to heavy-metal resistance [44] and eight of them are related to the COG category "Inorganic ion transport and metabolism" (see also Table 4). This set includes the mer-operon composed of merR, merT, merF and mercuric reductase MerA, which are part of the Gram-negatives' mercuryresistance system [45]. This plasmid also harbors a predicted P-type ATPase translocating heavymetal ions and components of a Cd2+, Zn2+ or Co2+ efflux system. The resistance to a wide pallet of heavy-metal ions may enable the strain to dwell in polluted environments [44]. The second conspicuous trait of LADFL_5 is the presence of a complete type-IV secretion system (T4SS [46] ).
The virB operon (ladfl_05033 to ladfl_05043) is required for the formation of a functional transmembrane channel and pilus formation.
Moreover, the virD gene cluster including the characteristic DNA relaxase (ladfl_05091) and the coupling protein VirD4 (ladfl_05093) indicates that the T4SS machinery represents a functional conjugation system. The lysozyme TraH_2 (ladfl_05088), which is required for the degradation of the peptidoglycan cell wall and transmembrane channel formation, is annotated as specific protein of Rhizobiales, an affiliation that is in agreement with the outcome of the phylogenetic RepC analysis [42].    Plasmid LADFL_6 is dominated by more than a dozen genes that are involved in sugar metabolism. It contains the complete operon for the conversion of glucose-1-phosphate into dTDP-Lrhamnose (rmlC, rmlD, rmlA, rmlB) that is a common component of the cell wall and capsule of many pathogenic bacteria [47]. Three glycosyltransferases, some components of an ABCtype polysaccharide transport system as well as a sugar transferase for lipopolysaccharide synthesis and a lipid A core O-antigen ligase (ladfl_05144, ladfl_05145) are indicative for a functional role of the plasmid for exopolysaccharide formation. Extracellular polysaccharids of the Sym plasmid are required for root hair attachment in Rhizobium leguminosarum [48] and the plasmid LADFL_6 may also be required for biofilm generation. This prediction is compatible with the origin of strain DFL-11 T that has been isolated from the dinoflagellate A. lusitanicum [1].