Permanent draft genome sequence of Vibrio tubiashii strain NCIMB 1337 (ATCC19106)

Vibrio tubiashii NCIMB 1337 is a major and increasingly prevalent pathogen of bivalve mollusks, and shares a close phylogenetic relationship with both V. orientalis and V. coralliilyticus. It is a Gram-negative, curved rod-shaped bacterium, originally isolated from a moribund juvenile oyster, and is both oxidase and catalase positive. It is capable of growth under both aerobic and anaerobic conditions. Here we describe the features of this organism, together with the draft genome and annotation. The genome is 5,353,266 bp long, consisting of two chromosomes, and contains 4,864 protein-coding and 86 RNA genes.


Introduction
The genus Vibrio is both numerous and ubiquitous within marine environments, with Vibrio species harbored within many diverse marine organisms, such as mollusks, shrimps, fishes, cephalopods and corals [1]. Comparative genome analysis has revealed a huge genetic diversity within this genus, which is driven by mutations, chromosomal rearrangements, loss of genes by decay or deletion, and gene acquisitions through duplication or horizontal transfer (e.g. the acquisition of bacteriophages, pathogenicity islands, and superintegrons), the combination of which presumably stimulates genetic and functional diversity and allows this group to colonize a wide variety of ecological niches and hosts [1,2]. Vibrio tubiashii was first described as three strains of Vibrio anguillarum by Tubiash et al [3] in 1965. The organisms were isolated from bivalve mollusks during an outbreak of bacillary necrosis in Milford, Connecticut, and deposited in the American Type Culture Collection as ATCC 19105, 19106 and 19109. These three strains were further elucidated and formally named as V. tubiashii by Hada et al [4] in 1984. Subsequently, several virulence factors have been identified [5,6] and the organism is increasingly implicated in major disease outbreaks in bivalve mollusks [1].
V. tubiashii is closely related to the proposed coral pathogen V. coralliilyticus, as well as V. orientalis, a bacterium associated with penaeid shrimps [7]. Indeed, V. coralliilyticus was initially designated as a V. tubiashii strain [8,9] due to their close similarity.

Classification and features
Vibrio tubiashii 1337 belongs to the Gammaproteobacteria and are contained within the family, Vibrionaceae [ Table 1]. Cells of Vibrio tubiashii are Gram-negative curved-rods of approximately 0.5 by 1.5 µm, which are motile in liquid media by means of a single sheathed, polar flagellum [3,4] These cells are facultative anaerobes, [3,4,22]. It is catalase and oxidase positive, capable of splitting indole from tryptophan, and can use glucose, xylose, mannitol, rhamnose, sucrose, arabinose and acetate as sole carbon sources, and has βgalactosidase activity, despite an apparent inability to ferment lactose. V. tubiashii is capable of dissimilatory nitrate and nitrite reduction under anaerobic conditions, can use organic phosphorus during phosphate limitation, and can utilize 2aminoethylphosphonate as a sole phosphorus source.
V. tubiashii has an absolute requirement for sodium and chloride ions, and is incapable of growth on media containing less than 0.5% W/V NaCl. The temperature optimum for growth is 25 o C, but growth does occur in the range of 12-30 o C. The organism is killed at 37 o C. V. tubiashii has a biphasic pH response and grows optimally at both pH 8.0 and 6.5, but displays weakened growth at pH 7.0 and 7.5. The bacterium shows rapid growth on marine broth and produces buff colored, opaque, irregular, slightly convex colonies on marine agar, and yellow colonies, characteristic of the Vibrionaceae, on Thiosulfate-Citrate-Bile-Sucrose Agar (TCBS).

Growth conditions and DNA isolation
Vibrio tubiashii NCIMB 1337 (ATCC19106) was grown in marine broth (seawater + 1 gl -1 yeast extract and 0.5 gl -1 tryptone) at 25 o C for 24 hours. DNA was extracted using the Qiagen DNAeasy blood and tissue kit, without modification of the manufacturer's protocol.

Genome sequencing
The genome was sequenced using the Illumina sequencing platform. All general aspects of library construction and sequencing performed at the NERC Biomolecular analysis facility can be found on the NBAF website [23]. SOLEXA Illumina reads were assembled using VELVET Large Newbler contigs that were broken into 4,074 overlapping fragments of 1,000 bp and entered into the assembly as pseudo-reads. The sequences were assigned quality scores based on consensus q-scores with modifications to account for overlap redundancy and to adjust inflated q-scores. The error rate of the completed genome sequence is less than 1 in 100,000. Overall sequencing provided 131 × coverage of the genome.

Genome annotation
Genes were identified using the RAST server The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, Uni-Prot, TIGRFam, Pfam, PRIAM, KEGG, COG, and In-terPro databases. The tRNAScanSE tool [24] was used to find tRNA genes, whereas ribosomal RNAs were found by using BLASTn against the ribosomal RNA databases. The RNA components of the protein secretion complex and the RNaseP were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [25]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform developed by the Joint Genome Institute, Walnut Creek, CA, USA [26,27].  Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [21]. If the evidence code is IDA, then the property was directly observed, for a live isolate by one of the authors, or an expert or reputable institution mentioned in the acknowledgements.

Genome project information
This organism was selected for sequencing on the basis of its increasing impact as a bivalve pathogen, and was funded by i-G Peninsula. The genome project is deposited in the IMG database and the complete genome sequence in GenBank (CP001643). Sequencing, finishing and annotation were performed by the GenePool Team at NERC Biomolecular Analysis Facility (NBAF) Edinburgh.
A summary of the project information is shown in Table 2.

Genomic properties
The genome was assembled into 335 contigs and includes two circular chromosomes combining to give a total size of 5,353,266 bp (44.84% GC content). A total of 4,950 genes were predicted, 4,864 of which are protein-coding genes. 74.22% of protein coding genes were assigned to a putative function with the remaining annotated as hypothetical proteins. 658 protein coding genes belong to paralogous families in this genome corresponding to a gene content redundancy of 13.29%. The properties and the statistics of the genome are summarized in Tables 3-5.  a)The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.
b)Also includes 54 pseudogenes and 5 other genes.

Genomic comparison
Based on COG I.D the Vibrio tubiashii genome shows most similarity to the genome of V coralliilyticus (R 2 = 0.96) and to V. orientalis (R 2 = 0.94), while showing less similarity to V. shilonii (R 2 = 0.86) [ Table 6]. This is in contrast to the 16Sbased analysis shown in Figure 1. However, it should be noted that 16S rRNA analysis often poorly discriminates vibrios due to low sequence heterogeneity in the 16S gene [28].

Regulatory systems
The Vibrio tubiashii NCIMB 1337 genome contains multiple quorum sensing systems, most notably a luxM/N system which has two adjacent copies of the luxN gene. In addition, there is a luxS/PQ system, with the lux P and Q gene appearing consecutively. There is also a cqsA/S system. It is probable that these three systems converge on the phospho-relay transfer system encoded by the lux-O/luxU/hapR genes. There are two additional lux genes (LuxT and LuxZ). The genome also contains the rpoN gene encoding for the sigma-54 factor, which may indicate the presence of the twocomponent phosphorylation-dephosphorylation cascade described in V. harveyi [29] (note: Vibrio harveyi is also known as Lucibacterium harveyi and Beneckea harveyi.).

Antibiotic resistance
There are six separate genes encoding for putative β-lactamases within the genome, but only two have homology at the protein levels with any know Vibrio β-lactamases. There is also a multiantibiotic resistance protein MarC, associated with an operon containing a variety of multidrug resis-tance proteins. This operon is controlled by a MerR type transcriptional regulator, which is often associated with antibiotic resistance [30], and may account for the kanamycin resistance observed in this strain by the authors.