Complete genome sequence of the halophilic and highly halotolerant Chromohalobacter salexigens type strain (1H11T)

Chromohalobacter salexigens is one of nine currently known species of the genus Chromohalobacter in the family Halomonadaceae. It is the most halotolerant of the so-called ‘moderately halophilic bacteria’ currently known and, due to its strong euryhaline phenotype, it is an established model organism for prokaryotic osmoadaptation. C. salexigens strain 1H11T and Halomonas elongata are the first and the second members of the family Halomonadaceae with a completely sequenced genome. The 3,696,649 bp long chromosome with a total of 3,319 protein-coding and 93 RNA genes was sequenced as part of the DOE Joint Genome Institute Program DOEM 2004.


Introduction
Strain 1H11 T (= DSM 3043 = ATCC BAA-138 = CECT 5384) is the type strain of the species Chromohalobacter salexigens [1], which is one of currently nine species in the genus Chromohalobacter [1,2]. The genus name was derived from the Greek words chroma, color, hals halos, salt, and the Neo-Latin bacter, rod, meaning the colored salt rod. The species epithet originated from the Latin words sal salis, salt, and exigo, to demand; saltdemanding [3]. Strain 1H11 T was originally isolated in 1974 in Bonair, Netherlands Antilles, from salterns containing 18.6% salt, and was initially published as a strain belonging to the species Halomonas elongata [4]. In 2001, Arahal et al. transferred the strain to the genus Chromohalobacter [2] as the type strain of the then novel species C. salexigens [1] following detailed phenotypic, genotypic, and phylogenetic analyses. C. salexigens is known for its very broad salinity range [1] and for its role as a model organism for prokaryotic osmosadaptation [5][6][7], e.g. the synthesis of ectoines (ectoine and hydroxyectoine) for cell stress protection [8,9]. Here we present a summary classification and characteristics of C. salexigens 1H11 T , together with the description of the complete genomic sequencing and annotation.

Classification and features
The sequences of the five identical 16S rRNA genes of strain 1H11 T were compared using NCBI BLAST [10] under default settings (e.g., considering only the high-scoring segment pairs (HSPs) from the best 250 hits) with the most recent release of the Greengenes database [11] and the relative frequencies of taxa and keywords (reduced to their stem [12]) were determined and weighted by BLAST scores.
The most frequently occurring genera were Halomonas (50.7%), Chromohalobacter (46.3%), 'Haererehalobacter' (1.7%), Bacillus (0.8%) and Pseudomonas (0.5%) (214 hits in total). For 16 hits to sequences from members of the C. salexigens species, the average identity within HSPs was 99.9% and the average coverage by HSPs was 97.9%. For 22 hits to sequences from other members of the genus Chromohalobacter, the average identity within HSPs was 98.2% and the average coverage by HSPs was 98.6%. Among all other species, the one yielding the highest score was Chromohalobacter marismortui (X87222), which corresponded to an identity of 99.9% and an HSP coverage of 100.0%. (Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification.) The highest-scoring environmental sequence was EU799899 ('It's all ranking aquatic Newport Harbor RI clone 1C227569'), which showed an identity of 100.0% and an HSP coverage of 100.0%. The most frequently occurring keywords within the labels of environmental samples which yielded hits were 'soil' (12.1%), 'lake' (3.6%), 'salin' (3.0%), 'agricultur' (2.9%) and 'alkalin, chang, flood, former, mexico, texcoco' (2.6%) (36 hits in total). The most frequently occurring keyword within the labels of environmental samples which yielded hits of a higher score than the highest scoring species was 'aquat, harbour, newport, rank' (25.0%) (2 hits in total). These keywords fit reasonably well with the ecological and physiological properties reported for strain 1H11 T in the original description [1]. Figure 1 shows the phylogenetic neighborhood of C. salexigens in a 16S rRNA based tree. The sequences of the five identical 16S rRNA gene copies in the genome differ by two nucleotides from the previously published 16S rRNA sequence (AJ295146), which contains three ambiguous base calls. The tree was inferred from 1,440 aligned characters [13,14] of the 16S rRNA gene sequence under the maximum likelihood (ML) criterion [15]. Rooting was done initially using the midpoint method [16] and then checked for its agreement with the current classification ( Table 1). The branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent to the branches are support values from 1,000 ML bootstrap replicates [17] (left) and from 1,000 maximum parsimony bootstrap replicates [18] (right) if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [19] are labeled with one asterisk, those also listed as 'Complete and Published' with two asterisks [20].
Cells of C. salexigens strain 1H11 T are straight or slightly curved rods, 0.7 to 1.0 by 2 to 3 µm in size ( Figure 2) with squared ends and occur singly or in pairs [1,4]. Cells of strain 1H11 T stain Gramnegative, are motile with polar flagella, strictly aerobic, and are non-spore-forming [1,4]. Carbon and nitrogen source utilization and biochemistry of the strain were reported by Arahal et al. [1]. A partial characterization of the carbon-source utilization by the organism has also been presented by Csonka et al. [36], who reported that the strain can degrade a number of aromatic compounds, including benzoate, protocatechuate, 4-hydroxybenzoate, and toluene. C. salexigens 1H11 T is a halophile, which according to the classification proposed by Kushner [37], is on the borderline between "moderate" halophiles (those growing optimally between 2.9 -14.5% NaCl) and "extreme" halophiles (those growing optimally between 8.7 -23.2% NaCl). In addition, it displays extraordinarily high halotolerance (considered as the ability to live and survive under high salt concentrations), and is able to grow at salt concentrations over 17.4% and 32% in defined and complex media, respectively. However, both the minimum NaCl requirement and the upper limit of NaCl tolerance are dependent on growth medium and temperature. The organism can tolerate higher NaCl concentrations in LB or in other complex media than in defined media. In defined media, halotolerance is enhanced by os-moprotectants, such as glycine betaine or its precursor, choline [4,6,33]. In the complex medium SW ('sea water'), which is routinely used for growing this type of microorganism, strain 1H11 T grows optimally at 7.5 to 10% (w/v) NaCl, with growth occurring over the range of 0.9% to 25% NaCl [1]. In casein medium, which was initially used for strain isolation, growth occurs in the presence of 32% solar salts [4]. In SW medium containing 10% (w/v) total salts, C. salexigens 1H11 T can grow at a pH range from 5 to 10, with an optimum at pH 7.5 [1]. In the same medium, the temperature range for growth is 15 -45°C, with an optimum at 37°C [1]. In the standard defined medium M63, supplemented with glucose as the sole carbon source, growth is optimal at 8.7 to 11.6% NaCl but occurs over the range of 2.9% NaCl or a maximum of 19% NaCl [6]. Interestingly, C. salexigens 1H11 T exhibits maximal growth rate in glucose-M63 with only 1.8% (0.3M) NaCl in the presence of high concentrations of salts of other inorganic ions, including K + , Rb + , NH4 + , Br -, NO3 -, or SO4 - [38]. However, it is an open question whether this strain is unique among halophiles in being able to use other inorganic ions in addition to Na + and Clfor maximal growth rate.

Chemotaxonomy
Data on the structure of the cell wall, fatty acids lipid composition, quinones and polar lipids are not available.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of the DOE Joint Genome Institute Program DOEM 2004. The genome project is deposited in the Genomes On Line Database [19] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Strain history
The

Growth conditions and DNA isolation
The culture of strain 1H11 T , DSM 3043, used to prepare genomic DNA (gDNA) for sequencing was grown in LB medium with 1 M NaCl. DNA was extracted as described by O'Connor and Zusman [39]. The purity, quality and size of the bulk gDNA preparation were assessed by JGI according to DOE-JGI guidelines.

Genome sequencing and assembly
The genome was sequenced using a combination of 4 kb, 8 kb and fosmid DNA libraries. All general aspects of library construction and sequencing can be found at the JGI website [40]. Draft assemblies were based on 44,750 total reads. The Phred/Phrap/Consed software package was used for sequence assembly and quality assessment [41]. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI) [42]. Gaps between contigs were closed by editing in Consed, custom priming, or PCR amplification (Roche Applied Science, Indianapolis, IN). A total of 920 additional reactions, 14 shatter and 18 transposon bomb libraries were needed to close gaps and to raise the quality of the finished sequence. The error rate of the completed genome sequence is less than 1 in 100,000. Together all libraries provided 11.5 × coverage of the genome.

Genome annotation
Genes were identified using two gene modeling programs, Glimmer [43] and Critica [44] as part of the Oak Ridge National Laboratory genome annotation pipeline. The two sets of gene calls were combined using Critica as the preferred start call for genes with the same stop codon. Genes specifying fewer than 80 amino acids that were predicted by only one of the gene callers and had no Blast hit in the KEGG database at ≤1e-05, were deleted. Automated annotation was followed by a round of manual curation to eliminate obvious overlaps. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [45], TMHMM [46], and signalP [47].

Genome properties
The genome consists of a 3,696,649 bp long chromosome with a 63.9% G+C content ( Figure 3 and Table 3). Of the 3,412 putative genes, 3,319 are protein-coding, and 93 specify RNAs; 21 pseudogenes were also identified. The majority of the protein-coding genes (76.8%) were assigned a putative function while the remaining ones were annotated as encoding hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.

Insights into the genome
The publication of genome sequence strain 1H11 T is preceded by some publications that were based on draft versions of the sequence or on publicly available genome sequence and annotation. Oren et al. [48] found that the predicted isoelectric points of periplasmic proteins of C. salexigens 1H11 T are significantly more acidic than those of orthologous proteins in mesophilic bacteria, and they suggested that this feature may contribute to the halophilic characteristics of 1H11 T . Analysis of the genomic sequence indicted that the organism has all of the enzymes of the Embden-Meyerhof glycolytic pathway, hexose monophosphate shunt, and TCA cycle but seemed to lack the standard fructose-1,6-bisphosphate phosphatase of the gluconeogenetic pathway [36]. Krejcík et al. predicted the isethionate formation from taurine based on the genome sequence [49]. Ates et al. recently presented a genome-scale reconstruction of a metabolic network for strain 1H11 T focusing on the uptake and accumulation of industrially important organic osmolytes such as ectoine and betaine [5].