Complete genome sequence of Halorhodospira halophila SL1

Halorhodospira halophila is among the most halophilic organisms known. It is an obligately photosynthetic and anaerobic purple sulfur bacterium that exhibits autotrophic growth up to saturated NaCl concentrations. The type strain H. halophila SL1 was isolated from a hypersaline lake in Oregon. Here we report the determination of its entire genome in a single contig. This is the first genome of a phototrophic extreme halophile. The genome consists of 2,678,452 bp, encoding 2,493 predicted genes as determined by automated genome annotation. Of the 2,407 predicted proteins, 1,905 were assigned to a putative function. Future detailed analysis of this genome promises to yield insights into the halophilic adaptations of this organism, its ability for photoautotrophic growth under extreme conditions, and its characteristic sulfur metabolism.


Introduction
Halorhodospira halophila is an anoxygenic photosynthetic halophile that was isolated from saltencrusted mud along the shore of Summer Lake in Oregon [1], and from the hypersaline Wadi Natrun lakes in Egypt [2]. The original name of this organism, Ectothiorhodospira halophila, was modified to Halorhodospira halophila when the genus Ectothiorhodospira was divided into two genera (Ectothiorhodospira and Halorhodospira), and E. halophila was reclassified as a member of the genus Halorhodospira, serving as the type species of the new genus [3]. Over the last decade, the genomes of a number of extremely halophilic Archaea have been sequenced and analyzed, including Halobacterium salinarum [4,5], Haloarcula marismortui [6], Natronomonas pharaonis [7], and Haloquadratum walsbyi [8]. In addition, the genomes of three halophilic Bacteria have become available: Salinibacter ruber [9], Halothermothrix orenii [10], and 'Halanaerobium hydrogenoformans' [11]. All of these organisms are obligate chemotrophs. Thus, H. halophila is the first phototrophic extreme halophile to have its genome sequence determined and analyzed. In contrast to other extreme halophiles that grow well in saturated salt concentrations, H. halophila has a high flexibility with respect to the salt concentrations that it tolerates, and grows optimally at all NaCl concentrations from 15% to 35%, with growth down to 3.5% NaCl [12]. In contrast, the above extremely halophilic archaea and S. ruber require 15% NaCl for growth. H. halophila is of significant interest because it is an obligately anaerobic purple sulfur bacterium, and among the most halophilic organisms known [13]. To date, genome sequences are available for two phototrophic purple sulfur bacteria, Allochromatium vinosum DSM 180 and the H. halophila SL1 genome reported here. H. halophila has very few growth requirements. However, it does need reduced sulfur compounds for growth, as does A. vinosum [14]. Its pathways for both photosynthetic electron transfer [15][16][17] and nitrogen fixation [18] have attracted attention. In addition, H. halophila contains photoactive yellow protein [19,20]. This is the first member of a novel class of blue light receptors, and triggers a negative phototaxis response in H. halophila [21]. The PYP from H. halophila has been studied extensively for its biophysical characteristics [22][23][24]. The sulfur metabolism of H. halophila is unusual, resulting in the transient accumulation of extracel-lular sulfur globules via metabolic pathways that are not yet fully resolved [14]. While purple nonsulfur phototrophs such as Rhodobacter sphaeroides and Rhodospirillum rubrum use organic compounds like malate as electron donors, H. halophila obtains electrons from reduced sulfur compounds. The genome sequence of H. halophila promises to reveal insights into its adaptations to hypersaline environments, and to allow a better understanding of its unique combination of metabolic capabilities, combining properties from extreme halophiles, anoxygenic phototrophs, and purple sulfur bacteria.

Classification and features
H. halophila belongs to the Gammaproteobacteria   [37]. The tree was determined by the maximum likelihood model of PhyML [38] and rendered with TreeDyn [39], using the "one click" pipeline of the Phylogeny.fr web resource [40]. Standards in Genomic Sciences , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [36]. If the evidence code is IDA, then the property should have been directly observed, for the purpose of this specific publication, for a live isolate by one of the authors, or an expert or reputable institution mentioned in the acknowledgements.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing to better understand its halophilic adaptations, its unusual sulfur metabolism, its photosynthetic pathways, and to provide a framework for better understanding signaling pathways for photoactive yellow protein. The complete genome sequence has been deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). Table 2 presents the project information and its association with MIGS version 2.0 compliance [25].

Genome sequencing and assembly
The random shotgun method was used in sequencing the genome of H. halophila SL1. Large (40 kb), median (8 kb) and small (3 kb) insert random sequencing libraries were sequenced for this genome project with an average success rate of 88% and average high-quality read lengths of 750 nucleotides. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher (unpublished, C. Han) or by transposon bombing of bridging clones (EZ-Tn5 <P6Kyori/KAN-2> Tnp Transposome kit, Epicentre Biotechnologies). Gaps between contigs were closed by editing, custom primer walks or PCR amplification. The completed genome sequence of H. halophila SL1 contains 36,035 reads, achieving an average of 12-fold sequence coverage per base with error rate less than 1 in 100,000.

Genome annotation
Identification of putative protein-encoding genes and initial automated annotation of the genome was performed by the Oak Ridge National Laboratory genome annotation pipeline. Additional gene prediction analysis and functional annotation was performed within the IMG platform [41].

Genome properties
The genome is 2,678,452 bp long and comprises one circular chromosome with 67% GC content ( Figure 2). For the main chromosome, 2,493 genes were predicted, 2,407 of which are protein-coding genes. A total of 1,905 of protein coding genes were assigned to a putative function, with the remaining annotated as hypothetical proteins. In addition, 31 pseudo genes were identified. The properties and the statistics of the genome are summarized in Tables 3-4.

Conclusion
H. halophila is among the most halophilic eubacteria known. Further analysis and characterization of its genome will provide insights into the mechanisms it uses to adapt to hypersaline environments.

Figure 2.
Graphical circular map of the genome. From outside to the center: Circle 1, genes on forward strand (colored by COG categories); Circle 2, genes on reverse strand (colored by COG categories); Circle 3, RNA genes (tRNAs green, rRNAs red, other RNAs black); Circle 4, mobile element genes; Circle 5, CRISPR-associated protein genes; Circle 6, GC content; Circle 7, GC skew. The total is based on the total number of protein coding genes in the annotated genome.