Complete genome sequence of the halophilic bacterium Spirochaeta africana type strain (Z-7692T) from the alkaline Lake Magadi in the East African Rift

Spirochaeta africana Zhilina et al. 1996 is an anaerobic, aerotolerant, spiral-shaped bacterium that is motile via periplasmic flagella. The type strain of the species, Z-7692T, was isolated in 1993 or earlier from a bacterial bloom in the brine under the trona layer in a shallow lagoon of the alkaline equatorial Lake Magadi in Kenya. Here we describe the features of this organism, together with the complete genome sequence, and annotation. Considering the pending reclassification of S. caldaria to the genus Treponema, S. africana is only the second 'true' member of the genus Spirochaeta with a genome-sequenced type strain to be published. The 3,285,855 bp long genome of strain Z-7692T with its 2,817 protein-coding and 57 RNA genes is a part of the G enomic E ncyclopedia of B acteria and A rchaea project.


Introduction
Strain Z-7692 T (= DSM 8902 = ATCC 700263) is the type strain of the species Spirochaeta africana [1]. The genus Spirochaeta currently consists of 18 validly named species [2]. The genus name was derived from the latinized Greek words 'speira' meaning 'a coil' and 'chaitê' meaning 'hair', yielding the Neo-Latin word 'Spirochaeta', a 'coiled hair' [2]. The species epithet was derived from the Latin word 'africana', of African continent, found in the African alkaline Lake Magadi [1]. Here we present a summary classification and a set of features for S. africana strain Z-7692 T , together with the description of the complete genome sequencing and annotation.

Classification and features 16S rRNA analysis
A representative genomic 16S rRNA sequence of strain Z-7692 T was compared using NCBI BLAST [3,4] under default settings (e.g., considering only the high-scoring segment pairs (HSPs) from the best 250 hits) with the most recent release of the Greengenes database [5] and the relative frequencies of taxa and keywords (reduced to their stem Standards in Genomic Sciences [6]) were determined, weighted by BLAST scores. The most frequently occurring genera were Spirochaeta (91.1%), Treponema (5.8%) and Cytophaga (3.1%) (29 hits in total). Regarding the two hits to sequences from members of the species, the average identity within HSPs was 99.6%, whereas the average coverage by HSPs was 99.0%. Regarding the 19 hits to sequences from other members of the genus, the average identity within HSPs was 89.1%, whereas the average coverage by HSPs was 78.9%. Among all other species, the one yielding the highest score was Spirochaeta asiatica (NR_026300), which corresponded to an identity of 96.6% and an HSP coverage of 98.8%. (Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification.) The highestscoring environmental sequence was AF454308 (Greengenes short name 'spirochete clone ML320J-13'), which showed an identity of 90.6% and an HSP coverage of 99.3%. The most frequently occurring keywords within the labels of all environmental samples which yielded hits were 'microbi' (10.5%), 'mat' (8.8%), 'hypersalin' (6.3%), 'new' (4.2%) and 'world' (4.1%) (221 hits in total). Environmental samples which yielded hits of a higher score than the highest scoring species were not found, indicating that this species is rarely found in environmental sequencing. Figure 1 shows the phylogenetic neighborhood of S. africana in a 16S rRNA based tree. The sequences of the three identical 16S rRNA gene copies in the genome differ by two nucleotides from the previously published 16S rRNA sequence (X93928).

Morphology and physiology
Cells of strain Z-7692 T are 0.25 to 0.3 µm in diameter and 15 to 30 µm (occasionally 7 to 40 µm) in length and form regular, stable primary coils [1] ( Figure 2); spherical bodies were seen in stationary-phase cultures (not visible in Figure 2). The cells are motile by periplasmic flagella [1] (not visible in Figure 2). The cell mass is orange [1]. S. africana is a Gram-negative, anaerobic, aerotolerant, mesophilic microorganism (Table 1) with an optimal growth temperature between 30°C and 37°C, and no growth observed above 47°C [1]. The optimum pH is 8.8-9.8, no growth is observed at pH 8 or pH 10.8 [1]. S. africana is halophilic and does not grows at NaCl concentrations below 3% or above 10% (wt/vol) [1].
S. africana utilizes mainly mono-and disaccharides as carbon and energy sources. Amino acids cannot be fermented. Glucose is fermented to acetate, ethanol and H 2 as the main fermentation products, with a minor amount of lactate produced in stationary phase [1]. Strain Z-7692 T is able to ferment fructose, maltose, trehalose, saccharose, cellobiose, glucose, glycogen, starch. Poor growth was observed with mannose and or xylose, no growth with galactose, Nacetylglucosamin or ribose. A supplement of vitamins is required [1].

Taxonomic perspective
The data presented in Figure 1, based on an evaluation of the 16S rRNA gene sequence data provide an interesting insight into the nomenclature and classification of members of the genus Spirochaeta. In determining which species currently placed in this genus should remain members of this genus it is important to note that the primary criterion is which species group with the type strain of the type species of the genus Spirochaeta. It should be noted that the type species of this genus is Spirochaeta plicatilis and only a description serves as the type since no type strain appears to be available. This makes it difficult to determine which species represented by living type strains belong within the genus Spirochaeta. This is important because the monophyletic group delineated by the majority of the members of the genus Spirochaeta and members of the genus Borrelia does not split into two monophyletic groups corresponding with the members of the genus Spirochaeta and Borrelia, but causes the members of the genus Spirochaeta to appear to be paraphyletic. If one of the goals of modern taxonomy is to classify species in a single genus only if the members of the genus constitute a monophyletic group, then there are three possible solutions. The first is that all members of the genus Borrelia should be transferred to the genus Spirochaeta, although this is also complicated by the fact that a type strain for the type species of the genus Borrelia, Borrelia anserine has never been designated. The second alternative would be to create a number of genera based on monophyletic groups to be found within the current analysis of members of the genus Spirochaeta. The third alternative would be to accept the status quo whereby members of the genus Spirochaeta appear to constitute a paraphyetic group. However, a key factor in attempting to undertake such a reclassification would be the absence of type strains of the type species of the genera Spirochaeta and Borrelia. There are already indications that the evolutionary group constituting members of the genera Spirochaeta and Borrelia show an interesting degree of diversity at the level of morphology, physiology and the genome. Phylogenetic tree highlighting the position of S. africana relative to the type strains of the other species within the phylum 'Spirochaetes'. The tree was inferred from 1,332 aligned characters [7,8] of the 16S rRNA gene sequence under the maximum likelihood (ML) criterion [9]. Rooting was done initially using the midpoint method [10] and then checked for its agreement with the current classification ( Table 1). The branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent to the branches are support values from 350 ML bootstrap replicates [11] (left) and from 1,000 maximum-parsimony bootstrap replicates [12] (right) if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [13] are labeled with one asterisk, those also listed as 'Complete and Published' with two asterisks (see [14][15][16][17][18][19][20] and CP003155 for Sphaerochaeta pleomorpha, CP002903 for Sphaerochaeta thermophila, CP002696 for Treponema brennaborense, CP001841 for T. azotonutricium and CP001843 for T. primitia. Note: Spirochaeta caldaria, S. stenostrepta and S. zuelzerae were effectively renamed to T. caldaria, T. stenostrepta and T. zuelzerae in [15], however, the names have not yet been validily published.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position [36,37], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [38]. The genome project is deposited in the Genomes On Line Database [13] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI) using state of the art sequencing technology [39]. A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
S. africana strain Z-7692 T , DSM 8902, was grown anaerobically in DSMZ medium 700 (Alkaliphilic Spirochaea medium) [40] at 37°C. DNA was isolated from 0.5-1 g of cell paste using MasterPure Gram-positive DNA purification kit (Epicentre MGP04100) following the standard protocol as recommended by the manufacturer with modification st/LALM for cell lysis as described in Wu et al. 2009 [41]. DNA is available through the DNA Bank Network [42].  [46], or sequencing cloned bridging PCR fragments with subcloning. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks (J.-F. Chang, unpublished). A total of 132 additional reactions were necessary to close some gaps and to raise the quality of the final contigs. Illumina reads were also used to correct potential base errors and increase con-sensus quality using a software Polisher developed at JGI [47]. The error rate of the final genome sequence is less than 1 in 100,000. Together, the combination of the Illumina and 454 sequencing platforms provided 480.9 x coverage of the genome. The final assembly contained 509,107 pyrosequence and 12,708,968 Illumina reads.

Genome annotation
Genes were identified using Prodigal [48] as part of the DOE-JGI genome annotation pipeline [20], followed by a round of manual curation using the JGI GenePRIMP pipeline [49]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [50].

Genome properties
The genome consists of a 3,285,855 bp long chromosome with a G+C content of 57.8% (Table 3 and Figure 3). Of the 2,874 genes predicted, 2,817 were protein-coding genes, and 57 RNAs; 35 pseudogenes were also identified. The majority of the proteincoding genes (74.2%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.   An estimate of the overall similarity between the genomes of S. africana, and those of the other Spirochaeta species was generated with the GGDC-Genome-to-Genome Distance Calculator [51,52]. This system calculates the distances by comparing the genomes to obtain HSPs (high-scoring segment pairs) and interfering distances from the set of formulas (1, HSP length / total length; 2, identities / HSP length; 3, identities / total length). Table 5 shows the results of the pairwise comparison.
The comparison of S. africana with S. alkalica reached the highest scores using the GGDC, 5.2% of the average of genome length are covered with HSPs. The identity within the HSPs was 86.4%, whereas the identity over the whole genome was 4.5%. Lower similarity scores were observed in the comparison of S. africana with S. caldaria and with S. smaragdinae only 1.62% and 1.64%, respectively, of the average of both genome lengths are covered with HSPs. The identity within these HSPs was 84.5% and 83.5%, respectively, whereas the identity over the whole genome was only 1.4% in both comparisons. S. alkalica shows the highest GGDC scores with S. smaragdinae: 2.5% of the average of genome length are covered with HSPs and the identity within the HSPs was 87.7%, whereas the identity over the whole genome was 2.2% [51].