Complete genome sequence of Spirosoma linguale type strain (1T)

Spirosoma linguale Migula 1894 is the type species of the genus. S. linguale is a free-living and non-pathogenic organism, known for its peculiar ringlike and horseshoe-shaped cell morphology. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the third completed genome sequence of a member of the family Cytophagaceae. The 8,491,258 bp long genome with its eight plasmids, 7,069 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain 1 T (= DSM 74 = ATCC 33905 = LMG 10896) is the type strain of the species Spirosoma linguale, which is the type species of the genus Spirosoma. The genus currently consists of five species [1]. Strain 1 T is reported to be isolated from a laboratory water bath (websites of DSMZ and ATCC), however, a proper reference could not be identified. Another strain of S. linguale was isolated from fresh water from deep wells in Long Beach, California, USA [2]. Other strains from the genus Spirosoma were isolated from high arctic permafrost soil in Norway [3], soil from a ginseng field in Pocheon province, South Korea [4], and fresh water from the Woopo wetlands, South Korea [5]. This would allow the hypothesis that S. linguale is a free-living species with a worldwide distribution. The genus name Spirosoma derives from 'spira' from Latin meaning coil combined with 'soma', Latin for 'body', resulting in 'coiled body' [1]. Spirosoma was the first genus in the family Spirillaceae in Migula's "System der Bakterien" [6]. The species name is effectively published by Migula in 1894 [7] and validly published by Skerman in 1980 [8]. Various taxonomic treatments have placed this organism either in the family "Flexibacteraceae" or the family Cyto-phagaceae. This would appear to be due to a number of nomenclatural problems. The family "Flexibacteriaceae" as outlined in TOBA 7.7 would include Cytophaga hutchinsonii, which is the type species of the genus Cytophaga, which, in turn is the type of the family Cytophagaceae, a name that may not be replaced by the family name "Flexibacteriaceae" as long as Cytophaga hutchinsonii is one of the included species. However, the topology of the 16S rDNA based dendrogram indicates that it may be possible to define a second family, including the genus Spirosoma, but excluding Cytophaga hutchinsonii. At the same time, the family Cytophagaceae may be defined to exclude the type species of the genus Flexibacter and members of the genus Spirosoma. It should also be remembered that the genus Spirosoma is the type of the family Spirosomaceae Larkin and Borrall 1978. At present the higher taxonomic ranks of this group of organisms lacks formal modern descriptions and circumscriptions making it difficult to make definitive state-ments that would hold over the next few years. Here we present a summary classification and a set of features for S. linguale 1 T , together with the description of the complete genomic sequencing and annotation.

Classification and features
Uncultured clone sequences in Genbank showed 96% or less sequence identity to the 16S gene sequence (AM000023) of strain S. linguale 1 T . No reasonable sequence similarity (>87%) to any metagenomic survey were reported from the NCBI BLAST server (October 2009). Figure 1 shows the phylogenetic neighborhood of for S. linguale 1 T in a 16S rRNA based tree. The sequences of the four identical 16S rRNA gene copies in the genome of S. linguale 1 T are also identical with the previously published 16S rRNA sequence generated from LMG 10896 (AM000023). Phylogenetic tree highlighting the position of S. linguale 1 T and the type strains of the other species within the genus relative to the other type strains within the family Cytophagaceae. The tree was inferred from 1,320 aligned characters [9,10] of the 16S rRNA gene sequence under the maximum likelihood criterion [11] and rooted with the type strain of the family Sphingobacteriaceae. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [12] are shown in blue, published genomes such as the one of Dyadobacter fermentans [13] in bold.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position, and is part of the Genomic Encyclopedia of Bacteria and Archaea project. The genome project is deposited in the Genome OnLine Database [12] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at http://www.jgi.doe.gov/. 454 Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche).
Large Newbler contigs were broken into 9,401 overlapping fragments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and to adjust inflated qscores. A hybrid 454/Sanger assembly was made using the parallel phrap assembler (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher [29]

Genome annotation
Genes were identified using Prodigal [30] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [31]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [32].

Genome properties
The genome consists of a 8,078,757 bp long chromosome and eight plasmids with 6,072 to 189,452 bp length (Table 3 and Figure 3). Of the 7,129 genes predicted, 7,069 were proteincoding genes, and 60 RNAs; 131 pseudogenes were also identified. The majority of the proteincoding genes (61.5%) were assigned with a putative function while those remaining were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.