Complete genome sequence of Brachyspira murdochii type strain (56-150T)

Brachyspira murdochii Stanton et al. 1992 is a non-pathogenic, host-associated spirochete of the family Brachyspiraceae. Initially isolated from the intestinal content of a healthy swine, the ‘group B spirochaetes’ were first described as Serpulina murdochii. Members of the family Brachyspiraceae are of great phylogenetic interest because of the extremely isolated location of this family within the phylum ‘Spirochaetes’. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a type strain of a member of the family Brachyspiraceae and only the second genome sequence from a member of the genus Brachyspira. The 3,241,804 bp long genome with its 2,893 protein-coding and 40 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain 56-150 T (= DSM 12563 = ATCC 51284 = CIP 105832) is the type strain of the species Brachyspira murdochii. This strain was first described as Serpulina murdochii [1,2], and later transferred to the genus Brachyspira [3]. The genus Brachyspira consists currently of seven species, with Brachyspira aalborgi as the type species [4,5]. The genus Brachyspira is the only genus in the not yet formally described family 'Brachyspiraceae' [6,7]. The generic name derives from 'brachys', Greek for short, and 'spira', Latin for a coil, a helix, to mean 'a short helix' [5]. The species name for B. murdochii de-rives from the city of Murdoch, in recognition of work conducted at Murdoch University in Western Australia, where the type strain was identified [1]. Some species of the genus Brachyspira cause swine dysentery and porcine intestinal spirochetosis. Swine dysentery is a severe, mucohemorrhagic disease that sometimes leads to death of the animals [1]. B. murdochii is generally not considered to be a pathogen, although occasionally it has been seen in association with colitis in pigs [3,8], and was also associated with clinical problems on certain farms [9][10][11].
In 1992, a user-friendly and robust novel PCR-based restriction fragment length polymorphism analysis of the Brachyspira nox-gene was developed which allows one to identify with high specificity members of B. murdochii using only two restriction endonucleases only [12]. More recently, a multi-locus sequence typing scheme was developed that facilitates the identification of Brachyspira species and reveals the intraspecies diversity of B. murdochii [13] (see also http://pubmlst.org/brachyspira/).
Only one genome of a member of the family 'Brachyspiraceae' has been sequenced to date: B. hyodysenteriae strain WA1 [14]. It is an intestinal pathogen of pigs. Based on 16S rRNA sequence this strain is 0.8% different from strain 56-150 T . Here we present a summary classification and a set of features for B. murdochii 56-150 T , together with the description of the complete genomic sequencing and annotation.

Classification and features
Brachyspira species colonize the lower intestinal tract (cecum and colons) of animals and humans [6]. The type of B. murdochii, 56-150 T , was isolated from a healthy swine in Canada [1,15]. Other isolates have been obtained from wild rats in Ohio, USA, from laboratory rats in Murdoch, Western Australia [16], and from the joint fluid of a lame pig [17]. Further isolates have been obtained from the feces or gastrointestinal tract of pigs in Canada, Tasmania, Queensland, and Western Australia [2,15]. The type strains of the other species of the genus Brachyspira share 95.9-99.4% 16S rRNA sequence identity with strain 56-150 T . GenBank contains 16S rRNA sequences for about 250 Brachyspira isolates, all of which share at least 96% sequence identity with strain 56-150 T [18]. The closest related type strain of a species outside of the Brachyspira, but within the order Spirochaetales, is Turneriella parva [19], which exhibits only 75% 16S rRNA sequence similarity [18]. 16S rRNA sequences from environmental samples and metagenomic surveys do not exceed 78-79% sequence similarity to strain 56-150 T , with the sole exception of one clone from a metagenomic analysis of human diarrhea [20], indicating that members of the species, genus and even family are poorly represented in the habitats outside of various animal intestines screened thus far (status March 2010). Figure 1 shows the phylogenetic neighborhood of B. murdochii 56-150 T in a 16S rRNA based tree. The sequence of the single 16S rRNA gene in the genome sequence is identical with the previously published 16S rRNA gene sequence generated from DSM 12563 (AY312492). The tree was inferred from 1,396 aligned characters [21,22] of the 16S rRNA gene sequence under the maximum likelihood criterion [23] and rooted in accordance with the current taxonomy. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates if [24] larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [25] are shown in blue, published genomes in bold.

Chemotaxonomy
At present, there are no reports on the chemotaxonomy of B. murdochii. However, some data are available for B. innocens (formerly classified as Treponema innocens [6]), the species that is currently most closely related to B. murdochii [13]. B. innocens cellular phospholipids and glycolipids were found to contain acyl (fatty acids with ester linkage) with alkenyl (unsaturated alcohol with ether linkage) side chains [6,38]. The glycolipid of B. innocens contains monoglycosyldiglyceride (MGDG) and, in most strains, acylMGDG is also found, with galactose as the predominant sugar moiety [38]. Standards in Genomic Sciences Altitude not reported TAS Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [35]. If the evidence code is IDA, then the property was directly observed by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position [39], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [40]. The genome project is deposited in the Genome OnLine Database [25] and the com-plete genome sequence is deposited in GenBank Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing performed can be found at the JGI website (http://www.jgi.doe.gov/). In total, 861,386 Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 3,554 overlapping fragments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and adjust inflated qscores. A hybrid 454/Sanger assembly was made using the parallel phrap assembler (High Performance Software, LLC). Possible misassemblies were corrected with Dupfinisher or transposon bombing of bridging clones [42]. A total of 300 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. The error rate of the completed genome sequence is less than 1 in 100,000. Together, the combination of the Sanger and 454 sequencing platforms provided 68.6× coverage of the genome. The final assembly contains 79,829 Sanger reads and 861,386 pyrosequencing reads.

Genome annotation
Genes were identified using Prodigal [43] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [44]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [45].

Genome properties
The genome is 3,241,804 bp long and comprises one main circular chromosome with an overall GC content of 27.8% (Table 3 and Figure 3). Of the 2,893 genes predicted, 2,853 were protein-coding genes, and 40 RNAs. A total of 44 pseudogenes were identified. The majority of the protein-coding genes (66.2%) were assigned a putative function while those remaining were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.