Complete genome sequence of Odoribacter splanchnicus type strain (1651/6T)

Odoribacter splanchnicus (Werner et al. 1975) Hardham et al. 2008 is the type species of the genus Odoribacter, which belongs to the family Porphyromonadaceae in the order ‘Bacteroidales’. The species is of interest because members of the Odoribacter form an isolated cluster within the Porphyromonadaceae. This is the first completed genome sequence of a member of the genus Odoribacter and the fourth sequence from the family Porphyromonadaceae. The 4,392,288 bp long genome with its 3,672 protein-coding and 74 RNA genes and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain 1651/6 T (= DSM 20712 = ATCC 29572 = JCM 15291) is the type strain of Odoribacter splanchnicus [1,2]. Currently, there are three species placed in the genus Odoribacter [1]. The generic name derives from the Latin noun odor meaning smell and the Neo-Latin word bacter meaning a rod, referring to a rod of (bad) smell [2]. The species epithet is derived from the Greek plural noun splanchna meaning innards, referring to the internal organs as the site of isolation [2]. O. splanchnicus strain 1651/6 T was isolated as Bacteroides splanchnicus from a human, abdominal abscess by Werner and Reichertz in 1971 [3] and described in 1975 [4]. The species was first validly published as B. splanchnicus due to a number of shared characteristics with the members of the genus Bacte-roides. However, the organism differs from other Bacteroides species in a number of important biochemical characteristics [5] and shows less than 20% relatedness in the homology of 16S rRNA genes compared to the B. fragilis group [6]. In 1994, through further studies of the phylogenetic structure of the bacteroides subgroup it became clear that B. splanchnicus did not belong to the genera Bacteroides, Prevotella or Porphyromonas, but fell just outside these three major clusters [7]. Finally, in 2008, the new genus Odoribacter was described and B. splanchnicus was reclassified as its new type species [2]. Additional isolates of O. splanchnicus have been obtained from stool specimens and surgically removed appendices [2]; in one case of pelviperitonitis the organism was iso-lated from a blood sample and peritoneal pus [8]. In general, O. splanchnicus can be described as an inhabitant of the human intestine that has the potential to become an opportunistic pathogen. Here we present a summary classification and a set of features for O. splanchnicus 1651/6 T , together with the description of the complete genomic sequencing and annotation.

Classification and features
A representative genomic 16S rRNA sequence of strain 1651/6 T was compared using NCBI BLAST under default settings (e.g., considering only the high-scoring segment pairs (HSPs) from the best 250 hits) with the most recent release of the Greengenes database [9] and the relative frequencies of taxa and keywords (reduced to their stem [10]) were determined, weighted by BLAST scores. The most frequently occurring genera were Bacteroides (43.5%), Odoribacter (37.9%), Alistipes (15.2%) and Brumimicrobium (3.4%) (21 hits in total). Regarding the two hits to sequences from members of the species, the average identity within HSPs was 99.7%, whereas the average coverage by HSPs was 97.9%. Regarding the two hits to sequences from other members of the genus, the average identity within HSPs was 93.4%, whereas the average coverage by HSPs was 42.5%. The highestscoring environmental sequence was EF401000 ('human fecal clone SJTU D 04 48'), which showed an identity of 99.8% and an HSP coverage of 98.2%. The most frequently occurring keywords within the labels of environmental samples which yielded hits were 'human' (13.4%), 'biopsi' (7.4%), 'mucos' (7.1%), 'fecal' (6.1%) and 'colon' (5.3%) (229 hits in total). The most frequently occurring keyword within the labels of environmental samples which yielded hits of a higher score than the highest scoring species was 'fecal/human' (50.0%) (27 hits in total). Figure 1 shows the phylogenetic neighborhood of O. splanchnicus in a 16S rRNA based tree. The sequences of the four 16S rRNA gene copies in the genome differ from each other by up to eight nucleotides, and differ by up to nine nucleotides from the previously published 16S rRNA sequence (L16496), which contains nine ambiguous base calls. The cells of O. splanchnicus generally have the shape of short rods (0.7 × 1.0-5.0 µm) which occur singly or in lightly associated groups ( Figure 2). They can also be pleomorphic. O. splanchnicus is a Gram-negative, non-pigmented and non sporeforming bacterium ( Table 1). The organism is described as non-motile and only ten genes associated with motility have been found in the genome (see below). O. splanchnicus grows well at 37°C, is strictly anaerobic, chemoorganotrophic and is able to ferment glucose, fructose, galactose, arabinose, lactose and mannose but does not utilize sucrose, rhamnose, trehalose or salicin [4,5]. The organism does not reduce nitrate but it produces indole from tryptophan and hydrolyzes esculin [28]. O. splanchnicus does not require hemin for growth but is highly stimulated by its presence and does not show hemolysis on blood agar. Growth is enhanced by the addition of 20% bile. Major fermentation products are acetic acid, propionic acid and succinic acid; butyric acid, isovaleric acid and isobutyric acid are produced in small amounts [4,29]. When amino acids are used as carbon sources, only lysine enables butyrate production [4]. It is known that O. splanchnicus possesses highly active pentose phosphate pathway enzymes such as glucose-6-phosphate dehydrogenase and 6-phospho-gluconate dehydrogenase as well as active malate dehydrogenase and glutamate dehydrogenase [30]. The organism produces large amounts of hydrogen and H2S. Strain 1651/6 T is phosphatase, α-and β-galactosidase, α-fucosidase, N-acetylglucosaminidase and glutamic acid decarboxylase active and urease and catalase inactive [2]. The organism produces arginine arylamidase, leucyl glycine arylamidase, leucine arylamidase, alanine arylamidase (own, unpublished data) and glycylprolyl arylamidase [31]. O. splanchnicus is reported to grow in the presence of aminoglycosides and polymyxins (minimum inhibitory concentration (MIC) value greater than 60 µg/ml); chloramphenicol, penicillins and cephalosporins show bacteriostatic activity (5-40 µg/ml). The organism is susceptible to tetracyclines, lincomycin, clindamycin, rifampicin and erythromycin (MIC values less than 0.5 µg/ml) [4,28].

Chemotaxonomy
Little chemotaxonomic information is available for strain 1651/6 T . It possesses meso-diaminopimelic acid in its peptidoglycan [30], sphingophospholipids as polar lipids [31] and the sole menaquinone present is MK-9 [29]. The major fatty acids found are iso-C15:0, C14:0, anteiso-C15:0 and C16:0 3-OH [30].  [11,12] of the 16S rRNA gene sequence under the maximum likelihood criterion [13]. Rooting was done initially using the midpoint method [14] and then checked for its agreement with the current classification ( Table 1). The branches are scaled in terms of the expected number of substitutions per site. Numbers to the right of bifurcations are support values from 250 bootstrap replicates [15] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [16] are labeled by one asterisk, published genomes by two asterisks [17,18,19].  Phylum 'Bacteroidetes' TAS [22] Class 'Bacteroidia' TAS [23,24] Order 'Bacteroidales' TAS [25] Family 'Porphyromonadaceae' TAS [25] Genus , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [27]. If the evidence code is IDA, then the property was directly observed by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position [33], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [34]. The genome project is deposited in the Genomes On Line Database [16] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2. DNA was isolated from 0.5-1 g of cell paste using Jetflex Genomic DNA Purification kit (GENOMED 600100) following the standard protocol as recommended by the manufacturer, but adding 20 µL proteinase K for 45 min lysis at 58ºC. DNA is available through the DNA Bank Network [36].

Genome sequencing and assembly
The genome was sequenced using a combination of Illumina and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website [37]. Pyrosequencing reads were assembled using the Newbler assembler version 2.3-PreRelease-10-21-2009 (Roche). The initial Newbler assembly consisting of 57 contigs in eight scaffolds was converted into a phrap [38] assembly by making fake reads from the consensus, to collect the read pairs in the 454 paired end library. Illumina GAii sequencing data (2,241.8 Mb) was assembled with Velvet, version 0.7.63 [39] and the consensus sequences were shredded into 1.5 kb overlapped fake reads and assembled together with the 454 data. The 454 draft assembly was based on 138 Mb 454 draft data and all of the 454 paired end data. Newbler parameters areconsed -a 50 -l 350 -g -m -ml 20. The Phred/Phrap/Consed software package [40] was used for sequence assembly and quality assessment in the subsequent finishing process. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with gapResolution [37], Dupfinisher, or sequencing cloned bridging PCR fragments with subcloning or transposon bombing (Epicentre Biotechnologies, Madison, WI) [40]. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks (J.-F.Chang, unpublished). A total of 65 additional reactions were necessary to close gaps and to raise the quality of the finished sequence. Illumina reads were also used to correct potential base errors and increase consensus quality using a software Polisher developed at JGI [41]. The error rate of the completed genome sequence is less than 1 in 100,000. Together, the combination of the Illumina and 454 sequencing platforms provided 552.5 × coverage of the genome. The final assembly contained 389,415 pyrosequence and 33,128,505 Illumina reads.

Genome annotation
Genes were identified using Prodigal [42] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [43]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [44].

Genome properties
The genome consists of a 4,392,288 bp long chromosome with a G+C content of 43.4% (Table 3 and Figure 3). Of the 3,746 genes predicted, 3,672 were protein-coding genes, and 74 RNAs; 175 pseudogenes were also identified. The majority of the protein-coding genes (61.2%) were assigned with a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.