Genome sequence of the squalene-degrading bacterium Corynebacterium terpenotabidum type strain Y-11T (= DSM 44721T)

Corynebacterium terpenotabidum Takeuchi et. al 1999 is a member of the genus Corynebacterium, which contains Gram-positive and non-spore forming bacteria with a high G+C content. C. terpenotabidum was isolated from soil based on its ability to degrade squalene and belongs to the aerobic and non-hemolytic Corynebacteria. It displays tolerance to salts (up to 8%) and is related to Corynebacterium variabile involved in cheese ripening. As this is a type strain of Corynebacterium, this project describing the 2.75 Mbp long chromosome with its 2,369 protein-coding and 72 RNA genes will aid the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain Y-11 T (= DSM 444721 T ) is the type strain of the species Corynebacterium terpenotabidum [1]. It was originally isolated from soil, although the exact source has not been published [2,3]. The genus Corynebacterium is comprised of Grampositive bacteria with a high G+C content. It currently contains over 80 members [4] isolated from diverse backgrounds like human clinical samples [5] and animals [6], but also from soil [7] and ripening cheese [8].
Within this diverse genus, C. terpenotabidum has been proposed to form a subclade together with C. variabile DSM 20132 T and C. nuruki S6-4 T , demonstrating 97.4% and 95.9% similarity respectively between the 16S rRNA gene sequences. Information on the strain is scarce. It was isolated for its ability to metabolize the linear triterpene squalene and classified as an Arthrobacter species [2,3], but no further information on the strain was supplied. Neither the origin nor the exact isolation procedures were reported. C. terpenotabidum can cleave squalene yielding geranylacetone [2] but also accepts some squalene derivatives [3]. Here we present a summary classification and a set of features for C. terpenotabidum DSM 44721 T , together with the description of the genomic sequencing and annotation.

Classification and features
A representative genomic 16S rRNA sequence of C. terpenotabidum DSM 44721 T was compared to the Ribosomal Database Project database [9]. C. terpenotabidum shows highest similarity to C. variabile (97.4%). Figure 1 shows the phylogenetic neighborhood of C. terpenotabidum in a 16S rRNA based tree. Within the genus Corynebacterium, C. terpenotabidum forms a distinct subclade together with C. variabile and C. nuruki.
C. terpenotabidum Y-11 T cells are Gram-positive non acid fast rods (1.0-1.5 μm x 0.5-0.8 μm wide) that grow strictly aerobically in rough, grayishwhite colonies without diffusible pigments or aerial mycelia [1], [ Table 1]. Cells grow with a waxlike quality on solid medium and tend to clot in liquid culture. Scanning electron micrograph pictures of liquid grown cultures revealed slight morphological differences between free-floating cells and clotted cells ( Figure 2). Species with at least one publicly available g enome sequence (not necessarily the type strain) are hig hlig hted in bold face. The tree is based on sequences alig ned by the RDP alig ner and utilizes the Jukes-Cantor c orrected distance model to construct a distance matrix based on alig nment model positions without alig nment inserts, using a minimum comparable position of 200. The tree is built with RDP Tree Builder, which utilizes the Weig hbor method [10] with an alphabet size of 4 and leng th size of 1,000. The building of the tree also involves a bootstrapping process repeated 100 times to g enerate a majority consensus tree [11]. Rhodococcus equi (X80614) was used as an outg roup.

Genome sequencing and annotation
Genome project history C. terpenotabidum Y-11 T was selected for sequencing as part of a project to define the core genome and pan genome of the non-pathogenic corynebacteria.
While not being part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) project [23], sequencing of the type strain will nonetheless aid the GEBA effort. The genome project is deposited in the Genomes OnLine Database [24] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the Center of Biotechnology (CeBiTec). A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
C. terpenotabidum strain Y-11 T , DSM 44721, was grown aerobically in LB broth (Carl Roth GmbH, Karlsruhe,Germany) at 30 °C. DNA was isolated from ~ 10 8 cells using the protocol described by Tauch et al. 1995 [25].

Genome sequencing and assembly
The genome was sequenced using a 454 sequencing platform. A standard 3k paired end sequencing library was prepared according to the manufacturers protocol (Roche). The genome was sequenced using the GS-FLX platform with Titanium chemistry, yielding 384,252 total reads, providing 29.52× coverage of the genome. Pyrosequencing reads were assembled using the Newbler assembler v2.3 (Roche). The initial Newbler assembly consisted of 22 contigs in six scaffolds. Analysis of the six scaffolds revealed five that made up the chromosome, while the remaining one contained five copies of the RRN operon that caused the scaffold breaks. The scaffolds were ordered based on alignments to the complete genomes of C. variabile [26] and subsequent verification by restriction digestion, Southern blotting and hybridization with a 16S rDNA specific probe. The Phred/Phrap/Consed software package [27][28][29][30] was used for sequence assembly and quality assessment in the subsequent finishing process. After the shotgun stage, gaps between contigs were closed by editing in Consed (for repetitive elements) and by PCR with subsequent Sanger sequencing (IIT Biotech GmbH, Bielefeld, Germany). A total of 12 additional reactions were necessary to close gaps not caused by repetitive elements.
To raise the quality of the assembled sequence, Illumina reads were used to correct potential base errors and increase consensus quality. A WGS library was prepared using the Illumina-Compatible Nextera DNA Sample Prep Kit (Epicentre, WI, U.S.A) according to the manufacturer's protocol. The library was sequenced in a 2x 120 bp paired read run on the MiSeq platform, yielding 2,307,926 total reads. Together, the combination of the Illumina and 454 sequencing platforms provided 91.2× coverage of the genome.

Genome properties
The genome consists of one circular chromosome of 2,751,233 bp (67.02% G+C content) with no additional extrachromosomal elements present. A total of 2,441 genes were predicted, 2,369 of which are protein coding genes. 1,306 (55.13%) of the protein coding genes were assigned to a putative function with the remaining annotated as hypothetical proteins. In addition, 910 protein coding genes belong to 281 paralogous families in this genome, corresponding to a gene content redundancy of 38.41% [ Figure 3]. The properties and the statistics of the genome are summarized in Table 3, and Table 4. a) The total is based on either the size of the g enome in base pairs or the total number of genes in the annotated g enome. Standards in Genomic Sciences