Complete genome sequence of Mycobacterium sp. strain (Spyr1) and reclassification to Mycobacterium gilvum Spyr1

Mycobacterium sp.Spyr1 is a newly isolated strain that occurs in a creosote contaminated site in Greece. It was isolated by an enrichment method using pyrene as sole carbon and energy source and is capable of degrading a wide range of PAH substrates including pyrene, fluoranthene, fluorene, anthracene and acenapthene. Here we describe the genomic features of this organism, together with the complete sequence and annotation. The genome consists of a 5,547,747 bp chromosome and two plasmids, a larger and a smaller one with sizes of 211,864 and 23,681 bp, respectively. In total, 5,588 genes were predicted and annotated.


Introduction
Strain Spyr1 (=LMG 24558, =DSM 45189) is a new strain which based on its morphological and genomic features, belongs to the genus Mycobacterium [1]. It was isolated from Perivleptos, a creosote polluted site in Epirus, Greece (12 Km North of the city of Ioannina), where a wood preserving industry was operating for over 30 years. Strain Spyr1 is of particular interest because it is able to utilize a wide range of PAH substrates as sole sources of carbon and energy, including pyrene, fluoranthene, fluorene, anthracene and acenapthene. Microbial degradation is one of the major routes by which Polycyclic Aromatic Hydrocarbons (PAHs) can be removed from the environment. Strain Spyr1 metabolizes pyrene to 1-Hydroxy-2naphthoic acid which subsequently is degraded via o-phthalic acid, a pathway also proposed for other Mycobacterium strains [1] exhibiting desirable PAH degradation properties as follows. Complete degradation of pyrene at concentrations 80 mg/L occurred within eight days of incubation in the dark [1]. The extrapolated degradation rate for the growth-phase can be averaged to 10 gml -1 day -1 , a value similar to that reported for other Mycobacterium species [2,3]. Addition of vitamins or trace amounts of yeast extract were not required for the growth of Spyr1 on any PAH, unlike other Mycobacterium spp. [4]. Use of free or entrapped cells of strain Spyr1 resulted in total removal of PAH from spiked soil samples [1]. Here a summary classification and a set of features for strain Spyr1, along with the description of the complete genome sequence and annotation are presented.

Classification and Features
The phylogenetic tree of strain Spyr1 according to 16S rDNA sequences is depicted in Figure 1. The sequence identity of the 16S rRNA genes of strain Spyr1 to those from the two M. gilvum strains is 99%, while the average nucleotide identity (ANI) [5] between strain Spyr1 and M. gilvum PYR-GCK is 98.5. This information indicates that Spyr1 is a strain of M gilvum. Accordingly, we propose the renaming of the Spy1 strain to M. gilvum Spyr1. The ANI values between strain Spyr1 and other sequenced Mycobacteria are depicted in Figure 2. Strain Spyr1 is an aerobic, non-motile rod, with a cell size of approximately 1.5-2.0 × 3.5-5.0 μm and produces only a weakly positive result under Gram staining. (Figure 3). Colonies were slightly yellowish on Luria agar. The temperature range for growth was 4-37°C with optimum growth at 30-37°C. The pH range was 6.5-8.5 with optimal growth at pH 7.0-7.5. Strain Spyr1 was found to be sensitive to various antibiotics, the minimal inhibitory concentrations were reported as follows: chlorampenicol 10 mgL -1 , erythromycin 10 mgL -1 , rifampicin 10 mgL -1 and tetracycline 10 mgL -1 . Catalase and nitrate reductase tests were positive, whereas arginine dihydrolase, gelatinase, lipase, lysine and ornithine decarboxylase, oxidase, urease, citrate assimilation and H2S production tests were negative. No acid was produced in the presence of glucose, lactose, sucrose, arabinose, galactose, glycerol, myo-inositol, maltose, mannitol, raffinose, sorbitol, sucrose, trehalose and xylose (see also Table 1).

Genome sequencing information Genome project history
This organism was selected for sequencing on the basis of its biodegradation capabilities, i.e. metabolizes phenanthrene as a sole source of carbon and energy. The genome project is deposited in the Genome OnLine Database [17] and the com-plete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI).
A summary of the project information is shown in Table 2.

Genome sequencing and assembly
The genome of Mycobacterium gilvum Spyr1 strain was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website [18]. Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 6,290 overlapping fragments of 1,000 bp and entered into assembly as pseudoreads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and to adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the Arachne assembler [19]. Possible mis-assemblies were corrected and gaps between contigs were closed by editing in Consed, with custom primer walks from sub-clones or PCR products. A total of 346 Sanger finishing reads were produced to close gaps, resolve repetitive regions, and raise the quality of the finished sequence. The error rate of the completed genome sequence is less than 1 in 100,000. Together, the combination of the Sanger and 454 sequencing platforms provided 53.56 x coverage of the genome. The final assembly contains 61,443 Sanger reads and 1,300,893 pyrosequencing reads.

Genome annotation
Genes were identified using Prodigal [20] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [21]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, Uni-Prot, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and In-terPro databases. Comparative analysis was performed within the Integrated Microbial Genomes (IMG) platform [22].

Genome properties
The genome consists of a 5,547,747 bp long circular chromosome with a G+C content of 68% and two plasmids (Figures 4-6, Table 3). The larger is 211,864 bp long with 66% G+C content and the smaller 23,681 bp with 64% G+C content (Table 3 and Figure 4, Figure 5 and Figure 6) Of the 5,434 genes predicted, 5,379 were protein-coding genes, and 55 RNAs; 30 pseudogenes were also identified. The majority of the protein-coding genes (67.3%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.