Complete genome sequence of Catenulispora acidiphila type strain (ID 139908T)

Catenulispora acidiphila Busti et al. 2006 is the type species of the genus Catenulispora, and is of interest because of the rather isolated phylogenetic location it occupies within the scarcely explored suborder Catenulisporineae of the order Actinomycetales. C. acidiphilia is known for its acidophilic, aerobic lifestyle, but can also grow scantly under anaerobic conditions. Under regular conditions, C. acidiphilia grows in long filaments of relatively short aerial hyphae with marked septation. It is a free living, non motile, Gram-positive bacterium isolated from a forest soil sample taken from a wooded area in Gerenzano, Italy. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of the actinobacterial family Catenulisporaceae, and the 10,467,782 bp long single replicon genome with its 9056 protein-coding and 69 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Catenulispora acidiphila strain ID 139908 T (= DSM 44928 = NRRL B-24433 = JCM 14897) is the type species of the genus Catenulispora which is the type genus of family Catenulisporaceae, as well as of the suborder Catenulisporineae [1]. The Catenulisporacineae is a rather small (six genera in two families) and young taxon [2], for which no completed genome sequence has been reported to date ( Figure 1). The four Catenulispora type strains were isolated from paddy field or forest soil, prefer slightly acidic habitats, and form vegetative and aerial mycelia [1,7,8]. Here we present a summary classification and a set of features for C. aci-diphila ID 139908 T ( Table 1), together with the description of the complete genomic sequencing and annotation.

Classification and features
The strains most probably belonging to the species C. acidiphila are also known from diversity studies performed on isolates collected from soils of various geographic origin: the 'Neo' strains from Italian and South American soils (Neo 1, 2, 6, 9, 15) as described by Busti et al. [15], several isolates from Ellinbank, Australia, (Ellin 5034, 5116, 5119) as described by Joseph et al. [16], and a Ko-rean isolate D8-90T (AM690741), all of which share at least 99.3% 16S rRNA gene sequence identity with strain ID 139908 T . None of the samples sequenced in environmental genomic survey and screening programs surpassed 92% sequence similarity with strain ID 139908 T , indicating a lack of close links of these phylotypes to the species C. acidiphila or the genus Catenulispora. Figure 1 shows the phylogenetic neighborhood of C. acidiphila strain ID 139908 T in a 16S rRNA based tree. All three 16S rRNA gene copies in the genome of strain D 139908 T are identical, and also match the previously published 16S rRNA sequence generated from DSM 20547 (AJ865857).

Figure 1.
Phylogenetic tree of C. acidiphila ID 139908 T and all type strains of the genus Catenulispora, inferred from 1,421 aligned characters [3,4] of the 16S rRNA sequence under the maximum likelihood criterion [5]. The tree was rooted with the type strains of the genera within the Streptomycetaceae (Streptomycineae, Actinomycetales). Also included are the type strains from the sister family of Catenulisporaceae, Actinospicaceae. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates if larger than 60%. Strains with a genome sequencing project registered in GOLD [6] are printed in blue; published genomes in bold.
C. acidiphila strain ID 139908 T was described as a Gram-positive, acidophilic, non-acid fast, nonmotile, essentially aerobic bacterium forming both vegetative and aerial mycelia [1] (Figure 2 and Table  1). Non-fragmentary vegetative mycelium and aerial hypha are straight to slightly flexuous and start to septate in chains of cylindrical arthrospores with a rugose surface when sporulation is induced [1]. Strain ID 139908 T grows on different agar media while producing brownish pigments and a whitish aerial mass which turned to yellow/green with the aging of bacteria [1]. The brownish pigments were not observed on tyrosine-supplemented Suter medium which indicated that they are not melanin-related [1]. The strain grows well in the presence of 3% (w/v) NaCl with a progressive reduction of pigmentation which started at 1% NaCl. Strain ID 139908 T grows better under aerobic conditions but is capable of reduced and non pigmented growth under microaerophilic and anaerobic conditions [1]. It is resistant to lysozyme (at least 100μg/ml) [1] which was not reported for any of the strains of the genus Catenulispora. Optimum temperature for growth was 22-28°C and the pH for growth ranges from 4.3 to 6.8 with an optimum pH level 6.0 but scant growth was reported up to pH 7.5 [1]. The organism is able to hydrolyze starch and casein, liquefy gelatin, and to utilize D-galactose, Dfructose, arabinose, xylose and gluconate but not glycerol, L-arabinose, D-mannitol, methyl-β-Dxylopyranoside, methyl-α-D-glucopyranoside, cellulose or sucrose [1].

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position, and is part of the Genomic Encyclopedia of Bacteria and Archaea project. The genome project is deposited in the Genomes OnLine Database [6] and the complete genome sequence in GenBank. Sequencing, finishing and annotation was performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
C. acidiphila strain ID 139908 T (DSM 44928) was grown in DSMZ medium 65 (GYM Streptomycetes Medium) at 28°C. DNA was isolated from 0.5-1 g of cell paste using the JGI CTAB protocol with lysis modification ALM as described in Wu et al. [17].

Genome sequencing and assembly
The genome was sequenced using the Sanger sequencing platform only. All general aspects of library construction and sequencing performed can be found at the JGI website. The Phred/Phrap/Consed software package was used for sequence assembly and quality assessment. After the shotgun stage, reads were assembled with parallel phrap (High Performance Soft ware, LLC). Possible mis-assemblies were corrected with Dupfinisher [18] or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI). Gaps between contigs were closed by editing in Consed, custom primer walking or PCR amplification (Roche Applied Science, Indianapo-lis, IN). A total of 2,556 finishing reactions were produced to close gaps and to raise the quality of the finished sequence. The completed genome sequences of C. acidiphila contains 126,099 Sanger reads, achieving an average of 10x sequence coverage per base with an error rate less than 1 in 100,000.

Genome annotation
Genes were identified using Prodigal [19] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [20]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, Uni-Prot, TIGRFam, Pfam, PRIAM, KEGG, COG, and In-terPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [21].

Genome properties
The genome is 10,467,782 bp long and comprises one circular chromosome with a 69.8% GC content (Table. 3 Figure 3). Of the 9,122 genes predicted, 9,056 were protein coding genes and 66 RNAs. In addition, 142 pseudogenes were also identified. Of the genes discovered, 68.2% were assigned with a putative function while the remaining genes were annotated as hypothetical proteins. The properties and the statistics of the genome are summarized in Table 3. The distribution of genes into COG functional categories is presented in Figure 3 and Table 4.