Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Acidimicrobium ferrooxidans strain ICP T (DSM 10331 = NBRC 103882 = JCM 15462) is the type strain of the species, and was first isolated by Clark and Norris from hot springs in the Krísuvík geothermal area, Iceland [1,2]. For over fifteen years A. ferrooxidans ICP T remained extremely isolated phylogenetically as the sole type strain in the actinobacterial subclass Acidimicrobidae [3] ( Figure 1). Only at the time this manuscript was written, Kurahashi et al. [4] and Johnson et al. [5] described three novel type strains representing one novel family, Iamiaceae [4], and two novel genera within the Acidimicrobiales [5]: Iamia majanohamensis (isolated from sea cucumber [4]), Ferromicrobium acidiphilum (from a mine site in North Wales, UK [5]) and Ferrithrix thermotolerans (from Yellowstone National Park, Wyoming, USA [5]). With the exception of I. majanohamensis, all these strains live in acidic environments. Here we present a summary classification and a set of features for A. ferrooxidans ICP T (Table. 1), together with the description of the complete genomic sequencing and annotation. Altitude not reported Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [15]. If the evidence code is IDA the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.

Classification and features
Members of the species A. ferrooxidans have been isolated or identified molecularly from warm, acidic, iron-, sulphur-or mineral-sulfide-rich environments. Strain TH3, isolated from a copper leaching dump [1], shares 100% 16S rRNA gene sequence identity with strain ICP T , whose genome sequence is reported here. The moderately thermophilic bacterium N1-45-02 (EF199986) from a spent Canadian copper sulfide heap, and strain Y00168 from a geothermal site in Yellowstone National Park [6] are the only other pure cultivated members of the species. Acidimicrobium species in mixed cultures used for bioleaching were frequently reported [7]. Uncultured clone sequences with significant sequence similarity (>98%) were observed by Inskeep and colleagues from several hot springs in Yellowstone National Park (e.g. AY882832, DQ179032 and others), and from hydrothermally modified volcanic soil at Mount Hood (EU419128). Screening of environmental genomic samples and surveys reported at the NCBI BLAST server indicated no closely related phylotypes (the highest observed sequence identity was 91%) that can be linked to the species or genus. Several DGGE analyses indicated the presence of members of the genus Acidimicrobium in metal-rich mine waters and geothermal fields around the world. Figure 1 shows the phylogenetic neighborhood of A. ferrooxidans strain ICP T in a 16S rRNA based tree. The sequences of the two identical copies of the 16S rRNA genes in the genome differ in 16 positions (1.1%) from the previously published 16S rRNA sequence generated from of A. ferrooxidans DSM 10331 (U75647). The higher sequence coverage and overall improved level of sequence quality in whole-genome sequences, as compared to ordinary gene sequences, implies that the significant difference between the genome data and the previously reported 16S rRNA gene sequence might be due to sequencing errors in the previously reported sequence data.
Cells of strain ICP T are rather small (0.4 µm × 1-1.5 µm) Gram-positive rods [1]. Optimal growth occurs at 45-50°C, pH 2, with a maximal doubling time of six hours at 48°C [1]. Cells are motile during heterotrophic growth on yeast extract. ICP T forms small colonies when grown autotrophically on ferrous iron containing solid medium under air [1]. The closely related strain TH3 differs from the type strain ICP T only by its tendency to grow in filaments, which has not been observed for strain ICP T [1]. Strain ICP T can be distinguished from members of the genus Sulfobacillus by its lower requirement of CO2 for autotrophic growth [1]. Iron oxidation by ICP T cells was not influenced by supplementation of either glucose nor by increased CO2 concentration [1]. Thin section electron micrographs of A. ferrooxidans strains indicate intracellular vesicles when cells were grown on ferrous iron and yeast extract [1] ( Figure 2).

Figure 1.
Phylogenetic tree highlighting the position of A. ferrooxidans strain ICP T relative to all other type strains within the Acidimicrobiales and the type strains of all other orders within the Actinobacteria. The tree was inferred from 1,306 aligned characters [8,9] of the 16S rRNA gene under the maximum likelihood criterion [10] and rooted with Rubrobacteriales. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [11] are shown in blue, published genomes in bold.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of each phylogenetic position, and is part of the Genomic Encyclopedia of Bacteria and Archaea project. The genome project is deposited in the Genome OnLine Database [11] and the complete genome sequence in GenBank (CP001631). Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger, 454 and Illumina sequencing platforms. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI web site. 454 Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 2356 overlapping fragments of 1000bp and entered into assembly as pseudoreads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the Arachne assembler. Possible mis-assemblies were corrected and gaps between contigs were closed by custom primer walks from sub-clones or PCR products. 118 Sanger finishing reads were produced. Illumina reads were used to improve the final consensus quality using an in-house developed tool (the Polisher). The error rate of the completed genome sequence is less than 1 in 100,000. Together, the combination of the Sanger and 454 sequencing platforms provided 59.7 x coverage of the genome.

Genome annotation
Genes were identified using Prodigal [16] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [17]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes (IMG-ER) platform [18].

Genome properties
The genome is 2,158,157 bp long and comprises one main circular chromosome with a 68.3% GC content (Table 3 and Figure 3). Of the 2,092 genes predicted, 2038 were protein coding genes, and 54 RNAs. Seventy four pseudogenes were also identified. A total of 75.7% of the genes were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.