Complete genome sequence of Alicyclobacillus acidocaldarius type strain (104-IAT)

Alicyclobacillus acidocaldarius (Darland and Brock 1971) is the type species of the larger of the two genera in the bacillal family ‘Alicyclobacillaceae’. A. acidocaldarius is a free-living and non-pathogenic organism, but may also be associated with food and fruit spoilage. Due to its acidophilic nature, several enzymes from this species have since long been subjected to detailed molecular and biochemical studies. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family ‘Alicyclobacillaceae’. The 3,205,686 bp long genome (chromosome and three plasmids) with its 3,153 protein-coding and 82 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain 104-IA T (= DSM 446 = ATCC 27009 = JCM 5260 = NCIMB 11725) is the type strain of the species Alicyclobacillus acidocaldarius, which is the type species of the genus Alicyclobacillus [1]. The genus currently consists of 20 species and two subspecies. Strain 104-IA T was originally isolated as 'Bacillus acidocaldarius' in 1971 (or earlier) from a hot and acidic spring in Yellowstone National Park, USA. In 1992, it was reclassified on the basis of comparative 16S rRNA gene sequence analysis into the new genus Alicyclobacillus [1]. With the description of A. acidocaldarius subsp. rittmannii in 2002 [2] the subspecies name A. acidocaldarius subsp. acidocaldarius was automatically created following rule 46 of the bacteriological code [3], with 104-IAT as its type strain. (hereinafter nevertheless referred to as A. acidocaldarius, without subspecies epithet). The species name derives from 'acidus' from Latin meaning acidic combined with 'caldarius', Latin for 'belonging to the hot'. Due to its thermoacidic nature, this species serves as a model organism for molecular and biochemical studies of its enzymes [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19]. Strain 104-IA T has also been used to produce the restriction enzyme BacI [20]. Here we present a summary classification and a set of features for A. acidocaldarius 104-IA T , together with the description of the complete genomic sequencing and annotation.

Classification and features
The type strain 104-IA T and several other strains were isolated from acidic hot springs in the Yellowstone National Park, USA, from soil from an acid fumarole in the Hawaiian Volcano National Park [21], and also from acidic environments in Japan [22]. Other strains, as identified by 16S rRNA gene sequences and by metabolic traits, were isolated from orchard soil, mango juice, vinegar flies or pre-pasteurized pear puree in South Africa [23][24][25]. These findings are supported by the experimentally determined heat resistance of A. acidocaldarius strains in water, acidic buffer and orange juice [26]. Thus, A. acidocaldarius might be involved in food and fruit spoilage, which is a characteristic of several other species of the genus Alicyclobacillus [23][24][25]27]. Clones with high sequence similarity (99%, AB042056) with the 16S rRNA gene sequence of strain 104-IA T are reported by the NCBI BLAST server from a 'simulated low level waste site' in USA (GQ263212), but not with any metagenomic environmental samples (October 2009). Figure 1 shows the phylogenetic neighborhood of for A. acidocaldarius 104-IA T in a 16S rRNA based tree. The sequences of the six 16S rRNA gene copies in the genome of A. acidocaldarius 104-IA T , differ from each other by up to six nucleotides, and differ by up to five nucleotides from the previously published 16S rRNA sequence derived from DSM 446 (AJ496806).

Figure 1.
Phylogenetic tree highlighting the position of A. acidocaldarius 104-IA T relative to the other type strains within the family. The tree was inferred from 1,419 aligned characters [28,29] of the 16S rRNA gene sequence under the maximum likelihood criterion [30] and rooted with the genus Sulfobacillus. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [31] are shown in blue, published genomes in bold.

Chemotaxonomy
Characteristic for several Alicyclobacillus species is the presence of a large amount of ω-alicyclic fatty acids [1,46]. As such, strain 104-IA T has approximately 51 ω-cyclohexane C17:0 and 33% ωcyclohexane C19:0. Other fatty acids such as C16:0, C18:0, iso-C15:0, iso-C16:0, iso-C18:0, anteiso-C15:0, and anteiso-C17:0 amount at individual levels of approximately 1% to 5% [22,43]. Fatty acid composition is rather stable though not static across different temperature and pH values [47]. Moreover, strain 104-IA T produces hopanoids, a group of pentacyclic triterpenoids, which together with the fatty acids constitute the lipophilic core of the cytoplasmic membrane. The amount of hopanoids depends on the temperature more so than the pH value [48]. The main isoprenoid quinone is menaquinone with seven isoprene units (MK-7) [1].  Altitude not reported Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [42]. If the evidence code is IDA, then the property was directly observed for a living isolate by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position, and is part of the Genomic Encyclopedia of Bacteria and Archaea project. The genome project is deposited in the Genome OnLine Database [31] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger and 454 sequencing platforms. All general aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov/. 454 Pyrosequencing reads were assembled using the Newbler assembler version 1.1.02.15 (Roche). Large Newbler contigs were broken into 3,478 overlapping fragments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and to adjust inflated q-scores. A hybrid 454/Sanger assembly was made using the parallel phrap assembler (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher [51] or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI). Gaps between contigs were closed by editing in Consed, custom primer walk or PCR amplification. A total of 767 Sanger finishing reads were produced to close gaps, to resolve repetitive regions, and to raise the quality of the finished sequence. The error rate of the completed genome sequence is less than 1 in 100,000. The final assembly contains 24,980 Sanger and 363,136 Pyrosequencing reads. Together all sequence types provided 34.3 × coverage of the genome.

Genome annotation
Genes were identified using Prodigal [52] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [53]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, Uni-Prot, TIGRFam, Pfam, PRIAM, KEGG, COG, and In-terPro databases. Additional gene prediction analysis and manual functional annotation were performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [54].

Genome properties
The genome consists of a 3,018,755 bp long chromosome and three plasmids of 91,726 bp, 87,298 bp, and 7,907 bp (Table 3 and Figure 3). Of the 3,235 genes predicted, 3,153 were protein-coding genes, and 82 RNAs; 69 pseudogenes were also identified. The majority of the protein-coding genes (68.4%) were assigned with a putative function while those remaining were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.