Complete genome sequence of Terriglobus saanensis type strain SP1PR4T, an Acidobacteria from tundra soil

Terriglobus saanensis SP1PR4T is a novel species of the genus Terriglobus. T. saanensis is of ecological interest because it is a representative of the phylum Acidobacteria, which are dominant members of bacterial soil microbiota in Arctic ecosystems. T. saanensis is a cold-adapted acidophile and a versatile heterotroph utilizing a suite of simple sugars and complex polysaccharides. The genome contained an abundance of genes assigned to metabolism and transport of carbohydrates including gene modules encoding for carbohydrate-active enzyme (CAZyme) family involved in breakdown, utilization and biosynthesis of diverse structural and storage polysaccharides. T. saanensis SP1PR4T represents the first member of genus Terriglobus with a completed genome sequence, consisting of a single replicon of 5,095,226 base pairs (bp), 54 RNA genes and 4,279 protein-coding genes. We infer that the physiology and metabolic potential of T. saanensis is adapted to allow for resilience to the nutrient-deficient conditions and fluctuating temperatures of Arctic tundra soils.

Acidobacteria are found in diverse soil environments and are widely distributed in Arctic and boreal soils [4][5][6][7][8]. However, relatively little is still known about their metabolic potential and ecological roles in these habitats. Despite a large collection of Acidobacteria 16S rRNA gene sequences in databases that represent diverse phylotypes from various habitats, few have been cultivated and described. Acidobacteria represent 26 phylogenetic subdivisions based on 16S rRNA gene phylogeny [9] of which subdivisions 1, 3, 4 and 6 are most commonly detected in soil environments [10]. The abundance of Acidobacteria has been found to correlate with soil pH [2,10,11] and carbon [1,12,13] with subdivision 1 Acidobacteria being most abundant in slightly acidic soils. The phylogenetic diversity, ubiquity and abundance of this group suggest that they play important ecological roles in soils.
Our previous studies on bacterial community profiling from Arctic alpine tundra soils of northern Finland have shown that Acidobacteria dominate in the acidic tundra heaths [2] and after multiple freeze-thaw cycles [6]. Using selective isolation techniques, including freezing soils at -20°C for 7 days, we have been able to isolate several slow growing and fastidious strains of Acidobacteria. On the basis of phylogenetic, phenotypic and chemotaxonomic data, including 16S rRNA, rpoB gene sequence similarity and DNA-DNA hybridization, strain SP1PR4 T was classified as a novel species of the genus Terriglobus [3]. Here, we summarize the physiological features together with the complete genome sequence and annotation of Terriglobus saanensis SP1PR4 T .

Genome sequencing and annotation
Genome project history Strain SP1PR4 T was selected for sequencing in 2009 by the DOE Joint Genome Institute (JGI) community sequencing program. The Quality Draft (QD) assembly and annotation were completed on August 6, 2010. The complete genome was made available on Jan 24, 2011. The genome project is deposited in the Genomes On-Line Database (GOLD) [25] and the complete genome sequence of strain SP1PR4 T is deposited in GenBank. Table 2 presents the project information and its association with MIGS version 2.0 [16].    [16].

Growth conditions and genomic DNA extraction
Strain SP1PR4 T was cultivated in R2 medium as previously described [3]. Genomic DNA (gDNA) of high sequencing quality was isolated using a modified CTAB method and evaluated according to the Quality Control (QC) guidelines provided by the DOE Joint Genome Institute.

Genome sequencing and assembly
The finished genome of T. saanensis SP1PR4 T (JGI ID 4088690) was generated at the DOE Joint genome Institute (JGI) using a combination of Illumina [26] and 454 technologies [27].

Genome annotation
Genes were identified using Prodigal [33] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [34]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, (COGs) [35,36], and InterPro. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [37], RNAMMer [38], Rfam [39], TMHMM [40], and signalP [41]. Additional gene prediction analysis and functional annotation were performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [42].

Genome properties
The genome consists of one circular chromosome of 5,095,226 bp in size with a GC content of 57.3% and consists of 54 RNA genes ( Figure 3, Table 3). Of the 4,333 predicted genes, 4,279 are proteincoding genes (CDSs) and 99 are pseudogenes. Of the total CDSs, 67% represent COG functional categories and 43% consist of signal peptides. The distribution of genes into COG functional categories is presented in Figure 3 and Table 4.

Discussion
Genome analysis of T. saanensis identified a high abundance of genes assigned to COG functional categories for transport and metabolism carbohydrates (9.5%) and amino acids (7.6%), energy conversion (6.2%), cell envelope biogenesis (9.6%) and transcription (9.2%) [15]. This indicates that the T. saanensis genome encodes for functions involved in transport and utilization of nutrients, mainly carbohydrates and amino acids for energy production and cell biogenesis to maintain cell integrity in cold tundra soils. Further genome analysis revealed an abundance of gene modules for glycoside hydrolases, glycosyl transferases, polysaccharide lyases, carbohydrate esterases, and non-catalytic carbohydrate-binding modules within the carbohydrateactive enzymes (CAZy [43]) family involved in breakdown, utilization and biosynthesis of carbohydrates [15]. T. saanensis hydrolyzed complex carbon polymers, including pectin, laminarin, and starch, and utilized sugars such as cellobiose, D-mannose, D-xylose, D-trehalose and laminarin. This parallels genome predictions for CDSs encoding for enzymes such as pectinases, chitinases, alginate lyases, trehalase and amylases. T. saanensis was unable to hydrolyze carboxymethyl cellulose (CMC) on plate assays and lacked CDSs encoding for cellulases involved in cellulose hydrolysis. However, the T. saanensis genome contained a BcsZ gene encoding for an endocellulase (GH8) as part of a bacterial cellulose synthesis (bcs) operon involved in cellulose biosynthesis in several species. This operon consists of clusters of genes in close proximity to the BcsZ gene which includes a cellulose synthase gene (bcsAB), a cellulose synthase operon protein (bcsC) and a cellulose synthase operon protein (yhj) [15]. In addition, the T. saanensis genome encoded for a large number of gene modules representing glycosyl transferases (GTs) involved in carbohydrate biosynthesis which include cellulose synthase (UDPforming), α-trehalose phosphate synthase [UDPforming], starch glucosyl transferase, ceramide βglucosyltransferase involved in biosynthesis of cellulose, trehalose, starch, hopanoid, and capsular/free exopolysaccharide (EPS) [15]. This suggests that T. saanensis is involved in hydrolysis of lignocellulosic soil organic matter, utilization of stored carbohydrates and biosynthesis of exopolysaccharides. Therefore, we surmise that T. saanensis may be central to carbon cycling processes in Arctic and boreal soil ecosystems.