Complete genome sequence of Granulicella tundricola type strain MP5ACTX9T, an Acidobacteria from tundra soil

Granulicella tundricola strain MP5ACTX9T is a novel species of the genus Granulicella in subdivision 1 Acidobacteria. G. tundricola is a predominant member of soil bacterial communities, active at low temperatures and nutrient limiting conditions in Arctic alpine tundra. The organism is a cold-adapted acidophile and a versatile heterotroph that hydrolyzes a suite of sugars and complex polysaccharides. Genome analysis revealed metabolic versatility with genes involved in metabolism and transport of carbohydrates, including gene modules encoding for the carbohydrate-active enzyme (CAZy) families for the breakdown, utilization and biosynthesis of diverse structural and storage polysaccharides such as plant based carbon polymers. The genome of G. tundricola strain MP5ACTX9T consists of 4,309,151 bp of a circular chromosome and five mega plasmids with a total genome content of 5,503,984 bp. The genome comprises 4,705 protein-coding genes and 52 RNA genes.


Chemotaxonomy
The major cellular fatty acids in G. tundricola are iso-C15:0 (46.4%), C16:1ω7c (35.0%) and C16:0 (6.6%). The cellular fatty acid composition of strain MP5ACTX9 T was similar to that of other Granulicella strains with fatty acids iso-C15:0 and C16:1ω7c being most abundant in all strains. Strain MP5ACTX9 T contains MK-8 as the major quinone and also contains 4% of MK-7.  Table 2 presents the project information and its association with MIGS version 2.0 [44].

Growth conditions and genomic DNA extraction
G. tundricola MP5ACTX9 T was cultivated on R2 medium as previously described [1]. Genomic DNA (gDNA) of high sequencing quality was isolated using a modified CTAB method and evaluated according to the Quality Control (QC) guidelines provided by the DOE Joint Genome Institute [45].

Genome annotation
Genes were identified using Prodigal [53] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [54]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, (COGs) [55,56], and InterPro. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [57], RNAMMer [58], Rfam [59], TMHMM [60], and signalP [61]. Additional gene prediction analysis and functional annotation were performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [62].

Genome properties
The genome is 5,503,984 bp in size, which includes the 4,309,151 bp chromosome and five plasmids pACIX901 (0.48 Mbp); pACIX902 (0.3 Mbp); pACIX903 (0.19 Mbp), pACIX904 (0.12 Mbp) and pACIX905 (0.12 Mbp), with a GC content of 59.9 mol%. There are 52 RNA genes (Figures 3   and 4, and Table 3). Of the 4,758 predicted genes, 4,706 are protein-coding genes (CDSs) and 163 are pseudogenes. Of the total CDSs, 68.8% represent COG functional categories and 27.5% consist of signal peptides. The distribution of genes into COG functional categories is presented in Figure 3 and Table 4, and Table 5. The total is based on either the size of the g enome in base pairs or the protein coding genes in the annotated g enome.   The total is based on the total number of protein coding g enes in the genome.

Discussion
Granulicella tundricola MP5ACTX9 T is a tundra soil strain with a genome consisting of a circular chromosome and five mega plasmids ranging in size from 1.1 x 10 5 to 4.7 x 10 5 bp for a total genome size of 5.5 Mbp. The G. tundricola genome also contains close to twice as many pseudogenes and a large number of mobile genetic elements as compared to Granulicella mallensis and Terrigobus saanensis, two other Acidobacteria isolated from the same habitat [29]. A large number of genes assigned to COG functional categories for transport and metabolism of carbohydrates (6.9%) and amino acids (6.5%) and involved in cell envelope biogenesis (8%) and transcription (6.9%) were identified. Further genome analysis revealed an abundance of gene modules encoding for functional activities within the carbohydrateactive enzymes (CAZy) families [63,64] involved in breakdown, utilization and biosynthesis of carbohydrates. G. tundricola hydrolyzed complex carbon polymers, including CMC, pectin, lichenin, laminarin and starch, and utilized sugars such as cellobiose, D-mannose, D-xylose and D-trehalose. Genome predictions for CDSs encoding for enzymes such as cellulases, pectinases, alginate lyases, trehalase and amylases are in agreement with biochemical activities in strain MP5ACTX9 T . However, the genome of G. tundricola did contain many CDSs encoding for GH18 chitinases although no chitinase activity was detected after 10 dayincubation with chitinazure [29]. In addition, the G. tundricola genome contained a cluster of genes in close proximity to the cellulose synthase gene (bcsAB), which included cellulase (bscZ) (endoglucanase Y) of family GH8, cellulose synthase operon protein (bcsC) and a cellulose synthase operon protein (yhjQ) involved in cellulose biosynthesis. We previously reported on a detailed comparative genome analysis of G. tundricola MP5ACTX9 T with other Acidobacteria strains for which finished genomes are available [29]. The data suggests that G. tundricola is involved in hydrolysis and utilization of stored carbohydrates and biosynthesis of exopolysaccharides from organic matter and plant based polymers in the soil. Therefore, G. tundricola may be central to carbon cycling processes in Arctic and boreal soil ecosystems.