Complete genome sequence of Granulicella mallensis type strain MP5ACTX8T, an acidobacterium from tundra soil

Granulicella mallensis MP5ACTX8T is a novel species of the genus Granulicella in subdivision 1of Acidobacteria. G. mallensis is of ecological interest being a member of the dominant soil bacterial community active at low temperatures and nutrient limiting conditions in Arctic alpine tundra. G. mallensis is a cold-adapted acidophile and a versatile heterotroph that hydrolyzes a suite of sugars and complex polysaccharides. Genome analysis revealed metabolic versatility with genes involved in metabolism and transport of carbohydrates. These include gene modules encoding the carbohydrate-active enzyme (CAZyme) family involved in breakdown, utilization and biosynthesis of diverse structural and storage polysaccharides including plant based carbon polymers. The genome of Granulicella mallensis MP5ACTX8T consists of a single replicon of 6,237,577 base pairs (bp) with 4,907 protein-coding genes and 53 RNA genes.


Morphology and physiology
G. mallensis grows on R2 medium (Difco) at pH 3.5-6.5 (optimum pH 5) and at +4 to +28 °C (optimum 24-27 °C) [1]. On R2 agar, strain MP5ACTX8 T forms opaque white mucoid colonies with a diameter of approximately 1 mm. Cells are Gram-negative, non-motile, aerobic rods, approximately 0.5-0.7 mm wide and 0.6-1.3 mm long. Growth observed with up to 1.5% NaCl (w/v) ( Table 1). The cell-wall structure in ultrathin sections of electron micrographs of cells of MP5ACTX8 T is shown in Figure 2.

Chemotaxonomy
The major cellular fatty acids in G. mallensis are iso-C 15:0 (45.3%), C 16:1ω7c (28.7%), iso-C 13:0 (8.3%) and C 16 Table 2 presents the project information and its association with MIGS version 2.0 [32]. Growth conditions and genomic DNA extraction G. mallensis MP5ACTX8 T was cultivated on R2 medium as previously described [1]. Genomic DNA (gDNA) of high sequencing quality was isolated using a modified CTAB method and evaluated according to the Quality Control (QC) guidelines provided by the DOE Joint Genome Institute [43].  [46]. The 454 Newbler consensus shreds, the Illumina Velvet consensus shreds and the read pairs in the 454 paired end library were integrated using parallel phrap, version SPS -4.24 (High Performance Software, LLC) [47]. The software Consed [48] was used in the finishing process. The Phred/Phrap/Consed software package [49] was used for sequence assembly and quality assessment in the subsequent finishing process. Illumina data was used to correct potential base errors and increase consensus quality using the software Polisher developed at JGI (Alla Lapidus, unpublished). Possible misassemblies were corrected using gapResolution (Cliff Han, un-published), Dupfinisher [50] or sequencing cloned bridging PCR fragments with sub-cloning. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR (J-F Cheng, unpublished) primer walks. The final assembly is based on 74.2 Mb of 454 data which provides an average 18.5× coverage and 1318.5 Mb of Illumina data which provides an average 213× coverage of the genome.

Genome annotation
Genes were identified using Prodigal [51] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [52]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COGs [53,54], and InterPro. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [55], RNAMMer [56], Rfam [57], TMHMM [58], and signalP [59]. Additional gene prediction analysis and functional annotation were performed within the Integrated Microbial Genomes Expert Review (IMG-ER) platform [60].

Genome properties
The genome consists of one circular chromosome of 6,211,694 bp in size with a GC content of 57.8 mol% and consists of 53 RNA genes (Figure 3 and Table 3). Of the 4,960 predicted genes, 4,907 are protein-coding genes (CDSs) and 90 are pseudogenes. Of the total CDSs, 70.5% represent COG functional categories and 16% consist of signal peptides. The distribution of genes into COG functional categories is presented in Figure 3 and Table 4. The total is based on either the size of the genome in base pairs or the protein coding genes in the annotated genome.  The total is based on the total number of protein coding genes in the genome.

Discussion
Granulicella mallensis type strain MP5ACTX8 T has the largest genome size of 6.2 Mbp. among the three tundra soil strains of subdivision 1 Acidobacteria [28]. Genome analysis of Granulicella mallensis identified a high abundance of genes assigned to COG functional categories for transport and metabolism of carbohydrates (9.1%) and amino acids (6.7%) and involved in cell envelope biogenesis (8.3%) and transcription (8.6%). Further genome analysis revealed an abundance of gene modules encoding for functional activities within the carbohydrate-active enzymes (CAZy) family [61] involved in breakdown, utilization and biosynthesis of carbohydrates. G. mallensis hydrolyzed complex carbon polymers, including CMC, pectin, lichenin, laminarin and starch, and utilized sugars such as cellobiose, D-mannose, D-xylose, D-trehalose. This parallels genome predictions for CDSs encoding for enzymes such as cellulases, pectinases, alginate lyases, trehalase and amylases. In addition, the G. mallensis genome contained a cluster of genes in the neighborhood of the cellulose synthase gene (bcsAB) which included cellulase (bscZ) (endoglucanase Y) of family GH8, cellulose synthase operon protein (bcsC) and a cellulose synthase operon protein (yhjQ) involved in cellulose biosynthesis. Detailed comparative genome analysis of G. mallensis MP5ACTX8 T with other Acidobacteria strains for which finished genomes were available is reported in Rawat et al. [28]. The data thus suggests that G. mallensis is involved in hydrolysis, the utilization of stored carbohydrates, and in the biosynthesis of exopolysaccharides from organic matter and plant based polymers in the soil. Therefore, we infer that strain G. mallensis may be central to carbon cycling processes in arctic and boreal soil ecosystems.