Complete genome sequence of Paenibacillus sp. strain JDR-2

Paenibacillus sp. strain JDR-2, an aggressively xylanolytic bacterium isolated from sweetgum (Liquidambar styraciflua) wood, is able to efficiently depolymerize, assimilate and metabolize 4-O-methylglucuronoxylan, the predominant structural component of hardwood hemicelluloses. A basis for this capability was first supported by the identification of genes and characterization of encoded enzymes and has been further defined by the sequencing and annotation of the complete genome, which we describe. In addition to genes implicated in the utilization of β-1,4-xylan, genes have also been identified for the utilization of other hemicellulosic polysaccharides. The genome of Paenibacillus sp. JDR-2 contains 7,184,930 bp in a single replicon with 6,288 protein-coding and 122 RNA genes. Uniquely prominent are 874 genes encoding proteins involved in carbohydrate transport and metabolism. The prevalence and organization of these genes support a metabolic potential for bioprocessing of hemicellulose fractions derived from lignocellulosic resources.


Introduction
Paenibacillus sp. strain JDR-2 (Pjdr2) was isolated from wafers cut from live stems of sweet gum (Liquidambar styraciflua) placed in soil in an area populated predominantly by this tree species. The ability of this isolate to grow on 4-Omethylglucuronoxylose (MeGX) as the sole carbon source identified a metabolic potential not previously described. MeGX is released along with fermentable xylose during dilute acid pretreatment of lignocellulosic biomass. Since MeGX may represent 5 to 20% of the hemicellulose components from hardwoods and agricultural residues, this ability was of interest for increasing bioconversion yields of fermentable sugars from these resources [1,2].
Growth rates and yields of Pjdr2 with polymeric 4-O-methylglucuronoxylan (MeGXn) as substrate were much greater than with monosaccharides and oligosaccharides derived from MeGXn. These increases are presumably the result of a cellassociated multimodular GH10 endoxylanase that generates xylobiose, xylotriose, and the aldouronate, 4-O-methylglucuronoxylotriose (MeGX3), for direct assimilation and metabolism [2]. A cluster of genes was cloned and sequenced from Pjdr2 genomic DNA which contained two genes encoding transcriptional regulators, three genes encoding ABC transporters, and three sequential structural genes lacking secretion sequences encoding a GH67 α-glucuronidase, a GH10 endoxylanase catalytic domain and a putative GH43 β-xylosidase. The expression of these genes, as well as a distal gene encoding a secreted cellassociated multimodular GH10 endoxylanase, was coordinately responsive to inducers and repressors, leading to their collective designation as a xylan-utilization regulon [3]. Physiological studies defining the preferential utilization of MeGXn compared to MeGX and MeGX3 support a process in which extracellular depolymerization, assimilation and intracellular metabolism are coupled, allowing the rapid and complete utilization of MeGXn [4].
Pjdr2 was the first member of this genus to have its genome completely sequenced and made available for detailed analysis. The sequences of genomes of 2 strains of Paenibacillus polymyxa [5,6], "Paenibacillus vortex" [7], and Paenibacillus sp. Y412MC10 (NCBI NC_013406.1, unpublished results) have since been completed. The incomplete genome sequence Paenibacillus larvae subsp. larvae, the causative agent of American Foulbrood disease of honey bees, has also been analyzed [8].

Classification and features
A phylogenetic tree was constructed using the Neighbor-Joining method [9] for complete sequences of genes encoding 16S rRNA derived from sequenced genomes of Paenibacillus spp., along with the sequences of some members of the Bacillus spp., Microbacterium spp. and Clostridium spp, is presented in Figure 1. The sequence of the gene encoding 16S rRNA (AF355462) from Paenibacillus polymyxa PKB1 is included as representative of the type species of the genus [10].
The unrooted phylogenetic tree shows Pjdr2 in a branch that includes other Paenibacillus spp. in this comparison, supporting a lineage distinct from other Gram positive endospore-forming bacteria. Pjdr2 groups more closely with Paenibacillus lentimorbus and other Paenibacillus species that are insect pathogens than it does with another group that includes type species Paenibacillus polymyxa. From the standpoint of genome size and imputed metabolic potential based on sequence, it is surprising, based on 16S sequence, that it is not more closely related to Paenibacillus sp. Y412MC10. Despite a close similarity of Paenibacillus JDR-2 to Microbacterium species with respect to membrane fatty acids (see discussion below), it is clear that it is not related to members of the genus Microbacterium on the basis of 16S rRNA sequence. When grown on oat spelt xylan agar plates [2], colonies of strain Pjdr2 are white with smooth edges, surrounded by clearing zones resulting from the depolymerization of the xylan. This property was routinely used to monitor the purity of Pjdr2 cultures. As shown in Figure 2, cells of Pjdr2 are rod shaped, with swellings suggestive of sporulation. The properties evaluated for classification allows assignment as an endosporeforming bacterium in the phylum Firmicutes and genus Paenibacillus as noted in Table 1.
Strains with a similarity index (SI) value of 0.5 or higher indicate a good library comparison (MIDI 2002). The two strains that most closely match the profile of Pjdr2 are Microbacterium laevaniformans (SI = 0.75) and Cellulobacterium cellulans (SI = 0.51). We have included these two species in our phylogenetic analysis based upon their 16S rRNA sequences ( Figure 1). The FAME analysis provided a rapid assignment of the species by comparing the fatty acid profile(s) with 60 strains (42 species) of Bacillus, 2 strains (1 species) of Cellulobacterium, 20 strains (19 species) of Microbacterium and 20 strains (18 species) of Paenibacillus, as well as other aerobic bacteria. Sequence analysis of 16S rRNA provides the acceptable basis for considering phylogenetic relationships. Nevertheless the FAME analysis provides a convenient method with which to confirm the identity of the organism as it is maintained and studied over time.

Growth conditions and DNA isolation
For the preparation of genomic DNA, one of several colonies surrounded by a clear zone was picked from an agar plate (0.1% oat spelt xylan/ 0.1% yeast extract/ Zucker-Hankin medium [2], and grown in Zucker-Hankin/1% yeast extract at 30°C with shaking at 240 rpm. A culture (8 ml) at 0.6 OD 600nm was inoculated into 48 ml of culture media (Zucker-Hankin, 1% yeast extract). The latter was grown to 0.6 OD 600nm and cells were collected by centrifugation. High molecular weight DNA was prepared from these cells as per the protocol provided by JGI. Cells were suspended in TE buffer (10 mM Tris-HCl, 1.0 mM EDTA), pH 8.0 and treated with lysozyme to lyse the cell wall. SDS and Proteinase K were added to denature and degrade proteins. NaCl and CTAB were added to facilitate subsequent precipitation. Cell lysates were extracted with phenol and chloroform and the DNA was precipitated by addition of isopropanol. The nucleic acid pellet was washed with 70% ethanol, dissolved in water and then treated with RNase A.

Genome sequencing and assembly
The genome of Pjdr2 was sequenced at the JGI using a combination of 8 kb and 40 kb (fosmid) DNA libraries. In addition to Sanger sequencing, 454 pyrosequencing [28] was performed to a depth of 20× coverage. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website [29]. Draft assemblies were based on 39,689 total reads. All three libraries provided 5.1× coverage of the genome. The Phred/Phrap/Consed software package [30] was used for sequence assembly and quality assessment [31][32][33]. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible misassemblies were corrected with Dupfinisher [34] or transposon bombing of bridging clones (Epicentre Biotechnologies, Madison, WI). Gaps between contigs were closed by editing in Consed, custom primer walk, or PCR amplification (Roche Applied Science, Indianapolis, IN). A total of 1,028 additional reactions were necessary to close gaps and to raise the quality of the finished sequence.
The completed sequence analysis of Pjdr2 contained 45,057 reads, achieving an average of 5.5-fold sequence coverage per base, with an error rate less than 1 in 100,000.

Genome annotation
Genes were identified using Prodigal [37] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by manual curation using the JGI program GenePRIMP [38]. The predicted CDSs were translated and searched with the following databases to assign a product description for each predicted protein: the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [39], RNAMMer [38], Rfam [40], TMHMM [41], and SignalP [42]. Genome statistics are provided in Table 2, and a full circular map in Figure 3 below.

Utilization of lignocellulosics
The nucleotide sequence of a cluster of genes which included the α-glucuronidase gene served as a marker for the sequenced genome. The sequence of this cluster was previously determined in a cosmid clone of the genomic DNA of Pjdr2. The presence of this unique contiguous sequence in a single copy without orthologs or paralogs supported the final genomic sequence as representative of a single genome from a pure culture. This aldouronateutilization gene cluster, in conjunction with the distal gene encoding a multimodular cell-associated GH10 endoxylanase, constitutes a xylan-utilization regulon as previously defined [3]. The coordinate expression of the genes in this regulon supports a process in which assimilation of the aldouronate, 4-0-methylglucuronoxylotriose, generated by a cellassociated GH10 endoxylanase, is coupled to extracellular depolymerization, facilitating depolymerization, assimilation and metabolism as previously described [4]. The sequencing of the genome of Paenibacillus sp. strain JDR-2 has allowed further analysis of its xylan-utilization regulon and the identification of similar regulons involved in the depolymerization and utilization of soluble βglucans.
A noteworthy feature of the genome of Pjdr2 is the large number (874) of genes involved in carbohydrate metabolism and transport constituting 17% of the genome (