Complete Genome Sequence of Paenibacillus strain Y4.12MC10, a Novel Paenibacillus lautus strain Isolated from Obsidian Hot Spring in Yellowstone National Park

Paenibacillus sp.Y412MC10 was one of a number of organisms isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. The isolate was initially classified as a Geobacillus sp. Y412MC10 based on its isolation conditions and similarity to other organisms isolated from hot springs at Yellowstone National Park. Comparison of 16 S rRNA sequences within the Bacillales indicated that Geobacillus sp.Y412MC10 clustered with Paenibacillus species, and the organism was most closely related to Paenibacillus lautus. Lucigen Corp. prepared genomic DNA and the genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute. The genome sequence was deposited at the NCBI in October 2009 (NC_013406). The genome of Paenibacillus sp. Y412MC10 consists of one circular chromosome of 7,121,665 bp with an average G+C content of 51.2%. Comparison to other Paenibacillus species shows the organism lacks nitrogen fixation, antibiotic production and social interaction genes reported in other paenibacilli. The Y412MC10 genome shows a high level of synteny and homology to the draft sequence of Paenibacillus sp. HGF5, an organism from the Human Microbiome Project (HMP) Reference Genomes. This, combined with genomic CAZyme analysis, suggests an intestinal, rather than environmental origin for Y412MC10.


Introduction
Numerous novel microorganisms have been isolated from hot springs in Yellowstone National Park. Many of these organisms have been shown to possess enzymes with significant potential in biotechnological applications [1]. Among the organisms first isolated from Yellowstone hot springs are Thermus aquaticus [2,3], Thermus brockianus [4], Acidothermus cellulollyticus [5], and Synechococcus species [6]. As part of a project in conjunction with the Department of Energy Joint Genome Institute, Lucigen Corp. isolated, characterized, and sequenced a number of new isolates from Yellowstone hot springs. The bacterial isolate Y412MC10 was one of four microorganisms isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA and submitted for whole genome sequencing. Y412MC10 was initially classified as a Geobacillus sp. based on its isolation conditions and morphological similarity to other organisms such as Geobacillus species Y412MC61 (GenBank 544556), Y412MC52 (GenBank 550542), and Geobacillus thermoglucosidasius C56-YS93 (GenBank 634956). The Geobacillus sp.Y412MC10 draft genome sequence was deposited at the NCBI in October 2009 (NC_013406) with the lineage entry indicating that it is a Geobacillus. Following assembly of the complete genome of Y412MC10, the 16S rRNA sequence and genome properties properly assigned the organism as a Paenibacillus sp. Y412MC10 represents the first Paenibacillus sp. isolated from a hot spring to have its genome completely sequenced.
Paenibacillus sp. were originally grouped in the genus Bacillus until 1993, when Ash et al. [7] proposed that members of "group 3" should be transferred to the genus Paenibacillus, and proposed Paenibacillus polymyxa as the type species. Paenibacillus sp. have been isolated from a wide range of environments including soil [8], the Antarctic [9], and the oral cavity of a dog [10]. Paenibacillus sp. are of interest for a number of reasons, including production of antibiotics [11][12][13], biopolymer-degrading enzymes [14][15][16], and their ability to fix nitrogen [17,18]. One species, P. vortex shows highly unusual organized growth morphologies on solid surfaces [19,20]; another species, P. dendritiformis, also shows unusual growth morphologies on solid surfaces [19,21,22]. Comparison of the genetic content of Y412MC10 with genomes of Paenibacillus sp. from other environments will give insights into the evolutionary adaptations that have occurred in the Paenibacillus. The organism also may be a source of novel polysaccharide degrading enzymes for use in biomass degradation.

Classification and features
A phylogenetic tree was constructed to identify the family relationship of strain Y412MC10 (Figure 1). The tree was created using BLAST2Tree software [23]. The analysis was carried out using only type strains of validly-named organisms, and the analysis shows that Y412MC10 does not clade with known Geobacillus species. Rather, Y412MC10 clades in the Paenibacillus genus. Based on r16S analysis of validly-named organisms, Y412MC10 is most closely related to Paenibacillus lautus DSM 3055 T (AB073188). The classification of the isolate was confirmed using the EzTaxon-e server [24], again on the basis of 16S rRNA sequence data. When compared to their entire r16S database, Y412MC10 was identified as being a strain of Paenibacillus lautus with 99.09% identity and 100% completeness to the r16S of the type strain, Paenibacillus lautus NRRL NRS-666 GenBank D78472.
Paenibacillus sp. Y412MC10 was one of a number of organisms isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA (44.6100594° latitude and -110.4388217° longitude) under a sampling permit from the National Park Service. The hot spring possesses a pH of 6.37 and a temperature range of 42-90°C. The organism was isolated from a sample of hot spring water by enrichment and plating on YTP-2 medium [25] (YTP-2 media contains (per liter) 2.0 g yeast extract, 2.0 g tryptone, 2.0 g sodium pyruvate, 1.0 g KCl, 2.0 g KNO3, 2.0 g Na2HPO4.7H2O, 0.1 g MgSO 4 , 0.03 g CaCl 2 , and 2.0 ml clarified tomato juice) at 50°C. Culture stocks were routinely maintained on YT (containing (per liter) 5.0 g yeast extract, 8.0 g tryptone, and 2.5 g NaCl) agar plates at 37°C. As part of the sequencing agreement with the Joint Genome Institute, the culture is available without restrictions from the authors. Lucigen, the National Park Service, and the Joint Genome Institute have placed no restrictions on the use of the culture or sequence data. Y412MC10 is a Gram-positive facultative anaerobe ( Table 1) that grows well on a wide variety of standard lab media (YT, TB, LB). On plates, the organism grows as rods or chains of rods ( Figure  2A). After growth for 6 days on plates, the cells still appear rod-shaped, but an extracellular matrix appears to surround and bind the individual cells together (light green background, Figure 2B). In liquid culture, the organism appears to also grow as a mixture of single cells and large clumps of cells surrounded by an extracellular matrix ( Figure 2C). Prolonged growth on plates or in liquid culture results in sporulation of the culture; spores are subterminal with swollen sporangia.  hr. at 37°C. A colony was removed, re-suspended in sterile water and stained using a 5 μM solution of SYTO® 9 fluorescent stain in sterile water (Molecular Probes). Dark field fluorescence microscopy was performed using a Nikon Eclipse TE2000-S epifluorescence microscope at 2000× magnification using a high-pressure Hg light source and a 500 nm emission filter.  Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [41]. rpm. An aliquot was removed and stained using a 5 μM sol ution of SYTO® 9 fluorescent stain in sterile water (Molecular Probes). Dark field fluorescence microscopy was performed using a Nikon Eclipse TE2000-S epifluorescence microscope at 200× magnification using a high-pressure Hg light source and 500 nm emission filter.

Genome sequencing and annotation
Genome project history Paenibacillus sp.Y412MC10 was selected for sequencing on the basis of its biotechnological potential as part of the U.S. Department of Energy's Genomic Science program (formerly Genomics:GTL). The genome sequence is deposited in the Genomes On Line Database [42] (GOLD ID = Gc01127), and in GenBank (NCBI Reference Sequence = NC_013406). Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information and its association with MIGS identifiers is shown in Table 2.

Growth conditions and DNA Isolation
For preparation of genomic DNA, liter cultures of Y412MC10 were grown from a single colony in YTP-2 medium at 37°C in flasks agitated at 200 rpm and collected by centrifugation. The cell concentrate was lysed using a combination of SDS and proteinase K, and genomic DNA was isolated using a phenol/chloroform extraction [43]. The genomic DNA was precipitated, and treated with RNase to remove residual contaminating RNA.

Genome sequencing and assembly
The genome of Paenibacillus lautus Y412MC10 was sequenced at the Joint Genome Institute (JGI) [44] using Sanger sequencing with a combination of 6 kb and 34 kb DNA libraries and 454 FLX pyrosequencing done to a depth of 20× coverage [45]. Both libraries provided 5.8× coverage of the genome. Draft assemblies were based on 39,162 total reads. Solexa sequencing data was used to polish the assembly. All general aspects of library construction and sequencing performed at the JGI can be found at their website. The Phred/Phrap/Consed software package [46] was used to assemble 6-kb and fosmid libraries and to assess quality. Possible mis-assemblies were corrected; gaps between contigs were closed by 2,744 primer walks from sub-clones or 83 PCR end reads, 5 mini-libraries, and 10 PCR shatter libraries. The error rate of the completed genome sequence was 0.08, based on 49,558 total reads. Table 2 presents the project information and its association with MIGS version 2.0 compliance [47].

Genome annotation
Genes were identified using Prodigal [48] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [49]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Noncoding genes and miscellaneous features were predicted using tRNAscan-SE [49], RNAMMer [50], Rfam [51], TMHMM [52], and signalP [52].

Genome properties
The genome of Paenibacillus lautus Y412MC10 consists of one circular chromosome of 7,121,665 bp with an average G+C content of 51.2% (Table 3 and Figure 3). There are 73 tRNA genes, 24 rRNA genes and 4 "other" identified RNA gene. There are 6,343 predicted protein-coding regions and 105 pseudogenes in the genome. A total of 4,651 genes (72.2%) have been assigned a predicted function while the rest have been designated as hypothetical proteins. The numbers of genes assigned to each COG functional category are listed in Table 4. About 20% of the annotated genes were either not assigned to a COG or have an unknown function. a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.

Insights from the genome sequence
Motility of Paenibacillus cells on solid media has been observed with a number of species. P. lautus is reported to spread across plates 69]. P. vortex shows highly unusual organized growth morphologies on solid surfaces [19,20] forming complex patterns on the plate. Another species, P. dendritiformis, also shows unusual growth morphologies on solid surfaces [19,21,22]. Y412MC10 was evaluated for spreading behavior on plates; the results (( Figure 4A Figure 4B Figure 4C) show definite spreading behavior for Y412MC10. The spreading behavior does not, however, appear to be as complex as reported for P. vortex and P. dendritiformis.       B-3650, 741161 is a honey bee pathogen that attacks bee larvae. The r16S analysis ( Figure 5) shows Y412MC10 is most closely related to Paenibacillus sp. HGF5 (NCBI Taxon ID 908341, Gold ID Gi05716), an organism being sequenced as part of the Human Microbiome Project (HMP) Reference Genomes (http://www.hmpdacc.org). COG [ Figure 6] and TIGRfam [ Figure 7] whole genome comparisons were carried out between Y412MC10 and draft and finished genomes of closely related organisms using IMG software [56]. The results of the COGs and TIGRfam whole genome comparisons place Y412MC10 clearly among the Paenibacillus species, in agreement with the results from 16S analysis. The 16S analysis shows Y412MC10 is most closely related to P. vortex and P. sp. HGF5, both human isolates, and then to the two P. polymyxa sp. In both whole genome analyses, Y412MC10 is again most closely related to Paenibacillus vortex and Paenibacillus sp. HGF5. In the COG comparison, the other human isolate, P. sp. HGF7, is not closely related to P. vortex and P. sp.
HGF5; in the TIGRfam comparison, HGF7 clades closely with Y412MC10, P. vortex and P. sp. HGF5. These results also suggest a mammalian, rather than environmental, ecosystem as the home of Y412MC10. To further understand the relationship between these organisms, whole genome alignments were performed using Mummer software to generate dot plot diagrams comparing pairs of genomes on the IMG website [57] using input DNA sequences directly (NUCmer). The close relationship between the genome of Y412MC10 and the genomes of HGF5 and P. vortex is reflected in the high levels of homology and synteny seen (Figure 8, Figure 9) with these two human isolates.  TIGRfam whole genome comparison of selected strains. Comparison was performed as described in text; organisms and GenBank accession numbers are described in Figure 5.
In comparison, whole genome alignment of Y412MC10 with the genomes of P. polymyxa and P. mucilaginosus show little homology or synteny between Y412MC10 and the two soil organisms (Figure 10, Figure11).
The similarity between the r16S sequences of P. lautus and Y412MC10 led us to examine if biochemical evidence suggested a similar habit for both. Bacillus lautus was first isolated from the intestinal tract of children [58]; later, the identity of the organism was re-confirmed and the organism was reclassified and renamed Paenibacillus lautus. Examination of the genome of Y412MC10 lends support to the hypothesis that Y412MC10 also has an intestinal origin. An analysis of the carbohydrate active enzymes (CAZY [59]) shows very low levels of GH family 5,6,8,9,10,11, and 48 as well as no CBM 2 or 3 members, suggesting an inability to significantly degrade cellulose and hemicellulose components of biomass. CAZy analysis shows a genome enriched in GH29 and GH95 α-fucosidases; the genome is also enriched in GH38 and GH125 αmannosidases and GH78 α-L-rhamnosidases. All these enzyme groups attack carbohydrate sidechains attached to eukaryotic glycoproteins; such glycoproteins are found in abundance in intestinal cell walls. CAZy analysis also shows a genome enriched in GH18 chitinases, GH28 polygalacturonases, GH88 unsaturated glucuronyl hydrolases, GH105 unsaturated rhamnogalacturonyl hydrolases and pectate lyase (PL) family members. These enzymes attack dietary fiber components that would be resistant to digestion by most ruminant bacteria, allowing the organism to scavenge sugars from pre-digested dietary sources. The enzymes required for bacillibactin production appear to be present in the genome of Y412MC10; bacillibactin is involved in iron acquisition. Iron is in limited supply in intestinal environments, but is present in large excess (approximately 2 μM Fe 2+ ) in Obsidian hot spring.    This again argues for an intestinal origin for the organism. Y4112MC10 does not possess genes usually involved in detoxification of heavy metals and sulfide found in other hot springs organisms (unpublished results). The organism also lacks antibiotic production genes, indicating it comes from an environment with excess resources, typical of the intestine. The growth temperature range and optimum of Y412MC10 is an excellent match for intestinal conditions, but a poor fit for the conditions of Obsidian hot spring, where temperatures average 79±4°C. Nitrogen fixing Paenibacillus have been isolated from the rhizosphere, including Paenibacillus brasilensis [60], and Paenibacillus zanthoxyli [61]. Paenibacillus lautus Y412MC10 has no nitrogenfixing genes; these would be of no advantage for a free-living organism in an intestinal environment.
Complex cooperative behaviors such as those seen with P. dendritiformis [62], and P. vortex [19] are not observed with Y412MC10; again, these behaviors may be unnecessary for survival in the intestine. Formation of external matrices in liquid and solid cultures may be beneficial to Y412MC10 for survival; the matrix may allow attachment of the bacteria to intestinal mucosa.

Conclusion
Paenibacillus sp. Y412MC10 is the first hot spring Paenibacillus sp. for which a whole genome sequence is available. Based on examination of the enzymes and biochemical pathways present in the organism, r16S comparison to other sequenced organisms and type strains, and whole genome comparisons, Y412MC10 appears to be of intestinal, rather than environmental origin. The bison herds that are present around Obsidian hot spring may be the reservoir of this organism; on multiple collection trips, bison dung was seen in and around the pool. The upper growth temperature of 50°C and/or sporulation may have contributed to Y412MC10's survival in this otherwise inhospitable environment. A major need for understanding the relationships among the Paenibacilli is both genome sequence information on validly-named type strains and the naming of sequenced strains. The majority of sequenced strains have not been validly named, nor has significant genomic analysis been performed on type strains. The result is two, independent, phylogenetic trees that cannot be easily overlapped (compare Figure 1 and Figure 5). For both sets of data to be useful, a consensus should be reached on a system for incorporating both sets of data.