Genome sequence and description of the heavy metal tolerant bacterium Lysinibacillus sphaericus strain OT4b.31

Lysinibacillus sphaericus strain OT4b.31 is a native Colombian strain having no larvicidal activity against Culex quinquefasciatus and is widely applied in the bioremediation of heavy-metal polluted environments. Strain OT4b.31 was placed between DNA homology groups III and IV. By gap-filling and alignment steps, we propose a 4,096,672 bp chromosomal scaffold. The whole genome (consisting of 4,856,302 bp long, 94 contigs and 4,846 predicted protein-coding sequences) revealed differences in comparison to the L. sphaericus C3-41 genome, such as syntenial relationships, prophages and putative mosquitocidal toxins. Sphaericolysin B354, the coleopteran toxin Sip1A and heavy metal resistance clusters from nik, ars, czc, cop, chr, czr and cad operons were identified. Lysinibacillus sphaericus OT4b.31 has applications not only in bioremediation efforts, but also in the biological control of agricultural pests.


Introduction
Biological control of vector-borne diseases, such as dengue and malaria, and agricultural pests have been an issue of special concern in the recent years. Since Kellen et al. [1] first described Lysinibacillus sphaericus as an insect pathogen, studies have shown mosquitoes to be the major target of this bacterium [2][3][4], but toxic activity against other species has also been reported [5,6]. L. sphaericus larvicidal toxicity has been reported due to vegetative mosquitocidal toxins (Mtx) [7], the binary toxin (BinA/BinB) [4], Cry48/Cry49 toxin [8] and recently the S-layer protein [9]. To date, no larvicidal activity has been identified in Lysinibacillus sphaericus OT4b.31 against Culex quinquefasciatus [10].
On the other hand, Lysinibacillus species are potential candidates for heavy metal bioremediation. Some Bacillaceae strains have been successfully isolated from nickel contaminated soil [11], industrial landfills [12], naturally metalliferous soils [13] and a uranium-mining waste pile [14]. In addition, native Colombian Lysinibacillus strains have been reported as potential metal bioremediators. Strain CBAM5 is resistant to arsenic, up to 200 mM, and contains the arsenate reductase gene [15]. L. sphaericus OT4b. 31 showed heavy metal biosorption in living and dead biomass. The S-layer protein was also shown to be present [16]. We observed 19 mosquitopathogenic L. sphaericus strains and 6 nonpathogenic strains (including OT4b.31) that were able to grow in arsenate, hexavalent chromium and/or lead [17]. The moderate heavy metal tolerance in a Lysinibacillus strain isolated from a non-polluted environment generates interest in characterizing the genomic properties of L. sphaericus OT4b.31, in addition to its biotechnological potential in biological control.
Here we present a summary classification and a set of features for Lysinibacillus sphaericus OT4b.31 including previously unreported aspects of its phenotype, together with the description of the complete genomic sequencing and annotation.

Classification and features
Formerly known as Bacillus sphaericus, the species was defined as having a spherical terminal spore and by its inability to ferment sugars [18]. According to physiological and phylogenetic analysis, it was reassigned to the genus Lysinibacillus [19]. Strains of L. sphaericus can be divided into five DNA homology groups (I-V). Some mosquito pathogenic strains are allocated in subgroup II-A, while Lysinibacillus fusiformis species is in subgroup II-B [20]. Later, according to 16S rDNA and lipid profile comparisons, Lysinibacillus sphaericus sensu lato was classified into seven similarity subgroups, of which only four retained the previous description by Krych et al. [21]. Recently, by using 16S rDNA phylogenetic analysis some mosquito pathogenic native strains were found in group II with heterogeneous heavy metal tolerance levels. [17].
Partial 16S rRNA gene sequences (1,421 bp) were aligned to establish the phylogenetic neighborhood of Lysinibacillus sphaericus OT4b.31 ( Figure 1). The phylogenetic tree was constructed by neighbor-joining [23] using the SEAVIEW [24] and TreeGraph2 [25] packages. Genetic distances were estimated by using the Jukes-Cantor model [23]. The stability of relationships was assessed by bootstrap analysis based on 1,000 resamplings for the tree topology. Interestingly, L. sphaericus OT4b.31 did not fall into any existing DNA similarity group; it was found between DNA similarity groups III and IV [21]. Consistent with Lozano & Dussán [17], L. sphaericus OT4b.31 did not fall into DNA similarity groups I, II or III.

Genome project history
The genome sequencing of Lysinibacillus sphaericus OT4b.31 was supported by the CIMIC (Centro de Investigaciones Microbiológicas) laboratory at the University of Los Andes within the Grant (1204-452-21129) of the Instituto Colombiano para el fomento de la Investigación Francisco José de Caldas. Whole genomic DNA extraction and bioinformatics analysis was performed at CIMIC laboratory, whereas libraries construction and whole shotgun sequencing at the Beijing Genome Institute (BGI) Americas Laboratory (Tai Po, Hong Kong). The applied pipeline included quality check of reads, de novo assembly, a gap-filling step and mapping against a reference genome. This whole genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession AQPX00000000. The version described in this paper is the first version, AQPX01000000. A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
Lysinibacillus sphaericus strain OT4b.31 was grown in nutrient broth for 16 hours at 30ºC and 150 rev/min. High molecular weight DNA was isolated using the EasyDNA ® Kit (Carlsbad, CA, USA. Invitrogen) as indicated by the manufacturer. DNA purity and concentration were determined in a NanoDrop spectrophotometer (Wilmington, DE, HUSA. Thermo Scientific). Standards in Genomic Sciences a) Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [38]. Lineages with type strain genome sequencing projects registered in GOLD [22] are labeled with one asterisk, those also listed as 'Complete and Published' with two asterisks.  Using the FASTX-Toolkit version 0.6.1 [39] and clean_reads version 0.2.3 from the ngs_backbone pipeline [40] reads were trimmed and quality filtered. Then, with the CLC Assembly Cell version 4.0.10 [41], assembly and scaffolding steps were conducted via a de novo assembly pipeline. The assembly included automatic scaffolding and kmer/overlapping optimization steps. Some gaps were successfully filled by using GapFiller [42] within 30 iterations. No more gaps reached convergence by running more iterations. To obtain structural insight of a chromosomal scaffold, we used CONTIGuator.2 [43], using the Lysinibacillus sphaericus strain C3-41 chromosome (accession number: CP000817.1) as reference. Gap-filling steps and mapping to reference sequences were performed again to confirm convergence. Quality assessment of the assembly was performed with iCORN [44]. The error rate of the final assembly is less than 1 in 1,000,000. Lastly, by using PROmer from the MUMmer [45] and Mauve [46] packages, we compared the chromosomal assembly and the chromosome of L. sphaericus C3-41.

Genome annotation
The Glimmer 3 gene finder was used to identify and extract sequences for potential coding regions. To achieve the functional annotation steps, the RAST server [47] and Blast2GO pipelines [48] were used. Blast2GO performed the blasting, GOmapping and annotation steps; which included a description according to the ProDom, FingerPRINTScan, PIR-PSD, Pfam, TIRGfam, PROSITE, ProDom, SMART, SuperFamily, Pattern, Gene3D, PANTHER, SignalIP and TM-HMM databases. The results were summarized with InterPro [49]. Additionally, a GO-EnzymeCode mapping step was used to retrieve KEGG pathway-maps. tRNA genes were identified by using tRNAscan-SE [50] and rRNA genes by using RNAmmer [51]. The possible orthologs of the genome were identified based on the COG database and classified accordingly [52]. Prophage region prediction was also conducted by using the PHAST tool [53].

Genome properties
The genome summary and statistics are provided in Tables 3 and 4 and Figure 4. The genome consists of 96 scaffolds in 4,856,302 bp total size with a GC content of 37.5%. A total of 23 scaffolds were successfully aligned to a reference sequence, comprising 4,096,672 bp of sequence and are represented by the red and blue bars within the outer ring of Figure 4. Of the 4,938 genes predicted, 4,846 were protein-coding genes, 46 RNAs, and 1,623 pseudogenes were identified. Genes assigned a putative function comprised 67.13% of the protein-coding genes while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 5.  a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. a) The total is based on the total number of protein coding genes in the annotated genome.

Insights into the genome
To complete the assembly process, a resequencing pipeline was applied that set whole genome sequences as references such as and GC nucleotide skew [(G-C)/(G+C)] analysis. In the first 40 Kbp of contig 1, we found dnaX, recR, and holB, while dnaA, recG and recA were found at the end (after 290 Kbp) of contig 13. This may suggest that contig 13 should be allocated immediately before contig 1. Besides, there was no evidence of multiple dnaA boxes around the potential origin.
The replication termination site of the chromosomal scaffold is believed to be localized near 2.5 Mbp in the contig 18, according to GC skew analysis, and the coding bias for the two strands of the chromosome is for the majority of CDSs to be on the outer strand from 0 to ~2.5 Mbp and on the inner strand from ~2.5 Mbp to the end of the chromosomal scaffold (contig 26, Figure 4). This was also confirmed by the presence of parC (H131_12178) and parE (H131_12183), which encode the subunits of the chromosome-partitioning enzyme topoisomerase IV [54]. Similar to the L. sphaericus C3-41 genome [55], we did not find the homolog of rtp (replication terminator proteinencoding gene) in the chromosomal assembly of OT4b.31.  Two elements contain conserved domains from the Listeria pathogenicity island LIPI-1, functionally assigned as a thiol-activated cytolysin and a phosphatidylinositol phospholipase C. The first was confirmed to correspond to the L. sphaericus B354 sphaericolysin coding gene in contig 18 (H131_12483). Sphaericolysin B354 has been reported to be widespread across L. sphaericus DNA homology groups not only including IIA, IIB, IV and V [56] but also non-grouped species such as OT4b.31. Upstream, in the same contig, a Bacillus toxin from the family Mtx2 (PFam PF03318) was found and described as a hypothetical Sip1A toxin coding sequence (H131_12498). Purified from Bacillus thuringiensis strain EG2158, Sip1A is a secreted insecticidal protein of 38 KDa having activity against Colorado Potato beetle (Leptinotarsa decemlineata) [57]. Considering that L. sphaericus OT4b.31 was isolated from beetle larvae, we suggest potential coleopteran larvicidal activity. To our knowledge, strain OT4b.31 is the first report of a predicted Sip1A-like toxin in a native Lysinibacillus sphaericus. Unexpectedly, mtx or bin mosquito pathogenic genes were not found in the OT4b.31 genome, despite a previous report showing positive evidence of BinA/B toxins with no larvicidal activity [10].
A total of 32 CDSs were described as surface (S) layer proteins or S-layer homologs (SLH). The putative S-layer gene sllB (H131_05299) previously reported in L. sphaericus JG-A12 [58] was found in a 3,696 bp sequence allocated in contig 8. Three sequences with conserved domains similar to Slp5 and Slp6 were identified in contigs 8 (H131_05339, H131_05344) and 22 (H131_16838). Bacillus sp. B-14905 was the most similar sequence for the majority of S-layer protein domains. In addition, a putative glycoprotein (H131_22117), a bifunctional periplasmic precursor (H131_05993) and an S-layer fusion (H131_05409) coding sequence associated with Slayer proteins were recognized. On the other hand, a cluster of spore germination genes were determined near the termination of the replication site (including genes from the ger and ype operons) among other genes widespread in the genome. Three clusters of sporulation genes were allocated at contigs 1, 10 and 13 (including genes from spoII, spoV, yaa and sig operons). Responses against toxic metal(oid)s in L. sphaericus OT4b.31 could be controlled by efflux pumps related genes in clusters found in contigs. Putative coding sequence order is as follows: yozA→czcD→csoR→copZA (contig 1, H131_00045: H131_00065); nikABC→oppD→nikD (contig 17, H131_11103:H131_11123); cadC-like→cadA (contig 24, H131_17086:H131_17081); arsRBCputative extracellular secreted protein CDS -arsR-like→arsR-like→ putative excinuclease CDS (contig 18, H131_11998:H131_12028). The function of YozA is still unknown [59], but is similar to CzrA and CadC belonging to the ArsR transcriptional family regulators. YozA, CsoR (from the copper-sensitive operon), CadC-like and ArsR proteins seem to be the direct regulators of each cluster. At least one additional copy of ChrA, CzrB and CzcD CDSs were found. Upstream the nik cluster, we could not find transcriptional regulators. In summary, L. sphaericus OT4b.31 has protein encoding sequences probably involved in the resistance against Cd, Zn, Co, Cu, Ni, Cr, and As. In fact, prior reports of resistance to toxic metals [16,17] in L. sphaericus OT4b.31 may be explained due to participation of heavy-metal resistance proteins.

Conclusions
The native Colombian strain Lysinibacillus sphaericus OT4b.31, isolated from beetle larvae, is classified between DNA similarity groups III and IV. A comparison of the chromosomal sequences of strain OT4b.31 and its closest complete genome sequence, L. sphaericus C3-41, demonstrates the presence of only a few similar regions with syntenial rearrangements, and no prophage or putative mosquitocidal toxins are shared. Sphaericolysin B354 and the coleopteran toxin Sip1A were predicted in the strain OT4b.31, a finding which may be useful not only in bioremediation of polluted environments, but also for biological control of agricultural pests. Finally, Cd, Zn, Co, Cu, Ni, Cr and As resistances probably are supported by efflux pumps genes.