High-quality draft genome sequence of nematocidal Bacillus thuringiensis Sbt003

Bacillus thuringiensis represents one of the six species of "Bacillus cereus group" in the genus Bacillus within the family Bacillaceae. Strain Sbt003 was isolated from soil and identified as B. thuringiensis. It harbors at least seven plasmids and produces three shapes of parasporal crystals including oval, bipyramidal and rice. SDS-PAGE analysis of spore-crystal suspension of this strain reveals six major protein bands, which implies the presence of multiple parasporal crystal genes. Bioassay of this strain reveals that it shows specific activity against nematodes and human cancer cells. In this study, we report the whole genomic shotgun sequences of Sbt003. The high-quality draft of the genome is 6,175,670 bp long (including chromosome and plasmids) with 6,372 protein-coding and 80 RNA genes.


Introduction
Bacillus thuringiensis, B. cereus, B. anthracis and other three species constitute the "Bacillus cereus group", a nontaxonomic term, within the genus Bacillus and family Bacillaceae [1]. These species were classified as separate species mainly based on their distinct phenotypes, although extensive genomic studies on strains of these species using different techniques have suggested that they form a single species [2][3][4][5]. Strain Sbt003 belongs to the species B. thuringiensis. The type strain of the species produces one or more parasporal crystal proteins showing specific activity against certain larvae from various orders of insects [6]. The specific role and the abundant number of genes encoding of insecticidal crystal proteins of this species have attracted much attention from both academic and industrial researchers. Dozens of B. thuringiensis strains have been sequenced, and dozens more are on their way. In this study, we present a summary classification and a set of features for B. thuringiensis Sbt003, together with the description of the genomic sequencing and annotation.

Classification and features
B. thuringiensis strain Sbt003 harbors at least 7 plasmids and produces three different shapes of parasporal crystals including oval, bipyramidal and rice ( Figure 1A, Figure 1B and Table 1). SDS-PAGE analysis of spore-crystal suspension of this strain reveals six major protein bands of 168.8, 148.5, 133.5, 117.2, 107.9 and 103.1 kDa, which implies the presence of multiple parasporal crystal genes ( Figure 1C). A representative genomic 16S rDNA sequence of strain Sbt003 was searched against GenBank database using BLAST [21]. Sequences showing more than 97% identity to the 16S rDNA of Sbt003 were selected for phylogentic analysis, and a 16S rDNA sequence from B. subtilis subsp. subtilis str. 168 was used as the outgroup. Nine sequences were aligned with ClustalW algorithm. The tree was reconstructed using neighbor joining with the Kimura 2-parameter substitution model. The phylogenetic tree was assessed by bootstrapping 1,000 times, and the consensus tree is shown in Figure 2.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing due to its specific activity against nematodes and human cancer cells. The complete high quality draft genome sequence is deposited in GenBank. The Beijing Genomics Institute (BGI) performed the sequencing and NCBI staff used the Prokaryotic Genome Automatic Annotation Pipeline (PGAAP) to complete the annotation. A summary of the project is given in Table 2.  , not directly observed for the living, isolated sample, but based on a g enerally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [20].

Growth conditions and DNA isolation
B. thuringiensis Sbt003 was grown in 50 mL Luria broth for 6 hours at 28°C. DNA was isolated by incubating the cells with lysozyme (10 mg/mL) in 2 mL TE (50 mM Tris base, 10 mM EDTA, 20% sucrose, pH8.0) at 4°C for 6 hours. 4 mL 2% SDS was added and the mixture was incubated at 55°C for 30 min; 2 mL 5M NaCl were added, and the mixture was incubated at 4°C for 10 min. DNA was purified by organic extraction and ethanol precipitation.

Genome sequencing and assembly
The genome of B. thuringiensis Sbt003 was sequenced using Illumina Hiseq 2,000 platform (with a combination of a 100-bp paired-end reads sequencing from a 500-bp genomic library and a 90bp mate-paired reads sequencing from a 2-kb genomic library). Reads with average quality scores below Q30 or having more than 3 unidentified nucleotides were eliminated.  [22]. The assembly is considered a high-guality draft and consists of 104 contigs arranged in 61 scaffolds with a total size of 6,175,670 bp. According to bioinformatic analysis, we identified two large plasmids belonging to ori44-type and repB-type plasmids, respectively. The former plasmid has two ori44-type replicons. We propose it represents a fusion of two plasmids and its estimated size is about 200 kb. The latter plasmid has an expected size of at least 90 kb, according to the sequence of contig0027, which is typical of repB-type plasmids (80 ~ 90 kb). In addition, we identified five other plasmids from the plasmid pattern (see Figure 1A). The expected sizes of the smaller three are 13 kb, 8kb and 4kb, respectively, while the sizes of the larger two can't be deduced either from the plasmid pattern or by bioinformatic analysis.

Genome annotation
Genome annotation was completed using the Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP). Briefly, protein-coding genes were predicted using a combination of GeneMark and Glimmer [23][24][25]. Ribosomal RNAs were predicted by sequence similarity searching using BLAST against an RNA sequence database and/or using Infernal and Rfam models [26,27]. Transfer RNAs were predicted using tRNAscan-SE [28]. In order to detect missing genes, a complete six-frame translation of the nucleotide sequence was done and predicted proteins (generated above) were masked. All predictions were then searched using BLAST against all proteins from complete microbial genomes. Annotation was based on comparison to protein clusters and on the BLAST results. Conserved domain Database and Cluster of Orthologous Group information were then added to the annotation.  CRISPR repeats 0 0 **none of the rRNA operons appears to be complete due to unresolved assembly problems.

Genome Properties
The high-quality draft assembly of the genome consists of 104 contigs in 61 scaffolds, with an overall 35.21% G+C content. Of the 6,452 genes predicted, 6,372 were protein-coding genes, and 80 RNAs were also identified. The majority of the protein-coding genes (66.67%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins ( Table 3). The distribution of genes into COGs functional categories is presented in Table 4.
The whole genomic sequence and the coding sequence of Sbt003 were analyzed by BtToxin_scanner [29], and eight potential crystal protein sequences were identified. Among these, four were considered to be full-length (locus tags: C797_02099, C797_12066, C797_12568 and C797_27783) while the others were considered to be truncated (Locus tags: C797_02094, C797_12046, C797_12061, C797_18417).