Draft genome sequence of Bacillus amyloliquefaciens HB-26

Bacillus amyloliquefaciens HB-26, a Gram-positive bacterium was isolated from soil in China. SDS-PAGE analysis showed this strain secreted six major protein bands of 65, 60, 55, 34, 25 and 20 kDa. A bioassay of this strain reveals that it shows specific activity against P. brassicae and nematode. Here we describe the features of this organism, together with the draft genome sequence and annotation. The 3,989,358 bp long genome (39 contigs) contains 4,001 protein-coding genes and 80 RNA genes.


Introduction
Bacillus amyloliquefaciens is a species of bacterium in the genus Bacillus with high affinity of Bacillus subtilis. In the growth process, B. amyloliquefaciens can produce numerous antimicrobial or, more generally, bioactive metabolites with well-established activity in vitro such as surfactin, iturin and fengycin [1,2]. The production of all of these antibiotic compounds highlights B. amyloliquefaciens as a good candidate for the development of biocontrol agents [3,4].
Strain HB-26 belongs to the species B. amyloliquefaciens. The type strain of the species produces much bioactive metabolites showing specific activity against Plasmodiophora brassicae which could cause Clubroot, one of the most serious diseases of brassica crops worldwide [5][6][7]. Heavy infection by this pathogen of Chinese cabbage, cabbage, broccoli, turnip, oilseed rape, and other crucifers can lead to severe economic losses [8][9][10][11]. The root systems of infected plants show gall formation, which inhibits nutrient and water transport, stunts plant growth, and increases susceptibility to wilting [12,13]. Otherwise, bioassay results showed strain HB-26 also had some rootknot nematicidal activity.
Here, we present a summary classification and a set of features for B. amyloliquefaciens HB-26, together with the description of the genomic sequencing and annotation in order to improve the understanding of the molecular basis for its ability to inhibit Plasmodiophora brassicae and nematode.

Classification and features
Strain HB-26 colonies were milky white and matte with a wrinkled surface. Microscopy observations indicated that it was a Bacillus species ( Figure 1A, Figure 1B and Table 1). SDS-PAGE analysis showed this strain secreted six major protein bands of 65, 60, 55, 34, 25 and 20 kDa ( Figure 1C). A representative genomic 16S rDNA sequence of strain HB-26 was searched against GenBank database using BLAST [29]. Sequences showing more than 99% sequence identity to 16S rDNA of HB-26 were selected for phylogentic analysis, and 15 sequences were aligned with ClustalW algorithm. The tree was reconstructed by neighbor-Joining by using Kimura 2-parameter for distance calculation. The phylogenetic tree was assessed by bootstrapped for 1,000 times, and the consensus tree was shown in Figure 2. Standards in Genomic Sciences  Sample collection time 2009 IDA a Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [28]

Genome sequencing information Genome project history
This Bacillus strain was selected for sequencing due to its specific activity against Plasmodiophora brassicae and nematode. The complete high quali-ty draft genome sequence is deposited in GenBank. The Beijing Genomics Institute (BGI) performed the sequencing and the NCBI staffs used the Prokaryotic Genome Annotation Pipeline (PGAAP) to complete the annotation. A summary of the project is given in Table 2.

Genome sequencing and assembly
The genome of B. amyloliquefaciens HB-26 was sequenced using Illumina Hiseq 2000 platform (with a combination of a 251-bp paired-end reads sequencing from a 700-bp genomic library). Reads with average quality scores below Q30 or more than 3 unidentified nucleotides were eliminated. 2,605,589 paired-end reads (achieving ~192 fold coverage [0.94 Gb]) was de novo assembled using SOAPdenovo 1.05 version [9]. The assembly consists of 39 contigs arranged in 39 scaffolds with a total size of 3,989,358 bp (including chromosome and plasmids).

Genome annotation
Genome annotation was completed using the Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP). Briefly, Protein-coding genes were predicted using a combination of GeneMark and Glimmer [31][32][33]. Ribosomal RNAs were predicted by sequence similarity searching using BLAST against an RNA sequence database and/or using Infernal and Rfam models [34,35]. Transfer RNAs were predicted using tRNAscan-SE [36]. In order to detect missing genes, a complete six-frame translation of the nucleotide sequence was done and predicted proteins (generated above) were masked. All predictions were then searched using BLAST against all proteins from complete microbial genomes. Annotation was based on comparison to protein clusters and on the BLAST results. Conserved domain Database and Cluster of Orthologous Group information is then added to the annotation. CRISPR repeats 0 0 a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. b None of the rRNA operons appears to be complete due to unresolved assembly problems.

Genome properties
The draft assembly of the genome consists of 39 contigs in 39 scaffolds, with an overall 47.37% G+C content. Of the 4,114 genes predicted, 4,001 were protein-coding genes, and 80 RNAs were al-so identified. The majority of the protein-coding genes (54.06%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 3, Table 4 and Figure 3.

Acknowledgments
This work was financially supported by the National Science and Technology Support Program