Complete genome sequence of a plant associated bacterium Bacillus amyloliquefaciens subsp. plantarum UCMB5033

Bacillus amyloliquefaciens subsp. plantarum UCMB5033 is of special interest for its ability to promote host plant growth through production of stimulating compounds and suppression of soil borne pathogens by synthesizing antibacterial and antifungal metabolites or priming plant defense as induced systemic resistance. The genome of B. amyloliquefaciens UCMB5033 comprises a 4,071,167 bp long circular chromosome that consists of 3,912 protein-coding genes, 86 tRNA genes and 10 rRNA operons.


Introduction
Bacillus amyloliquefaciens is a plant-associated species belonging to the family Bacillaceae. The members of the genus Bacillus are ubiquitous in nature and include biologically and ecologically diverse species, ranging from those beneficial for economically important plants, to pathogenic species that are harmful to humans. B. amyloliquefaciens UCMB5033 is a plant growth promoting bacterium (PGPB) that was isolated from a cotton plant [1]. Studies have shown that B. amyloliquefaciens UCMB5033 is an important tool for studies of plant-bacteria associations, has potential to confer protection against soil borne pathogens, and to stimulate growth of oilseed rape (Brassica napus) [2]. Such traits make UCMB5033 an important tool for studies of plant-bacteria associations and production of compounds that directly or indirectly promote plant growth or stress tolerance. Here we present a description of the complete genome sequencing of B. amyloliquefaciens UCMB5033 and its annotation.

Classification and features
Strain UCMB5033 was identified as a member of the B. amyloliquefaciensgroup based on phenotypic analysis [1]. The comparison of 16S rRNA gene sequences with the most recent databases from GenBank using NCBI BLAST [3] under default settings showed that B. amyloliquefaciens UCMB5033 shares 99% identity with many Bacillusspecies including Bacillus atrophaeus (CP002207.1) and Bacillus subtilis subsp. spizizenii str. W23 (CP002183.1). Figure 1 shows the phylogenetic relationship of B. amyloliquefaciens UCMB5033 with other species within the genus Bacillus. The tree highlights the close relationship of UCMB5033 with the B. amyloliquefaciens subsp. plantarum type strain FZB42. The other B. amyloliquefacienstype strain DSM 7 T representing subsp. amyloliquefaciens, displayed less taxonomic relatedness and strain UCMB5033 can thus be regarded as belonging to the subsp. plantarum also in line with its plant associated characteristics [7]. The tree is based on 16S rRNA g ene sequences alig ned with MUSCLE [4] was inferred under maximum likelihood criterion using MEGA5 [5] and rooted with Geobacillus thermogluc osidasius (a member of the family Bacillaceae). The numbers above the branches are support values from 1,000 bootstrap replicates if larg er than 50% [6].

Morphology and physiology
B. amyloliquefaciens UCMB5033 is a Grampositive, rod shaped, motile, spore forming, aerobic, and mesophilic microorganism (Table 1). Strain UCMB5033 is approximately 0.8 µm wide and 2 µm long that can grow on Luria Broth (LB) and potato dextrose agar (PDA) between 20 °C and 37 °C within the pH range 4-8. B. amyloliquefaciens UCMB5033 has properties as a plant growth promoting rhizobacterium (PGPR) [2]. The ability to catabolize plant derived compounds, resistance to metals and drugs; root colonization and biosynthesis of metabolites presumably give B. amyloliquefaciens UCMB5033 an advantage in developing a symbiotic relationship with plants in competition with other microorganims in the soil microbiota.

Genome assembly and annotation
Growth conditions and DNA isolation B. amyloliquefaciens UCMB5033 was grown in LB medium at 28°C for 12 hours (cells were in the early stationary phase). The genomic DNA was isolated using a QIAmp DNA mini kit (Qiagen).

Genome sequencing
B. amyloliquefaciens UCMB5033, originally isolated from cotton plant, was selected for sequencing on the basis of its ability to promote rapeseed growth and inhibit soil borne pathogens. Genome sequencing of B. amyloliquefaciens UCMB5033 using Illumina multiplex technology and Ion Torrent PGM systems was performed by Science for Life Laboratory (SciLifeLab) at Uppsala University. The genome project is deposited in the Genomes On Line Databases [24] and the complete genome sequence is deposited in the ENA database under accession number HG328253. A summary of the project information is shown in Table 2 and its association with MIGS identifiers.

Genome assembly
The genome of B. amyloliquefaciens UCMB5033 was assembled using 21,919,534 Illumina pairedend reads (75bp) and 1,922,725 single-end reads (Ion Torrent). The chromosome of size 4,071,167 bp was assembled by providing paired-end reads to MIRA v.3.4 [25] for reference-guided assembly using the available genome sequence of B. amyloliquefaciens UCMB5036 (accession no. HF563562) [26]. Whereas, single-end reads were assembled with Newbler v.2.8 by a de novo assembly method. Both forms of assemblies were compared after alignment to identify indels and cover gap regions using Mauve genome alignment software [27]. , not directly observed for t he living, isolated sample, but based on a g enerally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontolog y project [23].

Genome annotation
The genome sequence was annotated using a combination of several annotation tools via the Magnifying Genome (MaGe) Annotation Platform [28]. Genes were identified using Prodigal [29] and AMIGene [30] as part of the MaGe genome annotation pipeline followed by manual curation. Putative functional annotation of the predicted protein coding genes was done automatically by MaGe after BlastP similarity searches against the Uniprot and Trembl, TIGR-Fam, Pfam, PRIAM, COG and InterPro databases. The tRNAScanSE tool [31] was used to find tRNA genes. Ribosomal RNA genes were identified using RNAmmer tool [32].

Genome properties
The B. amyloliquefaciens UCMB5033 genome consists of a circular chromosome of size 4,071,168 bp. The genome having G+C content of 46.19% were predicted to contain 4,095 predicted ORFs including 10 copies each of 16S, 23S, and 5S rRNA; 86 tRNA genes, and 3,912 protein-coding sequences with the coding density of 87.51% (Figure 2). The majority of protein coding genes (81%) was assigned putative functions while those remaining were annotated as hypothetical or conserved hypothetical proteins ( Table 3). The distribution into COG functional categories is presented in Table 4.  a) The total is based on either the size of the g enome in base pairs or the total number of protein coding genes in the annotated genome.
b) Also includes 36 pseudog enes and 66 non-coding RNA.  a) The total is based on the total number of protein coding g enes in the annotated genome.

Conclusion
Comparative genome analysis might reveal mechanisms by which UCMB5033 mediates plant protection and growth promotion, will further enable the investigations of the biochemical and regula-tory mechanisms behind the symbiotic relationship, and will shed light on the activity of PGPR in different environments.