Non-contiguous finished genome sequence of plant-growth promoting Serratia proteamaculans S4

Serratia proteamaculans S4 (previously Serratia sp. S4), isolated from the rhizosphere of wild Equisetum sp., has the ability to stimulate plant growth and to suppress the growth of several soil-borne fungal pathogens of economically important crops. Here we present the non-contiguous, finished genome sequence of S. proteamaculans S4, which consists of a 5,324,944 bp circular chromosome and a 129,797 bp circular plasmid. The chromosome contains 5,008 predicted genes while the plasmid comprises 134 predicted genes. In total, 4,993 genes are assigned as protein-coding genes. The genome consists of 22 rRNA genes, 82 tRNA genes and 58 pseudogenes. This genome is a part of the project “Genomics of four rapeseed plant growth-promoting bacteria with antagonistic effect on plant pathogens” awarded through the 2010 DOE-JGI’s Community Sequencing Program.


Introduction
The genus Serratia is a diverse and widely dispersed group of Gammaproteobacteria [1,2]. Some of these have beneficial effects on ecologically and economically important plants [3][4] and others are known as opportunistic pathogens of humans and other organisms [1]. Plant-associated Serratia spp. are of considerable agricultural interest and several strains of S. plymuthica have recently been studied in relation to their possible use as biocontrol agents in agriculture [3][4]. Serratia proteamaculans S4 (previously Serratia sp. S4) was isolated from the rhizosphere of naturally growing Equisetum plants in 1980 from Uppsala, Sweden. The bacterium is able to enhance the growth of rapeseed plants and inhibit the growth of different fungal pathogens such as Verticillium dahliae, and Rhizoctonia solani. Sequencing the S. proteamaculans S4 genome will therefore assist in the identification of genetic traits underlying its potential and its beneficial effects on plant growth. Here we present the non-contiguous finished genome sequence of S. proteamaculans S4.

Classification and features
A representative 16S rRNA gene sequence of S. proteamaculans S4 was subjected to comparison with the most recently released databases in GenBank. The NCBI BLAST [5] tool was used under the default settings (i.e. by considering only the high-scoring segment pairs (HSP's) from the best 250 hits). The most frequently matching genus was Serratia (almost 50% of total matches). When considering high score, coverage and identity -S. proteamaculans 568 was the first match with 100% identity and 100% coverage. Other Serratia species with maximum identity were other S. proteamaculans strains (10%) with maxi-Standards in Genomic Sciences mum identity 99%, S. fonticola (2%) with maximum identity 98%, S. grimesii (3.2%) with maximum identity 99%, S. liquefaciens (4.4%) with maximum identity 99%, S. plymuthica (3.2%) maximum identity 98-99% and unclassified Serratia sp. (22%) with maximum identity 98-99%. Remaining matches were with Rahnella sp. (2%) with maximum identity 98-99% and other uncultured bacterial clones (40%) with maximum identity 98-99%. Figure 1 shows the phylogenetic proximity of S. proteamaculans S4 to S. proteamaculans 568 (CP000826) as well as its distinct separation from other members of the Enterobacteriaceae. Its phylogenetic relationship was further confirmed by digital DNA-DNA hybridization [10] values above 70% with the genome sequence of the S. proteamaculans 568 using the GGDC web-server [11]. Serratia proteamaculans S4, a Gram-negative, rod shaped, non-sporulating and motile bacterium measuring 1-2 µm in length and 0.5-0.7 µm in width [ Figure 2], was isolated from Equisetum roots. The bacterium is a pale yellow colored, facultative aerobe and easily grows on a broad spec-trum of organic compounds including carbon sources such as glucose, sucrose, succinate, mannitol, inositol, sorbitol, arabinose, trehalose, and melibiose. The optimal temperature for its growth is 28 °C and it can grow in the pH range 4 -10 [ Table 1].

Genome sequencing information
Serratia proteamaculans S4 was selected for sequencing because of its biological control potential and plant growth enhancing activity in rapeseed crops. The genome sequence is deposited in the Genomes On Line Databases [9]. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2 together with associated MIGS identifiers [12].

Growth conditions and DNA isolation
Serratia proteamaculans S4 was grown on Luria Broth (LB) medium for 12 hours at 28 °C. The DNA was extracted from the cells by using a standard CTAB protocol for bacterial genomic DNA isolation, which is available at JGI [26].  [6] under the default settings. The tree was constructed using MEGA5 software [7] under the Maximum likelihood criterion and the tree was rooted with Azotobacter vinelandii (a member of the family Pseudomonadaceae). The branches are scaled according to the expected number of substitutions per site. The numbers above the branches are support values from 1,000 bootstrap replicates if larger than 60% [8]. All lineages with genome sequences are registered in GOLD [9].  Altitude 58 -63 m NAS a) Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [25]. If the evidence code is IDA, then the property should have been directly observed, for the purpose of this specific publication, for a live isolate by one of the authors, or an expert or reputable institution mentioned in the acknowledgements.

Genome sequencing and assembly
The draft genome of S. proteamaculans S4 was generated using a combination of Illumina and 454 sequencing platforms. The details of library construction and sequencing are available at the JGI [26]. The sequence data generated from Illumina GAii (4,232 Mb) were assembled with Velvet [27] and the consensus sequence was computationally shredded into 1.5 kb overlapping fake reads. The sequencing data generated from 454 pyrosequencing (89.5 Mb) were assembled with Newbler and consensus sequences were computationally shredded into 2 kb overlapping fake reads. The initial draft assembly contained 50 contigs in 2 scaffolds. The 454 Newbler consensus reads, the Illumina Velvet consensus reads and the read pairs in the 454 paired end library were integrated using parallel Phrap [28,29]. The software, Consed [30] was used for the subsequent finishing process. The software Polisher [31] was used to correct the base errors and increase the consensus quality. Possible mis-assemblies were corrected with gapResolution ( [26], unpublished), Dupfinisher [32]

Genome annotation
The S. proteamaculans S4 genes were identified using Prodigal [33] as part of the DOE-JGI annotation pipeline [34] followed by a round of manual curation using the JGI GenePRIMP pipeline [35]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Non-coding genes and miscellaneous features were predicted using tRNAscan-SE [36], RNAmmer [37], Rfam [38], TMHMM [39], and signalP [40]. Additional gene prediction analysis and manual functional annotation was performed within the Integral Microbial Genomics-Expert Review (IMG-ER) [41] platform developed by the Joint Genome Institute, Walnut Creek, CA, USA.

Genome properties
The genome includes a circular chromosome of 5,324,944 bp (55% GC content) along with a circular plasmid of 129,797 bp (50% GC content). The chromosome comprises 5,008 predicted genes while the plasmid comprises 137 predicted genes.
In total 4,993 genes are assigned as proteincoding genes. About 85% of the protein-coding genes were assigned to a putative function with the remaining annotated as hypothetical proteins. The genome consists of 22 rRNA genes, 82 tRNA genes and 58 pseudogenes. The properties and the statistics of the genome are summarized in Tables  3 and 4 and Figures 3a and 3b. The genome contains genes arranged in several gene clusters encoding secondary metabolites such as siderophores (enterobactin and aerobactin) and antibiotics (pyrrolnitrin). These compounds can contribute indirectly to plant growth enhancement by suppressing growth of pathogens. The genome also includes genes for the production of plant growth hormones such as indole-3-acetic acid (IAA), which can be directly involved in plant growth. Further studies of the biochemical properties of additional secondary metabolites and regulation of their production using functional genomics will elucidate the detailed mechanisms underlying plant growth promotion by S. proteamaculans S4.   a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. a) The total is based on the total number of protein coding genes in the annotated genome.