Permanent draft genome sequences of the symbiotic nitrogen fixing Ensifer meliloti strains BO21CC and AK58

Ensifer (syn. Sinorhizobium) meliloti is an important symbiotic bacterial species that fixes nitrogen. Strains BO21CC and AK58 were previously investigated for their substrate utilization and their plant-growth promoting abilities showing interesting features. Here, we describe the complete genome sequence and annotation of these strains. BO21CC and AK58 genomes are 6,985,065 and 6,974,333 bp long with 6,746 and 6,992 genes predicted, respectively.


Introduction
Strains AK58 and BO21CC belong to the species Ensifer (syn. Sinorhizobium) meliloti (Alphaproteobacteria, Rhizobiales, Rhizobiaceae, Sinorhizobium/Ensifer group) [1,2], an important symbiotic nitrogen fixing bacterial species that associates with roots of leguminous plants of several genera, mainly from Melilotus, Medicago and Trigonella [3]. These strains have been originally isolated from Medicago spp. during a long course experiment (BO21CC) and from plants collected in the north Aral sea region (Kazakhstan) (AK58). Previous analyses conducted by comparative genomic hybridization (CGH), nodulation tests and Phenotype Microarray™(Biolog, Inc.) showed that AK58 (= DSM 23808) and BO21CC (= DSM 23809) are highly diverse in both genomic and phenotypic properties. In particular, they show different sym-biotic phenotypes with respect to the crop legume Medicago sativa L [4,5]. In a previous collaboration with DOE-JGI, the genomes of strains AK83 (= DSM 23913) and BL225C (= DSM 23914) were also sequenced, allowing the identification of putative genetic determinants for their different symbiotic phenotypes [6]. Consequently, interest in strains AK58 and BO21CC arose, sincegenomic analysis of these strains would foster a greater understanding of the E. meliloti pangenome [7], and facilitate deeper investigation of the genomic determinants responsible for differences in symbiotic performances between E. meliloti strains found in nature. These research goals may lead to improved strain selection and better inoculants of the legume crop M. sativa.

Classification and features
Representative genomic 16S rRNA sequences of strains AK58 and BO21CC were compared with those present in the Ribosomal Database by using Match Sequence module of Ribosomal Database Project [8]. Representative genomic 16S rRNA sequences of closer phylogenetic relatives of the genus Ensifer/Sinorhizobium and of Rhizobiales family (as outgroup) were then selected from IMG-ER database [ Table 1], [16]. All strains from the genus Ensifer/Sinorhizobium form a close cluster, including strains AK58 and BO21CC, thus confirming the affiliation of these two strains within the species. Figure 1 shows the phylogenetic neighborhood of E. meliloti AK58 and BO21CC in a 16S rRNA based tree. E. meliloti AK58 and BO21CC show different symbiotic phenotypes with respect to the host plant Medicago sativa, as well as differences in substrates utilization [5]. Moreover E. meliloti AK58 and BO21CC present differences in cell morphology also, with AK58 being smaller than BO21CC and the other E. meliloti strains for which genome sequencing is available ( Figure 2). Interestingly, BO21CC is also showing cells with a ratio between cell axes nearer 1 (more rounded cells), when compared with AK58 and with the other E. meliloti strains ( Figure 2).

Genome sequencing information
Genome project history AK58 and BO21CC strains were selected for sequencing on the basis of the Community Sequencing Program 2010 of DOE Joint Genome Institute (JGI) in relation to the project entitled "Complete genome sequencing of Sinorhizobium meliloti AK58 and BO21CC strains: Improving alfalfa performances through the exploitation of Sinorhizobium genomic data". The overall rationale for their genome sequencing was related to the identification of genomic determinants of different symbiotic performances between S. meliloti strains. The genome project is deposited in the Genomes On Line Database [21] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE-JGI. A summary of the project information is shown in Table 2.

Growth conditions and DNA isolation
E. meliloti strains AK58 and BO21CC (DSM23808 and DSM23809, respectively) were grown in DSMZ medium 98 (Rhizobium medium) [22] at 28°C. DNA was isolated from 0.5-1 g of cell paste using Jetflex Genomic DNA Purification kit (GENOMED 600100) following the standard protocol as recommended by the manufacturer with modification st/LALMP [23] for strain AK58 and additional 5 µl proteinase K incubation at 58° for 1 hour for strain BO21CC, respectively. DNA will be available on request through the DNA Bank Network [24].

Genome sequencing and assembly
The draft genomes were generated at the DOE Joint Genome Institute (JGI) using Illumina data [25]. For BO21CC genome, we constructed and sequenced an Illumina short-insert paired-end library with an average insert size of 270 bp which generated 76,033,356 reads and an Illumina long-insert paired-end library with an average insert size of 9,141.74 ± 1,934.63 bp which generated 4,563,348 reads totaling 6,463 Mbp of Illumina data. For AK58, a combination of Illumina [25] and 454 technologies [26] was used. For the AK58 genome we constructed and sequenced an Illumina GAii shotgun library which generated 80,296,956 reads totaling 6,102.6 Mb, a 454 Titanium standard library which generated 0 reads and 1 paired end 454 library with an average insert size of 10 kb which generated 326,569 reads totaling 96 Mb of 454 data. All general aspects of library construction and sequencing performed at the JGI can be found at [27]. The initial draft assemblies contained 194 contigs in 16 scaffold(s) for BO21CC, and 311 contigs in 5 scaffolds for AK58.
For BO21CC the initial draft data was assembled with Allpaths and the consensus was computationally shredded into 10 Kbp overlapping fake reads (shreds). The Illumina draft data was also assembled with Velvet, version 1.1.05 [28], and the consensus sequences were computationally shredded into 1.5 Kbp overlapping fake reads (shreds). The Illumina draft data was assembled again with Velvet using the shreds from the first Velvet assembly to guide the next assembly. The consensus from the second Velvet assembly was shredded into 1.5 Kbp overlapping fake reads. The fake reads from the Allpaths assembly and both Velvet assemblies and a subset of the Illumina CLIP paired-end reads were assembled using parallel phrap, version 4.24 (High Performance Software, LLC). Possible misassemblies were corrected with manual editing in Consed [29][30][31].  The phylogenetic tree was inferred by using the Maximum Likelihood method based on the Tamura 3-parameter model [17], chosen as model with the lowest BIC scores (Bayesian Information Criterion) after running a Maximum Likelihood fits of 24 different nucleotide substitution models (Model Test). The bootstrap consensus tree inferred from 500 replicates [18] is taken to represent the phylogenetic pattern of the taxa analyzed [18]. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches. The tree with the highest log likelihood (-3411.7124) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. A discrete Gamma distribution was used to model evolutionary rate differences among sites (G, parameter = 0.3439). A total of 1,284 nt positions were present in the final dataset. Model test and Maximum Likelihood inference were conducted in MEGA5 [19]. In bold E. meliloti AK58 and BO21CC strains.

Genome annotation
Genes were identified using Prodigal [33] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [34]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [16].

Genome properties
The High-Quality draft assemblies of the genomes consist of 41 scaffolds for BO21CC and 9 scaffolds for AK58 representing overall 6,985,065 and 6,974,333 bp, respectively. The overall G+C content was 62.12% and 62.04% for BO21CC and AK58, respectively (Table 3a and Table 3b). Of the 6,746 and 6,992 genes predicted, 5,357 and 5,549 were protein-coding genes, and 105 and 79 RNAs were present in BO21CC and AK58, respectively. The large majority of the protein-coding genes (79.32% and 78.03%, BO21CC and AK58, respectively) were assigned a putative function as COGs.
The distribution of genes into COGs functional categories is presented in Table 4. *only one rRNA operon appears to be complete.