Genome sequence of Ensifer sp. TW10; a Tephrosia wallichii (Biyani) microsymbiont native to the Indian Thar Desert

Ensifer sp. TW10 is a novel N2-fixing bacterium isolated from a root nodule of the perennial legume Tephrosia wallichii Graham (known locally as Biyani) found in the Great Indian (or Thar) desert, a large arid region in the northwestern part of the Indian subcontinent. Strain TW10 is a Gram-negative, rod shaped, aerobic, motile, non-spore forming, species of root nodule bacteria (RNB) that promiscuously nodulates legumes in Thar Desert alkaline soil. It is fast growing, acid-producing, and tolerates up to 2% NaCl and capable of growth at 40oC. In this report we describe for the first time the primary features of this Thar Desert soil saprophyte together with genome sequence information and annotation. The 6,802,256 bp genome has a GC content of 62% and is arranged into 57 scaffolds containing 6,470 protein-coding genes, 73 RNA genes and a single rRNA operon. This genome is one of 100 RNB genomes sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project.


Introduction
The Great Indian (or Thar) Desert is a large, hot, arid region in the northwestern part of the Indian subcontinent. It is the 18th largest desert in the world covering 200,000 square km with 61% of its landmass occupying Western Rajasthan. The landscape occurs at low altitude (<1500 m above sea level) and extends from India into the neighboring country of Pakistan [1]. The Thar Desert region is characterized by low annual precipitation (50 to 300 mm), high thermal load and alkaline soils that are poor in texture and fertility [2]. Despite these harsh conditions, the Thar Desert has very rich plant diversity in comparison to other desert landscapes [3]. Approximately a quarter of the plants in the Thar Desert are used to provide animal fodder or food, fuel, medicine or shelter for local inhabitants [4].
The Indian Thar desert harbors several native and exotic plants of the Leguminoseae family [2] including native legume members of the subfamilies Caesalpinioideae, Mimosoideae and Papilionoideae that have adapted to the harsh Thar desert environment [5]. The Papilionoid genus Tephrosia can be found throughout this semiarid to arid environment and these plants are among the first to grow after monsoonal rains. The generic name is derived from the Greek word "tephros" meaning "ash-gray" since dense trichomes on the leaves provide a greyish tint to the plant. Many species within this genus produce the potent toxin rotenone, which historically has been used to poison fish. It is a perennial shrub that has adapted to the harsh desert conditions by producing a long tap root system and dormant auxillary shoot buds.
Recently, the root nodule bacteria (RNB) microsymbionts capable of fixing nitrogen in symbiotic associations with Tephrosia have been characterized [5]. Both Bradyrhizobium and Ensifer were present within nodules, but a particularly high incidence of Ensifer was noted [5]. Ensifer was found to occupy the nodules of all four species of Tephrosia examined [5]. Here we present a preliminary description of the general features of the T. wallichii (Biyani) microsymbiont Ensifer sp. TW10 together with its genome sequence and annotation.
Minimum Information about the Genome Sequence (MIGS) is provided in Table 1. Figure 1 shows the phylogenetic neighborhood of Ensifer sp. strain TW10 in a 16S rRNA sequence based tree. This strain has 99% sequence identity at the 16S rRNA sequence level to E. kostiense LMG 19227 and 100% 16S rRNA sequence identity to other Indian Thar Desert Ensifer species (JNVU IC18 from a nodule of Indigofera and JNVU TF7, JNVU TP6 and TW8 from nodules of Tephrosia). . All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed using MEGA, version 5 [19]. The tree was built using the Maximum-Likelihood method with the General Time Reversible model [20]. Bootstrap analysis [21] with 500 replicates was performed to assess the support of the clusters. Type strains are indicated with a superscript T. Brackets after the strain name contain a DNA database accession number and/or a GOLD ID (beginning with the prefix G) for a sequencing project registered in GOLD [22]. Published genomes are indicated with an asterisk. Phylum Proteobacteria TAS [8] Class Alphaproteobacteria TAS [9,10] Order Rhizob iales TAS [10,11] Family Rhizob iaceae TAS [12,13] Genus Evidence codes -IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living , isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontolog y project [18].

Classification and general features
Ensifer sp. strain TW10 is a Gram-negative rod (Figure 2, and Figure 3) in the order Rhizobiales of the class Alphaproteobacteria. It is fast growing, forming white-opaque, slightly domed and moderately mucoid colonies with smooth margins within 3-4 days at 28°C when grown on YMA [23].

Symbiotaxonomy
Ensifer sp. TW10 has the ability to nodulate (Nod + ) and fix nitrogen (Fix + ) effectively with a wide range of perennial native (wild) legumes of Thar Desert origin and with species of crop legumes (Table 2). Ensifer sp. TW10 is symbiotically competent with these species when grown in alkaline soils. TW10 can nodulate the wild tree legume Prosopis cineraria of the Mimosoideae subfamily. However, it does not form nodules on the Mimosoid hosts Mimosa hamata and M. himalayana even though these hosts are known to be nodulated by Ensifer species [5,24]. TW10 was not compatible with the host Phaseolus vulgaris, a legume of the Phaseolae tribe.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its environmental and agricultural relevance to issues in global carbon cycling, alternative energy production, and biogeochemical importance, and is part of the Community Sequencing Program at the U.S. Department of Energy, Joint Genome Institute (JGI) for projects of rele-vance to agency missions. The genome project is deposited in the Genomes OnLine Database [22] and standard draft genome sequence in IMG. Sequencing, finishing and annotation were performed by the JGI. A summary of the project information is shown in Table 3.

Growth conditions and DNA isolation
Ensifer sp. TW10 was cultured to mid logarithmic phase in 60 ml of TY rich medium [25] on a gyratory shaker at 28°C. DNA was isolated from the cells using a CTAB (Cetyl trimethyl ammonium bromide) bacterial genomic DNA isolation method [26].

Genome sequencing and assembly
The genome of Ensifer sp. TW10 was generated at the Joint Genome Institute (JGI) using Illumina [27] technology. An Illumina std shotgun library was constructed and sequenced using the Illumina HiSeq 2000 platform which generated 14,938,244 reads totaling 2,241 Mbp. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website [26]. All raw Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes known Illumina sequencing and library preparation artifacts (Mingkun L, Copeland, A, and Han, J, unpublished).

Genome annotation
Genes were identified using Prodigal [30] as part of the DOE-JGI annotation pipeline [31]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) non-redundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. The tRNAScanSE tool [7] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [32]. Other non-coding RNAs such as the RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [33]. Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform) [34,35].

Genome properties
The genome is 6,802,256 nucleotides with 61.56% GC content ( Table 4) and comprised of 57 scaffolds (Figure 4) of 57 contigs. From a total of 6,546 genes, 6,473 were protein encoding and 73 RNA only encoding genes. The majority of genes (77.44%) were assigned a putative function while the remaining genes were annotated as hypothetical. The distribution of genes into COGs functional categories is presented in Table 5.