Complete genome sequence of Anabaena variabilis ATCC 29413

Anabaena variabilis ATCC 29413 is a filamentous, heterocyst-forming cyanobacterium that has served as a model organism, with an extensive literature extending over 40 years. The strain has three distinct nitrogenases that function under different environmental conditions and is capable of photoautotrophic growth in the light and true heterotrophic growth in the dark using fructose as both carbon and energy source. While this strain was first isolated in 1964 in Mississippi and named Anabaena flos-aquae MSU A-37, it clusters phylogenetically with cyanobacteria of the genus Nostoc. The strain is a moderate thermophile, growing well at approximately 40° C. Here we provide some additional characteristics of the strain, and an analysis of the complete genome sequence.


Introduction
Anabaena variabilis ATCC 29413 (=IUCC 1444 = PCC 7937) is a semi-thermophilic, filamentous, heterocyst-forming cyanobacterium. Heterocysts, which are specialized cells that form in a semiregular pattern in the filament, are the sites of nitrogen fixation in cells grown in an oxic environment. A. variabilis ATCC 29413 was first isolated as a freshwater strain in 1964 in Mississippi by R.G. Tischer, who called the strain Anabaena flosaquae A-37 [1]. He was primarily interested in the extracellular polysaccharide produced by this strain [2][3][4], which was subsequently called Anabaena variabilis by Healey in 1973 [5]. It was characterized in more detail by several labs in the 1960's and 1970's [6][7][8]. In particular, the early work by Wolk on this strain led to its becoming a model strain for cyanobacterial physiology, nitrogen fixation and heterocyst formation [9][10][11][12][13]. Here we present a summary classification and a set of features for A. variabilis ATCC 29413 together with the description of the complete genomic sequencing and annotation.

Classification and features
The general characteristics of A. variabilis are summarized in Table 1 and its phylogeny is shown in Figure 1. Vegetative cells of A. variabilis are oblong, 3-5 µm in length, have a Gram-negative cell wall structure, are normally non-motile, and form long filaments. Under conditions of nitrogen deprivation, certain vegetative cells differentiate heterocysts, which are the sites of aerobic nitrogen fixation (reviewed in [19,28,29]). Heterocysts, which comprise 5-10% of the cells in a filament, are terminally differentiated cells that form in a semiregular pattern in the filament ( Figure 2). Vegetative cells of A. variabilis can also differentiate into akinetes, which are spore-like cells that survive environmental stress [30]. Nitrogen stress may also induce the formation of motile filaments called hormogonia in A. variabilis [19]. In other cyanobacteria hormogonia are required for the establishment of symbiotic associations with plants [31]. A. variabilis has oxygen-evolving photosynthesis; however, it is also capable of photoheterotrophic growth and chemoheterotrophic growth in the dark using fructose [23,[32][33][34]. The strain cannot ferment; hence, it does not grow anaerobically in the dark with fructose. The genome sequence revealed the ABC-type fructose transport genes that were subsequently shown to be required for heterotrophic growth of the strain [32].
A. variabilis is a well-established model organism for heterocyst formation [35,36], nitrogen fixation [21,37,38], hydrogen production [39,40], photosynthesis [41][42][43], and heterotrophic cyanobacterial growth [9,32,44]. It is unique among the wellcharacterized cyanobacteria in that it has three sets of genes that encode distinct nitrogenases [19,37,38,[45][46][47][48]. One is the conventional, heterocyst-specific Mo-nitrogenase, the second is another Mo-nitrogenase that functions only under anoxic conditions in vegetative cells and heterocysts, while the third is a V-nitrogenase that is also heterocyst specific. These nitrogenases are expressed under distinct physiological conditions so that only one nitrogenase is generally functional [19]. The genome sequence has revealed a large 41-kb island of genes that all appear to be involved in synthesis and regulation of the V-nitrogenase, including the genes for the first vanadate transport system to be characterized in any bacterium [49]. The Vnitrogenase of A. variabilis has been exploited for its ability to make large amounts of hydrogen as a potential source of alternative energy production [39,40].

Chemotaxonomy
The Gram-negative cyanobacterial cell wall has not been well characterized; however, it typically contains lipopolysaccharide. In A. variabilis the O antigen contains L-acofriose, L-rhamnose, Dmannose, D-glucose, and D-galactose [50]. The cell envelope of the heterocyst differs from vegetative cells in that it also contains an inner laminated glycolipid layer and an outer fibrous, homogeneous polysaccharide layer. In A. variabilis the polysaccharide layer comprises a 1,3-linked backbone of glucosyl and mannosyl residues with terminal xylosyl and galactosyl residues. The side branches comprise glucosyl residues having a terminal arabinosyl residue. The inner heterocyst cell wall of almost all strains of Anabaena and Nostoc consists of a glycolipid comprising 1-(O-hexose)-3,25hexacosanediol and 1-(O-hexose)-3-keto-25hexacosanol [51,52]. The lipids of most cyanobacteria comprise monogalactosyldiacylglycerols, digalactosyldiacylglycerols, sulphoquinovosyldiacylglycerols and phosphatidylglycerols [53]. In A. variabilis the primary products of lipid biosynthesis are 1-stearoyl-2-palmitoyl species of monoglucosyl diacylglycerol, phosphatidylglycerol and sulfoquinovosyl diacylglycerol; however, the degree of saturation of the fatty acids in the lipids depends on the growth temperature [54][55][56]

Genome project history
This organism was selected for sequencing because of its 50-year long history as a model organism for studies on many aspects cyanobacterial metabolism including photosynthesis, nitrogen fixation, hydrogen production, and heterotrophic growth. The genome project is deposited in the Genome On Line Database (Gc00299) and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Strain history
The strain was first isolated by R.G. Tischer, in 1964 in Mississippi, who called it Anabaena flosaquae MSU A-3 7 [1]. It was submitted to the Indiana University Culture Collection (Anabaena flosaquae IUCC 1444) and was then submitted by C.P Wolk as Anabaena variabilis to ATCC in 1976 (Anabaena variabilis ATCC 29413). The phylogenetic tree ( Figure 1) reveals that the strain clusters with cyanobacteria in the genus Nostoc, which is consistent with the fact that it produces hormogonia [19], and not with the cluster of Anabaena/Aphanizomenon, suggests that the strain was incorrectly named.

Growth conditions and DNA isolation
An axenic culture of A. variabilis ATCC 29413 was grown photoautotrophically in one L of an eightfold dilution of the medium of Allen and Arnon (AA/8) [57], supplemented 5.0 mM NaNO3 at 30°C with illumination at 50-80 μEinsteins m -2 s -1 to an OD720 of about 0.3. Cells were harvested by centrifugation, frozen and then lysed by a combination of crushing the frozen pellet with a very cold mortar and pestle, and then treating the frozen powder with lysozyme (3.0 mg/ml)/proteinase K (1 mg/ml) in 10 mM Tris, 100 mM EDTA pH 8.0 buffer at 37°C for 30 min. This was followed by purification of the DNA using a Qiagen genomic DNA kit. The DNA was precipitated with isopropanol, spooled, and then dissolved in 10 mM Tris, 1.0 mM EDTA pH 8.0 buffer. The purity, quality and size of the bulk gDNA preparation were assessed by JGI according to DOE-JGI guidelines. Strains in blue have been sequenced and the 16S rRNA IMG locus tag is shown in parentheses after the strain. GenBank accession numbers are provided for 16S rRNA genes in strains that do not have a complete genome sequence (shown in black). The tree was made with sequences aligned by the RDP aligner, with the Jukes-Cantor corrected distance model, to construct a distance matrix based on alignment model positions, without alignment inserts, and uses a minimum comparable position of 200. The tree is built with RDP Tree Builder, which uses Weighbor [26] with an alphabet size of 4 and length size of 1,000. Bootstrapping (100 times) was used to generate a majority consensus tree [27]. Bootstrap values over 60 are shown. Gleothece violaceus PCC 7421 was used as the outgroup.  Evidence codes -IDA: Inferred from Di rect Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontolog y project [25]. If the evidence code is IDA, then the property was directly observed for a living isolate by one of the authors or an expert mentioned in the acknowledg ements.

Genome sequencing and annotation Sequencing and assembly
Sanger sequencing was done using a wholegenome shotgun approach with three plasmid libraries. A pUC18c library with 3-kb inserts generated 39.64 Mb of sequence. A pMCL200 library with 9-kb inserts produced 35.16 Mb of sequence, and a fosmid(pCC1Fos CopyControl fosmid library production kit; Epicentre, Madison, WI) library with 40-kb inserts yielded 5.83 Mb of sequence. Together, all libraries provided greater than 11.0× coverage of the genome. The plasmid inserts were made with sheared DNA that was blunt-end repaired and then size separated by gel electrophoresis. Sequencing from both ends of the plasmid inserts was done using dye terminators on ABI3730 sequencers. Details on the cloning and sequencing procedures are available from JGI [58]. Project information is summarized in Table 2. The Phred, Phrap, and Consed software package was used for sequence assembly and quality assessment [59] Repeat sequences were resolved with Dupfinisher [60]. Gaps between contigs were closed by editing in Consed, custom priming, or PCR amplification. This genome was curated to close all gaps with greater than 98% coverage of at least two independent clones. Each base pair has a minimum q (quality) value of 30 and the total error rate is less than one per 50,000.

Genome annotation
Genes were identified using two gene modeling programs, Glimmer [61] and Critica [62] as part of the Oak Ridge National Laboratory genome annotation pipeline [63].The two sets of gene calls were combined using Critica as the preferred start call for genes with the same stop codon. Genes with less than 80 amino acids that were predicted by only one of the gene callers and had no Blast hit in the KEGG database at 1e -5 were deleted. This was followed by a round of manual curation to eliminate obvious overlaps. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Noncoding genes and miscellaneous features were predicted using tRNAscan-SE [64], TMHMM [65], and signal [66].

GOLD ID Gc00299
Project relevance Hydrog en production; nitrog en fixation

Genome properties
The genome of A. variabilis, with a total of 7.1 Mbp (7,105,752 bp), has 41.4% GC (Table 4). There is a single large circular chromosome (6.36 Mbp) (Figure 3), three circular plasmids and one linear DNA element (Table 3). Plasmid A is circular with 366,354 bp and 41% GC. Plasmid B is circular with 5,762 bp and 38% GC. Plasmid C is circular with 300,758 bp and 42% GC. The linear incision element is 37,151 bp long with a higher GC content (46%) than the rest of the genome (Figure 4). The incision element has 40 ORFs of which only 5 have any similarity to known genes. AvaD004 has about 40% aa similarity to many proteins provisionally identified as phage terminases, which are involved in phage assembly. AvaD0022 is similar to RNA polymerase sigma factors, with 35% identity to a sigF encoded sigma factor, present in many other cyanobacteria including two copies of a similar gene of the large chromosome of A. variabilis. AvaD0026, identified as similar to site specific XerDlike recombinases shows 50-55% identity to genes present in many cyanobacteria including the B plasmid of A. variabilis and the alpha plasmid of Anabaena sp. PCC 7120. AvaD0037 shows similarity to the XRE family of transcriptional regulator and 65% identity to similar proteins in three sequenced strains of the cyanobacterium Cyanothece.
AvaD0015 is a histone-like DNA binding protein with about 70% identity to the HU gene present in most cyanobacteria including the gene on the large circular chromosome of A. variabilis. Many linear molecules overcome the problem of replicating the genome ends using terminal hairpins; however, there is no evidence of such repeats in this element. A total of 5,772 genes were predicted in the whole genome. Of these, 3,079 were annotated as coding for known protein functions and 62 for RNA genes (12 for rRNA and 50 for tRNA). The distribution of genes into COGs is presented in Table 4. There is considerable redundancy, with 5,710 protein coding genes belonging to 841 paralogous families in this genome.

Identification of vnf genes in other cyanobacterial genomes
The V-nitrogenase is not widespread among bacteria and has, to date, been characterized in only one cyanobacterium, A. variabilis [42,49,67], [Figure 5]. Using the large number of cyanobacterial genomes now available, we searched the IMG database for orthologs for the V-nitrogenase genes (vnf) and the vanadate transport genes (vupABC) present in A. variabilis. Only two strains showed any evidence of vnf genes, Fischerella 9339 (taxon ID 2516653082) and Chlorogleopsis 7702 (taxon ID 2512564012). Fischerella 9339 has orthologs of vnfDG, vnfK, vnfE and vnfN, but is missing most of the vanadate transport genes. In contrast, Chlorogleopsis 7702 has orthologs for all three of the vanadate transport genes, and has most of the structural genes for the V-nitrogenase; however, the fused vnfDG gene is missing the vnfD portion that encodes the alpha subunit of the enzyme, which is essential for dinitrogenase activity. It will be interesting to determine whether either of these strains is capable of fixing nitrogen in the absence of Mo and in the presence of V using the Vnitrogenase.

Conclusions
A. variabilis was one of the earliest model organisms for the study of important cellular processes such as photosynthesis and nitrogen fixation. It is unusual among cyanobacteria in that it has three nitrogenases [19], one of which, the Vnitrogenase, has been shown to be useful for hydrogen production [40], and for its ability to grow both photoautotrophically in the light and heterotrophically in the dark. The genome sequence was critical in identifying the genes for fructose transport [32] and the large island of genes important for V-nitrogenase function, including the vupABC genes for vanadate transport [49]. No other cyanobacterial genome has all the genes identified in A. variabilis that are important for growth using the V-nitrogenase, but two strains, Fischerella 9339 and Chlorogleopsis 7702, have some V-nitrogenase or vanadate transport genes. The presence of the linear genetic element shown in Fig. 4 is quite interesting, as such elements are not present in the genomes of the other Nostoc strains. It will also be interesting to determine whether this element is important to the cell and how this element replicates.