Complete genome sequence of Pirellula staleyi type strain (ATCC 27377T)

Pirellula staleyi Schlesner and Hirsch 1987 is the type species of the genus Pirellula of the family Planctomycetaceae. Members of this pear- or teardrop-shaped bacterium show a clearly visible pointed attachment pole and can be distinguished from other Planctomycetes by a lack of true stalks. Strains closely related to the species have been isolated from fresh and brackish water, as well as from hypersaline lakes. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the order Planctomyces and only the second sequence from the phylum Planctobacteria/Planctomycetes. The 6,196,199 bp long genome with its 4773 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain ATCC 27377 T (= DSM 6068 = ATCC 27377) is the type strain of the species Pirellula staleyi and was originally isolated by James T. Staley in the early 1970s [1,2]. Due to superficially similarities with Pasteuria ramosa in budding and rosetteformation, strain ATCC 27377 T was for several years considered to belong to the genus Pasteuria, the type strain of P. ramosa Metchnikoff 1888 [3]. However, Starr et al. [4] considered that this strain did not fit the original description of P. ramosa published by Metchnikoff in 1888 [3] and formally requested that the Judicial Commission rule that it should not be the type of P. ramosa Metchnikoff 1888. An Opinion was published by the Judicial Commission [5] fixing the type of P. ramosa Metchnikoff 1888 as the description of Metchnikoff as emended by Starr et al. [3]. At the same time Starr et al [3] also proposed that ATCC 27377 T be used as the type of a new species Planctomyces staleyi. In 1984 Schlesner and Hirsch re-assigned ATCC 27377 T to the new genus Pirella [6] as the type strain to the only species Pirella staleyi [6], but realized three years later that this genus name was as later homonym of Pirella Bainier 1883 [7], a fungus belonging to the Mucorales, and therefore illegitimate according to rule 51b of the International Code of Nomenclature of Bacteria [8,9]. In 1987 the strain received its currently validly pub-lished name Pirellula staleyi. P. staleyi and close relatives belong to the so called morphotype IV and are of interest because these organisms are usually attached to filamentous algae and cyanobacteria by a holdfast located at the distal end of the fascicle (the multifibrillar major appendage) or at the nonreproductive (nonbudding and nonpiliated) pole of the cell, if a fascicle is not present. P. staleyi is of further interest because of its life cycle (see below). It should be noted that members of the genus Pirellula (P. staleyi, P. marina) and other unnamed strains have been variously considered to be rapidly evolving (tachyletic) or ancient lineages. The transfer of P. marina to Blastopirellula marina and description of Rhodopirellula baltica [10] has called this interpretation into question, a theory that the growing number of genomes in the group may also be used to test. Here we present a summary classification and a set of features for P. staleyi ATCC 27377 T ( Table 1), together with the description of the complete genomic sequencing and annotation.

Classification and features
To date, two strains of the species P. staleyi have been described in detail, ATCC 27377 T [6,9] and strain ATCC 35122 [18]. Strain ATCC 27377 T was isolated from the freshwater Lake Lansing, MI, USA either in 1973 or before [2]. Strain ATCC 35122 was isolated as a "white" subclone of strain ICPB 4232 from a similar habitat, the freshwater Campus Lake, Baton Rouge, LA, USA [18,23]. Both strains are identical in their 16S rRNA gene sequence [18]. Except for an agricultural soil bacterium clone (SC-I-28, AJ252628), and for the isolates 'Schlesner 516' and 'Schlesner 670' (X81940, X81948) [24], no 16S rRNA gene sequences above 85% sequence similarity were reported in Genbank. Environmental samples from metagenomic surveys also do not surpass 88-90% sequence similarity, indicating that members of the species are not heavily represented in the so far genomically screened habitats (as of August 2009). Interestingly, sequences most closely related to the planktonic, aerobic heterotroph P. staleyi have been reported from anoxic sediments of the productive freshwater lake Priest Pot, Cumbria, UK [25]. Also, Pirellula-like sequences have been recovered from DNA extracted from marine sediments in Puget Sound [26] and marine snow [27].   [28,29] of the 16S rRNA gene sequence under the maximum likelihood criterion [30] and rooted in accordance with the current taxonomy. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 1,000 bootstrap replicates if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [31] are shown in blue, published genomes in bold.
The cell size of strain ATCC 27377 T is 0.9-1.0 × 1.0-1.5 µm. The mature cell shape is teardrop-to pear-shaped, with the attachment pole slightly pointed ( Figure 2). A fibrillar stalk shape and struc-ture is absent. Crateriform structures are predominantly on the reproductive cell pole only. Occasionally, small crateriform structures may also be observed on the non-reproductive and nonpiliated pole of the cell opposite the budding site [20]. The position of the monotrichous flagellum is at the reproductive cell pole [6,10]. Strain ATCC 27377 T produces pigmented colonies and motile daughter and sessile mother cells [10]. A unique feature seen in both negatively stained cells and in thin-sectioned cells of strains ATCC 27377 T and ATCC 35122 is the occurrence of 'hump' protrusions including both cell wall and cytoplasm [18]. These protrude 50 ±111 nm from the cell and are 200 ±260 nm in diameter measured at the base of the structure (from thin sections and negatively stained cells) [18]. One or two are visible per cell, and when two are visible these are distributed in a characteristic manner opposite to each other in the cell near the narrow pole [18]. They appear to conform to the definition of prosthecae as cellular appendages or extensions of the cell containing cytoplasm [18,32]. However, the prosthecae of strain ATCC 27377 are distributed further from the narrow cell pole than in strain ATCC 35122 [18]. Functions proposed for the prosthecae include increasing surface area, reproduction, and stalk function [18]. The life cycle of P. staleyi has been described in great detail elsewhere [23]. Briefly, the mature bud develops a sheathed flagellum attached near the piliated pole (opposite the fascicle origin) and becomes a swarmer; the swarmer loses its flagellum and becomes a sessile mother cell (with a distal holdfast and eventually a fascicle at the pole opposite the piliated and budding pole); the mother cell develops a bud [20,23]. Strain ATCC 27377 T hydrolyses casein, aesculin, gelatin and starch, but not DNA [10]. It produces H2S from thiosulfate and is negative for lipase (pH 7) and phosphatidyl choline [10]. It utilizes fucose as carbon source, but not glycerol, glutamic acid, or chondroitin sulfate [10]. Contrary to the original description [2], the cells are Gram-negative and do not utilize lyxose, D-ribose, fucose, L-rhamnose, fructose, or inulin as a carbon source. Additional characteristics include the following. Pectin, lactose, maltose, melibiose, raffinose, sucrose, and trehalose are utilized as carbon sources. The maximum salt tolerance is 50% artificial seawater (Lyman & Fleming, 1940), with 100% ASW corresponding to 3.5% salinity [4]. The cells are weakly inhibited by artificial light (2,400 lx). The following carbon sources are not utilized: adipate, citrate, I-alanine, I-glutamate, gluconate, and urea [4,7]. Strain ATCC 27377 T is resistant to ampicillin and penicillin (1000 µg ml -1 ), cephalothin (100 µg ml -1 ), streptomycin (500 µg ml -1 ) and cycloserine (100 µg ml -1 ), but not to tetracycline (10 µg ml -1 is lethal) [10]. The primary sequence and secondary structure of the ribonuclease P RNA of strain P. staleyi ATCC 27377 T and other planctomycetes has been described in detail and has been evaluated for their suitability as a taxonomic marker [20].

Chemotaxonomy
The cell envelope of strain P. staleyi ATCC 27377 T contains no peptidoglycan but consists almost entirely of protein. The cell wall amino acids (molar ratio) are threonine (3.0), glutamate (9.0), cysteine (3.6) and valine (1.7) [22]. Further details on the amino acids, NH3, hexosamine and neutral sugar contents of the cell envelope of strain ATCC 27377 T are published elsewhere [10]. The major fatty acids (relative %) are C16:0 (33. 8 [19]. The major respiratory lipoquinone present is MK-6. One of the major phospholipid present that has been identified is phosphatidylglycerol [10]. Other lipids have not been identified based on Rf values and staining behavior, indicating that novel lipids are an important constituent of the cell membrane. The production of spermidine distinguishes P. staleyi from the closely related R. baltica DSM 10527 and B. marina DSM 3645. Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [22]. If the evidence code is IDA, then the property was directly observed for a living isolate by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position, and is part of the Genomic Encyclopedia of Bacteria and Archaea project. The genome project is deposited in the Genome OnLine Database [31] and the complete genome sequence is deposited in Genbank Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2.

Genome sequencing and assembly
The genome was sequenced using a combination of Sanger, 454 and Illumina sequencing platforms. All general aspects of library construction and sequencing performed at the JGI can be found at the JGI website (http://www.jgi.doe.gov/). 454 Pyrosequencing reads were assembled using the Newbler assembler version 1.1.03.24 (Roche). Large Newbler contigs were broken into 6,869 overlapping fragments of 1,000 bp and entered into assembly as pseudo-reads. The sequences were assigned quality scores based on Newbler consensus q-scores with modifications to account for overlap redundancy and adjust inflated qscores. A hybrid 454/Sanger assembly was made using the PGA (Paracel Genome Assembler) assembler. Possible mis-assemblies were corrected and gaps between contigs were closed by custom primer walks from sub-clones or PCR products. Illumina reads were used to improve the final consensus quality using an in-house developed tool (the Polisher). The error rate of the completed genome sequence is less than 1 in 100,000. The final assembly consists of 70,045 Sanger and 450,004 pyrosequence reads. Together all sequence types provided 31.0× coverage of the genome.

Genome annotation
Genes were identified using Prodigal [35] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline (http://geneprimp.jgi-psf.org/) [36]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and InterPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (http://img.jgi.doe.gov/er) platform [37].

Genome properties
The genome is 6,196,199 bp long and comprises one main circular chromosome with a 57.5% GC content ( Figure 3 and Table 3). Of the 4,822 genes predicted, 4,773 were protein coding genes, and 49 RNAs. In addition, 56 pseudogenes were also identified. The majority of the protein-coding genes (54.5%) were assigned with a putative function while those remaining were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.