Genome sequence of the mud-dwelling archaeon Methanoplanus limicola type strain (DSM 2279T), reclassification of Methanoplanus petrolearius as Methanolacinia petrolearia and emended descriptions of the genera Methanoplanus and Methanolacinia

Methanoplanus limicola Wildgruber et al. 1984 is a mesophilic methanogen that was isolated from a swamp composed of drilling waste near Naples, Italy, shortly after the Archaea were recognized as a separate domain of life. Methanoplanus is the type genus in the family Methanoplanaceae, a taxon that felt into disuse since modern 16S rRNA gene sequences-based taxonomy was established. Methanoplanus is now placed within the Methanomicrobiaceae, a family that is so far poorly characterized at the genome level. The only other type strain of the genus with a sequenced genome, Methanoplanus petrolearius SEBR 4847T, turned out to be misclassified and required reclassification to Methanolacinia. Both, Methanoplanus and Methanolacinia, needed taxonomic emendations due to a significant deviation of the G+C content of their genomes from previously published (pre-genome-sequence era) values. Until now genome sequences were published for only four of the 33 species with validly published names in the Methanomicrobiaceae. Here we describe the features of M. limicola, together with the improved-high-quality draft genome sequence and annotation of the type strain, M3T. The 3,200,946 bp long chromosome (permanent draft sequence) with its 3,064 protein-coding and 65 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain M3 T (= DSM 2279 = ATCC 35062 = OCM 101) is the type strain of the species Methanoplanus limicola [1,2], one out of currently three species in the genus Methanoplanus [1,2]. Strain M3 T was originally isolated from the mud of a drilling swamp near Baia, Naples Area, Italy [1]. The genus name was derived from the Neo-Latin therm "methanum", pertaining to methane, and the Latin adjective "planus", meaning a flat plate, which refers to its flat cell morphology [1]. The species epithet was derived from the Latin word limicola, a dweller in the mud, inhabitant of a swamp [1]. When Wildgruber et al. described the type strain of the novel species in 1982 [1] they not only realized the striking similarity to the square-shaped flat bacterium that was reported two years earlier by Walsby [3], but also classified it as the type strain of the type species in the type genus of Methanomicrobiales Family III, 'Methanoplanaceae' [1]. However, when years later 16S rRNA sequences became available for phylogenetic analyses it became clear that the strains which represent the species Methanoplanus are closely related to Methanomicrobiaceae (including the genera Methanomicrobium, Methanogenium, and Methanoculleus). Since that time, the genus Methanoplanus is generally placed within the Methanomicrobiaceae, and Methanoplanaceae Wildgruber et al. 1984 has fallen into disuse [4], although the genus Methanoplanus was never formally reclassified. In the 31 years since strain M3 T was first characterized, only two follow-up projects have reported the use of M. limicola in comparative analyses; Ivanov and Stabnikova [5] used M. limicola for a study on the molecular phylogeny of methanogenic archaea based on the G+C content, and Liu et al. used the species in a study on air tolerance and water stress [6]. Here we present a summary classification and a set of features for M. limicola M3 T , together with the description of the genomic sequencing and annotation.

Classification and features
The single genomic 16S rRNA sequence of M. limicola M3 T was compared with the Greengenes database for determining the weighted relative frequencies of taxa and (truncated) keywords as previously described [7]. The most frequently occurring genera were Methanoculleus (51.9%), Methanoplanus (18.5%), Methanogenium (16.8%), Methanosphaerula (5.3%) and Methanomicrobium (3.7%) (52 hits in total). Regarding the two hits to sequences from members of the species, the average identity within HSPs was 99.9%, whereas the average coverage by HSPs was 92.8%. Regarding the five hits to sequences from other members of the genus, the average identity within HSPs was 96.6%, whereas the average coverage by HSPs was 95.0%. Among all other species, the one yielding the highest score was M. endosymbiosus (FR733674), which corresponded to an identity of 99.5% and an HSP coverage of 99.7%. (Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification.) The highest-scoring environmental sequence was EU420694 ('Archaeal and Kao-Mei Wetland clone KM07-Da-3'), which showed an identity of 95.7% and an HSP coverage of 98.0%. The most frequently occurring keywords within the labels of all environmental samples which yielded hits were 'temperatur' (4.7%), 'bioreactor' (4.4%), 'anaerob' (4.0%), 'methanogen' (3.3%) and 'archaeal' (2.9%) (198 hits in total) fit to the features known from the habitat of strain M3 T . Environmental samples which yielded hits of a higher score than the highest scoring species were not found. Figure 1 shows the phylogenetic neighborhood of M. limicola in a 16S rRNA based tree. The sequence of the single 16S rRNA gene copy in the genome does not differ from the previously published 16S rRNA sequence (M59143), which contains 23 ambiguous base calls. The tree was inferred from 1,271 aligned characters of the 16S rRNA gene sequence under the maximum likelihood (ML) criterion and rooted as previously described [7]. The branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent to the branches are support values from 250 ML bootstrap replicates [8] (left) and from 1,000 maximumparsimony bootstrap replicates [9] (right) if larger than 60%. Lineages with type-strain genome sequencing projects registered in GOLD [10] are labeled with one asterisk, those also listed as 'Complete and Published' with two asterisks [11][12][13][14] (for Methanoregula boonei and Methanosphaerula palustris see CP000780 and CP001338, respectively).
The tree depicted in Figure 1 reveals discrepancies between the current classification of the group and 16S rRNA phylogenetic analysis, as the genus Methanoplanus appeared polyphyletic, with M. petrolearius appearing as sister group of Methanolacinia payntneri with maximum support. We conducted a constraint analysis as previously described [15], enforcing the monophyly of all genera (which only affects Methanoplanus in this dataset, see Figure 1). The best-known ML tree had a log likelihood of -7,097.90, whereas the best tree found under the constraint had a log likelihood of -7,144.12. The constrained tree was significantly worse than the globally best one in the Shimodaira-Hasegawa test as implemented in RAxML [8] (α = 0.01). The best-known MP trees had a score of 1,090, whereas the best constrained trees found had a score of 1,115 and were significantly worse in the Kishino-Hasegawa test as implemented in PAUP* [9] (α = 0.01). M. limicola M3T cells stain Gram negative [1] and are plate-shaped with sharp crystal-like edges 1-3 µm long and 1-2 µm wide ( Figure 2 and [1]). Weak motility was observed and motility genes were identified in the genome (see below). Polar tufts of flagella were also reported [1], but not visible in Figure 2. Granules with putative reserve material were observed in thin section EM images, as were curious 'bone-shaped' cells [1]. Cell envelopes consist of an S-layer glycoprotein with a hexagonal surface pattern [1]. Cultures grow with H2 or formate as sole substrates supplemented with  0.1% acetate essentially required [1]. Growth temperatures span from 17-41°C (optimum 40°C) in the presence of 0.4-5.4% NaCl (optimum 1%) [1]. A summary of the classification and features is presented in Table 1.

Chemotaxonomy
No chemotaxonomic results were reported for strain M3 T , except for an estimation of 47.5% for the G+C content of the genome determined by a melting point in 0.1 × SSC [1].  Altitude not reported a Evidence codes -TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). Evidence codes are from of the Gene Ontology project [27].

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position [28], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [29]. The genome project is deposited in the Genomes On Line Database [10] and the complete genome sequence is deposited in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI) using state of the art sequencing technology [30]. A summary of the project information is shown in Table 2.  [29]. DNA is available through the DNA Bank Network [32].

Genome sequencing and assembly
The genome was sequenced using a combination of Illumina and 454 sequencing platforms. All general aspects of library construction and sequencing can be found at the JGI website [33]. Pyrosequencing reads were assembled using the Newbler assembler (Roche

Genome annotation
Genes were identified using Prodigal [38] as part of the DOE-JGI [39] genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [40]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and InterPro databases. These data sources were combined to assert a product description for each predicted protein. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [41].

Genome properties
The genome consists of one scaffold (circularity not experimentally proven) of 3,200,946 bp length with a 42.2% G+C content (Table 3 and Figure 3). Of the 3,128 genes predicted, 3,064 were proteincoding genes, and 65 RNAs; 122 pseudogenes were also identified. The majority of the proteincoding genes (60.8%) were assigned a putative function while the remaining ones were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4. . CRISPR repeats 0 a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome Standards in Genomic Sciences b but five genes for 5S rRNA

Insights into the genome sequence G+C content of the genus Methanoplanus
When calculated from the genome sequences, the G+C content of M. limicola DSM 2279 amounts to 42.2%, whereas the previously published value, determined using traditional ("wet-lab") techniques, is 47.5% [1]. Similarly, the G+C content of M. petrolearius was given as 50% [42], whereas the analysis of the genome sequence of the type strain SEBR 4837 T (DSM 11571) yielded 47.4% [11]. It was frequently stated in the literature that "organisms that differ by more than 10 mol% do not belong to the same genus and that 5 mol% is the common range found within a species" [43]. A recent study [44] has shown that when calculated from genome sequences the G+C content varies at most 1% within species and that larger variances are caused by the limitations of the traditional techniques for analyses. It has thus been recommended to conduct emendations of species descriptions in the case of discrepancies larger than 1%, and to also conduct emendations of genus descriptions if the species emendations yield values that do not fit into the range of the G+C content given in the literature for the respective genus [44].

Considerations about the polyphyletic genus Methanoplanus
The phylogenetic tree presented in Figure 1 shows Methanoplanus as a polyphyletic taxon with the members of Methanomicrobium and Methanolacinia interspersed between the members of Methanoplanus. Given the high bootstrap support for the branches in that section of the phylogenetic tree, this situation calls for some attention, mainly due to the location of M. petrolearius [42]. The conflict between 16S rRNA gene data and the classification is significant, as revealed by the bootstrap values and the pairedsite tests described above. The problematic local structure of the phylogenetic tree might be caused by the fact that most of the five species located in the respective part of the tree were already decribed in the early days ofArchaea research when only a limited number of reference sequences were available: M. limicola dates from 1982 [1], M. endosymbiosus from 1986 [45], M. petrolearius from 1997 [42], Ml. paynteri from 1983 [46] (renamed in 1989 [47]), and Methanomicrobium mobilis even from 1968 [48]. State-of-the-art techniques for the initial taxonomic characterization of the then novel bacteria were much less advanced than today, e.g. Sanger sequencing had just been invented (in 1977) when M. limicola was characterized with DNA-RNA hybridizations as decisive technique [49], and still not yet generally used for taxonomic work when M. endosymbiosus was characterized four years later. When the latest of the three Methanoplanus species with a validly published name, M. petrolearius, was added in 1997 16S rRNA sequences were used, but the ones from Ml. paynteri (closest neighbor in the phylogenetic tree in Figure 1) and M. mobilis were not yet available or at least not used for comparative analyses [42]. The completion of the Sequencing Orphan Species (SOS) initiative early last year [50], closed the last gaps in the availability of high-quality 16S rRNA reference sequences for phylogenetic trees. However, a decade after the first genome-based investigations into the history of the domain Archaea [51] and the systematic overview of their evolution, physiology, and molecular biology [52], a significant fraction of draft genome sequences as such generated in the genomic Encyclopedia of Bacteria and Archaea [29] are still very much needed to cover all of the diversity of the Archaea, especially from difficult-to-grow organisms and from type strains of remote clades such as the Methanomicrobiaceae. With all these limitations, a closer inspection of the positions of the members of Methanoplanus in Figure 1 might still be worthwhile. M. petrolearius appears to be clearly separated from the other two members of the genus, M. limicola and M. endosymbiosus, but closely linked to Ml. paynteri with a 99.8% 16S rRNA gene sequence identity. Table 5 shows a summary of the features of all members of the genera Methanoplanus and Methanolacinia, indicating that based on the higher optimal growth temperature, the lack of observed flagella and observed motility (although the flagellin genes are encoded in the genome), the usage of CO2+2-propanol as a substrate, and the higher G+C content of the genome [42], M. petrolearius clusters rather with Ml. paynteri than with the other two members of Methanoplanus.

Taxonomic consequences
As explained in detail above, the differences in the reported G+C contents of M. limicola and M. petrolearius to the ones calculated from their genome sequences justifies an emendation of the species descriptions. Moreover, M. petrolearius should be placed within the genus Methanolacinia.
The descriptions of the two genera should be emended accordingly.

Emended description of the species Methanoplanus limicola Wildgruber et al. 1982
The description of the species Methanoplanus limicola is the one given by Wildgruber et al. 1982 [1], with the following modification. The G+C content is 42%.

Methanoplanus petrolearius Ollivier et al. 1997
The description of the species Methanoplanus petrolearius is the one given by Ollivier et al. 1997 [42], with the following modification. The G+C content is 47%.

Description of Methanolacinia petrolearia, comb. nov.
Basonym: Methanoplanus petrolearius Ollivier et al. 1997 The description of the species is the same as given for Methanoplanus petrolearius Ollivier et al. 1997 with the emendation given above.

Emended description of the genus Methanoplanus
The description is the one given by Wildgruber et al. [1] with the following modifications: The G+C content is 39-42%.

Emended description of the genus Methanolacinia
The description is the one given by Zellner et al. [47] with the following modifications: The G+C content is 45-47%.

Acknowledgments
We