Non-contiguous finished genome sequence and description of Halopiger goleamassiliensis sp. nov.

Halopiger goleamassiliensis strain IIH3T sp. nov. is a novel, extremely halophilic archaeon within the genus Halopiger. This strain was isolated from an evaporitic sediment in El Golea Lake, Ghardaïa region (Algeria). The type strain is strain IIH3T. H. goleamassiliensis is moderately thermophilic, neutrophilic, non-motile and coccus-shaped. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,906,923 bp long genome contains 3,854 protein-encoding genes and 49 RNA genes (1 gene is 16S rRNA, 1 gene is 23S rRNA, 3 genes are 5S rRNA, and 44 are tRNA genes).


Introduction
Halopiger goleamassiliensis sp. nov. strain IIH3 T (=KC 430940 =CSUR P3036 = DSM on-going deposit) is the type strain of H .goleamassiliensis sp. nov. This organism is a Gram-negative, extremely halophilic, moderately thermophilic and strictly aerobic archaeon. It was isolated from an evaporitic sediment in El Golea Lake, Ghardaïa region (Algeria) as part of a project studying archaeal diversity in hypersaline Lakes of Algeria. The number of genera and species belonging to Halobacteria (Archaea, Euryarchaeota) has increased recently due to studies of several different hypersaline environments (thalassohaline and athalassohaline) combined with the use of different isolation media and culture conditions [1]. At the time of writing, the family Halobacteriaceae, the single family described within the order Halobacteriales, accommodated 40 recognized genera [2]. The genus Halopiger was proposed by Gutiérrez et al. (2007) [3] and contains only three species, Halopiger xanaduensis isolated from the Shangmatala Lake (China) [3], Halopiger aswanensis isolated from a hypersaline soil in Aswan (Egypt) [4] and Halopiger salifodinae recently isolated from a salt mine in Kuche county, Xinjiang province, China [5]. So far, this genus is composed of strictly aerobic, Gram-negative, polymorphic and pigmented strains. We have recently used [6][7][8][9][10][11][12][13][14][15][16][17][18] a polyphasic approach for prokaryotic classification [19] that includes genomic data [20,21], MALDI-TOF spectra [22,23] and major phenotypic characteristics. Using this approach, we report here a summary classification and a set of features for Halopiger goleamassiliensis sp.nov. strain IIH3 T together with the description of the complete genomic sequencing and annotation. These characteristics support the circumscription of the H. goleamassiliensis species.

Classification and features
H. goleamassiliensis was isolated from an evaporitic sediment of the hypersaline Lake El Golea in Ghardaïa region of Algeria. The sediment sample (1g) was enriched in a liquid SG medium [24] containing ampicillin (100 μg/mL) at 55°C on a rotary shaking platform (150 rpm) for 7 to 15 days. Serial dilutions of enrichment cultures were plated on SG agar plates and incubated aerobically at 55°C. After 2 to 6 weeks of incubation, representative colonies were picked and maintained in the SG medium at 55°C. Strain IIH3 T (Table 1) was isolated in 2012 by cultivation in aerobic condi-tions at 55°C and stored at -80 ºC with 25% (v/v) glycerol. Genomic DNA was extracted and purified using the Genomic DNA purification kit (MACHEREY-NAGEL) Hoerd, France. The 16S rRNA gene was amplified by PCR using the primers 21AF: TTCCGGTTGATCCTGCCGGA and RP2: ACGGCTACCTTGTTACGACTT. A total of 1,444 bases were identified. The sequence was compared with available sequences in GenBank using a BLAST search [36]. The strain exhibited 96% nu-cleotide sequence similarities with Halopiger xanaduensis [3]. These values were lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [37]. A phylogenetic tree (Figure 1) was constructed using the neighbor-joining method with the MEGA 5 program package [38] after multiple alignments of the data using MUSCLE [39]. Evolutionary distances were calculated using the Tamura-Nei model [40]. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [35]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. Standards in Genomic Sciences Phenotypic characterization was carried out according to the recommended minimal standards for the description of new taxa in the order Halobacteriales [41]. Under optimal growth conditions on SG agar medium and after incubation for 15-20 days at 55°C, colonies were salmon pigmented, circular with a diameter of 1-2 mm. Cell morphology and motility were examined by using light microscopy and phase-contrast microscopy. Gram staining was performed using samples fixed with acetic acid, as described by Dussault in 1955 [42]. Cells are Gram-negative, cocci ( Figure 2) measuring 0.8-1.5 μm in diameter ( Figure 3). Motility and spores or capsules were not observed. All the following biochemical and nutritional tests were realized in duplicate. Strain IIH3 T was found to be oxidaseand catalase-positive. The strain is extremely halophilic and cell lysis is observed in distilled water. It is a strictly aerobic organism and anaerobic growth does not occur even in the presence of KNO3 or arginine. Neither magnesium nor amino acids are required for growth. Tween 80, gelatin, and lipids from egg yolk are hydrolysed, whereas urea, starch, casein, and phosphatase are not. Production of indole and methyl red, Voges-Proskauer and Simmons' citrate tests are negative. H2S is not produced from cysteine.  Utilization of carbohydrates and other compounds as sole carbon sources and acid production from these compounds were determined as described by Oren [41]. Several sugars and amino acids can serve as sole carbon and energy sources ( Table 2). Antibiotic sensitivity tests were determined on SG medium agar plates with antibiotic discs. Strain IIH3 T is susceptible to bacitracin (10 μg),   Matrix-assisted laser desorption/ionization timeof-flight mass spectrometry (MALDI-TOF MS is considered a reliable and rapid identification method for extremophilic prokaryotes [22,23] and it is used in the present study to characterize the strain IIH3 T as previously described [6][7][8][9][10][11][12][13][14][15][16][17][18]. A pipette tip was used to pick one isolated archaeal colony from a culture agar plate, and to spread it as a thin film on a MTP 384 MALDI-TOF tar-get plate (Bruker Daltonics, Leipzig, Germany). The colonies from strain IIH3 T and from other species of archaea were spotted in triplicate. After airdrying, 1.5 μl of matrix solution (a saturated solution of α-cyano-4-hydroxycinnaminic acid [CHCA] in 50% aqueous acetonitrile containing 2.5% trifluoroacetic acid) per spot was applied and allowed to dry for five minutes. Mass spectrometric measurements were performed with a Microflex spectrometer (Bruker). Spectra were recorded in the positive linear mode for the mass range of 2000 to 20,000 DA. The acceleration voltage was 20 kV. The time of acquisition was between 30 seconds and 1 minute per spot. Spectra were collected as a sum of 240 shots across a spot. Preprocessing and identification steps were performed using the manufacturer's parameters. The IIH3 T spectrum (Figure 4) Figure 5). A score enabled the identification, or not, from the tested species: a score > 2.3 with a validly published species enabled the identification at the species level, a score > 1.7 but < 2 enabled the identification at the genus level; and a score < 1.7 did not enable any identification. For strain IIH3 T , none of the obtained scores was > 1, thus suggesting that our isolate was not a member of a known species. We added the spectrum from strain IIH3 T to our database for future reference. Figure 5 shows the MALDI-TOF MS spectrum differences between H. goleamassiliensis and other Archaea.

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the genus Halopiger, and as part of a study of archaeal diversity in hypersaline lakes of Algeria. It is the second genome of a Halopiger species and the first se-quenced genome of H. goleamassiliensis sp. nov. The EMBL accession number is CBMB010000001-CBMB010000011 and it consists of 3 scaffolds (HG315690-HG315692). A summary of the project information (PRJEB1780) and its association with MIGS version 2.0 recommendations [27] is shown in Table 3.

Growth conditions and DNA isolation
H. goleamassiliensis sp.nov. strain IIH3 T (= CSUR P3036 =DSM on-going deposit) was grown in SG medium at 55°C in aerobic condition. DNA was isolated and purified using the Genomic DNA purification kit, NucleoSpin Tissue procedure (MACHEREY-NAGEL) following the standard protocol as recommended by the manufacturer. The quality of the DNA was checked on an agarose gel (0.8%) stained with SYBR safe. The yield and the concentration were measured by the Quant-it Picogreen Kit (Invitrogen) on the Genios Tecan Fluorometer at 33.1 ng/µL.

Genome sequencing and assembly
A 5 kb paired-end sequencing strategy (Roche, Meylan, France) was used. This project was loaded on a 1/4 region on PTP Picotiterplate (Roche). Three µg of DNA was mechanically fragmented on the Covaris device (KBioScience-LGC Genomics, Teddington, UK) using miniTUBE-Red 5Kb. The DNA fragmentation was visualized through an Agilent 2100 BioAnalyzer on a DNA labchip 7500 with an optimal size of 4.7 kb. The library was constructed according to the 454 GS FLX Titanium paired end-protocol. After PCR amplification through 17 cycles followed by double size selection, the single stranded paired-end library was then loaded on a DNA labchip RNA pico 6000 on the BioAnalyzer. The pattern showed an optimum at 480 bp and the concentration was quantified on a Genios Tecan fluorometer at 642 pg/µL. The concentration equivalence of the library was calculated at 10 8 molecules/µL. The library was stored at -20°C until further use, and amplified in stored at -20°C until further use, and amplified in 2 emPCR reactions at 0.25 cpb, in 2 emPCR at 0.5 cpb and in 2 emPCR at 1 cpb with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yield of the 3 types of paired-end emPCR reactions was 3.68%, 8.05% and 10.69% respectively, in the quality range of 5 to 20% expected from the Roche procedure. These emPCR were pooled. Both libraries were loaded onto GS Titanium PicoTiterPlates (PTP Kit 70×75, Roche) and pyrosequenced with the GS Titanium Sequencing Kit XLR70 (Roche). The run was performed overnight and then analyzed on the cluster through the gsRunBrowser and Newbler assembler (Roche).

Genome annotation
Open Reading Frames (ORFs) were predicted using prodigal with default parameters [43]. ORFs spanning a sequencing gap region were excluded. Assessment of protein function was obtained by comparing the predicted protein sequences with sequences in the GenBank [44] and the Clusters of Orthologous Groups (COG) databases using BLASTP. RNAmmer [45] and tRNAscan-SE 1.21 [46] were used for identifying the rRNAs and tRNAs, respectively. SignalP [47] and TMHMM [48] were used to predict signal peptides and transmembrane helices, respectively. For alignment lengths greater than 80 amino acids, ORFans were identified if their BLASTP E-value was lower than 1e-03. An E-value of 1e-05 was used if alignment lengths were smaller than 80 amino acids. DNA Plotter [49] was used for visualization of genomic features and Artemis [50] was used for data management. The mean level of nucleotide sequence similarity was estimated at the genome level between H. goleamassiliensis and 5 other members of the Halobacteriaceae family (Table 6), by BLASTN comparison of orthologous ORFs in pairwise genomes. Orthologous proteins were detected using the Proteinortho software using the following parameters: e-value 1e-05, 30% identi-ty, 50% coverage and 50% of algebraic connectivity [51].

Genome properties
The genome is 3,906,923 bp long and displays a G+C content of 66.06%. (Table 4, Figure 6) It is composed of 12 contigs (11 large contigs >1,500 bp) arranged into 3 scaffolds. Of the 3,903 predicted genes, 3,854 were protein-coding genes (COG), and 49 were RNAs (1 gene is 16S rRNA, 1 gene is 23S rRNA, 3 genes are 5S rRNA, and 44 are tRNA genes). A total of 2,359 genes (61.21%) were assigned a putative function (by COG or by NR BLAST) and 188 genes were identified as ORFans (4.88%). The remaining genes were annotated as hypothetical proteins (1059 genes = 27.48%). The distribution of genes into COG functional categories is presented in Table 4. The properties and the statistics of the genome are summarized in Tables 4 and 5.  (Table  6).  23.51 a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. The total is based on the total number of protein coding genes in the annotated genome.

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of Halopiger goleamassiliensis sp. nov. that contains the strain IIH3 T . This archaeal strain has been found in Algeria.

Description of Halopiger goleamassiliensis sp. nov.
Halopiger goleamassiliensis (go.le'a. ma. si. li. en'sis. L. gen. masc. n. goleamassiliensis from the combination of El Golea, the Algerian region where the strain was isolated, and massiliensis, of Massilia, the Latin name of Marseille where the strain was sequenced). It has been isolated from an evaporitic sediment in El Golea Lake, Algeria. Colonies were smooth, salmon-pigmented and small with 1 to 2 mm in diameter under optimal growth conditions. Strain is strictly aerobic, extremely halophilic and moderately thermophilic archeon. Growth occurs at NaCl concentrations of 15-30%, at pH values in the range 7-11, and within the temperature range 40-60 °C. Optimal NaCl concentration, pH and temperature for growth are 22.5-25%, 8.0 and 55 °C, respectively. Magnesium is not required for growth. Cells are coccusshaped (0.8-1.5 µm), Gram-negative, non-motile and lyse in distilled water. Cells are positive for catalase, oxidase and lysine decarboxylase production and negative for urease, arginine dihydrolase, ornithine decarboxylase, tryptophanase, phosphatase, β-galactosidase, Dmannitol, sacharose, starch, dextrose, and D-fructose fermentation. The following substrates are utilized as single carbon and energy sources for growth: pyruvate, D-glucose, D-mannose, Dribose, D-xylose, maltose, sucrose, lactose, casamino acids, bacto-peptone, bacto-tryptone, and yeast extract. Tween 80, gelatin, and lipids from egg yolk are hydrolysed, whereas urea, starch, and casein are not. Methyl red, Voges-Proskauer, Simmons' citrate tests, and H2S production are negative. Cells are susceptible to bacitracin, novobiocin, streptomycin, and sulfamethoxazole but resistant to ampicillin, cephalothin, chloramphenicol, erythromycin, gentamicin, kanamycin, nalidixic acid, penicillin G, rifampicin, tetracycline, and vancomycin. The G+C content of the DNA is 66.06%. The 16S rRNA and genome sequences are deposited in GenBank and EMBL under accession numbers KC430940 and CBMB010000001-CBMB0100000-11, respectively. The type strain IIH3 T (=CSUR P3036 = DSM 27562) was isolated from an evaporitic sediment in El Golea Lake, Algeria.