Non-contiguous finished genome sequence and description of Halopiger djelfamassiliensis sp. nov.

Halopiger djelfamassiliensis strain IIH2T sp. nov. is the type strain of Halopiger djelfamassiliensis sp. nov., a new species within the genus Halopiger. This strain, whose genome is described here, was isolated from evaporitic sediment of the hypersaline Lake Zahrez Gharbi in the Djelfa region (Algeria). H. Djelfamassiliensis is a Gram-negative, polymorphic-shaped and strictly aerobic archaeon. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,771,216 bp long genome-contains 3,761 protein-coding and 51 RNA genes, including 4 rRNA genes.


Introduction
Halopiger djelfamassiliensis sp. nov. strain IIH2 T (= KC430939 = DSM on-going deposit) is the type strain of H. djelfamassiliensis sp. nov. It is a Gramnegative, aerobic, non-motile and polymorphic archaeon that was isolated from evaporitic sediment of the hypersaline Lake Zahrez Gharbi in the Djelfa region (Algeria) as part of a project studying archaeal diversity in hypersaline Lakes of Algeria. Classically, the classification of prokaryotes is based on a combination of phenotypic and genotypic characteristics [1] also known as polyphasic taxonomy. To date, only 192 archaeal genomes have been sequenced [2]. As the cost of genomic sequencing is constantly decreasing, the number of archaeal sequenced genomes is expected to grow in the next few years. We propose to describe new archaeal species by adding genomic information [3,4] to phenotypic criteria, including the proteic profile [5,6], as it was previously used for the description of new bacterial species [7][8][9][10][11][12][13][14][15][16][17][18][19]. The genus Halopiger created in 2007 by Gutiérrez [20], contains only three species, Halopiger xanaduensis SH-6 T isolated from the Shangmatala salt lake, Inner Mongolia, china [20], Halopiger aswanensis 56 T isolated from the surface of hypersaline salt soils close to Aswan, Egypt [21] and Halopiger salifodinae KCY07-B2 T recently isolated from a salt mine in Kuche county, Xinjiang province, China [22]. So far, this genus is composed of aerobic, Gram-negative, polymorphic and pigmented strains [20][21][22]. Here, we present a summary classification and a set of features for H. Djelfamassiliensis sp. nov. strain IIH2 T (= KC430939 = DSM ongoing deposit) together with the description of the complete genome sequencing and annotation. These characteristics support the circumscription of the H. Djelfamassiliensis species.

Classification and features
Halopiger djelfamassiliensis sp. nov. strain IIH2 T was isolated from evaporitic sediment of the hypersaline Lake Zahrez Gharbi in the Djelfa region of Algeria. Sediment samples (1g) were added to a 250 mL Erlenmeyer flasks containing 100 mL of SG medium [23] supplemented with ampicillin (100 μg/mL). Liquid enrichment cultures were incubated on a rotary shaking platform at 150 rpm for 7 to 10 days. After 1/10 dilution, aliquots (100 µL) were plated in SG medium supplemented with sterilized sediment extracts and incubated at 40°C for 7-30 days. In order to obtain pure culture, colonies were transferred to fresh solid SG medium. Strain IIH2 T (Table 1) was isolated in 2012 by cultivation in aerobic condition at 40°C. The strain exhibited a nucleotide sequence similarity with other members of the genus Halopiger ranging from 95% with H. salifodinae strain KCY07-B2 T to 96% with H. xanaduensis strain SH-6 T and H. aswanensis strain 56 T , its clos-est validated phylogenetic neighbor ( Figure 1). These values were lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [32]. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [31]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. Phenotypic tests of strain were performed according to the proposed minimal standards for the description of new taxa in the order Halobacteriales [33]. Different growth temperatures (30,37,40,50, 55, 60°C), pH (5, 6, 7, 7.5, 8, 8.5, 9, 10, 11, 12) and NaCl concentration (0, 10, 12, 15, 20, 22.5, 25, 30%) were tested. The requirement of Mg 2+ for growth was determined in media containing 0, 1, 2.5, and 5g MgSO 4. Growth occurred between 37°C and 55°C (optimum at 40°C), between 15% and 30% NaCl (optimum at 25% NaCl) and between pH 7-11 (optimum at pH 8). Mg 2+ was not required for growth. Colony morphology was observed under optimal growth conditions on agar medium after incubation in aerobic conditions at 40° C for 7 days. The colonies of strain IIH2 T were cream-pigmented, viscous and smooth with a diameter of 3 to 4 mm. A negative result was observed in the motility test. Gram staining was performed following the method outlined by Dussault in 1955 [34]. Cells grown on SG medium agar were Gram-negative ( Figure  2) polymorphic-shaped with a diameter ranging between 0.9 and 2.2 µm ( Figure 3). All the following biochemical and nutritional tests were realized in duplicate. Strain IIH2 T was found to be oxidase-and catalase-positive. Negative results were obtained for tryptophanase, βgalactosidase, arginine decarboxylase, H 2 S and indole production. Tween 80, gelatin, casein and lipids from egg yolk were hydrolysed at 40°C and 55°C, whereas urea, starch, and phosphatase were not. Methyl red and Voges-Proskauer tests were negative..  To estimate the utilization of various carbohydrates as carbon and energy sources, a minimum medium [250 g l -1 NaCl, 20 g l -1 MgSO 4 .7H 2 O, 2 g l -1 KCl, 0.1 g l -1 yeast extract (Difco), 0.5 g l -1 NH 4 Cl, 0.05 g l -1 KH 2 PO 4 , at pH 8.0] was supplemented with 1% of test carbohydrates. Strain IIH2 T can use as sole source of carbon and energy, organic nitrogen compounds such as casamino acids, peptone, tryptone and non-nitrogenous compounds such as acetate and pyruvate. Production of acids from carbohydrates was tested in the minimun medium supplemented with 0.5 g test substrate l -1 . Phenol red was used as an indicator to detect acid production. Positive reactions were observed for D-glucose, D-melibiose, Lrhamnose, D-xylose, D-galactose, D-mannose, Dribose and D-sucrose fermentation. No fermentation was observed with starch, fructose, Dlactose, dextran and mannitol. Matrix-assisted laser-desorption/ionizationtime-of-flight (MALDI-TOF) mass spectrometry (MS) protein analysis was carried out as previously described [5,6] using a Microflex spectrometer (Bruker Daltonics, Germany). Briefly, a pipette tip was used to pick one isolated archaeal colony from a culture agar plate and spread it as a thin film on a MTP 384 MALDI-TOF target plate (Bruker Daltonics). Twelve distinct deposits were done for strain IIH2 T from 12 isolated colonies. Each smear was overlaid with 1.5 µL of matrix solution (a saturated solution of alpha-cyano-4-hydroxycinnamic acid) in 50% acetonitrile, 2.5% tri-fluoracetic acid and allowed to dry for 5 minutes. Spectra were recorded in the positive linear mode for the mass range from 2,000 to 20,000 Da. A spectrum was obtained after 675 shots with variable laser power. The time of acquisition was between 30 seconds and 1 minute per spot. The 12 IIH2 T spectra were imported into the MALDI Bio Typer software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 8 Archaea ( (Figures 4 and 5). The method of identification included the m/z from 2,000 to 20,000 Da. For every spectrum, 100 peaks at most were taken into account and compared with the spectra in the database. The MALDI-TOF score enabled the predictive identification and discrimination of the tested species from those in a database: a score > 2 with a validated species enabled identification at the species level, and a score < 1.7 did not enable any identification. No significant score was obtained for strain IIH2 T against the archaea database, suggesting that our isolate was not a member of a known species. We added the spectrum from strain IIH2 T to our database for future reference ( Figure 4). Figure 5 shows the MALDI-TOF MS spectrum differences between H. djelfamassiliensis and other archaea ( Figure 5).

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phenotypic differences, phylogenetic position and 16S rRNA similarity to other members of the genus Halopiger, and as part of the study of archaeal diversity in hypersaline lakes of Algeria. It is the second genome of a Halopiger species and the first sequenced genome of H. djelfamassiliensis sp. nov. The EMBL accession number is CBMA010000001-CBMA010000055 and it consists of 6 scaffolds (HG315684-HG315689). Table 3 shows a summary of the project (PRJEB1777) information and its association with MIGS version 2.0 recommendations [24].     View displays the raw spectra of all loaded spectrum files arranged in a pseudo-gel like look. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a Gray scale scheme code. The color bar and the right y-axis indicate the relation between the color a peak is displayed and the peak intensity in arbitrary units.

Growth conditions and DNA isolation
Halopiger djelfamassiliensis strain IIH2 T sp. nov.
(=CSUR P3035= DSM on-going deposit) was grown aerobically on SG medium at 40°C. Four petri dishes were spread and resuspended in 4×50µl of DTT buffer (60 mM). After incubation at 60°C for 20 min, proteinase K (0.2mg/mL) was added and the sample was incubated at 37°C for 2h. The lysate was extracted with an equal volume of buffered phenol followed by a classical phenolchloroform extraction method [35]. The quality of the DNA was checked on an agarose gel (0.8%) stained with SYBR safe. The yield and the concentration were measured using the Quant-it Picogreen kit (Invitrogen) on the Genios_Tecan fluorometer at 126 ng/µl.

Genome sequencing and assembly
A paired-end sequencing strategy was used (Roche

Genome annotation
Open Reading Frames (ORFs) were predicted using prodigal [36] with default parameters. ORFs spanning a sequencing gap region were excluded. Assessment of protein function was obtained by comparing the predicted protein sequences with sequences in the GenBank [37] and the Clusters of Orthologous Groups (COG) databases using BLASTP. RNAmmer [38] and tRNAscan-SE 1.21 [39] were used for identifying the rRNAs and tRNAs, respectively. SignalP [40] and TMHMM [41] were used to predict signal peptides and transmembrane helices, respectively. ORFans of alignment length greater than 80 amino acids were identified if their BLASTP E-value was lower than 1e-03.. An E-value of 1e-05 was used if alignment lengths were smaller than 80 amino acids. DNA Plotter [42] was used for visualization of genomic features and Artemis [43] was used for data management. The mean level of nucleotide sequence similarity was estimated at the genome level between H. djelfamassiliensis and 5 other members of the Halobacteriaceae family (Table 6), by BLASTN comparison of orthologous ORFs in pairwise genomes. Orthologous proteins were detected using the Proteinortho software using the following parameters e-value 1e-05, 30% identity, 50% coverage and 50% of algebraic connectivity [44].

Genomes properties
The genome is 3,771,216 bp long with 64,30% G+C content (Table 4, Figure 6). It is composed of 73 contigs (54 contigs are >1,500 bp) arranged into 6 scaffolds. Of the 3,812 predicted genes, 3,761 were protein-coding genes, and 51 were RNAs (1 gene is 16S rRNA, 1 gene is 23S rRNA, 2 genes are 5S rRNA, and 47 are tRNA genes). A total of 2,319 genes (61.66%) were assigned a putative function (by COG or by NR BLAST). In addition, 174 genes were identified as ORFans (4.63%). The remaining genes were annotated as hypothetical proteins (1035 genes = 27.52%). The distribution of genes into COG functional categories is presented in Table 4. The properties and the statistics of the genome are summarized in Tables 4 and 5.   a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. The total is based on the total number of protein coding genes in the annotated genome.

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of Halopiger djelfamassiliensis sp. nov. that contains the strain IIH2 T . This archaeal strain has been found in Algeria.

Description of Halopiger djelfamassiliensis sp. nov.
Halopiger djelfamassiliensis (dj. el. fa. ma. si. li. en'sis. L. gen. fem. n. djelfamassiliensis from the combination of Djelfa, the Algerian region where the strain was isolated, and massiliensis, of Massilia, the Latin name of Marseille, where the strain was sequenced). It has been isolated from an evaporitic sediment of the hypersaline Lake Zahrez Gharbi in the Djelfa region of Algeria. Colonies were smooth, viscous and creampigmented with 3 to 4 mm in diameter on SG medium after incubation for 7 days at 40°C. Strain IIH2 T is a Gram-negative, non-motile, strictly aerobic and extremely halophilic archeon. Growth occurs at NaCl concentrations of 15-30%, at pH values in the range 7-11, and within the temperature range 37-55 °C. Optimal NaCl concentration, pH and temperature for growth are 25%, 8.0 and 40 °C, respectively. Magnesium is not required for growth. Cells are polymorphic (0.9-2.2 µm) and lyse in distilled water. Tween 80, gelatin and lipids from egg yolk are hydrolysed, D-glucose, Dmelibiose, L-rhamnose, D-xylose, D-galactose, Dmannose, D-ribose and D-sucrose are fermented. Cells are susceptible to bacitracin, novobiocin and tetracycline but resistant to ampicillin, cephalothin, chloramphenicol, erythromycin, gentamicin, kanamycin, nalidixic acid, penicillin G, streptomycin, and vancomycin. The G+C content of the genome is 64.30%. The 16S rRNA and genome sequences are deposited in GenBank and EMBL under accession numbers KC430939 and CBMA010000001-CBMA010000055 respectively. The type strain IIH2 T (=CSUR P3035= DSM ongoing deposit) was isolated from the sediment border of the hypersaline Lake Zahrez Gharbi, located in the Djelfa region of Algeria.