Non-contiguous finished genome sequence and description of Bacillus massilioalgeriensis sp. nov.

Strain EB01T sp. nov. is the type strain of Bacillus massilioalgeriensis, a new species within the genus Bacillus. This strain, whose genome is described here, was isolated from sediment sample of the hypersaline lake Ezzemoul sabkha in northeastern Algeria. B. massilioalgeriensis is a facultative anaerobic Gram-positive bacillus. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5,269,577 bp long genome contains 5,098 protein-coding and 95 RNA genes, including 12 rRNA genes.


Introduction
Bacillus algeriensis sp. nov. strain EB01 T (= CSUR P857 = DSM 27334) is the type strain of B. algeriensis sp. nov. It is a new Gram-positive, facultatively anaerobic, motile, indole-negative, rod shaped bacterium with rounded ends. It was isolated from a sediment sample from the hypersaline lake Ezzemoul sabkha in the Oum-El-Bouaghi region in northeastern Algeria, which is an important wintering and resting site for several species of waterbirds, including the Greater Flamingo. This site is one of the Ramsar convention wetlands (http://www.ramsar.org). The genus Bacillus was created by Cohn about 142 years ago [1],and mainly comprises Gram-positive, rodshaped, aerobic or facultatively anaerobic, sporeforming bacteria. The genus includes 279 species and 7 subspecies with validly published names [2]. Members of Bacillus genus are ubiquitous in nature, ranging from freshwater to marine sediments and from hot springs and desert sands to Arctic soils; many strains have been isolated from the gastrointestinal tracts of various insects and animals, from vegetation and from food [3]. Bacillus strains are biotechnologically priceless be-cause of their high capacity to produce a wide range of antimicrobial compounds, enzymes and other metabolites that can be used in industry [4,5]. Some species of Bacillus are pathogenic, such as B. anthracis (responsible for causing anthrax) [6] and B. cereus (a major cause of food poisoning) [7]. Others are opportunists in immunocompromised patients, and may also be involved in various human infections, including pneumonia, endocarditis, ocular, cutaneous, bone or central nervous system infections and bacteremia [8].The current bacterial taxonomy is based on a combination of various phenotypic and genetic criteria [9,10]. However, the three essential genetic criteria that are used, comprising 16S rRNA gene based phylogeny [11], G+C content, and DNA-DNA hybridization [10,12] exhibit several drawbacks. As a result of the recent decrease in the cost of genomic sequencing, it has been proposed that whole genome sequencing information and MALDI-TOF spectrum [13] be combined with the main phenotypic characteristics as a polyphasic approach strategy (taxono-genomics) to describe new bacterial taxa [14][15][16][17][18][19][20][21][22][23][24][25][26].
Here we present a summary classification and a set of features for B. algeriensis sp. nov. strain EB01 T together with the description of the complete genome sequence and annotation. These characteristics support the circumscription of the species B. algeriensis.

Classification and features
In July 2012, a sediment sample was aseptically collected in sterile bottles, 15 cm below the evaporite crust of the hypersaline lake Ezzemoul sabkha of Oum-El-Bouaghi region in northeastern Algeria. Samples were transferred in a cooler (4°C) to our lab in Algeria. Samples were processed the same day. Sediments were diluted 1:10 v/v with sterile saline water (0.9% NaCl) and vigorously shaken, tenfold serial dilutions (10 -1 -10 -5 ) of the sediment suspension were plated in Nutrient Agar (NA) medium (meat extract 1 g/l, peptone 5 g/l, yeast extract 2 g/l, sodium chloride 5 g/l, agar 15 g/l) and the plates were incubated at 30°C for 24-72 h. In order to obtain a pure culture, colonies were transferred to fresh NA medium. Bacillus algeriensis sp. nov. strain EB01 T (Table 1) was isolated in July 2012 by cultivation under aerobic conditions at 30°C. This strain exhibited a 97.0% 16S rRNA nucleotide sequence similarity with Bacillus subterraneus type strain DSM13966 T (Figure 1), the phylogenetically closest validly published Bacillus species. These values were lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying DNA DNA hybridizidation [11]. Altitude 800 m IDA a Evidence codes -IDA: Inferred from Direct Assay, TAS: Traceable Author Statement (i.e., a direct report exists in the literature), NAS: Non traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [37]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.

Figure 1.
A consensus phylogenetic tree based on 16S rRNA gene sequence comparisons, highlighting the position of strain EB01 T Bacillus algeriensis relative to other type strains within the Bacillus genus. GenBank accession numbers are displayed in parentheses. Sequences were aligned using CLUSTALW, and phylogenetic inferences made using the neighbor-joining method [38] within the MEGA 5 software [39]. Numbers above the nodes are percentages of bootstrap values from 1,000 replicates that support the node. Paenibacillus polymyxa was used as the outgroup. The scale bar represents 0.01 substitutions per nucleotide position.
Growth of the strain was tested in anaerobic and microaerophilic atmospheres using GasPak EZ Anaerobe Pouch (Becton, Dickinson and Compa-ny) and CampyGen Compact (Oxoid) systems, respectively, and in aerobic atmosphere, with or without 5% CO2. Growth was achieved under aerobic (with and without CO2) and microaerophilic conditions but weak growth was observed under anaerobic conditions. Gram staining showed Gram-positive rods ( Figure 2). Cells grown on agar sporulate. A motility test was positive. The size of cells were determined by negative staining transmission electron microscopy on a Technai G 2 Cryo (FEI) at an operating voltage of 200 kV, the rods have a length ranging from 2.4 μm to 4.9 μm (mean 3.6 μm) and a diameter ranging from 0.7 μm to 1.1 μm (mean 0.8 μm) ( Figure 3).    Strain EB01 T exhibited catalase activity but oxidase activity was negative. Using the commercially available API 50CH system (BioMerieux) according to the manufacturer's instructions, a weak positive reaction was observed for D-ribose, Dglucose, D-fructose, methyl α-D-glucopyranoside, N-acetylglucosamine, D-maltose, D-lactose, Dmelibiose, D-saccharose, D-trehalose, D-tagatose, and hydrolysis of starch. Other tests were negative. Using the API ZYM system (BioMerieux), positive reactions were observed for alkaline phosphatase, esterase (C4), esterase lipase (C8), leucine arylamidase, α-chymotrypsin, β-glucuronidase, α-glucosidase, N-acetyl-glucosaminidase and a weak positive reaction was observed for acid phosphatase. The nitrate reduction and βgalactosidase reaction was also positive, but urease and indole production were negative. B. algeriensis was susceptible to amoxicillin, nitrourantoin, erythromycin, doxycycline, rifampicin, vancomycin, gentamicin, imipenem, trimethoprim-sulfamethoxazole, ciprofloxacin, ceftriaxone and amoxicillin-clavulanic acid, but resistant to nalidixic acid.
Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was performed as previously described [26,49,50]. Briefly, strain EB01 T was plated on 5% sheep blood-enriched Columbia agar (BioMerieux) and incubated for 24 h at 37°C. Isolated bacterial colo-nies were picked, and then deposited as a thin film in 12 replicates on a MALDI-TOF steel target plate (Bruker Daltonics, Bremen, Germany). The plates were allowed to dry at room temperature. Each deposit was overlaid with 1.5 µl of matrix solution containing α-cyano-hydroxycinnamic acid (Sigma, Saint-Quentin Fallavier, France) saturated with 50% acetonitrile, 2.5% trifluoroacetic acid and high-performance liquid chromatography (HPLC)grade water, and allowed to co-crystallize with the sample. Measurements were conducted using the Microflex LT spectrometer (Bruker Daltonics). Spectra were recorded in the linear positive ion mode over a mass range of 2 to 20 kDa. The acceleration voltage was 20 kV. Spectra were collected as a sum of 240 shots across a spot. The 12 EB01 T spectra were imported into the MALDI BioTyper software (version 3.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against 6,335 bacterial spectra including 210 spectra from 104 Bacillus species, used as reference data, in the BioTyper database. A score enabled the identification, or not, from the tested species: a score > 2 with a validated species enabled the identification at the species level, a score > 1.7 but < 2 enabled the identification at the genus level; and a score < 1.7 did not enable any identification. For strain EB01 T , the scores obtained ranged from 1.15 to 1.60 thus suggesting that our isolate was a new species. We added the spectrum from strain EB01 T (Figure 4) to our database. Spectrum differences with other of Bacillus species are shown in ( Figure 5). Standards in Genomic Sciences   weihenstephanensis, B. vallismortis, B. thuringiensis, B. thioparans, B. subtilis subsp. subtilis, B. subterraneus, B.  shackletonii, B. novalis, B. niacini, B. nealsonii, B. jeotgali, B. flexus, B. circulans, B. bataviensis and B. asahii).
The Gel View displays the raw spectra of all loaded spectrum files as a pseudo-electrophoretic gel. The x-axis records the m/z value. The left y-axis displays the running spectrum number originating from subsequent spectra loading. The peak intensity is expressed by a grey scale scheme code. The grey scale bar on the right y-axis indicates the relation between the shade of grey a peak is displayed with and the peak intensity in arbitrary units. Intens.
[a.  Table 3 shows the project information and its association with MIGS version 2.0 compliance [51].  56] and BLASTn against the GenBank database, whereas the tRNAScanSE tool [57] was used to find tRNA genes. Transmembrane helices and lipoprotein signal peptides were predicted using phobius web server [58]. ORFans were identified if their BLASTP E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Artemis [59] was used for data management and DNA Plotter [60] was used for visualization of genomic features. To estimate the mean level of nucleotide sequence similarity at the genome level between B. algeriensis sp nov. strain EB01 T and seven other Bacillus species, we use the Average Genomic Identity of Orthologous gene Sequences (AGIOS) in-house software. Briefly, this software combines the Proteinortho software [61] for pairwise comparison and detection of orthologous proteins between genomes, then retrieves the corresponding genes and determines the mean percentage of nucleotide sequence identity among orthologous ORFs using the Needleman-Wunsch global alignment algorithm.

Genome properties
The genome is 5,269,577 bp long with 42.22% GC content ( Figure 6 and Table 4). It is composed of 46 contigs. Of the 5,193 predicted genes, 5,098 were protein-coding genes, and 95 were RNAs (10 genes encode 5S rRNA, 1 gene encodes 16S rRNA, 1 gene encodes 23S rRNA, 83 genes are tRNA genes). A total of 3,217 genes (63.1%) were assigned a putative function (by cogs or by NR blast). 457 genes were identified as ORFans (8.96%). The remaining genes were annotated as hypothetical proteins (1,097 genes, 21.52%). The distribution of genes into COGs functional categories is presented in Table 5. The properties and statistics of the genome are summarized in Tables  4 and 5.  Genes with transmembrane helices 1,297 25.44 a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome  (Table 7), thus confirming its new species status.

Conclusion
On the basis of phenotypic (Table 2), phylogenetic and genomic analyses (taxonogenomics) ( Table  6), we formally propose the creation of Bacillus algeriensis sp. nov. that contains the strain EB01 T . This strain has been found in hypersaline lacustrine sediment sample collected from Algeria.

Description of Bacillus algeriensis sp. nov.
Bacillus algeriensis (al.ge.ri.en'sis. NL. masc.adj. algeriensis, of or pertaining to Algeria). Strain EB01 T is a facultative anaerobic Gram-positive, endospore-forming, motile and rod shaped bacterium with rounded ends. Growth is achieved aerobically between 30 and 55°C (optimum 37°C), between 0% and 2.5% NaCl concentration and pH in the range of 6.5-9 (optimum at pH 7). Growth is also observed in microaerophilic atmosphere, however, weak growth was observed under anaerobic conditions. After 24h growth on 5% sheep blood-enriched Columbia agar (BioMerieux) at 37°C, bacterial colonies were smooth, light yellow with 2 mm in diameter. Cells have a length ranging from 2.4 μm to 4.9 μm (mean 3.6 μm) and a diameter ranging from 0.7 μm to 1.1 μm (mean 0.8 μm). Catalase positive but oxidase negative. Using the commercially available API 50CH system (BioMerieux) according to the manufacturer's instructions, a weak positive reaction was observed for D-ribose, D-glucose, D-fructose, methyl α-Dglucopyranoside, N-acetylglucosamine, D-maltose, D-lactose, D-melibiose, D-saccharose, D-trehalose, D-tagatose, and hydrolysis of starch. Other tests were negative. Using the API ZYM system (BioMerieux), positive reactions were observed for alkaline phosphatase, esterase (C4), esterase lipase (C8), leucine arylamidase, α chymotrypsin, β-glucuronidase, α-glucosidase, N-acetyl-glucosaminidase and a weak positive reaction was observed for acid phosphatase. The nitrate reduction and β-galactosidase reaction was also positive, but urease and indole production were negative. B. algeriensis was susceptible to amoxicillin, nitrofurantoin, erythromycin, doxycycline, rifampin, vancomycin, gentamycin, imipenem, trimethoprim-sulfamethoxazole, ciprofloxacin, ceftriaxone and amoxicillin/clavulanic acid, but resistant to nalidixic acid. The G+C content of the genome is 42.22. The 16S rRNA and genome sequences are deposited in GenBank under accession numbers HG315679 and EMBL database under accession number ERP003483, respectively. The type strain EB01 T (= CSUR P857 = DSM 27334) was isolated from sediment sample of the hypersaline lake Ezzemoul sabkha of Oum-El-Bouaghi region in northeastern Algeria.