Non contiguous-finished genome sequence and description of Cellulomonas massiliensis sp. nov.

Cellulomonas massiliensis strain JC225T sp. nov. is the type strain of Cellulomonas massiliensis sp., a new species within the genus Cellulomonas. This strain, whose genome is described here, was isolated from the fecal flora of a healthy Senegalese patient. C. massiliensis is an aerobic rod-shaped bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,407,283 bp long genome contains 3,083 protein-coding and 48 RNA genes.


Introduction
Cellulomonas massiliensis strain JC225 T (= CSUR P160 = DSM 25695) is the type strain of C. massiliensis sp. nov. This bacterium is a motile, Gram-positive, aerobic, indole-negative rod that was isolated from the stool of a healthy Senegalese patient as part of a culturomics study aiming at cultivating all bacterial species within human feces [1]. The current approach to the classification of prokaryotes, known as polyphasic taxonomy, relies on a combination of phenotypic and genotypic characteristics [2]. However, as more than 3,000 bacterial genomes have been sequenced [3], and proteomic information is more becoming more readily accessible [4], we recently proposed that genomic information should be integrated in the description of new bacterial species [5][6][7][8][9][10][11]. The genus Cellulomonas was created in 1923 to reclassify several bacteria previously classified as Bacillus species [12]. To date, this genus is made of 19 species [13][14][15][16][17][18][19][20][21][22][23][24]. The two species that are the most phylogenetically related to C. massiliensis are C. composti [17] and C. persica [21]. Most of these species were originally solated from environmental samples, notably from habitats enriched in cellulose, such as soil or sugar fields, and occasionally from the rumen and activated sludge. Rare cases of human endocarditis [25], osteomyelitis [25], endophtalmitis [26] and cholecystitis [27] caused by Cellulomonas species have been reported. To date, members of the genus Cellulomonas have not been described in the normal fecal flora. Here we present a summary classification and a set of features for C. massiliensis sp. nov. strain JC225 T together with the description of the complete genomic sequencing and annotation. These characteristics support the circumscription of the species C. massiliensis.

Classification and features
A stool sample was collected from a healthy 16year-old male Senegalese volunteer patient living in Dielmo (a rural village in the Guinean-Sudanian zone in Senegal), who was included in a research protocol. Written assent was obtained from this individual; no written consent was needed from his guardians for this study because he was older than 15 years old (in accordance with the previous project approved by the Ministry of Health of Senegal and the assembled village population and as published elsewhere [28]. Both this study and the assent procedure were approved by the National Ethics Committee of Senegal (CNERS) and the Ethics Committee of the Institut Fédératif de Recherche IFR48, Faculty of Medicine, Marseille, France (agreement numbers 09-022 and 11-017). Several other new bacterial species were isolated from this specimen using various culture conditions, including the recently described Anaerococcus senegalensis, Bacillus timonensis, Alistipes senegalensis, Alistipes timonensis,Clostridium senegalense, Paenibacillus senegalensis and Peptoniphilus timonensis [5][6][7][8][9][10][11], thus suggesting that the human digestive flora is far from being fully known. The fecal specimen was preserved at -80°C after collection and sent to Marseille. Strain JC225 (Table 1) was isolated in May 2011 by passive filtration of the stool and aerobic incubation on Brain Heart Infusion agar at 37°C. This strain exhibited a nucleotide sequence similarity of 98.3% with Cellulomonas composti (Kang et al 2007), the phylogenetically closest validated Cellulomonas species (Figure 1) that was cultivated from cattle farm compost [17]. This value was lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [39]. By comparison to the Genbank database [40] strain JC225 T also exhibited a nucleotide sequence similarity greater than 99.5% with Cellulomonas sp. strain 3335BRRJ isolated from clean room environments (Genbank accession number FJ200382). This bacterium is most likely classified within the same species as strain JC225 T ( Figure  1). , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [38]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. GenBank accession numbers are indicated in parentheses. Sequences were aligned using CLUSTALW, and phylogenetic inferences obtained using the maximum-likelihood method within the MEGA software. Numbers at the nodes are bootstrap values obtained by repeating the analysis 500 times to generate a majority consensus tree. The scale bar indicates a 1% nucleotide sequence divergence.
Different growth temperatures (25, 30, 37, 45°C) were tested; no growth occurred at 25°C or 45°C, growth occurred between 30 and 37°C, and optimal growth was observed at 37°C. Colonies were transparent and smooth with a diameter of 1 mm on blood-enriched Columbia agar and Brain Heart Infusion (BHI) agar. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux), and in the presence of air, with or without 5% CO 2 . Optimal growth was achieved aerobically. Weak growth was observed under microaerophilic condition and with 5% CO 2. No growth was observed under anaerobic conditions. Gram staining showed Gram-positive rods. A motility test was positive. Cells grown on agar are Gram-positive (Figure 2), with a diameter and length ranging from 0.37 to 0.60 µm (mean, 0.48 µm), and from 0.55 to 1.4 µm (mean, 0.95 µm), respectively, in electron microscopy, (Figure 3). Strain JC225 T exhibited catalase and oxidase activities. Using the API 20 NE system (BioMérieux), a positive reaction was obtained for aesculin hydrolysis and β-galactosidase. Negative reactions were obtained for nitrate reduction, indole production, glucose fermentation, arginine dihydrolase, urease, gelatin hydrolysis, and glucose, arabinose, mannose, mannitol N-acetylglucosamine, maltose, gluconate, caprate, adipate, malate, citrate, and phenyl-acetate assimilation. C. massiliensis is susceptible to amoxicillin, imipenem, gentamicin, and ciprofloxacin but resistant to trimethoprim/sulfamethoxazole and metronidazole. By comparison to C. composti [17], C. massiliensis differed in motility, nitrate reduction, gelatine hydrolysis, carbohydrate assimilation, and catalase activity ( Table 2). Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was carried out as previously described [5,41] using a Microflex spectrometer (Bruker Daltonics, Germany). Twelve distinct deposits were done for strain JC225 from 12 isolated colonies. The 12 JC225 spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 3,769 bacteria, which were used as reference data in the BioTyper database. The database contained 11 spectra from 8 validly published Cellulomonas species, including Cellulomonas composti, the phylogenetically closest species to C. massiliensis. No significant score was obtained for strain JC225 T , thus suggesting that our isolate was not a member of a known species within the Bruker database. We incremented our database with the reference spectrum from strain JC225 T (Figure 4).

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phenotypic differences, phylogenetic position and 16S rRNA similarity to other members of the genus Cellulomonas and is part of a study of the human digestive flora aiming at isolating all bacterial species within human feces. It was the fourth genome of a Cellulomonas species and the first genome of Cellulomonas massiliensis sp. nov. The EMBL accession number is CAHD00000000 and consists of 250 contigs (>=200 bp). Table 3 shows the project information and its association with MIGS version 2.0 compliance [42].

Growth conditions and DNA isolation
Cellumonas massiliensis sp. nov. JC225 T (= CSUR P160 = DSM 25695) was grown aerobically on 5% sheep blood-enriched Columbia agar (BioMérieux) at 37°C. Ten petri dishes were spread and resuspended in 3×100µl of G2 buffer (EZ 1 DNA Tissue kit, Qiagen). A first mechanical lysis was performed using glass powder on a Fastprep-24 device (MP Biomedicals, Ilkirch, France) during 2×20 seconds. DNA was then treated with 2.5µg/µL lysozyme (30 minutes at 37°C) and extracted using a BioRobot EZ 1 Advanced XL (Qiagen). The DNA was then concentrated and purified using a Qiamp kit (Qiagen). The yield and the concentration were measured using a Quant-it Picogreen kit (Invitrogen) on a Genios_Tecan fluorometer at 78.9 ng/µl.

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [43] with default parameters but the predicted ORFs were excluded if they were spanned a sequencing GAP region. The predicted bacterial protein sequences were searched against the GenBank database [40] and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAscan-SE tool [44] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [45]. Transmembrane domains and signal peptides were predicted using TMHMM [46] and SignalP [47], respectively. ORFans were identified if their BLASTp E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have been used in previous works to define ORFans. To estimate the mean level of nucleotide sequence similarity at the genome level between C. massiliensis and C. flavigena and C. fimi (EMBL accession numbers CP001964 and CP002666, respectively), the only two available genomes from validly published Cellulomonas species to date, we compared the ORFs only using BLASTN at a query coverage of ≥ 70% and a minimum nucleotide length of 100 bp.

Genome properties
The genome is 3,407,283 bp long (1 chromosome, but no plasmid) with a 71.22% G+C content (Table  4 and Figure 5). It is composed of 5 scaffolds. Of the 3,131 predicted genes, 3,083 were proteincoding genes, and 48 were RNAs (1 rRNA operon and 45 tRNA genes). A total of 2,184 genes (70.84%) were assigned a putative function, and 256 genes were identified as ORFans (8.30%).
The remaining genes were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 5. The properties and the statistics of the genome are summarized in Table 4 and 5.  The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of Cellulomonas massiliensis sp. nov. that contains the strain JC225 T . This bacterium has been found in Senegal.
Colonies are transparent and smooth with a diameter of 1 mm on blood-enriched Columbia agar and Brain Heart Infusion (BHI) agar. Cells are rodshaped with a diameter and length ranging from 0.37 to 0.60 µm (mean of 0.48 µm), and from 0.55 to 1.4 µm (mean of 0.95 µm), respectively. Optimal growth is achieved aerobically. Weak growth is observed with 5% CO 2 and under microaerophilic conditions. No growth is observed under anaerobic conditions. Growth occurs between 30-37°C, with optimal growth at 37°C. Cells stain Gram-positive, are non-endospore forming, and are motile. Catalase, oxidase, aesculin hydrolysis and βgalactosidase activities are present. Indole production, nitrate reduction, glucose fermentation, arginine dihydrolase, urease, gelatin hydrolysis, and glucose, arabinose, mannose, mannitol N-acetylglucosamine, maltose, gluconate, caprate, adipate, malate, citrate, and phenyl-acetate assimilation activities are absent. Cells are susceptible to amoxicillin, imipenem, ciprofloxacin and gentamicin, but resistant to trimethoprim/sulfamethoxazole and metronidazole. The 16S rRNA and genome sequences are deposited in Genbank and EMBL under accession numbers JN657218 and CAHD00000000, respectively. The G+C content of the genome is 71.22%. The type strain JC225 T (= CSUR P160 = DSM 25695) was isolated from the fecal flora of a healthy patient in Senegal.