Genome sequence and description of Aeromicrobium massiliense sp. nov.

Aeromicrobium massiliense strain JC14Tsp. nov. is the type strain of Aeromicrobium massiliense sp. nov., a new species within the genus Aeromicrobium. This strain, whose genome is described here, was isolated from the fecal microbiota of an asymptomatic patient. Aeromicrobium massiliense is an aerobic rod-shaped gram-positive bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,322,119 bp long genome contains 3,296 protein-coding and 51 RNA genes.


Introduction
Aeromicrobium massiliense strain JC14 T (= CSUR P158 = DSM 25782) is the type strain of A. massiliense. sp. nov. This bacterium is a motile, rod-shaped Gram-positive, aerobic catalasepositive bacterium that was isolated from the stool of a healthy Senegalese patient as part of a culturomics study aiming at cultivating all bacterial species within human feces [1]. Bacterial taxonomy, once reliant on "gold standards" such as DNA-DNA hybridization to delineate species and genera [2], has been significantly altered by the introduction of 16S rRNA amplification and sequencing [3]. The recent outcome of high throughput genome sequencing and proteomic analysis of bacteria adds a new and source of discriminative information, on which taxonomic proposals cn be based [4]. We proposed that this information be incorporated into a polyphasic approaches, as previously described [5], and used to describe new bacterial taxa [6,7]. The genus Aeromicrobium was created in 1991 [8] and consists of a group of related Gram-positive, aerobic, motile, rod-shaped bacteria. The genus Aeromicrobium belongs to the family Nocardioidaceae [9] within the order Actinomycetales [10]. Aeromicrobium erythreum was the first described species and is the type species of the genus Aeromicrobium [8]. In addition to this species, nine Aeromicrobium species have been validly published to date, including A. alkaliterrae [11], A. fastidiosum [12], A. panaciterrae [13], and A. ginsengisoli [14] that were isolated from soil; A. marinum [15], A. tamlense [16] A. ponti [17] and A. halocynthiae [18] that were recovered from marine environment; and A. flavum that was isolated from the air [19]. None of these species are reported to be human pathogens.
Here we present a summary classification and a set of features for A. massiliense sp. nov. strain JC14 T together with the description of the complete genomic sequencing and annotation. These characteristics support the circumscription of the species A. massiliense.

Classification and features
A stool sample was collected from a healthy 16year-old male Senegalese volunteer patient living in Dielmo (rural village in the Guinean-Sudanian zone in Senegal), who was included in a research protocol. Written assent was obtained from this individual. No written consent was needed from his guardians for this study because he was older than 15 years old (in accordance with the previous project approved by the Ministry of Health of Senegal and the assembled village population and as published elsewhere [20]). Both this study and the assent procedure were approved by the National Ethics Committee of Senegal (CNERS) and the Ethics Committee of the Institut Fédératif de Recherche IFR48, Faculty of Medicine, Marseille, France (agreement numbers 09-022 and 11-017).  [6,7,[21][22][23][24][25]. The fecal specimen was preserved at -80°C after collection and sent to Marseille. Strain JC14 T (Table 1) was isolated in December 2010 after inoculation on sheep blood-enriched Columbia agar (BioMérieux, Marcy l'Etoile, France), in 5% CO 2 atmosphere at 37°C. The strain exhibited nucleotide sequence similarities with validated Aeromicrobium species ranging from 95.37% with A. marinum (Bruns et al. 2003) [15] to 96.58% with A. erythreum (Miller et al. 1991) [8] (Figure 1), a value lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without using DNA-DNA hybridization [3]. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [32]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. Standards in Genomic Sciences Nocardioides albus (AF004988) Different growth temperatures (25, 30, 37, 45, 50°C) were tested; no growth occurred at 50°C, very weak growth occurred at 45°C, and optimal growth was observed between 25 to 37°C. Colonies were light yellow and opaque with a diameter of 1 mm on 5% blood-enriched Columbia agar (BioMérieux).
Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (Bio-Mérieux), and in the presence of air, with or without 5% CO 2 . Optimal growth was achieved under aerobic conditions, with or without CO 2 , and weak growth occurred in microaerophilic conditions. No growth was observed under anaerobic conditions. Gram staining showed rod-shaped Gram-positive bacteria. A motility test was positive. Cells grown on agar are Gram-positive ( Figure 2) and have a mean diameter of 1.04 µm and a mean length of 1.67 µm ( Figure 3).
A weak reaction was observed for β-glucuronidase.
Other tested characteristics were negative. A. massiliense is susceptible to penicillin G, amoxicillin, imipenem, and vancomycin but resistant to metronidazole. By comparison to A. erythreum, strain JC14 T differed in nitrate reduction, maltose assimilation, and susceptibility to penicillin G, amoxicillin and imipenem (Table 2) [8].
Matrix-assisted laser-desorption/ionization time-offlight (MALDI-TOF) MS protein analysis was carried out as previously described [7,33] using a Microflex spectrometer (Bruker Daltonics, Germany). Twelve distinct deposits were done for strain JC14 T from 12 isolated colonies. The twelve JC14 T spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 3,769 bacteria, which were used as reference data, in the BioTyper database. The database contained no spectra from validly published Aeromicrobium species. No significant score was obtained for strain JC14 T , thus suggesting that our isolate was not a member of a known species within the Bruker database. However, we acknowledge that the absence of other Aeromicrobium spectra does not as yet make using MALDI TOF MS a discriminatory identification criterion for A. massiliense. We incremented our database with the spectrum from strain JC14 T (Figure 4).

Genome sequencing and annotation Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the genus Aeromicrobium.It is part of a study of the human digestive microbiota aimed at isolating all bacterial species found in human feces. It was the second genome of an Aeromicrobium species and the first genome of Aeromicrobium massiliense sp. nov. The genome EMBL accession number is CAHG00000000 and consists of 18 contigs. Table  3 shows the project information and its association with MIGS version 2.0 compliance [34].

Growth conditions and DNA isolation
A. massiliense sp. nov. strain JC14 T (CSUR P158, DSM 25782) was grown aerobically on 5% sheep blood-enriched blood agar at 37°C. Four petri dishes were spread and resuspended in 3x100µl of G2 buffer (EZ1 DNA Tisue kit, Qiagen). A first mechanical lysis was performed using glass powder on a Fastprep-24 device (Sample Preparation system; MP Biomedicals, USA) using 2×20 seconds. DNA was then treated with 2.5µg/µL lysozyme (30 minutes at 37°C) and extracted through a BioRobot EZ 1 Advanced XL (Qiagen).The DNA was then concentrated and purified on a Qiamp kit (Qiagen). The yield and the concentration was measured by the Quant-it Picogreen kit (Invitrogen) on a Genios Tecan fluorometer at 90 ng/µl. Standards in Genomic Sciences

Genome sequencing and assembly
A shotgun and a 3 kb paired-end sequencing strategies were used (Roche). Both libraries were pyrosequenced on the GS FLX Titanium sequencer (Roche). This project was loaded on a 1/4 region of PTP Picotiterplate (Roche, Meylan, France) for the shotgun library and 2 ×1/8 region for the 3-kb paired-end library. The shotgun library was constructed with 500ng of DNA with the GS Rapid library Prep kit as described by the manufacturer (Roche). For the paired-end library, 5µg of DNA was mechanically fragmented on a Hydroshear device (Digilab, Holliston, MA, USA) with an enrichment size at 3-4kb. DNA fragmentation was visualized through an Agilent 2100 BioAnalyzer on a DNA labchip 7500 with an optimal size of 3.428kb. The library was constructed according to the 454 Titanium paired-end protocol (Roche). Circularization and nebulization were performed and generated a pattern with an optimal at 367bp. After PCR amplification through 15 cycles followed by double size selection, the single stranded paired-end library was then quantified on usingQuant-it Ribogreen (Invitrogen) on a Genios Tecan fluorometer at 175pg/µL. The library concentration equivalence was calculated at 8.75E+08 molecules/µL. The libraries were stored at -20°C until further use. The shotgun library was clonally amplified with 3 cpb in 4 emPCR reactions and the 3-kb paired-end library was amplified with 0.5, 0.75 and 1 cpb in 2 emPCR reactions with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yield of the shotgun emPCR reactions was 13%, and the yields of the paired-end emPCRs were 9.1, 10.8, and 9.5% for the 0.5, 0.75 and 1 cpb conditions, respectively, in the range of 5 to 20% from the Roche procedure.
Approximately 790,000 and 340,000 beads for the shotgun and paired-end libraries, respectively, were loaded on the GS Titanium PicoTiterPlate PTP Kit 70×75 and sequenced with the GS FLX Titanium Sequencing Kit XLR70 (Roche). The runs were performed overnight and then analyzed on the cluster through the gsRunBrowser and Newbler Assembler (Roche). A total of 322,810 and 108,529 passed filter wells were obtained for the shotgun and paired-end strategies, respectively, and generated 122.9 and 33.2 Mb with length averages of 381 and 306 bp, respectively. The passed filter sequences were assembled using Newbler with 90% identity and 40 bp as overlap. The final assembly identified 18 contigs (>1,500bp) arranged into 5 scaffolds and generated a genome size of 3.32 Mb.

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [35] with default parameters but the predicted ORFs were excluded if they were spanning a sequencing GAP region. The predicted bacterial protein sequences were searched against the GenBank database [36] and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAscan-SE tool [37] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [38]. Transmembrane domains and signal peptides were predicted using TMHMM [39] and SignalP [40], respectively. ORFans were identified if their BLASTp E-value was lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parameter thresholds have been used in previous works to define ORFans.
To estimate the mean level of nucleotide sequence similarity at the genome level between A. massiliense and A. marinum (GenBank accession number ACLF00000000), the only available Aeromicrobium genome to date, we compared the ORFs only using BLASTN at a query coverage of ≥ 70% and a minimum nucleotide length of 100 bp.

Genome properties
The genome is 3,322,119 bp long (1 chromosome, but no plasmid) with a 72.49% G+C content (Table  4 and Figure 5). Of the 3,347 predicted genes, 3,296 were protein-coding genes, and 51 were RNAs (1 rRNA operon, 2 addition 5S rRNAs, and 46 tRNAs). A total of 2,358 genes (71.54%) were assigned a putative function. In addition, 292 genes were identified as ORFans (8.86%). The remaining genes were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 5. The properties and the statistics of the genome are summarized in Tables 4 and 5.  a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of Aeromicrobium massiliense sp. nov. that contains the strain JC14 T . This bacterium has been detected in Senegal. Colonies are light yellow and opaque with a diameter of 1 mm on blood-enriched Columbia agar. Cells are rod-shaped with a mean diameter of 1.04 µm. Optimal growth is achieved obtained aerobically with or without CO 2 or under microaerophilic conditions. No growth is observed under anaerobic conditions. Growth occurs between 25 -37°C. Cells stain Gram-positive, are non-endospore forming, and motile. Catalase, nitrate reduction, aesculin and gelatin hydrolysis, glucose fermentation, β-galactosidase, maltose and gluconate assimilation, α-glucosidase, βglucosidase, β-glucuronidase, trypsine, leucine arylaminidase, esterase lipase, and esterase activities are present. Oxidase activity is absent. Cells are susceptible to penicillin G, amoxicillin, imipenem, and vancomycin but resistant to metronidazole. The G+C content of the genome is 72.49%. The 16S rRNA and genome sequence are deposited in EMBL under accession numbers JF824798 and CAHG00000000, respectively. The type strain JC14 T (= CSUR P158 = DSM 25782) was isolated from the fecal microbiota of a healthy patient in Senegal.