Non-contiguous finished genome sequence and description of Herbaspirillum massiliense sp. nov.

Herbaspirillum massiliense strain JC206T sp. nov. is the type strain of H. massiliense sp. nov., a new species within the genus Herbaspirillum. This strain, whose genome is described here, was isolated from the fecal flora of a healthy Senegalese patient. H. massiliense is an aerobic rod. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,186,486 bp long genome (one chromosome but no plasmid) contains 3,847 protein-coding and 54 RNA genes, including 3 rRNA genes.


Classification and features
A stool sample was collected from a healthy 16year-old male Senegalese volunteer patient living in Dielmo (a rural village in the Guinean-Sudanian zone in Senegal), who was included in a research protocol. Written assent was obtained from this individual; no written consent was needed from his guardians for this study because he was older than 15 years old (in accordance with the previous project approved by the Ministry of Health of Senegal and the assembled village population and as published elsewhere [20].) Both this study and the assent procedure were approved by the National Ethics Committee of Senegal (CNERS) and the Ethics Committee of the Institut Fédératif de Recherche IFR48, Faculty of Medicine, Marseille, France (agreement numbers 09-022 and 11-017). Several other new bacterial species were isolated from this specimen using various culture conditions [3,4]. The fecal specimen was preserved at -80°C after collection and sent to Marseille. Strain JC206 T ( Table 1) was isolated in June 2011 after passive filtration of the stool sample to select motile species using companion plate, cell culture inserts with 0.4 μm-pore membranes (Becton Dickinson, Heildeberg, Germany) and Leptospira broth (BioMerieux, Marcy l'Etoile, France). Subsequently, we cultivated strain JC206 T on 5% sheep blood agar in an aerobic atmosphere at 37°C. This strain exhibited a 96.7% 16S rDNA nucleotide sequence similarity with H. aurantiacum (Carro et al. 2012), the phylogenetically closest validly published Herbaspirillum species (Figure 1), that was cultivated from volcanic soil in Canary Islands. This value was lower than the 98.7% 16S rRNA gene sequence threshold recommended by Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA hybridization [29]. , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [28]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements. Standards in Genomic Sciences  Different growth temperatures (25, 30, 37, 45°C) were tested. No growth occurred at either 25°C or 45°C, growth occurred at either 30 or 37°C. Optimal growth was observed at 37°C. Colonies were light brown, opaque and 0.5 mm in diameter on blood-enriched Columbia agar and Brain Heart Infusion (BHI) agar. Growth of the strain was tested under anaerobic and microaerophilic conditions using GENbag anaer and GENbag microaer systems, respectively (BioMérieux), and in the presence of air, of 5% CO2 and in anaerobic conditions. Optimal growth was obtained aerobically, with weak growth being observed under microaerophilic condition and with 5% CO 2 . No growth occurred under anaerobic conditions. Gram staining showed Gram negative curved rods ( Figure 2). A motility test was positive. Cells grown on agar have a mean diameter of 0.44 µm by electron microscopy and have several polar flagella ( Figure 3).
Matrix-assisted laser-desorption/ionization timeof-flight (MALDI-TOF) MS protein analysis was carried out as previously described [30] using a Microflex spectrometer (Bruker Daltonics, Germany). Spectra were compared with the Bruker database that contained no spectrum from Herbaspirillum species. No significant score was obtained with any other taxon. We incremented our database with the spectrum from strain JC206 T (Figure 4).

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of its phylogenetic position and 16S rRNA similarity to other members of the genus Herbaspirillum, and is part of a "culturomics" study of the human digestive flora aiming at isolating all bacterial species within human feces. It was the second genome of a Herbaspirillum species and the first genome of H. massiliense sp. nov. A summary of the project information is shown in Table 2. The Genbank accession number of the genome is CAHF00000000 and consists of 27 contigs. Table 2 shows the project information and its association with MIGS version 2.0 compliance.

Genome sequencing and assembly
Both a shotgun and a 3-kb paired end sequencing were performed on a 454 GS FLX pyrosequencer. Both projects were loaded on a ¼ and a 1/8 regions of a PTP Picotiterplate. The shotgun library was constructed with 500 ng DNA as recommended by the manufacturer (Roche). For paired end sequencing, five µg of DNA were mechanically fragmented using the Hydroshear device (Digilab, Holliston, MA) with an enrichment size at 3-4kb. The DNA fragmentation was visualized using the BioAnalyzer 2100 on a DNA labchip 7500 (Agilent) with an optimal size of 3.944 kb. The library was constructed according to the 454 GS FLX Titanium paired end protocol. Circularization and nebulization were performed and generated a pattern with an optimal at 418 bp. After PCR amplification through 15 cycles followed by double size selection, the single stranded paired end library was then quantified on the Quant-it Ribogreen kit (Invitrogen) on the Genios Tecan fluorometer at 128 pg/µL. The library concentration equivalence was calculated as 5.62 × 10 8 molecules/µL. The library was stored at -20°C until further use. The library was clonally amplified with 2 cpb and 3 cpb, respectively, in 2 × 8 emPCR reactions with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yields of the emPCR were 13.75 and 2.65% for the shotgun and paired end strategies, respectively. Approximately 790,000 beads were loaded on the GS Titanium PicoTiterPlate PTP Kit 70×75 and sequenced with the GS FLX Titanium Sequencing Kit XLR70 (Roche). The run was performed overnight and then analyzed on the cluster through the gsRunBrowser and Newbler assembler (Roche). A total of 504,311 passed filter wells were obtained and generated 4.69 Mb with a length average of 312 bp. The passed filter sequences were assembled Using Newbler with 90% identity and 40 bp as overlap. The final assembly identified 5 scaffolds and 27 contigs (>100 bp).

Genome annotation
Open Reading Frames (ORFs) were predicted using Prodigal [31] with default parameters but the predicted ORFs were excluded if they were spanning a sequencing GAP region. The predicted bacterial protein sequences were searched against the National Center for Biotechnology Information (NCBI) nonredundant (NR) and the Clusters of Orthologous Groups (COG) databases using BLASTP. The tRNAScanSE tool [32] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [33] and BLASTn against the NR database. ORFans were identified if their BLASTP E-value were lower than 1e-03 for alignment length greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. Such parame-ter thresholds have already been used in previous works to define ORFans.

Genome properties
The genome is 4,186,486 bp long (one chromosome but no plasmid) with a 59.73% GC content (Table 3 and Figure 5). Of the 3,901 predicted genes, 3,847 were protein-coding genes, and 54 were RNAs. A total of 2,924 genes (74.95%) were assigned a putative function. ORFans accounted for 312 (8.0%) of the genes. The remaining genes were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4. The properties and the statistics of the genome are summarized in Tables  3 and 4.  a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. The total is based on the total number of protein coding genes in the annotated genome.

Conclusion
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the creation of Herbaspirillum massiliense sp. nov. that contains the strain JC206 T . This bacterium has been found in Senegal.
Colonies are 0.5 mm in diameter on bloodenriched Columbia agar and Brain Heart Infusion (BHI) agar. Cells are rod-shaped with a mean diameter of 0.44 µm. Motile with tufts of polar flagellae optimal growth occurs under aerobic conditions. Weak growth is observed under microaerophilic conditions and with 5% CO 2 . No growth is observed under anaerobic conditions. Growth occurs between 30-37°C, with optimal growth observed at 37°C. Cells stain Gram-negative. Catalase, oxidase and arginine dihydrolase activities, as well as esculin hydrolysis are present. Nitrate reduction and indole production are absent. Cells are susceptible to ticarcillin, imipenem, trimethoprim/sulfamethoxazole, gentamicin, amikacin, and colimycin. The G+C content of the genome is 59.73%. The 16S rRNA and genome sequences are deposited in Genbank under accession numbers JN657219 and CAHF00000000, respectively. The type strain JC206 T (= CSUR P159 = DSMZ 25712) was isolated from the fecal flora of a healthy patient in Senegal.