Non-contiguous finished genome sequence and description of Bartonella senegalensis sp. nov.

Bartonella senegalensis sp. nov. strain OS02T is the type strain of B. senegalensis sp. nov., a new species within the genus Bartonella. This strain, whose genome is described here, was isolated in Senegal from the soft tick Ornithodoros sonrai, the vector of relapsing fever. B. senegalensis is an aerobic, rod-shaped, Gram-negative bacterium. Here we describe the features of this organism, together with the complete genome sequence and its annotation. The 1,966,996 bp-long genome contains 1,710 protein-coding and 46 RNA genes, including 6 rRNA genes.


Introduction
Bartonella is the only genus of the family Bartonellaceae of Alphaproteobacteria. To date, 29 Bartonella species have been validly published [1,2], and many isolates have yet to be described. These bacteria are facultative intracellular pathogens, many of which infect erythrocytes [3]. At least 13 Bartonella species are associated with human diseases. B. bacilliformis, B. quintana and B. henselae, are relatively common human pathogens and cause Carrión's disease, trench fever and cat scratch fever, respectively. Different species of Bartonella are also associated with chronic bacteremia and/or endocarditis, bacillary angiomatosis, peliosis hepatitis, retinitis, uveitis, and myocarditis [4]. The epidemiological cycle of bartonellae consists of a reservoir host, which is a vertebrate with a chronic intravascular infection and sustained bacteremia, and a vector (usually a blood-sucking arthropod such as fleas, sandflies or lice) that transfers the bacteria from the reservoir to a susceptible host. Bartonella species are typically associated with a specific primary host; e.g., B. henselae is commonly found in domestic and wild felids all over the world, including Africa [5][6][7], whereas B. bacilliformis is human-specific. Animal hosts of bartonellae include dogs, rabbits, coyotes, foxes, cattle, deer, elk and multiple rodent species [6,[8][9][10]. For most pathogenic bartonellae (except B. bacilliformis and B. quintana), humans are accidental (secondary) hosts [6]. In 2003, La Scola et al. proposed a multilocus sequence analysis based on 4 genes and one intergenic spacer as a tool for the description of new Bartonella species [11]. Among these genetic markers, two, i.e., gltA and rpoB, were particularly discriminatory, with new Bartonella isolates considered as new species if they exhibit <96.0% and <95.4% sequence identity with other validly published species for the 327-and 825-bp fragments of the gltA and rpoB genes, respectively [2,[11][12][13]. This strategy of combining sequences from several genes, usually housekeeping genes, is congruent with the "gold-standard" DNA-DNA reassociation for several bacterial genera [14]. In this study, we used La Scola's criteria and described the genome sequence as well as main phenotypic characteristics of strain OS02 T . Here, we present a summary classification and a set of features for B. senegalensis sp. nov. strain OS02 T together with the description of the complete genomic sequence and annotation. These characteristics support the definition of the species B. senegalensis.

Classification and features
Fifteen adult Ornithodoros sonrai soft ticks were collected in 2008 from rodent burrows in the Soulkhou Thissé village (a rural village in the Guinean-Sudanian zone in Senegal) as part of a prospective study on tick-borne relapsing fever in West Africa. Ticks were preserved at room temperature for 40 days without feeding prior to further testing. The isolation of Bartonella strains from ticks was performed as described previously [15] and the results will be reported elsewhere. Strain OS02 (Table 1) was obtained in June 2009 from a single tick following a 7-day incubation at 37°C in 5% CO 2 -enriched atmosphere on Columbia agar (BioMerieux, Marcy l'Etoile, France). , not directly observed for the living, isolated sample but based on a generally accepted property for the species or anecdotal evidence). Evidence codes come from the Gene Ontology project [27]. If the evidence is IDA, then the property was directly observed for a live isolate by one of the authors or an expert mentioned in the acknowledgements.
A motility test was negative. Cells grown on agar are Gram negative and have a mean length and width of 1,254.4±329.3 nm and 533.3±100.5 nm, respectively, by electron microscopy ( Figure 2). No flagella or pili were observed. Strain OS02 T exhibited neither catalase nor oxidase activity. Biochemical characteristics were assessed using an Anaerobe Identification Test Panel AN MicroPlate™ (Biolog Inc., Hayward, CA, USA). None of 95 biochemical tests available (including D-mannose, D-fructose and D-galactose) were positive. Similar profiles were previously observed for other Bartonella species [13].  Matrix-assisted laser desorption/ionization timeof-flight (MALDI-TOF) mass spectrometry protein analysis was carried out as previously described [33]. Five isolated colonies of strain OS02 T were deposited as individual spots on the MALDI target plate. Each smear was overlaid with 2 μL of matrix solution (a saturated solution of alpha-cyano-4hydroxycinnamic acid) in 50% acetonitrile/2.5% trifluoroacetic acid, and allowed to dry for five minutes. Measurements were performed with a Microflex spectrometer (Bruker). The five OS02 T spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the main spectra of 4,613 bacteria in the BioTyper database and of 19 Bartonella species in our own database. The identification method included the m/z from 3,000 to 15,000 Da. For every spectrum, a maximum of 100 peaks were taken into account and compared with the spectra in the database. For strain OS02 T , the scores obtained were always below 1.5 (a score < 1.7 did not enable an identification), suggesting that our isolate was not a member of a known species. We added the spectrum from strain OS02 T to the database (Figure 3).

Genome sequencing information Genome project history
The organism was selected for sequencing on the basis of the similarity of its 16S rRNA, ITS, ftsZ, gltA and rpoB to other members of the genus Bartonella. Nucleotide sequence similarity levels of these genes suggested that strain OS02 T represents a new species in the genus Bartonella. A summary of the project information is shown in Table 2. The GenBank accession number is CALV00000000, and the entry consists of 99 contigs (≥200 bp) and 9 scaffolds (>1,500 bp). Table 2 shows the project information and its association with MIGS version 2.0 compliance.

Genome sequencing and assembly
DNA (5 μg) was mechanically fragmented on a Hydroshear device (Digilab, Holliston, MA, USA) with an enrichment size of 3-4 kb. The DNA fragmentation was visualized using the Agilent 2100 BioAnalyzer on a DNA labchip 7500 with an optimal size of 3.475 kb. The library was constructed according to the 454 GS FLX Titanium paired-end protocol. Circularization and nebulization were performed and generated a pattern with an optimum at 641 bp. After PCR amplification over 17 cycles followed by double size selection, the single-stranded paired-end library was then quantified with the BioAnalyzer on a DNA labchip RNA pico 6,000 at 323 pg/μL. The library concentration equivalence was calculated as 9.24E+08 molecules/μL. The library was stored at -20°C until further use.
The library was clonally amplified with 1 cpb and 1.5 cpb in 4 and 3 emPCR reactions, respectively, with the GS Titanium SV emPCR Kit (Lib-L) v2 (Roche). The yields of the 1 cpb and 1.5 cpb emPCR were determined to be 3.08% and 8%, respectively. After amplification, 790,000 beads from the 2 emPCR conditions were loaded on a ¼ region on the GS Titanium PicoTiterPlate PTP Kit 70×75 and sequenced with the GS FLX Titanium Sequencing Kit XLR70 (Roche). The run was analyzed on the cluster through the gsRunBrowser and Newbler assembler (Roche). A total of 200,243 passed filter wells were obtained and generated 57.62 Mb of DNA sequence with an average length of 287 bp. The passed filter sequences were assembled using Newbler with 90% identity and 40 bp for overlap requirements. The final assembly identified 9 scaffolds and 63 large contigs (≥1,500 bp), generating a genome size of 1.98 Mb, which corresponds to 29.10× equivalent genome.

Genome annotation
Open reading frames (ORFs) were predicted using PRODIGAL [34] with default parameters, but predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank database [35] using BLASTP and the Clusters of Orthologous Groups (COG) database using COGNITOR [36]. The prediction of RNA genes, i.e., rRNAs, tRNAs and other RNAs, was performed using the RNAmmer [37] and ARAGORN [38] algorithms. The transmembrane helices and signal peptides were identified using TMHMM [39] and SignalP [40], respectively.

Genome properties
The genome is 1,966,996 bp long (one chromosome, no plasmids) with a 38.6% G+C content (Table 3, Figure 4). Of the 1,756 predicted genes, 1,710 were protein-coding genes, and 46 were RNAs (2 rRNA operons and 40 tRNA genes). A total of 997 genes (58.3%) were assigned a putative function. The remaining genes were annotated as either hypothetical proteins or proteins of unknown functions. The distribution of genes into COGs functional categories is presented in Table 4. The properties and the statistics of the genome are summarized in Tables 3 and 4.

Insights from the genome sequence
Compared to B. henselae strain Houston (GenBank accession number NC_005956), its closest phylogenetic neighbor, B. senegalensis strain OS02 T had a larger genome (1,966,996 and 1,931,047 bp, respectively), more genes (1,756 and 1,491 genes, respectively) and a higher G+C content (38.6 and 38%, respectively). The protein-coding genes present in B. senegalensis but absent or split in B. henselae included multidrug-efflux transporter, membrane protein formate-tetrahydrofolate ligase, formate-tetrahydrofolate ligase, glycoside hydrolase family 3-like, glycoside hydrolase family 3-like, putative major facilitator superfamily, SAMdependent methyltransferases, resolvases, toxinantitoxin modules, transposases, ubiquinolcytochrome C reductase, LeuA2, and phage proteins, as well as several hypothetical proteins. a The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.

Conclusion
On the basis of phylogenetic and genotypic analyses, we formally propose the creation of Bartonella senegalensis sp. nov., which contains strain OS02 T . This bacterium was isolated in Senegal.