Complete genome sequence of Arcobacter nitrofigilis type strain (CIT)

Arcobacter nitrofigilis (McClung et al. 1983) Vandamme et al. 1991 is the type species of the genus Arcobacter in the family Campylobacteraceae within the Epsilonproteobacteria. The species was first described in 1983 as Campylobacter nitrofigilis [1] after its detection as a free-living, nitrogen-fixing Campylobacter species associated with Spartina alterniflora Loisel roots [2]. It is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacter species which are associated with warm-blooded animals and tend to be pathogenic. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a type stain of the genus Arcobacter. The 3,192,235 bp genome with its 3,154 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.


Introduction
Strain CI T (= DSM 7299 = ATCC 33309 = CCUG 15893) is the type strain of the species Arcobacter nitrofigilis, which is the type species of the genus Arcobacter [1]. Strain CI T was isolated from roots of Spartina alterniflora Loisel (cordgrass) growing in salty marshes at the East coast of Canada. It was the first description of an organism in this kind of habitat that belonged to the genus Campylobacter, as described based on phenotypic and biochemical traits [2]. The species epithet nitrofigilis means 'nitrogen-fixing' and is based on the outstanding characteristic of this species [3]. The new genus Arcobacter, meaning 'bow-shaped rod', was introduced in 1991 and its separation from the genus Campylobacter was based on DNA-DNA and DNA-rRNA hybridization [1]. Up to now, the genus Arcobacter comprises nine species, some of which are associated with warm-blooded animals whereas others are found in marine environments. Within the Campylobacteraceae several wholegenome sequences have already been deciphered: A. butzleri strain RM4018 [4] (non type strain) is the only member of the genus Arcobacter, as well as genomes from seven species of the genus Campylobacter, and Sulfurospirillum deleyianum [5]. Only few additional strains belonging to the species A. nitrofigilis are known in the literature, with F2176 and F2173 [6] being the closest related ones (99% sequence identity). The type strains of the other species of the genus Arcobacter share 93.8-94.6% 16S rRNA sequence identity with strain CI T , whereas the type strains from other genera in the family Campylobacteraceae share less than 89% sequence identity with strain CI T [7]. There are plenty of phylotypes (uncultured bacteria) known from marine environments such as the ridges flanking crustal fluids in oceanic crust (AY704399, clone FD118-51B-02, 98.6%), sea water from Ishigaki port in Japan (AB262370/-71, 96.4%), a mangrove of the Danshui river estuary of northern Taiwan (DQ234254, 95.8%) [8], costal water in the Bohai Bay, China, (FJ155005, 95.8%), in Black Sea shelf sediments in Romania (AJ271655, 95.8%), or from activated sludge in New Zealand (EU104146, 95.8%). Environmental screens and marine metagenome libraries do not contain more than a handful of sequences with >93% 16S rRNA gene sequence identity indicating a sparse representation of closely related strains in the habitats analyzed (status March 2010). Here we present a summary classification and a set of features for A. nitrofigilis strain CI T , together with the description of the complete genome sequencing and annotation. Figure 1 shows the phylogenetic neighborhood of A. nitrofigilis strain CI T in a 16S rRNA based tree. The four 16S rRNA gene sequences in the genome differ from each other by up to two nucleotides, and differ by up to three nucleotides from the previously published 16S rRNA sequence (L14627) generated from CCUG 15893, which contains nine ambiguous base calls. Phylogenetic tree highlighting the position of A. nitrofigilis strain CI T relative to the type strains of the other genera within the Epsilonproteobacteria. The tree was inferred from 1,379 aligned characters [9,10] of the 16S rRNA gene sequence under the maximum likelihood criterion [11,12] and rooted (as far as possible) in accordance with the current taxonomy [13]. The branches are scaled in terms of the expected number of substitutions per site. Numbers above branches are support values from 200 bootstrap replicates [14] if larger than 60%. Lineages with type strain genome sequencing projects registered in GOLD [15] are shown in blue, published genomes [16] in bold, e.g. the recently published GEBA genome from S. deleyianum [5].

Classification and features
A. nitrofigilis cells are Gram-negative, bow-shaped or curved rods of 1-3 µm length and 0.2-0.9 µm width ( Figure 2 and Table 1). Motility is based on a single, polar flagellum and results in rapid corkscrew motion. Older cultures also show coccoid cells [2]. The habitat of all known A. nitrofigilis isolates is either the roots or the sediment around the roots of S. alterniflora Loisel growing in salt marshes [3]. Although no pathogenic association has been described so far, A. nitrofigilis was among five Arcobacter species that were isolated from food samples such as meat and shellfish varieties [27]. The optimum growth temperature of A. nitrofigilis is 30°C, the temperature range is from 10-37°C [28]. Neither spores nor granules are present but a brown pigment is formed from tryptophan [2]. All strains of the species show positive reactions for nitrogenase, catalase and oxidase. Growth occurs under microae-rophilic conditions with oxygen as terminal electron acceptor, under anaerobic conditions fumarate or aspartate are necessary, the presence of nitrate is detrimental [2]. Hydrogen is not necessary for growth [1]. Nitrate is reduced to nitrite and sulfide is produced from cysteine [3]. Strain CI T tested positive for urease, other strains of the species do not [3]. The metabolism of A. nitrofigilis is chemoorganotrophic; organic acids and amino acids are used as carbon sources but carbohydrates are neither oxidized nor fermented [2]. All strains of the species are halotolerant. They require a minimum of 0.5% NaCl for growth and can tolerate up to 7% NaCl [28]. A. nitrofigilis is susceptible to cephalothin and nalidixic acid but isresistant to vancomycin [3]. The G+C content of the DNA was determined by thermal denaturation to be 28.0% [3] which is slightly below the 28.4% found in the genome.

Genome sequencing and annotation Genome project history
This organism was selected for sequencing on the basis of its phylogenetic position [29], and is part of the Genomic Encyclopedia of Bacteria and Archaea project [30]. The genome project is deposited in the Genomes OnLine Database [15] and the complete genome sequence in GenBank. Sequencing, finishing and annotation were performed by the DOE Joint Genome Institute (JGI). A summary of the project information is shown in Table 2. Evidence codes -IDA: Inferred from Direct Assay (first time in publication); TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene Ontology project [26]. If the evidence code is IDA, then the property was directly observed by one of the authors or an expert mentioned in the acknowledgements.

Genome sequencing and assembly
The genome was sequenced using a combination of Illumina and 454 technologies. An Illumina GAii shotgun library with reads of 50 Mb, a 454 Titanium draft library with average read length of 243 bases, and a paired end 454 library with average insert size of 24 kb were generated for this genome. All general aspects of library construction and sequencing can be found at http://www.jgi.doe.gov/. Draft assembly was based on 138 Mb 454 standard and 454 paired end data (498,215 reads). Newbler (Roch, version 2.0.0-PostRelease-10/28/2008) parameters areconsed -a 50 -l 350 -g -m -ml 20. The initial Newbler assembly contained 42 contigs in 3 scaffolds. It was converted into a phrap assembly by making fake reads from the consensus and collecting the read pairs in the 454 paired end library. Illumina sequencing data was assembled with Velvet [32], and the consensus sequences were shredded into 1.5 kb overlapped fake reads and assembled together with the 454 data. The Phred/Phrap/Consed software package was used for sequence assembly and quali-ty assessment in the following finishing process. After the shotgun stage, reads were assembled with parallel phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with gapResolution, Dupfinisher, or sequencing cloned bridging PCR fragments with subcloning or transposon bombing [33]. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks (J-F.Cheng, unpublished). A total of 480 additional Sanger reactions were necessary to close gaps and to raise the quality of the finished sequence. Illumina reads were also used to improve the final consensus quality using an in-house developed tool (the Polisher). The error rate of the completed genome sequence is less than 1 in 100,000.

Genome annotation
Genes were identified using Prodigal [34] as part of the Oak Ridge National Laboratory genome annotation pipeline, followed by a round of manual curation using the JGI GenePRIMP pipeline [35]. The predicted CDSs were translated and used to search the National Center for Biotechnology Information (NCBI) nonredundant database, Uni-Prot, TIGR-Fam, Pfam, PRIAM, KEGG, COG, and In-terPro databases. Additional gene prediction analysis and functional annotation was performed within the Integrated Microbial Genomes -Expert Review (IMG-ER) platform [36].

Genome properties
The genome is 3,192,235 bp long and comprises one main circular chromosome with an overall G+C content of 28.4% (Table 3 and Figure 3). Of the 3,224 genes predicted, 3,154 were proteincoding genes, and 70 RNAs; 28 pseudogenes were also identified. The majority of the proteincoding genes (72.1%) were assigned a putative function while those remaining were annotated as hypothetical proteins. The distribution of genes into COGs functional categories is presented in Table 4.