Complete genome sequence of Arthrobacter sp. strain FB24

Arthrobacter sp. strain FB24 is a species in the genus Arthrobacter Conn and Dimmick 1947, in the family Micrococcaceae and class Actinobacteria. A number of Arthrobacter genome sequences have been completed because of their important role in soil, especially bioremediation. This isolate is of special interest because it is tolerant to multiple metals and it is extremely resistant to elevated concentrations of chromate. The genome consists of a 4,698,945 bp circular chromosome and three plasmids (96,488, 115,507, and 159,536 bp, a total of 5,070,478 bp), coding 4,536 proteins of which 1,257 are without known function. This genome was sequenced as part of the DOE Joint Genome Institute Program.


Introduction
Arthrobacter sp. strain FB24 was isolated from a microcosm made from soil collected at an Indiana Department of Transport facility in Seymour, Indiana. This site was of particular interest because the soils were contaminated by mixed waste, both petroleum hydrocarbons and extreme metal (chromium and lead) levels [1]. Details of microcosm enrichment and isolation procedures used to obtain the Arthrobacter strain have been described previously [2]. This isolate was of particular interest because of its extreme resistance to chromate [3,4]. This work is a part of a larger study determining the compositional and functional diversity of bacterial communities in soils exposed to long-term contamination with metals [5][6][7].

Classification and features
Arthrobacter sp. strain FB24 is a high G+C Grampositive member of the Micrococcaceae (Figure 1, Table 1). The strain is a facultative, non-motile aerobe with characteristic morphology of rod-shaped cells ( Figure 2) that become coccoid in stationary phase. Strain FB24 is able to use a number carbon sources for growth, including glucose, fructose, lactate, succinate, malate, xylose and aromatic hydrocarbons (hydroxybenzoates, phthalate). Additionally, this Arthrobacter sp. strain is resistant to multiple metals: arsenate, arsenite, chromate, cadmium, lead, nickel, and zinc.

Genome sequencing information Genome project history
Arthrobacter sp. strain FB24 was chosen for sequencing by DOE-JGI because of its extreme resistance to chromate. Table 2 presents the project information and its association with MIGS version 2.0 compliance [25].

Genome sequencing and assembly
The random shotgun method was used in Sanger sequencing the genome of Arthrobacter sp. strain FB24 at the DOE-Joint Genome Institution (DOE-JGI). Medium (8 kb) and small (3 kb) insert random libraries were partially sequenced with average success rate of 88% and average high-quality read lengths of 614 nucleotides. Sequences were assembled with parallel phrap (High Performance Software, LLC). Possible mis-assemblies were corrected with Dupfinisher [27] or by analysis of transposon insertions in bridge clones. Gaps between contigs were closed by editing, custom primer walk or PCR amplification. The completed genome sequence of Arthrobacter sp. FB24 contains 89530 reads, achieving an average of 15-fold sequence coverage per base with an error rate less than 1 in 100,000. The sequences of Arthrobacter sp. FB24 can be accessed using the GenBank accession number NC_008541 for the chromosome and NC_008537, NC_008538, NC_008539 for three plasmids.

Genome annotation
Automated gene prediction was performed by using the output of Critica [28], combined with the output of Generation and Glimmer [29]. The assignment of product descriptions was made by using search results of the following curated databases in this order: TIGRFam; PRIAM (e -30 cutoff); Pfam; Smart; COGs; Swissprot/TrEMBL (SPTR); and KEGG. If there was no significant similarity to any protein in another organism, it was described as "hypothetical protein." "Conserved hypothetical protein" was used if at least one match was found to a hypothetical protein in another organism. EC numbering was based on searches in PRIAM at an e -10 cutoff; COG and KEGG functional classifications were based on homology searches in the respective databases. Additionally, the tRNAScanSE tool [30] was used to find tRNA genes, whereas ribosomal RNAs were found by using BLASTn vs. the 16S and 23S ribosomal RNA databases. Other "standard" structural RNAs (e.g., 5S rRNA, rnpB, tmRNA, SRP RNA) were found by using covariance models with the Infernal search tool [31]. The HMMTOP program was used to predict the number of transmembrane segments (TMSs) in each protein. Those predicted to have two or more TMSs (about 918 proteins) were used to interrogate the transporter database (TCDB). Peter Karp's pathologic tool was used for pathway prediction [32]. This method largely relies on the keyword matching and other automatic methods to manually curate some of the pathways, such as aromatic compound degradation. Metabolic pathways were constructed using MetaCyc as a reference data set [33].  a) The total is based on either the size of the genome in base pairs or on the total number of protein coding genes in the annotated genome.
b) Also includes 54 pseudogenes and 5 other genes   a) The total is based on the total number of protein coding genes in the annotated genome.