High quality draft genome sequence of Staphylococcus cohnii subsp. cohnii strain hu-01

Staphylococcus cohnii subsp. cohnii belongs to the family Staphylococcaceae in the order Bacillales, class Bacilli and phylum Firmicutes. The increasing relevance of S. cohnii to human health prompted us to determine the genomic sequence of Staphylococcus cohnii subsp. cohnii strain hu-01, a multidrug-resistant isolate from a hospital in China. Here we describe the features of S. cohnii subsp. cohnii strain hu-01, together with the genome sequence and its annotation. This is the first genome sequence of the species Staphylococcus cohnii.


Introduction
Staphylococcus cohnii belongs to the Coagulase-Negative Staphylococci group. It was described by Schleifer and Kloos (1975) and was named for Ferdinand Cohn, a German botanist and bacteriologist [1]. Recently, more cases of Staphylococcus cohnii infection have been reported in the literature. This organism may be responsible for brain abscess, pneumonia, acute cholecystitis, endocarditis, bacteremia, urinary tract infection and septic arthritis [2]. S. cohnii is comprised of two subspecies that are defined on the basis of their phenotypic characteristics: Staphylococcus cohnii subsp. cohnii and Staphylococcus cohnii subsp. urealyticus [3]. S. cohnii subsp. cohnii is a Gram-positive coccus, coagulase negative and catalase positive, that behaves like a commensal mucocutaneous bacterium [4]. It has more frequently been isolated in hospital than in non-hospital environments [2]. Here we report this draft genome of S. cohnii subsp. cohnii strain hu-01, the first genome of this species to be sequenced.

Classification and features
Strain hu-01 was isolated from a hospital environment in Zhejiang province, China, in October 2012. It is a Gram-positive, coccus-shaped bacterium that can grow on 5% sheep blood enriched Columbia agar (BioMérieux, Marcyl'Etoile, France) at 37°C. Growth occurs under either aerobic or anaerobic conditions. The optimum temperature for growth is 37 ºC, with a temperature range of 15-45 ºC (Table 1). Cell morphology, motility and sporulation were examined by using transmission electron (H-600, Hitachi) microscopy. Cells of strain hu-01 are coccoidal, 0.6 to 1.2 μm in diameter, occurring predominantly singly or in pairs ( Figure 1 and Figure 2). Standards in Genomic Sciences  , not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [32]. If the evidence code is IDA, then the property should have been directly observed, for the purpose of this specific publication, for a live isolate by one of the authors, or an expert or reputable institution mentioned in the acknowledgements. Standards in Genomic Sciences   Table 2 presents the project information and its association with MIGS version 2.0 compliance [9].

Genome sequencing and assembly
One DNA library was generated (500 bp insert size, with the Illumina adapter at both ends, detected by Agilent DNA analyzer 2100), then sequencing was performed by using an Illumina Hieseq 2000 genomic sequencer, with a 2×100 pair end sequencing strategy. A total of 1,103 M bp of sequence data was produced which was assessed for quality by the following criteria: 1) Reads linked to adapters at both end were considered as sequencing artifacts then removed. 2) Bases with a quality index lower than Q20 at both ends were trimmed. 3) Reads with ambiguous bases (N) were removed. 4) Single qualified reads were discarded (In this situation, one read is qualified but its mate is not). A total of 867.94 M clean filtered reads were assembled into scaffolds using the Velvet version 1.2.07 with parameters "scaffolds no" [10], then we used a PAGIT flow [11] to prolong the initial contigs and correct sequencing errors. to arrive at a set of improved scaffolds.
To annotate predicted genes, we used HMMER version 3.0 [15], with parameters 'hmmscan -E 0.01 -domainE 0.01' to align genes against Pfam version 27.0 [16] (only pfam-A was used) to find genes with conserved domains. The KAAS server [17] was used to assign translated amino acids (with genetic code table 11) into KEGG Orthology with SBH (single-directional best hit) method. Translated genes were aligned with the COG database using NCBI blastp (hits should have scores no less than 60, e-value is no more than 1e-6). To find genes with hypothetical or putative function, we aligned genes against NCBI nucleotide sequence database (nt database was downloaded at Sep 20, 2013) by using NCBI blastn, only if hits have an identity of no less than 0.95, coverage no less than 0.9, and the reference gene had an annotation of putative or hypothetical. To define genes with signal peptide, we use signalp version 4.1 [18] to identify genes with signal peptide with default parameters except " -t gram+ ". TMHMM2.0 [19] was used to identify genes with transmembrane helices.

Genome properties
The draft genome sequence of S. conhii subsp. cohnii strain hu-01 revealed a genome size of 5,761,489 bp and a G+C content of 34.85% (521 scaffolds with N50 is 39,926 bp). These scaffolds contain 5,820 coding sequences (CDSs), 61 tRNAs (excluding 6 Pseudo tRNAs) and incomplete rRNA operons (10 small subunit rRNA and 3 large subu-nit rRNAs). A total of 1,840 protein-coding genes were assigned as putative function or hypothetical proteins. 3,734 genes were categorized into COGs functional groups. The properties and the statistics of the genome are summarized in Table 3 and Table 4. The total is based on either size of the genome in base pairs or total number of genes in the annotated genome. No hits For some genes, qualified alignments can occur with several genes belonging to different COG categories. In such cases only the best match to a single COG category is considered. b) The total is based on the total number of protein coding genes (5,820) in the annotated genome. c) These genes have alignments with reference genes archived in COG, but these reference genes do not have COG categories. d) Genes without a qualified hit to a reference genes.

Conclusion
Staphylococcus cohnii ssp. cohnii are part of the normal flora of human skin and mucous membranes which, in particular conditions, may become opportunistic pathogens [4]. The genome sequence of Staphylococcus cohnii subsp. cohnii strain hu-01 will provide the basis to elucidate the molecular principles of host colonization and insight into the genetic background of this organism's pathogenesis.