While the sequencing of the genome described in this paper was underway, Li et al. from the Third Institute of Oceanography, Xiamen, China published the complete genome sequence of strain TPY . The two genomes differ in size by less than 7,000 bp. Here, we take the opportunity to compare the completed genome sequences from these two stains, NALT and TPY, both belonging to S. acidophilus. While the biological material for the type stain, NALT, is publicly available from the DSMZ open collection for postgenomic analyses, no source of the biological material (MIGS-13 criterion, see Table 2) of strain TPY was provided by Li et al. .
To estimate the overall similarity between the genomes of strains NALT and TPY (Genbank accession number: CP002901), the GGDC-Genome-to-Genome Distance Calculator [50,51] was used. The system calculates the distances by comparing the genomes to obtain HSPs (high-scoring segment pairs) and interfering distances from three formulae (HSP length / total length; identities / HSP length; identities / total length). The comparison of the genomes of strains NALT and TPY revealed that 99.65% of the average of the genome lengths are covered with HSPs. The identity within these HSPs was 99.01%, whereas the identity over the whole genome (counting regions not covered by HSPs as non-identical) was 98.67%. The inferred digital DNA-DNA hybridization values for the two strains are 96.47% (formula 1 in ), 86.08% (formula 2 in ) and 97.05% (formula 3 in ), respectively. These results clearly demonstrate that according to the whole genome sequences of strains NALT and TPY, the similarity is very high, supporting the membership of both strains in the same species.
The comparison of the number of genes belonging to the different COG categories revealed few differences between the genomes of strains NALT and TPY. Strain NALT has 2,740 genes with COGs assigned, while strain TPY has 2,700. We analyzed the differences in COG assignment between the two strains and found that in almost all cases they could be explained by differences in the gene calls or pseudogene assignment, i.e. in one genome two parts of a pseudogene were called as two separate genes, while in the other genome they were combined into one pseudogene. The only clear case of a difference in gene content between the two strains is the presence of a transposable element consisting of two genes (Sulac_1668, Sulac_1669) disrupting a subunit of a potassium transporter (Sulac_1667) in strain NALT. There were also cases where a gene in one strain was split into two genes in the other strain. For example, Sulac_2178 corresponds to TPY_1983 and TPY1984, and Sulac_0347 corresponds to TPY_0381 and TPY_0382. In both cases the differences are due to a single base indel.
A dot plot showed that there are large blocks of synteny between the two genomes with some rearrangements (data not shown). The genes found on the plasmid in strain NALT are found in two regions of the chromosome in strain TPY. Sulac_3528-3555 corresponds to TPY_0524-0552, while Sulac_3556-3626 corresponds to TPY_2310-2244. This suggests that in strain TPY, the plasmid was inserted into the chromosome and then split into two pieces.
We analyzed CRISPR repeats with the CRISPR Recognition Tool  and found major differences between the two strains. They both have two regions of CRISPR repeats, but the strain TPY repeat regions have 8 and 9 repeats while the strain NALT repeat regions have 27 and 43 repeats. All of the spacers in the TPY repeat regions are found in NALT, but NALT has many additional spacers. This agrees with previous results suggesting that CRISPRs evolve quickly, and differences can be found in closely related strains .