For a few years now, my EvoSTAR colleague, Bill Langdon, has been exploring the degree to which Mycoplasma bacteria have contaminated experimental systems and even "infected" online databases with the contents of their genomes. He and his colleagues have previously shown that Mycoplasma genome sequences have previously been mislabeled as human sequences in several online resources (GenBank, dbEST, and RefSeq).
Early microarray designs were based largely on ESTs from these resources, and as a result, the Affymetrix HG-U133 plus 2.0 array contains probes for Mycoplasma sequences. Details for these probes can be found here. Exploiting these probes, Bill and colleagues have also examined the Gene Expression Omnibus for evidence of Mycoplasma contamination, and found around 30 studies (roughly 1% of GEO) that show high expression for this probe, the vast majority of which were from cell cultures.
By their proclivity to infect human experimental cell lines, Bill has playfully described Mycoplasma as having evolved the ability to transmit their genes into online databases.
Continuing this pursuit, Bill recently published an article in BMC BioData Mining illustrating Mycoplasma contamination of the 1000 Genomes Project. It is unclear what the implications of this contamination are for the integrity of 1000 Genomes Data, as the majority of identified Mycoplasma reads to not map to the human reference genome. This work should however serve as a bellwether to those performing experiments, or using experimental data from treated cell lines. In these situations, any contamination might severely taint experimental results.