Stanford researchers uncover patterns in how scientists lie about their data: When scientists falsify data, they try to cover it up by writing differently in their published works. A pair of Stanford researchers have devised a way of identifying these written clues.
There is a fair amount of research dedicated to understanding the ways liars lie. Studies have shown that liars generally tend to express more negative emotion terms and use fewer first-person pronouns. Fraudulent financial reports typically display higher levels of linguistic obfuscation – phrasing that is meant to distract from or conceal the fake data – than accurate reports.Uh, doesn't that imply that the average paper has something like 9,000 jargon words? Either their definition of jargon is generous indeed, or those numbers are wrong. I'm guessing the latter. And a 1.5% increase may be statistically significant, but it would certainly not be readily apparent to a casual reader.
To see if similar patterns exist in scientific academia, Jeff Hancock, a professor of communication at Stanford, and graduate student David Markowitz searched the archives of PubMed, a database of life sciences journals, from 1973 to 2013 for retracted papers. They identified 253, primarily from biomedical journals, that were retracted for documented fraud and compared the writing in these to unretracted papers from the same journals and publication years, and covering the same topics.
They then rated the level of fraud of each paper using a customized "obfuscation index," which rated the degree to which the authors attempted to mask their false results. This was achieved through a summary score of causal terms, abstract language, jargon, positive emotion terms and a standardized ease of reading score.
"We believe the underlying idea behind obfuscation is to muddle the truth," said Markowitz, the lead author on the paper. "Scientists faking data know that they are committing a misconduct and do not want to get caught. Therefore, one strategy to evade this may be to obscure parts of the paper. We suggest that language can be one of many variables to differentiate between fraudulent and genuine science."
The results showed that fraudulent retracted papers scored significantly higher on the obfuscation index than papers retracted for other reasons. For example, fraudulent papers contained approximately 1.5 percent more jargon than unretracted papers.
"Fradulent papers had about 60 more jargon-like words per paper compared to unretracted papers," Markowitz said. "This is a non-trivial amount."
But what to do about it:
In the future, a computerized system based on this work might be able to flag a submitted paper so that editors could give it a more critical review before publication, depending on the journal's threshold for obfuscated language. But the authors warn that this approach isn't currently feasible given the false-positive rate.That's it! Put Skynet in charge of rooting out the fraudulent scientists.