Thursday, February 27, 2014

A Great Milestone for Science

A Computer program able to write and publish peer-reviewed articles!

What a great time saver! Why didn't I think of this? I guess my skills in Fortran, Basic, and Apple II+ machine language just weren't up to snuff.  Take it away, Ace:

120 Scientific Papers Withdrawn After Being Proven to be Gibberish. No, Actual Computer-Generated Gibberish.
Multiple layers of painstaking fact-checking editorial oversight.
So, some scientists at MIT had invented a program called "SCIgen" to generate, by computer, random scientific-sounding papers. They did this for amusement.

But people (especially in China, apparently) have been using the program to generate papers and then submit them to actual scientific publishers' subscription services.
“The papers are quite easy to spot,” says Labbé, who has built a website where users can test whether papers have been created using SCIgen. His detection technique, described in a study published in Scientometrics in 2012, involves searching for characteristic vocabulary generated by SCIgen. Shortly before that paper was published, Labbé informed the IEEE of 85 fake papers he had found. Monika Stickel, director of corporate communications at IEEE, says that the publisher “took immediate action to remove the papers” and “refined our processes to prevent papers not meeting our standards from being published in the future”. In December 2013, Labbé informed the IEEE of another batch of apparent SCIgen articles he had found. Last week, those were also taken down, but the web pages for the removed articles give no explanation for their absence.
Ruth Francis, UK head of communications at Springer, says that the company has contacted editors, and is trying to contact authors, about the issues surrounding the articles that are coming down. The relevant conference proceedings were peer reviewed, she confirms — making it more mystifying that the papers were accepted.
It's possible the reviewers chalked up the computerese nonsense to a language barrier, figuring the "scientist" who wrote them spoke Chinese as a first language and was struggling with the English language. But this only goes so far, because, ultimately, these papers didn't make sense in any language. Because they were gibbrerish.
Labbé (the guy who built the tool for finding these fakes) wanted to prove how easy it was to spoof the system so he created a fake scientist named "Antkare."
Labbé is no stranger to fake studies. In April 2010, he used SCIgen to generate 102 fake papers by a fictional author called Ike Antkare. Labbé showed how easy it was to add these fake papers to the Google Scholar database, boosting Ike Antkare’s h-index, a measure of published output, to 94 — at the time, making Antkare the world's 21st most highly cited scientist.
Ike Antkare?  I couldn't kare less...
Why? Why would 120 fake, gibberish, nonsense papers be submitted to these publishers? And how did they make it onto the system?

Well possibly this is a prank, or an attempt to prove how easy it is to get nonsense published, as Labbé already proved.

Or, possibly:

Apparently, in science, one gross method of ranking your authority is by counting up the number of times you're cited in other scientific papers.

So, what if you could just spam a lot of fictitious, gibberish papers and get them into "the system" (the subscription services) citing you a whole bunch of times? Then your crude bean-counting ranking goes up.
I'm willing to bet this new "batch" of faked papers is not, in fact, an educational prank, but rather curriculum vitae padding by a bunch of Chinese authors.  They are extremely smart, extremely competitive, and many of them are desperate to get out of China and into civilized countries.  They know that most of the people who they are hoping to hire them to allow them to move out of China aren't actually going to read their old papers, just the title, and if they're unlucky, maybe the abstract. They have a high bar to overcome, many United States scientists have a strong bias against foreign scientists (which they will mostly vociferously deny), and Chinese in particular. An impression of productivity and familiarity with English is what they are trying to convey, hoping that the prospect of getting their names on subordinates papers will be enough of a carrot to get them to take a risk on a Chinese post-doc who, on the face of things, seems to have questionable English skills.


  1. seems legit... mostly climate change papers?

    1. My guess is not; the vast majority of papers are biomedical. But if an environmental scientist used it, my guess is that whatever algorithm SCIgen uses would grab some climate change phrases to insert.