Fakin’ It the Modern Way

I’ve written here many times about varieties of fraud in the literature. It ranges from retail (someone faking a Western blot band in a single paper to make a key figure say something it wouldn’t have) to industrialized wholesale, and that latter style is what this new preprint addresses. Over the last ten to twenty years, the rise of “paper mills”, who tend to call themselves “academic support agencies” or the like, has been steady. These are outfits that charge what the market will bear – which is sometimes quite a lot – to produce complete from-the-ground-up fake papers and get them placed into indexed journals. There are hundreds of these junk-generating sites out there, and they will write the text – well, if that’s the correct verb – produce figures and illustrations, generate sorta-plausible fake data, the lot. Some of them are now going a step further and contacting journal editors directly with offers to send them papers that will bump up the journal’s citation index rankings if they accept them, with some cash on the side for each acceptance! One of these offers is provided in great detail in Table 1 of the preprint, and it does not beat around the damn bush:

We are a well-known academic support institution from Guangzhou, China, which has been established for 8 years. Experts in our institution need to publish some research papers in SCI journals every month. For reducing the publication time, we expect to cooperate with you in the future. Cooperation mode: we cite the content of your journal in our articles, thus increasing … your impact factor in 2022. You shall help us shorten the publication time. Payment: If an article is successfully published, we will pay for it at the price: IF*1000 USD/article. For example, with IF=2.36, total payment=2.36*1000 USD=2360USD. And this price is negotiable.

Here’s an interesting perspective from the publishing side, from the editors of Naunyn-Schmiedeberg’s Archives of Pharmacology, which is just the sort of journal (established, but not top-tier) that the paper mills like to target. As that article notes, there are a number of features that these papers tend to display (or have tended to, since the landscape does change). Among these are commercial service email contact addresses rather than academic ones, difficulties with service labs (such as actually naming them or providing any contact information) if you dig into any of the data – gosh, that’s why we can’t provide the raw numbers, that darn contract lab misplaced them, all sorts of other excuses why the raw data can’t be provided (with various pandemic-related ones being favored in the last couple of years, naturally), images that don’t hold up if you look them over closely (but which seem OK on first inspection), correspondence about the paper which is at a notably lower English fluency than the manuscript itself (this can be legit, of course, but it seems to be almost invariable with the paper-mill products), slow and evasive email correspondence behavior in general, and so on. Sometimes the signs are immediate if you push a little bit: the “teaching hospital” that’s listed as the authors’ affiliation turns out not to even exist, suggested reviewers who turn around positive recommendations within the hour in broken English, and even the exact same manuscript showing up with a different roster of authors.

The new preprint is trying to estimate the size of the problem, and they went with three indicators as a first pass: private/commercial email as the contact, hospital affiliation, and lack of any international co-author(s). Now, that’s going to generate some false positives, of course, because there are legitimate papers with these features, but it’s likely to dredge up more actual junk along the way. What the authors did was manually inspect papers in various fields to flag a decent sample of these, and then follow up with the authors by sending them a questionnaire saying that the editorial staff was trying to root out suspicious papers. The same questionnaire was send out to a random control sample of paper authors as well. It asked everyone if they were willing to provide raw data, if they had engaged the services of professional paper-editing services during the preparation of the manuscript, and so on – the sort of stuff that a faked-paper’s authors would be very reluctant to answer directly. This seemed to work pretty well: inspection of 215 neurology papers, for example, flagged 44 of them as suspicious. Contact with those authors (and with 48 random controls) led to replies from 96% of the latter group and only 44% of the suspicious group, even after repeated reminders that failure to answer could lead to the paper being retracted, so there’s an indicator right up front. Only one of authors of the 44 suspicious papers even said that they were willing to provide original data at all.

Using those metrics across larger journal samples revealed the expected (and depressing) trends: there are more such “flaggable” papers in lower-impact journals, with the most in the bottom-tier open access ones, and their numbers have been increasing steadily year by year. The best set of red-flag indicators so far seems to be the combinatoin of private email, hospital affiliation, and citation of other papers that fit these criteria: those three together had a sensitivity of 90% and a false-positive rate of 30% when compared to lists of known fakes and known legitimate papers. Not perfect, but a useful filter to bring on further scrutiny.

The authors go on to use these rules to estimate an upper bound for faked papers in general, and that doesn’t make for cheerful reading, either. That bound is estimated to be in the mid-20s-percentile (one sample gave about 24%, and two others closer to 29%). Which means that there are hundreds of thousands of fake papers published every year, since the total yearly publication count is now about 1.3 million. Looking at the origin of these red-flag papers, China seems to be the number one contributor in sheer volume (over half the faked papers globally, and it along with Russia, Turkey, Egypt, and India rank highly for faked papers as a percentage of a country’s total output. Well, rank lowly – you know what I mean. That Naunyn-Schmiedeberg paper mentioned above tries to dance around this fact:

The retraction notes published in this issue (Table 1) and the additionally forthcoming retraction notes are all from one country. Thus, one can assume that paper mills reside in this country and that there must be substantial career-advancing incentives for “authors” to invest (private?) financial resources into paper mill papers. Interestingly, paper mills aim at disguising the geographical origin of the emails by sending them at any day or night time, strikingly deviating from “conventional” email writing and sending patterns by scientists from different continents. In addition, the risk of being uncovered is apparently deemed to be rather low by the fake authors and largely outweighed by the professional “benefits.” It must be assumed that a specific political climate in this country strongly fosters the use of paper mills. It cannot be excluded that paper mills operate in other countries as well. Therefore, in our revised editorial guidelines, the request for original data concerns authors from all countries. We deeply regret that we had to implement this rule globally, but we did not want to convey the impression of discrimination of a specific country.

Well, while it’s true that you can’t just say “it’s all one country”, if you were to devote particular attention to papers from unknown Chinese sources you would not go far wrong, and if you wanted to go on and add a short list of a few other countries to the “special scrutiny” list you would in fact be making a more efficient use of your manuscript-flagging time than otherwise. It’s not a nice thing to bring up, but it’s true. And it’s not a mystery, either, because there are some institutions in these countries that have had explicit quotas and targets for how many papers need to be published in indexed journals for promotion, etc. As for the paper mills themselves, the authors of the current preprint find them almost entirely in China, India, the UK, and the US – although the latter two provide an extremely small percentage of the customers, their English fluency and proximity in time zones to major publishers is presumably an advantage. With India they have both English supply and customer demand, it would seem.

Fixing this is not going to be easy (the preprint has a good list of recommendations). But the absolute first step is realizing the size of the problem. And it’s big.