There’s a quote frequently attributed to Mark Twain that goes, “A lie can travel halfway around the world before the truth can get on its boots.” Whether or not Twain truly said that, a reality in the age of AI is it’s becoming increasingly difficult to distinguish truth from fiction.
New evidence supporting that fact comes from a group of Swedish researchers that just issued its findings regarding a growing number of fake scientific papers published to Google Scholar. The study found that more than 130 submissions either used AI without proper disclosure or were entirely faked using AI tools.
Google Scholar Not So Scholarly?
The researchers decided to conduct a mini-scrape of the Google Scholar index looking for two commonly generated phrases that public AI tools such as ChatGPT or Claude provide as part of the answers produced in response to prompts. The two phrases are:
- “as of my last knowledge update”
- “I don\’t have access to real-time data”
If either or both of those obvious genAI phrases were found in one of the papers uploaded to Google Scholar the team flagged it, and looked it over for proper acknowledgement that an AI tool was used as part of that specific paper’s study methodology.
The search flagged 227 papers, of which 139 papers failed to cite, mention or reference any use of AI—despite its clear use. It’s worth noting that Google Scholar reportedly has more than 389 million records on its website and the researchers’ sample represents a miniscule 0.0000003573% of all published papers.
Regardless, researcher Kristofer Rolf Söderström from Lund University, Sweden explained in an email exchange why his team’s study to callout sham science was necessary.
“With this research, we wanted to address the issue by looking into how common this is, especially because Google Scholar is so easy to use and it is very widely used, even by ourselves, but actually it is not that well controlled,” Söderström wrote.
“Our motivation was that the depth of the issue could be mitigated by such an investigation, thus making an early contribution to highlight the growing concern of undeclared GPT-use in academic papers since this runs the risk of ill-will hacking of society’s evidence base. But really, just the possibility of this happening—even if it is quite uncommon—risks further undermining trust in science, and that this is the last thing society needs right now.”
AI Makes Science Easier and More Accessible To Fake
Söderström highlighted that there are two main risks from this type of scientific flimflammery.
First is the increasing risk that undeclared and mischievous use of genAI in scholarly research produces believable—but still false—academic publications that can be tricky to detect.
Second, the sheer quantity of papers that large language models can produce suggests that the scholarly record risks being overwhelmed with bogus studies.
“One of our findings was that many of these papers have spread to several repositories online, and have appeared in social media. This is a common and mostly automated process, but it makes retractions or corrections of research extremely difficult. Especially because Google Scholar will keep on finding and displaying them,” he wrote.
AI Is Not To Blame —They Blame a Broken System
However, Söderström and his colleagues point out that AI itself isn’t the core problem it’s merely a tool that academicians have found to try and survive within the flawed “publish or perish” culture at most research universities.
The publishing of phony science papers is further compounded given Google’s disproportionate control over scholarly papers, search engines and basic access to online information.
He said the team is doing a larger, deeper dive on this specific topic since their initial query was so limited, but it turned up some many issues and left so many questions unanswered.
“While it is not clear that all papers were actually produced by individuals – they might also be produced by so-called paper mills producing results from fake studies resembling scholarly publications – there might be several potential reasons. The pressure for researchers to continuously publish scholarly output, which can be conducted more frequently through the use of LLM misconduct, could be one of the reasons,” Söderström expressed in the email.
The report doesn’t offer any simple solutions, but it does suggest a multi-pronged approach that needs to include technical, regulatory and educational components to protect the truth.
Let’s just hope we don’t have to wait much longer for it to get its boots on.