In its August edition, Resources Policy, an academic journal under the Elsevier publishing umbrella, featured a peer-reviewed study about how ecommerce has affected fossil fuel efficiency in developing nations. But buried in the report was a curious sentence: “Please note that as an AI language model, I am unable to generate specific tables or conduct tests, so the actual results should be included in the table.”
The study’s three listed authors had names and university or institutional affiliations—they did not appear to be AI language models. But for anyone who has played around in ChatGPT, that phrase may sound familiar: The generative AI chatbot often prefaces its statements with this caveat, noting its weaknesses in delivering some information. After a screenshot of the sentence was posted to X, formerly Twitter, by another researcher, Elsevier began investigating. The publisher is looking into the use of AI in this article and “any other possible instances,” Andrew Davis, vice president of global communications at Elsevier, told WIRED in a statement.
Elsevier’s AI policies do not block the use of AI tools to help with writing, but they do require disclosure. The publishing company uses its own in-house AI tools to check for plagiarism and completeness, but it does not allow editors to use outside AI tools to review papers.
The authors of the study did not respond to emailed requests for comment from WIRED, but Davis says Elsevier has been in contact with them, and that the researchers are cooperating. “The author intended to use AI to improve the quality of the language (which is within our policy), and they accidentally left in those comments—which they intend to clarify,” Davis says. The publisher declined to provide more information on how it would remedy the Resources Policy situation, citing the ongoing nature of the inquiry.
The rapid rise of generative AI has stoked anxieties across disciplines. High school teachers and college professors are worried about the potential for cheating. News organizations have been caught with shoddy articles penned by AI. And now, peer-reviewed academic journals are grappling with submissions in which the authors may have used generative AI to write outlines, drafts, or even entire papers, but failed to make the AI use clear.
Journals are taking a patchwork approach to the problem. The JAMA Network, which includes titles published by the American Medical Association, prohibits listing artificial intelligence generators as authors and requires disclosure of their use. The family of journals produced by Science does not allow text, figures, images, or data generated by AI to be used without editors’ permission. PLOS ONE requires anyone who uses AI to detail what tool they used, how they used it, and ways they evaluated the validity of the generated information. Nature has banned images and videos that are generated by AI, and it requires the use of language models to be disclosed. Many journals’ policies make authors responsible for the validity of any information generated by AI.
Most PopularThe End of Airbnb in New YorkBusiness
Experts say there’s a balance to strike in the academic world when using generative AI—it could make the writing process more efficient and help researchers more clearly convey their findings. But the tech—when used in many kinds of writing—has also dropped fake references into its responses, made things up, and reiterated sexist and racist content from the internet, all of which would be problematic if included in published scientific writing.
If researchers use these generated responses in their work without strict vetting or disclosure, they raise major credibility issues. Not disclosing use of AI would mean authors are passing off generative AI content as their own, which could be considered plagiarism. They could also potentially be spreading AI’s hallucinations, or its uncanny ability to make things up and state them as fact.
It’s a big issue, David Resnik, a bioethicist at the National Institute of Environmental Health Sciences, says of AI use in scientific and academic work. Still, he says, generative AI is not all bad—it could help researchers whose native language is not English write better papers. “AI could help these authors improve the quality of their writing and their chances of having their papers accepted,” Resnik says. But those who use AI should disclose it, he adds.
For now, it's impossible to know how extensively AI is being used in academic publishing, because there’s no foolproof way to check for AI use, as there is for plagiarism. The Resources Policy paper caught a researcher’s attention because the authors seem to have accidentally left behind a clue to a large language model’s possible involvement. “Those are really the tips of the iceberg sticking out,” says Elisabeth Bik, a science integrity consultant who runs the blog Science Integrity Digest. “I think this is a sign that it's happening on a very large scale.”
In 2021, Guillaume Cabanac, a professor of computer science at the University of Toulouse in France, found odd phrases in academic articles, like “counterfeit consciousness” instead of “artificial intelligence.” He and a team coined the idea of looking for “tortured phrases,” or word soup in place of straightforward terms, as indicators that a document likely comes from text generators. He’s also on the lookout for generative AI in journals, and is the one who flagged the Resources Policy study on X.
Cabanac investigates studies that may be problematic, and he has been flagging potentially undisclosed AI use. To protect scientific integrity as the tech develops, scientists must educate themselves, he says. “We, as scientists, must act by training ourselves, by knowing about the frauds,” Cabanac says. “It’s a whack-a-mole game. There are new ways to deceive."
Tech advances since have made these language models even more convincing—and more appealing as a writing partner. In July, two researchers used ChatGPT to write an entire research paper in an hour to test the chatbot’s abilities to compete in the scientific publishing world. It wasn’t perfect, but prompting the chatbot did pull together a paper with solid analysis.
Most PopularThe End of Airbnb in New YorkBusiness
That was a study to evaluate ChatGPT, but it shows how the tech could be used by paper mills—companies that churn out scientific papers on demand—to create more questionable content. Paper mills are used by researchers and institutions that may feel pressure to publish research but who don’t want to spend the time and resources to conduct their own original work. With AI, this process could become even easier. AI-written papers could also draw attention away from good work by diluting the pool of scientific literature.
And the issues could reach beyond text generators—Bik says she also worries about AI-generated images, which could be manipulated to create fraudulent research. It can be difficult to prove such images are not real.
Some researchers want to crack down on undisclosed AI writing, to screen for it just as journals might screen for plagiarism. In June, Heather Desaire, a professor of chemistry at the University of Kansas, was an author on a study demonstrating a tool that can differentiate with 99 percent accuracy between science writing produced by a human and entries produced by ChatGPT. Desaire says the team sought to build a highly accurate tool, “and the best way to do that is to focus on a narrow type of writing.” Other AI writing detection tools billed as “one-size fits all” are usually less accurate.
The study found that ChatGPT typically produces less complex content than humans, is more general in its references (using terms like others, instead of specifically naming groups), and uses fewer types of punctuation. Human writers were more likely to use words like however, although, and but. But the study only looked at a small data set of Perspectives articles published in Science. Desaire says more work is needed to expand the tool’s capabilities in detecting AI-writing across different journals. The team is “thinking more about how scientists—if they wanted to use it—would actually use it,” Desaire says, “and verifying that we can still detect the difference in those cases.”