Researchers: New AI-Detection Tool Can Identify ChatGPT-Generated Scientific Papers

Artificial Intelligence

A US radiologist wrote a paper with the help of ChatGPT and put it in a peer-reviewed magazine earlier this year.

You can open Table of Contents show

The piece “ChatGPT and the Future of Medical Writing” was written by Som Biswas of the University of Tennessee Health Science Center in Memphis for the journal Radiology.

He said that he wrote and edited the piece to help people understand how useful the technology is.

Dr. Biswas told The Daily Beast, “I am a researcher, and I write articles regularly.”

“If ChatGPT can be used to write stories and jokes, why not use it for research or publication of serious articles?”

He is now said to have used the chatbot to publish 16 more journal papers in four months.

The Daily Beast also quoted an editor of a magazine who said that they had seen a “dramatic uptick” in articles.

Heather Desaire is a chemistry professor at the University of Kansas.

“The story fits with what I know from my own life,” she told the ABC.

“I’m worried that journals will get too many papers and that, as a reviewer for those journals, I’ll be asked to do 10 times as many reviews as I usually do.”

Even though Professor Desaire doesn’t dislike ChatGPT, she thinks it’s important to watch out for unintended effects, and she hopes her latest study will help.

A New AI-detector for Scientific Texts?

Professor Desaire and colleagues reveal that they have created a very accurate approach for identifying ChatGPT-generated writing in scientific papers in the current issue of the journal Cell Reports Physical Science.

According to Professor Desaire, it might be useful for journal editors who are overrun by submissions created using the chatbot.

She suggested that a detector might assist editors in prioritizing the items they put out for review.

The researchers initially established a collection of “telltale signs” that distinguish writing produced by AI from material produced by human scientists in order to construct their tool.

They achieved this by meticulously analyzing 64 “perspective” articles from the journal Science, which are review pieces that discuss recent findings and place them in their wider context. Then they examined 128 articles produced by ChatGPT on related research areas.

By contrasting the two, they were able to identify 20 traits that might be used to determine who wrote a particular scientific text.

Some of these characteristics—which included phrase complexity, sentence length diversity, punctuation use, and vocabulary—appeared to be specific to scientists, the researchers discovered.

For instance, Professor Desaire and her team discovered that while people writing on Twitter might employ punctuation, such as double exclamation points, to indicate emotion, scientists have different language preferences.

Scientists are unique individuals, she claimed. They are utilizing more parentheses and dashes than ChatGPT, but they don’t use double exclamation points.

Her research also revealed that, in contrast, scientists did tend to ramble on.

Professor Desaire observed that “the difference in paragraph length really jumps out at you.”

She continued by saying that extremely short and very long sentences were more common in human writing.

The use of “equivocal language” — terms like “however,” “although,” and “but” — was another feature of human-generated scientific material.

Additionally, scientists doubled the amount of capital letters and utilized twice as many question marks and semicolons.

Training the AI-detector

The 20 features were then used by the researchers to train the commercial machine learning algorithm XGBoost.

Professor Desaire and her colleagues utilize the algorithm, which is also referred to as a “classifier” in the industry, to make decisions between two possibilities on a daily basis in order to find biomarkers for diseases like Alzheimer’s.

They used a sample of 180 articles to test how well their AI-detector worked. They found that it was very good at figuring out whether a scientific article was written by ChatGPT or a real scientist.

“The method is more than 99.99 percent accurate,” said Professor Desaire. He also said that it was better than tools that were already available because it was trained on a wider range of texts than just science writing.

She said that it could be used for other things, like finding plagiarism among students, as long as it was trained on the right words used by that group.

“You could change it into any domain you wanted by thinking about what features would be useful.”

Will it Work in the Real World?

Researchers who weren’t part of the study said it wasn’t fair to compare texts that were made by humans and texts that were made by AI.

“It’s a made-up difference,” said Vitomir Kovanovic, who builds machine learning and AI models at the Centre for Change and Complexity in Learning (C3L) at the University of South Australia.

He said that when scientists use ChatGPT, people and machines tend to work together more. For example, a scientist might change the text that was made by AI.

This is good because ChatGPT sometimes gets things wrong, and one study found that it can even make up references.

Dr. Kovanovi said that the researchers’ success rate went up because they didn’t use joint texts but instead compared 100% AI text to 100% human text.

The real-world accuracy may be less accurate, Lingqiao Liu of the Australian Institute for Machine Learning at the University of Adelaide agreed, resulting in more incorrect classifications than anticipated.

Dr. Liu, who creates algorithms to identify photos produced by artificial intelligence, indicated that while methodologically sound, there is a risk in employing this.

The approach’s potential scope would need to be demonstrated in research with larger samples, according to Professor Desaire and colleagues.

However, further research has demonstrated that the tool is still helpful when used in human/ChatGPT partnerships, according to Professor Desaire.

“We can still forecast the difference with a high degree of accuracy.”

AI ‘Arms Race’

But, as Dr. Liu pointed out, it was possible to tell ChatGPT to write in a certain way that could get a text written entirely by AI past an analyzer.

And putting out features that can tell the difference between text written by a person and text written by a computer would only help.

Some people even talk about a “arms race” between those who want to make computers more human-like and those who want to catch people who would use this for bad reasons.

Dr. Kovanovi thinks this is a “pointless race to have” given how fast technology is moving and how good it could be. He says that AI recognition “misses the point”

“I think it’s much better to focus on how to use AI in a useful way.”

He also said that using anti-plagiarism software to give college students a score based on how likely it was that their work was written by AI was too stressful.

He said, “It’s hard to believe that score.”

Kane Murdoch, who looks into wrongdoing at Macquarie University, said that anti-ChatGPT software is sometimes like a “black box” in terms of how it works.

He said that some AI-detection systems don’t have as much information as Professor Desaire’s work.

“It’s not clear how they came up with these numbers,” he said.

“We could just work on getting better at judging.”

Mr. Murdoch also wonders if AI detection in areas like science could scare people away from using AI in a “ethical” way, which could help important science communication.

“Someone might not be a great writer, but they could be a great scientist.”

Dr. Liu said that it was important to keep researching AI detection, even though it was hard, and that the study done by Professor Desaire and his colleagues was “a good starting point” for judging scientific writing.

A spokesperson for the journal Science said that the journal recently changed its editorial rules to say that text made by any AI tool can’t be used in a scientific paper.

They said that there might be “acceptable uses” of AI-made tools in scientific papers in the future, but that the journal needed to know more about what uses the scientific community thought were “allowed.”

“A tool that could accurately tell if submissions were made by ChatGPT or not could be a useful addition to our strict peer review processes if it had a track record of being accurate,” they said.