Researchers are worried about the extent of plagiarism within ChatGPT and similar language model that can generate content based on a few simple prompts. After the tool went public, instances of plagiarism within universities and educational institutions became more common.
Now, a research team from Penn University states that language models like ChatGPT plagiarise content in ways more than one. "Plagiarism comes in different flavours," said Dongwon Lee, professor of information sciences and technology at Penn State. "We wanted to see if language models not only copy and paste but resort to more sophisticated forms of plagiarism without realising it."
The researchers attempted to identify three forms of plagiarism - verbatim; paraphrasing; and idea. The first kind includes content that was directly lifted, paraphrasing corresponds to rewording content without citing the original source, and idea refers to using the key thought of a research without the right attribution.
They created a pipeline to automatically detect plagiarism and run it on OpenAI's GPT-2 model, allowing researchers to compare AI-generated text to 8 million documents that were used to pre-train GPT-2.
Also read:?ChatGPT Chatbot To Add Customisation Abilities, Diverse Viewpoints In Near-Future
About 210,000 generated texts were tested for plagiarism - divided by specific topic areas that are scientific documents, scholarly articles about Covid-19, and patent claims.
Using an open-source search engine, the research team retrieved?top 10 training documents that were similar to AI-generated text. They found that GPT-2 committed all three types of plagiarism and this rate was higher with larger datasets.
While language models may have managed to keep down verbatim plagiarism, instances of paraphrasing and idea plagiarism were still up. The researchers also found that language models often exposed individuals' private information through all three forms of plagiarism.
Also read:?New Bing Search Engine Driven By ChatGPT Wants To Be 'Alive And Powerful'
Findings from the study will be presented at the 2023 ACM Web Conference that is set to take place in Austin, Texas. Even though the study only takes into account GPT-2, its process may be applied to newer language models like ChatGPT.
What do you think about the problem of plagiarism in language models? Let us know in the comments below.?For more in the world of?technology?and?science, keep reading?Indiatimes.com.