OpenAI Announces GPTbot That Collects Data To Train AI: What About Consent?
While GPTbot can be useful for AI systems and can make them more accurate and contextual, it could also have been accessing content without consent to build their own product
ChatGPT's parent company OpenAI recently announced GPTbot, a web crawler that scraped publicly available information from the internet. Through this information, OpenAI could train the next iteration of its generative AI, GPT-5.
While GPTbot can be useful for AI systems and can make them more accurate and contextual, it could also have been accessing content without consent to build their own product. Essentially, all free content online could be scraped by OpenAI using GPTbot, a web crawler or a spiderbot to train its generative AI.
Are web crawlers acceptable?
The company also announced a way to block GPTbot from accessing your website and content through a common protocol known as robots.txt as explained here. While many creators have already implemented this, it's unclear for how long GPTbot has been active.
It makes sense why many creators and creative organisations have trouble trusting such bots. A similar bot called CCBot is another web crawler that is run by an organisation called Common Crawl, a major supplier of training data for AI models. Even if you block this bot now, chances are that it already has all your data.
Also read: Elon Musk Says He's A 'Huge Idiot' For Letting Go Of ChatGPT Creator OpenAI
So what do creators want? Such AI bots to become "opt-in" instead of "opt-out," as an editor told Business Insider. Essentially, OpenAI and similar companies should take permission instead of informing companies that their data may have been used. Fair, right?
Finally, after soaking up all your copyrighted content to build their proprietary product, OpenAI gives you a way to prevent your content from being used to further improve their product.#seo #gptbot #content pic.twitter.com/Ui4l5hXYWH
¡ª Prasad Dhumal ? (@prasaddhumal_) August 7, 2023
Even then, OpenAI says GPTbot can respect some boundaries. It filters out sources that are behind a paywall and removes sources that collect personal information of users.
Also read: OpenAI's ChatGPT Unveils 'Incognito Mode' For Enhanced User Privacy
While such bots can be beneficial to AI algorithms and its creators, very little benefit is expected to be felt in creative industries for creators in the near-future. Instead, using their data and content for training without consent sets the wrong tone for future AI models.
What do you think about GPTbot? Let us know in the comments below. For more in the world of technology and science, keep reading Indiatimes.com.