'We ran out of internet data to train AI models last year', says Elon Musk
LLMs should be continuously fed more data, but what happens when all the information on the internet has already been used to train AI? As crazy as it might sound, this is a real concern experts have voiced, and according to Elon Musk, that day is not in the future.
Artificial intelligence (AI) is still the buzzword in the tech industry. From AI startups like OpenAI, Anthropic, and xAI to tech giants like Google, Microsoft, Facebook, and Amazon, all are investing billions into developing new AI tools.
How LLMs are trained
The current developments are around Large Language Models (LLMs), which are used to train AI models. LLMs like ChatGPT are fed a vast amount of data as training materials, and based on their learning, the chatbots generate their responses to user queries.
For LLMs to improve, they should be continuously fed more data, but what happens when all the information on the internet has already been used to train AI? As crazy as it might sound, this is a real concern experts have voiced, and according to Elon Musk, that day is not in the future.
No more data left to train AI: Musk
Musk, who was one of the co-founders of OpenAI and now owns his own AI startup xAI, said that in 2024 itself, we ran out of data to train AI models.
¡°We¡¯ve now exhausted basically the cumulative sum of human knowledge ¡. in AI training. That happened basically last year,¡± Musk said during a live-streamed conversation with Stagwell chairman Mark Penn on X on Wednesday.
Musk is not alone
It should be noted that Musk is not the first to make this claim. Last month, Ilya Sutskever, the former chief scientist of OpenAI, had said that the industry had reached ¡°peak data and there¡¯ll be no more,¡± to train AI models.
What is next for AI
Experts have said that going forward, synthetic data, which is data that is created using algorithms and simulations, will be used to train AI models.
Musk also agreed that synthetic data is the way forward.
¡°The only way to supplement [real-world data] is with synthetic data, where the AI creates [training data]. With synthetic data ¡ [AI] will sort of grade itself and go through this process of self-learning,¡± he said.
For more news and current affairs from around the world, please visit Indiatimes News.