Researchers are ringing the alarm bells, warning that companies like OpenAI and Google are rapidly running out of human-written training data for their AI models.
And without new training data, it’s likely the models won’t be able to get any smarter, a point of reckoning for the burgeoning AI industry.
“There is a serious bottleneck here,” AI researcher Tamay Besiroglu, lead author of a new paper to be presented at a conference this summer, told the Associated Press. “If you start hitting those constraints about how much data you have, then you can’t really scale up your models efficiently anymore.”
“And scaling up models has been probably the most important way of expanding their capabilities and improving the quality of their output,” he added.
It’s an existential threat for AI tools that rely on feasting on copious amounts of data, which has often indiscriminately been pulled from publicly available archives online.
The controversial trend has already led to publishers, including the New York Times, suing OpenAI over copyright infringement for using their material to train AI models.
And as companies continue to lay off workers while making major investments in AI, the stream of new content could soon turn into a trickle.