
iStockphoto
The largest study of its kind shows AI assistants get the news wrong 45% of the time, regardless of which language or AI platform is tested. In addition to consistently misrepresenting news content almost half of the time, the AI assistants often had significant sourcing problems including missing, misleading, or incorrect attributions.
The results of the study were determined by 22 public service media (PSM) organizations in 18 countries working in 14 languages who evaluated more than 3,000 responses to 30 core news questions (plus optional local queries) from ChatGPT, Copilot, Gemini, and Perplexity. The answers were checked for accuracy, sourcing, distinguishing opinion from fact, and providing context.
What they found was that 45% of all AI answers had at least one significant issue; 31% of responses showed serious sourcing problems – missing, misleading, or incorrect attributions; and 20% contained major accuracy issues, including fabricated details, outdated information, and misleading sourcing.
The study, coordinated by the European Broadcasting Union (EBU) and led by the BBC, is significant as more and more companies attempt to replace longstanding search engines and web browsers with artificial intelligence-powered platforms. As the researchers point point out, Reuters Institute’s Digital News Report 2025 showed 7% of total online news consumers use AI assistants to get their news. That number increases to 15% for people under the age of 25.
“We’re excited about AI and how it can help us bring even more value to audiences,” said Peter Archer, BBC Programme Director, Generative AI. “But people must be able to trust what they read, watch and see. Despite some improvements, it’s clear that there are still significant issues with these assistants. We want these tools to succeed and are open to working with AI companies to deliver for audiences and wider society.”
The worst AI assistant in the study was Gemini, which had significant issues in 76% of its responses. That number was more than twice as high as any other AI assistant that was tested. The researchers said Gemini’s failures were mostly due to poor sourcing. Second on the list was Copilot at 37%, followed by ChatGPT at 36%, and Perplexity at 30%.
“This research conclusively shows that these failings are not isolated incidents,” said EBU Media Director and Deputy Director General Jean Philip De Tender. “They are systemic, cross-border, and multilingual, and we believe this endangers public trust. When people don’t know what to trust, they end up trusting nothing at all, and that can deter democratic participation.”
To verify the results of this study BroBible asked AI assistant Grok: “A study claims AI assistants get news wrong 45% of the time, is that accurate?”
Grok responded, “Yes, the claim is accurate.”