Google’s and Microsoft’s chatbots are making up Super Bowl stats


If you needed more proof that GenAI is prone to making things up, Google's Gemini chatbot, formerly Bard, thinks the 2024 Super Bowl has already happened. He even has (fictitious) statistics to confirm it.

By a Reddit thread, Gemini, powered by Google's GenAI models of the same name, answers questions about Super Bowl LVIII as if the game ended yesterday – or weeks before. Like many bookmakers, he seems to favor the Chiefs over the 49ers (sorry, San Francisco fans).

Gemini embellishes quite creatively, in at least one case giving a breakdown of player stats suggesting Kansas Chief quarterback Patrick Mahomes ran for 286 yards for two touchdowns and an interception compared to 253 rushing yards and a touchdown from Brock Purdy.

It's not just Gemini. Microsoft's Copilot chatbot also insists the game is over and provides erroneous quotes to support this claim. But – perhaps reflecting a San Francisco bias! – it is said that it was the 49ers, not the Chiefs, who emerged victorious “with a final score of 24-21.”

Image credits: Kyle Wiggers/TechCrunch

Copilot is powered by a GenAI model similar, if not identical, to the model that underpins OpenAI's ChatGPT (GPT-4). But in my testing, ChatGPT was reluctant to make the same mistake.

Image credits: Kyle Wiggers/TechCrunch

This is all rather silly – and perhaps resolved now, given that this reporter hasn't had a chance to reproduce Gemini's responses in the Reddit thread. (I'd be shocked if Microsoft wasn't also working on a fix.) But it also illustrates the major limitations of current GenAI — and the dangers of putting too much trust in it.

GenAI models have no real intelligence. Powered by a large number of examples typically from the public web, AI models learn the likelihood of data (e.g. text) occurring based on patterns, including the context of all surrounding data.

This probability-based approach works remarkably well at scale. But even if the range of words and their probabilities are likely arriving at a text that makes sense is far from certain. LLMs can generate something grammatically correct but absurd, for example, like the statement about the Golden Gate. Or they may tell untruths, propagating inaccuracies in their training data.

This is not malicious on the part of LLMs. They have no wickedness and the concepts of right and wrong have no meaning for them. They have simply learned to associate certain words or expressions with certain concepts, even if these associations are not exact.

Hence Gemini and Copilot's lies about the Super Bowl.

Google and Microsoft, like most GenAI vendors, readily acknowledge that their GenAI applications are not perfect and are in fact prone to making errors. But these acknowledgments come in the form of fine print that I would argue could easily go unnoticed.

The Super Bowl misinformation is certainly not the most harmful example of GenAI going haywire. This distinction probably lies in endorsing – approve torture, to strenghten ethnic and racial stereotypes or write convincingly on conspiracy theories. It is, however, a useful reminder to verify the claims of GenAI robots. Chances are this isn't true.



Source link

Scroll to Top