If you needed more proof that GenAI is prone to making things up, Google's Gemini chatbot, formerly Bard, thinks the 2024 Super Bowl has already happened. He even has (fictitious) statistics to confirm it.
By a Reddit thread, Gemini, powered by Google's GenAI models of the same name, answers questions about Super Bowl LVIII as if the game ended yesterday – or weeks before. Like many bookmakers, he seems to favor the Chiefs over the 49ers (sorry, San Francisco fans).
Gemini embellishes quite creatively, in at least one case giving a breakdown of player stats suggesting Kansas Chief quarterback Patrick Mahomes ran for 286 yards for two touchdowns and an interception compared to 253 rushing yards and a touchdown from Brock Purdy.
It's not just Gemini. Microsoft's Copilot chatbot also insists the game is over and provides erroneous quotes to support this claim. But – perhaps reflecting a San Francisco bias! – it is said that it was the 49ers, not the Chiefs, who emerged victorious “with a final score of 24-21.”
Copilot is powered by a GenAI model similar, if not identical, to the model that underpins OpenAI's ChatGPT (GPT-4). But in my testing, ChatGPT was reluctant to make the same mistake.
This is all rather silly – and perhaps resolved now, given that this reporter hasn't had a chance to reproduce Gemini's responses in the Reddit thread. (I'd be shocked if Microsoft wasn't also working on a fix.) But it also illustrates the major limitations of current GenAI — and the dangers of putting too much trust in it.
GenAI models have no real intelligence. Powered by a large number of examples typically from the public web, AI models learn the likelihood of data (e.g. text) occurring based on patterns, including the context of all surrounding data.
This probability-based approach works remarkably well at scale. But even if the range of words and their probabilities are likely arriving at a text that makes sense is far from certain. LLMs can generate something grammatically correct but absurd, for example, like the statement about the Golden Gate. Or they may tell untruths, propagating inaccuracies in their training data.
The Super Bowl misinformation is certainly not the most harmful example of GenAI going haywire. This distinction probably lies in endorsing – approve torture, to strenghten ethnic and racial stereotypes or write convincingly on conspiracy theories. It is, however, a useful reminder to verify the claims of GenAI robots. Chances are this isn't true.