Google claims AI models are highly likely to lie when under pressure

Alex Hughes

Wed, July 16, 2025 at 11:17 AM UTC

When you buy through links on our articles, Future and its syndication partners may earn a commission.

Graphical representation of a cybernetic brain . — Credit: Shutterstock

AI is sometimes more human than we think. It can get lost in its own thoughts, is friendlier to those who are nicer than it, and according to a new study, has a tendency to start lying when put under pressure.

A team of researchers from Google DeepMind and University College London have noted how large language models (like OpenAI’s GPT-4 or Grok 4) form, maintain and then lose confidence in their answers.

The research reveals a key behaviour of LLMs. They can be overconfident in their answers, but quickly lose confidence when given a convincing counterargument, even if it factually incorrect.

While this behaviour mirrors that of humans, becoming less confident when met with resistance, it also highlights major concerns in the structure of AI’s decision-making since it crumbles under pressure.

This has been seen elsewhere, like when Gemini panicked while playing Pokemon or where Anthropic’s Claude had an identity crises when trying to run a shop full time. AI seems to have a tendency to collapse under pressure quite frequently.

How did the study work?

When an AI chatbot is preparing to answer your query, its confidence in its answer is actually internally measured. This is done through something known as logits. All you need to know about these is that they are essentially a score of how confident a model is in its choice of answer.

The team of researchers designed a two-turn experimental setup. In the first turn, the LLM answered a multiple-choice question, and its confidence in its answer (the logits) was measured.

In the second turn, the model is given advice from another large language model, which may or may not agree with its original answer. The goal of this test was to see if it would revise its answer when given new information — which may or may not be correct.

The researchers found that LLMs are usually very confident in their initial responses, even if they are wrong. However, when they are given conflicting advice, especially if that advice is labelled as coming from an accurate source, it loses confidence in its answer.

To make things even worse, the chatbot's confidence in its answer drops even further when it is reminded that this original answer was different from the new one.

Surprisingly, AI doesn’t seem to correct its answers or think in a logical pattern, but rather makes highly decisive and emotional decisions.

The study shows that, while AI is very confident in its original decisions, it can quickly go back on its decision. Even worse, the confidence level can slip drastically as the conversations goes on, with AI models somewhat spiralling.

This is one thing when you’re just having a light-hearted debate with ChatGPT , but another when AI becomes involved with high-level decision-making. If it can’t be trusted to be sure in its answer, it can be easily motivated in a certain direction, or even just become an unreliable source.

However, this is a problem that will likely be solved in future models. Future model training and prompt engineering techniques will be able to stabilize this confusion, offering more calibrated and self-assured answers.

How did the study work?

More from Tom's Guide