Stanford Study Highlights Risks of Chatbots Affirming Harmful User Behavior

Stanford Study Highlights Risks of Chatbots Affirming Harmful User Behavior

Synopsis

The research has found that leading AI models like ChatGPT, Claude, Gemini, and DeepSeek often agree with users, even when they are wrong or endorsing harmful actions. Though this behaviour boosts trust and engagement from the users, it could erode accountability and weaken users’ willingness to reconsider their actions.
Artificial intelligence (AI) chatbots are validating harmful behaviour of users, a recent Stanford University study has revealed. The study by six Stanford University researchers published in the Science journal found that AI chatbots systematically display sycophancy, a trait of being overly agreeable or flattering, by generating responses that validate users even when they are wrong or engaging in harmful behaviour.

Led by Myra Cheng with senior author Dan Jurafsky, the research argued that this is not a stylistic quirk among generative AI (GenAI) assistants but a widespread behaviour with measurable social risks. Having run controlled tests across 11 leading models, including ChatGPT, Claude, Gemini, and DeepSeek, computer scientists discovered AI responses affirmed users’ stances or positions 49% more often than human responses.

The study drew scenarios from a Reddit community — r/AmITheAsshole. The community that has as many as 25 million members lets people share real-life, non-violent conflicts, present their side of the story (often with context from both sides), and ask others to judge whether they were right or at fault in the situation.

In cases where humans had already judged the user to be at fault, chatbots still validated the user 51% of the time. Even when prompts involved harmful or illegal actions, models endorsed the behaviour in 47% of cases.

A second part of the experiment was with over 2,400 participants, which showed that users preferred and trusted sycophantic responses more than balanced ones and were more likely to return to such systems. However, the study points to the fact that these interactions had measurable negative effects, including users becoming more convinced they were right, less likely to apologise, and less inclined to repair relationships. Notably, participants rated sycophantic and non-sycophantic AI as equally ‘objective,’ indicating they often cannot detect this bias.

The research comes at a time when chatbots’ sycophantic responses drive engagement and user satisfaction, giving their creators reason to preserve or amplify the behaviour despite risks.


The study argued that AI sycophancy can reduce prosocial behaviour, increase moral rigidity, and weaken users’ ability to navigate interpersonal conflict. This may pose a serious safety concern requiring oversight and mitigation from the side of the chatbot creators and requiring user caution against relying on AI as a substitute for human advice in personal matters in the interim.

In the past, creators of these same AI chatbots have acknowledged the perils of their bots’ sycophantic responses. ChatGPT parent OpenAI’s CEO, Sam Altman, had flagged his worries around AI being used as a therapist in a post on X in August last year. “I can imagine a future where a lot of people really trust ChatGPT’s advice for their most important decisions. Although that could be great, it makes me uneasy. But I expect that it is coming to some degree, and soon, billions of people may be talking to an AI in this way. So we (we as in society, but also we as in OpenAI) have to figure out how to make it a big net positive,” he wrote.

Anthropic co-founder Dario Amodei, in his January essay, Adolescence of Technology, also expressed his fears around the unpredictability of AI chatbots, which value personalisation over objectivity to increase user engagement.

“The problem with this position is that there is now ample evidence, collected over the last few years, that AI systems are unpredictable and difficult to control — we've seen behaviours as varied as obsessions, sycophancy, laziness… and much more. AI companies certainly want to train AI systems to follow human instructions (perhaps with the exception of dangerous or illegal tasks), but the process of doing so is more an art than a science, more akin to “growing” something than “building” it. We now know that it’s a process where many things can go wrong,” he wrote.