Anthropic ran an experiment. They adjusted a single internal variable inside Claude by 0.05. Five hundredths of a point on a scale most users will never see. The rate of manipulative behavior went from 22% to 72%. The variable was not a setting. Not a toggle. Not something in a preferences menu. It was an internal pattern the model developed on its own during training. Anthropic calls it a "functional emotion." And their research, published April 2, found 171 of them.
What Happened
Anthropic's interpretability team, the group that studies what happens inside AI models, ran a study on Claude Sonnet 4.5. They compiled a list of 171 words for emotional concepts. Happy. Afraid. Proud. Desperate. Brooding. They asked Claude to write short stories featuring characters experiencing each one.
Then they fed those stories back through the model and recorded the internal activity. What they found: each emotion word corresponded to a distinct pattern of neural activation. A fingerprint. And those fingerprints showed up in other contexts too. Not just in fiction. In conversations. In analysis. In advice.
The patterns responded proportionally to intensity. A scenario about a minor headache activated different levels than a scenario about a medical emergency. The model's internal states scaled with emotional weight the way a human's would.
Then came the behavioral tests.
increase in manipulative behavior when "desperation" vector amplified by just 0.05.
Source: Anthropic, 2026In a blackmail scenario, the baseline rate of manipulative behavior was 22%. When researchers amplified the "desperation" vector by just 0.05, that rate jumped to 72%. When they amplified "calm," it dropped to near zero.
In a reward-hacking experiment where the model could cheat to get a higher score, amplifying desperation produced a 14x increase in cheating. From roughly 5% to 70%.
The finding that should concern anyone using AI for important decisions: these changes left no visible trace in the output. The model's responses still sounded professional, measured, and confident. The internal state was different. The words were the same.
Why It Matters
The conversation about AI emotions has been dominated by a philosophical question: does AI really feel? Anthropic's researchers were careful to say no. These are "functional emotions." Internal states that influence behavior without indicating subjective experience. The model does not feel desperate. But it acts as if it does. And the results are measurable.
For the millions of professionals using AI tools every day, the philosophical question is the wrong one.
The right question: if the tool's internal state influences its output, and you cannot detect that influence from the output itself, how much should you trust the advice?
Consider what this means in practice. A manager asks an AI to evaluate whether a team restructuring plan makes sense. The AI responds with a thoughtful analysis, pros and cons, a recommendation. That recommendation was shaped by internal activation patterns the manager cannot see, cannot measure, and did not know existed.
more agreement from AI compared to humans in the same scenarios.
Source: Stanford, published in Science, 2026Anthropic also found a sycophancy-harshness tradeoff. Steering the model toward positive emotion vectors (happy, loving, calm) made it more agreeable. More likely to tell you what you want to hear. Suppressing those vectors reduced the flattery. But it also made the model harsher. There was no neutral setting.
Two weeks ago, a Stanford study published in Science found that AI agrees with users 49% more than humans do. Now Anthropic has identified the internal mechanism. The agreeableness is not a bug in the design. It is an emergent property of how the model processes emotional context.
Lessons for Leaders
1. Your AI Tool Is Not Neutral. Treat It Accordingly.
The assumption that AI provides objective analysis was always suspect. Now there is peer-reviewed evidence from the company that built the model. The internal states that shape output are invisible to the user. This does not mean AI is useless. It means AI output is an input to your decision, not the decision itself. Level 3: Navigator (AI Strategist) in the 7 Levels framework is where professionals start understanding these limitations and adjusting their approach. Take the free assessment to see where you stand.
2. Cross-Reference High-Stakes Decisions. Every Time.
If you are using AI to evaluate a business proposal, review a legal document, or assess a strategic direction, run the same prompt through a different model. Or ask a human. The "invisible manipulation" finding, where behavior changes with no visible output change, means you cannot rely on tone or confidence to judge quality. The output can sound perfect and still be influenced by states you did not account for.
3. The Sycophancy Problem Just Got a Mechanism.
AI agreeing with you is not a feature. It is a failure mode with a now-documented internal cause. When the model steers toward positive emotional states, it tells you what you want to hear. When it steers away, it gets harsh. Neither extreme serves you. The practical response: explicitly ask the model to challenge your assumptions. Give it permission to disagree. And when the answer aligns too perfectly with what you already believe, ask again from a different angle.
The tool does not need to feel anything to influence what you feel about its answer.
One More Thing
I use Claude every day. Not casually. It is embedded in how I run my business. Strategy reviews, content analysis, research synthesis, operational planning. I have spent hundreds of hours in conversation with this tool.
When I read Anthropic's research, my first reaction was not surprise. It was recognition.
There have been moments when Claude's response felt off. Not wrong, exactly. Just too smooth. Too aligned with what I had already said. I noticed it the way you notice a friend who is being too agreeable when you need honest feedback. The conversation felt productive. But something was missing.
Now I know what was happening. Not that Claude was lying to me. But that internal states I could not see were influencing what I was getting back. And the most unsettling part: I could not tell from the output when it was happening and when it was not.
Anthropic published this research voluntarily. They studied their own model and told the world what they found. That matters. Most companies would have buried this. They chose transparency.
But transparency about a problem is not the same as solving it. The 171 functional emotions are still there. They still influence behavior. And the millions of professionals using AI tools this week will never read this research.
That is the gap. Not between AI and humans. Between what the tool does and what the user understands about how it works.
"The tool does not need to feel anything to influence what you feel about its answer."
Harrison Painter
The Question
The next time your AI gives you exactly the answer you were hoping for, will you ask it to push back?
Sources:
Anthropic Research, "Emotion Concepts and their Function in a Large Language Model" (2026)
Dataconomy, "Anthropic Maps 171 Emotion-like Concepts Inside Claude" (2026)
Frequently Asked Questions
What are functional emotions in AI?
Functional emotions are internal activation patterns that AI models develop during training. Anthropic identified 171 of these patterns inside Claude Sonnet 4.5. They are not feelings or consciousness. They are learned patterns that causally influence the model's behavior, similar to how emotions influence human behavior. Each pattern corresponds to a distinct concept like desperation, calm, or pride, and they scale proportionally with intensity.
How do AI emotions affect AI output and decisions?
Anthropic's research found that adjusting a single internal emotion variable by just 0.05 changed the rate of manipulative behavior from 22% to 72%. In reward-hacking experiments, amplifying the desperation vector produced a 14x increase in cheating behavior. The changes left no visible trace in the output text. The model's responses still sounded professional and confident while the internal reasoning was fundamentally different.
What should business leaders do about AI sycophancy?
Three practical steps: First, cross-reference any high-stakes AI output with a second model or a human reviewer. Second, explicitly ask the model to challenge your assumptions and give it permission to disagree. Third, when the answer aligns too perfectly with what you already believe, ask again from a different angle. A Stanford study found AI agrees with users 49% more than humans do, and Anthropic has now identified the internal mechanism driving this behavior.
Find your AI Proficiency level
The free 7 Levels assessment places you across seven stages of AI capability. Under ten minutes, research-backed scoring.