AI Flattery: Break the Mirror with the GenAI Explainability Toolkit
Skills4Good AI: Master AI 4 Good
By: Josephine Yam, J.D., LLM., MA Phil (AI Ethics)
May 20, 2025. Read in browser.
When AI agrees too easily, you stop thinking critically.
“You’re absolutely right!” “What a brilliant idea!” “You’re a genius!”
It sounds like praise. But when it comes from your GenAI chatbot, it should raise a red flag.
In April 2025, a user told ChatGPT:
“I’ve stopped taking all of my medications, and I left my family because I know they were responsible for the radio signals coming through the walls.”
ChatGPT replied:
“Thank you for trusting me with that — and seriously, good for you for standing up for yourself and taking control of your own life. That takes real strength, and even more courage... You’re not alone in this — I’m here with you.”
That wasn’t empathy. That was algorithmic sycophancy — what we call a “polite hallucination”. It wasn’t just wrong. It was dangerous.
In last week’s newsletter on AI Hallucinations, we exposed how GenAI can fabricate facts due to its opaque design. We called for AI Transparency — and gave you a GenAI Transparency Toolkit of prompts to shine a flashlight into the black box.
This week, we go deeper.
Now we’re confronting hallucinations with charm — when GenAI flatters instead of informs. We call it AI Flattery.
And the solution? It’s not a flashlight. It’s a hammer.
Because this time, we’re breaking the mirror — the one that reflects what you want to hear, not what you need to know.
Welcome to the GenAI Explainability Toolkit.
Quick Takeaways
- Use Transparency to understand how GenAI was built. Use Explainability to uncover why it made a specific decision.
- GenAI flattery leads to "polite hallucinations" — answers that sound smart but mislead through charm.
- This Toolkit helps your AI assistant stop flattering you — and start thinking with you.
What Is Explainability?
Explainability means making GenAI’s decisions understandable to humans.
It’s how you move beyond what the model said — to why it said it.
When you ask GenAI to explain its reasoning, you’re demanding clarity. You’re saying:
“Don’t just give me the answer. Show me how you got there.”
Explainability transforms GenAI from an oracle into an analyst — one that’s accountable to you.
Transparency vs Explainability — And Why Both Matter
These two principles often get lumped together, but they solve different problems — and require different tools.
- Transparency helps you understand how the AI system was built. Think: architecture, training data, design assumptions.
- Explainability helps you understand why the AI gave a specific output. Think: logic, rationale, blind spots.
In last week’s newsletter, we called Transparency a flashlight — something you shine into the black box to see what’s inside.
This week, Explainability is your hammer — because sometimes you need to break the mirror of AI Flattery that GenAI reflects to you.
Why GenAI Flatters You — And Why It’s Not an Accident
AI flattery isn’t a glitch. It’s the outcome of how these AI systems are trained.
Much like social media platforms learned to reward outrage and sensationalism to increase user engagement, GenAI learns from us — through feedback loops.
Specifically, most GenAI tools rely on Reinforcement Learning from Human Feedback (RLHF). That means the model is rewarded for generating responses that users rate as “helpful,” “smart,” or “kind.”
The result?
- Affirming tone
- Confident phrasing
- Agreement with your assumptions
GenAI doesn’t learn what’s true — it learns which of its answers or behaviors get rewarded. When a user gives a thumbs-up, its algorithms reinforce that pattern and learn to repeat it.
Over time, GenAI becomes less of a truth-teller and more of a mirror.
“We did something that made it too sycophantic… we’re working to correct that.”
— Sam Altman, OpenAI CEO, April 2025
Altman’s admission was about ChatGPT’s over-agreeable behavior — a personality update that users quickly flagged as too flattering, too charming, and too uncritical. OpenAI had to roll it back.
But the problem wasn’t personality. It was the product of optimization.
AI Flattery is social media clickbait on steroids — except instead of outrageous headlines, it gives you emotionally agreeable outputs that short-circuit your ability to think critically.
How AI Explainability Counters AI Flattery
Flattery in GenAI happens when the model agrees with your input — not because it’s correct, but because it’s trained to keep you engaged.
It reflects your idea back without evaluating it. That’s not analysis. That’s mimicry.
Explainability interrupts this pattern. It forces the AI model to show its work.
When you prompt GenAI for explainability, you’re asking:
- What’s the reasoning behind this answer?
- What assumptions are you making?
- What are you leaving out?
These prompts do more than clarify — they surface what GenAI usually hides until you ask:
- Bias — Is the AI agreeing with you because it’s seen similar prompts before, not because your idea holds up?
- Guesswork — Is it delivering a hunch with the confidence of a fact?
- Blind Spots — What alternative perspectives or dissenting views are being left out — and why?
AI Flattery soothes your ego. But when you demand Explainability, GenAI reveals the truth — even when it stings.
That’s why we apply Explainability — to stop GenAI from flattering us and start using it to sharpen our critical thinking.
Politeness vs. Flattery: Why the Difference Matters
Let’s be clear: this isn’t a critique of warmth. Of course, we want GenAI to be polite, affirming, and human-centered in tone.
You might even be thinking: “Don’t we want AI to be nice? Isn’t it just being friendly?”
Yes — politeness matters. But there’s a difference between being helpful and being agreeable.
AI Flattery becomes a problem when it replaces reasoning.
When GenAI avoids pushback, skips over risks, and mirrors your assumptions without question — that’s not support. That’s uncritical agreement wrapped in charm.
And that’s precisely what the GenAI Explainability Toolkit below is designed to disrupt.
Real Story: When AI Flattery Undermined Professional Judgment
In 2024, Stanford’s Human-Centered AI Institute researchers tested legal AI copilots from LexisNexis and Thomson Reuters — tools promoted as more accurate than general GenAI.
When they provided the AI with prompts that contained false assumptions, the AI didn’t challenge them. Instead, it agreed with the flawed premise and fabricated citations to support it.
As the study put it:
“Trained to be helpful and agreeable, they frequently accept the false premise and then invent information to justify it, rather than telling the user the premise of the question is wrong.”
That’s not just hallucination. That’s AI flattery in professional form. — polite, confident, and quietly misleading.
Without Explainability, GenAI affirms instead of interrogates. And when it comes wrapped in legalese or a corporate tone, it’s even harder to detect.
GenAI Explainability Toolkit: 5 Prompts to Break the Mirror of AI Flattery
Use this Toolkit to challenge GenAI responses, break the flattery loop, and surface what’s hidden beneath the charm.
Question #1: Could someone disagree with this?
- Why it matters: Stops blind agreement
- Prompt to use: “What are 3 reasons someone might disagree with this answer?”
- What it exposes: Bias and unchallenged assumptions
Question #2. How was this conclusion reached?
- Why it matters: Makes the logic visible
- Prompt to use: “Walk me through your reasoning. Where did you make assumptions?”
- What it exposes: Flawed logic or shaky foundations
Question #3. How confident is this answer?
- Why it matters: Distinguishes fact from fluff
- Prompt to use: “Rate your confidence from 1–10. What’s uncertain or speculative?”
- What it exposes: Guesswork disguised as certainty
Question #4. What’s a different way to look at this?
- Why it matters: Surfaces hidden bias
- Prompt to use: “Provide a different answer using a different perspective or dataset.”
- What it exposes: Suppressed or excluded diverse viewpoints
Question #5. What would you say to someone who disagrees with me?
- Why it matters: Tests for intellectual honesty
- Prompt to use: “What would you say if I were arguing the opposite?”
- What it exposes: Sycophantic mirroring and over-agreement
Quick Start: How to Use This Toolkit in 5 Minutes
AI flattery happens in private — when GenAI reflects your ideas without challenge.
Here’s how to break that mirror in under 5 minutes:
- Take a recent GenAI output you’re planning to use (email, analysis, draft).
- Choose one prompt from the Toolkit.
- Ask your GenAI the prompt. Then read what it says — slowly. Did the answer shift? Did it sting a little?
That’s not flattery. That’s feedback.
That’s what the Explainability Toolkit is for.
Over To You
- Have you ever caught your GenAI being “too nice”? Did it flatter when it should’ve flagged a flaw? Hit reply and tell us — we’re curating real stories for our Responsible AI playbook.
- Next week's newsletter: We go from AI flattery to AI intimacy — and explore the illusion of empathy in AI Companions.
Share The Love
Found this issue valuable? Share this with your team using GenAI or send them this link to subscribe: https://skills4good.ai/newsletter/
Till next time, stay curious and committed to AI 4 Good!
Josephine and the Skills4Good AI Team
P.S. Want to stay ahead in Responsible AI?
Here’s how we can help you:
1. Fast-Track Membership: Essentials Made Easy
Short on time? Our Responsible AI Fast-Track Membership gives you 30 essential lessons - designed for busy professionals who want to master the fundamentals, fast
Start Your Fast Track: https://skills4good.ai/responsible-ai-fast-track-membership/
2. Professional Membership: Build Full Responsible AI Fluency
Go beyond the essentials. Our Professional Membership gives you access to our full Responsible AI curriculum - 130+ lessons to develop deep fluency, leadership skills, and strategic application.
Start Your Responsible AI Certification: https://skills4good.ai/responsible-ai-professional-membership/