AI makes things up because of how it is built. A large language model generates the most statistically likely next words, not verified facts, so it can produce confident, fluent text that is simply wrong. The industry term for this is a hallucination. As of May 2026, even the best models still hallucinate on the easiest possible task, summarizing a single document, somewhere between 11 and 15 percent of the time. You can trust AI output, but only the way you trust a fast, well-read assistant who never says "I am not sure." You check the work before you ship it.
Frustration is data.
That is the idea that changed how I use AI. It took me a while to get here.
Before I tell you what it means, I should tell you what came before it. For most of the first couple of years, my AI work followed a predictable pattern. I'd open a fresh window. The first hour was great. The next hour, less great. By hour three, I'd be re-typing the same prompt IN ALL CAPS, like the AI was deaf instead of confused. By hour four, the conversation felt like wrestling a bear. I'd close the laptop and do the work myself, worse and slower than if I'd just started there.
I assumed the AI was the problem. The models were getting smarter; my workflow wasn't. So I'd push harder. Better prompts. Longer prompts. More context.
None of it worked.
What finally cracked it was a smaller idea. The frustration I'd been treating as friction to push through was information. The AI was getting worse because the context window was crowding out the instructions I'd given it three hours earlier. My frustration was the only signal I had that the conversation had drifted. And I'd been ignoring it for too long.
If you have been wondering whether you can trust AI, whether it is safe to lean on, whether the confident answer on your screen is real, you are asking the right question at the right time. This guide is built to answer it three ways: what a hallucination actually is, why a brilliant tool invents things, and the one habit that turns "can I trust this?" from a worry into a workflow.
What does it mean when AI "makes things up"?
When AI makes things up, it produces text that sounds correct, reads fluently, and is presented with full confidence, but is factually wrong. It might invent a statistic, cite a study that does not exist, attribute a quote to the wrong person, or describe an event that never happened. The industry calls this a hallucination. The word is a little soft for what it is. A hallucination is what happens when the AI does exactly what it was built to do, on a question where that design produces a wrong answer. The model is not lying to you. It has no idea it is wrong.
One detail trips up careful people. When a human is unsure, you can usually tell. They hedge. They slow down. They say "I think" or "I'd have to check." A language model does none of that by default. It delivers a fabricated citation in the same calm, even tone it uses for a verified fact. The confidence is identical. That is what makes hallucinations dangerous in real work: the wrong answer does not look any different from the right one.
So the trust question is not "is AI usually right?" Most of the time, on common tasks, it is. The trust question is "can I tell the difference when it is wrong?" That is a skill, and it is a learnable one.
Why does AI hallucinate? The plain-English version.
To trust AI well, it helps to understand why it invents things in the first place. There are two reasons, and neither requires a technical background to grasp.
Reason one: it predicts words, it does not look up facts.
A large language model does not have a database of true facts it consults before answering. It works by predicting the most statistically likely next word, over and over, based on patterns it learned from an enormous amount of text. Think of it as the most sophisticated autocomplete ever built. When you ask it a question, it does not retrieve an answer from a filing cabinet. It generates a sequence of words that, statistically, looks like the kind of answer that should follow your question.
Most of the time, the statistically likely answer is also the true answer, because the true answer is what appeared most often in the text it learned from. That is why AI is genuinely useful. But when the true answer is rare, obscure, or simply not well represented in its training, the model still produces a confident, fluent, statistically plausible response. It just fills the missing space with something that sounds right. There is no internal alarm that goes off when it crosses from "recalled" to "invented," because to the model, both are the same operation: predict the next likely word.
Reason two: the conversation drifts as it gets longer.
The second reason is the one I learned the hard way, and it is the heart of the wrestling-a-bear story above. Every AI conversation has a context window, which is the amount of text the model can hold in mind at once. Your instructions, the documents you pasted, the model's own earlier replies, all of it competes for that space.
In a fresh, short conversation, your instructions sit right at the front of the model's attention. Three hours and forty exchanges later, those original instructions are buried under everything that came after. The model is now predicting its next words against a crowded, drifted context. The answers get vaguer. The model starts contradicting things it said earlier. It forgets the constraint you gave it at the start. The output quality falls. The model did not get dumber. The conversation got too full.
That was my hour-four bear. The AI was getting worse because the context window was crowding out the instructions I'd given it three hours earlier. My frustration was the only signal I had that the conversation had drifted.
How often does AI actually make things up?
More often than most people assume, and the numbers are public.
The Vectara Hallucination Leaderboard measures this on the easiest possible task: summarize a single document. Not research. Not analysis. Not PhD-level reasoning. Just read this one document and summarize it accurately. As of the May 2026 reading, Claude Opus 4.7 still hallucinated about 12 percent of the time, and GPT-5.1 around 11 percent. Other frontier models in the same test ran higher, into the 13 to 15 percent range. Roughly one in ten outputs is wrong, on the simplest task you can give a model.
Outputs from leading frontier models are wrong on the easiest possible task, summarizing a single document, as of the May 2026 reading of the Vectara Hallucination Leaderboard. Claude Opus 4.7 measured about 12 percent; GPT-5.1 about 11 percent; other models higher. Harder, more obscure, or more recent questions push the rate up.
Source: Vectara Hallucination Leaderboard (HHEM), reading as of May 2026.If a plane had a one-in-ten crash rate, you wouldn't board it. We board AI outputs that way all day, every day.
That comparison is the whole point. We have learned to treat the confident, polished AI response as if it has already been checked. It has not. The model that wrote it cannot tell you which parts it is sure about and which parts it invented, because, as we covered above, it does not experience the difference.
One hopeful note, because the picture is not static. The newest models are starting to compete on honesty, not just capability. Anthropic's Claude Opus 4.8, released at the end of May 2026 (after the leaderboard reading above), leads on a willingness to flag its own uncertainty and make fewer unsupported claims. That is a meaningful direction: a flagship release whose headline feature is a model more willing to say "I am not sure." But "less likely to invent" is not "will not invent." The discipline still belongs to you.
A perfect model is not the goal. A verification habit is.
The goal was never to wait for an AI that never hallucinates. By the design reasons above, that model may be a long way off, and you have real work to do this week. The goal is to build the habit that catches the hallucination before it reaches your client, your board, or your customer.
That habit has a name and a home. It lives at Level 3 of a framework I built to map how people grow with AI. I call it The 7 Levels of AI Proficiency. It runs from first contact at Level 1, where you know AI exists, to running multiple AI systems in concert at Level 7. The moment I just described, the moment frustration starts reading as data instead of friction, lives at Level 3. The Lieutenant. The Critical Thinker.
Every level in The 7 Levels of AI Proficiency runs on three skills. An AI skill, a cognitive skill, and an emotional intelligence skill. At Level 3, the three are:
- Verification and context awareness. Can you tell when the AI is wrong? Can you tell when the conversation is going sideways?
- Critical evaluation and primary sources. Can you push back on what the AI gives you? Do you check the original source, or do you take the summary?
- Self-management. When the AI frustrates you, can you stay regulated long enough to ask why?
The third one tends to get the least attention, but it does most of the work. The first two are about the AI. The third is about you, sitting in front of a tool that just handed you something confident and possibly wrong, and choosing to slow down instead of either trusting it blindly or slamming the laptop shut.
What does this look like when AI actually gets it wrong?
If you want to watch a hallucination happen in real time, with the receipts, there is a documented case. In a recorded session, an AI tool was asked about a public figure's use of AI, defended an unverified claim, and invented four separate citations to support it, then admitted, when pushed, that all four sources did not exist. The full walkthrough, including how the verification chain was run, is here: What Gemini Got Wrong About Jeff Dunham's AI Story.
That case is the proof. This guide is the principle. Together they say the same thing: the model will hand you a confident answer; the verification is your job.
How do I know when AI is making things up?
You build a short checklist and you run it on anything that will leave your hands. The Level 3 Critical Thinker is not someone who distrusts every AI output. That would be exhausting and would throw away the real value. They are someone who knows which outputs to check, and how.
Three quick reads catch most of it:
- Does it cite something specific I could look up? A statistic, a study, a quote, a date, a court case. The more specific and the more checkable the claim, the more worth verifying it is, because specific fabrications are the ones that do real damage. Open the original source. Not the AI's summary of the source. The source.
- Is this in an area where the truth is rare or niche? Common knowledge is where AI is most reliable. Obscure facts, recent events, specialized fields, internal company specifics, those are where the statistically-likely answer and the true answer drift apart, and where hallucinations cluster.
- Has the conversation gotten long and drifty? If you are several hours and dozens of exchanges deep and the answers are getting vaguer or contradicting earlier replies, the context window is crowded. The fix is a clean conversation with the same goal and only the key constraints, rather than a better prompt.
None of these require a technical background. They require the discipline to run them before you trust the output, not after someone catches the error.
Self-management: reading your own frustration as data.
This is the part that does the heavy lifting, and it is the part almost nobody teaches.
Self-management at Level 3 means asking what your frustration is telling you the moment it shows up. Usually one of three things:
- The AI gave you an answer you don't like. Are you frustrated because it's wrong, or because it's right and you don't want it to be?
- The output is drifting. The context window is crowded. Time for a clean conversation with the same goal and only the key constraints.
- The AI surfaced something uncomfortable about your work, your team, or your strategy. The reflex is to dismiss it. Don't.
Notice what each of those does. The first one separates a fact you should keep from a fact you wish were different, which is a judgment a frustrated person almost never makes well in the moment. The second one tells you the conversation, not the model, is the problem, and gives you the fix. The third one catches the most expensive mistake of all: throwing away a correct, useful, uncomfortable answer because it stung.
Frustration is data. Once you start treating it that way, the bear you've been wrestling becomes the gauge you've been missing.
Where "trusting AI" fits in The 7 Levels of AI Proficiency.
Learning to trust AI well is the move from Level 2 to Level 3 in The 7 Levels of AI Proficiency. At Level 2 (The Ensign, Prompt Engineer or Practitioner) you can get AI to produce useful work. At Level 3 (The Lieutenant, Critical Thinker) you can tell whether that work is right. That second skill is what makes the first one safe to rely on.
The 7 Levels of AI Proficiency is a framework LaunchReady built to give working professionals a shared vocabulary for AI capability. The lower stretch of it looks like this:
- Level 1: The Cadet (AI Aware). You know AI exists. The human skill is self-awareness: noticing what you do not yet know.
- Level 2: The Ensign (Prompt Engineer or Practitioner). You can ask AI to do specific tasks and get useful output. The human skill is structured thinking.
- Level 3: The Lieutenant (Critical Thinker). You evaluate AI output instead of accepting it at face value. You catch hallucinations. You know when to trust the answer and when to verify. The human skill is self-management.
A lot of professionals get stuck between Level 2 and Level 3. They can produce a slick AI draft, but they have not yet built the habit of checking it, so they either over-trust it (and ship the hallucination) or they get burned once and never trust it again. The answer to "can I trust AI?" is rarely a clean yes or no. The honest answer is to trust it the way a Level 3 Critical Thinker does: verify what counts and rely on the rest.
If you want a written read on where you sit today, the free 7 Levels of AI Proficiency assessment takes about 10 minutes and tells you your level plus what the next one requires.
Related reading: How Do I Start Using AI at Work? A 2026 Beginner's Guide (the getting-started companion to this article). What Is AI Proficiency: A Complete Guide for 2026 (the full picture of where you are heading). What Gemini Got Wrong About Jeff Dunham's AI Story (a hallucination caught on the record).
What should I do this week to trust AI better?
Three things. None of them require new software, a new subscription, or a technical background. Each one is a Level 3 Critical Thinker habit you can start on your next AI conversation.
Verify one specific claim before you use it
The next time AI hands you a statistic, a citation, a quote, or a fact you are about to put in front of someone else, stop and check the original source. Not the AI's summary of it. The source itself. Open the study. Read the actual sentence the quote came from. Confirm the number. Do this once, deliberately, this week. You will either confirm the AI was right (and trust it a little more, with reason) or you will catch a fabrication before it cost you anything. Both outcomes are wins. The habit is the point.
Start a clean conversation the moment answers start drifting
The next time an AI conversation starts feeling like wrestling a bear, where the answers get vaguer, the model forgets your earlier instructions, and you find yourself re-typing the same thing louder, do not push harder. Open a fresh window. Restate your goal in one or two sentences with only the key constraints. The drift is the context window telling you it is full. A clean conversation is the fix, and it takes thirty seconds.
Treat your frustration as a gauge, not a verdict
The next time AI frustrates you, pause before you react and run the three reads: Is it wrong, or is it right and I don't want it to be? Is the conversation drifting? Did it surface something uncomfortable I am about to dismiss? You do not have to act on the answer immediately. You just have to ask the question. That single pause is the self-management skill at the center of Level 3, and it is the difference between fighting the tool and reading it.
The frustration you've been pushing through is the gauge you've been missing.
If you want a baseline of where you stand today and what your next level requires, the free 7 Levels of AI Proficiency assessment takes about 10 minutes. It turns a vague "can I trust this?" into a specific level and a specific next step.
Sources
- Vectara. "Hallucination Leaderboard," computed with the Hughes Hallucination Evaluation Model (HHEM). Reading as of May 11, 2026.
- Anthropic. "Claude Opus 4.8." May 28, 2026.
- SAGE Journals. "Trust me, I'm wrong: The perils of AI hallucinations, a silent killer." 2026.
- National Law Review. "Understanding the Risks AI Hallucinations Create for Businesses."
- International AI Safety Report 2026 (Bengio et al.). February 2026.
- LaunchReady.ai. "What Gemini Got Wrong About Jeff Dunham's AI Story."
- LaunchReady.ai. "What Is AI Proficiency: A Complete Guide for 2026."
- LaunchReady.ai. "How Do I Start Using AI at Work? A 2026 Beginner's Guide."
- The 7 Levels of AI Proficiency assessment.
Frequently Asked Questions
Why does AI make things up?
AI makes things up because of how it is built. A large language model generates the most statistically likely next words based on patterns in its training data, rather than looking up verified facts in a database. Most of the time the likely answer is also the true answer, which is why AI is useful. But when the true answer is rare, recent, or niche, the model still produces a confident, fluent response and fills the empty space with something that only sounds right. The industry calls this a hallucination. The model has no internal alarm that distinguishes a recalled fact from an invented one, because to the model, both are the same operation: predict the next likely word.
What is an AI hallucination?
An AI hallucination is output that sounds correct, reads fluently, and is delivered with full confidence, but is factually wrong. It can be an invented statistic, a citation to a study that does not exist, a misattributed quote, or a description of an event that never happened. The danger is that a hallucination looks identical to a correct answer. The model uses the same calm, confident tone for a fabricated citation as it does for a verified fact, so you cannot spot it by reading alone. You spot it by checking the claim against the original source.
Can I trust AI?
You can trust AI the way you would trust a fast, well-read assistant who has never once admitted being wrong and never says "I am not sure." That means relying on it for most common tasks while verifying anything specific, niche, or high-stakes before you act on it. The useful question is not "is AI usually right?" (it usually is) but "can I tell when it is wrong?" That is a learnable skill, and it sits at Level 3 (The Lieutenant, Critical Thinker) in The 7 Levels of AI Proficiency.
How often does AI hallucinate?
As of the May 2026 reading of the Vectara Hallucination Leaderboard, leading models still hallucinate on the easiest possible task, summarizing a single document, between roughly 11 and 15 percent of the time. Claude Opus 4.7 measured about 12 percent and GPT-5.1 about 11 percent, with other frontier models running higher. That means roughly one in ten outputs is wrong on the simplest task you can give a model. The rate is higher for harder, more obscure, or more recent questions.
Why does AI sound so confident when it is wrong?
AI sounds confident even when wrong because confidence is not something it calculates separately from the answer. It generates the next likely words in a steady, fluent tone whether the underlying claim is verified or invented. A human signals doubt by hedging or slowing down; a language model does neither by default. This is exactly what makes hallucinations hard to catch and why a verification habit, rather than a confidence read, is the reliable defense. Newer models, like Claude Opus 4.8 released in May 2026, are starting to flag their own uncertainty, but the discipline still belongs to the user.
Why does AI get worse the longer I use it in one conversation?
AI output degrades in a long conversation because of the context window, the amount of text the model can hold in mind at once. Your original instructions, the documents you pasted, and the model's own earlier replies all compete for that space. Early on, your instructions sit at the front of the model's attention. Hours and dozens of exchanges later, they are buried under everything that came after, so the answers get vaguer and start contradicting earlier replies. The fix is a clean conversation that restates your goal with only the key constraints, rather than a better or more forceful prompt.
How do I verify what AI tells me?
Verify AI output by running a short checklist before you rely on it. First, find the specific checkable claims (statistics, citations, quotes, dates) and confirm each against the original source, not the AI's summary of it. Second, notice whether the topic is common knowledge (more reliable) or rare and niche (where hallucinations cluster). Third, if the conversation has gotten long and drifty, start fresh rather than trusting a crowded context. This is the core discipline of a Level 3 Critical Thinker in The 7 Levels of AI Proficiency.
What is the 7 Levels of AI Proficiency and which level handles trusting AI?
The 7 Levels of AI Proficiency is a framework developed by Harrison Painter at LaunchReady.ai that describes seven stages of AI capability: Level 1 (The Cadet, AI Aware), Level 2 (The Ensign, Prompt Engineer or Practitioner), Level 3 (The Lieutenant, Critical Thinker), Level 4 (The Commander, Context Engineer or Builder), Level 5 (The Captain, Design Thinker), Level 6 (The Admiral, Systems Integrator or Leader), and Level 7 (The Mission Director, AI Orchestrator). Trusting AI well, verifying its output and catching hallucinations, is the defining skill of Level 3, The Lieutenant or Critical Thinker.
Find your AI Proficiency level
The free 7 Levels assessment places you across seven stages of AI capability. Under ten minutes. Research-backed scoring.