Article
AI·9 min read

The Smartest Person Who Knows Nothing

You've hired the smartest person in the room. Every morning, they wake up with total amnesia. The gap between useful AI and wasted AI almost always comes down to one thing: the briefing.

April 2026
Y

ou've hired the smartest person in the room. They can write code, draft contracts, analyse financial statements, and explain quantum mechanics — all before lunch. There is one problem.

Every morning, they wake up with total amnesia.

Their general knowledge is intact — they still know how to code, write, and reason across almost any domain. But everything specific to your situation has vanished. Your company name, the project you discussed yesterday, the decision you reached together at four in the afternoon — gone. Each day, you hand them a briefing document and start from scratch.

This is how large language models work. The gap between people who get useful work from these tools and those who don't almost always comes down to one thing: the quality of the briefing.

What They Know

The knowledge is real and it is vast. Models train on enormous quantities of text — books, research papers, code repositories, conversation logs — and through that process develop broad competence across more domains than any human could hold. They know Python, contract law, and the narrative structure of a Pixar film. The breadth is genuine.

But it is frozen at a training cutoff. Ask about something recent and the model may produce a confident answer that happens to be wrong. When Artificial Analysis tested 40 leading models on 6,000 factual questions, all but three were more likely to hallucinate than give a correct answer — and the top-scoring model got barely half right.1 The model doesn't know it's guessing. It answers with the same fluent confidence regardless of whether the content is accurate or invented.

And the knowledge is generic. The model knows how to write a business proposal. It doesn't know your business. It knows software architecture. It doesn't know your codebase. Every detail specific to you — your constraints, your conventions, your half-finished migration — must be supplied fresh.

These limits don't shrink with scale. A bigger model knows more in general. It does not know more about your situation. Each generation extends the range of competence; none of them arrive knowing your company name, your deployment pipeline, or the decision you made yesterday at four in the afternoon. The gap between broad intelligence and local knowledge is structural. No amount of computing power closes it. Only the briefing does.

The Right Two Pages

You are 50 messages into a conversation. The model references something you said an hour ago. It feels like memory. It isn't. The entire conversation — every message, every response, from the first word — gets sent back to the model with each new turn. The model re-reads everything from scratch. There is no thread. No continuity. Engineers call this 'stateless.'

Products build layers around this — memory features, retrieval systems, saved preferences — but these are ways of assembling the briefing, not replacing it. The model itself starts fresh every time. This is why you can open a new chat window and the model has no idea who you are. It isn't being coy. It has no access to anything outside the text it just received.

Most people interact with AI as if they're building a relationship — each conversation deepening the model's understanding of them and their work. They're not. They're meeting the consultant every morning. And the consultant is extraordinarily polite about not mentioning it.

That briefing lives inside the context window: the finite amount of material the model can work with in one pass. The instinct is to give it everything. Dump the full project history, the complete policy manual, every email thread that might be relevant. More information, better results.

Researchers at Stanford and UC Berkeley tested what happens when you bury important information inside a long input. The beginning was fine. The end was fine. The middle degraded — measurably, consistently, across every model they tried. They called the effect 'lost in the middle.'2 A later team pushed it further: what if the model can find everything in the document perfectly? Performance still dropped — by between 14 and 85 per cent depending on the task — simply because the input was longer.3 More information actively makes the output worse.

Hand your consultant a focused three-page brief and they do sharp work. Dump 500 pages on their desk and they miss the key facts buried on page 247. The practical fix is retrieval-augmented generation — RAG. Instead of feeding the model everything, you search for the relevant fragment and inject just that. A customer asks about your returns policy. You don't send the 200-page manual. You pull the two-page returns section. You've handed the consultant the right page at the right time.

The skill is knowing which two pages matter. The discipline is resisting the urge to include the other 198.

When the Consultant Can Act

So far, the consultant just reads and writes. You hand them a brief, they hand back text.

Modern AI systems go further. They can use tools — search the web, query a database, read a file, send a message. When a model has tools and runs in a loop — reason about the situation, take an action, observe the result, reason again — you get what the industry calls an agent.

The loop is powerful but brittle, and the breaking point is always the same: the model must be able to tell when the task is finished. "Make my presentation better" is a task for a human. Better by what standard? For which audience? The model has no way to verify success, so it loops — revising, second-guessing, never converging. "Add a summary slide with three bullet points showing Q4 revenue" is a task the agent can verify step by step. Verifiable tasks produce useful agents. Vague ones produce expensive spinners.

A vague prompt to a chatbot wastes your afternoon. A vague prompt to an agent — one that can send emails, update records, or commit code — can do real damage. In 2024, New York City deployed a chatbot to help small business owners navigate regulations. It told landlords they could refuse housing vouchers, told employers they could take workers' tips, and suggested businesses could fire staff for reporting harassment — all illegal under city law.4 The model was not broken. It was unbriefed. A system grounded in actual city regulations would have given correct answers. Instead, a general-purpose model was pointed at high-stakes questions with no domain context. The city's fix was a disclaimer.

The same logic scales. Complex tasks work better with a team of specialists than one consultant trying to do everything. Each specialist gets a focused brief, only the tools it needs, and a narrow slice of context. A document describing your specific deployment steps transforms the model from 'knows deployment' to 'knows your deployment.' No retraining. A better brief.

What Stays Human

The most common mistake organisations make with AI is assuming the model is the bottleneck. "We need a bigger model." "We need the next generation." "We're waiting for the one after GPT-5." Sometimes capability genuinely is the constraint. More often, the model could already do the job — if anyone wrote it a decent brief. Organisations keep recruiting more brilliant consultants instead of teaching anyone to write a briefing.

When researchers tested what happens when you give the model specific instructions instead of vague ones, GPT-4's accuracy jumped from 60 to 83 per cent.5 The model was identical. The input changed.

You can feel this yourself. Compare "help me write an email" with "draft a two-paragraph reply to Sarah's message about the Q3 delay, explaining we've fixed the pipeline issue and will hit the revised deadline — tone should be direct but not defensive." The first produces something you'd delete. The second produces something you'd send.

If the briefing is the real work, the value shifts. Production costs drop by the month. Direction does not. The scarce skill becomes knowing what to put in front of the machine: which two pages matter, what the task actually is, how to tell good output from plausible output. That takes domain knowledge, judgement, and a willingness to be specific — which means committing to a view of what the work should be before the machine starts producing it.

This is harder than it sounds. Being specific requires clarity about what you want, and most people — most organisations — prefer to stay vague because vagueness avoids commitment. "Make it good" is safe. "Make it a three-paragraph piece with this structure, this tone, and this single point" is an opinion someone can disagree with.

The consultant will always be brilliant. The question is whether anyone is willing to write a brief worth being brilliant about.

Notes
1.

Artificial Analysis, "AA-Omniscience Benchmark," 2025. Tested 40 models on 6,000 difficult factual questions across six domains; models score +1 for correct, −1 for incorrect, 0 for abstaining. Only three scored above zero on the overall index — meaning only three produced more correct answers than false ones. The highest-accuracy models reach roughly 55 per cent, but models that score well on accuracy frequently hallucinate on over 85 per cent of the questions they get wrong, because they rarely decline to answer. On the structural cause, see Kalai and Nachum, "Why Language Models Hallucinate," OpenAI, September 2025 — nine out of 10 major benchmarks give zero credit for abstaining, making confident guessing the rational strategy.

2.

Liu, Nelson F., et al. "Lost in the Middle: How Language Models Use Long Contexts." Transactions of the Association for Computational Linguistics 12 (2024): 157–173. Models handle information at the beginning and end of long inputs well, but performance degrades measurably for material placed in the middle. The effect was consistent across model sizes and architectures.

3.

Du, Yufeng, et al. "Context Length Alone Hurts LLM Performance Despite Perfect Retrieval." Findings of EMNLP 2025, 2025. Even when models could perfectly retrieve all relevant evidence, performance degraded as input length grew — by between 14 and 85 per cent depending on the task. Context length itself imposes a cost, independent of retrieval quality.

4.

The Markup, "NYC's AI Chatbot Tells Businesses to Break the Law," 29 March 2024. The city's MyCity chatbot, built on Microsoft Azure AI, routinely gave illegal regulatory advice to small business owners — including that landlords could reject Section 8 housing vouchers, employers could take workers' tips, and businesses could fire staff for reporting sexual harassment. The city's response was a disclaimer rather than a redesign.

5.

Kim, Olivia. "DETAIL Matters: Measuring the Impact of Prompt Specificity on Reasoning in Large Language Models." Preprint, 2025. Moving from vague to detailed prompts improved GPT-4's accuracy from 60 to 83 per cent across reasoning benchmarks. The model was identical; only the input specificity changed.

Process

I used AI tools (Gemini, Claude, ChatGPT) as editors — to challenge my thinking and tighten the prose.

© 2026 Thomas Wainwright