Pandotic · Field Notes

Controlling
the Truth

An LLM doesn't look facts up. It predicts plausible words. That difference is where AI projects quietly go wrong — and what a knowledge engine is built to fix.

9 min readblog + how we buildwith a real case

Ask a powerful AI a question about your industry and it will give you a fluent, confident, well-organized answer. It will sound exactly like an expert. That is the problem — not the feature. Fluent and correct are two different things, and an LLM optimizes for the first.

This is the companion piece to our buzzword decoder, going deep on the one idea that decides whether an AI product is a business or a liability: who controls what it's allowed to treat as true. It's part argument, part a look at how we actually build — and it ends with a real system we built doing exactly this in a domain where being confidently wrong costs real money.

Part 1

How an LLM actually answers a question

Strip away the chat interface and the magic, and the mechanism is unsettlingly simple — and that simplicity is the whole risk.

A large language model was trained to do one thing: given some text, predict the next chunk of text that is most plausible. Do that billions of times over most of the written internet and the result is astonishing — it can write, reason, summarize, and explain. But under the hood, when you ask it "does this building qualify for that credit?" or "what does our refund policy say?", it is not opening a drawer and reading a document. It is generating the sequence of words that looks most like a correct answer.

Most of the time, the most-plausible-looking answer is also the true one — which is exactly why this is dangerous. The model is right often enough to earn your trust, then wrong in the specific, high-stakes places where the true answer and the plausible-sounding answer diverge. It will not flag the difference. It cannot. It has no concept of "I am now guessing." Confidence is the default setting; it is not evidence.

An LLM with nothing controlling its inputs is a confident stranger. Brilliant, fast, well-spoken — and with no obligation to be right about your business specifically.

Everything people bolt onto LLMs to make them safe for real work — retrieval, knowledgebases, guardrails, citations — exists to address this one gap. Not because the model is bad. Because the model is a plausibility engine, and a business runs on truth, and those are not the same machine.

Part 2

Six ways a fluent answer is wrong

Uncontrolled AI doesn't fail loudly. It fails convincingly — in patterns that are predictable once you know to look.

"Just let it search the web" feels like the fix. It isn't — it usually changes which wrong answer you get. Here is how a fluent, web-searching model goes wrong in exactly the ways that cost you, drawn from what we see in regulated, technical, and high-consequence domains.

Anecdote outranks authority

Open-web search rewards what's popular and well-SEO'd, not what's correct. A forum thread of practitioners guessing will outrank the actual standard that practitioners are guessing about. The model can't tell the difference between a venting Reddit post and the source of record — it just sees text.

Version blending — the silent killer

Standards, policies, and rules have versions. The right answer often depends on which version applies to you. The open web has all versions mixed together with no dates; old threads confidently discuss rules that no longer apply. A blended answer is a wrong answer wearing a correct one's clothes.

The long tail decides the outcome

The carve-outs, exceptions, and edge cases are exactly what real decisions turn on — and exactly what nobody writes a clean public explainer about. They live in the primary source. A model that learned the topic from the open web learned the well-trodden 80% and is blank on the 20% that actually determines the call.

Domain bleed

Ask about one framework and a general model will happily blend in a neighboring one, or fold a strategy opinion into what should be a hard compliance answer. It has no internal border between "context that enriches" and "the rule that decides." The dangerous answer is the one that mixes the two and sounds whole.

No audit trail

Ask the same question twice, get two different answers — neither sourced, both confident. For anything you have to defend, justify, or stand behind, "the AI said so" is not a position. Non-determinism is fine for brainstorming and fatal for adjudication.

The paywall reality

The most authoritative sources in most serious domains are paywalled or licensed — reference guides, standards bodies, proprietary datasets. A web-searching model literally cannot read them. So it paraphrases a third-hand summary of the thing that actually matters, and presents the paraphrase with the same confidence as a quote.

None of these are exotic. They are the default behavior of a plausibility engine pointed at the open internet. The fix is not a smarter model. The fix is changing what the model is allowed to treat as true.

The Capability · How we build

What a knowledge engine actually is

A chatbot has a knowledgebase. A serious product has a knowledge engine — and those are not the same noun.

"We'll add a knowledgebase" usually means: dump some PDFs into a vector database and hope the retrieval is good enough. That clears the demo. It does not clear production, because it inherits most of the six failures above — just from your documents instead of the web.

A knowledge engine is the discipline of deciding, deliberately, what the AI is allowed to know — and proving it. When we build one for a client, it has properties a document dump never does:

Sourced, with provenance

Every fact is tagged to the authority that owns it, with a retrieval date and a trust tier. "Where did this come from?" always has an answer.

Structured, not scraped

Knowledge is encoded into records with keys, versions, and keywords — so the right slice is selectable, not left to luck.

Version-aware

The engine knows which edition of a rule applies to this case, and refuses to blend versions. This alone removes the single most common failure.

Bounded

Explicit borders between domains. Context that enriches an answer is structurally forbidden from deciding it.

A verdict procedure

Not "here's a summary." A real decision rule that can return insufficient data instead of a confident guess when the evidence isn't there.

Deterministic & auditable

You can trace exactly why a given piece of knowledge was used for a given answer. Reproducible. Inspectable. Defensible.

Failure-aware

It encodes not just what's required, but how real submissions fail — the difference between an explainer and an advisor.

Tested

The knowledge layer has its own test suite asserting it routes correctly and respects its own boundaries. Knowledge you can regression-test.

The model is rented and replaceable. The knowledge engine is the asset — and it's yours.

This is also why we never marry a client's product to one AI vendor. The model underneath can change next quarter; the knowledge engine — structured, sourced, owned by you — is the part that compounds in value and travels with you. More on why we stay model-agnostic →

The Proof · A system we built

LEEDSmart: a knowledge engine in a domain where wrong is expensive

Green-building certification is a perfect stress test for everything above. The rules are versioned and consequential, the authoritative sources are paywalled, the edge cases decide whether a project earns a credit, and a confident-but-wrong answer can cost a client a certification. We built a knowledge engine for exactly this — here's the shape of it, with the proprietary internals left out.

It isn't one knowledgebase. It's three, with a border guard.

The most important architectural decision: the knowledge is partitioned into three domains that are forbidden from answering each other's questions, with a runtime router deciding which one is even allowed to respond.

Domain

Owns / must not answer

Compliance

Owns: whether a credit is earned, prerequisites, points, certification logic. Must not: give building-science strategy with no compliance decision.

Model QA

Owns: whether an energy model is credible — calibration, simulation health, plausibility. Must not: render compliance verdicts.

Strategy

Owns: how carbon, water, air quality, materials, resilience interact. Must not: decide whether a credit is earned or a model is valid.

Why this matters: the most dangerous failure in this domain is a confident answer that blends a strategy opinion with a compliance claim. The border guard makes that structurally hard instead of hoping the model behaves. Strategy can enrich an answer; it is forbidden from deciding it. That's risk #4, engineered out.

It's version-gated to the project's actual registration date.

Multiple versions of the standard exist, with materially different requirements, and which one applies depends on when a project registered. The engine filters guidance by that registration date and refuses to serve the wrong edition. That's risk #2 — the silent killer — closed at the structural level, not left to the model's judgment.

It knows how submissions fail — not just what they require.

The hardest-won layer is a structured catalog of how real submissions actually get rejected in review: the patterns, why they fail, what to ask for instead. A web search will tell you a requirement exists. It will not tell you the specific way people think they've met it and haven't. That's institutional reviewer experience encoded as data — and it has no equivalent anywhere on the public internet.

It quotes the source instead of paraphrasing a rumor.

The authoritative reference guides and standards are paywalled — a web-searching model cannot legitimately read them. This engine has them licensed and structured, so it works from the actual requirement, with provenance, against named public authorities tracked with source IDs and retrieval dates. That's risk #1 and risk #6, handled.

It returns a verdict, not a vibe.

Each requirement carries an explicit decision procedure: return insufficient data when documents are missing, not compliant when they're inconsistent, compliant only when everything is present and consistent. The output is a defensible adjudication, not a fluent guess — and there's a test suite asserting it routes correctly and stays inside its own boundaries.

A web-searching LLM gives a fluent, version-blended, unsourced opinion. The knowledge engine gives a version-correct, source-anchored, reviewer-aware adjudication — and knows the difference.

That sentence is the entire value proposition of controlling the truth, in one domain. The domain changes per client. The discipline doesn't.

Truth is a system, not a setting

There is no toggle on any model that makes it reliably right about your business. Reliability isn't a model property — it's something you build, by deciding what the AI is allowed to treat as true and proving that it did. That work is unglamorous, it doesn't demo, and it is the entire difference between an AI feature people enjoy and an AI product people can depend on.

It's also the part that's yours. Models are rented. A knowledge engine — structured, sourced, version-aware, owned by you — is the asset that appreciates while the models underneath it churn.

That's the part we like building best.

— PANDOTIC

Companion piece

Talk Nerdy To Me — the full buzzword decoder

In the map

Knowledgebase: a product, not a prayer

The next move

If your AI product's answers have to be right — not just fluent — that's a knowledge engine, and it's what we build. Sourced, version-aware, auditable, model-agnostic, yours to own.

Work with us →