Architecture2026-02-153 min

Why Can't We Just Use the LLM Directly?

LLMs are powerful, but pointing GPT-4 at your catalog and saying 'fix it' produces unreliable, unauditable output. Here's why agents need structure.

The question comes up a lot: "Why not just use ChatGPT to fix product data? Why do we need all this infrastructure?"

Here's the honest answer: you *can* use an LLM directly. And for a one-off task — rewriting 10 product descriptions — it works fine. But for operating on a catalog of 5,000+ products continuously, direct LLM usage breaks down fast.

Problem 1: Inconsistency.

Ask GPT-4 to "improve" the same product description twice and you'll get two different outputs. Ask it to fill "material" for 200 products and you'll get "100% cotton," "Cotton," "cotton fabric," and "Pure cotton" — all for the same material. Without schema constraints, LLM output is inherently inconsistent.

Problem 2: No audit trail.

If you pipe LLM output directly into your catalog, you have no record of what changed, why, or who approved it. Enterprise brands need change logs. They need rollback. They need to know which products were modified and what the previous values were.

Problem 3: No safety net.

An unconstrained LLM might decide to "improve" your product title. Or rewrite a legally required compliance field. Or change pricing-adjacent data. Without boundaries — protected fields, scope filters, confidence thresholds — there's no way to prevent this.

Problem 4: No composability.

Direct LLM calls don't compose. You can't say "only fix products over $50" or "don't touch hero SKUs" or "auto-approve low-risk changes but flag high-risk ones." These require a structured layer between the user's intent and the LLM's output.

What EKOM does differently:

The LLM is one component inside a structured pipeline. It generates *proposed* attribute values — but those proposals are validated against the canonical schema, scored for confidence, wrapped in a Patch object, and queued for approval.

The conversational layer translates user intent into structured Policy and Job objects. The engine executes within those constraints. The audit log records everything.

The LLM provides intelligence. The system provides safety.