Architecture2026-02-164 min

Why Do We Need Our Own Canonical Schema?

Every platform stores product data differently. Without a canonical schema, your AI agent is just guessing. Here's why normalization is the foundation of catalog intelligence.

Every e-commerce platform stores product data differently. Shopify uses metafields. WooCommerce uses custom attributes. Salesforce Commerce has its own object model. Feed systems like Feedonomics flatten everything into TSV columns.

If you point an LLM at raw Shopify data and ask it to "fix" your product catalog, it will hallucinate attributes, invent field names, and produce inconsistent output across products. The model doesn't know what "material" means in your catalog vs. another.

The canonical schema solves this.

EKOM normalizes every product from every platform into a single, typed product object. Every attribute has a defined key, a validation rule, and a Schema.org mapping. When an agent proposes a change, it proposes a change to a *known* attribute — not a freeform string.

What the schema gives us:

1.Consistency across platforms. A product from Shopify and a product from WooCommerce look identical once normalized. Agents don't need platform-specific logic.

1.Validation rules. 24 attribute rules across core + apparel modules. Required vs. optional. Commercial vs. structural. Each rule maps to Schema.org. An agent can't propose a change to an attribute that doesn't exist in the schema.

1.Drift prevention. Without schema constraints, LLM output drifts. Monday's enrichment uses different attribute names than Friday's. The schema prevents this by definition.

1.Confidence thresholds. Every proposed change has a confidence score. Low-confidence fills are flagged for review. The schema defines *what* can be filled, and confidence defines *whether* it should be.

1.Audit trail. Every change is a versioned patch against a known attribute. You can trace any value back to its source: original catalog, agent fill, or human override.

The LLM is the brain. The schema is the spine.

Without structure, an "AI agent" is just a copywriter with API access. The canonical schema is what makes EKOM defensible — it prevents hallucinated attributes, inconsistent enrichment, and catalog drift.

This isn't over-engineering. It's the minimum viable constraint set for operating on production product data at scale.