Engineering2026-02-247 min

Not All Product Fields Are the Same

We built an optimization engine for meta titles. Then we tried to extend it to every field in the catalog. That's when we realized: different fields need fundamentally different product behavior. Here's the framework that emerged.

We started with one field: meta_title.

The idea was simple. Product meta titles are often bad — too short, missing the brand name, stuffed with SKU fragments. An AI agent should be able to evaluate a meta title, propose a better one, and deploy the fix. So we built that.

It worked. The agent would scan a catalog, flag weak meta titles, generate improved versions citing Google Merchant Center standards, and present them as patches for approval. Deploy a batch, every product gets a better title tag. Clean.

Then we tried to extend it.

The Pattern We Found

The meta_title engine had a clear pipeline: interpret the user's intent, run quality evaluation against each product, propose patches with citations, approve, deploy. We thought we could just copy this pipeline for every field — descriptions, colors, materials, GTINs, the works.

We were wrong. About halfway through, we realized we were looking at three fundamentally different problems pretending to be the same thing.

Type A: Optimization Fields

These are fields where a value already exists, but the quality is poor. The product has a title — it's just bad. The description exists — it's just two sentences of marketing fluff. The meta title is there — it's a duplicate of the product title with no brand or category signal.

The fields in this category:

Title
Meta title
Description
Meta description (coming soon)
Alt text (coming soon)

These are rewrite problems. Restructure, clarify, enrich. The agent can handle these without asking the merchant anything, because it's improving what already exists. It has the raw material.

The pipeline for optimization fields is now a repeatable pattern:

1.Detect intent — the user says "improve my descriptions" or "optimize titles"
2.Run quality evaluation — check length, check for garbage (SKU fragments, filenames), check whether brand and product type are present
3.Propose patches — generate an improved version with a citation to the relevant standard (Google Merchant Center, Schema.org)
4.Veto check — reject proposals that are worse than what exists (shorter by more than 30%, keyword-stuffed, generic)
5.Approve and deploy

This is the optimization engine. Adding a new optimization field means writing a quality evaluator and a proposal generator. The pipeline, veto logic, patch lifecycle, and deployment path are all shared.

What Broke When We Built It

The first time we deployed meta_title patches, the deploy handler wrote to the database correctly. The audit trail showed the change. But when we navigated to the product detail page, the old value was still there.

Two bugs, stacked on top of each other.

First, the frontend wasn't invalidating its cache after deploy. React Query had stale product data, so the page showed the pre-deploy values even though the database had the new ones. We had to add explicit cache invalidation for product, catalog, and job queries after every deploy action.

Second, the product detail page didn't know meta_title existed as a renderable field. We had a field label map that controlled which attributes showed up on the page, and meta_title wasn't in it. The data was in the database, the API was returning it, but the UI was silently dropping it.

Neither bug would have surfaced in a unit test. They only appeared when a human clicked "deploy" and then looked at the product. This is the kind of thing that makes end-to-end testing essential for agent systems — the agent's work is only real if the user can see the result.

We fixed both issues. But more importantly, it taught us something about the separation architecture: every layer in the pipeline (agent proposal, engine validation, database write, cache invalidation, UI rendering) is a place where a field can silently disappear. When you add a new field, you have to trace the full path.

Type B: Gap-Fill Fields

These are fields where the value is missing, but the answer is inferrable from existing product data. The product doesn't have a "color" attribute, but the title says "Navy Blue Compression Sleeve" and the tags include "blue." The product type is missing, but it's clearly a massage gun based on the description and category.

The fields in this category:

Category / product type
Color (often)
Material (often)
Gender / age group (sometimes)
Sizes (sometimes)

These are extraction problems, not generation problems. The agent isn't inventing data. It's reading the description, the tags, the variant options, and pulling out structured values that already exist in unstructured form.

This is the gap engine — the system we built first, before the optimization engine existed. It scans every product against a canonical schema, identifies missing required and recommended attributes, and proposes fills based on what it can extract.

The key difference from optimization: gap-fill fields start empty. There's no existing value to evaluate or improve. The agent's job is to find the answer in context and propose it with a confidence score. High-confidence fills (color extracted directly from tags) can be auto-approved. Lower-confidence fills (material inferred from a description paragraph) get flagged for human review.

Type C: Knowledge-Required Fields

This is where the framework gets its teeth.

These are fields where the value is missing and cannot be inferred, generated, or guessed. They are facts about the physical product that exist in the merchant's head, their supplier's spreadsheet, or their packaging — not in the product listing.

The fields in this category:

GTIN / UPC / MPN
Country of origin
Certifications (organic, cruelty-free, FSC)
Carbon impact / sustainability claims
Care instructions (often)
Compliance claims (safety, regulatory)

An LLM cannot fill these. A GTIN is a 14-digit number assigned by GS1. You can't generate one. A country of origin is a fact about a supply chain. You can't infer it from a product description. A certification claim is a legal assertion — inventing one is not just wrong, it's potentially illegal.

This is the category where most AI catalog tools fail silently. They either skip these fields entirely (leaving critical gaps), or worse, they hallucinate plausible-sounding values. We've seen systems generate fake GTINs that pass checksum validation but reference products that don't exist.

Our approach: the agent should never try to fill knowledge-required fields. Instead, the system should surface them as information requests.

"This product is missing a GTIN (required for Google Shopping). Provide the UPC or upload a product sheet."

"Country of origin is missing. Select from the list or enter manually."

"No certifications are declared. If this product has certifications, add them here."

This turns the agent from a data generator into a data collector. It knows what's missing, it knows why it matters (with citations to the relevant standard), and it knows it can't fill the gap itself. So it asks.

The UX for knowledge-required fields is fundamentally different from the other two types. There's no "propose and approve" flow. There's a "request and provide" flow. The agent creates a structured request, the merchant provides the answer, and the system validates and stores it.

Why the Framework Matters

When we started building EKOM, we treated every field the same way. A field is a field — it has a value, or it doesn't, and the agent should fill it. That simplicity was appealing but wrong.

The framework — optimization, gap-fill, knowledge-required — emerged from building and testing. We didn't design it upfront. We discovered it by watching things break.

Meta titles taught us the optimization pattern. Colors and materials taught us the gap-fill pattern. GTINs taught us that some things should never be generated.

Here's why this matters beyond our codebase:

AI search engines like ChatGPT, Perplexity, and Google AI Overviews consume structured product data. The richer and more accurate your catalog, the more likely your products surface in AI-generated answers. But "rich and accurate" means different things for different fields.

For titles and descriptions, richness means quality — clear, well-structured, brand-inclusive. An optimization engine handles this.

For attributes like color, material, and size, richness means completeness — every field filled, every value normalized. A gap engine handles this.

For compliance and identification fields, accuracy means truth — real GTINs, real certifications, real origin data. Only the merchant can provide this.

You can't build one system that handles all three. Or rather, you can — but it will hallucinate GTINs, skip hard-to-fill fields, and rewrite titles that didn't need rewriting. We tried. It doesn't work.

Adding New Fields

The framework also makes the engineering work predictable. When we need to add meta_description as an optimizable field, we know exactly what to build:

1.A quality evaluator (check length, check for missing brand, check for garbage)
2.A proposal generator (build a better version from available product data)
3.Veto rules (don't shorten, don't keyword-stuff)
4.A GMC citation (the relevant Google Merchant Center requirement)
5.Keywords for intent routing ("improve meta descriptions" maps to the right field)

That's it. The patch lifecycle, the approval flow, the deploy mechanism, the cache invalidation, the audit trail — all shared infrastructure. Plug in the field-specific logic and the rest is handled.

For a gap-fill field, the work is different: define the extraction rules, set the confidence thresholds, add it to the canonical schema.

For a knowledge-required field, it's about building the right request UX — what question to ask, what input format to accept, what validation to run on the answer.

Three types of problems. Three types of solutions. One framework that makes each one a known pattern instead of a one-off build.