21 June 2026

Allergen Data Needs Evidence, Not Keywords

A builder-focused guide to modelling recipe allergens with source, jurisdiction, confidence, and user-visible caveats instead of keyword guesses.

allergensapidata-modelingdevelopers

Allergen filters are trust features

A recipe app can get away with fuzzy matching for inspiration. It cannot get away with fuzzy matching when a user is avoiding peanuts, milk, sesame, gluten, or shellfish. Allergen handling is not just another search facet; it is a trust feature that needs explicit data provenance, conservative defaults, and clear product language.

The common shortcut is to scan ingredient text for a small list of allergen words. That works for demos, but it breaks quickly in real products. Ingredients use aliases, regional names, compound foods, brand-specific formulations, and preparation notes. A safe recipe experience needs to model where an allergen claim came from and how confident the system is allowed to be.

Keep dietary flags separate from allergen claims

Dietary labels and allergen labels answer different questions.

A recipe can be vegetarian and still contain milk. A gluten-free label might come from a packaged product claim, a recipe author note, or a parser that simply failed to find wheat. A vegan recipe can still be unsafe for a tree-nut allergy if cashews are used as a cream substitute.

Treat these as separate API fields:

{
  "dietaryTags": ["vegetarian"],
  "allergens": [
    {
      "allergen": "milk",
      "status": "contains",
      "source": "ingredient_match",
      "confidence": "high"
    }
  ]
}

That separation makes the contract honest. Dietary tags support discovery and preference matching. Allergen fields support risk communication and should carry source, status, and confidence.

Start with a jurisdiction-aware allergen vocabulary

Allergen lists are not universal. The FDA explains major food allergens for United States packaged food labelling, including milk, eggs, fish, crustacean shellfish, tree nuts, peanuts, wheat, soybeans, and sesame. The European Union's Food Information to Consumers regulation uses its own Annex II list of substances or products causing allergies or intolerances, including cereals containing gluten, crustaceans, eggs, fish, peanuts, soybeans, milk, nuts, celery, mustard, sesame seeds, sulphur dioxide and sulphites, lupin, and molluscs.

For an API, that means the allergen vocabulary should not be a hard-coded global enum with no context. A practical model needs at least:

canonical allergen key
display label
jurisdiction or vocabulary source
aliases and ingredient terms
whether the match is exact, derived, or declared
whether the claim applies to the whole recipe or one ingredient

This is especially important for products that serve both US and EU users, or that ingest global packaged-food data.

Distinguish contains, may contain, and unknown

A boolean hasPeanuts field is too small for real food data. Users and product teams need to distinguish confirmed presence, precautionary traces, absence claims, and unknown status.

A better shape is a status field:

{
  "allergen": "peanut",
  "status": "unknown",
  "evidence": []
}

Then let stronger evidence update the status:

{
  "allergen": "peanut",
  "status": "contains",
  "evidence": [
    {
      "type": "ingredient",
      "field": "ingredients[3].name",
      "value": "peanut butter"
    }
  ]
}

Use unknown as a real value, not a missing field. Missing fields are easy for clients to misread as safe. Unknown tells the UI to avoid promises and, when appropriate, ask the user to review the ingredient list.

Preserve evidence for debugging and UI copy

Allergen mistakes are hard to diagnose if the API only returns a final flag. Preserve the evidence that produced the decision.

Open Food Facts, for example, exposes separate product fields for ingredient text, allergen values, allergen tags, traces, and trace tags. That is a useful pattern: the system can distinguish what the package or data source declares from what a downstream parser infers.

For recipe data, evidence can come from several places:

a normalized ingredient identity
an original ingredient string
a packaged-product allergen declaration
a user-submitted exclusion claim
a source recipe note
a human review override
an AI-generated or parser-generated inference

Those sources should not have the same weight. A human-reviewed packaged product declaration is different from a keyword hit in free text. Store that difference so the frontend, support team, and data pipeline can explain the result.

Avoid silent safety claims from generated recipes

AI-generated recipes create another trap. A model may generate an ingredient list that appears allergen-free while using ambiguous names, substitutions, or branded products it cannot verify. If generated recipes flow into the same API as catalog recipes, the allergen pipeline should still run, but the confidence should reflect the source.

A generated recipe can safely say "no peanut ingredient detected" only if the product language makes the limit clear. It should not silently become "peanut-free" unless the data pipeline has stronger evidence and the business is prepared to stand behind that claim.

In practice, keep generated content conservative:

normalize generated ingredients into the same ingredient schema as catalog recipes
run the same allergen matching rules
mark evidence as generated or inferred
expose confidence and review status
avoid stronger UI labels than the evidence supports

Make the client contract boring and explicit

A useful allergen contract does not need to expose every internal rule. It does need to prevent client teams from inventing unsafe interpretations.

A builder-grade response can look like this:

{
  "allergenSummary": {
    "contains": ["milk", "tree_nuts"],
    "mayContain": [],
    "freeFrom": [],
    "unknown": ["sesame"]
  },
  "allergenEvidence": [
    {
      "allergen": "tree_nuts",
      "status": "contains",
      "source": "normalized_ingredient",
      "ingredient": "cashew cream",
      "confidence": "high"
    }
  ]
}

The summary is easy for apps to render. The evidence gives product and support teams enough context to investigate.

What builders should do differently

If you are building a meal planner, grocery app, nutrition tracker, or AI cooking assistant, treat allergens as a data model instead of a label. Start with these rules:

Do not collapse dietary preferences and allergen claims into one tag array.
Use a jurisdiction-aware allergen vocabulary.
Return contains, may_contain, free_from, and unknown as different states.
Preserve evidence and source metadata for each allergen decision.
Make generated or inferred claims visibly weaker than declared or reviewed claims.
Avoid frontend-only keyword scans as the source of truth.

Recipe API is built for this kind of structured food-data workflow: normalized ingredients, per-serving nutrition, consistent recipe objects, and API fields that let applications reason about food instead of reparsing recipe text. Allergen support should follow the same principle. The more sensitive the claim, the more explicit the evidence needs to be.

Sources

Start Building

One consistent schema on every response. Get a free key and ship in minutes.

Get Free API Key Read Docs