← yenklabs.com

// Artifact · v0.1

Evidence Package Specification (EPS)

2026-06-29

Overview

The Evidence Package Specification (EPS) is a draft interchange format for AI-generated outputs that must be verified, exchanged, and preserved over time. Think of it as OpenAPI for evidence: a portable object any system can export so downstream auditors, benchmarks, and regulators can inspect what happened without depending on the original runtime.

EPS v0.1 is a draft interchange envelope aligned with RFC-001: Evidence JSON v1.0 in the Dali repository. This page is the public-facing summary; the canonical spec is RFC-001.

Repo: github.com/yenklabs/Dali · Canonical spec: RFC-001: Evidence JSON v1.0 · Related: reproducible evidence bundles


Design goals

  1. Portable — one self-contained record, not a scattered row in a benchmark table.
  2. Verifiable — cryptographic hashes and replay metadata so outcomes can be re-checked offline.
  3. Domain-extensible — legal citations are the first vertical; contracts, policies, and clinical guidance use the same envelope.
  4. Benchmark-ready — every EPS instance can be scored against the verification taxonomy.

Evidence classes

Failures are one class of evidence, not the product. EPS supports multiple evidence_class values:

Class Description
failure Documented breakdown (fabrication, unsupported proposition, etc.)
verified Authority exists, proposition supported, bundle complete
disputed Competing interpretations or unresolved human review
benchmark_artifact Synthetic or curated eval row with golden expectations
replay_trace Deterministic re-run output sealed under a policy version

The Open Evidence Corpus (open-evidence-corpus) maps to evidence_class: failure in v0.1.


EPS v0.1 envelope

{
  "eps_version": "0.1",
  "package_id": "eps-001-mata-v-avianca",
  "evidence_class": "failure",
  "created_at": "2026-06-08T00:00:00Z",
  "domain": "legal",

  "prompt": "Research whether the statute of limitations was tolled…",
  "model": {
    "provider": "openai",
    "name": "gpt-3.5-turbo",
    "config_fingerprint": "sha256:…"
  },
  "context": {
    "retrieved_sources": [],
    "runtime_state_preserved": false
  },
  "output": {
    "text": "…",
    "citations": ["Varghese v. China Southern Airlines, 925 F.3d 1339 (11th Cir. 2019)"]
  },

  "sources": [
    {
      "authority_id": "cite-001",
      "citation_string": "Varghese v. China Southern Airlines, 925 F.3d 1339 (11th Cir. 2019)",
      "source_blob_sha256": null,
      "retrieval_snapshot_uri": null
    }
  ],

  "verification": {
    "policy_version": "dali-tier1-v0.2",
    "primary_outcome": "authority_not_found",
    "outcomes_by_authority": {
      "cite-001": "authority_not_found"
    },
    "summary": "Citation does not resolve in canonical reporter indices."
  },

  "taxonomy": "dali-verification-taxonomy/v0.1",
  "replay_hash": "sha256:…",
  "review_status": "documented_incident",

  "evidence_bundle": {
    "merkle_root": "sha256:…",
    "bundle_uri": null,
    "yenklabs_investigation_url": "https://yenklabs.com/failures/001-mata-v-avianca"
  },

  "annotations": {
    "human_reviewer": null,
    "reviewer_notes": null
  }
}

Required fields (v0.1)

Field Required Notes
eps_version yes Spec version ("0.1")
package_id yes Stable identifier
evidence_class yes See table above
prompt yes User or system prompt that produced the output
model yes Provider, name, config fingerprint
output yes Model output text and extracted citations
verification yes Outcome(s) under a named policy version
taxonomy yes Taxonomy dataset or version URI
replay_hash recommended Hash of deterministic replay under fixed policy
evidence_bundle recommended Merkle root and/or bundle URI
sources recommended Retrieved or cited primary materials
review_status optional documented_incident, peer_reviewed, draft, etc.

Relationship to Dali

Layer Role
EPS Interchange format — how evidence is exported and exchanged
Dali Verification engine — scores packages, seals replay hashes, runs offline eval
Open Evidence Corpus Public archive — open-evidence-corpus on Hugging Face
Verification Benchmark Measurement — whether evidence in a package can be trusted

Roadmap


Citation

@misc{yenklabs_eps_2026,
  title={Evidence Package Specification (EPS) v0.1},
  author={YenkLabs},
  year={2026},
  url={https://yenklabs.com/artifacts/evidence-package-spec-v0.1},
  note={Draft interchange format for portable AI evidence}
}
Public asset from the Dali open evidence ecosystem.