Overview
The Evidence Package Specification (EPS) is a draft interchange format for AI-generated outputs that must be verified, exchanged, and preserved over time. Think of it as OpenAPI for evidence: a portable object any system can export so downstream auditors, benchmarks, and regulators can inspect what happened without depending on the original runtime.
EPS v0.1 is a draft interchange envelope aligned with RFC-001: Evidence JSON v1.0 in the Dali repository. This page is the public-facing summary; the canonical spec is RFC-001.
Repo: github.com/yenklabs/Dali · Canonical spec: RFC-001: Evidence JSON v1.0 · Related: reproducible evidence bundles
Design goals
- Portable — one self-contained record, not a scattered row in a benchmark table.
- Verifiable — cryptographic hashes and replay metadata so outcomes can be re-checked offline.
- Domain-extensible — legal citations are the first vertical; contracts, policies, and clinical guidance use the same envelope.
- Benchmark-ready — every EPS instance can be scored against the verification taxonomy.
Evidence classes
Failures are one class of evidence, not the product. EPS supports multiple evidence_class values:
| Class | Description |
|---|---|
failure |
Documented breakdown (fabrication, unsupported proposition, etc.) |
verified |
Authority exists, proposition supported, bundle complete |
disputed |
Competing interpretations or unresolved human review |
benchmark_artifact |
Synthetic or curated eval row with golden expectations |
replay_trace |
Deterministic re-run output sealed under a policy version |
The Open Evidence Corpus (open-evidence-corpus) maps to evidence_class: failure in v0.1.
EPS v0.1 envelope
{
"eps_version": "0.1",
"package_id": "eps-001-mata-v-avianca",
"evidence_class": "failure",
"created_at": "2026-06-08T00:00:00Z",
"domain": "legal",
"prompt": "Research whether the statute of limitations was tolled…",
"model": {
"provider": "openai",
"name": "gpt-3.5-turbo",
"config_fingerprint": "sha256:…"
},
"context": {
"retrieved_sources": [],
"runtime_state_preserved": false
},
"output": {
"text": "…",
"citations": ["Varghese v. China Southern Airlines, 925 F.3d 1339 (11th Cir. 2019)"]
},
"sources": [
{
"authority_id": "cite-001",
"citation_string": "Varghese v. China Southern Airlines, 925 F.3d 1339 (11th Cir. 2019)",
"source_blob_sha256": null,
"retrieval_snapshot_uri": null
}
],
"verification": {
"policy_version": "dali-tier1-v0.2",
"primary_outcome": "authority_not_found",
"outcomes_by_authority": {
"cite-001": "authority_not_found"
},
"summary": "Citation does not resolve in canonical reporter indices."
},
"taxonomy": "dali-verification-taxonomy/v0.1",
"replay_hash": "sha256:…",
"review_status": "documented_incident",
"evidence_bundle": {
"merkle_root": "sha256:…",
"bundle_uri": null,
"yenklabs_investigation_url": "https://yenklabs.com/failures/001-mata-v-avianca"
},
"annotations": {
"human_reviewer": null,
"reviewer_notes": null
}
}
Required fields (v0.1)
| Field | Required | Notes |
|---|---|---|
eps_version |
yes | Spec version ("0.1") |
package_id |
yes | Stable identifier |
evidence_class |
yes | See table above |
prompt |
yes | User or system prompt that produced the output |
model |
yes | Provider, name, config fingerprint |
output |
yes | Model output text and extracted citations |
verification |
yes | Outcome(s) under a named policy version |
taxonomy |
yes | Taxonomy dataset or version URI |
replay_hash |
recommended | Hash of deterministic replay under fixed policy |
evidence_bundle |
recommended | Merkle root and/or bundle URI |
sources |
recommended | Retrieved or cited primary materials |
review_status |
optional | documented_incident, peer_reviewed, draft, etc. |
Relationship to Dali
| Layer | Role |
|---|---|
| EPS | Interchange format — how evidence is exported and exchanged |
| Dali | Verification engine — scores packages, seals replay hashes, runs offline eval |
| Open Evidence Corpus | Public archive — open-evidence-corpus on Hugging Face |
| Verification Benchmark | Measurement — whether evidence in a package can be trusted |
Roadmap
- v0.1 (draft) — envelope definition, legal-vertical examples, mapping from failure records
- v0.2 —
dali-evidence-packagesHugging Face dataset with full seed packages - v1.0 — JSON Schema, validation CLI (
dali eps validate), cross-domain examples
Citation
@misc{yenklabs_eps_2026,
title={Evidence Package Specification (EPS) v0.1},
author={YenkLabs},
year={2026},
url={https://yenklabs.com/artifacts/evidence-package-spec-v0.1},
note={Draft interchange format for portable AI evidence}
}