Citation failures are not binary. A link checker returns valid or broken. Legal AI failures require a finer classification — one that separates what broke from why it broke.
Dali classifies every evaluated citation into one of five verification outcomes:
1. Verified
The cited authority exists. The attributed proposition is supported by the source text at the claimed pin cite. The evidence bundle is complete and reproducible.
2. Authority Not Found
The citation does not resolve to a published opinion, statute, or record in canonical registries. Total fabrication — the Mata v. Avianca failure class.
3. Proposition Unsupported
The authority exists. The link opens the correct docket or reporter entry. The attributed holding or rule is not entailed by the source text. The sophisticated lie.
4. Source Trail Missing
The output references an authority but the workflow preserved no primary source material, retrieval snapshot, or runtime state needed to verify what the model actually saw.
5. Unverifiable
Insufficient evidence was preserved at generation time to classify the citation at all. The output cannot be inspected, reproduced, or defended — regardless of whether the citation happens to be correct.
Why Five Outcomes Matter
Most evaluation tools collapse everything into "hallucinated" vs. "not hallucinated." That loses the engineering signal:
- Outcomes 2 and 3 require different detection pipelines
- Outcomes 4 and 5 are evidence preservation failures, not model failures
- Outcome 1 still requires a sealed bundle — correctness without provenance is not defensible
The failure database at yenklabs.com/failures maps real incidents to these outcome classes. The taxonomy is published on Hugging Face. The goal is ground-truth data for testing whether any AI system preserves enough evidence to reach a defensible classification.