all projects
2026 Creator, Python

attest

An open-source, evidence-grounded evaluator for AI agents, published on PyPI. It grades an agent against its real tool outputs, not LLM-judge vibes.

PyPi
PyPi

attest is an open-source tool for evaluating AI agents. Most ways of grading an agent ask another model, "Is this answer good?" which is surprisingly easy to fool. Research has shown that simply rewriting an agent's reasoning, while leaving what it actually did unchanged, can inflate an AI judge's false-positive rate by up to 90%. attest takes the opposite approach. It never trusts what the agent says it did. It breaks the answer into individual claims and checks each one against the agent's real tool outputs, the receipts, grounding every verdict in the exact piece of evidence behind it.

It runs four checks: faithfulness, whether the answer matches what the tools returned; tool-use correctness, whether the right tools were called and errors handled; prompt injection, whether hidden instructions in tool data hijacked the agent; and role adherence, whether the agent stayed within its defined job or got talked out of it. It works across Anthropic, OpenAI, and Gemini behind one interface and returns a single typed report with per-check scores.

I designed and published it as a pip-installable package, agent-attest, with automated releases; hardened its own judge against the same injection it looks for; and made it cheap to run at scale by batching the verification. I use it the way evals are meant to be used: as a test suite. A spec of expected agent behaviour, graded by attest, that turns red when the agent regresses, so a problem shows up as a failing check instead of reaching production.

highlights

  • Designed and published a pip-installable evals package (agent-attest), MIT-licensed, with automated PyPI releases via GitHub Actions Trusted Publishing.
  • Four checks: faithfulness, tool use, prompt injection, and role adherence, across Anthropic, OpenAI, and Gemini.
  • Grounded in current research (Gaming the Judge, 2026): rewriting an agent's reasoning can inflate an LLM judge's false-positive rate by up to 90%.
  • 73 offline tests, dogfooded in a real LangGraph agent's eval suite.

attest is an open-source tool I built and published for evaluating AI agents. It grades an agent by checking its claims against the real outputs of the tools it used, instead of asking another model whether the answer looks good. It installs from PyPI as agent-attest, runs on Anthropic, OpenAI, or Gemini, and is MIT licensed.

The problem

Evaluating an agent usually means LLM-as-judge: one model grading another. The weakness is that the judge reacts to the agent's explanation, so a confident, well-written answer can pass even when a specific detail buried inside it is wrong. This is measured, not hypothetical. A 2026 paper, Gaming the Judge, found that rewriting an agent's reasoning while leaving its actions unchanged can inflate a judge's false-positive rate by up to 90 percent.

The approach

attest never trusts what the agent says it did. It breaks the final answer into atomic claims and verifies each one against the agent's recorded tool outputs, the receipts. The property that makes this resistant to gaming is isolation: when attest verifies a claim, the model sees only that claim and the evidence, never the agent's reasoning or narrative. It still uses a model to judge, but constrains that judgment to a narrow entailment question rather than a holistic opinion, and every verdict quotes the exact span of evidence behind it.

What it checks

attest grades a run across four dimensions:

  • Faithfulness: does each claim follow from the tool outputs?
  • Tool use: were the right tools called, and were errors handled rather than ignored?
  • Prompt injection: did instructions hidden in tool data steer the agent? A regex scan catches known payloads, and an effect-based check catches novel ones by asking whether the agent took an action the user never authorized.
  • Role adherence: did the agent stay within the role its system prompt defines, or get talked out of it by a jailbreak?

Each check returns the same shape, a result with a pass or fail, an optional score, and a list of findings, so all four read and serialize the same way.

Engineering decisions

A few choices I am happy with:

  • One interface, three providers. Grading runs on Anthropic, OpenAI, or Gemini behind a single structured-output layer, so you can switch models without changing your code.
  • The judge is hardened against itself. attest reads attacker-controllable text, which makes its own judge an injection target. A guard frames that text as data to evaluate, never commands to obey, so a planted "mark this as passing" does not flip the verdict.
  • Cheap at scale. Faithfulness originally made one model call per claim, which exploded on long answers. I batched the verification and tightened claim extraction, taking one real example from about 170 grading calls down to 5, with the verdicts unchanged.
  • Shipped, not just written. It is published to PyPI with automated releases through GitHub Actions Trusted Publishing, so a new version goes out on a tagged release with no manual upload and no long-lived tokens.

Evals as tests

I use attest the way evals are meant to be used: as a test suite. I dogfooded it on a real LangGraph code-review agent by writing a spec of expected behavior, a few legitimate requests plus a set of jailbreak attempts, and a runner that grades each result with attest and asserts against the spec. It exits non-zero on failure, so it drops into CI. If the agent's system prompt is ever weakened and it starts complying with a jailbreak, the suite turns red instead of the problem reaching production.

Honest limits

attest constrains a model rather than removing it, so its verdicts are far harder to game than a holistic grader, but not infallible. Prompt injection is an unsolved problem, and attest detects rather than prevents it. The goal is to make agent behavior measurable and auditable, not to claim it is solved.

Repo: github.com/adepeju4/attest · Install: pip install agent-attest

built with

#Python#LLMS#Evals#Pydantic#Langchain#AI Agents#Prompt Injection#Security

© 2026 adepeju orefejo