Documentation

Tripwire verifies that an AI-optimized function is still correct on inputs it never saw — and only then credits the speedup. Install it, try it on a bundled example, then point it at your own code.

Install Quickstart How the oracle works Authoring a Target CLI reference

Get started

Install

Tripwire is a Python package (Python 3.12+). The fastest way to try it is with uv, which runs it with no install or clone:

Run it with no install:

uvx --from tripwire-oracle tripwire demo

Or install from PyPI:

pip install tripwire-oracle
tripwire demo

Or run the very latest from GitHub:

uvx --from git+https://github.com/SammyTourani/tripwire tripwire demo

The distribution is named tripwire-oracle (the bare name “tripwire” is reserved on PyPI), but the command you run is always tripwire. The optimize command additionally needs OpenEvolve and an LLM key — Tripwire offers to install OpenEvolve for you when you first run it.

Five minutes

Quickstart

Four commands, in order:

# 1. see the oracle catch planted reward-hacks across domains (no setup)
tripwire demo

# 2. watch it ACCEPT a real win and REJECT a memorized hack (no setup)
tripwire verify --example

# 3. scaffold a Target from your own slow-but-correct function
tripwire init my_reference.py        # writes my_reference_target.py

# 4. fill in the TODOs, then verify a candidate or run the optimizer
tripwire verify my_reference_target.py my_candidate.py
tripwire optimize my_reference_target.py

Run tripwire with no arguments for an interactive menu, or tripwire explain for the same overview in your terminal.

The mechanism

How the oracle works

AI tools that “optimize” code sometimes cheat: they return code that looks much faster but is secretly wrong — it memorized the test answers, or skipped the real work. The oracle grades every candidate in four layers. Any correctness layer failing means rejection with zero credit; speed is measured only after correctness is proven.

L1Canonical correctness

Is the answer the same on the test inputs?

Anything wrong on the inputs it was tested on. Exact for structural targets; tolerance for numeric ones — correct vectorization changes the low bits, so bitwise here would discard real speedups.

L2Metamorphic / property

Does it obey invariants the real computation must satisfy?

Candidates that pass the visible inputs but violate a known relation — scale-equivariance of a sum, parse↔serialize round-trip, count-conservation of a tokenizer. Cheap, total, relational.

L3Differential on withheld inputs

Is it still correct on adversarial inputs it has never seen?

Memorization. Skip-the-work. Distribution-conditioned wrongness. L3 re-checks against the reference on a fixed adversarial set AND fresh generative draws under new OS-entropy seeds each run — you cannot overfit to inputs you cannot see. This is the moat.

L4Isolated speedup

Is the speed real, after correctness has been proven?

Phantom improvements from timing noise. Near-infinite 'speedups' (a red flag, not a winner). L4 measures warmed-up, best-of-N across shapes, with a variance lower bound — and only a candidate that already passed L1–L3 is ever timed.

L3 is the moat. It differential-tests against withheld, adversarial inputs the optimizer never saw — which is how a memorized or special-cased “optimization” gets caught before its speedup is ever counted.

Your own code

Authoring a Target

A Target tells the oracle how to judge your problem. It bundles five things:

referencethe slow-but-correct ground truth (a pure function)
canonical_argsinputs the optimizer is allowed to see (the “test set”)
withheld_argsfresh + adversarial inputs it never sees — the moat (must be non-empty)
propertiesmetamorphic / invariant checks the real computation must satisfy
candidateslabeled reference implementations (benchmark only)

You don’t have to write it from scratch. tripwire init reads your reference function and generates a fill-in-the-blanks skeleton:

tripwire init my_reference.py     # -> my_reference_target.py with TODOs

Make the withheld inputs genuinely adversarial — edges that exercise every code path, not more of the same. That split is the whole defense against reward-hacking. See the full authoring guide for the contract and a worked example.

Reference

CLI reference

tripwire demo

Run the cross-domain integrity scorecard — the oracle vs. naive oracles across seven domains. No setup.

tripwire verify TARGET CANDIDATE

Verify one optimized candidate against a Target: correct on withheld inputs, then how much faster. Add --example to try a bundled one.

tripwire init REFERENCE.py

Scaffold a Target skeleton from your reference function. --function picks one if the file has several; -o sets the output.

tripwire optimize TARGET

Run a real OpenEvolve loop graded by the oracle (needs OpenEvolve + an LLM key). --iterations, --example, --yes (auto-install OpenEvolve).

tripwire explain

The 4-layer oracle, Targets, and the commands — in your terminal.

The optimize loop reads OPENAI_API_KEY and OPENEVOLVE_MODEL (OpenAI-compatible; OPENAI_BASE_URL optional) from your environment or a local .env.