anthropics/defending-code-reference-harness: a blueprint for LLM-driven vulnerability hunting

A reference implementation, and it means it

Most security repos want to be your scanner. This one wants to be your blueprint. Anthropic published the Defending Code Reference Harness as a worked example of how to find and remediate vulnerabilities with Claude, distilled from partnering with security teams since the Claude Mythos preview. The README is unusually direct about the boundary: the repo is not maintained, is not accepting contributions, and the autonomous pipeline is “a reference, not a product” that “will not work on every codebase out of the box.” Reading it that way is the difference between using it well and being disappointed.

What you take from it is the shape: a repeatable recon to find to verify to report to patch loop, plus the prompts and sandboxing that make that loop safe to run. You are expected to adapt it, not deploy it as-is.

Two things in the box

The repo splits cleanly into an interactive half and an autonomous half:

Claude Code skills: /quickstart, /threat-model, /vuln-scan, /triage, /patch, and /customize. You open the repo in Claude Code and run /quickstart to get oriented, then use the others for interactive scoping, scanning, triage, and patching.
The harness/ pipeline: an autonomous reference implementation configured to find C and C++ memory vulnerabilities using Docker and AddressSanitizer. It runs the full recon to find to verify to report to patch loop without a human in each step.

The autonomous harness is deliberately narrow in its out-of-the-box target (C/C++ memory bugs with ASAN). Porting it to your language, detector, or vulnerability class is what /customize is for, and that customization is the work the project expects you to do.

The safety model is the part to read twice

Security tooling that runs untrusted code is itself a risk, and the harness is explicit about which actions touch what:

The read-only skills, /quickstart, /threat-model, /vuln-scan, and /triage, only read and write files. Running /patch on static findings (TRIAGE.json or VULN-FINDINGS.json) is also read- and write-only. These are safe to run unsandboxed as long as you approve each tool use in Claude Code.
/customize edits the harness code and runs validation commands.
The autonomous pipeline, including /patch on pipeline results, executes target code, so it refuses to run outside a gVisor sandbox unless you explicitly override that. You run scripts/setup_sandbox.sh once to get set up.

That refuse-by-default posture is the right instinct for a tool that will run code it just flagged as possibly malicious, and it is worth copying into anything you build from this.

How to run it

Clone the repo and open it in Claude Code, then run /quickstart. The skills work with whatever access you have to Claude APIs, including Bedrock, Vertex, or Azure, so you are not tied to one deployment. For the autonomous harness, set up the gVisor sandbox first with the provided script. There is also a lighter SDK-only walkthrough of the same loop in Anthropic’s companion cookbook if you want the concepts without the full harness.

Where it fits, and where it does not

Reach for this when you want to build your own vulnerability-finding pipeline and you want a credible starting point for the prompts, the loop structure, and the sandboxing, rather than inventing them. It is genuinely useful as a teaching artifact and a scaffold.

Do not reach for it expecting a turnkey scanner. It is unmaintained by design, narrow in its default target, and explicit that it needs porting. For a managed option, Anthropic points to Claude Security, a hosted product that scans repositories, runs a multi-stage verification pipeline to cut false positives, and manages findings through their lifecycle. The reference harness is the open, build-it-yourself counterpart to that.

A note on licensing: GitHub does not detect a standard license on this repo, so review the actual terms in the repository before reusing the code in your own project, rather than assuming a permissive default.

For the code-review flavor of LLM-driven analysis, see alibaba/open-code-review, which wraps a model in deterministic pipelines aimed at code review, not security scanning. For what else is climbing in the ecosystem, see LLM tooling, the daily digest, and the weekly report.

FAQ

Is this a security scanner I can just run? No. It is a reference implementation, unmaintained by design, that you adapt. The autonomous harness targets C/C++ memory bugs out of the box and needs /customize for other cases.

Is it safe to run? The read-only skills are safe to run unsandboxed with per-action approval. The autonomous pipeline executes target code and refuses to run outside a gVisor sandbox unless overridden.

Which Claude access do I need? Any, including Bedrock, Vertex, or Azure. The skills are not tied to one deployment.

What is the managed alternative? Anthropic’s Claude Security, a hosted product that finds and fixes vulnerabilities with a verification pipeline and findings management.

anthropics/defending-code-reference-harness: a blueprint for LLM-driven vulnerability hunting

Star growth

A reference implementation, and it means it

Two things in the box

The safety model is the part to read twice

How to run it

Where it fits, and where it does not

FAQ

Momentum

Repository data

anthropics/defending-code-reference-harness: a blueprint for LLM-driven vulnerability hunting

Star growth

A reference implementation, and it means it

Two things in the box

The safety model is the part to read twice

How to run it

Where it fits, and where it does not

Related

FAQ

Momentum

Repository data