RECPublic benchmark record·Outcome metrics only

The full measured shape of WRIT.

Every public WRIT evaluation, reported as outcome metrics on standard model checkpoints and standard public benchmarks. No protocol narrative, no setup detail — just the cells. Methodology, ablations, and per-seed traces are archived and available under NDA. Construction details are held confidentially under patent.

Public capability record·Construction details under NDA with partners

Headline summary.

One row per test family. Every cell is an outcome on a public benchmark or held-out evaluation set against a standard model checkpoint.

#	Family	Model	Headline	Status
01	Phase 0 — mechanism eval	Qwen2.5-Coder-1.5B	Oracle ρ = 1.000 at N=50; full ρ = 0.926 paraphrased	PASS
02	7B headline (Acme KB)	Qwen2.5-7B-Instruct	Oracle ρ = 1.000; full ρ = 0.940 paraphrased N=50	PASS
03	Cross-architecture transfer	Qwen / Llama / Mistral · 0.5B–72B	Oracle 1.000 on every model; full ρ 0.90–0.953	PASS
04	Capacity sweep	Qwen2.5-7B-Instruct	0.940 → 0.902 paraphrased ρ from N=50 → N=500, no cliff	PASS
05	Behavioral suite — 1.5B	Qwen2.5-Coder-1.5B	10/11 PASS, 1 PARTIAL	PASS
06	Behavioral suite — 7B	Qwen2.5-7B-Instruct	10/11 PASS, 1 PARTIAL · zero net regressions vs 1.5B	PASS
07	Capacity stress — 1.5B	Qwen2.5-Coder-1.5B	99.8% routing @ N=500 · 0 leaks · 257 ms/q (O(1) in N)	PASS
08	Capacity stress — 7B	Qwen2.5-7B-Instruct	100% routing @ N=100 · 0 leaks · 815 ms/q	PASS
09	No-harm (MMLU + GSM8K + HumanEval)	Qwen2.5-7B-Instruct · N=100 active	0 / 15,498 fires · output bit-identical to baseline	PASS
10	Head-to-head vs ROME	Qwen2.5-7B-Instruct	WRIT 0.940 vs ROME 0.020 paraphrased N=50 (~47×)	PASS
11	Head-to-head vs MEMIT	Qwen2.5-7B-Instruct	WRIT 0.940 vs MEMIT 0.070 paraphrased N=50 (~13×)	PASS
12	Head-to-head vs RAG (top-3)	Qwen2.5-7B-Instruct	WRIT 0.940 vs RAG 0.853 paraphrased · 0 retrieved tokens	PASS
13	Head-to-head vs LoRA	Qwen2.5-7B-Instruct · 15-cell sweep	LoRA-best 0.954 / −0.200 MMLU vs WRIT 0.902 / 0.000 MMLU	PASS
14	Static jailbreak families (defense)	Qwen2.5-7B-Instruct · 5 families · 500 attempts	Aggregate ASR 0.158 → 0.006 · 0/20 benign fires	PASS
15	GCG defense — generic hardening	Qwen2.5-7B-Instruct · 100 GCG attacks	ASR 0.690 → 0.210 · no GCG seen during setup	PASS
16	GCG defense — targeted hardening	Qwen2.5-7B-Instruct · 75 held-out GCG	ASR 0.653 → 0.000 · 75/75 refused	PERFECT
17	Right-to-forget	Qwen2.5-7B-Instruct · 100 CounterFact edits	36.9 µs / unlearn · FS=1.0 / RS=1.0 / ΔMMLU=0.000 · 5–7 orders faster than ROME / MEMIT / LoRA	PASS
18	Live demos (12-step cast, BTC chat)	Qwen2.5-7B-Instruct	9/9 cast assertions · BTC live chat 5/5 smoke	PASS
19	DKT Phase A (drop-in probe)	Qwen2.5-0.5B	40/40 across all probed depths	PASS
20	DKT Phase B (from-scratch 200M)	Tiny from-scratch transformer	Recall 40/40 · perplexity 1.76× baseline	PARTIAL

Snapshot 2026-04-29. ρ = headroom-normalized substring-recall correlation on held-out paraphrased queries. Pre-registered kill criterion: oracle ρ < 0.75 at N=50 — never fired.

Knowledge install.

Single-fact and multi-fact installation, measured by paraphrased substring recall on held-out queries. Oracle = perfect routing, isolates the plasticity rule. Full = end-to-end with the live dispatcher. zero and random_B are the sanity floors.

Phase 0 — mechanism eval (Qwen2.5-Coder-1.5B).

N	full	oracle_gate	frozen_gate
1	1.000	1.000	1.000
5	0.971	1.000	0.248
10	0.962	1.000	0.114
25	0.968	1.000	0.038
50	0.926	1.000	0.012

7B headline — Acme KB (Qwen2.5-7B-Instruct, N=50).

Ablation	literal	paraphrased	abstract	task
zero	0.000	0.000	0.000	0.000
random_B	0.000	0.000	0.000	0.000
oracle_gate	1.000	1.000	1.000	1.000
full	0.980	0.940	1.000	0.987

Pre-registered MVP bars: oracle ρ ≥ 0.90 and full ρ ≥ 0.70 at N=50 paraphrased. Cleared by 24 points on full-ρ.

Capacity sweep — paraphrased ρ vs N (Qwen2.5-7B-Instruct).

N	Paraphrased ρ (full)	Note
50	0.940	matches Task 6 headline
100	0.913	mid-curve
200	0.905	early plateau
500	0.902	end of sweep, no degradation cliff

Cross-architecture transfer.

Same primitive across three architectural families and 144× parameter range. Per-family operating-point tuning is mechanical (one decoder layer + one fractional scale).

Model	Params	Oracle ρ	Full ρ	Operating point
Qwen2.5-Coder-1.5B	1.5B	1.000	0.926	L27 / frac=0.005
Qwen2.5-7B-Instruct	7B	1.000	0.940	L27 / frac=0.005
Llama-3.1-8B-Instruct	8B	1.000	0.940	L30 / frac≈0.5
Mistral-7B-Instruct-v0.3	7B	1.000	0.947	L30 / frac≈0.5
Qwen2.5-14B-Instruct	14B	1.000	0.900	L47 / frac=0.05
Qwen2.5-72B-Instruct	72B	1.000	0.953	L79 / frac=0.05

All cells: N=50, paraphrased held-out queries. Phi and Gemma additionally sanity-validated; not in this table.

Capacity at scale.

Number of WRIT operations coexisting on a single base model, with routing accuracy and benign-fire (leak) counts.

Qwen2.5-Coder-1.5B.

N	Routing	Inference
10	100%	339 ms/q
50	100%	244 ms/q
100	100%	248 ms/q
200	99.5%	244 ms/q
300	99.7%	~250 ms/q
500	99.8% (499/500)	257 ms/q

Qwen2.5-7B-Instruct.

N	Routing	Leaks	Build	Gate train	Generation
10	10/10	0/4	3.1 s	25.1 s	1222 ms
25	25/25	0/4	7.3 s	39.7 s	1025 ms
50	50/50	0/4	13.2 s	74.4 s	866 ms
100	100/100	0/4	27.5 s	136.7 s	815 ms

Composability ceiling.

5,000 simultaneous operations on a 7B model.

Routing accuracy at N=5,00098.5%recall 98.5% · zero cross-talk

MMLU change with 5,000 active operations−0.01 ppcapacity scales 10× with only a 1.3-point routing-accuracy drop

No-harm benchmark.

The selectivity test. With N=100 active operations, the dispatcher is run against three standard public benchmarks. A “fire” means a WRIT operation activated — the headline number is how many times it did.

Benchmark	Prompts	WRIT N=100	Vanilla baseline	Δ	Fire rate
MMLU (5-shot)	14,015	68.6100	68.6085	+0.0015	0 / 14,015
GSM8K (5-shot)	1,319	69.98	—	—	0 / 1,319
HumanEval (pass@1)	164	82.32	—	—	0 / 164
Total	15,498	—	—	—	0 / 15,498 = 0.0%

With zero fires, output is mathematically identical to the unmodified base — proven on MMLU at the 14-decimal level. Selectivity improves with N: 1/570 fires at N=10 → 0/14,015 at N=100.

Head-to-head baselines.

Same model (Qwen2.5-7B-Instruct), same KB (57–553 base-ignorance-verified Acme facts), same paraphrased substring evaluator. Every method ran on the operating point its own paper or sweep recommended.

vs ROME (Meng et al. 2022) and MEMIT (Meng et al. 2023).

N	WRIT oracle	WRIT full	MEMIT (best)	ROME L15
10	1.000	1.000	0.200	0.100
50	1.000	0.940	0.070	0.020

ROME drift after 50 sequential edits: 307% of frozen-matrix Frobenius norm; perplexity 6.27 → 8.24. MEMIT structural improvements held (drift 19%; perplexity unchanged at N=50). Margins: ~47× over ROME, ~13× over MEMIT.

vs RAG (top-3 retrieval).

Method	Strict-match	Paraphrased ρ	Tokens consumed	External index?
RAG (top-3)	0.633	0.853	hundreds	yes
WRIT (full)	0.940	0.940	0	no

vs LoRA — 15-cell sweep.

Ranks ∈ {8, 16, 32, 64} × N_facts ∈ {50, 200, 500} × Protocol A (3 ep) + Protocol B (3 ep, r=64 / N=500) + 2 max-effort cells (20 ep). The selected cells are baseline + the highest-effort points.

Cell	rank	N	epochs	para_sub	MMLU	ΔMMLU
Baseline (no LoRA)	—	—	—	0.000	0.680	—
protA_r8_n500	8	500	3	0.040	0.660	−0.020
protA_r64_n500	64	500	3	0.154	0.680	+0.000
protB_r64_n500_BEST	64	500	3	0.870	0.600	−0.080
protA_r64_n500_E20	64	500	20	0.816	0.560	−0.120
protB_r64_n500_BEST_E20	64	500	20	0.954	0.480	−0.200
WRIT @ N=100	—	100	—	0.913	0.680	0.000
WRIT @ N=500	—	500	—	0.902	0.680	0.000

LoRA at maximum effort slightly exceeds WRIT on accuracy (0.954 vs 0.902 at N=500), but every LoRA cell with usable accuracy pays MMLU. The MMLU-Δ column is the moat: parity is achievable, capability preservation is not.

Jailbreak defense.

Setup wall-clock: 33.62 s (32.68 s gate train + <1 s refusal-patch construction). Storage: a few KB per patch. ASR = attack-success rate.

Static jailbreak families — 5 families × 100 prompts.

Family	Train / OOD	Vanilla ASR	Defended ASR	ASR drop	Slot fire rate
AIM	trained	0.290	0.000	−0.290	1.000
DAN	trained	0.120	0.000	−0.120	1.000
Evil Confidant	trained	0.210	0.000	−0.210	1.000
Roleplay (Year-2099)	OOD	0.100	0.000	−0.100	1.000
Ignore-Previous-Instructions	OOD	0.070	0.030*	−0.040	0.970
Aggregate (500 attempts)	—	0.158	0.006	−0.152	—

* The 3 “failures” in Ignore-Instructions are JBB Disinformation-category prompts where the model gave factual rebuttals rather than canonical refusal phrases. Zero actually harmful outputs; metric over-counts.

GCG (gradient-based adversarial suffixes).

nanoGCG, 250 steps/prompt, ~4 min/prompt; published GCG range for Qwen-7B-class is ASR 0.65–0.85.

Condition	ASR	Refused	Slot fire rate
GCG attack (vanilla, N=100)	0.690	31 / 100	—
GCG (generic defense, no GCG seen)	0.210	79 / 100	0.700
GCG on held-out 75 (vanilla)	0.653	26 / 75	—
GCG held-out 75 (targeted defense)	0.000	75 / 75	1.000

Generic hardening = persona-style jailbreaks only in training. Targeted hardening = + 25 sampled GCG attacks; tested on 75 held-out. ASR drops: −48 pp generic, −65 pp targeted (perfect).

Selectivity — no benign degradation.

Test	n	Slot fire rate	Behavior
Benign general-knowledge queries	20	0.000 (0/20)	model answers normally
Plain JBB harmful prompts	100	0.920	refusal rate matches vanilla; no over-refusal

Comparison to existing defenses.

Method	Time to deploy	Reversible?	Per-attack selective?	vs strong attacks
RLHF	months	hard rollback	no	high
Constitutional AI	months	hard rollback	no	high
Safety LoRA fine-tune	hours	partial	no	high
System-prompt defenses	seconds	trivial	no	low (eats context)
WRIT (generic)	33 s	bit-identical	yes	0.21 ASR vs GCG
WRIT (targeted)	33 s	yes	yes	0.00 ASR vs GCG

Unlearning & reversibility.

Four-cell head-to-head on Qwen2.5-7B-Instruct. CounterFact records [1000, 1100) — 100 facts. After installing all 100 edits per method, sequentially unlearn K target facts and measure: per-target wall, ForgetScore (target reverts cleanly), RetainScore (other 99 stay), ΔMMLU (general-capability damage).

Method	Wall / unlearn	× vs WRIT	ForgetScore	RetainScore	ΔMMLU
WRIT	36.9 µs	1×	1.000	1.000	0.000
ROME (restore + replay 99)	2.5 min	4.0 × 10⁶×	1.000	1.000	−0.050
MEMIT (restore + replay 99)	8.4 min	1.4 × 10⁷×	0.400	0.937	−0.760
LoRA (retrain on remaining 99)	24.2 s	6.6 × 10⁵×	0.333	0.814	−0.620

WRIT is the only method with FS = 1.0, RS = 1.0, AND ΔMMLU = 0.0 simultaneously. MEMIT’s joint state at N=100 collapses (52/100 edits hold; cascade through unlearn drives MMLU 76% → 0%). LoRA at default config catastrophically forgets (76% → 14%) and recovers only 1/3 of unlearn targets.

Bit-identical rollback.

Output equivalence vs unmodified baseline after rollback14 dec.verified across MMLU at 14-decimal precision

Validation suite required to confirm rollbacknonebyte-for-byte revert · no checkpoint replacement, no regression sweep

Behavioral suite.

11 tests measuring keyword, semantic, knowledge, skill, persona, multi-lingual, and adversarial behaviors. Run on both 1.5B and 7B with the chat-template port as the only structural change.

#	Test	Qwen2.5-Coder-1.5B	Qwen2.5-7B-Instruct	Δ
1	Keyword behavior (FUCK → joke)	PASS	PASS	identical
2	Semantic behavior (depression → joke)	PASS	PASS	identical
3	Knowledge override (France → London)	PASS	PASS	identical
4	Skill injection (1987 → MCMLXXXVII)	PASS	PASS	identical
5	Multi-behavior stacking	PASS	PASS	identical
6	Persona injection (pirate speak)	PASS	PASS	identical
7	Cross-lingual (English → Spanish)	PASS	PASS	identical
8	Chain-of-thought steering	PASS	PASS	identical
9	Multi-turn context (1–4 turns)	PARTIAL	PASS ↑	chat template fixes
10	Adversarial robustness (use vs mention)	PASS	PARTIAL ↓	1 should-fire missed
11	Token length (7 / 18 / 31 tokens)	PASS	PASS	identical

7B headline: 10/11 PASS, 1 PARTIAL · zero net regressions vs 1.5B · 7.4 min total runtime for the full suite.

Inference overhead.

Per-request cost of having WRIT operations attached, measured at 7B against the unmodified base.

Operation	Latency	Δ vs baseline
Bundle construction	305 ms / bundle	one-time per fact
Gate train (N=10)	12.1 s	one-time per gate
Gate train (N=50)	63.0 s	one-time per gate
Inference baseline (20 tok)	1074 ms / q	—
Inference hooked (10 inj, 20 tok)	1099 ms / q	+25 ms / +2.3%
Reversibility (unhook)	instant	bit-identical

Speed profile — 1.5B vs 7B.

Operation	1.5B	7B	Ratio
Bundle construction	112 ms (36 ms/tok)	305 ms (98 ms/tok)	2.7×
Feature extraction	20 ms / 8960-dim	89 ms / 18944-dim	4.5×
Gate train (N=10)	13.6 s	12.1 s	0.9×
Gate train (N=50)	46.6 s	63.0 s	1.4×
Inference baseline (20 tok)	375 ms / q	1074 ms / q	2.9×
Inference hooked (10 inj)	336 ms / q (−10.3% noise)	1099 ms / q (+2.3%)	—

Storage footprint.

Method	Per-fact footprint
WRIT	~270 KB factored (A, B) rank-1 pair
LoRA r=64	~1.3 MB / fact (adapter is monolithic; this is amortized)
ROME / MEMIT	full layer matrix per edit family

Per-fact construction cost.

Method	Per-fact add	Per-fact forget	Per-fact modify
WRIT	1 fwd pass / answer token (~0.9 s on 7B)	unhook — instant	replace B row
LoRA	re-train adapter	re-train without it	re-train adapter
ROME / MEMIT	edit pass + cov inv (10s of seconds)	not natively supported	full edit
RAG	index update	index update	index update

Live demonstrations.

The 12-step recorded narrative.

Cast assertions on real Qwen2.5-7B-Instruct9 / 9batch-install · save / load across processes · surgical forget

Regression guard before recording0.90+paraphrased ρ on 50 facts × 4 query specs (200 queries)

BTC live chatbot.

Smoke-test queries (literal · paraphrase · multi-turn · long history · vanilla)5 / 5price refreshed every 5 s via forget+teach under gpu_lock

Initial teach~2.5 ssubsequent ~2 s · off-topic queries hit unmodified base

Provenance & sanity floors.

What every cell on this page has, by construction:

Per-cell JSON results with seed / spec / ablation broken out — never just summary stats.
Pre-registered hypotheses for every gating experiment, written before the experiment ran and committed to the repo with a timestamp.
Pre-registered kill criteria (“stop if oracle ρ < 0.75 at N=50”) — none ever fired; actual oracle ρ has been 1.000 on every gating cell.
Sanity floors built into every cell — zero and random_B ablations are required; zero = 0.000 everywhere, random_B at sanity-floor (≤ 0.20 on the worst cell, well below every pass bar).
Base-ignorance verification of every fact in the test KB (<5% of seeds dropped for low-entropy answer spaces) before it enters the protocol.
Cross-seed verification — at least 2 seeds per cell (most have 3); seed range ≤ 2 points across the program.
Cross-arch / cross-scale verification — six base models from 0.5B to 72B across three families.
Cross-harness drift confirmation for the no-harm result — the gap to published MMLU reproduces on the unmodified base with the same harness configuration.
Negative results retained alongside positives — the LRM Phase 1F multi-fact superposition FAIL, DKT Phase B PARTIAL, and the negative-alpha suppression FAIL are all kept with full traces and motivated downstream design choices.

§ Engage

The traces behind every cell.

Per-seed JSON, ablation traces, and replication scripts are available to qualified counterparties under NDA, alongside a technical deep-dive on the construction held under patent.

Back to WRIT overview →