MVPDeltaWrite·AI · Adaptive inference

DeltaWrite: bounded persistent inference-time adaptation for frozen language models.

DeltaWrite is a way for a deployed language model to acquire a bounded change at runtime — knowledge, behavior, or a runtime safety patch — without a standard fine-tuning job, and without paying prompt-budget cost on every query. The result is persistent across sessions, reversible on demand, and dormant on unrelated traffic. Validated on six base models from 0.5B to 72B parameters across three architectural families.

Public technical record·Implementation details under NDA

Open the live demo ↗

The problem.

Large language models increasingly need to evolve after pretraining. Operators want to insert a new fact, encode a customer-specific policy, override a default answer, or supply a narrow domain behavior — without retraining the whole model.

Every existing approach leaves gaps. Prompting is flexible but ephemeral and costs tokens on every call. Retrieval-augmented generation grounds factuality but depends on an external memory and reinjects evidence at query time. Fine-tuning and adapter methods work but demand optimization infrastructure and offline training data.

DeltaWrite targets a different operating point: a frozen transformer that acquires bounded persistent behavior through an inference-time mechanism, without a gradient-based optimization loop at deployment. The goal isn’t broad distributional change. It’s a bounded write — persistent across sessions, reversible on demand, and dormant on unrelated queries.

Approach.

A frozen base model serves interactive queries. Given a target item of knowledge or behavior, the system registers an update that satisfies five properties simultaneously:

Alters the model only in a narrow intended way.
Persists across prompts or sessions.
Can be removed cleanly.
Remains largely dormant for unrelated inputs.
Composes with many other registered items without destructive interference.

This is not broad alignment, domain adaptation, or long-horizon continual learning. The target is a bounded overlay — a factual insertion, a policy override, a terminology correction, a narrow behavioral mapping — evaluated not just on whether the insertion is recalled, but on whether it triggers under paraphrase, avoids false activation, and survives competition with many neighboring items.

How it works.

DeltaWrite is built around a runtime intervention that satisfies the five properties above — narrow alteration, persistence, clean removal, dormancy on unrelated inputs, and composition across many simultaneous items. Registration is cheap enough for interactive workflows rather than offline training pipelines. Removal is by deregistration, not retraining.

The construction of the registered object and the calibration that adapts the method across host families sits in the non-public technical record. We discuss it in detail with pilot and integration partners under NDA.

Validation.

Three test families structure the public evaluation: knowledge-install behavior on a verified-clean enterprise knowledge base, capacity stress at large registry sizes, and capability preservation on standard public benchmarks. Every headline cell ships with a per-cell JSON artifact, sanity-floor ablations, and pre-registered pass criteria committed to the repo before any data was collected.

Knowledge-install headline.

Paraphrased recall · Qwen-7B-Instruct · N=50 simultaneously installed facts · held-out paraphrasings0.940pre-registered 0.70 minimum cleared by 24 points

Paraphrased recall · Qwen-72B-Instruct · N=500.953does not collapse on bigger models

Capacity · paraphrased recall from N=50 to N=500 · Qwen-7B0.940 → 0.902curve plateaus, no degradation cliff

Per-query overhead vs unmodified base · Qwen-7B+2.3%interactive serving

Knowledge base under test: 553 verified-clean enterprise facts, base-ignorance verified across four query specs (literal, paraphrased, abstract, task) before a fact enters the protocol. Paraphrased queries are the held-out spec — the system never sees them during install.

Architecture transfer.

DeltaWrite has been validated on six base models across three architectural families: Qwen 0.5B / 1.5B / 7B / 14B / 72B, Llama 3.1-8B-Instruct, and Mistral-7B-Instruct-v0.3. Paraphrased recall lands in the 0.90–0.95 band on every one. The 72B number is higher than the 7B number, not lower. Per-family calibration is required when porting and is treated as part of the work; whether a method “transfers” is the recurring source of overclaim in this corner of the field, and we report transfer as portable with calibration, not plug-and-play.

Bit-identical baseline preservation.

With 100 installs simultaneously loaded into Qwen-7B-Instruct, DeltaWrite was evaluated against three standard public benchmarks via lm-evaluation-harness: MMLU (5-shot, 14,015 prompts), GSM8K (5-shot, 1,319 prompts), and HumanEval (pass@1, 164 prompts). Across all 15,498 prompts, DeltaWrite’s output was bit-identical to the unmodified base model. The MMLU number bears it out: DeltaWrite-active scored 68.6100, the vanilla baseline scored 68.6085 on the same harness configuration — they match to fourteen decimal digits. This is a mechanical guarantee, not a statistical one.

Head-to-head against the obvious alternatives.

On identical knowledge base, identical 7B host, identical evaluation protocol:

vs ROME (Meng et al., 2022): paraphrased recall 0.940 vs 0.020 at N=50 — a ~47× margin. ROME’s drift after 50 sequential edits reaches 307% of the frozen weight matrix’s Frobenius norm; perplexity rises from 6.27 to 8.24.
vs MEMIT (Meng et al., 2023, the canonical successor to ROME): we built MEMIT from scratch (~580 LOC, Algorithm 1 from the paper), tuned per-architecture, and ran the identical evaluation. Paraphrased recall: 0.940 vs 0.070 — a ~13× margin. MEMIT’s structural improvements over ROME hold (drift bounded at 19%, perplexity unchanged at N=50) — MEMIT is the correct baseline, and the DeltaWrite win persists against the strongest representative of the weight-editing family.
vs retrieval-augmented generation, top-3 retrieval: paraphrased recall 0.940 vs 0.853, with zero retrieved tokens consumed and no external retrieval index to maintain.
vs LoRA, on a 15-cell sweep across rank ∈ {8, 16, 32, 64} × N ∈ {50, 200, 500} × multiple training protocols: LoRA at maximum effort edges DeltaWrite on raw paraphrased recall (0.954 vs 0.902 at N=500) but pays a 0.200-point absolute drop on MMLU at the same configuration. The model loses measurable unrelated capability. DeltaWrite at the same registry size is bit-identical to the unmodified base on MMLU.

The capability-preservation axis is the persuasive cell of the comparison. Every method that achieves comparable accuracy by training pays for it in measurable degradation of unrelated capability. DeltaWrite does not.

Right-to-forget.

A separate evaluation cell measured the per-deletion cost of removing one previously-installed item from a deployed model, head-to-head against ROME, MEMIT, and LoRA. Same host (Qwen2.5-7B-Instruct), same dataset (CounterFact, 100 records simultaneously installed, disjoint case_ids from the install benchmarks), same evaluation script. The operational lens is GDPR right-to-erasure: on a model that has acquired a piece of information, what does it cost to take it out cleanly?

DeltaWrite: 36.9 microseconds per deletion. ForgetScore 1.000, RetainScore 1.000, ΔMMLU 0.000 — the targeted item is gone, the other 99 items are intact, general capability is unchanged.
LoRA: 24.2 seconds per deletion (~660,000× slower). MMLU drops 62 points; only one in three deletions actually clears the targeted item.
ROME: 2.5 minutes per deletion (~4,000,000× slower). Forget and retain land clean, but MMLU loses 5 points to accumulated floating-point drift across the restore-and-replay cycle the deletion procedure requires.
MEMIT: 8.4 minutes per deletion (~14,000,000× slower). The 100-edit joint state collapses at this registry size on this host; post-deletion MMLU drops to zero.

The wall-time gap is structural. DeltaWrite’s edits sit next to the model rather than being baked into the weight matrices, so deletion is by deregistration and per-deletion cost is independent of registry size. The other three methods materialize their edits into the weights; deleting one requires restoring the base model and replaying the N−1 retained edits — cost scaling linearly in N. Right-to-erasure on that class of method is structurally O(N) per request; on DeltaWrite it is O(1).

Single-seed run, K=5 deterministically-picked targets across distinct relations. MEMIT was tuned at N=50 and not retuned for N=100; the joint-state collapse at the larger registry size is documented as a finding rather than papered over. LoRA was evaluated at a default training config (rank 64, three epochs); a max-effort cell would likely improve the quality columns at proportionally larger per-deletion wall.

Runtime safety.

DeltaWrite has been validated as a runtime safety capability. Configured against jailbreak attack patterns, it drops attack success rate substantially while leaving benign queries untouched. Setup is 33 seconds. Removal is bit-identical revert.

Static jailbreak families.

Five families (AIM, DAN, Evil Confidant, Year-2099 Roleplay, Ignore-Previous-Instructions) across 500 attempts on Qwen2.5-7B-Instruct, including two families never seen during defense configuration. Aggregate attack success rate falls from 0.158 to 0.006. Four of the five families fall to 0.000 attack success, including the unseen Roleplay family.

Gradient-based adversarial suffixes (GCG).

GCG (Zou et al., 2023) is the strongest published adversarial-suffix attack class. We ran it at full strength — 250 optimization steps per prompt, ~4 minutes per prompt — across 100 JailbreakBench prompts. Vanilla attack success rate: 0.690, within the published GCG range for Qwen-7B-class models.

With no GCG attacks in the configuration corpus, DeltaWrite drops GCG attack success to 0.210. A 48-point absolute drop into a fundamentally different attack vector than anything in the corpus.

With 25 sampled GCG attacks added to the configuration corpus, evaluated on 75 held-out GCG attacks the system never saw: 0.000. Perfect defense, 65-point absolute drop. The realistic deployment shape: one operator observes a new attack family in production logs, adds a small sample, and redeploys — every future instance of that family is blocked.

Selectivity.

Across twenty benign general-knowledge queries, the defense made zero changes to the model’s output. Refusal rate on plain harmful prompts (no jailbreak template) matches the unmodified base model exactly — no extra over-refusal. The defense is invisible to anyone not attacking the system.

Operational properties.

Setup wall-clock: 33 seconds.
Storage: a few kilobytes per patch.
Inference overhead: +2.3% on a 7B host.
Reversibility: bit-identical revert via a single API call.

What it doesn’t do.

We carry a list of negative results next to the headline numbers, in the same document, at the same level of seriousness.

Several tempting simplifications fail near zero. A reasonable person reaching for the operating point we describe will try a small number of shortcut constructions first. We tried them. They do not fail at the limit; they fail in the regime where you would most want them to work. Each took weeks to characterize cleanly enough that we could rule the construction out rather than blame our implementation.

Architecture sensitivity is real. Host families require materially different calibration and characterization. The approach is portable across model lines but is not plug-and-play; serious deployment requires per-family validation, and treating that validation as part of the work is one of the things this research cycle taught us to do reliably.

Integration.

DeltaWrite ships as a runtime layer that sits in front of a frozen base model and manages a registry of bounded writes. Integration is API-shaped, not training-shaped: an operator registers a target update, the runtime evaluates it against the published property suite, and the write either commits or is rejected with a diagnostic.

We’re currently working with a small number of pilot and design partners on production deployments — the realistic surface today is a teams-with-frontier-models conversation, not a self-serve SaaS. If you operate a frontier model and either the knowledge-install story or the runtime-safety story maps to a deployed product, we would like to talk.

Availability.

DeltaWrite is in MVP today. Pilots, design partnerships, and integration conversations are all open. A provisional patent on the core construction was filed in April 2026; the technical record is shared with partners under NDA. Send a one-line note describing the use case and we reply within one business day.

§09 — Contact

Talk to us about DeltaWrite.

NDA not required for the first call. Pilots, design partners, and integration conversations are all open. We reply within one business day.

All products