01WRIT·AI · Adaptive inference

A runtime primitive for modifying language models.

WRIT installs new knowledge, behavior, refusal policies, or capability constraints directly into a deployed model in seconds, can be removed bit-identically with a single API call, and remains invisible to users on inputs that aren’t its target. It is not a fine-tune, not a prompt, not retrieval, and not an adapter. It is a new operating point — closer in shape to a feature flag than to a training run — and its properties are unlike anything currently shipping in the LLM stack.

Public capability record·Construction details under NDA with partners

See the benchmarks →

The gap WRIT fills.

For a decade, the field has had only two ways to change what a deployed language model knows or does: retrain it, or wrap it. WRIT is a third way.

Every existing approach forces a trade between speed, reversibility, and selectivity. Retraining is slow but durable. System prompts are instant but ephemeral. Adapters fall in between and apply uniformly. None of them are runtime, reversible, and per-request selective at the same time.

Approach	Time-to-deploy	Reversible?	Per-request?	Cost shape
Retraining (RLHF, full fine-tune)	weeks–months	hard rollback	no	engineering project
Adapter fine-tunes (LoRA family)	hours	partial	no	training infra + checkpoint push
Retrieval (RAG)	minutes	trivial	no	eats context window every request
System prompts	seconds	trivial	no	eats context, easily bypassed
WRIT	seconds	bit-identical	yes	runtime install, no checkpoint push

WRIT is the first approach to offer all three at once, at effectiveness comparable to multi-month training procedures on the capabilities it has been measured against.

The simplest way to feel the gap: a new failure mode shows up in production at 11pm on a Friday. What’s the timeline for a fix that doesn’t break ordinary users? For RLHF that’s measured in weeks. For an adapter, hours plus regression validation plus a fleet push. For a system prompt, seconds — but the next attacker variant defeats it, and every benign user pays for the prompt forever. For WRIT, the install time is in the same band as the system-prompt option, with the effectiveness of the retraining option, and it leaves benign users untouched.

What WRIT does.

At the capability level, WRIT lets a running language model:

Acquire a new fact and recall it under arbitrary phrasing — including phrasings that share no keywords with the training description.
Adopt a new behavior or response policy that overrides the base model’s defaults on a targeted class of inputs.
Refuse a designated class of inputs with high reliability — including adversarial inputs that bypass the model’s native safety training.
Forget designated information on demand and pass post-removal verification.
Suppress a targeted capability on inputs that match a defined trigger pattern, while leaving everything else untouched.

All of these operate on a base aligned model whose weights are not retrained, not modified on disk, and not permanently altered. WRIT is a layer the deployment can switch on, switch off, or compose with other WRIT operations per-request.

The properties that make it deployable.

WRIT inherits a set of operational properties that, taken together, do not appear in any existing LLM modification method.

Install time measured in seconds. Most operations land in well under a minute on a single accelerator.
Bit-identical removal. A removed WRIT operation leaves the underlying model byte-for-byte indistinguishable from the unmodified baseline. No drift, no checkpoint replacement, no validation suite required to confirm rollback.
Per-request selectivity. A WRIT operation fires only on inputs that match its targeted pattern. Inputs that don’t match see the unmodified model. Selectivity has been validated at zero benign-fire rate across thousands of unrelated benchmark prompts.
Composability at scale. Hundreds to thousands of WRIT operations can coexist on a single model. We have measured this at 5,000 simultaneous operations on a 7B model with 99% routing accuracy and zero cross-talk.
Negligible inference overhead. Measured at roughly +2.3% per request on a 7B model with active operations attached.
Tiny storage footprint. Each WRIT operation occupies a small fraction of an equivalent fine-tune. A fully populated deployment with thousands of operations remains a small fraction of model size.
No retraining infrastructure required. WRIT is constructed and applied at inference. There is no optimizer state, no gradient accumulation, no training loop, no validation epoch.

These properties are not aspirational. They are how WRIT behaves in our measured experiments today.

What WRIT is not.

WRIT is frequently mistaken for one of several adjacent ideas it is not. To save the reader’s time:

Not a fine-tune. No training, no optimizer, no gradient steps, no checkpoint produced.
Not LoRA, QLoRA, or any other adapter. Adapters are monolithic, trained, and applied uniformly to every request. WRIT is targeted, runtime, and per-request.
Not a system prompt or context-window technique. WRIT does not consume tokens, does not appear in the prompt, and is not vulnerable to prompt injection or context overflow.
Not retrieval-augmented generation. No external store is consulted at inference time for WRIT to function. (RAG remains usable on top of a WRIT-modified model.)
Not a router or mixture-of-experts layer. No additional expert networks, no architectural changes to the base model.
Not a guardrail wrapper or content filter. WRIT modifies model behavior at its source, not its outputs.

It is a genuinely new operating point. The closest analogies in software engineering are feature flags (instant, reversible, targeted) and cache invalidation (a runtime operation against a persistent store) — not anything from the existing model-training toolkit.

Validated breadth.

WRIT has been measured across multiple model families and at multiple scales. We report public-benchmark numbers and standard model checkpoints.

Model families validated.

Qwen2.5 (0.5B, 1.5B, 7B, 14B, 72B)
Llama 3.1 (8B)
Mistral 7B
Phi
Gemma

Selected validated outcomes.

Knowledge install · paraphrase-robust.

Single-fact installation with paraphrased recall · Qwen2.5-7B-Instructnear saturationvs ROME / MEMIT near zero on the same benchmark

ROME and MEMIT are the two leading academic methods for direct weight editing. WRIT clears the same paraphrased-recall bar that the published baselines cannot.

Capacity at scale.

Simultaneous WRIT operations on a 7B model5,00098.5% routing accuracy · 98.5% recall

MMLU change with 5,000 active operations−0.01 ppcapacity scales 10× with only a 1.3-point routing-accuracy drop

No-harm benchmark.

100 active operations · MMLU + GSM8K + HumanEval · 15,498 prompts0zero spurious activations · output bit-identical to unmodified baseline on every prompt

Real-time jailbreak defense.

Static jailbreak families · 500 attempts · attack-success-rate15.8% → 0.6%five families

Gradient-optimized adversarial-suffix attacks · zero exposure during setup69% → 21%strongest published attack class

Same attack class · with a small targeted sample→ 0%setup time 33s · reversibility bit-identical

Right-to-forget.

Targeted removal of one designated item · 7B modelmicrosecondsperfect post-removal verification · no measurable change to general capabilities

Wall-clock improvement over standard methods (ROME, MEMIT, LoRA-based unlearning)5–7 ordersof magnitude

Cross-family transfer.

The same primitive applies to Qwen, Llama, Mistral, Phi, and Gemma without architectural changes. Per-family parameter tuning is mechanical.

These are illustrative, not exhaustive. The list of capability surfaces WRIT has been measured against continues to grow. The full set of measured outcomes — every model, every benchmark, every cell — is on the public benchmark record.

Where WRIT is useful.

The shape of WRIT’s properties — fast, reversible, selective, composable, runtime — implies a different deployment model than anything currently shipping in production. We see four high-value surfaces.

Real-time safety operations.

Today, fixing a new failure mode in a deployed model is a multi-week engineering project. With WRIT, it becomes an operations capability: observe a new failure pattern, install a targeted patch, roll back if it regresses. This is the same shift web infrastructure made when it went from quarterly releases to continuous deployment — and it took the rollback primitive (bit-identical, instant) to make it possible.

Capability and policy customization.

Per-customer behaviors, per-tenant safety policies, jurisdiction-specific compliance constraints, and product-line-specific knowledge bases can be installed as WRIT operations on a single shared base model. Composability has been validated at thousands of simultaneous operations with no measurable cross-contamination.

Knowledge install where context windows fail.

When the corpus to install exceeds what a context window or RAG retrieval can deliver per request, WRIT operates as direct, per-request knowledge insertion at sub-millisecond cost. This becomes the dominant approach as deployment knowledge bases grow.

Targeted unlearning and right-to-forget.

Statutory deletion requests, takedowns, and right-to-be-forgotten requirements traditionally require either retraining or post-hoc filtering. WRIT performs targeted removal directly, in microseconds, with measurable verification.

These are use cases the technology demonstrably supports. The product strategy beyond this list is documented on the products page.

Status.

Validated end-to-end across five model families and across model sizes from sub-1B through 72B.
Capacity, reliability, and no-harm benchmarks completed at 7B scale.
Patent filings: a US provisional was filed in April 2026; a comprehensive standalone provisional followed shortly after. International filings are within the standard one-year window.
Live demonstrations are available to qualified counterparties under appropriate agreements.

What is intentionally not in this document.

This is a public document. The internal mechanism by which WRIT achieves these properties — the algorithm that constructs an operation, the properties that make selectivity reliable, the design choices that enable composition at scale, the engineering that drives the storage and overhead numbers above — is the subject of patent filings and is held confidentially.

We are happy to discuss the mechanism with serious technical counterparties under a mutual non-disclosure agreement.

For everyone else, the right framing is: WRIT is a primitive with a measured shape and measured outcomes on standard public benchmarks against standard public model checkpoints. The shape and the outcomes are what determine where it slots into a deployment. The mechanism is a question for a later conversation.

The longer-arc framing.

Every era of software has been shaped by a small number of primitives — small, sharp, composable building blocks that change the cost structure of a class of problems. The transistor. The file system. The TCP socket. The hash table. The container. The feature flag.

The LLM era has, so far, inherited primitives from training (gradient descent, adapters) and from inference (prompting, retrieval). It does not yet have a primitive for modifying a deployed model in real time, reversibly, and selectively. That is the shape of the gap WRIT fills.

We build the primitives AI needs to reach its full potential. WRIT is the first.

§ Engage

Build on the primitive.

For partnership, evaluation, or technical-deep-dive inquiries, get in touch. Pilots, design partnerships, and integration conversations are all open.

See the benchmarks →See the products →