Compress
ctxsift compress is the entry point for turning noisy command output into a compact, cached record that the agent can look up later with ctxsift recall.
It does three things in one command:
- Runs a model against your instruction and the raw output to produce a compressed record
- Extracts structured signals (files, errors, test ids, symbols) that power later recall
- Stores and indexes the result so
recallcan find it semantically and lexically
Two invocation modes
Section titled “Two invocation modes”Pipe mode
Section titled “Pipe mode”Pipe any command’s output directly into ctxsift compress:
pytest -q | ctxsift compress --intent summary "show only failing tests and the files involved"pnpm build 2>&1 | ctxsift compress --intent summary "summarize build errors and point out misbehaving files"cat large-log-file.txt | ctxsift compress --intent recall "extract error lines and affected service names"Pipe mode reads stdin as the raw input. CtxSift does not know the exit code, duration, or stderr separately in this mode.
Command capture mode
Section titled “Command capture mode”Let CtxSift run the command itself using -- as a separator:
ctxsift compress --intent summary "summarize build errors" -- pnpm buildctxsift compress --intent exact-lines "return only the failing test ids" -- pytest -qCommand capture mode runs the command as a subprocess and captures stdout, stderr, exit code, and duration as separate fields. It also captures git metadata (HEAD, branch, dirty state) from the current workspace. All of this is stored with the record and used for freshness tracking during recall.
The --intent flag
Section titled “The --intent flag”--intent is required. It tells CtxSift what output contract the model should follow, changes the prompt and validation rules, and determines what kinds of outputs are considered acceptable for caching and benchmarking.
| Intent | What it produces | Use when |
|---|---|---|
summary | Plain-text prose summary | You want a readable current-step explanation and the exact final text shape is not critical |
recall | Plain-text summary optimized for later retrieval | You are compressing evidence mainly so it can be found and trusted again after context compaction or task switching |
exact-lines | Exact lines quoted from the raw input | You need verbatim anchors only, such as failing test ids, package names, first real error lines, or stack frames |
exact-format | Output in the exact format the instruction specifies | You need one strict textual shape, such as `SAFE |
json | Valid JSON | Another tool, parser, or downstream step will consume machine-readable objects or arrays |
yaml | Valid YAML | You need structured config-like output or human-editable key-value structure |
table | Markdown table | You need side-by-side comparison across rows and columns |
bullet-list | Markdown bullet list | You need a short scannable checklist or grouped findings list |
The practical rule is:
- Use
summaryfor “tell me what matters right now.” - Use
recallfor “store durable evidence I will want to search later.” - Use
exact-lineswhen every returned line must already exist in the input. - Use
exact-formatwhen the model must emit one strict text shape but not necessarily verbatim lines. - Use the structured intents when the next step benefits from a parser-friendly or scan-friendly schema.
# Return exact lines from the output — great for test ids, file pathsctxsift compress --intent exact-lines "return only the failing pytest test ids" -- pytest -q
# Return a JSON structurectxsift compress --intent json "extract each error as {file, line, message}" -- tsc --noEmit
# Return a bullet listctxsift compress --intent bullet-list "list all packages that failed to install" -- pnpm installStructured intents (json, yaml, table, bullet-list) all share the same “structured” validation family internally, but each one asks the model for a different concrete surface format.
What CtxSift preserves
Section titled “What CtxSift preserves”CtxSift is built for coding agents, so the compression target is not generic prose quality. Before the model runs, CtxSift deterministically extracts structured signals from the raw output:
| Signal type | Examples |
|---|---|
| File paths | src/auth/tokens.py, tests/test_users.py |
| Traceback frames | File "app/db.py", line 42, in connect |
| Test IDs | tests/api/test_users.py::test_create_user_requires_email |
| Package names | requests, pydantic, torch |
| Symbols | function names, class names, method names |
| Commands | pnpm build, pytest -q, docker compose up |
| Exit code lines | exited with code 1, exit status 2 |
| Warnings and errors | lines matching error and warning patterns |
These extracted signals are passed directly into the model prompt as structured context — so the model does not have to discover them itself — and are also stored separately for lexical recall search (BM25).
The system prompt explicitly instructs the model to preserve exact filenames, symbols, error codes, test names, line numbers, and commands verbatim. This is the key difference from generic summarization: a correct compress output should be runnable or referenceable, not just readable.
Exact-cache: why the same input never compresses twice
Section titled “Exact-cache: why the same input never compresses twice”CtxSift builds an exact cache key from:
- SHA-256 of the raw input
- normalized instruction (case-folded, whitespace-collapsed)
- compression intent
- model id and prompt version
max_output_tokens- CtxSift version
- workspace root
If you run the same command with the same instruction on the same input again, the stored result is returned immediately without calling the model. This means:
- no redundant model calls during re-runs
- consistent output for the same input regardless of when you run it
- recall always returns the same compressed form for the same raw output
What gets stored
Section titled “What gets stored”Every compression run that goes through the model stores a record containing:
- the compressed output
- the original instruction and normalized form
- SHA-256 of the raw input
- referenced files (with SHA-256 checksums and existence flags at capture time)
- extracted terms (files, test ids, symbols — used for FTS5 lexical recall)
- a vector embedding of the record (used for semantic recall)
- command metadata in capture mode: command, exit code, duration, stdout/stderr hashes
- git metadata in capture mode: HEAD, branch, dirty state
- model provider, model name, and prompt version
The record is immediately indexed in both FTS5 (lexical) and sqlite-vec (vector) so recall can find it from the next command.
Backend fallback behavior
Section titled “Backend fallback behavior”If the configured backend (local or remote) is unavailable when compress runs, CtxSift does not silently fail or block. It returns the raw output with a warning header and does not store the result:
[ctxsift warning] Local compression failed: <error>. Returning uncompressed output and skipping storage.This means the agent still gets output it can work with, and the failure is visible. It also means the record is not stored — the agent will not be able to recall it later. Check ctxsift doctor to diagnose the backend issue.
Options
Section titled “Options”ctxsift compress --intent INTENT [OPTIONS] INSTRUCTION [-- COMMAND [ARGS]...]| Option | Description |
|---|---|
--intent | Required output contract: summary, recall, exact-lines, exact-format, json, yaml, table, bullet-list |
--max-output-tokens | Override the max compressed output size for this run |
INSTRUCTION | Natural-language instruction for the model |
-- COMMAND | Optional: run this command and capture its output instead of reading stdin |
What happens next
Section titled “What happens next”After a compress run, the record is immediately available to ctxsift recall. The agent does not need to do anything to make it searchable.
See Recall for how to find it again.