Skip to content

Compress

ctxsift compress is the entry point for turning noisy command output into a compact, cached record that the agent can look up later with ctxsift recall.

It does three things in one command:

  1. Runs a model against your instruction and the raw output to produce a compressed record
  2. Extracts structured signals (files, errors, test ids, symbols) that power later recall
  3. Stores and indexes the result so recall can find it semantically and lexically

Pipe any command’s output directly into ctxsift compress:

pytest -q | ctxsift compress --intent summary "show only failing tests and the files involved"
pnpm build 2>&1 | ctxsift compress --intent summary "summarize build errors and point out misbehaving files"
cat large-log-file.txt | ctxsift compress --intent recall "extract error lines and affected service names"

Pipe mode reads stdin as the raw input. CtxSift does not know the exit code, duration, or stderr separately in this mode.

Let CtxSift run the command itself using -- as a separator:

ctxsift compress --intent summary "summarize build errors" -- pnpm build
ctxsift compress --intent exact-lines "return only the failing test ids" -- pytest -q

Command capture mode runs the command as a subprocess and captures stdout, stderr, exit code, and duration as separate fields. It also captures git metadata (HEAD, branch, dirty state) from the current workspace. All of this is stored with the record and used for freshness tracking during recall.


--intent is required. It tells CtxSift what output contract the model should follow, changes the prompt and validation rules, and determines what kinds of outputs are considered acceptable for caching and benchmarking.

IntentWhat it producesUse when
summaryPlain-text prose summaryYou want a readable current-step explanation and the exact final text shape is not critical
recallPlain-text summary optimized for later retrievalYou are compressing evidence mainly so it can be found and trusted again after context compaction or task switching
exact-linesExact lines quoted from the raw inputYou need verbatim anchors only, such as failing test ids, package names, first real error lines, or stack frames
exact-formatOutput in the exact format the instruction specifiesYou need one strict textual shape, such as `SAFE
jsonValid JSONAnother tool, parser, or downstream step will consume machine-readable objects or arrays
yamlValid YAMLYou need structured config-like output or human-editable key-value structure
tableMarkdown tableYou need side-by-side comparison across rows and columns
bullet-listMarkdown bullet listYou need a short scannable checklist or grouped findings list

The practical rule is:

  • Use summary for “tell me what matters right now.”
  • Use recall for “store durable evidence I will want to search later.”
  • Use exact-lines when every returned line must already exist in the input.
  • Use exact-format when the model must emit one strict text shape but not necessarily verbatim lines.
  • Use the structured intents when the next step benefits from a parser-friendly or scan-friendly schema.
# Return exact lines from the output — great for test ids, file paths
ctxsift compress --intent exact-lines "return only the failing pytest test ids" -- pytest -q
# Return a JSON structure
ctxsift compress --intent json "extract each error as {file, line, message}" -- tsc --noEmit
# Return a bullet list
ctxsift compress --intent bullet-list "list all packages that failed to install" -- pnpm install

Structured intents (json, yaml, table, bullet-list) all share the same “structured” validation family internally, but each one asks the model for a different concrete surface format.


CtxSift is built for coding agents, so the compression target is not generic prose quality. Before the model runs, CtxSift deterministically extracts structured signals from the raw output:

Signal typeExamples
File pathssrc/auth/tokens.py, tests/test_users.py
Traceback framesFile "app/db.py", line 42, in connect
Test IDstests/api/test_users.py::test_create_user_requires_email
Package namesrequests, pydantic, torch
Symbolsfunction names, class names, method names
Commandspnpm build, pytest -q, docker compose up
Exit code linesexited with code 1, exit status 2
Warnings and errorslines matching error and warning patterns

These extracted signals are passed directly into the model prompt as structured context — so the model does not have to discover them itself — and are also stored separately for lexical recall search (BM25).

The system prompt explicitly instructs the model to preserve exact filenames, symbols, error codes, test names, line numbers, and commands verbatim. This is the key difference from generic summarization: a correct compress output should be runnable or referenceable, not just readable.


Exact-cache: why the same input never compresses twice

Section titled “Exact-cache: why the same input never compresses twice”

CtxSift builds an exact cache key from:

  • SHA-256 of the raw input
  • normalized instruction (case-folded, whitespace-collapsed)
  • compression intent
  • model id and prompt version
  • max_output_tokens
  • CtxSift version
  • workspace root

If you run the same command with the same instruction on the same input again, the stored result is returned immediately without calling the model. This means:

  • no redundant model calls during re-runs
  • consistent output for the same input regardless of when you run it
  • recall always returns the same compressed form for the same raw output

Every compression run that goes through the model stores a record containing:

  • the compressed output
  • the original instruction and normalized form
  • SHA-256 of the raw input
  • referenced files (with SHA-256 checksums and existence flags at capture time)
  • extracted terms (files, test ids, symbols — used for FTS5 lexical recall)
  • a vector embedding of the record (used for semantic recall)
  • command metadata in capture mode: command, exit code, duration, stdout/stderr hashes
  • git metadata in capture mode: HEAD, branch, dirty state
  • model provider, model name, and prompt version

The record is immediately indexed in both FTS5 (lexical) and sqlite-vec (vector) so recall can find it from the next command.


If the configured backend (local or remote) is unavailable when compress runs, CtxSift does not silently fail or block. It returns the raw output with a warning header and does not store the result:

[ctxsift warning] Local compression failed: <error>. Returning uncompressed output and skipping storage.

This means the agent still gets output it can work with, and the failure is visible. It also means the record is not stored — the agent will not be able to recall it later. Check ctxsift doctor to diagnose the backend issue.


ctxsift compress --intent INTENT [OPTIONS] INSTRUCTION [-- COMMAND [ARGS]...]
OptionDescription
--intentRequired output contract: summary, recall, exact-lines, exact-format, json, yaml, table, bullet-list
--max-output-tokensOverride the max compressed output size for this run
INSTRUCTIONNatural-language instruction for the model
-- COMMANDOptional: run this command and capture its output instead of reading stdin

After a compress run, the record is immediately available to ctxsift recall. The agent does not need to do anything to make it searchable.

See Recall for how to find it again.