Local | Free | Opensource

Save tokens and extend your coding sessions.

Command outputs and state recollection are the biggest sources of token overuse and context clutter. CtxSift is a skill to save tokens by keeping sessions clutter-free and helping agents recover state faster.

Codex Antigravity Copilot Claude Code Cursor

OpenCode Cascade Replit Aider Other Agents
Getting Started
 
The motivation behind CtxSift

Show only high-signal tokens to the agent after command runs and context compactions.

In agentic workflows, raw command outputs and state recollection after compaction contribute to major token waste. Agents often pull raw terminal output into context even when they only need a few anchors, then pay the same cost again later when compaction forces them to reread files or rerun commands.

CtxSift was built to cut that loop down to two operations: keep only the signal that matters now, then recover it later without rebuilding the whole state trail.

It was inspired by the original Distill project and extends that direction toward local execution, file rereads, and read-after-compression state recovery for coding agents.

Read more →

Problem
Raw output sprawls
Most command output is noise relative to the next step the agent actually needs to take.
Failure mode
Compaction forgets
After long sessions, the agent often has to reconstruct state by expensive rereads and reruns.
Design goal
Recover state faster
Compression is only useful if the agent can later look up the same finding with trust signals attached.
Approach
Local-first workflow
Run locally when possible, support remote providers when needed, and keep the workflow grounded in real command use.
How it works

With CtxSift, your agents use two steps to keep minimal token footprint:
1. Extract and cache only what they need from raw outputs
2. Look up context later instead of repeatedly re-running commands or dragging raw terminal output back into the session.
That's it. Unlike other token savers, which can get heavy can confuse the agent with multiple tools, CtxSift keeps it simple and light. No multiple tools, MCP servers or sandbox spin-up dependencies.
See why it matters →

Compress
Agent uses pipe or command-capture mode with a natural-language instruction to extract exactly what it needs.
Cache
The compressed output is stored automatically with command metadata so it can be searched later instead of recreated.
Recall
Agent queries the stored record set, optionally boosted by files, and gets back exactly what it stored earlier.
Freshness
When source files change, results get marked stale so older context gets down-ranked and eventually cleaned up.
Real compression examples

See what CtxSift keeps from raw command output and what it strips away.

These cases come from benchmark fixtures and latest benchmarked outputs, with token counts shown for the raw input and the final compressed record.

Service restart-loop summary

systemd summary
Command systemctl restart api.service
430 Raw tokens
25 Output tokens
94.2% Smaller
Before
$ systemctl restart api.service
Job for api.service failed because the control process exited with error code.
See "systemctl status api.service" and "journalctl -u api.service -n 50" for details.
$ systemctl status api.service --no-pager
* api.service - HTTP API
May 16 11:27:40 buildbox api-server[21871]: {"level":"error","msg":"parse config","file":"/etc/api/config.yaml","line":17,"error":"yaml: unmarshal errors: line 17: field timeuot not found in type config.Server"}
May 16 11:27:41 buildbox systemd[1]: api.service: Start request repeated too quickly.
After
api.service fails due to a config error at /etc/api/config.yaml line 17: unexpected field "timeuot".
Source: benchmark fixture systemd-02 and latest remote gpt-4.1 benchmark output

Clone failure recall

git recall
Command git clone --progress https://github.com/example/very-large-repo.git
461 Raw tokens
32 Output tokens
93.1% Smaller
Before
$ git clone --progress https://github.com/example/very-large-repo.git
Cloning into 'very-large-repo'...
remote: Enumerating objects: 248731, done.
remote: Compressing objects: 100% (84791/84791), done.
Receiving objects:  64% (159188/248731), 125.01 MiB | 210.00 KiB/s
error: RPC failed; curl 56 Recv failure: Connection reset by peer
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output
After
git clone --progress https://github.com/example/very-large-repo.git failed with curl 56 Connection reset by peer and fetch-pack: invalid index-pack output
Source: benchmark fixture git-03 and latest remote gpt-4.1 benchmark output

Compose startup recall

docker-compose recall
Command docker compose up --build
364 Raw tokens
28 Output tokens
92.3% Smaller
Before
$ docker compose up --build
[+] Running 3/3
[ok] Network demo_default     Created
[ok] Container demo-db-1      Created
[ok] Container demo-app-1     Created
db-1   | database system is ready to accept connections
app-1  | applying migrations
app-1  | ERROR sqlalchemy.exc.ProgrammingError: relation "tenant_settings" does not exist
app-1 exited with code 1
Aborting on container exit...
After
docker compose up --build, demo-app-1, tenant_settings, sqlalchemy.exc.ProgrammingError, app-1 exited with code 1
Source: benchmark fixture docker-compose-02 and latest remote gpt-4.1 benchmark output
Supported Models

Use local models on CPU/GPU or remotely hosted LLMs for compression.

By default, CtxSift starts with a small GGUF model on local CPU. If you have CUDA available, local compression can use normal Hugging Face text-generation models instead. If you prefer hosted inference, remote compression works through LiteLLM-compatible endpoints.

Recall embeddings stay local and separate from compression, so the retrieval path remains the same whether compression is local or remote.

Local model guide →

Remote model guide →

Open benchmark guide →

Latest benchmark scores →

Support at a glance
Component Default / support How it works
Local compression Granite 4.0 350M GGUF by default CPU uses GGUF through built-in llama.cpp. CUDA local mode supports normal Hugging Face text-generation models.
Remote compression Any LiteLLM-compatible provider Enabled when remote base URL and model name are configured. Replaces local compression, not local embeddings.
Recall embeddings Harrier OSS v1 0.6B Used for storing and recalling records regardless of whether compression is local or remote.

Supported local model families are broader than the defaults shown here. The benchmarked picks below are the fastest way to start from known-good options.

Choose Your Setup Path

Start with the runtime path that matches your machine and workflow.

Use local CPU for the simplest default path, local GPU when you want faster local inference, and remote provider mode when you want hosted models through a LiteLLM-compatible endpoint.

Benchmarked model comparisons live on their own page. Use the benchmark guide when you want tested CPU and GPU recommendations rather than setup instructions.

Open benchmark guide →

Automated Install

Install with standalone scripts or let your agent install it with the install skill.