Configure
CtsSift has a guided setup that you can run to get everything setup after install. Typically, you will only need to run this once.
ctxsift configureThis walks you through choosing local or remote compression, picking a model, writing a config file for your machine or workspace and installing the skill. This will also pre-download the required models so that you can start using it right away.
After running it, verify everything is working:
ctxsift doctorAgent skill installation
Section titled “Agent skill installation”If you prefer manual installation, download the skill from here.
ctxsift configure can install the CtxSift skill for supported agent hosts as part of the same guided flow.
During configure, you will be asked:
Install the CtxSift agent skill for supported coding agents?(default:no)Install for (numbers, ranges, names, or all)with a numbered host list- Scope prompts per selected host
- Target-path prompts for hosts that do not use one fixed built-in location
Supported host names:
copilotantigravityclaude-codecodexcursorwindsurf-cascadeclineroo-codekilo-codecontinueaideropencodegemini-cliqwen-codekirojetbrains-junieopenhandszed-agentsourcegraph-ampaugment-auggiefactory-droidamazon-q-developerreplit-agentdevincodegengoogle-julesother(custom target)
If you want to see the full support matrix before running setup, including which hosts are global-only, workspace-only, or shared-file based, see Supported agents.
For the newly added hosts, configure uses the documented global/workspace support from the host catalog and then suggests a default target path for that scope. Some hosts use a dedicated skill folder, while others use shared files such as AGENTS.md, GEMINI.md, or a rules/workflows folder. For shared files, CtxSift writes only its managed block instead of overwriting the whole file.
For other, and for any host where you want a different location than the suggested default, configure asks for a target path. You can provide:
- a full file path ending in
SKILL.md, or - a directory path (for example
.agents/skills/ctxsift) and CtxSift writesSKILL.mdinside it.
Configure prints one line per install result, such as:
Installed CtxSift skill for Copilot (workspace) at /path/to/repo/.github/skills/ctxsift/SKILL.mdAlready current CtxSift skill for Codex (global) at /home/user/.codex/skills/ctxsift/SKILL.mdHow config works
Section titled “How config works”CtxSift resolves settings from four layers, in order:
Environment variable > Workspace config > Global config > DefaultGlobal config applies to all workspaces on the machine. Stored at:
- Linux/macOS:
~/.config/ctxsift/config.toml - Windows:
%LOCALAPPDATA%\ctxsift\ctxsift\config.toml
Workspace config overrides global for one repo. Stored at:
.git/ctxsift/config.tomlinside Git repos.ctxsift/config.tomlfor non-Git workspaces
Environment variables override everything for the current shell or process. Useful in CI and per-run overrides.
To see what CtxSift is actually resolving (secrets redacted):
ctxsift config show # workspace-aware resolved configctxsift config show --global # global config file onlyTo write one key at a time:
ctxsift config set <key> <value> # writes to workspace configctxsift config set <key> <value> --global # writes to global configCompression model
Section titled “Compression model”CtxSift uses two separate model roles: one for compression and one for embeddings (recall). They are configured independently.
Local CPU (default)
Section titled “Local CPU (default)”CPU mode uses embedded llama.cpp with a GGUF model. You supply a Hugging Face GGUF repo id and one concrete .gguf filename from that repo.
| Key | Default | Env var |
|---|---|---|
local.model | ibm-granite/granite-4.0-350m-GGUF | CTXSIFT_LOCAL_MODEL |
local.gguf_filename | granite-4.0-350m-Q8_0.gguf | CTXSIFT_LOCAL_GGUF_FILENAME |
local.llama_context_window | 8192 (built-in) | CTXSIFT_LOCAL_LLAMA_CONTEXT_WINDOW |
local.device | auto | CTXSIFT_LOCAL_DEVICE |
ctxsift config set local.model unsloth/Qwen3.5-0.8B-GGUF --globalctxsift config set local.gguf_filename Qwen3.5-0.8B-Q8_0.gguf --globalSee Local models for benchmarked CPU picks.
Local CUDA
Section titled “Local CUDA”CUDA mode uses Transformers. Supply a standard Hugging Face text-generation model id. gguf_filename is ignored on this path.
| Key | Default | Env var |
|---|---|---|
local.model | ibm-granite/granite-4.0-350m-GGUF | CTXSIFT_LOCAL_MODEL |
local.device | auto | CTXSIFT_LOCAL_DEVICE |
local.dtype | auto | CTXSIFT_LOCAL_DTYPE |
local.attn_implementation | auto | CTXSIFT_LOCAL_ATTN_IMPLEMENTATION |
local.quantization | none | CTXSIFT_LOCAL_QUANTIZATION |
local.model_cache_path | (empty) | CTXSIFT_MODEL_CACHE_PATH |
ctxsift config set local.model LiquidAI/LFM2.5-1.2B-Instruct --globalctxsift config set local.device cuda --globallocal.dtype — Accepted values: auto, float32, float16, bfloat16. Leave at auto unless you have a specific compatibility reason.
local.attn_implementation — Accepted values: auto, sdpa, flash_attention_2.
auto: CtxSift picks the safest supported backend automaticallysdpa: most conservative, broadly compatibleflash_attention_2: better throughput on supported CUDA GPUs; requires theflash-attnpackage. Not available on Windows in most setups.
local.quantization — Only applies to CUDA/Transformers. CPU llama.cpp models ignore this — quantization is baked into the .gguf file itself.
none: load at full precision (default)bnb-8bit: 8-bit BitsAndBytes; good first step when VRAM is tightbnb-4bit-fp4/bnb-4bit-nf4: more aggressive; lower memory, more quality risk
Requires ctxsift[gpu,quant] for bnb modes.
local.model_cache_path — When set, CtxSift saves quantized checkpoints here to speed up cold starts. Leave empty to use the default Hugging Face cache.
Remote compression
Section titled “Remote compression”Set remote.base_url to switch from local to a hosted provider via LiteLLM.
| Key | Default | Env var |
|---|---|---|
remote.base_url | (empty — local mode) | CTXSIFT_LLM_BASE_URL |
remote.model_name | (empty) | CTXSIFT_LLM_MODEL |
remote.api_key | (empty) | CTXSIFT_LLM_API_KEY |
remote.api_version | (empty) | CTXSIFT_LLM_API_VERSION |
remote.reasoning_mode | auto | CTXSIFT_LLM_REASONING_MODE |
ctxsift config set remote.base_url https://api.openai.com/v1 --globalctxsift config set remote.model_name gpt-4o-mini --globalctxsift config set remote.api_key YOUR_KEY --globalremote.reasoning_mode — Accepted values: auto, true, false. This does not control reasoning effort — it tells CtxSift whether the model supports reasoning tokens so it can adjust its prompt structure. Leave at auto for most providers.
Remote mode replaces local compression but not local embeddings. Recall still runs through the embedding model regardless of which compression path is active.
General compression settings
Section titled “General compression settings”These apply to all compression regardless of local or remote mode.
| Key | Default | Env var |
|---|---|---|
max_output_tokens | 512 | CTXSIFT_MAX_OUTPUT_TOKENS |
timeout_ms | 90000 | CTXSIFT_TIMEOUT_MS |
retries | 1 | CTXSIFT_RETRIES |
recovery_enabled | true | CTXSIFT_RECOVERY_ENABLED |
ctxsift config set max_output_tokens 768ctxsift config set timeout_ms 120000ctxsift config set recovery_enabled falseEmbeddings
Section titled “Embeddings”Recall uses a local Sentence Transformers-compatible embedding model. This runs independently of the compression path — even when compression is remote, embeddings stay local.
| Key | Default | Env var |
|---|---|---|
embedding.model | microsoft/harrier-oss-v1-0.6b | CTXSIFT_EMBEDDING_MODEL |
embedding.backend | auto | CTXSIFT_EMBEDDING_BACKEND |
embedding.device | auto | CTXSIFT_EMBEDDING_DEVICE |
embedding.dtype | auto | CTXSIFT_EMBEDDING_DTYPE |
embedding.attn_implementation | auto | CTXSIFT_EMBEDDING_ATTN_IMPLEMENTATION |
embedding.max_length | 32768 | CTXSIFT_EMBEDDING_MAX_LENGTH |
embedding.backend — auto prefers ONNX Runtime for the default Harrier model on CPU when available, otherwise falls back to Torch.
embedding.model — Must be a Sentence Transformers-compatible model. Switching models changes the embedding dimension, which can require reinitializing the vector store for a workspace.
Advanced embedding prompt overrides (leave empty unless you know what they do):
| Key | Env var |
|---|---|
embedding.query_prompt_name | CTXSIFT_EMBEDDING_QUERY_PROMPT_NAME |
embedding.query_prompt | CTXSIFT_EMBEDDING_QUERY_PROMPT |
embedding.document_prompt_name | CTXSIFT_EMBEDDING_DOCUMENT_PROMPT_NAME |
Recall
Section titled “Recall”These control how many candidates are fetched and scored during a recall search.
| Key | Default | Env var | Notes |
|---|---|---|---|
recall.default_limit | 10 | CTXSIFT_RECALL_DEFAULT_LIMIT | Max results shown when --limit is not passed |
recall.lexical_candidate_limit | 50 | CTXSIFT_RECALL_LEXICAL_CANDIDATE_LIMIT | BM25 candidates before hybrid fusion |
recall.vector_candidate_limit | 50 | CTXSIFT_RECALL_VECTOR_CANDIDATE_LIMIT | Vector candidates before hybrid fusion |
recall.max_vector_distance | 0.75 | CTXSIFT_RECALL_MAX_VECTOR_DISTANCE | Smaller = stricter semantic filtering |
Higher candidate limits improve recall quality but cost more SQLite and vector search work. The defaults are tuned for typical session sizes.
Daemon
Section titled “Daemon”CtxSift serves local models through background daemons — one per effective runtime signature. They auto-start on first use, stay warm across workspaces with matching config, and shut down after idle.
| Key | Default | Env var |
|---|---|---|
daemon.enabled | true | CTXSIFT_DAEMON_ENABLED |
daemon.idle_timeout_seconds | 600 | CTXSIFT_DAEMON_IDLE_TIMEOUT_SECONDS |
daemon.startup_timeout_ms | 15000 | CTXSIFT_DAEMON_STARTUP_TIMEOUT_MS |
daemon.embedding_batch_window_ms | 20 | CTXSIFT_DAEMON_EMBEDDING_BATCH_WINDOW_MS |
daemon.embedding_max_batch_size | 16 | CTXSIFT_DAEMON_EMBEDDING_MAX_BATCH_SIZE |
When daemon.enabled is false, CtxSift falls back to in-process model loading — slower cold starts and no cross-workspace model reuse.
ctxsift daemon statusctxsift daemon stop --allRetention
Section titled “Retention”Controls how long compressed records are kept before background cleanup removes them.
| Key | Default | Env var |
|---|---|---|
retention.max_age_days | 30 | CTXSIFT_RETENTION_MAX_AGE_DAYS |
Quick reference
Section titled “Quick reference”All env vars at a glance:
# GeneralCTXSIFT_MAX_OUTPUT_TOKENS=512CTXSIFT_TIMEOUT_MS=90000CTXSIFT_RETRIES=1CTXSIFT_RECOVERY_ENABLED=true
# RemoteCTXSIFT_LLM_BASE_URL=CTXSIFT_LLM_MODEL=CTXSIFT_LLM_API_KEY=CTXSIFT_LLM_API_VERSION=CTXSIFT_LLM_REASONING_MODE=auto
# Local compressionCTXSIFT_LOCAL_MODEL=ibm-granite/granite-4.0-350m-GGUFCTXSIFT_LOCAL_GGUF_FILENAME=granite-4.0-350m-Q8_0.ggufCTXSIFT_LOCAL_LLAMA_CONTEXT_WINDOW=8192CTXSIFT_LOCAL_DEVICE=autoCTXSIFT_LOCAL_DTYPE=autoCTXSIFT_LOCAL_ATTN_IMPLEMENTATION=autoCTXSIFT_LOCAL_QUANTIZATION=noneCTXSIFT_MODEL_CACHE_PATH=
# EmbeddingsCTXSIFT_EMBEDDING_MODEL=microsoft/harrier-oss-v1-0.6bCTXSIFT_EMBEDDING_BACKEND=autoCTXSIFT_EMBEDDING_DEVICE=autoCTXSIFT_EMBEDDING_DTYPE=autoCTXSIFT_EMBEDDING_ATTN_IMPLEMENTATION=autoCTXSIFT_EMBEDDING_MAX_LENGTH=32768
# RecallCTXSIFT_RECALL_DEFAULT_LIMIT=10CTXSIFT_RECALL_LEXICAL_CANDIDATE_LIMIT=50CTXSIFT_RECALL_VECTOR_CANDIDATE_LIMIT=50CTXSIFT_RECALL_MAX_VECTOR_DISTANCE=0.75
# DaemonCTXSIFT_DAEMON_ENABLED=trueCTXSIFT_DAEMON_IDLE_TIMEOUT_SECONDS=600CTXSIFT_DAEMON_STARTUP_TIMEOUT_MS=15000CTXSIFT_DAEMON_EMBEDDING_BATCH_WINDOW_MS=20CTXSIFT_DAEMON_EMBEDDING_MAX_BATCH_SIZE=16
# RetentionCTXSIFT_RETENTION_MAX_AGE_DAYS=30See .env.example in the repo root for the full annotated version.