Skip to content

Configure

CtsSift has a guided setup that you can run to get everything setup after install. Typically, you will only need to run this once.

ctxsift configure

This walks you through choosing local or remote compression, picking a model, writing a config file for your machine or workspace and installing the skill. This will also pre-download the required models so that you can start using it right away.

After running it, verify everything is working:

ctxsift doctor

If you prefer manual installation, download the skill from here.

ctxsift configure can install the CtxSift skill for supported agent hosts as part of the same guided flow.

During configure, you will be asked:

  1. Install the CtxSift agent skill for supported coding agents? (default: no)
  2. Install for (numbers, ranges, names, or all) with a numbered host list
  3. Scope prompts per selected host
  4. Target-path prompts for hosts that do not use one fixed built-in location

Supported host names:

  • copilot
  • antigravity
  • claude-code
  • codex
  • cursor
  • windsurf-cascade
  • cline
  • roo-code
  • kilo-code
  • continue
  • aider
  • opencode
  • gemini-cli
  • qwen-code
  • kiro
  • jetbrains-junie
  • openhands
  • zed-agent
  • sourcegraph-amp
  • augment-auggie
  • factory-droid
  • amazon-q-developer
  • replit-agent
  • devin
  • codegen
  • google-jules
  • other (custom target)

If you want to see the full support matrix before running setup, including which hosts are global-only, workspace-only, or shared-file based, see Supported agents.

For the newly added hosts, configure uses the documented global/workspace support from the host catalog and then suggests a default target path for that scope. Some hosts use a dedicated skill folder, while others use shared files such as AGENTS.md, GEMINI.md, or a rules/workflows folder. For shared files, CtxSift writes only its managed block instead of overwriting the whole file.

For other, and for any host where you want a different location than the suggested default, configure asks for a target path. You can provide:

  • a full file path ending in SKILL.md, or
  • a directory path (for example .agents/skills/ctxsift) and CtxSift writes SKILL.md inside it.

Configure prints one line per install result, such as:

Installed CtxSift skill for Copilot (workspace) at /path/to/repo/.github/skills/ctxsift/SKILL.md
Already current CtxSift skill for Codex (global) at /home/user/.codex/skills/ctxsift/SKILL.md

CtxSift resolves settings from four layers, in order:

Environment variable > Workspace config > Global config > Default

Global config applies to all workspaces on the machine. Stored at:

  • Linux/macOS: ~/.config/ctxsift/config.toml
  • Windows: %LOCALAPPDATA%\ctxsift\ctxsift\config.toml

Workspace config overrides global for one repo. Stored at:

  • .git/ctxsift/config.toml inside Git repos
  • .ctxsift/config.toml for non-Git workspaces

Environment variables override everything for the current shell or process. Useful in CI and per-run overrides.

To see what CtxSift is actually resolving (secrets redacted):

ctxsift config show # workspace-aware resolved config
ctxsift config show --global # global config file only

To write one key at a time:

ctxsift config set <key> <value> # writes to workspace config
ctxsift config set <key> <value> --global # writes to global config

CtxSift uses two separate model roles: one for compression and one for embeddings (recall). They are configured independently.

CPU mode uses embedded llama.cpp with a GGUF model. You supply a Hugging Face GGUF repo id and one concrete .gguf filename from that repo.

KeyDefaultEnv var
local.modelibm-granite/granite-4.0-350m-GGUFCTXSIFT_LOCAL_MODEL
local.gguf_filenamegranite-4.0-350m-Q8_0.ggufCTXSIFT_LOCAL_GGUF_FILENAME
local.llama_context_window8192 (built-in)CTXSIFT_LOCAL_LLAMA_CONTEXT_WINDOW
local.deviceautoCTXSIFT_LOCAL_DEVICE
ctxsift config set local.model unsloth/Qwen3.5-0.8B-GGUF --global
ctxsift config set local.gguf_filename Qwen3.5-0.8B-Q8_0.gguf --global

See Local models for benchmarked CPU picks.

CUDA mode uses Transformers. Supply a standard Hugging Face text-generation model id. gguf_filename is ignored on this path.

KeyDefaultEnv var
local.modelibm-granite/granite-4.0-350m-GGUFCTXSIFT_LOCAL_MODEL
local.deviceautoCTXSIFT_LOCAL_DEVICE
local.dtypeautoCTXSIFT_LOCAL_DTYPE
local.attn_implementationautoCTXSIFT_LOCAL_ATTN_IMPLEMENTATION
local.quantizationnoneCTXSIFT_LOCAL_QUANTIZATION
local.model_cache_path(empty)CTXSIFT_MODEL_CACHE_PATH
ctxsift config set local.model LiquidAI/LFM2.5-1.2B-Instruct --global
ctxsift config set local.device cuda --global

local.dtype — Accepted values: auto, float32, float16, bfloat16. Leave at auto unless you have a specific compatibility reason.

local.attn_implementation — Accepted values: auto, sdpa, flash_attention_2.

  • auto: CtxSift picks the safest supported backend automatically
  • sdpa: most conservative, broadly compatible
  • flash_attention_2: better throughput on supported CUDA GPUs; requires the flash-attn package. Not available on Windows in most setups.

local.quantization — Only applies to CUDA/Transformers. CPU llama.cpp models ignore this — quantization is baked into the .gguf file itself.

  • none: load at full precision (default)
  • bnb-8bit: 8-bit BitsAndBytes; good first step when VRAM is tight
  • bnb-4bit-fp4 / bnb-4bit-nf4: more aggressive; lower memory, more quality risk

Requires ctxsift[gpu,quant] for bnb modes.

local.model_cache_path — When set, CtxSift saves quantized checkpoints here to speed up cold starts. Leave empty to use the default Hugging Face cache.

Set remote.base_url to switch from local to a hosted provider via LiteLLM.

KeyDefaultEnv var
remote.base_url(empty — local mode)CTXSIFT_LLM_BASE_URL
remote.model_name(empty)CTXSIFT_LLM_MODEL
remote.api_key(empty)CTXSIFT_LLM_API_KEY
remote.api_version(empty)CTXSIFT_LLM_API_VERSION
remote.reasoning_modeautoCTXSIFT_LLM_REASONING_MODE
ctxsift config set remote.base_url https://api.openai.com/v1 --global
ctxsift config set remote.model_name gpt-4o-mini --global
ctxsift config set remote.api_key YOUR_KEY --global

remote.reasoning_mode — Accepted values: auto, true, false. This does not control reasoning effort — it tells CtxSift whether the model supports reasoning tokens so it can adjust its prompt structure. Leave at auto for most providers.

Remote mode replaces local compression but not local embeddings. Recall still runs through the embedding model regardless of which compression path is active.


These apply to all compression regardless of local or remote mode.

KeyDefaultEnv var
max_output_tokens512CTXSIFT_MAX_OUTPUT_TOKENS
timeout_ms90000CTXSIFT_TIMEOUT_MS
retries1CTXSIFT_RETRIES
recovery_enabledtrueCTXSIFT_RECOVERY_ENABLED
ctxsift config set max_output_tokens 768
ctxsift config set timeout_ms 120000
ctxsift config set recovery_enabled false

Recall uses a local Sentence Transformers-compatible embedding model. This runs independently of the compression path — even when compression is remote, embeddings stay local.

KeyDefaultEnv var
embedding.modelmicrosoft/harrier-oss-v1-0.6bCTXSIFT_EMBEDDING_MODEL
embedding.backendautoCTXSIFT_EMBEDDING_BACKEND
embedding.deviceautoCTXSIFT_EMBEDDING_DEVICE
embedding.dtypeautoCTXSIFT_EMBEDDING_DTYPE
embedding.attn_implementationautoCTXSIFT_EMBEDDING_ATTN_IMPLEMENTATION
embedding.max_length32768CTXSIFT_EMBEDDING_MAX_LENGTH

embedding.backendauto prefers ONNX Runtime for the default Harrier model on CPU when available, otherwise falls back to Torch.

embedding.model — Must be a Sentence Transformers-compatible model. Switching models changes the embedding dimension, which can require reinitializing the vector store for a workspace.

Advanced embedding prompt overrides (leave empty unless you know what they do):

KeyEnv var
embedding.query_prompt_nameCTXSIFT_EMBEDDING_QUERY_PROMPT_NAME
embedding.query_promptCTXSIFT_EMBEDDING_QUERY_PROMPT
embedding.document_prompt_nameCTXSIFT_EMBEDDING_DOCUMENT_PROMPT_NAME

These control how many candidates are fetched and scored during a recall search.

KeyDefaultEnv varNotes
recall.default_limit10CTXSIFT_RECALL_DEFAULT_LIMITMax results shown when --limit is not passed
recall.lexical_candidate_limit50CTXSIFT_RECALL_LEXICAL_CANDIDATE_LIMITBM25 candidates before hybrid fusion
recall.vector_candidate_limit50CTXSIFT_RECALL_VECTOR_CANDIDATE_LIMITVector candidates before hybrid fusion
recall.max_vector_distance0.75CTXSIFT_RECALL_MAX_VECTOR_DISTANCESmaller = stricter semantic filtering

Higher candidate limits improve recall quality but cost more SQLite and vector search work. The defaults are tuned for typical session sizes.


CtxSift serves local models through background daemons — one per effective runtime signature. They auto-start on first use, stay warm across workspaces with matching config, and shut down after idle.

KeyDefaultEnv var
daemon.enabledtrueCTXSIFT_DAEMON_ENABLED
daemon.idle_timeout_seconds600CTXSIFT_DAEMON_IDLE_TIMEOUT_SECONDS
daemon.startup_timeout_ms15000CTXSIFT_DAEMON_STARTUP_TIMEOUT_MS
daemon.embedding_batch_window_ms20CTXSIFT_DAEMON_EMBEDDING_BATCH_WINDOW_MS
daemon.embedding_max_batch_size16CTXSIFT_DAEMON_EMBEDDING_MAX_BATCH_SIZE

When daemon.enabled is false, CtxSift falls back to in-process model loading — slower cold starts and no cross-workspace model reuse.

ctxsift daemon status
ctxsift daemon stop --all

Controls how long compressed records are kept before background cleanup removes them.

KeyDefaultEnv var
retention.max_age_days30CTXSIFT_RETENTION_MAX_AGE_DAYS

All env vars at a glance:

# General
CTXSIFT_MAX_OUTPUT_TOKENS=512
CTXSIFT_TIMEOUT_MS=90000
CTXSIFT_RETRIES=1
CTXSIFT_RECOVERY_ENABLED=true
# Remote
CTXSIFT_LLM_BASE_URL=
CTXSIFT_LLM_MODEL=
CTXSIFT_LLM_API_KEY=
CTXSIFT_LLM_API_VERSION=
CTXSIFT_LLM_REASONING_MODE=auto
# Local compression
CTXSIFT_LOCAL_MODEL=ibm-granite/granite-4.0-350m-GGUF
CTXSIFT_LOCAL_GGUF_FILENAME=granite-4.0-350m-Q8_0.gguf
CTXSIFT_LOCAL_LLAMA_CONTEXT_WINDOW=8192
CTXSIFT_LOCAL_DEVICE=auto
CTXSIFT_LOCAL_DTYPE=auto
CTXSIFT_LOCAL_ATTN_IMPLEMENTATION=auto
CTXSIFT_LOCAL_QUANTIZATION=none
CTXSIFT_MODEL_CACHE_PATH=
# Embeddings
CTXSIFT_EMBEDDING_MODEL=microsoft/harrier-oss-v1-0.6b
CTXSIFT_EMBEDDING_BACKEND=auto
CTXSIFT_EMBEDDING_DEVICE=auto
CTXSIFT_EMBEDDING_DTYPE=auto
CTXSIFT_EMBEDDING_ATTN_IMPLEMENTATION=auto
CTXSIFT_EMBEDDING_MAX_LENGTH=32768
# Recall
CTXSIFT_RECALL_DEFAULT_LIMIT=10
CTXSIFT_RECALL_LEXICAL_CANDIDATE_LIMIT=50
CTXSIFT_RECALL_VECTOR_CANDIDATE_LIMIT=50
CTXSIFT_RECALL_MAX_VECTOR_DISTANCE=0.75
# Daemon
CTXSIFT_DAEMON_ENABLED=true
CTXSIFT_DAEMON_IDLE_TIMEOUT_SECONDS=600
CTXSIFT_DAEMON_STARTUP_TIMEOUT_MS=15000
CTXSIFT_DAEMON_EMBEDDING_BATCH_WINDOW_MS=20
CTXSIFT_DAEMON_EMBEDDING_MAX_BATCH_SIZE=16
# Retention
CTXSIFT_RETENTION_MAX_AGE_DAYS=30

See .env.example in the repo root for the full annotated version.