Distillation

Distillation compresses verbose context into token-efficient versions while preserving essential information. This helps you stay within context limits and reduce costs.

Context Size Research

Understanding how LLMs process context helps optimize your setup.

Key Findings

Recent research reveals important patterns in how LLMs handle context:

Continuous Degradation: Performance degrades as input grows, not at a specific threshold. The Context Rot study (Chroma, 2025) found accuracy is highest for early tokens and declines continuously.
Lost in the Middle: The Lost in the Middle paper (Liu et al., 2023) found LLMs process information at the start and end of context more reliably than the middle—a U-shaped performance curve with >30% degradation for middle-positioned content.
Effective vs Advertised: The Maximum Effective Context Window research found most models show severe degradation by ~1,000 tokens, falling 99% short of advertised windows.

ctxloom’s 16KB Warning

ctxloom warns when assembled context exceeds 16KB (~4,000 tokens):

ctxloom: warning: assembled context is 24KB (recommended max: 16KB)
ctxloom: warning: large context may reduce LLM effectiveness; consider distillation or fewer fragments

This threshold is conservative - degradation varies by model and task. The warning encourages you to:

Use distillation to compress verbose content
Prioritize most relevant fragments
Structure context with key information at start/end

Optimization Strategies

Strategy	Description
Distill verbose content	Compress 5,000 tokens → 800 tokens
Front-load key info	Put critical instructions at the start
Summarize at end	Reiterate key points at context end
Use tags selectively	Include only relevant fragments
Profile per task	Different tasks need different context

Why Distill?

The Problem

AI context windows have limits, and verbose documentation can quickly consume your budget:

A comprehensive coding standards document might be 5,000 tokens
You might want 10+ such documents in your context
That’s 50,000+ tokens just for standards, leaving little room for code

The Solution

Distillation uses AI to compress content while preserving meaning:

A verbose document can often be compressed 70-90% (e.g., 5,000 → 500-1,500 tokens)
Essential rules and patterns preserved, verbose explanations removed
More room for actual code and conversation

Actual compression varies by content type—structured guidelines compress well, code examples less so.

How It Works

ctxloom uses a hybrid compression approach:

AST-Based Compression (Code & JSON)

For structured content, ctxloom uses tree-sitter AST parsing for fast, deterministic compression:

Content Type	Strategy
Go, Python, JS, TS, Rust	Preserve signatures, elide function bodies
JSON	Preserve structure, truncate low-entropy values

This approach is:

Fast: No API calls, instant compression
Deterministic: Same input always produces same output
Structure-preserving: Maintains navigational breadcrumbs

LLM-Based Compression (Prose)

For prose and documentation, ctxloom falls back to LLM compression:

Original content is analyzed by an AI model
Key information is extracted and condensed
Distilled version is stored alongside the original
Content hash tracks when re-distillation is needed

Compression Router

When you distill content, ctxloom automatically routes to the best strategy:

Code file (.go, .py, .js, etc.) → AST compression
JSON file → JSON structure compression
Markdown/prose → LLM compression

Distilling Fragments

Single Fragment

# Distill a specific fragment
ctxloom fragment distill my-bundle#fragments/coding-standards

# Force re-distillation even if hash matches
ctxloom fragment distill --force my-bundle#fragments/coding-standards

Multiple Fragments

Distill fragments one at a time:

# Distill each fragment that needs it
ctxloom fragment distill my-bundle#fragments/coding-standards
ctxloom fragment distill my-bundle#fragments/testing-patterns

Checking Distillation Status

# Show fragment with distillation info
ctxloom fragment show my-bundle#fragments/coding-standards

# Show distilled version
ctxloom fragment show --distilled my-bundle#fragments/coding-standards

Using Distilled Content

Automatic Selection

By default, ctxloom uses distilled content when available:

# Uses distilled versions automatically
ctxloom run -f my-bundle#fragments/coding-standards

Prefer Original

To use original content instead:

# In config.yaml
defaults:
  use_distilled: false

Or per-run:

ctxloom run --no-distilled -f my-bundle#fragments/coding-standards

Bundle Configuration

In Bundle YAML

version: "1.0"
fragments:
  verbose-standards:
    content: |
      # Comprehensive Coding Standards

      [5000 tokens of detailed documentation...]

    # After distillation, these fields are added:
    distilled: |
      # Coding Standards (Distilled)

      [800 tokens of condensed key points...]

    content_hash: "sha256:abc123..."
    distilled_by: "claude-3-opus"

  keep-original:
    no_distill: true  # Prevent distillation
    content: |
      # Critical Exact Wording

      This content must be preserved exactly as written.

Distillation Fields

Field	Description
`content`	Original, full content
`distilled`	AI-compressed version
`content_hash`	SHA256 hash of content (for change detection)
`distilled_by`	Model that created the distillation
`no_distill`	If true, never distill this fragment

When to Distill

Good Candidates

Long reference documents - Style guides, standards, best practices
Comprehensive tutorials - Can be condensed to key points
API documentation - Essential patterns and gotchas
Historical context - Background info that’s useful but verbose

Poor Candidates

Code examples - Exact syntax matters
Legal/compliance text - Exact wording required
Configuration templates - Need precise formatting
Short fragments - Already concise, no benefit

Using no_distill

fragments:
  legal-disclaimer:
    no_distill: true  # Must preserve exact wording
    content: |
      IMPORTANT: This software is provided "as is"...

  code-template:
    no_distill: true  # Exact code matters
    content: |
      ```go
      func main() {
          // Exact template structure
      }
      ```

Distillation Quality

Compression Strategy

ctxloom’s distillation uses an extractive approach designed to preserve actionable information while removing redundancy. The algorithm:

Preserves (never removes):

Code syntax and exact patterns
Function/file/variable names (breadcrumbs for navigation)
Error handling rules and edge cases
Actionable instructions (“DO X”, “NEVER do Y”)
Technical constraints and requirements

Compresses aggressively:

Verbose explanations of “why”
Redundant examples (keeps 1 best example per concept)
Motivational/philosophical content
Historical context unless directly actionable

Target: 30-50% of original size while maintaining same structure.

What Makes Good Distillation

Preserves key concepts and essential rules
Maintains actionable guidance
Keeps critical examples (one per concept)
Removes redundancy and verbose explanations
Uses bullet points and abbreviations where clear

Example

Original (verbose):

# Error Handling in Go

Error handling is one of the most important aspects of writing reliable
Go programs. Unlike many other languages that use exceptions, Go takes
a different approach by treating errors as values that are returned
from functions. This design decision was intentional and reflects the
Go philosophy of being explicit about error conditions.

When a function can fail, it typically returns an error as its last
return value. The caller is then responsible for checking this error
and handling it appropriately. This might seem verbose at first, but
it makes the error handling path explicit and visible in the code...

[continues for 2000 more words]

Distilled:

# Go Error Handling

- Errors are values, not exceptions
- Return error as last value: `func Foo() (Result, error)`
- Always check: `if err != nil { return err }`
- Wrap with context: `fmt.Errorf("operation failed: %w", err)`
- Use sentinel errors sparingly: `var ErrNotFound = errors.New("not found")`
- Handle at appropriate level, don't over-wrap

Re-distillation

Automatic Detection

ctxloom tracks content hashes. When content changes, distillation is flagged as stale:

# Check if distillation is current
ctxloom fragment show my-bundle#fragments/standards
# Shows: "Distillation: stale (content changed)"

Triggering Re-distillation

# Re-distill a specific fragment
ctxloom fragment distill my-bundle#fragments/standards

# Force re-distill even if unchanged
ctxloom fragment distill --force my-bundle#fragments/standards

Cost Considerations

Distillation uses AI API calls, which have costs:

Each fragment requires one API call to distill
Longer content = more tokens = higher cost
Re-distillation only happens when content changes

Minimizing Costs

Distill selectively - Only distill fragments that benefit
Batch distillation - Distill all at once, not repeatedly
Use content hashes - Don’t re-distill unchanged content
Review before distilling - Ensure content is stable

Best Practices

Distill after finalizing - Don’t distill work-in-progress
Review distilled output - Ensure key info is preserved
Keep originals - Distilled versions can be regenerated
Document no_distill usage - Explain why certain content shouldn’t be distilled
Version control both - Commit both original and distilled versions