Context Compression - Upsonic AI

How It Works

Context management applies three strategies in order when the context window is exceeded:

Prune old tool calls — Removes old tool call/return pairs, keeping only the most recent ones.

LLM summarization — Summarizes older messages into condensed, structured messages via the LLM while keeping recent messages verbatim.

Context full response — If the context is still full after all strategies, returns a fixed message indicating the context limit has been reached.

Usage

from upsonic import Agent, Task

agent = Agent(
    model="openai/gpt-4o-mini",
    context_management=True,           # Enabled by default
    context_management_keep_recent=5,   # Number of recent messages to always preserve
    context_management_model="anthropic/claude-sonnet-4-5"   # Model for context managing
)

# Task with potentially long context
long_text = "..." * 100000

task = Task(f"Summarize this text: {long_text}")
result = agent.do(task)
print(result)

Parameters

Parameter	Type	Default	Description
`context_management`	`bool`	`True`	Enable or disable automatic context management.
`context_management_keep_recent`	`int`	`5`	Number of recent messages (and tool call events) to preserve during pruning and summarization.

Context management uses a 90% safety margin of the model’s maximum context window. Token estimation relies on actual usage data from model responses when available, falling back to a character-based heuristic otherwise.

​How It Works

​Usage

​Parameters

How It Works

Usage

Parameters