Skip to main content
The Agent class includes built-in context management that automatically handles context window overflow during long conversations. When enabled, the middleware monitors token usage and applies reduction strategies before the context exceeds the model’s limit.

How It Works

Context management applies three strategies in order when the context window is exceeded:
  1. Prune old tool calls — Removes old tool call/return pairs, keeping only the most recent ones.
  2. LLM summarization — Summarizes older messages into condensed, structured messages via the LLM while keeping recent messages verbatim.
  3. Context full response — If the context is still full after all strategies, returns a fixed message indicating the context limit has been reached.

Usage

from upsonic import Agent, Task

agent = Agent(
    model="openai/gpt-4o-mini",
    context_management=True,           # Enabled by default
    context_management_keep_recent=5,   # Number of recent messages to always preserve
    context_management_model="anthropic/claude-sonnet-4-5"   # Model for context managing
)

# Task with potentially long context
long_text = "..." * 100000

task = Task(f"Summarize this text: {long_text}")
result = agent.do(task)
print(result)

Parameters

ParameterTypeDefaultDescription
context_managementboolTrueEnable or disable automatic context management.
context_management_keep_recentint5Number of recent messages (and tool call events) to preserve during pruning and summarization.
Context management uses a 90% safety margin of the model’s maximum context window. Token estimation relies on actual usage data from model responses when available, falling back to a character-based heuristic otherwise.