Agent class includes built-in context management that automatically handles context window overflow during long conversations. When enabled, the middleware monitors token usage and applies reduction strategies before the context exceeds the model’s limit.
How It Works
Context management applies three strategies in order when the context window is exceeded:- Prune old tool calls — Removes old tool call/return pairs, keeping only the most recent ones.
- LLM summarization — Summarizes older messages into condensed, structured messages via the LLM while keeping recent messages verbatim.
- Context full response — If the context is still full after all strategies, returns a fixed message indicating the context limit has been reached.
Usage
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
context_management | bool | True | Enable or disable automatic context management. |
context_management_keep_recent | int | 5 | Number of recent messages (and tool call events) to preserve during pruning and summarization. |
Context management uses a 90% safety margin of the model’s maximum context window. Token estimation relies on actual
usage data from model responses when available, falling back to a character-based heuristic otherwise.
