Overview - Upsonic AI

Setup

Get your keys from the Langfuse dashboard → Settings → API Keys.

export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...

Install the optional dependencies:

uv pip install "upsonic[langfuse]"
# pip install "upsonic[langfuse]"

Usage with Agent / Task

Every agent.do() or agent.print_do() (including async versions) call is automatically traced to Langfuse when you pass instrument=langfuse.

Minimal — Keys from Environment

Agent
AutonomousAgent

from upsonic import Agent, Task
from upsonic.integrations.langfuse import Langfuse

langfuse = Langfuse()
agent = Agent("anthropic/claude-sonnet-4-6", instrument=langfuse)

task = Task(description="What is 2 + 2?")
agent.print_do(task)

langfuse.shutdown()

from upsonic import AutonomousAgent, Task
from upsonic.integrations.langfuse import Langfuse

langfuse = Langfuse()
agent = AutonomousAgent("anthropic/claude-sonnet-4-6", instrument=langfuse)

task = Task(description="What is 2 + 2?")
agent.print_do(task)

langfuse.shutdown()

Langfuse trace overview with input, output, and other metrics

Langfuse trace detail view with spans and metadata

Full Configuration with Session & User Tracking

Agent
AutonomousAgent

from upsonic import Agent, Task
from upsonic.integrations.langfuse import Langfuse
from upsonic.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"Sunny in {city}, 22°C"

langfuse = Langfuse(
    public_key="pk-lf-abc123",
    secret_key="sk-lf-xyz789",
    region="us",                  # "eu" (default) or "us"
    service_name="my-agent",
    include_content=True,         # include prompts/responses (default)
)

agent = Agent(
    "anthropic/claude-sonnet-4-6",
    instrument=langfuse,
    session_id="chat-session-001",
    user_id="user@example.com",
    tools=[get_weather],
)

task = Task(description="What is the weather in Paris?")
agent.print_do(task)

langfuse.shutdown()

from upsonic import AutonomousAgent, Task
from upsonic.integrations.langfuse import Langfuse
from upsonic.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"Sunny in {city}, 22°C"

langfuse = Langfuse(
    public_key="pk-lf-abc123",
    secret_key="sk-lf-xyz789",
    region="us",                  # "eu" (default) or "us"
    service_name="my-agent",
    include_content=True,         # include prompts/responses (default)
)

agent = AutonomousAgent(
    "anthropic/claude-sonnet-4-6",
    instrument=langfuse,
    session_id="chat-session-001",
    user_id="user@example.com",
    tools=[get_weather],
)

task = Task(description="What is the weather in Paris?")
agent.print_do(task)

langfuse.shutdown()

Autonomous Agent Example

Autonomous agents make their own decisions about which tools to call and in what order — there is no predefined workflow. This makes tracing especially important: you need full visibility into every decision the agent made, which tools it invoked, and what results it got back. Langfuse gives you that observability out of the box. Here is a real-world example — an expense tracker bot that reads receipt images autonomously:

"""
Receipt Tracker — Autonomous Agent Example

Drop receipt images (PNG, JPG) or PDFs into workspace/receipts/ and run this script.
The agent reads each receipt, extracts the expense details, logs them, and reports
your monthly spending summary.
"""

import os
from dotenv import load_dotenv
from upsonic import AutonomousAgent, Task
from upsonic.integrations.langfuse import Langfuse

langfuse = Langfuse()

from tools import ocr_extract_text

load_dotenv()

agent = AutonomousAgent(
    model="anthropic/claude-sonnet-4-6",
    tools=[ocr_extract_text],
    workspace=os.path.join(os.path.dirname(__file__), "workspace"),
    instrument=langfuse,
)

task = Task("Save my receipts from the receipts/ folder and tell me my monthly expenses. DO NOT USE run_python to execute any code. COMMAND! USE BOTH read_file AND ocr_extract_text TOOLS!")

if __name__ == "__main__":
    agent.print_do(task)
    langfuse.shutdown()

If you want to see the full project with OCR tools, workspace setup, and sample receipts, check out the complete example: Expense Tracker Bot.

Langfuse trace overview showing the autonomous agent's full execution pipeline, input prompt, extracted receipts, and monthly expense summary

Langfuse LLM generation detail showing tool call status for file_info, list_files, read_file and other available tools

Langfuse trace timeline showing sequential chat and tool execution steps with latency and cost per step

Usage with Evaluation

What is AccuracyEvaluator?

AccuracyEvaluator measures how well your agent’s output matches an expected answer. It uses a separate judge agent to score the agent-under-test on a scale of 0–10. You can run multiple iterations for statistical confidence. When connected to Langfuse, each evaluation is automatically logged as a dataset item, linked to the agent’s trace, and scored — giving you a complete audit trail of your agent’s accuracy over time.

For a deep dive into accuracy evaluation concepts, configuration options, and best practices, see the AccuracyEval Introduction.

AccuracyEvaluator with Langfuse Datasets

When you pass langfuse to AccuracyEvaluator, evaluation results are automatically:

Logged as a dataset item
Linked to the agent’s trace via a dataset run item
Scored on the trace with the evaluation result

Agent
AutonomousAgent

import asyncio
from upsonic import Agent, Task
from upsonic.integrations.langfuse import Langfuse
from upsonic.eval import AccuracyEvaluator

langfuse = Langfuse()

agent = Agent("anthropic/claude-sonnet-4-6", instrument=langfuse)
judge = Agent("anthropic/claude-sonnet-4-6")

evaluator = AccuracyEvaluator(
    judge_agent=judge,
    agent_under_test=agent,
    query="What is the capital of France?",
    expected_output="Paris",
    langfuse=langfuse,
    langfuse_dataset_name="my-eval-dataset",  # optional, default: "accuracy-eval"
    langfuse_run_name="run-v1",               # optional, default: auto-generated
)

result = asyncio.run(evaluator.run())
print(f"Score: {result.average_score}/10")

langfuse.shutdown()

import asyncio
from upsonic import AutonomousAgent, Task
from upsonic.integrations.langfuse import Langfuse
from upsonic.eval import AccuracyEvaluator

langfuse = Langfuse()

agent = AutonomousAgent("anthropic/claude-sonnet-4-6", instrument=langfuse)
judge = AutonomousAgent("anthropic/claude-sonnet-4-6")

evaluator = AccuracyEvaluator(
    judge_agent=judge,
    agent_under_test=agent,
    query="What is the capital of France?",
    expected_output="Paris",
    langfuse=langfuse,
    langfuse_dataset_name="my-eval-dataset",  # optional, default: "accuracy-eval"
    langfuse_run_name="run-v1",               # optional, default: auto-generated
)

result = asyncio.run(evaluator.run())
print(f"Score: {result.average_score}/10")

langfuse.shutdown()

AccuracyEvaluator with Multiple Iterations

Run the same query multiple times to get statistical confidence:

Agent
AutonomousAgent

import asyncio
from upsonic import Agent
from upsonic.integrations.langfuse import Langfuse
from upsonic.eval import AccuracyEvaluator

langfuse = Langfuse()
agent = Agent("anthropic/claude-sonnet-4-6", instrument=langfuse)
judge = Agent("anthropic/claude-sonnet-4-6")

evaluator = AccuracyEvaluator(
    judge_agent=judge,
    agent_under_test=agent,
    query="What is 10 + 5? Reply with just the number.",
    expected_output="15",
    num_iterations=3,
    langfuse=langfuse,
    langfuse_dataset_name="math-eval",
)

result = asyncio.run(evaluator.run())
print(f"Average score: {result.average_score}/10")
print(f"Iterations: {len(result.evaluation_scores)}")

langfuse.shutdown()

import asyncio
from upsonic import AutonomousAgent
from upsonic.integrations.langfuse import Langfuse
from upsonic.eval import AccuracyEvaluator

langfuse = Langfuse()
agent = AutonomousAgent("anthropic/claude-sonnet-4-6", instrument=langfuse)
judge = AutonomousAgent("anthropic/claude-sonnet-4-6")

evaluator = AccuracyEvaluator(
    judge_agent=judge,
    agent_under_test=agent,
    query="What is 10 + 5? Reply with just the number.",
    expected_output="15",
    num_iterations=3,
    langfuse=langfuse,
    langfuse_dataset_name="math-eval",
)

result = asyncio.run(evaluator.run())
print(f"Average score: {result.average_score}/10")
print(f"Iterations: {len(result.evaluation_scores)}")

langfuse.shutdown()

Langfuse trace showing accuracy_eval_score

AccuracyEvaluator Parameters

Parameter	Type	Default	Description
`langfuse`	`Langfuse`	`None`	Langfuse instance for dataset logging
`langfuse_dataset_name`	`str`	`"accuracy-eval"`	Name of the Langfuse dataset to create/use
`langfuse_run_name`	`str`	auto-generated	Name for the dataset run
`num_iterations`	`int`	`1`	Number of times to run the evaluation

Advanced APIs

For direct access to Langfuse’s scoring, datasets, and annotation queue APIs, see the Advanced guides:

Scores — Add numeric, boolean, or categorical scores to traces
Score Configs — Define validation rules for scores
Datasets — Create datasets, add items, and link traces
Annotation Queues — Create review queues for human evaluation
Update Trace — Override trace output or metadata after a run

`langfuse.shutdown()`

Why call langfuse.shutdown()? Langfuse sends traces via a BatchSpanProcessor that buffers spans and exports them every 5 seconds by default. When you call langfuse.shutdown(), it forces all buffered spans to be flushed to Langfuse before shutting down the provider.By default flush_on_exit=True registers a Python atexit handler that calls shutdown() automatically when the process exits. However, calling it explicitly at the end of your script is recommended for short-lived scripts because atexit handlers can be skipped in edge cases (e.g., SIGKILL, os._exit()).

Parameters Reference

Parameter	Type	Default	Description
`public_key`	`str`	env `LANGFUSE_PUBLIC_KEY`	Langfuse public key
`secret_key`	`str`	env `LANGFUSE_SECRET_KEY`	Langfuse secret key
`host`	`str`	env `LANGFUSE_HOST`	Custom host URL
`region`	`"eu"` \| `"us"`	`"eu"`	Cloud region (ignored if `host` is set)
`service_name`	`str`	`"upsonic"`	Service name in traces
`sample_rate`	`float`	`1.0`	Fraction of traces to sample (0.0–1.0)
`include_content`	`bool`	`True`	Include prompts/responses in traces
`flush_on_exit`	`bool`	`True`	Auto-flush on process exit

Environment Variables

Variable	Description
`LANGFUSE_PUBLIC_KEY`	Langfuse public key (`pk-lf-...`)
`LANGFUSE_SECRET_KEY`	Langfuse secret key (`sk-lf-...`)
`LANGFUSE_HOST`	Custom Langfuse host URL

​Setup

​Usage with Agent / Task

​Minimal — Keys from Environment

​Full Configuration with Session & User Tracking

​Autonomous Agent Example

​Usage with Evaluation

​What is AccuracyEvaluator?

​AccuracyEvaluator with Langfuse Datasets

​AccuracyEvaluator with Multiple Iterations

​AccuracyEvaluator Parameters

​Advanced APIs

​langfuse.shutdown()

​Parameters Reference

​Environment Variables

Setup

Usage with Agent / Task

Minimal — Keys from Environment

Full Configuration with Session & User Tracking

Autonomous Agent Example

Usage with Evaluation

What is AccuracyEvaluator?

AccuracyEvaluator with Langfuse Datasets

AccuracyEvaluator with Multiple Iterations

AccuracyEvaluator Parameters

Advanced APIs

`langfuse.shutdown()`

Parameters Reference

Environment Variables