Skip to main content

Setup

Get your keys from the Langfuse dashboard → Settings → API Keys.
export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...
Install the optional dependencies:
uv pip install "upsonic[langfuse]"
# pip install "upsonic[langfuse]"

Usage with Agent / Task

Every agent.do() or agent.print_do() (including async versions) call is automatically traced to Langfuse when you pass instrument=langfuse.

Minimal — Keys from Environment

from upsonic import Agent, Task
from upsonic.integrations.langfuse import Langfuse

langfuse = Langfuse()
agent = Agent("anthropic/claude-sonnet-4-6", instrument=langfuse)

task = Task(description="What is 2 + 2?")
agent.print_do(task)

langfuse.shutdown()
Langfuse trace overview with input, output, and other metrics
Langfuse trace detail view with spans and metadata

Full Configuration with Session & User Tracking

from upsonic import Agent, Task
from upsonic.integrations.langfuse import Langfuse
from upsonic.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"Sunny in {city}, 22°C"

langfuse = Langfuse(
    public_key="pk-lf-abc123",
    secret_key="sk-lf-xyz789",
    region="us",                  # "eu" (default) or "us"
    service_name="my-agent",
    include_content=True,         # include prompts/responses (default)
)

agent = Agent(
    "anthropic/claude-sonnet-4-6",
    instrument=langfuse,
    session_id="chat-session-001",
    user_id="user@example.com",
    tools=[get_weather],
)

task = Task(description="What is the weather in Paris?")
agent.print_do(task)

langfuse.shutdown()
Langfuse trace with tool call spans
Langfuse session grouping view
Langfuse user ID tracking on traces

Autonomous Agent Example

Autonomous agents make their own decisions about which tools to call and in what order — there is no predefined workflow. This makes tracing especially important: you need full visibility into every decision the agent made, which tools it invoked, and what results it got back. Langfuse gives you that observability out of the box. Here is a real-world example — an expense tracker bot that reads receipt images autonomously:
"""
Receipt Tracker — Autonomous Agent Example

Drop receipt images (PNG, JPG) or PDFs into workspace/receipts/ and run this script.
The agent reads each receipt, extracts the expense details, logs them, and reports
your monthly spending summary.
"""

import os
from dotenv import load_dotenv
from upsonic import AutonomousAgent, Task
from upsonic.integrations.langfuse import Langfuse

langfuse = Langfuse()

from tools import ocr_extract_text

load_dotenv()

agent = AutonomousAgent(
    model="anthropic/claude-sonnet-4-6",
    tools=[ocr_extract_text],
    workspace=os.path.join(os.path.dirname(__file__), "workspace"),
    instrument=langfuse,
)

task = Task("Save my receipts from the receipts/ folder and tell me my monthly expenses. DO NOT USE run_python to execute any code. COMMAND! USE BOTH read_file AND ocr_extract_text TOOLS!")

if __name__ == "__main__":
    agent.print_do(task)
    langfuse.shutdown()
If you want to see the full project with OCR tools, workspace setup, and sample receipts, check out the complete example: Expense Tracker Bot.
Langfuse trace overview showing the autonomous agent's full execution pipeline, input prompt, extracted receipts, and monthly expense summary
Langfuse LLM generation detail showing tool call status for file_info, list_files, read_file and other available tools
Langfuse trace timeline showing sequential chat and tool execution steps with latency and cost per step

Usage with Evaluation

What is AccuracyEvaluator?

AccuracyEvaluator measures how well your agent’s output matches an expected answer. It uses a separate judge agent to score the agent-under-test on a scale of 0–10. You can run multiple iterations for statistical confidence. When connected to Langfuse, each evaluation is automatically logged as a dataset item, linked to the agent’s trace, and scored — giving you a complete audit trail of your agent’s accuracy over time.
For a deep dive into accuracy evaluation concepts, configuration options, and best practices, see the AccuracyEval Introduction.

AccuracyEvaluator with Langfuse Datasets

When you pass langfuse to AccuracyEvaluator, evaluation results are automatically:
  1. Logged as a dataset item
  2. Linked to the agent’s trace via a dataset run item
  3. Scored on the trace with the evaluation result
import asyncio
from upsonic import Agent, Task
from upsonic.integrations.langfuse import Langfuse
from upsonic.eval import AccuracyEvaluator

langfuse = Langfuse()

agent = Agent("anthropic/claude-sonnet-4-6", instrument=langfuse)
judge = Agent("anthropic/claude-sonnet-4-6")

evaluator = AccuracyEvaluator(
    judge_agent=judge,
    agent_under_test=agent,
    query="What is the capital of France?",
    expected_output="Paris",
    langfuse=langfuse,
    langfuse_dataset_name="my-eval-dataset",  # optional, default: "accuracy-eval"
    langfuse_run_name="run-v1",               # optional, default: auto-generated
)

result = asyncio.run(evaluator.run())
print(f"Score: {result.average_score}/10")

langfuse.shutdown()
Langfuse dataset items and structure
Langfuse dataset run linked to traces
Langfuse dataset run results and scores

AccuracyEvaluator with Multiple Iterations

Run the same query multiple times to get statistical confidence:
import asyncio
from upsonic import Agent
from upsonic.integrations.langfuse import Langfuse
from upsonic.eval import AccuracyEvaluator

langfuse = Langfuse()
agent = Agent("anthropic/claude-sonnet-4-6", instrument=langfuse)
judge = Agent("anthropic/claude-sonnet-4-6")

evaluator = AccuracyEvaluator(
    judge_agent=judge,
    agent_under_test=agent,
    query="What is 10 + 5? Reply with just the number.",
    expected_output="15",
    num_iterations=3,
    langfuse=langfuse,
    langfuse_dataset_name="math-eval",
)

result = asyncio.run(evaluator.run())
print(f"Average score: {result.average_score}/10")
print(f"Iterations: {len(result.evaluation_scores)}")

langfuse.shutdown()
Langfuse trace showing accuracy_eval_score

AccuracyEvaluator Parameters

ParameterTypeDefaultDescription
langfuseLangfuseNoneLangfuse instance for dataset logging
langfuse_dataset_namestr"accuracy-eval"Name of the Langfuse dataset to create/use
langfuse_run_namestrauto-generatedName for the dataset run
num_iterationsint1Number of times to run the evaluation

Advanced APIs

For direct access to Langfuse’s scoring, datasets, and annotation queue APIs, see the Advanced guides:
  • Scores — Add numeric, boolean, or categorical scores to traces
  • Score Configs — Define validation rules for scores
  • Datasets — Create datasets, add items, and link traces
  • Annotation Queues — Create review queues for human evaluation
  • Update Trace — Override trace output or metadata after a run

langfuse.shutdown()

Why call langfuse.shutdown()? Langfuse sends traces via a BatchSpanProcessor that buffers spans and exports them every 5 seconds by default. When you call langfuse.shutdown(), it forces all buffered spans to be flushed to Langfuse before shutting down the provider.By default flush_on_exit=True registers a Python atexit handler that calls shutdown() automatically when the process exits. However, calling it explicitly at the end of your script is recommended for short-lived scripts because atexit handlers can be skipped in edge cases (e.g., SIGKILL, os._exit()).

Parameters Reference

ParameterTypeDefaultDescription
public_keystrenv LANGFUSE_PUBLIC_KEYLangfuse public key
secret_keystrenv LANGFUSE_SECRET_KEYLangfuse secret key
hoststrenv LANGFUSE_HOSTCustom host URL
region"eu" | "us""eu"Cloud region (ignored if host is set)
service_namestr"upsonic"Service name in traces
sample_ratefloat1.0Fraction of traces to sample (0.0–1.0)
include_contentboolTrueInclude prompts/responses in traces
flush_on_exitboolTrueAuto-flush on process exit

Environment Variables

VariableDescription
LANGFUSE_PUBLIC_KEYLangfuse public key (pk-lf-...)
LANGFUSE_SECRET_KEYLangfuse secret key (sk-lf-...)
LANGFUSE_HOSTCustom Langfuse host URL