Skip to main content

Overview

vLLM is a high-throughput serving engine for large language models that provides an OpenAI-compatible API. Perfect for running models locally with excellent performance and throughput. Model Class: OpenAIChatModel (OpenAI-compatible API)

Authentication

export VLLM_BASE_URL="http://localhost:8000/v1"  # Required
export VLLM_API_KEY="your-api-key"  # Optional, vLLM doesnt not require authentication

Examples

from upsonic import Agent, Task
from upsonic.models.openai import OpenAIChatModel

model = OpenAIChatModel(model_name="Qwen/Qwen2.5-0.5B-Instruct", provider="vllm")

agent = Agent(model=model)
task = Task("Hello, how are you?")
result = agent.do(task)

print(result)

Parameters

ParameterTypeDescriptionDefaultSource
max_tokensintMaximum tokens to generateModel defaultBase
temperaturefloatSampling temperatureModel defaultBase
top_pfloatNucleus samplingModel defaultBase
seedintRandom seedNoneBase
stop_sequenceslist[str]Stop sequencesNoneBase
presence_penaltyfloatToken presence penalty0.0Base
frequency_penaltyfloatToken frequency penalty0.0Base
parallel_tool_callsboolAllow parallel toolsTrueBase
timeoutfloatRequest timeout (seconds)Model defaultBase