OCR

What is Unified OCR?

The Unified OCR system in Upsonic provides a consistent interface for optical character recognition across multiple OCR engines. It uses a layered architecture — Layer 0 handles document preparation, Layer 1 provides pluggable OCR engines, and the OCR orchestrator ties everything together. The OCR class serves as a high-level orchestrator that:

Manages multiple OCR engine backends with a unified API
Handles image preprocessing (rotation correction, contrast enhancement, noise reduction)
Converts PDFs to images with configurable DPI
Tracks confidence scores and bounding box detection
Collects performance metrics and processing statistics
Supports async-first processing with sync convenience wrappers
Provides configurable timeout via layer_1_timeout

OCR Installation

uv pip install "upsonic[ocr]"

This installs Upsonic with OCR dependencies including EasyOCR, RapidOCR, Tesseract, PaddleOCR, and image processing libraries. You’ll have access to all OCR providers through a unified interface without needing to configure each one separately.

from upsonic.ocr import OCR
from upsonic.ocr.layer_1.engines import EasyOCREngine

# Create engine instance with its own config
engine = EasyOCREngine(languages=['en'], rotation_fix=True)

# Create OCR orchestrator
ocr = OCR(layer_1_ocr_engine=engine)

# Extract text
text = ocr.get_text('document.pdf')
print(text)

How Unified OCR Works

The OCR system follows a layered processing pipeline:

Layer 0 — Document Preparation: Validates file existence and format, converts PDFs to images at specified DPI, applies optional preprocessing (rotation fix, contrast enhancement, noise reduction)
Layer 1 — OCR Engine: Processes each prepared image through the configured engine (EasyOCR, RapidOCR, Tesseract, DeepSeek, PaddleOCR)
Orchestrator — Result Aggregation: Combines results from multiple pages, calculates average confidence scores, tracks processing metrics

Supported Layer 1 Engines

Engine	Best for	Docs
`EasyOCREngine`	Multi-language support, 80+ languages	EasyOCR
`RapidOCREngine`	Speed and lightweight deployment	RapidOCR
`TesseractOCREngine`	Traditional OCR, 100+ languages	Tesseract
`DeepSeekOCREngine`	Batch processing with vLLM	DeepSeek OCR
`DeepSeekOllamaOCREngine`	Local processing via Ollama	DeepSeek Ollama
`PaddleOCREngine`	General OCR (PP-OCRv5)	PaddleOCR

Async Usage

All OCR methods support async execution:

import asyncio
from upsonic.ocr import OCR
from upsonic.ocr.layer_1.engines import EasyOCREngine

async def process():
    engine = EasyOCREngine(languages=['en'])
    ocr = OCR(layer_1_ocr_engine=engine)

    text = await ocr.get_text_async('document.pdf')
    print(text)

asyncio.run(process())

Architecture - Understanding the layered OCR pipeline
OCR Attributes - Comprehensive guide to all OCR configuration options
OCR Providers - Configure EasyOCR, RapidOCR, Tesseract, PaddleOCR, and DeepSeek OCR
Advanced Features - Advanced preprocessing, async, and timeout options
Metrics and Performance - Monitor and optimize OCR performance
Basic OCR Example - Get started with OCR integration

Made with Love 💚

We believe that document processing should be simple and accessible to everyone. By unifying multiple OCR engines under one interface, we’re giving developers the freedom to choose the best tool for their needs without rewriting code. Whether you’re building an invoice processor or analyzing historical documents, we’ve built this with care so you can focus on what matters most - your application.

GET STARTED

CONCEPTS

STARTING AN AGENT PROJECT

READY TO USE SNIPPETS

DEPLOYMENT

FURTHER READINGS

What is Unified OCR?

How Unified OCR Works

Supported Layer 1 Engines

Async Usage

Navigation

Made with Love 💚

GET STARTED

CONCEPTS

STARTING AN AGENT PROJECT

READY TO USE SNIPPETS

DEPLOYMENT

FURTHER READINGS

​What is Unified OCR?

​How Unified OCR Works

​Supported Layer 1 Engines

​Async Usage

​Navigation

​Made with Love 💚

What is Unified OCR?

How Unified OCR Works

Supported Layer 1 Engines

Async Usage

Navigation

Made with Love 💚