Skip to main content

What is Unified OCR?

The Unified OCR system in Upsonic provides a consistent interface for optical character recognition across multiple OCR engines. It uses a layered architecture — Layer 0 handles document preparation, Layer 1 provides pluggable OCR engines, and the OCR orchestrator ties everything together. The OCR class serves as a high-level orchestrator that:
  • Manages multiple OCR engine backends with a unified API
  • Handles image preprocessing (rotation correction, contrast enhancement, noise reduction)
  • Converts PDFs to images with configurable DPI
  • Tracks confidence scores and bounding box detection
  • Collects performance metrics and processing statistics
  • Supports async-first processing with sync convenience wrappers
  • Provides configurable timeout via layer_1_timeout
OCR Installation
uv pip install "upsonic[ocr]"
This installs Upsonic with OCR dependencies including EasyOCR, RapidOCR, Tesseract, PaddleOCR, and image processing libraries. You’ll have access to all OCR providers through a unified interface without needing to configure each one separately.
from upsonic.ocr import OCR
from upsonic.ocr.layer_1.engines import EasyOCREngine

# Create engine instance with its own config
engine = EasyOCREngine(languages=['en'], rotation_fix=True)

# Create OCR orchestrator
ocr = OCR(layer_1_ocr_engine=engine)

# Extract text
text = ocr.get_text('document.pdf')
print(text)

How Unified OCR Works

The OCR system follows a layered processing pipeline:
  1. Layer 0 — Document Preparation: Validates file existence and format, converts PDFs to images at specified DPI, applies optional preprocessing (rotation fix, contrast enhancement, noise reduction)
  2. Layer 1 — OCR Engine: Processes each prepared image through the configured engine (EasyOCR, RapidOCR, Tesseract, DeepSeek, PaddleOCR)
  3. Orchestrator — Result Aggregation: Combines results from multiple pages, calculates average confidence scores, tracks processing metrics

Supported Layer 1 Engines

EngineBest forDocs
EasyOCREngineMulti-language support, 80+ languagesEasyOCR
RapidOCREngineSpeed and lightweight deploymentRapidOCR
TesseractOCREngineTraditional OCR, 100+ languagesTesseract
DeepSeekOCREngineBatch processing with vLLMDeepSeek OCR
DeepSeekOllamaOCREngineLocal processing via OllamaDeepSeek Ollama
PaddleOCREngineGeneral OCR (PP-OCRv5)PaddleOCR

Async Usage

All OCR methods support async execution:
import asyncio
from upsonic.ocr import OCR
from upsonic.ocr.layer_1.engines import EasyOCREngine

async def process():
    engine = EasyOCREngine(languages=['en'])
    ocr = OCR(layer_1_ocr_engine=engine)

    text = await ocr.get_text_async('document.pdf')
    print(text)

asyncio.run(process())

Made with Love 💚

We believe that document processing should be simple and accessible to everyone. By unifying multiple OCR engines under one interface, we’re giving developers the freedom to choose the best tool for their needs without rewriting code. Whether you’re building an invoice processor or analyzing historical documents, we’ve built this with care so you can focus on what matters most - your application.