What is Unified OCR?
The Unified OCR system in Upsonic provides a consistent interface for optical character recognition across multiple OCR engines. It uses a layered architecture — Layer 0 handles document preparation, Layer 1 provides pluggable OCR engines, and theOCR orchestrator ties everything together.
The OCR class serves as a high-level orchestrator that:
- Manages multiple OCR engine backends with a unified API
- Handles image preprocessing (rotation correction, contrast enhancement, noise reduction)
- Converts PDFs to images with configurable DPI
- Tracks confidence scores and bounding box detection
- Collects performance metrics and processing statistics
- Supports async-first processing with sync convenience wrappers
- Provides configurable timeout via
layer_1_timeout
OCR InstallationThis installs Upsonic with OCR dependencies including EasyOCR, RapidOCR, Tesseract, PaddleOCR, and image processing libraries. You’ll have access to all OCR providers through a unified interface without needing to configure each one separately.
How Unified OCR Works
The OCR system follows a layered processing pipeline:- Layer 0 — Document Preparation: Validates file existence and format, converts PDFs to images at specified DPI, applies optional preprocessing (rotation fix, contrast enhancement, noise reduction)
- Layer 1 — OCR Engine: Processes each prepared image through the configured engine (EasyOCR, RapidOCR, Tesseract, DeepSeek, PaddleOCR)
- Orchestrator — Result Aggregation: Combines results from multiple pages, calculates average confidence scores, tracks processing metrics
Supported Layer 1 Engines
| Engine | Best for | Docs |
|---|---|---|
EasyOCREngine | Multi-language support, 80+ languages | EasyOCR |
RapidOCREngine | Speed and lightweight deployment | RapidOCR |
TesseractOCREngine | Traditional OCR, 100+ languages | Tesseract |
DeepSeekOCREngine | Batch processing with vLLM | DeepSeek OCR |
DeepSeekOllamaOCREngine | Local processing via Ollama | DeepSeek Ollama |
PaddleOCREngine | General OCR (PP-OCRv5) | PaddleOCR |
Async Usage
All OCR methods support async execution:Navigation
- Architecture - Understanding the layered OCR pipeline
- OCR Attributes - Comprehensive guide to all OCR configuration options
- OCR Providers - Configure EasyOCR, RapidOCR, Tesseract, PaddleOCR, and DeepSeek OCR
- Advanced Features - Advanced preprocessing, async, and timeout options
- Metrics and Performance - Monitor and optimize OCR performance
- Basic OCR Example - Get started with OCR integration

