What is Unified OCR?
The Unified OCR system in Upsonic provides a consistent interface for optical character recognition across multiple OCR engines. Instead of learning different APIs for each OCR provider, you use a singleOCR class that works seamlessly with EasyOCR, RapidOCR, Tesseract, DeepSeek, and PaddleOCR.
The OCR class serves as a high-level orchestrator that:
- Manages multiple OCR provider backends with a unified API
- Handles image preprocessing (rotation correction, contrast enhancement, noise reduction)
- Converts PDFs to images with configurable DPI
- Tracks confidence scores and bounding box detection
- Collects performance metrics and processing statistics
- Provides provider-specific features and optimizations
OCR InstallationThis installs Upsonic with OCR dependencies including EasyOCR, RapidOCR, Tesseract, PaddleOCR, and image processing libraries. You’ll have access to all OCR providers through a unified interface without needing to configure each one separately.
How Unified OCR Works
The OCR system follows a clear processing pipeline:- File Preparation: Validates file existence and format (supports .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp, .pdf)
- PDF Conversion: If the file is a PDF, converts each page to images at the specified DPI
- Image Preprocessing: Optionally applies rotation correction, contrast enhancement, and noise reduction
- OCR Processing: Processes each image through the selected provider’s engine
- Result Aggregation: Combines results from multiple pages, calculates average confidence scores
- Metrics Tracking: Updates processing statistics for performance analysis
Navigation
- OCR Attributes - Comprehensive guide to all OCR configuration options
- OCR Providers - Configure EasyOCR, RapidOCR, Tesseract, PaddleOCR, and DeepSeek OCR
- Advanced Features - Advanced preprocessing and optimization options
- Metrics and Performance - Monitor and optimize OCR performance
- Basic OCR Example - Get started with OCR integration

