Skip to main content

What is Unified OCR?

The Unified OCR system in Upsonic provides a consistent interface for optical character recognition across multiple OCR engines. Instead of learning different APIs for each OCR provider, you use a single OCR class that works seamlessly with EasyOCR, RapidOCR, Tesseract, DeepSeek, and PaddleOCR. The OCR class serves as a high-level orchestrator that:
  • Manages multiple OCR provider backends with a unified API
  • Handles image preprocessing (rotation correction, contrast enhancement, noise reduction)
  • Converts PDFs to images with configurable DPI
  • Tracks confidence scores and bounding box detection
  • Collects performance metrics and processing statistics
  • Provides provider-specific features and optimizations
OCR Installation
pip install upsonic[ocr]
This installs Upsonic with OCR dependencies including EasyOCR, RapidOCR, Tesseract, PaddleOCR, and image processing libraries. You’ll have access to all OCR providers through a unified interface without needing to configure each one separately.
from upsonic.ocr import OCR
from upsonic.ocr.easyocr import EasyOCR

# Create OCR instance
ocr = OCR(EasyOCR, languages=['en'], rotation_fix=True)

# Extract text
text = ocr.get_text('document.pdf')
print(text)

How Unified OCR Works

The OCR system follows a clear processing pipeline:
  1. File Preparation: Validates file existence and format (supports .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp, .pdf)
  2. PDF Conversion: If the file is a PDF, converts each page to images at the specified DPI
  3. Image Preprocessing: Optionally applies rotation correction, contrast enhancement, and noise reduction
  4. OCR Processing: Processes each image through the selected provider’s engine
  5. Result Aggregation: Combines results from multiple pages, calculates average confidence scores
  6. Metrics Tracking: Updates processing statistics for performance analysis
from upsonic.ocr import OCR
from upsonic.ocr.rapidocr import RapidOCR

# Create OCR with preprocessing
ocr = OCR(
    RapidOCR,
    languages=['en'],
    rotation_fix=True,
    enhance_contrast=True,
    pdf_dpi=300
)

# Process file - returns detailed results
result = ocr.process_file('document.pdf')

print(f"Text: {result.text}")
print(f"Confidence: {result.confidence:.2%}")
print(f"Pages: {result.page_count}")
print(f"Processing time: {result.processing_time_ms:.2f}ms")

Made with Love 💚

We believe that document processing should be simple and accessible to everyone. By unifying multiple OCR engines under one interface, we’re giving developers the freedom to choose the best tool for their needs without rewriting code. Whether you’re building an invoice processor or analyzing historical documents, we’ve built this with care so you can focus on what matters most - your application.