Tesseract

What is Tesseract?

Google’s open-source OCR engine with 100+ language support. Best for traditional OCR with extensive language coverage.

Usage

from upsonic.ocr import OCR
from upsonic.ocr.layer_1.engines import TesseractOCREngine
# Also available: from upsonic.ocr import TesseractOCREngine

# Create engine instance
engine = TesseractOCREngine(languages=['eng'], enhance_contrast=True)

# Create OCR orchestrator
ocr = OCR(layer_1_ocr_engine=engine)

# Extract text
text = ocr.get_text('receipt.jpg')
print(text)

# Custom Tesseract configuration
engine_custom = TesseractOCREngine(languages=['eng'], psm=3, oem=3)
ocr_custom = OCR(layer_1_ocr_engine=engine_custom)
result = ocr_custom.process_file('document.pdf')
print(f"Text: {result.text}")

Parameters

Parameter	Type	Default	Description
`languages`	List[str]	`['eng']`	List of Tesseract language codes
`tesseract_cmd`	str	`None`	Path to tesseract executable
`confidence_threshold`	float	`0.0`	Minimum confidence for text blocks
`rotation_fix`	bool	`False`	Auto-detect and fix image rotation
`enhance_contrast`	bool	`False`	Enhance image contrast
`remove_noise`	bool	`False`	Apply noise reduction
`preserve_formatting`	bool	`True`	Preserve text layout and formatting
`psm`	int	`3`	Page segmentation mode (0-13)
`oem`	int	`3`	OCR Engine Mode (0-3)
`custom_config`	str	`''`	Additional Tesseract configuration string

Supported Languages

100+ languages including all major languages. Requires language packs to be installed separately.

Installation Note

Tesseract must be installed on the system:

Ubuntu/Debian: sudo apt-get install tesseract-ocr
macOS: brew install tesseract
Windows: Download installer from GitHub

GET STARTED

CONCEPTS

STARTING AN AGENT PROJECT

READY TO USE SNIPPETS

DEPLOYMENT

FURTHER READINGS

What is Tesseract?

Usage

Parameters

Supported Languages

Installation Note

GET STARTED

CONCEPTS

STARTING AN AGENT PROJECT

READY TO USE SNIPPETS

DEPLOYMENT

FURTHER READINGS

​What is Tesseract?

​Usage

​Parameters

​Supported Languages

​Installation Note

What is Tesseract?

Usage

Parameters

Supported Languages

Installation Note