Attributes
The OCR system is configured throughOCRConfig, which provides the following attributes:
| Attribute | Type | Default | Description |
|---|---|---|---|
languages | List[str] | ['en'] | Languages to detect (e.g., [‘en’, ‘zh’, ‘ja’]) |
confidence_threshold | float | 0.0 | Minimum confidence threshold (0.0-1.0) for accepting OCR results |
rotation_fix | bool | False | Enable automatic rotation correction for skewed images |
enhance_contrast | bool | False | Enhance image contrast before OCR processing |
remove_noise | bool | False | Apply noise reduction filter to improve text clarity |
pdf_dpi | int | 300 | DPI resolution for PDF rendering (higher = better quality, slower) |
preserve_formatting | bool | True | Try to preserve text formatting (line breaks, spacing) |

