siraaj-dot-ocr-service / docs/research/benchmark_ocr_models/README.md

OCR Model Benchmarks

Last updated: 4/16/2026GitHub

Reports

Hard English Benchmark — 17 models tested on 9 difficult English pages (tables, infographics, KPI tables, diagrams). Winner: LightOnOCR-2 (1B) at 92.8/100, 2.7s/page, 2 GB VRAM.
General Model Comparison — 20+ models benchmarked on English + Arabic documents. Covers speed, quality, stability, Arabic support, VRAM, and vLLM compatibility.

baselines/ — ground truth files (9 pages), created by Claude via visual inspection at 150 DPI
results/ — raw model outputs organized by benchmark type (hard_english/, arabic_tables/, texts/)
benchmark_hard_english.py — main hard English benchmark script (vLLM-compatible models)
benchmark_got_ocr_hard.py — standalone GOT-OCR 2.0 benchmark (HuggingFace transformers)
benchmark_arabic_tables.py — Arabic table extraction benchmark
benchmark_table_comparison.py — table-specific comparison across models
benchmark_primary_layout.py — DotsOCR layout detection benchmark
benchmark_secondary_vlm.py — secondary VLM (Qwen) benchmark for Picture crops
benchmark_stack_comparison.py — full pipeline E2E comparison
run_hard_english_benchmark.sh — shell runner for sequential model benchmarking