siraaj-dot-ocr-service / docs/research/benchmark_ocr_models/README.md

OCR Model Benchmarks

Last updated: 4/16/2026GitHub

OCR Model Benchmarks

Reports

  • Hard English Benchmark — 17 models tested on 9 difficult English pages (tables, infographics, KPI tables, diagrams). Winner: LightOnOCR-2 (1B) at 92.8/100, 2.7s/page, 2 GB VRAM.
  • General Model Comparison — 20+ models benchmarked on English + Arabic documents. Covers speed, quality, stability, Arabic support, VRAM, and vLLM compatibility.

Directory Structure

  • baselines/ — ground truth files (9 pages), created by Claude via visual inspection at 150 DPI
  • results/ — raw model outputs organized by benchmark type (hard_english/, arabic_tables/, texts/)
  • benchmark_hard_english.py — main hard English benchmark script (vLLM-compatible models)
  • benchmark_got_ocr_hard.py — standalone GOT-OCR 2.0 benchmark (HuggingFace transformers)
  • benchmark_arabic_tables.py — Arabic table extraction benchmark
  • benchmark_table_comparison.py — table-specific comparison across models
  • benchmark_primary_layout.py — DotsOCR layout detection benchmark
  • benchmark_secondary_vlm.py — secondary VLM (Qwen) benchmark for Picture crops
  • benchmark_stack_comparison.py — full pipeline E2E comparison
  • run_hard_english_benchmark.sh — shell runner for sequential model benchmarking