siraaj-dot-ocr-service / docs/migration-complete.md

Triton Migration - Complete

Last updated: 4/16/2026GitHub

Triton Migration - Complete

Files Migrated from paddle-ocr-azure-service

Dockerfiles

  • paddle/Dockerfile.triton - Triton Inference Server container

Triton Models

  • paddle/triton_models/det_onnx/ - Detection ONNX model
    • 1/model.onnx
    • config.pbtxt
  • paddle/triton_models/cls_onnx/ - Classification ONNX model
    • 1/model.onnx
    • config.pbtxt
  • paddle/triton_models/rec_onnx/ - Recognition ONNX model
    • 1/model.onnx
    • config.pbtxt
  • paddle/triton_models/ocr_pipeline/ - Business Logic Scripting model
    • 1/model.py
    • config.pbtxt
    • arabic_dict_v1.txt
  • paddle/triton_models/router_onnx/ - SigLIP2 vision encoder ONNX (exported at Docker build time via multi-stage build)
    • 1/model.onnx + model.onnx.data (generated by scripts/export_router_onnx.py)
    • config.pbtxt
  • paddle/triton_models/router_pipeline/ - BLS pipeline for EN/AR language classification
    • 1/model.py
    • 1/text_embeddings.npy (pre-computed text embeddings, generated at build time)
    • config.pbtxt

Architecture

The OCR workflow communicates directly with Triton via gRPC (no intermediate proxy):

triton (Inference Server)
  ├── Port 8000: HTTP (health checks)
  ├── Port 8001: gRPC (inference — used by TritonClient)
  └── Port 8002: Metrics

ocr_workflow / temporal-worker
  └── TritonClient → gRPC → triton:8001
      ├── ocr_pipeline (PaddleOCR)
      └── router_pipeline (SigLIP2 language router)

Previously, a FastAPI proxy (triton-api, port 9192) translated OpenAI-compatible HTTP requests to Triton gRPC. This was removed in favor of direct gRPC via TritonClient (ocr_workflow/infrastructure/triton_client.py).

Configuration

  • TRITON_GRPC_URL - Triton gRPC endpoint (default: triton:8001)
  • TRITON_TIMEOUT - Request timeout in seconds (default: 30)