siraaj-dot-ocr-service / docs/migration-complete.md
Triton Migration - Complete
Last updated: 4/16/2026GitHub
Triton Migration - Complete
Files Migrated from paddle-ocr-azure-service
Dockerfiles
paddle/Dockerfile.triton- Triton Inference Server container
Triton Models
paddle/triton_models/det_onnx/- Detection ONNX model1/model.onnxconfig.pbtxt
paddle/triton_models/cls_onnx/- Classification ONNX model1/model.onnxconfig.pbtxt
paddle/triton_models/rec_onnx/- Recognition ONNX model1/model.onnxconfig.pbtxt
paddle/triton_models/ocr_pipeline/- Business Logic Scripting model1/model.pyconfig.pbtxtarabic_dict_v1.txt
paddle/triton_models/router_onnx/- SigLIP2 vision encoder ONNX (exported at Docker build time via multi-stage build)1/model.onnx+model.onnx.data(generated byscripts/export_router_onnx.py)config.pbtxt
paddle/triton_models/router_pipeline/- BLS pipeline for EN/AR language classification1/model.py1/text_embeddings.npy(pre-computed text embeddings, generated at build time)config.pbtxt
Architecture
The OCR workflow communicates directly with Triton via gRPC (no intermediate proxy):
triton (Inference Server)
├── Port 8000: HTTP (health checks)
├── Port 8001: gRPC (inference — used by TritonClient)
└── Port 8002: Metrics
ocr_workflow / temporal-worker
└── TritonClient → gRPC → triton:8001
├── ocr_pipeline (PaddleOCR)
└── router_pipeline (SigLIP2 language router)
Previously, a FastAPI proxy (triton-api, port 9192) translated OpenAI-compatible HTTP requests
to Triton gRPC. This was removed in favor of direct gRPC via TritonClient
(ocr_workflow/infrastructure/triton_client.py).
Configuration
TRITON_GRPC_URL- Triton gRPC endpoint (default:triton:8001)TRITON_TIMEOUT- Request timeout in seconds (default: 30)