iqra-pid / README.md
P&ID Digitization Project
P&ID Digitization Project
Overview
This project aims to automatically analyze Piping and Instrumentation Diagrams (P&IDs) to extract key information, including symbols, text labels, and eventually the connections between them. The goal is to convert the visual information present in a P&ID image into a structured digital format, enabling further analysis, database population, or integration with other systems.
Workflow
The current workflow focuses primarily on detecting and recognizing symbols and associating them with their text labels. The main stages are:
-
Input: The system takes a P&ID image file as input. It also utilizes corresponding Optical Character Recognition (OCR) data to identify text elements present in the diagram.
-
Symbol Detection:
- Preprocessing: The input image is processed (grayscale, blur, thresholding, morphological operations) to create a clean binary representation, making symbols easier to isolate.
- Contour Analysis: Potential symbol shapes (contours) are identified in the processed image.
- Filtering: These contours are filtered based on size and aspect ratio to remove noise and non-symbol elements (like straight lines). Contours that heavily overlap with OCR text are also filtered out.
- Bounding Box Generation: Bounding boxes are created around the filtered contours, and a fixed padding is added to ensure better coverage of the symbol area.
- Merging: Overlapping or very close padded boxes are merged to consolidate multiple detected parts of a single symbol.
- Text Association (Detection Stage): OCR text words are grouped into logical phrases (handling horizontal and vertical arrangements). These phrases are then associated with the detected symbol boxes based primarily on whether the text center falls inside the symbol box, with a fallback to proximity for nearby external labels.
- Output: A list of detected potential symbols, each with a bounding box and its associated text phrase.
-
Symbol Recognition (Hybrid Approach):
- Candidate Retrieval: For each detected symbol (represented by its cropped image), an image embedding model (e.g., ResNet, ViT, CLIP - experiments ongoing) generates a feature vector. This vector is compared against a pre-computed library of embeddings from known template symbols using cosine similarity to find the Top-N most visually similar candidates.
- VLM Refinement (Optional): The cropped image of the detected symbol, along with the names of the Top-N candidates, is sent to a powerful Visual Language Model (VLM) via an API (e.g., OpenAI's GPT-4 Turbo). The VLM is prompted to choose the best classification from the provided candidate list based on the visual evidence.
- Final Classification: If VLM refinement is enabled and successful, the VLM's choice is used as the final symbol class. Otherwise, the system falls back to the top match from the similarity search, provided its score meets a minimum confidence threshold.
- Output: The final list of symbols, now augmented with the recognized symbol class (e.g., "Gate Valve", "TIC Instrument") and the confidence score (from the similarity search).
-
Line Detection & Connectivity Analysis (Future Work):
- Although not fully implemented in the current scripts discussed, a complete P&ID digitization system requires detecting the pipelines (process lines, instrument lines, etc.) connecting the symbols.
- This used Hough Transform for line detection and subsequent analysis to determine which symbols are connected by which lines, creating a graph representation of the process flow.
Current Status & Next Steps
- The symbol detection and text association components are functional but require ongoing parameter tuning for optimal bounding box accuracy and text linking across different P&ID styles.
- Symbol recognition using similarity search with general pre-trained models shows limitations. The hybrid approach using VLM refinement offers better potential but relies on external APIs.
- Further refinement of the template library and potentially fine-tuning an embedding model specifically on P&ID symbols could improve the similarity search accuracy.
- Implementing robust line detection and connectivity analysis is the next major phase to achieve full digitization.