iqra / README.md
Iqra
Iqra
Overview
Iqra is a data extraction tool. Given a file, the tool will OCR and classify the text/images into bounding boxes. The following image shows the high level flow:

The following is the database diagram:

Development
1. Overview
The server is built using Go. The frontend uses React . The OCR and the classification parts will be mainly in Python. Hence, the docker-compose file contains specification for Go server, Python container, and other dependencies (such as Minio for file storage).
2. App Components
The app composes of the following components:
a. Go Server
The Go Server implements the backend service. We are using Golang for it's blazing fast run-time, easy-to-read syntax and the type-checking system.
b. MinIO
MinIO is a file/blob/object storage. It is used for storing all uploaded files (the original and the generated images). as well as storing models as pkl files.
c. Python container
now we use the python container for 2 purposes:
- OCR and classify images
- Retrain models/classifiers
The python code resides in ./file-processor directory.
3. Components communication
The Go Server and the Python container need to communicate somehow. Usually we do that by creating an API in one side and let the other side call it. This is valid in this case.
However, as ocr/classification of an image takes more time, this should be done
asynchronously. Therefore, we use RabbitMQ, a message broker.
- the GO server splits the uploaded document into images
- the GO server stores each image and the document in Minio
- the GO server sends the images through RabbitMQ to be processed by python file-processor.
- the file-processor will consume the message, and after completion it will call either a
Success URIor anError URI, depending on the outcome. - These URI's refer back to the Go server, which will handle updating the database and notifying the user.
4. Running the project
To run the project in development mode, follow these steps:
- To Build: Run
make build-devfrom the repository's root to build the images and run the python container and other dependencies. Alternatively, to run only usedocker-compose up -d - Go to http://localhost:3000
Note:
- the file_processor sometimes starts before the rabbitMQ, so you might need to restart it
- we use keycloak hosted on
portainer.rihal.dev, if this server is ever down then the project won't run.
5. Sample users
These are the sample users that you can use to authenticate via Keycloak in development:
-
Admin:
- username: test_admin
- password: test
-
User:
- username: test_user
- password: test
6. API docs
To see the API docs, we use postman, anytime a new API is added please update
the collection here Iqra.postman_collection.json