iqra / README.md

Iqra

Iqra

Overview

Iqra is a data extraction tool. Given a file, the tool will OCR and classify the text/images into bounding boxes. The following image shows the high level flow:

iqra flow

The following is the database diagram:

DB diagram

Development

1. Overview

The server is built using Go. The frontend uses React . The OCR and the classification parts will be mainly in Python. Hence, the docker-compose file contains specification for Go server, Python container, and other dependencies (such as Minio for file storage).

2. App Components

The app composes of the following components:

a. Go Server

The Go Server implements the backend service. We are using Golang for it's blazing fast run-time, easy-to-read syntax and the type-checking system.

b. MinIO

MinIO is a file/blob/object storage. It is used for storing all uploaded files (the original and the generated images). as well as storing models as pkl files.

c. Python container

now we use the python container for 2 purposes:

  1. OCR and classify images
  2. Retrain models/classifiers

The python code resides in ./file-processor directory.

3. Components communication

The Go Server and the Python container need to communicate somehow. Usually we do that by creating an API in one side and let the other side call it. This is valid in this case.

However, as ocr/classification of an image takes more time, this should be done asynchronously. Therefore, we use RabbitMQ, a message broker.

  1. the GO server splits the uploaded document into images
  2. the GO server stores each image and the document in Minio
  3. the GO server sends the images through RabbitMQ to be processed by python file-processor.
  4. the file-processor will consume the message, and after completion it will call either a Success URI or an Error URI, depending on the outcome.
  5. These URI's refer back to the Go server, which will handle updating the database and notifying the user.

4. Running the project

To run the project in development mode, follow these steps:

  1. To Build: Run make build-dev from the repository's root to build the images and run the python container and other dependencies. Alternatively, to run only use docker-compose up -d
  2. Go to http://localhost:3000

Note:

  • the file_processor sometimes starts before the rabbitMQ, so you might need to restart it
  • we use keycloak hosted on portainer.rihal.dev, if this server is ever down then the project won't run.

5. Sample users

These are the sample users that you can use to authenticate via Keycloak in development:

  1. Admin:

    • username: test_admin
    • password: test
  2. User:

    • username: test_user
    • password: test

6. API docs

To see the API docs, we use postman, anytime a new API is added please update the collection here Iqra.postman_collection.json