Enterprise Workflow Intelligence Platform (EWIP) is an end-to-end AI-powered document processing and workflow automation platform capable of:
- Processing PDFs, scanned documents, and images
- Performing OCR and text extraction
- Understanding document types
- Extracting critical information
- Validating business rules
- Automatically routing documents to departments
- Generating operational analytics
- Providing an interactive Streamlit dashboard
- Maintaining reproducible pipelines using DVC
This project simulates a real enterprise document processing solution used in organizations such as consulting firms, banks, insurance companies, BPOs, and large enterprises.
Organizations process thousands of documents daily (Invoices, Purchase Orders, Contracts, Resumes, Reports, Forms, Compliance Documents). Manual processing creates high operational costs, human errors, slow turnaround times, compliance risks, and routing mistakes. This platform automates document understanding and workflow execution.
For a deep-dive into how this architecture functions, please explore the dedicated documentation in each core module:
- 🧠 src/README.md: Core intelligence pipeline (Ingestion, OCR, Classification, Extraction, Validation).
- 📊 dashboard/README.md: The Streamlit front-end architecture.
- 📓 notebooks/README.md: ML Benchmarking and Explainable AI (LIME/SHAP).
- 💾 data/README.md: MLOps data version control strategy.
- 🤖 models/README.md: Hugging Face DistilBERT weights management.
graph TD
A[Raw Document: PDF/Image] --> B(Ingestion Layer)
B --> C{OCR Engine}
C -->|PaddleOCR/EasyOCR| D(Raw Text)
D --> E{Classifier}
E -->|DistilBERT / TF-IDF| F[Class: Invoice/Resume/Contract]
F --> G(Information Extraction)
G -->|Regex/NER| H[Extracted Fields]
H --> I{Validation Engine}
I -->|Rule Checks| J[Completeness Score]
J --> K(Workflow Router)
K --> L[Assigned Department & Priority]
L --> M[(SQLite DB)]
M --> N[Streamlit Dashboard]
1. Create Environment & Install Dependencies:
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt # (Using previously executed pip commands)2. Download Datasets (DVC):
dvc repro ingest3. Run Streamlit Dashboard:
python -m streamlit run dashboard/app.py