Self-hosted, multi-user Retrieval-Augmented Generation service for personal and company knowledge bases.
A production-ready Retrieval-Augmented Generation (RAG) service for enterprise knowledge bases. Upload PDFs, index them with modern embeddings, and query them through a conversational interface powered by multiple LLM providers. Includes a full REST API for seamless integration with enterprise services, mobile apps, and third-party platforms.
RAGuardian lets one deployment serve many people: each local user gets an isolated RAG workspace with separate uploads, file index, data sources, conversations, API keys, and Chroma collection. Admins keep provider configuration, model policy, and user management centralized.
Enterprise Compliance: When using Regolo.ai as the LLM provider, inference runs on infrastructure located in the European Union with zero data retention. Customer prompts and outputs are handled in memory only and are not persisted. Regolo does not use customer request content to train or fine-tune shared models, providing a simpler, more auditable compliance story aligned with European data sovereignty requirements.
Mistral AI Data Policy: Data is stored in the European Union. Mistral AI does not use customer data to train its AI models except under specific conditions (free subscription, paid subs without opt-out, feedback provision, or content moderation). For complete details, see Mistral AI Data Usage Policy.
Status: beta | Python: 3.11+ | License: MIT
- Multi-Provider LLM Support: Regolo.ai and Mistral AI default providers, plus custom OpenAI-compatible providers
- Multi-user local login with
adminanduserroles. - Personal workspaces under
app/data/workspaces/<workspace_id>/andapp/uploads/workspaces/<workspace_id>/. - Isolated Chroma collections per workspace, for example
documents_<workspace_id>. - Advanced Embeddings: Local, Regolo cloud, or custom OpenAI-compatible embeddings
- Smart Reranking: BAAI local, Regolo remote, or custom OpenAI-compatible reranking
- Conversational Memory: Context-aware responses with automatic summary compression
- Streaming Responses: Real-time token streaming with NDJSON structured events
- Admin Dashboard: Web UI for document/audio management, configuration, and monitoring
- Versioned REST API: Production-ready APIs with authentication and rate limiting
- External Client Ready: Scoped API keys and safe source payloads for server-to-server clients
- Audio Workflow: Configurable OpenAI-compatible Voice providers for STT indexing, optional STT language control, and TTS
- OCR Fallback: Default Regolo OCR provider for scanned PDFs and image-to-text chat input
- WordPress Plugin Ready: Modular WordPress client with customizable chatbot, shortcode, WXR article import, live public-post sync, retry/backoff, health checks, and abuse controls
- Enterprise Logging: Rotating file logs with multiple severity levels
- Health Monitoring: Deep health checks with readiness indicators
- Python: 3.11 or higher
- Windows, Linux, macOS (Apple Silicon): Latest supported version
- macOS (Intel): Python 3.12 recommended (PyTorch wheel compatibility)
- API Keys: Not needed to start the UI. Add one LLM provider key, or a local/custom OpenAI-compatible provider, before asking real questions.
- Redis: Not required for local development. The default local runtime uses in-process memory and inline jobs.
- Optional WordPress Client: WordPress 6.x or compatible hosting for the bundled plugin
- Memory: Minimum 2GB RAM for local embeddings
- Disk: 500MB+ for dependencies and local ChromaDB index
From a fresh clone:
git clone <repo-url>
cd Test-Rag-v1
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
cp .env.example .env
python app/app.pyOpen:
- Chat:
http://127.0.0.1:5000/ - Login:
http://127.0.0.1:5000/admin/login
On a clean install, the first successful login bootstraps an admin user. Use email admin@example.local (or leave the email field empty) and password admin. Set a real password before using the app beyond local testing.
The copied .env.example keeps the app local: no Redis, JSON-backed users/settings, local ChromaDB, inline jobs, and local sentence-transformers embeddings where supported.
The UI starts without provider keys, but real chat answers require a model provider:
- Hosted provider: uncomment
REGOLO_API_KEYorMISTRAL_API_KEYin.env, restart the app, then select the provider in Admin -> Configuration. - Local/custom provider: start any OpenAI-compatible server, then add it in Admin -> Configuration -> Custom LLM Providers. Disable "requires API key" if your local server does not need one.
- Embeddings:
.env.exampleselects local embeddings. If your platform cannot installsentence-transformers(notably macOS Intel), switch embeddings to Regolo or another custom embedding provider in Admin -> Configuration.
For shared or production deployments, replace the dev secrets in .env:
FLASK_SECRET_KEY=replace-with-a-long-random-secret
RAG_SECRET_KEY=replace-with-a-different-long-random-secret
# Preferred bootstrap admin credential: store a Werkzeug password hash.
# Generate with:
# python -c "from werkzeug.security import generate_password_hash; import getpass; print(generate_password_hash(getpass.getpass()))"
RAG_ADMIN_PASSWORD_HASH=replace-with-generated-password-hash
# Legacy/dev fallback only:
# RAG_ADMIN_PASSWORD=replace-me
# Provider keys, depending on what you enable.
REGOLO_API_KEY=...
MISTRAL_API_KEY=...RAG_SECRET_KEY protects user connector secrets. In development the app can fall back to a temporary/dev key, but production deployments should always set it explicitly. Prefer RAG_ADMIN_PASSWORD_HASH over plaintext RAG_ADMIN_PASSWORD for bootstrap admin credentials.
Redis is only needed when you want shared runtime state, queued background jobs, multiple Gunicorn workers, or production-style job monitoring. Keep RAG_STATE_BACKEND=memory and RAG_QUEUE_BACKEND=inline for local development.
- Log in and create the initial admin.
- Go to Configuration and choose LLM, embeddings, reranker, voice, and OCR providers.
- Go to Users and create normal users.
- Each user logs in and uploads files or configures personal Data Sources.
- For API clients or WordPress, create one API key in the workspace that should answer those requests, granting only the scopes the integration needs.
Use Gunicorn with the tracked runtime configuration:
gunicorn -c gunicorn.conf.py wsgi:applicationSee Deployment for Redis, reverse proxy, logging, and production checklist details.
- Start Here: practical guide from zero
- Setup
- Multi-User Architecture
- Data Ingestion Plugins
- Deployment
- API Reference
- OpenAPI Schema
- WordPress Plugin
- Public Repository Snapshot Workflow
| Path | Purpose |
|---|---|
app/app.py |
Flask app factory, UI routes, API routes, job orchestration |
app/utils/user_store.py |
JSON-backed local user store |
app/utils/workspace.py |
Workspace path and collection resolver |
app/utils/secret_store.py |
Encrypted connector credential store |
app/utils/data_ingestion/ |
Plugin contract, registry, Email IMAP, Microsoft Drive |
app/utils/document_indexer.py |
Shared indexing pipeline for uploads and connectors |
app/utils/rag_engine.py |
Retrieval, reranking, generation, conversation memory |
integrations/ |
External platform integrations (WordPress, etc.) |
integrations/README.md |
Integration architecture, roadmap, and how to add new |
integrations/wordpress/rag-client/ |
WordPress plugin with modular PHP architecture |
public-site/ |
Static public website |
tests/ |
Regression and architecture tests |
Create an API key from the user workspace you want to query, then:
curl -X POST http://127.0.0.1:5000/api/v1/query \
-H "Content-Type: application/json" \
-H "X-API-Key: $RAG_API_KEY" \
-d '{
"query": "Summarize my indexed documents",
"response_language": "auto"
}'The API key resolves to one user/workspace. Queries, uploads, delete operations, jobs, and conversations stay inside that boundary.
The bundled WordPress plugin is a server-side client. Store a RAGuardian API key in WordPress settings; browser JavaScript never sees it.
Recommended pattern:
- Create a dedicated RAGuardian user, for example
website@example.com. - Index the website knowledge base in that user's workspace.
- Create one API key for that workspace. Use
queryfor chat, addingestfor WXR import/live article sync, and addspeechonly when TTS is enabled. - Configure
Settings -> Raguardianin WordPress.
The plugin uses the standard WordPress settings UI, AJAX actions, WP-Cron, and post hooks. It does not expose the API key to JavaScript and does not read WordPress content directly from the database.
pip install -r requirements-dev.txt
.venv/bin/pytest -qCurrent multi-user regression coverage includes auth, workspace isolation, API key ownership, data sources, secret storage, jobs, upload/query/delete/rebuild, Chroma collection routing, and legacy API compatibility.
- Do not commit
.env,app/data/*.json, workspace data, uploads, ChromaDB, logs, or secrets. - Use strong
FLASK_SECRET_KEYandRAG_SECRET_KEYvalues in production. - Use per-user API keys with the narrowest useful scopes.
- Store connector passwords/tokens as user secrets or environment references, never as plaintext in workspace settings.
- Use HTTPS in front of any public deployment.
MIT. See LICENSE.