Companion repository del libro "AI che Ragiona": implementazione production-ready di un sistema Knowledge Graph completo con pipeline di ingestion documentale, vector store, graph database, hybrid RAG (vector + graph), interfaccia web, server MCP e layer di orchestrazione multi-agent.
- Modules
- Architecture
- Tech stack
- System requirements
- Quick start
- Local development (without Docker)
- Running with Docker
- Running with pre-built images (GHCR)
- Repository structure
- API Reference
- Agent API Reference
- Multi-Agent system
- UI (Frontend)
- Environment variables
- Debugging with VS Code
- Testing and linting
- Ingestion pipeline
- RAG pipeline (query)
- Data models
- Scientific references
- Troubleshooting
| Module | Description | README |
|---|---|---|
| knowledge-graph-api | FastAPI backend — ingestion, RAG, graph | README |
| knowledge-graph-ui | Next.js 15 web frontend | README |
| knowledge-graph-mcp | MCP server — exposes API as LLM tools | README |
| knowledge-graph-agents | Multi-agent orchestration (LangGraph) | README |
Client / LLM Host (Claude Desktop, VS Code, custom app)
| |
| MCP Protocol | HTTP REST
v v
+------------------+ +--------------------+
| knowledge-graph | | knowledge-graph |
| -mcp | | -agents |
| (MCP tool layer) | | (Multi-Agent API) |
| localhost:8080 | | localhost:8002 |
+--------+---------+ +--------+-----------+
| |
+----------+--------------+
| HTTP REST
+--------v---------+
| FastAPI API |
| knowledge-graph |
| -api |
| localhost:8000 |
+--+-----+------+--+
| | |
+------------+ +--+--+ ++-----------+
| | | |
+--v------------+ +v----v--+ +----------v-----+
| Neo4j 5.18 | | Redis | | Ollama |
| Graph DB | | Stack | | llama3 + |
| :7474 / :7687| | :6379 | | nomic-embed |
+--------------+ | :8001 | | :11434 |
+---------+ +----------------+
^
+--------+---------+
| Next.js UI |
| knowledge-graph |
| -ui |
| localhost:3000 |
+------------------+
Data flows through three main paths:
- Ingestion: document → chunking → embedding (Ollama) → dedup (SHA-256) → entity/relation extraction (LLM) → storage in Redis (vectors) + Neo4j (graph)
- RAG Query: question → intent classification → vector search (Redis) → graph traversal (Neo4j) → context assembly → LLM generation (Ollama) → answer
- Multi-Agent: request → Orchestrator (LangGraph) → specialised agent → HTTP tools to API → structured response
| Component | Technology | Version |
|---|---|---|
| Graph Database | Neo4j (Cypher + APOC) | 5.18 |
| Vector Store | Redis for AI (RedisSearch + RedisJSON) | latest |
| LLM Inference | Ollama (local, no API key) | latest |
| LLM Model | Llama 3 | latest |
| Embedding Model | nomic-embed-text (768 dim) | latest |
| REST API | FastAPI + uvicorn | 0.115+ |
| Data Models | Pydantic v2 + pydantic-settings | 2.7+ |
| Multi-Agent | Microsoft Agent Framework (MAF) | latest |
| Frontend | Next.js + React + Tailwind CSS | 15 / 19 / 4 |
| Graph Visualisation | react-force-graph-2d | 1.26+ |
| Logging | structlog (JSON in prod, console in dev) | 24.1+ |
| Testing | pytest + pytest-asyncio + pytest-mock | 8.2+ |
| Linting | ruff (API/Agents), ESLint (UI) | 0.4+ |
| Containerisation | Docker + Docker Compose | 24+ / v2 |
- Docker 24+ and Docker Compose v2
- 8 GB RAM recommended (Ollama + Neo4j + Redis)
- Python 3.11+ (only for local API development without Docker)
- Node.js 22+ (only for local UI development without Docker)
- NVIDIA GPU optional (to accelerate Ollama)
The fastest way to get everything running:
# 1. Clone and move into the root
git clone <repo-url>
cd knowledge-graph
# 2. Configure environment variables
cp .env.example .env
# Edit NEO4J_PASSWORD and other values in .env
# 3. Start all services (production stack)
make up-prod
# 4. Download Ollama models (first time only)
make pull-models
# 5. (Optional) Seed with sample data
cd knowledge-graph-api && make seed
# 6. Open in the browser
# UI: http://localhost:3000
# API Swagger: http://localhost:8000/docs
# Agent API: http://localhost:8002/docs
# Neo4j: http://localhost:7474
# RedisInsight: http://localhost:8001Run only the infrastructure in Docker and the application servers natively for a hot-reload development experience.
cd knowledge-graph
make up-dev
# or: docker compose --profile dev up -dWait for services to become healthy:
docker compose psmake pull-modelscd knowledge-graph-api
python -m venv venv
source venv/bin/activate # Linux / macOS
# venv\Scripts\activate # Windows
pip install -r requirements.txt
uvicorn api.main:app --reload --port 8000API available at http://localhost:8000. Interactive Swagger docs at http://localhost:8000/docs.
cd knowledge-graph-agents
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn api.agent_api:app --reload --port 8001Agent API available at http://localhost:8001. Swagger at http://localhost:8001/docs.
# Example call
curl -X POST http://localhost:8001/agents/run \
-H "Content-Type: application/json" \
-d '{"request": "What do you know about Neo4j?", "thread_id": "default"}'cd knowledge-graph-ui
cp .env.local.example .env.local
# Verify NEXT_PUBLIC_API_URL=http://localhost:8000
npm install
npm run devUI available at http://localhost:3000.
The stack uses Docker Compose profiles to separate environments:
| Profile | Services started | Use case |
|---|---|---|
dev |
neo4j, redis, ollama, redisinsight | Local development — run apps outside Docker |
prod |
neo4j, redis, ollama, api, ui, mcp, agents | Full production stack |
make up-dev # infrastructure + RedisInsight (profile dev)
make up-prod # full production stack (profile prod)
make down # stop all services
make pull-models # download llama3 + nomic-embed-textcp .env.example .env
# Configure .env with real passwords
make up-prodServices exposed:
| Service | URL | Description |
|---|---|---|
| UI | http://localhost:3000 | Next.js frontend |
| API | http://localhost:8000 | FastAPI REST API |
| API Docs | http://localhost:8000/docs | Swagger UI (API) |
| Agent API | http://localhost:8002 | Multi-Agent Orchestration API |
| Agent Docs | http://localhost:8002/docs | Swagger UI (Agent API) |
| MCP Server | http://localhost:8080 | MCP tool layer (SSE transport) |
| Neo4j Browser | http://localhost:7474 | Neo4j web interface |
| RedisInsight | http://localhost:8001 | Redis web interface (built-in) |
| Ollama | http://localhost:11434 | Ollama API |
make up-devThis starts only Neo4j, Redis, Ollama and a standalone RedisInsight on port 5540. Run API, UI, Agents and MCP locally with hot-reload (see Local development).
Additional services when using --profile dev:
| Service | URL | Description |
|---|---|---|
| RedisInsight | http://localhost:5540 | Standalone Redis UI (profile dev) |
| Neo4j Browser | http://localhost:7474 | Built into neo4j container |
# Service status
docker compose ps
# Follow logs of a specific service
docker compose logs -f api
docker compose logs -f ui
# Stop all services
make down
# Stop and remove volumes (WARNING: deletes all Neo4j/Redis data)
docker compose down -v
# Rebuild a single service
docker compose up --build api -dGPU acceleration for Ollama is configured in docker-compose.yml under the deploy section of the ollama service — it is enabled by default and requires the NVIDIA Container Toolkit.
The easiest way to run the full stack locally without cloning the source code or building any image. Every CI-green merge to main publishes four images to the GitHub Container Registry:
| Image | Description |
|---|---|
ghcr.io/agent-engineering-studio/kg-api:latest |
FastAPI backend |
ghcr.io/agent-engineering-studio/kg-ui:latest |
Next.js frontend |
ghcr.io/agent-engineering-studio/kg-mcp:latest |
MCP server |
ghcr.io/agent-engineering-studio/kg-agents:latest |
Multi-agent API |
curl -O https://raw.githubusercontent.com/agent-engineering-studio/knowledge-graph/main/docker-compose.ghcr.yml
curl -O https://raw.githubusercontent.com/agent-engineering-studio/knowledge-graph/main/.env.example
cp .env.example .env
# Edit .env: set NEO4J_PASSWORD and, if needed, OLLAMA_BASE_URLOption A — Ollama already running on the host (recommended if you already have models):
# Set in .env:
# OLLAMA_BASE_URL=http://host.docker.internal:11434
docker compose -f docker-compose.ghcr.yml up -dOption B — Ollama CPU container (no GPU):
docker compose -f docker-compose.ghcr.yml --profile cpu up -d
# Pull models (first time only)
docker compose -f docker-compose.ghcr.yml exec ollama-cpu \
sh -c "ollama pull llama3 && ollama pull nomic-embed-text"Option C — Ollama GPU container (NVIDIA, requires NVIDIA Container Toolkit):
docker compose -f docker-compose.ghcr.yml --profile gpu up -d
docker compose -f docker-compose.ghcr.yml exec ollama-gpu \
sh -c "ollama pull llama3 && ollama pull nomic-embed-text"| Service | URL |
|---|---|
| UI | http://localhost:3000 |
| API + Swagger | http://localhost:8000/docs |
| Agent API + Swagger | http://localhost:8002/docs |
| MCP Server (SSE) | http://localhost:8080 |
| Neo4j Browser | http://localhost:7474 |
| RedisInsight | http://localhost:5540 |
# Check all containers are up
docker compose -f docker-compose.ghcr.yml ps
# Follow logs of a specific service
docker compose -f docker-compose.ghcr.yml logs -f api
# Pull latest images and restart
docker compose -f docker-compose.ghcr.yml pull
docker compose -f docker-compose.ghcr.yml up -d
# Stop everything
docker compose -f docker-compose.ghcr.yml down
# Stop and remove all data (WARNING: deletes Neo4j + Redis volumes)
docker compose -f docker-compose.ghcr.yml down -vNote: the GHCR packages for this repository are public. No
docker loginis required to pull them.
knowledge-graph/
├── .vscode/ # VS Code configuration (debug, tasks, settings)
│ ├── launch.json # Debug configurations
│ ├── tasks.json # Build/run tasks
│ └── settings.json # Editor settings
├── docker-compose.yml # Full stack (profiles: dev, prod)
├── Makefile # Shorthand commands
├── .env.example # Environment variable template
│
├── knowledge-graph-api/ # Backend API (Python / FastAPI)
│ ├── api/ # FastAPI app, routes, schemas
│ │ ├── main.py # Application entry point
│ │ ├── schemas.py # Pydantic request/response models
│ │ └── routes/
│ │ ├── ingest.py # POST /ingest
│ │ └── query.py # POST /query, POST /query/stream
│ ├── config/
│ │ └── settings.py # Centralised configuration (pydantic-settings)
│ ├── models/ # Domain models
│ │ ├── base.py # VectorDocument
│ │ ├── graph_node.py # GraphNode (KGNode)
│ │ └── relation.py # Relation
│ ├── pipeline/ # Ingestion pipeline
│ ├── query/ # Query pipeline
│ ├── storage/ # Persistence backends
│ ├── infra/docker/Dockerfile # API Dockerfile
│ └── requirements.txt
│
├── knowledge-graph-agents/ # Multi-Agent Orchestration (Python / LangGraph)
│ ├── agents/ # Specialised agents
│ ├── orchestration/ # LangGraph workflow
│ ├── tools/kg_tools.py # Async HTTP wrappers for the API
│ ├── memory/kg_memory.py # AgentRunRecord + in-process store
│ ├── api/agent_api.py # FastAPI app port 8001 (host 8002 in Docker)
│ ├── Dockerfile
│ └── requirements.txt
│
├── knowledge-graph-mcp/ # MCP Server (Python / FastMCP)
│ ├── src/kg_mcp/
│ │ ├── server.py # MCP server + tool definitions
│ │ ├── api_client.py # HTTP client to the API
│ │ └── tools.py # 8 tool implementations
│ ├── Dockerfile
│ └── pyproject.toml
│
└── knowledge-graph-ui/ # Frontend (Next.js / React)
├── src/
│ ├── app/ # Next.js App Router pages
│ ├── components/ # Reusable React components
│ └── lib/api-client.ts # Typed fetch wrapper
├── Dockerfile
└── package.json
Interactive OpenAPI documentation generated automatically by FastAPI:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- OpenAPI JSON: http://localhost:8000/openapi.json
Checks connectivity with Neo4j, Redis and Ollama.
Response (HealthResponse):
{
"status": "healthy",
"neo4j": true,
"redis": true,
"ollama": true
}status is "healthy" when all services are reachable, "degraded" otherwise.
Uploads a document, processes it through the full pipeline (chunking, embedding, entity extraction) and persists it in Redis and Neo4j.
Request body (IngestRequest):
{
"file_path": "/path/to/document.pdf",
"thread_id": "my-project",
"skip_existing": true
}| Field | Type | Default | Description |
|---|---|---|---|
file_path |
string | required | Path to the file to process (PDF, DOCX, TXT) |
thread_id |
string | required | Namespace for multi-tenant isolation |
skip_existing |
boolean | true |
Skip already-indexed chunks (dedup via SHA-256) |
Response (IngestResult):
{
"document_id": "a1b2c3d4-...",
"chunks_processed": 15,
"chunks_skipped": 0,
"entities_extracted": 23,
"relations_extracted": 18,
"nodes_created": 23,
"edges_created": 18,
"processing_time_ms": 12450.5,
"errors": []
}Supported formats: .pdf (pypdf), .docx (python-docx), .txt (plain text).
Executes a hybrid RAG query: vector search + graph traversal + LLM generation.
Request body (QueryRequest):
{
"query": "Which technologies are connected to Neo4j?",
"thread_id": "my-project",
"top_k": 10,
"max_hops": 2
}| Field | Type | Default | Description |
|---|---|---|---|
query |
string | required | Natural-language question |
thread_id |
string | required | Namespace to query |
top_k |
integer | 10 |
Number of vector search results |
max_hops |
integer | 2 |
Maximum graph traversal depth |
Response (RAGResponse):
{
"answer": "Neo4j is connected to...",
"sources": [
{ "doc_id": "chunk-uuid", "text_preview": "First 200 chars...", "score": 0.876 }
],
"nodes_used": ["node-id-1"],
"edges_used": ["NodeA --USES--> NodeB"],
"query_intent": "entity_query",
"processing_time_ms": 3200.0
}query_intent can be: document_query, entity_query, relation_query, general.
Same request as /query, but the response is a stream of Server-Sent Events. Each LLM token is sent as an event:
data: Neo4j
data: is
data: connected
data: to...
data: [DONE]
On error: data: [ERROR] message.
Removes a document and all its chunks from Redis.
Response:
{ "deleted": "a1b2c3d4-..." }The Agent API exposes the multi-agent system at http://localhost:8002 (internal port 8001).
- Swagger UI: http://localhost:8002/docs
Receives a natural-language request, classifies the intent, executes the agent plan and returns structured output.
Request body:
{
"request": "What do you know about Neo4j?",
"thread_id": "default",
"context": {}
}| Field | Type | Default | Description |
|---|---|---|---|
request |
string | required | Natural-language request |
thread_id |
string | default |
KG namespace to operate on |
context |
dict | {} |
Extra params (e.g. file_path, topic) |
Response (AgentRunResponse):
{
"run_id": "uuid",
"intent": "query",
"output": "Neo4j is a graph database...",
"plan": [{"agent": "analyst", "action": "hybrid_search", "status": "done"}],
"quality": {"overall_health": 0.85, "total_nodes": 120},
"duration_ms": 2340,
"error": null
}Automatically classified intents:
| Keywords in request | Intent | Delegated agent |
|---|---|---|
| ingest, upload, load | ingest |
Ingestion Agent |
| what do you know, describe, tell me | query |
Analyst Agent |
| analyse, count, statistics | analyze |
Analyst Agent |
| report, generate, summarise | synthesize |
Synthesis Agent |
| validate quality, check | validate |
Validator Agent |
| missing relations, gap | kgc |
KGC Agent |
| health, status, monitor | monitor |
Monitor Agent |
Returns the persisted record of a previous execution.
Returns the last N runs (default 20), ordered by date descending.
{ "status": "ok", "kg_api": true, "kg_api_url": "http://localhost:8000" }The knowledge-graph-agents/ module implements the Supervisor + Specialists pattern.
┌──────────────────┐
│ ORCHESTRATOR │
│ (Router+Planner)│
└────────┬─────────┘
│ delegates by intent (LangGraph)
┌──────────────────┼──────────────────┐
│ │ │
┌──────▼──────┐ ┌───────▼──────┐ ┌──────▼──────┐
│ INGESTION │ │ ANALYST │ │ SYNTHESIS │
│ AGENT │ │ AGENT │ │ AGENT │
└──────┬──────┘ └───────┬──────┘ └──────┬──────┘
│ │ │
┌──────▼──────┐ ┌───────▼──────┐ ┌──────▼──────┐
│ VALIDATOR │ │ KGC │ │ MONITOR │
│ AGENT │ │ AGENT │ │ AGENT │
└─────────────┘ └──────────────┘ └─────────────┘
│ HTTP REST
┌────────▼─────────┐
│ knowledge-graph │
│ -api │
│ Neo4j + Redis │
└──────────────────┘
| Agent | Responsibility |
|---|---|
| Orchestrator | Classifies intent, builds plan — never executes tools directly |
| Ingestion | Health check, dedup check, kg_ingest, report |
| Analyst | Vector search / graph traversal / hybrid (3 strategies) |
| Validator | 4 Cypher queries, KGQualityReport with overall_health |
| KGC | Transitive closure + similarity, finds missing relations |
| Synthesis | RAG context + Ollama, Markdown report (optional auto-ingest) |
| Monitor | Health check + quick quality check, alert summary |
Each execution is recorded as an AgentRunRecord (Pydantic) in the in-process store and, best-effort, as an AgentRun node in Neo4j via POST /graph/cypher/write.
MATCH (r:AgentRun {run_id: $run_id})
RETURN r.agent_name, r.intent, r.status, r.duration_mscd knowledge-graph-agents
pytest tests/ -vTests use httpx mocks — no live services needed.
The UI is a Next.js 15 (App Router) SPA with three main pages.
- Real-time service health indicators (Neo4j, Redis, Ollama)
- Quick links to functional pages
- Direct access to Swagger, Neo4j Browser, RedisInsight
- Search form with configurable parameters (
thread_id,top_k,max_hops) - SSE streaming support: tokens appear in real time during generation
- Structured result view: answer, sources with score, metadata (intent, nodes, edges, time)
- Query input to explore sections of the knowledge graph
- Interactive force-directed visualisation (react-force-graph-2d)
- Colour-coded nodes by type, edges with arrows and relation labels
- Zoom, pan and drag
src/lib/api-client.ts is the single contract between UI and API:
- TypeScript interfaces mirroring the Pydantic models
- Typed functions for every endpoint (
getHealth,postQuery,postIngest,deleteDocument) - Async generator
streamQuery()for SSE streaming AbortSignalsupport for request cancellation
Copy .env.local.example to .env.local:
cd knowledge-graph-ui
cp .env.local.example .env.local| Variable | Default | Description |
|---|---|---|
NEXT_PUBLIC_API_URL |
http://localhost:8000 |
API base URL |
NEXT_PUBLIC_ENABLE_STREAMING |
true |
Enable/disable SSE streaming |
NEXT_PUBLIC_ENABLE_GRAPH_VIEW |
true |
Feature flag for graph view |
All variables are defined in .env.example at the root. Copy to .env and customise:
cp .env.example .env| Variable | Default | Description |
|---|---|---|
NEO4J_URI |
bolt://neo4j:7687 |
Connection URI (use localhost for local dev) |
NEO4J_USER |
neo4j |
Username |
NEO4J_PASSWORD |
yourpassword |
Change this in production |
NEO4J_DATABASE |
neo4j |
Database name |
| Variable | Default | Description |
|---|---|---|
REDIS_URL |
redis://redis:6379 |
Connection URL (use localhost for local dev) |
REDIS_INDEX_NAME |
kg_vectors |
Vector index name |
REDIS_VECTOR_DIM |
768 |
Vector dimension (depends on embedding model) |
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL |
http://ollama:11434 |
Ollama URL (use localhost for local dev) |
OLLAMA_LLM_MODEL |
llama3 |
Text generation model |
OLLAMA_EMBEDDING_MODEL |
nomic-embed-text |
Embedding model (768 dim) |
| Variable | Default | Description |
|---|---|---|
CHUNK_SIZE |
1024 |
Maximum chunk size (characters) |
CHUNK_OVERLAP |
128 |
Overlap between consecutive chunks (characters) |
| Variable | Default | Description |
|---|---|---|
LOG_LEVEL |
INFO |
Log level (DEBUG, INFO, WARNING, ERROR) |
DEBUG |
false |
Debug mode |
Note: for local development without Docker,
NEO4J_URI,REDIS_URLandOLLAMA_BASE_URLmust uselocalhostinstead of Docker container names.
Open the knowledge-graph/ folder in VS Code. The .vscode/ directory contains ready-to-use configurations.
- Python (ms-python.python)
- Ruff (charliermarsh.ruff)
- Prettier (esbenp.prettier-vscode)
- JavaScript Debugger (built-in)
| Name | Type | Description |
|---|---|---|
| API: FastAPI (debugpy) | Python | Starts uvicorn with Python debugger, hot-reload |
| UI: Next.js (Server) | Node | Starts npm run dev and attaches Chrome debugger |
| UI: Next.js (Chrome) | Chrome | Attaches to a running Next.js server on :3000 |
| API: Tests (pytest) | Python | Runs pytest with step-through debugger |
| Agents: API (debugpy) | Python | Starts Agent API with debugger on port 8001 |
| Agents: Orchestrator (debugpy) | Python | Runs the LangGraph orchestrator directly |
| MCP: Server (debugpy) | Python | Starts the MCP server with debugger |
| Full Stack: API + UI | Compound | Starts API + UI in parallel with one click |
| Full Stack: All Services | Compound | Starts API + UI + MCP + Agents |
- Start infrastructure:
make up-dev - In VS Code, select "Full Stack: API + UI" in the Run and Debug panel
- Press
F5— API (port 8000) and UI (port 3000) start with active debuggers - Set breakpoints in Python (API) or TypeScript (UI) code
- Open
http://localhost:3000in the browser
Accessible from Terminal > Run Task...:
| Task | Command |
|---|---|
| Docker: Up Prod | docker compose --profile prod up --build -d |
| Docker: Up Dev (infra + tools) | docker compose --profile dev up -d |
| Docker: Down | docker compose --profile prod --profile dev down |
| API: Dev Server | uvicorn api.main:app --reload |
| UI: Dev Server | npm run dev |
| API: Run Tests | pytest tests/ -v |
| API: Lint | ruff check . |
| Pull Ollama Models | ollama pull llama3 + nomic-embed-text |
cd knowledge-graph-api
pytest tests/ -v # run all tests
pytest tests/test_ingest.py -v -k "test_name" # specific test
ruff check . # lint
ruff check . --fix # auto-fixTests use mocks for Neo4j, Redis and Ollama — no live services needed.
cd knowledge-graph-agents
pytest tests/ -v
ruff check .cd knowledge-graph-ui
npm run lintmake test # pytest (API)
make lint # ruff (API)
make agents-test # pytest (Agents)
make agents-lint # ruff (Agents)
make mcp-test # pytest (MCP)The ingestion pipeline (POST /ingest) processes a document in 8 stages:
Document
|
v
[1] File Routing -------> MIME type detection (PDF / DOCX / TXT)
|
v
[2] Content Extraction -> Raw text + page count
|
v
[3] Text Chunking ------> 1024-char chunks, 128-char overlap
| (respects sentence boundaries)
v
[4] Embedding ----------> 768-D vectors via Ollama (nomic-embed-text)
|
v
[5] Deduplication ------> SHA-256 hash to skip existing chunks
|
v
[6] Entity Extraction --> LLM extracts entities (Person, Technology, ...)
| and relations (USES, PART_OF, ...)
v
[7] Vector Storage -----> Upsert into Redis (RedisSearch + RedisJSON)
|
v
[8] Graph Storage ------> Nodes and edges in Neo4j (MERGE/upsert)
Person, Organization, Product, Technology, Process, Event, Location, Concept, Document, Category, Tag
BELONGS_TO, RELATES_TO, CREATED_BY, MENTIONS, PART_OF, USES, LOCATED_IN, OCCURRED_AT, HAS_TAG, SIMILAR_TO, DEPENDS_ON, REPLACED_BY
The RAG pipeline (POST /query) answers questions in 5 stages:
User question
|
v
[1] Intent Classification -> document_query | entity_query
| | relation_query | general
v
[2] Vector Search ---------> Top-K documents by cosine similarity
| (Redis KNN)
v
[3] Graph Enrichment ------> Traversal of neighbours up to max_hops
| (Neo4j Cypher)
v
[4] Context Assembly ------> System prompt with chunks + nodes + edges
|
v
[5] LLM Generation -------> Response (sync JSON or SSE stream)
The search is hybrid: it combines semantic similarity (vector) with structural relations (graph) to produce more complete and context-aware answers.
Each document chunk is stored in Redis as JSON with a vector index:
| Field | Type | Description |
|---|---|---|
id |
UUID | Unique chunk identifier |
thread_id |
string | Namespace / partition |
text |
string | Chunk text content |
name |
string | Source filename |
vector |
float[768] | Chunk embedding |
content_hash |
string | SHA-256 for deduplication |
base_document_id |
string | Parent document ID |
mime_type |
string | Original file MIME type |
page_number |
integer | Page number (PDFs) |
Each extracted entity is stored as a node in the graph:
| Field | Type | Description |
|---|---|---|
id |
UUID | Unique identifier |
name |
string | Entity name |
label |
string | Display label |
node_type |
string | Type (Person, Technology, ...) |
namespace |
string | Namespace / partition |
importance |
float | Score 0-1 |
confidence |
float | Score 0-1 |
source_chunk_ids |
string[] | References to source chunks |
Each extracted relation becomes an edge in the graph:
| Field | Type | Description |
|---|---|---|
id |
UUID | Unique identifier |
source_id |
string | Source node |
target_id |
string | Target node |
relation_type |
string | Type (USES, PART_OF, ...) |
weight |
float | Relation strength 0-1 |
confidence |
float | Extraction confidence 0-1 |
This project draws inspiration from the following pipelines and papers:
| Paper / Tool | Usage in this project |
|---|---|
| OpenIE6 (Kolluru et al., 2020) | Patterns for open-domain triple extraction |
| CoDe-KG (Anuyah et al., 2025) | Modular pipeline: coreference + decomposition + RE |
| KGGen (Mo et al., 2025) | Entity clustering/dedup to reduce graph sparsity |
| BLINK (Wu et al., 2019) | Bi-encoder + cross-encoder architecture for entity linking |
| DocRED (Yao et al., 2019) | Benchmark for document-level relation extraction |
Hybrid architecture recommended by the papers: Graph DB (Neo4j/Cypher) for structure + Vector DB (Redis/FAISS) for semantic similarity, with the option of vector indexes directly in Neo4j (CREATE VECTOR INDEX).
docker compose logs ollama
# If the container is up but models are not downloaded
docker compose exec ollama ollama list
make pull-modelsThe first query after pulling models may be slow (~30s) due to model loading.
docker compose logs neo4j
# Verify the password matches
echo $NEO4J_PASSWORD # must match the value in .envOne or more backend services are unreachable. Check which ones return false:
curl http://localhost:8000/healthVerify all containers are running: docker compose ps
Verify NEXT_PUBLIC_API_URL is set correctly in .env.local:
- Local dev:
http://localhost:8000 - Docker:
http://localhost:8000(the browser calls the API directly) - Cross-host: configure CORS on the API (currently
allow_origins=["*"])
The full stack requires ~6-8 GB RAM. If Docker has lower limits:
docker stats
# Use a smaller Ollama model or disable APOC if not needed# WARNING: deletes all data!
docker compose down -v
make up-prod