Skip to content

hyperloop11/kube-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kube-ai

Self-aware Kubernetes workloads with pod-local diagnostics, RAG context, and AI-assisted recovery.

Status Stack AI

Why this project matters

Most incident response is still human glue: logs, dashboards, hunches, repeat.

kube-ai aims to make each workload more autonomous:

  • detect failure signals in real time,
  • reason using local + hosted AI,
  • ground recommendations using troubleshooting docs,
  • and move toward safe self-healing with minimal manual intervention.

The novelty

Calling an AI API is easy. Building a pod that can interpret its own failure patterns and improve over time is the real challenge.

What kube-ai is trying to do differently:

  • run as a pod-local sidecar, not only as a cluster-level observer,
  • keep an incident memory close to the workload,
  • use lightweight local reasoning first, then escalate to stronger models,
  • make troubleshooting knowledge portable with the deployment.

Think of it as workload-level immune response, not only platform-level monitoring.

Architecture at a glance

[ pod-local agent ]
  heartbeat -> signal collection -> anomaly detection -> incident ledger
                    |
                    v
               [ RAG pipeline ]
         retrieve relevant troubleshooting context
                    |
                    v
               [ AI routing ]
      local Ollama first, Foundry fallback for depth

Project layout:

  • agent/ pod-side runtime, AI routing, health endpoint
  • rag-pipeline/ ingest and query for troubleshooting knowledge
  • helm/ deployment packaging (in progress)
  • tsg-samples/ seed troubleshooting guides
  • internal-docs/ execution plan, testing notes, phase status

Quick start

  1. Set up Python 3.12 and create the virtual environment.
cd agent
python -m venv .venv
.\.venv\Scripts\python.exe -m pip install -r requirements.txt
  1. Configure AI mode.
  • Local mode: Ollama at http://127.0.0.1:11434/v1
  • Foundry mode: set AI_FOUNDRY_ENDPOINT_URL and AI_FOUNDRY_API_KEY in local env
  1. Run the agent.
$env:AGENT_CONFIG_FILE='config/agent.yaml'
.\.venv\Scripts\python.exe -m app.main
  1. Verify service health.
curl http://localhost:9090/health

Where we are now

  • Agent heartbeat + rule engine + health endpoint
  • Incident ledger persistence
  • Local Ollama diagnostics path
  • Azure Foundry diagnostics path
  • RAG ingest/query MVP
  • Kubernetes restart/OOM collector
  • Notification transport + execution policy layer
  • Helm-based production packaging
  • End-to-end demo and validation flow

Execution plan

  1. Finish pod-level signal coverage and AI-assisted diagnosis loop.
  2. Strengthen RAG grounding for higher-confidence recommendations.
  3. Add notification and safe action orchestration.
  4. Package for installable Kubernetes deployment via Helm.

Useful links

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors