tgi

Here are 16 public repositories matching this topic...

opea-project / GenAIExamples

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.

xeon summarization codegen copilot tgi rag llms genai gaudi2 chatqna

Updated Jun 26, 2026
Shell

Kubernetes operator for self-hosted LLM inference across a heterogeneous GPU fleet: NVIDIA CUDA, AMD Vulkan, and Apple Silicon Metal. Runtimes: llama.cpp, vLLM, TGI, mlx-server. Multi-GPU sharding, model caching, OpenAI-compatible endpoints. Apache-2.0, run across homelab and on-prem fleets, actively developed.

Updated Jun 27, 2026
Go

zRzRzRzRzRzRzR / lm-fly

Sponsor

Star

大模型推理框架加速，让 LLM 飞起来

mlx tgi openvino llm vllm llm-inference tensorrt-llm

Updated May 10, 2024
Python

slinusc / bench360

Star

Bench360 is a modular benchmarking suite for local LLM deployments. It offers a full-stack, extensible pipeline to evaluate the latency, throughput, quality, and cost of LLM inference on consumer and enterprise GPUs. Bench360 supports flexible backends, tasks and scenarios, enabling fair and reproducible comparisons for researchers & practitioners.

benchmark performance framework energy deployment local optimization engine inference quantization energy-consumption tgi llm vllm llm-inference sglang lmdeploy bench360

Updated Feb 18, 2026
Python

pyxis3-ai / awesome-model-agnostic-llm

Star

Curated list of open-source LLM-serving runtimes, routers, evaluators, and standards. Run LLMs without locking into one vendor.

awesome ai awesome-list tgi mlops model-agnostic llm vllm open-weights

Updated Jun 20, 2026

Bradley-Butcher / Splleed

Star

LLM Inference performance harness

benchmarking latency inference tgi llm vllm llm-inference

Updated Dec 29, 2025
Python

oriolrius / sagemaker-gemma3-openwebui

Star

AWS deployment stack for Gemma 3 on SageMaker with HuggingFace TGI, OpenAI-compatible API (Lambda + API Gateway), and OpenWebUI chat interface

aws lambda cloudformation sagemaker tgi huggingface bfloat16 openai-api llm-inference openwebui gemma3

Updated Mar 25, 2026
Python

TGI13 / Abi

Star

Zitate & Memes der Klasse TGI13

abitur tgi a-levels

Updated May 17, 2019

pyxis3-ai / vllm-bench

Star

Throughput + latency benchmark for OpenAI-compatible LLM endpoints (vLLM, TGI, llama.cpp, Ollama). TTFT, TPOT, throughput, percentiles. Runtime-agnostic.

benchmark latency inference throughput tpot tgi mlops ttft llm vllm ollama ai-infrastructure openai-compatible

Updated Jun 20, 2026
Python

tgilabs / .github

Star

it's the .github repo 🚀

tgi

Updated Sep 30, 2025

rxxusp / llmtop

Star

An nvtop for local LLM inference: zero-config autodiscovery of vLLM, llama.cpp, Ollama, TGI, SGLang + live GPU and serving metrics in a Textual TUI. Unified-memory (GB10/Jetson) aware. MIT.

monitoring gpu textual inference tui nvidia htop tgi nvtop llm llama-cpp vllm ollama sglang gb10

Updated Jun 20, 2026
Python

dzivkovi / vllm-huggingface-bridge

Star

Bridge GitHub Copilot Chat with local vLLM/TGI servers and HuggingFace cloud models. Enterprise-ready VS Code extension for air-gapped AI coding.

enterprise ai vscode-extension tgi huggingface air-gapped github-copilot llm code-assistant vllm

Updated Sep 29, 2025
TypeScript

rahulunair / xpu_tgi

Star

TGI server setup for Intel Data Centre GPUs

intel tgi xpu llm intelgpu llm-inference

Updated Nov 26, 2024
Shell

Anirudhx7 / ollama-mesh

Star

GPU-aware LLM proxy for Ollama, vLLM, TGI, and llama.cpp. Warm-first routing, cloud fallback to OpenAI/Anthropic, per-token cost tracking. Single Go binary, zero deps.

golang gpu load-balancer self-hosted reverse-proxy openai homelab tgi llm anthropic vllm ollama llm-gateway litellm-alternative

Updated Jun 26, 2026
Go

varad-more / selfhosted-chat-api

Star

Self-hosted FastAPI gateway exposing OpenAI and Anthropic Messages APIs in front of any open-source LLM runtime (vLLM, Ollama, llama.cpp, TGI, SGLang, LocalAI, LM Studio). Streaming, embeddings, metrics, auth, rate limiting.

Updated Apr 22, 2026
Python

aisingapore / sealion-sampler

Star

Lightweight HTML form with Python Flask app and accompanying scripts for swift testing of interactions with SEA-LION family of LLMs.

flask-application tgi large-language-models ollama

Updated Aug 2, 2024
Python

Improve this page

Add a description, image, and links to the tgi topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tgi topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tgi

Here are 16 public repositories matching this topic...

opea-project / GenAIExamples

defilantech / LLMKube

zRzRzRzRzRzRzR / lm-fly

slinusc / bench360

pyxis3-ai / awesome-model-agnostic-llm

Bradley-Butcher / Splleed

oriolrius / sagemaker-gemma3-openwebui

TGI13 / Abi

pyxis3-ai / vllm-bench

tgilabs / .github

rxxusp / llmtop

dzivkovi / vllm-huggingface-bridge

rahulunair / xpu_tgi

Anirudhx7 / ollama-mesh

varad-more / selfhosted-chat-api

aisingapore / sealion-sampler

Improve this page

Add this topic to your repo