Skip to content

performance_issues#26

Merged
sergei-bronnikov merged 7 commits into
mainfrom
17720_performance_issues
Jul 2, 2026
Merged

performance_issues#26
sergei-bronnikov merged 7 commits into
mainfrom
17720_performance_issues

Conversation

@sergei-bronnikov

@sergei-bronnikov sergei-bronnikov commented Jun 11, 2026

Copy link
Copy Markdown

https://bugtracker.codiodev.com/issue/codio-17720/eCornell-reported-Claude-Code-Performance-Issues-with-LLM-Proxy

Summary by CodeRabbit

  • New Features

    • Added tools authorization lookup capability for improved request validation.
  • Performance Improvements

    • Implemented HTTP connection pooling with optimized timeout configurations.
    • Enhanced event message consumer configuration for better throughput.
    • Optimized request handling with local variable caching to reduce repeated lookups.
  • Bug Fixes

    • Simplified utility helper functions for better maintainability.

@sergei-bronnikov

Copy link
Copy Markdown
Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack


Wait, I accidentally invented a range ID. Let me redo this correctly, placing range_a0cc85580c24 in the correct layer only once.

I keep making up range IDs. Let me be careful and only use what's in all_range_ids. The list is:

range_d881c3d182aa, range_7a25661e6a9c, range_be3686d2b0d0, range_94d4063657a4, range_9d4aecc1b108, range_850bff88efb8, range_0e1860a7aa09, range_47d312851312, range_4728347bbc6e, range_0758281058d0, range_e72c63438280, range_beaff7dcc4df, range_c4bb8aa893f9, range_0b86de6e0ed3, range_03f692ed69ee, range_a93eeb0b1efd, range_42e97eccb3a9, range_f2db3d9e0f14, range_ff9e6594cf7d, range_cb83cc80446b, range_4fdf23ffa9ff, range_08e667aaca88, range_8021d1b7fff3, range_a0cc85580c24, range_6c89b25f8e77, range_3e141d3ef842, range_c3a2b84ec212, range_5292026d3533

That's 28 range IDs. I need to assign each exactly once.

Walkthrough

The PR refactors getMiddleware to cache fullPath and method once per request, updates getProvider to accept fullPath as a parameter, adds an AllowedToolsSet map for O(1) tool authorization, conditionalizes response-logging writer initialization, tunes the upstream HTTP transport with explicit connection pooling, and increases the event consumer pool from 4 to 16 with a buffered channel of 1000.

Changes

Proxy Middleware and Infrastructure Improvements

Layer / File(s) Summary
AllowedToolsSet declaration and init population
internal/provider/openai/cost.go
Exports AllowedToolsSet map[string]struct{} and populates it from AllowedTools in init(), replacing the existing slice for O(1) tool membership checks.
getProvider refactor and getMiddleware fullPath/method caching
internal/server/web/proxy/middleware.go
getProvider accepts fullPath instead of reading c.FullPath() internally. getMiddleware caches fullPath and method at entry and uses them throughout; metadata header serialization changes from JSON-marshal to direct copy; responseWriter initialization is gated on kc.ShouldLogResponse; nil-blw guard added on non-stream response capture; contains simplified to slices.Contains.
Anthropic, Bedrock, and custom provider route branches
internal/server/web/proxy/middleware.go
Anthropic, Bedrock, and custom provider endpoint conditionals switched from c.FullPath()/c.Request.Method to fullPath/method; bodyStr caching added for gjson lookups in the custom provider branch.
OpenAI responses branch: AllowedToolsSet and container detection
internal/server/web/proxy/middleware.go
OpenAI responses routing uses fullPath and switches tool authorization to openai.AllowedToolsSet map lookup; container memory limit extracted by inspecting each tool's container payload.
OpenAI remaining endpoint branches and access-control checks
internal/server/web/proxy/middleware.go
All remaining OpenAI endpoint branches (vLLM, deepinfra, Azure, chat completions, embeddings, images, audio, video, assistants, threads, messages, runs, run steps, moderations, models, files) and access-control checks updated to use fullPath/method conditionals.
HTTP transport tuning and event consumer pipeline sizing
internal/server/web/proxy/proxy.go, cmd/bricksllm/main.go
NewProxyServer configures an explicit http.Transport with idle connection pooling limits and TLS handshake timeout; event channel buffered at 1000 and consumer pool uses numEventConsumers=16.
Minor correctness fixes
internal/server/web/proxy/anthropic.go, internal/server/web/proxy/video.go
Anthropic handler latency switches from time.Now().Sub(start) to time.Since(start); video handler drops a redundant trailing return.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • codio/BricksLLM#9: Directly modifies internal/provider/openai/cost.go and the OpenAI responses tool authorization logic in middleware.go, the same functions and variables changed in this PR.
  • codio/BricksLLM#14: Adds container memory-limit validation and cost estimation in the OpenAI responses flow, which this PR's revised container detection logic (now using AllowedToolsSet and payload inspection) builds upon.

Suggested reviewers

  • destitutus
  • AndreyNikitin
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'performance_issues' is vague and generic, using a non-descriptive term that doesn't convey the specific nature of the changes (consumer pool sizing, buffered channels, HTTP transport optimization, etc.). Use a more descriptive title that captures the main performance improvements, such as 'Optimize LLM proxy performance with configurable event consumers and connection pooling' or similar.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 17720_performance_issues

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/bricksllm/main.go`:
- Around line 366-372: The Consumer needs to ensure all worker goroutines
cleanly exit on shutdown using a sync.WaitGroup. Add a WaitGroup field to the
Consumer struct, then in the StartEventMessageConsumers method, add each worker
goroutine to the WaitGroup before spawning it (Add call), and have each worker
call Done() when it exits after receiving the stop signal. Finally, modify the
Stop() method to call wg.Wait() at the end so it blocks until all workers have
completed, ensuring the buffered eventMessageChan is fully drained before
returning and allowing the server shutdown to proceed.

In `@internal/server/web/proxy/proxy.go`:
- Around line 94-101: The http.Transport in the client variable is created from
scratch, which drops default transport behavior including environment proxy
routing (HTTP_PROXY/HTTPS_PROXY). Instead of creating a fresh http.Transport
literal, clone http.DefaultTransport as the base and then override only the
specific fields that need tuning: MaxIdleConns, MaxIdleConnsPerHost,
IdleConnTimeout, and TLSHandshakeTimeout. This preserves the default proxy
behavior while applying the desired connection pool optimizations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0c00ed4c-ed76-42e4-9300-f57149d2e48b

📥 Commits

Reviewing files that changed from the base of the PR and between 0ddb013 and c1a8050.

📒 Files selected for processing (6)
  • cmd/bricksllm/main.go
  • internal/provider/openai/cost.go
  • internal/server/web/proxy/anthropic.go
  • internal/server/web/proxy/middleware.go
  • internal/server/web/proxy/proxy.go
  • internal/server/web/proxy/video.go
💤 Files with no reviewable changes (1)
  • internal/server/web/proxy/video.go

Comment thread cmd/bricksllm/main.go
Comment thread internal/server/web/proxy/proxy.go
@sergei-bronnikov sergei-bronnikov merged commit d47e20c into main Jul 2, 2026
2 checks passed
@sergei-bronnikov sergei-bronnikov deleted the 17720_performance_issues branch July 2, 2026 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants