Skip to content

fix: check partial stop on cumulative sglang output#3891

Open
Chessing234 wants to merge 1 commit into
lm-sys:mainfrom
Chessing234:fix/sglang-partial-stop-cumulative-output
Open

fix: check partial stop on cumulative sglang output#3891
Chessing234 wants to merge 1 commit into
lm-sys:mainfrom
Chessing234:fix/sglang-partial-stop-cumulative-output

Conversation

@Chessing234

Copy link
Copy Markdown

Summary

SGLang streaming dropped newlines when a stop string started with \n (e.g. \n<|endoftext|>).

Root cause

sglang_worker.py called is_partial_stop(out, stop) on each delta chunk. A lone \n chunk matched as a prefix of \n<|endoftext|>, so the worker continued without ever appending it to entire_output — the newline was discarded permanently.

Fix

Append each chunk first, then run the partial-stop check on the cumulative entire_output (same pattern as vllm_worker). If the suffix is only a tentative stop prefix, skip yielding for that step but keep the text buffered until the next token confirms it is not the stop sequence.

Fixes #3467

Made with Cursor

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When using sglang as the inference framework, if a word starting with "\n" appears in the stop parameter, the sglang will Missing '\n' during inference

1 participant