Skip to content

Maintenance: Langfuse SDK#985

Open
AkhileshNegi wants to merge 5 commits into
mainfrom
upgrade/langfuse-sdk
Open

Maintenance: Langfuse SDK#985
AkhileshNegi wants to merge 5 commits into
mainfrom
upgrade/langfuse-sdk

Conversation

@AkhileshNegi

@AkhileshNegi AkhileshNegi commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Issue

Closes #407

Checklist

Langfuse keys live in the DB (per org/project), not in env files. Make sure the test org has valid Langfuse keys (public_key, secret_key, host) set up before testing.

1. Chat / Response tracing

Does every AI response still show up in Langfuse?

  • Send a normal chat request (POST /responses, async) — a trace appears with input, output, and token/cost numbers, and ends cleanly
  • Send a synchronous request — same trace shows up with input/output
  • Send a follow-up message — it joins the same session as the first (not a new one)
  • Force an error (bad LLM call) — error is logged on the trace, nothing left hanging (no orphan span)
  • Trace details set correctly — name, input, output, session_id, tags, metadata (uses OTel shim set_trace_attributes)

2. Background jobs (Celery)

Do traces still work when responses run as background jobs?

  • Trigger a background response job (run_response_job, priority 9) — trace shows up via observe_llm_execution
  • Run a multi-step chain job — langfuse_credentials threaded through ChainContext, each step logged
  • Cost numbers correct — usage_details are int-only (v4 dropped the unit field), no type errors on token counts

3. Upload a dataset for evals

Can we still push eval datasets to Langfuse?

  • Upload a dataset (upload_dataset) — appears in Langfuse with all rows (input, expected output, metadata)
  • Parallel upload (ThreadPoolExecutor) — concurrent create_dataset_item calls succeed, no thread-safety errors
  • Upload a big dataset — every row lands, none dropped

4. Run an eval + write scores

Do eval runs still create traces and write scores?

  • Run a batch eval (via cron evaluation_cron_job) — each row gets a trace, cosine score written via create_score
  • Run a fast eval (run_evaluation_fast, Celery) — dataset-run + scores written
  • Trace↔dataset-run linkage via api.dataset_run_items.create (replaces v2 dataset_item.observe()) — run items show in Langfuse UI
  • Cosine scores appear on traces with correct value/name/comment

5. Read scores back

Can we fetch eval scores from Langfuse?

  • Open an eval run's status (get_evaluation_run_status) — scores load via api.datasets.get_run + api.trace.get(fields="core,io,scores")
  • Hit refresh/resync (resync_score=true / force=true) — scores re-fetch, fields parsed (name/value/comment/data_type)
  • Concurrent trace.get (ThreadPoolExecutor) — no thread-safety errors
  • Cached merge path (crud/evaluations/core.py) returns consistent scores

6. Cron / scheduled jobs

  • evaluation_cron_job + pending_jobs_cron_job run start-to-finish, no Langfuse errors
  • Nothing lost when a worker shuts down — flush() called on all paths

Staging-Specific

  • Multi-tenant: 2+ orgs with different Langfuse keys — traces route to the correct project
  • Worker logs — format_langfuse_error keeps ApiError logs compact (status_code + body, no full HTTP header dump)
  • No latency regression on /responses from the new OTel-based client
  • Test against both Langfuse Pro plan and Hobby plan orgs — confirm tracing/scoring/datasets work on each (plan-tier limits or feature gating don't break the flow)

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are limited based on label configuration.

🏷️ Required labels (at least one) (1)
  • ready-for-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2109e151-3f20-4750-bfdc-f0fe0d20799a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch upgrade/langfuse-sdk

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@AkhileshNegi AkhileshNegi marked this pull request as ready for review June 30, 2026 03:05
Port the Langfuse integration from the v2 SDK (2.60.3) to the OTel-based
v4 SDK (4.7.1). Scope is the SDK upgrade only; evaluation feature work
(score-fetching rewrite, dataset dedup, sample-index fan-out) is deferred
to a separate PR.

- core/langfuse: rewrite tracer + observe_llm_execution for v4 — explicit
  per-key clients (multi-tenant), start_observation/LangfuseSpan/
  LangfuseGeneration, usage_details, set_trace_attributes via OTel keys,
  format_langfuse_error for concise ApiError logs.
- crud/evaluations/langfuse: port the v2 calls that v4 removed —
  dataset_item.observe()/trace()/generation() -> start_observation +
  set_trace_attributes + dataset_run_items.create; langfuse.score ->
  create_score; trace.get(fields="core,io,scores") to avoid full-trace
  timeouts.
- tests: v4-adapt tracer + crud langfuse suites; add format_error tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AkhileshNegi AkhileshNegi force-pushed the upgrade/langfuse-sdk branch from 9dd478b to f7e6f30 Compare June 30, 2026 04:09
@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown

OpenAPI changes   ⚪ No API surface changes

Note

This PR does not modify the API contract.

mainfba54265 · generated by oasdiff

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.96748% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/crud/evaluations/langfuse.py 88.46% 3 Missing ⚠️
backend/app/core/langfuse/langfuse.py 96.61% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

The format_langfuse_error helper was defined in the SDK v4 upgrade but had
no production callers, so Langfuse exception logs still rendered the full
ApiError string (every HTTP response header). Wire the helper into all
Langfuse-related except blocks across the tracer, observe_llm_execution, and
the evaluations CRUD so logs keep only status_code and body.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AkhileshNegi AkhileshNegi linked an issue Jun 30, 2026 that may be closed by this pull request
@AkhileshNegi AkhileshNegi self-assigned this Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Langfuse : Update to latest

1 participant