Maintenance: Langfuse SDK#985
Conversation
|
Important Review skippedAuto reviews are limited based on label configuration. 🏷️ Required labels (at least one) (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Port the Langfuse integration from the v2 SDK (2.60.3) to the OTel-based v4 SDK (4.7.1). Scope is the SDK upgrade only; evaluation feature work (score-fetching rewrite, dataset dedup, sample-index fan-out) is deferred to a separate PR. - core/langfuse: rewrite tracer + observe_llm_execution for v4 — explicit per-key clients (multi-tenant), start_observation/LangfuseSpan/ LangfuseGeneration, usage_details, set_trace_attributes via OTel keys, format_langfuse_error for concise ApiError logs. - crud/evaluations/langfuse: port the v2 calls that v4 removed — dataset_item.observe()/trace()/generation() -> start_observation + set_trace_attributes + dataset_run_items.create; langfuse.score -> create_score; trace.get(fields="core,io,scores") to avoid full-trace timeouts. - tests: v4-adapt tracer + crud langfuse suites; add format_error tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9dd478b to
f7e6f30
Compare
OpenAPI changes ⚪ No API surface changesNote This PR does not modify the API contract.
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
The format_langfuse_error helper was defined in the SDK v4 upgrade but had no production callers, so Langfuse exception logs still rendered the full ApiError string (every HTTP response header). Wire the helper into all Langfuse-related except blocks across the tracer, observe_llm_execution, and the evaluations CRUD so logs keep only status_code and body. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Issue
Closes #407
Checklist
Langfuse keys live in the DB (per org/project), not in env files. Make sure the test org has valid Langfuse keys (
public_key,secret_key,host) set up before testing.1. Chat / Response tracing
Does every AI response still show up in Langfuse?
POST /responses, async) — a trace appears with input, output, and token/cost numbers, and ends cleanlyset_trace_attributes)2. Background jobs (Celery)
Do traces still work when responses run as background jobs?
run_response_job, priority 9) — trace shows up viaobserve_llm_executionlangfuse_credentialsthreaded throughChainContext, each step loggedusage_detailsare int-only (v4 dropped the unit field), no type errors on token counts3. Upload a dataset for evals
Can we still push eval datasets to Langfuse?
upload_dataset) — appears in Langfuse with all rows (input, expected output, metadata)ThreadPoolExecutor) — concurrentcreate_dataset_itemcalls succeed, no thread-safety errors4. Run an eval + write scores
Do eval runs still create traces and write scores?
evaluation_cron_job) — each row gets a trace, cosine score written viacreate_scorerun_evaluation_fast, Celery) — dataset-run + scores writtenapi.dataset_run_items.create(replaces v2dataset_item.observe()) — run items show in Langfuse UI5. Read scores back
Can we fetch eval scores from Langfuse?
get_evaluation_run_status) — scores load viaapi.datasets.get_run+api.trace.get(fields="core,io,scores")resync_score=true/force=true) — scores re-fetch, fields parsed (name/value/comment/data_type)trace.get(ThreadPoolExecutor) — no thread-safety errorscrud/evaluations/core.py) returns consistent scores6. Cron / scheduled jobs
evaluation_cron_job+pending_jobs_cron_jobrun start-to-finish, no Langfuse errorsflush()called on all pathsStaging-Specific
format_langfuse_errorkeepsApiErrorlogs compact (status_code + body, no full HTTP header dump)/responsesfrom the new OTel-based client