Skip to content

feat(serverless): support forking checkpoints from a different W&B entity#741

Open
AnnaSuSu wants to merge 1 commit into
OpenPipe:mainfrom
AnnaSuSu:feat/fork-checkpoint-from-entity
Open

feat(serverless): support forking checkpoints from a different W&B entity#741
AnnaSuSu wants to merge 1 commit into
OpenPipe:mainfrom
AnnaSuSu:feat/fork-checkpoint-from-entity

Conversation

@AnnaSuSu

Copy link
Copy Markdown

What

ServerlessBackend._experimental_fork_checkpoint builds the source W&B artifact path from the destination model's entity:

from_entity = model.entity or api.default_entity
collection_path = f"{from_entity}/{from_project}/{from_model}"

So a checkpoint can only be forked within the same entity — forking e.g. willow-voice/willow_normal/kl-000-1 into wb-training/willow_normal/my-new-run fails because it looks for the artifact under wb-training. The only workaround is to download from the source entity and re-upload to the destination, doubling storage.

Closes #649.

Fix

Add an optional from_entity parameter to _experimental_fork_checkpoint and resolve the source entity as from_entity → model.entity → api.default_entity. The resolution + path building move into a small pure helper, _wandb_checkpoint_collection_path, which also raises a clear ValueError when no entity can be determined (previously this produced a "None/…" path).

await backend._experimental_fork_checkpoint(
    model,                       # destination, e.g. wb-training/...
    from_model="kl-000-1",
    from_project="willow_normal",
    from_entity="willow-voice",  # NEW: source entity
)

This re-implements the approach from #676 (validated there but voluntarily closed by its author); credit to @poofeth for the original.

Tests

New tests/unit/test_serverless_fork_checkpoint.py — no GPU/backend deps, runs on the base install:

  • _wandb_checkpoint_collection_path resolution: explicit from_entity wins → falls back to model entity → falls back to default entity → raises when none.
  • An async test drives _experimental_fork_checkpoint through the W&B branch with a fake wandb.Api and asserts the artifact query uses the explicit source entity (("lora", "src-entity/src-project/src-model")).
tests/unit/test_serverless_fork_checkpoint.py ..... [5 passed]
existing serverless unit tests: 12 passed (no regressions)

…tity

ServerlessBackend._experimental_fork_checkpoint built the source artifact path
from the destination model's entity (model.entity or api.default_entity), so a
checkpoint could only be forked within the same W&B entity. Forking e.g.
willow-voice/willow_normal/kl-000-1 into wb-training/... failed because it looked
for the artifact under wb-training.

Add an optional from_entity parameter and resolve the source entity as
from_entity -> model.entity -> api.default_entity via a small pure helper
(_wandb_checkpoint_collection_path) that also raises a clear error when no entity
is available. Re-implements the approach validated in OpenPipe#676.

Closes OpenPipe#649
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add from_entity parameter to _experimental_fork_checkpoint

1 participant