feat(serverless): support forking checkpoints from a different W&B entity#741
Open
AnnaSuSu wants to merge 1 commit into
Open
feat(serverless): support forking checkpoints from a different W&B entity#741AnnaSuSu wants to merge 1 commit into
AnnaSuSu wants to merge 1 commit into
Conversation
…tity ServerlessBackend._experimental_fork_checkpoint built the source artifact path from the destination model's entity (model.entity or api.default_entity), so a checkpoint could only be forked within the same W&B entity. Forking e.g. willow-voice/willow_normal/kl-000-1 into wb-training/... failed because it looked for the artifact under wb-training. Add an optional from_entity parameter and resolve the source entity as from_entity -> model.entity -> api.default_entity via a small pure helper (_wandb_checkpoint_collection_path) that also raises a clear error when no entity is available. Re-implements the approach validated in OpenPipe#676. Closes OpenPipe#649
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
ServerlessBackend._experimental_fork_checkpointbuilds the source W&B artifact path from the destination model's entity:So a checkpoint can only be forked within the same entity — forking e.g.
willow-voice/willow_normal/kl-000-1intowb-training/willow_normal/my-new-runfails because it looks for the artifact underwb-training. The only workaround is to download from the source entity and re-upload to the destination, doubling storage.Closes #649.
Fix
Add an optional
from_entityparameter to_experimental_fork_checkpointand resolve the source entity asfrom_entity → model.entity → api.default_entity. The resolution + path building move into a small pure helper,_wandb_checkpoint_collection_path, which also raises a clearValueErrorwhen no entity can be determined (previously this produced a"None/…"path).This re-implements the approach from #676 (validated there but voluntarily closed by its author); credit to @poofeth for the original.
Tests
New
tests/unit/test_serverless_fork_checkpoint.py— no GPU/backend deps, runs on the base install:_wandb_checkpoint_collection_pathresolution: explicitfrom_entitywins → falls back to model entity → falls back to default entity → raises when none._experimental_fork_checkpointthrough the W&B branch with a fakewandb.Apiand asserts the artifact query uses the explicit source entity (("lora", "src-entity/src-project/src-model")).