Skip to content

feat(providers): support profile updates#1914

Open
johntmyers wants to merge 5 commits into
mainfrom
feat/1881-provider-profile-update/johntmyers
Open

feat(providers): support profile updates#1914
johntmyers wants to merge 5 commits into
mainfrom
feat/1881-provider-profile-update/johntmyers

Conversation

@johntmyers

Copy link
Copy Markdown
Collaborator

🏗️ build-from-issue-agent

Summary

Add safe custom provider profile updates through a new UpdateProviderProfiles RPC and openshell provider profile update. The update path validates profile batches before writing, preserves stored custom profile metadata, rejects built-in and missing profiles, and keeps provider-derived policy JIT-composed from current profiles.

Related Issue

Closes #1881

Changes

  • proto/openshell.proto: Added UpdateProviderProfiles RPC and request/response messages.
  • crates/openshell-server/src/grpc/provider.rs: Added custom profile update handling with validation, metadata preservation, built-in/missing rejection, and attached-sandbox dynamic token grant ambiguity checks.
  • crates/openshell-server/src/grpc/policy.rs: Added tests proving updated profile policy reaches sandbox effective config without rewriting provider instances or persisted sandbox source policy, and that profile changes affect provider env revision.
  • crates/openshell-cli/src/main.rs, crates/openshell-cli/src/run.rs: Added openshell provider profile update -f|--from.
  • CLI/server test mocks: Added the new RPC method to test OpenShell implementations and extended provider profile CLI lifecycle coverage.
  • docs/sandboxes/providers-v2.mdx, docs/sandboxes/manage-providers.mdx: Documented custom profile update semantics and rollout behavior.

Deviations from Plan

The plan preferred all-or-none server-side batch updates if cleanly supported. The existing persistence API does not provide transactional multi-object CAS, so this implementation validates the full batch before writes and documents the remaining concurrent-write/storage-error retry behavior instead of adding a broad transaction layer.

Testing

  • cargo test -p openshell-server update_provider_profile -- --nocapture
  • cargo test -p openshell-server sandbox_config_uses_updated_custom_provider_profile -- --nocapture
  • cargo test -p openshell-server provider_env_revision_changes_when_custom_profile_token_grant_changes -- --nocapture
  • cargo test -p openshell-cli provider_profile_commands_parse -- --nocapture
  • cargo test -p openshell-cli provider_profile_cli_run_functions_support_custom_profiles -- --nocapture
  • mise run pre-commit
  • E2E skipped: no e2e/ files changed

Tests added:

  • Unit: Provider profile update handler tests for success, built-in/missing rejection, metadata preservation, and dynamic-token ambiguity rejection.
  • Integration: Policy/config and CLI integration tests for updated profile effects and CLI lifecycle.
  • E2E: N/A.

Checklist

  • Follows Conventional Commits
  • Commit is signed off (DCO)
  • Documentation updated

@github-actions

Copy link
Copy Markdown

@johntmyers johntmyers added the gator:in-review Gator is reviewing or awaiting PR review feedback label Jun 15, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

PR Review Status

Validation: Project-valid. PR #1914 implements approved provider profile update work from #1881, is authored by a repo admin, is non-draft, and DCO/branch checks are passing.
Head SHA: 94022415e4e9122693afb17ad17c42b0cb754edf

Review findings:

  • proto/openshell.proto:1112 / crates/openshell-server/src/grpc/provider.rs:1370: the update RPC has no client-supplied resource_version, so stale local profile files can silently overwrite a newer profile. The handler fetches the current object and then matches that current version, which only catches a narrow race between fetch and write. Please expose profile metadata/resource versions on get/list or add an update item with expected_resource_version, then pass that value into WriteCondition::MatchResourceVersion.
  • crates/openshell-server/src/grpc/provider.rs:1366: multi-profile updates validate the batch, then write profiles one by one. If a later write fails from conflict or storage error, earlier profiles remain updated and the client receives an error without structured partial-state details. Please add transactional batch support for this path or reject multi-profile updates until atomic batch writes exist.

Docs: Fern docs were updated in existing provider pages; no navigation change is needed.

Next state: gator:in-review

@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

Re-check After Author Update

I re-evaluated latest head b22db325ad3c9efed935e8b70220df8f1fe18ef2 after the follow-up commit fix(providers): require CAS for profile updates pushed on 2026-06-15.

Disposition: partially resolved.

Resolved items:

  • The stale-overwrite concern is resolved in shape: update now requires a non-zero custom profile resource_version and writes with WriteCondition::MatchResourceVersion.
  • The partial multi-profile batch concern is resolved in shape: update is now a single-profile RPC/CLI operation, so there is no sequential multi-profile partial-write state.

Remaining items:

  • crates/openshell-server/src/grpc/provider.rs: profile update still validates dynamic token grant ambiguity against a snapshot of attached sandboxes, then writes the profile without holding the sandbox sync guard. A concurrent sandbox attach can validate against the old profile while the profile update validates against the old sandbox attachment set, leaving an ambiguous final dynamic-token state. Please serialize the update validation/write path with the same sandbox sync guard used by sandbox attach/detach, and hold it through the put_if write.
  • crates/openshell-cli/tests/provider_commands_integration.rs: the CLI lifecycle test manually injects resource_version from test state instead of exercising the documented export-edit-update workflow. Please add or adjust coverage so it imports a profile, exports YAML with the hydrated resource_version, edits that exported file while preserving the version, then runs provider profile update.

Docs: Fern docs are updated in the existing provider pages; no navigation change appears needed.

E2E: pending. This PR touches provider credential/policy behavior, so test:e2e should be applied once review feedback is resolved and the PR moves to pipeline watch.

Next state: gator:in-review

@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

Re-check After Author Update

I re-evaluated latest head f0209f9a0e929917f5c4d1a5679ab0b0cfcb6e66 after the follow-up commit fix(providers): serialize profile update validation pushed on 2026-06-15.

Disposition: partially resolved.

Resolved items:

  • The profile update path now holds sandbox_sync_guard while checking attached-sandbox dynamic token grant ambiguity and through the CAS write.
  • The CLI lifecycle coverage now exercises the documented export-edit-update workflow by exporting YAML with resource_version, editing it, and running provider profile update.

Remaining items:

  • crates/openshell-server/src/grpc/provider.rs / crates/openshell-server/src/grpc/sandbox.rs: the dynamic token-grant ambiguity invariant is still TOCTOU-prone because ImportProviderProfiles and CreateSandbox with initial providers are not under the same serialization guard. A racing sandbox create can validate providers against the old profile while a profile import/update scans the old sandbox set and writes a profile, leaving an ambiguous final token-grant state. Please put every writer participating in this invariant under the same guard through validation plus persistence, including provider profile import and sandbox creation with initial providers, and add regression coverage for the overlap.
  • crates/openshell-cli/src/main.rs / crates/openshell-cli/src/run.rs: this PR appears to remove openshell provider list -o json|yaml and the structured provider-list output path. That looks unrelated to provider profile updates and regresses machine-readable provider listing. Please restore the output argument and structured output path, or split that removal into a separate maintainer-approved UX change.

Docs: Fern docs are updated in the existing provider pages; no navigation change appears needed.

E2E: still pending. This PR touches provider credential/policy behavior, so test:e2e should be applied once review feedback is resolved and the PR moves to pipeline watch.

Next state: gator:in-review

@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

PR Review Status

Validation: Project-valid. PR #1914 implements the validated provider profile update work from #1881 and remains non-draft.
Head SHA: f0209f9a0e929917f5c4d1a5679ab0b0cfcb6e66

Review findings:

  • Still blocking: crates/openshell-server/src/grpc/provider.rs / crates/openshell-server/src/grpc/sandbox.rs: the dynamic token-grant ambiguity invariant remains TOCTOU-prone because ImportProviderProfiles and CreateSandbox with initial providers are not serialized with the same sandbox_sync_guard through validation and persistence. A racing import and sandbox create can each validate against incomplete state and leave an ambiguous final dynamic-token grant set. Please put those writers under the same guard and add regression coverage for that overlap.
  • Resolved from the previous gator note: the suspected openshell provider list -o json|yaml / structured provider-list regression is not present in the current head. Structured provider list-profiles -o json|yaml still routes through print_output_direct.

Docs: Fern docs are updated in the existing provider pages; no navigation change appears needed.

E2E: pending. This PR touches provider credential/policy behavior, so test:e2e should be applied once review feedback is resolved and the PR moves to pipeline watch.

Next state: gator:in-review

Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>
Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>
Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>
@johntmyers johntmyers force-pushed the feat/1881-provider-profile-update/johntmyers branch from f0209f9 to 8ee094b Compare June 16, 2026 14:51
@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

Re-check After Author Update

I re-evaluated latest head 8ee094b8f0acb8fbd114ad27cff6b723eb65b02b after the follow-up update to PR #1914.

Disposition: partially resolved.

Resolved items:

  • The stale-overwrite concern remains resolved: exported custom profiles carry resource_version, update requires a non-zero version, and the server writes with WriteCondition::MatchResourceVersion.
  • The partial multi-profile update concern remains resolved: update is now single-profile.
  • The CLI lifecycle coverage exercises the documented export-edit-update workflow with the hydrated resource_version.
  • The suspected structured provider-list regression is not present in the current head.

Remaining items:

  • crates/openshell-server/src/grpc/provider.rs / crates/openshell-server/src/grpc/sandbox.rs: the dynamic token-grant ambiguity invariant is still TOCTOU-prone because ImportProviderProfiles and CreateSandbox with initial providers are not serialized with the same sandbox_sync_guard through validation and persistence. UpdateProviderProfiles, sandbox attach, and sandbox detach are guarded, but import still validates attached-sandbox ambiguity before writing profiles without the guard, and sandbox create validates initial provider bindings before persisting without the guard. A racing import and sandbox create can each validate against incomplete state and leave an ambiguous final dynamic-token grant set.
  • Please put every writer participating in this invariant under the same guard through validation plus persistence, including provider profile import and sandbox creation with initial providers, and add regression coverage for the import/create overlap.

Docs: Fern docs are updated in the existing provider pages; no navigation change appears needed.

E2E: still pending. This PR touches provider credential/policy behavior, so test:e2e should be applied once review feedback is resolved and the PR moves to pipeline watch.

Independent reviewer result: confirmed the serialization issue as blocking; CAS/export/docs/test coverage otherwise looks satisfactory.

Next state: gator:in-review

Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>
@johntmyers

Copy link
Copy Markdown
Collaborator Author

Addressed the latest provider profile concurrency feedback in commit 3e0fe13. Profile imports now hold the shared sandbox/provider invariant guard from conflict validation through persistence, and sandbox creation with initial providers holds the same guard across provider existence/env-key validation through create. Added regression coverage for both guarded paths.\n\nVerification:\n- cargo test -p openshell-server provider_profile -- --nocapture\n- cargo test -p openshell-server create_sandbox -- --nocapture\n- mise run pre-commit

@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

Re-check After Author Update

I re-evaluated latest head 3e0fe13c3d3582a4a3c01dffd221ee8298457cf3 after the author's 2026-06-16 15:16 UTC update saying the provider profile concurrency feedback was addressed.

Disposition: partially resolved.

Resolved items:

  • The previous dynamic token-grant TOCTOU finding is resolved for this head. ImportProviderProfiles now holds the shared sandbox_sync_guard from validation through persistence, and sandbox creation with initial providers holds the same guard through provider validation and create.
  • Docs remain updated in the existing Fern provider pages; no navigation change appears needed.

Remaining items:

  • Blocking: proto/openshell.proto:1117, crates/openshell-cli/src/main.rs:933, crates/openshell-server/src/grpc/provider.rs:1403: the update API still has no stable target profile ID separate from the edited profile payload. The handler chooses the row to update from profile.id, so an exported profile file can be edited from id: a to id: b; if b exists at the same resource_version, the request updates b instead of rejecting an ID change. That violates feat(providers): support safe custom provider profile updates #1881's requirement to reject profile ID changes during update and risks wrong-profile overwrite. Please add an explicit target ID to the request/CLI, compare it against the payload ID, fetch/write by that target ID, and add a regression with two profiles at the same resource version where changing a's file to id: b leaves b unchanged.
  • Warning: crates/openshell-server/src/grpc/provider.rs:1472: profile delete still scans sandboxes_using_profile and then deletes without the shared sandbox/profile guard. A racing sandbox create/attach could validate while the profile exists while delete sees no users and removes it. Please consider putting the existence check, in-use scan, and delete under the same guard, ideally paired with a versioned delete.

Checks: required non-E2E checks are still running for this head (OpenShell / Branch Checks is pending). E2E remains not applied; this PR should get test:e2e after blocking review feedback is resolved and it moves to pipeline watch.

Next state: gator:in-review

@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

PR Review Status

Validation: Project-valid. PR #1914 implements the validated provider profile update work from #1881 and remains non-draft.
Head SHA: 3e0fe13c3d3582a4a3c01dffd221ee8298457cf3

Review findings:

  • Still blocking: proto/openshell.proto:1117, crates/openshell-cli/src/main.rs:933, crates/openshell-server/src/grpc/provider.rs:1403: independent review confirmed the update API still has no stable target profile ID separate from the edited profile payload. Resource versions are row-local, so an exported profile a can be edited to id: b and overwrite b when both rows have the same resource_version. Please add an explicit update target ID, compare it against the payload ID, fetch/write by the target ID, and add the two-profile same-version regression requested in the previous gator note.
  • Warning: crates/openshell-server/src/grpc/provider.rs:1493 / crates/openshell-server/src/grpc/provider.rs:1503: independent review also confirmed profile delete still has a check/delete race. Please consider holding the shared sandbox/profile guard through the existence check, in-use scan, and delete.

Docs: Fern docs are updated in existing provider pages; no navigation change appears needed.

Checks: required non-E2E checks are green. E2E is still not applied; this PR should get test:e2e after blocking review feedback is resolved and it moves to pipeline watch.

Next state: gator:in-review

Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>
@johntmyers johntmyers requested a review from maxamillion as a code owner June 16, 2026 21:01
@johntmyers

johntmyers commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed both follow-up review items in commit 41286ca6.

Changes:

  • Added explicit provider profile update target ID to the proto/CLI: openshell provider profile update <id> -f profile.yaml.
  • Server now fetches/writes by the explicit target ID and rejects payload ID changes before persistence.
  • Profile delete now holds the shared sandbox/profile guard across existence check, in-use scan, and delete.
  • Added regressions for payload ID mismatch leaving the other profile unchanged, and delete waiting on the shared guard.
  • Updated docs and architecture notes.

Verification:

  • cargo test -p openshell-server provider_profile -- --nocapture
  • cargo test -p openshell-cli provider_profile -- --nocapture
  • cargo test -p openshell-server sandbox_config_uses_updated_custom_provider_profile -- --nocapture
  • cargo test -p openshell-server provider_env_revision_changes_when_custom_profile_token_grant_changes -- --nocapture
  • mise run pre-commit

@johntmyers johntmyers added the test:e2e Requires end-to-end coverage label Jun 16, 2026
@github-actions

Copy link
Copy Markdown

Label test:e2e applied for 41286ca. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute the standard E2E suite after building the required gateway and supervisor images once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

Re-check After Author Update

I re-evaluated latest head 41286ca6563b2b55336b048690173b4c1a7fa4ca after the author's 2026-06-16 update saying the target-ID and delete-race review items were addressed.

Disposition: resolved.

Remaining items:

  • No blocking code review findings remain.

Review status:

  • The explicit provider profile update target ID is now present in the proto and CLI, and the server fetches/writes by that target ID while rejecting payload ID changes before persistence.
  • The requested regression verifies an edited export cannot overwrite a different profile with the same resource version.
  • Profile delete now holds the shared sandbox/profile guard across existence check, in-use scan, and delete.
  • Independent reviewer result: confirmed the previous blockers are resolved and found no new blocking code issues.

Docs: Fern docs are updated in the existing provider pages; no navigation change appears needed.

Checks: test:e2e is applied. The prior E2E workflow completed before the label was applied, so I reran the existing E2E workflow for this head and will watch the required OpenShell / E2E gate.

Next state: gator:watch-pipeline

@johntmyers johntmyers added gator:watch-pipeline Gator is monitoring PR CI/CD status and removed gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 16, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

Pipeline Follow-Up Needed

Head SHA: 41286ca6563b2b55336b048690173b4c1a7fa4ca

test:e2e ran for this head and the required OpenShell / E2E gate failed. The failed job is e2e / E2E (rust-podman) in https://github.com/NVIDIA/OpenShell/actions/runs/27647920470/job/81774472618.

The failing test is forward_proxy_allows_l7_permitted_request from e2e/rust/tests/forward_proxy_l7_bypass.rs; it expected 200 for an L7-allowed GET but got {"status": 403, "error": "HTTP Error 403: Forbidden"}. The docker, podman-rootless, Python, and Kubernetes smoke E2E jobs passed.

I did not find an existing open issue or PR documenting this exact test failure. Because this PR touches provider, policy, and sandbox behavior, and the failure is in a required E2E policy/proxy path, this needs author follow-up before maintainer handoff.

Next action: @johntmyers, please investigate the rust-podman L7 allowed-request failure or point to evidence that it is an unrelated CI flake or infrastructure incident. Gator will continue watching after an update.

Next state: gator:follow-up-needed

@johntmyers johntmyers added gator:follow-up-needed Gator needs submitter or maintainer follow-up and removed gator:watch-pipeline Gator is monitoring PR CI/CD status labels Jun 16, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

Pipeline Re-check

Head SHA: 41286ca6563b2b55336b048690173b4c1a7fa4ca

The required OpenShell / E2E gate is still reported as failed from the earlier rust-podman failure, but the same E2E workflow has been rerun and e2e / E2E (rust-podman) is now in progress: https://github.com/NVIDIA/OpenShell/actions/runs/27647920470/job/81782470635

Because the required E2E job is actively running again, gator is moving this PR back to pipeline watch and will re-evaluate the E2E result after the rerun completes.

Next state: gator:watch-pipeline

@johntmyers johntmyers added gator:watch-pipeline Gator is monitoring PR CI/CD status and removed gator:follow-up-needed Gator needs submitter or maintainer follow-up labels Jun 16, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

Pipeline Follow-Up Needed

Head SHA: 41286ca6563b2b55336b048690173b4c1a7fa4ca

The rerun of the required OpenShell / E2E gate completed and failed again. The failing job is e2e / E2E (rust-podman) in https://github.com/NVIDIA/OpenShell/actions/runs/27647920470/job/81782470635.

The repeated failing test is forward_proxy_allows_l7_permitted_request from e2e/rust/tests/forward_proxy_l7_bypass.rs. It expected HTTP 200 for an L7-allowed GET but got {"status": 403, "error": "HTTP Error 403: Forbidden"}. The core E2E result failed with E2E_RESULT: failure; docker, podman-rootless, Python, and Kubernetes smoke E2E jobs passed.

Because this is a repeated failure in a required policy/proxy E2E path and this PR touches provider, policy, and sandbox behavior, this still needs author follow-up before maintainer handoff.

Next action: @johntmyers, please investigate the rust-podman L7 allowed-request failure or point to evidence that it is unrelated to this PR.

Next state: gator:follow-up-needed

@johntmyers johntmyers added gator:follow-up-needed Gator needs submitter or maintainer follow-up gator:watch-pipeline Gator is monitoring PR CI/CD status and removed gator:watch-pipeline Gator is monitoring PR CI/CD status gator:follow-up-needed Gator needs submitter or maintainer follow-up labels Jun 16, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

Pipeline Re-check

Head SHA: 41286ca6563b2b55336b048690173b4c1a7fa4ca

The required OpenShell / E2E gate is still reported as failed from the previous rust-podman failure, but the same E2E workflow has another e2e / E2E (rust-podman) run queued for this head: https://github.com/NVIDIA/OpenShell/actions/runs/27647920470/job/81789281695

Because the required E2E job is active again, gator is moving this PR back to pipeline watch and will re-evaluate the E2E result after the rerun completes.

Next state: gator:watch-pipeline

@johntmyers johntmyers added gator:approval-needed Gator completed review; maintainer approval needed and removed gator:watch-pipeline Gator is monitoring PR CI/CD status labels Jun 18, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator Author

gator-agent

Maintainer Approval Needed

Gator validation and PR monitoring are complete.

Validation: PR #1914 is project-valid because it implements the validated provider profile update work from #1881 and remains non-draft.
Review: Blocking review findings were resolved at head 41286ca6563b2b55336b048690173b4c1a7fa4ca; the independent reviewer found no new blocking code issues after the final target-ID/delete-race updates.
Docs: Fern docs were updated in the existing provider pages; no navigation change appears needed.
Checks: Required gates are green for this head: OpenShell / Branch Checks, OpenShell / Helm Lint, and OpenShell / E2E.
E2E: test:e2e is applied and the required OpenShell / E2E gate passed after rerun.

Human maintainer approval or merge decision is now required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gator:approval-needed Gator completed review; maintainer approval needed test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(providers): support safe custom provider profile updates

1 participant