Skip to content

[FEAT] Support clone of cloud related entities - agentic PS, lookups and manual review #21

Merged
chandrasekharan-zipstack merged 11 commits into
mainfrom
feat/clone-cloud-entity-phases
Jun 19, 2026
Merged

[FEAT] Support clone of cloud related entities - agentic PS, lookups and manual review #21
chandrasekharan-zipstack merged 11 commits into
mainfrom
feat/clone-cloud-entity-phases

Conversation

@chandrasekharan-zipstack

@chandrasekharan-zipstack chandrasekharan-zipstack commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What

Extends unstract clone with full-fidelity cloud-only (enterprise) entity support: Lookups, Manual Review (HITL), and Agentic Prompt Studio — including share state / user-group replication — plus the foundation that lets these phases live in this OSS package while staying a no-op on OSS deployments.

Why

The clone migrated only OSS entities; an org using enterprise features got a green report that silently omitted the entire enterprise surface. These phases close that gap and reproduce each entity in the target as it is in the source, sharing included. They're gated by a capability probe, so an OSS run looks exactly like before.

How

Foundation

  • Capability probe: PlatformClient.probe(path) (200=present, 404=absent, else raise) + ctx.feature_present per-run cache + Phase.probe_path. The orchestrator probes source/target before a cloud phase and applies a skip matrix: source-absent → skip silently (no report row); source-present/target-absent → report.warnings + skip + continue; both present → run. Probe failure can't abort a run.
  • custom_tool now records a prompt remap (src→tgt prompt_id, matched by prompt_key; real + dry-run planned) so prompt-scoped cloud config can rewrite its FKs.

Sharing / user-group support (sharing.py)

  • The user/group axis mapping (groups via the group remap, users by email, owner/service-accounts skipped) is extracted into reusable helpers + a generic replicate_share() that's agnostic to the write mechanism. Existing /share/-POST phases (adapters, tools, workflows…) are unchanged.
  • Lookups and Agentic projects replicate shared_to_org + shared_users via their detail PATCH. Lookups have no group axis; Agentic group sharing is polymorphic/read-only on the write serializer and is warned, not silently dropped.

Phases — each create-or-adopt-by-name, FK-remapped, dry-run aware, probe-gated, share-replicated:

  • lookups — definition + draft template/adapters + reference-file blobs + full published-version history (each source published version is replayed: stage content + that version's files onto the draft, publish in version_number order, record a lookup_version remap; then the draft is restored to the source's current draft) + assignments (draft- and published-pinned, resolved via the version remap; prompt + lookup_definition + variable_mappings UUID remap).
  • manual_review — workflow-scoped RuleEngine (+ nested confidence filters) and HITLSettings rebased onto the cloned workflow; org-level AutoApprovalSettings with auto_approved_users remapped by email; ReviewApiKey recreated (secret is server-minted → warns operator to re-wire external consumers).
  • agentic_studio — project (4 adapter FKs remapped) + prompt-versions (parent-before-child via a version remap) + schemas + settings; registry republished via the project export action (analog of custom_tool's registry republish).

Can this PR break any existing features? If yes, list. If no, explain.

No. Every new phase is gated by probe_path; on a deployment without the feature (all OSS) the orchestrator skips it before it runs — no report row, no behavior change. The sharing.py change is a refactor that preserves the existing /share/-POST path (covered by existing tests). The only edit to an existing phase is custom_tool recording an extra prompt remap (additive; failures swallowed + warned). Full clone suite green (170 passed).

Notes on Testing

  • pytest tests/clone/170 passed (fakes cover the foundation, all 3 phases, sharing replication, and lookup version replay). ruff clean on all touched files.
  • Create/patch payloads + response envelopes were confirmed against the cloud backend serializers/views (not guessed). The phases are validated against fakes + serializer-confirmed shapes; an end-to-end run against a live cloud deployment is the recommended next validation step (the probe keeps them inert until then on any deployment lacking the feature).

Residual gaps (logged, non-fatal)

  • Lookups: assignment_values_snapshot on historical published versions is backend-derived at publish time (assignments are recreated after publish), so frozen snapshot values are best-effort; version structure + pinning are reproduced.
  • Manual Review: ReviewApiKey secret can't be copied (re-minted, warned); auto_approved_document_classes are mixed workflow-id/label strings with no reliable remap → carried verbatim with a verify-on-target warning.
  • Agentic Studio: group sharing (polymorphic, read-only on the write serializer) is warned, not replicated; AgenticSetting is an org-global key/value store (no project FK) — cloned as a flat pass.

Related

🤖 Generated with Claude Code

…dio)

Extend the org clone with cloud-only (enterprise) entity support, gated so OSS
runs are unchanged.

Foundation:
- Capability probe (PlatformClient.probe + ctx.feature_present cache) and a
  Phase.probe_path gate. The orchestrator probes source/target before a cloud
  phase: absent on source -> skip silently (OSS looks like today); present on
  source but absent on target -> warn + skip + continue. report.warnings added.
- custom_tool now records a src->tgt prompt-id remap (matched by prompt_key,
  real + dry-run planned) so prompt-scoped cloud config can rewrite its FKs.

Phases (each create-or-adopt-by-name, FK remap, dry-run aware, probe-gated):
- lookups: definition + draft template/adapters + reference-file blobs +
  draft-pinned assignments (prompt + lookup + variable_mappings remap).
  v1 defers published-version replay (logged, non-fatal).
- manual_review: workflow-scoped RuleEngine (+ confidence filters) and
  HITLSettings rebased onto the cloned workflow, org-level AutoApprovalSettings,
  ReviewApiKey recreated (secret re-minted -> operator re-wire warning).
- agentic_studio: project (4 adapter FKs remapped) + prompt-versions
  (parent-before-child) + schemas + settings, registry republished via export.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CsGrHbs5SWmQkKqiimg6CF
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown

Greptile Summary

This PR extends unstract clone with three cloud-only (enterprise) phases — Lookups, Manual Review (HITL), and Agentic Prompt Studio — plus the capability-probe gate that silently no-ops all three on OSS deployments. It also refactors sharing.py into a generic replicate_share() helper, fixes a pre-existing bug where workflow endpoints with unmapped connectors were wholly skipped instead of partially patched, and restores pipeline active state on create.

  • Capability probe: PlatformClient.probe() + ctx.feature_present() cache + _cloud_phase_runnable() orchestrator gate; source-absent = silent skip, target-absent = warn+skip, both present = run.
  • Lookups phase: full definition clone with draft content, published-version history replay (template/adapter/file staging + publish), prompt assignment replication, and share state via PATCH.
  • Manual Review phase: per-workflow rule engine (with nested confidence filters) and HITL settings clone, org-level auto-approval settings with user remapping, review API key recreation (server-minted, operator warned).
  • Agentic Studio phase: project clone with four adapter FK remaps, prompt-version and schema child cloning with adopt-by-version-number idempotency, registry republish, and org-wide settings sync.

Confidence Score: 5/5

Safe to merge. All new cloud phases are probe-gated and have no effect on OSS deployments. The fixed issues (get_review_settings error swallowing, staging-failure publish guards, adopted-project child idempotency) are correctly addressed.

The three new phases are well-isolated behind the capability probe and covered by 170 tests. The previously reported bugs are properly fixed in this revision. The two remaining notes are minor quality items: a service-account fallback regression that only affects older backends with spurious warnings (no data impact), and a file-upload failure path that still allows publishing a version with missing reference files (template and adapter content are correct; only auxiliary blobs may be absent). Neither introduces wrong data or silently drops real entities.

src/unstract/clone/phases/lookups.py — version-replay file-failure handling; src/unstract/clone/sharing.py — is_service_account fallback removal

Important Files Changed

Filename Overview
src/unstract/clone/phases/lookups.py New 869-line phase; handles lookup definition clone, published-version history replay, draft restoration, reference-file transfer, and assignment replication. Template and adapter staging failures correctly bail before publishing; file-upload failures during version replay do not, leaving a possibility of publishing an incomplete version.
src/unstract/clone/phases/agentic_studio.py New 587-line phase; clones agentic projects with adapter FK remapping, prompt-versions (parent-before-child sort), schemas (version-keyed idempotency), registry republish, and org settings. Adopt-by-version idempotency guards are present for both prompt versions and schemas.
src/unstract/clone/phases/manual_review.py New 454-line phase; per-workflow rule engine (with nested confidence filters) and HITL settings cloning, org-level auto-approval with email-based user remapping, and review API key recreation with operator warning. The previously reported get_review_settings error-swallowing bug is fixed: only 500 is suppressed.
src/unstract/clone/client.py Adds ~370 lines of new API methods for lookups, manual review, and agentic studio; adds probe(). get_review_settings is fixed to only suppress 500 (not all PlatformAPIError). Clean error handling throughout.
src/unstract/clone/sharing.py Refactored into replicate_share() (generic) + apply_share_state() (thin /share/ POST wrapper). The is_service_account email-suffix fallback is removed; old backends without the is_service_account field will see spurious user-not-found warnings for platform service accounts.
src/unstract/clone/orchestrator.py Adds _cloud_phase_runnable() probe gate and wires the three new phases into PHASES. Skip matrix (source-absent/target-absent/probe-failure) is correctly implemented; probe failures are non-fatal by design.
src/unstract/clone/phases/workflow_endpoint.py Fixes unmapped-connector handling: instead of skipping the whole endpoint PATCH, the phase now sets connection_type and warns the operator, leaving the connector slot for UI re-binding. Correct and safe.
src/unstract/clone/phases/pipeline.py Adds _restore_active_state() to disable an inactive source pipeline on the target after create (backend always force-activates on create). Clean and correctly guarded.
src/unstract/clone/phases/custom_tool.py Adds _remap_prompts() to record source->target prompt-id remaps (matched by prompt_key) so downstream lookup assignments can rewrite their FKs. Changes adapter-resolution failure from hard-fail to best-effort/warn. Registry republish demoted from error to warning.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    O[Orchestrator: next phase] --> P{probe_path set?}
    P -- No --> RUN[Run phase normally]
    P -- Yes --> PS[probe source]
    PS -- 404 absent --> SS[Silent skip]
    PS -- other error --> WS[warn + skip]
    PS -- 200 present --> PT[probe target]
    PT -- 404 absent --> WT[warn + skip]
    PT -- other error --> WT
    PT -- 200 present --> RUN
    RUN --> LK{Phase?}
    LK -- lookups --> L1[list definitions] --> L2[parallel clone definitions]
    L2 --> L3[_clone_assignments]
    LK -- manual_review --> M1[per-workflow rules+settings]
    M1 --> M2[auto-approval settings]
    M2 --> M3[review API keys]
    LK -- agentic_studio --> A1[parallel clone projects]
    A1 --> A2[prompt-versions]
    A2 --> A3[schemas] --> A4[registry republish]
    A4 --> A5[org-wide settings]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    O[Orchestrator: next phase] --> P{probe_path set?}
    P -- No --> RUN[Run phase normally]
    P -- Yes --> PS[probe source]
    PS -- 404 absent --> SS[Silent skip]
    PS -- other error --> WS[warn + skip]
    PS -- 200 present --> PT[probe target]
    PT -- 404 absent --> WT[warn + skip]
    PT -- other error --> WT
    PT -- 200 present --> RUN
    RUN --> LK{Phase?}
    LK -- lookups --> L1[list definitions] --> L2[parallel clone definitions]
    L2 --> L3[_clone_assignments]
    LK -- manual_review --> M1[per-workflow rules+settings]
    M1 --> M2[auto-approval settings]
    M2 --> M3[review API keys]
    LK -- agentic_studio --> A1[parallel clone projects]
    A1 --> A2[prompt-versions]
    A2 --> A3[schemas] --> A4[registry republish]
    A4 --> A5[org-wide settings]
Loading

Reviews (10): Last reviewed commit: "UN-3479 [DOCS] Add clone compatibility n..." | Re-trigger Greptile

Comment thread src/unstract/clone/phases/lookups.py Outdated
chandrasekharan-zipstack and others added 3 commits June 18, 2026 19:54
…story

Make the cloned cloud entities match the source, including user-group sharing.

Sharing (sharing.py): extract the user/group axis mapping into reusable helpers
and add replicate_share(), generic over the write mechanism (POST /share/ or a
detail PATCH). Existing /share/-POST phases are unchanged (thin wrapper).
- lookups + agentic projects now replicate shared_to_org + shared_users
  (mapped by email) via their detail PATCH; lookups have no group axis, agentic
  group sharing is polymorphic/read-only and is warned, not dropped silently.

Lookups published-version replay: reproduce each source published version
(stage template + remapped adapters + that version's reference files onto the
draft, then publish in version_number order, recording a lookup_version remap),
then restore the draft to the source's current draft. Assignments now resolve
published-pinned versions via the version remap instead of being skipped.
Residual: assignment_values_snapshot is backend-derived at publish time (best
effort). Fixes list_lookup_versions to unwrap the {"versions": [...]} envelope.

Manual review: auto_approved_users remapped by email (was carried verbatim);
auto_approved_document_classes carried with a verify-on-target warning. MR rows
have no share fields (inherit workflow/org visibility).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CsGrHbs5SWmQkKqiimg6CF
…ns + MR api keys

- lookups: adopt existing target published versions by name instead of
  re-publishing (fixes spurious failures / duplicate version history on
  re-runs and adopted definitions)
- lookups: guard draft_cache so a failed fetch can't clobber a peer's
  valid draft id (TOCTOU)
- manual_review: adopt ReviewApiKey by (class_name, description) so
  re-runs don't create duplicate keys; only warn for keys actually minted
- tests for both adopt paths

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CsGrHbs5SWmQkKqiimg6CF
Drop persona-mode markers and conversational phrasing from comments.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CsGrHbs5SWmQkKqiimg6CF
Comment thread src/unstract/clone/phases/lookups.py
chandrasekharan-zipstack and others added 2 commits June 18, 2026 20:34
Share axes are read-only on AgenticProjectSerializer, so a detail PATCH
silently no-ops. Route share replication through the dedicated share
action (POST agentic/projects/{id}/share/), which also handles the
polymorphic group axis — so agentic group shares now replicate like every
other shared resource. Drops the now-dead update_agentic_project_share
client method and the include_groups=False special-case.

Found while prepping live staging test scenarios for the clone tool.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CsGrHbs5SWmQkKqiimg6CF
Dry-run records planned group remaps as synthetic uuids; the share-payload
builder int()-cast them and crashed the whole run. Cast only real (digit)
pks, keep planned uuids as-is (never POSTed in dry-run).

Caught running a live dry-run against staging.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CsGrHbs5SWmQkKqiimg6CF
Comment thread src/unstract/clone/phases/agentic_studio.py
chandrasekharan-zipstack and others added 2 commits June 18, 2026 22:40
AgenticSetting.key is globally unique across orgs, so a create can collide
with a row owned by another org that isn't in this org's listing (surfaces
as a 500). That isn't data loss the clone can resolve — downgrade from a
hard failure to a warned skip.

Surfaced verifying a live staging clone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CsGrHbs5SWmQkKqiimg6CF
CustomToolPhase failed the whole tool on three recoverable conditions,
turning pre-existing source-config gaps into clone failures the operator
would blame on the clone. Mirror the tool unconfigured instead:

- No default profile on source: import with an empty adapter set. The
  backend auto-creates an unconfigured default profile and flags
  needs_adapter_config rather than rejecting the import.
- Missing target adapter remap: _resolve_target_adapter_ids is now
  best-effort — it resolves the adapters that map and omits the rest,
  so a partial set still imports (flagged needs_adapter_config).
- Registry republish 500 (stale/empty source registry): warn instead
  of fail. The tool itself cloned; only its registry entry is missing,
  so downstream tool_instances cascade-skip until re-published.

In every case the operator wires the adapters on target and re-runs the
(idempotent) clone to complete downstream. Frictionless-bound tools are
unchanged — still hard-skipped + cascaded, since the adapter is
cloud-only with no target equivalent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: 4f5ab436-eabb-4464-9a96-f024724bc818
@chandrasekharan-zipstack chandrasekharan-zipstack changed the title feat(clone): cloud entity phases — lookups, manual review, agentic studio feat(clone): clone cloud entities (lookups, manual review, agentic studio) with share-state replication Jun 18, 2026
Comment thread src/unstract/clone/client.py
@chandrasekharan-zipstack chandrasekharan-zipstack changed the title feat(clone): clone cloud entities (lookups, manual review, agentic studio) with share-state replication [FEAT] Support clone of cloud related entities - agentic PS, lookups and manual review Jun 18, 2026
…ntic re-run idempotency

- client.get_review_settings: only suppress the backend's DoesNotExist
  (500); re-raise 401/403/429 so an auth error can't silently drop a
  configured HITLSettings row.
- lookups._replay_one_version: return after a template/adapter staging
  failure instead of publishing a version with stale content.
- agentic _clone_prompt_versions / _clone_schemas: adopt children already
  on target (keyed by version) so a re-run against the same pair doesn't
  re-create duplicates — mirrors the lookups version-replay guard.

Tests added for each. The draft_cache TOCTOU (P2) is already mitigated:
the `is None` write guard guarantees a valid draft id is never overwritten
by a peer's failure; only a benign extra GET can occur.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: 4f5ab436-eabb-4464-9a96-f024724bc818
@chandrasekharan-zipstack chandrasekharan-zipstack force-pushed the feat/clone-cloud-entity-phases branch from 058e35b to cafe776 Compare June 18, 2026 20:33
…ion_type on unmapped connector

Clone faithfully mirrors a source pipeline's disabled state instead of
letting the backend's force-activate leave it running on the target
scheduler. Endpoint patch now sets connection_type even when a source
connector has no remap, so runs fail with a clear 'connector not
configured' instead of an invalid empty connection type.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CsGrHbs5SWmQkKqiimg6CF
…mments

- README: document `unstract clone` compatibility (capability-probed,
  match builds, pin client >= 1.4.0).
- Trim comments/docstrings that disclosed cloud-backend internals
  (model/serializer/method names, cross-org invariants, queryset
  semantics); WHY preserved, public API surface unchanged.
- Drop dead service-account email-suffix fallback; rely on the
  `is_service_account` flag the members API returns.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CsGrHbs5SWmQkKqiimg6CF
@chandrasekharan-zipstack chandrasekharan-zipstack merged commit 5bc13b4 into main Jun 19, 2026
3 checks passed
@chandrasekharan-zipstack chandrasekharan-zipstack deleted the feat/clone-cloud-entity-phases branch June 19, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants