Skip to content

[FIX] correct dry-run counts via planned-remap chaining#20

Merged
chandrasekharan-zipstack merged 2 commits into
mainfrom
fix/clone-dry-run-remap-chaining
Jun 18, 2026
Merged

[FIX] correct dry-run counts via planned-remap chaining#20
chandrasekharan-zipstack merged 2 commits into
mainfrom
fix/clone-dry-run-remap-chaining

Conversation

@chandrasekharan-zipstack

@chandrasekharan-zipstack chandrasekharan-zipstack commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

What

  • Fix unstract clone --dry-run so phase counts faithfully predict a real run. Adds a planned-remap mechanism so dependent phases plan-count without writing.

Why

  • Dry-run under-counted every phase downstream of a fresh create: a phase that would create a parent recorded no target id, so each dependent phase saw an empty remap and silently no-op'd — tool_instance, workflow_endpoint, pipeline, api_deployment reported 0 against a fresh target even though a real run creates them. It only looked right when the target already had the parents (the adopt path records a remap).
  • Dry-run also bucketed would-creates as skipped, so totals never matched the real run a dry-run exists to predict.

How

  • "Reads run, writes stub" contract — dry-run executes every read + dependency check, stubs only the POST/PATCH.
  • RemapTable.record_planned() mints a deterministic (uuid5) synthetic target id so dependent phases resolve the FK and plan-count; is_planned() flags them; snapshot(hide_planned=) masks them in the report.
  • Every create-capable phase's dry-run branch now counts in the bucket a real run would (created/adopted) and records a planned remap (group, adapter, connector, tag, custom_tool, files, workflow, tool_instance, workflow_endpoint, pipeline, api_deployment).
  • custom_tool runs its source-side validations (frictionless-adapter check, source-registry lookup → planned prompt_studio_registry remap) in dry-run so the plan reflects real create-vs-skip.
  • Phases doing live target lookups (tool_instance, workflow_endpoint, files) guard on is_planned — a synthetic parent id has no row on target, so they short-circuit instead of querying it (otherwise workflow_endpoint would count every endpoint as failed).
  • Report shows a DRY RUN banner; synthetic ids never reach the wire.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • No. Real-run (non-dry-run) behavior is unchanged: every change is inside an if dry_run branch, except two safe reorders that leave the real-run sequence identical — workflow_endpoint resolves the connector before the dry-run gate (real run: resolve → patch, same as before), and custom_tool._create_fresh moves the dry-run gate after the source-side validations (real run: validate → import, same as before). New report/remap fields default off and only affect dry-run. Full suite (incl. all non-dry happy-path tests) is green.

Database Migrations

  • None.

Env Config

  • None.

Relevant Docs

  • None.

Related Issues or PRs

  • Foundation (§7) for the planned cloud-entity clone support design.

Dependencies Versions

  • None (uses stdlib uuid only).

Notes on Testing

  • uv run pytest192 passed.
  • Updated per-phase dry-run tests to the new contract (created/adopted + no writes + planned remap recorded).
  • New coverage: RemapTable planned mechanics (determinism, is_planned, hide_planned masking) and planned-parent guards for tool_instance / workflow_endpoint (the end-to-end chain that previously regressed to 0).

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

🤖 Generated with Claude Code

https://claude.ai/code/session_011ja9H1rnSXmPUgQtHm8TNS

Dry-run under-counted every phase downstream of a fresh create: a phase
that would create a parent recorded no target id, so dependent phases saw
an empty remap and no-op'd (tool_instance/endpoint/pipeline/api_deployment
showed 0 on a fresh target). Dry-run also counted would-creates as
`skipped`, so totals didn't match the real run it's meant to predict.

Fix centrally with a "reads run, writes stub" contract:
- RemapTable.record_planned() mints a deterministic synthetic target id so
  dependent phases resolve the FK and plan-count without writing;
  is_planned() flags them; snapshot(hide_planned=) masks them in the report.
- Every create-capable phase's dry-run branch now counts in the bucket a
  real run would (created/adopted) and records a planned remap.
- custom_tool runs its source-side validations (frictionless check, source
  registry lookup) in dry-run so the plan reflects real create-vs-skip.
- Phases doing live target lookups (tool_instance, workflow_endpoint, files)
  guard on is_planned to avoid querying a synthetic parent id.
- Report shows a DRY RUN banner; synthetic ids never reach the wire.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011ja9H1rnSXmPUgQtHm8TNS
@chandrasekharan-zipstack chandrasekharan-zipstack changed the title fix(clone): correct dry-run counts via planned-remap chaining [FIX] correct dry-run counts via planned-remap chaining Jun 18, 2026
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes unstract clone --dry-run undercounting by introducing a "planned-remap" mechanism: when a dry-run would create an entity, RemapTable.record_planned() mints a deterministic synthetic UUID (via uuid5) so dependent phases can resolve the FK and plan-count without writing. Every create-capable phase's dry-run branch now increments the same bucket (created/adopted) a real run would, and a DRY RUN banner is added to the report.

  • Core mechanic (context.py): record_planned(), is_planned(), and snapshot(hide_planned=) are added to RemapTable; synthetic ids are tracked in _planned and masked to \"(planned)\" in the snapshot.
  • Phase updates (adapter, connector, tag, group, custom_tool, files, workflow, tool_instance, workflow_endpoint, pipeline, api_deployment): dry-run branches switched from skipped to created/adopted and record planned remaps; phases with synthetic-parent guards (tool_instance, workflow_endpoint, files) short-circuit live target lookups to avoid querying non-existent rows.
  • Tests: 192 passing; per-phase dry-run tests updated to the new contract, plus new RemapTable unit tests and planned-parent chain tests.

Confidence Score: 5/5

Safe to merge — all changes are gated behind if dry_run branches; real-run paths are unchanged and the test suite is green at 192 passed.

The implementation is well-structured and non-destructive: every new code path is inside an explicit dry-run guard, synthetic IDs never touch the wire, and the planned-remap chain correctly predicts counts for the core scenario (fresh create followed by dependent phases). The one inaccuracy identified is the planned-parent fast path in workflow_endpoint skipping the connector-remap guard — a minor overcounting risk limited to partial runs where connectors are excluded or failed.

src/unstract/clone/phases/workflow_endpoint.py — the planned-parent fast path at lines 87–98 could benefit from a connector-remap check to stay consistent with how _patch_endpoint handles unmapped connectors.

Important Files Changed

Filename Overview
src/unstract/clone/context.py Adds record_planned() / is_planned() / snapshot(hide_planned=) to RemapTable; uuid5-based determinism is correct, _planned set properly isolated from real ids.
src/unstract/clone/phases/custom_tool.py Refactors dry-run create/adopt to use record_planned; source-side validations run before the dry-run gate so frictionless-skip is accurately reflected. _record_planned_registry correctly mirrors the real-run registry-republish path as a read-only source lookup.
src/unstract/clone/phases/workflow_endpoint.py Adds a planned-parent fast path to avoid querying a non-existent planned workflow's endpoints, but the fast path omits the connector-remap check that _patch_endpoint performs, causing overcounting for partial runs with excluded/failed connectors.
src/unstract/clone/phases/tool_instance.py Planned-parent guard correctly short-circuits the live target lookup; dry-run adopt path now correctly counts adopted and records real remap. Logic is sound.
src/unstract/clone/report.py Adds dry_run field to CloneReport; DRY RUN banner is emitted in both rich and plain renderers. as_dict() also exports the new field.
tests/clone/test_remap_table.py New tests cover determinism, is_planned false-positive, and snapshot hide_planned masking. Good coverage of the new planned mechanics.
tests/clone/test_tool_instance_phase.py New test_dry_run_planned_workflow_predicts_create covers the chained planned-parent scenario. All updated assertions are consistent with the new contract.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant O as Orchestrator
    participant WF as WorkflowPhase
    participant TI as ToolInstancePhase
    participant WE as WorkflowEndpointPhase
    participant RT as RemapTable

    Note over O: dry_run=True, fresh target

    O->>WF: run(report)
    WF->>RT: record_planned("workflow", src_wf_id)
    RT-->>WF: synthetic_wf_id (uuid5)
    WF-->>O: "result.created += 1"

    O->>TI: run(report)
    TI->>RT: "snapshot() → {workflow: {src_wf_id: synthetic_wf_id}}"
    TI->>RT: is_planned(synthetic_wf_id) → True
    Note over TI: skip live target lookup
    TI->>RT: record_planned("tool_instance", src_ti_id)
    TI-->>O: "result.created += 1"

    O->>WE: run(report)
    WE->>RT: "snapshot() → {workflow: {src_wf_id: synthetic_wf_id}}"
    WE->>RT: is_planned(synthetic_wf_id) → True
    Note over WE: skip live endpoint listing
    loop each src_endpoint
        WE->>RT: record_planned("workflow_endpoint", src_ep_id)
        WE-->>O: "result.created += 1"
    end

    Note over O: remap_snapshot(hide_planned=True) → synthetics shown as "(planned)"
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant O as Orchestrator
    participant WF as WorkflowPhase
    participant TI as ToolInstancePhase
    participant WE as WorkflowEndpointPhase
    participant RT as RemapTable

    Note over O: dry_run=True, fresh target

    O->>WF: run(report)
    WF->>RT: record_planned("workflow", src_wf_id)
    RT-->>WF: synthetic_wf_id (uuid5)
    WF-->>O: "result.created += 1"

    O->>TI: run(report)
    TI->>RT: "snapshot() → {workflow: {src_wf_id: synthetic_wf_id}}"
    TI->>RT: is_planned(synthetic_wf_id) → True
    Note over TI: skip live target lookup
    TI->>RT: record_planned("tool_instance", src_ti_id)
    TI-->>O: "result.created += 1"

    O->>WE: run(report)
    WE->>RT: "snapshot() → {workflow: {src_wf_id: synthetic_wf_id}}"
    WE->>RT: is_planned(synthetic_wf_id) → True
    Note over WE: skip live endpoint listing
    loop each src_endpoint
        WE->>RT: record_planned("workflow_endpoint", src_ep_id)
        WE-->>O: "result.created += 1"
    end

    Note over O: remap_snapshot(hide_planned=True) → synthetics shown as "(planned)"
Loading

Reviews (2): Last reviewed commit: "fix(clone): custom_tool sub-paths own th..." | Re-trigger Greptile

Comment thread src/unstract/clone/phases/custom_tool.py
Address review: _create_fresh recorded the planned remap and _clone_one
re-recorded the same value, while the adopt path only recorded in
_clone_one. Drop the generic record in _clone_one; each sub-path
(adopt / fresh / fresh-dry-run) now records once, since only it knows
whether the target id is real or a planned synthetic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011ja9H1rnSXmPUgQtHm8TNS
@chandrasekharan-zipstack chandrasekharan-zipstack merged commit cd9bcf5 into main Jun 18, 2026
3 checks passed
@chandrasekharan-zipstack chandrasekharan-zipstack deleted the fix/clone-dry-run-remap-chaining branch June 18, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants