feat(logs): redact PII from workflow logs via configurable rules#5136
feat(logs): redact PII from workflow logs via configurable rules#5136TheodoreSpeaks wants to merge 4 commits into
Conversation
Enterprise PII redaction for workflow execution logs, configured under Data Retention as org-scoped rules (each rule picks entity types + which workspaces it applies to). Reuses the guardrails Presidio engine in mask mode at the log-persist choke point, with a check-digit-validated VIN recognizer. Also adds per-workspace data-retention-hours overrides.
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview Extends data retention with per-workspace hour overrides ( Data Retention UI gains a PII rules section (list + modal with searchable entity grid and workspace targeting); retention hours and rules save separately. Guardrails gain a client-safe Reviewed by Cursor Bugbot for commit 1635ce1. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
@greptile review |
…rkflow-log # Conflicts: # packages/db/migrations/meta/0241_snapshot.json # packages/db/migrations/meta/_journal.json # scripts/check-api-validation-contracts.ts
| ): number | null { | ||
| const { workspaceSettings, orgSettings, key, fallback } = params | ||
| return workspaceSettings?.[key] ?? orgSettings?.[key] ?? fallback | ||
| } |
There was a problem hiding this comment.
Null retention treated as unset
High Severity
resolveEffectiveRetentionHours chains values with ??, so an explicit null retention hour (Forever in the UI) is skipped and replaced by the org value or enterprise plan fallback. Enterprise cleanup then schedules deletion when it previously skipped, and workspace overrides to Forever never win over org numeric defaults.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit d9a4dd4. Configure here.
| // Settings are only writable by enterprise orgs, but re-verify at read time | ||
| // (e.g. a plan downgrade) before doing the work. | ||
| if (!(await isWorkspaceOnEnterprisePlan(workspaceId))) return payload | ||
|
|
There was a problem hiding this comment.
Enterprise check skips active redaction
Medium Severity
When org PII redaction rules are active, applyPiiRedaction still returns the raw payload if isWorkspaceOnEnterprisePlan is false. That helper returns false on lookup errors, so a transient billing/workspace failure bypasses masking and can persist unredacted PII despite enabled rules.
Reviewed by Cursor Bugbot for commit d9a4dd4. Configure here.
Greptile SummaryThis PR adds enterprise PII redaction for workflow execution logs, applying Microsoft Presidio in
Confidence Score: 4/5Safe to merge with one fix:
Important Files Changed
Reviews (3): Last reviewed commit: "fix(logs): widen PII entity visibleValue..." | Re-trigger Greptile |
| rule?.appliesToAllWorkspaces === true || | ||
| (Array.isArray(rule?.workspaceIds) && rule.workspaceIds.includes(params.workspaceId)) | ||
| ) | ||
|
|
||
| const union = new Set<string>() | ||
| for (const rule of applicable) { | ||
| if (!Array.isArray(rule.entityTypes)) continue | ||
| for (const t of rule.entityTypes) { | ||
| if (typeof t === 'string') union.add(t) | ||
| } | ||
| } | ||
| if (union.size === 0) return DEFAULT_PII_REDACTION | ||
| return { enabled: true, entityTypes: [...union] } | ||
| } |
There was a problem hiding this comment.
entityTypes: [] silently means "redact nothing", contradicting the schema JSDoc
resolveEffectivePiiRedaction contributes nothing to the union when a rule has entityTypes: [], so if all applicable rules have empty entity-type lists the union stays empty and the function returns enabled: false — no redaction. However, every site that documents entityTypes carries the JSDoc "Empty = redact all detected PII" (schema.ts PiiRedactionRule, primitives.ts piiRedactionRuleSchema, pii-redaction.ts PiiRedactionOptions). An enterprise operator who creates a catch-all rule with no entity types selected, expecting Presidio to mask every recognised PII type, will instead get zero redaction — a silent privacy leak. The Zod schema accepts entityTypes: [] without error, so the bug is reachable via the API. Either the resolution logic should treat an empty entity-type list as "mask all" (pass [] through, which Presidio interprets as all types), or the JSDoc needs to be corrected to "Empty = skip rule (redact nothing)".
| return value.length > 0 && Buffer.byteLength(value, 'utf8') <= PII_MAX_STRING_BYTES | ||
| } | ||
|
|
||
| /** | ||
| * Rebuild `value` replacing every eligible string leaf with `handle(leaf)`. | ||
| * Used for both collection (handle records and returns the input) and | ||
| * substitution (handle returns the masked value), so traversal order and |
There was a problem hiding this comment.
Oversized strings are never replaced with
REDACTION_FAILED_MARKER on the scrub path
When the total payload exceeds PII_MAX_TOTAL_BYTES, all eligible strings are replaced with REDACTION_FAILED_MARKER. But strings larger than PII_MAX_STRING_BYTES (128 KB) are excluded from collected entirely, so they pass through the substitution walk unchanged — even in the scrub/fail-safe branch. A block that emits a 200 KB string containing an email address or SSN will never be masked, regardless of whether the overall ceiling is hit or Presidio throws. The comment acknowledges this ("almost always base64 blobs") but does not log a warning when an oversized string is skipped, making it invisible to operators.
| } | ||
| } | ||
|
|
||
| /** | ||
| * Mask PII from log content before persistence when the execution's workspace | ||
| * (via workspace override or org default) has enterprise PII redaction enabled. | ||
| * Resolved at persist time so both the inline and externalized write paths are | ||
| * covered. Returns the payload unchanged when disabled or non-enterprise. | ||
| */ | ||
| private async applyPiiRedaction( | ||
| workspaceId: string | null, | ||
| payload: RedactablePayload | ||
| ): Promise<RedactablePayload> { | ||
| if (!workspaceId) return payload | ||
|
|
||
| const [row] = await db | ||
| .select({ orgSettings: organization.dataRetentionSettings }) | ||
| .from(workspace) | ||
| .leftJoin(organization, eq(organization.id, workspace.organizationId)) | ||
| .where(eq(workspace.id, workspaceId)) | ||
| .limit(1) | ||
| if (!row) return payload | ||
|
|
||
| const config = resolveEffectivePiiRedaction({ orgSettings: row.orgSettings, workspaceId }) | ||
| if (!config.enabled) return payload | ||
|
|
||
| // Settings are only writable by enterprise orgs, but re-verify at read time | ||
| // (e.g. a plan downgrade) before doing the work. | ||
| if (!(await isWorkspaceOnEnterprisePlan(workspaceId))) return payload | ||
|
|
||
| return redactPIIFromExecution(payload, { entityTypes: config.entityTypes }) | ||
| } | ||
|
|
||
| async completeWorkflowExecution(params: { | ||
| executionId: string |
There was a problem hiding this comment.
Unconditional DB join on every workflow completion
applyPiiRedaction always fires a workspace → LEFT JOIN organization query, plus a second isWorkspaceOnEnterprisePlan call when redaction is configured, regardless of whether the org has any PII rules or is even on an enterprise plan. For deployments where the majority of executions are personal/non-enterprise workspaces this adds two round-trips to the hot path of every workflow. A lightweight guard (e.g. caching the org's enterprise status or checking a feature-flag ahead of the query) would let the common case skip both calls entirely.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Greptile SummaryThis PR adds enterprise-grade PII redaction for workflow execution logs, a new per-workspace data-retention-hours override, and a check-digit-validated VIN recognizer. The core redaction pipeline is well-engineered: a deterministic two-pass collect/substitute approach batches all strings from a single execution into one chunked Presidio call, and a fail-safe ensures PII is never persisted when masking fails.
Confidence Score: 3/5The redaction pipeline and workspace-override logic are structurally sound, but three copies of a wrong comment on a security-sensitive field create a genuine risk of operators deploying empty-entity-type rules under the false impression that they cover all PII. The core machinery — two-pass collect/substitute, byte-chunked Presidio calls, fail-safe scrubbing, and the workspace ?? org ?? plan-default retention resolution — is implemented correctly. Three conflicting entityTypes comments in schema.ts, primitives.ts, and pii-redaction.ts say the opposite of what retention.ts implements. An enterprise admin who follows those comments and creates a rule with no entity types selected will have no PII masked at all — a silent failure in a privacy-protection feature. packages/db/schema.ts, apps/sim/lib/api/contracts/primitives.ts, and apps/sim/lib/logs/execution/pii-redaction.ts all carry the wrong entityTypes comment that contradicts retention.ts. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant E as Executor
participant L as ExecutionLogger
participant R as retention.ts
participant P as pii-redaction.ts
participant V as validate_pii.ts
participant Py as validate_pii.py
participant DB as Postgres
E->>L: completeWorkflowExecution
L->>L: filterForDisplay + redactApiKeys
L->>DB: SELECT org.dataRetentionSettings
DB-->>L: orgSettings
L->>R: resolveEffectivePiiRedaction
R-->>L: enabled + entityTypes
alt PII enabled
L->>DB: isWorkspaceOnEnterprisePlan
DB-->>L: true
L->>P: redactPIIFromExecution
P->>P: collect string leaves
loop Per 256KB chunk
P->>V: maskPIIBatch
V->>Py: spawn subprocess
Py-->>V: masked strings
V-->>P: masked strings
end
P-->>L: redacted payload
end
L->>DB: UPDATE workflowExecutionLogs
L-->>E: WorkflowExecutionLog
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant E as Executor
participant L as ExecutionLogger
participant R as retention.ts
participant P as pii-redaction.ts
participant V as validate_pii.ts
participant Py as validate_pii.py
participant DB as Postgres
E->>L: completeWorkflowExecution
L->>L: filterForDisplay + redactApiKeys
L->>DB: SELECT org.dataRetentionSettings
DB-->>L: orgSettings
L->>R: resolveEffectivePiiRedaction
R-->>L: enabled + entityTypes
alt PII enabled
L->>DB: isWorkspaceOnEnterprisePlan
DB-->>L: true
L->>P: redactPIIFromExecution
P->>P: collect string leaves
loop Per 256KB chunk
P->>V: maskPIIBatch
V->>Py: spawn subprocess
Py-->>V: masked strings
V-->>P: masked strings
end
P-->>L: redacted payload
end
L->>DB: UPDATE workflowExecutionLogs
L-->>E: WorkflowExecutionLog
Reviews (2): Last reviewed commit: "Merge remote-tracking branch 'origin/sta..." | Re-trigger Greptile |
| export interface PiiRedactionOptions { | ||
| /** Presidio entity types to mask. Empty = redact all detected PII. */ | ||
| entityTypes: string[] | ||
| language?: string | ||
| } |
There was a problem hiding this comment.
Contradictory
entityTypes: [] semantics across the codebase
This comment says "Empty = redact all detected PII", but resolveEffectivePiiRedaction in retention.ts explicitly documents and implements the opposite: an empty union returns DEFAULT_PII_REDACTION (disabled), so empty entityTypes means redact nothing. The same misleading comment also appears on PiiRedactionRule.entityTypes in schema.ts and primitives.ts.
In practice, if an admin creates a rule with no entity types selected (believing the schema comment), resolveEffectivePiiRedaction returns enabled: false and applyPiiRedaction returns the payload unchanged — no PII is masked. The inconsistency is security-relevant: an operator who reads the schema or this interface and acts on it would mistakenly believe their logs are protected.
| /** Replaces text we could not safely mask, so PII is never persisted on failure. */ | ||
| export const REDACTION_FAILED_MARKER = '[REDACTION_FAILED]' |
There was a problem hiding this comment.
The
REDACTION_FAILED_MARKER label is reused for a deliberate performance-safety scrub (payload exceeds the 16 MB ceiling), which is not a failure of the Presidio service. An operator who sees [REDACTION_FAILED] in a log will reasonably assume the masking subprocess crashed, triggering unnecessary incident investigation. A distinct constant makes the two cases distinguishable without any logic change.
| /** Replaces text we could not safely mask, so PII is never persisted on failure. */ | |
| export const REDACTION_FAILED_MARKER = '[REDACTION_FAILED]' | |
| /** Replaces text we could not safely mask, so PII is never persisted on failure. */ | |
| export const REDACTION_FAILED_MARKER = '[REDACTION_FAILED]' | |
| /** | |
| * Replaces text when the payload exceeds {@link PII_MAX_TOTAL_BYTES} and NER | |
| * was deliberately skipped. Distinct from {@link REDACTION_FAILED_MARKER} so | |
| * operators can tell a size-based scrub from a Presidio service failure. | |
| */ | |
| export const REDACTION_OVERSIZED_MARKER = '[REDACTED_OVERSIZED]' |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| private async applyPiiRedaction( | ||
| workspaceId: string | null, | ||
| payload: RedactablePayload | ||
| ): Promise<RedactablePayload> { | ||
| if (!workspaceId) return payload | ||
|
|
||
| const [row] = await db | ||
| .select({ orgSettings: organization.dataRetentionSettings }) | ||
| .from(workspace) | ||
| .leftJoin(organization, eq(organization.id, workspace.organizationId)) | ||
| .where(eq(workspace.id, workspaceId)) | ||
| .limit(1) | ||
| if (!row) return payload | ||
|
|
||
| const config = resolveEffectivePiiRedaction({ orgSettings: row.orgSettings, workspaceId }) | ||
| if (!config.enabled) return payload | ||
|
|
||
| // Settings are only writable by enterprise orgs, but re-verify at read time | ||
| // (e.g. a plan downgrade) before doing the work. | ||
| if (!(await isWorkspaceOnEnterprisePlan(workspaceId))) return payload | ||
|
|
||
| return redactPIIFromExecution(payload, { entityTypes: config.entityTypes }) | ||
| } |
There was a problem hiding this comment.
Two extra DB round-trips per execution on the hot completion path
applyPiiRedaction always issues a JOIN query for org settings, then — if rules are active — calls isWorkspaceOnEnterprisePlan (another query). For the vast majority of executions (no PII rules configured, or non-enterprise workspace), this is wasted work on the critical log-persist path. The existing existingLog query at the top of completeWorkflowExecution already loads the workspace row; adding organization.dataRetentionSettings to that initial SELECT would remove the need for this second query entirely.
| /** Presidio entity types to mask. Empty = redact all detected PII. */ | ||
| entityTypes: string[] |
There was a problem hiding this comment.
This comment contradicts how
resolveEffectivePiiRedaction in retention.ts actually behaves. A rule with an empty entityTypes array contributes nothing to the applicable union, which makes the effective policy enabled: false — meaning nothing is redacted. The intended semantics (from retention.ts's own doc and the PR description) are "empty = redact nothing (rule contributes no types)".
| /** Presidio entity types to mask. Empty = redact all detected PII. */ | |
| entityTypes: string[] | |
| /** Presidio entity types to mask. Empty array = rule contributes no types (redact nothing). */ | |
| entityTypes: string[] |
| /** Presidio entity types to mask. Empty = redact all detected PII. */ | ||
| entityTypes: z.array(z.string().min(1, 'Entity type cannot be empty')).max(100), |
There was a problem hiding this comment.
Same misleading comment as on
PiiRedactionRule in schema.ts. resolveEffectivePiiRedaction treats an empty entityTypes array on a rule as contributing nothing to the union, resulting in enabled: false (no redaction). The comment should reflect that.
| /** Presidio entity types to mask. Empty = redact all detected PII. */ | |
| entityTypes: z.array(z.string().min(1, 'Entity type cannot be empty')).max(100), | |
| /** Presidio entity types to mask. Empty array = rule contributes no types (redact nothing). */ | |
| entityTypes: z.array(z.string().min(1, 'Entity type cannot be empty')).max(100), |
|
@greptile review |
| private async applyPiiRedaction( | ||
| workspaceId: string | null, | ||
| payload: RedactablePayload | ||
| ): Promise<RedactablePayload> { | ||
| if (!workspaceId) return payload | ||
|
|
||
| const [row] = await db | ||
| .select({ orgSettings: organization.dataRetentionSettings }) | ||
| .from(workspace) | ||
| .leftJoin(organization, eq(organization.id, workspace.organizationId)) | ||
| .where(eq(workspace.id, workspaceId)) | ||
| .limit(1) | ||
| if (!row) return payload | ||
|
|
||
| const config = resolveEffectivePiiRedaction({ orgSettings: row.orgSettings, workspaceId }) | ||
| if (!config.enabled) return payload | ||
|
|
||
| // Settings are only writable by enterprise orgs, but re-verify at read time | ||
| // (e.g. a plan downgrade) before doing the work. | ||
| if (!(await isWorkspaceOnEnterprisePlan(workspaceId))) return payload | ||
|
|
||
| return redactPIIFromExecution(payload, { entityTypes: config.entityTypes }) | ||
| } |
There was a problem hiding this comment.
applyPiiRedaction DB failure propagates to log persistence
The db.select() inside applyPiiRedaction has no error handling. A transient DB error (connection timeout, pool exhaustion) while looking up org PII settings will throw from this function, propagate through completeWorkflowExecution — which has no local catch around the applyPiiRedaction call — and cause the entire workflow execution log to fail to persist. The PR's own fail-safe principle ("scrubs rather than leaks on error") in redactPIIFromExecution is not applied here to the settings-lookup path. A try/catch around the entire method body returning payload on any error would prevent PII configuration lookup failures from destroying execution history.
…t lazy - Extend PII redaction to span error/errorMessage/toolCalls and top-level error/completionFailure/trigger/executionState (Bugbot: PII in execution metadata). executionState is safe to redact — resume reads from the separate pausedExecutions table, not the log copy. - Lazy-import validate_pii in pii-redaction so the Python/child_process guardrails module stays out of the static middleware/RSC graph. - Type the org retention mutation to the contract body (optional, non-null).
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1635ce1. Configure here.
| }) | ||
| } | ||
|
|
||
| if (collected.length === 0) return payload |
There was a problem hiding this comment.
Oversized-only logs skip redaction
High Severity
When PII redaction is active, redactPIIFromExecution returns the original payload unchanged if every string leaf is skipped (e.g. over 128KB). Those fields are never masked or scrubbed, so sensitive text in large log values can still be persisted despite an enabled policy.
Reviewed by Cursor Bugbot for commit 1635ce1. Configure here.


Summary
maskmode, applied at the single log-persist choke point (completeWorkflowExecution), covering both the inline and externalized write paths.structuredClone), and fail-safe (scrubs rather than leaks on error).workspace ?? org ?? plan default) via a new nullableworkspace.data_retention_settingscolumn.ChipModal(grouped checkbox grid for entity types, workspace multiselect), persisting on save with an unsaved-changes guard.Type of Change
Testing
bun run lint,bun run check:api-validation:strict, andbun run check:migrations origin/stagingall pass. Migration is a single additive nullable column (expand-phase, backward-compatible).Checklist