feat: packages tb enrichment#4243
Conversation
Signed-off-by: anilb <epipav@gmail.com>
Signed-off-by: anilb <epipav@gmail.com>
|
|
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e1c746d. Configure here.
| coalesce(r.archived, 0) = 1, | ||
| 'archived', | ||
| coalesce(snap.hasSnapshot, 0) = 0, | ||
| 'active', |
There was a problem hiding this comment.
Missing snapshot forces active lifecycle
Medium Severity
When no repoActivitySnapshot row exists for the linked repo, lifecycleLabel is set to active before abandoned, declining, or stable rules run. Repos with stale lastCommitAt or other repo-level signals can be mislabeled until snapshot replication catches up, contradicting the datasource note that a missing snapshot is “no signal.”
Reviewed by Cursor Bugbot for commit e1c746d. Configure here.
| 3, | ||
| coalesce(dv.vulnerableDeps, 0) <= 5, | ||
| 1, | ||
| 0 |
There was a problem hiding this comment.
No deps get dependency credit
Medium Severity
dependencyHealth awards the maximum five points whenever vulnerableDeps coalesces to zero, including when the package has no packageDependencies join row. That conflates “no dependency data” with “zero vulnerable direct deps,” inflating securitySupplyChainScore while signalCoverageHealth marks dependency_health as blocked.
Reviewed by Cursor Bugbot for commit e1c746d. Configure here.
There was a problem hiding this comment.
Pull request overview
This PR introduces a Tinybird-based enrichment layer for OSS packages, producing a new materialized datasource (ossPackages_enriched_ds) that augments ossPackages with derived lifecycle and health scoring signals sourced from repo metadata, activity snapshots, maintainers, releases, vulnerabilities, and dependencies. It also updates the packages-db schema/replication to support the new snapshot feed and to persist enriched fields back into Postgres.
Changes:
- Add a new Tinybird pipe (
ossPackages_enriched.pipe) that computes lifecycle + composite health scoring for packages and materializes results via a scheduled COPY. - Add new Tinybird datasources for
repoActivitySnapshotand the resultingossPackages_enriched_ds. - Add a packages-db migration to improve indexing, add sequin publication replication for
repo_activity_snapshot, and add new enrichment columns topackages.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| services/libs/tinybird/pipes/ossPackages_enriched.pipe | Builds package lifecycle/health scoring and signal coverage JSON; materializes into Tinybird on a schedule. |
| services/libs/tinybird/datasources/repoActivitySnapshot.datasource | Defines the repo activity snapshot datasource schema and storage engine settings used by the enrichment pipe. |
| services/libs/tinybird/datasources/ossPackages_enriched_ds.datasource | Defines the enriched OSS packages datasource schema that the pipe writes into. |
| backend/src/osspckgs/migrations/V1781539311__packages_tables_sequin_updates.sql | Adds an index, ensures sequin publication includes repo activity snapshots, and adds enriched columns to packages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ENGINE ReplacingMergeTree | ||
| ENGINE_PARTITION_KEY toYear(snapshotAt) | ||
| ENGINE_SORTING_KEY repoId | ||
| ENGINE_VER snapshotAt |
| coalesce(mh.maintainersCount, 0) > 0, | ||
| 'partial', | ||
| 'blocked' |
| 'security_practices', | ||
| if(r.branchProtectionEnabled IS NULL, 'partial', 'available'), |
| END IF; | ||
| END$$; | ||
|
|
||
| ALTER TABLE public.repo_activity_snapshot REPLICA IDENTITY FULL; |
| ADD COLUMN IF NOT EXISTS maintainer_health_score smallint, | ||
| ADD COLUMN IF NOT EXISTS security_supply_chain_score smallint, | ||
| ADD COLUMN IF NOT EXISTS development_activity_score smallint, | ||
| ADD COLUMN IF NOT EXISTS signal_coverage_health jsonb; |
There was a problem hiding this comment.
@epipav what is going to be exactly in the signal_coverage_helath do we know that ?


No description provided.