Approved Python packages for the CDII DataHub Databricks platform.
This repository is the formal governance gate for all third-party Python packages used on DataHub clusters. A package must complete the vetting process and be merged into this repo before it may be installed on any cluster.
For the equivalent R package repo, see: chhsdata/datahub-pkg-repo-r
datahub-pkg-repo-python/
├── approved-packages.yml # Registry of all approved packages
├── packages/
│ └── <package-name>/
│ └── <version>/
│ ├── *.whl # Linux x86_64 binary wheel (install this on Databricks)
│ ├── *.tar.gz # Source distribution (governance artifact)
│ ├── checksum.md # SHA256 checksums verified against PyPI
│ ├── scan-results.md # pip-audit CVE scan output
│ ├── license-review.md # License compatibility review
│ └── test_import.py # Import test — validates install on a live cluster
└── sample-notebooks/ # Reference notebooks showing approved package usage
Open a pull request with the following artifacts. The PR will not be merged until all steps are complete and branch protection checks pass.
Vetting checklist (all steps required):
-
Source & Maintenance Review — Confirm the package is published on PyPI, identify the author, check release frequency and GitHub activity.
-
License Review — Confirm the license is permissive and compatible with California state government use (MIT, BSD-2/3-Clause, Apache-2.0, PSF). Copyleft licenses (GPL, LGPL, AGPL) require legal review before approval.
-
Download & Checksum Verification — Download the Linux x86_64
.whland.tar.gzsource distribution from PyPI. Verify SHA256 checksums against the PyPI JSON API. Save results tochecksum.txt. -
CVE Scan — Run
pip-auditagainst arequirements.txtpinned to the exact version. Save full output toscan-results.txt. Any findings must be resolved or formally accepted before merge. -
Write test_import.py — A minimal script that imports the package and confirms basic functionality. Run this on a live cluster after installing the wheel to validate the package works as expected.
-
Update approved-packages.yml — Add an entry with all fields populated (version, license, checksums, scan date, Jira story).
Full process documentation: Confluence — CDII Package Governance
Approved .whl files are stored in this repository. Use the method that matches
your workspace type.
Unity Catalog workspaces do not expose DBFS in the same way as legacy workspaces. Use the cluster Libraries UI instead:
- Go to Compute in the left sidebar and select your cluster
- Click the Libraries tab → Install new → Upload
- Upload the
.whlfile from this repository:packages/uuid-utils/0.10.0/uuid_utils-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl - Wait for status to show Installed
Step 1 — Upload the wheel to DBFS
# Upload via Databricks CLI (run from your local machine)
databricks fs cp packages/uuid-utils/0.10.0/uuid_utils-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl \
dbfs:/FileStore/packages/Step 2 — Install in your notebook using %pip
# In a Databricks notebook cell — use %pip so it applies to the whole cluster
%pip install /dbfs/FileStore/packages/uuid_utils-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whlimport uuid_utils
print(uuid_utils.__version__) # should print: 0.10.0Notes:
- Always use the exact
.whlfilename fromapproved-packages.yml— never install from PyPI directly (pip install uuid-utilsis not permitted). - The wheel filenames encode the target platform. All wheels in this repo are built for Linux x86_64 (the Databricks cluster OS). Do not substitute macOS or Windows wheels.
- Version pinning is mandatory. Never use
>=or omit the version.
approved-packages.yml is the authoritative registry of all approved packages.
Before installing any package, confirm it is listed here with a matching version.
# Example entry
- name: uuid-utils
version: "0.10.0"
license: BSD-3-Clause
whl_sha256: 263b2589111c61decdd74a762e8f850c9e4386fb78d2cf7cb4dfc537054cda1b
cve_scan: PASS
cve_scan_date: "2026-05-18"
jira_story: HUB-1630To verify a wheel file you already have on disk matches the approved checksum:
shasum -a 256 uuid_utils-0.10.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
# Compare output against whl_sha256 in approved-packages.yml| Package | Version | License | Approved | Jira |
|---|---|---|---|---|
| uuid-utils | 0.10.0 | BSD-3-Clause | 2026-05-18 | HUB-1630 |
Full policy, process, and rationale: Confluence — CDII Package Governance
Maintainer: CDII DataHub Platform Team