A secure, production-ready tool for bulk importing markdown files into Open Notebook.
Before using this script:
- Ensure your source directory contains no sensitive information (API keys, credentials, PII, etc.)
- The script includes security protections, but YOU are responsible for reviewing your data
- Only works with localhost Open Notebook instances for security
- See SECURITY.md for detailed security guidelines
bulk_import_to_open_notebook.py - Bulk import markdown files into Open Notebook via the API.
Use Case: Import documentation from a directory into your Open Notebook instance for knowledge management, search, and AI-powered interactions.
- Python 3.8 or higher
- Open Notebook running locally (default: http://localhost:5055)
- A notebook ID from your Open Notebook instance
# Clone the repository
git clone <your-repo-url>
cd ON_Bulk_Import
# Install dependencies
pip install -r requirements.txt
# Or use a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtpython bulk_import_to_open_notebook.py \
--notebook-id "notebook:abc123xyz" \
--source-dir ./my-documents- Open Notebook must be running (default: http://localhost:5055)
- Notebook ID - Found in the Open Notebook URL when viewing a notebook
- Source directory with markdown files to import
Basic Import:
python bulk_import_to_open_notebook.py \
--notebook-id "notebook:abc123xyz" \
--source-dir ./my-documentsAuto-confirm (skip prompt):
python bulk_import_to_open_notebook.py \
--notebook-id "notebook:abc123xyz" \
--source-dir ./my-documents \
--yesSkip Embedding (faster, but no vector search):
python bulk_import_to_open_notebook.py \
--notebook-id "notebook:abc123xyz" \
--source-dir ./my-documents \
--no-embedCustom API URL and delay:
python bulk_import_to_open_notebook.py \
--notebook-id "notebook:abc123xyz" \
--source-dir ./my-documents \
--api-url "http://localhost:5055" \
--delay 1.0Get help:
python bulk_import_to_open_notebook.py --help| Option | Default | Required | Description |
|---|---|---|---|
--notebook-id |
- | ✅ Yes | Notebook ID to import into (e.g., "notebook:abc123") |
--source-dir |
- | ✅ Yes | Directory containing markdown files |
--api-url |
http://localhost:5055 |
No | Open Notebook API URL (localhost only) |
--pattern |
*.md |
No | File pattern to match |
--no-embed |
false |
No | Skip embedding (faster, no vector search) |
--delay |
0.5 |
No | Delay between imports (seconds) |
--yes, -y |
false |
No | Skip confirmation prompt |
--transformations |
None | No | Transformation IDs to apply |
======================================================================
Open Notebook Bulk Import
======================================================================
Verifying notebook...
✓ Found notebook: My Documentation
Configuration:
API URL: http://localhost:5055
Notebook ID: notebook:abc123xyz...
Source Directory: ./my-documents
File Pattern: *.md
Embed Sources: True
Delay: 0.5s between imports
Security Limits:
Max file size: 10MB
Max total size: 500MB
Max files: 10000
Found 15 files to import
Proceed with import? [y/N]: y
[1/15] Importing: Getting Started
✓ Created source: src_xyz123
[2/15] Importing: Installation Guide
✓ Created source: src_xyz124
[3/15] Importing: API Reference
✓ Created source: src_xyz125
...
======================================================================
Import Complete!
======================================================================
Total files: 15
Successful: 15
Failed: 0
Imported sources:
1. Getting Started (ID: src_xyz123)
2. Installation Guide (ID: src_xyz124)
3. API Reference (ID: src_xyz125)
...
You can now view these sources in Open Notebook!
This script includes comprehensive security protections:
- ✅ Localhost Only - API must be on localhost for security
- ✅ Path Validation - Prevents directory traversal attacks
- ✅ File Size Limits - 10MB per file, 500MB total
- ✅ Symlink Protection - Skips symbolic links
- ✅ Safe Patterns - Only allows markdown file patterns (*.md, *.markdown, *.txt)
- ✅ Max Files Limit - Prevents processing more than 10,000 files
- ✅ Input Validation - Sanitizes notebook IDs and file paths
- ✅ Privacy Protection - Doesn't expose full system paths in logs
- ✅ Error Handling - Graceful failure without exposing sensitive information
MAX_FILE_SIZE = 10MB # Maximum size per file
MAX_TOTAL_SIZE = 500MB # Maximum total import size
MAX_FILES = 10,000 # Maximum number of files
- Path traversal attacks (
../../../etc/passwd) - Symlink attacks pointing outside the source directory
- Oversized files causing denial of service
- System directory access (
/etc,/sys, etc.) - Files with dangerous characters in names
- Non-localhost API connections
- Malicious notebook IDs
Before running the script:
- Verify source directory contains no API keys, tokens, or credentials
- Check for PII or confidential information in markdown files
- Ensure Open Notebook API is only accessible on localhost
- Review files to be imported (script shows count before import)
- Use
--yesflag only when you've verified the source directory
See SECURITY.md for:
- Complete security guidelines
- Vulnerability reporting process
- Best practices for users and maintainers
- Detailed threat model
Run the security test suite:
# Execute all security tests
./test_security.sh
# Expected output:
# ✓ All security tests passed!The test suite validates:
- Path traversal protection
- Symlink protection
- File size limits
- Hidden file exclusion
- Localhost-only API access
- And more...
# Install main dependencies
pip install -r requirements.txt
# Install development tools
pip install pre-commit bandit safety black flake8 isort# Install pre-commit
pip install pre-commit
# Install the git hooks
pre-commit install
# Run manually on all files
pre-commit run --all-filesPre-commit hooks automatically check for:
- Secrets and credentials (Gitleaks, detect-secrets)
- Private keys and AWS credentials
- Security issues (Bandit)
- Code quality (Black, flake8, isort)
- File format issues
For troubleshooting, set the DEBUG environment variable:
export DEBUG=1
python bulk_import_to_open_notebook.py --notebook-id "..." --source-dir ./docsThis shows full stack traces for errors.
A: No. All processing happens locally on your machine. The only network request is to your local Open Notebook instance (localhost). No data is sent to external services.
A:
- Files are read from your local source directory
- Content is sent to your local Open Notebook instance
- Original files are never modified or deleted
- Nothing is uploaded to external services
A: No, for security reasons, the script only allows localhost connections. If you need to access a remote Open Notebook instance, use SSH tunneling:
# On your local machine, create SSH tunnel
ssh -L 5055:localhost:5055 user@remote-server
# Then run the script normally (it connects to localhost:5055)
python bulk_import_to_open_notebook.py \
--notebook-id "notebook:abc123" \
--source-dir ./docsA: Files larger than 10MB are skipped. If you need to import larger files:
- Split them into smaller chunks (recommended)
- Modify the
MAX_FILE_SIZEconstant in the script (not recommended for public use)
A:
- Open your notebook in Open Notebook
- Look at the URL in your browser
- Find the part after
/notebooks/ - Important: If you see
%3Ain the URL, replace it with:(colon)
Example 1 - Simple URL:
URL: http://localhost:5055/notebooks/notebook:abc123xyz
ID: notebook:abc123xyz
Example 2 - URL-encoded (most common):
URL: http://127.0.0.1:8502/notebooks/notebook%3A075hgk2qbdgoka54yug5
↑ This %3A is a colon (:)
ID: notebook:075hgk2qbdgoka54yug5
↑ Use the decoded version with a real colon
Quick Rule: %3A in URLs = : (colon) in notebook IDs
📖 For detailed instructions with more examples, see NOTEBOOK_ID_GUIDE.md
A: Yes, but only safe text formats. The --pattern option accepts:
*.md(default) - Markdown files*.markdown- Markdown files with .markdown extension*.txt- Plain text files
Other patterns are blocked for security.
A: Files that were successfully imported before the interruption will remain in Open Notebook. You can:
- Re-run the script (it will create duplicates)
- Manually check and continue from where it stopped
A: Yes. The script imports all files it finds, even if they were previously imported. Make sure to avoid running it multiple times on the same directory unless you want duplicates.
Problem: Trying to connect to a non-localhost API.
Solution: The script only allows localhost connections for security. If your Open Notebook is on a remote server, use SSH tunneling:
ssh -L 5055:localhost:5055 user@remote-serverProblem: Invalid notebook ID or Open Notebook not running.
Solutions:
- Make sure Open Notebook is running at the API URL (default: http://localhost:5055)
- Verify your notebook ID is correct (see FAQ above)
- Check if you need to decode
%3Ato:in the notebook ID - Verify the notebook exists by opening it in your browser first
Problem: The path you provided doesn't exist.
Solutions:
- Use absolute paths:
/Users/username/documentsinstead of~/documents - Check for typos in the path
- Ensure the directory exists:
ls -la /path/to/directory
Problem: Files exceed security limits (10MB per file, 500MB total).
Solutions:
- Split large files into smaller chunks
- Import in batches (multiple runs with different source directories)
- Remove oversized files from the source directory
Problem: Many files or slow embedding process.
Solutions:
- Use
--no-embedflag to skip embedding (much faster) - Increase
--delayif API is being overwhelmed - Process files in smaller batches
Problem: Using a pattern other than *.md, *.markdown, or *.txt.
Solution: The script only accepts markdown and text files for security. Convert other formats to markdown first.
Contributions are welcome! Please:
- Review SECURITY.md for security guidelines
- Run security tests before submitting:
./test_security.sh - Use pre-commit hooks for code quality
- Include tests for new features
- Update documentation
IMPORTANT: If you find security vulnerabilities, see SECURITY.md for responsible disclosure guidelines.
DO NOT open public issues for security vulnerabilities.
MIT License - See LICENSE file for details.
- Open Notebook team for the excellent documentation tool
- Security scanning tools: Bandit, Safety, GitLeaks, TruffleHog, detect-secrets
- Python community for excellent libraries and tools
- Open Notebook Documentation
- GitHub Security Best Practices
- OWASP Secure Coding Practices
- Python Security Best Practices
Current Version: v1.0.0
Status: ✅ Production Ready | 🔒 Security Hardened
- Issues: GitHub Issues
- Security: See SECURITY.md
- Documentation: This README and inline code comments
Made with ❤️ for the Open Notebook community