Monitors a directory for changes to Excel, Word, PDF, and image files. Logs all events (create, modify, delete, rename, move) to a SQLite database. Detects changes that occurred while the script was not running on every restart. Auto-recovers if the watched drive goes offline.
- Python 3.10 or higher
watchdog— the only dependency, installed viarequirements.txt- No database server, no running services, no XAMPP — SQLite is built into Python
filewatcher/
├── config.ini ← your configuration (edit this)
├── main.py ← entry point
├── db.py ← SQLite database layer
├── handler.py ← live watchdog event handler
├── logger.py ← centralized logging setup
├── query.py ← CLI tool for reading logs
├── .gitignore ← excludes cache, db, and log files from git
└── requirements.txt ← Python dependencies
Download from https://python.org — check "Add Python to PATH" during install.
Open a terminal in the filewatcher folder and run:
pip install -r requirements.txt
Change at minimum:
watch_directory→ the folder you want to monitor (can be a network drive e.g.K:\)log_directory→ where the SQLite database and log file will be saved (keep OUTSIDE watch_directory)
Note on large drives: If
watch_directorypoints to the root of a large drive or network share, the first startup scan will take longer as it hashes all matching files. The terminal will show an estimated time to completion and progress updates every 50 files. Every subsequent startup is significantly faster due to the mtime pre-filter.
python main.py
All settings live in config.ini. No code changes needed.
| Setting | Section | Default | Description |
|---|---|---|---|
watch_directory |
[watcher] |
— | Full path to the directory to monitor |
recursive |
[watcher] |
true |
Watch subdirectories recursively |
reconnect_delay |
[watcher] |
30 |
Seconds to wait before retrying if drive goes offline |
move_window |
[watcher] |
10 |
Seconds to poll for a matching file after DELETE before confirming it as a delete |
heartbeat_interval |
[watcher] |
30 |
Seconds between heartbeat writes to the config table (used by Laravel UI health check) |
watch_extensions |
[filters] |
see file | Whitelist of file extensions to track |
ignore_prefixes |
[filters] |
~$, .~, ~ |
Filename prefixes to ignore (Office lock files) |
log_directory |
[storage] |
— | Where to save filelog.db and filewatcher.log |
db_name |
[storage] |
filelog.db |
SQLite database filename |
retention_days |
[storage] |
90 |
Days to keep events before auto-purge (0 = keep forever) |
hash_algorithm |
[snapshot] |
md5 |
Hashing algorithm for file fingerprinting |
Two log outputs are written to log_directory on every run:
filelog.db— SQLite database containing all file change events and the current snapshotfilewatcher.log— rotating text log of all script activity including startup, errors, and reconnects. Rotates at 5MB, keeps last 5 files.
python query.py # last 50 events
python query.py --limit 100 # show more results
python query.py --type DELETED # filter by event type
python query.py --type RENAMED # filter by event type
python query.py --file budget.xlsx # search by filename
python query.py --today # events from today only
python query.py --date 2026-05-26 # events from a specific date
python query.py --summary # count of each event typeFilters stack — --type DELETED --today shows only today's deletes.
Download free from https://sqlitebrowser.org. Open filelog.db from your
log_directory, click the Browse Data tab, and select the events or
snapshots table from the dropdown.
import sqlite3
conn = sqlite3.connect(r"C:\Users\primelink\Desktop\LOGS\filelog.db")
for row in conn.execute("SELECT * FROM events ORDER BY timestamp DESC LIMIT 50"):
print(row)| Column | Description |
|---|---|
| timestamp | ISO 8601 datetime of the event |
| event_type | See event types table below |
| src_path | File path where the event occurred (source path for renames/moves) |
| dest_path | Destination path — populated for RENAMED, MOVED, MOVED_AND_RENAMED |
| file_size | Size in bytes at time of event |
| md5_hash | MD5 fingerprint of file contents after the event |
| prev_hash | MD5 fingerprint before the change — populated for MODIFIED events only |
| Event type | Meaning |
|---|---|
CREATED |
A new file appeared in the watched directory |
MODIFIED |
An existing file's contents changed |
DELETED |
A file was permanently removed |
RENAMED |
Filename changed, file stayed in the same folder |
MOVED |
File moved to a different folder, filename unchanged |
MOVED_AND_RENAMED |
File moved to a different folder and renamed |
CREATED (offline) |
File was created while the script was not running |
MODIFIED (offline) |
File was modified while the script was not running |
DELETED (offline) |
File was deleted while the script was not running |
RENAMED (offline) |
File was renamed while the script was not running |
MOVED (offline) |
File was moved while the script was not running |
MOVED_AND_RENAMED (offline) |
File was moved and renamed while script was off |
See only deleted files:
SELECT * FROM events WHERE event_type LIKE '%DELETED%'See only renames:
SELECT * FROM events WHERE event_type LIKE '%RENAMED%'See all offline changes:
SELECT * FROM events WHERE event_type LIKE '%offline%'Track a specific file:
SELECT * FROM events WHERE src_path LIKE '%filename.pdf%'Events from a specific date:
SELECT * FROM events WHERE timestamp LIKE '2026-05-26%'All file paths are stored and compared as lowercase strings. This prevents
false-positive DELETED (offline) events on Windows network drives where
os.walk() and the stored snapshot may return the same path in different
cases (e.g. \\Kyle\bid docs\ vs \\kyle\bid docs\).
If you reset the snapshots table (e.g.
DELETE FROM snapshots), the next startup will log every existing file asCREATED (offline). This is expected and only happens once — the snapshot rebuilds itself on that restart and all subsequent startups will diff correctly.
The config table in filelog.db stores script metadata readable by external
tools such as the Laravel UI:
| Key | Description |
|---|---|
watch_directory |
The directory currently being monitored |
log_directory |
Where logs and the database are stored |
retention_days |
Current retention setting |
script_version |
Version string from main.py |
started_at |
ISO 8601 timestamp of last startup |
heartbeat |
ISO 8601 timestamp updated every heartbeat_interval seconds — used to determine if the script is currently alive |
python main.py
│
▼
Load config.ini [main.py]
│
▼
Setup logging [logger.py] → filewatcher.log + console
│
▼
Watch dir available? [main.py] → waits if K:\ not mounted yet
│
▼
Open / create database [db.py] → filelog.db, creates tables
│
▼
Purge old events [db.py] → deletes rows older than retention_days
│
▼
── STARTUP DIFF ──────────────────────────────────────────────────
│
▼
Scan watch directory [main.py] → normalize paths to lowercase
│ → mtime pre-filter → reuse stored hash
│ → parallel hash remaining files + ETA
▼
Diff snapshot vs disk [main.py] → hash match: RENAMED / MOVED /
│ MOVED_AND_RENAMED / CREATED / DELETED
▼
Log offline events [db.py] → db.log_event() + update snapshots
│ → db.flush() ensures commit before returning
▼
── HEARTBEAT ─────────────────────────────────────────────────────
│
▼
Heartbeat thread starts [main.py] → daemon thread, upserts config.heartbeat
│ every heartbeat_interval seconds
▼
── LIVE WATCHER ──────────────────────────────────────────────────
│
▼
Watchdog observer starts [handler.py] → attached to watch_dir
│
▼ (loops on every file system event)
File system event fires [handler.py] → on_created / on_modified
│ on_deleted / on_moved
▼
Extension + prefix filter → skip ~$ prefixes, check whitelist
│
▼
Classify event [handler.py] → DELETED held in pending_deletes dict
│ → single sweep thread checks expiry
│ → hash match found → RENAMED / MOVED
│ → window expires → confirmed DELETE
▼
Log live event [db.py] → db.log_event() + upsert_snapshot()
│ → all paths normalized to lowercase
│
└──────────────────────────────── loops back to next event
── QUERY (separate tool) ─────────────────────────────────────────
python query.py [query.py] → reads filelog.db directly
→ filter by type, file, date
-
First run on large drives is slow — every matching file must be MD5 hashed to build the initial snapshot. On a network drive with thousands of files this can take several minutes. Every run after the first is fast due to the mtime pre-filter.
-
Move detection window — when a file is deleted and recreated (cross-folder move), the script polls the watch directory every second for up to
move_windowseconds looking for a file with a matching hash. The MOVED event is logged the moment the file finishes copying — not after a blind wait. If no match is found within the window, it is confirmed as a DELETE. Increasemove_windowinconfig.iniif large files on slow network drives are still being logged as DELETE + CREATE instead of MOVED. -
Bulk operations may cause missed live events (Windows) — on Windows, watchdog uses the
ReadDirectoryChangesWAPI which has a fixed-size event buffer. If a large number of files change simultaneously (e.g. a bulk copy or mass rename operation), the buffer can overflow and watchdog will silently miss some live events. This does not cause data corruption — any missed events will be detected and logged as(offline)variants on the next restart when the startup diff compares the snapshot against the current drive state. There is no workaround within the script itself; this is an OS-level constraint. -
Network drive hashing is slower than local — MD5 hashing over a network connection is limited by network bandwidth, not disk speed. Pointing
watch_directoryto a specific subfolder rather than the drive root significantly reduces startup time. -
No content logging — the script records that a file changed and its MD5 hash, but does not store the file's contents or a diff of what changed inside it.
-
Paths stored as lowercase — all paths in the database are normalized to lowercase. This is intentional (see Path Normalization above) but means the original casing of filenames and directories is not preserved in the log.
To run automatically on startup:
-
Open Task Scheduler → Create Task
-
General tab
- Name: File Watcher
- Check: "Run whether user is logged on or not"
- Check: "Run with highest privileges"
-
Triggers tab
- New trigger → At startup
-
Actions tab
- Action: Start a program
- Program:
C:\Python312\python.exe(runwhere pythonto find your actual path) - Arguments:
main.py - Start in:
C:\path\to\filewatcher(full path to this folder)
-
Settings tab
- UNCHECK: "Stop the task if it runs longer than 3 days"
- Select: "Do not start a new instance" if already running
Network drives: If
K:\is not mounted yet when the script starts at boot, the script will wait patiently until the drive becomes available rather than crashing.
For a persistent background service on Linux, create /etc/systemd/system/filewatcher.service:
[Unit]
Description=File Watcher
[Service]
ExecStart=/usr/bin/python3 /path/to/filewatcher/main.py
Restart=on-failure
WorkingDirectory=/path/to/filewatcher
[Install]
WantedBy=multi-user.targetThen enable it:
sudo systemctl enable filewatcher
sudo systemctl start filewatcher