MCP SERVER · WINDOWS CAPTURE · FFMPEG
+Give your agent
eyes on the screen.
+
+ screencast-mcp records the Windows screen, samples + footage into frames an agent can actually look at, and cuts the result with + ffmpeg — from a quick trim to a platform-ready export. It speaks MCP + over stdio, so it plugs into any MCP client.
+
+ Needs ffmpeg and ffprobe on PATH
+ (or FFMPEG_PATH / FFPROBE_PATH). Capture uses
+ gdigrab, which is Windows-only; the watch and edit tools run
+ anywhere ffmpeg runs. Add it to an MCP client with:
{
"mcpServers": {
"screencast": {
"command": "npx",
@@ -61,68 +253,159 @@ Install
}
}
}
+ Point it at anything on screen.
+Record the whole desktop, one monitor, a window by title, or an exact + pixel region — with optional system audio through a loopback device. + Recordings run as background sessions you stop by id; screenshots are one + call. Every capture is an explicit tool call: nothing records on its own.
+| Tool | What it does |
|---|---|
start_recording | Start a background recording. Target = full, a monitor, a window by title, or a region. Optional fps, quality preset, and system audio. |
stop_recording | Stop a session by id with a graceful quit so the file is finalized, not truncated. |
screenshot | Capture a single PNG of a target. |
list_sessions | List active and finished recording sessions. |
get_session | Inspect a single session by id. |
list_audio_devices | List DirectShow audio devices and flag a likely system-audio loopback device for start_recording. |
Video a model can read.
+A video file is opaque to a language model. sample_frames
+ turns footage into PNGs an agent can actually view — at a fixed rate
+ for a full pass, or at exact timestamps to check one moment.
| Tool | What it does |
|---|---|
sample_frames | Extract frames at a fixed fps or at explicit timestamps so the agent can view what happened. |
get_media_info | ffprobe wrapper: duration, resolution, fps, codecs, format, size. |
Tools
-Tools that write a file refuse to replace an existing file at a
- caller-supplied output path unless overwrite: true
- is passed; auto-generated default paths are always unique.
| Tool | What it does |
|---|---|
start_recording | Start a background recording. Target = full, a monitor, a window by title, or a region. Optional fps and quality preset. |
stop_recording | Stop a session by id with a graceful quit so the file is finalized, not truncated. |
list_sessions | List active and finished recording sessions. |
get_session | Inspect a single session by id. |
screenshot | Capture a single PNG of a target. |
sample_frames | Extract frames at a fixed fps or at explicit timestamps so the agent can view what happened. |
get_media_info | ffprobe wrapper: duration, resolution, fps, codecs, format, size. |
trim | Cut a sub-clip by start + end or duration. |
concat | Join two or more videos into one. |
convert | Convert between mp4, gif, and webm. |
crop | Crop to a pixel rectangle; an off-frame rectangle is rejected. |
scale | Resize to a width and/or height, keeping aspect when one side is given. |
speed | Change playback speed by a factor; audio is retempo'd when present. |
overlay | Composite a logo, watermark, or picture-in-picture, optionally scaled and time-limited. |
compress | Re-encode smaller with a CRF ladder and an optional width cap. |
extract_audio | Write the audio track to its own file (mp3, aac, wav, or copy). |
clip | Extract one or more frame-accurate sub-segments to separate files. |
redact_region | Cover declared rectangles (solid box, blur, or pixelate) to hide on-screen secrets. Declared regions only, not automatic detection. |
list_audio_devices | List DirectShow audio devices and flag a likely system-audio loopback device for start_recording. |
xfade_transition | Crossfade two videos into one with an xfade transition. Inputs are auto-normalized first. |
assemble_highlights | Stitch two or more clips into one with hard cuts or an xfade transition between each. |
title_card | Generate a standalone title card with centered text on a solid background. Uses a bundled font. |
music_bed | Lay a music track under a video: looped/trimmed, faded, leveled, and mixed with any existing audio. |
reframe | Re-aspect to 16:9, 9:16, 1:1, or 4:5 with pad (letterbox) or crop (fill). |
export_preset | Encode a platform-ready file (youtube, instagram_reel, tiktok, x, square) at the right aspect, fps, and bitrate. |
Windows notes
--
-
- Monitor targets crop the virtual desktop to a display's real pixel
- bounds, so
monitor:1grabs the second display at its true - offset;monitor:0is primary.
- - Window capture matches a case-insensitive exact title first, then - falls back to a substring match; with several matches the topmost window - wins. -
- Fullscreen-exclusive apps can produce black frames under gdigrab; use - borderless-windowed mode. -
- System audio capture (
start_recordingwith -audio.source= system) needs a virtual-audio loopback device, - since gdigrab is video-only and Windows has no native loopback. Use -list_audio_devicesto find one. Microphone capture is not - supported.
-
Threat model
-Small cuts, clean errors.
+Composable ffmpeg edits with validated inputs — a bad rectangle,
+ an odd dimension, or a wrong duration is rejected with a message that says
+ how to fix it, not an encoder stack trace. Tools that write a file refuse
+ to replace an existing file at a caller-supplied output path
+ unless overwrite: true is passed; auto-generated default
+ paths are always unique.
| Tool | What it does |
|---|---|
trim | Cut a sub-clip by start + end or duration. Stream copy, snaps to keyframes. |
clip | Extract one or more frame-accurate sub-segments to separate files. |
concat | Join two or more videos into one. |
convert | Convert between mp4, gif, and webm. |
crop | Crop to a pixel rectangle; an off-frame rectangle is rejected. |
scale | Resize to a width and/or height, keeping aspect when one side is given. |
speed | Change playback speed by a factor; audio is retempo'd when present. |
overlay | Composite a logo, watermark, or picture-in-picture, optionally scaled and time-limited. |
compress | Re-encode smaller with a CRF ladder and an optional width cap. |
extract_audio | Write the audio track to its own file (mp3, aac, wav, or copy — copy picks a container that fits the source codec). |
redact_region | Cover declared rectangles (solid box, blur, or pixelate) to hide on-screen secrets. Declared regions only, not automatic detection. |
From raw capture to something you’d publish.
+Assembly, titles, music, and platform-shaped exports. Mixed inputs are + normalized to a common resolution, fps, and audio rate before they are + combined, so heterogeneous clips compose cleanly.
+| Tool | What it does |
|---|---|
assemble_highlights | Stitch two or more clips into one with hard cuts or an xfade transition between each. |
xfade_transition | Crossfade two videos into one with an xfade transition. Inputs are auto-normalized first. |
title_card | Generate a standalone title card with centered text on a solid background. Uses a bundled font. |
music_bed | Lay a music track under a video: looped/trimmed, faded, leveled, and mixed with any existing audio. |
reframe | Re-aspect to 16:9, 9:16, 1:1, or 4:5 with pad (letterbox) or crop (fill). |
export_preset | Encode a platform-ready file (youtube, instagram_reel, tiktok, x, square) at the right aspect, fps, and bitrate. |
Know your capture surface.
+-
+
- Monitor targets crop the virtual desktop to a display’s real pixel
+ bounds, so
monitor:1grabs the second display at its true + offset;monitor:0is primary.
+ - Window capture matches a case-insensitive exact title first, then + falls back to a substring match; with several matches the topmost window + wins. +
- Fullscreen-exclusive apps can produce black frames under gdigrab; use + borderless-windowed mode. +
- System audio capture (
start_recordingwith +audio.source= system) needs a virtual-audio loopback device, + since gdigrab is video-only and Windows has no native loopback. Use +list_audio_devicesto find one. Microphone capture is not + supported.
+
The screen is sensitive. Treat it that way.
+redact_region before anything leaves the
+ machine.
+