diff --git a/docs/index.html b/docs/index.html index 514c162..c01d5c7 100644 --- a/docs/index.html +++ b/docs/index.html @@ -3,57 +3,249 @@ - Screencast MCP + Screencast MCP — give your agent eyes on the screen - + + + + -
-

Screencast MCP

-

- MCP server - TypeScript - Windows-first - ffmpeg -

-

A Model Context Protocol server that lets an agent record the screen, take - screenshots, "watch" footage by sampling frames into viewable images, and run - a small set of ffmpeg edits. It speaks MCP over stdio.

-

View on GitHub

- -

Prerequisites

-

ffmpeg and ffprobe must be installed and on - PATH (or set via FFMPEG_PATH / - FFPROBE_PATH). Screen capture uses gdigrab, which is - Windows-only; the watch and edit tools run anywhere ffmpeg runs.

- -

Install

-
npm install -g @tmhs/screencast-mcp
-
{
+  
+
+ + REC + 00:00:00;00 +
+
+ @tmhs/screencast-mcp + GitHub ↗ +
+
+ +
+
+

MCP SERVER · WINDOWS CAPTURE · FFMPEG

+
+ + + + +

Give your agent
eyes on the screen.

+ +
+

screencast-mcp records the Windows screen, samples + footage into frames an agent can actually look at, and cuts the result with + ffmpeg — from a quick trim to a platform-ready export. It speaks MCP + over stdio, so it plugs into any MCP client.

+
+ $npm install -g @tmhs/screencast-mcp + View source ↗ +
+

+ Needs ffmpeg and ffprobe on PATH + (or FFMPEG_PATH / FFPROBE_PATH). Capture uses + gdigrab, which is Windows-only; the watch and edit tools run + anywhere ffmpeg runs. Add it to an MCP client with:

+
{
   "mcpServers": {
     "screencast": {
       "command": "npx",
@@ -61,68 +253,159 @@ 

Install

} } }
+
+ +
+
+
00:02CHAPTER 1 / CAPTURE
+

Point it at anything on screen.

+

Record the whole desktop, one monitor, a window by title, or an exact + pixel region — with optional system audio through a loopback device. + Recordings run as background sessions you stop by id; screenshots are one + call. Every capture is an explicit tool call: nothing records on its own.

+ + + + + + + + + + +
ToolWhat it does
start_recordingStart a background recording. Target = full, a monitor, a window by title, or a region. Optional fps, quality preset, and system audio.
stop_recordingStop a session by id with a graceful quit so the file is finalized, not truncated.
screenshotCapture a single PNG of a target.
list_sessionsList active and finished recording sessions.
get_sessionInspect a single session by id.
list_audio_devicesList DirectShow audio devices and flag a likely system-audio loopback device for start_recording.
+
+ +
+
00:14CHAPTER 2 / WATCH
+

Video a model can read.

+

A video file is opaque to a language model. sample_frames + turns footage into PNGs an agent can actually view — at a fixed rate + for a full pass, or at exact timestamps to check one moment.

+ + + + + + +
ToolWhat it does
sample_framesExtract frames at a fixed fps or at explicit timestamps so the agent can view what happened.
get_media_infoffprobe wrapper: duration, resolution, fps, codecs, format, size.
+
-

Tools

-

Tools that write a file refuse to replace an existing file at a - caller-supplied output path unless overwrite: true - is passed; auto-generated default paths are always unique.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ToolWhat it does
start_recordingStart a background recording. Target = full, a monitor, a window by title, or a region. Optional fps and quality preset.
stop_recordingStop a session by id with a graceful quit so the file is finalized, not truncated.
list_sessionsList active and finished recording sessions.
get_sessionInspect a single session by id.
screenshotCapture a single PNG of a target.
sample_framesExtract frames at a fixed fps or at explicit timestamps so the agent can view what happened.
get_media_infoffprobe wrapper: duration, resolution, fps, codecs, format, size.
trimCut a sub-clip by start + end or duration.
concatJoin two or more videos into one.
convertConvert between mp4, gif, and webm.
cropCrop to a pixel rectangle; an off-frame rectangle is rejected.
scaleResize to a width and/or height, keeping aspect when one side is given.
speedChange playback speed by a factor; audio is retempo'd when present.
overlayComposite a logo, watermark, or picture-in-picture, optionally scaled and time-limited.
compressRe-encode smaller with a CRF ladder and an optional width cap.
extract_audioWrite the audio track to its own file (mp3, aac, wav, or copy).
clipExtract one or more frame-accurate sub-segments to separate files.
redact_regionCover declared rectangles (solid box, blur, or pixelate) to hide on-screen secrets. Declared regions only, not automatic detection.
list_audio_devicesList DirectShow audio devices and flag a likely system-audio loopback device for start_recording.
xfade_transitionCrossfade two videos into one with an xfade transition. Inputs are auto-normalized first.
assemble_highlightsStitch two or more clips into one with hard cuts or an xfade transition between each.
title_cardGenerate a standalone title card with centered text on a solid background. Uses a bundled font.
music_bedLay a music track under a video: looped/trimmed, faded, leveled, and mixed with any existing audio.
reframeRe-aspect to 16:9, 9:16, 1:1, or 4:5 with pad (letterbox) or crop (fill).
export_presetEncode a platform-ready file (youtube, instagram_reel, tiktok, x, square) at the right aspect, fps, and bitrate.
- -

Windows notes

-
    -
  • Monitor targets crop the virtual desktop to a display's real pixel - bounds, so monitor:1 grabs the second display at its true - offset; monitor:0 is primary.
  • -
  • Window capture matches a case-insensitive exact title first, then - falls back to a substring match; with several matches the topmost window - wins.
  • -
  • Fullscreen-exclusive apps can produce black frames under gdigrab; use - borderless-windowed mode.
  • -
  • System audio capture (start_recording with - audio.source = system) needs a virtual-audio loopback device, - since gdigrab is video-only and Windows has no native loopback. Use - list_audio_devices to find one. Microphone capture is not - supported.
  • -
- -

Threat model

-
- Screen capture can record anything on screen, including secrets. Capture is - always explicit (a tool call, never automatic), output stays on the local - filesystem, and this public repo ignores captured media so test recordings - cannot be committed by accident. Review frames before sharing a file. +
+
00:26CHAPTER 3 / EDIT
+

Small cuts, clean errors.

+

Composable ffmpeg edits with validated inputs — a bad rectangle, + an odd dimension, or a wrong duration is rejected with a message that says + how to fix it, not an encoder stack trace. Tools that write a file refuse + to replace an existing file at a caller-supplied output path + unless overwrite: true is passed; auto-generated default + paths are always unique.

+ + + + + + + + + + + + + + + +
ToolWhat it does
trimCut a sub-clip by start + end or duration. Stream copy, snaps to keyframes.
clipExtract one or more frame-accurate sub-segments to separate files.
concatJoin two or more videos into one.
convertConvert between mp4, gif, and webm.
cropCrop to a pixel rectangle; an off-frame rectangle is rejected.
scaleResize to a width and/or height, keeping aspect when one side is given.
speedChange playback speed by a factor; audio is retempo'd when present.
overlayComposite a logo, watermark, or picture-in-picture, optionally scaled and time-limited.
compressRe-encode smaller with a CRF ladder and an optional width cap.
extract_audioWrite the audio track to its own file (mp3, aac, wav, or copy — copy picks a container that fits the source codec).
redact_regionCover declared rectangles (solid box, blur, or pixelate) to hide on-screen secrets. Declared regions only, not automatic detection.
+
+ +
+
00:41CHAPTER 4 / PRODUCE
+

From raw capture to something you’d publish.

+

Assembly, titles, music, and platform-shaped exports. Mixed inputs are + normalized to a common resolution, fps, and audio rate before they are + combined, so heterogeneous clips compose cleanly.

+ + + + + + + + + + +
ToolWhat it does
assemble_highlightsStitch two or more clips into one with hard cuts or an xfade transition between each.
xfade_transitionCrossfade two videos into one with an xfade transition. Inputs are auto-normalized first.
title_cardGenerate a standalone title card with centered text on a solid background. Uses a bundled font.
music_bedLay a music track under a video: looped/trimmed, faded, leveled, and mixed with any existing audio.
reframeRe-aspect to 16:9, 9:16, 1:1, or 4:5 with pad (letterbox) or crop (fill).
export_presetEncode a platform-ready file (youtube, instagram_reel, tiktok, x, square) at the right aspect, fps, and bitrate.
+
-
Built by TMHSDigital · CC-BY-NC-ND-4.0
-
+
+
00:52APPENDIX / WINDOWS NOTES
+

Know your capture surface.

+
    +
  • Monitor targets crop the virtual desktop to a display’s real pixel + bounds, so monitor:1 grabs the second display at its true + offset; monitor:0 is primary.
  • +
  • Window capture matches a case-insensitive exact title first, then + falls back to a substring match; with several matches the topmost window + wins.
  • +
  • Fullscreen-exclusive apps can produce black frames under gdigrab; use + borderless-windowed mode.
  • +
  • System audio capture (start_recording with + audio.source = system) needs a virtual-audio loopback device, + since gdigrab is video-only and Windows has no native loopback. Use + list_audio_devices to find one. Microphone capture is not + supported.
  • +
+
+ +
+
00:55APPENDIX / SAFETY
+

The screen is sensitive. Treat it that way.

+
+ Screen capture can record anything on screen, including secrets. + Capture is always explicit (a tool call, never automatic), output stays on + the local filesystem, and this public repo ignores captured media so test + recordings cannot be committed by accident. Review frames before sharing + a file, and use redact_region before anything leaves the + machine. +
+
+ + +
+ +