Claude-real-video:任何 LLM 都能看视频
Claude-real-video - any LLM can watch a video
Most AI tools don't really see a video. Paste a YouTube link into ChatGPT and it reads the transcript, not the picture. Claude won't take a video file at all. Even Gemini, which can read video natively, has to send it up to Google and samples frames at a fixed interval (1 fps by default), so fast cuts slip past.
claude-real-video does it differently, and locally: point it at a URL or a file, and it pulls the frames that actually matter (every scene change, not a fixed quota), throws away the near-duplicates, transcribes the audio, and hands you a clean folder any LLM can read — on your own machine, nothing uploaded.
Then drop the frames + MANIFEST.txt into Claude / ChatGPT / Gemini and ask away.
Why not just sample frames?
Most "let an LLM watch a video" scripts (and Gemini's own pipeline) grab frames at a fixed interval — e.g. one per second. That over-samples a static screencast and under-samples a fast-cut reel. claude-real-video is smarter:
You feed the model fewer, more meaningful frames — cheaper context, better understanding.
Install
System requirement: ffmpeg
ffmpeg / ffprobe are used for frame extraction and audio, and aren't pip-installable. Install them once:
Transcription uses the whisper CLI (installed by the [whisper] extra, or pip install openai-whisper). Whisper also relies on ffmpeg.
Usage
python -m claude_real_video ... works as an alias for crv too.
Options
Use it from Python
How it works
So the model can see (key frames), read (transcript) and — with --keep-audio — hear (full soundtrack) the video. The transcript is plain text any model can read; the tool doesn't burn subtitles into the video — burning is a presentation choice, not something needed to make a video AI-readable.
Notes
License
这篇还没有中文全文
该条目暂未提供中文翻译。标题/摘要已自动中译;本系统只对人工挑选的内容生成全文翻译。
挑中后 → markitdown 取正文 → 精翻 → 此处切换为译文