资讯Hacker News· 07-02 · 19:10

Claude-real-video：任何 LLM 都能看视频

Claude-real-video － any LLM can watch a video

Most AI tools don't really see a video. Paste a YouTube link into ChatGPT and it reads the transcript, not the picture. Claude won't take a video file at all. Even Gemini, which can read video natively, has to send it up to Google and samples frames at a fixed interval (1 fps by default), so fast cuts slip past.

claude-real-video does it differently, and locally: point it at a URL or a file, and it pulls the frames that actually matter (every scene change, not a fixed quota), throws away the near-duplicates, transcribes the audio, and hands you a clean folder any LLM can read — on your own machine, nothing uploaded.

Then drop the frames + MANIFEST.txt into Claude / ChatGPT / Gemini and ask away.

Why not just sample frames?

Most "let an LLM watch a video" scripts (and Gemini's own pipeline) grab frames at a fixed interval — e.g. one per second. That over-samples a static screencast and under-samples a fast-cut reel. claude-real-video is smarter:

You feed the model fewer, more meaningful frames — cheaper context, better understanding.

Install

System requirement: ffmpeg

ffmpeg / ffprobe are used for frame extraction and audio, and aren't pip-installable. Install them once:

Transcription uses the whisper CLI (installed by the [whisper] extra, or pip install openai-whisper). Whisper also relies on ffmpeg.

Usage

python -m claude_real_video ... works as an alias for crv too.

Options

Use it from Python

How it works

So the model can see (key frames), read (transcript) and — with --keep-audio — hear (full soundtrack) the video. The transcript is plain text any model can read; the tool doesn't burn subtitles into the video — burning is a presentation choice, not something needed to make a video AI-readable.

Notes

License

这篇还没有中文全文

该条目暂未提供中文翻译。标题/摘要已自动中译;本系统只对人工挑选的内容生成全文翻译。

挑中后 → markitdown 取正文 → 精翻 → 此处切换为译文