Yeah, it's probably too expensive and complicated to send all the frames to an L...

Yeah, it's probably too expensive and complicated to send all the frames to an LLM. I only send about 30 images at 576x324 which is around 15000 tokens a video, and comes to about $0.045 per video. First I save one frame every second and then loop through them comparing each to find the differences, and send just screenshots which have changed significantly, up to a max of 30. Claude only allows 100 images per API call, so it would be a bit fiddly and costly to handle 7000 frames.

Though, now that I'm thinking about it, you could probably do this locally and just look at the part of the image that has the current score, do some local OCR on it to check if the score has changed each frame, if it has, store the timestamp and then use ffmpeg to extract the correct parts. Probably wouldn't need an LLM at all.

As for editing, one thing I do in my videos is audio keywords so my app can do specific things. For example, I can say "AI, mark what I just said as important." Then when it transcribes the audio and the LLM processes it, it will mark that part as a Distinct Moment with a start and end timestamp, a title and description that will show in my app as a clickable link to that part of the video. I'm thinking of adding more commands for more complex editing too.