Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
FFMPEG from Zero to Hero (ffmpegfromzerotohero.com)
412 points by wilsonfiifi on March 6, 2021 | hide | past | favorite | 134 comments


Might have come in handy while I was struggling to type out

    ffmpeg -i part0.mp4 -i part1.mp4 -filter_complex "[1:v]scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:(ow-iw)/2:(oh-ih)/2[v1]; [0:v] [0:a] [v1] [1:a] concat=n=2:v=1:a=1 [v] [a]" -map "[v]" -map "[a]" out.mp4
to concatenate two videos with different sizes today.

My relation with it is almost identical to my relation with any bash scripting more than a one-liner: I have to relearn it every time I want to engage with it.


I never had to use it since I never had anything complicated enough, but next time you relearn things, I've heard good things about ffmpeg-python [0] , it supports multiple inputs, outputs, custom filters, etc.

Their example from the readme:

    ffmpeg -i input.mp4 -i overlay.png -filter_complex "[0]trim=start_frame=10:end_frame=20[v0];\
    [0]trim=start_frame=30:end_frame=40[v1];[v0][v1]concat=n=2[v2];[1]hflip[v3];\
    [v2][v3]overlay=eof_action=repeat[v4];[v4]drawbox=50:50:120:120:red:t=5[v5]"\
    -map [v5] output.mp4
Which turns into

    import ffmpeg

    in_file = ffmpeg.input('input.mp4')
    overlay_file = ffmpeg.input('overlay.png')
    (
        ffmpeg
        .concat(
            in_file.trim(start_frame=10, end_frame=20),
            in_file.trim(start_frame=30, end_frame=40),
        )
        .overlay(overlay_file.hflip())
        .drawbox(50, 50, 120, 120, color='red', thickness=5)
        .output('out.mp4')
        .run()
    )

[0] https://github.com/kkroening/ffmpeg-python


At this point I'd prefer it if ffmpeg included Lua and allowed you to pass Lua scripts, kinda like what ZFS did[1].

[1]: https://zfsonlinux.org/manpages/0.8.3/man8/zfs-program.8.htm...


I feel like it's better to embed the tool in a language, or many community supported languages than embed a programming language in the tool.


Per the docs on zfs program:

    The entire script is executed atomically, with no other administrative operations taking effect concurrently. 
Which would be somewhat hard to ensure with an external language.


This just sounds like when the script is run, a lock is taken. It seems like a pretty fast and loose definition of "atomic" (as in, not the ACID sense), as it says below:

     If a fatal error is returned, the channel program may have not executed at all, may have partially executed, or may have fully executed but failed to pass a return value back to userland. 
If an external tool is allowed to acquire this kind of exclusive lock, I don't see the difference.


> If an external tool is allowed to acquire this kind of exclusive lock, I don't see the difference.

That's the whole point, as I understand it. The channel programs are executed in the kernel, in a way that cannot be done by external programs. Some more details here[1].

edit: I also think you should have included the note, which says

Note: ZFS API functions do not generate Fatal Errors when correctly invoked, they return an error code and the channel program continues executing.

So while it's not quite ACID-level, its not as bad as it sounds without that note.

[1]: https://openzfs.org/wiki/Projects/ZFS_Channel_Programs


My feeling is that they're already there with the command line (just look at the filter stuff). Might as well just go all the way and have something sane that's supported all over.


I find that ffmpeg's regular command lines are easier to mentally parse if I include additional \ to separate things out onto their own lines. That's what I've done in several shell scripts that do complicated things with ffmpeg, along with comment lines and echoing commentary on what it's doing to the terminal.


FTR, I also do that for regular shell scripts.


ffmpeg-python is great! easier to construct complex commands, and read / edit later


It's true, and I feel the same, but to give it some credit: wouldn't that apply to anything you don't use regularly? Programming languages or even regular languages. You have to use it or lose it.


If you don't perform a task frequently, a discoverable interface (e.g. GUI) is dramatically more productive than reading documentation.


Conversely, if you perform a task frequently you want to be able to script it out and automate it away.


Why not both? GUIs can be built that can emit the command-line equivalent.

In ffmpeg's case, I would be looking for a GUI beyond just "here's all the command line parameters, only in a form"... the hardest thing is taking something I already know what I want to do, and encoding it into the command line, because the way to encode a graph into a command line is not obvious. It is easy in principle but there are so many degrees of freedom in how it is done that I can never remember it all. This is a perfect time for a visual language.


there is winff(not sure of its state today) its totally just a gui frontend even shows you it working in console for an example of something already written. scripting can help with more complicated operations but just for common shorthand there are aliases which can be written out scripts single command lines and pointing to script files.


I use ffmpeg not infrequently, but rarely to do the same thing, so automation doesn't help. Knowing how filter chains work and having access to the filter documentation is great.


Well there are video editing GUIs, even straightforward ones. But they'll have footguns and surprising behaviour everywhere. Lossy or compressed video is a hard problem.


This really depends a lot on the quality of the GUI and the documentation.

IMO the best option is the best of both worlds. A GUI that outputs the CLI command along with good documentation would be ideal.


> wouldn't that apply to anything you don't use regularly?

Doesn't apply to well-designed GUI programs. In good GUIs features are discoverable, you don't need prior knowledge or documentation to use software.

Experience speeds things up, e.g. keyboard shortcuts are faster than menus or buttons. That's optional though, one can still use the software without these shortcuts.


There's a continuum, and yes that applies to human languages, both natural and constructed, and to other skills. But where you can choose (so, not natural languages really) you can design things to make this easier, or, I guess, harder.

For example English orthography is a horrible mess. The inventory of squiggles needed to write English isn't too bad, but the correspondence between the words you know and the correct sequence of squiggles to write them is unnecessarily complicated. No benefit accrues to us from this, and in some other written languages it's much easier.

I'd place ffmpeg somewhere in the not-bad but not-great part of the continuum. As with English orthography of course the problem is if you change things with the intent to make them better you actually introduce a cost for existing users which may be impossible to sustain.


You don’t lose it completely. It rusts over, but I had occasion recently to resurrect my 6502 assembler, and it was still in muscle memory.

Kinda like running into an old friend you haven’t seen in decades.


> My relation with it is almost identical to my relation with any bash scripting more than a one-liner: I have to relearn it every time I want to engage with it.

This is how I feel about jq, every time I want to parse some JSON I have to re-read their documentation. Their API is not very intuitive (at least to me).


I just grep through my bash history ;)


I have shortcuts to save last (sl) executed command and another to grep that file.

    alias sl='fc -ln -1 | sed "s/^\s*//" >> ~/.saved_commands.txt'
    alias slg='< ~/.saved_commands.txt grep'
You can also add a comment to the command before saving. For ex:

    $ foo_cmd args #this command does this awesome thing
    $ sl


I have a $HOME/bin folder full of bash scripts invoking imagemagick or ffmpeg one liners.


One handy idiom I like to use after typing out a long command that I might want to save is

    echo ‘!!’ > ~/bin/name-of-script
where !! is the previous command


care to share? :D


This one for instance to grab my desktop quickly without starting another program when I want to make a quick demo for a colleague:

  #!/bin/bash

  size=${1:-"1920x1080"}
  offset=${2:-"0,0"}
  name=${3:-"video"}

  ffmpeg -video_size $size -framerate 25 -f x11grab -i :0.0+$offset -c:v libx264 -crf 0 -preset ultrafast "$name.mkv"
  ffmpeg -i "$name.mkv" -movflags faststart -pix_fmt yuv420p "$name.mp4"


I remember when mencoder wasn't just a frontend for ffmpeg, I literally just saved a text file with the ridiculous command required to convert a DVD to an avi.

Their man page even had a like three line long example for basic stuff. ffmpeg is at least a bit more consistent with syntax though only barely!


Feels like a nice & typed Haskell eDSL on top of ffmpeg would be helpful.


It would be hard to be expressive enough considering there’s a lot of exceptions where things don’t really quite work.

eg copying without conversion between different formats isn’t really possible when it looks like it should be, because there’s a lot of incorrect handling of timestamps both in libavformat and in files themselves.


> I have to relearn it every time I want to engage with it

That's my relation with regex.


I’ve heard this a few times. I’m the opposite, I think I learned it once and never forgot it, it seems impossible. Do you understand regex in terms of a state machine? Draw it out. It’s a very simple language.


Its the syntax, not the idea that I forget. I use it rarely which is why I have to relearn from some cheat sheet what each symbol means.


One of the interesting things you can do with ffmpeg, if you have a LOT of scratch disk space, and you're trying to compare subjective encoder quality (not VMAF, but human eyeballs) on a certain video file:

1) take your raw uncompressed y4m file and write it out to a directory of PNG files, one png file per frame. this is your static image reference baseline for subjective eyeballs.

2) take your raw uncompressed y4m file and encode it to x265 or whatever codec you're testing, at various different bitrates and encoder settings

3) take your various encoded x265 files and also write those out to PNG files in separate directories

3) pick exactly the same frame number filename from your 'master' PNGs and your encoder-output PNGs, copy them and put them in the same directory so you can quickly flip back and forth between them in a image slideshow application.

If you want to do this with 2160p24,p25 or p30 videos at several minute lengths, be prepared to have 150-160GB of scratch disk per y4m and per PNG-dump-directory.


Thing is, x264/265 are explicit designed to provide better subjective quality in moving images.

For example, x264 will “allocate” more data to parts of the image that change less, as those remain on screen longer so artifacts on them are more visible. While fast moving parts can be compressed into a blurry blocky haze as they appear for maybe 20 frames so few people will really notice artifacts.

(Of course the actual implementation is 100x more sophisticated and complex)

You are doing the opposite with PNG’s: focusing on quality of 1 frame instead of on the perceived quality of the frames when viewed as video.


That's indeed true - and why for subjective eyeball moving picture tests you should also do things like cut out a 20 second piece of a raw/y4m file, encode it with various x265 settings, and then put those 4 or 5 pieces in a VLC playlist so you can quickly compare them in series.

For objective comparisons there is of course VMAF, which is an essential tool for doing automated comparisons of videos vs their uncompressed original. One of the reasons why VMAF was created is that subjective eyeball evaluation of codecs (whether still or moving) will vary between person to person, and is very labor intensive.

But still image PNGs of frames also serve a useful purpose when dealing with lower bitrate videos, for your personal opinion comparison of blockiness, color banding, blobs of color in areas that are mostly the same color, etc.


Theoretically, when comparing frames, frame type (I/P/B) should be considered and it's better to compare same frame type. How do you treat about frame types?


If this book does what it claims to do, it should probably cost more (pls raise the price after I buy it). I strongly believe that inside ffmpeg may be the secret to the cosmos, the universe and life itself


I just had to tangle with ffmpeg to stream some video from a spacecraft. I’m genuinely undecided as to which end of that problem was the more complicated one.


Please elaborate, is this a work or hobby project, which space craft?


I'm not OP, but I suspect it is weather sat imagery: you can use SDR to receive images captured by NOAAA weather satellites[1]

1. https://www.rtl-sdr.com/rtl-sdr-tutorial-receiving-noaa-weat...


Read this thread to say this. ffmpeg is one of the most unbelievable FOSS efforts I have ever used. I am in almost constant daily awe of imagemagick and this.


If you google search every title of the contents you will find 10 stackoverflow pages with answers for each of them. Source: I've done all of that. The author didn't try to describe hard things, like, say, how to do distributed encoding of a single video. etc.


I have yet to get 42 as the output of any ffmpeg command I have ever used (and I use it daily). So not sure it is the answer to life, the universe, and everything.


Well, there is at least a poster image of the cosmos.


I love FFMPEG, but it has truly awful handling of timestamps by default. You can't easily extract a clip using an exact timestamp because it rounds to the nearest keyframe, which may be many seconds earlier. It's such a powerful command-line tool, but I find the user-interface far more difficult than it needs to be. In "git" terminology, there's so much plumbing but not enough porcelain

I wish there was a scriptable / command-line interface for HandBrake (which is already based on FFMPEG) [1], where the user just provides the high-level commands: 99% of the time I want to specify high-level commands: extract clip from this timestamp, including SRT subtitles, crop to this geometry, and shift the audio by 3 seconds.

[1] https://en.wikipedia.org/wiki/HandBrake


I use ffmpeg for encoding audio and metadata extraction... very powerful and a really sophisticated tool, but handling timestamps I agree with you. Some caveats that I came across along the way:

- Inaccurate time handling (https://github.com/yermak/AudioBookConverter/issues/21#issue...)

- Incorrect handling of mp3 chapters https://github.com/sandreas/m4b-tool/issues/71#issuecomment-...

If anyone who is interested in using ffmpeg in a docker container (without the dependencies / compiling stuff), this alias is pretty useful (with relative paths ;-):

  alias ffmpeg='docker run --rm -u $(id -u):$(id -g) -v "$PWD:$PWD" -w "$PWD" mwader/static-ffmpeg:4.3.2'


ffmpeg seeks accurately when transcoding. [1] Cutting on non-keyframes when stream copying results in broken video until the next keyframe.

Handbrake does have a CLI. [2] I haven't used it and I'm not sure what advantage it might have over ffmpeg. I personally use mkvmerge or ffmpeg for my muxing/cutting and VapourSynth for encoding.

[1] https://trac.ffmpeg.org/wiki/Seeking

[2] https://handbrake.fr/docs/en/latest/cli/cli-options.html


Yeah, it is literally not possible to not seek to keyframe if you are stream copying.

There are however, plenty of great software that can but at any frame and only re-encode the frames that are outside the whole GOP. Most of them are commercial though, I haven't find one that is free and good.

----

Also, seeking in FFMPEG in practice, is actually more complicated than the guide [1] you linked. Below is a note I keep for own reference for keyframe-copy. Hope someone will find it useful.

How to keyframe-cut video properly with FFMPEG

FFMPEG supports "input seeking" and "output seeking". The output seeking is very slow (it needs to decode the whole video until the timestamp of your -ss) so you want to avoid it if unnecessary.

However, while -ss (seek start) works fine with input seeking, "-to/-t" (seek ending) is somehow vastly inaccurate in input seeking for FFMPEG. It could be off by a few seconds, or sometimes straight up does not work (for some mepeg-ts files recorded from TV).

The best of the two worlds is to use input seeking for -ss and then output seeking for -to. However, this way, the timestamp will restart from 0 in output seeking. So instead of using -to, you should calculate -t (duration) yourself by subtracting -ss from -to, and use `-t duration` instead. Below is a quick Python script to do so.

https://gist.github.com/fireattack/9a100c5a200154937babd1823...

(You can also try to use -copyts to keep timestamp, but not recommended because it doesn't work if the video file has non-zero start time.)


>The best of the two worlds is to use input seeking for -ss and then output seeking for -to.

Could you clarify what you mean with "use output seeking for -to"?

From your Python script it seems that you're just using input seeking and then specifying the duration in seconds with `-t`, which is actually the same as using `-to` when doing input seeking.

Also, input seeking should be inaccurate when doing stream copy, so I'm not sure your script actually works as expected?

(And unless I'm missing something, it seems that all of this is well-explained in the ffmpeg guide linked above.)

Thanks!


I think I know where you're confused: from the guide it looks like there is only a difference where you put -ss; but in reality, where you put -t/-to matters too.

In my Python script, I did input seeking for -ss (start point) part, and then output seeking for -t part (end point). As you can see, the -ss part is before -i {inputfile}, and -t is after.

-ss 1:00 -i file -t 5

is NOT the same as

-ss 1:00 -t 5 -i file.

The latter has a bug that happens frequently when I'm trimming MPEG-TS files recorded from HDTV. It literally doesn't stop at the -t/-to timestamp for reason I don't know. And it only happens when stream copy.

Below is a quick showcase: t.ts is the source, and the filenames show how I generate them with FFMPEG (for example, ss_t_i means input seeking -ss first, then -i t.ts, then -t 1:00).

https://i.imgur.com/lLUSzEM.png

As you can see, if I use -t/-to before -i, it doesn't cut the file properly.

>Also, input seeking should be inaccurate when doing stream copy

Yeah, it's not frame accurate, can only cut at keyframes, but enough for my application. By the way the same inaccuracy exists for output seeking if you're doing stream copy.


Edit: I just reported the bug to ffmpeg tracker: https://trac.ffmpeg.org/ticket/9141


I battled with this a lot with https://github.com/umaar/video-everyday and still haven't found a better solution.

What I don't understand is, how can professional video editing tools trim accurately (and very quickly)? What are they doing differently to ffmpeg?

If do things the "fast way" with ffmpeg, the exported video has random black frames which I think is related to the keyframe issue you mention. If I do things the "slow way" (e.g. accurately) with ffmpeg, it takes a huge amount of time (at least with large 4k videos). But I don't understand how I can drop that same 4k video into Screenflow, trim 1 second out of it and export it in a matter of seconds.


All of the proprietary tools I know of for doing frame-perfect cuts (VideoRedo, TMPGEnc, SolveigMM) work by determining (guessing?) the original encoding parameters and then only reencoding the first and last GOP. The rest of the video is just remuxed.


x264 encoded streams have the original encoding parameters included by default.


Encoding settings metatag can be striped.

Regardless, I don't think these software are "matching" anything. TMPGEnc for example has settings to choose what quality you want for these re-encoded frames.


The parameters being matched would be those that maintain decoder config. Usually, bitrate/quantizer values don't come into that.


Oh yeah, the level would (should) definitely be kept.


If you have sufficient scratch disk space you can absolutely use ffmpeg to take input of a h264, h265, vp8 or vp9 file, or just about anything else, and write it out to a y4m format uncompressed, raw yuv420p or yuv422 file. From there you can use just about any industry standard commercial or free GUI based video editing tool (kdenlive, etc) to extract a clip, down to per-frame precision.


But why is that multi-step process even necessary?

There should be an quick command-line utility to concatenate multiple video files according to exactly the timestamps the user has provided. It's such a common operation.

There's no reason that the tool can't simply do a streaming decode of multiple different file formats and concatenate the video and sub-second precision. If input video resolutions are different, scaling the smaller video to the largest resolution is what the user almost always wants.

I get that FFMPEG is a "plumbing" CLI tool, but a "porcelain" wrapper would be amazing!


I understand that you want to do that, but any attempt to do so will be decidedly non-optimal due to how keyframes and lossy encoders work.

Even if your two files were encoded with x265 at exactly the same bitrate. It's a much more complicated problem than it appears at first glance, once you really dig into the command line options and encoding parameters of codecs like x264, x265 and vp9.

It's not as simple as concatenating two files together. You can also select down to per-frame precision using kdenlive and loading different x264,x265,vp8,vp9 files into it and cutting/editing them together. You will then need to re-encode the resulting output. kdenlive is ultimately a nice GUI front end on top of this:

https://www.mltframework.org/


When I ran into this, it came down to if I wanted to splice two videos together, or re-encode them.

Due to how keyframes work, cutting on keyframe boundaries is a lot faster and easier and doesn't require re-encoding in many cases. This is the default for the segment muxer.

Cutting between keyframes is a fair bit more effort, and requires re-encoding, which is why I guess it's not the default.


I'm not a video expert, but one thing I've noticed is that there seems to be a lot of discrepancies between video files and the way programs use them.

I think it might be differences between how the the container describes the video and the video itself, and which one is chosen as the truth during operations.


the "video itself" doesn't really exist. If you're on Windows and use MPC, you can test this yourself - use WMV9 as a renderer, take a PNG screenshot, then switch to NVENC, and take a screenshot at the same timestamp. You'll notice that on most videos, the screenshots are not the same (with NVEnc introducing macroblocking in darkness, which WMV9 gradients out properly). Using the rendered video as a source of truth instead of the source material itself would be as big of a mistake as using Photoshop on a JPEG.


A well-meaning advice: it's great when TOC and a sample chapter is there (it's almost mandatory for me to consider a self-published book from an author I don't already know).

But that chapter shouldn't simply be the first n pages covering "what is ffmpeg", "where do I download ffmpeg" and "how do I install ffmpeg".

Show us a practical example what cool things I can do with ffmpeg. Your TOC is enticing, pick one of those things (OPUS or audio from waveform or whatever) and show that! Including the command line and your explanation up to it.

So far I have no idea whether you explain anything, or whether you're just listing "100 cool one-liners".

(The latter may also be fine, but I don't know what I'm buying here)


The part of the book that covers raw[0] image conversion is lacking. It suggests ImageMagick or SIPS (on macOS), completely ignoring the difficulty of properly converting image from scene-referred to output-referred format. The result is probably going to be of limited usefulness in most cases involving interpretation of raw footage.

If the author is here, I suggest looking into RawTherapee. It offers a GUI for authoring and applying raw processing profiles, as well as a CLI tool that can be used for batch application of a given profile to raw images. If you consider raw image processing within the scope of the book, you might as well cover it properly.

[0] Setting aside the insistence of the author to spell “RAW” as an abbreviation, which it isn’t. “PodCast” is another instance of weird, excessive and sometimes inconsistent capitalization style used in the book. Not to say this detracts from the substance, but I would suggest that an editor or a proofreader could help your books be more professional.


Is there any book you’d recommend for scripting vfx/media pipelines using tools like ffmpeg , imagemagick , etc ?


This is the only time I'll admit this, but my favorite book on FFMPEG is called google.com

Just like the rest of the internet, there's probably very little that you could do right now, that someone else has not already done or struggled with as well. Most of the time, it's reading about what people have tried that did not work before which is still incredibly useful in and of itself. It sucks while your under pressure of a deadline, but you eventually you just sort of "get it".

The trick is, you have to use it frequently, and not just every now and then when some task comes up. But that's no different than any other tool. Yes, the commands can get harry and scary looking, but so can SQL statements.


I don’t know of any book like that. I have a CinemaDNG processing workflow: create a camera/lens profile using a ColorChecker and DCamProf, batch develop DNGs using RawTherapee into 16-bit TIFFs, export TIFF sequence as a movie using DaVinci Resolve (30 FPS if delivering to popular services like Instagram, where 24 FPS causes subtle stuttering). I’ve been meaning to script this all, Resolve can be easily replaced with ffmpeg, but didn’t get around to that yet. For basic VFX I’ve used Blender (which can track the scene allowing to incorporate 3D objects, and can act as a decent NLE) but again not in a scripted way. It’s just personal projects so there’s not enough pressure to automate.


Ah gotcha ,

Thanks for the reply !


I feel like someone should teach a course on how to use the grouping of command line applications: convert (image magick), pdftk, pandoc, and ffmpeg.

Lots of power to be had in mastering them.


type command.

roll face on keyboard.

hit enter.

repeat until results look good!


So machine learning


Hot dog/not hot dog


Only if your face has a gradient.


Lovely!

A tangentially related question: I've always found FFMPEG-the-cli-program an amazing piece of software, incredibly powerful, versatile and well-made. I therefore expected the same when I had to interact with its library interface (i.e. libavcodec, libavformat) recently. How disappointing, and very frustrating an experience! The docs felt extremely thin and full of "ah yeah don't use the foo function afterall, it's since been replaced with foo_2 and foo_2really3forreal, but the docs don't mention it", and the API conventions seemed very random and inconsistent. Is it just me?

This is not a complaint; thank you to the people who spend their free time developing a free multimedia suite for me to use! I was just surprised about the perceived quality differences between FFMPEG-the-cli-program and FFMPEG-the-library.


Something weird (IMO) about ffmpeg is that it doesn't do hardware-accelerated encoding or decoding by default unless you compile it with support and pass it some extra command line flags.

If you are using ffmpeg with hardware-accelerated codecs like H.264, remember to take the free 10x speed boost!


The boost is not free though. You lose some quality, gain some vendor-specific artifacts, lose the ability to adapt to content, and the ability to trade time for compression ratio or quality.


> If you are using ffmpeg with hardware-accelerated codecs like H.264, remember to take the free 10x speed boost!

The speed boost isn't free, hardware accelerated encoders often can't compress as well as software encoders.

A video codec isn't a fixed rule book that all the encoders follow to get the same result, it's more akin to CSS: there are a bunch of tools you can use but there's no one way to combine them to get a particular result. If you want to make a chess board with HTML and CSS, just think of how many ways there are to do it, it's similar when it comes to video encoding.

So different encoders have different results but why would a hardware encoder compress worse than a software encoder? The answer is simple: a hardware encoder needs to be implemented in hardware. That comes with a ton of constraints that CPU encoders don't have to deal with, such as physical space on silicon, power, heat, limited memory and so on and so forth. This means that the encoder has to make compromises that a software encoder doesn't.

So how big is the difference? It depends a bit how you measure it but Moscow State University has a ton [0] of data about different encoders and just last year, they evaluated a bunch of hardware encoders [1] and compared them to software. You can see in their results [2] that relative to x264 (software, which happens to be ffmpeg's default), NVENC (NVIDIA's encoder, present on their GPUs) took 21.3% more bytes to produce the same subjective result.

[0]: https://www.compression.ru/video/codec_comparison/index_en.h...

[1]: https://www.compression.ru/video/codec_comparison/hevc_2020/...

[2]: https://i.imgur.com/Smh4v3P.png


Accelerated encode in particular is very likely to give inferior quality and a more limited range of encoding options, and also introduce a situation where those factors change from machine to machine (say, based on which codecs and options your particular GPU supports). Of course, if speed and resource consumption are the key factors then it can definitely make sense to use them, but probably not as a default.


The default is to use software for the key reason that running ffmpeg on headless servers will often have no GPU to access, when hardware acceleration is present it is specific to certain vendors only, plus the software implementation that would be done by hardware is higher precision, higher quality. So the sensible default is no acceleration.


You don't want to necessarily use it by default.


You you do want to explain why and not just tell people not to.


At least on mac the quality of the encoded h264 video is not the same (same bitrate but lower quality but much faster encoding) and the scope of fine tuning is also limited.


As below, the quality of the hardware encoder can be different from something you can get otherwise. So I don't see a problem not making it a default for ffmpeg.


Some of them depends on proprietary SDK.


I'm really interested in learning how to fix bad files that have been recorded from a live RTP (or WebRTC) stream. These files tend to have gaps in their PTS or DTS, caused by UDP packet loss when the streaming took place.

There are a myriad of StackOverflow questions, mailing lists, forums... but never a well structured and comprehensive analysis of this topic. And FFmpeg itself, while being a feat of a software project, has such a lackluster and incomplete doc.

I'd like to see a discussion about recovering H.264 and VP8 timestamps, with minimal processing (i.e. not transcoding, if possible, otherwise of course it becomes an easy thing), which covered the whys and whens of using these FFmpeg options:

• -fflags +genpts

• -fflags +igndts

• -vsync

• -copyts

• -use_wallclock_as_timestamps

Does this book cover these options and the topic of timestamp reconstruction?


It does not, unfortunately. Those flags or even the `setpts` filter are not even mentioned once in the whole book.


Is FFMPEG globally unique in being extremely powerful, ubiquitous and totally incomprehensible? It's a real treasure and gift to humanity.


Imagemagick is in the same ballpark of power and complexity.


Perl edges it out, but only barely.


    FFMPEG stands for “Fast-Forward-Moving-Picture-Experts-Group”.
Are we sure FF doesnt stand for "Free Frame" ? I think in the 90s there were a lot of projects with this ff prefix



FFMPEG exposes a terrible API. Even something as simple as thumbnail extraction is overly complex and has a ton of pitfalls that you won't discover until a user uploads a weirdly encoded video.


FFMPEG's API is perfect for what it aims to provide. Not every program is built for every use-case. It's trivial to create helper scripts to provide a simpler API for simple tasks, in fact this is probably how FFMPEG and ImageMagick are most often used; in image, video editing programs and servers.


FFMPEG's API is far from perfect. In order to work with a video in a memory buffer you literally have to write part of a virtual filesystem driver. That is not perfect API design. You should just let me pass in a memory buffer and length. I am not really bothered with how verbose it can be since it is rather low level. What I am bothered by are undocumented pitfalls in demuxing, decoding, and handling color spaces. Ideally their should be an API that handles taking care of all the pitfalls correctly or there should be documentation explaining the pitfalls and how to avoid them.


how is it ffmpeg's fault for a goofy user suppled input?


It's ffmpeg's fault that the correct implementation is overly complex. You will think that your code is correct, but actually there is a flag somewhere you have to set, or you have to copy something from one buffer to another. Due to the drastically large input space of different codecs and settings for each codec and how corruption is handled can make it hard to test. Instead of there being a simple API, there are all sorts of things you just have to know you should do. The examples given by the ffmpeg project are not enough. It's a case of there being many unknown unknowns.


I was looking into cutting a file and I came across two flags. Could anybody explain them in layman's terms.

-c copy - It doesn't re-encode the file. What does re-encoding mean?

-async 1 -which is deprecated in favour of aresample. If the correct syntax is aresample=async=1 then would it just cut off the audio timestamp and match it with video timestamp.


re-encoding = transcoding. which means decoding one video codec and then encoding it into a different codec. OR, reading the source, but then recompressing it into the same codec. Either way, the new video is not the same as the original. `-c :v copy` literally takes the video from the source and places it into the output without doing anything than making it fit in the output.


> What does re-encoding mean

It's something you may want done or not whenever a decoding step is happening by necessity anyway. The "possible benefit" would be if the original encoding was partially broken somewhere (which players/decoders can handle usually but they complain to stderr) or otherwise sub-par. Not sure about video but audio files floating about out there from the last 25 years are a wild mess, with "mp3"s really being layer 2 or 1 or not even mpeg at all inside etc.


ffmpeg's default mode of operation is to decode the input and re-encode it. Unlike the default of re-encoding, "copy" is extremely fast and doesn't involve loss of quality from the input because it's just, well, copying the streams from the input to the output. The downside is that you can't do lots of things when you're just copying: making any changes to the actual video or audio, or even cutting video accurately at non-keyframes requires the standard decode/re-encode steps.

Typical video-file-related reasons for using the "copy" codec would include just switching between container formats: MP4 vs. Matroska vs. AVI vs. whatever else. As long as the new container supports the types of audio and video you have, just copying them will save both time and quality. Or maybe you want to alter the video but keep the audio untouched, or vice-versa. Or you just want to add a track to something and not disturb the existing ones.

Even doing "copy" for all streams and using the same container can be useful, even though nothing would seem to change in that situation: "remuxing" like this is often all that's necessary to fix a wonky file.


Fun little project I made, trying to automate the creation of those "1 second everyday" style videos and used FFmpeg to achieve this: https://github.com/umaar/video-everyday


ffmpeg is really useful, but isn't doing some of these things on a CPU really inefficient? I know there are some offloads for Nvidia GPU’s for some video stuff, but I was under the impression that they used Nvidia’s proprietary libraries that have limited codec support.


it's always a tradeoff between speed and quality. and most of the time cpu offers more quality while hardware encoders are faster, this is mostly because gpu encoders have a different use case than cpu encoders.


The GPU encoders aren't GPGPU programs, but just hardware encoders that happen to be on the GPU. The reason they're worse is because the algorithms they use are worse than x264, which had much more development time and better developers. (who have since retired to become professional cosplayers.)


ffmpeg supports various hardware decoders/encoders: https://trac.ffmpeg.org/wiki/HWAccelIntro

I use the h264_videotoolbox hardware encoder on macOS all the time! But in my experience, CPU based encoding is always better and offers more fine tune control. It's just slow. And there's all sorts of filters and transformations that aren't possible in hardware.

FFmpeg's Nvidia hardware support seems pretty good too. There's an official support doc on using ffmpeg: https://docs.nvidia.com/video-technologies/video-codec-sdk/f...


NVENC is definitely limited -- it only supports H.264, H.265, and H.266. :P But it's absolutely the wrong thing to use for quality video encoding, since it introduces awful artifacting. GPGPU programs for H.264 is being limited, since CUVID is still seen as "good enough" so I've seen support for this in Blender, and nothing else.


After seeing some posts here on HN about FFMPEG I tried looking into green-screen joining two Twitch streams together.

Sometimes a streamer wants to overlay his stream with a tournament channel, but because of copyright they can't. It'd be nice to have an easy way to set that up, if twitch won't do it on their end the user can do it themselves


People do this using OBS already, don't they? I've definitely seen streamers do "talk over" streams of press conferences or "watch party" streams of tournaments..


Can OBS not do that? I haven't used it much but I have had some success merging video streams from different sources without a lot of effort.


I haven't used OBS, but OBS is more a "producer" app, not very consumer oriented, so it mught be a pain to setup it you just want to layer to independent streams at once for a one time thing

I think that a process of clicking a link (meshstreams.io/streamer1,streamer2) would let you download a script that would install FFMPEG and open both streams in VLC


FFmpeg is the foundation of my Video Hub App :D

https://github.com/whyboris/Video-Hub-App

Extract screenshots from video, join them together in a horizontal filmstrip (letterboxing as needed) for super-fast preview of each video :)


ffmpeg is a command designed to keep me grounded. I rely on a recipe to convert FTA TS mpeg streams to an mp4 compression I can tolerate, for my digital pvr stash of free-to-air movies.

Handbrake worked too, sometimes easier for things like 4:3 deletterbox or ad removal or b&w restoration on colourised screening.

I still use mpeg_streamclip for some things. I used to use an Italian PhD students Java tool to fix TS flags without explicitly reencoding. ffmpeg can probably do all the things I am doing, if I look harder.

I have no idea what most of the options mean beyond bullshitting, and I rely on the wisdom of others "try adding this option"

It's digital alchemy. Do not offend Yagoth, incanting for Morgoth. The candles are optional but the goat skull is not: i tried removing the -a: option but something broke.


I use ffmpeg regularly to cut and join videos and it saves me so much time to run my scripts. I’d give up if I had to fire up some GUI and pick and tick one million settings. Admittedly, creating the script is not different but the reuse is the usecase and it does pay off so nicely


This was posted a couple of months ago here.

https://github.com/yuanqing/vdx - An Intuitive CLI for processing video, powered by FFmpeg.


This book looks great.

I'm looking for some easy and beautiful way to generate audio visualization; the books' TOC doesn't seem show much.

Can someone please tell me a bit more what I may get from that chapter?

Thanks


Perhaps https://processing.org/ might be interesting to you.


Thanks for your information. I have found that [1] may be relevant to my requirements. A bit more work, let's see how far I go with that.

[1] https://processing.org/tutorials/sound/


I'm convinced, but their google pay seems to time out on my phone. Will try again later, but if the rest of the book is like the first chapter it's well worth it.


As someone who always wanted to explore FFMPEG this can be a great book.

I got to know about FFMPEG when I wanted to generate gifs using videos.

I’ll definitely buy this book


This book is dope, all the jargon is explained and there is an example for everything!


Just what I need! Wish you offered a print as well, but this will do


Anyone using ffmpeg on a mac with an apple silicon chip?


I did so a while back to convert a video from MOV to MP4, worked fine.


avidemux is a clean ffmpeg gui for quick transcoding, truncating, and container swapping.


ffmpeg is the most intimidating piece of software ever crafted, and it’s not even close


Something I've been working on lately to help with generating some of the common ffmpeg commands:

https://alfg.github.io/ffmpeg-commander/

Hoping to cover more options soon.


Friend, let me introduce you to the horrorshow that is GNU autotools.


imagemagick is close for me, but I do use ffmpeg and magick often, but as others have said feel as though I have to relearn each as soon as I have do something nonbasic.


Idk, I’ve used regex before.


> Tested for MacOS X, Ubuntu 18.04, Ubuntu 20.04 and Windows 10 platforms

I'm always surprised at how often macOS is misprinted, especially in technical publications. I suppose Apple hasn't made it easy to keep up (Mac OS, Mac OS X, OS X[1]), but macOS has been the official name since Sierra[2]. Perhaps it shouldn't, but it does make me wonder about the accuracy of other details a publication might offer.

[1] https://en.wikipedia.org/wiki/MacOS

[2] https://en.wikipedia.org/wiki/MacOS_Sierra




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: