FFmpeg is truly a multi-tool for media processing. As an industry-standard tool, it supports a wide range of audio and video codecs and container formats. It can also orchestrate complex filter chains for media editing and manipulation. For users of our apps, FFmpeg plays an important role in enabling new video experiences and improving the reliability of existing ones.

Meta is running ffmpeg (the main CLI application) et ffprobe (a utility for retrieving media file properties) binary files tens of billions of times a day, which presents unique challenges when handling media files. FFmpeg can easily handle transcoding and editing of individual files, but our workflows impose additional requirements to meet our needs. For many years we had to rely on our own, internally developed branch of FFmpeg to provide features that were recently added to FFmpeg, tel que: B. Threaded multi-lane encoding and real-time quality metric calculation.

Over time, our internal fork diverged significantly from the upstream version of FFmpeg. At the same time, new versions of FFmpeg brought support for new codecs and file formats as well as reliability improvements, allowing us to capture more diverse video content from users without interruptions. This required us to support both current open source versions of FFmpeg in addition to our internal fork. Not only did this lead to a gradually diverging feature set, but also challenges in safely rebasing our internal changes to avoid regressions.

As our internal fork became increasingly outdated, we worked with FFmpeg developers, FFlabs, and VideoLAN to develop features in FFmpeg that allowed us to completely eliminate our internal fork and rely solely on the upstream version for our use cases. Using upstream patches and refactorings, we were able to close two important gaps that we previously needed to address on our internal fork: threaded, multi-track transcoding, and real-time quality metrics.

Building more efficient multi-track transcoding for VOD and live streaming

image[1]-FFmpeg at Meta: Media Processing at Scale For Windows 7,8,10,11-Winpcsoft.com
A video transcoding pipeline that produces multiple outputs at different resolutions.

When a user uploads a video through one of our apps, we generate a set of encodings to support Dynamic Adaptive Streaming over HTTP (DASH) playback. DASH playback allows the apps video player to dynamically select an encoding based on signals such as network conditions. These encodings may differ in resolution, codec, frame rate and visual quality level, but they are created from the same source encoding and the player can switch between them seamlessly in real time.

In a very simple system, separate FFmpeg command lines can generate the encodings for each track one at a time. This could be optimized by executing each instruction in parallel, but this quickly becomes inefficient due to the duplicate work being performed by each process.

To get around this, multiple outputs could be generated within a single FFmpeg command line, with a videos frames decoded once and sent to the encoder instance of each output. This eliminates a lot of video decoding deduplication overhead and process startup time caused by each command line. Given that we process over 1 billion video uploads daily, each requiring multiple FFmpeg executions, reductions in per-process compute usage result in significant efficiencies.

Our internal FFmpeg fork provided additional optimization: parallelized video encoding. While individual video encoders often have multithreading internally, previous versions of FFmpeg ran each encoder serially for a given frame when multiple encoders were used. By operating all encoder instances in parallel, better parallelism can be achieved overall.

Thanks to contributions from FFmpeg developers, including those at FFlabs and VideoLAN, more efficient threading was implemented starting with FFmpeg 6.0, with the finishing touches landing in 8.0. This was directly influenced by the design of our internal fork and was one of the key features we relied on. This development led to this Most complex refactoring of FFmpeg in decades and has enabled more efficient encodings for all FFmpeg users.

To fully migrate from our internal fork, we needed another pre-implemented feature: real-time quality metrics.

Enables real-time quality metrics when transcoding for live streams

image[2]-FFmpeg at Meta: Media Processing at Scale For Windows 7,8,10,11-Winpcsoft.com

Visual quality metrics, which provide a numerical representation of the perceived visual quality of media, can be used to quantify the loss of quality caused by compression. These metrics are categorized into reference or non-reference metrics, the former comparing a reference coding to another distorted Coding.

FFmpeg can calculate various visual quality metrics such as PSNR, SSIM and VMAF using two existing encodings on a separate command line after the encoding is complete. This is fine for offline or VOD use cases, but not for live streaming where we may want to calculate quality metrics in real time.

To do this we need to insert a video decoder after each video encoder used by each output track. These provide bitmaps for each frame in the video after Compression has been applied so we can compare the frames before Compression. In the end, we can create a quality metric for each encoded track in real time using a single FFmpeg command line.

Thanks to the “in-loop” decoding enabled by FFmpeg developers, including those at FFlabs and VideoLAN, starting with FFmpeg 7.0, we no longer need to rely on our internal FFmpeg fork for this feature.

We upstream when it has the greatest impact on the community

Things like real-time quality metrics when transcoding and more efficient threading can bring efficiencies to a variety of FFmpeg-based pipelines both within and outside of Meta, and we are committed to enabling these developments upstream to help the FFmpeg community and the industry at large. Cependant, there are some patches that we developed internally that make no sense to contribute upstream. These are very specific to our infrastructure and do not generalize well.

FFmpeg supports hardware accelerated decoding, encoding and filtering with devices such as NVIDIAs NVDEC and NVENC, AMDs Unified Video Decoder (UVD), and Intels Quick Sync Video (QSV). Each device is supported by an implementation of standard APIs in FFmpeg, allowing for easier integration and minimizing the need for device-specific command line flags. Weve added support for Meta Scalable Video Processor (MSVP)our custom ASIC for video transcoding, via the same APIs and enables the use of common tools on different hardware platforms with minimal platform-specific peculiarities.

Since MSVP is only used within Metas own infrastructure, it would be challenging for FFmpeg developers to support it without access to the hardware for testing and validation. In this case, it makes sense to keep such patches internal, as they would not be of any use externally. We have taken responsibility to rebase our internal patches to newer FFmpeg versions over time, performing extensive validation to ensure robustness and correctness during upgrades.

Our continued commitment to FFmpeg

Thanks to more efficient multi-lane encoding and real-time quality metrics, we were able to completely eliminate our internal FFmpeg fork for all VOD and live streaming pipelines. And thanks to standardized hardware APIs in FFmpeg, we were able to support our MSVP ASIC alongside software-based pipelines with minimal friction.

FFmpeg has stood the test of time with over 25 years of active development. Developments that improve resource utilization, add support for new codecs and features, and increase reliability enable robust support for a broader range of media. For people on our platforms, this means enabling new experiences and improving the reliability of existing ones. We plan to continue investing in FFmpeg in collaboration with open source developers, bringing benefits to Meta, the entire industry, and the people who use our products.

Acknowledgments

We would like to acknowledge the contributions of the open source community, our partners in FFlabs and VideoLAN, and many meta-engineers, including Max Bykov, Jordi Cenzano Ferret, Tim Harris, Colleen Henry, Mark Shwartzman, Haixia Shi, Cosmin Stejerean, Hassene Tmar, and Victor Loh.