In September 2015 the leading video codec in terms of compression efficiency was HEVC (High Efficiency Video Coding, or H.265). The specification was approved by ITU-T in April 2013, and implementations were already available in both software and hardware form. It seemed that the video industry was ready to transition from H.264/AVC to this new codec, as it did from MPEG-2 many years before.
Unfortunately, motivated by the great success of previous patent licensing programs related to video codecs, two patent pools were already active and a third one was announced. Each one was supposed to act as a one-stop shop for obtaining patent rights to implement the standard. What’s worse, whereas earlier licensing programs offered caps on royalty payments (i.e., a maximum payment per company for the particular technology), this was no longer true for some of the pools. This created serious problems, especially for web-based video delivery companies, which would have to pay per-stream costs.
The response of the internet-based video industry was to take matters in its own hands. Video content companies Amazon, Netflix, and YouTube/Google partnered with software technology developers Cisco, Google, IBM, Microsoft, and Mozilla and chip developers ARM, Intel, and NVIDIA and founded the Alliance for Open Media. The goal of the alliance is to build an open-source, royalty-free video codec that will outperform HEVC. The project launched in September 2015 and incorporated ongoing work on free video codecs being done by individual companies (VP9/VP10 from Google, Daala from Mozilla, and Thor by Cisco). The goal was to produce not just a technical specification but also open-source software that would implement it.
Vidyo was invited to join the group in February 2016, and I was invited to serve as co-chair of the Real-Time Communications Subgroup (RTC SG). The main focus of the parent body, the Codec Working Group (Codec WG), is compression efficiency for video delivery (i.e., HLS or DASH streaming). The RTC SG addresses the specific requirements of real-time applications, including low-delay coding, error resilience, and support for temporal and spatial scalability.
The last 18 months was a period of intense activity. In August 2017 we received approval for our proposal for the high-level syntax of the codec, based on what are referred to as “open bitstream units” (OBUs). The high-level syntax describes how the different video data components are packaged into “bins” so they can be handled transparently by applications.
In December 2017 we also received approval for our proposal on how to support spatial (and temporal) scalability in AV1, as the codec is now known. The design mimics the work we did earlier with Google on spatial scalability in the VP9 video codec. Spatial scalability support is essential for using AV1 in our high-performance SFU-based VidyoCloud, in exactly the same way it was with H.264/AVC and VP9. This way we can use all the innovations that were designed into our system and enjoy the improved coding efficiency offered by AV1.
While it will take some time before commercial implementations that fully utilize AV1’s capabilities are available in the marketplace, we expect that AV1 will become a significant contender in 12 to 18 months. The fact that Apple and Facebook joined the alliance in recent months is further indication that the winds are behind AV1’s sails. Chips supporting AV1 should arrive in late 2019, if not earlier. Part of the challenge in implementing a new codec is in the optimizations used in the encoder portion, especially when real-time performance is required.
This is the first time in the history of video codecs that an industry alliance of such breadth stands behind an industry forum to compete with official standards produced by Geneva-based international standards organizations. It is truly the dawn of a new era.
The situation is described very eloquently in a blog post, “A crisis, the causes, and a solution,”, by Leonardo Chiariglione, the chairman (formally, “Convenor”) of MPEG (Motion Pictures Experts Group), one of the two committees that develop these standards and part of the Geneva-based International Standards Organization. Dr. Chiariglione argues:
“AOM will certainly give much needed stability to the video codec market but this will come at the cost of reduced if not entirely halted technical progress.”
The presumption is that lack of patent royalties will cause companies and academic/research organizations to stop making investments in new video codec technology, and hence progress will slow or even stop.
I can certainly see the argument. Note that several of the innovations that became the foundation of H.264/AVC actually came from academic and research organizations that do not make any products. Any return on their investment can only come from patent royalties.
Countering that disconcerting situation, however, is the fact that, for the first time, all browser manufacturers — Apple, Google, Microsoft, and Mozilla — are now part of the same video codec development organization. While there has been no specific announcement, it is likely that we may finally have support for the same high-end codec across all major browsers, which would be a huge milestone for WebRTC. If implementations are not limited to just streaming decoders, we may finally have the holy grail of interoperability for real-time applications.
Vidyo has a system architecture that offers seamless, native WebRTC video support across its entire VidyoCloud infrastructure. Being able to use a single codec across the entire range of systems, from phones and tablets to desktops and room systems, would offer huge performance and quality benefits to our customers, especially in healthcare and financial services, where browser-based applications are in heavy use. If AV1’s potential for universal interoperability is fulfilled, it will certainly mark the beginning of a new era in the video communication industry.