Diary Of An x264 Developer

05/19/2010 (9:30 am)

The first in-depth technical analysis of VP8

Filed under: google,VP8 ::

Back in my original post about Internet video, I made some initial comments on the hope that VP8 would solve the problems of web video by providing a supposed patent-free video format with significantly better compression than the current options of Theora and Dirac. Fortunately, it seems I was able to acquire access to the VP8 spec, software, and source a good few days before the official release and so was able to perform a detailed technical analysis in time for the official release.

The questions I will try to answer here are:

1. How good is VP8? Is the file format actually better than H.264 in terms of compression, and could a good VP8 encoder beat x264? On2 claimed 50% better than H.264, but On2 has always made absurd claims that they were never able to back up with results, so such a number is almost surely wrong. VP7, for example, was claimed to be 15% better than H.264 while being much faster, but was in reality neither faster nor higher quality.

2. How good is On2′s VP8 implementation? Irrespective of how good the spec is, is the implementation good, or is this going to be just like VP3, where On2 releases an unusably bad implementation with the hope that the community will fix it for them? Let’s hope not; it took 6 years to fix Theora!

3. How likely is VP8 to actually be free of patents? Even if VP8 is worse than H.264, being patent-free is still a useful attribute for obvious reasons. But as noted in my previous post, merely being published by Google doesn’t guarantee that it is. Microsoft did similar a few years ago with the release of VC-1, which was claimed to be patent-free — but within mere months after release, a whole bunch of companies claimed patents on it and soon enough a patent pool was formed.

We’ll start by going through the core features of VP8. We’ll primarily analyze them by comparing to existing video formats. Keep in mind that an encoder and a spec are two different things: it’s possible for good encoder to be written for a bad spec or vice versa! Hence why a really good MPEG-1 encoder can beat a horrific H.264 encoder.

But first, a comment on the spec itself.

AAAAAAAGGGGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH!

The spec consists largely of C code copy-pasted from the VP8 source code — up to and including TODOs, “optimizations”, and even C-specific hacks, such as workarounds for the undefined behavior of signed right shift on negative numbers. In many places it is simply outright opaque. Copy-pasted C code is not a spec. I may have complained about the H.264 spec being overly verbose, but at least it’s precise. The VP8 spec, by comparison, is imprecise, unclear, and overly short, leaving many portions of the format very vaguely explained. Some parts even explicitly refuse to fully explain a particular feature, pointing to highly-optimized, nigh-impossible-to-understand reference code for an explanation. There’s no way in hell anyone could write a decoder solely with this spec alone.

Now that I’ve gotten that out of my system, let’s get back to VP8 itself. To begin with, to get a general sense for where all this fits in, basically all modern video formats work via some variation on the following chain of steps:

Encode: Predict -> Transform + Quant -> Entropy Code -> Loopfilter
Decode: Entropy Decode -> Predict -> Dequant + Inverse Transform -> Loopfilter

If you’re looking to just get to the results and skip the gritty technical details, make sure to check out the “overall verdict” section and the “visual results” section. Or at least skip to the “summary for the lazy”.

Prediction

Prediction is any step which attempts to guess the content of an area of the frame. This could include functions based on already-known pixels in the same frame (e.g. inpainting) or motion compensation from a previous frame. Prediction usually involves side data, such as a signal telling the decoder a motion vector to use for said motion compensation.

Intra Prediction

Intra prediction is used to guess the content of a block without referring to other frames. VP8′s intra prediction is basically ripped off wholesale from H.264: the “subblock” prediction modes are almost exactly identical (they even have the same names!) to H.264′s i4x4 mode, and the whole block prediction mode is basically identical to i16x16. Chroma prediction modes are practically identical as well. i8x8, from H.264 High Profile, is not present. An additional difference is that the planar prediction mode has been replaced with TM_PRED, a very vaguely similar analogue. The specific prediction modes are internally slightly different, but have the same names as in H.264.

Honestly, I’m very disappointed here. While H.264′s intra prediction is good, it has certainly been improved on quite a bit over the past 7 years, and I thought that blatantly ripping it off was the domain of companies like Real (see RV40). I expected at least something slightly more creative out of On2. But more important than any of that: this is a patent time-bomb waiting to happen. H.264′s spatial intra prediction is covered in patents and I don’t think that On2 will be able to just get away with changing the rounding in the prediction modes. I’d like to see Google’s justification for this — they must have a good explanation for why they think there won’t be any patent issues.

Update: spatial intra prediction apparently dates back to Nokia’s MVC H.26L proposal, from around ~2000. It’s possible that Google believes that this is sufficient prior art to invalidate existing patents — which is not at all unreasonable!

Verdict on Intra Prediction: Slightly modified ripoff of H.264. Somewhat worse than H.264 due to omission of i8x8.

Inter Prediction

Inter prediction is used to guess the content of a block by referring to past frames. There are two primary components to inter prediction: reference frames and motion vectors. The reference frame is a past frame from which to grab pixels from and the motion vectors index an offset into that frame. VP8 supports a total of 3 reference frames: the previous frame, the “alt ref” frame, and the “golden frame”. For motion vectors, VP8 supports variable-size partitions much like H.264. For subpixel precision, it supports quarter-pel motion vectors with a 6-tap interpolation filter. In short:

VP8 reference frames: up to 3
H.264 reference frames: up to 16
VP8 partition types: 16×16, 16×8, 8×16, 8×8, 4×4
H.264 partition types: 16×16, 16×8, 8×16, flexible subpartitions (each 8×8 can be 8×8, 8×4, 4×8, or 4×4).
VP8 chroma MV derivation: each 4×4 chroma block uses the average of colocated luma MVs (same as MPEG-4 ASP)
H.264 chroma MV derivation: chroma uses luma MVs directly
VP8 interpolation filter: qpel, 6-tap luma, mixed 4/6-tap chroma
H.264 interpolation filter: qpel, 6-tap luma (staged filter), bilinear chroma
H.264 has but VP8 doesn’t: B-frames, weighted prediction

H.264 has a significantly better and more flexible referencing structure. Sub-8×8 partitions are mostly unnecessary, so VP8′s omission of the H.264-style subpartitions has little consequence. The chroma MV derivation is more accurate in H.264 but slightly slower; in practice the difference is probably near-zero both speed and compression-wise, since sub-8×8 luma partitions are rarely used (and I would suspect the same carries over to VP8).

The VP8 interpolation filter is likely slightly better, but will definitely be slower to implement, both encoder and decoder-side. A staged filter allows the encoder to precalculate all possible halfpel positions and then quickly calculate qpel positions when necessary: an unstaged filter does not, making subpel motion estimation much slower. Not that unstaged filters are bad — staged filters have basically been abandoned for all of the H.265 proposals — it’s just an inherent disadvantage performance-wise. Additionally, having as high as 6 taps on chroma is, IMO, completely unnecessary and wasteful.

The lack of B-frames in VP8 is a killer. B-frames can give 10-20% (or more) compression benefit for minimal speed cost; their omission in VP8 probably costs more compression than all other problems noted in this post combined. This was not unexpected, however; On2 has never used B-frames in any of their video formats. They also likely present serious patent problems, which probably explains their omission. Lack of weighted prediction is also going to hurt a bit, especially in fades.

Update: Alt-ref frames can apparently be used to partially replicate the lack of B-frames. It’s not nearly as good, but it can get at least some of the benefit without actual B-frames.

Verdict on Inter Prediction: Similar partitioning structure to H.264. Much weaker referencing structure. More complex, slightly better interpolation filter. Mostly a wash — except for the lack of B-frames, which is seriously going to hurt compression.

Transform and Quantization

After prediction, the encoder takes the difference between the prediction and the actual source pixels (the residual), transforms it, and quantizes it. The transform step is designed to make the data more amenable to compression by decorrelating it. The quantization step is the actual information-losing step where compression occurs; the output values of transform are rounded, mostly to zero, leaving only a few integer coefficients.

Transform

For transform, VP8 uses again a very H.264-reminiscent scheme. Each 16×16 macroblock is divided into 16 4×4 DCT blocks, each of which is transformed by a bit-exact DCT approximation. Then, the DC coefficients of each block are collected into another 4×4 group, which is then Hadamard-transformed. OK, so this isn’t reminiscent of H.264, this is H.264. There are, however, 3 differences between VP8′s scheme and H.264′s.

The first is that the 8×8 transform is omitted entirely (fitting with the omission of the i8x8 intra mode). The second is the specifics of the transform itself. H.264 uses an extremely simplified “DCT” which is so un-DCT-like that it often referred to as the HCT (H.264 Cosine Transform) instead. This simplified transform results in roughly 1% worse compression, but greatly simplifies the transform itself, which can be implemented entirely with adds, subtracts, and right shifts by 1. VC-1 uses a more accurate version that relies on a few small multiplies (numbers like 17, 22, 10, etc). VP8 uses an extremely, needlessly accurate version that uses very large multiplies (20091 and 35468). This in retrospect is not surpising, as it is very similar to what VP3 used.

The third difference is that the Hadamard hierarchical transform is applied for some inter blocks, not merely i16x16. In particular, it also runs for p16x16 blocks. While this is definitely a good idea, especially given the small transform size (and the need to decorrelate the DC value between the small transforms), I’m not quite sure I agree with the decision to limit it to p16x16 blocks; it seems that perhaps with a small amount of modification this could also be useful for other motion partitions. Also, note that unlike H.264, the hierarchical transform is luma-only and not applied to chroma.

Overall, the transform scheme in VP8 is definitely weaker than in H.264. The lack of an 8×8 transform is going to have a significant impact on detail retention, especially at high resolutions. The transform is needlessly slower than necessary as well, though a shift-based transform might be out of the question due to patents. The one good new idea here is applying the hierarchical DC transform to inter blocks.

Verdict on Transform: Similar to H.264. Slower, slightly more accurate 4×4 transform. Improved DC transform for luma (but not on chroma). No 8×8 transform. Overall, worse.

Quantization

For quantization, the core process is basically the same among all MPEG-like video formats, and VP8 is no exception. The primary ways that video formats tend to differentiate themselves here is by varying quantization scaling factors. There are two ways in which this is primarily done: frame-based offsets that apply to all coefficients or just some portion of them, and macroblock-level offsets. VP8 primarily uses the former; in a scheme much less flexible than H.264′s custom quantization matrices, it allows for adjusting the quantizer of luma DC, luma AC, chroma DC, and so forth, separately. The latter (macroblock-level quantizer choice) can, in theory, be done using its “segmentation map” features, albeit very hackily and not very efficiently.

The killer mistake that VP8 has made here is not making macroblock-level quantization a core feature of VP8. Algorithms that take advantage of macroblock-level quantization are known as “adaptive quantization” and are absolutely critical to competitive visual quality. My implementation of variance-based adaptive quantization (before, after) in x264 still stands to this day as the single largest visual quality gain in x264 history. Encoder comparisons have showed over and over that encoders without adaptive quantization simply cannot compete.

Thus, while adaptive quantization is possible in VP8, the only way to implement it is to define one segment map for every single quantizer that one wants and to code the segment map index for every macroblock. This is inefficient and cumbersome; even the relatively suboptimal MPEG-style delta quantizer system would be a better option. Furthermore, only 4 segment maps are allowed, for a maximum of 4 quantizers per frame.

Verdict on Quantization: Lack of well-integrated adaptive quantization is going to be a killer when the time comes to implement psy optimizations. Overall, much worse.

Entropy Coding

Entropy coding is the process of taking all the information from all the other processes: DCT coefficients, prediction modes, motion vectors, and so forth — and compressing them losslessly into the final output file. VP8 uses an arithmetic coder somewhat similar to H.264′s, but with a few critical differences. First, it omits the range/probability table in favor of a multiplication. Second, it is entirely non-adaptive: unlike H.264′s, which adapts after every bit decoded, probability values are constant over the course of the frame. Accordingly, the encoder may periodically send updated probability values in frame headers for some syntax elements. Keyframes reset the probability values to the defaults.

This approach isn’t surprising; VP5 and VP6 (and probably VP7) also used non-adaptive arithmetic coders. How much of a penalty this actually means compression-wise is unknown; it’s not easy to measure given the design of either H.264 or VP8. More importantly, I question the reason for this: making it adaptive would add just one single table lookup to the arithmetic decoding function — hardly a very large performance impact.

Of course, the arithmetic coder is not the only part of entropy coding: an arithmetic coder merely turns 0s and 1s into an output bitstream. The process of creating those 0s and 1s and selecting the probabilities for the encoder to use is an equally interesting problem. Since this is a very complicated part of the video format, I’ll just comment on the parts that I found particularly notable.

Motion vector coding consists of two parts: prediction based on neighboring motion vectors and the actual compression of the resulting delta between that and the actual motion vector. The prediction scheme in VP8 is a bit odd — worse, the section of the spec covering this contains no English explanation, just confusingly-written C code. As far as I can tell, it chooses an arithmetic coding context based on the neighboring MVs, then decides which of the predicted motion vectors to use, or whether to code a delta instead.

The downside of this scheme is that, like in VP3/Theora (though not nearly as badly), it biases heavily towards the re-use of previous motion vectors. This is dangerous because, as the Theora devs have recently found (and fixed to some extent in Theora 1.2 aka Ptalabvorm), any situation in which the encoder picks a motion vector which isn’t the “real” motion vector in order to save bits can potentially have negative visual consequences. In terms of raw efficiency, I’m not sure whether VP8 or H.264′s prediction is better here.

The compression of the resulting delta is similar to H.264, except for the coding of very large deltas, which is slightly better (similar to FFV1′s Golomb-like arithmetic codes).

Intra prediction mode coding is done using arithmetic coding contexts based on the modes of the neighboring blocks. This is probably a good bit better than the hackneyed method that H.264 uses, which always struck me as being poorly designed.

Residual coding is even more difficult to understand than motion vector coding, as the only full reference is a bunch of highly optimized, highly obfuscated C code. Like H.264′s CAVLC, it bases contexts on the number of nonzero coefficients in the top and left blocks relative to the current block. In addition, it also considers the magnitude of those coefficients and, like H.264′s CABAC, updates as coefficients are decoded.

One more thing to note is the data partitioning scheme used by VP8. This scheme is much like VP3/Theora’s and involves putting each syntax element in its own component of the bitstream. The unfortunate problem with this is that it’s a nightmare for hardware implementations, greatly increasing memory bandwidth requirements. I have already received a complaint from a hardware developer about this specific feature with regard to VP8.

Verdict on Entropy Coding: I’m not quite sure here. It’s better in some ways, worse in some ways, and just plain weird in others. My hunch is that it’s probably a very slight win for H.264; non-adaptive arithmetic coding has to have some serious penalties. It may also be a hardware implementation problem.

Loop Filter

The loop filter is run after decoding or encoding a frame and serves to perform extra processing on a frame, usually to remove blockiness in DCT-based video formats. Unlike postprocessing, this is not only for visual reasons, but also to improve prediction for future frames. Thus, it has to be done identically in both the encoder and decoder. VP8′s loop filter is vaguely similar to H.264′s, but with a few differences. First, it has two modes (which can be chosen by the encoder): a fast mode and a normal mode. The fast mode is somewhat simpler than H.264′s, while the normal mode is somewhat more complex. Secondly, when filtering between macroblocks, VP8′s filter has wider range than the in-macroblock filter — H.264 did this, but only for intra edges.

Third, VP8′s filter omits most of the adaptive strength mechanics inherent in H.264′s filter. Its only adaptation is that it skips filtering on p16x16 blocks with no coefficients. This may be responsible for the high blurriness of VP8′s loop filter: it will run over and over and over again on all parts of a macroblock even if they are unchanged between frames (as long as some other part of the macroblock is changed). H.264′s, by comparison, is strength-adaptive based on whether DCT coefficients exist on either side of a given edge and based on the motion vector delta and reference frame delta across said edge. Of course, skipping this strength calculation saves some decoding time as well.

Update:
05:28 < derf> Gumboot: You’ll be disappointed to know they got the loop filter ordering wrong again.
05:29 < derf> Dark_Shikari: They ordered it such that you have to process each macroblock in full before processing the next one.

Verdict on Loop Filter: Definitely worse compression-wise than H.264′s due to the lack of adaptive strength. Especially with the “fast” mode, might be significantly faster. I worry about it being too blurry.

Overall verdict on the VP8 video format

Overall, VP8 appears to be significantly weaker than H.264 compression-wise. The primary weaknesses mentioned above are the lack of proper adaptive quantization, lack of B-frames, lack of an 8×8 transform, and non-adaptive loop filter. With this in mind, I expect VP8 to be more comparable to VC-1 or H.264 Baseline Profile than with H.264. Of course, this is still significantly better than Theora, and in my tests it beats Dirac quite handily as well.

Supposedly Google is open to improving the bitstream format — but this seems to conflict with the fact that they got so many different companies to announce VP8 support. The more software that supports a file format, the harder it is to change said format, so I’m dubious of any claim that we will be able to spend the next 6-12 months revising VP8. In short, it seems to have been released too early: it would have been better off to have an initial period during which revisions could be submitted and then a big announcement later when it’s completed.

Update: it seems that Google is not open to changing the spec: it is apparently “final”, complete with all its flaws.

In terms of decoding speed I’m not quite sure; the current implementation appears to be about 16% slower than ffmpeg’s H.264 decoder (and thus probably about 25-35% slower than state-of-the-art decoders like CoreAVC). Of course, this doesn’t necessarily say too much about what a fully optimized implementation will reach, but the current one seems to be reasonably well-optimized and has SIMD assembly code for almost all major DSP functions, so I doubt it will get that much faster.

I would expect, with equally optimized implementations, VP8 and H.264 to be relatively comparable in terms of decoding speed. This, of course, is not really a plus for VP8: H.264 has a great deal of hardware support, while VP8 largely has to rely on software decoders, so being “just as fast” is in many ways not good enough. By comparison, Theora decodes almost 35% faster than H.264 using ffmpeg’s decoder.

Finally, the problem of patents appears to be rearing its ugly head again. VP8 is simply way too similar to H.264: a pithy, if slightly inaccurate, description of VP8 would be “H.264 Baseline Profile with a better entropy coder”. Even VC-1 differed more from H.264 than VP8 does, and even VC-1 didn’t manage to escape the clutches of software patents. It’s quite possible that VP8 has no patent issues, but until we get some hard evidence that VP8 is safe, I would be cautious. Since Google is not indemnifying users of VP8 from patent lawsuits, this is even more of a potential problem. Most importantly, Google has not released any justifications for why the various parts of VP8 do not violate patents, as Sun did with their OMS standard: such information would certainly cut down on speculation and make it more clear what their position actually is.

But if luck is on Google’s side and VP8 does pass through the patent gauntlet unscathed, it will undoubtedly be a major upgrade as compared to Theora.

Addendum A: On2′s VP8 Encoder and Decoder

This post is primarily aimed at discussing issues relating to the VP8 video format. But from a practical perspective, while software can be rewritten and improved, to someone looking to use VP8 in the near future, the quality (both code-wise, compression-wise, and speed-wise) of the official VP8 encoder and decoder is more important than anything I’ve said above. Thus, after reading through most of the code, here’s my thoughts on the software.

Initially I was intending to go easy on On2 here; I assumed that this encoder was in fact new for VP8 and thus they wouldn’t necessarily have time to make the code high-quality and improve its algorithms. However, as I read through the encoder, it became clear that this was not at all true; there were comments describing bugfixes dating as far back as early 2004. That’s right: this software is even older than x264! I’m guessing that the current VP8 software simply evolved from the original VP7 software. Anyways, this means that I’m not going to go easy on On2; they’ve had (at least) 6 years to work on VP8, and a much larger dev team than x264′s to boot.

Before I tear the encoder apart, keep in mind that it isn’t bad. In fact, compression-wise, I don’t think they’re going to be able to get it that much better using standard methods. I would guess that the encoder, on slowest settings, is within 5-10% of the maximum PSNR that they’ll ever get out of it. There’s definitely a whole lot more to be had using unusual algorithms like MB-tree, not to mention the complete lack of psy optimizations — but at what it tries to do, it does pretty decently. This is in contrast to the VP3 encoder, which was a pile of garbage (just ask any Theora dev).

Before I go into specific components, a general note on code quality. The code quality is much better than VP3, though there’s still tons of typos in the comments. They also appear to be using comments as a form of version control system, which is a bit bizarre. The assembly code is much worse, with staggering levels of copy-paste coding, some completely useless instructions that do nothing at all, unaligned loads/stores to what-should-be aligned data structures, and a few functions that are simply written in unfathomably roundabout (and slower) ways. While the C code isn’t half bad, the assembly is clearly written by retarded monkeys. But I’m being unfair: this is way better than with VP3.

Motion estimation: Diamond, hex, and exhaustive (full) searches available. All are pretty naively implemented: hexagon, for example, performs a staggering amount of redundant work (almost half of the locations it searches are repeated!). Full is even worse in terms of inefficiency, but it’s useless for all but placebo-level speeds, so I’m not really going to complain about that.

Subpixel motion estimation: Straightforward iterative diamond and square searches. Nothing particularly interesting here.

Quantization: Primary quantization has two modes: a fast mode and a slightly slower mode. The former is just straightforward deadzone quant, while the latter has a bias based on zero-run length (not quite sure how much this helps, but I like the idea). After this they have “coefficient optimization” with two modes. One mode simply tries moving each nonzero coefficient towards zero; the slow mode tries all 2^16 possible DCT coefficient rounding permutations. Whoever wrote this needs to learn what trellis quantization (the dynamic programming solution to the problem) is and stop using exponential-time algorithms in encoders.

Ratecontrol (frame type handling): Relies on “boosting” the quality of golden frames and “alt-ref” frames — a concept I find extraordinarily dubious because it means that the video will periodically “jump” to a higher quality level, which looks utterly terrible in practice. You can see the effect in this graph of PSNR; every dozen frames or so, the quality “jumps”. This cannot possibly look good in motion.

Ratecontrol (overall): Relies on a purely reactive ratecontrol algorithm, which probably will not do very well in difficult situations such as hard-CBR and tight buffer constraints. Furthermore, it does no adaptation of the quantizer within the frame (e.g. in the case that the frame overshot the size limitations ratecontrol put on it). Instead, it relies on re-encoding the frame repeatedly to reach the target size — which in practice is simply not a usable option for two reasons. In low-latency situations where one can’t have a large delay, re-encoding repeatedly may send the encoder way behind time-wise. In any other situation, one can afford to use frame-based threading, a much faster algorithm for multithreaded encoding than the typical slice-based threading — which makes re-encoding impossible.

Loop filter: The encoder attempts to optimize the loop filter parameters for maximum PSNR. I’m not quite sure how good an idea this is; every example I’ve seen of this with H.264 ends up creating very bad (often blurry) visual results.

Overall performance: Even on the absolute fastest settings with multithreading, their encoder is slow. On my 1.6Ghz Core i7 it gets barely 26fps encoding 1080p; not even enough to reliably do real-time compression. x264, by comparison, gets 101fps at its fastest preset “ultrafast”. Now, sure, I don’t expect On2′s encoder to be anywhere near as fast as x264, but being unable to stream HD video on a modern quad-core system is simply not reasonable in 2010. Additionally, the speed options are extraordinarily confusing and counterintuitive and don’t always seem to work properly; for example, fast encoding mode (–rt) seems to be ignored completely in 2-pass.

Overall compression: As said before, compression-wise the encoder does a pretty good job with the spec that it’s given. The slower algorithms in the encoder are clearly horrifically unoptimized (see the comments on motion search and quantization in particular), but they still work.

Decoder: Seems to be straightforward enough. Nothing jumped out at me as particularly bad, slow, or otherwise, besides the code quality issues mentioned above.

Practical problems: The encoder and decoder share a staggering amount of code. This means that any bug in the common code will affect both, and thus won’t be spotted because it will affect them both in a matching fashion. This is the inherent problem with any file format that doesn’t have independent implementations and is defined by a piece of software instead of a spec: there are always bugs. RV40 had a hilarious example of this, where a typo of “22″ instead of “33″ resulted in quarter-pixel motion compensation being broken. Accordingly, I am very dubious of any file format defined by software instead of a specification. Google should wait until independent implementations have been created before setting the spec in stone.

Update: it seems that what I forsaw is already coming true:

<derf> gmaxwell: It survives it with a patch that causes artifacts because their encoder doesn’t clamp MVs properly.
<gmaxwell> ::cries::
<derf> So they reverted my decoder patch, instead of fixing the encoder.
<gmaxwell> “but we have many files encoded with this!”
<gmaxwell> so great.. single implementation and it depends on its own bugs.

This is just like Internet Explorer 6 all over again — bugs in the software become part of the “spec”!

Hard PSNR numbers:
(Source/target bitrate are the same as in my upcoming comparison.)
x264, slowest mode, High Profile: 29.76103db (~28% better than VP8)
VP8, slowest mode: 28.37708db (~8.5% better than x264 baseline)
x264, slowest mode, Baseline Profile: 27.95594db

Note that these numbers are a “best-case” situation: we’re testing all three optimized for PSNR, which is what the current VP8 encoder specializes in as well. This is not too different from my expectations above as estimated from the spec itself; it’s relatively close to x264′s Baseline Profile.

Keep in mind that this is not representative of what you can get out of VP8 now, but rather what could be gotten out of VP8. PSNR is meaningless for real-world encoding — what matters is visual quality — so hopefully if problems like the adaptive quantization issue mentioned previously can be overcome, the VP8 encoder could be improved to have x264-level psy optimizations. However, as things stand…

Visual results: Unfortunately, since the current VP8 encoder optimizes entirely for PSNR, the visual results are less than impressive. Here’s a sampling of how it compares with some other encoders. Source and bitrate are the same as above; all encoders are optimized for optimal visual quality wherever possible. And apparently given some of the responses to this part, many people cannot actually read; the bitrate is (as close as possible to) the same on all of these files.

Update: I got completely slashdotted and my few hundred gigs of bandwidth ran out in mere hours. The images below have been rehosted, so if you’ve pasted the link somewhere else, check below for the new one.

VP8 (On2 VP8 rc8) (source) (Note: I recently realized that the official encoder doesn’t output MKV, so despite the name, this file is actually a VP8 bitstream wrapped in IVF, as generated by ivfenc. Decode it with ivfdec.)
H.264 (Recent x264) (source)
H.264 Baseline Profile (Recent x264) (source)
Theora (Recent ptalabvorm nightly) (source)
Dirac (Schroedinger 1.0.9) (source)
VC-1 (Microsoft VC-1 SDK) (source)
MPEG-4 ASP (Xvid 1.2.2) (source)

The quality generated by On2′s VP8 encoder will probably not improve significantly without serious psy optimizations.

One further note about the encoder: currently it will drop frames by default, which is incredibly aggravating and may cause serious problems. I strongly suggest anyone using it to turn the frame-dropping feature off in the options.

Addendum B: Google’s choice of container and audio format for HTML5

Google has chosen Matroska for their container format. This isn’t particularly surprising: Matroska is one of the most widely used “modern” container formats and is in many ways best-suited to the task. MP4 (aka ISOmedia) is probably a better-designed format, but is not very flexible; while in theory it can stick anything in a private stream, a standardization process is technically necessary to “officially” support any new video or audio formats. Patents are probably a non-issue; the MP4 patent pool was recently disbanded, largely because nobody used any of the features that were patented.

Another advantage of Matroska is that it can be used for streaming video: while it isn’t typically, the spec allows it. Note that I do not mean progressive download (a’la Youtube), but rather actual streaming, where the encoder is working in real-time. The only way to do this with MP4 is by sending “segments” of video, a very hacky approach in which one is effectively sending a bunch of small MP4 files in sequence. This approach is used by Microsoft’s Silverlight “Smooth Streaming”. Not only is this an ugly hack, but it’s unsuitable for low-latency video. This kind of hack is unnecessary for Matroska. One possible problem is that since almost nobody currently uses Matroska for live streaming purposes, very few existing Matroska implementations support what is necessary to play streamed Matroska files.

I’m not quite sure why Google chose to rebrand Matroska; “WebM” is a silly name and Matroska is already pretty well-recognized as a brand.

The choice of Vorbis for audio is practically a no-brainer. Even ignoring the issue of patents, libvorbis is still the best general-purpose open source audio encoder. While AAC is generally better at very low bitrates, there aren’t any good open source AAC encoders: faac is worse than LAME and ffmpeg’s AAC encoder is even worse. Furthermore, faac is not free software; it contains code from the non-free reference encoder. Combined with the patent issue, nobody expected Google to pick anything else.

Addendum C: Summary for the lazy

VP8, as a spec, should be a bit better than H.264 Baseline Profile and VC-1. It’s not even close to competitive with H.264 Main or High Profile. If Google is willing to revise the spec, this can probably be improved.

VP8, as an encoder, is somewhere between Xvid and Microsoft’s VC-1 in terms of visual quality. This can definitely be improved a lot.

VP8, as a decoder, decodes even slower than ffmpeg’s H.264. This probably can’t be improved that much; VP8 as a whole is similar in complexity to H.264.

With regard to patents, VP8 copies too much from H.264 for comfort, no matter whose word is behind the claim of being patent-free. This doesn’t mean that it’s sure to be covered by patents, but until Google can give us evidence as to why it isn’t, I would be cautious.

VP8 is definitely better compression-wise than Theora and Dirac, so if its claim to being patent-free does stand up, it’s a big upgrade with regard to patent-free video formats.

VP8 is not ready for prime-time; the spec is a pile of copy-pasted C code and the encoder’s interface is lacking in features and buggy. They aren’t even ready to finalize the bitstream format, let alone switch the world over to VP8.

With the lack of a real spec, the VP8 software basically is the spec–and with the spec being “final”, any bugs are now set in stone. Such bugs have already been found and Google has rejected fixes.

Google made the right decision to pick Matroska and Vorbis for its HTML5 video proposal.

29.76103

Comments [224]

224 Responses to “The first in-depth technical analysis of VP8”

Bruce Says:
May 19th, 2010 at 10:11 am
DS, thanks for the exhaustive analysis. Will take some time to go through in detail, but your conclusion seems to confirm what many predicted in recent months (quality low to medium, format good, patent issues for sure).
Multimedia Mike Says:
May 19th, 2010 at 10:15 am
Thanks so much for getting on top of this. I have a good feeling that this is the ONLY intelligent information we are going to see about VP8 for the time being.
Jonathan Norris Says:
May 19th, 2010 at 10:18 am
Great summary! It will be interesting to see if Google can avoid the legal onslaught.
Daniel Says:
May 19th, 2010 at 10:24 am
Thanks for the overview of the format and comparison pics. It’s nice to hear some useful information that isn’t marketing fluff.

Overall I think this is indeed a step in the right direction. I think the world will be watching over the coming months to see if the patent trolls come out to play.
Gerbrand Oudenaarden Says:
May 19th, 2010 at 10:56 am
Thanks so much for this exhaustive summary. I am very excited that there is now a more serious open source codec than Theora, plus it will be supported by a lot of browsers. But what a shame that we can still expect many patent problems, that the implementation and specification are not up to standards, and that the actual video quality will be less than what we have today with h264/x264.
The open source video codec drama continues …
AlekseiV Says:
May 19th, 2010 at 11:04 am
Looking forward to a potential fix of the VP8 spec and hopefully bitstream, the superior open-source encoder that will be created, and the ensuing patent war.
skal Says:
May 19th, 2010 at 11:12 am
nice job, Jason
cb Says:
May 19th, 2010 at 11:15 am
“The downside of this scheme is that, like in VP3/Theora (though not nearly as badly), it biases heavily towards the re-use of previous motion vectors. This is dangerous because, as the Theora devs have recently found (and fixed to some extent in Theora 1.2 aka Ptalabvorm), any situation in which the encoder picks a motion vector which isn’t the “real” motion vector in order to save bits can potentially have negative visual consequences.”

But presumably the exact same thing is in H264, which also does a delta from predicted movec.

Encoder perceptual optimization would bias movec choice to be the real movec, but this is something that is format-independent.
Geoff Says:
May 19th, 2010 at 11:17 am
Daniel writes: “I think the world will be watching over the coming months to see if the patent trolls come out to play.”

I think of “patent trolls” as companies which hold onto patents quietly and then seek to enforce them when covered technologies become popular. I don’t think of the MPEG-LA patent holders as “patent trolls”
mpz Says:
May 19th, 2010 at 11:19 am
Excellent analysis.

As it stands now (judging from the given comparison picture which I’m sure you’ve picked to favor x264 , I’d rank VP8 in there with Xvid. Maybe the format has potential for more, but right now it’s no better than what we had with Xvid more than half a decade ago.
Ken Jackson Says:
May 19th, 2010 at 11:28 am
Nice post! If there were a list of 10 best tech blog posts of the year, this would be on there. Fun read!
Ben Says:
May 19th, 2010 at 11:29 am
Yes, thanks for the analysis. I am sure a lot of people will appreciate it!
Gabriel Says:
May 19th, 2010 at 11:35 am
Although the spec if *final*, the encoder/decoders are just preview releases. It is silly and naive to make comparisons on an alpha preview. From the official website,
“Note: The initial developer preview releases of browsers supporting WebM are not yet fully optimized and therefore have a higher computational footprint for screen rendering than we expect for the general releases. The computational efficiencies of WebM are more accurately measured today using the development tools in the VP8 SDKs. Optimizations of the browser implementations are forthcoming.”
Weixi Yen Says:
May 19th, 2010 at 11:37 am
The legal issues you mention certainly puts a damper on all the hype.
kl Says:
May 19th, 2010 at 11:43 am
Actually ripping of H.264 may be a way to avoid submarine patents.

By having design so similar, Google only has to worry about MPEG-LA patents.

So there is a known, finite list of patents to review.

Given bizarre tweaks and omissions in VP8, I suspect that’s exactly what they did – looked at claims on H.264 and tweaked VP8 just a little to avoid crucial points (remember in patents you have to infringe all points in a claim, if you infringe 2 out of 3, then you’re safe).
Casper Says:
May 19th, 2010 at 11:50 am
I hope video-codec knowledgeable people like you join webmproject.org )
martin Says:
May 19th, 2010 at 11:54 am
The reason why they forked Matroska and called it WebM instead is because they want a subset that is guaranteed to work out of the box in all browsers and tools. With Matroska MKV files almost nobody implements every single feature of the container.

WebM will be nice in that either you support it or you don’t, it will be like PNG.
Jacob Says:
May 19th, 2010 at 12:01 pm
Great comparison Jason. Well supported by reason and examples. On a first glance I thought this article would be a typical H264 developers’ opinion about the google adopted VP8 codec. But you do have interesting points. BTW I was too lazy to skip Addendum C.
Mike Says:
May 19th, 2010 at 12:04 pm
“I’m not quite sure why Google chose to rebrand Matroska; “WebM” is a stupid name.”

Matroska is a stupid name.
bliblibli Says:
May 19th, 2010 at 12:09 pm
Very nice analysis. In the end only thing that matters is that there’s good free baseline codec that works everywhere without tricky plugins, can be used by everyone and is used by Youtube. At the moment vp8 seems like that codec. I’m all for it and I’m happy that Google did the right thing.
Bill McGonigle Says:
May 19th, 2010 at 12:13 pm
Perhaps Google thinks in Re: Bilski will go the right way, and it can buy its way through any remaining problems.

I’m ready to buy my son an $85 ARM-based mini-laptop. There’s effectively no room in that kind of price for MPEG-LA licenses – Google probably thinks it’s wiser to be able to serve the 3/4 of the world that isn’t going to play the software-patent game.
David Says:
May 19th, 2010 at 12:13 pm
Great analysis..!
Michael G.R. Says:
May 19th, 2010 at 12:21 pm
Thank you so much. You’ve deflated the VP8 bubble for me, but it had to be done (the truth is better than dreams).
jstsch Says:
May 19th, 2010 at 12:25 pm
Brilliant analysis, thank you! Disappointing to see what VP8 actually holds. The quality difference between x264, the lack of hardware decoding and potential for patent problems is a bit too much for me to start rooting for WebM.
James Smith Says:
May 19th, 2010 at 12:42 pm
There is 1 (one) “TODO” in the entire specification – I assume you mean [1]? Hardly the plural you make it out to be. I stopped reading after I verified this over-exaggeration.

[1] http://www.webmproject.org/media/pdf/vp8_bitstream.pdf
Z. Says:
May 19th, 2010 at 12:44 pm
Very nice jobn thanx for it

As post #2, I suspect it may be a kind of more “intelligent analysis”, especially comparing to what’s gonna be spread all over forms in a few weeks…
again, nice job, thank you.
Louise Says:
May 19th, 2010 at 12:49 pm
How much larger would the file size roughly be, if you wanted equal quality with vp8 as in x264?
Leo Says:
May 19th, 2010 at 12:49 pm
I also wanted to thank you for your detailed analysis – very informative although google-tards are probably going to jump on you for this.
Joe Says:
May 19th, 2010 at 12:53 pm
“I’m ready to buy my son an $85 ARM-based mini-laptop. There’s effectively no room in that kind of price for MPEG-LA licenses”

Except that even if only a 100,000 people buy such a laptop the price of the MPEG-LA license per customer is a few bucks at most.
jsmith45 Says:
May 19th, 2010 at 1:01 pm
Bill McGonigle, there is no room in a $85 netbook for $0.20 worth of license fee for AVC (H264)?

That is the license fee per machine under the H264 OEM license if more than 100,000 and less than 5,000,000 are made.

It drops to $0.10 if more than 5,000,000 are made.

If less than 100,000 are made there are no license fees.
Tom Says:
May 19th, 2010 at 1:03 pm
Wow, thanks for your in-depth research, I’m sure it must have taken hours and hours of work sifting through the specs… Amazingly good job !
Dark Shikari Says:
May 19th, 2010 at 1:04 pm
@Tom

I basically spent all of Sunday and much of Monday reading the spec and the code. It wasn’t too long though.
foljs Says:
May 19th, 2010 at 1:20 pm
I’m ready to buy my son an $85 ARM-based mini-laptop.

Hey, there, big spender…

There’s effectively no room in that kind of price for MPEG-LA licenses

Yes, I can see how $2 per device, guaranteed to not be raised by more than 10% in the next 5 years, can be a problem…

Google probably thinks it’s wiser to be able to serve the 3/4 of the world that isn’t going to play the software-patent game.

Google holds a gazillion of patents…

Try stepping on its search patents and see what happens…
Jaco Vosloo Says:
May 19th, 2010 at 1:21 pm
#31, You have just improved my opinion of patented codecs. I always thought they charged a few $ per device.
Markus Says:
May 19th, 2010 at 1:29 pm
#35 $ per device is a one time cost. It’s more the possible $ per view that would rise my concern
The Bloodhound Gang Says:
May 19th, 2010 at 1:30 pm
Many thanks to you from France. I didn’t read the whole tehcnical part since I’m more interested in the legal side of this move by Google, but a technical view was required to know what we were facing.

Anyays, the introduction, overall conclusion and parts of technical development were handy.

Too bad that there are so many similarities, I really like open source and free software, but this looks sketchy at the moment for two reasons: (too) early release and similarities. And I totally share your view about the fact that lawsuits will rain if VP8 is chosen, I think Apple has patents on H.264, they are not gonna let Google get away with this…

And I know this thanks to you!
Paul Says:
May 19th, 2010 at 1:38 pm
Morale is proberly at an all time low at MPEG-LA after hearing this news. It doesn’t matter if VP8 is slightly inferior at this point in time as Google have been smart to utilise the many os devs which are out there and are willing to embrace this standard.

Its like when your a .net dev and your competion developes the same product using Java or whatever… and can sell at a lower price due to being free from licences etc.

It’s now no dream and the truth is, WebM is going to play a big part in web video.

Reading you artical you have brought up some good points but the way it is written sounds like you feel threatened by VP8 (and slighty Pead-off to)

Note that most of the points raised are irrelevant to end users anyway.

Peace.
Hamranhansenhansen Says:
May 19th, 2010 at 1:41 pm
I don’t think any of these issues matter to many potential users of WebM whose main desire is to be able to encode video that only other computer nerds can see.

And this certainly doesn’t matter to anyone who is actually publishing content because H.264 already plays everywhere, including Firefox via 3 or 4 different methods. We’re not going to see video people taking an interest in nonstandard encoders to help the Web author write “purer” HTML.
Richard Lynch Says:
May 19th, 2010 at 1:49 pm
I didn’t understand a lot of this, but two things stick out for me.

I would presume Google engineers/lawyers have already run the numbers on the patent issues, and will not have much drama. They’d have to be idiots not to, given recent past.

I realize you care a whole lot about video quality. But we’re talking YouTube here. I don’t think the quality and/or speed is going to matter much so long as it’s in the right ballpark for the unwashed masses to watch their YouTube.
Jan Rychter Says:
May 19th, 2010 at 1:49 pm
Excellent analysis, thanks!

However, I am puzzled by this: “The lack of B-frames in VP8 is a killer. B-frames can give 10-20% (or more) compression benefit for minimal speed cost”

When you say “speed cost” I think you mean “speed cost on a PC computer” — B frames introduce significant complexity, memory overhead and memory bandwidth requirements which might be unacceptable on a mobile device.

I could understand why they wouldn’t want B-frames in a universal codec for all devices, but I am also puzzled as to why they didn’t want them in VP8 — On2′s codecs were always targeted towards offline-encoded content for PCs.
Louis Gerbarg Says:
May 19th, 2010 at 1:55 pm
MPEG-2 is a couple of dollars, which is a big deal because it is a substantial portion of the cost of a DVD player (hardware or software).

H.264 is at most $0.20, and drops on a per unit basis the more units you ship. It is also free if the licensee ships less than 100,000 units. That is still a problem for vendors who ship millions of units for free (Opera and Mozilla come to mind), since they have no guaranteed revenue from each unit, but it should be a non-issue for hardware product that costs $85, it is less than the cost of couple of good capacitors.
Relgoshan Says:
May 19th, 2010 at 1:59 pm
Oh dear. That patent matter may indeed rear its head. However;

The Ptalabvorm shot looks exceptionally terrible, perhaps about as badly as it could be made to perform. The VP8 shot looks rather similar to ptalabvorm in terms of quality, a bit more crisp but nothing to write home about.

I will wait for a few more comparisons, such as the efforts of the Theora dev team to bleat that their years of effort can easily add in VP8′s actual improvements and beat everyone.

I think what we’ll get from this is a Google bribe to MPEG-LA and better licensing terms on use of the H.264 codec. Would be nice if H.264 and Theora could be just as easily used inside WebM, and if WebM improves the state of subtitles for web video.
J Moreno Says:
May 19th, 2010 at 2:01 pm
Was just browsing the On2 website, and saw this interesting page http://www.on2.com/index.php?599 where there is video comparison of H.264 and VP8 – using build r915 of the x264 encoder, set to HQ 2 pass – have to say VP8 looks waaay better…
deviker Says:
May 19th, 2010 at 2:05 pm
Excelent article
Dark Shikari Says:
May 19th, 2010 at 2:07 pm
@J Moreno

It’s a cherry-picked shot using intentionally terrible x264 settings on a 2-year-old version of the encoder. What do you expect?
Relgoshan Says:
May 19th, 2010 at 2:09 pm
#38: It sounds like you actually think people don’t use .NET, and are unable to sell products made with .NET; The devs at MPEG-LA are thinking about H.265, the lawyers at MPEG-LA are scrambling to gather documents for an investigation, the devs at x264 are relieved and fascinated by what VP8 actually DOES. If anyone is troubled, the devs for Theora must be thinking about their role in the future of web video.
J Moreno Says:
May 19th, 2010 at 2:15 pm
@Dark Shikari

This was the input I’ve been expecting, having in mind I’m just dealing with codecs from end-user perspective – and I believe you. It is sad, then, to see it marketed as “better” solution.
dave Says:
May 19th, 2010 at 2:19 pm
@foljs I saw a blog by Ed Bott recently that made the same claim about H.264 licenses being limited to 10% increases every 5 years. Can you explain why the yearly max fee has risen 10% per year for the last 4 years?

Can you offer any good reason, apart from widespread adoption of VP8 that would encourage them not to increase it by 10% for each of the next 15 years?

It’s also worth looking at how much the licences cost for DVD players over time. It went from about 5 percent to nearly half the production cost.
Alex Says:
May 19th, 2010 at 2:25 pm
Jason, thanks for taking the time to dissect it in such detail!
Relgoshan Says:
May 19th, 2010 at 2:31 pm
#44: If you lean back and put your glasses on, you will note several things between those two images:

1) There is more background detail in the x264 frame, though the blockiness causes it to look bad IF you PAUSE the video and LOOK CLOSELY. Three details only visible in the x264 shot:
-a) The man has a patch of hair on his chin.
-b) The woman behind him is wearing normal sunglasses with a narrow bridge, NOT wraparound shades with a continuous thick band.
-c) The red object behind him appears to be a purse or handbag, you can just barely see the continuous arc of a stiff carry-strap sticking up from it.

2) The two stills come from different frames, 1-2 frames apart in the same video.

3) The high blurriness and sudden up/down leaps of quality in VP8 look similar to Real’s offensive artifacting from 10-year-old codecs.

4) The comparison is not made with two original files coded using VP8 and x264; it is made with both videos RECODED AGAIN using the primitive VP6 codec. Certain formats will behave strangely when recoded, because their original methods of compression are not compatible.

5) None of this changes the fact that VP8 has not been adopted by anyone in the two years since On2 said it was imminent for the market.
viktor Says:
May 19th, 2010 at 2:33 pm
wonderful licensing terms:

If You or your agent or exclusive licensee institute or order or agree to the institution of patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that any implementation of this specification constitutes direct or contributory patent infringement, or inducement of patent infringement, then any rights granted to You under the License for this specification shall terminate as of the date such litigation is filed.

that’s some really strong legal backing. thank you google! …
Joe Says:
May 19th, 2010 at 2:33 pm
“It’s also worth looking at how much the licences cost for DVD players over time. It went from about 5 percent to nearly half the production cost.”

Yes, but if the production cost is 10 times less, than you’re still paying peanuts in licensing fees even if they have become 50% of the production costs.
Relgoshan Says:
May 19th, 2010 at 2:37 pm
#49:

1) MPEG-LA made several announcements about the future of licensing earlier this year.

2) Cost improvements and market saturation have driven the retail price of a DVD player from >$1000USD to ~$25USD at the lowest. Even a decent DVD-only machine should run less than $100, and BluRay players have fallen in price faster than DVD players before them. When the cost of the hardware falls radically, the fixed cost of the codec license appears to be proportionally higher. Thanks for playing.
Terry Says:
May 19th, 2010 at 2:38 pm
#49. My first DVD player cost $400. The latest cost $40. Assuming production costs are 40% of retail, which is grossly exaggerated, the production costs were $160 and $16 respectively.

5% of $160 is $8. 50% of $16 is $8. Same license cost each time. Odd.

It seems more likely that DVD player production costs have plummeted over time than license costs have skyrocketed.
john Says:
May 19th, 2010 at 2:38 pm
“How much larger would the file size roughly be, if you wanted equal quality with vp8 as in x264?”

I’d like to here thoughts on this question.

Current vp8 vs x264?

Theoretical future vp8 vs x264?
Dark Shikari Says:
May 19th, 2010 at 2:41 pm
@john

Hard to say without testing it, and it takes a long time to encode with VP8 on its best encoding mode. If I had to make a guess, I’d say 15-25% for theoretical future as-good-as-x264 vp8 and 40-60% for current vp8. It depends heavily on the content though — in a case where x264′s psy opts help more, vp8 will need more extra bits to compensate.
Relgoshan Says:
May 19th, 2010 at 2:42 pm
52: That’s actually inviting trouble, MPEG-LA wants the right to use VP8 for itself why…? Google says that trying to check for patent fraud or compare the code against other code, cancels your very right to possess or examine the code in any way! That never holds up!
Xsi Says:
May 19th, 2010 at 2:42 pm
I kinda expected this bashing from you, DS. You know, VP8 now is better than x264 when it was first released.
Relgoshan Says:
May 19th, 2010 at 2:48 pm
Actually (Dark Shikari), does all of On2′s fancy wording about processor-adaptive realtime encodes actually boil down to?

http://www.on2.com/index.php?606 Seems to entirely disagree with your findings. Or is it just a dressed-up means of saying that VP8 drops frames for various reasons?
Datruth Says:
May 19th, 2010 at 2:51 pm
Thanks for the article, but I’m not convinced whatsoever that Google would:

a: invest the money to purchase the product

and then

b: open-source an inferior product in the hopes it will unseat H.264 in HTML5 implementations. And what sense would that make? Zero. So, I’m not buying that this is the be all and end all of VP8.
Jan Says:
May 19th, 2010 at 2:59 pm
A number of your concerns are addressed on the On2 website. B-frames, for example, are patented and therefore VP7 and VP8 use other techniques to achieve the same effect. Or at least that’s what they claim.

See:
- http://www.on2.com/index.php?599
- http://www.on2.com/file.php?229 (PDF)

It would be nice if you could comment on their material.
Dark Shikari Says:
May 19th, 2010 at 3:08 pm
@Jan

Regardless of what they claim, the results show that it doesn’t actually work. Furthermore, they don’t have biprediction, which is pretty much the whole reason to have B-frames in the first place.

As I mentioned in the article, yes it’s a patent risk, which is why they don’t have it — this is perfectly reasonable. It’s just going to reduce compression, which unfortunately is pretty much unavoidable if you’re trying to avoid patents.
Dan Says:
May 19th, 2010 at 3:12 pm
I assume you’ve tested the decoder speed without out-of-loop postprocessing, right? I haven’t tested it, but it looks like it’s enabled in the decoder by default and turning it off requires a nonothodox procedure. There are also encoder settings to decrease decoding time (the loop filter you mention, maybe others?), how much do these hurt quality and help speed?

Can’t most of the B-frame gain be emulated via discardable lower-quality frames? It could be tested in x264 by forbidding forward or mixed references, and maybe crippling direct MV prediction.

The “segmentation map” thing doesn’t look as bad as you make it sound. You can choose from four different qps on each frame (hey, it’s better than Theroa’s three!) — it should be decent enough even for mbtree (certainly not optimal, but serviceable). It’s coded as two binary decisions per MB, which can optionally be re-used in the next frame. What’s hacky and inefficient about that? I expected worse.

Also, it’s laughable that the format only “supports” ITU-R BT.601. Is BT.709 patented or something? Can you patent a 3×3 matrix? Theora had this same stupidity. That said this is a billion times better than Theora. With a decent encoder and if it’s really patent-unencumbered (ha ha) it might serve useful purposes both as a codec and as a keeper of low licensing costs for h.264.
AlekseiV Says:
May 19th, 2010 at 3:15 pm
@59: is saying “VP8 now is better than x264 when it was first released” supposed to be praise?

VP8 is a commercial product with years of development behind it, not a hobby project that has just become working.
Joe Says:
May 19th, 2010 at 3:18 pm
“You know, VP8 now is better than x264 when it was first released.”

Duh? What exactly is supposed to be insightful or amazing that a 6 year old codec was better than another codec was on it’s initial release?
Relgoshan Says:
May 19th, 2010 at 3:33 pm
#61: It doesn’t hurt that VP8′s proposed container for HTML5 is very flexible and better for certain over-the-net uses.

#62: Those pages are old, the data backing them is practically nonexistent, and some of their market-speak is completely made up (lies). A little like reading the marketing material for SpinRite.

Good news! Disregarding the viability or legality of VP8, Opera ASA is donating its GStreamer porting efforts to the general GStreamer project! This should allow software that uses GStreamer to play VP8 in the near future!
NA Says:
May 19th, 2010 at 3:34 pm
Patenting algorithms is pathetic and wrong and a symptom of an overdose of greed. Software patents are a nuisance that hinder progress and just help with the monopolization in the software industry. Its usually exactly those who use the free results of researchers and others who then run to patent their own stuff. I personally despise any developer who sells his service to companies who have nothing better to do than to hire patent lawyers to cause trouble instead of licensing decent products.
http://www.nosoftwarepatents.com/
juanb Says:
May 19th, 2010 at 3:37 pm
Can VP8 stand on its own in terms of patents? Maybe. If they can work around the weaker areas, would it be an adequate format? If so, the technical differences might not matter as long as VP8 is widely adopted. Think VHS vs Beta.
EgoLayer13 Says:
May 19th, 2010 at 3:38 pm
Many thanks for this in-depth look at VP8 vs. H.264. With any luck, Google will permit changes to the spec, allowing the developer community to clean up some of the more glaring issues of VP8. Also, good call on Google’s stupid rebranding of Matroska.
Dark Shikari Says:
May 19th, 2010 at 3:45 pm
@Dan

I think B-frames are pretty much useless if you leave out bipred, direct, and forward reference.

I tested decoding with default settings; I think postprocessing is off, but I’m not sure.
Luca Goodwin Says:
May 19th, 2010 at 3:58 pm
OK, if this statement is correct: “That’s right: this software is even older than x264!” how can: “VP8 copies way too much from H.264 for anyone sane to be comfortable with it, no matter whose word is behind the claim of being patent-free.” be correct? If H.264 and VP8 are really that similar and VP8 predates H.264… well, I’m sure you can do the math.
Dark Shikari Says:
May 19th, 2010 at 4:14 pm
@Luca

x264 is just a software program. H.264 is the standard, and H.264 has been around since 2000-2001 (and was finalized in 2003). VP8 has been around for quite a while as well, but we have no idea when they added the H.264-alike features to it. They were surely added after 2000-2001 though.
Jonathan Wilson Says:
May 19th, 2010 at 4:20 pm
The question is, can these “glaring issues” be fixed without increasing the patent risk. Some of these “glaring issues” may exist specifically because its the only way it can be done without infringing on one of the H.264 patents.
Relgoshan Says:
May 19th, 2010 at 4:22 pm
#72: VP8 predates x264. H.264 was being worked on many years before the x264 project was begun. So parts of VP8 may have been stolen from drafts of H.264, while x264 was later begun as an OPEN-SOURCE ALTERNATIVE to the H.264 reference encoder.

H.264 and x264 are not the same thing, nor are they developed by the same people. H.264 is a collection of ideas gathered into a standard, x264 is one program (among many) that creates video which is compatible with the H.264 standard.
Midzuki Says:
May 19th, 2010 at 4:43 pm
Many many thanks for another great article
* THUMBS UP *

P.S.: Hopefully “Mr. Graft” will not come “strike” me for an _alleged_ violation of the “Rule 11″. ^_~
Will Says:
May 19th, 2010 at 5:11 pm
Thanks very much for this exhaustive overview and introduction into the tangled underbelly of video encoders. I knew little to nothing about the details of this world, or programming in general but your explanations were interesting and understandable even with the jargon.

Great post!
johnny Says:
May 19th, 2010 at 5:13 pm
Main 2 issues:

1) Why is google so quick to set standard in stone?

H264 royalty is free until 2016 so there is no rush. Of all things google likes to leave in beta this seems most crucial because hardware is going to built around it. Google could at least beta for 1 year and finalize at 2011 conference.

2) Which will cost more in future: H264 royalty or vp8 additional bandwidth?

That’s really the only question that matters because it’s the only money question. Decoding speed won’t matter in 6 years because it’ll be trivial for cpu to perform, storage space won’t matter, etc, etc. Bandwidth prices also drop over time so in all likelihood H264 simply won’t be worth paying for at any price.

My crystal ball prediction is that if/when H264 ever starts having royalties, that’s when the switch happens. Everything right now until then is just laying infrastructure (client support, tools, encoders, etc) until websites are motivated to switch.

Nay to google for finalizing without input from hardware vendors, developers, etc.

Yay to google for providing viable alternative to H264 and providing free video standard.
Wurast Says:
May 19th, 2010 at 5:13 pm
AFAIK, Youtube and such only encode in the baseline profile because of mobile devices, so you should have really just concentrated on comparing VP8 to H.2464 Baseline.
Paul Irish Says:
May 19th, 2010 at 5:16 pm
It’s worth putting the tl;dr appendix C at the top of the page, i think.
first time through i missed that. it’s perfect for most.

thx for such detail and perspective.
Dark Shikari Says:
May 19th, 2010 at 5:24 pm
@Wurast

They only use Baseline for their lowest resolution encode. Everything else uses Main or High.
Lachlan Stuart Says:
May 19th, 2010 at 5:45 pm
It’s possible that Google is declaring VP8 a sealed spec because they’ve got the On2 staff preparing a VP9 RFC. What do you think the chances of this are? If they are, is there a possibility that they could suck up proposed H.265 features before they are patented?
Matt Says:
May 19th, 2010 at 5:46 pm
Given that youtube is serving 2bn videos per day and assuming they will eventually adopt vp8/webm, it will be in google’s interests to direct significant resources into improving vp8 file-size and encode time?
psuedonymous Says:
May 19th, 2010 at 5:55 pm
I’m a little worried about the ‘Matroska subset’ portion of WebM. It would be most irritating to see widespread hardware support for WebM’s subset without support for MKV features that are in use today. Ordered Chapters, for instance.
gabort Says:
May 19th, 2010 at 6:17 pm
If you are unconcerned about the MPEG-LA license cost included with your device, perhaps the “little” restriction, that you are not allowed to use the codec commercially will raise your head. Go on, check the fine print now. Not even professional cameras costing thousands of dollars have that license. Sound fair? Concerned? I certainly am. The only way to solve software patents is to abolish them.
Lukas Says:
May 19th, 2010 at 9:19 pm
What I find most amazing is that this bunch of crap cost Google no less than $124 million.
Xiong Chiamiov Says:
May 19th, 2010 at 11:25 pm
#83: I found it interesting that the very first item on the roadmap ( http://www.webmproject.org/code/roadmap/ ) mentions fansubs, a field where people certainly care about both quality and filesize (although the former more in recent days). Ordered chapters are in widespread use in the fansubbing community, so my hope is that there is someone with enough interest there to get ordered chapters in and working.

Since Google explicitly stated the reason for using a subset of Matroska is for compatability and such, though, it’s probably unlikely. I think that while WebM may indeed take over the web, good ‘ol h264-aac/vorbis-mkv will continue to be the standard for high-quality desktop video. I’m far from an expert, though.
cc Says:
May 19th, 2010 at 11:27 pm
Excellent writeup.

I’d like to note that 20091 and 35648 are actually fairly clever things to multiply by. In binary:

20091 → 100111001111011
35648 → 1000101101000000
Rex Guo Says:
May 19th, 2010 at 11:59 pm
Long ago I did some work on data compression and I seem to remember that IBM has patents on the arithmetic coding compression technique. Doing a quick search reveals these:

http://en.wikipedia.org/wiki/Arithmetic_coding#US_patents_on_arithmetic_coding

http://www.google.com/search?q=ibm+arithmetic+coding+patent
Steven Says:
May 20th, 2010 at 12:07 am
Very interesting Jason, thnx for the detailed report.

Are you planning on contributing to VP8 in any way? Im sure they could use your asm skills (if I were google I’d hire you today!).

Not that I want you to abandon x264 in any way.
Rex Guo Says:
May 20th, 2010 at 12:10 am
That’s why the JPEG specs says that Arithmetic Coding is an option and defaults to Huffman as Huffman is not patented.
Rolf Says:
May 20th, 2010 at 12:26 am
You’re not quite right about Silverlight’s Smooth Streaming: they take an wmv file and split it in pieces (each piece isn’t a mini-wmv file, it only contains the bitstream). And since VC1 allows resoluion changes in the bitstream, they can effectively split several wmv files of varying resolutions, and switch between them when streaming.

There is no practical difference between this and the http streaming supported in Windows Media Server (for both progressive/stored content and live content).
memo Says:
May 20th, 2010 at 12:55 am
This is just another war being upgrading vs those big names like google and apple and adobe.
David Says:
May 20th, 2010 at 1:12 am
I wish all standards include include not just ambiguous natural language descriptions, but also some working and inambiguous reference code.
KeyJ Says:
May 20th, 2010 at 1:51 am
An excellent summary.
As someone who works in a company that builds hardware video decoders, I wholeheartedly agree that VP8′s entropy decoder is pure horror to implement in hardware.

Another funny side note: It seems that clamping MVs incorrectly is a common error — WMV9 had a bug there, too. There are two MV clamping algorithms in the VC-1 spec: The wrong one for Simple and Main Profile (which is identical to WMV9) and a fixed, correct one for Advanced Profile.
Tommy He Says:
May 20th, 2010 at 1:55 am
Very detail analysis with thoughtful opinions!

Great job! Thank you.
hazydave Says:
May 20th, 2010 at 2:15 am
I think it was the particulars of H.264 licensing that got everyone looking at alternatives. If there was simply a one-time licensing fee for any hardware, then most hardware would be licensed already. The problem is, you’re paying that license over and over again.. each new H.264 producing or consuming application: video tools, web browsers, etc.

Worse still, they’re somehow holding their patents over the pure software product of the patented encoder: the output. In 2015, they’re plannnig to charge for each streaming of an H.264 bitstream… despite you having already have paid for the decoder in order to be able to use it. Even the license on professional H.264 cameras is an issue… in the manual of my Pro-level Panasonic AVCCAM camcorder, it says “This product is licensed under the AVC Patent Portfolio License for the personal and non-commerical use of a consumer…”. Now, that’s certainly non-binding for a product sold specifically for commerical use, but really, they’re just crazy with their claims of the extent of these patents. It’s kind of like if Microsoft or IBM had claimed some control over the products created with a PC, based on the patents in their PC SW or HW.

Even if VP8 isn’t “better” than H.264, if it’s close enough, and someone doesn’t really trip up on the patents (yeah, I see what’s said here, but doesn’t Google have the legal staff and the last year to have looked this stuff over), this will be a good thing. And even On2′s bosts for VP8 were qualified by “on low bitrate video”… even they weren’t suggesting I’d want VP8 replacing H.264 in my camcorder.
Tim Says:
May 20th, 2010 at 2:18 am
I was interested in the following comment

“though a shift-based transform might be out of the question due to patents.”

The ‘shft-based transform’ which is referred to sounds more than a little like a lifting scheme, which should not be patented (not saying it isn’t, it just shouldn’t be).

Would it be possible to clarify the comment and details of a shift-based transform that you think might be prone to patent coverage ?
Lonesome Walker Says:
May 20th, 2010 at 2:23 am
I’m sure, Google has it’s reason to buy this “crap”:

if you don’t want to get sued, you need some “rights” for your own company
David Gerard Says:
May 20th, 2010 at 2:24 am
Greg Maxwell thinks it’s only about as much of a car crash as VP3 was when it was released:

http://lists.wikimedia.org/pipermail/wikitech-l/2010-May/047795.html

“You should have seen what VP3 was like when it was handed over to Xiph.Org. The software was horribly buggy, slow, and the quality was fairly poor (at least compared to the current status).”

He also thinks you’re likely wrong on the patent horrors.

In any case, it’s interesting times
Ts Says:
May 20th, 2010 at 2:45 am
Readability is better. Its express purpose is to make things easy to read, which among other things means carving out all the other shit on the page leaving only the content you actually care about.
Kenny Says:
May 20th, 2010 at 3:30 am
Why didn’t Google just buy MPEG-LA?
jinzo Says:
May 20th, 2010 at 3:45 am
Thanks for a great technical review.
But I’m sure Google was aware of (at least some) of this flaws, but decided to bring it on early. Quite some software, standards and devices dominated the market not with technical superiority, but with just “being there at the right time”. I think Google aimed at this effect, and thus VP8 will still be quite an success that’ll lay grounds for “VP9″.
zhb Says:
May 20th, 2010 at 3:52 am
The non-adaptivity of the arithmetic coding is a big penalty that will make it more close to cavlc while away from the cabac. Perhaps things will be not such bad because there is some adaptivity in frame level or sub-frame level.
The adaptivity in coefficients for loop filter could help the coding performance, and there are some proposals about this in JCTVC.
Besides, the B-frames give pretty complexity in decoding because the b-directional motion compensation. The MC is the most time-consumming part in a H.264 decoder.
x264.nl Says:
May 20th, 2010 at 4:12 am
@Lukas: Exactly, we at x264 land have done better for minus that amount of money. I.e. we didn’t get paid AT ALL.
AndyS2 Says:
May 20th, 2010 at 4:43 am
I just tried out the new codec on youtube today. Here is a comparision of the quality with a specific video that was available in 720p in webm and flash (I suppose it’s H.264, but I can’t be sure. There is no way to copy the .tmp file under windows 7 x64 that I know of, and flash player only showed me bitrate and filesize)

http://img243.imageshack.us/img243/985/youtubevergleich.png

Note:
- I’m not an expert on video encoding
- Top frame on the picture is Opera webm build as linked on http://www.webmproject.org/users/, bottom is stable firefox with current adobe flash plugin
- I used Windows 7 x64 Professional during the test
- The filesizes were very similar, 35407kbyte for flash, 35183kbyte for webm (looked in appdata/local/temp… or appdata/local/opera/… for that)
- I don’t know what encoder/settings youtube used to encode the flash video or the webm video, and I don’t know the quality of the source video that was uploaded. I still like the result of the webm video far more than the blocky flash one.
- Link to the youtube video is http://www.youtube.com/watch?v=EMs5pWce1y0
Overlord Says:
May 20th, 2010 at 4:56 am
Let’s give time to time.

The baseline is: it’s OPEN SOURCE.

We ALL can improve it… not only Google (even YOU, x264 “developer”).

We all can USE it… MPEG-LA too.

Let’s see, Mr. Steve Jobs/Apple, the “self-claimed defensor” of Open Web now.
Anonymous Says:
May 20th, 2010 at 5:37 am
Ordered chapters is the single dumbest feature of MKV, and it would be a very good thing indeed if people learned to not use them.
Baughn Says:
May 20th, 2010 at 5:38 am
Yeah.. once I finally got the encoder to /finish/ encoding a short movie, the quality difference was astonishing. Not in a good way.

I’m not subject to software patents where I live, and I won’t be using VP8.

Something I’ve been wondering, though – if you didn’t have to worry about decoder compatibility (or patents), how much better would it be reasonably possible to make x264? Are you pushing the limits of what we know how to do, or just what the H.264 specification can handle?
chris boozer Says:
May 20th, 2010 at 5:39 am
is vorbis stereo only or can it support multichannel surround?What would really be cool would be to include ambisonic audio.it uses 4 channels to encode a 3-dimensional sound space x,y,z and an omnidirectional channel.It can provide bin-aural 2 channel and can be decoded to feed from 2 channels up to hundreds so it could support 5.1 -10.2 and beyond right out of the gate.all patents are expired public domain except for g-format 5.1,modify that and 4 channel encoding supporting headphones and home theater all the way to 122 channel public concerts.check out ambisonic.net
MfA Says:
May 20th, 2010 at 5:54 am
As I mentioned on Doom9, I can find intra prediction in Q15-J-19 and no patent with a priority date within a year of that.

It doesn’t seem to me that intra prediction is patented (validly). IANAL of course.
Sean Burke Says:
May 20th, 2010 at 6:05 am
You talk about the encoding/decoding speed and quality differences between VP8 and h264. Can you explain what software was used to create the source content and what options were used.
Bye bye Says:
May 20th, 2010 at 6:19 am
Well, it seems the beginning of the end of H264. We won’t miss you MPEG-LA! ^^.
Mandarinka Says:
May 20th, 2010 at 6:36 am
@86 Anonymous:

It is a bit troublesome given the non-uniform support and the added complexity, but it offers tremendous filesize savings in scenarios it is designed for.
FFFUUUU Says:
May 20th, 2010 at 6:46 am
Well, I wouldn’t expect a different analysis from an x264 developer. I think you should start developing for VP8 and stop feeding the patent trolls with your damn x264 encoder, limiting people’s freedom.
Huh, Google made the right election, they won’t need to pay to the MPEGLA. Copy a lot from x264? If it was all copypasted and not patent free why nobody took care of it. On2 was using it already.
Once it’s implemented and beeing used, I’ll emit a veredict over it, but based only on your review, I can’t make up an opinion.
Carlos Solís Says:
May 20th, 2010 at 7:06 am
After reading this article I discovered lots of pet peeves with WebM’s current status:
* The name itself. WebM sounds more like Internet than Multimedia.
* The format. WebM is only a subset of Matroska – no embedded subtitles, no DVD-style chapters, no dual audio.
* The fact that Google will set the standard in stone RIGHT NOW. Come on, Google, if you planned to free VP8, you should suppose somebody would come along with ideas to tweak the basic standard, right? So why seal it right now and prevent such tweaks from becoming part of the standard?
* The fact that (by now) VP8 is unable to make good-quality compression on a 100-to-150 MB per hour. x264 can already achieve it, search for Bencos or RealAnime.
whatsup Says:
May 20th, 2010 at 8:18 am
@EgoLayer13: “Google’s stupid rebranding of Matroska”

WebM isn’t using stock Matroska, but a subset. Stupid? Heh.

@Lukas: “that this bunch of crap cost Google no less than $124 million”

Actually, WebM easily matches H.264 baseline, which is what matters. And this is just the beginning.
Raymod Says:
May 20th, 2010 at 8:36 am
The best feature of the VP8 codec, as opposed to H.264, is that the VP8 codec can be fitted with a 200KW VASIMR engine. In the future, this could enable round trips to Mars in just 7.8 seconds or less.
makomk Says:
May 20th, 2010 at 9:04 am
psuedonymous: does anything actually support playing MKV files with Ordered Chapters except Haali’s MKV splitter? I seem to recall not…
Martin Says:
May 20th, 2010 at 9:18 am
wut? ordered chapters provides endless flexiblity and works on all os’s (mplayer uau for linux and mac)

whats the problem with them?
Dark Shikari Says:
May 20th, 2010 at 9:25 am
@Tim

I don’t know the details of the choice of H.264′s transform, but I do know that nobody else has used a shift-based DCT in any spec as far as I’ve seen. The only reason I can think of is patents.
Steve Says:
May 20th, 2010 at 9:30 am
Sorry, but i think that you feel threatened by VP8.
Mecki Says:
May 20th, 2010 at 9:46 am
Thank you very much Dark Shikari, you ruined it all for me I really dared to hope, that VP8 could finally end the HTML5 codec war, not being the best codec on earth, but maybe good enough as the smallest common denominator for everyone. I knew it is not great in many aspects, but thanks to your post I now have to think it is crap. Not referring to image quality, but lack of useful specification and even worse, coupling encoder and decoder. If we learn one thing from MPEG, it is how to make codec specs: Specify the bitstream, so the meaning of every single bit is known at any time. Specify the decoder, so everything you need to know to convert the bitstream back into full pictures is right at your hands. Don’t specify the encoder; every encoder is valid if it generates a bitstream as defined that can be decoded by a decoder as defined. This simple recipe works great in practice.
Manuel Says:
May 20th, 2010 at 10:11 am
I think Google setted the spec in stone before open-sourcing it, in order to avoid more patent infringement risks.
Mavro Says:
May 20th, 2010 at 11:37 am
A credible “threat” to the MPEG (and AAC) patent holders is necessary for them to make the license terms less greedy. When you are the only game in town, you can charge what you want. With a viable alternative, the license terms are more likely to change. So this is a good thing. I suspect VP8 will make H264 use more transparent and reasonable….in a few years.
Orbijx Says:
May 20th, 2010 at 11:40 am
Oh, lord.

I smell browser wars coming again, and this time, there’re reinforcements.

For everyone asking why someone would release an inferior product when an established product and standard is out there…
I would like to introduce you to my old friend, Microsoft Internet Explorer.
I’m sure you all remember the days where Internet Explorer actually trumped Netscape for usability’s sake, even though it had that gaping wound called ActiveX in it.
Look how long we had to live with Internet Explorer, until designers started realizing that Trident isn’t the only rendering engine out there.

I expect that this is a ploy to get something out there that needs more work, just to get someone, or some group motivated to release something free and better than what has assumed the standard, if this goes the way I expect it to.

But that’s just the thoughts of a non-technical bumbling fool.
Witek Says:
May 20th, 2010 at 11:43 am
According to images you present. x264 even in baseline mode is much better than vp8.

What concerns me the most is the legal problems with VP8 as it collide with H264 anyway.
Dark Shikari Says:
May 20th, 2010 at 12:00 pm
@Witek

That’s because of psy optimizations, which x264 has but the VP8 encoder does not. In terms of theoretical capabilities, VP8 beats H.264 Baseline; it’s just that x264 is a better encoder.
shon3i Says:
May 20th, 2010 at 12:57 pm
@Dark Shikari

So in some near future if developers enhance VP8, add more things (PSY, AQ), fix all stupidities, can in this way VP8 stand with H264 HP shoulder to shoulder, or been better?
Kevin Lewis Says:
May 20th, 2010 at 1:06 pm
Good analysis. I agree with your conclusion about patent encombrance of VP6, and I’m sure the open source community will simply ignore this. The whole VC-1 / MPEG-LA saga will happen again. Happy days.
MfA Says:
May 20th, 2010 at 1:13 pm
Zero-multiply shift-only transforms aren’t really out of the question … since it is indeed all lifting and lifting parametrizations of DCT are ancient. Some of the design algorithms for finding good factorizations and specific implementations might be patented (pretty shaky).

Is it still relevant though? It doesn’t matter to CPU based encoders, and for fast heuristic encoders (ie. which don’t try to simply re-encode blocks a dozen times) and decoders I can’t imagine it’s still a big deal.

Computation is always getting cheaper … bandwidth and storage more expensive in comparison. Look at things like Frame memory compression!
Mak Says:
May 20th, 2010 at 1:38 pm
Clearly Google wouldn’t have splashed out $125m for something that’s crap.

I trust Google to have spent wisely (as they have always shown in the past), over the opinion of someone with a heavily vested interest in H.264.
TGM Says:
May 20th, 2010 at 2:02 pm
oh dear, point missed completely. HTML5 is an open spec. We don’t need closed formats in it. I hope VP8 wins through, and now it’s been open sourced people can look at it and improve it. I’d like to see the same comparison on a snapshot 6 months down the line…
bgm Says:
May 20th, 2010 at 2:25 pm
Well, for $124 million Google could have hired the x264 team to create a new open codec. Patent lawyers from the big G working to ensure no time-bombs from start. Even if took some serius time,its a well spent money.
Tim Says:
May 20th, 2010 at 2:56 pm
@Dark Shikari 120

Fair enough – I was just wondering – and alas it’s not an overly paranoid suggestion.. As Mfa (#129) mentioned, however, additive/shift lifting schemes for wavelets have been around a while, and comprehensively published… and while not exactly ancient or widely used in a commercial sense (often the important metric for litigation), I remember using them in the mid-90s for noise filtering of drill data (IIRC the schemes can be used to filter out mean data values in log(n) complexity, useful for real-time control).

I’m grateful for all the analysis here, it’s certainly got me looking more closely at this. Thanks.
m.j.mellin Says:
May 20th, 2010 at 3:32 pm
It’s true what Dark Shikari wrote, x264 is better than VP8 and it’s doesn’t seems to be patent-free. Even, if by any change Google will pay patents fines to make it free, there are much more good and well tested programs to code and encode x264 than VP8. X264 works really good on streaming video thru Internet, look on website http://www.YourLiveCinema.TV, where 99% of movies is compressed with x264 with really good quality.

Thank you all x264 developers for your great work in developing x264 coder.
Team of sites NAPISY.info and YourLiveCinema.TV
Mecki Says:
May 20th, 2010 at 3:57 pm
@Orbijx: Your comparison to IE isn’t really correct here. The competitors that are trying or maybe already have beaten IE in the long run (Mozilla, later Firefox, Chrome, Safari, Konqueror, Opera, etc.) are not setting up competing standards. They were all following an independent set of standards from the W3C (CSS, HTML/XHTML) that neither of them created themselves. So we have here one standard created by a single company (Microsoft) vs a set of standards created by none specific company, but used by many applications.

Going by your comparison, Google is Microsoft now. Microsoft and Apple are using an independent standard, one that neither Microsoft, nor Apple has created on their own. Google on the other hand is creating their own standard; they are doing the same thing everybody has criticized Microsoft for.

The fact that Google might have more supporters is only caused by the horrible patent problem computer industry is facing for years. If there were no patents, surely everyone would prefer H264 over VP8.
Bill S. Says:
May 20th, 2010 at 4:39 pm
Might the release of V8/WebM be an attempt to get the H.264 patent pool players to relent on their plan to extract royalties in 5 years?
Nick Says:
May 20th, 2010 at 5:00 pm
Why doesn’t Google just buy all the H.264 patents (or companies that hold them)?

Also, thanks for the comparison! It is very thorough.
Witek Says:
May 20th, 2010 at 5:29 pm
@Dark Shikari answering to me at #127.

Thanks about mentioning lack of PSY. Then we wait for theese to be implemented if possible. Still I see this preliminary vp8 images better than Theora (even latest) so I’m hoping that it will be good and maybe even better than h264

Thanks for critical review btw.
Skybuck Says:
May 20th, 2010 at 5:40 pm
Hi, here is my reply to your review:

First of all I think your review is a little bit biased

Second of all you are nagging a little bit too much about little things like:

“Typo’s in comments” ?!? So what ?!?

“Versioning in comments” ?!? So what ??

“1080i/1080p” is a pretty high resolution and probably most PC’s out there will have problems with it.

“1080i/1080p live streaming/encoder/real time” seems pushing it a bit… I doubt h.264 can do it…

I do agree with you on the code though:

The code looks pretty horrible for a 100+ million deal… no c++ ? no nicely designed c++ classes ? I guess it’s “rush to market eh ?!”

I can understand why Microsoft might not want to include it in their IE9 by default, because:

1. The codec could be a big piece of steaming shit.

2. No security analysis has been done yet ?

3. Lack of features/api’s and so forth, though probably minor issue for integration.

I do wonder how they are planning to support it by external install ? That will require some kind of plugin support which currently doesn’t seem present ? No vfw ? No directshow ? Though wait… I think I did see some weird DirectShow stuff somewhere.. mostly’s dll’s though ?!?

I also agree with you that this codec is probably using many methods that h.264 is using… so very few new things… but then again I know very little about h.264 just the general methods that these kind of codecs use So anybody hoping that vp8 would be something spectacular will probably be disapointed

“What’s the signal to noise ratio stuff about ” Seems bullshit to me but ok…

Nice to see somebody review it though and speaking out about it !

Also what’s the point of open sourcing it if it’s set in stone ?!? Might as well give people a windows dll and be done with it ! Ok, Ok, maybe some optimizations here and there… big whoopie

Only thing this open source seems to be usefull for is potentially stealing it, but who wants to steal a steaming pile of crap ?

LOL. Did Google just pay 100+ million dollars for a steaming pile of shit ?

I will leave it at that !
foxyshadis Says:
May 20th, 2010 at 7:24 pm
@TGM

What’s so closed about a format with a complete spec and reference software that’s freely available on many sites, including the official agencies, with licensing fees that are less than any of its wildly popular predecessors? (MPEG-1, MPEG-2/DVD, MPEG-4…) I appreciate that Google is rattling the sabre at MPEG-LA, but they have months of hard work ahead to fix the substandard format, and if they abandon it for VP9 as soon as companies bring out devices for it, there’s going to be a huge backlash.

@Dark

On2 may not have had a spec. Google may have created one largely by slapping the relevant code in – expecting to nail it down in English at a future date. Obviously the whole release was rushed. It’s hard to believe that On2 didn’t have clear, tested, plain C versions of optimized routines the way most open source software does, but the proof is in the code. How else can you find optimization bugs?
Glen Turner Says:
May 20th, 2010 at 7:27 pm
There seems to be some conflict in your discussion about VP8 patents. If VP8 builds upon the previous VP…s and development happened in the same timeframe as the development of X.264 then it may not follow that VP8 has the patent difficulties.

In this situation it would be ballsy of MPEG-LA to approach Google. The only assets MPEG-LA has *is* the MPEG patent licenses so if MPEG-LA loses then MPEG-LA ceases to exist.

Furthermore, the license-providers to MPEG-LA are taking a fair risk too. MPEG-LA is essentially a shell company and courts are increasingly unimpressed by the use of shell companies to avoid liabilities.

I fully expect MPEG-LA to sit on its hands and continue to collect rent from camera manufacturers and other licensees, rather than taking the risk of being wound up after the success of any one counter-claim by Google.
Bob Says:
May 20th, 2010 at 8:12 pm
So, the corporate “standard” is created by pooling a lot of patents, and hoping that there is no important patent left out. (there will be several).

Meanwhile, H264 will not even be a possible consideration for HTML5 until the patents run out. W3C policy and history call for no royalties for WWW standards. It is really that simple. Manufacturers will come around as soon as the patent minefield is cleared a little. It will take 2 or 3 years to sort out.
tangfl Says:
May 20th, 2010 at 10:30 pm
I would like to ask for your permit to translate this post to Chinese and post in my blog: http://blog.fulin.org/2010/05/vp8_first_in_depth_tech_analysis.html
Dark Shikari Says:
May 20th, 2010 at 11:14 pm
@tangfl

No problem, feel free.
Jose Quervo Says:
May 20th, 2010 at 11:54 pm
I have a novel idea. Since x264 can’t really be used in the US because of patents, maybe “we the people” should exert our will and have all software patents abolished and this won’t be an issue anymore. Just an idea.
Eric Says:
May 21st, 2010 at 1:09 am
I find this frame quality changes that you show in your analysis disturbing:
http://doom10.org/compare/vp8_psnr_graph.png

This find will likely mean that VP8 supporters will use those 1 in a dozen improved quality frames to compare their codec favorably with h.264
Marco Ravich Says:
May 21st, 2010 at 1:27 am
Just a question: doesn’t VP8 come out *before* x264 ?

If so, can x264 be te patent violator ?
NerdWithNoLife Says:
May 21st, 2010 at 6:30 am
The interesting thing about this soon to come patent war is VP8 will have Google’s legal team on its side, something x264 surely lacks. Thanks again DS for REAL information. Hopefully a quarter of it will trickle down to mainstream news.
Ajay Says:
May 21st, 2010 at 7:22 am
Jason, one claim that people are using against you is that x264 is trying to relicense as a commercial product, which they say is what’s motivating your criticism here. You might want to address this claim in an update or on a separate post, whether it’s true or not.
Bron Says:
May 21st, 2010 at 8:37 am
the assembly [code] is clearly written by retarded monkeys. LOL!

I've seen a lot of commercially viable retarded code. "Good enough" seems to matter more in the end - I don't think the short-comings and trade-offs in the code are going to matter unfortunately. That's kind of what you get when something is given away. "We're not going to fix it because it was a major pain in the ass just to get our retarded monkeys to get this to work!"
Jeff kelly Says:
May 21st, 2010 at 8:47 am
Google never really thouroughly checks for patent violations in open sourced code because they are not liable in case of infringement.

They also don’t indemnify licensees of the spec from patent infringement lawsuits (MPEG-LA does).

So if you license the format you are basically SOL. Google nowhere states that the code is free from patents and they transfer liability for patent infringements onto the licensee.

Just ask HTC that pay Microsoft license fees for Android how well using open source Google code has worked out for them.

Why do you suppose Microsoft only supports VP8 by plug-ins in IE9? They can claim support and if the shit hits the fan the plug-in supplier is the one liable.

The more interesting question is the following:

Can a Company like Microsoft or Google, that already pay license fees to the LA actually infringe on patents if they use VP8?
Antti Says:
May 21st, 2010 at 9:24 am
Someone mentioned that B-frames were out of the question because of patent issues. How can this be if 1) MPEG-1 uses B-frames and 2) MPEG-1 draft standard was produced already in 1990? Wouldn’t the relevant patents have expired already?
Jeff Johnson Says:
May 21st, 2010 at 9:28 am
We have used and tested this codec.
In performance it shines for scenecut on frame ( no fade / crossfade ) in stable ( tripod ) shots and lens fringing fix( and other noise artifact filtering ) reasonable performance.
It upscales badly.

However the world is being handeld, HD, tree leaves fluttering frame to frame bg, pan wobble etc. h.264 beats vp8 in all regards. Since we need 2.5mbit and under netspeeds to go HD, we are staying with h.264 @720p30 baseline. It scales better up and down, meaning iPad 1024 to 1366 bullshithd to 1080 screens see the 720p better.

with flash 10.1 gpu pipeline we expect to fill the crossplatform to net-conntected stuff and google tv hooked to 1080 screens.

At least we can hope for HD convergence, IMHO h.264 already is what we need.
Dark Shikari Says:
May 21st, 2010 at 9:47 am
@Antti

There was a loophole in patent rules that was closed in 1995. This loophole allowed patent applications to be extended much longer than is reasonable before granting, thus making the date at which they would expire later. A lot of MPEG patents thus do not expire until 20 years after 1995 (2015).
Dark Shikari Says:
May 21st, 2010 at 9:50 am
@Ajay

We are not planning to relicense, but rather to add another option. The reason for this is that a ton of companies would love to use x264, but they use proprietary solutions instead because of the GPL — so in order to increase adoption of free software, we need to add an option more suitable to companies who can’t use the GPL. x264 will always be GPL; it just may also be available under other licenses too. And it will never be forked, i.e. a different version available under different licenses.
cubeeggs Says:
May 21st, 2010 at 11:24 am
Have you seen what Mozilla said on their blog?

http://blog.mozilla.com/blog/2010/05/19/open-web-open-video-and-webm/

Until today, Theora was the only production-quality codec that was usable under terms appropriate for the open web. Now we can add another, in the form of VP8: providing better bandwidth efficiency than H.264, and designed to take advantage of hardware from mobile devices to powerful multicore desktop machines, it is a tremendous technology to have on the side of the open web. VP8 and WebM promise not only to commoditize state-of-the-art video quality, but also to form the basis of further advances in video for the web.
steve Says:
May 21st, 2010 at 11:27 am
It only took 1 day for patent threats to start:

MPEG-LA looking into forming a patent pool…

http://digitaldaily.allthingsd.com/20100520/googles-royalty-free-webm-video-may-not-be-royalty-free-for-long/

Basically mpeg-la is selling patent insurance: buy today to protect from us suing you tomorrow.

In regards to people having issue with google quickly “setting in stone”, my guess would be for legal reasons:

* Ginalize vp8 exactly as on2 sold to google, so any lawsuits arising would clearly be against on2 patents. If google released as beta and made changes before finalizing, the legal waters become muddy about whether on2 patents still apply

* The other reason for finalizing quickly is to get ball rolling from industry support. Lots of different groups need to come together to add support in browsers, encoders, design tools, hardware decoders. If google released as beta, everyone would likely sit around and wait for final before starting work.

Also there’s no reason problems can’t be addressed in quick update to vp9. Even specs like hdmi went through a quick few updates to add things that should have been there in 1.0, like bitstreaming
Joseph Frazier Says:
May 21st, 2010 at 1:59 pm
Hi,

Because of you great view and apparent knowledge of developing I have an off the topic question to ask you.

I am trying to become a developer for the iPhone OS. I have started to take beginning programing classes at my local JRC. Unfortunately they only seem to teach windows based data base programing. From a developer’s point of view like yourself what should I do to learn to become a developer if I only want to develop for iPhone OS?

Great blog BTW I totally agree!
iAPX Says:
May 21st, 2010 at 2:21 pm
Thnaks for the good work, I am planning to work on my own for CUDA implementation ox x264, with new ideas, and it’s great to read insight as well on VP8 than on video encoder from an expert!
tommy Says:
May 21st, 2010 at 5:50 pm
Can vp8 utilize multi-core processors when encoding in real time?
Dark Shikari Says:
May 22nd, 2010 at 8:33 am
@tommy

Yes, but it doesn’t seem to do so very well (probably because its threading model is slice-based instead of frame-based). It capped out around 40% usage on my Core i7 here.
Marcus Says:
May 22nd, 2010 at 10:42 am
Can we have the diffs for the rejected bugfix for libvpx?
And Says:
May 22nd, 2010 at 12:08 pm
haha I think most of the people here commenting negatively on VP8 are apple workers. Most Linux distros are free, faster, smarter, open, and in my opinion, look a lot better than OSX. Thank God for open source! And thank you Google for trying to stop these greedy bast***s.. All patent trolls should burn in h***

Time to change the laws! No one should be able to terrorize the web like apple and microsoft do!
David Jezek Says:
May 22nd, 2010 at 1:00 pm
dear Dark Shikari, sorry to inform you so lately. as you permit chinese translation of this article before, i would like to inform you also about my czech article on VP8 based in approximately 50% on this your analysis (partial translation with my czech comments).

btw. thank you for this analysis (and also for your work on x264, which i’am gladly using several years).

http://www.diit.cz/clanek/google-uvolnil-video-kodek-vp8-dalsi-krok-v-revoluci-weboveho-videa/36259/ (analysis is used on pages 4 to 6)
jmdesp Says:
May 22nd, 2010 at 4:01 pm
Some precision about Mfa’s comment above : Q15-J-19 is the decoder description of Nokia’s MVC Video Codec. There’s an easy to find q15j65.ppt document about it with a publication date of 2000 (nice read if you want even more advanced video encoding descriptions after the above ). Mfa reports Nokia then got a patent on it, but dated 2002, so easily demonstrated as invalid if you have the money (no problem for Google). A demonstration you can destroy many patents with enough knowledge of the published literature.
Jammec Says:
May 22nd, 2010 at 4:05 pm
@tommy

VP8 stream can be concurrently encoded into one macro block control partition and 1, 2, 4 or 8 residual partitions. Residual partitions are divided by macro block rows so for example row 0. is encoded to partition 1. row 1. to partition 2. row 3. to partition 3. and so fort. This should give very good multi core possibilities.
Relgoshan Says:
May 22nd, 2010 at 4:28 pm
#159: Get a Mac, get the SDK from Apple, and please get off the internet.

#161-162: On2 made some very bold claims about multicore processing and in-frame adaptive motion compensation. Yet I am not seeing any delivery on those promises. Is there a chance that part of On2′s codec had to be yanked and replaced with a more primitive kluge?
Saravanan T S Says:
May 22nd, 2010 at 4:35 pm
Thanks for the detailed analysis. Without a proper spec, any standard is “never going to be a standard”. Whatever the technical issues, unless this is patent-free, its still going to be a problem. Looks like Google would establish a “patent-pool” for centralized management of the patents covered by VC8. Whatever the case, IMO, Google’s move is nice, but very complicated.
Relgoshan Says:
May 22nd, 2010 at 5:03 pm
http://www.webmproject.org/code/roadmap/

This actually did mention starvation issues with multicore encoding, as well as several of Dark Shikari’s personal complaints about the technology. I don’t think that the new VP8 working group is entirely happy with what it has been handed.
Carlos Solís Says:
May 22nd, 2010 at 7:31 pm
@Relgoshan:
As a matter of fact, the roadmap mentioned fansub encoding in nearly the first place! Which remembers me: is it possible to encode anime with WebM at 50 MB per episode, as it can be already done with x264?
Mark Hollis Says:
May 23rd, 2010 at 8:51 am
Cripes!

While I do totally appreciate Google’s intended gift here to the World, a codec that is software patent-free (like all software ought to be), since this codec is hugely based on H.264, they need to seriously revise it before it is useful. In the meantime, I have websites I need to create with actual moving images and I can only assume we’re three years (ten if you consider the litigation that is predicted) away from any really usable HTML-5 video, unless we default to Flash, which is proprietary or H.264 which is being refused by Firefox.

I need to serve clients here with video. I started using Apple’s “Export for Web” for videos (which is not W3C compliant, knowing that Firefox won’t play it back. I’m not a fan of Flash because it’s not W3C compliant, either. And I risk the potential loss of SEO for every web page with a moving image for my clients (who want moving images).

I deeply appreciate how Google stepped up to the plate here with this attempt. I would hope that they can clean up the mess and truly make their codec patent-free. But I don’t like waiting like this. We’re, essentially, in limbo.
psuedonymous Says:
May 23rd, 2010 at 12:12 pm
@266
Yes. It will look as equally horrific as x.264 at 50mb per episode.
Relgoshan Says:
May 23rd, 2010 at 3:06 pm
#167: I’m pretty sure the question was targeted for PSP playback or something similar. The question is, what’s the first handheld that will manage reasonable playback speeds?

If VP8′s decoder is slower than halway decent H.264 software solutions, and ALSO has elements that are difficult to implement in a hardware decoder, won’t video quality suffer a LOT on handhelds?
bdmv Says:
May 23rd, 2010 at 4:43 pm
mb=mibibit
Mb=Megabit
MB= Megabyte
Jens Says:
May 24th, 2010 at 1:08 am
First, x264 patents are only an issue in a handful of countries (including the US) and the FFMPEG implementation is free and open source just as VP8. The new “WildFox” project will implement the free x264 for firefox users of the rest of the world. Based on that I only see a “moral superiority” of VP8 in countries that suffer from software patents. Apart from that I think the most disturbing part indeed is the lack of independent implementations of VP8 where basically the software released turns into specs, which conserves bugs as “standards”. Hopefully there will be community efforts to improve a VP8-derrived codec where such abdominations are identified and removed – even if that means breaking compatibility.
xcomcmdr Says:
May 24th, 2010 at 5:40 am
@MfA (130)
“Computation is always getting cheaper … bandwidth and storage more expensive in comparison. Look at things like Frame memory compression!”

You can always screw up an algorithm’s complexity way off the scale so even the best CPU in the world can’t compute it in a reasonnable time frame. Like VP8′s motion estimation in wich “almost half of the locations it searches are repeated!”

That’s why mesuring complexity is important, and “CPUs are getting more and more powerful” never changed that!
Dark Shikari Says:
May 24th, 2010 at 11:25 am
@Marcus

Ask derf. He’s a Theora dev who submitted the fix.
Louise Says:
May 24th, 2010 at 4:59 pm
29 bugs have already been filed, and some already fixed.

http://code.google.com/p/webm/issues/list?can=2&q=&colspec=ID+Type+Status+Priority+Version+Owner+Summary&sort=&x=&y=&cells=tiles
Martin Says:
May 24th, 2010 at 5:36 pm
Your work is very much appreciated. Thank you!
manfid Says:
May 24th, 2010 at 10:12 pm
doesnt matter if h264 is better than v8, aac is better than mp3… yet mp3 is more mainstream, we dont care for a few more bytes as long as sound equal good.
v8 just need to be good at sigth, doesnt matter if is twice the size of a h.264 video.
ten years ago, a few bytes were a big deal, not now and surely not in 10 years.
even x264 know that, it puts quality a head of bytes, instead of xvid who was all around bytes, with the 1-2-3 passing aproach.
sorry if v8 kill h264, so you just wasted time developing something that should not be developed in the first place, for a format now without future.
if h264 were free, without licenses and patents crap (glad i live in a country were all that patent software nonsense dont reach us, sorry for those who live in one of those 2 stupid countries that believe in software patents), there should be no need for open sources projects or third developers creating multiple encoders and decoders for one format, trying to circle around patents, reinventing the circle, wasting hundreds of developers hours time for nothing but trying to reach the same place skipping the toll road.
everyone should be just using the official codec and decodec free and careless, that will be the case if v8 win.
i hardly see mpeg-la will be able to twist the arm of their clients to boycott v8, at must they can send the memo “dont support v8 or we will stop licensing you and will sue you, thx for give me your money i will use it to sue you if i must” or “ok, we are in danger, we need to put h264 everywhere lonely, we dont need the competition, so i may be able to reduce your license payment a little as far as you only support h264, even give you a few years free if you help me boycott the competition”
TurtleCrazy Says:
May 24th, 2010 at 11:05 pm
> This doesn’t mean that it’s sure to be covered by
> patents, but until Google can give us evidence as to
> why it isn’t,

Hum, sure about this ?
while MPEG-LA (or someone else) do not point out the evidence, it is.
Ben Says:
May 25th, 2010 at 12:34 am
“In 2015, they’re plannnig to charge for each streaming of an H.264 bitstream… ”

Not sure where you got that from, or you are just spreading FUD.
Bactero Says:
May 25th, 2010 at 7:43 am
Just wondering if Dark Shikari really examined VP8 code or just cherry picks some parts of it to criticize on.
bdmv Says:
May 25th, 2010 at 8:46 am
manfid Says: a lot of things, but doesn’t quite get it apparently…

“doesnt matter if is twice the size of a h.264 video.
ten years ago, a few bytes were a big deal, not now and surely not in 10 years.”

Not true, apparently you have not come across the mobile markets pay by the Kbit transferred in eather direction model yet.

so OC the lower the file size transferred the lower your your mobile bill with less over usage charges, AVC High Profile x264 encoded content gives you that higher visual quality at a lower file-size if you choose to tune it that way Today.

put simply, a 50% file size increase , directly increases your end user mobile kbit transferred bill by the same amount.

“sorry if v8 kill h264, so you just wasted time developing something that should not be developed in the first place, for a format now without future.”
there’s just so much wrong with this reactionary thinking, its hard to know were to start.

put simply x264 and All the dev’s that have contributed to its development have made the x264 Encoder the worlds best and most used H.264 encoder there is today, look at any AVC encoded content on the web today and you will find that 90%+ have been encoded with it.

many of these same dev’s have written patches and whole sections of new code for the other apps ffmpeg,VLC,Mplayer etc so don’t go pissing any of them off to much or your VP8 open project may find that these new VP8 patches don’t get written by the people that know quality and produce virtually all your end users are using today.

the commenting iv seen so far here and elsewhere about how VP8 is a good “base line” like profile forget or worse simply don’t Know due to ignorance that virtually everyone can and many already do use “High profile” H.264 Encoded content today.

DVB-S /DVB-S2 /DVB-C /DVB-C2 /DVB-T /DVB-T2 /DVB-H /DVB-H2 , BBC HD i Player, verticality all the US EU, and far east AVC/H.264 HD web streaming broadcasters, Blue ray,HD AVC STB’s and even most current arm based Mobile video capable devices can all take “High profile” level 3.1 through 4.0 AVC Encoded content, and of the very few that cant do HP can take Main Profile content, so who wants, never mind actually needs a low visual quality low grade “baseline” fuzzy type profile, no one that’s the real answer….

hell even the current commercial DivX codec spec that are finding their way to into lots of commercial licensed products Today state a requirement for AVC/H.264 “High profile” and “main profile” level 4.0 + AAC + Mkv container ability’s.

and lets not forget super HD AVC is not so far away eather.
Dark Shikari Says:
May 25th, 2010 at 8:56 am
@Bactero

I read the whole thing. As I said overall, it’s not too bad besides being pretty inefficient.
Bactero Says:
May 25th, 2010 at 9:17 am
Analysis of WebM in response to Dark Shikari.
http://carlodaffara.conecta.it/?p=420
Relgoshan Says:
May 25th, 2010 at 9:52 am
Between Opera, VLC and the KeepVid website, I have ripped every quality level of many new YouTube videos. The VP8 LQ looks about the same as h.264 LQ, sounds worse, and can take more than twice as much bandwidth. VP8 HQ actually does pretty well compared to h.264, but even MPEG-2 works fine if you give it enough bit-rate to work with. VP8′s inefficiency with low-quality video is awful.

On a mobile connection, VP8 LQ will either look worse or take longer to download.

Compare VP8 vs h264 on these two videos, for an especially extreme example:
http://www.youtube.com/watch?v=tQxbpryKKQo
http://www.youtube.com/watch?v=IMS-gewLP44
Carlo Daffara Says:
May 25th, 2010 at 12:02 pm
Actually, mine is a reinstatement of your post with a different angle: I used your mentions of sub-par implementations, checked with some of the patents that I happened to have read in my past years at ISO (I was in ISO JTC1, Italian chapter, UNINFO) to show that those decision may have been made to avoid patents.
I believe that your post is excellent and very precise, but misses the point; it’s true that x264 is much better (actually, as I wrote in the post, probably the best encoder available on the market) but the point is to demonstrate that On2 actually tried to avoid patents in the area (which is the main point of Webm, actually, not final quality). As the specification, it is true that it is incomplete and currently based heavily on the actual implementation. This, of course, happened also during MPEG4p2 and H264; most of the specification was actually based on the reference encoder/decoder.
Carlo Daffara Says:
May 25th, 2010 at 12:12 pm
By the way, it’s not only the US that does have the software patent problem. H264 patents are registered in Germany, France, Italy, Mexico, Canada, Korea, Japan, China, Finland, Sweden, Belgium, Bulgaria, Switzerland (just in the first 3 pages of the MPEGLA H264 essential patent list). I can tell you that in Germany patents were already used to remove unlicensed ware from fairs and shops; MP3 players were sequestered at CeBIT just a few months ago. Same for Italy.
And, again (as I gotten flamed badly): I am *not* saying that VP8 is better than x264. Psy opts on x264 are among the most incredible example of extraordinary encoder design. I am just claiming that the differences may be sufficient so survive cross-examination, using Dark Shikari analysis. Also, there are legal reasons for not releasing the patent crosscheck that Google did before acquiring On2: to avoid treble damages (this is also quite common in the US).
Final point: I doubt that psychovisual coding could be implemented in VP8 in a patent-safe way. Bit allocation, the use of visual metrics to drive quantizer choices (your tree allocation method) are, in my cursory glance, all patented in a more or less unavoidable way.
Carlo Daffara Says:
May 25th, 2010 at 12:30 pm
A final point: MPEG-LA does *not* provide indemnification for external patent claims. From the MPEGLA FAQ: “Q: Are all AVC essential patents included?
A: No assurance is or can be made that the License includes every essential patent. The purpose of the License is to offer a convenient licensing alternative to everyone on the same terms and to include as much essential intellectual property as possible for their convenience. Participation in the License is voluntary on the part of essential patent holders, however.” So, if an external (non-MPEGLA member) holds patents on something, MPEGLA is neither a part of the suit nor will help in a suit against a customer.
WulfTheSaxon Says:
May 25th, 2010 at 3:07 pm
@Dark Shikari (comment #155)
So, would it be reasonable to expect VP10 with B-frames in 2015?
Rüdiger Says:
May 27th, 2010 at 7:45 am
@CarloDaffara: Those are bot software patents in the countries where there are no software patents. MP3 players are hardware.
sep332 Says:
May 27th, 2010 at 9:11 pm
I read the parts where you said VP8 was similar to H.264, but I didn’t appreciate the effect until Google rolled out “WebM” on YouTube. These two 720p encodes of the same clip (the first 10 minutes of a Google I/O keynote) look COMPLETELY the same to me, despite the difference in file size. I’ve never seen two different codecs produce visually identical results at moderate-to-low bitrate before.

- Links -
WebM http://drop.io/hidden/oyoyld5o6xkjlu/asset/Z29vZ2xlLWlvLWtleW5vdGUtd2VibS1ta3YtMg%253D%253D

H.264 http://drop.io/hidden/oyoyld5o6xkjlu/asset/Z29vZ2xlLWlvLWtleW5vdGUtaDI2NC1tcDQ%253D
Pink Says:
May 28th, 2010 at 2:08 am
@Rudiger: my video iPod, smartphone and laptop are also hardware. I can’t think of any device I’d use to watch H.264 or WebM video that isn’t hardware. What’s your point?
Carlo Daffara Says:
May 28th, 2010 at 6:38 am
@rudiger At the best of my knowledge, the Germany raid at CeBIT was started by a claim for a *software* patent, implemented inside of the players, for decoding MP3 based on a patent held by Italian company Sisvel. Despite the claim that software patents are not valid in Europe, there are several patents issued, and several were upheld in European courts (the most recent one in Germany by Siemens). Just go and check the non-US patents in the H264 essential patent pool, and you will have a good list of countries that accepted them.
xawari Says:
May 29th, 2010 at 1:58 am
Well, for me VP* codec is a great disappointment. In my (even non-HD) test video in which I tried to preserve near-loseless quality (all parameters set to highest quality), VP codec showed very poor image quality in fast-motion part (while nearly static video was sometimes looking even better than x264). The worst case, in my opinion, for VP codec is when big bright image shrinks into a small and dark one.
Mes Says:
May 29th, 2010 at 12:00 pm
Relevant and very insightful post by the VP8 team:
http://webmproject.blogspot.com/2010/05/inside-webm-technology-vp8-alternate.html
Willem Says:
May 30th, 2010 at 4:57 am
I saw a mail from Steve Jobs referring to your article, I wonder if Apple has offered you a job yet.
Rüdiger Says:
May 30th, 2010 at 8:35 am
@CarloDaffara: AFAIK, there are no software patents in Germany according to the definition of the German patent office. But, and that’s the point here, there is patent protection for computer-aided inventions or hardware-implementations of algorithms depending on the “technical character” of the claims made (keyword: “Technizität” = [technicity?, technicality?]).
Carlo Daffara Says:
May 31st, 2010 at 12:05 am
@Rudiger We actually agree on this point. Also in France and Italy software patents are not allowed, but the actual use of a software algorithm on any machine that alters the physical world in any way (including changing memory, or a display state) are accepted. So, it may be not “software patents” but certainly look like it…
Relgoshan Says:
June 1st, 2010 at 11:31 am
#194: Laptops are not hardware because you can change the software framework. There are questions about the classification of hackable devices like Linux smartphones, because technically you could replace the entire OS.

Think “Firmware” versus “Software”
Okazaki_Saito Says:
June 3rd, 2010 at 1:19 pm
First, all H.264 material on the web (as in HTML5), and all material viewable on mobile phones (with “hardware” support, whatever that means) is Baseline H.264. VP8 is better than that by most measures.

In other words, VP8, even in its current unoptimized state, beats the heck out of Baseline H.264, which is used by YouTube and all other video-enabled devices. I’m surprised how you make things sound worse than they actually are. Latest Theora development report (demo9) shows significant improvements due to psychovisual optimizations, and that’s already in the roadmap of VP8 developers.

Your analysis is correct in most areas, but the way it is presented generates a lot of FUD in my opinion. I believe if you were doing a technical analysis of x264, your tone regarding the drawbacks will be very different.

Why are you against VP8? isn’t it better for everyone that now we have a truly free (so far at least) video format that is better than Baseline h264 and has huge potential to improve? I don’t think VP8′s adoption for the HTML5 standard will affect H.264 or x264; H.264 is already adopted by other long term standards such as Blu-ray and DVB-S2. Not to mention that fans won’t leave x264 for anything else, So why are you against VP8′s adoption for the web?
Jeroi Says:
June 4th, 2010 at 10:53 am
Google should have been rebranded matroska as *.mkw whichwould stand for matroskaweb where orginal is *.mkv

So stubid to give as *.webm format which is stupid for 5 letter extension anyway. Please google, rename container extension to MKW rather than webm. You can call it WEBM but use .mkw extension format.
Dark Shikari Says:
June 7th, 2010 at 8:19 pm
@Okazaki

That isn’t what I said. x264 baseline beats the crap out of VP8 currently; I simply said that in theory, VP8 should beat H.264 Baseline, given a sufficiently good encoder.

Furthermore, it isn’t VP8 vs Baseline H.264. VP8 is vastly more complex than Baseline H.264 to decode. Most likely, if there is a “baseline VP8″, it will take many shortcuts (fast deblocking filter, bilinear MC, etc) — and thus be not nearly as good.
Okazaki_Saito Says:
June 7th, 2010 at 10:27 pm
Yes, excuse my blunder. VP8′s current encoder is a lot weaker than baseline with x264. But I have 2 questions:

1-How much improvement can Google milk if the apply serious psy optimizations to the encoder?

2-Assume (for the sake of argument) no one could find any patent infringement. What will be your view regarding VP8′s adoption for the HTML5 standard? Are for it? against it? or still undecided?
bdmv Says:
June 8th, 2010 at 10:44 am
“203: Okazaki_Saito Says:
June 3rd, 2010 at 1:19 pm
First, all H.264 material on the web (as in HTML5), and all material viewable on mobile phones (with “hardware” support, whatever that means) is Baseline H.264.”
as already stated in 185: above, that assumption that all mobile type devices can only take and use Baseline profile H.264 is totally false, as is your assertion that “all material viewable on mobile phones is Baseline H.264.” ill say again virtually all current ARM based SOC (System On a Chip) mobile devices CAN Take “High profile” Encoded level 3.1 H.264 and of the very few that cant , they too can take “Main profile”
James Zhou Says:
June 9th, 2010 at 3:44 am
I have a question — do you know where I can obtain/down-load a copy of the vp8 standard?

Thanks a lot.

– James
anonymous Says:
June 10th, 2010 at 2:01 pm
If you say that libfaac is bad and there is no good free AAC encoder, then what does YouTube currently use to encode their AAC?
seth Says:
June 10th, 2010 at 4:02 pm
that was a great writeup. i would say, on the whole, if google has the will and fortunes to defend an h.264 like rip off that can be comparable in quality it’s an overall win. increasing internet speeds will reduce bandwidth hurdles faster than outliving existing patents or developing something entirely new. and i’m REALLY happy all my h.264 code didn’t go into the toilet. it can now be aimed at something, uh hum, similar. the overall benefit for new applications to be developed without paying licensing fees is a PLUS.
Tina Says:
June 13th, 2010 at 6:40 pm
Thanks DS, i go “quicktime” then.
Dark Shikari Says:
June 13th, 2010 at 7:32 pm
@anonymous

Do you think Youtube gives a crap about quality? They use ffvorbis for webm for christ’s sakes! Yes, they use FAAC.
Dark Shikari Says:
June 13th, 2010 at 7:34 pm
@Okazaki

I think VP8 would be suitable for HTML5 if Google was forced to give up “ownership” of it. This means:

1) Development taken over by a neutral standards organization
2) libvpx is no longer “the correct” software that everything has to agree with; instead, a spec is drafted up, and everything must match the spec. Bugs in libvpx must not become part of the spec.
3) Google must release any patents they have on VP8. Currently, they refuse to even list what they are.
Conan Kudo (ニール・ゴンパ) Says:
June 15th, 2010 at 8:48 pm
Once a full spec has been made, perhaps a new independent encoder and decoder can be written to replace libvpx.
Andre Says:
June 15th, 2010 at 9:13 pm
What I wonder is whether the take up of WebM by web browsers will be enough to cause the MPEG association to update their terms to something that doesn’t scare mosts web sites off? I hope so, but only time will tell.
Brown Says:
June 16th, 2010 at 11:28 pm
How much improvement can Google milk if the apply serious psy optimizations to the encoder?
Relgoshan Says:
June 17th, 2010 at 5:58 pm
YouTube’s AAC may not be a good as some other encoders, but VP8 LQ sounds MUCH worse than even the flash version.
Mike Spooner Says:
June 30th, 2010 at 3:36 am
#183 Ben: see the official announcement from MPEG-LA dated 2nd February 2010:

http://www.mpegla.com/Lists/MPEG%20LA%20News%20List/Attachments/226/n-10-02-02.pdf

It is quite clear that they *may* require license fees (on some yet to be determined basis: per-decoder, per-stream, per-decode, or whatever, including combinations thereof), but may not do so, depending on how they feel.

Note also that there is no legal commitment in
any of these announcements, MPEG-LA could
decide to require fees at *any* point in the
future (I’m no legal expert, but I strongly
suspect that they are not entitled to make
*retroactive* changes to patent licensing
terms, so your previously legal past
activities should not require (further) fees).

From the February announcement, we should
get a statement from MPEG-LA by the end of
2010 which will state their *current*
intentions of what they will require from
2015 onwards. As noted above, such a statement
would not guarantee anything.
Evan Says:
July 30th, 2010 at 10:06 am
Thanks for your first technical analysis of VP8. It is really helpful for basic understanding of VP8.

However, it should be another story regarding to patent analysis. In fact, MPEG-LA lists their essential patents of H.264 on the website. I believe it won’t take you too much time to go through them (especially US patents).

In my biased opinion, only “deblocking filter” and “DC intra prediction mode” of VP8 may have patent infringement problems.

More interestingly, there is a lawsuit now regarding to deblocking filter (http://dockets.justia.com/docket/georgia/gandce/1:2010cv00748/165453/). Video Enhancement Solutions accuses many companies (Sony, Samsung, Panasonic..) of patent infringement. The patents of deblocking filter are quite strong and assigned from LG Electronics. This tells us that even if you have paid H.264 license fee to MPEG-LA, you are still not 100% safe. So, it is not reasonable to ask Google to make sure VP8 is 100% patent-free. Maybe 95% is already high enough.
Rocso Says:
September 8th, 2010 at 8:18 pm
To all those asking why Google is rushing to get WebM and VP8 out there before it’s really ready… to quote Mozilla: “net effects”.

Remember Microsoft’s years of vapourware announcements that were designed solely to stall “net effects” and uptake of anything new that they didn’t own or control? They’re still around but they killed, crippled and delayed a lot of great stuff in the process.

It only works if you are big enough, and Google are BIG ENOUGH.
Quora Says:
October 6th, 2010 at 3:12 pm
In layman’s terms, what are the key technical differences between the VP8 and H.264 coded video representations?…

Its hard to provide technical differences in layman’s terms here, because they are actually pretty similar and the differences are very nuanced. As a very rough summary, H.264 has better adaptive capabilities in a number of areas, i.e. it reacts bette…
Chris Smith Says:
November 5th, 2010 at 1:41 pm
I love the practical and frank approach your review takes. The “summary for the lazy” notice was pretty funny too, though it should probably be an anchor link to that section. (Cause they would be too lazy to scroll, get it…) Any way, good read.
Angel Genchev Says:
December 11th, 2010 at 1:35 pm
I`m sorry to read that VP8 is so inferior and patent endangered. I hope that the google`s millions bought the related patents and that On2 had enough of them. A year ago I did experimental rtp server and client for live video chat using x264 from ffdshow. Then I gave up, because of my employer ordered other things to do. Later begun looking to reimplement it @home without x264 to avoid patent issues. The theora, then VP8 codecs became one of my hopes when google changed the license.
I`m interested in comparisson of VP8 and x264 codecs for live streaming – what happens when one turns off the B slices, CABAC coding, Weighted prediction which is requrement of h.264 baseline profile and VP8 features (if required).
I think that if the patents are not an issue, VP9.x (why not xVP9 ) should be derived from vp8 to fix what is possible without patent infringement.
Quora Says:
January 15th, 2011 at 7:51 pm
What are some of the reasons that people prefer H.264/AAC over WebM (VP8 and Ogg Vorbis) for web video and audio? Which are deal-breakers for those who won’t use WebM?…

Too-long version: H.264 and AAC are better than VP8 and Ogg Vorbis, respectively, in almost every way. Except one: You need a per-implementation patent license for H.264 and AAC. That’s not really a problem for proprietary software vendors (like Apple…