Diary Of An x264 Developer

01/13/2010 (10:23 pm)

x264: the best low-latency video streaming platform in the world

Filed under: low latency,ratecontrol,speed,streaming,threading,x264 ::

x264 has long held the crown as one of the best, if not the best, general-purpose H.264 video encoder. With state-of-the-art psy optimizations and powerful internal algorithms, its quality and performance in “normal” situations is mostly unrivaled.

But there are many very important use-cases where this simply isn’t good enough. All the quality and performance in the world does nothing if x264 can’t meet other requirements necessary for a given business. Which brings us to today’s topic: low-latency streaming.

The encoding familiar to most users has effectively “infinite” latency: the output file is not needed by the user until the entire encode is completed. This allows algorithms such as 2-pass encoding, which require that the entire input be processed before even a single frame of the final output is available. This of course becomes infeasible for any sort of live streaming, in which the viewer must see the video some predictable amount of time after it reaches the encoder. Which brings us to our first platform: broadcast television.

x264 is used in thousands of servers at hundreds of head-ends for cable and IPTV broadcast, HD and SD, thanks to our good friends at Avail-TVN. In this situation 2-pass is no longer an option: we’re restricted to 1-pass encoding, for obvious reasons. But we still have a lot of flexibility: latency is not particularly critical since the user isn’t interacting with the content he’s viewing. At most our biggest worry is channel-change time, which can be optimized independent of the actual end-to-end latency.

As such, x264 has received many optimizations that assume a few seconds of lookahead. Avail paid me to develop RC-lookahead, which looks ahead a few seconds to plan future bitrate allocation. Other important features, such as macroblock-tree ratecontrol, sync-lookahead, frame-based threading, all have their own latency requirements. Furthermore, the stream itself inherently has some latency: the VBV buffer is usually around a second long, and B-frames require a delay as well., both encoder and decoder-side. Even without x264′s lookahead features, we’d still have a good bit of latency. For those unfamiliar with the topic, the VBV buffer stores the compressed video data on the decoder and is used to absorb fluctuations in bitrate, especially those caused by keyframes.

But some use-cases are more extreme. With interactive video, a 2-10 second delay becomes completely unusable. Videoconferencing requires latencies below 1 second, preferably much lower. If our target is 200ms encoding latency, not counting transport time, at 30fps that’s a mere 6 frames. Instantly, all lookaheads are forced off: we have no choice but to disable them. Even the regular threading model becomes a problem: it adds one frame of latency per thread beyond the first, which with many threads can quickly fill that 6 frame limit. And each B-frame we allow increases the latency by 1 frame too.

The total latency of x264, including encoder/decoder-side buffering, is:

B-frame latency (in frames) + Threading latency (in frames) + RC-lookahead (in frames) + Sync-lookahead (in frames) + VBV buffer size (in seconds) + Time to encode one frame (in milliseconds)

At the start of October 2009, x264 was completely unsuitable for this use-case. Its handling of tiny VBV buffers, especially without the RC-lookahead (which we’re forced to turn off), was disastrous. And the latency added by threading was completely intolerable in many cases, especially considering that we want to use as much of that 200ms as possible for the VBV buffer. None of this was surprising, of course: low-latency is a use-case that requires very specialized features that most encoders don’t have. In short, x264 needed a miracle.

Fortunately, there was a startup–which has requested not to be named–that saw the potential here. With a few features, x264 could be turned into the most powerful low-latency streaming platform in the world. So, in October 2009, we began work.

The prelude to this work was multi-slice encoding support, which I wrote at the end of August. Among other things, it contained a feature that seemed rather useless at the time, but had been requested by a few clients: the ability to cap the size of each output slice of the image, so that each frame is split into a set of slices with a maximum size. One reason for this might be to fit each slice into a single UDP or TCP packet. We’ll come back to this later.

The first step was single-frame VBV support. With a single-frame VBV, every single frame is capped to the same maximum size. This means that the server can instantly send all frames after encoding them, and the client can instantly decode all received frames without buffering them. This effectively eliminates the entire VBV buffer latency and also improved support for small-but-not-nonexistent buffer sizes as well.

But single-frame VBV support seems useless at first glance. Keyframes are far larger than normal frames, so if every frame is capped to the same size, the image will completely fall apart at every single keyframe! This is completely intolerable, obviously. This means the video will only work if there are no keyframes in the stream other than the first–which basically assumes only one viewer, that nobody would want to seek in a recorded version of the live stream, and that no packet loss ever occurs for any reason. This doesn’t fit most use-cases. We’ll come back to this later, too.

The second step was to bring back a threading model which was discontinued in 2006 due to its inefficiency: slice-based threading. Normal threading, also known as frame-based threading, uses a clever staggered-frame system for parallelism. But it comes at a cost: as mentioned earlier, every extra thread requires one more frame of latency. Slice-based threading has no such issue: every frame is split into slices, each slice encoded on one core, and then the result slapped together to make the final frame. Its maximum efficiency is much lower for a variety of reasons, but it allows at least some parallelism without an increase in latency. This begins to resolve the latency problem mentioned earlier.

The final step was to bring it all together with Periodic Intra Refresh. Periodic Intra Refresh completely eliminates the concept of keyframes: instead of periodic keyframes, a column of intra blocks moves across the video from one side to the other, “refreshing” the image. In effect, instead of a big keyframe, the keyframe is “spread” over many frames. The video is still seekable: a special header, called the SEI Recovery Point, tells the decoder to “start here, decode X frames, and then start displaying the video”–this hides the “refresh” effect from the user while the frame loads. Motion vectors are restricted so that blocks on one side of the refresh column don’t reference blocks on the other side, effectively creating a demarcation line in each frame.

Immediately the previous steps become relevant. Without keyframes, it’s feasible to make every frame capped to the same size. With each frame split into packet-sized slices and the image constantly being refreshed by the magic intra refresh column, packet loss resilience skyrocketed, with a videoconference being “watchable” at losses as absurd as 25%.

No longer does 200ms seem out of reach. If anything, it’s now far more than we need. Because with –tune zerolatency, single-frame VBV, and intra refresh, x264 can achieve end-to-end latency (not including transport) of under 10 milliseconds for an 800×600 video stream. And it’s all open source. Furthermore, CELT provides the perfect open source low-latency audio equivalent for x264′s video. We already have multiple companies building software around these new features.

Videoconferencing? Pah! I’m playing Call of Duty 4 over a live video stream!

Comments [69]

69 Responses to “x264: the best low-latency video streaming platform in the world”

alex Says:
January 13th, 2010 at 11:16 pm
That’s awesome. Congrats!!

I have a few questions so I can understand more.

What is your methodology for calculating 10ms of latency?

I was always accustomed to calculating latency based upon the frame rate. As in, a GOP size of 8, gives you at least 8 * framerate latency. Since you’ve reduced the GOP size to 1, then the latency should be 1 * framerate, or a minimum of 33ms on typical hardware.

It sounds like your calculation takes into account the encode/decode time, in which case, we need to know about your hardware. (A perfectly valid way to do it, but I’m curious what sort of hardware you tested it on.)

Again congrats!! That’s amazing news.
Dark Shikari Says:
January 13th, 2010 at 11:40 pm
@alex

The way I calculated 10ms:

1) Encoding with sliced threads on a Core i7 server takes 6.9ms for an 800×600 frame. It could actually be less; that was with settings similar to “–preset veryfast” in x264, and “ultrafast” would have been even faster (but much worse compression).
2) Decoding, well, I didn’t time it, but it better be a lot less than that, so I rounded to 10ms. Probably depends on what decoder you’re using. I know that I can decode 1080p in realtime with one core on a good system, and libavcodec supports sliced threads, so 2-4ms for decoding wouldn’t be too far off.

Since the frame is sent out immediately after encoding, the latency can be lower than 1/fps.

It’s certainly not an exact number, as it does depend heavily on the hardware, decoder used, and x264 settings used, but I think it’s a fair “order of magnitude”: one can surely get less than 10ms if one really wanted, and one could also get more.
Michael Says:
January 14th, 2010 at 2:18 am
I guess the startup not to be named is OnLive?
Shevach Riabtsev Says:
January 14th, 2010 at 3:18 am
A decoder which uses slice-based threading in order to diminish latency might encounter a stream where deblocking filter is activated across slice boundaries.
I suppose that it is necessary to propose a particular profile for low-latency mode which disables filtering of slice boundaries.
Shevach Riabtsev Says:
January 14th, 2010 at 3:22 am
More comment on periodic intra refresh.
How random access can be performed in absense of keyframes.
wolf550e Says:
January 14th, 2010 at 3:41 am
What are the quality @ bandwidth achieved with this method? A clip to compare? How does this compare with the competition in this problem space? (What is the competition in this problem space? Dirac?)

I understand that this is meant to be used to encode/transfer/decode something like a news broadcast to cable/IP TV subscribers. Do yo know whether broadcasting companies do use multi-pass encoding when broadcasting pre-recorded media like a TV series?
mpz Says:
January 14th, 2010 at 5:12 am
Great work. This reminded me of the marketing talk Steve Perlman had for OnLive (a realtime video streaming gaming service currently in beta) – he emphasized that traditional video compressing algorithms introduce far too much latency because they operate on GOPs at a time and that they’d come up with a completely new video compression algorithm to deal with this. However, it sounds like H.264 was ready for this use case since day one, it’s just that nobody had bothered to implement before. (Or was able to put 1 and 1 together.)

Here’s the video: http://tv.seas.columbia.edu/videos/545/60/79?file=1&autostart=true

BTW, do you have any samples of the newly resilient bitstreams simulating 25% packet loss online?
Dark Shikari Says:
January 14th, 2010 at 5:24 am
@Michael
Nope, not OnLive. They have their own custom system (which is sort of mediocre, but probably passable).

@Shevach
As mentioned in the article, frames in which the column is on the left side of the screen are effectively “keyframes”: the decoder can seek to them, wait for the column to move across the frame, and when it’s done, it has a full keyframe.

@wolf550e
Bandwidth efficiency depends heavily on one’s latency restrictions: as always with video, extremely low latency will generally reduce efficiency. I’d say it’s not that bad though: even with a capped frame size, no lookahead, and intra refresh, you’re only looking at sacrificing a few 10s of percent compression. Could be less, too. A quick test with Foreman CIF gives -2.075db PSNR and 2.6% bitrate increase with –vbv-maxrate 2000 –vbv-bufsize 80 –intra-refresh –tune zerolatency –slice-max-size 1500 –keyint 30 as opposed to simply “–keyint 30″.

@mpz
My information suggests that OnLive is using H.264 anyways, just in a hacky and ugly way

That packet loss test was from one of my clients who was testing out the patch; I don’t have it myself.

For you and anyone who wants to test this, you can try it yourself with the appropriate x264 options:

–slice-max-size A –vbv-maxrate B –vbv-bufsize C –crf D –intra-refresh –tune zerolatency

Where
A is your packet size
B is your connection speed
C is (B / FPS)
D is a number from 18-30 or so (quality level, lower is better but higher bitrate).

Equally, you can do constant bitrate instead of capped constant quality, by replacing CRF with –bitrate B, where B is the maxrate above.
Anthony Says:
January 14th, 2010 at 10:06 am
Any idea what performance is like in a video conference context at 720p or 1080p on today’s hardware (core i5/i7)? It seems like it could get dramatically better, along with tons of new applications that weren’t possible before.

Great job and keep up the good work!
Dark Shikari Says:
January 14th, 2010 at 11:16 am
@Anthony

Here’s some results on my 1.6Ghz Core i7 with the following commandline on a low motion source:

x264 videos/720p50_mobcal_ter.y4m –preset veryfast –tune zerolatency –intra-refresh –fps 25 –vbv-maxrate 5000 –vbv-bufsize 200 –slice-max-size 1500 -o /dev/null

With sliced threads as per above: 53.96fps

For reference, if we toss sliced threads and use regular threads with 6 threads total (5 frames of latency), we get 89.27fps.

This means that 720p25, 720p50, 1080p24, 1080p30 are not at all unreasonable with zero latency on a desktop Core i7 system (which would be much faster than my laptop). Furthermore, I’m running 32-bit, and 64-bit is an extra ~15% faster. Even 1080p60 might not be out of the question on a good enough system.
Denis Says:
January 14th, 2010 at 4:20 pm
Low latency is very exciting! Sorry if my questions are silly – I know nothing about video encoding other than the fact that I really dig low latency.

Can a standard video player (i.e. flash) be used to play thusly encoded video, or would Adobe have to add support for this? It would be awesome if you could do high-quality videoconference without users installing any software – so many fantastic possibilities!

When you said you play a video game over this thing, does it mean that you have some sort of screen-grabber installed the directly pipes the grabbed video into your encoder? Is this something I can do myself? How do you transport keyboard and mouse events to the remote system?

Thanks!
Dark Shikari Says:
January 14th, 2010 at 6:27 pm
@Denis

Flash can do it, but Flash 10 has a design flaw where it always buffers at least 8 frames, rendering low latency much more difficult. It’s possible to fool this system via various extremely ugly hacks. Flash 10.1 is much improved, with only a 2 frame buffer, which is much more reasonable.

Playing a video game over this thing? Well, that’s a lot of dark magic that I probably can’t talk about
Shevach Riabtsev Says:
January 15th, 2010 at 4:20 am
@Dark
The question is when a decoder starts playing stream in which no I-picture is present – only I-columns.
Obviously the decoder should process all pictures without displaying untill al I-columns composes the whole frame.
This arises random-access latency problem.
Shevach Riabtsev Says:
January 15th, 2010 at 4:28 am
@mpz
Low-latency H.264 have been implemented for mobile and video-conferencing markets.
To implement H.264 low-latency required to perform the following restrictions:
1) Don’t use CABAC – because CABAC might cause performance peaks
2) Don’t use B-pictures, actually B-pictures can be used with condition that forward and backward references have POC less than the current B-picture.
3) Divide each picture into fixed number of slices with fixed MB size in
4) Don’t use deblocking across slice boundaries
Dark Shikari Says:
January 15th, 2010 at 8:11 am
@Shevach

There is always random access latency in a video stream unless your stream is made up of pure I-frames. This is perfectly normal.
Multimedia Mike Says:
January 15th, 2010 at 9:26 am
Interesting. I think Westwood may have pioneered periodic intra refresh with their VQA format. http://wiki.multimedia.cx/index.php?title=VQA
João Serra Says:
January 15th, 2010 at 10:40 pm
@Dark

- I know this is not intended to use on a IPTV scenario, which can tolerate latency, but how will periodic intra refresh impact the channel change time?

- How does this new x264 features compare to Hardware H.264 encoders? (quality and performance wise)

- What is the bitrate of that 800×600 stream you demoed?
sn4091 Says:
January 16th, 2010 at 12:31 am
You guys just keep getting better! Great work.

I’m intrigued by this startup mystery, my guess would be someone into Videoconferencing, launched at ces very recently?
Pengvado Says:
January 16th, 2010 at 3:10 am
@Shevach
CABAC cputime is pretty much directly proportional to bitrate. And to get low latency, you need very constant bitrate. So no, there won’t be performance fluctuations due to CABAC.
Dark Shikari Says:
January 16th, 2010 at 9:25 am
@Joao

The maximum channel change time is something on the order of (keyframe interval + VBV buffer size). This would lower the channel change time by allowing a smaller VBV buffer size, though it wouldn’t help the keyframe interval get any smaller unless you intentionally made it smaller in addition to enabling intra refresh.

I don’t know any hardware encoders for H.264 that do intra refresh. There were some for MPEG-2, AFAIK. In terms of general quality and performance, x264 has been ahead of hardware encoders for a very long time.

I’m not sure about the bitrate of the stream because of the way it was set up: it used constant quality mode and then capped the size of each frame based on the user’s connection speed. So for example, if I’m on a 6mbps line, at 30fps, it will allow no frame to be larger than 200k. This doesn’t mean it will use the full 6mbps, however.

I would say you only need about 1-2mbps for low-latency 800×600 in most cases.
Shevach Riabtsev Says:
January 18th, 2010 at 3:01 am
@Pengvado
the question is how to keep constant bit-rate during CABAC encoding. How to keep constant bin-rate (don’t confuse with bit-rate) we know (e.g. by adjustment of QP according).
But how to know in one-pass encoding how many bits are generated by arithmetic encoder of CABAC – 7 bits or 0.1 bits.
Pengvado Says:
January 18th, 2010 at 8:31 am
That’s backwards. Bitrate is what we already control; binrate (number of boolean decisions that go through the arithcoder) is what determines cabac decoding speed.

But bits per decision only varies between about 0.6 and 0.8, and even that much is highly correlated with bitrate, thus the variation within a single CBR video is even less. (6 or .1 bits are possible for a single CABAC decision, not sustainable over any sort of timescale that matters.)
Aaron Says:
January 18th, 2010 at 9:43 am
@Denis, adding to the comments regarding Flash’s 8/2 frame buffer, this is a direct result of the audio codec used. Nellymoser is a terrible codec that uses 2 bits per audio sample. At 8kbit/44khz, thats 90ms per frame. I haven’t tried but I dare say flash would play back the a stream with no audio track with much lower latency. Depending on your use case, that may or may not help.
Dark Shikari Says:
January 18th, 2010 at 7:30 pm
@Aaron, nope! Can’t be entirely true, since I have a bit of code (which I can’t post, unfortunately) that “tricks” Flash 10 into buffering fewer frames. Furthermore, none of my clients who have dealt with Flash’s latency use Nellymoser.
Aaron Says:
January 18th, 2010 at 9:21 pm
Ahh, really? I humbly stand corrected.

I did some testing a long time ago (Flash 9) and noticed that video latency jumped up when I used audio. As with most things flash though, I guess when you start digging deeper things aren’t as simple as you’d assume / like.
totoum Says:
January 19th, 2010 at 2:51 pm
maybe this is the startup in question

veetle:

http://www.veetle.com/index.php

When using the “super advanced commandline” to create a channel you see that the encoder is x264

Though maybe this is just a coincidence
x264fan Says:
January 20th, 2010 at 10:32 am
Which version of libx264 contains these low latency changes?
Shevach Riabtsev Says:
January 20th, 2010 at 6:46 pm
@Pengvado

I agree with you. I also get bit/bin ratios in the interval 0.6 – 0.8 .
But if you take noisy source (e.g. white noise) then bit/bin ratio gets larger.
Another example if you take a picture where source entropy is abruptely changed (e.g. one MB row is white noise and next is ordinal video and so on) then bit-rate is not easely predicted from bin-rate.
Most of encoder designers don’t take above cases into consideration. Perhaps they are right, it is not reasonable to “align” encoder’s architecture to the theoretical worst cases because it increases significantly increases complexity.
Shevach Riabtsev Says:
January 20th, 2010 at 6:54 pm
I am not sure that streams which contains Intra Refresh are played by modern RT decoders.

Because in random access mode (e.g. channel change) some decoders enter to decoding mode at I-picture only. If no I-picture is present in the stream the decoder will not play back the stream.
To support Intra Refresh perhaps it is required to change ASIC or microcodes
Dark Shikari Says:
January 21st, 2010 at 1:39 am
@x264fan

r1391 and later, though r1400 is recommended due to a bugfix in intra refresh.
x264fan Says:
January 21st, 2010 at 2:49 pm
I think @Shevach has a point – not all decoders support Intra refresh correct? Is there a specific set of requirements for what kind of decoders are compatible with the low-latency changes?
Dark Shikari Says:
January 22nd, 2010 at 8:55 am
@x264fan

No, all decoders support intra-refresh. The only catch is whether a decoder can *seek* properly in an intra-refresh stream, which is not really important for streaming purposes.
Shevach Riabtsev Says:
January 23rd, 2010 at 5:49 am
@Dark
I know two ASIC decoders which don’t support Intra-refresh.
Indeed, in random access mode a decoder seeks I-picture to start if no I-picture is present (due to intra-refresh) – no decoding.
Dark Shikari Says:
January 23rd, 2010 at 4:28 pm
@Shevach

Almost all containers provide keyframe flags; the decoder seeks based on the flags in the container, not the video stream itself.

That problem only occurs with a decoder that doesn’t support SEI Recovery Points, *combined* with a container that doesn’t have keyframe flags (e.g. H.264 ES).
Shevach Riabtsev Says:
January 27th, 2010 at 6:01 am
@Dark

Could you provide a stream with Intra Refresh in order to update my microcodes.
pip Says:
January 31st, 2010 at 3:28 pm
you or anyone can make your own really easy Shevach.

just download the current http://x264.nl/
and use something like the example above

x264 -crf 20 –intra-refresh –fps 25 –vbv-maxrate 5000 –vbv-bufsize 200 –slice-max-size 1500 -o whatever.mp4 inputfile
Esurnir Says:
February 1st, 2010 at 3:06 pm
@Shevach: Sorry if that’s not a video conference video that’s the only video I had (used it to fprofiled a 64 bit build).

http://www.mediafire.com/?jzztmxxm3d3

$ x264 –tune zerolatency –intra-refresh –preset veryfast –fps 30 –vbv-maxrate 700 –slice-max-size 1500 –vbv-bufsize 23 -o persona.264 persona.y4m
Shevach Riabtsev Says:
February 2nd, 2010 at 2:59 am
@Esurnir
Thanks for your Intra-Refresh stream. I need such stream for testing of decoder’s operations.
Esurnir Says:
February 2nd, 2010 at 6:43 am
The vbv buffer should be set at arround 30000 bit though, since there’s a “big” frame at one point.
Shevach Riabtsev Says:
February 3rd, 2010 at 6:23 am
@Esurnir
Frankly speaking some ASIC decoders ignores VBV information. These decoders uses its own input buffer with size exceeding the intended VBV size.
Esurnir Says:
February 3rd, 2010 at 3:16 pm
actually my buffer was fine, it seems that it’s the vbv checker i was using that was wrong XD
pookie Says:
February 8th, 2010 at 5:19 pm
Anybody got b_intra_refresh working with flash? I’m running flash 10 and if I use b_intra_refresh, the video won’t play.
Dark Shikari Says:
February 14th, 2010 at 12:34 pm
@pookie

It should at least play, but seeking is impossible with Flash in b_intra_refresh.

It’s not meant for anything outside of streaming though, so don’t expect it to work well for normal files.
junkct Says:
February 16th, 2010 at 2:24 am
Can i encode live streams through x264, if yes any suggetions?
Andrew Klofas Says:
February 17th, 2010 at 1:49 am
That sounds really cool. Nice job.

Lemme take a stab at the unnamed startup: Is it Willow Garage? I know that they’ve been up to some cool robotic applications.
Andrew Says:
February 17th, 2010 at 3:26 pm
Are these changes committed to the repo yet?

Do you have any suggestions on what x264_param_t config options to set to enable this? I noticed that the command line utility has switches like -preset veryfast and -tune zerolatency, but it’s hard to track down what those do to the x264_param_t struct.

Thanks. Andy
Dark Shikari Says:
February 17th, 2010 at 3:31 pm
@Andrew

x264 –fullhelp lists what every tune and preset option does.
Mark Says:
February 19th, 2010 at 6:02 am
I’ve got a requirement to input 2 live analogue PAL video feeds and stream them with as little latency as possible (less than 200ms end to end)- they are to be sent over an intranet but need to work over wifi, so I don’t want to exceed 10Mb/s for 2 streams. How do I get your low latency command line switches into ffmpeg ( ffmpeg -f video4linux2 -s 4CIF -r 25 /dev/video0 -vcodec libx264 -f rtp rtp://127.0.0.1:8000 ), or is there a way of using x264 to directly stream to rtp? Apologies in advance if this is a total n00b question…latency in digital video streaming seems to be the issue that shall not speak its name.
Dark Shikari Says:
February 19th, 2010 at 10:44 am
@Mark

Unfortunately, currently ffmpeg doesn’t provide access to all of x264′s parameters, including the low latency parameters.

Here’s what I’d do:

1. Read Rob’s guide to x264 encoding with ffmpeg ( http://rob.opendot.cl/index.php/useful-stuff/ffmpeg-x264-encoding-guide/ )

2. Modify libavcodec/libx264.c to assign the options you want that ffmpeg doesn’t expose yet. Just find all the parameter handling code and stick in the new options you want.

Your bandwidth restriction is pretty easy to deal with: you’d probably want something like:

–tune zerolatency –intra-refresh –vbv-maxrate 5000 –vbv-bufsize 200

for each stream, with appropriate other settings set for speed purposes.

On an unrelated note, RTP support for native x264 would be quite awesome
Shevach Riabtsev Says:
February 24th, 2010 at 7:08 am
@Esurnir
I analyzed intra-refresh stream you provided me.

As far as I can see intra-refresh is carried out in column-wise mode, i.e. each picture contains two successive Intra MB columns.
Due to deblocking the decoder can’t correctly restore pictures. Indeed, in random access mode the decoder starts decoding at a P-picture. Let’s suppose that the first MB column of the P-picture is intra and the rest MBs contains garbage samples since no reference is available for them.
After IDCT and spatial compensation of the first MB (Intra MB due to refresh) the encoder and the decoder are in sync.
Then the deblocking process is invoked. Because the right neighboring samples for our intra MB are garbage the deblocking performs incorrect filtering. Consequently samples in the intra MB column are slightly corrupted, i.e. reconstructed samples at encoder and decoder sides differ!!!
For the second picture the same phenomenom occurs and so on. Finally the whole stream gets corrupted until IDR picture (if it exists).
I think it is not correctly to produce column-wise intra refresh stream unless deblocking is OFF. Random entrance to the stream causes visual distortions.

@Dark
Apparently Intra-refresh should be executed in row-wise mode where each Intra MB row is encapsulated as a separate slice and deblocking operations across slice boundaries are disabled.
Otherwise no decoder can provide video quality in random access mode (e.g. in stream sliding).
Dark Shikari Says:
February 24th, 2010 at 10:30 am
@Shevach

This is incorrect; you have made a mistake.

The first frame has two columns of intra: we will represent this using a single row of MBs:

II______

The first “I” is correctly decoded. The last few pixels on the right of the second “I” are not because of the deblocking between the I and the junk data.

The second frame is as follows:

PII_____

The P is correct, because it predicts from the correct pixels (the first I, or the correctly decoded part of the second I). The first I is correct too, since it predicts from the correct P block. The second I is slightly wrong as before.

A proof by induction from this point is trivial.

(Do note that I’m not sure if the current code does consider that extra few pixels as “invalid” for purposes of the P’s MV; if it doesn’t, it will be EXTREMELY slightly wrong, probably not in a visible fashion.)
Shevach Riabtsev Says:
February 25th, 2010 at 1:53 am
@Dark
Your reasoning is correct albeit it is not mathematically rigorious. I was wrong.

So, in order to support Intra Refresh an encoder should produce two successive intra MB columns and enforce MVs point on already restored regions.
How the above restrictions affect on compression ratio?
Perhaps it is more efficient for each picture to generate I-slice containing a single MB row with disabling deblocking across the slice boundary.
Denis Says:
March 5th, 2010 at 2:24 am
Is this implemented in new Skype 4.2 ? Video quality more better than before.
Dark Shikari Says:
March 5th, 2010 at 9:42 am
@Denis

Skype uses VP7 as far as I know.
michael Says:
March 12th, 2010 at 8:36 am
@Dark

1. i might have the rtp done….how do you define “RTP support for native x264?”

2. libavcodec is great, but it’s a bit large. what are the prospects of decoding this with something smaller, like jm for example?
Dark Shikari Says:
March 12th, 2010 at 10:08 am
@michael

You can compile libavcodec with only the decoders you need. It’s quite reasonable for embedded systems and other cases where you need very small binaries.

JM is probably larger.

“RTP support for native x264″ means that x264CLI gets RTP output.
michael Says:
March 16th, 2010 at 11:11 am
@dark

1. it is implied that the only stream to decode is a low-latency stream, i.e. x264, thus this is a simple, logical approach. any links on this? always trying to save effort.

2. jm compiled is 500k, but i may be able to remove non-referenced code, just as mentioned above.

3. will take a detailed look at x264cli and get back to you.
CF Says:
March 19th, 2010 at 5:03 pm
Some new low-power Skype add-ons work strictly in H.264 mode (performed with low-powered ASICs). I measured ~200ms end-to-end at CES:

http://about.skype.com/press/2010/01/new_era_in_face_to_face.html
CF Says:
March 19th, 2010 at 5:23 pm
Also, I believe the LG? demo rep at CES 2010 said x264 stream was wrapped in RTP/UDP, with AAC-LC stereo 48 kHz audio..

http://about.skype.com/press/2010/01/new_era_in_face_to_face.html
WZ Says:
April 4th, 2010 at 11:16 am
@dark

You mentioned “x264 is used in thousands of servers at hundreds of head-ends for cable and IPTV broadcast”. Can you give an example command-line of what sort of settings might be used for such applications? In my situation, I would like to broadcast a live 640×480 stream at < 2 Mbps. I can live with up to 10 second delay if it helps improve video quality. Thanks.
Dark Shikari Says:
April 4th, 2010 at 12:04 pm
@WZ

For that kind of thing it depends heavily on what the receiving player is. If it’s actual IPTV with a set-top-box receiving the stream, you should already know a lot of the options you need (e.g. in terms of VBV-maxrate, VBV-bufsize level, reference frames, etc).

A basic starting point would be something like:

–preset (Use Whatever You Can Get Away With given your CPU)
–tune film (assuming your content is live-action)
–vbv-maxrate 2000
–vbv-bufsize X, where bufsize / maxrate = number of seconds the client side is set to buffer.
If you need outright CBR, as on some IPTV networks, –bitrate 2000.
If you don’t need outright CBR, use constant quality mode, e.g. –crf 22 or similar.
WZ Says:
April 5th, 2010 at 9:58 pm
@Dark

Thanks for those tips. I was curious what commercial applications (such as cable and IPTV) used for their rate control. I know in satellite, they tend to stat-mux their streams based on what’s happening on each channel sharing the same transponder. Didn’t know that anyone used CRF for live applications since the bandwidth usage would be unpredictable. CBR would work fine but probably not optimal when one of the channels is showing static images while another a high-action movie.
Dark Shikari Says:
April 7th, 2010 at 4:00 am
@WZ you can use CRF with a cap (e.g. VBV). This means the encoder can use no _more_ than you asked for, but it can use less.

Hard CBR (with filler) is used in broadcast cable where you need to fill the whole mux no matter what.
Nil Einne Says:
April 30th, 2010 at 2:33 pm
Is there any actual publicly available video conferencing software that uses x264 yet? A quick search couldn’t find anything…

Cheers
Marsian Says:
May 4th, 2010 at 9:57 am
@Nil

Check out Unreal Media Server.
http://www.umediaserver.net

It uses x264 for live encoding and streaming to Flash Player. It is very low-latency implementation; you can use it for video conferencing. The Unreal Live Server can encode in VC1 and x264, so you can compare video quality at the same bitrate. We find live x264 encoding better than VC1 at medium-high bitrates.

They wrapped x264 in a DirectShow filter; the source code for this filter can also be donloaded from their site.

The weird thing about their system is that it records ASF files with x264-encoded video, my Windows Media Player refuses to play them.
Dragos Says:
May 4th, 2010 at 12:22 pm
Could you provide an actual example of low latency streaming? Like videoconferencing using a webcam…
Could you include some sample code like CELT does?
Jelle Says:
October 13th, 2010 at 6:32 am
Thanks for your article, it helps a lot!

/2. Modify libavcodec/libx264.c to assign the /options you want that ffmpeg doesn’t expose yet. /Just find all the parameter handling code and /stick in the new options you want.

Any idea where I can find the options you mention? It doesn’t seem to be as straight forward as you put it (well, at least to me it isn’t : ).
Håvard Tunheim Says:
December 23rd, 2010 at 12:41 am
@Dark Shikari
we’re very interested in integrating a solution similar to this in our software. Are you available to help us with this?
Thor Grimner Says:
March 29th, 2011 at 1:39 am
We recently needed to implement low-latency streaming for an online robot control project and used this article as the base reference for the implementation.

Since we had absolutely no previous experience with video encoding/decoding or streaming, it took a while to get everything figured out and it would have been great to have a sample implementation available for study…

Anyways, our basic implementation is more or less complete and it’s cross platform and open source. If you’re interested you can get it here:

https://github.com/oau/streamer

Latency using x264 with this scheme is every bit as good as we’d hoped!

01/13/2010 (10:23 pm)

x264: the best low-latency video streaming platform in the world

69 Responses to “x264: the best low-latency video streaming platform in the world”

Leave a Reply

Control Panel

Search

Pages

Recent Entries

Archives