Explained: The Shannon Limit (2010)

mturmon · on Aug 6, 2012

This little article failed to say the really surprising thing about channel coding, which is that good channel codes exist at all. There's a simple argument that seems to indicate that they should not exist. If I remember correctly, this argument was commonly accepted by engineers at the time Shannon's paper came out.

The argument is: you have a communications channel that flips bits with some probability, say 1/4. Suppose you want to transmit one bit. The first bit might have an error, so you add a parity bit -- now P(error) is 1/16. But that parity bit might have an error, so you add a second parity bit, and you're down to P(error) of 1/64. So you are driving P(error) to zero, but at the expense of ever-increasing overhead.

If you keep on adding parity bits, it's not clear that you can ever make the error probability arbitrarily small, while still transmitting some information.

That is, in the simple approach above, P(error) -> 0, but Overhead -> Infinity.

It is "intuitively obvious" that's a fundamental problem, or at least it was, until Shannon's results showed that intuition was wrong, or at least needed correction. In fact, the amount of overhead-per-bit is not infinite, but is a finite number related to channel capacity.

gwern · on Aug 6, 2012

> This little article failed to say the really surprising thing about channel coding, which is that good channel codes exist at all. There's a simple argument that seems to indicate that they should not exist. If I remember correctly, this argument was commonly accepted by engineers at the time Shannon's paper came out.

I did not know that, thanks. (Now I understand better why Hamming, in 'You and Your Research', calls Shannon a 'man of infinite courage'.)

ordinary · on Aug 6, 2012

Maybe it's because of the way you put it, but I can't help but see an algorithm that exponentially increases the reliability of the message by linearily adding bits. 1 bit for an error rate of 1/4, 2 bits for 1/16, 3 bits for 1/64. Am I wrong or was this really lost on everyone, back in the day?

mturmon · on Aug 6, 2012

But then, to decrease error probability arbitrarily close to zero, you need arbitrarily many parity bits. That is undesirable.

Shannon showed you can decrease it arbitrarily close to zero using a finite number of parity bits. Surprised?

Not content with that, Shannon showed how to compute the ratio of overhead you needed. He also showed that, if you used slightly less overhead than his formula, no matter how clever your code was, your error probability would be rather large.

Then, the reader felt humble and rather amazed.

About fifty years later, people started to figure out how to make codes (in general situations) that got very close to the best-possible ratio Shannon found.

Dylan16807 · on Aug 6, 2012

It took me a minute to digest this, but I see where the problem was. It's not exactly a finite number of parity bits, but a finite percentage of parity bits. The first step is to calculate the parity:data ratio needed to just barely overwhelm the effect of noise. Then to increase reliability you combine messages, costing no additional bits. Combining messages dilutes spikes in noise so that the parity is not overwhelmed, to an arbitrary certainty.

This also goes to show that the direction you approach a problem from makes a big difference.

It's easy to get the impression that you can't make efficient very-low-error codes by looking at the effect/cost of parity bits.

But it's also easy to realize that fusing messages can provide statistical certainty that noise levels don't spike above redundancy levels.

-

Also, in a completely different point:

Adding standard parity technically consumes arbitrarily many bits, but log(n) is a very small number. This has to be considered along with the fact that Shannon pulled a bit of a hat trick. To make an efficient code safer you have to increase the block size and therefore the latency. You can't get these improvements for free.

ordinary · on Aug 6, 2012

Thank you, that does explain it.

alister · on Aug 6, 2012

Since we're talking about the Shannon limit, why does a 56 Kbps modem not violate it?

The common wisdom in the early '90s was that the fastest possible modem for a dial-up connection would be ~35 Kbps. I even remember reading this "fact" in a communications theory book from that era. Analog modems could never be faster. (It's assumed that US and Canadian analog telephones have a 3 kHz channel.)

Then why does a 56 Kbps modem not violate the Shannon limit for data transmission on a 3 kHz channel? How is it possible to break a fundamental principle of information theory?

This question has bugged me a lot. Based on some research on web, I tried to put together an intuitive answer to what happened. Here's my answer, but I welcome insight or corrections from people knowledgeable in telephony or information theory.

At the time the book was written--when people believed that dial-up modems could never be faster than 35 Kbps--the telephone network was perceived as being analog end to end (though lots of it was already becoming digital), like this:

    User<------------{ telco network }------------------------>ISP
           (analog)        (analog)      (analog)

A critical assumption in calculating the Shannon limit was the noise floor of the network, which was taken to be 35 dB, a figure based on an all-analog network. (A higher dB number is better for transmission quality.) The Shannon limit would be 35 Kbps based on a 3 kHz bandwidth and that particular signal to noise ratio of 35 dB. ( bps = BW log2 (1 + P/N) = 3000 log2 (1 + 10^(35/10)) = 34881 bps )

However, by the mid-'90s, most of the telco network became digital. Only the customer loop--the connection between the user and the telephone company--remained analog. So now, with respect to the Shannon limit, only the noise floor of the customer loop matters, since the rest is digital:

    User<------------{ telco network }------------------------>ISP
           (analog)        (digital)     (also digital)

It turns out that it is possible to achieve a much better S/N ratio when only the customer loop is analog. The Shannon limit would be based on a 3 kHz bandwidth (the same bandwidth as before) and but a much higher noise floor of 98 dB for the customer loop. The Shannon limit would now be ~97 Kbps. ( bps = BW log2 (1 + P/N) = 3000 log2 (1 + 10^(98/10)) = 97664 bps )

The Shannon limit still exists, but other factors related to the network would limit the data rate before this bigger Shannon limit takes effect.

So the key thing that the experts and the communications theory book got wrong was to neglect the hidden assumption about S/N ratio. The bandwidth matters, but the S/N ratio matters too.

ams6110 · on Aug 6, 2012

Those modems would negotiate as well. 56K was the best they would do. I don't think I ever saw one actually connect at that rate. I would get 46-48 Kbps. So perhaps my real bandwidth was a bit less that 3 KHz or maybe my line was a bit more noisy.

est · on Aug 6, 2012

this buggs me too. They used to say that ADSL mas max speed of 8mbps, now there are faster ADSL all around the world.

alister · on Aug 6, 2012

When they used to say that 35 Kbps was the maximum for a analog modem, it was purportedly due to a fundamental principle of information theory.

When they used to say that ADSL had a maximum of 8 Mbps, it might be just because the spec for the first version of ADSL was written in that way. It's probably not due to any limits of physics or math.

wickedchicken · on Aug 6, 2012

The codes which the article hints at near the end are LDPC[1] codes, which are being rapidly deployed in nearly anything requiring storing or transmission of information. I recall speaking to a grad student working on integrating them into flash memory to reduce its cost.

[1] http://en.wikipedia.org/wiki/Low-density_parity-check_code

epistasis · on Aug 6, 2012

In addition to being discovered twice, by Gallagher and Glavieux & Berrou, LDPC codes have deep connections to the belief propagation algorithm, which was discovered in machine learning, and algorithms for spin glasses which were discovered in physics.

There's a great temptation going here to say that this type of algorithm is at a sweet spot for understanding communication, learning, and some physical systems. The behavior of belief propagation at state transitions is quite fascinating, and it really feels as though there is a great discovery to be made there.

I have the utmost respect for coding researchers; every time I read their papers I get new ideas. If you're a programmer interested in writing fast code or coming up with new algorithms, I'd recommend occasionally slipping a coding paper into your reading list.

Cogito · on Aug 6, 2012

Some of the best known codes that approach the Shannon Limit are the Gallager, or Low-density parity-check, Codes [1], invented by Robert G. Gallager at MIT in 1960.

The article goes on to mention Gallager, but doesn't provide many references. It's pretty interesting work so check it out!

[1] https://en.wikipedia.org/wiki/Low_Density_Parity_Check_Codes

[2] https://en.wikipedia.org/wiki/Robert_G._Gallager

imurray · on Aug 6, 2012

It's been linked to many times on HN, but not yet in these comments: An accessible textbook (free online) on information theory, and its links to machine learning: http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

Recently video lectures that go with that book also went online: http://www.inference.phy.cam.ac.uk/itprnn_lectures/

daviddaviddavid · on Aug 6, 2012

The resemblance between Claude Shannon and Thomas Vidick (off to the left) is positively uncanny. They're nearly indistinguishable.