Book Discussion: Information Theory

We will discuss the exquisite introduction to information theory given by the book An Introduction to Information Theory: Symbols, Signals and Noise by John R. Pierce. The book features a detailed description of the nuts and bolts of the mathematics of information. There is also an exquisite and thorough discussion of entropy. The book has chapters discussing the role of the subject in physics, cybernetics, psychology, and even art and music.

A few of the chapters are a bit challenging, but overall the book is an easy read. Excluding the index, the book is just under 300 pages, so it will take some time to read in its entirety. Our discussion will focus on the first nine chapters.

The first edition (1961) of "An Introduction to Information Theory" is in the public domain and can be read in many formats for free. I have read Dover's second edition of Pierce's book on Information Theory (1980).

Possible Topics for Discussion:

• The nature of scientific theories

• The nature of mathematical models

• The importance of proof in mathematics

• The mathematical model of communication theory

• The nature of encoding information

• Entropy: a measure of the information conveyed from a source to a recipient

• Shannon's fundamental theorem for the noiseless channel (channel capacity)

• Language, meaning, understanding and information theory

• The noisy channel: error detection and correction

• The use of multidimensional space to characterize channel capacity

If there is some aspect of Pierce's book or information theory in general which you would like us to discuss, please post a comment about it below.

Join or login to comment.

  • Martin C.

    I found a book on information theory that I think is pretty good, A Student's Guide to Information Theory http://www.amazon.com/Students-Coding-Information-Theory-Hardcover/dp/B00I2FY2QI/ref=sr_1_1?s=books&ie=UTF8&qid=1404899086&sr=1-1 I am still reading it, but so far I have really enjoyed it. It is less than 200 pages, but it is fairly dense in math, containing proofs along with examples and exercises. I looked ahead in the book and saw that it contains a proof of the Huffman code creating the shortest average code length. The math required is high school level.

    July 9

  • Lynn

    Sorry, too tired for this one.

    1 · June 28

  • CJ F.

    Tomorrow we will discuss Pierce's book on Information Theory. Reading the book is optional. You can read this & the previous comments to prepare a little.

    Chapter 9 explains & sketches the proof of an important theorem about a noisy continuous channel.The formula C = W log(1 + P/N) where C is the channel capacity, W is the bandwidth or the width of a band of frequencies as measured in cycles per second (hertz) "gives the rate at which we can transmit binary digits with negligible error over a continuous channel in which a signal of bandwidth W and power P is mixed with a white Gaussian noise of bandwidth W and power N."

    The quotient P/N can be regarded as the ratio of signal power to noise power. So we can increase C by increasing W or P. Why does use of more frequencies increase communications performance? With infinite frequencies, can we get infinite capacity for communications? Did you grasp the proof that most of the volume of a high dimensional sphere is near its surface? Wow!

    June 27

  • CJ F.

    Tomorrow we will discuss Pierce's book on Information Theory. If you haven't been able to read the book, instead you can read this comment and some of the preceding ones which discuss highlights & questions about the book.

    In Chapter 8 Pierce explains the astonishing (that's Pierce's word not mine!) result known as Shannon's fundamental theorem for a noisy channel. The theorem states (p. 164) "that when the entropy or information rate of a message source is less than this channel capacity, the messages produced by the source can be so encoded that they can be transmitted over the noisy channel with an error less than any specified amount." Wow: we can transmit messages effectively if only our channel is big enough to accommodate the information content of our messages! Really?

    On p. 157 Pierce sketches the proof of this theorem. Did you follow the proof? Does it make sense? Pierce says "One information theorist has characterized this mode of proof as weird." Is it weird? Is it valid?

    June 27

  • CJ F.

    Since reading Pierce's book on "Information Theory" is not required to attend Saturday's discussion, I'm providing notes so those who can't read it by Sat will have some things to think about.

    Chapter 4 covers coding & bits: essential material that I assume most of us know. Chapter 5 on Entropy gets very interesting. Although other sources motivate the subject by reference to Boltzmann's entropy formula in thermodynamics, Pierce wants us to see the information theoretic notion of entropy afresh using the question of the efficiency of encoding a message as a driving example. In this view entropy is the amount of information or the average number of bits (per symbol or per second) "necessary to encode the messages produced by the source".

    Is a pure information theoretic approach to entropy helpful?

    Is the formula for entropy as the negative of the sum of the products of certain probabilities times their logarithms intuitively justifiable? Can you explain it better than the text?

    June 25

    • Martin C.

      I was hoping the book would have provided an intuitive feel for the formula for product of probabilities times log of probabilities. I have yet to see an intuitive explanation. How did Shannon arrive at that equation? I am wondering if it fell out of a more mathematical analysis, perhaps from an investigation into the desired properties that a measure of information needs to have.

      June 26

    • CJ F.

      Claude E. Shannon's paper "A Mathematical Theory of Communication" is available from Bell Labs: http://cm.bell-labs.c...­

      But that site is down right now. I got a copy from http://www.mast.queen...­

      The paper looks readable. The book uses mostly the same notation. But I'm not sure I can glean the answer to your question from Shannon's paper. Shannon seems to rely upon a familiarity with Markov Processes which is beyond me. I suspect it is a simple counting argument, but I'm weak in probability theory.

      Can anyone explain why entropy is the negative of the sum of the products of certain probabilities times their logarithms?

      June 26

  • Sam B.

    http://irishredfox.blogspot.com/2014/06/information-theory-philadelphia.html?m=1 I can't make it sadly, but I did do a post on the model chapter. There's a link to another article about how spell checkers work, and it walks the reader through how to make one using python. Have a good time guys.

    1 · June 26

  • CJ F.

    On Saturday we will discuss Pierce's book on Information Theory. I'm summarizing key arguments to help those who haven't had time to study the book and to highlight passages for those who are reading this extraordinary book.

    In addition to a thorough discussion of entropy in Chapter 6, Pierce also covers Huffman coding which "is the most efficient code for encoding a set of symbols having different probabilities". I was impressed by Figure V-3 which explains Huffman coding by example. Wow! Did you admire the elegance of this encoding? Did you get it (I had to read it twice)?

    Shannon's fundamental theorem of the noiseless channel: if entropy < the channel capacity, then messages from an ergodic source can be transmitted without error. Does this show that entropy captures the information content of messages? Is it fascinating to you to see how probabilities can capture information content by measuring uncertainty? Is this why information & uncertainty are so intimately related?

    June 26

    • CJ F.

      There one last little tidbit in Chapter 5 that I found fascinating. Evidently, Shannon estimated that "in writing grammatical English we have on the average a choice of about one bit per letter". This is "proved" by the nice graph Figure V-4 on page 102. One bit per letter? Does that feel right to you?

      June 26

  • Roger T.

    One comment: Information theory says absolutely nothing about semantics and meaning although some social theorists seem to imply that it does. It is not a General Semantics.

    June 24

    • Brian

      "says absolutely nothing about semantics and meaning" What exactly do you mean? I agree that it can easily be misapplied in an overly simplistic manner and taken to determine more than it necessarily does, but aren't there many ways in which information theory, with sufficient tweaking, could be relevant to semantics? Or are you making the distinction between semantic content and the form of its communication (which on some levels is a tenuous distinction, especially given the evidence for enactivist accounts of a large part of neurological meaning)?

      June 24

    • Martin C.

      Chapter 6 is areal muddle, and it also shows by what is not covered, how much has happened since the book was written. One thing that surprised me is that there is no mention of computer programming languages. True you can't use them to write poetry, but they have a limited but highly useful grammatical structure. Machine learning is an area that uses an approach not considered by Pierce - using training sessions to "teach" the program by telling it when it has made a correct pattern match. The use of brute force in searching massive databases is another interesting area. IBM's Watson was able to use this to give the appearance of understanding English sentences.

      June 24

  • CJ F.

    The mathematical model of information theory centers on the stochastic (random) process of a message source. Pierce uses the brilliant example of randomly produced English text as a guiding example. The simplified but "reasonably realistic" message source of information theory is technically called "ergodic". An ergodic source is "stationary" (so any statistic is independent of the distance from the start of its messages) and "every possible ensemble average (of letters, digrams, trigrams, etc.) is equal to the corresponding time average".

    Unfortunately Pierce does not explain why the regularities of ergodic sources simplify the development of information theory. Why can't we allow statistics to vary depending on distance from the start? Why can't ensemble averages vary from time averages (how are they different)? Is the difference all that great? Is it reasonable to admit these simplifications or do ergodic sources vary too much from real sources of messages & communications?

    June 24

    • Brian

      Yeah---and, more generally, Pierce seems to really want to stick to simplicity. Can't information theory be extended to more complicated systems? How do advances in computing---and, eventually, quantum computing---factor in? And aren't there many cases where complex systems can be functionally or pragmatically analyzed in terms of simple causal changes ("the difference that makes a difference", or "all else being equal"... which we essentially see in some of Pierce's "proofs")? But this might benefit from bringing in recent critics like Ostrom and advocates of "thinking small" like the authors of Freakonomics.....

      June 24

  • Martin C.

    CJ, I did not find the discussion about science to be particularly enlightening. If we want to talk about the philosophy of science, we would be better off reading Kuhn or Popper. I would much prefer to talk about information theory. I wish the book provided more in the way of proofs, but it provides a good introduction.

    1 · June 23

    • CJ F.

      I liked how Pierce placed mathematics in the purview of science. So there is an aspect of mathematics that leans upon science. But the book doesn't go into much depth. I think you are right: we should focus on information theory.

      June 23

  • CJ F.

    On Saturday we will discuss John R. Pierce's book "An Introduction to Information Theory". He starts Chapter 3: "A mathematical theory which seeks to explain and to predict the events in the world about us always deals with a simplified model of the world, a mathematical model in which only things pertinent to the behavior under consideration enter. ... The great beauty and power of a mathematical theory or model lies in the separation of the relevant from the irrelevant so that certain observable behavior can be related and understood without the need of comprehending the whole nature and behavior of the universe." (pp. 45-6).

    Pierce emphasizes that models are simplified and often "wrong" (he observes on p. 46 that no truly rigid body exists but a model of them can predict the orbits of planets). I love the way Pierce blends the incisiveness of mathematical models with their limitations & inadequacies.

    Did you like the exposition on mathematical models? Did he capture the essence?

    June 23

  • CJ F.

    Chapter 1 in John R. Pierce's book "An Introduction to Information Theory: Symbols, Signals and Noise" discusses the nature of scientific and mathematical theories. I found the treatment wonderful, but do others want to discuss these issues on Sat or focus on information theory?

    I particularly appreciated Pierce's treatment of scientific theories. "A valid scientific theory seldom if ever offers the solution to the pressing problems which we repeatedly state. It seldom supplies a sensible answer to our multitudinous questions." But "It tells us in a fresh and new way what aspects of our experience can profitably be related and simply understood" (p. 4). Wonderful: the limitations and nature of scientific theories!

    June 22

    • CJ F.

      Pierce also explains the role of mathematics: "Mathematics is a way of finding out, step by step, facts which are inherent in the statement of the problem but which are not immediately obvious" (p. 17). He emphasizes the critical role of theorems in mathematics: "a statement which must be proved, that is, which must be shown to be the necessary consequence of a set of initial assumptions" (p. 18). He gives a proof that a 1-1 map from a square to a line is discontinuous. Were any of his proofs in chapter 1 "good"? What did you think of them?

      June 22

  • Martin C.

    I just came across this, which I hope you will all appreciate:
    http://xkcd.com/

    For an explanation see:
    http://www.explainxkcd.com/wiki/index.php/Main_Page

    1 · June 13

    • Martin C.

      The comic gets updated regularly, so the links will not apply in a few days. Here is a link to the arhive: http://xkcd.com/1381/­ Explanation: http://www.explainxkc...­

      June 14

    • Lynn

      I'm not, generally speaking, a fan of geeky web comics. They tend to be, well, unfunny. But xkcd is insightful and a bit humorous some of the time. That was a pretty neat reference to Fermat. Here's one related to information theory that packs quite a bigger practical punch:

      https://xkcd.com/936/­

      ^ Now that's something you can use!

      June 14

  • Lynn

    Quite a lot has changed since "cybernetics" was a more current term than it is now. Today, "control theory" seems to be a more widely used term for most of what Wiener originally wrote about and other disciplines rolled into "cybernetics" have maintained their autonomy. Anyhow, here is a demonstration of one of the latest sorts of automatic guns considered by Wiener in the 40s, the Phalanx Close-In Weapon System ... that sort of thing has come quite a long way:

    https://www.youtube.com/watch?v=Zdp9llrBLnA

    Please pardon the awful generic butt rock background music.

    June 13

  • Lynn

    I couldn't help but help try to put the old myth about the Inuit number of words for snow to rest as it appears in this title:

    http://en.wikipedia.org/wiki/Inuit_languages#Words_for_snow

    Inuit is polysynthetic, so from our perspective, it has scads of words for *everything*. The basic "atoms" of Inuit that correspond to snow are, from what I see here, only two in number.

    June 12

    • Lynn

      I wonder if the source for the figure of two given there is based on a "standard" variation, somewhat analogous to MSA.

      June 12

    • Sam B.

      As a kid who grew up in the snow, I know 10 words, maybe, that describes types of snow: gropple, and how much it's snowing: like flurry. How many types of words do you know for rain?

      June 12

  • Martin C.

    Here are some possible additional topics for discussion.

    Can we view knowledge of scientific laws as a type of information and, if so, how would such a view conform to the view of informational content as being measured by unlikelihood? Might we instead need to talk about different types of information?

    To what degree has the Computer Revolution caused us to view the world in informational terms rather than the mechanistic view ushered in by the Industrial Revolution?

    June 8

    • Brian

      Lynn---let's take a hypothetical extreme case. Suppose we encounter a new phenomena about which we know nothing. We try to understand it using a measuring device. If we truly know nothing except for the results the measuring device is capable of recording, wouldn't all results, relative to our knowledge, be equiprobable? But in practicla terms I agree that the Bayesian assumption of equiprobability usually doesn't hold. And then, to get back on topic, there's the question of how subjective/Bayesian and objective/frequentist probability relate to information and scientific laws....

      June 10

    • Lynn

      To start with:

      >But in practicla terms I agree that the Bayesian assumption of equiprobability usually doesn't hold.

      That is not entailed by Bayes. All Bayesian reasoning strictly requires is finding a posterior probability from a prior and evidence. And while noninformative priors are possible, they are not necessary.

      June 10

  • Jeannie M.

    The book is better than you might think at first, quite accessible. My brother who graduated in information theory and mathematics gave it to me and I didn't think i'd even read it .....

    May 29

  • Michael R.

    The informtation theory videos I noted at this morning's meeting:

    https://www.youtube.com/channel/UCcAtD_VYwcYwVbTdvArsm7w

    1 · April 26, 2014

4 went

Your organizer's refund policy for Book Discussion: Information Theory

Refunds are not offered for this Meetup.

Create your own Meetup Group

Get started Learn more
Allison

Meetup has allowed me to meet people I wouldn't have met naturally - they're totally different than me.

Allison, started Women's Adventure Travel

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy