This paper is information theory’s fundamental contribution to NLP ...
This paper is by Claude Shannon, the father of information theory a...
Zipf law was first introduced by George Kingsley Zipf and it states...
Here is a parallel more fun effort to look at the entropy and infor...
Since Shannon's foundational paper, many people have calculated bou...
Mary E. Shannon (Betty) met Claude at Bell Lab's and married in 194...
Discussion
This paper is information theory’s fundamental contribution to NLP and computational linguistics. Within the paper, Shannon shows the the first presentation of statistical language analysis, providing upper and lower bounds of entropy on the statistics of printed English.
Here is a parallel more fun effort to look at the entropy and information theory content of songs over time, they show that songs are in fact getting more predictable over time! https://pudding.cool/2017/05/song-repetition/
Since Shannon's foundational paper, many people have calculated bounds for printed english. One upper bound calculated by a team at IBM Watson in 1995 found an upper bound of 1.75: http://www.aclweb.org/anthology/J92-1002
Zipf law was first introduced by George Kingsley Zipf and it states that given a large sample of words, the frequency of any word is inversely proportional to its rank in the frequency table.
This plot shows the rank versus frequency for the first 10 million words in 30 different language Wikipedias in a log-log scale.
![Imgur](https://i.imgur.com/YHXEaXj.png)
Source: https://en.wikipedia.org/wiki/Zipf%27s_law
Mary E. Shannon (Betty) met Claude at Bell Lab's and married in 1949. She started her career there as a “computer” (doing the mathematical calculations required by the engineers); and frequently collaborated with Claude on research, inventions and writing.
This paper is by Claude Shannon, the father of information theory and one of the most creative and important scientists of the 20th century. Information theory transformed our thinking about the quantification, communication and storage of information and Claude Shannon’s “Mathematical Theory of Communication” lays the foundation for the field. Claude Elwood Shannon (April 30, 1916 – February 24, 2001) was a master inventor and tinkerer, and had a storied career. Among his accomplishments: he proved that circuits could do anything that Boolean algebra code could solve (fundamental to electronic digital computers); he made major advancements in cryptanalysis as a codebreaker during World War II; he worked at the storied Bell Labs, inventing the field of information theory (which subsequently allowed for the digital revolution to occur).
Some interesting facts about Shannon-> he invented the first wearable computer (a device to improve the odds in playing roulette); he loved building robots / AI and famously built an AI mouse while at Bell Labs; when he moved to Massachusetts to be a professor at MIT, his house was famously called the Entropy House; and he became a multi-millionaire from investments in technology companies.
![Imgur](https://i.imgur.com/pQ5vH5E.png)
More about Shannon here: https://en.wikipedia.org/wiki/Claude_Shannon
Shannon mouse: https://www.youtube.com/watch?v=vPKkXibQXGA