# Zipf's Law

5 results back to index

The Art of Computer Programming: Sorting and Searching by Donald Ervin Knuth

In this case we find the optimum arrangement saves about one-third of the search time that would have been obtained if the records had appeared in random order. 400 SEARCHING 6.1 Of course the probability distributions in E) and F) are rather artificial, and they may never be a very good approximation to reality. A more typical sequence of probabilities, called "Zipf's law," has Pi = c/1, p2 = c/2, ..., pN = c/N, where c = 1/HN. (8) This distribution was popularized by G. K. Zipf, who observed that the nth most common word in natural language text seems to occur with a frequency approx- approximately proportional to 1/n. [The Psycho-Biology of Language (Boston, Mass.: Houghton Mifflin, 1935); Human Behavior and the Principle of Least Effort (Reading, Mass.: Addison-Wesley, 1949).] He observed the same phenomenon in census tables, when metropolitan areas are ranked in order of decreasing population. If Zipf's law governs the frequency of the keys in a table, we have immediately CN = N/HN; (9) searching such a file is about \ In N times faster than searching the same file with randomly ordered records.

A3) Here 6 = log .80/log .20 as before, and Hfr' is the iVth harmonic number of order 5, namely l~s + 2~s + • • • + iV~s. Notice that this probability distribution is very similar to that of Zipf's law (8); as 9 varies from 1 to 0, the probabilities 6.1 SEQUENTIAL SEARCHING 401 vary from a uniform distribution to a Zipfian one. Applying C) to A3) yields CN = H?e)/H\$-0) = j^ + OiN1-') « 0.122AT A4) as the mean number of comparisons for the 80-20 law (see exercise 8). A study of word frequencies carried out by E. S. Schwartz [see the interesting graph on page 422 of JACM 10 A963)] suggests that distribution A3) with a slightly negative value of 9 gives a better fit to the data than Zipf's law (8). In this case the mean value is substantially smaller than (9) as N —>¦ 00. Distributions like A1) and A3) were first studied by Vilfredo Pareto in connection with disparities of personal income and wealth [Cours d'Economie Politique 2 (Lausanne: Rouge, 1897), 304-312].

In general, the average number of comparisons A7) is always less than twice the optimal value C), since C/v < 1 + 2^2 =1(j — l)pj = 2Cjv — 1. In fact, Cn is always less than ir/2 times the optimal value Cm [Chung, Hajela, and Seymour, J. Comp. Syst. Sci. 36 A988), 148-157]; this ratio is the best possible constant in general, since it is approached when pj is proportional to 1/j2. Let us see how well the self-organizing procedure works when the key prob- probabilities obey Zipf's law (8). We have _ _ 2 \k<i+»~~2 N 1 2N N 2 = 1 2=1 2=1 = \ +c(BN + 1)H2N -2N- 2{N + 1)HN + 2N) = ^+c(iVln4-lniV + O(l)) m2N/]gN, A8) by Eqs. 1.2.7-(8) and 1.2.7-C). This is substantially better than |jV, when N is reasonably large, and it is only about In 4 fh 1.386 times as many comparisons as would be obtained in the optimum arrangement; see (9). Computational experiments involving actual compiler symbol tables indicate that the self-organizing method works even better than our formulas predict, because successive searches are not independent (small groups of keys tend to occur in bunches).

pages: 299 words: 92,782

The Success Equation: Untangling Skill and Luck in Business, Sports, and Investing by Michael J. Mauboussin

Matthew Salganik, “Prediction and Surprise,” presentation at the Thought Leader Forum, Legg Mason Capital Management, October 14, 2011. 10. More formally, a power law is expressed in the form: p(x) = Cx−α, where C and α are constants. The exponent, α, is often shown as positive, although it is negative. Since x is raised to the power of α, the distribution is called a power law. The value of the exponent is typically 2 < α < 3. See M. E. J. Newman, “Power Laws, Pareto Distributions, and Zipf's Law,” Contemporary Physics 46, no. 5 (September–October 2005): 323–351; and Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman, “Power-law Distributions in Empirical Data,” SIAM Review 51, no. 4 (2009): 661–703. 11. Matthew 13:12 from the King James version, www.kingjamesbibleonline.org/matthew-13-12. 12. See Robert K. Merton, “The Matthew Effect in Science,” Science 159, no. 3810 (January 5, 1968): 56–63; and Daniel Rigney, The Matthew Effect: How Advantage Begets Further Advantage (New York: Columbia University Press, 2010).

“We Cannot Go On: Disruptive Innovation and the First World War Royal Navy.” Security Studies 19, no. 1 (January 2010): 124–159. Muller, Thor, and Lane Becker. Get Lucky: How to Put Planned Serendipity to Work for You and Your Business. San Francisco, CA: Jossey-Bass, 2012. Myers, David G. Intuition: Its Powers and Perils. New Haven, CT: Yale University Press, 2002. Newman, M. E. J. “Power Laws, Pareto Distributions, and Zipf's Law.” Contemporary Physics 46, no. 5 (September–October 2005): 323–351. Nisbett, Richard, and Lee Ross. Human Inference: Strategies and Shortcomings of Social Judgment. Englewood Cliffs, NJ: Prentice-Hall, 1980. Odean, Terrance. “Are Investors Reluctant to Realize Their Losses?” Journal of Finance 53, no. 5 (October 1998): 1775–1798. Olsen, Robert A. “Professional Investors as Naturalistic Decision Makers: Evidence and Market Implications.”

pages: 339 words: 112,979

Unweaving the Rainbow by Richard Dawkins

Higher again, only movement is news. Then only changes in rate or direction of movement. In Barlow's terms derived from the theory of codes, we could say that the nervous system uses short, economical words for messages that occur frequently and are expected; long, expensive words for messages that occur rarely and are not expected. It is a bit like language, in which (the generalization is called Zipf's Law) the shortest words in the dictionary are the ones most often used in speech. To push the idea to an extreme, most of the time the brain does not need to be told anything because what is going on is the norm. The message would be redundant. The brain is protected from redundancy by a hierarchy of filters, each filter tuned to remove expected features of a certain kind. It follows that the set of nervous filters constitutes a kind of summary description of the norm, of the statistical properties of the world in which the animal lives.

pages: 349 words: 114,038

Culture & Empire: Digital Revolution by Pieter Hintjens

This is extraordinary, given that no money is actually being sent anywhere. It's just electronic messages. The biggest cost is probably the paper form one has to fill in, and the front office that types it in, and takes a copy of your ID "for security purposes." Now let's look at competitors. The largest competitor to Western Union is MoneyGram International, one tenth the size. There is a mathematical "power law" called Zipf's Law that models the distribution in natural systems such as free markets, earthquakes, cities in a country, and words in a language. Yes, all these follow the same rules of distribution. Normally, you'd expect the largest firm to be twice the size of its next competitor, three times the size of the one after, and so on. The data shows that Western Union, too large and too costly, has a monopoly over the money transfer market.

pages: 651 words: 180,162

Antifragile: Things That Gain From Disorder by Nassim Nicholas Taleb

Freidson, Eliot, 1970, Profession of Medicine: A Study of the Sociology of Applied Knowledge. Chicago: University of Chicago Press. French, Roger, 2003, Medicine Before Science: The Rational and Learned Doctor from the Middle Ages to the Enlightenment. Cambridge: Cambridge University Press. Froot, K. A., 2001, “The Market for Catastrophe Risk: A Clinical Examination,” Journal of Financial Economics 60(2–3): 529–571. Fujiwara, Y., 2004, “Zipf Law in Firms Bankruptcy.” Physica A: Statistical and Theoretical Physics 337: 219–30. Fukumoto, S., and T. J. Martin, 2009, “Bone as an Endocrine Organ.” Trends in Endocrinology and Metabolism 20: 230–236. Fuller, Steve, 2005, The Intellectual. Icon Books. García-Ballester, Luis, 1995, “Health and Medical Care in Medieval Galenism.” In Don Bates, ed., Knowledge and the Scholarly Medical Traditions.