Zipf's Law

5 results back to index


The Art of Computer Programming: Sorting and Searching by Donald Ervin Knuth

card file, Claude Shannon: information theory, complexity theory, correlation coefficient, Donald Knuth, double entry bookkeeping, Eratosthenes, Fermat's Last Theorem, G4S, information retrieval, iterative process, John von Neumann, linked data, locality of reference, Menlo Park, Norbert Wiener, NP-complete, p-value, Paul Erdős, RAND corporation, refrigerator car, sorting algorithm, Vilfredo Pareto, Yogi Berra, Zipf's Law

In this case we find the optimum arrangement saves about one-third of the search time that would have been obtained if the records had appeared in random order. 400 SEARCHING 6.1 Of course the probability distributions in E) and F) are rather artificial, and they may never be a very good approximation to reality. A more typical sequence of probabilities, called "Zipf's law," has Pi = c/1, p2 = c/2, ..., pN = c/N, where c = 1/HN. (8) This distribution was popularized by G. K. Zipf, who observed that the nth most common word in natural language text seems to occur with a frequency approx- approximately proportional to 1/n. [The Psycho-Biology of Language (Boston, Mass.: Houghton Mifflin, 1935); Human Behavior and the Principle of Least Effort (Reading, Mass.: Addison-Wesley, 1949).] He observed the same phenomenon in census tables, when metropolitan areas are ranked in order of decreasing population. If Zipf's law governs the frequency of the keys in a table, we have immediately CN = N/HN; (9) searching such a file is about \ In N times faster than searching the same file with randomly ordered records.

A3) Here 6 = log .80/log .20 as before, and Hfr' is the iVth harmonic number of order 5, namely l~s + 2~s + • • • + iV~s. Notice that this probability distribution is very similar to that of Zipf's law (8); as 9 varies from 1 to 0, the probabilities 6.1 SEQUENTIAL SEARCHING 401 vary from a uniform distribution to a Zipfian one. Applying C) to A3) yields CN = H?e)/H$-0) = j^ + OiN1-') « 0.122AT A4) as the mean number of comparisons for the 80-20 law (see exercise 8). A study of word frequencies carried out by E. S. Schwartz [see the interesting graph on page 422 of JACM 10 A963)] suggests that distribution A3) with a slightly negative value of 9 gives a better fit to the data than Zipf's law (8). In this case the mean value is substantially smaller than (9) as N —>¦ 00. Distributions like A1) and A3) were first studied by Vilfredo Pareto in connection with disparities of personal income and wealth [Cours d'Economie Politique 2 (Lausanne: Rouge, 1897), 304-312].

In general, the average number of comparisons A7) is always less than twice the optimal value C), since C/v < 1 + 2^2 =1(j — l)pj = 2Cjv — 1. In fact, Cn is always less than ir/2 times the optimal value Cm [Chung, Hajela, and Seymour, J. Comp. Syst. Sci. 36 A988), 148-157]; this ratio is the best possible constant in general, since it is approached when pj is proportional to 1/j2. Let us see how well the self-organizing procedure works when the key prob- probabilities obey Zipf's law (8). We have _ _ 2 \k<i+»~~2 N 1 2N N 2 = 1 2=1 2=1 = \ +c(BN + 1)H2N -2N- 2{N + 1)HN + 2N) = ^+c(iVln4-lniV + O(l)) m2N/]gN, A8) by Eqs. 1.2.7-(8) and 1.2.7-C). This is substantially better than |jV, when N is reasonably large, and it is only about In 4 fh 1.386 times as many comparisons as would be obtained in the optimum arrangement; see (9). Computational experiments involving actual compiler symbol tables indicate that the self-organizing method works even better than our formulas predict, because successive searches are not independent (small groups of keys tend to occur in bunches).


pages: 299 words: 92,782

The Success Equation: Untangling Skill and Luck in Business, Sports, and Investing by Michael J. Mauboussin

Amazon Mechanical Turk, Atul Gawande, Benoit Mandelbrot, Black Swan, Checklist Manifesto, Clayton Christensen, cognitive bias, commoditize, Daniel Kahneman / Amos Tversky, David Brooks, deliberate practice, disruptive innovation, Emanuel Derman, fundamental attribution error, Gini coefficient, hindsight bias, hiring and firing, income inequality, Innovator's Dilemma, Long Term Capital Management, loss aversion, Menlo Park, mental accounting, moral hazard, Network effects, prisoner's dilemma, random walk, Richard Thaler, risk-adjusted returns, shareholder value, Simon Singh, six sigma, Steven Pinker, transaction costs, winner-take-all economy, zero-sum game, Zipf's Law

Matthew Salganik, “Prediction and Surprise,” presentation at the Thought Leader Forum, Legg Mason Capital Management, October 14, 2011. 10. More formally, a power law is expressed in the form: p(x) = Cx−α, where C and α are constants. The exponent, α, is often shown as positive, although it is negative. Since x is raised to the power of α, the distribution is called a power law. The value of the exponent is typically 2 < α < 3. See M. E. J. Newman, “Power Laws, Pareto Distributions, and Zipf's Law,” Contemporary Physics 46, no. 5 (September–October 2005): 323–351; and Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman, “Power-law Distributions in Empirical Data,” SIAM Review 51, no. 4 (2009): 661–703. 11. Matthew 13:12 from the King James version, www.kingjamesbibleonline.org/matthew-13-12. 12. See Robert K. Merton, “The Matthew Effect in Science,” Science 159, no. 3810 (January 5, 1968): 56–63; and Daniel Rigney, The Matthew Effect: How Advantage Begets Further Advantage (New York: Columbia University Press, 2010).

“We Cannot Go On: Disruptive Innovation and the First World War Royal Navy.” Security Studies 19, no. 1 (January 2010): 124–159. Muller, Thor, and Lane Becker. Get Lucky: How to Put Planned Serendipity to Work for You and Your Business. San Francisco, CA: Jossey-Bass, 2012. Myers, David G. Intuition: Its Powers and Perils. New Haven, CT: Yale University Press, 2002. Newman, M. E. J. “Power Laws, Pareto Distributions, and Zipf's Law.” Contemporary Physics 46, no. 5 (September–October 2005): 323–351. Nisbett, Richard, and Lee Ross. Human Inference: Strategies and Shortcomings of Social Judgment. Englewood Cliffs, NJ: Prentice-Hall, 1980. Odean, Terrance. “Are Investors Reluctant to Realize Their Losses?” Journal of Finance 53, no. 5 (October 1998): 1775–1798. Olsen, Robert A. “Professional Investors as Naturalistic Decision Makers: Evidence and Market Implications.”


pages: 339 words: 112,979

Unweaving the Rainbow by Richard Dawkins

Any sufficiently advanced technology is indistinguishable from magic, Arthur Eddington, complexity theory, correlation coefficient, David Attenborough, discovery of DNA, double helix, Douglas Engelbart, Douglas Engelbart, I think there is a world market for maybe five computers, Isaac Newton, Jaron Lanier, Mahatma Gandhi, music of the spheres, Necker cube, p-value, phenotype, Ralph Waldo Emerson, Richard Feynman, Ronald Reagan, Solar eclipse in 1919, Steven Pinker, Zipf's Law

Higher again, only movement is news. Then only changes in rate or direction of movement. In Barlow's terms derived from the theory of codes, we could say that the nervous system uses short, economical words for messages that occur frequently and are expected; long, expensive words for messages that occur rarely and are not expected. It is a bit like language, in which (the generalization is called Zipf's Law) the shortest words in the dictionary are the ones most often used in speech. To push the idea to an extreme, most of the time the brain does not need to be told anything because what is going on is the norm. The message would be redundant. The brain is protected from redundancy by a hierarchy of filters, each filter tuned to remove expected features of a certain kind. It follows that the set of nervous filters constitutes a kind of summary description of the norm, of the statistical properties of the world in which the animal lives.


pages: 349 words: 114,038

Culture & Empire: Digital Revolution by Pieter Hintjens

4chan, airport security, AltaVista, anti-communist, anti-pattern, barriers to entry, Bill Duvall, bitcoin, blockchain, business climate, business intelligence, business process, Chelsea Manning, clean water, commoditize, congestion charging, Corn Laws, correlation does not imply causation, cryptocurrency, Debian, Edward Snowden, failed state, financial independence, Firefox, full text search, German hyperinflation, global village, GnuPG, Google Chrome, greed is good, Hernando de Soto, hiring and firing, informal economy, intangible asset, invisible hand, James Watt: steam engine, Jeff Rulifson, Julian Assange, Kickstarter, M-Pesa, mass immigration, mass incarceration, mega-rich, MITM: man-in-the-middle, mutually assured destruction, Naomi Klein, national security letter, Nelson Mandela, new economy, New Urbanism, Occupy movement, offshore financial centre, packet switching, patent troll, peak oil, pre–internet, private military company, race to the bottom, rent-seeking, reserve currency, RFC: Request For Comment, Richard Feynman, Richard Stallman, Ross Ulbricht, Satoshi Nakamoto, security theater, selection bias, Skype, slashdot, software patent, spectrum auction, Steve Crocker, Steve Jobs, Steven Pinker, Stuxnet, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, trade route, transaction costs, twin studies, union organizing, wealth creators, web application, WikiLeaks, Y2K, zero day, Zipf's Law

This is extraordinary, given that no money is actually being sent anywhere. It's just electronic messages. The biggest cost is probably the paper form one has to fill in, and the front office that types it in, and takes a copy of your ID "for security purposes." Now let's look at competitors. The largest competitor to Western Union is MoneyGram International, one tenth the size. There is a mathematical "power law" called Zipf's Law that models the distribution in natural systems such as free markets, earthquakes, cities in a country, and words in a language. Yes, all these follow the same rules of distribution. Normally, you'd expect the largest firm to be twice the size of its next competitor, three times the size of the one after, and so on. The data shows that Western Union, too large and too costly, has a monopoly over the money transfer market.


pages: 651 words: 180,162

Antifragile: Things That Gain From Disorder by Nassim Nicholas Taleb

Air France Flight 447, Andrei Shleifer, banking crisis, Benoit Mandelbrot, Berlin Wall, Black Swan, business cycle, Chuck Templeton: OpenTable:, commoditize, creative destruction, credit crunch, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, discrete time, double entry bookkeeping, Emanuel Derman, epigenetics, financial independence, Flash crash, Gary Taubes, George Santayana, Gini coefficient, Henri Poincaré, high net worth, hygiene hypothesis, Ignaz Semmelweis: hand washing, informal economy, invention of the wheel, invisible hand, Isaac Newton, James Hargreaves, Jane Jacobs, joint-stock company, joint-stock limited liability company, Joseph Schumpeter, Kenneth Arrow, knowledge economy, Lao Tzu, Long Term Capital Management, loss aversion, Louis Pasteur, mandelbrot fractal, Marc Andreessen, meta analysis, meta-analysis, microbiome, money market fund, moral hazard, mouse model, Myron Scholes, Norbert Wiener, pattern recognition, Paul Samuelson, placebo effect, Ponzi scheme, principal–agent problem, purchasing power parity, quantitative trading / quantitative finance, Ralph Nader, random walk, Ray Kurzweil, rent control, Republic of Letters, Ronald Reagan, Rory Sutherland, selection bias, Silicon Valley, six sigma, spinning jenny, statistical model, Steve Jobs, Steven Pinker, Stewart Brand, stochastic process, stochastic volatility, Thales and the olive presses, Thales of Miletus, The Great Moderation, the new new thing, The Wealth of Nations by Adam Smith, Thomas Bayes, Thomas Malthus, too big to fail, transaction costs, urban planning, Vilfredo Pareto, Yogi Berra, Zipf's Law

Freidson, Eliot, 1970, Profession of Medicine: A Study of the Sociology of Applied Knowledge. Chicago: University of Chicago Press. French, Roger, 2003, Medicine Before Science: The Rational and Learned Doctor from the Middle Ages to the Enlightenment. Cambridge: Cambridge University Press. Froot, K. A., 2001, “The Market for Catastrophe Risk: A Clinical Examination,” Journal of Financial Economics 60(2–3): 529–571. Fujiwara, Y., 2004, “Zipf Law in Firms Bankruptcy.” Physica A: Statistical and Theoretical Physics 337: 219–30. Fukumoto, S., and T. J. Martin, 2009, “Bone as an Endocrine Organ.” Trends in Endocrinology and Metabolism 20: 230–236. Fuller, Steve, 2005, The Intellectual. Icon Books. García-Ballester, Luis, 1995, “Health and Medical Care in Medieval Galenism.” In Don Bates, ed., Knowledge and the Scholarly Medical Traditions.