Zipf's Law

5 results back to index


The Art of Computer Programming: Sorting and Searching by Donald Ervin Knuth

card file, Charles Babbage, Claude Shannon: information theory, complexity theory, correlation coefficient, Donald Knuth, double entry bookkeeping, Eratosthenes, Fermat's Last Theorem, G4S, information retrieval, iterative process, John von Neumann, linked data, locality of reference, Menlo Park, Norbert Wiener, NP-complete, p-value, Paul Erdős, RAND corporation, refrigerator car, sorting algorithm, Vilfredo Pareto, Yogi Berra, Zipf's Law

A more typical sequence of probabilities, called "Zipf's law," has Pi = c/1, p2 = c/2, ..., pN = c/N, where c = 1/HN. (8) This distribution was popularized by G. K. Zipf, who observed that the nth most common word in natural language text seems to occur with a frequency approx- approximately proportional to 1/n. [The Psycho-Biology of Language (Boston, Mass.: Houghton Mifflin, 1935); Human Behavior and the Principle of Least Effort (Reading, Mass.: Addison-Wesley, 1949).] He observed the same phenomenon in census tables, when metropolitan areas are ranked in order of decreasing population. If Zipf's law governs the frequency of the keys in a table, we have immediately CN = N/HN; (9) searching such a file is about \ In N times faster than searching the same file with randomly ordered records.

Notice that this probability distribution is very similar to that of Zipf's law (8); as 9 varies from 1 to 0, the probabilities 6.1 SEQUENTIAL SEARCHING 401 vary from a uniform distribution to a Zipfian one. Applying C) to A3) yields CN = H?e)/H$-0) = j^ + OiN1-') « 0.122AT A4) as the mean number of comparisons for the 80-20 law (see exercise 8). A study of word frequencies carried out by E. S. Schwartz [see the interesting graph on page 422 of JACM 10 A963)] suggests that distribution A3) with a slightly negative value of 9 gives a better fit to the data than Zipf's law (8). In this case the mean value is substantially smaller than (9) as N —>¦ 00.

In fact, Cn is always less than ir/2 times the optimal value Cm [Chung, Hajela, and Seymour, J. Comp. Syst. Sci. 36 A988), 148-157]; this ratio is the best possible constant in general, since it is approached when pj is proportional to 1/j2. Let us see how well the self-organizing procedure works when the key prob- probabilities obey Zipf's law (8). We have _ _ 2 \k<i+»~~2 N 1 2N N 2 = 1 2=1 2=1 = \ +c(BN + 1)H2N -2N- 2{N + 1)HN + 2N) = ^+c(iVln4-lniV + O(l)) m2N/]gN, A8) by Eqs. 1.2.7-(8) and 1.2.7-C). This is substantially better than |jV, when N is reasonably large, and it is only about In 4 fh 1.386 times as many comparisons as would be obtained in the optimum arrangement; see (9).


pages: 299 words: 92,782

The Success Equation: Untangling Skill and Luck in Business, Sports, and Investing by Michael J. Mauboussin

Amazon Mechanical Turk, Atul Gawande, Benoit Mandelbrot, Black Swan, Boeing 747, Checklist Manifesto, Clayton Christensen, cognitive bias, commoditize, Daniel Kahneman / Amos Tversky, David Brooks, deliberate practice, disruptive innovation, Emanuel Derman, fundamental attribution error, Gary Kildall, Gini coefficient, hindsight bias, hiring and firing, income inequality, Innovator's Dilemma, John Bogle, Long Term Capital Management, loss aversion, Menlo Park, mental accounting, moral hazard, Network effects, power law, prisoner's dilemma, random walk, Richard Thaler, risk-adjusted returns, shareholder value, Simon Singh, six sigma, Steven Pinker, transaction costs, winner-take-all economy, zero-sum game, Zipf's Law

More formally, a power law is expressed in the form: p(x) = Cx−α, where C and α are constants. The exponent, α, is often shown as positive, although it is negative. Since x is raised to the power of α, the distribution is called a power law. The value of the exponent is typically 2 < α < 3. See M. E. J. Newman, “Power Laws, Pareto Distributions, and Zipf's Law,” Contemporary Physics 46, no. 5 (September–October 2005): 323–351; and Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman, “Power-law Distributions in Empirical Data,” SIAM Review 51, no. 4 (2009): 661–703. 11. Matthew 13:12 from the King James version, www.kingjamesbibleonline.org/matthew-13-12. 12.

Muller, Thor, and Lane Becker. Get Lucky: How to Put Planned Serendipity to Work for You and Your Business. San Francisco, CA: Jossey-Bass, 2012. Myers, David G. Intuition: Its Powers and Perils. New Haven, CT: Yale University Press, 2002. Newman, M. E. J. “Power Laws, Pareto Distributions, and Zipf's Law.” Contemporary Physics 46, no. 5 (September–October 2005): 323–351. Nisbett, Richard, and Lee Ross. Human Inference: Strategies and Shortcomings of Social Judgment. Englewood Cliffs, NJ: Prentice-Hall, 1980. Odean, Terrance. “Are Investors Reluctant to Realize Their Losses?” Journal of Finance 53, no. 5 (October 1998): 1775–1798.


pages: 339 words: 112,979

Unweaving the Rainbow by Richard Dawkins

Any sufficiently advanced technology is indistinguishable from magic, Arthur Eddington, Boeing 747, complexity theory, correlation coefficient, David Attenborough, discovery of DNA, double helix, Douglas Engelbart, Douglas Engelbart, Eddington experiment, I think there is a world market for maybe five computers, Isaac Newton, Jaron Lanier, Mahatma Gandhi, music of the spheres, Necker cube, p-value, phenotype, Ralph Waldo Emerson, Richard Feynman, Ronald Reagan, Skinner box, Steven Pinker, Stuart Kauffman, world market for maybe five computers, Zipf's Law

In Barlow's terms derived from the theory of codes, we could say that the nervous system uses short, economical words for messages that occur frequently and are expected; long, expensive words for messages that occur rarely and are not expected. It is a bit like language, in which (the generalization is called Zipf's Law) the shortest words in the dictionary are the ones most often used in speech. To push the idea to an extreme, most of the time the brain does not need to be told anything because what is going on is the norm. The message would be redundant. The brain is protected from redundancy by a hierarchy of filters, each filter tuned to remove expected features of a certain kind.


pages: 349 words: 114,038

Culture & Empire: Digital Revolution by Pieter Hintjens

4chan, Aaron Swartz, airport security, AltaVista, anti-communist, anti-pattern, barriers to entry, Bill Duvall, bitcoin, blockchain, Boeing 747, bread and circuses, business climate, business intelligence, business process, Chelsea Manning, clean water, commoditize, congestion charging, Corn Laws, correlation does not imply causation, cryptocurrency, Debian, decentralized internet, disinformation, Edward Snowden, failed state, financial independence, Firefox, full text search, gamification, German hyperinflation, global village, GnuPG, Google Chrome, greed is good, Hernando de Soto, hiring and firing, independent contractor, informal economy, intangible asset, invisible hand, it's over 9,000, James Watt: steam engine, Jeff Rulifson, Julian Assange, Kickstarter, Laura Poitras, M-Pesa, mass immigration, mass incarceration, mega-rich, military-industrial complex, MITM: man-in-the-middle, mutually assured destruction, Naomi Klein, national security letter, Nelson Mandela, new economy, New Urbanism, no silver bullet, Occupy movement, off-the-grid, offshore financial centre, packet switching, patent troll, peak oil, power law, pre–internet, private military company, race to the bottom, real-name policy, rent-seeking, reserve currency, RFC: Request For Comment, Richard Feynman, Richard Stallman, Ross Ulbricht, Russell Brand, Satoshi Nakamoto, security theater, selection bias, Skype, slashdot, software patent, spectrum auction, Steve Crocker, Steve Jobs, Steven Pinker, Stuxnet, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, trade route, transaction costs, twin studies, union organizing, wealth creators, web application, WikiLeaks, Y2K, zero day, Zipf's Law

The biggest cost is probably the paper form one has to fill in, and the front office that types it in, and takes a copy of your ID "for security purposes." Now let's look at competitors. The largest competitor to Western Union is MoneyGram International, one tenth the size. There is a mathematical "power law" called Zipf's Law that models the distribution in natural systems such as free markets, earthquakes, cities in a country, and words in a language. Yes, all these follow the same rules of distribution. Normally, you'd expect the largest firm to be twice the size of its next competitor, three times the size of the one after, and so on.


pages: 651 words: 180,162

Antifragile: Things That Gain From Disorder by Nassim Nicholas Taleb

"World Economic Forum" Davos, Air France Flight 447, Alan Greenspan, Andrei Shleifer, anti-fragile, banking crisis, Benoit Mandelbrot, Berlin Wall, biodiversity loss, Black Swan, business cycle, caloric restriction, caloric restriction, Chuck Templeton: OpenTable:, commoditize, creative destruction, credit crunch, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, discrete time, double entry bookkeeping, Emanuel Derman, epigenetics, fail fast, financial engineering, financial independence, Flash crash, flying shuttle, Gary Taubes, George Santayana, Gini coefficient, Helicobacter pylori, Henri Poincaré, Higgs boson, high net worth, hygiene hypothesis, Ignaz Semmelweis: hand washing, informal economy, invention of the wheel, invisible hand, Isaac Newton, James Hargreaves, Jane Jacobs, Jim Simons, joint-stock company, joint-stock limited liability company, Joseph Schumpeter, Kenneth Arrow, knowledge economy, language acquisition, Lao Tzu, Long Term Capital Management, loss aversion, Louis Pasteur, mandelbrot fractal, Marc Andreessen, Mark Spitznagel, meta-analysis, microbiome, money market fund, moral hazard, mouse model, Myron Scholes, Norbert Wiener, pattern recognition, Paul Samuelson, placebo effect, Ponzi scheme, Post-Keynesian economics, power law, principal–agent problem, purchasing power parity, quantitative trading / quantitative finance, Ralph Nader, random walk, Ray Kurzweil, rent control, Republic of Letters, Ronald Reagan, Rory Sutherland, Rupert Read, selection bias, Silicon Valley, six sigma, spinning jenny, statistical model, Steve Jobs, Steven Pinker, Stewart Brand, stochastic process, stochastic volatility, synthetic biology, tacit knowledge, tail risk, Thales and the olive presses, Thales of Miletus, The Great Moderation, the new new thing, The Wealth of Nations by Adam Smith, Thomas Bayes, Thomas Malthus, too big to fail, transaction costs, urban planning, Vilfredo Pareto, Yogi Berra, Zipf's Law

French, Roger, 2003, Medicine Before Science: The Rational and Learned Doctor from the Middle Ages to the Enlightenment. Cambridge: Cambridge University Press. Froot, K. A., 2001, “The Market for Catastrophe Risk: A Clinical Examination,” Journal of Financial Economics 60(2–3): 529–571. Fujiwara, Y., 2004, “Zipf Law in Firms Bankruptcy.” Physica A: Statistical and Theoretical Physics 337: 219–30. Fukumoto, S., and T. J. Martin, 2009, “Bone as an Endocrine Organ.” Trends in Endocrinology and Metabolism 20: 230–236. Fuller, Steve, 2005, The Intellectual. Icon Books. García-Ballester, Luis, 1995, “Health and Medical Care in Medieval Galenism.”