# Bayesian statistics

47 results back to index

pages: 589 words: 69,193

Mastering Pandas by Femi Anthony

A Tour of Statistics – The Classical Approach Descriptive statistics versus inferential statistics Measures of central tendency and variability Measures of central tendency The mean The median The mode Computing measures of central tendency of a dataset in Python Measures of variability, dispersion, or spread Range Quartile Deviation and variance Hypothesis testing – the null and alternative hypotheses The null and alternative hypotheses The alpha and p-values Type I and Type II errors Statistical hypothesis tests Background The z-test The t-test Types of t-tests A t-test example Confidence intervals An illustrative example Correlation and linear regression Correlation Linear regression An illustrative example Summary 8. A Brief Tour of Bayesian Statistics Introduction to Bayesian statistics Mathematical framework for Bayesian statistics Bayes theory and odds Applications of Bayesian statistics Probability distributions Fitting a distribution Discrete probability distributions Discrete uniform distributions The Bernoulli distribution The binomial distribution The Poisson distribution The Geometric distribution The negative binomial distribution Continuous probability distributions The continuous uniform distribution The exponential distribution The normal distribution Bayesian statistics versus Frequentist statistics What is probability? How the model is defined Confidence (Frequentist) versus Credible (Bayesian) intervals Conducting Bayesian statistical analysis Monte Carlo estimation of the likelihood function and PyMC Bayesian analysis example – Switchpoint detection References Summary 9.

A Brief Tour of Bayesian Statistics In this chapter, we will take a brief tour of an alternative approach to statistical inference called Bayesian statistics. It is not intended to be a full primer but just serve as an introduction to the Bayesian approach. We will also explore the associated Python-related libraries, how to use pandas, and matplotlib to help with the data analysis. The various topics that will be discussed are as follows: Introduction to Bayesian statistics Mathematical framework for Bayesian statistics Probability distributions Bayesian versus Frequentist statistics Introduction to PyMC and Monte Carlo simulation Illustration of Bayesian inference – Switchpoint detection Introduction to Bayesian statistics The field of Bayesian statistics is built on the work of Reverend Thomas Bayes, an 18th century statistician, philosopher, and Presbyterian minister.

pages: 561 words: 120,899

The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy by Sharon Bertsch McGrayne

JASA (95) 1282–86 Couzin, Jennifer. (2004) The new math of clinical trials. Science (303) 784–86. DeGroot, Morris H. (1986b) A conversation with Persi Diaconis. Statistical Science (1:3) 319–34. Diaconis P, Efron B. (1983) Computer-intensive methods in statistics. Scientific American (248) 116–30. Diaconis, Persi. (1985) Bayesian statistics as honest work. Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer (1), eds., Lucien M. Le Cam and Richard A. Olshen. Wadsworth. Diaconis P, Holmes S. (1996) Are there still things to do in Bayesian statistics? Erkenntnis (45) 145–58. Diaconis P. (1998) A place for philosophy? The rise of modeling in statistical science. Quarterly of Applied Mathematics (56:4) 797–805. DuMouchel WH, Harris JE. (1983) Bayes methods for combining the results of cancer studies in humans and other species.

Today, Bayes’ rule is used everywhere from DNA de-coding to Homeland Security. Drawing on primary source material and interviews with statisticians and other scientists, The Theory That Would Not Die is the riveting account of how a seemingly simple theorem ignited one of the greatest controversies of all time”—Provided by publisher. Includes bibliographical references and index. ISBN 978-0-300-16969-0 (hardback) 1. Bayesian statistical decision theory—History. I. Title. QA279.5.M415 2011 519.5’42—dc22 2010045037 A catalogue record for this book is available from the British Library. This paper meets the requirements of ANSI/NISO Z39.48–1992 (Permanence of Paper). 10 9 8 7 6 5 4 3 2 1 When the facts change, I change my opinion. What do you do, sir? —John Maynard Keynes contents Preface and Note to Readers Acknowledgments Part I.

Bayes combined judgments based on prior hunches with probabilities based on repeatable experiments. He introduced the signature features of Bayesian methods: an initial belief modified by objective new information. He could move from observations of the world to abstractions about their probable cause. And he discovered the long-sought grail of probability, what future mathematicians would call the probability of causes, the principle of inverse probability, Bayesian statistics, or simply Bayes’ rule. Given the revered status of his work today, it is also important to recognize what Bayes did not do. He did not produce the modern version of Bayes’ rule. He did not even employ an algebraic equation; he used Newton’s old-fashioned geometric notation to calculate and add areas. Nor did he develop his theorem into a powerful mathematical method. Above all, unlike Price, he did not mention Hume, religion, or God.

Bulletproof Problem Solving by Charles Conn, Robert McLean

Adding more variables may improve the performance of the regression analysis—but adding more variables may then be overfitting the data. This problem is a consequence of the underlying mathematics—and a reminder to always use the simplest model that sufficiently explains your phenomenon. Bayesian Statistics and the Space Shuttle Challenger Disaster For those who lived through the Space Shuttle Challenger disaster, it is remembered as an engineering failure. It was that of course, but more importantly it was a problem solving failure. It involved risk assessment relating to O‐ring damage that we now know is best assessed with Bayesian statistics. Bayesian statistics are useful in incomplete data environments, and especially as a way of assessing conditional probability in complex situations. Conditional probability occurs in situations where a set of probable outcomes depends in turn on another set of conditions that are also probabilistic.

To illustrate each of these analytic tools in action, we provide case examples of how they are used in problem solving. We start with simple data analysis and then move on to multiple regression, Bayesian statistics, simulations, constructed experiments, natural experiments, machine learning, crowd‐sourced problem solving, and finish up with another big gun for competitive settings, game theory. Of course each of these tools could warrant a textbook on their own, so this is necessarily only an introduction to the power and applications of each technique. Summary of Case Studies Data visualization: London air quality Multivariate regression: Understanding obesity Bayesian statistics: Space Shuttle Challenger disaster Constructed experiments: RCTs and A|B testing Natural experiments: Voter prejudice Simulations: Climate change example Machine learning: Sleep apnea, bus routing, and shark spotting Crowd‐sourcing algorithms Game theory: Intellectual property and serving in tennis It is a reasonable amount of effort to work through these, but bear with us—these case studies will give you a solid sense of which advanced tool to use in a variety of problem settings.

The resulting posterior probability of failure given launch at 31F is a staggering 99.8%, almost identical to the estimate of another research team who also used Bayesian analysis. Several lessons emerge for the use of big guns in data analysis from the Challenger disaster. First is that the choice of model, in this case Bayesian statistics, can have an impact on conclusions about risks, in this case catastrophic risks. Second is that it takes careful thinking to arrive at the correct conditional probability. Finally, how you handle extreme values like launch temperature at 31F, when the data is incomplete, requires a probabilistic approach where a distribution is fitted to available data. Bayesian statistics may be the right tool to test your hypothesis when the opportunity exists to do updating of a prior probability with new evidence, in this case exploring the full experience of success and failure at a temperature not previously experienced.

pages: 354 words: 105,322

The Road to Ruin: The Global Elites' Secret Plan for the Next Financial Crisis by James Rickards

This railroad incident took place before the Balkan Wars of 1912–13, and six years before the outbreak of the First World War. Yet, based on the French-Russian reaction alone, Somary correctly inferred that world war was inevitable. His analysis was that if an insignificant matter excited geopolitical tensions to the boiling point, then larger matters, which inevitably occur, must lead to war. This inference is a perfect example of Bayesian statistics. Somary, in effect, started with a hypothesis about the probability of world war, which in the absence of any information is weighted fifty-fifty. As incidents like the sanjak railway emerge, they are added to the numerator and denominator of the mathematical form of Bayes’ theorem, increasing the odds of war. Contemporary intelligence analysts call these events “indications and warnings.”

This is why central bank and Wall Street equilibrium models produce consistently weak results in forecasting and risk management. Every analysis starts with the same data. Yet when you enter that data into a deficient model, you get deficient output. Investors who use complexity theory can leave mainstream analysis behind and get better forecasting results. The third tool in addition to behavioral psychology and complexity theory is Bayesian statistics, a branch of etiology also referred to as causal inference. Both terms derive from Bayes’ theorem, an equation first described by Thomas Bayes and published posthumously in 1763. A version of the theorem was elaborated independently and more formally by the French mathematician Pierre-Simon Laplace in 1774. Laplace continued work on the theorem in subsequent decades. Twentieth-century statisticians have developed more rigorous forms.

Austrians made invaluable contributions to the study of choice and markets. Yet their emphasis on the explanatory power of money seems narrow. Money matters, but an emphasis on money to the exclusion of psychology is a fatal flaw. Keynesian and monetarist schools have lately merged into the neoliberal consensus, a nightmarish surf and turf presenting the worst of both. In this book, I write as a theorist using complexity theory, Bayesian statistics, and behavioral psychology to study economics. That approach is unique and not yet a “school” of economic thought. This book also uses one other device—history. When asked to identify which established school of economic thought I find most useful, my reply is Historical. Notable writers of the Historical school include the liberal Walter Bagehot, the Communist Karl Marx, and the conservative Austrian-Catholic Joseph A.

The Book of Why: The New Science of Cause and Effect by Judea Pearl, Dana Mackenzie

She must abandon the centuries-old dogma of objectivity for objectivity’s sake. Where causation is concerned, a grain of wise subjectivity tells us more about the real world than any amount of objectivity. In the above paragraph, I said that “most of” the tools of statistics strive for complete objectivity. There is one important exception to this rule, though. A branch of statistics called Bayesian statistics has achieved growing popularity over the last fifty years or so. Once considered almost anathema, it has now gone completely mainstream, and you can attend an entire statistics conference without hearing any of the great debates between “Bayesians” and “frequentists” that used to thunder in the 1960s and 1970s. The prototype of Bayesian analysis goes like this: Prior Belief + New Evidence Revised Belief.

We also need to take into account our prior knowledge about the coin.” Did it come from the neighborhood grocery or a shady gambler? If it’s just an ordinary quarter, most of us would not let the coincidence of nine heads sway our belief so dramatically. On the other hand, if we already suspected the coin was weighted, we would conclude more willingly that the nine heads provided serious evidence of bias. Bayesian statistics give us an objective way of combining the observed evidence with our prior knowledge (or subjective belief) to obtain a revised belief and hence a revised prediction of the outcome of the coin’s next toss. Still, what frequentists could not abide was that Bayesians were allowing opinion, in the form of subjective probabilities, to intrude into the pristine kingdom of statistics. Mainstream statisticians were won over only grudgingly, when Bayesian analysis proved a superior tool for a variety of applications, such as weather prediction and tracking enemy submarines.

Journal of Educational Statistics 12: 101–223. Galton, F. (1869). Hereditary Genius. Macmillan, London, UK. Galton, F. (1883). Inquiries into Human Faculty and Its Development. Macmillan, London, UK. Galton, F. (1889). Natural Inheritance. Macmillan, London, UK. Goldberger, A. (1972). Structural equation models in the social sciences. Econometrica: Journal of the Econometric Society 40: 979–1001. Lindley, D. (1987). Bayesian Statistics: A Review. CBMS-NSF Regional Conference Series in Applied Mathematics (Book 2). Society for Industrial and Applied Mathematics, Philadelphia, PA. McGrayne, S. B. (2011). The Theory That Would Not Die. Yale University Press, New Haven, CT. Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, NY. Pearl, J. (2015). Trygve Haavelmo and the emergence of causal calculus.

Super Thinking: The Big Book of Mental Models by Gabriel Weinberg, Lauren McCann

Bayesians, by contrast, allow probabilistic judgments about any situation, regardless of whether any observations have yet occurred. To do this, Bayesians begin by bringing related evidence to statistical determinations. For example, picking a penny up off the street, you’d probably initially estimate a fifty-fifty chance that it would come up heads if you flipped it, even if you’d never observed a flip of that particular coin before. In Bayesian statistics, you can bring such knowledge of base rates to a problem. In frequentist statistics, you cannot. Many people find this Bayesian way of looking at probability more intuitive because it is similar to how your beliefs naturally evolve. In everyday life, you aren’t starting from scratch every time, as you would in frequentist statistics. For instance, on policy issues, your starting point is what you currently know on that topic—what Bayesians call a prior—and then when you get new data, you (hopefully) update your prior based on the new information.

., the one-hundred-coin-flips example we presented), the confidence intervals calculated should contain the parameter you are studying (e.g., 50 percent probability of getting heads) to the level of confidence specified (e.g., 95 percent of the time). To many people’s dismay, a confidence interval does not say there is a 95 percent chance of the true value of the parameter being in the interval. By contrast, Bayesian statistics analogously produces credible intervals, which do say that; credible intervals specify the current best estimated range for the probability of the parameter. As such, this Bayesian way of doing things is again more intuitive. In practice, though, both approaches yield very similar conclusions, and as more data becomes available, they should converge on the same conclusion. That’s because they are both trying to estimate the same underlying truth.

Crowdsourcing has been effective across a wide array of situations, from soliciting tips in journalism, to garnering contributions to Wikipedia, to solving the real-world problems of companies and governments. For example, Netflix held a contest in 2009 in which crowdsourced researchers beat Netflix’s own recommendation algorithms. Crowdsourcing can help you get a sense of what a wide array of people think about a topic, which can inform your future decision making, updating your prior beliefs (see Bayesian statistics in Chapter 5). It can also help you uncover unknown unknowns and unknown knowns as you get feedback from people with previous experiences you might not have had. In James Surowiecki’s book The Wisdom of Crowds, he examines situations where input from crowds can be particularly effective. It opens with a story about how the crowd at a county fair in 1906, attended by statistician Francis Galton, correctly guessed the weight of an ox.

Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth by Stuart Ritchie

Doing away with p-values wouldn’t necessarily improve matters; in fact, by introducing another source of subjectivity, it might make the situation a lot worse.26 With tongue only partly in cheek, John Ioannidis has noted that if we remove all such objective measures we invite a situation where ‘all science will become like nutritional epidemiology’ – a scary prospect indeed.27 The same criticism is often levelled at the other main alternative to p-values: Bayesian statistics. Drawing on a probability theorem devised by the eighteenth-century statistician Thomas Bayes, this method allows researchers to take the strength of previous evidence – referred to as a ‘prior’ – into account when assessing the significance of new findings. For instance, if someone tells you their weather forecast predicts a rainy day in London in the autumn, it won’t take too much to convince you that they’re right.

A Bayesian can build all that pre-existing evidence into their initial calculation – in the latter case, they’d require the new forecast to be extraordinarily convincing in order to overturn all the previous meteorological knowledge.28 This isn’t something you can do so easily with p-values, since they’re almost always calculated independently of any prior evidence. However, the Bayesian ‘prior’ is inherently subjective: we can all agree that the Sahara is hot and dry, but how strongly we should believe before a study starts that a particular drug will reduce depression symptoms, or that a specific government policy will boost economic growth, is wholly debatable. Aside from taking prior evidence into account, Bayesian statistics also have other differences from p-values.29 They’re less affected by sample size, for example: statistical power is not a factor because the Bayesian approach is aimed not at detecting the effect of a particular set of conditions, but simply at weighing up the evidence for and against a hypothesis. Arguably, they’re also closer to how people normally reason about statistics. Bayesians say ‘what is the probability my hypothesis is true, given these observations?’

The broader statistical tradition where p-values sit, incidentally, is called frequentist statistics. That’s because, fundamentally, users of p-values are interested in frequencies – most notably the frequency with which you’ll find results with p-values below 0.05 if you run your study an infinite number of times and the hypothesis you’re testing isn’t true. 30.  A useful annotated reading list that serves as an introduction to Bayesian statistics is given by Etz et al., ‘How to Become a Bayesian in Eight Easy Steps: An Annotated Reading List’, Psychonomic Bulletin & Review 25, no. 1 (Feb. 2018): 219–34; https://doi.org/10.3758/s13423-017-1317-5. See also Richard McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan, Chapman & Hall/CRC Texts in Statistical Science Series 122 (Boca Raton: CRC Press/Taylor & Francis Group, 2016). 31.

pages: 442 words: 94,734

The Art of Statistics: Learning From Data by David Spiegelhalter

So the product of the likelihood ratio and the prior odds ends up being around 72,000/1,000,000, which are odds of around 7/100, corresponding to a probability of 7/107 or 7% that he is a cheat. So we should give him the benefit of the doubt at this stage, whereas we might not be so generous with someone we had just met in the pub. And perhaps we should keep a careful eye on the Archbishop. Bayesian Statistical Inference Bayes’ theorem, even if it is not permitted in UK courts, is the scientifically correct way to change our mind on the basis of new evidence. Expected frequencies make Bayesian analysis reasonably straightforward for simple situations that involve only two hypotheses, say about whether someone does or does not have a disease, or has or has not committed an offence. However, things get trickier when we want to apply the same ideas to drawing inferences about unknown quantities that might take on a range of values, such as parameters in statistical models.

CHAPTER 9: Putting Probability and Statistics Together 1 To derive this distribution, we could calculate the probability of two left-handers as 0.2 × 0.2 = 0.04, the probability of two right-handers as 0.8 × 0.8 = 0.64, and so the probability of one of each must be 1 − 0.04 − 0.64 = 0.32. 2 There are important exceptions to this – some distributions have such long, ‘heavy’ tails that their expectations and standard deviations do not exist, and so averages have nothing to converge to. 3 If we can assume that all our observations are independent and come from the same population distribution, the standard error of their average is just the standard deviation of the population distribution divided by the square root of the sample size. 4 We shall see in Chapter 12 that practitioners of Bayesian statistics are happy using probabilities for epistemic uncertainty about parameters. 5 Strictly speaking, a 95% confidence interval does not mean there is a 95% probability that this particular interval contains the true value, although in practice people often give this incorrect interpretation. 6 Both of whom I had the pleasure of knowing in their more advanced years. 7 More precisely, 95% confidence intervals are often set as plus or minus 1.96 standard errors, based on assuming a precise normal sampling distribution for the statistic. 8 With 1,000 participants, the margin of error (in %) is at most ±100/√1,000 = 3%.

The Ethical Algorithm: The Science of Socially Aware Algorithm Design by Michael Kearns, Aaron Roth

See also p-hacking advantages of machine learning, 190–93 advertising, 191–92 Afghanistan, 50–51 age data, 27–29, 65–66, 86–89 aggregate data, 2, 30–34, 50–51 AI labs, 145–46 alcohol use data, 51–52 algebraic equations, 37 algorithmic game theory, 100–101 Amazon, 60–61, 116–17, 121, 123, 125 analogies, 57–63 anonymization of data “de-anonymizing,” 2–3, 14–15, 23, 25–26 reidentification of anonymous data, 22–31, 33–34, 38 shortcomings of anonymization methods, 23–29 and weaknesses of aggregate data, 31–32 Apple, 47–50 arbitrary harms, 38 Archimedes, 160–62 arms races, 180–81 arrest data, 92 artificial intelligence (AI), 13, 176–77, 179–82 Atari video games, 132 automation, 174–78, 180 availability of data, 1–3, 51, 66–67 averages, 40, 44–45 backgammon, 131 backpropagation algorithm, 9–10, 78–79, 145–46 “bad equilibria,” 95, 97, 136 Baidu, 148–51, 166, 185 bans on data uses, 39 Bayesian statistics, 38–39, 173 behavioral data, 123 benchmark datasets, 136 Bengio, Yoshua, 133 biases and algorithmic fairness, 57–63 and data collection, 90–93 and word embedding, 58–63, 77–78 birth date information, 23 bitcoin, 183–84 blood-type compatibility, 130 board games, 131–32 Bonferroni correction, 149–51, 153, 156, 164 book recommendation algorithms, 117–21 Bork, Robert, 24 bottlenecks, 107 breaches of data, 32 British Doctors Study, 34–36, 39, 51 brute force tasks, 183–84, 186 Cambridge University, 51–52 Central Intelligence Agency (CIA), 49–50 centralized differential privacy, 46–47 chain reaction intelligence growth, 185 cheating, 115, 148, 166 choice, 101–3 Chrome browser, 47–48, 195 classification of data, 146–48, 152–55 cloud computing, 121–23 Coase, Ronald, 159 Coffee Meets Bagel (dating app), 94–97, 100–101 coin flips, 42–43, 46–47 Cold War, 100 collaborative filtering, 23–24, 116–18, 123–25 collective behavioral data, 105–6, 109, 123–24 collective good, 112 collective language, 64 collective overfitting, 136.

Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Geron

Bayes’ theorem Unfortunately, in a Gaussian mixture model (and many other problems), the denominator p(x) is intractable, as it requires integrating over all the possible values of z (Equation 9-3). This means considering all possible combinations of cluster parameters and cluster assignments. Equation 9-3. The evidence p(X) is often intractable This is one of the central problems in Bayesian statistics, and there are several approaches to solving it. One of them is variational inference, which picks a family of distributions q(z; λ) with its own variational parameters λ (lambda), then it optimizes these parameters to make q(z) a good approximation of p(z|X). This is achieved by finding the value of λ that minimizes the KL divergence from q(z) to p(z|X), noted DKL(q‖p). The KL divergence equation is shown in (see Equation 9-4), and it can be rewritten as the log of the evidence (log p(X)) minus the evidence lower bound (ELBO).

A simpler approach to maximizing the ELBO is called black box stochastic variational inference (BBSVI): at each iteration, a few samples are drawn from q and they are used to estimate the gradients of the ELBO with regards to the variational parameters λ, which are then used in a gradient ascent step. This approach makes it possible to use Bayesian inference with any kind of model (provided it is differentiable), even deep neural networks: this is called Bayesian deep learning. Tip If you want to dive deeper into Bayesian statistics, check out the Bayesian Data Analysis book by Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. Gaussian mixture models work great on clusters with ellipsoidal shapes, but if you try to fit a dataset with different shapes, you may have bad surprises. For example, let’s see what happens if we use a Bayesian Gaussian mixture model to cluster the moons dataset (see Figure 9-24): Figure 9-24. moons_vs_bgm_diagram Oops, the algorithm desperately searched for ellipsoids, so it found 8 different clusters instead of 2.

Dinosaurs Rediscovered by Michael J. Benton

With colleagues Manabu Sakamoto and Chris Venditti from the University of Reading, we put together an even larger supertree of all dinosaur species, and dated it as accurately as we could. We then ran calculations to work out whether speciation and extinction rates were stable, rising, or falling through the Mesozoic. We were looking for one of three possible outcomes: that overall the balance of speciation and extinction gave ever-rising values, or levelling off, or declining values. We used Bayesian statistical methods, which involve seeding the calculations with a starting model, and then running the data millions or billions of times to assess how well the starting model fits the data, allowing for every possible source of uncertainty, and repeatedly adjusting the model to make it fit better. In this case, Manabu modelled uncertainty about dating the rocks, gaps in the record, accuracy of the phylogenetic tree, and many other issues.

pages: 294 words: 81,292

Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat

But by the time the tragedy unfolded, Holtzman told me, Good had retired. He was not in his office but at home, perhaps calculating the probability of God’s existence. According to Dr. Holtzman, sometime before he died, Good updated that probability from zero to point one. He did this because as a statistician, he was a long-term Bayesian. Named for the eighteenth-century mathematician and minister Thomas Bayes, Bayesian statistics’ main idea is that in calculating the probability of some statement, you can start with a personal belief. Then you update that belief as new evidence comes in that supports your statement or doesn’t. If Good’s original disbelief in God had remained 100 percent, no amount of data, not even God’s appearance, could change his mind. So, to be consistent with his Bayesian perspective, Good assigned a small positive probability to the existence of God to make sure he could learn from new data, if it arose.

pages: 283 words: 81,376

The Doomsday Calculation: How an Equation That Predicts the Future Is Transforming Everything We Know About Life and the Universe by William Poundstone

The Copernican method is like the sleek tech gadget that comes with intelligent defaults. The Carter-Leslie argument promises to be more customizable, more suited to those who like to tinker. Gott’s 1993 article does not mention Bayes’s theorem or prior probabilities. For some Nature readers that was a great sin. I asked Gott why he omitted Bayes, and he had a quick answer: “Bayesians.” “I didn’t put any Bayesian statistics in this paper because I didn’t want to muddy the waters,” he explained. “Because Bayesian people will argue about their priors, endlessly. I had a falsifiable hypothesis.” The long-standing complaint is that prior probabilities are subjective. A Bayesian prediction can be a case of garbage in, garbage out. There is plenty of scope to slant the results to one’s liking, and to wrap them up in the flag of impartial mathematics.

Twenty-Four Dogs in Albuquerque 1. “incredibly irresponsible”; “Anybody can see it’s garbage”: Caves interview, December 12, 2017. 2. “Gott dismisses the entire process”: Caves 2000, 2. 3. “it was important to find”: Caves 2000, 2. 4. “a notarized list of…24 dogs”: Caves 2000, 15. 5. “Gott is on record as applying”: Caves 2008, 2. 6. “We can distinguish two forms”: Bostrom 2002, 89. 7. “I didn’t put any Bayesian statistics”: Gott interview, July 31, 2017. 8. “When you can’t identify any time scales”: Caves 2008, 11. 9. “No other formula in the alchemy of logic”: Keynes 1921, 89. 10. Goodman’s objection to Gott: Goodman 1994. 11. Jeffreys prior compatible with location-and scale-invariance: This fact was demonstrated not by Jeffreys but by Washington University physicist E. T. Jaynes. See Jaynes 1968. 12.

pages: 319 words: 90,965

The End of College: Creating the Future of Learning and the University of Everywhere by Kevin Carey

In describing how the brain reacts to surprise, Lue said that “everything is a function of risk and opportunity.” To survive and prosper in the world with limited cognitive capacity, humans filter waves of constant sensory information through neural patterns—heuristics and mental shortcuts that our minds use to weigh the odds that what we are sensing is familiar and categorizable based on our past experience. Sebastian Thrun’s self-driving car does this with Bayesian statistics built into silicon and code, while the human mind uses electrochemical processes that we still don’t fully understand. But the underlying principle is the same: Based on the pattern of lines and shapes and edges, that is probably a boulder and I should drive around it. That is probably a group of three young women eating lunch at a table near the sushi bar and I should pay them no mind. Heuristics are also critically important to the market for higher education.

., 90–91, 98 Air Force, 91 Artificial Intelligence (AI), 11, 79, 136, 153, 159, 170, 264n Adaptive Control of Thought—Rational (ACT-R) model for, 101–4 cognitive tutoring using, 103, 105, 138, 179, 210 Dartmouth conference on, 79, 101 learning pathways for, 155 personalized learning with, 5, 232 theorem prover based in, 110 Thrun’s work in, 147–50 Arum, Richard, 9, 10, 36, 85, 244 Associate’s degrees, 6, 61, 117, 141, 193, 196, 198 Atlantic magazine, 29, 65, 79, 123 AT&T, 146 Australian National University, 204 Bachelor’s degrees, 6–9, 31, 36, 60–61, 64 for graduate school admission, 30 percentage of Americans with, 8, 9, 57, 77 professional versus liberal arts, 35 required for public school teachers, 117 social mobility and, 76 time requirement for, 6, 22 value in labor market of, 58 Badges, digital, 207–12, 216–18, 233, 245, 248 Barzun, Jacques, 32–34, 44, 45, 85 Bayesian statistics, 181 Bell Labs, 123–24 Bellow, Saul, 59, 78 Berlin, University of, 26, 45-46 Bhave, Amol, 214–15 Bing, 212 Binghamton, State University of New York at, 183–84 Bishay, Shereef, 139, 140 Bloomberg, Michael, 251 Blue Ocean Strategy (Kim and Mauborgne), 130 Bologna, University of, 16–17, 21, 41 Bonn, University of, 147 Bonus Army, 51 Borders Books, 127 Boston College, 164, 175 Boston Gazette, 95 Boston Globe, 2 Boston University (BU), 59, 61–62, 64 Bowen, William G., 112–13 Bowman, John Gabbert, 74–75 Brigham Young University, 2 Brilliant, 213 British Army, 98 Brookings Institution, 54 Brooklyn College, 44 Brown v.

pages: 397 words: 102,910

The Idealist: Aaron Swartz and the Rise of Free Culture on the Internet by Justin Peters

So we need an algorithm or computer program that would encourage lots of people to identify the fights and to start the campaigns,” McLean told the Sydney Morning Herald in 2014. “We’d put the tools that we have at our disposal in their hands.”32 Swartz had actually been building tools like these for several months with his colleagues at ThoughtWorks. Victory Kit, as the project was called, was an open-source version of the expensive community-organizing software used by groups such as MoveOn. Victory Kit incorporated Bayesian statistics—an analytical method that gets smarter as it goes along by consistently incorporating new information into its estimates—to improve activists’ ability to reach and organize their bases. “In the end, a lot of what the software was about was doing quite sophisticated A/B testing of messages for advocacy,” remembered Swartz’s friend Nathan Woodhull.33 Swartz was scheduled to present Victory Kit to the group at the Holmes retreat.

Ashcroft, 137–38, 140 FBI file on, 191–92, 223 fleeing the system, 8, 145, 151, 158–59, 161, 171, 173, 193, 248, 267 and free culture movement, 3–4, 141, 152–55, 167, 223 and Harvard, 3, 205, 207, 223, 224, 229 health issues of, 9, 150, 165–66, 222 immaturity of, 8–9 and Infogami, 147, 148–51, 158 interests of, 6–7, 8–9, 204, 221 “Internet and Mass Collaboration, The,” 166–67 lawyers for, 6, 254–55 legacy of, 14–15, 268, 269–70 and Library of Congress, 139 and Malamud, 187–93, 222, 223 manifesto of, 6–7, 178–81, 189–90, 201, 228–30, 247 mass downloading of documents by, 1, 3, 188–94, 197–202, 207, 213, 215, 222, 228, 235 media stories about, 125 and MIT, 1, 3, 201, 204, 207, 213, 222, 227, 232, 249–50, 262 and money, 170–71 on morality and ethics, 205–6 and Open Library, 163, 173, 179, 223, 228 and PCCC, 202–3, 225 as private person/isolation of, 2–3, 5, 124, 127, 143, 154–55, 158–60, 166, 169, 205, 224, 227, 228, 248–49, 251 and public domain, 123 as public speaker, 213–14, 224, 243, 257 and Reddit, see Reddit The Rules broken by, 14 “saving the world” on bucket list of, 7, 8, 15, 125, 151–52, 181, 205–6, 247–48, 266, 267, 268 self-help program of, 251–53 and theinfo.org, 172–73 and US Congress, 224–25, 239–40 Swartz, Robert: and Aaron’s death, 261, 262, 264 and Aaron’s early years, 124, 127 and Aaron’s legal woes, 232, 250, 254 and MIT Media Lab, 203–4, 212, 219, 232, 250 and technology, 124, 212 Swartz, Susan, 128–29, 160, 192 Swartz’s legal case: as “the bad thing,” 3, 7–8, 234 change in defense strategy, 256–57 evidence-suppression hearing, 259–60 facts of, 11 felony charges in, 235, 253 grand jury, 232–33 indictment, 1, 5, 8, 10, 11, 233, 234, 235–37, 241, 253–54 investigation and capture, 215–17, 223, 228 JSTOR’s waning interest in, 231–32 manifesto as evidence in, 228–30 motion to suppress, 6 motives sought in, 223, 229 Norton subpoenaed in, 1–2, 227–29 ongoing, 248, 249–51 online petitions against, 236–37 original charges in, 218, 222 plea deals offered, 227, 250 possible prison sentence, 1, 2, 5, 7–8, 11, 222, 232, 235–36, 253, 260 potential harm assessed, 218, 219, 222, 235 prosecutor’s zeal in, 7–8, 11, 218, 222–24, 235–37, 253–54, 259–60, 263, 264 search and seizure in, 6, 223–24, 256–57 Symbolics, 103 systems, flawed, 265–67 T. & J. W. Johnson, 49 Tammany Hall, New York, 57 tech bubble, 146, 156 technology: Bayesian statistics in, 258–59 burgeoning, 69, 71, 84, 87–88 communication, 12, 13, 18, 87–88 computing, see computers and digital culture, 122 and digital utopia, 91, 266–67 of electronic publishing, 120 and intellectual property, 90–91 and irrational exuberance, 146 in library of the future, 81–83 as magic, 152 moving inexorably forward, 134 overreaching police action against, 233 power of metadata, 128, 130 as private property, 210 resisting change caused by, 120 saving humanity via, 101 thinking machines, 102 unknown, future, 85 and World War II, 208 telephone, invention of, 69 Templeton, Brad, 261 theinfo.org, 172–73 theme parks, 134 ThoughtWorks, 9, 248, 257, 258 “thumb drive corps,” 187, 191, 193 Toyota Motor Corporation, “lean production” of, 7, 257, 265 Trumbull, John, McFingal, 26 trust-busting, 75 Tucher, Andie, 34 Tufte, Edward, 263–64 “tuft-hunter,” use of term, 28 Tumblr, 240 Twain, Mark, 60, 62, 73 Tweed, William “Boss,” 57 Twitter, 237 Ulrich, Lars, 133 United States: Articles of Confederation, 26 copyright laws in, 26–27 economy of, 44–45, 51, 55, 56 freedom to choose in, 80, 269 industrialization, 57 literacy in, 25, 26–27, 39, 44, 48 migration to cities in, 57 national identity of, 28, 32 new social class in, 69–70 opportunity in, 58, 80 poverty in, 59 railroads, 55, 56 rustic nation of, 44–45 values of, 85 UNIVAC computer, 81, 90 Universal Studios Orlando, 134 University of Illinois at Urbana-Champaign, 94, 95–96, 112–15 Unix, 104 US Chamber of Commerce, 239 utilitarianism, 214 Valenti, Jack, 111, 132 Van Buren, Martin, 44 Van Dyke, Henry, The National Sin of Literary Piracy, 61 venture capital, 146 Viaweb, 146 Victor, O.

pages: 412 words: 115,266

The Moral Landscape: How Science Can Determine Human Values by Sam Harris

If we are measuring sanity in terms of sheer numbers of subscribers, then atheists and agnostics in the United States must be delusional: a diagnosis which would impugn 93 percent of the members of the National Academy of Sciences.63 There are, in fact, more people in the United States who cannot read than who doubt the existence of Yahweh.64 In twenty-first-century America, disbelief in the God of Abraham is about as fringe a phenomenon as can be named. But so is a commitment to the basic principles of scientific thinking—not to mention a detailed understanding of genetics, special relativity, or Bayesian statistics. The boundary between mental illness and respectable religious belief can be difficult to discern. This was made especially vivid in a recent court case involving a small group of very committed Christians accused of murdering an eighteen-month-old infant.65 The trouble began when the boy ceased to say “Amen” before meals. Believing that he had developed “a spirit of rebellion,” the group, which included the boy’s mother, deprived him of food and water until he died.

The ACC and the caudate display an unusual degree of connectivity, as the surgical lesioning of the ACC (a procedure known as a cingulotomy) causes atrophy of the caudate, and the disruption of this pathway is thought to be the basis of the procedure’s effect in treating conditions like obsessive-compulsive disorder (Rauch et al., 2000; Rauch et al., 2001). There are, however, different types of uncertainty. For instance, there is a difference between expected uncertainty—where one knows that one’s observations are unreliable—and unexpected uncertainty, where something in the environment indicates that things are not as they seem. The difference between these two modes of cognition has been analyzed within a Bayesian statistical framework in terms of their underlying neurophysiology. It appears that expected uncertainty is largely mediated by acetylcholine and unexpected uncertainty by norepinephrine (Yu & Dayan, 2005). Behavioral economists sometimes distinguish between “risk” and “ambiguity”: the former being a condition where probability can be assessed, as in a game of roulette, the latter being the uncertainty borne of missing information.

pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

The distinction between descriptive and normative theories was articulated by John Neville Keynes in The Scope and Method of Political Economy (Macmillan, 1891). Chapter Six Sharon Bertsch McGrayne tells the history of Bayesianism, from Bayes and Laplace to the present, in The Theory That Would Not Die (Yale University Press, 2011). A First Course in Bayesian Statistical Methods,* by Peter Hoff (Springer, 2009), is an introduction to Bayesian statistics. The Naïve Bayes algorithm is first mentioned in Pattern Classification and Scene Analysis,* by Richard Duda and Peter Hart (Wiley, 1973). Milton Friedman argues for oversimplified theories in “The methodology of positive economics,” which appears in Essays in Positive Economics (University of Chicago Press, 1966). The use of Naïve Bayes in spam filtering is described in “Stopping spam,” by Joshua Goodman, David Heckerman, and Robert Rounthwaite (Scientific American, 2005).

pages: 829 words: 186,976

The Signal and the Noise: Why So Many Predictions Fail-But Some Don't by Nate Silver

Scott Armstrong, The Wharton School, University of Pennsylvania LIBRARY OF CONGRESS CATALOGING IN PUBLICATION DATA Silver, Nate. The signal and the noise : why most predictions fail but some don’t / Nate Silver. p. cm. Includes bibliographical references and index. ISBN 978-1-101-59595-4 1. Forecasting. 2. Forecasting—Methodology. 3. Forecasting—History. 4. Bayesian statistical decision theory. 5. Knowledge, Theory of. I. Title. CB158.S54 2012 519.5'42—dc23 2012027308 While the author has made every effort to provide accurate telephone numbers, Internet addresses, and other contact information at the time of publication, neither the publisher nor the author assumes any responsibility for errors, or for changes that occur after publication. Further, publisher does not have any control over and does not assume any responsibility for author or third-party Web sites or their content.

In essence, this player could go to work every day for a year and still lose money. This is why it is sometimes said that poker is a hard way to make an easy living. Of course, if this player really did have some way to know that he was a long-term winner, he’d have reason to persevere through his losses. In reality, there’s no sure way for him to know that. The proper way for the player to estimate his odds of being a winner, instead, is to apply Bayesian statistics,31 where he revises his belief about how good he really is, on the basis of both his results and his prior expectations. If the player is being honest with himself, he should take quite a skeptical attitude toward his own success, even if he is winning at first. The player’s prior belief should be informed by the fact that the average poker player by definition loses money, since the house takes some money out of the game in the form of the rake while the rest is passed around between the players.32 The Bayesian method described in the book The Mathematics of Poker, for instance, would suggest that a player who had made \$30,000 in his first 10,000 hands at a \$100/\$200 limit hold ’em game was nevertheless more likely than not to be a long-term loser.

McGrayne, The Theory That Would Not Die, Kindle location 7. 61. Raymond S. Nickerson, “Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy,” Psychological Methods, 5, 2 (2000), pp. 241–301. http://203.64.159.11/richman/plogxx/gallery/17/%E9%AB%98%E7%B5%B1%E5%A0%B1%E5%91%8A.pdf. 62. Andrew Gelman and Cosma Tohilla Shalizi, “Philosophy and the Practice of Bayesian Statistics,” British Journal of Mathematical and Statistical Psychology, pp. 1–31, January 11, 2012. http://www.stat.columbia.edu/~gelman/research/published/philosophy.pdf. 63. Although there are several different formulations of the steps in the scientific method, this version is mostly drawn from “APPENDIX E: Introduction to the Scientific Method,” University of Rochester. http://teacher.pas.rochester.edu/phy_labs/appendixe/appendixe.html. 64.

pages: 573 words: 157,767

From Bacteria to Bach and Back: The Evolution of Minds by Daniel C. Dennett

Friston, Karl, Michael Levin, Biswa Sengupta, and Giovanni Pezzulo. 2015. “Knowing One’s Place: A Free-Energy Approach to Pattern Regulation.” Journal of the Royal Society Interface, 12: 20141383. Frith, Chris D. 2012. “The Role of Metacognition in Human Social Interactions.” Philosophical Transactions of the Royal Society B: Biological Sciences 367 (1599): 2213–2223. Gelman, Andrew. 2008. “Objections to Bayesian Statistics.” Bayesian Anal. 3 (3): 445–449. Gibson, James J. 1966. “The Problem of Temporal Order in Stimulation and Perception.” Journal of Psychology 62 (2): 141–149. —. 1979. The Ecological Approach to Visual Perception. Boston: Houghton Mifflin. Godfrey-Smith, Peter. 2003. “Postscript on the Baldwin Effect and Niche Construction.” In Evolution and Learning: The Baldwin Effect Reconsidered, edited by Bruce H.

Natural Language Processing with Python and spaCy by Yuli Vasiliev

More no-nonsense books from NO STARCH PRESS PYTHON CRASH COURSE, 2ND EDITION A Hands-On, Project-Based Introduction to Programming by ERIC MATTHES MAY 2019, 544 pp., \$39.95 ISBN 978-1-59327-928-8 MATH ADVENTURES WITH PYTHON An Illustrated Guide to Exploring Math with Code by PETER FARRELL JANUARY 2019, 304 pp., \$29.95 ISBN 978-1-59327-867-0 THE BOOK OF R A First Course in Programming and Statistics by TILMAN M. DAVIES JULY 2016, 832 pp., \$49.95 ISBN 978-1-59327-651-5 BAYESIAN STATISTICS THE FUN WAY Understanding Statistics and Probability with Star Wars, LEGO, and Rubber Ducks by WILL KURT JULY 2019, 256 pp., \$34.95 ISBN 978-1-59327-956-1 PYTHON ONE-LINERS by CHRISTIAN MAYER SPRING 2020, 256 pp., \$39.95 ISBN 978-1-7185-0050-1 AUTOMATE THE BORING STUFF WITH PYTHON, 2ND EDITION Practical Programming for Total Beginners by AL SWEIGART NOVEMBER 2019, 592 pp., \$39.95 ISBN 978-1-59327-992-9 PHONE: 800.420.7240 OR 415.863.9900 EMAIL: SALES@NOSTARCH.COM WEB: WWW.NOSTARCH.COM BUILD YOUR OWN NLP APPLICATIONS Natural Language Processing with Python and spaCy will show you how to create NLP applications like chatbots, text-condensing scripts, and order-processing tools quickly and easily.

Analysis of Financial Time Series by Ruey S. Tsay

In this chapter, we introduce the ideas of MCMC methods and data augmentation that are widely applicable in finance. In particular, we discuss Bayesian inference via Gibbs sampling and demonstrate various applications of MCMC methods. Rapid developments in the MCMC methodology make it impossible to cover all the new methods available in the literature. Interested readers are referred to some recent books on Bayesian and empirical Bayesian statistics (e.g., Carlin and Louis, 2000; Gelman, Carlin, Stern, and Rubin, 1995). For applications, we focus on issues related to financial econometrics. The demonstrations shown in this chapter only represent a small fraction of all possible applications of the techniques in finance. As a matter of fact, it is fair to say that Bayesian inference and the MCMC methods discussed here are applicable to most, if not all, of the studies in financial econometrics.

Such a prior distribution is called a conjugate prior distribution. For MCMC methods, use of conjugate priors means that a closed-form solution for the conditional posterior distributions is available. Random draws of the Gibbs sampler can then be obtained by using the commonly available computer routines of probability distributions. In what follows, we review some well-known conjugate priors. For more information, readers are referred to textbooks on Bayesian statistics (e.g., DeGroot, 1970, Chapter 9). Result 1: Suppose that x1 , . . . , xn form a random sample from a normal distribution with mean µ, which is unknown, and variance σ 2 , which is known and positive. Suppose that the prior distribution of µ is a normal distribution with mean µo and variance σo2 . Then the posterior distribution of µ given the data and prior is 401 BAYESIAN INFERENCE normal with mean µ∗ and variance σ∗2 given by µ∗ = σ 2 µo + nσo2 x̄ σ 2 + nσo2 and σ∗2 = σ 2 σo2 , σ 2 + nσo2 n xi /n is the sample mean. where x̄ = i=1 In Bayesian analysis, it is often convenient to use the precision parameter η = 1/σ 2 (i.e., the inverse of the variance σ 2 ).

pages: 206 words: 70,924

The Rise of the Quants: Marschak, Sharpe, Black, Scholes and Merton by Colin Read

He postulated that the rational decision-maker will align his or her beliefs of unknown probabilities to the consensus bets of impartial bookmakers, a technique often called the Dutch Book. Thirty later, the great mind Leonard “Jimmie” Savage (1917–1971) elaborated his concept into an axiomatic approach to decision-making under uncertainty using arguments remarkably similar to Ramsey’s logic. The concepts of Ramsey and Savage also formed the basis for the theory of Bayesian statistics and are important in many aspects of financial decision-making. Marschak’s great insight While Ramsey created and Savage broadened the logical landscape for the inclusion of uncertainty into decision-making, it was not possible to incorporate their logic until the finance discipline could develop actual measures of uncertainty. Of course, modern financial analysis depends crucially even today on such a methodology to measure uncertainty.

pages: 654 words: 191,864

Thinking, Fast and Slow by Daniel Kahneman

So if you believe that there is a 40% chance plethat it will rain sometime tomorrow, you must also believe that there is a 60% chance it will not rain tomorrow, and you must not believe that there is a 50% chance that it will rain tomorrow morning. And if you believe that there is a 30% chance that candidate X will be elected president, and an 80% chance that he will be reelected if he wins the first time, then you must believe that the chances that he will be elected twice in a row are 24%. The relevant “rules” for cases such as the Tom W problem are provided by Bayesian statistics. This influential modern approach to statistics is named after an English minister of the eighteenth century, the Reverend Thomas Bayes, who is credited with the first major contribution to a large problem: the logic of how people should change their mind in the light of evidence. Bayes’s rule specifies how prior beliefs (in the examples of this chapter, base rates) should be combined with the diagnosticity of the evidence, the degree to which it favors the hypothesis over the alternative.

pages: 586 words: 186,548

Architects of Intelligence by Martin Ford

The basic problem is, how do we go beyond specific experiences to general truths? Or from the past to the future? In the case that Roger Shepard was thinking about, he was working on the basic mathematics of how might an organism, having experienced a certain stimulus to have some good or negative consequence, figure out which other things in the world are likely to have that same consequence? Roger had introduced some mathematics based on Bayesian statistics for solving that problem, which was a very elegant formulation of the general theory of how organisms could generalize from experience and he was looking to neural networks to try to take that theory and implement it in a more scalable way. Somehow, I wound up working with him on this project. Through that, I was exposed to both neural networks, as well as to Bayesian analyses of cognition early on, and you can view most of my career since then as working through those same ideas and methods.

Even a very young child can learn this new causal relation between moving your finger in a certain way and a screen lighting up, and that is how all sorts of other possibilities of action open to you. These problems of how we make a generalization from just one or a few examples are what I started working on with Roger Shepard when I was just an undergraduate. Early on, we used these ideas from Bayesian statistics, Bayesian inference, and Bayesian networks, to use the mathematics of probability theory to formulate how people’s mental models of the causal structure of the world might work. It turns out that tools that were developed by mathematicians, physicists, and statisticians to make inferences from very sparse data in a statistical setting were being deployed in the 1990s in machine learning and AI, and it revolutionized the field.

pages: 345 words: 75,660

Prediction Machines: The Simple Economics of Artificial Intelligence by Ajay Agrawal, Joshua Gans, Avi Goldfarb

Validere improves the efficiency of oil custody transfer by predicting the water content of incoming crude. These applications are a microcosm of what most businesses will be doing in the near future. If you’re lost in the fog trying to figure out what AI means for you, then we can help you understand the implications of AI and navigate through the advances in this technology, even if you’ve never programmed a convolutional neural network or studied Bayesian statistics. If you are a business leader, we provide you with an understanding of AI’s impact on management and decisions. If you are a student or recent graduate, we give you a framework for thinking about the evolution of jobs and the careers of the future. If you are a financial analyst or venture capitalist, we offer a structure around which you can develop your investment theses. If you are a policy maker, we give you guidelines for understanding how AI is likely to change society and how policy might shape those changes for the better.

pages: 267 words: 72,552

Reinventing Capitalism in the Age of Big Data by Viktor Mayer-Schönberger, Thomas Ramge

The idea was that every day, four hundred nationalized factories around the country would send data to Cybersyn’s nerve center in Santiago, the capital, where it would then be fed into a mainframe computer, scrutinized, and compared against forecasts. Divergences would be flagged and brought to the attention of factory directors, then to government decision makers sitting in a futuristic operations room. From there the officials would send directives back to the factories. Cybersyn was quite sophisticated for its time, employing a network approach to capturing and calculating economic activity and using Bayesian statistical models. Most important, it relied on feedback that would loop back into the decision-making processes. The system never became fully operational. Its communications network was in place and was used in the fall of 1972 to keep the country running when striking transportation workers blocked goods from entering Santiago. The computer-analysis part of Cybersyn was mostly completed, too, but its results were often unreliable and slow.

pages: 277 words: 87,082

Beyond Weird by Philip Ball

Those beliefs do not become realized as facts until they impinge on the consciousness of the observer – and so the facts are specific to every observer (although different observers can find themselves agreeing on the same facts). This notion takes its cue from standard Bayesian probability theory, introduced in the eighteenth century by the English mathematician and clergyman Thomas Bayes. In Bayesian statistics, probabilities are not defined with reference to some objective state of affairs in the world, but instead quantify personal degrees of belief of what might happen – which we update as we acquire new information. The QBist view, however, says something much more profound than simply that different people know different things. Rather, it asserts that there are no things that can be meaningfully spoken of beyond the self.

pages: 301 words: 85,126

AIQ: How People and Machines Are Smarter Together by Nick Polson, James Scott

Allen WannaCry (ransomware attack) waterfall diagram Watson (IBM supercomputer) Waymo (autonomous-car company) WeChat word vectors word2vec model (Google) World War I World War II Battle of the Bulge Bayesian search and Hopper, Grace, and Schweinfurt-Regensburg mission (World War II) Statistical Research Group (Columbia) and Wald’s survivability recommendations for aircraft Yormark, Brett YouTube Zillow ABOUT THE AUTHORS NICK POLSON is professor of Econometrics and Statistics at the Chicago Booth School of Business. He does research on artificial intelligence, Bayesian statistics, and deep learning, and is a frequent speaker at conferences. He lives in Chicago. You can sign up for email updates here. JAMES SCOTT is associate professor of Statistics at the University of Texas at Austin. He earned his Ph.D. in statistics from Duke University in 2009 after studying mathematics at the University of Cambridge on a Marshall Scholarship. He has published over 45 peer-reviewed scientific articles, and he has worked with clients across many industries to help them understand the power of their data.

pages: 290 words: 82,871

The Hidden Half: How the World Conceals Its Secrets by Michael Blastland

., ‘Non-steroidal Anti-inflammatory Drug Use is Associated with Increased Risk of Out-of-Hospital Cardiac Arrest: A Nationwide Case-time-control Study’, European Heart Journal – Cardiovascular Pharmacotherapy, vol. 3, no. 2, 2017, pp. 100–107. 4 I wrote about this case in a blog for the Winton Centre for Risk and Evidence Communication: ‘Here we Go Again’, 21 March 2017. 5 See, for example, James Ware, ‘The Limitations of Risk Factors as Prognostic Tools’, New England Journal of Medicine, 21 December 2006; and Tjeerd-Pieter van Staa et al., ‘Prediction of Cardiovascular Risk Using Framingham, ASSIGN and QRISK2: How Well Do They Predict Individual Rather than Population Risk?’, PLOS One, 1 October 2014. 6 This is a metaphor often used by some statisticians. I have a lot of time for it. But we are teetering here on the brink of a discussion of Bayesian statistics, and had better resist. Readers can find plenty of such discussions elsewhere. 7 We simply don’t have the data to do it at the individual level. Some people think we do, but to begin to convert one to the other requires a series of medical trials involving multiple tests on the same person, known as ‘N of 1’ trials, and these are not standard. 8 For a favourable explanation of how NNTs are calculated, their advantages, and for a searchable database of NNTs for different treatments, see: theNNT.com. 9 The wide variability of response in individuals that could produce the kind of average effect shown in the chart – but might also be consistent with a quite different set of individual reactions – is discussed in two articles by Stephen Senn on https://errorstatistics.com: ‘Responder Despondency’ and ‘Painful Dichotomies’.

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Inferential statistics seek to explain, not simply describe, the patterns and relationships that may exist within a dataset, and to test the strength and significance of associations between variables. They include parametric statistics which are employed to assess hypotheses using interval and ratio level data, such as correlation and regression; non-parametric statistics used for testing hypotheses using nominal or ordinal-level data; and probabilistic statistics that determine the probability of a condition occurring, such as Bayesian statistics. The armoury of descriptive and inferential statistics that have traditionally been used to analyse small data are also being applied to big data, though as discussed in Chapter 9 this is not always straightforward because many of these techniques were developed to draw insights from relatively scarce rather than exhaustive data. Nonetheless, the techniques do provide a means of making sense of massive amounts of data.

This is known as machine learning, the fundamentals of which were developed in the 1800s and early 1900s and have been worked on ever since. Recently, there has been a resurgence in interest in machine learning algorithms and applications owing to the availability of extremely cost-effective processing power and the easy availability of large datasets. Understanding machine learning techniques in great detail is a massive field at the intersection of linear algebra, multivariate calculus, probability theory, frequentist and Bayesian statistics, and an in-depth analysis of machine learning is beyond the scope of a single book. Machine learning methods, however, are surprisingly easily accessible in Python and quite intuitive to understand, so we will explain the intuition behind the methods and see how they find applications in algorithmic trading. But first, let's introduce some basic concepts and notation that we will need for the rest of this chapter.

pages: 340 words: 97,723

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity by Amy Webb

James Andrews, mathematician and professor at Florida State University who specialized in group theory and knot theory. Jean Bartik, mathematician and one of the original programmers for the ENIAC computer. Albert Turner Bharucha-Reid, mathematician and theorist who made significant contributions in Markov chains, probability theory, and statistics. David Blackwell, statistician and mathematician who made significant contributions to game theory, information theory, probability theory, and Bayesian statistics. Mamie Phipps Clark, a PhD and social psychologist whose research focused on self-consciousness. Thelma Estrin, who pioneered the application of computer systems in neurophysiological and brain research. She was a researcher in the Electroencephalography Department of the Neurological Institute of Columbia Presbyterian at the time of the Dartmouth Summer Research Project. Evelyn Boyd Granville, a PhD in mathematics who developed the computer programs used for trajectory analysis in the first US-manned missions to space and the moon.

pages: 407 words: 104,622

The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution by Gregory Zuckerman

Rather than manually programming in static knowledge about how language worked, they created a program that learned from data. Brown, Mercer, and the others relied upon Bayesian mathematics, which had emerged from the statistical rule proposed by Reverend Thomas Bayes in the eighteenth-century. Bayesians will attach a degree of probability to every guess and update their best estimates as they receive new information. The genius of Bayesian statistics is that it continuously narrows a range of possibilities. Think, for example, of a spam filter, which doesn’t know with certainty if an email is malicious, but can be effective by assigning odds to each one received by constantly learning from emails previously classified as “junk.” (This approach wasn’t as strange as it might seem. According to linguists, people in conversation unconsciously guess the next words that will be spoken, updating their expectations along the way.)

pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline by Cathy O'Neil, Rachel Schutt

Finally, don’t ignore the necessary data intuition when you make use of algorithms. Just because your method converges, it doesn’t mean the results are meaningful. Make sure you’ve created a reasonable narrative and ways to check its validity. Chapter 12. Epidemiology The contributor for this chapter is David Madigan, professor and chair of statistics at Columbia. Madigan has over 100 publications in such areas as Bayesian statistics, text mining, Monte Carlo methods, pharmacovigilance, and probabilistic graphical models. Madigan’s Background Madigan went to college at Trinity College Dublin in 1980, and specialized in math except for his final year, when he took a bunch of stats courses, and learned a bunch about computers: Pascal, operating systems, compilers, artificial intelligence, database theory, and rudimentary computing skills.

pages: 398 words: 120,801

Little Brother by Cory Doctorow

They hopped from Xbox to Xbox until they found one that was connected to the Internet, then they injected their material as undecipherable, encrypted data. No one could tell which of the Internet's packets were Xnet and which ones were just plain old banking and e-commerce and other encrypted communication. You couldn't find out who was tying the Xnet, let alone who was using the Xnet. But what about Dad's "Bayesian statistics?" I'd played with Bayesian math before. Darryl and I once tried to write our own better spam filter and when you filter spam, you need Bayesian math. Thomas Bayes was an 18th century British mathematician that no one cared about until a couple hundred years after he died, when computer scientists realized that his technique for statistically analyzing mountains of data would be super-useful for the modern world's info-Himalayas.

pages: 755 words: 121,290

Statistics hacks by Bruce Frey

William Skorupski is currently an assistant professor in the School of Education at the University of Kansas, where he teaches courses in psychometrics and statistics. He earned his Bachelor's degree in educational research and psychology from Bucknell University in 2000, and his Doctorate in psychometric methods from the University of Massachusetts, Amherst in 2004. His primary research interest is in the application of mathematical models to psychometric data, including the use of Bayesian statistics for solving practical measurement problems. He also enjoys applying his knowledge of statistics and probability to everyday situations, such as playing poker against the author of this book! Acknowledgments I'd like to thank all the contributors to this book, both those who are listed in the "Contributors" section and those who helped with ideas, reviewed the manuscript, and provided suggestions of sources and resources.

pages: 415 words: 125,089

Against the Gods: The Remarkable Story of Risk by Peter L. Bernstein

Jahn Maynard Keynes. Vol. 1: Hopes Betrayed. New York: Viking. Slovic, Paul, Baruch Fischoff, and Sarah Lichtenstein, 1990. "Rating the Risks." In Glickman and Gough, 1990, pp. 61-75. Smith, Clifford W., Jr., 1995. "Corporate Risk Management: Theory and Practice." Journal of Derivatives, Summer, pp. 21-30. Smith, M. F. M., 1984. "Present Position and Potential Developments: Some Personal Views of Bayesian Statistics." Journal of the Royal Statistical Association, Vol. 147, Part 3, pp. 245-259. Smithson, Charles W., and Clifford W. Smith, Jr., 1995. Managing Financial Risk: A Guide to Derivative Products, Financial Engineering, and Value Maximization. New York: Irwin.* Sorensen, Eric, 1995. "The Derivative Portfolio Matrix-Combining Market Direction with Market Volatility." Institute for Quantitative Research in Finance, Spring 1995 Seminar.

pages: 483 words: 141,836

Red-Blooded Risk: The Secret History of Wall Street by Aaron Brown, Eric Kim

If you accept that your entire earthly life is the appropriate numeraire for decision making, then the rest of Pascal’s case is easy to accept. Just as Archimedes claimed that with a long enough lever he could move the earth, I claim that with a big enough numeraire, I can make any faith-based action seem reasonable. Frequentist statistics suffers from paradoxes because it doesn’t insist everything be stated in moneylike terms, without which there’s no logical connection between frequency and degree of belief. Bayesian statistics suffers from insisting on a single, universal numeraire, which is often not appropriate. One thing we know about money is that it can’t buy everything. One thing we know about people is they have multiple natures, and groups of people are even more complicated. There are many numeraires, more than there are people. Picking the right one is key to getting meaningful statistical results. The only statistical analyses that can be completely certain are ones that are pure mathematical results, and ones that refer to gamelike situations in which all outside considerations are excluded by rule and the numeraire is specified.

No Slack: The Financial Lives of Low-Income Americans by Michael S. Barr

Barr, Anjali ­Kumar, and Robert E. Litan, 117–41. Brookings. Romich, Jennifer, Sarah Gordon, and Eric N. Waithaka. 2009. “A Tool for Getting By or Getting Ahead? Consumers’ Views on Prepaid Cards.” Working Paper 2009-WP-09. Terre Haute: Indiana State University, Networks Financial Institute (http://ssrn.com/ abstract=1491645). Rossi, Peter E., Greg M. Allenby, and Robert McCulloch. 2005. Bayesian Statistics and Marketing. West Sussex, U.K.: John Wiley & Sons. Sawtooth Software. 2008. “Proceedings of the Sawtooth Software Conference, October 2007” (www.sawtoothsoftware.com/download/techpap/2007Proceedings.pdf ). Seidman, Ellen, Moez Hababou, and Jennifer Kramer. 2005. A Financial Services Survey of Low- and Moderate-Income Households. Chicago: Center for Financial Services Innovation (http://cfsinnovation.com/system/files/imported/managed_documents/threecitysurvey. pdf ).

pages: 523 words: 143,139

Algorithms to Live By: The Computer Science of Human Decisions by Brian Christian, Tom Griffiths

Laplace was born in Normandy: For more details on Laplace’s life and work, see Gillispie, Pierre-Simon Laplace. distilled down to a single estimate: Laplace’s Law is derived by working through the calculation suggested by Bayes—the tricky part is the sum over all hypotheses, which involves a fun application of integration by parts. You can see a full derivation of Laplace’s Law in Griffiths, Kemp, and Tenenbaum, “Bayesian Models of Cognition.” From the perspective of modern Bayesian statistics, Laplace’s Law is the posterior mean of the binomial rate using a uniform prior. If you try only once and it works out: You may recall that in our discussion of multi-armed bandits and the explore/exploit dilemma in chapter 2, we also touched on estimates of the success rate of a process—a slot machine—based on a set of experiences. The work of Bayes and Laplace undergirds many of the algorithms we discussed in that chapter, including the Gittins index.

pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom

They also provide important insight into the concept of causality.28 One advantage of relating learning problems from specific domains to the general problem of Bayesian inference is that new algorithms that make Bayesian inference more efficient will then yield immediate improvements across many different areas. Advances in Monte Carlo approximation techniques, for example, are directly applied in computer vision, robotics, and computational genetics. Another advantage is that it lets researchers from different disciplines more easily pool their findings. Graphical models and Bayesian statistics have become a shared focus of research in many fields, including machine learning, statistical physics, bioinformatics, combinatorial optimization, and communication theory.35 A fair amount of the recent progress in machine learning has resulted from incorporating formal results originally derived in other academic fields. (Machine learning applications have also benefitted enormously from faster computers and greater availability of large data sets

pages: 579 words: 183,063

Tribe of Mentors: Short Life Advice From the Best in the World by Timothy Ferriss

Sometimes it isn’t, and I need to spend time doing other stuff before I’m ready. Often, I end up realizing that those things aren’t important and I just forget about them forever. What is one of the best or most worthwhile investments you’ve ever made? Lots of time spent doing math and philosophy has paid off and will continue to pay off, I have (almost) no doubt. Questioning the foundation of Bayesian statistics has been a very valuable process. Reworking definitions and impossibility results from consensus literature has been equally valuable. What purchase of \$100 or less has most positively impacted your life in the last six months (or in recent memory)? An audio lecture series on institutional economics called “International Economic Institutions: Globalism vs. Nationalism.” It was interesting/important to me because it was the first information about institutional design that I’ve ever really internalized.

Statistics in a Nutshell by Sarah Boslaugh

The Reverend Thomas Bayes Bayes’ theorem was developed by a British Nonconformist minister, the Reverend Thomas Bayes (1702–1761). Bayes studied logic and theology at the University of Edinburgh and earned his livelihood as a minister in Holborn and Tunbridge Wells, England. However, his fame today rests on his theory of probability, which was developed in his essay, published after his death by the Royal Society of London. There is an entire field of study today known as Bayesian statistics, which is based on the notion of probability as a statement of strength of belief rather than as a frequency of occurrence. However, it is uncertain whether Bayes himself would have embraced this definition because he published relatively little on mathematics during his lifetime. Enough Exposition, Let’s Do Some Statistics! Statistics is something you do, not something you read about, so the real purpose of the preceding theoretical presentation is to give you the information you need to perform calculations about the probability of events and to use the concepts introduced to be able to reason using your knowledge of statistics.

pages: 685 words: 203,949

The Organized Mind: Thinking Straight in the Age of Information Overload by Daniel J. Levitin

For every 5 people who take the treatment, 1 will be cured (because that person actually has the disease) and .25 will have the side effects. In this case, with two tests, you’re now about 4 times more likely to experience the cure than the side effects, a nice reversal of what we saw before. (If it makes you uncomfortable to talk about .25 of a person, just multiply all the numbers above by 4.) We can take Bayesian statistics a step further. Suppose a newly published study shows that if you are a woman, you’re ten times more likely to get the disease than if you’re a man. You can construct a new table to take this information into account, and to refine the estimate that you actually have the disease. The calculations of probabilities in real life have applications far beyond medical matters. I asked Steve Wynn, who owns five casinos (at his Wynn and Encore hotels in Las Vegas, and the Wynn, Encore, and Palace in Macau), “Doesn’t it hurt, just a little, to see customers walking away with large pots of your money?”

pages: 1,737 words: 491,616

Rationality: From AI to Zombies by Eliezer Yudkowsky

Some frequentists criticize Bayesians for treating probabilities as subjective states of belief, rather than as objective frequencies of events. Kruschke and Yudkowsky have replied that frequentism is even more “subjective” than Bayesianism, because frequentism’s probability assignments depend on the intentions of the experimenter.10 Importantly, this philosophical disagreement shouldn’t be conflated with the distinction between Bayesian and frequentist data analysis methods, which can both be useful when employed correctly. Bayesian statistical tools have become cheaper to use since the 1980s, and their informativeness, intuitiveness, and generality have come to be more widely appreciated, resulting in “Bayesian revolutions” in many sciences. However, traditional frequentist methods remain more popular, and in some contexts they are still clearly superior to Bayesian approaches. Kruschke’s Doing Bayesian Data Analysis is a fun and accessible introduction to the topic.11 In light of evidence that training in statistics—and some other fields, such as psychology—improves reasoning skills outside the classroom, statistical literacy is directly relevant to the project of overcoming bias.

I responded—note that this was completely spontaneous—“What on Earth do you mean? You can’t avoid assigning a probability to the mathematician making one statement or another. You’re just assuming the probability is 1, and that’s unjustified.” To which the one replied, “Yes, that’s what the Bayesians say. But frequentists don’t believe that.” And I said, astounded: “How can there possibly be such a thing as non-Bayesian statistics?” That was when I discovered that I was of the type called “Bayesian.” As far as I can tell, I was born that way. My mathematical intuitions were such that everything Bayesians said seemed perfectly straightforward and simple, the obvious way I would do it myself; whereas the things frequentists said sounded like the elaborate, warped, mad blasphemy of dreaming Cthulhu. I didn’t choose to become a Bayesian any more than fishes choose to breathe water.

pages: 827 words: 239,762

The Golden Passport: Harvard Business School, the Limits of Capitalism, and the Moral Failure of the MBA Elite by Duff McDonald

Anyone who has come across a decision tree when contemplating the choices and uncertainties in business owes them a debt. In short, their work opened up just about any business problem to mathematical analysis, without necessarily sacrificing expert opinion in the process. In 1959, Schlaifer published Probability and Statistics for Business Decisions, and in 1961, Raiffa and Schlaifer coauthored Applied Statistical Decision Theory, which “set the direction of Bayesian statistics for the next two decades.”10 But this was geeky stuff, especially for the more “broad-gauged” crowd at HBS. So even if the School was trying as hard as it could to keep up with the GSIAs of the world, it still felt a need to apologize for getting too geeky with Applied Statistical Decision Theory. Calling it “a new type of publication,” Dean Teele explained that “[whereas] most reports . . . published by the Division of Research have as their intended audience informed and forward-looking business executives in general, the new series has been written primarily for specialists. . . .”11 Translation: You may not understand it, but that doesn’t mean you’re not “informed and forward-looking.”

pages: 764 words: 261,694

The Elements of Statistical Learning (Springer Series in Statistics) by Trevor Hastie, Robert Tibshirani, Jerome Friedman

A modified principal component technique based on the lasso, Journal of Computational and Graphical Statistics 12: 531–547. Jones, L. (1992). A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training, Annals of Statistics 20: 608–613. Jordan, M. (2004). Graphical models, Statistical Science (Special Issue on Bayesian Statistics) 19: 140–155. Jordan, M. and Jacobs, R. (1994). Hierachical mixtures of experts and the EM algorithm, Neural Computation 6: 181–214. Kalbfleisch, J. and Prentice, R. (1980). The Statistical Analysis of Failure Time Data, Wiley, New York. Kaufman, L. and Rousseeuw, P. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York. Kearns, M. and Vazirani, U. (1994). An Introduction to Computational Learning Theory, MIT Press, Cambridge, MA.