291 results back to index

**
Natural Language Processing with Python and spaCy
** by
Yuli Vasiliev

Bayesian statistics, computer vision, database schema, en.wikipedia.org, loose coupling, natural language processing, Skype, statistical model

Statistical language modeling is vital to many natural language processing tasks, such as natural language generating and natural language understanding. For this reason, a statistical model lies at the heart of virtually any NLP application. Figure 1-4 provides a conceptual depiction of how an NLP application uses a statistical model. Figure 1-4: A high-level conceptual view of an NLP application’s architecture The application interacts with spaCy’s API, which abstracts the underlying statistical model. The statistical model contains information like word vectors and linguistic annotations. The linguistic annotations might include features such as part-of-speech tags and syntactic annotations. The statistical model also includes a set of machine learning algorithms that can extract the necessary pieces of information from the stored data.

…

Rather than encoding a language by assigning each word to a predetermined number, machine learning algorithms generate statistical models that detect patterns in large volumes of language data and then make predictions about the syntactic structure in new, previously unseen text data. Figure 1-3 summarizes how language processing works for natural languages and programming languages, respectively. A natural language processing system uses an underlying statistical model to make predictions about the meaning of input text and then generates an appropriate response. In contrast, a compiler processing programming code applies a set of strictly defined rules. Figure 1-3: On the left, a basic workflow for processing natural language; on the right, a basic workflow for processing a programming language What Is a Statistical Model in NLP? In NLP, a statistical model contains estimates for the probability distribution of linguistic units, such as words and phrases, allowing you to assign linguistic features to them.

…

From now on, we’ll use python and pip regardless of the executables your system uses. If you decide to upgrade your installed spaCy package to the latest version, you can do this using the following pip command: $ pip install -U spacy Installing Statistical Models for spaCy The spaCy installation doesn’t include statistical models that you’ll need when you start using the library. The statistical models contain knowledge collected about the particular language from a set of sources. You must separately download and install each model you want to use. Several pretrained statistical models are available for different languages. For English, for example, the following models are available for download from spaCy’s website: en_core_web_sm, en_core_web_md, en_core_web_lg, and en_vectors_web_lg. The models use the following naming convention: lang_type_genre_size.

pages: 50 words: 13,399

**
The Elements of Data Analytic Style
** by
Jeff Leek

correlation does not imply causation, Netflix Prize, p-value, pattern recognition, Ronald Coase, statistical model

Missing data are often simply ignored by statistical software, but this means that if the missing data have informative patterns, then analyses will ultimately be biased. As an example, suppose you are analyzing data to identify a relationship between geography and income in a city, but all the data from suburban neighborhoods are missing. 6. Statistical modeling and inference The central goal of statistical modeling is to use a small subsample of individuals to say something about a larger population. The reasons for taking this sample are often the cost or difficulty of measuring data on the whole population. The subsample is identified with probability (Figure 6.1). Figure 6.1 Probability is used to obtain a sample Statistical modeling and inference are used to try to generalize what we see in the sample to the population. Inference involves two separate steps, first obtaining a best estimate for what we expect in the population (Figure 6.2).

…

A measure of inference on a scientific scale (such as confidence intervals or credible intervals) should be reported and interpreted with every p-value. 6.12.3 Inference without exploration A very common mistake is to move directly to model fitting and calculation of statistical significance. Before these steps, it is critical to tidy, check, and explore the data to identify dataset specific conditions that may violate your model assumptions. 6.12.4 Assuming the statistical model fit is good Once a statistical model is fit to data it is critical to evaluate how well the model describes the data. For example, with a linear regression analysis it is critical to plot the best fit line over the scatterplot of the original data, plot the residuals, and evaluate whether the estimates are reasonable. It is ok to fit only one statistical model to a data set to avoid data dredging, as long as you carefully report potential flaws with the model. 6.12.5 Drawing conclusions about the wrong population When you perform inference, the goal is to make a claim about the larger population you have sampled from.

…

This mistake is so pervasive it even caused the loss of a mars satellite. Histograms and boxplots are good ways to check that the measurements you observe fall on the right scale. 4.10 Common mistakes 4.10.1 Failing to check the data at all A common temptation in data analysis is to load the data and immediately leap to statistical modeling. Checking the data before analysis is a critical step in the process. 4.10.2 Encoding factors as quantitative numbers If a scale is qualitative, but the variable is encoded as 1, 2, 3, etc. then statistical modeling functions may interpret this variable as a quantitative variable and incorrectly order the values. 4.10.3 Not making sufficient plots A common mistake is to only make tabular summaries of the data when doing data checking. Creating a broad range of data visualizations, one for each potential problem in a data set, is the best way to identify problems. 4.10.4 Failing to look for outliers or missing values A common mistake is to assume that all measurements follow the appropriate distribution.

pages: 442 words: 94,734

**
The Art of Statistics: Learning From Data
** by
David Spiegelhalter

Antoine Gombaud: Chevalier de Méré, Bayesian statistics, Carmen Reinhart, complexity theory, computer vision, correlation coefficient, correlation does not imply causation, dark matter, Edmond Halley, Estimating the Reproducibility of Psychological Science, Hans Rosling, Kenneth Rogoff, meta analysis, meta-analysis, Nate Silver, Netflix Prize, p-value, placebo effect, probability theory / Blaise Pascal / Pierre de Fermat, publication bias, randomized controlled trial, recommendation engine, replication crisis, self-driving car, speech recognition, statistical model, The Design of Experiments, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Malthus

Even with the Bradford Hill criteria outlined above, statisticians are generally reluctant to attribute causation unless there has been an experiment, although computer scientist Judea Pearl and others have made great progress in setting out the principles for building causal regression models from observational data.2 Pearson Correlation Gradient of regression of offspring on parent Mothers and daughters 0.31 0.33 Fathers and sons 0.39 0.45 Table 5.2 Correlations between heights of adult children and parent of the same gender, and gradients of the regression of the offspring’s on the parent’s height. Regression Lines Are Models The regression line we fitted between fathers’ and sons’ heights is a very basic example of a statistical model. The US Federal Reserve define a model as a ‘representation of some aspect of the world which is based on simplifying assumptions’: essentially some phenomenon will be represented mathematically, generally embedded in computer software, in order to produce a simplified ‘pretend’ version of reality.3 Statistical models have two main components. First, a mathematical formula that expresses a deterministic, predictable component, for example the fitted straight line that enables us to make a prediction of a son’s height from his father’s. But the deterministic part of a model is not going to be a perfect representation of the observed world.

…

We have data that can help us answer some of these questions, with which we have already done some exploratory plotting and drawn some informal conclusions about an appropriate statistical model. But we now come to a formal aspect of the Analysis part of the PPDAC cycle, generally known as hypothesis testing. What Is a ‘Hypothesis’? A hypothesis can be defined as a proposed explanation for a phenomenon. It is not the absolute truth, but a provisional, working assumption, perhaps best thought of as a potential suspect in a criminal case. When discussing regression in Chapter 5, we saw the claim that observation = deterministic model + residual error. This represents the idea that statistical models are mathematical representations of what we observe, which combine a deterministic component with a ‘stochastic’ component, the latter representing unpredictability or random ‘error’, generally expressed in terms of a probability distribution.

…

This would only be rejected by large positive values of a test statistic representing an estimated treatment effect. A two-sided test would be appropriate for a null hypothesis that a treatment effect, say, is exactly zero, and so both positive and negative estimates would lead to the null being rejected. one-tailed and two-tailed P-values: those corresponding to one-sided and two-sided tests. over-fitting: building a statistical model that is over-adapted to training data, so that its predictive ability starts to decline. parameters: the unknown quantities in a statistical model, generally denoted with Greek letters. Pearson correlation coefficient: for a set of n paired numbers, (x1, y1), (x2, y2) … (xn, yn), when , sx are the sample mean and standard deviation of the xs, and , sy are the sample mean and standard deviation of the ys, the Pearson correlation coefficient is given by Suppose xs and ys have both been standardized to Z-scores given by us and vs respectively, so that ui = (xi – )/sx, and vi = (yi – )/sy.

pages: 227 words: 62,177

**
Numbers Rule Your World: The Hidden Influence of Probability and Statistics on Everything You Do
** by
Kaiser Fung

American Society of Civil Engineers: Report Card, Andrew Wiles, Bernie Madoff, Black Swan, business cycle, call centre, correlation does not imply causation, cross-subsidies, Daniel Kahneman / Amos Tversky, edge city, Emanuel Derman, facts on the ground, fixed income, Gary Taubes, John Snow's cholera map, moral hazard, p-value, pattern recognition, profit motive, Report Card for America’s Infrastructure, statistical model, the scientific method, traveling salesman

Figure C-1 Drawing a Line Between Natural and Doping Highs Because the anti-doping laboratories face bad publicity for false positives (while false negatives are invisible unless the dopers confess), they calibrate the tests to minimize false accusations, which allows some athletes to get away with doping. The Virtue of Being Wrong The subject matter of statistics is variability, and statistical models are tools that examine why things vary. A disease outbreak model links causes to effects to tell us why some people fall ill while others do not; a credit-scoring model identifies correlated traits to describe which borrowers are likely to default on their loans and which will not. These two examples represent two valid modes of statistical modeling. George Box is justly celebrated for his remark “All models are false but some are useful.” The mark of great statisticians is their confidence in the face of fallibility. They recognize that no one can have a monopoly on the truth, which is unknowable as long as there is uncertainty in the world.

…

Highway engineers in Minnesota tell us why their favorite tactic to reduce congestion is a technology that forces commuters to wait more, while Disney engineers make the case that the most effective tool to reduce wait times does not actually reduce average wait times. Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. In Chapter 2, we compare and contrast these two modes of statistical modeling by trailing disease detectives on the hunt for tainted spinach (causal models) and by prying open the black box that produces credit scores (correlational models). Surprisingly, these practitioners freely admit that their models are “wrong” in the sense that they do not perfectly describe the world around us; we explore how they justify what they do. Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups.

…

They play a high-stakes game, ever wary of the tyranny of the unknown, ever worried about the consequence of miscalculation. Their special talent is the educated guess, with emphasis on the adjective. The leaders of the pack are practical-minded people who rely on detailed observation, directed research, and data analysis. Their Achilles heel is the big I, when they let intuition lead them astray. This chapter celebrates two groups of statistical modelers who have made lasting, positive impacts on our lives. First, we meet the epidemiologists whose investigations explain the causes of disease. Later, we meet credit modelers who mark our fiscal reputation for banks, insurers, landlords, employers, and so on. By observing these scientists in action, we will learn how they have advanced the technical frontier and to what extent we can trust their handiwork. ~###~ In November 2006, the U.S.

pages: 523 words: 112,185

**
Doing Data Science: Straight Talk From the Frontline
** by
Cathy O'Neil,
Rachel Schutt

Amazon Mechanical Turk, augmented reality, Augustin-Louis Cauchy, barriers to entry, Bayesian statistics, bioinformatics, computer vision, correlation does not imply causation, crowdsourcing, distributed generation, Edward Snowden, Emanuel Derman, fault tolerance, Filter Bubble, finite state, Firefox, game design, Google Glasses, index card, information retrieval, iterative process, John Harrison: Longitude, Khan Academy, Kickstarter, Mars Rover, Nate Silver, natural language processing, Netflix Prize, p-value, pattern recognition, performance metric, personalized medicine, pull request, recommendation engine, rent-seeking, selection bias, Silicon Valley, speech recognition, statistical model, stochastic process, text mining, the scientific method, The Wisdom of Crowds, Watson beat the top human players on Jeopardy!, X Prize

He was using it to mean data models—the representation one is choosing to store one’s data, which is the realm of database managers—whereas she was talking about statistical models, which is what much of this book is about. One of Andrew Gelman’s blog posts on modeling was recently tweeted by people in the fashion industry, but that’s a different issue. Even if you’ve used the terms statistical model or mathematical model for years, is it even clear to yourself and to the people you’re talking to what you mean? What makes a model a model? Also, while we’re asking fundamental questions like this, what’s the difference between a statistical model and a machine learning algorithm? Before we dive deeply into that, let’s add a bit of context with this deliberately provocative Wired magazine piece, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete,” published in 2008 by Chris Anderson, then editor-in-chief.

…

Attention must always be paid to these abstracted details after a model has been analyzed to see what might have been overlooked. In the case of proteins, a model of the protein backbone with side-chains by itself is removed from the laws of quantum mechanics that govern the behavior of the electrons, which ultimately dictate the structure and actions of proteins. In the case of a statistical model, we may have mistakenly excluded key variables, included irrelevant ones, or assumed a mathematical structure divorced from reality. Statistical modeling Before you get too involved with the data and start coding, it’s useful to draw a picture of what you think the underlying process might be with your model. What comes first? What influences what? What causes what? What’s a test of that? But different people think in different ways. Some prefer to express these kinds of relationships in terms of math.

…

In the purest sense, an algorithm is a set of rules or steps to follow to accomplish some task, and a model is an attempt to describe or capture the world. These two seem obviously different, so it seems the distinction should should be obvious. Unfortunately, it isn’t. For example, regression can be described as a statistical model as well as a machine learning algorithm. You’ll waste your time trying to get people to discuss this with any precision. In some ways this is a historical artifact of statistics and computer science communities developing methods and techniques in parallel and using different words for the same methods. The consequence of this is that the distinction between machine learning and statistical modeling is muddy. Some methods (for example, k-means, discussed in the next section) we might call an algorithm because it’s a series of computational steps used to cluster or classify objects—on the other hand, k-means can be reinterpreted as a special case of a Gaussian mixture model.

pages: 209 words: 13,138

**
Empirical Market Microstructure: The Institutions, Economics and Econometrics of Securities Trading
** by
Joel Hasbrouck

Alvin Roth, barriers to entry, business cycle, conceptual framework, correlation coefficient, discrete time, disintermediation, distributed generation, experimental economics, financial intermediation, index arbitrage, information asymmetry, interest rate swap, inventory management, market clearing, market design, market friction, market microstructure, martingale, price discovery process, price discrimination, quantitative trading / quantitative ﬁnance, random walk, Richard Thaler, second-price auction, selection bias, short selling, statistical model, stochastic process, stochastic volatility, transaction costs, two-sided market, ultimatum game, zero-sum game

If we know that the structural model is the particular one described in section 9.2, we simply set vt so that qt = +1, set ut = 0 and forecast using equation (9.7). We do not usually know the structural model, however. Typically we’re working from estimates of a statistical model (a VAR or VMA). This complicates specification of ε0 . From the perspective of the VAR or VMA model of the trade and price data, the innovation vector and its variance are: 2 σp,q σp εp,t . (9.15) and = εt = εq,t σp,q σq2 The innovations in the statistical model are simply associated with the observed variables, and have no necessary structural interpretation. We can still set εq,t according to our contemplated trade (εq,t = +1), but how should we set εp,t ? MULTIVARIATE LINEAR MICROSTRUCTURE MODELS The answer to this specific problem depends on the immediate (time t) relation between the trade and price-change innovations.

…

The role they play and how they should be regulated are ongoing concerns of practical interest. 117 12 Limit Order Markets The worldwide proliferation of limit order markets (LOMs) clearly establishes a need for economic and statistical models of these mechanisms. This chapter discusses some approaches, but it should be admitted at the outset that no comprehensive and realistic models (either statistical or economic) exist. One might start with the view that a limit order, being a bid or offer, is simply a dealer quote by another name. The implication is that a limit order is exposed to asymmetric information risk and also must recover noninformational costs of trade. This view supports the application of the economic and statistical models described earlier to LOM, hybrid, and other nondealer markets. This perspective features a sharp division between liquidity suppliers and demanders.

…

Stock exchanges—Mathematical models. I. Title. HG4521.H353 2007 332.64—dc22 2006003935 9 8 7 6 5 4 3 2 1 Printed in the United States of America on acid-free paper To Lisa, who inspires these pages and much more. This page intentionally left blank Preface This book is a study of the trading mechanisms in financial markets: the institutions, the economic principles underlying the institutions, and statistical models for analyzing the data they generate. The book is aimed at graduate and advanced undergraduate students in financial economics and practitioners who design or use order management systems. Most of the book presupposes only a basic familiarity with economics and statistics. I began writing this book because I perceived a need for treatment of empirical market microstructure that was unified, authoritative, and comprehensive.

pages: 257 words: 13,443

**
Statistical Arbitrage: Algorithmic Trading Insights and Techniques
** by
Andrew Pole

algorithmic trading, Benoit Mandelbrot, constrained optimization, Dava Sobel, George Santayana, Long Term Capital Management, Louis Pasteur, mandelbrot fractal, market clearing, market fundamentalism, merger arbitrage, pattern recognition, price discrimination, profit maximization, quantitative trading / quantitative ﬁnance, risk tolerance, Sharpe ratio, statistical arbitrage, statistical model, stochastic volatility, systematic trading, transaction costs

Once again, some of the variation magically disappears when each day is scaled according to that day’s overall volume in the stock. Orders, up to a threshold labeled ‘‘visibility threshold,’’ have less impact on large-volume days. Fitting a mathematical curve or statistical model to the order size–market impact data yields a tool for answering the question: How much will I have to pay to buy 10,000 shares of XYZ? Note that buy and sell responses may be different and may be dependent on whether the stock is moving up or down that day. Breaking down the raw (60-day) data set and analyzing up days and down days separately will illuminate that issue. More formally, one could define an encompassing statistical model including an indicator variable for up or down day and test the significance of the estimated coefficient. Given the dubious degree to which one could reasonably determine independence and other conditions necessary for the validity of such statistical tests (without a considerable amount of work) one will be better off building prediction models for the combined data and for the up/down days separately and comparing predictions.

…

Approaches for selecting a universe of instruments for modeling and trading are described. Consideration of change is Preface xv introduced from this first toe dipping into analysis, because temporal dynamics underpin the entirety of the project. Without the dynamic there is no arbitrage. In Chapter 3 we increase the depth and breadth of the analysis, expanding the modeling scope from simple observational rules1 for pairs to formal statistical models for more general portfolios. Several popular models for time series are described but detailed focus is on weighted moving averages at one extreme of complexity and factor analysis at another, these extremes serving to carry the message as clearly as we can make it. Pair spreads are referred to throughout the text serving, as already noted, as the simplest practical illustrator of the notions discussed.

…

Therefore, it is not necessary to be overly concerned about which set of events to use in the correlation analysis as a screen for good risk-controlled candidate pairs. Events in trading volume series provide information sometimes not identified (by turning point analysis) in price series. Volume patterns do not directly affect price spreads but volume spurts are a useful warning that a stock may be subject to unusual trading activity and that price development may therefore not be as characterized in statistical models that have been estimated on average recent historical price series. In historical analysis, flags of unusual activity are extremely important in the evaluation of, for example, simulation 25 Statistical Arbitrage 80 $ 70 60 50 40 19970102 19970524 19971016 19980312 FIGURE 2.8 Adjusted close price trace (General Motors) with 20 percent turning points identified TABLE 2.1 Event return summary for Chrysler–GM Criterion daily 30% move 25% move 20% move # Events Return Correlation 332 22 26 33 0.53 0.75 0.73 0.77 results.

pages: 327 words: 103,336

**
Everything Is Obvious: *Once You Know the Answer
** by
Duncan J. Watts

active measures, affirmative action, Albert Einstein, Amazon Mechanical Turk, Black Swan, business cycle, butterfly effect, Carmen Reinhart, Cass Sunstein, clockwork universe, cognitive dissonance, coherent worldview, collapse of Lehman Brothers, complexity theory, correlation does not imply causation, crowdsourcing, death of newspapers, discovery of DNA, East Village, easy for humans, difficult for computers, edge city, en.wikipedia.org, Erik Brynjolfsson, framing effect, Geoffrey West, Santa Fe Institute, George Santayana, happiness index / gross national happiness, high batting average, hindsight bias, illegal immigration, industrial cluster, interest rate swap, invention of the printing press, invention of the telescope, invisible hand, Isaac Newton, Jane Jacobs, Jeff Bezos, Joseph Schumpeter, Kenneth Rogoff, lake wobegon effect, Laplace demon, Long Term Capital Management, loss aversion, medical malpractice, meta analysis, meta-analysis, Milgram experiment, natural language processing, Netflix Prize, Network effects, oil shock, packet switching, pattern recognition, performance metric, phenotype, Pierre-Simon Laplace, planetary scale, prediction markets, pre–internet, RAND corporation, random walk, RFID, school choice, Silicon Valley, social intelligence, statistical model, Steve Ballmer, Steve Jobs, Steve Wozniak, supply-chain management, The Death and Life of Great American Cities, the scientific method, The Wisdom of Crowds, too big to fail, Toyota Production System, ultimatum game, urban planning, Vincenzo Peruggia: Mona Lisa, Watson beat the top human players on Jeopardy!, X Prize

Nevertheless, as a speculative exercise, we tested a range of plausible assumptions, each corresponding to a different hypothetical “influencer-based” marketing campaign, and measured their return on investment using the same statistical model as before. What we found was surprising even to us: Even though the Kim Kardashians of the world were indeed more influential than average, they were so much more expensive that they did not provide the best value for the money. Rather, it was what we called ordinary influencers, meaning individuals who exhibit average or even less-than-average influence, who often proved to be the most cost-effective means to disseminate information. CIRCULAR REASONING AGAIN Before you rush out to short stock in Kim Kardashian, I should emphasize that we didn’t actually run the experiment that we imagined. Even though we were studying data from the real world, not a computer simulation, our statistical models still made a lot of assumptions. Assuming, for example, that our hypothetical marketer could persuade a few thousand ordinary influencers to tweet about their product, it is not at all obvious that their followers would respond as favorably as they do to normal tweets.

…

Next, we compared the performance of these two polls with the Vegas sports betting market—one of the oldest and most popular betting markets in the world—as well as with another prediction market, TradeSports. And finally, we compared the prediction of both the markets and the polls against two simple statistical models. The first model relied only on the historical probability that home teams win—which they do 58 percent of the time—while the second model also factored in the recent win-loss records of the two teams in question. In this way, we set up a six-way comparison between different prediction methods—two statistical models, two markets, and two polls.6 Given how different these methods were, what we found was surprising: All of them performed about the same. To be fair, the two prediction markets performed a little better than the other methods, which is consistent with the theoretical argument above.

…

Indeed, an entire field of research called sabermetrics has developed specifically for the purpose of analyzing baseball statistics, even spawning its own journal, the Baseball Research Journal. One might think, therefore, that prediction markets, with their far greater capacity to factor in different sorts of information, would outperform simplistic statistical models by a much wider margin for baseball than they do for football. But that turns out not to be true either. We compared the predictions of the Las Vegas sports betting markets over nearly twenty thousand Major League baseball games played from 1999 to 2006 with a simple statistical model based again on home-team advantage and the recent win-loss records of the two teams. This time, the difference between the two was even smaller—in fact, the performance of the market and the model were indistinguishable. In spite of all the statistics and analysis, in other words, and in spite of the absence of meaningful salary caps in baseball and the resulting concentration of superstar players on teams like the New York Yankees and Boston Red Sox, the outcomes of baseball games are even closer to random events than football games.

pages: 829 words: 186,976

**
The Signal and the Noise: Why So Many Predictions Fail-But Some Don't
** by
Nate Silver

"Robert Solow", airport security, availability heuristic, Bayesian statistics, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, big-box store, Black Swan, Broken windows theory, business cycle, buy and hold, Carmen Reinhart, Claude Shannon: information theory, Climategate, Climatic Research Unit, cognitive dissonance, collapse of Lehman Brothers, collateralized debt obligation, complexity theory, computer age, correlation does not imply causation, Credit Default Swap, credit default swaps / collateralized debt obligations, cuban missile crisis, Daniel Kahneman / Amos Tversky, diversification, Donald Trump, Edmond Halley, Edward Lorenz: Chaos theory, en.wikipedia.org, equity premium, Eugene Fama: efficient market hypothesis, everywhere but in the productivity statistics, fear of failure, Fellow of the Royal Society, Freestyle chess, fudge factor, George Akerlof, global pandemic, haute cuisine, Henri Poincaré, high batting average, housing crisis, income per capita, index fund, information asymmetry, Intergovernmental Panel on Climate Change (IPCC), Internet Archive, invention of the printing press, invisible hand, Isaac Newton, James Watt: steam engine, John Nash: game theory, John von Neumann, Kenneth Rogoff, knowledge economy, Laplace demon, locking in a profit, Loma Prieta earthquake, market bubble, Mikhail Gorbachev, Moneyball by Michael Lewis explains big data, Monroe Doctrine, mortgage debt, Nate Silver, negative equity, new economy, Norbert Wiener, PageRank, pattern recognition, pets.com, Pierre-Simon Laplace, prediction markets, Productivity paradox, random walk, Richard Thaler, Robert Shiller, Robert Shiller, Rodney Brooks, Ronald Reagan, Saturday Night Live, savings glut, security theater, short selling, Skype, statistical model, Steven Pinker, The Great Moderation, The Market for Lemons, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, too big to fail, transaction costs, transfer pricing, University of East Anglia, Watson beat the top human players on Jeopardy!, wikimedia commons

Moreover, even the aggregate economic forecasts have been quite poor in any real-world sense, so there is plenty of room for progress. Most economists rely on their judgment to some degree when they make a forecast, rather than just take the output of a statistical model as is. Given how noisy the data is, this is probably helpful. A study62 by Stephen K. McNess, the former vice president of the Federal Reserve Bank of Boston, found that judgmental adjustments to statistical forecasting methods resulted in forecasts that were about 15 percent more accurate. The idea that a statistical model would be able to “solve” the problem of economic forecasting was somewhat in vogue during the 1970s and 1980s when computers came into wider use. But as was the case in other fields, like earthquake forecasting during that time period, improved technology did not cover for the lack of theoretical understanding about the economy; it only gave economists faster and more elaborate ways to mistake noise for a signal.

…

McNees, “The Role of Judgment in Macroeconomic Forecasting Accuracy,” International Journal of Forecasting, 6, no. 3, pp. 287–99, October 1990. http://www.sciencedirect.com/science/article/pii/016920709090056H. 63. About the only economist I am aware of who relies solely on statistical models without applying any adjustments to them is Ray C. Fair of Yale. I looked at the accuracy of the forecasts from Fair’s model, which have been published regularly since 1984. They aren’t bad in some cases: the GDP and inflation forecasts from Fair’s model have been roughly as good as those of the typical judgmental forecaster. However, the model’s unemployment forecasts have always been very poor, and its performance has been deteriorating recently as it considerably underestimated the magnitude of the recent recession while overstating the prospects for recovery. One problem with statistical models is that they tend to perform well until one of their assumptions is violated and they encounter a new situation, in which case they may produce very inaccurate forecasts.

…

.* This explanation becomes less credible, however, when the forecaster does not have a history of successful predictions and when the magnitude of his error is larger. In these cases, it is much more likely that the fault lies with the forecaster’s model of the world and not with the world itself. In the instance of CDOs, the ratings agencies had no track record at all: these were new and highly novel securities, and the default rates claimed by S&P were not derived from historical data but instead were assumptions based on a faulty statistical model. Meanwhile, the magnitude of their error was enormous: AAA-rated CDOs were two hundred times more likely to default in practice than they were in theory. The ratings agencies’ shot at redemption would be to admit that the models had been flawed and the mistake had been theirs. But at the congressional hearing, they shirked responsibility and claimed to have been unlucky. They blamed an external contingency: the housing bubble.

**
Data Mining: Concepts and Techniques: Concepts and Techniques
** by
Jiawei Han,
Micheline Kamber,
Jian Pei

bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

Statistics Statistics studies the collection, analysis, interpretation or explanation, and presentation of data. Data mining has an inherent connection with statistics. A statistical model is a set of mathematical functions that describe the behavior of the objects in a target class in terms of random variables and their associated probability distributions. Statistical models are widely used to model data and data classes. For example, in data mining tasks like data characterization and classification, statistical models of target classes can be built. In other words, such statistical models can be the outcome of a data mining task. Alternatively, data mining tasks can be built on top of statistical models. For example, we can use statistics to model noise and missing data values. Then, when mining patterns in a large data set, the data mining process can use the model to help identify and handle noisy or missing values in the data.

…

Thus, the Gaussian distribution gD can be used to model the normal data, that is, most of the data points in the data set. For each object y in region, R, we can estimate , the probability that this point fits the Gaussian distribution. Because is very low, y is unlikely generated by the Gaussian model, and thus is an outlier. The effectiveness of statistical methods highly depends on whether the assumptions made for the statistical model hold true for the given data. There are many kinds of statistical models. For example, the statistic models used in the methods may be parametric or nonparametric. Statistical methods for outlier detection are discussed in detail in Section 12.3. Proximity-Based Methods Proximity-based methods assume that an object is an outlier if the nearest neighbors of the object are far away in feature space, that is, the proximity of the object to its neighbors significantly deviates from the proximity of most of the other objects to their neighbors in the same data set.

…

Then, when mining patterns in a large data set, the data mining process can use the model to help identify and handle noisy or missing values in the data. Statistics research develops tools for prediction and forecasting using data and statistical models. Statistical methods can be used to summarize or describe a collection of data. Basic statistical descriptions of data are introduced in Chapter 2. Statistics is useful for mining various patterns from data as well as for understanding the underlying mechanisms generating and affecting the patterns. Inferential statistics (or predictive statistics) models data in a way that accounts for randomness and uncertainty in the observations and is used to draw inferences about the process or population under investigation. Statistical methods can also be used to verify data mining results. For example, after a classification or prediction model is mined, the model should be verified by statistical hypothesis testing.

pages: 174 words: 56,405

**
Machine Translation
** by
Thierry Poibeau

AltaVista, augmented reality, call centre, Claude Shannon: information theory, cloud computing, combinatorial explosion, crowdsourcing, easy for humans, difficult for computers, en.wikipedia.org, Google Glasses, information retrieval, Internet of things, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, natural language processing, Necker cube, Norbert Wiener, RAND corporation, Robert Mercer, Skype, speech recognition, statistical model, technological singularity, Turing test, wikimedia commons

The recent developments we have described in this section have, however, helped improve the IBM models and can still be considered currently as the state of the art in machine translation. Introduction of Linguistic Information into Statistical Models Statistical translation models, despite their increasing complexity to better fit language specificities, have not solved all the difficulties encountered. In fact, bilingual corpora, even large ones, remain insufficient at times to properly cover rare or complex linguistic phenomena. One solution is to then integrate more information of a linguistic nature in the machine translation system to better represent the relations between words (syntax) and their meanings (semantics). Alignment Models Accounting for Syntax The statistical models described so far are all direct translation systems: they search for equivalences between the source language and the target language at word level, or, at best, they take into consideration sequences of words that are not necessarily linguistically coherent.

…

On the one hand, the analysis of existing translations and their generalization according to various linguistic strategies can be used as a reservoir of knowledge for future translations. This is known as example-based translation, because in this approach previous translations are considered examples for new translations. On the other hand, with the increasing amount of translations available on the Internet, it is now possible to directly design statistical models for machine translation. This approach, known as statistical machine translation, is the most popular today. Unlike a translation memory, which can be relatively small, automatic processing presumes the availability of an enormous amount of data. Robert Mercer, one of the pioneers of statistical translation,1 proclaimed: “There is no data like more data.” In other words, for Mercer as well as followers of the statistical approach, the best strategy for developing a system consists in accumulating as much data as possible.

…

Nonetheless, toward the end of the 1980s, the statistical approach based on sentence alignment at the word level led to remarkable progress for machine translation. This approach naturally takes into account the statistical nature of language, which means that the approach focuses on the most frequent patterns in a language and, despite its limitations, is able to produce acceptable translations for a significant number of simple sentences. In certain cases, statistical models can also identify idioms thanks to asymmetric alignments (one word from the source language aligned with several words from the target language, for example), which means they can also overcome the word-for-word limitation. In the following section, we will examine several lexical alignment models developed toward the end of the 1980s and the beginning of the 1990s. The goal of this approach is to use very large bilingual corpora to automatically extract bilingual lexicons.

pages: 204 words: 58,565

**
Keeping Up With the Quants: Your Guide to Understanding and Using Analytics
** by
Thomas H. Davenport,
Jinho Kim

Black-Scholes formula, business intelligence, business process, call centre, computer age, correlation coefficient, correlation does not imply causation, Credit Default Swap, en.wikipedia.org, feminist movement, Florence Nightingale: pie chart, forensic accounting, global supply chain, Hans Rosling, hypertext link, invention of the telescope, inventory management, Jeff Bezos, Johannes Kepler, longitudinal study, margin call, Moneyball by Michael Lewis explains big data, Myron Scholes, Netflix Prize, p-value, performance metric, publish or perish, quantitative hedge fund, random walk, Renaissance Technologies, Robert Shiller, Robert Shiller, self-driving car, sentiment analysis, six sigma, Skype, statistical model, supply-chain management, text mining, the scientific method, Thomas Davenport

Data analysis * * * Key Software Vendors for Different Analysis Types (listed alphabetically) REPORTING SOFTWARE BOARD International IBM Cognos Information Builders WebFOCUS Oracle Business Intelligence (including Hyperion) Microsoft Excel/SQL Server/SharePoint MicroStrategy Panorama SAP BusinessObjects INTERACTIVE VISUAL ANALYTICS QlikTech QlikView Tableau TIBCO Spotfire QUANTITATIVE OR STATISTICAL MODELING IBM SPSS R (an open-source software package) SAS * * * While all of the listed reporting software vendors also have capabilities for graphical display, some vendors focus specifically on interactive visual analytics, or the use of visual representations of data and reporting. Such tools are often used simply to graph data and for data discovery—understanding the distribution of the data, identifying outliers (data points with unexpected values) and visual relationships between variables. So we’ve listed these as a separate category. We’ve also listed key vendors of software for the other category of analysis, which we’ll call quantitative or statistical modeling. In that category, you’re trying to use statistics to understand the relationships between variables and to make inferences from your sample to a larger population.

…

However, there are circumstances in which these “black box” approaches to analysis can greatly leverage the time and productivity of human analysts. In big-data environments, where the data just keeps coming in large volumes, it may not always be possible for humans to create hypotheses before sifting through the data. In the context of placing digital ads on publishers’ sites, for example, decisions need to be made in thousandths of a second by automated decision systems, and the firms doing this work must generate several thousand statistical models per week. Clearly this type of analysis can’t involve a lot of human hypothesizing and reflection on results, and machine learning is absolutely necessary. But for the most part, we’d advise sticking to hypothesis-driven analysis and the steps and sequence in this book. The Modeling (Variable Selection) Step A model is a purposefully simplified representation of the phenomenon or problem.

…

The software vendors for this type of data tend to be different from the reporting software vendors, though the two categories are blending a bit over time. Microsoft Excel, for example, perhaps the most widely used analytical software tool in the world (though most people think of it as a spreadsheet tool), can do some statistical analysis (and visual analytics) as well as reporting, but it’s not the most robust statistical software if you have a lot of data or a complex statistical model to build, so it’s not listed in that category. Excel’s usage for analytics in the corporate environment is frequently augmented by other Microsoft products, including SQL Server (primarily a database tool with some analytical functionality) and SharePoint (primarily a collaboration tool, with some analytical functionality). Types of Models There are a variety of model types that analysts and their organizations use to think analytically and make data-based decisions.

pages: 276 words: 81,153

**
Outnumbered: From Facebook and Google to Fake News and Filter-Bubbles – the Algorithms That Control Our Lives
** by
David Sumpter

affirmative action, Bernie Sanders, correlation does not imply causation, crowdsourcing, don't be evil, Donald Trump, Elon Musk, Filter Bubble, Google Glasses, illegal immigration, Jeff Bezos, job automation, Kenneth Arrow, Loebner Prize, Mark Zuckerberg, meta analysis, meta-analysis, Minecraft, Nate Silver, natural language processing, Nelson Mandela, p-value, prediction markets, random walk, Ray Kurzweil, Robert Mercer, selection bias, self-driving car, Silicon Valley, Skype, Snapchat, speech recognition, statistical model, Stephen Hawking, Steven Pinker, The Signal and the Noise by Nate Silver, traveling salesman, Turing test

They don’t talk directly to people to get a sense of the feelings and emotions involved, an approach that would be considered subjective. Instead, Mona described to me a culture where colleagues judged each other on how advanced their mathematical techniques were. They believed there was a direct trade-off between the quality of statistical results and the ease with which they can be communicated. If FiveThirtyEight offered a purely statistical model of the polls then the socio-economic background of their statisticians wouldn’t be relevant. But they don’t offer a purely statistical model. Such a model would have come out strongly for Clinton. Instead, they use a combination of their skills as forecasters and the underlying numbers. Work environments consisting of people with the same background and ideas are typically less likely to perform as well on difficult tasks, such as academic research and running a successful business.12 It is difficult for a bunch of people who all have the same background to identify all of the complex factors involved in predicting the future.

…

Google’s search engine was making racist autocomplete suggestions; Twitterbots were spreading fake news; Stephen Hawking was worried about artificial intelligence; far-right groups were living in algorithmically created filter-bubbles; Facebook was measuring our personalities, and these were being exploited to target voters. One after another, the stories of the dangers of algorithms accumulated. Even the mathematicians’ ability to make predictions was called into question as statistical models got both Brexit and Trump wrong. Stories about the maths of football, love, weddings, graffiti and other fun things were suddenly replaced by the maths of sexism, hate, dystopia and embarrassing errors in opinion poll calculations. When I reread the scientific article on Banksy, a bit more carefully this time, I found that very little new evidence was presented about his identity. While the researchers mapped out the precise position of 140 artworks, they only investigated the addresses of one single suspect.

…

I decided to find out in the only way I knew how: by looking at the data, computing the statistics and doing the maths. CHAPTER TWO Make Some Noise After the mathematical unmasking of Banksy had sunk in, I realised that I had somehow missed the sheer scale of the change that algorithms were making to our society. But let me be clear. I certainly hadn’t missed the development of the mathematics. Machine learning, statistical models and artificial intelligence are all things I actively research and talk about with my colleagues every day. I read the latest articles and keep up to date with the biggest developments. But I was concentrating on the scientific side of things: looking at how the algorithms work in the abstract. I had failed to think seriously about the consequences of their usage. I hadn’t thought about how the tools I was helping to develop were changing society.

pages: 301 words: 89,076

**
The Globotics Upheaval: Globalisation, Robotics and the Future of Work
** by
Richard Baldwin

agricultural Revolution, Airbnb, AltaVista, Amazon Web Services, augmented reality, autonomous vehicles, basic income, business process, business process outsourcing, call centre, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, commoditize, computer vision, Corn Laws, correlation does not imply causation, Credit Default Swap, David Ricardo: comparative advantage, declining real wages, deindustrialization, deskilling, Donald Trump, Douglas Hofstadter, Downton Abbey, Elon Musk, Erik Brynjolfsson, facts on the ground, future of journalism, future of work, George Gilder, Google Glasses, Google Hangouts, hiring and firing, impulse control, income inequality, industrial robot, intangible asset, Internet of things, invisible hand, James Watt: steam engine, Jeff Bezos, job automation, knowledge worker, laissez-faire capitalism, low skilled workers, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, manufacturing employment, Mark Zuckerberg, mass immigration, mass incarceration, Metcalfe’s law, new economy, optical character recognition, pattern recognition, Ponzi scheme, post-industrial society, post-work, profit motive, remote working, reshoring, ride hailing / ride sharing, Robert Gordon, Robert Metcalfe, Ronald Reagan, Second Machine Age, self-driving car, side project, Silicon Valley, Skype, Snapchat, social intelligence, sovereign wealth fund, standardized shipping container, statistical model, Stephen Hawking, Steve Jobs, supply-chain management, TaskRabbit, telepresence, telepresence robot, telerobotics, Thomas Malthus, trade liberalization, universal basic income

The chore is to identify which features of the digitalized speech data are most useful when making an educated guess as to the corresponding word. To tackle this chore, the computer scientists set up a “blank slate” statistical model. It is a blank slate in the sense that every feature of the speech data is allowed to be, in principle, an important feature in the guessing process. What they are looking for is how to weight each aspect of the speech data when trying to find the word it is associated with. The revolutionary thing about machine learning is that the scientists don’t fill in the blanks. They don’t write down the weights in the statistical model. Instead, they write a set of step-by-step instructions for how the computer should fill in the blanks itself. The human-written instructions tell the machine how to learn about which features of the sound data are the important ones.

…

That is to say, it identifies the features of the speech data that are useful in predicting the corresponding words. The scientists then make the statistical model take an exam. They feed it a fresh set of spoken words and ask it to predict the written words that they correspond to. This is called the “testing data set.” Usually, the model—which is also called an “algorithm”—is not good enough to be released “into the wild,” so the computer scientists do some sophisticated trial and error of their own by manually tweaking the computer program that is used to choose the weights. After what can be a long sequence of iterations like this, and after the statistical model has achieved a sufficiently high degree of accuracy, the new language model graduates to the next level. Apple didn’t immediately use this new algorithm for translation.

…

We understand how our rider thinks—how we, for example, do arithmetic, algebra, and archery. We haven’t a clue as to how our elephant thinks—how we, for example, recognize a cat or keep our balance when running over hill and dale. A form of AI called “machine learning” solved the paradox by changing the way computers are programmed. With machine learning, humans help the computer (the “machine” part) estimate a very large statistical model that the computer then uses to guess the solution to a particular problem (the “learning” part). Thanks to mind-blowing advances in computing power and access to hallucinatory amounts of data, white-collar robots trained by machine learning routinely achieve human-level performance on specific guessing tasks, like recognizing speech. With machine-learning-trained algorithms, computers started to think, to cognate.

pages: 400 words: 94,847

**
Reinventing Discovery: The New Era of Networked Science
** by
Michael Nielsen

Albert Einstein, augmented reality, barriers to entry, bioinformatics, Cass Sunstein, Climategate, Climatic Research Unit, conceptual framework, dark matter, discovery of DNA, Donald Knuth, double helix, Douglas Engelbart, Douglas Engelbart, en.wikipedia.org, Erik Brynjolfsson, fault tolerance, Fellow of the Royal Society, Firefox, Freestyle chess, Galaxy Zoo, Internet Archive, invisible hand, Jane Jacobs, Jaron Lanier, Johannes Kepler, Kevin Kelly, Magellanic Cloud, means of production, medical residency, Nicholas Carr, P = NP, publish or perish, Richard Feynman, Richard Stallman, selection bias, semantic web, Silicon Valley, Silicon Valley startup, Simon Singh, Skype, slashdot, social intelligence, social web, statistical model, Stephen Hawking, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, The Nature of the Firm, The Wisdom of Crowds, University of East Anglia, Vannevar Bush, Vernor Vinge

Might it be that the statistical models contain more truth than our conventional theories of language, with their notions of verb, noun, and adjective, subjects and objects, and so on? Or perhaps the models contain a different kind of truth, in part complementary, and in part overlapping, with conventional theories of language? Maybe we could develop a better theory of language by combining the best insights from the conventional approach and the approach based on statistical modeling into a single, unified explanation? Unfortunately, we don’t yet know how to make such unified theories. But it’s stimulating to speculate that nouns and verbs, subjects and objects, and all the other paraphernalia of language are really emergent properties whose existence can be deduced from statistical models of language.

…

The program would also examine the corpus to figure out how words moved around in the sentence, observing, for example, that “hola” and “hello” tend to be in the same parts of the sentence, while other words get moved around more. Repeating this for every pair of words in the Spanish and English languages, their program gradually built up a statistical model of translation—an immensely complex model, but nonetheless one that can be stored on a modern computer. I won’t describe the models they used in complete detail here, but the hola-hello example gives you the flavor. Once they had analyzed the corpus and built up their statistical model, they used that model to translate new texts. To translate a Spanish sentence, the idea was to find the English sentence that, according to the model, had the highest probability. That high-probability sentence would be output as the translation. Frankly, when I first heard about statistical machine translation I thought it didn’t sound very promising.

…

But whereas Darwin’s theory of evolution can be summed up in a few sentences, and Einstein’s general theory of relativity can be expressed in a single equation, these theories of translation are expressed in models with billions of parameters. You might object that such a statistical model doesn’t seem much like a conventional scientific explanation, and you’d be right: it’s not an explanation in the conventional sense. But perhaps it should be considered instead as a new kind of explanation. Ordinarily, we judge explanations in part by their ability to predict new phenomena. In the case of translation, that means accurately translating never-before-seen sentences. And so far, at least, the statistical translation models do a better job of that than any conventional theory of language. It’s telling that a model that doesn’t even understand the noun-verb distinction can outperform our best linguistic models. At the least we should take seriously the idea that these statistical models express truths not found in more conventional explanations of language translation.

pages: 354 words: 26,550

**
High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems
** by
Irene Aldridge

algorithmic trading, asset allocation, asset-backed security, automated trading system, backtesting, Black Swan, Brownian motion, business cycle, business process, buy and hold, capital asset pricing model, centralized clearinghouse, collapse of Lehman Brothers, collateralized debt obligation, collective bargaining, computerized trading, diversification, equity premium, fault tolerance, financial intermediation, fixed income, high net worth, implied volatility, index arbitrage, information asymmetry, interest rate swap, inventory management, law of one price, Long Term Capital Management, Louis Bachelier, margin call, market friction, market microstructure, martingale, Myron Scholes, New Journalism, p-value, paper trading, performance metric, profit motive, purchasing power parity, quantitative trading / quantitative ﬁnance, random walk, Renaissance Technologies, risk tolerance, risk-adjusted returns, risk/return, Sharpe ratio, short selling, Small Order Execution System, statistical arbitrage, statistical model, stochastic process, stochastic volatility, systematic trading, trade route, transaction costs, value at risk, yield curve, zero-sum game

Operational risk—the risk of financial losses embedded in daily trading operations 5. Legal risk—the risk of litigation expenses All current risk measurement approaches fall into four categories: r r r r Statistical models Scalar models Scenario analysis Causal modeling Statistical models generate predictions about worst-case future conditions based on past information. The Value-at-Risk (VaR) methodology is the most common statistical risk measurement tool, discussed in detail in the sections that focus on market and liquidity risk estimation. Statistical models are the preferred methodology of risk estimation whenever statistical modeling is feasible. Scalar models establish the maximum foreseeable loss levels as percentages of business parameters, such as revenues, operating costs, and the like. The parameters can be computed as averages of several days, weeks, months, or even years of a particular business variable, depending on the time frame most suitable for each parameter.

…

Yet, readers relying on software packages with preconfigured statistical procedures may find the level of detail presented here to be sufficient for quality analysis of trading opportunities. The depth of the statistical content should be also sufficient for readers to understand the models presented throughout the remainder of this book. Readers interested in a more thorough treatment of statistical models may refer to Tsay (2002); Campbell, Lo, and MacKinlay (1997); and Gouriéroux and Jasiak (2001). This chapter begins with a review of the fundamental statistical estimators, moves on to linear dependency identification methods and volatility modeling techniques, and concludes with standard nonlinear approaches for identifying and modeling trading opportunities. T STATISTICAL PROPERTIES OF RETURNS According to Dacorogna et al. (2001, p. 121), “high-frequency data opened up a whole new field of exploration and brought to light some behaviors that could not be observed at lower frequencies.”

…

CHAPTER 12 Event Arbitrage ith news reported instantly and trades placed on a tick-by-tick basis, high-frequency strategies are now ideally positioned to profit from the impact of announcements on markets. These highfrequency strategies, which trade on the market movements surrounding news announcements, are collectively referred to as event arbitrage. This chapter investigates the mechanics of event arbitrage in the following order: W r Overview of the development process r Generating a price forecast through statistical modeling of r Directional forecasts r Point forecasts r Applying event arbitrage to corporate announcements, industry news, and macroeconomic news r Documented effects of events on foreign exchange, equities, fixed income, futures, emerging economies, commodities, and REIT markets DEVELOPING EVENT ARBITRAGE TRADING STRATEGIES Event arbitrage refers to the group of trading strategies that place trades on the basis of the markets’ reaction to events.

pages: 320 words: 33,385

**
Market Risk Analysis, Quantitative Methods in Finance
** by
Carol Alexander

asset allocation, backtesting, barriers to entry, Brownian motion, capital asset pricing model, constrained optimization, credit crunch, Credit Default Swap, discounted cash flows, discrete time, diversification, diversified portfolio, en.wikipedia.org, fixed income, implied volatility, interest rate swap, market friction, market microstructure, p-value, performance metric, quantitative trading / quantitative ﬁnance, random walk, risk tolerance, risk-adjusted returns, risk/return, Sharpe ratio, statistical arbitrage, statistical model, stochastic process, stochastic volatility, Thomas Bayes, transaction costs, value at risk, volatility smile, Wiener process, yield curve, zero-sum game

Chapter 3, Probability and Statistics, covers the probabilistic and statistical models that we use to analyse the evolution of financial asset prices or interest rates. Starting from the basic concepts of a random variable, a probability distribution, quantiles and population and sample moments, we then provide a catalogue of probability distributions. We describe the theoretical properties of each distribution and give examples of practical applications to finance. Stable distributions and kernel estimates are also covered, because they have broad applications to financial risk management. The sections on statistical inference and maximum likelihood lay the foundations for Chapter 4. Finally, we focus on the continuous time and discrete time statistical models for the evolution of financial asset prices and returns, which are further developed in Volume III.

…

The multivariate t distribution has very useful applications which will be described in Volumes II and IV. Its most important market risk modelling applications are to: • multivariate GARCH modelling, generating copulas, and • simulating asset prices. • I.3.5 INTRODUCTION TO STATISTICAL INFERENCE A statistical model will predict well only if it is properly specified and its parameter estimates are robust, unbiased and efficient. Unbiased means that the expected value of the estimator is equal to the true model parameter and efficient means that the variance of the estimator is low, i.e. different samples give similar estimates. When we set up a statistical model the implicit assumption is that this is the ‘true’ model for the population. We estimate the model’s parameters from a sample and then use these estimates to infer the values of the ‘true’ population parameters. With what degree of confidence can we say that the ‘true’ parameter takes some value such as 0?

…

Using this add-in, we have been able to compute eigenvectors and eigenvalues and perform many other matrix operations that would not be possible otherwise in Excel, except by purchasing software. This matrix.xla add-in is included on the CD-ROM, but readers may also like to download any later versions, currently available free from: http://digilander.libero.it/foxes (e-mail: leovlp@libero.it). I.3 Probability and Statistics I.3.1 INTRODUCTION This chapter describes the probabilistic and statistical models that we use to analyse the evolution of financial asset prices or interest rates. Prices or returns on financial assets, interest rates or their changes, and the value or P&L of a portfolio are some examples of the random variables used in finance. A random variable is a variable whose value could be observed today and in the past, but whose future values are unknown. We may have some idea about the future values, but we do not know exactly which value will be realized in the future.

pages: 250 words: 64,011

**
Everydata: The Misinformation Hidden in the Little Data You Consume Every Day
** by
John H. Johnson

Affordable Care Act / Obamacare, Black Swan, business intelligence, Carmen Reinhart, cognitive bias, correlation does not imply causation, Daniel Kahneman / Amos Tversky, Donald Trump, en.wikipedia.org, Kenneth Rogoff, labor-force participation, lake wobegon effect, Long Term Capital Management, Mercator projection, Mercator projection distort size, especially Greenland and Africa, meta analysis, meta-analysis, Nate Silver, obamacare, p-value, PageRank, pattern recognition, publication bias, QR code, randomized controlled trial, risk-adjusted returns, Ronald Reagan, selection bias, statistical model, The Signal and the Noise by Nate Silver, Thomas Bayes, Tim Cook: Apple, wikimedia commons, Yogi Berra

You collect all the data on every wheat price in the history of humankind, and all the different factors that determine the price of wheat (temperature, feed prices, transportation costs, etc.). First, you need to develop a statistical model to determine what factors have affected the price of wheat in the past and how these various factors relate to one another mathematically. Then, based on that model, you predict the price of wheat for next year.14 The problem is that no matter how big your sample is (even if it’s the full population), and how accurate your statistical model is, there are still unknowns that can cause your forecast to be off: What if a railroad strike doubles the transportation costs? What if Congress passes new legislation capping the price of wheat? What if there’s a genetic mutation that makes wheat grow twice as fast, essentially doubling the world’s supply?

…

As Hovenkamp said, “the plaintiff’s expert had ignored a clear ‘outlier’ in the data.”33 If that outlier data had been excluded—as it arguably should have been—then the results would have shown a clear increase in market share for Conwood. Instead, the conclusion—driven by an extreme observation—showed a decrease. If your conclusions change dramatically by excluding a data point, then that data point is a strong candidate to be an outlier. In a good statistical model, you would expect that you can drop a data point without seeing a substantive difference in the results. It’s something to think about when looking for outliers. ARE YOU BETTER THAN AVERAGE? The average American: Sleeps more than 8.7 hours per day34 Weighs approximately 181 pounds (195.5 pounds for men and 166.2 pounds for women)35 Drinks 20.8 gallons of beer per year36 Drives 13,476 miles per year (hopefully not after drinking all that beer)37 Showers six times a week, but only shampoos four times a week38 Has been at his or her current job 4.6 years39 So, are you better than average?

…

(On its website, Visa even suggests that you tell your financial institution if you’ll be traveling, which can “help ensure that your card isn’t flagged for unusual activity.”18) This is a perfect example of a false positive—the credit card company predicted that the charges on your card were potentially fraudulent, but it was wrong. Events like this, which may not be accounted for in the statistical model, are potential sources of prediction error. Just as sampling error tells us about the uncertainty in our sample, prediction error is a way to measure uncertainty in the future, essentially by comparing the predicted results to the actual outcomes, once they occur.19 Prediction error is often measured using a prediction interval, which is the range in which we expect to see the next data point.

pages: 265 words: 74,000

**
The Numerati
** by
Stephen Baker

Berlin Wall, Black Swan, business process, call centre, correlation does not imply causation, Drosophila, full employment, illegal immigration, index card, Isaac Newton, job automation, job satisfaction, McMansion, Myron Scholes, natural language processing, PageRank, personalized medicine, recommendation engine, RFID, Silicon Valley, Skype, statistical model, Watson beat the top human players on Jeopardy!

He started publishing papers nearly as soon as he arrived. And when he got his master's, he decided to look for a job "at places where they hire Ph.D.'s." He landed at Accenture, and now, at an age at which many of his classmates are just finishing their doctorate, he runs the analytics division from his perch in Chicago. Ghani leads me out of his office and toward the shopping cart. For statistical modeling, he explains, grocery shopping is one of the first retail industries to conquer. This is because we buy food constantly. For many of us, the supermarket functions as a chilly, Muzak-blaring annex to our pantries. (I would bet that millions of suburban Americans spend more time in supermarkets than in their formal living room.) Our grocery shopping is so prodigious that just by studying one year of our receipts, researchers can detect all sorts of patterns—far more than they can learn from a year of records detailing our other, more sporadic purchases.

…

It's terrifying." He thinks that over the next generation, many of us will surround ourselves with the kinds of networked gadgets he and his team are building and testing. These machines will busy themselves with far more than measuring people's pulse and counting the pills they take, which is what today's state-of-the-art monitors can do. Dishman sees sensors eventually recording and building statistical models of almost every aspect of our behavior. They'll track our pathways in the house, the rhythm of our gait. They'll diagram our thrashing in bed and chart our nightly trips to the bathroom—perhaps keeping tabs on how much time we spend in there. Some of these gadgets will even measure the pause before we recognize a familiar voice on the phone. A surveillance society gone haywire? Personal privacy in tatters?

…

Let's say they see lots of activity in the morning and at bedtime. Together those two periods might represent 90 percent of toothbrush movement. From that, they can calculate a 90 percent probability that toothbrush movement involves teeth cleaning. (They could factor in time variables, but there's more than enough complexity ahead, as we'll see.) Next they move to the broom and the teakettle, and they ask the same questions. The goal is to build a statistical model for each of us that will infer from a series of observations what we're most likely to be doing. The toothbrush was easy. For the most part, it sticks to only one job. But consider the kettle. What are the chances that it's being used for tea? Maybe a person uses it to make instant soup (which is more nutritious than tea but dangerously salty for people like my mother). How can the Intel team come up with a probability?

pages: 764 words: 261,694

**
The Elements of Statistical Learning (Springer Series in Statistics)
** by
Trevor Hastie,
Robert Tibshirani,
Jerome Friedman

Bayesian statistics, bioinformatics, computer age, conceptual framework, correlation coefficient, G4S, greed is good, linear programming, p-value, pattern recognition, random walk, selection bias, speech recognition, statistical model, stochastic process, The Wisdom of Crowds

–Ian Hacking This is page xiii Printer: Opaque this Contents Preface to the Second Edition vii Preface to the First Edition xi 1 Introduction 2 Overview of Supervised Learning 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 2.2 Variable Types and Terminology . . . . . . . . . . 2.3 Two Simple Approaches to Prediction: Least Squares and Nearest Neighbors . . . . . . . 2.3.1 Linear Models and Least Squares . . . . 2.3.2 Nearest-Neighbor Methods . . . . . . . . 2.3.3 From Least Squares to Nearest Neighbors 2.4 Statistical Decision Theory . . . . . . . . . . . . . 2.5 Local Methods in High Dimensions . . . . . . . . . 2.6 Statistical Models, Supervised Learning and Function Approximation . . . . . . . . . . . . 2.6.1 A Statistical Model for the Joint Distribution Pr(X, Y ) . . . 2.6.2 Supervised Learning . . . . . . . . . . . . 2.6.3 Function Approximation . . . . . . . . . 2.7 Structured Regression Models . . . . . . . . . . . 2.7.1 Difficulty of the Problem . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 14 16 18 22 . . . . 28 . . . . . 28 29 29 32 32 . . . . . . . . . . . 9 9 9 . . . . . . . . . .

…

We will see that there is a whole spectrum of models between the rigid linear models and the extremely flexible 1-nearest-neighbor models, each with their own assumptions and biases, which have been proposed specifically to avoid the exponential growth in complexity of functions in high dimensions by drawing heavily on these assumptions. Before we delve more deeply, let us elaborate a bit on the concept of statistical models and see how they fit into the prediction framework. 28 2. Overview of Supervised Learning 2.6 Statistical Models, Supervised Learning and Function Approximation Our goal is to find a useful approximation fˆ(x) to the function f (x) that underlies the predictive relationship between the inputs and outputs. In the theoretical setting of Section 2.4, we saw that squared error loss lead us to the regression function f (x) = E(Y |X = x) for a quantitative response.

…

The class of nearest-neighbor methods can be viewed as direct estimates of this conditional expectation, but we have seen that they can fail in at least two ways: • if the dimension of the input space is high, the nearest neighbors need not be close to the target point, and can result in large errors; • if special structure is known to exist, this can be used to reduce both the bias and the variance of the estimates. We anticipate using other classes of models for f (x), in many cases specifically designed to overcome the dimensionality problems, and here we discuss a framework for incorporating them into the prediction problem. 2.6.1 A Statistical Model for the Joint Distribution Pr(X, Y ) Suppose in fact that our data arose from a statistical model Y = f (X) + ε, (2.29) where the random error ε has E(ε) = 0 and is independent of X. Note that for this model, f (x) = E(Y |X = x), and in fact the conditional distribution Pr(Y |X) depends on X only through the conditional mean f (x). The additive error model is a useful approximation to the truth. For most systems the input–output pairs (X, Y ) will not have a deterministic relationship Y = f (X).

pages: 321

**
Finding Alphas: A Quantitative Approach to Building Trading Strategies
** by
Igor Tulchinsky

algorithmic trading, asset allocation, automated trading system, backtesting, barriers to entry, business cycle, buy and hold, capital asset pricing model, constrained optimization, corporate governance, correlation coefficient, credit crunch, Credit Default Swap, discounted cash flows, discrete time, diversification, diversified portfolio, Eugene Fama: efficient market hypothesis, financial intermediation, Flash crash, implied volatility, index arbitrage, index fund, intangible asset, iterative process, Long Term Capital Management, loss aversion, market design, market microstructure, merger arbitrage, natural language processing, passive investing, pattern recognition, performance metric, popular capitalism, prediction markets, price discovery process, profit motive, quantitative trading / quantitative ﬁnance, random walk, Renaissance Technologies, risk tolerance, risk-adjusted returns, risk/return, selection bias, sentiment analysis, shareholder value, Sharpe ratio, short selling, Silicon Valley, speech recognition, statistical arbitrage, statistical model, stochastic process, survivorship bias, systematic trading, text mining, transaction costs, Vanguard fund, yield curve

Machine Learning in Alpha Research123 Machine learning Unsupervised methods Clusterization algorithms Supervised methods Statistical models Support vector machines Neural networks Deep learning algorithms Fuzzy logic Ensemble methods Random forest AdaBoost Figure 16.1 The most developed directions of machine learning. The most popular are in black Statistical Models Models like naive Bayes, linear discriminant analysis, the hidden Markov model, and logistic regression are good for solving relatively simple problems that do not need high precision of classification or prediction. These methods are easy to implement and not too sensitive to missing data. The disadvantage is that each of these approaches presumes some specific data model. Trend analysis is an example of applications of statistical models in alpha research. In particular, a hidden Markov model is frequently utilized for that purpose, based on the belief that price movements of the stock market are not totally random.

…

(There will be a range of views on both of these horizons, but we can still use the implied causal relationship between the extreme weather and the commodity supply to narrow the range of candidates.) We can now test our idea by gathering data on historical weather forecasts and price changes for the major energy contracts, and testing for association between the two datasets, using a partial in- sample historical dataset. The next step is to fit a simple statistical model and test it for robustness, while varying the parameters in the fit. One good robustness test is to include a similar asset for comparison, where we expect the effect to be weaker. In the case of our weather alpha example, Brent crude oil would be a reasonable choice. Crude oil is a global market, so we would expect some spillover from a US supply disruption. However, oil delivered in Europe is not a perfect substitute for the US supply, so we would expect a diluted impact.

…

Shaw & Co. 8 design 25–30 automated searches 111–120 backtesting 33–41 case study 31–41 core concepts 3–6 data inputs 4, 25–26, 43–47 evaluation 28–29 expressions 4 flow chart 41 future performance 29–30 horizons 4–50 intraday alphas 219–221 machine learning 121–126 noise reduction 26 optimization 29–30 prediction frequency 27 quality 5 risk-on/risk off alphas 246–247 robustness 89–93 smoothing 54–55, 59–60 triple-axis plan 83–88 universe 26 value 27–30 digital filters 127–128 digitization 7–9 dimensionality 129–132 disclosures 192 distressed assets 202–203 diversification automated searches 118–119 exchange-traded funds 233 portfolios 83–88, 108 DL see deep learning dot (inner) product 63–64 Dow, Charles 7 DPIN see dynamic measure of the probability of informed trading drawdowns 106–107 dual timestamping 78 dynamic measure of the probability of informed trading (DPIN) 214–215 dynamic parameterization 132 early-exercise premium 174 earnings calls 181, 187–188 earnings estimates 184–185 earnings surprises 185–186 efficiency, automated searches 111–113 Index295 efficient markets hypothesis (EMH) 11, 135 ego 19 elegance of models 75 EMH see efficient markets hypothesis emotions 19 ensemble methods 124–125 ensemble performance 117–118 estimation of risk 102–106 historical 103–106 position-based 102–103 shrinkage 131 ETFs see exchange-traded funds Euclidean space 64–66 evaluation 13–14, 28–29 backtesting 13–14, 33–41, 69–76 bias 77–82 bootstrapping 107 correlation 28–29 cutting losses 20–21 data selection 74–75 drawdowns 107 information ratio 28 margin 28 overfitting 72–75 risk 101–110 robustness 89–93 turnover 49–60 see also validation event-driven strategies 195–205 business cycle 196 capital structure arbitrage 204–205 distressed assets 202–203 index-rebalancing arbitrage 203–204 mergers 196–199 spin-offs, split-offs & carve-outs 200–202 exchange-traded funds (ETFs) 223–240 average daily trading volume 239 challenges 239–240 merits 232–233 momentum alphas 235–237 opportunities 235–238 research 231–240 risks 233–235 seasonality 237–238 see also index alphas exit costs 19, 21 expectedness of news 164 exponential moving averages 54 expressions, simple 4 extreme alpha values 104 extrinsic risk 101, 106, 108–109 factor risk heterogeneity 234 factors financial statements 147 to alphas 148 failure modes 84 fair disclosures 192 fair value of futures 223 Fama–French three-factor model 96 familiarity bias 81 feature extraction 130–131 filters 127–128 finance blogs 181–182 finance portals 180–181, 192 financial statement analysis 141–154 balance sheets 143 basics 142 cash flow statements 144– 145, 150–152 corporate governance 146 factors 147–148 fundamental analysis 149–154 growth 145–146 income statements 144 negative factors 146–147 special considerations 147 finite impulse response (FIR) filters 127–128 296Index FIR filters see finite impulse response filters Fisher Transform 91 five-day reversion alpha 55–59 Float Boost 125 forecasting behavioral economics 11–12 computer adoption 7–9 frequencies 27 horizons 49–50 statistical arbitrage 10–11 UnRule 17–21 see also predictions formation of the industry 8–9 formulation bias 80 forward-looking bias 72 forwards 241–249 checklist 243–244 Commitments of Traders report 244–245 instrument groupings 242–243 seasonality 245–246 underlying assets 241–242 frequencies 27 full text analysis 164 fundamental analysis 149–154 future performance 29–30 futures 241–249 checklist 243–244 Commitments of Traders report 244–245 fair value 223 instrument groupings 242–243 seasonality 245–246 underlying assets 241–242 fuzzy logic 126 General Electric 200 generalized correlation 64–66 groupings, futures and forwards 242–243 group momentum 157–158 growth analysis 145–146 habits, successful 265–271 hard neutralization 108 headlines 164 hedge fund betas see risk factors hedge funds, initial 8–9 hedging 108–109 herding 81–82, 190–191 high-pass filters 128 historical risk measures 103–106 horizons 49–50 horizontal mergers 197 Huber loss function 129 humps 54 hypotheses 4 ideas 85–86 identity matrices 65 IIR filters see infinite impulse response filters illiquidity premium 208–211 implementation core concepts 12–13 triple-axis plan 86–88 inaccuracy of models 10–11 income statements 144 index alphas 223–240 index changes 225–228 new entrants 227–228 principles 223–225 value distortion 228–230 see also exchange-traded funds index-rebalancing arbitrage 203–204 industry formation 8–9 industry-specific factors 188–190 infinite impulse response (IIR) filters 127–128 information ratio (IR) 28, 35–36, 74–75 initial hedge funds 8–9 inner product see dot product inputs, for design 25–26 integer effect 138 intermediate variables 115 Index297 intraday data 207–216 expected returns 211–215 illiquidity premium 208–211 market microstructures 208 probability of informed trading 213–215 intraday trading 217–222 alpha design 219–221 liquidity 218–219 vs. daily trading 218–219 intrinsic risk 102–103, 105–106, 109 invariance 89 inverse exchange-traded funds 234 IR see information ratio iterative searches 115 Jensen’s alpha 3 L1 norm 128–129 L2 norm 128–129 latency 46–47, 128, 155–156 lead-lag effects 158 length of testing 75 Level 1/2 tick data 46 leverage 14–15 leveraged exchange-traded funds 234 limiting methods 92–93 liquidity effect 96 intraday data 208–211 intraday trading 218–219 and spreads 51 literature, as a data source 44 look-ahead bias 78–79 lookback days, WebSim 257–258 looking back see backtesting Lo’s hypothesis 97 losses cutting 17–21, 109 drawdowns 106–107 loss functions 128–129 low-pass filters 128 M&A see mergers and acquisitions MAC clause see material adverse change clause MACD see moving average convergence-divergence machine learning 121–126 deep learning 125–126 ensemble methods 124–125 fuzzy logic 126 look-ahead bias 79 neural networks 124 statistical models 123 supervised/unsupervised 122 support vector machines (SVM) 122, 123–124 macroeconomic correlations 153 manual searches, pre-automation 119 margin 28 market commentary sites 181–182 market effects index changes 225–228 see also price changes market microstructure 207–216 expected returns 211–215 illiquidity premium 208–211 probability of informed trading 213–215 types of 208 material adverse change (MAC) clause 198–199 max drawdown 35 max stock weight, WebSim 257 mean-reversion rule 70 mean-squared error minimization 11 media 159–167 academic research 160 categorization 163 expectedness 164 finance information 181–182, 192 momentum 165 novelty 161–162 298Index sentiment 160–161 social 165–166 mergers and acquisitions (M&A) 196–199 models backtesting 69–76 elegance 75 inaccuracy of 10–11 see also algorithms; design; evaluation; machine learning; optimization momentum alphas 155–158, 165, 235–237 momentum effect 96 momentum-reversion 136–137 morning sunshine 46 moving average convergencedivergence (MACD) 136 multiple hypothesistesting 13, 20–21 narrow framing 81 natural gas reserves 246 negative factors, financial statements 146–147 neocognitron models 126 neural networks (NNs) 124 neutralization 108 WebSim 257 newly indexed companies 227–228 news 159–167 academic research 160 categories 163 expectedness 164 finance information 181–182, 192 momentum 165 novelty 161–162 relevance 162 sentiment 160–161 volatility 164–165 NNs see neural networks noise automated searches 113 differentiation 72–75 reduction 26 nonlinear transformations 64–66 normal distribution, approximation to 91 novelty of news 161–162 open interest 177–178 opportunities 14–15 optimization 29–30 automated searches 112, 115–116 loss functions 128–129 of parameter 131–132 options 169–178 concepts 169 open interest 177–178 popularity 170 trading volume 174–177 volatility skew 171–173 volatility spread 174 option to stock volume ratio (O/S) 174–177 order-driven markets 208 ordering methods 90–92 O/S see option to stock volume ratio outliers 13, 54, 92–93 out-of-sample testing 13, 74 overfitting 72–75 data mining 79–80 reduction 74–75, 269–270 overnight-0 alphas 219–221 overnight-1 alphas 219 parameter minimization 75 parameter optimization 131–132 PCA see principal component analysis Pearson correlation coefficients 62–64, 90 peer pressure 156 percent profitable days 35 performance parameters 85–86 Index299 PH see probability of heuristicdriven trading PIN see probability of informed trading PnL see profit and loss pools see portfolios Popper, Karl 17 popularity of options 170 portfolios correlation 61–62, 66 diversification 83–88, 108 position-based risk measures 102–103 positive bias 190 predictions 4 frequency 27 horizons 49–50 see also forecasting price changes analyst reports 190 behavioral economics 11–12 efficient markets hypothesis 11 expressions 4 index changes 225–228 news effects 159–167 relative 12–13, 26 price targets 184 price-volume strategies 135–139 pride 19 principal component analysis (PCA) 130–131 probability of heuristic-driven trading (PH) 214 probability of informed trading (PIN) 213–215 profit and loss (PnL) correlation 61–62 drawdowns 106–107 see also losses profit per dollar traded 35 programming languages 12 psychological factors see behavioral economics put-call parity relation 174 Python 12 quality 5 quantiles approximation 91 quintile distributions 104–105 quote-driven markets 208 random forest algorithm 124–125 random walks 11 ranking 90 RBM see restricted Boltzmann machine real estate investment trusts (REITs) 227 recommendations by analysts 182–183 recurrent neural networks (RNNs) 125 reduction of dimensionality 130–131 of noise 26 of overfitting 74–75, 269–270 of risk 108–109 Reg FD see Regulation Fair Disclosure region, WebSim 256 regions 85–86 regression models 10–11 regression problems 121 regularization 129 Regulation Fair Disclosure (Reg FD) 192 REITs see real estate investment trusts relationship models 26 relative prices 12–13, 26 relevance, of news 162 Renaissance Technologies 8 research 7–15 analyst reports 179–193 automated searches 111–120 backtesting 13–14 300Index behavioral economics 11–12 computer adoption 7–9 evaluation 13–14 exchange-traded funds 231–240 implementation 12–13 intraday data 207–216 machine learning 121–126 opportunities 14–15 perspectives 7–15 statistical arbitrage 10–11 triple-axis plan 83–88 restricted Boltzmann machine (RBM) 125 Reuleaux triangle 70 reversion alphas, five-day 55–59 risk 101–110 arbitrage 196–199 control 108–109 drawdowns 106–107 estimation 102–106 extrinsic 101, 106, 108–109 intrinsic 102–103, 105–106, 109 risk factors 26, 95–100 risk-on/risk off alphas 246–247 risk-reward matrix 267–268 RNNs see recurrent neural networks robustness 89–93, 103–106 rules 17–18 evaluation 20–21 see also algorithms; UnRule Russell 2000 IWM fund 225–226 SAD see seasonal affective disorder scale of automated searches 111–113 search engines, analyst reports 180–181 search spaces, automated searches 114–116 seasonality exchange-traded funds 237–238 futures and forwards 245–246 momentum strategies 157 and sunshine 46 selection bias 77–79, 117–118 sell-side analysts 179–180 see also analyst reports sensitivity tests 119 sentiment analysis 160–161, 188 shareholder’s equity 151 Sharpe ratios 71, 73, 74–75, 221, 260 annualized 97 Shaw, David 8 shrinkage estimators 131 signals analysts report 190, 191–192 cutting losses 20–21 data sources 25–26 definition 73 earnings calls 187–188 expressions 4 noise reduction 26, 72–75 options trading volume 174–177 smoothing 54–55, 59–60 volatility skew 171–173 volatility spread 174 sign correlation 65 significance tests 119 Simons, James 8 simple moving averages 55 simulation backtesting 71–72 WebSim settings 256–258 see also backtesting size factor 96 smoothing 54–55, 59–60 social media 165–166 sources of data 25–26, 43–44, 74–75 automated searches 113–114 see also data sparse principal component analysis (sPCA) 131 Spearman’s rank correlation 90 Index301 special considerations, financial statements 147 spin-offs 200–202 split-offs 200–202 spreads and liquidity 51 and volatility 51–52 stat arb see statistical arbitrage statistical arbitrage (stat arb) 10–11, 69–70 statistical models, machine learning 123 step-by-step construction 5, 41 storage costs 247–248 storytelling 80 subjectivity 17 sunshine 46 supervised machine learning 122 support vector machines (SVM) 122, 123–124 systemic bias 77–80 TAP see triple-axis plan tax efficiency, exchange-traded funds 233 teams 270–271 temporal-based correlation 63–64, 65 theory-fitting 80 thought processes of analysts 186–187 tick data 46 timestamping and bias 78–79 tracking errors 233–234 trades cost of 50–52 crossing effect 52–53 latency 46–47 trend following 18 trimming 92 triple-axis plan (TAP) 83–88 concepts 83–86 implementation 86–88 tuning of turnover 59–60 see also smoothing turnover 49–60 backtesting 35 control 53–55, 59–60 costs 50–52 crossing 52–53 examples 55–59 horizons 49–50 smoothing 54–55, 59–60 WebSim 260 uncertainty 17–18 underlying principles 72–73 changes in 109 understanding data 46 unexpected news 164 universes 26, 85–86, 239–240, 256 UnRule 17–18, 20–21 unsupervised machine learning 122 validation, data 45–46 valuation methodologies 189 value of alphas 27–30 value distortion, indices 228–230 value factors 96 value investing 96, 141 variance and bias 129–130 vendors as a data source 44 vertical mergers 197 volatility and news 164–165 and spreads 51–52 volatility skew 171–173 volatility spread 174 volume of options trading 174–177 price-volume strategies 135–139 volume-synchronized probability of informed trading (VPIN) 215 302Index VPIN see volume-synchronized probability of informed trading weather effects 46 WebSim 253–261 analysis 258–260 backtesting 33–41 data types 255 example 260–261 settings 256–258 weekly goals 266–267 weighted moving averages 55 Winsorization 92–93 Yahoo finance 180 Z-scoring 92

**
The Ethical Algorithm: The Science of Socially Aware Algorithm Design
** by
Michael Kearns,
Aaron Roth

23andMe, affirmative action, algorithmic trading, Alvin Roth, Bayesian statistics, bitcoin, cloud computing, computer vision, crowdsourcing, Edward Snowden, Elon Musk, Filter Bubble, general-purpose programming language, Google Chrome, ImageNet competition, Lyft, medical residency, Nash equilibrium, Netflix Prize, p-value, Pareto efficiency, performance metric, personalized medicine, pre–internet, profit motive, quantitative trading / quantitative ﬁnance, RAND corporation, recommendation engine, replication crisis, ride hailing / ride sharing, Robert Bork, Ronald Coase, self-driving car, short selling, sorting algorithm, speech recognition, statistical model, Stephen Hawking, superintelligent machines, telemarketer, Turing machine, two-sided market, Vilfredo Pareto

A far more common type of machine learning is the supervised variety, where we wish to use data to make specific predictions that can later be verified or refuted by observing the truth—for example, using past meteorological data to predict whether it will rain tomorrow. The “supervision” that guides our learning is the feedback we get tomorrow, when either it rains or it doesn’t. And for much of the history of machine learning and statistical modeling, many applications, like this example, were focused on making predictions about nature or other large systems: predicting tomorrow’s weather, predicting whether the stock market will go up or down (and by how much), predicting congestion on roadways during rush hour, and the like. Even when humans were part of the system being modeled, the emphasis was on predicting aggregate, collective behaviors.

…

After all, if we take a traditional statistical fairness notion such as equality of false rejection rates in lending, if you are one of the creditworthy Square applicants who has been denied a loan, how comforting is it to be told that there was also a creditworthy Circle applicant who was rejected to “compensate” for your mistreatment? But if we go too far down the path toward individual fairness, other difficulties arise. In particular, if our model makes even a single mistake, then it can potentially be accused of unfairness toward that one individual, assuming it makes any loans at all. And anywhere we apply machine learning and statistical models to historical data, there are bound to be mistakes except in the most idealized settings. So we can ask for this sort of individual level of fairness, but if we do so naively, its applicability will be greatly constrained and its costs to accuracy are likely to be unpalatable; we’re simply asking for too much. Finding reasonable ways to give meaningful alternative fairness guarantees to individuals is one of the most exciting areas of ongoing research.

…

We might call this version “Bias in, bias out.” The problems can become even more insidious. Sometimes decisions made using biased data or algorithms are the basis for further data collection, forming a pernicious feedback loop that can amplify discrimination over time. An example of this phenomenon comes from the domain of “predictive policing,” in which large metropolitan police departments use statistical models to forecast neighborhoods with higher crime rates, and then send larger forces of police officers there. The most popularly used algorithms are proprietary and secret, so there is debate about how these algorithms estimate crime rates, and concern that some police departments might be in part using arrest data. Of course, even if neighborhoods A and B have the same underlying rates of crime, if we send more police to A than to B, we naturally will discover more crime in A as well.

pages: 294 words: 82,438

**
Simple Rules: How to Thrive in a Complex World
** by
Donald Sull,
Kathleen M. Eisenhardt

Affordable Care Act / Obamacare, Airbnb, asset allocation, Atul Gawande, barriers to entry, Basel III, Berlin Wall, carbon footprint, Checklist Manifesto, complexity theory, Craig Reynolds: boids flock, Credit Default Swap, Daniel Kahneman / Amos Tversky, diversification, drone strike, en.wikipedia.org, European colonialism, Exxon Valdez, facts on the ground, Fall of the Berlin Wall, haute cuisine, invention of the printing press, Isaac Newton, Kickstarter, late fees, Lean Startup, Louis Pasteur, Lyft, Moneyball by Michael Lewis explains big data, Nate Silver, Network effects, obamacare, Paul Graham, performance metric, price anchoring, RAND corporation, risk/return, Saturday Night Live, sharing economy, Silicon Valley, Startup school, statistical model, Steve Jobs, TaskRabbit, The Signal and the Noise by Nate Silver, transportation-network company, two-sided market, Wall-E, web application, Y Combinator, Zipcar

One study looked at how police can identify where serial criminals live. A simple rule—take the midpoint of the two most distant crime scenes—got police closer to the criminal than more sophisticated decision-making approaches. Another study compared a state-of-the-art statistical model and a simple rule to determine which did a better job of predicting whether past customers would purchase again. According to the simple rule, a customer was inactive if they had not purchased in x months (the number of months varies by industry). The simple rule did as well as the statistical model in predicting repeat purchases of online music, and beat it in the apparel and airline industries. Other research finds that simple rules match or beat more complicated models in assessing the likelihood that a house will be burgled and in forecasting which patients with chest pain are actually suffering from a heart attack.

…

., “Validation of the Emergency Severity Index (ESI) in Self-Referred Patients in a European Emergency Department,” Emergency Medicine Journal 24, no. 3 (2007): 170–74. [>] Statisticians have found: Professor Scott Armstrong of the Wharton School reviewed thirty-three studies comparing simple and complex statistical models used to forecast business and economic outcomes. He found no difference in forecasting accuracy in twenty-one of the studies. Sophisticated models did better in five studies, while simple models outperformed complex ones in seven cases. See J. Scott Armstrong, “Forecasting by Extrapolation: Conclusions from 25 Years of Research,” Interfaces 14 (1984): 52–66. Spyros Makridakis has hosted a series of competitions for statistical models over two decades, and consistently found that complex models fail to outperform simpler approaches. The history of the competitions is summarized in Spyros Makridakis and Michèle Hibon, “The M3-Competition: Results, Conclusions, and Implications,” International Journal of Forecasting 16, no. 4 (2000): 451–76. [>] When it comes to modeling: In statistical terms, a model that closely approximates the underlying function that generates observed data is said to have low bias.

…

In fact, the 1/N rule ignores everything except for the number of investment alternatives under consideration. It is hard to imagine a simpler investment rule. And yet it works. One recent study of alternative investment approaches pitted the Markowitz model and three extensions of his approach against the 1/N rule, testing them on seven samples of data from the real world. This research ran a total of twenty-eight horseraces between the four state-of-the-art statistical models and the 1/N rule. With ten years of historical data to estimate risk, returns, and correlations, the 1/N rule outperformed the Markowitz equation and its extensions 79 percent of the time. The 1/N rule earned a positive return in every test, while the more complicated models lost money for investors more than half the time. Other studies have run similar tests and come to the same conclusions.

pages: 401 words: 109,892

**
The Great Reversal: How America Gave Up on Free Markets
** by
Thomas Philippon

airline deregulation, Amazon Mechanical Turk, Amazon Web Services, Andrei Shleifer, barriers to entry, bitcoin, blockchain, business cycle, business process, buy and hold, Carmen Reinhart, carried interest, central bank independence, commoditize, crack epidemic, cross-subsidies, disruptive innovation, Donald Trump, Erik Brynjolfsson, eurozone crisis, financial deregulation, financial innovation, financial intermediation, gig economy, income inequality, income per capita, index fund, intangible asset, inventory management, Jean Tirole, Jeff Bezos, Kenneth Rogoff, labor-force participation, law of one price, liquidity trap, low cost airline, manufacturing employment, Mark Zuckerberg, market bubble, minimum wage unemployment, money market fund, moral hazard, natural language processing, Network effects, new economy, offshore financial centre, Pareto efficiency, patent troll, Paul Samuelson, price discrimination, profit maximization, purchasing power parity, QWERTY keyboard, rent-seeking, ride hailing / ride sharing, risk-adjusted returns, Robert Bork, Robert Gordon, Ronald Reagan, Second Machine Age, self-driving car, Silicon Valley, Snapchat, spinning jenny, statistical model, Steve Jobs, supply-chain management, Telecommunications Act of 1996, The Chicago School, the payments system, The Rise and Fall of American Growth, The Wealth of Nations by Adam Smith, too big to fail, total factor productivity, transaction costs, Travis Kalanick, Vilfredo Pareto, zero-sum game

This pattern holds for the whole economy as well as within the manufacturing sector, where we can use more granular data (NAICS level 6, a term explained in the Appendix section on industry classification). The relationship is positive and significant over the 1997–2002 period but not after. In fact, the relationship appears to be negative, albeit noisy, in the 2007–2012 period. Box 4.2. Statistical Models Table 4.2 presents the results of five regressions, that is, five statistical models. The right half of the table considers the whole economy; the left half focuses on the manufacturing sector. TABLE 4.2 Regression Results Productivity growth Years (1) (2) (3) (4) (5) Manufacturing Whole economy 97–02 02–07 07–12 89–99 00–15 Census CR4 growth 0.13* 0.01 −0.13 [0.06] [0.05] [0.17] Compustat CR4 growth 0.14* −0.09 [0.06] [0.07] Data set & granularity NAICS-6 KLEMS Year fixed effects Y Y Y Y Y Observations 469 466 299 92 138 R2 0.03 0.00 0.02 0.07 0.09 Notes: Log changes in TFP and in top 4 concentration.

…

The issue is that the CPI misses the (big) initial introduction effect and overestimates inflation. When BLS data collectors cannot obtain a price for an item in the CPI sample (for example, because the outlet has stopped selling it), they look for a replacement item that is closest to the missing one. The BLS then adjusts for changes in quality and specifications. It can use manufacturers’ cost data or hedonic regressions to compute quality adjustments. Hedonic regressions are statistical models to infer consumers’ willingness to pay for goods or services. When it cannot estimate an explicit quality adjustment, the BLS imputes the price change using the average price change of similar items in the same geographic area. Finally, the BLS has specific procedures to estimate the price of housing (rents and owners’ equivalent rents) and medical care. (The medical care component of the CPI covers only out-of-pocket expenses.)

…

To test this idea, Matias Covarrubias, Germán Gutiérrez, and I (2019) study the relationship between changes in concentration and changes in total factor productivity (TFP) across industries during the 1990s and 2000s. We use our trade-adjusted concentration measures to control for foreign competition and for exports. Box 4.2 and its table summarize our results and discuss the interpretation of the various numbers in statistical models. We find that the relationship between concentration and productivity growth has changed over the past twenty years. During the 1990s (1989–1999) this relationship was positive. Industries with larger increases in concentration were also industries with larger productivity gains. This is no longer the case. In fact, between 2000 and 2015, we find a negative (but somewhat noisy) relationship between changes in concentration and changes in productivity.

pages: 370 words: 107,983

**
Rage Inside the Machine: The Prejudice of Algorithms, and How to Stop the Internet Making Bigots of Us All
** by
Robert Elliott Smith

Ada Lovelace, affirmative action, AI winter, Alfred Russel Wallace, Amazon Mechanical Turk, animal electricity, autonomous vehicles, Black Swan, British Empire, cellular automata, citizen journalism, Claude Shannon: information theory, combinatorial explosion, corporate personhood, correlation coefficient, crowdsourcing, Daniel Kahneman / Amos Tversky, desegregation, discovery of DNA, Douglas Hofstadter, Elon Musk, Fellow of the Royal Society, feminist movement, Filter Bubble, Flash crash, Gerolamo Cardano, gig economy, Gödel, Escher, Bach, invention of the wheel, invisible hand, Jacquard loom, Jacques de Vaucanson, John Harrison: Longitude, John von Neumann, Kenneth Arrow, low skilled workers, Mark Zuckerberg, mass immigration, meta analysis, meta-analysis, mutually assured destruction, natural language processing, new economy, On the Economy of Machinery and Manufactures, p-value, pattern recognition, Paul Samuelson, performance metric, Pierre-Simon Laplace, precariat, profit maximization, profit motive, Silicon Valley, social intelligence, statistical model, Stephen Hawking, stochastic process, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, The Future of Employment, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Thomas Bayes, Thomas Malthus, traveling salesman, Turing machine, Turing test, twin studies, Vilfredo Pareto, Von Neumann architecture, women in the workforce

However, the idea that probability and statistics are the best way to cope with complexity and the uncertainty it creates is something of an act of faith for the AI community. That act of faith remains largely hidden from everyone outside that community by a cloud of seemingly impenetrable mathematics. This obscures the dangers inherent in using statistics and probability as a basis for reasoning about people via algorithms. Statistical models, after all, aren’t unbiased, particularly when, as is the case for most algorithms today, they are motivated by the pursuit of profit. Just like expert systems, statistical models require a frame within which to operate, which is then populated by particular atoms. That frame and those atoms are subject to the same brittleness (limitations) and biases. On top of that, the probabilities drawn from these statistics, which become the grist for the statistical algorithmic mill, often aren’t what we think they are at all.

…

Byron’s wife (who he abandoned with child) coined the term ‘Byromania’ to describe the commotion around him, and Friedrich Nietzsche drew influences from him. Unlike Wollstonecraft, Byron was a game-changing personality who challenged conventions and social mores and opened the door to a new Romantic Age. At least for men. The casual definition of outlier is ‘a person or thing situated away or detached from the main body or system,’ but in statistical modelling, it is ‘a data point on a graph or in a set of results that is very much bigger or smaller than the next nearest data point.’ In terms of algorithms, a statistical model is like the flattened and warped rugby ball, a shape that can be mathematically characterized by a few numbers, which can be in turn manipulated by an algorithm to fit data. In this sense, an outlier is a point that is far from the other points, the fluff on the data cloud which can’t easily be fitted inside the warped rugby ball.

…

However, there is another way to view the Bell Curve: not as a natural law, but as an artefact of trying to see complex and uncertain phenomena through the limiting lens of sampling and statistics. The CLT does not prove that everything follows a Bell Curve; it shows that when you sample in order to understand things that you can’t observe, you will always get a Bell Curve. That’s all. Despite this reality, faith in CLT and the Bell Curve still dominates in statistical modelling of all sorts of things today from presidential approval ratings to reoffending rates for criminals to educational success or failure, to whether jobs can be done by computers as well as people. What’s more, faith in this mathematical model inevitably led to its use in areas where it was ill-suited and inappropriate, such as Quetelet’s Theory of Probabilities as Applied and to the Moral and Political Sciences.

pages: 545 words: 137,789

**
How Markets Fail: The Logic of Economic Calamities
** by
John Cassidy

"Robert Solow", Albert Einstein, Andrei Shleifer, anti-communist, asset allocation, asset-backed security, availability heuristic, bank run, banking crisis, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Black-Scholes formula, Blythe Masters, Bretton Woods, British Empire, business cycle, capital asset pricing model, centralized clearinghouse, collateralized debt obligation, Columbine, conceptual framework, Corn Laws, corporate raider, correlation coefficient, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, Daniel Kahneman / Amos Tversky, debt deflation, different worldview, diversification, Elliott wave, Eugene Fama: efficient market hypothesis, financial deregulation, financial innovation, Financial Instability Hypothesis, financial intermediation, full employment, George Akerlof, global supply chain, Gunnar Myrdal, Haight Ashbury, hiring and firing, Hyman Minsky, income per capita, incomplete markets, index fund, information asymmetry, Intergovernmental Panel on Climate Change (IPCC), invisible hand, John Nash: game theory, John von Neumann, Joseph Schumpeter, Kenneth Arrow, Kickstarter, laissez-faire capitalism, Landlord’s Game, liquidity trap, London Interbank Offered Rate, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, margin call, market bubble, market clearing, mental accounting, Mikhail Gorbachev, money market fund, Mont Pelerin Society, moral hazard, mortgage debt, Myron Scholes, Naomi Klein, negative equity, Network effects, Nick Leeson, Northern Rock, paradox of thrift, Pareto efficiency, Paul Samuelson, Ponzi scheme, price discrimination, price stability, principal–agent problem, profit maximization, quantitative trading / quantitative ﬁnance, race to the bottom, Ralph Nader, RAND corporation, random walk, Renaissance Technologies, rent control, Richard Thaler, risk tolerance, risk-adjusted returns, road to serfdom, Robert Shiller, Robert Shiller, Ronald Coase, Ronald Reagan, shareholder value, short selling, Silicon Valley, South Sea Bubble, sovereign wealth fund, statistical model, technology bubble, The Chicago School, The Great Moderation, The Market for Lemons, The Wealth of Nations by Adam Smith, too big to fail, transaction costs, unorthodox policies, value at risk, Vanguard fund, Vilfredo Pareto, wealth creators, zero-sum game

“Today, retail lending has become more routinized as banks have become increasingly adept at predicting default risk by applying statistical models to data, such as credit scores,” Bernanke went on. “Other tools include proprietary internal debt-rating models and third-party programs that use market data to analyze the risk of exposures to corporate borrowers that issue stock.” While challenges remained, Bernanke concluded, “banking organizations of all sizes have made substantial strides over the past two decades in their ability to measure and manage risks.” Nobody could quibble with Bernanke’s point that Wall Street was becoming more quantitative: the research and risk departments of big financial firms were teeming with physicists, applied mathematicians, and statisticians. But the proper role of statistical models is as a useful adjunct to an overall strategy of controlling risk, not as a substitute for one.

…

However, it also raises the possibility that the causal relationships that determine market movements aren’t fixed, but vary over time. Maybe because of shifts in psychology or government policy, there are periods when markets will settle into a rut, and other periods when they will be apt to gyrate in alarming fashion. This picture seems to jibe with reality, but it raises some tricky issues for quantitative finance. If the underlying reality of the markets is constantly changing, statistical models based on past data will be of limited use, at best, in determining what is likely to happen in the future. And firms and investors that rely on these models to manage risk may well be exposing themselves to danger. The economics profession didn’t exactly embrace Mandelbrot’s criticisms. As the 1970s proceeded, the use of quantitative techniques became increasingly common on Wall Street. The coin-tossing view of finance made its way into the textbooks and, with the help of Burton Malkiel, onto the bestsellers list.

…

After listening to Vincent Reinhart, the head of the Fed’s Division of Monetary Affairs, suggest several ways the Fed could try to revive the economy if interest rate changes could no longer be used, he dismissed the discussion as “premature” and described the possibility of a prolonged deflation as “a very small probability event.” The discussion turned to the immediate issue of whether to keep the funds rate at 1.25 percent. Since the committee’s previous meeting, Congress had approved the Bush administration’s third set of tax cuts since 2001, which was expected to give spending a boost. The Fed’s own statistical model of the economy was predicting a vigorous upturn later in 2003, suggesting that further rate cuts would be unnecessary and that some policy tightening might even be needed. “But that forecast has a very low probability, as far as I’m concerned,” Greenspan said curtly. “It points to an outcome that would be delightful if it were to materialize, but it is not a prospect on which we should focus our policy at this point.”

**
Big Data at Work: Dispelling the Myths, Uncovering the Opportunities
** by
Thomas H. Davenport

Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, commoditize, data acquisition, disruptive innovation, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, lifelogging, Mark Zuckerberg, move fast and break things, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining, Thomas Davenport

Another difference is a widespread preference for visual analytics on big data. For reasons not entirely understood (by anyone, I think), the results of big data analyses are often expressed in visual formats. Now, visual analytics have a lot of strengths: They are relatively easy for non-quantitative executives to interpret, and they get attention. The downside is that they are not generally well suited for expressing complex multivariate relationships and statistical models. Put in other terms, most visual displays of data are for descriptive analytics, rather than predictive or prescriptive ones. They can, however, show a lot of data at once, as figure 4-1 illustrates. It’s a display of the tweets and retweets on Twitter involving particular New York Times articles.5 I find—as with many other complex big data visualizations—this one difficult to decipher. I sometimes think that many big data visualizations are created simply because they can be, rather than to provide clarity on an issue.

…

Chapter_04.indd 112 03/12/13 12:00 PM 5 Technology for Big Data Written with Jill Dyché A major component of what makes the management and analysis of big data possible is new technology.* In effect, big data is not just a large volume of unstructured data, but also the technologies that make processing and analyzing it possible. Specific big data technologies analyze textual, video, and audio content. When big data is fast moving, technologies like machine learning allow for the rapid creation of statistical models that fit, optimize, and predict the data. This chapter is devoted to all of these big data technologies and the difference they make. The technologies addressed in the chapter are outlined in table 5-1. *I am indebted in this section to Jill Dyché, vice president of SAS Best Practices, who collaborated with me on this work and developed many of the frameworks in this section. Much of the content is taken from our report, Big Data in Big Companies (International Institute for Analytics, April 2013).

…

Hive performs similar functions but is more batch oriented, and it can transform data into the relational format suitable for Structured Query Language (SQL; used to access and manipulate data in databases) queries. This makes it useful for analysts who are familiar with that query language. Business View The business view layer of the stack makes big data ready for further analysis. Depending on the big data application, additional processing via MapReduce or custom code might be used to construct an intermediate data structure, such as a statistical model, a flat file, a relational table, or a data cube. The resulting structure may be intended for additional analysis or to be queried by a traditional SQL-based query tool. Many vendors are moving to so-called “SQL on Hadoop” approaches, simply because SQL has been used in business for a couple of decades, and many people (and higher-level languages) know how to create SQL queries. This business view ensures that big data is more consumable by the tools and the knowledge workers that already exist in an organization.

pages: 443 words: 51,804

**
Handbook of Modeling High-Frequency Data in Finance
** by
Frederi G. Viens,
Maria C. Mariani,
Ionut Florescu

algorithmic trading, asset allocation, automated trading system, backtesting, Black-Scholes formula, Brownian motion, business process, buy and hold, continuous integration, corporate governance, discrete time, distributed generation, fixed income, Flash crash, housing crisis, implied volatility, incomplete markets, linear programming, mandelbrot fractal, market friction, market microstructure, martingale, Menlo Park, p-value, pattern recognition, performance metric, principal–agent problem, random walk, risk tolerance, risk/return, short selling, statistical model, stochastic process, stochastic volatility, transaction costs, value at risk, volatility smile, Wiener process

Florescu, Ionuţ, 1973– III. Title. HG106.V54 2011 332.01 5193–dc23 2011038022 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 Contents Preface Contributors xi xiii part One Analysis of Empirical Data 1 1 Estimation of NIG and VG Models for High Frequency Financial Data 3 José E. Figueroa-López, Steven R. Lancette, Kiseop Lee, and Yanhui Mi 1.1 1.2 1.3 1.4 1.5 1.6 Introduction, 3 The Statistical Models, 6 Parametric Estimation Methods, 9 Finite-Sample Performance via Simulations, 14 Empirical Results, 18 Conclusion, 22 References, 24 2 A Study of Persistence of Price Movement using High Frequency Financial Data 27 Dragos Bozdog, Ionuţ Florescu, Khaldoun Khashanah, and Jim Wang 2.1 Introduction, 27 2.2 Methodology, 29 2.3 Results, 35 v vi Contents 2.4 Rare Events Distribution, 41 2.5 Conclusions, 44 References, 45 3 Using Boosting for Financial Analysis and Trading 47 Germán Creamer 3.1 3.2 3.3 3.4 3.5 Introduction, 47 Methods, 48 Performance Evaluation, 53 Earnings Prediction and Algorithmic Trading, 60 Final Comments and Conclusions, 66 References, 69 4 Impact of Correlation Fluctuations on Securitized structures 75 Eric Hillebrand, Ambar N.

…

In Section 1.5, we present our empirical results using high frequency transaction data from the US equity market. The data was obtained from the NYSE TAQ database of 2005 trades via Wharton’s WRDS system. For the sake of clarity and space, we only present the results for Intel and defer a full analysis of other stocks for a future publication. We ﬁnish with a section of conclusions and further recommendations. 1.2 The Statistical Models 1.2.1 GENERALITIES OF EXPONENTIAL LÉVY MODELS Before introducing the speciﬁc models we consider in this chapter, let us brieﬂy motivate the application of Lévy processes in ﬁnancial modeling. We refer the reader to the monographs of Cont & Tankov (2004) and Sato (1999) or the recent review papers Figueroa-López (2011) and Tankov (2011) for further information. Exponential (or Geometric) Lévy models are arguably the most natural generalization of the geometric Brownian motion intrinsic in the Black–Scholes option pricing model.

…

Exponential (or Geometric) Lévy models are arguably the most natural generalization of the geometric Brownian motion intrinsic in the Black–Scholes option pricing model. A geometric Brownian motion (also called Black–Scholes model) postulates the following conditions about the price process (St )t≥0 of a risky asset: (1) The (log) return on the asset over a time period [t, t + h] of length h, that is, Rt,t+h := log St+h St is Gaussian with mean μh and variance σ 2 h (independent of t); 7 1.2 The Statistical Models (2) Log returns on disjoint time periods are mutually independent; (3) The price path t → St is continuous; that is, P(Su → St , as u → t, ∀ t) = 1. The previous assumptions can equivalently be stated in terms of the so-called log return process (Xt )t , denoted henceforth as Xt := log St . S0 Indeed, assumption (1) is equivalent to ask that the increment Xt+h − Xt of the process X over [t, t + h] is Gaussian with mean μh and variance σ 2 h.

pages: 460 words: 122,556

**
The End of Wall Street
** by
Roger Lowenstein

Asian financial crisis, asset-backed security, bank run, banking crisis, Berlin Wall, Bernie Madoff, Black Swan, break the buck, Brownian motion, Carmen Reinhart, collateralized debt obligation, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversified portfolio, eurozone crisis, Fall of the Berlin Wall, fear of failure, financial deregulation, fixed income, high net worth, Hyman Minsky, interest rate derivative, invisible hand, Kenneth Rogoff, London Interbank Offered Rate, Long Term Capital Management, margin call, market bubble, Martin Wolf, money market fund, moral hazard, mortgage debt, negative equity, Northern Rock, Ponzi scheme, profit motive, race to the bottom, risk tolerance, Ronald Reagan, Rubik’s Cube, savings glut, short selling, sovereign wealth fund, statistical model, the payments system, too big to fail, tulip mania, Y2K

See AIG bailouts Ben Bernanke and board of Warren Buffett and CDOs and collateral calls on compensation at corporate structure of credit default swaps and credit rating agencies and Jamie Dimon and diversity of holdings employees, number of Financial Products subsidiary Timothy Geithner and Goldman Sachs and insurance (credit default swap) premiums of JPMorgan Chase and lack of reserve for losses leadership changes Lehman Brothers and losses Moody’s and Morgan Stanley and New York Federal Reserve Bank and Hank Paulson and rescue of. See AIG bailouts revenue of shareholders statistical modeling of stock price of struggles of risk of systemic effects of failure of Texas and AIG bailouts amount of Ben Bernanke and board’s role in credit rating agencies and Federal Reserve and Timothy Geithner and Goldman Sachs and JPMorgan Chase and Lehman Brothers’ bankruptcy and New York state and Hank Paulson and reasons for harm to shareholders in Akers, John Alexander, Richard Allison, Herbert Ambac American Home Mortgages Andrukonis, David appraisers, real estate Archstone-Smith Trust Associates First Capital Atteberry, Thomas auto industry Bagehot, Walter bailouts.

…

See credit crisis volatility of credit crisis borrowers, lack of effects of fear of lending mortgages and reasons for spread of as unforeseen credit cycle credit default swaps AIG and Goldman Sachs and Morgan Stanley and credit rating agencies. See also specific agencies AIG and capital level determination by guessing by inadequacy of models of Lehman Brothers and Monte Carlo method of mortgage-backed securities and statistical modeling used by Credit Suisse Cribiore, Alberto Cummings, Christine Curl, Gregory Dallavecchia, Enrico Dannhauser, Stephen Darling, Alistair Dean Witter debt of financial firms U.S. reliance on of U.S. families defaults/delinquencies deflation deleveraging. See also specific firms del Missier, Jerry Democrats deposit insurance deregulation of banking system and derivatives of financial markets derivatives.

…

See home foreclosure(s) foreign investors France Frank, Barney Freddie Mac and Fannie Mae accounting problems of affordable housing and Alternative-A loans bailout of Ben Bernanke and capital raised by competitive threats to Congress and Countrywide Financial and Democrats and Federal Reserve and foreign investment in Alan Greenspan and as guarantor history of lack of regulation of leadership changes leverage losses mortgage bubble and as mortgage traders Hank Paulson and politics and predatory lending and reasons for failures of relocation to private sector Robert Rodriguez and shareholders solving financial crisis through statistical models of stock price of Treasury Department and free market Freidheim, Scott Friedman, Milton Fuld, Richard compensation of failure to pull back from mortgage-backed securities identification with Lehman Brothers Lehman Brothers’ bankruptcy and Lehman Brothers’ last days and long tenure of Hank Paulson and personality and character of Gamble, James (Jamie) GDP Geithner, Timothy AIG and bank debt guarantees and Bear Stearns bailout and career of China and Citigroup and financial crisis, response to Lehman Brothers and money markets and Morgan Stanley and in Obama administration Hank Paulson and TARP and Gelband, Michael General Electric General Motors Germany Glass-Steagall Act Glauber, Robert Golden West Savings and Loan Goldman Sachs AIG and as bank holding company Warren Buffett investment in capital raised by capital sought by compensation at credit default swaps and hedge funds and insurance (credit default swap) premiums of job losses at leverage of Merrill Lynch and Stanley O’Neal’s obsession with Hank Paulson and pull back from mortgage-backed securities short selling against stock price of Wachovia and Gorton, Gary government, U.S.

**
Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals
** by
David Aronson

Albert Einstein, Andrew Wiles, asset allocation, availability heuristic, backtesting, Black Swan, butter production in bangladesh, buy and hold, capital asset pricing model, cognitive dissonance, compound rate of return, computerized trading, Daniel Kahneman / Amos Tversky, distributed generation, Elliott wave, en.wikipedia.org, feminist movement, hindsight bias, index fund, invention of the telescope, invisible hand, Long Term Capital Management, mental accounting, meta analysis, meta-analysis, p-value, pattern recognition, Paul Samuelson, Ponzi scheme, price anchoring, price stability, quantitative trading / quantitative ﬁnance, Ralph Nelson Elliott, random walk, retrograde motion, revision control, risk tolerance, risk-adjusted returns, riskless arbitrage, Robert Shiller, Robert Shiller, Sharpe ratio, short selling, source of truth, statistical model, stocks for the long run, systematic trading, the scientific method, transfer pricing, unbiased observer, yield curve, Yogi Berra

It was a review of prior studies, known as a meta-analysis, which examined 20 studies that had compared the subjective diagnoses of psychologists and psychiatrists with those produced by linear statistical models. The studies covered the prediction of academic success, the likelihood of criminal recidivism, and predicting the outcomes of electrical shock therapy. In each case, the experts rendered a judgment by evaluating a multitude of variables in a subjective manner. “In all studies, the statistical model provided more accurate predictions or the two methods tied.”34 A subsequent study by Sawyer35 was a meta analysis of 45 studies. “Again, there was not a single study in which clinical global judgment was superior to the statistical prediction (termed ‘mechanical combination’ by Sawyer).”36 Sawyer’s investigation is noteworthy because he considered studies in which the human expert was allowed access to information that was not considered by the statistical model, and yet the model was still superior.

…

The prediction problems spanned nine different ﬁelds: (1) academic performance of graduate students, (2) life-expectancy of cancer patients, (3) changes in stock prices, (4) mental illness using personality tests, (5) grades and attitudes in a psychology course, (6) business failures using ﬁnancial ratios, (7) students’ ratings of teaching effectiveness, (8) performance of life insurance sales personnel, and (9) IQ scores using Rorschach Tests. Note that the average correlation of the statistical model was 0.64 versus the expert average of 0.33. In terms of information content, which is measured by the correlation coefﬁcient squared or r-squared, the model’s predictions were on average 3.76 times as informative as the experts’. Numerous additional studies comparing expert judgment to statistical models (rules) have conﬁrmed these ﬁndings, forcing the conclusion that people do poorly when attempting to combine a multitude of variables to make predictions or judgments. In 1968, Goldberg39 showed that a linear prediction model utilizing personality test scores as inputs could discriminate neurotic from psychotic patients better than experienced clinical diagnosticians.

…

The task was to predict the propensity for violence among newly admitted male psychiatric patients based on 19 inputs. The average accuracy of the experts, as measured by the correlation coefﬁcient between their prediction of violence and the actual manifestation of violence, was a poor 0.12. The single best expert had a score of 0.36. The predictions of a linear statistical model, using the same set of 19 inputs, achieved a correlation of 0.82. In this instance the model’s predictions were nearly 50 times more informative than the experts’. Meehl continued to expand his research of comparing experts and statistical models and in 1986 concluded that “There is no controversy in social science which shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one. When you are pushing 90 investigations [currently greater than 15040] predicting everything from the outcomes of football games to the diagnosis of liver disease and when you can hardly come up with a half dozen studies showing even a weak tendency in favor of the clinician, it is time to draw a practical conclusion.”41 The evidence continues to accumulate, yet few experts pay heed.

pages: 294 words: 77,356

**
Automating Inequality
** by
Virginia Eubanks

autonomous vehicles, basic income, business process, call centre, cognitive dissonance, collective bargaining, correlation does not imply causation, deindustrialization, disruptive innovation, Donald Trump, Elon Musk, ending welfare as we know it, experimental subject, housing crisis, IBM and the Holocaust, income inequality, job automation, mandatory minimum, Mark Zuckerberg, mass incarceration, minimum wage unemployment, mortgage tax deduction, new economy, New Urbanism, payday loans, performance metric, Ronald Reagan, self-driving car, statistical model, strikebreaker, underbanked, universal basic income, urban renewal, War on Poverty, working poor, Works Progress Administration, young professional, zero-sum game

The 2006 Indiana eligibility modernization experiment was fairly straightforward: the system accepted online applications for services, checked and verified income and other personal information, and set benefit levels. The electronic registry of the unhoused I studied in Los Angeles, called the coordinated entry system, was piloted seven years later. It deploys computerized algorithms to match unhoused people in its registry to the most appropriate available housing resources. The Allegheny Family Screening Tool, launched in August 2016, uses statistical modeling to provide hotline screeners with a predictive risk score that shapes the decision whether or not to open child abuse and neglect investigations. I started my reporting in each location by reaching out to organizations working closely with the families most directly impacted by these systems. Over three years, I conducted 105 interviews, sat in on family court, observed a child abuse hotline call center, searched public records, submitted Freedom of Information Act requests, pored through court filings, and attended dozens of community meetings.

…

“[P]renatal risk assessments could be used to identify children at risk … while still in the womb.”3 On the other side of the world, Rhema Vaithianathan, associate professor of economics at the University of Auckland, was on a team developing just such a tool. As part of a larger program of welfare reforms led by conservative Paula Bennett, the New Zealand Ministry of Social Development (MSD) commissioned the Vaithianathan team to create a statistical model to sift information on parents interacting with the public benefits, child protective, and criminal justice systems to predict which children were most likely to be abused or neglected. Vaithianathan reached out to Putnam-Hornstein to collaborate. “It was such an exciting opportunity to partner with Rhema’s team around this potential real-time use of data to target children,” said Putnam-Hornstein.

…

Nevertheless, Allegheny County’s experiment in predicting child maltreatment is worth watching with a skeptical eye. It is an early adopter in a nationwide algorithmic experiment in child welfare: similar systems have been implemented recently in Florida, Los Angeles, New York City, Oklahoma, and Oregon. As this book goes to press, Cherna and Dalton continue to experiment with data analytics. The next iteration of the AFST will employ machine learning rather than traditional statistical modeling. They also plan to introduce a second predictive model, one that will not rely on reports to the hotline at all. Instead, the planned model “would be run on a daily or weekly basis on all babies born in Allegheny County the prior day or week,” according to a September 2017 email from Dalton. Running a model that relies on the public to make calls to a hotline does not capture the whole population of potential abusers and neglecters; at-birth models are much more accurate.

pages: 263 words: 75,455

**
Quantitative Value: A Practitioner's Guide to Automating Intelligent Investment and Eliminating Behavioral Errors
** by
Wesley R. Gray,
Tobias E. Carlisle

activist fund / activist shareholder / activist investor, Albert Einstein, Andrei Shleifer, asset allocation, Atul Gawande, backtesting, beat the dealer, Black Swan, business cycle, butter production in bangladesh, buy and hold, capital asset pricing model, Checklist Manifesto, cognitive bias, compound rate of return, corporate governance, correlation coefficient, credit crunch, Daniel Kahneman / Amos Tversky, discounted cash flows, Edward Thorp, Eugene Fama: efficient market hypothesis, forensic accounting, hindsight bias, intangible asset, Louis Bachelier, p-value, passive investing, performance metric, quantitative hedge fund, random walk, Richard Thaler, risk-adjusted returns, Robert Shiller, Robert Shiller, shareholder value, Sharpe ratio, short selling, statistical model, survivorship bias, systematic trading, The Myth of the Rational Market, time value of money, transaction costs

We need some means to protect us from our cognitive biases, and the quantitative method is that means. It serves both to protect us from our own behavioral errors and to exploit the behavioral errors of others. The model does need not be complex to achieve this end. In fact, the weight of evidence indicates that even simple statistical models outperform the best experts. It speaks to the diabolical nature of our faulty cognitive apparatus that those simple statistical models continue to outperform the best experts even when those same experts are given access to the models' output. This is as true for a value investor as it is for any other expert in any other field of endeavor. This book is aimed at value investors. It's a humbling and maddening experience to compare active investment results with an analogous passive strategy.

…

In his book, Expert Political Judgment,36 Philip Tetlock discusses his extensive study of people who make prediction their business—the experts. Tetlock's conclusion is that experts suffer from the same behavioral biases as the laymen. Tetlock's study fits within a much larger body of research that has consistently found that experts are as unreliable as the rest of us. A large number of studies have examined the records of experts against simple statistical model, and, in almost all cases, concluded that experts either underperform the models or can do no better. It's a compelling argument against human intuition and for the statistical approach, whether it's practiced by experts or nonexperts.37 Even Experts Make Behavioral Errors In many disciplines, simple quantitative models outperform the intuition of the best experts. The simple quantitative models continue to outperform the judgments of the best experts, even when those experts are given the benefit of the outputs from the simple quantitative model.

…

The model predicted O'Connor's vote correctly 70 percent of the time, while the experts' success rate was only 61 percent.41How can it be that simple models perform better than experienced clinical psychologists or renowned legal experts with access to detailed information about the cases? Are these results just flukes? No. In fact, the MMPI and Supreme Court decision examples are not even rare. There are an overwhelming number of studies and meta-analyses—studies of studies—that corroborate this phenomenon. In his book, Montier provides a diverse range of studies comparing statistical models and experts, ranging from the detection of brain damage, the interview process to admit students to university, the likelihood of a criminal to reoffend, the selection of “good” and “bad” vintages of Bordeaux wine, and the buying decisions of purchasing managers. Value Investors Have Cognitive Biases, Too Graham recognized early on that successful investing required emotional discipline.

pages: 1,829 words: 135,521

**
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
** by
Wes McKinney

business process, Debian, Firefox, general-purpose programming language, Google Chrome, Guido van Rossum, index card, p-value, quantitative trading / quantitative ﬁnance, random walk, recommendation engine, sentiment analysis, side project, sorting algorithm, statistical model, type inference

.: categories=['a', 'b']) In [25]: data Out[25]: x0 x1 y category 0 1 0.01 -1.5 a 1 2 -0.01 0.0 b 2 3 0.25 3.6 a 3 4 -4.10 1.3 a 4 5 0.00 -2.0 b If we wanted to replace the 'category' column with dummy variables, we create dummy variables, drop the 'category' column, and then join the result: In [26]: dummies = pd.get_dummies(data.category, prefix='category') In [27]: data_with_dummies = data.drop('category', axis=1).join(dummies) In [28]: data_with_dummies Out[28]: x0 x1 y category_a category_b 0 1 0.01 -1.5 1 0 1 2 -0.01 0.0 0 1 2 3 0.25 3.6 1 0 3 4 -4.10 1.3 1 0 4 5 0.00 -2.0 0 1 There are some nuances to fitting certain statistical models with dummy variables. It may be simpler and less error-prone to use Patsy (the subject of the next section) when you have more than simple numeric columns. 13.2 Creating Model Descriptions with Patsy Patsy is a Python library for describing statistical models (especially linear models) with a small string-based “formula syntax,” which is inspired by (but not exactly the same as) the formula syntax used by the R and S statistical programming languages. Patsy is well supported for specifying linear models in statsmodels, so I will focus on some of the main features to help you get up and running.

…

Compared with scikit-learn, statsmodels contains algorithms for classical (primarily frequentist) statistics and econometrics. This includes such submodules as: Regression models: Linear regression, generalized linear models, robust linear models, linear mixed effects models, etc. Analysis of variance (ANOVA) Time series analysis: AR, ARMA, ARIMA, VAR, and other models Nonparametric methods: Kernel density estimation, kernel regression Visualization of statistical model results statsmodels is more focused on statistical inference, providing uncertainty estimates and p-values for parameters. scikit-learn, by contrast, is more prediction-focused. As with scikit-learn, I will give a brief introduction to statsmodels and how to use it with NumPy and pandas. 1.4 Installation and Setup Since everyone uses Python for different applications, there is no single solution for setting up Python and required add-on packages.

…

While readers may have many different end goals for their work, the tasks required generally fall into a number of different broad groups: Interacting with the outside world Reading and writing with a variety of file formats and data stores Preparation Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and transforming data for analysis Transformation Applying mathematical and statistical operations to groups of datasets to derive new datasets (e.g., aggregating a large table by group variables) Modeling and computation Connecting your data to statistical models, machine learning algorithms, or other computational tools Presentation Creating interactive or static graphical visualizations or textual summaries Code Examples Most of the code examples in the book are shown with input and output as it would appear executed in the IPython shell or in Jupyter notebooks: In [5]: CODE EXAMPLE Out[5]: OUTPUT When you see a code example like this, the intent is for you to type in the example code in the In block in your coding environment and execute it by pressing the Enter key (or Shift-Enter in Jupyter).

pages: 461 words: 128,421

**
The Myth of the Rational Market: A History of Risk, Reward, and Delusion on Wall Street
** by
Justin Fox

activist fund / activist shareholder / activist investor, Albert Einstein, Andrei Shleifer, asset allocation, asset-backed security, bank run, beat the dealer, Benoit Mandelbrot, Black-Scholes formula, Bretton Woods, Brownian motion, business cycle, buy and hold, capital asset pricing model, card file, Cass Sunstein, collateralized debt obligation, complexity theory, corporate governance, corporate raider, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, discovery of the americas, diversification, diversified portfolio, Edward Glaeser, Edward Thorp, endowment effect, Eugene Fama: efficient market hypothesis, experimental economics, financial innovation, Financial Instability Hypothesis, fixed income, floating exchange rates, George Akerlof, Henri Poincaré, Hyman Minsky, implied volatility, impulse control, index arbitrage, index card, index fund, information asymmetry, invisible hand, Isaac Newton, John Meriwether, John Nash: game theory, John von Neumann, joint-stock company, Joseph Schumpeter, Kenneth Arrow, libertarian paternalism, linear programming, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, market bubble, market design, Myron Scholes, New Journalism, Nikolai Kondratiev, Paul Lévy, Paul Samuelson, pension reform, performance metric, Ponzi scheme, prediction markets, pushing on a string, quantitative trading / quantitative ﬁnance, Ralph Nader, RAND corporation, random walk, Richard Thaler, risk/return, road to serfdom, Robert Bork, Robert Shiller, Robert Shiller, rolodex, Ronald Reagan, shareholder value, Sharpe ratio, short selling, side project, Silicon Valley, Social Responsibility of Business Is to Increase Its Profits, South Sea Bubble, statistical model, stocks for the long run, The Chicago School, The Myth of the Rational Market, The Predators' Ball, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas L Friedman, Thorstein Veblen, Tobin tax, transaction costs, tulip mania, value at risk, Vanguard fund, Vilfredo Pareto, volatility smile, Yogi Berra

First, modeling financial risk is hard. Statistical models can never fully capture all things that can go wrong (or right). It was as physicist and random walk pioneer M. F. M. Osborne told his students at UC–Berkeley back in 1972: For everyday market events the bell curve works well. When it doesn’t, one needs to look outside the statistical models and make informed judgments about what’s driving the market and what the risks are. The derivatives business and other financial sectors on the rise in the 1980s and 1990s were dominated by young quants. These people knew how to work statistical models, but they lacked the market experience needed to make informed judgments. Meanwhile, those with the experience, wisdom, and authority to make informed judgments—the bosses—didn’t understand the statistical models. It’s possible that, as more quants rise into positions of high authority (1986 Columbia finance Ph.D.

…

Traditional ratios of loan-to-value and monthly payments to income gave way to credit scoring and purportedly precise gradations of default risk that turned out to be worse than useless. In the 1970s, Amos Tversky and Daniel Kahneman had argued that real-world decision makers didn’t follow the statistical models of John von Neumann and Oskar Morgenstern, but used simple heuristics—rules of thumb—instead. Now the mortgage lending industry was learning that heuristics worked much better than statistical models descended from the work of von Neumann and Morgenstern. Simple trumped complex. In 2005, Robert Shiller came out with a second edition of Irrational Exuberance that featured a new twenty-page chapter on “The Real Estate Market in Historical Perspective.” It offered no formulas for determining whether prices were right, but it did feature an index of U.S. home prices back to 1890.

pages: 301 words: 85,126

**
AIQ: How People and Machines Are Smarter Together
** by
Nick Polson,
James Scott

Air France Flight 447, Albert Einstein, Amazon Web Services, Atul Gawande, autonomous vehicles, availability heuristic, basic income, Bayesian statistics, business cycle, Cepheid variable, Checklist Manifesto, cloud computing, combinatorial explosion, computer age, computer vision, Daniel Kahneman / Amos Tversky, Donald Trump, Douglas Hofstadter, Edward Charles Pickering, Elon Musk, epigenetics, Flash crash, Grace Hopper, Gödel, Escher, Bach, Harvard Computers: women astronomers, index fund, Isaac Newton, John von Neumann, late fees, low earth orbit, Lyft, Magellanic Cloud, mass incarceration, Moneyball by Michael Lewis explains big data, Moravec's paradox, more computing power than Apollo, natural language processing, Netflix Prize, North Sea oil, p-value, pattern recognition, Pierre-Simon Laplace, ransomware, recommendation engine, Ronald Reagan, self-driving car, sentiment analysis, side project, Silicon Valley, Skype, smart cities, speech recognition, statistical model, survivorship bias, the scientific method, Thomas Bayes, Uber for X, uber lyft, universal basic income, Watson beat the top human players on Jeopardy!, young professional

People rely on billions of language facts, most of which they take for granted—like the knowledge that “drop your trousers” and “drop off your trousers” are used in very different situations, only one of which is at the dry cleaner’s. Knowledge like this is hard to codify in explicit rules, because there’s too much of it. Believe it or not, the best way we know to teach it to machines is to give them a giant hard drive full of examples of how people say stuff, and to let the machines sort it out on their own with a statistical model. This purely data-driven approach to language may seem naïve, and until recently we simply didn’t have enough data or fast-enough computers to make it work. Today, though, it works shockingly well. At its tech conference in 2017, for example, Google boldly announced that machines had now reached parity with humans at speech recognition, with a per-word dictation error rate of 4.9%—drastically better than the 20–30% error rates common as recently as 2013.

…

This is about 250 times more common than “whether report” (0.0000000652%), which is used mainly as a bad pun or an example of phonetic ambiguity. From the 1980s onward, NLP researchers began to recognize the value of this purely statistical information. Before, they’d been hand-building rules capable of describing how a given linguistic task should be performed. Now, these experts started training statistical models capable of predicting that a person would perform a task in a certain way. As a field, NLP shifted its focus from understanding to mimicry—from knowing how, to knowing that. These new models required lots of data. You fed the machine as many examples as you could find of how humans use language, and you programmed the machine to use the rules of probability to find patterns in those examples.

…

One example was Google 411, which debuted in 2007. You may remember a time when people dialed 411 to look up a phone number for a local business, at a dollar or so per call. Google 411 lets you do the same thing for free, by dialing 1-800-GOOG-411. It was a useful service in an age before ubiquitous smartphones—and also a great way for Google to build up an enormous database of voice queries that would help train its statistical models for speech recognition. The system quietly shut down in 2010, presumably because Google had all the data it needed. Of course, there’s been an awful lot of Grace Hopper–style coding since 2007 to turn all that data into good prediction rules. So more than a decade later, what’s the result? Let’s try a simple experiment. Open up a blank email on your phone and try dictating a test phrase: “The weather report calls for rain, whether or not the reigning queen has an umbrella.”

**
Learn Algorithmic Trading
** by
Sebastien Donadio

active measures, algorithmic trading, automated trading system, backtesting, Bayesian statistics, buy and hold, buy low sell high, cryptocurrency, DevOps, en.wikipedia.org, fixed income, Flash crash, Guido van Rossum, latency arbitrage, locking in a profit, market fundamentalism, market microstructure, martingale, natural language processing, p-value, paper trading, performance metric, prediction markets, quantitative trading / quantitative ﬁnance, random walk, risk tolerance, risk-adjusted returns, Sharpe ratio, short selling, sorting algorithm, statistical arbitrage, statistical model, stochastic process, survivorship bias, transaction costs, type inference, WebSocket, zero-sum game

Regular relational databases are not efficient at reading these time series. We will review a few ways to handle time series. In-sample versus out-of-sample data When building a statistical model, we use cross-validation to avoid overfitting. Cross-validation imposes a division of data into two or three different sets. One set will be used to create your model, while the other sets will be used to validate the model's accuracy. Because the model has not been created with the other datasets, we will have a better idea of its performance. When testing a trading strategy with historical data, it is important to use a portion of data for testing. In a statistical model, we call training data the initial data to create the model. For a trading strategy, we will say that we are in the in-sample data. The testing data will be called out-of-sample data.

…

Even if Python can also give you the same visualization experience, R was designed for this purpose. R is not significantly more recent than Python. It was released in 1995 by the two founders, Ross Ihaka and Robert Gentleman, while Python was released in 1991 by Guido Van Rossum. Today, R is mainly used by the academic and research world. Unlike many other languages, Python and R allows us to write a statistical model with a few lines of code. Because it is impossible to choose one over the other, since they both have their own advantages, they can easily be used in a complementary manner. Developers created a multitude of libraries capable of easily using one language in conjunction with the other without any difficulties. Choice of IDE – Pycharm or Notebook While RStudio became the standard IDE (Integrated Development Environment) for R, choosing between JetBrains PyCharm and Jupyter Notebook is much more challenging.

…

We recommend using daily returns when studying financial products. In the example of stationary, we could observe that no transformation is needed. The last step of the time series analysis is to forecast the time series. We have two possible scenarios: A strictly stationary series without dependencies among values. We can use a regular linear regression to forecast values. A series with dependencies among values. We will be forced to use other statistical models. In this chapter, we chose to focus on using the Auto-Regression Integrated Moving Averages (ARIMA) model. This model has three parameters: Autoregressive (AR) term (p)—lags of dependent variables. Example for 3, the predictors for x(t) is x(t-1) + x(t-2) + x(t-3). Moving average (MA) term (q)—lags for errors in prediction. Example for 3, the predictor for x(t) is e(t-1) + e(t-2) + e(t-3), where e(i) is the difference between the moving average value and the actual value.

**
Analysis of Financial Time Series
** by
Ruey S. Tsay

Asian financial crisis, asset allocation, Bayesian statistics, Black-Scholes formula, Brownian motion, business cycle, capital asset pricing model, compound rate of return, correlation coefficient, data acquisition, discrete time, frictionless, frictionless market, implied volatility, index arbitrage, Long Term Capital Management, market microstructure, martingale, p-value, pattern recognition, random walk, risk tolerance, short selling, statistical model, stochastic process, stochastic volatility, telemarketer, transaction costs, value at risk, volatility smile, Wiener process, yield curve

Stable Distribution The stable distributions are a natural generalization of normal in that they are stable under addition, which meets the need of continuously compounded returns rt . Furthermore, stable distributions are capable of capturing excess kurtosis shown by historical stock returns. However, non-normal stable distributions do not have a finite variance, which is in conflict with most finance theories. In addition, statistical modeling using non-normal stable distributions is difficult. An example of non-normal stable distributions is the Cauchy distribution, which is symmetric with respect to its median, but has infinite variance. Scale Mixture of Normal Distributions Recent studies of stock returns tend to use scale mixture or finite mixture of normal distributions. Under the assumption of scale mixture of normal distributions, the log return rt is normally distributed with mean µ and variance σ 2 [i.e., rt ∼ N (µ, σ 2 )].

…

Furthermore, the lag- autocovariance of rt is γ = Cov(rt , rt− ) = E =E ∞ i=0 ∞ ψi at−i ∞ ψ j at−− j j=0 ψi ψ j at−i at−− j i, j=0 = ∞ j=0 2 2 ψ j+ ψ j E(at−− j ) = σa ∞ ψ j ψ j+ . j=0 Consequently, the ψ-weights are related to the autocorrelations of rt as follows: ∞ ψi ψi+ γ = i=0 ρ = ∞ 2 , γ0 1 + i=1 ψi ≥ 0, (2.5) where ψ0 = 1. Linear time series models are econometric and statistical models used to describe the pattern of the ψ-weights of rt . 2.4 SIMPLE AUTOREGRESSIVE MODELS The fact that the monthly return rt of CRSP value-weighted index has a statistically significant lag-1 autocorrelation indicates that the lagged return rt−1 might be useful in predicting rt . A simple model that makes use of such predictive power is rt = φ0 + φ1rt−1 + at , (2.6) where {at } is assumed to be a white noise series with mean zero and variance σa2 .

…

If at has a symmetric distribution around zero, then conditional on pt−1 , pt has a 50–50 chance to go up or down, implying that pt would go up or down at random. If we treat the random-walk model as a special AR(1) model, then the coefficient of pt−1 is unity, which does not satisfy the weak stationarity condition of an AR(1) model. A random-walk series is, therefore, not weakly stationary, and we call it a unit-root nonstationary time series. The random-walk model has been widely considered as a statistical model for the movement of logged stock prices. Under such a model, the stock price is not predictable or mean reverting. To see this, the 1-step ahead forecast of model (2.32) at the forecast origin h is p̂h (1) = E( ph+1 | ph , ph−1 , . . .) = ph , which is the log price of the stock at the forecast origin. Such a forecast has no practical value. The 2-step ahead forecast is UNIT- ROOT NONSTATIONARITY 57 p̂h (2) = E( ph+2 | ph , ph−1 , . . .) = E( ph+1 + ah+2 | ph , ph−1 , . . .) = E( ph+1 | ph , ph−1 , . . .) = p̂h (1) = ph , which again is the log price at the forecast origin.

pages: 416 words: 39,022

**
Asset and Risk Management: Risk Oriented Finance
** by
Louis Esch,
Robert Kieffer,
Thierry Lopez

asset allocation, Brownian motion, business continuity plan, business process, capital asset pricing model, computer age, corporate governance, discrete time, diversified portfolio, fixed income, implied volatility, index fund, interest rate derivative, iterative process, P = NP, p-value, random walk, risk/return, shareholder value, statistical model, stochastic process, transaction costs, value at risk, Wiener process, yield curve, zero-coupon bond

Table 6.3 Student distribution quantiles ν γ2 z0.95 z0.975 z0.99 6.00 1.00 0.55 0.38 0.29 0.23 0.17 0.11 0.05 0 2.601 2.026 1.883 1.818 1.781 1.757 1.728 1.700 1.672 1.645 3.319 2.491 2.289 2.199 2.148 2.114 2.074 2.034 1.997 1.960 4.344 3.090 2.795 2.665 2.591 2.543 2.486 2.431 2.378 2.326 5 10 15 20 25 30 40 60 120 normal 8 Blattberg R. and Gonedes N., A comparison of stable and student distributions as statistical models for stock prices, Journal of Business, Vol. 47, 1974, pp. 244–80. 9 Pearson E. S. and Hartley H. O., Biometrika Tables for Statisticians, Biometrika Trust, 1976, p. 146. 190 Asset and Risk Management This clearly shows that when the normal law is used in place of the Student laws, the VaR parameter is underestimated unless the number of degrees of freedom is high. Example With the same data as above, that is, E(pt ) = 100 and σ (pt ) = 80, and for 15 degrees of freedom, we ﬁnd the following evaluations of VaR, instead of 31.6, 64.3 and 86.1 respectively.

…

Using pt presents the twofold advantage of: • making the magnitudes of the various factors likely to be involved in evaluating an asset or portfolio relative; • supplying a variable that has been shown to be capable of possessing certain distributional properties (normality or quasi-normality for returns on equities, for example). 1 Estimating quantiles is often a complex problem, especially for arguments close to 0 or 1. Interested readers should read Gilchrist W. G., Statistical Modelling with Quantile Functions, Chapman & Hall/CRC, 2000. 2 If the risk factor X is a share price, we are looking at the return on that share (see Section 3.1.1). 200 Asset and Risk Management Valuation models Historical data Estimation technique VaR Figure 7.1 Estimating VaR Note In most calculation methods, a different expression is taken into consideration: ∗ (t) = ln X(t) X(t − 1) As we saw in Section 3.1.1, this is in fact very similar to (t) and has the advantage that it can take on any real value3 and that the logarithmic return for several consecutive periods is the sum of the logarithmic return for each of those periods.

…

If the model is nonstationary (nonstationary variance and/or mean), it can be converted into a stationary model by using the integration of order r after the logarithmic transformation : if y is the transformed variable, apply the technique to ((. . . (yt ))) − r times− instead of yt ((yt ) = yt − yt−1 ). We therefore use an ARIMA(p, r, q) procedure.16 If this procedure fails because of nonconstant volatility in the error term, it will be necessary to use the ARCH-GARCH or EGARCH models (Appendix 7). B. The equation on the replicated positions This equation may be estimated by a statistical model (such as SAS/OR procedure PROC NPL), using multiple regression with the constraints 15 years αi = 1 and αi ≥ 0 i=3 months It is also possible to estimate the replicated positions (b) with the single constraint (by using the SAS/STAT procedure) 15 years αi = 1 i=3 months In both cases, the duration of the demand product is a weighted average of the durations. In the second case, it is possible to obtain negative αi values.

**
Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth
** by
Stuart Ritchie

Albert Einstein, anesthesia awareness, Bayesian statistics, Carmen Reinhart, Cass Sunstein, citation needed, Climatic Research Unit, cognitive dissonance, complexity theory, coronavirus, correlation does not imply causation, COVID-19, Covid-19, crowdsourcing, deindustrialization, Donald Trump, double helix, en.wikipedia.org, epigenetics, Estimating the Reproducibility of Psychological Science, Growth in a Time of Debt, Kenneth Rogoff, l'esprit de l'escalier, meta analysis, meta-analysis, microbiome, Milgram experiment, mouse model, New Journalism, p-value, phenotype, placebo effect, profit motive, publication bias, publish or perish, race to the bottom, randomized controlled trial, recommendation engine, rent-seeking, replication crisis, Richard Thaler, risk tolerance, Ronald Reagan, Scientific racism, selection bias, Silicon Valley, Silicon Valley startup, Stanford prison experiment, statistical model, stem cell, Steven Pinker, Thomas Bayes, twin studies, University of East Anglia

Do you delete those outlying datapoints because you reason that they make your sample less representative of the population? Or do you leave them in? Do you split the sample up into separate age groups, or by some other criterion? Do you merge observations from week one and week two and compare them to weeks three and four, or look at each week separately, or make some other grouping? Do you choose this particular statistical model, or that one? Precisely how many ‘control’ variables do you throw in? There aren’t objective answers to these kinds of questions. They depend on the specifics and context of what you’re researching, and on your perspective on statistics (which is, after all, a constantly evolving subject in itself): ask ten statisticians, and you might receive as many different answers. Meta-science experiments in which multiple research groups are tasked with analysing the same dataset or designing their own study from scratch to test the same hypothesis, have found a high degree of variation in method and results.70 Endless choices offer endless opportunities for scientists who begin their analysis without a clear idea of what they’re looking for.

…

– we’re looking for generalisable facts about the world (‘what is the link between taking antipsychotic drugs and schizophrenia symptoms in humans in general?’). Figure 3, below, illustrates overfitting. As you can see, we have a set of data: one measurement of rainfall is made each month across the space of a year. We want to draw a line through that data that describes what happens to rainfall over time: the line will be our statistical model of the data. And we want to use that line to predict how much rain will fall in each month next year. The laziest possible solution is just to try a straight line, as in graph 3A – but it looks almost nothing like the data: if we tried to use that line to predict the next year’s measurements, forecasting the exact same amount of rain for every month, we’d do a terribly inaccurate job. Next, we might try a curved line that goes through the data like in graph 3B, and this turns out to be a decent approximation.

…

For the American Statistical Association’s consensus position on p-values, written surprisingly comprehensibly, see Ronald L. Wasserstein & Nicole A. Lazar, ‘The ASA Statement on p-Values: Context, Process, and Purpose’, The American Statistician 70, no. 2 (2 April 2016): pp. 129–33; https://doi.org/10.1080/0003130 5.2016.1154108. It defines the p-value like this: ‘the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value: p. 131. 18. Why does the definition of the p-value (‘how likely is it that pure noise would give you results like the ones you have, or ones with an even larger effect’) have that ‘or an even larger effect’ clause in it? (The ‘or more extreme’ part of the American Statistical Association’s definition, given in the previous endnote, serves the same purpose.)

pages: 233 words: 67,596

**
Competing on Analytics: The New Science of Winning
** by
Thomas H. Davenport,
Jeanne G. Harris

always be closing, big data - Walmart - Pop Tarts, business intelligence, business process, call centre, commoditize, data acquisition, digital map, en.wikipedia.org, global supply chain, high net worth, if you build it, they will come, intangible asset, inventory management, iterative process, Jeff Bezos, job satisfaction, knapsack problem, late fees, linear programming, Moneyball by Michael Lewis explains big data, Netflix Prize, new economy, performance metric, personalized medicine, quantitative hedge fund, quantitative trading / quantitative ﬁnance, recommendation engine, RFID, search inside the book, shareholder value, six sigma, statistical model, supply-chain management, text mining, the scientific method, traveling salesman, yield management

As the organization’s analytical capabilities improved, these breakdowns became less frequent. As more tangible benefits began to appear, the CEO’s commitment to competing on analytics grew. In his letter to shareholders, he described the growing importance of analytics and a new growth initiative to “outsmart and outthink” the competition. Analysts expanded their work to use propensity analysis and neural nets (an artificial intelligence technology incorporating nonlinear statistical modeling to identify patterns) to target and provide specialized services to clients with both personal and corporate relationships with the bank. They also began testing some analytically enabled new services for trust clients. Today, BankCo is well on its way to becoming an analytical competitor. Stage 4: Analytical Companies The primary focus in stage 4 is on building world-class analytical capabilities at the enterprise level.

…

They can also be used to help streamline the flow of information or products—for example, they can help employees of health care organizations decide where to send donated organs according to criteria ranging from blood type to geographic limitations. Emerging Analytical Technologies These are some of the leading-edge technologies that will play a role in analytical applications over the next few years: Text categorization is the process of using statistical models or rules to rate a document’s relevance to a certain topic. For example, text categorization can be used to dynamically evaluate competitors’ product assortments on their Web sites. Genetic algorithms are a class of stochastic optimization methods that use principles found in natural genetic reproduction (crossover or mutations of DNA structures). One common application is to optimize delivery routes.

…

As enterprise systems become more analytical, vendors such as SAP and Oracle are rapidly incorporating these capabilities as well.) Commercially purchased analytical applications usually have an interface to be used by information workers, managers, and analysts. But for proprietary analyses, the presentation tools determine how different classes of individuals can use the data. For example, a statistician could directly access a statistical model, but most managers would hesitate to do so. A new generation of visual analytical tools—from new vendors such as Spotfire and Visual Sciences and from traditional analytics providers such as SAS—allow the manipulation of data and analyses through an intuitive visual interface. A manager, for example, could look at a plot of data, exclude outlier values, and compute a regression line that fits the data—all without any statistical skills.

pages: 252 words: 72,473

**
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
** by
Cathy O'Neil

Affordable Care Act / Obamacare, Bernie Madoff, big data - Walmart - Pop Tarts, call centre, carried interest, cloud computing, collateralized debt obligation, correlation does not imply causation, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, Emanuel Derman, housing crisis, I will remember that I didn’t make the world, and it doesn’t satisfy my equations, illegal immigration, Internet of things, late fees, mass incarceration, medical bankruptcy, Moneyball by Michael Lewis explains big data, new economy, obamacare, Occupy movement, offshore financial centre, payday loans, peer-to-peer lending, Peter Thiel, Ponzi scheme, prediction markets, price discrimination, quantitative hedge fund, Ralph Nader, RAND corporation, recommendation engine, Rubik’s Cube, Sharpe ratio, statistical model, Tim Cook: Apple, too big to fail, Unsafe at Any Speed, Upton Sinclair, Watson beat the top human players on Jeopardy!, working poor

The proxies the journalists chose for educational excellence make sense, after all. Their spectacular failure comes, instead, from what they chose not to count: tuition and fees. Student financing was left out of the model. This brings us to the crucial question we’ll confront time and again. What is the objective of the modeler? In this case, put yourself in the place of the editors at U.S. News in 1988. When they were building their first statistical model, how would they know when it worked? Well, it would start out with a lot more credibility if it reflected the established hierarchy. If Harvard, Stanford, Princeton, and Yale came out on top, it would seem to validate their model, replicating the informal models that they and their customers carried in their own heads. To build such a model, they simply had to look at those top universities and count what made them so special.

…

In a sense, it learns. Compared to the human brain, machine learning isn’t especially efficient. A child places her finger on the stove, feels pain, and masters for the rest of her life the correlation between the hot metal and her throbbing hand. And she also picks up the word for it: burn. A machine learning program, by contrast, will often require millions or billions of data points to create its statistical models of cause and effect. But for the first time in history, those petabytes of data are now readily available, along with powerful computers to process them. And for many jobs, machine learning proves to be more flexible and nuanced than the traditional programs governed by rules. Language scientists, for example, spent decades, from the 1960s to the early years of this century, trying to teach computers how to read.

…

Imagine if a highly motivated and responsible person with modest immigrant beginnings is trying to start a business and needs to rely on such a system for early investment. Who would take a chance on such a person? Probably not a model trained on such demographic and behavioral data. I should note that in the statistical universe proxies inhabit, they often work. More times than not, birds of a feather do fly together. Rich people buy cruises and BMWs. All too often, poor people need a payday loan. And since these statistical models appear to work much of the time, efficiency rises and profits surge. Investors double down on scientific systems that can place thousands of people into what appear to be the correct buckets. It’s the triumph of Big Data. And what about the person who is misunderstood and placed in the wrong bucket? That happens. And there’s no feedback to set the system straight. A statistics-crunching engine has no way to learn that it dispatched a valuable potential customer to call center hell.

pages: 197 words: 35,256

**
NumPy Cookbook
** by
Ivan Idris

business intelligence, cloud computing, computer vision, Debian, en.wikipedia.org, Eratosthenes, mandelbrot fractal, p-value, sorting algorithm, statistical model, transaction costs, web application

diff Calculates differences of numbers within a NumPy array. If not specified, first-order differences are computed. log Calculates the natural log of elements in a NumPy array. sum Sums the elements of a NumPy array. dot Does matrix multiplication for 2D arrays. Calculates the inner product for 1D arrays. Installing scikits-statsmodels The scikits-statsmodels package focuses on statistical modeling. It can be integrated with NumPy and Pandas (more about Pandas later in this chapter). How to do it... Source and binaries can be downloaded from http://statsmodels.sourceforge.net/install.html . If you are installing from source, you need to run the following command: python setup.py install If you are using setuptools, the command is: easy_install statsmodels Performing a normality test with scikits-statsmodels The scikits-statsmodels package has lots of statistical tests.

…

Perform an ordinary least squares calculation by creating an OLS object, and calling its fit method as follows: x, y = data.exog, data.endog fit = statsmodels.api.OLS(y, x).fit() print "Fit params", fit.params This should print the result of the fitting procedure, as follows: Fit params COPPERPRICE 14.222028 INCOMEINDEX 1693.166242 ALUMPRICE -60.638117 INVENTORYINDEX 2515.374903 TIME 183.193035 Summarize.The results of the OLS fit can be summarized by the summary method as follows: print fit.summary() This will give us the following output for the regression results: The code to load the copper data set is as follows: import statsmodels.api # See https://github.com/statsmodels /statsmodels/tree/master/statsmodels/datasets data = statsmodels.api.datasets.copper.load_pandas() x, y = data.exog, data.endog fit = statsmodels.api.OLS(y, x).fit() print "Fit params", fit.params print print "Summary" print print fit.summary() How it works... The data in the Dataset class of statsmodels follows a special format. Among others, this class has the endog and exog attributes. Statsmodels has a load function, which loads data as NumPy arrays. Instead, we used the load_pandas method, which loads data as Pandas objects. We did an OLS fit, basically giving us a statistical model for copper price and consumption. Resampling time series data In this tutorial, we will learn how to resample time series with Pandas. How to do it... We will download the daily price time series data for AAPL, and resample it to monthly data by computing the mean. We will accomplish this by creating a Pandas DataFrame, and calling its resample method. Creating a date-time index.Before we can create a Pandas DataFrame, we need to create a DatetimeIndex method to pass to the DataFrame constructor.

pages: 396 words: 117,149

**
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World
** by
Pedro Domingos

Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, basic income, Bayesian statistics, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, creative destruction, crowdsourcing, Danny Hillis, data is the new oil, double helix, Douglas Hofstadter, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, global village, Google Glasses, Gödel, Escher, Bach, information retrieval, job automation, John Markoff, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, lone genius, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, NP-complete, off grid, P = NP, PageRank, pattern recognition, phenotype, planetary scale, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, scientific worldview, Second Machine Age, self-driving car, Silicon Valley, social intelligence, speech recognition, Stanford marshmallow experiment, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight, zero-sum game

In machine learning, knowledge is often in the form of statistical models, because most knowledge is statistical: all humans are mortal, but only 4 percent are Americans. Skills are often in the form of procedures: if the road curves left, turn the wheel left; if a deer jumps in front of you, slam on the brakes. (Unfortunately, as of this writing Google’s self-driving cars still confuse windblown plastic bags with deer.) Often, the procedures are quite simple, and it’s the knowledge at their core that’s complex. If you can tell which e-mails are spam, you know which ones to delete. If you can tell how good a board position in chess is, you know which move to make (the one that leads to the best position). Machine learning takes many different forms and goes by many different names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems, and more.

…

They called this scheme the EM algorithm, where the E stands for expectation (inferring the expected probabilities) and the M for maximization (estimating the maximum-likelihood parameters). They also showed that many previous algorithms were special cases of EM. For example, to learn hidden Markov models, we alternate between inferring the hidden states and estimating the transition and observation probabilities based on them. Whenever we want to learn a statistical model but are missing some crucial information (e.g., the classes of the examples), we can use EM. This makes it one of the most popular algorithms in all of machine learning. You might have noticed a certain resemblance between k-means and EM, in that they both alternate between assigning entities to clusters and updating the clusters’ descriptions. This is not an accident: k-means itself is a special case of EM, which you get when all the attributes have “narrow” normal distributions, that is, normal distributions with very small variance.

…

See S curves Significance tests, 87 Silver, Nate, 17, 238 Similarity, 178, 179 Similarity measures, 192, 197–200, 207 Simon, Herbert, 41, 225–226, 302 Simultaneous localization and mapping (SLAM), 166 Singularity, 28, 186, 286–289, 311 The Singularity Is Near (Kurzweil), 286 Siri, 37, 155, 161–162, 165, 172, 255 SKICAT (sky image cataloging and analysis tool), 15, 299 Skills, learners and, 8, 217–227 Skynet, 282–286 Sloan Digital Sky Survey, 15 Smith, Adam, 58 Snow, John, 183 Soar, chunking in, 226 Social networks, information propagation in, 231 The Society of Mind (Minsky), 35 Space complexity, 5 Spam filters, 23–24, 151–152, 168–169, 171 Sparse autoencoder, 117 Speech recognition, 155, 170–172, 276, 306 Speed, learning algorithms and, 139–142 Spin glasses, brain and, 102–103 Spinoza, Baruch, 58 Squared error, 241, 243 Stacked autoencoder, 117 Stacking, 238, 255, 309 States, value of, 219–221 Statistical algorithms, 8 Statistical learning, 37, 228, 297, 300, 307 Statistical modeling, 8. See also Machine learning Statistical relational learning, 227–233, 254, 309 Statistical significance tests, 76–77 Statistics, Master Algorithm and, 31–32 Stock market predictions, neural networks and, 112, 302 Stream mining, 258 String theory, 46–47 Structure mapping, 199–200, 254, 307 Succession, rule of, 145–146 The Sun Also Rises (Hemingway), 106 Supervised learning, 209, 214, 220, 222, 226 Support vector machines (SVMs), 53, 179, 190–196, 240, 242, 244, 245, 254, 307 Support vectors, 191–193, 196, 243–244 Surfaces and Essences (Hofstadter & Sander), 200 Survival of the fittest programs, 131–134 Sutton, Rich, 221, 223 SVMs.

pages: 481 words: 125,946

**
What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence
** by
John Brockman

agricultural Revolution, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, artificial general intelligence, augmented reality, autonomous vehicles, basic income, bitcoin, blockchain, clean water, cognitive dissonance, Colonization of Mars, complexity theory, computer age, computer vision, constrained optimization, corporate personhood, cosmological principle, cryptocurrency, cuban missile crisis, Danny Hillis, dark matter, discrete time, Douglas Engelbart, Elon Musk, Emanuel Derman, endowment effect, epigenetics, Ernest Rutherford, experimental economics, Flash crash, friendly AI, functional fixedness, global pandemic, Google Glasses, hive mind, income inequality, information trail, Internet of things, invention of writing, iterative process, Jaron Lanier, job automation, Johannes Kepler, John Markoff, John von Neumann, Kevin Kelly, knowledge worker, loose coupling, microbiome, Moneyball by Michael Lewis explains big data, natural language processing, Network effects, Norbert Wiener, pattern recognition, Peter Singer: altruism, phenotype, planetary scale, Ray Kurzweil, recommendation engine, Republic of Letters, RFID, Richard Thaler, Rory Sutherland, Satyajit Das, Search for Extraterrestrial Intelligence, self-driving car, sharing economy, Silicon Valley, Skype, smart contracts, social intelligence, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, superintelligent machines, supervolcano, the scientific method, The Wisdom of Crowds, theory of mind, Thorstein Veblen, too big to fail, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!, Y2K

A literature pioneered by psychologists such as the late Robyn Dawes finds that virtually any routine decision-making task—detecting fraud, assessing the severity of a tumor, hiring employees—is done better by a simple statistical model than by a leading expert in the field. Let me offer just two illustrative examples, one from human-resource management and the other from the world of sports. First, let’s consider the embarrassing ubiquity of job interviews as an important, often the most important, determinant of who gets hired. At the University of Chicago Booth School of Business, where I teach, recruiters devote endless hours to interviewing students on campus for potential jobs—a process that selects the few who will be invited to visit the employer, where they will undergo another extensive set of interviews. Yet research shows that interviews are nearly useless in predicting whether a job prospect will perform well on the job. Compared to a statistical model based on objective measures such as grades in courses relevant to the job in question, interviews primarily add noise and introduce the potential for prejudice.

…

AI systems can be thought of as trying to approximate rational behavior using limited resources. There’s an algorithm for computing the optimal action for achieving a desired outcome, but it’s computationally expensive. Experiments have found that simple learning algorithms with lots of training data often outperform complex hand-crafted models. Today’s systems primarily provide value by learning better statistical models and performing statistical inference for classification and decision making. The next generation will be able to create and improve their own software and are likely to self-improve rapidly. In addition to improving productivity, AI and robotics are drivers for numerous military and economic arms races. Autonomous systems can be faster, smarter, and less predictable than their competitors.

…

Compared to a statistical model based on objective measures such as grades in courses relevant to the job in question, interviews primarily add noise and introduce the potential for prejudice. (Statistical models don’t favor any particular alma mater or ethnic background and cannot detect good looks.) These facts have been known for more than four decades, but hiring practices have barely budged. The reason is simple: Each of us just knows that if we are the one conducting an interview, we will learn a lot about the candidate. It might well be that other people are not good at this task, but I am! This illusion, in direct contradiction to empirical research, means that we continue to choose employees the same way we always did. We size them up, eye to eye. One domain where some progress has been made in adopting a more scientific approach to job-candidate selection is sports, as documented by the Michael Lewis book and movie Moneyball.

**
Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data
** by
Dipanjan Sarkar

bioinformatics, business intelligence, computer vision, continuous integration, en.wikipedia.org, general-purpose programming language, Guido van Rossum, information retrieval, Internet of things, invention of the printing press, iterative process, natural language processing, out of africa, performance metric, premature optimization, recommendation engine, self-driving car, semantic web, sentiment analysis, speech recognition, statistical model, text mining, Turing test, web application

Even though we have a large number of machine learning and data analysis techniques at our disposal, most of them are tuned to work with numerical data, hence we have to resort to areas like natural language processing (NLP ) and specialized techniques, transformations, and algorithms to analyze text data, or more specifically natural language, which is quite different from programming languages that are easily understood by machines. Remember that textual data, being highly unstructured, does not follow or adhere to structured or regular syntax and patterns—hence we cannot directly use mathematical or statistical models to analyze it. Before we dive into specific techniques and algorithms to analyze textual data, we will be going over some of the main concepts and theoretical principles associated with the nature of text data in this chapter. The primary intent here is to get you familiarized with concepts and domains associated with natural language understanding, processing, and text analytics. We will be using the Python programming language in this book primarily for accessing and analyzing text data.

…

Topic modelingusually involves using statistical and mathematical modeling techniques to extract main topics, themes, or concepts from a corpus of documents. Note here the emphasis on corpus of documents because the more diverse set of documents you have, the more topics or concepts you can generate—unlike with a single document where you will not get too many topics or concepts if it talks about a singular concept. Topic models are also often known as probabilistic statistical models, which use specific statistical techniques including singular valued decomposition and latent dirichlet allocation to discover connected latent semantic structures in text data that yield topics and concepts. They are used extensively in text analytics and even bioinformatics. Automated document summarizationis the process of using a computer program or algorithm based on statistical and ML techniques to summarize a document or corpus of documents such that we obtain a short summary that captures all the essential concepts and themes of the original document or corpus.

…

The end result is still in the form of some document, but with a few sentences based on the length we might want the summary to be. This is similar to having a research paper with an abstract or an executive summary. The main objective of automated document summarization is to perform this summarization without involving human inputs except for running any computer programs. Mathematical and statistical models help in building and automating the task of summarizing documents by observing their content and context. There are mainly two broad approaches towards document summarization using automated techniques: Extraction-based techniques: These methods use mathematical and statistical concepts like SVD to extract some key subset of content from the original document such that this subset of content contains the core information and acts as the focal point of the entire document.

pages: 447 words: 104,258

**
Mathematics of the Financial Markets: Financial Instruments and Derivatives Modelling, Valuation and Risk Issues
** by
Alain Ruttiens

algorithmic trading, asset allocation, asset-backed security, backtesting, banking crisis, Black Swan, Black-Scholes formula, Brownian motion, capital asset pricing model, collateralized debt obligation, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, delta neutral, discounted cash flows, discrete time, diversification, fixed income, implied volatility, interest rate derivative, interest rate swap, margin call, market microstructure, martingale, p-value, passive investing, quantitative trading / quantitative ﬁnance, random walk, risk/return, Satyajit Das, Sharpe ratio, short selling, statistical model, stochastic process, stochastic volatility, time value of money, transaction costs, value at risk, volatility smile, Wiener process, yield curve, zero-coupon bond

FOCARDI, Frank J. FABOZZI, The Mathematics of Financial Modeling and Investment Management, John Wiley & Sons, Inc., Hoboken, 2004, 800 p. Lawrence GALITZ, Financial Times Handbook of Financial Engineering, FT Press, 3rd ed. Scheduled on November 2011, 480 p. Philippe JORION, Financial Risk Manager Handbook, John Wiley & Sons, Inc., Hoboken, 5th ed., 2009, 752 p. Tze Leung LAI, Haipeng XING, Statistical Models and Methods for Financial Markets, Springer, 2008, 374 p. David RUPPERT, Statistics and Finance, An Introduction, Springer, 2004, 482 p. Dan STEFANICA, A Primer for the Mathematics of Financial Engineering, FE Press, 2011, 352 p. Robert STEINER, Mastering Financial Calculations, FT Prentice Hall, 1997, 400 p. John L. TEALL, Financial Market Analytics, Quorum Books, 1999, 328 p. Presents the maths needed to understand quantitative finance, with examples and applications focusing on financial markets. 1.

…

More generally, Jarrow has developed some general but very useful considerations about model risk in an article devoted to risk management models, but valid for any kind of (financial) mathematical model.17 In his article, Jarrow is distinguishing between statistical and theoretical models: the former ones refer to modeling a market price or return evolution, based on historical data, such as a GARCH model. What is usually developed as “quantitative models” by some fund or portfolio managers, also belong to statistical models. On the other hand, theoretical models aim to evidence some causality based on a financial/economic reasoning, for example the Black–Scholes formula. Both types of model imply some assumptions: Jarrow distinguishes between robust and non-robust assumptions, depending on the size of the impact when the assumption is slightly modified. The article then develops pertinent considerations about testing, calibrating and using a model.

…

Philippe JORION, Financial Risk Manager Handbook, John Wiley & Sons, Inc., Hoboken, 6th ed., 2010, 800 p. E. JURCZENKO, B. MAILLET (eds), Multi-Moment Asset Allocation and Pricing Models, John Wiley & Sons, Ltd, Chichester, 2006, 233 p. Ioannis KARATZAS, Steven E. SHREVE, Methods of Mathematical Finance, Springer, 2010, 430 p. Donna KLINE, Fundamentals of the Futures Market, McGraw-Hill, 2000, 256 p. Tze Leung LAI, Haipeng XING, Statistical Models and Methods for Financial Markets, Springer, 2008, 374 p. Raymond M. LEUTHOLD, Joan C. JUNKUS, Jean E. CORDIER, The Theory and Practice of Futures Markets, Stipes Publishing, 1999, 410 p. Bob LITTERMAN, Modern Investment Management – An Equilibrium Approach, John Wiley & Sons, Inc., Hoboken, 2003, 624 p. T. LYNCH, J. APPLEBY, Large Fluctuation of Stochastic Differential Equations: Regime Switching and Applications to Simulation and Finance, LAP LAMBERT Academic Publishing, 2010, 240 p.

pages: 518 words: 147,036

**
The Fissured Workplace
** by
David Weil

accounting loophole / creative accounting, affirmative action, Affordable Care Act / Obamacare, banking crisis, barriers to entry, business cycle, business process, buy and hold, call centre, Carmen Reinhart, Cass Sunstein, Clayton Christensen, clean water, collective bargaining, commoditize, corporate governance, corporate raider, Corrections Corporation of America, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, declining real wages, employer provided health coverage, Frank Levy and Richard Murnane: The New Division of Labor, George Akerlof, global supply chain, global value chain, hiring and firing, income inequality, information asymmetry, intermodal, inventory management, Jane Jacobs, Kenneth Rogoff, law of one price, loss aversion, low skilled workers, minimum wage unemployment, moral hazard, Network effects, new economy, occupational segregation, Paul Samuelson, performance metric, pre–internet, price discrimination, principal–agent problem, Rana Plaza, Richard Florida, Richard Thaler, Ronald Coase, shareholder value, Silicon Valley, statistical model, Steve Jobs, supply-chain management, The Death and Life of Great American Cities, The Nature of the Firm, transaction costs, ultimatum game, union organizing, women in the workforce, yield management

The impact of shedding janitorial jobs in otherwise higher-wage companies is borne out in several studies of contracting out among janitorial workers. Using a statistical model to predict the factors that increase the likelihood of contracting out specific types of jobs, Abraham and Taylor demonstrate that the higher the typical wage for the workforce at an establishment, the more likely that establishment will contract out its janitorial work. They also show that establishments that do any contracting out of janitorial workers tend to shift out the function entirely.36 Wages and benefits for workers employed directly versus contracted out can be compared given the significant number of people in both groups. Using statistical models that control for both observed characteristics of the workers and the places in which they work, several studies directly compare the wages and benefits for these occupations.

…

For example, franchisees might be more common in areas where there is greater competition among fast-food restaurants. That competition (and franchising only indirectly) might lead them to have higher incentives to not comply. Alternatively, company-owned outlets might be in locations with stronger consumer markets, higher-skilled workers, or lower crime rates, all of which might also be associated with compliance. To adequately account for these problems, statistical models that consider all of the potentially relevant factors, including franchise status, are generated to predict compliance levels. By doing so, the effect of franchising can be examined, holding other factors constant. This allows measurement of the impact on compliance of an outlet being run by a franchisee with otherwise identical features, as opposed to a company-owned outlet. Figure 6.1 provides estimates of the impact of franchise ownership on three different measures of compliance for the top twenty branded fast-food companies in the United States.22 The figure presents the percentage difference in compliance between franchised outlets relative to otherwise comparable company-owned outlets of the same brand.23 FIGURE 6.1.

…

Mining entered into contract agreements at mine sites that Ember had never worked. This narrative is based on Federal Mine Safety and Health Review Commission, Secretary of Labor MSHA v. Ember Contracting Corporation, Office of Administrative Law Judges, November 4, 2011. I am grateful to Greg Wagner for flagging this case and to Andrew Razov for additional research on it. 26. These estimates are based on quarterly mining data from 2000–2010. Using statistical modeling techniques, two different measures of traumatic injuries and a direct measure of fatality rates are associated with contracting status of the mine operator as well as other explanatory factors, including mining method, physical attributes of the mine, union status, size of operations, year, and location. The contracting measure includes all forms of contracting. See Buessing and Weil (2013). 27.

pages: 721 words: 197,134

**
Data Mining: Concepts, Models, Methods, and Algorithms
** by
Mehmed Kantardzić

Albert Einstein, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application

In statistics, a subset of a population is called a sample and it describes a finite data set of n-dimensional vectors. Throughout this book, we will simply call this subset of population data set, to eliminate confusion between the two definitions of sample: one (explained earlier) denoting the description of a single entity in the population, and the other (given here) referring to the subset of a population. From a given data set, we build a statistical model of the population that will help us to make inferences concerning that same population. If our inferences from the data set are to be valid, we must obtain samples that are representative of the population. Very often, we are tempted to choose a data set by selecting the most convenient members of the population. But such an approach may lead to erroneous inferences concerning the population.

…

We are minimizing empirical error: E(g/x) = 1/N Σ (rt − g[xt])2 for t = 1 to N. Generalized linear regression models are currently the most frequently applied statistical techniques. They are used to describe the relationship between the trend of one variable and the values taken by several other variables. Modeling this type of relationship is often called linear regression. Fitting models is not the only task in statistical modeling. We often want to select one of several possible models as being the most appropriate. An objective method for choosing between different models is called ANOVA, and it is described in Section 5.5. The relationship that fits a set of data is characterized by a prediction model called a regression equation. The most widely used form of the regression model is the general linear model formally written as Applying this equation to each of the given samples we obtain a new set of equations where εj’s are errors of regression for each of m given samples.

…

All these ideas are still in their infancy, and we expect that the next generation of text-mining techniques and tools will improve the quality of information and knowledge discovery from text. 11.7 LATENT SEMANTIC ANALYSIS (LSA) LSA is a method that was originally developed to improve the accuracy and effectiveness of IR techniques by focusing on semantic meaning of words across a series of usage contexts, as opposed to using simple string-matching operations. LSA is a way of partitioning free text using a statistical model of word usage that is similar to eigenvector decomposition and factor analysis. Rather than focusing on superficial features such as word frequency, this approach provides a quantitative measure of semantic similarities among documents based on a word’s context. Two major shortcomings to the use of term counts are synonyms and polysemes. Synonyms refer to different words that have the same or similar meanings but are entirely different words.

pages: 451 words: 103,606

**
Machine Learning for Hackers
** by
Drew Conway,
John Myles White

call centre, centre right, correlation does not imply causation, Debian, Erdős number, Nate Silver, natural language processing, Netflix Prize, p-value, pattern recognition, Paul Erdős, recommendation engine, social graph, SpamAssassin, statistical model, text mining, the scientific method, traveling salesman

You should note that this does not include the intercept term, which you don’t want to penalize for its size. Knowing the number of nonzero coefficients is useful because many people would like to be able to assert that only a few inputs really matter, and we can assert this more confidently if the model performs well even when assigning zero weight to many of the inputs. When the majority of the inputs to a statistical model are assigned zero coefficients, we say that the model is sparse. Developing tools for promoting sparsity in statistical models is a major topic in contemporary machine learning research. The second column, %Dev, is essentially the R2 for this model. For the top row, it’s 0% because you have a zero coefficient for the one input variable and therefore can’t get better performance than just using a constant intercept. For the bottom row, the Dev is 59%, which is the value you’d get from using lm directly, because lm doesn’t do any regularization at all.

…

We can see that we make systematic errors in our predictions if we use a straight line: at small and large values of x, we overpredict y, and we underpredict y for medium values of x. This is easiest to see in a residuals plot, as shown in panel C of Figure 6-1. In this plot, you can see all of the structure of the original data set, as none of the structure is captured by the default linear regression model. Using ggplot2’s geom_smooth function without any method argument, we can fit a more complex statistical model called a Generalized Additive Model (or GAM for short) that provides a smooth, nonlinear representation of the structure in our data: set.seed(1) x <- seq(-10, 10, by = 0.01) y <- 1 - x ⋀ 2 + rnorm(length(x), 0, 5) ggplot(data.frame(X = x, Y = y), aes(x = X, y = Y)) + geom_point() + geom_smooth(se = FALSE) The result, shown in panel D of Figure 6-1, lets us immediately see that we want to fit a curved line instead of a straight line to this data set.

**
Quantitative Trading: How to Build Your Own Algorithmic Trading Business
** by
Ernie Chan

algorithmic trading, asset allocation, automated trading system, backtesting, Black Swan, Brownian motion, business continuity plan, buy and hold, compound rate of return, Edward Thorp, Elliott wave, endowment effect, fixed income, general-purpose programming language, index fund, John Markoff, Long Term Capital Management, loss aversion, p-value, paper trading, price discovery process, quantitative hedge fund, quantitative trading / quantitative ﬁnance, random walk, Ray Kurzweil, Renaissance Technologies, risk-adjusted returns, Sharpe ratio, short selling, statistical arbitrage, statistical model, survivorship bias, systematic trading, transaction costs

I will illustrate this somewhat convoluted procedure at the end of Example 3.6. Data-Snooping Bias In Chapter 2, I mentioned data-snooping bias—the danger that backtest performance is inflated relative to the future performance of the strategy because we have overoptimized the parameters of the model based on transient noise in the historical data. Data snooping bias is pervasive in the business of predictive statistical models of historical data, but is especially serious in finance because of the limited amount of independent data we have. High-frequency data, while in abundant supply, is useful only for high-frequency models. And while we have stock market data stretching back to the early parts of the twentieth century, only data within the past 10 years are really suitable for building predictive model. Furthermore, as discussed in Chapter 2, regime shifts may render even data that are just a few years old obsolete for backtesting purposes.

…

Chan & Associates (www.epchan.com), a consulting firm focusing on trading strategy and software development for money managers. He also co-manages EXP Quantitative Investments, LLC and publishes the Quantitative Trading blog (epchan.blogspot.com), which is syndicated to multiple financial news services including www.tradingmarkets.com and Yahoo! Finance. He has been quoted by the New York Times and CIO magazine on quantitative hedge funds, and has appeared on CNBC’s Closing Bell. Ernie is an expert in developing statistical models and advanced computer algorithms to discover patterns and trends from large quantities of data. He was a researcher in computer science at IBM’s T. J. Watson Research Center, in data mining at Morgan Stanley, and in statistical arbitrage trading at Credit Suisse. He has also been a senior quantitative strategist and trader at various hedge funds, with sizes ranging from millions to billions of dollars.

pages: 442 words: 39,064

**
Why Stock Markets Crash: Critical Events in Complex Financial Systems
** by
Didier Sornette

Asian financial crisis, asset allocation, Berlin Wall, Bretton Woods, Brownian motion, business cycle, buy and hold, capital asset pricing model, capital controls, continuous double auction, currency peg, Deng Xiaoping, discrete time, diversified portfolio, Elliott wave, Erdős number, experimental economics, financial innovation, floating exchange rates, frictionless, frictionless market, full employment, global village, implied volatility, index fund, information asymmetry, intangible asset, invisible hand, John von Neumann, joint-stock company, law of one price, Louis Bachelier, mandelbrot fractal, margin call, market bubble, market clearing, market design, market fundamentalism, mental accounting, moral hazard, Network effects, new economy, oil shock, open economy, pattern recognition, Paul Erdős, Paul Samuelson, quantitative trading / quantitative ﬁnance, random walk, risk/return, Ronald Reagan, Schrödinger's Cat, selection bias, short selling, Silicon Valley, South Sea Bubble, statistical model, stochastic process, stocks for the long run, Tacoma Narrows Bridge, technological singularity, The Coming Technological Singularity, The Wealth of Nations by Adam Smith, Tobin tax, total factor productivity, transaction costs, tulip mania, VA Linux, Y2K, yield curve

Of special interest will be the study of the premonitory processes before ﬁnancial crashes or “bubble” corrections in the stock market. For this purpose, I shall describe a new set of computational methods that are capable of searching and comparing patterns, simultaneously and iteratively, at multiple scales in hierarchical systems. I shall use these patterns to improve the understanding of the dynamical state before and after a ﬁnancial crash and to enhance the statistical modeling of social hierarchical systems with the goal of developing reliable forecasting skills for these large-scale ﬁnancial crashes. IS PREDICTION POSSIBLE? A WORKING HYPOTHESIS With the low of 3227 on April 17, 2000, identiﬁed as the end of the “crash,” the Nasdaq Composite index lost in ﬁve weeks over 37% of its all-time high of 5133 reached on March 10, 2000. This crash has not been followed by a recovery, as occurred from the October 1987 crash.

…

Following the null hypothesis that the exponential description is correct and extrapolating this description to, for example, the three largest crashes on the U.S. market in this century (1914, 1929, and 1987), as indicated in Figure 3.4, yields a recurrence time of about ﬁfty centuries for each single crash. In reality, the three crashes occurred in less than one century. This result is a ﬁrst indication that the exponential model may not apply for the large crashes. As an additional test, 10,000 so-called synthetic data sets, each covering a time span close to a century, hence adding up to about 1 million years, was generated using a standard statistical model used by the ﬁnancial industry [46]. We use the model version GARCH(1,1) estimated from the true index with a student distribution with four degrees of freedom. This model includes both nonstationarity of volatilities (the amplitude of price variations) and the (fat tail) nature of the distribution of the price returns seen in Figure 2.7. Our analysis [209] shows that, in approximately 1 million years of heavy tail “GARCH-trading,” with a reset every century, never did three crashes similar to the three largest observed in the true DJIA occur in a single “GARCH-century.”

…

More recently, Feigenbaum has examined the ﬁrst differences for the logarithm of the S&P 500 from 1980 to 1987 and ﬁnds that he cannot reject the log-periodic component at the 95% conﬁdence level [127]: in plain words, this means that the probability that the log-periodic component results from chance is about or less than one in twenty. To test furthermore the solidity of the advanced log-periodic hypothesis, Johansen, Ledoit, and I [209] tested whether the null hypothesis that a standard statistical model of ﬁnancial markets, called the GARCH(1,1) model with Student-distributed noise, could “explain” the presence of log-periodicity. In the 1,000 surrogate data sets of length 400 weeks generated using this GARCH(1,1) model with Student-distributed noise and analyzed as for the real crashes, only two 400-week windows qualiﬁed. This result corresponds to a conﬁdence level of 998% for rejecting the hypothesis that GARCH(1,1) with Student-distributed noise can generate meaningful log-periodicity.

pages: 336 words: 113,519

**
The Undoing Project: A Friendship That Changed Our Minds
** by
Michael Lewis

Albert Einstein, availability heuristic, Cass Sunstein, choice architecture, complexity theory, Daniel Kahneman / Amos Tversky, Donald Trump, Douglas Hofstadter, endowment effect, feminist movement, framing effect, hindsight bias, John von Neumann, Kenneth Arrow, loss aversion, medical residency, Menlo Park, Murray Gell-Mann, Nate Silver, New Journalism, Paul Samuelson, Richard Thaler, Saturday Night Live, Stanford marshmallow experiment, statistical model, the new new thing, Thomas Bayes, Walter Mischel, Yom Kippur War

He helped hire new management, then helped to figure out how to price tickets, and, finally, inevitably, was asked to work on the problem of whom to select in the NBA draft. “How will that nineteen-year-old perform in the NBA?” was like “Where will the price of oil be in ten years?” A perfect answer didn’t exist, but statistics could get you to some answer that was at least a bit better than simply guessing. Morey already had a crude statistical model to evaluate amateur players. He’d built it on his own, just for fun. In 2003 the Celtics had encouraged him to use it to pick a player at the tail end of the draft—the 56th pick, when the players seldom amount to anything. And thus Brandon Hunter, an obscure power forward out of Ohio University, became the first player picked by an equation.* Two years later Morey got a call from a headhunter who said that the Houston Rockets were looking for a new general manager.

…

He had a diffidence about him—an understanding of how hard it is to know anything for sure. The closest he came to certainty was in his approach to making decisions. He never simply went with his first thought. He suggested a new definition of the nerd: a person who knows his own mind well enough to mistrust it. One of the first things Morey did after he arrived in Houston—and, to him, the most important—was to install his statistical model for predicting the future performance of basketball players. The model was also a tool for the acquisition of basketball knowledge. “Knowledge is literally prediction,” said Morey. “Knowledge is anything that increases your ability to predict the outcome. Literally everything you do you’re trying to predict the right thing. Most people just do it subconsciously.” A model allowed you to explore the attributes in an amateur basketball player that led to professional success, and determine how much weight should be given to each.

…

Without data, there’s nothing to analyze. The Indian was DeAndre Jordan all over again; he was, like most of the problems you faced in life, a puzzle, with pieces missing. The Houston Rockets would pass on him—and be shocked when the Dallas Mavericks took him in the second round of the NBA draft. Then again, you never knew.†† And that was the problem: You never knew. In Morey’s ten years of using his statistical model with the Houston Rockets, the players he’d drafted, after accounting for the draft slot in which they’d been taken, had performed better than the players drafted by three-quarters of the other NBA teams. His approach had been sufficiently effective that other NBA teams were adopting it. He could even pinpoint the moment when he felt, for the first time, imitated. It was during the 2012 draft, when the players were picked in almost the exact same order the Rockets ranked them.

pages: 49 words: 12,968

**
Industrial Internet
** by
Jon Bruner

autonomous vehicles, barriers to entry, commoditize, computer vision, data acquisition, demand response, en.wikipedia.org, factory automation, Google X / Alphabet X, industrial robot, Internet of things, job automation, loose coupling, natural language processing, performance metric, Silicon Valley, slashdot, smart grid, smart meter, statistical model, web application

“Imagine trying to operate a highway system if all you have are monthly traffic readings for a few spots on the road. But that’s what operating our power system was like.” The utility’s customers benefit, too — an example of the industrial internet creating value for every entity to which it’s connected. Fort Collins utility customers can see data on their electric usage through a Web portal that uses a statistical model to estimate how much electricity they’re using on heating, cooling, lighting and appliances. The site then draws building data from county records to recommend changes to insulation and other improvements that might save energy. Water meters measure usage every hour — frequent enough that officials will soon be able to dispatch inspection crews to houses whose vacationing owners might not know about a burst pipe.

pages: 1,088 words: 228,743

**
Expected Returns: An Investor's Guide to Harvesting Market Rewards
** by
Antti Ilmanen

Andrei Shleifer, asset allocation, asset-backed security, availability heuristic, backtesting, balance sheet recession, bank run, banking crisis, barriers to entry, Bernie Madoff, Black Swan, Bretton Woods, business cycle, buy and hold, buy low sell high, capital asset pricing model, capital controls, Carmen Reinhart, central bank independence, collateralized debt obligation, commoditize, commodity trading advisor, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, debt deflation, deglobalization, delta neutral, demand response, discounted cash flows, disintermediation, diversification, diversified portfolio, dividend-yielding stocks, equity premium, Eugene Fama: efficient market hypothesis, fiat currency, financial deregulation, financial innovation, financial intermediation, fixed income, Flash crash, framing effect, frictionless, frictionless market, G4S, George Akerlof, global reserve currency, Google Earth, high net worth, hindsight bias, Hyman Minsky, implied volatility, income inequality, incomplete markets, index fund, inflation targeting, information asymmetry, interest rate swap, invisible hand, Kenneth Rogoff, laissez-faire capitalism, law of one price, London Interbank Offered Rate, Long Term Capital Management, loss aversion, margin call, market bubble, market clearing, market friction, market fundamentalism, market microstructure, mental accounting, merger arbitrage, mittelstand, moral hazard, Myron Scholes, negative equity, New Journalism, oil shock, p-value, passive investing, Paul Samuelson, performance metric, Ponzi scheme, prediction markets, price anchoring, price stability, principal–agent problem, private sector deleveraging, purchasing power parity, quantitative easing, quantitative trading / quantitative ﬁnance, random walk, reserve currency, Richard Thaler, risk tolerance, risk-adjusted returns, risk/return, riskless arbitrage, Robert Shiller, Robert Shiller, savings glut, selection bias, Sharpe ratio, short selling, sovereign wealth fund, statistical arbitrage, statistical model, stochastic volatility, stocks for the long run, survivorship bias, systematic trading, The Great Moderation, The Myth of the Rational Market, too big to fail, transaction costs, tulip mania, value at risk, volatility arbitrage, volatility smile, working-age population, Y2K, yield curve, zero-coupon bond, zero-sum game

This is an in-sample measure and can be misleading if the correlations are not stable over time. Note, though, that most academic studies rely on such in-sample relations; econometricians simply assume that any observed statistical relation between predictors and subsequent market returns was already known to rational investors in real time. Practitioners who find this assumption unrealistic try to avoid in-sample bias by selecting and/or estimating statistical models repeatedly using only data that were available at each point in time, so as to assess predictability in a quasi-out-of-sample sense, but never completely succeeding in doing so. Table 8.6. Correlations with future excess returns of the S&P 500, 1962–2009 Sources: Haver Analytics, Robert Shiller’s website, Amit Goyal’s website, own calculations. Valuations. Various valuation ratios have predictive correlations between 10% and 20% for the next quarter [5].

…

They treat default (or rating change) as a random event whose probability can be estimated from observed market prices in the context of an analytical model (or directly from historical default data). Useful indicators, besides equity volatility and leverage, include past equity returns, certain financial ratios, and proxies for the liquidity premium. This modeling approach is sort of a compromise between statistical models and theoretically purer structural models. Reduced-form models can naturally match market spreads better than structural models, but unconstrained indicator selection can make them overfitted to in-sample data. Box 10.1. (wonkish) Risk-neutral and actual default probabilities Under certain assumptions (continuous trading, a single-factor diffusion process), positions in risky assets can be perfectly hedged and thus should earn riskless return.

…

However, there is some evidence of rising correlations across all quant strategies, presumably due to common positions among leveraged traders. 12.7 NOTES [1] Like many others, I prefer to use economic intuition as one guard against data mining, but the virtues of such intuition can be overstated as our intuition is inevitably influenced by past experiences. Purely data-driven statistical approaches are even worse, but at least then statistical models can help assess the magnitude of data-mining bias. [2] Here are some additional points on VMG: —No trading costs or financing costs related to shorting are subtracted from VMG returns. This is typical for academic studies because such costs are trade specific and/or investor specific and, moreover, such data are not available over long histories. —VMG is constructed in a deliberately conservative (“underfitted”) manner.

pages: 467 words: 116,094

**
I Think You'll Find It's a Bit More Complicated Than That
** by
Ben Goldacre

call centre, conceptual framework, correlation does not imply causation, crowdsourcing, death of newspapers, Desert Island Discs, en.wikipedia.org, experimental subject, Firefox, Flynn Effect, jimmy wales, John Snow's cholera map, Loebner Prize, meta analysis, meta-analysis, moral panic, placebo effect, publication bias, selection bias, selective serotonin reuptake inhibitor (SSRI), Simon Singh, statistical model, stem cell, the scientific method, Turing test, WikiLeaks

Obviously, there are no out gay people in the eighteen-to-twenty-four group who came out at an age later than twenty-four; so the average age at which people in the eighteen-to-twenty-four group came out cannot possibly be greater than the average age of that group, and certainly it will be lower than, say, thirty-seven, the average age at which people in their sixties came out. For the same reason, it’s very likely indeed that the average age of coming out will increase as the average age of each age group rises. In fact, if we assume (in formal terms we could call this a ‘statistical model’) that at any time, all the people who are out have always come out at a uniform rate between the age of ten and their current age, you would get almost exactly the same figures (you’d get fifteen, twenty-three and thirty-five, instead of seventeen, twenty-one and thirty-seven). This is almost certainly why ‘the average coming-out age has fallen by over twenty years’: in fact you could say that Stonewall’s survey has found that on average, as people get older, they get older.

…

For example, a recent study identified two broad subpopulations of cyclist: ‘one speed-happy group that cycle fast and have lots of cycle equipment including helmets, and one traditional kind of cyclist without much equipment, cycling slowly’. The study concluded that compulsory cycle-helmet legislation may selectively reduce cycling in the second group. There are even more complex second-round effects if each individual cyclist’s safety is improved by increased cyclist density through ‘safety in numbers’, a phenomenon known as Smeed’s law. Statistical models for the overall impact of helmet habits are therefore inevitably complex and based on speculative assumptions. This complexity seems at odds with the current official BMA policy, which confidently calls for compulsory helmet legislation. Standing over all this methodological complexity is a layer of politics, culture and psychology. Supporters of helmets often tell vivid stories about someone they knew, or heard of, who was apparently saved from severe head injury by a helmet.

…

A&E departments: randomised trials in 208; waiting times 73–5 abdominal aortic aneurysms (AAA) 18, 114 abortion; GPs and xviii, 89–91; Science and Technology Committee report on ‘scientific developments relating to the Abortion Act, 1967’ 196–201 academia, bad xviii–xix, 127–46; animal experiments, failures in research 136–8; brain-imaging studies report more positive findings than their numbers can support 131–4; journals, failures of academic 138–46; Medical Hypotheses: Aids denialism in 138–41; Medical Hypotheses: ‘Down Subjects and Oriental Population Share Several Specific Attitudes and Characteristics’ article 139, 141–3; Medical Hypotheses: masturbation as a treatment for nasal congestion articles 139, 143–6; misuse of statistics 129–31; retractions, academic literature and 134–6 academic journals: access to papers published in 32–4, 143; cherry-picking and 5–8; ‘citation classics’ and 9–10, 102–3, 173; commercial ghost writers and 25–6; data published in newspapers rather than 17–20; doctors and technical academic journals 214; ‘impact factor’ 143; number of 14, 17; peer review and 138–46 see also peer review; poor quality (‘crap’) 138–46; refusal to publish in 3–5; retractions and 134–6; statistical model errors in 129–31; studies of errors in papers published in 9–10, 129–31; summaries of important new research from 214–15; teaching and 214–15; youngest people to publish papers in 11–12 academic papers xvi; access to 32–4; cherry-picking from xvii, 5–8, 12, 174, 176–7, 192, 193, 252, 336, 349, 355; ‘citation classics’ 9–10, 102–3, 173; commercial ‘ghost writers’ and 25–6; investigative journalism work and 18; journalists linking work to 342, 344, 346; number of 14; peer review and see peer review; post-publication 4–5; press releases and xxi, 6, 29–31, 65, 66, 107–9, 119, 120, 121–2, 338–9, 340–2, 358–60; public relations and 358–60; publication bias 132–3, 136, 314, 315; references to other academic papers within allowing study of how ideas spread 26; refusal to publish in 3–5, 29–31; retractions and 134–6; studies of errors in 9–10, 129–31; titles of 297 Acousticom 366 acupuncture 39, 388 ADE 651 273–5 ADHD 40–2 Advertising Standards Authority (ASA) 252 Afghanistan 231; crop captures in xx, 221–4 Ahn, Professor Anna 341 Aids; antiretroviral drugs and 140, 185, 281, 284, 285; Big Pharma and 186; birth control, abortion and US Christian aid groups 185; Catholic Church fight against condom use and 183–4; cures for 12, 182–3, 185–6, 366; denialism 138–41, 182–3, 185–6, 263, 273, 281–6; drug users and 182, 183, 233–4; House of Numbers film 281–3; Medical Hypotheses, Aids denial in 138–41; needle-exchange programmes and 182, 183; number of deaths from 20, 186, 309; power of ideas and 182–7; Roger Coghill and ‘the Aids test’ 366; Spectator, Aids denialism at the xxi, 283–6; US Presidential Emergency Plan for Aids Relief 185 Aidstruth.org 139 al-Jabiri, Major General Jehad 274–5 alcohol: intravenous use of 233; lung cancer and 108–9; rape and consumption of 329, 330 ALLHAT trial 119 Alzheimer’s, smoking and 20–1 American Academy of Child and Adolescent Psychiatry 325 American Association on Mental Retardation 325 American Journal of Clinical Nutrition 344 American Medical Association 262 American Psychological Association 325 American Speech-Language-Hearing Association 325 anecdotes, illustrating data with 8, 118–22, 189, 248–9, 293 animal experiments 136–8 Annals of Internal Medicine 358 Annals of Thoracic Surgery 134 anti-depressants 18; recession linked to rise in prescriptions for xviii, 104–7; SSRI 18, 105 antiretroviral medications 140, 185, 281, 284, 285 aortic aneurysm repair, mortality rates in hospital after/during 18–20, 114 APGaylard 252 Appleby, John 19, 173 artificial intelligence xxii, 394–5 Asch, Solomon 15, 16 Asphalia 365 Associated Press 316 Astel, Professor Karl 22 ATSC 273 autism: educational interventions in 325; internet use and 3; MMR and 145, 347–55, 356–8 Autism Research Centre, Cambridge 348, 354 Bad Science (Goldacre) xvi, 104, 110n, 257, 346 Bad Science column see Guardian Ballas, Dr Dimitris 58 Barasi, Leo 96 Barden, Paul 101–4 Barnardo’s 394 Baron-Cohen, Professor Simon 349–51, 353–4 Batarim 305–6 BBC xxi; ‘bioresonance’ story and 277–8; Britain’s happiest places story and 56, 57; causes of avoidable death, overall coverage of 20; Down’s syndrome births increase story and 61–2; ‘EDF Survey Shows Support for Hinkley Power Station’ story and 95–6; psychological nature of libido problems story and 37; radiation from wi-fi networks story and 289–91, 293; recession and anti-depressant link, reports 105; Reform: The Value of Mathematics’ story and 196; ‘Threefold variation’ in UK bowel cancer rates’ story and 101–4; Wightman and 393, 394; ‘“Worrying’’ Jobless Rise Needs Urgent Action – Labour’ story and 59 Beating Bowel Cancer 101, 104 Becker muscular dystrophy 121 Bem Sex Role Inventory (BSRI) 45 Benedict XVI, Pope 183, 184 Benford’s law 54–6 bicycle helmets, the law and 110–13 big data xvii, xviii, 71–86; access to government data 75–7; care.data and risk of sharing medical records 77–86; magical way that patterns emerge from data 73–5 Big Pharma xvii, 324, 401 bin Laden, Osama 357 biologising xvii, 35–46; biological causes for psychological or behavioural conditions 40–2; brain imaging, reality of phenomena and 37–9; girls’ love of pink, evolution and 42–6 Biologist 6 BioSTAR 248 birth rate, UK 49–50 Bishop, Professor Dorothy 3, 6 bladder cancer 24–5, 342 Blair, Tony 357 Blakemore, Colin 138 blame, mistakes in medicine and 267–70 blind auditions, orchestras and xxi, 309–11 blinding, randomised testing and xviii, 12, 118, 124, 126, 133, 137–8, 292–3, 345 blood tests 117, 119–20, 282 blood-pressure drugs 119–20 Blundell, Professor John 337 BMA 112 Booth, Patricia 265 Boston Globe 39 bowel cancer 101–4 Boynton, Dr Petra 252 Brain Committee 230–1 Brain Gym 10–12 Brainiac: faking of science on xxii, 371–5 brain-imaging studies, positive findings in 131–4 breast cancer: abortion and 200–1; diet and 338–40; red wine and 267, 269; screening 113, 114, 115 breast enhancement cream xx, 254–7 Breuning, Stephen 135–6 The British Association for Applied Nutrition and Nutritional Therapy (BANT) 268–9 British Association of Nutritional Therapists 270 British Chiropractic Association (BCA) 250–4 British Dental Association 24 British Household Panel Survey 57 British Journal of Cancer: ‘What if Cancer Survival in Britain were the Same as in Europe: How Many Deaths are Avoidable?’

pages: 586 words: 186,548

**
Architects of Intelligence
** by
Martin Ford

3D printing, agricultural Revolution, AI winter, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, barriers to entry, basic income, Baxter: Rethink Robotics, Bayesian statistics, bitcoin, business intelligence, business process, call centre, cloud computing, cognitive bias, Colonization of Mars, computer vision, correlation does not imply causation, crowdsourcing, DARPA: Urban Challenge, deskilling, disruptive innovation, Donald Trump, Douglas Hofstadter, Elon Musk, Erik Brynjolfsson, Ernest Rutherford, Fellow of the Royal Society, Flash crash, future of work, gig economy, Google X / Alphabet X, Gödel, Escher, Bach, Hans Rosling, ImageNet competition, income inequality, industrial robot, information retrieval, job automation, John von Neumann, Law of Accelerating Returns, life extension, Loebner Prize, Mark Zuckerberg, Mars Rover, means of production, Mitch Kapor, natural language processing, new economy, optical character recognition, pattern recognition, phenotype, Productivity paradox, Ray Kurzweil, recommendation engine, Robert Gordon, Rodney Brooks, Sam Altman, self-driving car, sensor fusion, sentiment analysis, Silicon Valley, smart cities, social intelligence, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, Ted Kaczynski, The Rise and Fall of American Growth, theory of mind, Thomas Bayes, Travis Kalanick, Turing test, universal basic income, Wall-E, Watson beat the top human players on Jeopardy!, women in the workforce, working-age population, zero-sum game, Zipcar

Nowadays people talk about it in various contexts, with consciousness, and with common sense, but that’s really not what we’ve seen. We do find that people, including myself, have all kinds of speculations about the future, but as a scientist, I like to base my conclusions on the specific data that we’ve seen. And what we’ve seen is people using deep learning as high-capacity statistical models. High capacity is just some jargon that means that the model keeps getting better and better the more data you throw at it. Statistical models that at their core are based on matrices of numbers being multiplied, and added, and subtracted, and so on. They are a long way from something where you can see common sense or consciousness emerging. My feeling is that there’s no data to support these claims and if such data appears, I’ll be very excited, but I haven’t seen it yet.

…

He received a BS, summa cum laude from the University of Minnesota in Computer Science & Economics in 1990. From 1996 to 1999, he worked for Digital Equipment Corporation’s Western Research Lab in Palo Alto, where he worked on low-overhead profiling tools, design of profiling hardware for out-of-order microprocessors, and web-based information retrieval. From 1990 to 1991, Jeff worked for the World Health Organization’s Global Programme on AIDS, developing software to do statistical modeling, forecasting, and analysis of the HIV pandemic. In 2009, Jeff was elected to the National Academy of Engineering, and he was also named a Fellow of the Association for Computing Machinery (ACM) and a Fellow of the American Association for the Advancement of Sciences (AAAS). His areas of interest include large-scale distributed systems, performance monitoring, compression techniques, information retrieval, application of machine learning to search and other related problems, microprocessor architecture, compiler optimizations, and development of new products that organize existing information in new and interesting ways.

…

In some ways, the field of AI grew to embrace my work rather than me choosing to go into AI. I went to Berkeley as a postdoc, and there I started to really think about how what I was doing was relevant to actual problems that people cared about, as opposed to just being mathematically elegant. That was the first time I started to get into machine learning. I then returned to Stanford as faculty in 1995 where I started to work on areas relating to statistical modeling and machine learning. I began studying applied problems where machine learning could really make a difference. I worked in computer vision, in robotics, and from 2000 on biology and health data. I also had an ongoing interest in technology-enabled education, which led to a lot of experimentation at Stanford into ways in which we could offer an enhanced learning experience. This was not only for students on campus, but also trying to offer courses to people who didn’t have access to a Stanford education.

pages: 183 words: 17,571

**
Broken Markets: A User's Guide to the Post-Finance Economy
** by
Kevin Mellyn

banking crisis, banks create money, Basel III, Bernie Madoff, Big bang: deregulation of the City of London, Bonfire of the Vanities, bonus culture, Bretton Woods, BRICs, British Empire, business cycle, buy and hold, call centre, Carmen Reinhart, central bank independence, centre right, cloud computing, collapse of Lehman Brothers, collateralized debt obligation, corporate governance, corporate raider, creative destruction, credit crunch, crony capitalism, currency manipulation / currency intervention, disintermediation, eurozone crisis, fiat currency, financial innovation, financial repression, floating exchange rates, Fractional reserve banking, global reserve currency, global supply chain, Home mortgage interest deduction, index fund, information asymmetry, joint-stock company, Joseph Schumpeter, labor-force participation, light touch regulation, liquidity trap, London Interbank Offered Rate, market bubble, market clearing, Martin Wolf, means of production, mobile money, money market fund, moral hazard, mortgage debt, mortgage tax deduction, negative equity, Ponzi scheme, profit motive, quantitative easing, Real Time Gross Settlement, regulatory arbitrage, reserve currency, rising living standards, Ronald Coase, seigniorage, shareholder value, Silicon Valley, statistical model, Steve Jobs, The Great Moderation, the payments system, Tobin tax, too big to fail, transaction costs, underbanked, Works Progress Administration, yield curve, Yogi Berra, zero-sum game

Regulators were becoming increasingly comfortable with the “market-centric” model too, because the securities churned out had to be properly vetted and rated by the credit agencies under SEC (Securities and Exchange Commission) rules. Moreover, distributing risk to large numbers of sophisticated institutions seemed safer than leaving it concentrated on the books of individual banks. Besides, even the Basel-process experts had become convinced that bank risk management had reached a new level of effectiveness through the use of sophisticated statistical models, and the Basel II rules that superseded Basel I especially allowed the largest and most sophisticated banks to use approved models to set their capital requirements. The ﬂy in the ointment of market-centric ﬁnance was that it allowed an almost inﬁnite expansion of credit in the economy, but creditworthy risks are by deﬁnition ﬁnite. At some point, every household with a steady income has 33 34 Chapter 2 | Banking, Regulation, and Financial Crises seven credit cards, a mortgage, and a home equity line.

…

Americans make this tradeoff with limited fair-credit-reporting protections, while many other societies do not. It is critical to understand that a credit score is only a measure of whether a consumer can service a certain amount of credit—that is, make timely interest and principal payments. It is not concerned with the ability to pay off Broken Markets debts over time. What it really measures is the probability that an individual will default. This is a statistical model–based determination, and as such is hostage to historical experience of the behavior of tens of millions of individuals. The factors that over time have proved most predictive include not only behavior—late or missed payments on any bill, not just a loan, signals potential default—but also circumstances. Home ownership of long duration is a plus. So is long-term employment at the same ﬁrm.

pages: 238 words: 77,730

**
Final Jeopardy: Man vs. Machine and the Quest to Know Everything
** by
Stephen Baker

23andMe, AI winter, Albert Einstein, artificial general intelligence, business process, call centre, clean water, commoditize, computer age, Frank Gehry, information retrieval, Iridium satellite, Isaac Newton, job automation, pattern recognition, Ray Kurzweil, Silicon Valley, Silicon Valley startup, statistical model, theory of mind, thinkpad, Turing test, Vernor Vinge, Wall-E, Watson beat the top human players on Jeopardy!

The Google team had fed millions of translated documents, many of them from the United Nations, into their computers and supplemented them with a multitude of natural-language text culled from the Web. This training set dwarfed their competitors’. Without knowing what the words meant, their computers had learned to associate certain strings of words in Arabic and Chinese with their English equivalents. Since they had so very many examples to learn from, these statistical models caught nuances that had long confounded machines. Using statistics, Google’s computers won hands down. “Just like that, they bypassed thirty years of work on machine translation,” said Ed Lazowska, the chairman of the computer science department at the University of Washington. The statisticians trounced the experts. But the statistically trained machines they built, whether they were translating from Chinese or analyzing the ads that a Web surfer clicked, didn’t know anything.

…

“We knew all of its algorithms,” he said, and the team had precise statistics on every aspect of its behavior. The human players were more complicated. Tesauro had to pull together statistics on the thousands of humans who had played Jeopardy: how often they buzzed in, their precision in different levels of clues, their betting patterns for Daily Doubles and Final Jeopardy. From these, the IBM team pieced together statistical models of two humans. Then they put them into action against the model of Watson. The games had none of the life or drama of Jeopardy—no suspense, no jokes, no jingle while the digital players came up with their Final Jeopardy responses. They were only simulations of the scoring dynamics of Jeopardy. Yet they were valuable. After millions of games, Tesauro was able to calculate the value of each clue at each state of the game.

pages: 752 words: 131,533

**
Python for Data Analysis
** by
Wes McKinney

backtesting, cognitive dissonance, crowdsourcing, Debian, Firefox, Google Chrome, Guido van Rossum, index card, random walk, recommendation engine, revision control, sentiment analysis, Sharpe ratio, side project, sorting algorithm, statistical model, type inference

While readers may have many different end goals for their work, the tasks required generally fall into a number of different broad groups: Interacting with the outside world Reading and writing with a variety of file formats and databases. Preparation Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and transforming data for analysis. Transformation Applying mathematical and statistical operations to groups of data sets to derive new data sets. For example, aggregating a large table by group variables. Modeling and computation Connecting your data to statistical models, machine learning algorithms, or other computational tools Presentation Creating interactive or static graphical visualizations or textual summaries In this chapter I will show you a few data sets and some things we can do with them. These examples are just intended to pique your interest and thus will only be explained at a high level. Don’t worry if you have no experience with any of these tools; they will be discussed in great detail throughout the rest of the book.

…

To create a Panel, you can use a dict of DataFrame objects or a three-dimensional ndarray: import pandas.io.data as web pdata = pd.Panel(dict((stk, web.get_data_yahoo(stk, '1/1/2009', '6/1/2012')) for stk in ['AAPL', 'GOOG', 'MSFT', 'DELL'])) Each item (the analogue of columns in a DataFrame) in the Panel is a DataFrame: In [297]: pdata Out[297]: <class 'pandas.core.panel.Panel'> Dimensions: 4 (items) x 861 (major) x 6 (minor) Items: AAPL to MSFT Major axis: 2009-01-02 00:00:00 to 2012-06-01 00:00:00 Minor axis: Open to Adj Close In [298]: pdata = pdata.swapaxes('items', 'minor') In [299]: pdata['Adj Close'] Out[299]: <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 861 entries, 2009-01-02 00:00:00 to 2012-06-01 00:00:00 Data columns: AAPL 861 non-null values DELL 861 non-null values GOOG 861 non-null values MSFT 861 non-null values dtypes: float64(4) ix-based label indexing generalizes to three dimensions, so we can select all data at a particular date or a range of dates like so: In [300]: pdata.ix[:, '6/1/2012', :] Out[300]: Open High Low Close Volume Adj Close AAPL 569.16 572.65 560.52 560.99 18606700 560.99 DELL 12.15 12.30 12.05 12.07 19396700 12.07 GOOG 571.79 572.65 568.35 570.98 3057900 570.98 MSFT 28.76 28.96 28.44 28.45 56634300 28.45 In [301]: pdata.ix['Adj Close', '5/22/2012':, :] Out[301]: AAPL DELL GOOG MSFT Date 2012-05-22 556.97 15.08 600.80 29.76 2012-05-23 570.56 12.49 609.46 29.11 2012-05-24 565.32 12.45 603.66 29.07 2012-05-25 562.29 12.46 591.53 29.06 2012-05-29 572.27 12.66 594.34 29.56 2012-05-30 579.17 12.56 588.23 29.34 2012-05-31 577.73 12.33 580.86 29.19 2012-06-01 560.99 12.07 570.98 28.45 An alternate way to represent panel data, especially for fitting statistical models, is in “stacked” DataFrame form: In [302]: stacked = pdata.ix[:, '5/30/2012':, :].to_frame() In [303]: stacked Out[303]: Open High Low Close Volume Adj Close major minor 2012-05-30 AAPL 569.20 579.99 566.56 579.17 18908200 579.17 DELL 12.59 12.70 12.46 12.56 19787800 12.56 GOOG 588.16 591.90 583.53 588.23 1906700 588.23 MSFT 29.35 29.48 29.12 29.34 41585500 29.34 2012-05-31 AAPL 580.74 581.50 571.46 577.73 17559800 577.73 DELL 12.53 12.54 12.33 12.33 19955500 12.33 GOOG 588.72 590.00 579.00 580.86 2968300 580.86 MSFT 29.30 29.42 28.94 29.19 39134000 29.19 2012-06-01 AAPL 569.16 572.65 560.52 560.99 18606700 560.99 DELL 12.15 12.30 12.05 12.07 19396700 12.07 GOOG 571.79 572.65 568.35 570.98 3057900 570.98 MSFT 28.76 28.96 28.44 28.45 56634300 28.45 DataFrame has a related to_panel method, the inverse of to_frame: In [304]: stacked.to_panel() Out[304]: <class 'pandas.core.panel.Panel'> Dimensions: 6 (items) x 3 (major) x 4 (minor) Items: Open to Adj Close Major axis: 2012-05-30 00:00:00 to 2012-06-01 00:00:00 Minor axis: AAPL to MSFT Chapter 6.

…

There are much more efficient sampling-without-replacement algorithms, but this is an easy strategy that uses readily available tools: In [183]: df.take(np.random.permutation(len(df))[:3]) Out[183]: 0 1 2 3 1 4 5 6 7 3 12 13 14 15 4 16 17 18 19 To generate a sample with replacement, the fastest way is to use np.random.randint to draw random integers: In [184]: bag = np.array([5, 7, -1, 6, 4]) In [185]: sampler = np.random.randint(0, len(bag), size=10) In [186]: sampler Out[186]: array([4, 4, 2, 2, 2, 0, 3, 0, 4, 1]) In [187]: draws = bag.take(sampler) In [188]: draws Out[188]: array([ 4, 4, -1, -1, -1, 5, 6, 5, 4, 7]) Computing Indicator/Dummy Variables Another type of transformation for statistical modeling or machine learning applications is converting a categorical variable into a “dummy” or “indicator” matrix. If a column in a DataFrame has k distinct values, you would derive a matrix or DataFrame containing k columns containing all 1’s and 0’s. pandas has a get_dummies function for doing this, though devising one yourself is not difficult. Let’s return to an earlier example DataFrame: In [189]: df = DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'b'], .....: 'data1': range(6)}) In [190]: pd.get_dummies(df['key']) Out[190]: a b c 0 0 1 0 1 0 1 0 2 1 0 0 3 0 0 1 4 1 0 0 5 0 1 0 In some cases, you may want to add a prefix to the columns in the indicator DataFrame, which can then be merged with the other data. get_dummies has a prefix argument for doing just this: In [191]: dummies = pd.get_dummies(df['key'], prefix='key') In [192]: df_with_dummy = df[['data1']].join(dummies) In [193]: df_with_dummy Out[193]: data1 key_a key_b key_c 0 0 0 1 0 1 1 0 1 0 2 2 1 0 0 3 3 0 0 1 4 4 1 0 0 5 5 0 1 0 If a row in a DataFrame belongs to multiple categories, things are a bit more complicated.

pages: 58 words: 18,747

**
The Rent Is Too Damn High: What to Do About It, and Why It Matters More Than You Think
** by
Matthew Yglesias

Edward Glaeser, falling living standards, Home mortgage interest deduction, income inequality, industrial robot, Jane Jacobs, land reform, mortgage tax deduction, New Urbanism, pets.com, rent control, rent-seeking, Robert Gordon, Robert Shiller, Robert Shiller, Saturday Night Live, Silicon Valley, statistical model, transcontinental railway, urban sprawl, white picket fence

That said, though automobiles are unquestionably a useful technology, they’re not teleportation devices and they haven’t abolished distance. Location still matters, and some land is more valuable than other land. Since land and structures are normally sold in a bundle, it’s difficult in many cases to get precise numbers on land prices as such. But researchers at the Federal Reserve Bank of New York used a statistical model based on prices paid for vacant lots and for structures that were torn down to be replaced by brand-new buildings and found that the price of land in the metro area is closely linked to its distance from the Empire State Building: CHART 1 Land Prices and Distance of Property from Empire State Building Natural logarithm of land price per square foot Distance from Empire State Building (kilometers) In general, the expensive land should be much more densely built upon than the cheap land.

**
Statistics in a Nutshell
** by
Sarah Boslaugh

Antoine Gombaud: Chevalier de Méré, Bayesian statistics, business climate, computer age, correlation coefficient, experimental subject, Florence Nightingale: pie chart, income per capita, iterative process, job satisfaction, labor-force participation, linear programming, longitudinal study, meta analysis, meta-analysis, p-value, pattern recognition, placebo effect, probability theory / Blaise Pascal / Pierre de Fermat, publication bias, purchasing power parity, randomized controlled trial, selection bias, six sigma, statistical model, The Design of Experiments, the scientific method, Thomas Bayes, Vilfredo Pareto

One way to think of this use of ANCOVA is that by controlling for the effect of the continuous covariate(s), you are examining what the relationship between the factors and the continuous outcome would be if all cases had the same value for the covariate(s). For instance, in the field of study and salary example, by using age as a continuous covariate, you are examining what the relationship between those two factors would be if all the subjects in your study were the same age. Another typical use of ANCOVA is to reduce the residual or error variance in a design. We know that one goal of statistical modeling is to explain variance in a data set and that we generally prefer models that can explain more variance, and have lower residual variance, than models that explain less. If we can reduce the residual variance by including one or more continuous covariates in our design, it might be easier to see the relationships between the factors of interest and the dependent variable. The assumptions of ANOVA apply to ANCOVA, and there are two additional assumptions (numbers 5 and 6) as well for ANCOVA: Data appropriateness The outcome variable should be continuous, measured at the interval or ratio level, and be unbounded (or at least cover a wide range); the factors (group variables) should be dichotomous or categorical; the covariate(s) should be continuous, measured at the interval or ratio level, and be unbounded or cover a wide range.

…

For example, in the mid-1970s, models focused on variables derived from atmospheric conditions, whereas in the near future, models will be available that are based on atmospheric data combined with land surface, ocean and sea ice, sulphate and nonsulphate aerosol, carbon cycle, dynamic vegetation, and atmospheric chemistry data. By combining these additional sources of variation into a large-scale statistical model, predictions of weather activity of qualitatively different types have been made possible at different spatial and temporal scales. In this chapter, we will be working with multiple regression on a much smaller scale. This is not unrealistic from a real-world point of view; in fact, useful regression models may be built using a relatively small number of predictor variables (say, from 2 to 10), although the people building the model might consider far more predictors for inclusion before selecting those to keep in the final model.

…

Perhaps wine drinkers eat better diets than people who don’t drink at all, or perhaps they are able to drink wine because they are in better health. (Treatment for certain illnesses precludes alcohol consumption, for instance.) To try to eliminate these alternative explanations, researchers often collect data on a variety of factors other than the factor of primary interest and include the extra factors in the statistical model. Such variables, which are neither the outcome nor the main predictors of interest, are called control variables because they are included in the equation to control for their effect on the outcome. Variables such as age, gender, socioeconomic status, and race/ethnicity are often included in medical and social science studies, although they are not the variables of interest, because the researcher wants to know the effect of the main predictor variables on the outcome after the effects of these control variables have been accounted for.

pages: 309 words: 86,909

**
The Spirit Level: Why Greater Equality Makes Societies Stronger
** by
Richard Wilkinson,
Kate Pickett

basic income, Berlin Wall, clean water, Diane Coyle, epigenetics, experimental economics, experimental subject, Fall of the Berlin Wall, full employment, germ theory of disease, Gini coefficient, God and Mammon, impulse control, income inequality, Intergovernmental Panel on Climate Change (IPCC), knowledge economy, labor-force participation, land reform, longitudinal study, Louis Pasteur, meta analysis, meta-analysis, Milgram experiment, moral panic, offshore financial centre, phenotype, plutocrats, Plutocrats, profit maximization, profit motive, Ralph Waldo Emerson, statistical model, The Chicago School, The Spirit Level, The Wealth of Nations by Adam Smith, Thorstein Veblen, ultimatum game, upwardly mobile, World Values Survey, zero-sum game

One factor is the strength of the relationship, which is shown by the steepness of the lines in Figures 4.1 and 4.2. People in Sweden are much more likely to trust each other than people in Portugal. Any alternative explanation would need to be just as strong, and in our own statistical models we find that neither poverty nor average standards of living can explain our findings. We also see a consistent association among both the United States and the developed countries. Earlier we described how Uslaner and Rothstein used a statistical model to show the ordering of inequality and trust: inequality affects trust, not the other way round. The relationships between inequality and women’s status and between inequality and foreign aid also add coherence and plausibility to our belief that inequality increases the social distance between different groups of people, making us less willing to see them as ‘us’ rather than ‘them’.

pages: 360 words: 85,321

**
The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling
** by
Adam Kucharski

Ada Lovelace, Albert Einstein, Antoine Gombaud: Chevalier de Méré, beat the dealer, Benoit Mandelbrot, butterfly effect, call centre, Chance favours the prepared mind, Claude Shannon: information theory, collateralized debt obligation, correlation does not imply causation, diversification, Edward Lorenz: Chaos theory, Edward Thorp, Everything should be made as simple as possible, Flash crash, Gerolamo Cardano, Henri Poincaré, Hibernia Atlantic: Project Express, if you build it, they will come, invention of the telegraph, Isaac Newton, Johannes Kepler, John Nash: game theory, John von Neumann, locking in a profit, Louis Pasteur, Nash equilibrium, Norbert Wiener, p-value, performance metric, Pierre-Simon Laplace, probability theory / Blaise Pascal / Pierre de Fermat, quantitative trading / quantitative ﬁnance, random walk, Richard Feynman, Ronald Reagan, Rubik’s Cube, statistical model, The Design of Experiments, Watson beat the top human players on Jeopardy!, zero-sum game

The probability each horse will win is a balance between the chance of the horse winning in the model and the chance of victory according to the current odds. The scales can tip one way or the other: whichever produces the combined prediction that lines up best with actual results. Strike the right balance, and good predictions can become profitable ones. WHEN WOODS AND BENTER arrived in Hong Kong, they did not meet with immediate success. While Benter spent the first year putting together the statistical model, Woods tried to make money exploiting the long-shot-favorite bias. They had come to Asia with a bankroll of $150,000; within two years, they’d lost it all. It didn’t help that investors weren’t interested in their strategy. “People had so little faith in the system that they would not have invested for 100 percent of the profits,” Woods later said. By 1986, things were looking better. After writing hundreds of thousands of lines of computer code, Benter’s model was ready to go.

…

All sorts of factors could influence a horse’s performance in a race, from past experience to track conditions. Some of which provide clear hints about the future, while others just muddy the predictions. To pin down which factors are useful, syndicates need to collect reliable, repeated observations about races. Hong Kong was the closest Bill Benter could find to a laboratory setup, with the same horses racing on a regular basis on the same tracks in similar conditions. Using his statistical model, Benter identified factors that could lead to successful race predictions. He found that some came out as more important than others. In Benter’s early analysis, for example, the model said the number of races a horse had previously run was a crucial factor when making predictions. In fact, it was more important than almost any other factor. Maybe the finding isn’t all that surprising. We might expect horses that have run more races to be used to the terrain and less intimated by their opponents.

pages: 304 words: 80,965

**
What They Do With Your Money: How the Financial System Fails Us, and How to Fix It
** by
Stephen Davis,
Jon Lukomnik,
David Pitt-Watson

activist fund / activist shareholder / activist investor, Admiral Zheng, banking crisis, Basel III, Bernie Madoff, Black Swan, buy and hold, centralized clearinghouse, clean water, computerized trading, corporate governance, correlation does not imply causation, credit crunch, Credit Default Swap, crowdsourcing, David Brooks, Dissolution of the Soviet Union, diversification, diversified portfolio, en.wikipedia.org, financial innovation, financial intermediation, fixed income, Flash crash, income inequality, index fund, information asymmetry, invisible hand, Kenneth Arrow, Kickstarter, light touch regulation, London Whale, Long Term Capital Management, moral hazard, Myron Scholes, Northern Rock, passive investing, performance metric, Ponzi scheme, post-work, principal–agent problem, rent-seeking, Ronald Coase, shareholder value, Silicon Valley, South Sea Bubble, sovereign wealth fund, statistical model, Steve Jobs, the market place, The Wealth of Nations by Adam Smith, transaction costs, Upton Sinclair, value at risk, WikiLeaks

Even if they change your life profoundly, such days are not likely to resemble the ones before and after. That is why the day you get married is so memorable. In fact, the elements of that day are not likely to be present in the sample of any of the previous 3,652 days.28 So how could the computer possibly calculate the likelihood of their recurring tomorrow, or next week? Similarly, in the financial world, if you feed a statistical model data that have come from a period where there has been no banking crisis, the model will predict that it is very unlikely you will have a banking crisis. When statisticians worked out that a financial crisis of the sort we witnessed in 2008 would occur once in billions of years, their judgment was based on years of data when there had not been such a crisis.29 It compounds the problem that people tend to simplify the outcome of risk models.

…

Just as the laws of gravity don’t explain magnetism or subatomic forces, so the disciplines of economics that held sway in our financial institutions paid little attention to the social, cultural, legal, political, institutional, moral, psychological, and technological forces that shape our economy’s behavior. The compass that bankers and regulators were using worked well according to its own logic, but it was pointing in the wrong direction, and they steered the ship onto the rocks. History does not record whether the Queen was satisfied with the academics’ response. She might, however, have noted that this economic-statistical model had been found wanting before—in 1998, when the collapse of the hedge fund Long-Term Capital Management nearly took the financial system down with it. Ironically, its directors included the two people who had shared the Nobel Prize in Economics the previous year.20 The Queen might also have noted the glittering lineup of senior economists who, over the last century, have warned against excessive confidence in predictions made using models.

**
Data Wrangling With Python: Tips and Tools to Make Your Life Easier
** by
Jacqueline Kazil

Amazon Web Services, bash_history, cloud computing, correlation coefficient, crowdsourcing, data acquisition, database schema, Debian, en.wikipedia.org, Firefox, Google Chrome, job automation, Nate Silver, natural language processing, pull request, Ronald Reagan, Ruby on Rails, selection bias, social web, statistical model, web application, WikiLeaks

Depending on how you join your data (inner/outer and left/right), you will get different datasets. Take time to think about what join fits your needs. Exception handling Enables you to anticipate and manage Python exceptions with code. It’s always better to be specific and explicit, so you don’t disguise bugs with overly general exception catches. numpy coerrcoef Uses statistical models like Pearson’s correlation to determine whether two parts of a dataset are related. agate mad_outli ers and stdev_out liers Use statistical models and tools like standard deviations or mean average deviations to determine whether your dataset has specific outliers or data that “doesn’t fit.” agate group_by and aggregate Group your dataset on a particular attribute and run aggregation analysis to see if there are notable differences (or similarities) across groupings.

…

This interactive displays different scenar‐ 262 | Chapter 10: Presenting Your Data ios The Guardian staff researched and coded. Not every simulation turns out with the same outcome, allowing users to understand there is an element of chance, while still showing probability (i.e., less chance of infection with higher vaccination rates). This takes a highly politicized topic and brings out real-world scenarios using statistical models of outbreaks. Although interactives take more experience to build and often require a deeper cod‐ ing skillset, they are a great tool, especially if you have frontend coding experience. As an example, for our child labor data we could build an interactive showing how many people in your local high school would have never graduated due to child labor rates if they lived in Chad. Another interactive could show goods and services avail‐ able in your local mall that are produced using child labor.

pages: 88 words: 25,047

**
The Mathematics of Love: Patterns, Proofs, and the Search for the Ultimate Equation
** by
Hannah Fry

Brownian motion, John Nash: game theory, linear programming, Nash equilibrium, Pareto efficiency, recommendation engine, Skype, statistical model

Statistical Science, 1989. Todd, Peter M. ‘Searching for the Next Best Mate.’ Simulating Social Phenomena, edited by Rosaria Conte, Rainer Hegselmann, Pietro Terna, 419–36. Berlin: Springer Berlin Heidelberg, 1997. CHAPTER 8: HOW TO OPTIMIZE YOUR WEDDING Bellows, Meghan L. and J. D. Luc Peterson. ‘Finding an Optimal Seating Chart.’ Annals of Improbable Research, 2012. Alexander, R. A Statistically Modelled Wedding. (2014): http://www.bbc.co.uk/news/magazine-25980076. CHAPTER 9: HOW TO LIVE HAPPILY EVER AFTER Gottman, John M., James D. Murray, Catherine C. Swanson, Rebecca Tyson and Kristin R. Swanson. The Mathematics of Marriage: Dynamic Nonlinear Models. Cambridge, MA.: Basic Books, 2005. AUTHOR THANKS This book isn’t exactly War and Peace, but it has still required help and support from a number of wonderful people.

pages: 398 words: 86,855

**
Bad Data Handbook
** by
Q. Ethan McCallum

Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, Chuck Templeton: OpenTable:, cloud computing, cognitive dissonance, combinatorial explosion, commoditize, conceptual framework, database schema, DevOps, en.wikipedia.org, Firefox, Flash crash, Gini coefficient, illegal immigration, iterative process, labor-force participation, loose coupling, natural language processing, Netflix Prize, quantitative trading / quantitative ﬁnance, recommendation engine, selection bias, sentiment analysis, statistical model, supply-chain management, survivorship bias, text mining, too big to fail, web application

In a previous life, he invented the refrigerator. Spencer Burns is a data scientist/engineer living in San Francisco. He has spent the past 15 years extracting information from messy data in fields ranging from intelligence to quantitative finance to social media. Richard Cotton is a data scientist with a background in chemical health and safety, and has worked extensively on tools to give non-technical users access to statistical models. He is the author of the R packages “assertive” for checking the state of your variables and “sig” to make sure your functions have a sensible API. He runs The Damned Liars statistics consultancy. Philipp K. Janert was born and raised in Germany. He obtained a Ph.D. in Theoretical Physics from the University of Washington in 1997 and has been working in the tech industry since, including four years at Amazon.com, where he initiated and led several projects to improve Amazon’s order fulfillment process.

…

As the first and second examples show, a scientist can spot faulty experimental setups, because of his or her ability to test the data for internal consistency and for agreement with known theories, and thereby prevent wrong conclusions and faulty analyses. What possibly could be more importantto a scientist? And if that means taking a trip to the factory, I’ll be glad to go. Chapter 8. Blood, Sweat, and Urine Richard Cotton A Very Nerdy Body Swap Comedy I spent six years working in the statistical modeling team at the UK’s Health and Safety Laboratory.[23] A large part of my job was working with the laboratory’s chemists, looking at occupational exposure to various nasty substances to see if an industry was adhering to safe limits. The laboratory gets sent tens of thousands of blood and urine samples each year (and sometimes more exotic fluids like sweat or saliva), and has its own team of occupational hygienists who visit companies and collect yet more samples.

**
Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage
** by
Zdravko Markov,
Daniel T. Larose

Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, speech recognition, statistical model, William of Occam

WHY THE BOOK IS NEEDED The book provides the reader with: r The models and techniques to uncover hidden nuggets of information in Webbased data r Insight into how web mining algorithms really work r The experience of actually performing web mining on real-world data sets “WHITE-BOX” APPROACH: UNDERSTANDING THE UNDERLYING ALGORITHMIC AND MODEL STRUCTURES The best way to avoid costly errors stemming from a blind black-box approach to data mining, is to apply, instead, a white-box methodology, which emphasizes an understanding of the algorithmic and statistical model structures underlying the software. The book, applies this white-box approach by: r Walking the reader through various algorithms r Providing examples of the operation of web mining algorithms on actual large data sets PREFACE xiii r Testing the reader’s level of understanding of the concepts and algorithms r Providing an opportunity for the reader to do some real web mining on large Web-based data sets Algorithm Walk-Throughs The book walks the reader through the operations and nuances of various algorithms, using small sample data sets, so that the reader gets a true appreciation of what is really going on inside an algorithm.

…

By inspecting the normal density curves, determine which attribute is more relevant for the classiﬁcation task. CHAPTER 4 EVALUATING CLUSTERING APPROACHES TO EVALUATING CLUSTERING SIMILARITY-BASED CRITERION FUNCTIONS PROBABILISTIC CRITERION FUNCTIONS MDL-BASED MODEL AND FEATURE EVALUATION CLASSES-TO-CLUSTERS EVALUATION PRECISION, RECALL, AND F-MEASURE ENTROPY APPROACHES TO EVALUATING CLUSTERING Clustering algorithms group documents by similarity or create statistical models based solely on the document representation, which in turn reﬂects document content. Then the criterion functions evaluate these models objectively (i.e., using only the document content). In contrast, when we label documents by topic we use additional knowledge, which is generally not explicitly available in document content and representation. Labeled documents are used primarily in supervised learning (classiﬁcation) to create a mapping between the document representation and the external notion (concept, category, class) provided by the teacher through labeling.

pages: 339 words: 94,769

**
Possible Minds: Twenty-Five Ways of Looking at AI
** by
John Brockman

AI winter, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, artificial general intelligence, Asilomar, autonomous vehicles, basic income, Benoit Mandelbrot, Bill Joy: nanobots, Buckminster Fuller, cellular automata, Claude Shannon: information theory, Daniel Kahneman / Amos Tversky, Danny Hillis, David Graeber, easy for humans, difficult for computers, Elon Musk, Eratosthenes, Ernest Rutherford, finite state, friendly AI, future of work, Geoffrey West, Santa Fe Institute, gig economy, income inequality, industrial robot, information retrieval, invention of writing, James Watt: steam engine, Johannes Kepler, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, Kickstarter, Laplace demon, Loebner Prize, market fundamentalism, Marshall McLuhan, Menlo Park, Norbert Wiener, optical character recognition, pattern recognition, personalized medicine, Picturephone, profit maximization, profit motive, RAND corporation, random walk, Ray Kurzweil, Richard Feynman, Rodney Brooks, self-driving car, sexual politics, Silicon Valley, Skype, social graph, speech recognition, statistical model, Stephen Hawking, Steven Pinker, Stewart Brand, strong AI, superintelligent machines, supervolcano, technological singularity, technoutopianism, telemarketer, telerobotics, the scientific method, theory of mind, Turing machine, Turing test, universal basic income, Upton Sinclair, Von Neumann architecture, Whole Earth Catalog, Y2K, zero-sum game

., 49, 240–53 AI safety concerns, 242–43 background and overview of work of, 240–41 conventional computers versus bio-electronic hybrids, 246–48 equal rights, 248–49 ethical rules for intelligent machines, 243–44 free will of machines, and rights, 250–51 genetic red lines, 251–52 human manipulation of humans, 244–46, 252 humans versus nonhumans and hybrids, treatment of, 249–53 non-Homo intelligences, fair and safe treatment of, 247–48 rights for nonhumans and hybrids, 249–53 science versus religion, 243–44 self-consciousness of machines, and rights, 250–51 technical barriers/red lines, malleability of, 244–46 transhumans, rights of, 252–53 clinical (subjective) method of prediction, 233, 234–35 Colloquy of Mobiles (Pask), 259 Colossus: The Forbin Project (film), 242 competence of superintelligent AGI, 85 computational theory of mind, 102–3, 129–33, 222 computer learning systems Bayesian models, 226–28 cooperative inverse-reinforcement learning (CIRL), 30–31 deep learning (See deep learning) human learning, similarities to, 11 reality blueprint, need for, 16–17 statistical, model-blind mode of current, 16–17, 19 supervised learning, 148 unsupervised learning, 225 Computer Power and Human Reason (Weizenbaum), 48–49, 248 computer virus, 61 “Computing Machinery and Intelligence” (Turing), 43 conflicts among hybrid superintelligences, 174–75 controllable-agent designs, 31–32 control systems beyond human control (control problem) AI designed as tool and not as conscious agent, 46–48, 51–53 arguments against AI risk (See risk posed by AI, arguments against) Ashby’s Law and, 39, 179, 180 cognitive element in, xx–xxi Dyson on, 38–39, 40 Macy conferences, xx–xxi purpose imbued in machines and, 23–25 Ramakrishnan on, 183–86 risk of superhuman intelligence, arguments against, 25–29 Russell on templates for provably beneficial AI, 29–32 Tallinn on, 93–94 Wiener’s warning about, xviii–xix, xxvi, 4–5, 11–12, 22–23, 35, 93, 104, 172 Conway, John Horton, 263 cooperative inverse-reinforcement learning (CIRL), 30–31 coordination problem, 137, 138–41 corporate/AI scenario, in relation of machine superintelligences to hybrid superintelligences, 176 corporate superintelligences, 172–74 credit-assignment function, 196–200 AI and, 196–97 humans, applied to, 197–200 Crick, Francis, 58, 66 culture in evolution, selecting for, 198–99 curiosity, and AI risk denial, 96 Cybernetic Idea, xv cybernetics, xv–xxi, 3–7, 102–4, 153–54, 178–80, 194–95, 209–10, 256–57 “Cybernetic Sculpture” exhibition (Tsai), 258, 260–61 “Cybernetic Serendipity” exhibition (Reichardt), 258–59 Cybernetics (Wiener), xvi, xvii, 3, 5, 7, 56 “Cyborg Manifesto, A” (Haraway), 261 data gathering and exploitation, computation platforms used for, 61–63 Dawkins, Richard, 243 Declaration of Helsinki, 252 declarative design, 166–67 Deep Blue, 8, 184 Deep Dream, 211 deep learning, 184–85 bottom-up, 224–26 Pearl on lack of transparency in, and limitations of, 15–19 reinforcement learning, 128, 184–85, 225–26 unsupervised learning, 225 visualization programs, 211–13 Wiener’s foreshadowing of, 9 Deep-Mind, 184–85, 224, 225, 262–63 Deleuze, Gilles, 256 Dennett, Daniel C., xxv, 41–53, 120, 191 AI as “helpless by themselves,” 46–48 AI as tool, not colleagues, 46–48, 51–53 background and overview of work of, 41–42 dependence on new tools and loss of ability to thrive without them, 44–46 gap between today’s AI and public’s imagination of AI, 49 humanoid embellishment of AI, 49–50 intelligent tools versus artificial conscious agents, need for, 51–52 operators of AI systems, responsibilities of, 50–51 on Turing Test, 46–47 on Weizenbaum, 48–50 on Wiener, 43–45 Descartes, René, 191, 223 Desk Set (film), 270 Deutsch, David, 113–24 on AGI risks, 121–22 background and overview of work of, 113–14 creating AGIs, 122–24 developing AI with goals under unknown constraints, 119–21 innovation in prehistoric humans, lack of, 116–19 knowledge imitation of ancestral humans, understanding inherent in, 115–16 reward/punishment of AI, 120–21 Differential Analyzer, 163, 179–80 digital fabrication, 167–69 digital signal encoding, 180 dimensionality, 165–66 distributed Thompson sampling, 198 DNA molecule, 58 “Dollie Clone Series” (Hershman Leeson), 261, 262 Doubt and Certainty in Science (Young), xviii Dragan, Anca, 134–42 adding people to AI problem definition, 137–38 background and overview of work of, 134–35 coordination problem, 137, 138–41 mathematical definition of AI, 136 value-alignment problem, 137–38, 141–42 The Dreams of Reason: The Computer and the Rise of the Science of Complexity (Pagels), xxiii Drexler, Eric, 98 Dyson, Freeman, xxv, xxvi Dyson, George, xviii–xix, 33–40 analog and digital computation, distinguished, 35–37 background and overview of work of, 33–34 control, emergence of, 38–39 electronics, fundamental transitions in, 35 hybrid analog/digital systems, 37–38 on three laws of AI, 39–40 “Economic Possibilities for Our Grandchildren” (Keynes), 187 “Einstein, Gertrude Stein, Wittgenstein and Frankenstein” (Brockman), xxii emergence, 68–69 Emissaries trilogy (Cheng), 216–17 Empty Space, The (Brook), 213 environmental risk, AI risk as, 97–98 Eratosthenes, 19 Evans, Richard, 217 Ex Machina (film), 242 expert systems, 271 extreme wealth, 202–3 fabrication, 167–69 factor analysis, 225 Feigenbaum, Edward, xxiv Feynman, Richard, xxi–xxii Fifth Generation, xxiii–xxiv The Fifth Generation: Artificial Intelligence and Japan’s Computer Challenge to the World (Feigenbaum and McCorduck), xxiv Fodor, Jerry, 102 Ford Foundation, 202 Foresight and Understanding (Toulmin), 18–19 free will of machines, and rights, 250–51 Frege, Gottlob, 275–76 Galison, Peter, 231–39 background and overview of work of, 231–32 clinical versus objective method of prediction, 233–35 scientific objectivity, 235–39 Gates, Bill, 202 generative adversarial networks, 226 generative design, 166–67 Gershenfeld, Neil, 160–69 background and overview of work of, 160–61 boom-bust cycles in evolution of AI, 162–63 declarative design, 166–67 digital fabrication, 167–69 dimensionality problem, overcoming, 165–66 exponentially increasing amounts of date, processing of, 164–65 knowledge in AI systems, 164 scaling, and development of AI, 163–66 Ghahramani, Zoubin, 190 Gibson, William, 253 Go, 10, 150, 184–85 goal alignment.

…

., 222, 225 Sleepwalkers, The (Koestler), 153 Sloan Foundation, 202 social sampling, 198–99 software failure to advance in conjunction with increased processing power, 10 lack of standards of correctness and failure in engineering of, 60–61 Solomon, Arthur K., xvi–xvii “Some Moral and Technical Consequences of Automation” (Wiener), 23 Stapledon, Olaf, 75 state/AI scenario, in relation of machine superintelligences to hybrid superintelligences, 175–76 statistical, model-blind mode of learning, 16–17, 19 Steveni, Barbara, 218 Stewart, Potter, 247 Steyerl, Hito on AI visualization programs, 211–12 on artificial stupidity, 210–11 subjective method of prediction, 233, 234–35 subjugation fear in AI scenarios, 108–10 Superintelligence: Paths, Dangers, Strategies (Bostrom), 27 supervised learning, 148 surveillance state dystopias, 105–7 switch-it-off argument against AI risk, 25 Szilard, Leo, 26, 83 Tallinn, Jaan, 88–99 AI-risk message, 92–93 background and overview of work of, 88–89 calibrating AI-risk message, 96–98 deniers of AI-risk, motives of, 95–96 environmental risk, AI risk as, 97–98 Estonian dissidents, messages of, 91–92 evolution’s creation of planner and optimizer greater than itself, 93–94 growing awareness of AI risk, 98–99 technological singularity.

pages: 340 words: 94,464

**
Randomistas: How Radical Researchers Changed Our World
** by
Andrew Leigh

Albert Einstein, Amazon Mechanical Turk, Anton Chekhov, Atul Gawande, basic income, Black Swan, correlation does not imply causation, crowdsourcing, David Brooks, Donald Trump, ending welfare as we know it, Estimating the Reproducibility of Psychological Science, experimental economics, Flynn Effect, germ theory of disease, Ignaz Semmelweis: hand washing, Indoor air pollution, Isaac Newton, Kickstarter, longitudinal study, loss aversion, Lyft, Marshall McLuhan, meta analysis, meta-analysis, microcredit, Netflix Prize, nudge unit, offshore financial centre, p-value, placebo effect, price mechanism, publication bias, RAND corporation, randomized controlled trial, recommendation engine, Richard Feynman, ride hailing / ride sharing, Robert Metcalfe, Ronald Reagan, statistical model, Steven Pinker, uber lyft, universal basic income, War on Poverty

Critics mocked his ‘pocket handkerchief wheat plots’.4 But after trying hundreds of different breeding combinations, Farrer created a new ‘Federation Wheat’ based not on reputation or appearance, but on pure performance. Agricultural trials of this kind are often called ‘field experiments’, a term which some people also use to describe randomised trials in social science. Modern agricultural field experiments use spatial statistical models to divide up the plots.5 As in medicine and aid, the most significant agricultural randomised trials are now conducted across multiple countries. They are at the heart of much of our understanding of genetically modified crops, the impact of climate change on agriculture, and drought resistance. * Gary Loveman was in his late thirties when he decided to make the switch from Harvard to Las Vegas.

…

In each case, my co-authors and I did our best to find a credible counter-factual. But all of these studies are limited by the assumptions that the methods required us to make. New developments in non-randomised econometrics – such as machine learning – are generally even more complicated than the older approaches.34 As economist Orley Ashenfelter notes, if an evaluator is predisposed to give a program the thumbs-up, statistical modelling ‘leaves too many ways for the researcher to fake it’.35 That’s why one leading econometrics text teaches non-random approaches by comparing each to the ‘experimental ideal’.36 Students are encouraged to ask the question: ‘If we could run a randomised experiment here, what would it look like?’ Another novel approach is to take data from a properly conducted randomised trial, and pretend that we wanted to run a non-randomised evaluation.

pages: 346 words: 92,984

**
The Lucky Years: How to Thrive in the Brave New World of Health
** by
David B. Agus

active transport: walking or cycling, Affordable Care Act / Obamacare, Albert Einstein, butterfly effect, clean water, cognitive dissonance, crowdsourcing, Danny Hillis, Drosophila, Edward Lorenz: Chaos theory, en.wikipedia.org, epigenetics, Kickstarter, longitudinal study, medical residency, meta analysis, meta-analysis, microbiome, microcredit, mouse model, Murray Gell-Mann, New Journalism, pattern recognition, personalized medicine, phenotype, placebo effect, publish or perish, randomized controlled trial, risk tolerance, statistical model, stem cell, Steve Jobs, Thomas Malthus, wikimedia commons

It didn’t take long for there to be a backlash against the implied message. Tomasetti and Vogelstein were accused of focusing on rare cancers while leaving out several common cancers that indeed are largely preventable. The International Agency for Research on Cancer, the cancer arm of the World Health Organization, published a press release stating it “strongly disagrees” with the report. To arrive at their conclusion, Tomasetti and Vogelstein used a statistical model they developed based on known rates of cell division in thirty-one types of tissue. Stem cells were their main focal point. As a reminder, these are the small, specialized “mothership” cells in each organ or tissue that divide to replace cells that die or wear out. Only in recent years have researchers been able to conduct these kinds of studies due to advances in the understanding of stem-cell biology.

…

., “Intensive Lifestyle Changes May Affect the Progression of Prostate Cancer,” Journal of Urology, 174, no. 3 (September 2005): 1065–69; discussion 1069–70. 11. A. R. Kristal et al., “Baseline Selenium Status and Effects of Selenium and Vitamin E Supplementation on Prostate Cancer Risk,” Journal of the National Cancer Institute 106, no. 3 (March 2014): djt456, doi:10.1093/jnci/djt456, Epub February 22, 2014. 12. Johns Hopkins Medicine, “Bad Luck of Random Mutations Plays Predominant Role in Cancer, Study Shows—Statistical Modeling Links Cancer Risk with Number of Stem Cell Divisions,” news release, January 1, 2015, www.hopkinsmedicine.org/news/media/releases/bad_luck_of_random_mutations_plays_predominant_role_in_cancer_study_shows. 13. C. Tomasetti and B. Vogelstein, “Cancer Etiology. Variation in Cancer Risk Among Tissues Can Be Explained by the Number of Stem Cell Divisions,” Science 347, no. 6217 (January 2, 2015): 78–81, doi:10.1126/science.1260825. 14.

pages: 125 words: 27,675

**
Applied Text Analysis With Python: Enabling Language-Aware Data Products With Machine Learning
** by
Benjamin Bengfort,
Rebecca Bilbro,
Tony Ojeda

full text search, natural language processing, quantitative easing, sentiment analysis, statistical model

In this chapter we used preprocessed documents to perform feature extraction, processes which are destructive since they remove information. In [Link to Come] we will explore classification models and applications, then in [Link to Come] we will take a look at clustering models, often called topic modeling in text analysis. 1 Kumar, A., McCann, R., Naughton, J., Patel, J. (2015) Model Selection Management Systems: The Next Frontier of Advanced Analytics 2 Wickham, H., Cooke, D., Hofmann, H. (2015) Visualizing statistical models: Removing the blindfold 3 https://arxiv.org/abs/1405.4053

pages: 317 words: 106,130

**
The New Science of Asset Allocation: Risk Management in a Multi-Asset World
** by
Thomas Schneeweis,
Garry B. Crowder,
Hossein Kazemi

asset allocation, backtesting, Bernie Madoff, Black Swan, business cycle, buy and hold, capital asset pricing model, collateralized debt obligation, commodity trading advisor, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, diversified portfolio, fixed income, high net worth, implied volatility, index fund, interest rate swap, invisible hand, market microstructure, merger arbitrage, moral hazard, Myron Scholes, passive investing, Richard Feynman, Richard Feynman: Challenger O-ring, risk tolerance, risk-adjusted returns, risk/return, selection bias, Sharpe ratio, short selling, statistical model, stocks for the long run, survivorship bias, systematic trading, technology bubble, the market place, Thomas Kuhn: the structure of scientific revolutions, transaction costs, value at risk, yield curve, zero-sum game

In practice, we must come up with estimates of the expected returns, standard deviations, and correlations. There are libraries of statistical books dedicated to the simple task of coming up with estimates of the parameters used in MPT. Here is the point: It is not simple. For example, (1) for what period is one estimating the parameters (week, month, year)? and (2) how constant are the estimates (e.g., do they change and, if they do, do we have statistical models that permit us to systematically reflect those changes?)? There are many more issues in parameter estimation, but probably the biggest is that when two assets exist with the same true expected return, standard deviation, and Measuring Risk 33 correlation but when the risk parameter is often estimated with error (e.g., standard deviation is larger or smaller than its true standard deviation), the procedure for determining the efficient frontier always picks the asset with the downward bias risk estimate (e.g., the lower estimated standard deviation) and the upward bias return estimate.

…

The expected return on a comparably risky non-actively managed investment strategy is often either derived from academic theory or statistically derived from historical pricing relationships. The primary issue, of course, remains how to create a comparably risky investable non-actively managed asset. Even when one believes in the use of ex ante equilibrium (e.g., CAPM) or arbitrage (e.g., APT) models of expected return, problems in empirically estimating the required parameters usually results in alpha being determined using statistical models based on the underlying theoretical model. As generally measured in a statistical sense, the term alpha is often derived from a linear regression in which the equation that relates an observed variable y (asset return) to some other factor x (market index) is written as: y = α + βx + ε The first term, α (alpha) represents the intercept; β (beta) represents the slope; and ε (epsilon) represents a random error term.

pages: 347 words: 97,721

**
Only Humans Need Apply: Winners and Losers in the Age of Smart Machines
** by
Thomas H. Davenport,
Julia Kirby

AI winter, Andy Kessler, artificial general intelligence, asset allocation, Automated Insights, autonomous vehicles, basic income, Baxter: Rethink Robotics, business intelligence, business process, call centre, carbon-based life, Clayton Christensen, clockwork universe, commoditize, conceptual framework, dark matter, David Brooks, deliberate practice, deskilling, digital map, disruptive innovation, Douglas Engelbart, Edward Lloyd's coffeehouse, Elon Musk, Erik Brynjolfsson, estate planning, fixed income, follow your passion, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, game design, general-purpose programming language, global pandemic, Google Glasses, Hans Lippershey, haute cuisine, income inequality, index fund, industrial robot, information retrieval, intermodal, Internet of things, inventory management, Isaac Newton, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joi Ito, Khan Academy, knowledge worker, labor-force participation, lifelogging, longitudinal study, loss aversion, Mark Zuckerberg, Narrative Science, natural language processing, Norbert Wiener, nuclear winter, pattern recognition, performance metric, Peter Thiel, precariat, quantitative trading / quantitative ﬁnance, Ray Kurzweil, Richard Feynman, risk tolerance, Robert Shiller, Robert Shiller, Rodney Brooks, Second Machine Age, self-driving car, Silicon Valley, six sigma, Skype, social intelligence, speech recognition, spinning jenny, statistical model, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, superintelligent machines, supply-chain management, transaction costs, Tyler Cowen: Great Stagnation, Watson beat the top human players on Jeopardy!, Works Progress Administration, Zipcar

Where It All Began Today, someone using the term “smart machine” could be talking about any number of technologies. The term “artificial intelligence” alone, for example, has been used to describe such technologies as expert systems (collections of rules facilitating decisions in a specified domain, such as financial planning or knowing when a batch of soup is cooked), neural networks (a more mathematical approach to creating a model that fits a data set), machine learning (semiautomated statistical modeling to achieve the best fitting-model to data), natural language processing or NLP (in which computers make sense of human language in textual form), and so forth. Wikipedia lists at least ten branches of AI, and we have seen other sources that mention many more. To make sense of this army of machines and the direction in which it is marching, it helps to remember where it all started: with numerical analytics supporting and supported by human decision-makers.

…

He hired additional credit risk modelers, and encouraged them to build a variety of quantitative models to identify any problems with the bank’s loan portfolios and credit processes. This work required a broad range of sophisticated models including “neural network” models; some were vendor supplied; some were custom-built . Cathcart, who was an English major at Dartmouth College but also learned the BASIC computer language there from its creator, John Kemeny, knew his way around computer systems and statistical models. Most important, he knew when to trust them and when not to. The models and analyses began to exhibit significant problems. No matter how automated and sophisticated the models were, Cathcart realized that they were becoming less valid over time with changes in the economy and banking climate. Many of the mortgage models, for example, were based on five years of historical data. But as the economy became worse by the day in 2007, those five-year models became dramatically overoptimistic.

pages: 311 words: 99,699

**
Fool's Gold: How the Bold Dream of a Small Tribe at J.P. Morgan Was Corrupted by Wall Street Greed and Unleashed a Catastrophe
** by
Gillian Tett

accounting loophole / creative accounting, asset-backed security, bank run, banking crisis, Black-Scholes formula, Blythe Masters, break the buck, Bretton Woods, business climate, business cycle, buy and hold, collateralized debt obligation, commoditize, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, easy for humans, difficult for computers, financial innovation, fixed income, housing crisis, interest rate derivative, interest rate swap, Kickstarter, locking in a profit, Long Term Capital Management, McMansion, money market fund, mortgage debt, North Sea oil, Northern Rock, Renaissance Technologies, risk tolerance, Robert Shiller, Robert Shiller, Satyajit Das, short selling, sovereign wealth fund, statistical model, The Great Moderation, too big to fail, value at risk, yield curve

That triggered panic among some investors, and many rushed to sell CDSs and CDOs, causing their prices to drop, an eventuality not predicted by the models. JPMorgan Chase, Deutsche Bank, and many other banks and funds suffered substantial losses. For a few weeks after the turmoil, the banking community engaged in soul-searching. At J.P. Morgan the traders stuck bananas on their desks as a jibe at the so-called F9 model monkeys, the mathematical wizards who had created such havoc. (The “monkeys” who wrote the statistical models tended to use the “F9” key on the computer when they performed their calculations, giving rise to the tag.) J.P. Morgan, Deutsche, and others conducted internal reviews that led them to introduce slight changes in their statistical systems. GLG Ltd., one large hedge fund, told its investors that it would use a wider set of data to analyze CDOs in the future. Within a couple of months, though, the markets rebounded, and the furor died down.

…

Compared to Greenspan, Geithner was not just younger, but he also commanded far less clout and respect. As the decade wore on, though, he became privately uneasy about some of the trends in the credit world. From 2005 onwards, he started to call on bankers to prepare for so-called “fat tails,” a statistical term for extremely negative events that occur more often than the normal bell curve statistical models the banks’ risk assessment relied on so much implied. He commented in the spring of 2006: “A number of fundamental changes in the US financial system over the past twenty-five years appear to have rendered it able to withstand the stress of a broader array of shocks than was the case in the past. [But] confidence in the overall resilience of the financial system needs to be tempered by the realization that there is much we still do not know about the likely sources and consequences of future stress to the system…[and]…The proliferation of new forms of derivatives and structured financial products has changed the nature of leverage in the financial system.

pages: 317 words: 100,414

**
Superforecasting: The Art and Science of Prediction
** by
Philip Tetlock,
Dan Gardner

Affordable Care Act / Obamacare, Any sufficiently advanced technology is indistinguishable from magic, availability heuristic, Black Swan, butterfly effect, buy and hold, cloud computing, cuban missile crisis, Daniel Kahneman / Amos Tversky, desegregation, drone strike, Edward Lorenz: Chaos theory, forward guidance, Freestyle chess, fundamental attribution error, germ theory of disease, hindsight bias, index fund, Jane Jacobs, Jeff Bezos, Kenneth Arrow, Laplace demon, longitudinal study, Mikhail Gorbachev, Mohammed Bouazizi, Nash equilibrium, Nate Silver, Nelson Mandela, obamacare, pattern recognition, performance metric, Pierre-Simon Laplace, place-making, placebo effect, prediction markets, quantitative easing, random walk, randomized controlled trial, Richard Feynman, Richard Thaler, Robert Shiller, Robert Shiller, Ronald Reagan, Saturday Night Live, scientific worldview, Silicon Valley, Skype, statistical model, stem cell, Steve Ballmer, Steve Jobs, Steven Pinker, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Watson beat the top human players on Jeopardy!

Amos had an impish sense of humor. He also appreciated the absurdity of an academic committee on a mission to save the world. So I am 98% sure he was joking. And 99% sure his joke captures a basic truth about human judgment. Probability for the Stone Age Human beings have coped with uncertainty for as long as we have been recognizably human. And for almost all that time we didn’t have access to statistical models of uncertainty because they didn’t exist. It was remarkably late in history—arguably as late as the 1713 publication of Jakob Bernoulli’s Ars Conjectandi—before the best minds started to think seriously about probability. Before that, people had no choice but to rely on the tip-of-your-nose perspective. You see a shadow moving in the long grass. Should you worry about lions? You try to think of an example of a lion attacking from the long grass.

…

Appendix Ten Commandments for Aspiring Superforecasters The guidelines sketched here distill key themes in this book and in training systems that have been experimentally demonstrated to boost accuracy in real-world forecasting contests. For more details, visit www.goodjudgment.com. (1) Triage. Focus on questions where your hard work is likely to pay off. Don’t waste time either on easy “clocklike” questions (where simple rules of thumb can get you close to the right answer) or on impenetrable “cloud-like” questions (where even fancy statistical models can’t beat the dart-throwing chimp). Concentrate on questions in the Goldilocks zone of difficulty, where effort pays off the most. For instance, “Who will win the presidential election, twelve years out, in 2028?” is impossible to forecast now. Don’t even try. Could you have predicted in 1940 the winner of the election, twelve years out, in 1952? If you think you could have known it would be a then-unknown colonel in the United States Army, Dwight Eisenhower, you may be afflicted by one of the worst cases of hindsight bias ever documented by psychologists.

pages: 103 words: 32,131

**
Program Or Be Programmed: Ten Commands for a Digital Age
** by
Douglas Rushkoff

banking crisis, big-box store, citizen journalism, cloud computing, digital map, East Village, financial innovation, Firefox, hive mind, Howard Rheingold, invention of the printing press, Kevin Kelly, Marshall McLuhan, peer-to-peer, Silicon Valley, statistical model, Stewart Brand, Ted Nelson, WikiLeaks

In fact, the game only became a mass phenomenon as free agenting and Major League players’ strikes soured fans on the sport. As baseball became a business, the fans took back baseball as a game—even if it had to happen on their computers. The effects didn’t stay in the computer. Leveraging the tremendous power of digital abstraction back to the real world, Billy Bean, coach of the Oakland Athletics, applied these same sorts of statistical modeling to players for another purpose: to assemble a roster for his own Major League team. Bean didn’t have the same salary budget as his counterparts in New York or Los Angeles, and he needed to find another way to assemble a winning combination. So he abstracted and modeled available players in order to build a better team that went from the bottom to the top of its division, and undermined the way that money had come to control the game.

pages: 123 words: 32,382

**
Grouped: How Small Groups of Friends Are the Key to Influence on the Social Web
** by
Paul Adams

Airbnb, Cass Sunstein, cognitive dissonance, David Brooks, information retrieval, invention of the telegraph, planetary scale, race to the bottom, Richard Thaler, sentiment analysis, social web, statistical model, The Wisdom of Crowds, web application, white flight

Research by Forrester found that cancer patients trust their local care physician more than world renowned cancer treatment centers, and in most cases, the patient had known their local care physician for years.16 We overrate the advice of experts Psychologist Philip Tetlock conducted numerous studies to test the accuracy of advice from experts in the fields of journalism and politics. He quantified over 82,000 predictions and found that the journalism experts tended to perform slightly worse than picking answers at random. Political experts didn’t fare much better. They slightly outperformed random chance, but did not perform as well as a basic statistical model. In fact, they actually performed slightly better at predicting things outside their area of expertise, and 80 percent of their predictions were wrong. Studies in finance also show that only 20 percent of investment bankers outperform the stock market.17 We overestimate what we know Sometimes we consider ourselves as experts, even though we don’t know as much as we think we know. Research by Russo and Schoemaker asked managers in the advertising industry questions about their domain.

pages: 719 words: 104,316

**
R Cookbook
** by
Paul Teetor

Debian, en.wikipedia.org, p-value, quantitative trading / quantitative ﬁnance, statistical model

Solution The factor function encodes your vector of discrete values into a factor: > f <- factor(v) # v is a vector of strings or integers If your vector contains only a subset of possible values and not the entire universe, then include a second argument that gives the possible levels of the factor: > f <- factor(v, levels) Discussion In R, each possible value of a categorical variable is called a level. A vector of levels is called a factor. Factors fit very cleanly into the vector orientation of R, and they are used in powerful ways for processing data and building statistical models. Most of the time, converting your categorical data into a factor is a simple matter of calling the factor function, which identifies the distinct levels of the categorical data and packs them into a factor: > f <- factor(c("Win","Win","Lose","Tie","Win","Lose")) > f [1] Win Win Lose Tie Win Lose Levels: Lose Tie Win Notice that when we printed the factor, f, R did not put quotes around the values.

…

So think twice before you diddle with those globals: do you really want all lines in all graphics to be (say) magenta, dotted, and three times wider? Probably not, so use local parameters rather than global parameters whenever possible. See Also The help page for par lists the global graphics parameters; the chapter of R in a Nutshell on graphics includes the list with useful annotations. R Graphics contains extensive explanations of graphics parameters. Chapter 11. Linear Regression and ANOVA Introduction In statistics, modeling is where we get down to business. Models quantify the relationships between our variables. Models let us make predictions. A simple linear regression is the most basic model. It’s just two variables and is modeled as a linear relationship with an error term: yi = β0 + β1xi + εi We are given the data for x and y. Our mission is to fit the model, which will give us the best estimates for β0 and β1 (Recipe 11.1).

pages: 502 words: 107,510

**
Natural Language Annotation for Machine Learning
** by
James Pustejovsky,
Amber Stubbs

Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, game design, information retrieval, iterative process, natural language processing, pattern recognition, performance metric, sentiment analysis, social web, speech recognition, statistical model, text mining

This is a corpus of tagged and parsed sentences of naturally occurring English (4.5 million words). The British National Corpus (BNC) is compiled and released as the largest corpus of English to date (100 million words). The Text Encoding Initiative (TEI) is established to develop and maintain a standard for the representation of texts in digital form. 2000s: As the World Wide Web grows, more data is available for statistical models for Machine Translation and other applications. The American National Corpus (ANC) project releases a 22-million-word subcorpus, and the Corpus of Contemporary American English (COCA) is released (400 million words). Google releases its Google N-gram Corpus of 1 trillion word tokens from public web pages. The corpus holds up to five n-grams for each word token, along with their frequencies . 2010s: International standards organizations, such as ISO, begin to recognize and co-develop text encoding formats that are being used for corpus annotation efforts.

…

.), this algorithm computes a probability distribution over the possible labels associated with them, and then computes the best label sequence. We can identify two basic methods for sequence classification: Feature-based classification A sequence is tranformed into a feature vector. The vector is then classified according to conventional classifier methods. Model-based classification An inherent model of the probability distribution of the sequence is built. HMMs and other statistical models are examples of this method. Included in feature-based methods are n-gram models of sequences, where an n-gram is selected as a feature. Given a set of such n-grams, we can represent a sequence as a binary vector of the occurrence of the n-grams, or as a vector containing frequency counts of the n-grams. With this sort of encoding, we can apply conventional methods to model sequences (Manning and Schütze 1999).

pages: 350 words: 103,270

**
The Devil's Derivatives: The Untold Story of the Slick Traders and Hapless Regulators Who Almost Blew Up Wall Street . . . And Are Ready to Do It Again
** by
Nicholas Dunbar

asset-backed security, bank run, banking crisis, Basel III, Black Swan, Black-Scholes formula, bonus culture, break the buck, buy and hold, capital asset pricing model, Carmen Reinhart, Cass Sunstein, collateralized debt obligation, commoditize, Credit Default Swap, credit default swaps / collateralized debt obligations, delayed gratification, diversification, Edmond Halley, facts on the ground, financial innovation, fixed income, George Akerlof, implied volatility, index fund, interest rate derivative, interest rate swap, Isaac Newton, John Meriwether, Kenneth Rogoff, Kickstarter, Long Term Capital Management, margin call, market bubble, money market fund, Myron Scholes, Nick Leeson, Northern Rock, offshore financial centre, Paul Samuelson, price mechanism, regulatory arbitrage, rent-seeking, Richard Thaler, risk tolerance, risk/return, Ronald Reagan, shareholder value, short selling, statistical model, The Chicago School, Thomas Bayes, time value of money, too big to fail, transaction costs, value at risk, Vanguard fund, yield curve, zero-sum game

The mattress had done its job—it had given international regulators the confidence to sign off as commercial banks built up their trading businesses. Betting—and Beating—the Spread Now return to the trading floor, to the people regulators and bank senior management need to police. Although they are taught to overcome risk aversion, traders continue to look for a mattress everywhere, in the form of “free lunches.” But do they use statistical modeling to identify a mattress, and make money? If you talk to traders, the answer tends to be no. Listen to the warning of a senior Morgan Stanley equities trader who I interviewed in 2009: “You can compare to theoretical or historic value. But these forms of trading are probably a bit dangerous.” While regulators and senior bankers may have embraced VAR, traders themselves have always been skeptical.

…

According to the Morgan Stanley trader, “You study the perception of the market: I buy this because the next tick will be on the upside, or I sell because the next tick will be on the downside. This is probably based on the observations of your peers and so on. If you look purely at the anticipation of the price, that’s a way to make money in trading.” One reason traders don’t tend to make outright bets on the basis of statistical modeling is that capital rules such as VAR discourage it. The capital required to be set aside by VAR scales up with the size of the positions and the degree of worst-case scenario projected by the statistics. For volatile markets like equities, that restriction takes a big bite out of potential profit since trading firms must borrow to invest.5 On the other hand, short-term, opportunistic trading (which might be less profitable) slips under the VAR radar because the positions never stay on the books for very long.

pages: 319 words: 106,772

**
Irrational Exuberance: With a New Preface by the Author
** by
Robert J. Shiller

Andrei Shleifer, asset allocation, banking crisis, Benoit Mandelbrot, business cycle, buy and hold, computer age, correlation does not imply causation, Daniel Kahneman / Amos Tversky, demographic transition, diversification, diversified portfolio, equity premium, Everybody Ought to Be Rich, experimental subject, hindsight bias, income per capita, index fund, Intergovernmental Panel on Climate Change (IPCC), Joseph Schumpeter, Long Term Capital Management, loss aversion, mandelbrot fractal, market bubble, market design, market fundamentalism, Mexican peso crisis / tequila crisis, Milgram experiment, money market fund, moral hazard, new economy, open economy, pattern recognition, Ponzi scheme, price anchoring, random walk, Richard Thaler, risk tolerance, Robert Shiller, Robert Shiller, Ronald Reagan, Small Order Execution System, spice trade, statistical model, stocks for the long run, survivorship bias, the market place, Tobin tax, transaction costs, tulip mania, urban decay, Y2K

Thus it is not surprising, according to this line of reasoning, that we often do not ﬁnd new information in the newspaper on the day of a price change: earlier information, appearing to the casual observer as tangential or irrelevant, has already been interpreted by perceptive investors as signiﬁcant to the fundamentals that should determine share prices. Another argument advanced to explain why days of unusually large stock price movements have often not been found to coincide with important news is that a conﬂuence of factors may cause a signiﬁcant market change, even if the individual factors themselves are not particularly newsworthy. For example, suppose certain investors are informally using a particular statistical model that forecasts fundamental value using a number of economic indicators. If all or most of these particular indicators point the same way on a given day, even if no single one of them is of any substantive importance by itself, their combined effect will be noteworthy. Both of these interpretations of the tenuous relationship between news and market movements assume that the public is paying continuous attention to the news—reacting sensitively to the slightest clues about market fundamentals, constantly and carefully adding up all the disparate pieces of evidence.

…

Most notable among them was Robert Merton, a brilliant ﬁnancial theorist who was later to win the Nobel Prize in economics (and also to suffer a major ﬁnancial loss as a principal in the Long Term Capital Management hedge fund). Merton, with Terry Marsh, wrote an article in the American Economic Review in 1986 that argued against my results and concluded, ironically, that speculative markets were not too volatile.26 John Campbell and I wrote a number of papers attempting to put these claims of excess volatility on a more secure footing, and we developed statistical models to study the issue and deal with some of the problems emphasized by the critics.27 We felt that we had established in a fairly convincing way that stock markets do violate the efﬁcient markets model. Our research has not completely settled the matter, however. There are just too many possible statistical issues that can be raised, and the sample provided by only a little over a century of data cannot prove anything conclusively.

pages: 502 words: 107,657

**
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
** by
Eric Siegel

Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, butter production in bangladesh, call centre, Charles Lindbergh, commoditize, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil, en.wikipedia.org, Erik Brynjolfsson, Everything should be made as simple as possible, experimental subject, Google Glasses, happiness index / gross national happiness, job satisfaction, Johann Wolfgang von Goethe, lifelogging, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mass immigration, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, Shai Danziger, software as a service, speech recognition, statistical model, Steven Levy, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Davenport, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra, zero-sum game

FICO: Todd Steffes, “Predictive Analytics: Saving Lives and Lowering Medical Bills,” Analytics Magazine, Analytics Informs, January/February 2012. www.analytics-magazine.org/januaryfebruary-2012/505-predictive-analytics-saving-lives-and-lowering-medical-bills. GlaxoSmithKline (UK): Vladimir Anisimov, GlaxoSmithKline, “Predictive Analytic Patient Recruitment and Drug Supply Modelling in Clinical Trials,” Predictive Analytics World London Conference, November 30, 2011, London, UK. www.predictiveanalyticsworld.com/london/2011/agenda.php#day1–16. Vladimir V. Anisimov, “Statistical Modelling of Clinical Trials (Recruitment and Randomization),” Communications in Statistics—Theory and Methods 40, issue 19–20 (2011): 3684–3699. www.tandfonline.com/toc/lsta20/40/19–20. MultiCare Health System (four hospitals in Washington): Karen Minich-Pourshadi for HealthLeaders Media, “Hospital Data Mining Hits Paydirt,” HealthLeaders Media Online, November 29, 2010. www.healthleadersmedia.com/page-1/FIN-259479/Hospital-Data-Mining-Hits-Paydirt.

…

Johnson, Serena Lee, Frank Doherty, and Arthur Kressner (Consolidated Edison Company of New York), “Predicting Electricity Distribution Feeder Failures Using Machine Learning Susceptibility Analysis,” March 31, 2006. www.phillong.info/publications/GBAetal06_susc.pdf. This work has been partly supported by a research contract from Consolidated Edison. BNSF Railway: C. Tyler Dick, Christopher P. L. Barkan, Edward R. Chapman, and Mark P. Stehly, “Multivariate Statistical Model for Predicting Occurrence and Location of Broken Rails,” Transportation Research Board of the National Academies, January 26, 2007. http://trb.metapress.com/content/v2j6022171r41478/. See also: http://ict.uiuc.edu/railroad/cee/pdf/Dick_et_al_2003.pdf. TTX: Thanks to Mahesh Kumar at Tiger Analytics for this case study, “Predicting Wheel Failure Rate for Railcars.” Fortune 500 global technology company: Thanks to Dean Abbott, Abbot Analytics (http://abbottanalytics.com/index.php) for information about this case study.

pages: 353 words: 106,704

**
Choked: Life and Breath in the Age of Air Pollution
** by
Beth Gardiner

barriers to entry, Boris Johnson, call centre, carbon footprint, clean water, connected car, deindustrialization, Donald Trump, Elon Musk, epigenetics, Exxon Valdez, failed state, Hyperloop, index card, Indoor air pollution, Mahatma Gandhi, megacity, meta analysis, meta-analysis, Ronald Reagan, self-driving car, Silicon Valley, Skype, statistical model, Steve Jobs, white picket fence

But while its harm was real, ozone was not what was dampening the power of those young lungs. Tiny airborne particles known as PM2.5, so small they are thought to enter the bloodstream and penetrate vital organs, including the brain, were a far more potent danger. Nitrogen dioxide, one of a family of gases known as NOx, also had a powerful effect. In fact, it poured out of cars, trucks, and ships in such close synchronicity with PM2.5 that even Jim Gauderman’s statistical models couldn’t disentangle the two pollutants’ effects. That wasn’t all. In what may have been their most worrisome discovery, the team found the pollutants were wreaking harm even at levels long assumed to be safe. In the years to come, the implications of that uncomfortable finding would be felt far beyond the pages of prestigious scientific journals. * * * Long Beach, where Erika Fields grew up, and where she gave birth to a baby after high school, wasn’t the most polluted of the Children’s Health Study communities, but it was certainly on the wrong end of the scale.

…

Even for a single country, they can vary widely, depending on the pollutants they include and the methodology they use. These numbers are everywhere: more than a million and a half annual air pollution deaths each for China and India.10 Approaching a half million in Europe.11 Upward of a hundred thousand in America.12 None are arrived at by counting individual cases; like Walton’s, they’re all derived through complex statistical modeling. Even if you tried, David Spiegelhalter says, it would be impossible to compile a body-by-body tabulation, since pollution—unlike, say, a heart attack or stroke—is not a cause of death in the medical sense. It’s more akin to smoking, obesity, or inactivity, all risk factors that can hasten a death or make it more likely, either alone or as one of several contributing factors. That’s not to say there’s no point trying to sketch the outlines of its impact, Spiegelhalter is quick to add.

pages: 407 words: 104,622

**
The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution
** by
Gregory Zuckerman

affirmative action, Affordable Care Act / Obamacare, Albert Einstein, Andrew Wiles, automated trading system, backtesting, Bayesian statistics, beat the dealer, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, blockchain, Brownian motion, butter production in bangladesh, buy and hold, buy low sell high, Claude Shannon: information theory, computer age, computerized trading, Credit Default Swap, Daniel Kahneman / Amos Tversky, diversified portfolio, Donald Trump, Edward Thorp, Elon Musk, Emanuel Derman, endowment effect, Flash crash, George Gilder, Gordon Gekko, illegal immigration, index card, index fund, Isaac Newton, John Meriwether, John Nash: game theory, John von Neumann, Loma Prieta earthquake, Long Term Capital Management, loss aversion, Louis Bachelier, mandelbrot fractal, margin call, Mark Zuckerberg, More Guns, Less Crime, Myron Scholes, Naomi Klein, natural language processing, obamacare, p-value, pattern recognition, Peter Thiel, Ponzi scheme, prediction markets, quantitative hedge fund, quantitative trading / quantitative ﬁnance, random walk, Renaissance Technologies, Richard Thaler, Robert Mercer, Ronald Reagan, self-driving car, Sharpe ratio, Silicon Valley, sovereign wealth fund, speech recognition, statistical arbitrage, statistical model, Steve Jobs, stochastic process, the scientific method, Thomas Bayes, transaction costs, Turing machine

The brute told him he was making a big mistake. “Are you nuts? You can’t make any money in mathematics,” he sneered. The experience taught Patterson to distrust most moneymaking operations, even those that appeared legitimate—one reason why he was so skeptical of Simons years later. After graduate school, Patterson thrived as a cryptologist for the British government, building statistical models to unscramble intercepted messages and encrypt secret messages in a unit made famous during World War II when Alan Turing famously broke Germany’s encryption codes. Patterson harnessed the simple-yet-profound Bayes’ theorem of probability, which argues that, by updating one’s initial beliefs with new, objective information, one can arrive at improved understandings. Patterson solved a long-standing problem in the field, deciphering a pattern in the data others had missed, becoming so valuable to the government that some top-secret documents shared with allies were labeled “For US Eyes Only and for Nick Patterson.”

…

At any point in a sentence, there exists a certain probability of what might come next, which can be estimated based on past, common usage. “Pie” is more likely to follow the word “apple” in a sentence than words like “him” or “the,” for example. Similar probabilities also exist for pronunciation, the IBM crew argued. Their goal was to feed their computers with enough data of recorded speech and written text to develop a probabilistic, statistical model capable of predicting likely word sequences based on sequences of sounds. Their computer code wouldn’t necessarily understand what it was transcribing, but it would learn to transcribe language, nonetheless. In mathematical terms, Brown, Mercer, and the rest of Jelinek’s team viewed sounds as the output of a sequence in which each step along the way is random, yet dependent on the previous step—a hidden Markov model.

**
Capital Ideas Evolving
** by
Peter L. Bernstein

Albert Einstein, algorithmic trading, Andrei Shleifer, asset allocation, business cycle, buy and hold, buy low sell high, capital asset pricing model, commodity trading advisor, computerized trading, creative destruction, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, diversification, diversified portfolio, endowment effect, equity premium, Eugene Fama: efficient market hypothesis, financial innovation, fixed income, high net worth, hiring and firing, index fund, invisible hand, Isaac Newton, John Meriwether, John von Neumann, Joseph Schumpeter, Kenneth Arrow, London Interbank Offered Rate, Long Term Capital Management, loss aversion, Louis Bachelier, market bubble, mental accounting, money market fund, Myron Scholes, paper trading, passive investing, Paul Samuelson, price anchoring, price stability, random walk, Richard Thaler, risk tolerance, risk-adjusted returns, risk/return, Robert Shiller, Robert Shiller, Sharpe ratio, short selling, Silicon Valley, South Sea Bubble, statistical model, survivorship bias, systematic trading, technology bubble, The Wealth of Nations by Adam Smith, transaction costs, yield curve, Yogi Berra, zero-sum game

W * Unless otherwise specif ied, quotations are from personal interviews or correspondence. 58 bern_c05.qxd 3/23/07 9:02 AM Page 59 Andrew Lo 59 While he was at Bronx Science, Lo read The Foundation Trilogy by the science fiction writer Isaac Asimov. The story was about a mathematician who develops a theory of human behavior called “psychohistory.” Psychohistory can predict the future course of human events, but only when the population reaches a certain size because the predictions are based on statistical models. Lo was hooked. He found Asimov’s narrative to be plausible enough to become a reality some day, and he wanted to be the one to make it happen. Economics, especially game theory and mathematical economics, looked like the best way to get started. He made the decision in his second year at Yale to do just that. Toward the end of the first semester of his graduate work at Harvard, Lo ran into a former classmate from Bronx Science who was studying economics at MIT.

…

At that moment, in the early 1980s, academics in the field of financial economics were still working out the full theoretical implications of Markowitz’s theory of portfolio selection, the Efficient Market Hypothesis, the Capital Asset Pricing Model, the options pricing model, and Modigliani and Miller’s iconoclastic ideas about corporate finance and the central role of arbitrage. bern_c05.qxd 60 3/23/07 9:02 AM Page 60 THE THEORETICIANS That emphasis on theory made the bait even tastier for Lo. He saw the way clear to follow Asimov’s advice. By applying statistical models to the daily practice of finance in the real world, he would not only move the field of finance forward from its focus on theory, but even more enticing, he would also find the holy grail he was seeking in the first place: solutions to Asimov’s psychohistory. Progress was rapid. By 1988 he was an untenured professor at MIT, having turned down an offer of tenure to stay at Wharton. And by 1990, at the age of 29, he received a tenured professorship at MIT.

pages: 416 words: 108,370

**
Hit Makers: The Science of Popularity in an Age of Distraction
** by
Derek Thompson

Airbnb, Albert Einstein, Alexey Pajitnov wrote Tetris, always be closing, augmented reality, Clayton Christensen, Donald Trump, Downton Abbey, full employment, game design, Gordon Gekko, hindsight bias, indoor plumbing, industrial cluster, information trail, invention of the printing press, invention of the telegraph, Jeff Bezos, John Snow's cholera map, Kodak vs Instagram, linear programming, Lyft, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Metcalfe’s law, Minecraft, Nate Silver, Network effects, Nicholas Carr, out of africa, randomized controlled trial, recommendation engine, Robert Gordon, Ronald Reagan, Silicon Valley, Skype, Snapchat, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, Steven Pinker, subscription business, telemarketer, the medium is the message, The Rise and Fall of American Growth, Uber and Lyft, Uber for X, uber lyft, Vilfredo Pareto, Vincenzo Peruggia: Mona Lisa, women in the workforce

The movie’s 1955 box office gross made it the thirteenth most popular film of the year behind Cinerama Holiday, Mister Roberts, Battle Cry, Oklahoma!, Guys and Dolls, Lady and the Tramp, Strategic Air Command, Not as a Stranger, To Hell and Back, The Sea Chase, The Seven Year Itch, and The Tall Men. If you’ve heard of five of those twelve movies, you have me beat. And yet they were all more popular than the film that launched the bestselling rock song of all time. There is no statistical model in the world to forecast that the forgotten B-side of a middling record played over the credits of the thirteenth most popular movie of any year will automatically become the most popular rock-and-roll song of all time. The business of creativity is a game of chance—a complex, adaptive, semi-chaotic game with Bose-Einstein distribution dynamics and Pareto’s power law characteristics with dual-sided uncertainty.

…

killing 127 people in three days: Kathleen Tuthill, “John Snow and the Broad Street Pump,” Cricket 31, no. 3 (November 2003), reprinted by UCLA Department of Epidemiology, www.ph.ucla.edu/epi/snow/snowcricketarticle.html. “There were only ten deaths in houses”: John Snow, Medical Times and Gazette 9, September 23, 1854: 321–22, reprinted by UCLA Department of Epidemiology, www.ph.ucla.edu/epi/snow/choleraneargoldensquare.html. Note: Other accounts of Snow’s methodology, such as David Freedman’s paper “Statistical Models and Shoe Leather,” give more weight to Snow’s investigation of the water supply companies. A few years before the outbreak, one of London’s water suppliers had moved its intake point upstream from the main sewage discharge on the Thames, while another company kept its intake point downstream from the sewage. London had been divided into two groups, one drinking sewage and one drinking purer water.

pages: 456 words: 185,658

**
More Guns, Less Crime: Understanding Crime and Gun-Control Laws
** by
John R. Lott

affirmative action, Columbine, crack epidemic, Donald Trump, Edward Glaeser, G4S, gun show loophole, income per capita, More Guns, Less Crime, Sam Peltzman, selection bias, statistical model, the medium is the message, transaction costs

As to the concern that other changes in law enforcement may have been occurring at the same time, the estimates account for changes in other gun-control laws and changes in law enforcement as measured by arrest and conviction rates as well as by prison terms. No previous study of crime has attempted to control for as many diﬀerent factors that might explain changes in the crime rate. 3 Did I assume that there was an immediate and constant effect from these laws and that the effect should be the same everywhere? The “statistical models assumed: (1) an immediate and constant eﬀect of shall-issue laws, and (2) similar eﬀects across diﬀerent states and counties.” (Webster, “Claims,” p. 2; see also Dan Black and Daniel Nagin, “Do ‘Right-to-Carry’ Laws Deter Violent Crime?” Journal of Legal Studies 27 [January 1998], p. 213.) One of the central arguments both in the original paper and in this book is that the size of the deterrent eﬀect is related to the number of permits issued, and it takes many years before states reach their long-run level of permits.

…

A major reason for the larger eﬀect on crime in the more urban counties was that in rural areas, permit requests already were being approved; hence it was in urban areas that the number of permitted concealed handguns increased the most. A week later, in response to a column that I published in the Omaha WorldHerald,20 Mr. Webster modified this claim somewhat: Lott claims that his analysis did not assume an immediate and constant eﬀect, but that is contrary to his published article, in which the vast majority of the statistical models assume such an eﬀect. (Daniel W. Webster, “Concealed-Gun Research Flawed,” Omaha World-Herald, March 12, 1997; emphasis added.) When one does research, it is most appropriate to take the simplest specifications first and then gradually make things more complicated. The simplest way of doing this is to examine the mean crime rates before and 136 | CHAPTER SEVEN after the change in a law.

…

While he includes a chapter that contains replies to his critics, unfortunately he doesn’t directly respond to the key Black and Nagin finding that formal statistical tests reject his methods. The closest he gets to addressing this point is to acknowledge “the more serious possibility is that some other factor may have caused both the reduction in crime rates and the passage of the law to occur at the same time,” but then goes on to say that he has “presented over a thousand [statistical model] specifications” that reveal “an extremely consistent pattern” that right-to-carry laws reduce crime. Another view would be that a thousand versions of a demonstrably invalid analytical approach produce boxes full of invalid results. (Jens Ludwig, “Guns and Numbers,” Washington Monthly, June 1998, p. 51)76 We applied a number of specification tests suggested by James J. Heckman and V. Joseph Hotz.

pages: 161 words: 39,526

**
Applied Artificial Intelligence: A Handbook for Business Leaders
** by
Mariya Yao,
Adelyn Zhou,
Marlene Jia

Airbnb, Amazon Web Services, artificial general intelligence, autonomous vehicles, business intelligence, business process, call centre, chief data officer, computer vision, conceptual framework, en.wikipedia.org, future of work, industrial robot, Internet of things, iterative process, Jeff Bezos, job automation, Marc Andreessen, natural language processing, new economy, pattern recognition, performance metric, price discrimination, randomized controlled trial, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, skunkworks, software is eating the world, source of truth, speech recognition, statistical model, strong AI, technological singularity

“We’ve had a lot of success hiring from career fairs that Galvanize organizes, where we present the unique challenges our company tackles in healthcare,” he adds.(57) Experienced Scientists and Researchers Hiring experienced data scientists and machine learning researchers requires a different approach. For these positions, employers typically look for a doctorate or extensive experience in machine learning, statistical modeling, or related fields. You will usually source these talented recruits through strategic networking, academic conferences, or blatant poaching. To this end, you can partner with universities or research departments and sponsor conferences to build your brand reputation. You can also host competitions on Kaggle or similar platforms. Provide a problem, a dataset, and a prize purse to attract competitors.

pages: 147 words: 39,910

**
The Great Mental Models: General Thinking Concepts
** by
Shane Parrish

Albert Einstein, Atul Gawande, Barry Marshall: ulcers, bitcoin, Black Swan, colonial rule, correlation coefficient, correlation does not imply causation, cuban missile crisis, Daniel Kahneman / Amos Tversky, dark matter, delayed gratification, feminist movement, index fund, Isaac Newton, Jane Jacobs, mandelbrot fractal, Pierre-Simon Laplace, Ponzi scheme, Richard Feynman, statistical model, stem cell, The Death and Life of Great American Cities, the map is not the territory, the scientific method, Thomas Bayes, Torches of Freedom

“It became possible also to map out master plans for the statistical city, and people take these more seriously, for we are all accustomed to believe that maps and reality are necessarily related, or that if they are not, we can make them so by altering reality.” 12 Jacobs’ book is, in part, a cautionary tale of what can happen when faith in the model influences the decisions we make in the territory. When we try to fit complexity into the simplification. _ Jacobs demonstrated that mapping the interaction between people and sidewalks was an important factor in determining how to improve city safety. «In general, when building statistical models, we must not forget that the aim is to understand something about the real world. Or predict, choose an action, make a decision, summarize evidence, and so on, but always about the real world, not an abstract mathematical world: our models are not the reality. » David Hand13 Conclusion Maps have long been a part of human society. They are valuable tools to pass on knowledge. Still, in using maps, abstractions, and models, we must always be wise to their limitations.

pages: 302 words: 82,233

**
Beautiful security
** by
Andy Oram,
John Viega

Albert Einstein, Amazon Web Services, business intelligence, business process, call centre, cloud computing, corporate governance, credit crunch, crowdsourcing, defense in depth, Donald Davies, en.wikipedia.org, fault tolerance, Firefox, loose coupling, Marc Andreessen, market design, MITM: man-in-the-middle, Monroe Doctrine, new economy, Nicholas Carr, Nick Leeson, Norbert Wiener, optical character recognition, packet switching, peer-to-peer, performance metric, pirate software, Robert Bork, Search for Extraterrestrial Intelligence, security theater, SETI@home, Silicon Valley, Skype, software as a service, statistical model, Steven Levy, The Wisdom of Crowds, Upton Sinclair, web application, web of trust, zero day, Zimmermann PGP

Ashenfelter is a statistician at Princeton who loves wine but is perplexed by the pomp and circumstance around valuing and rating wine in much the same way I am perplexed by the pomp and circumstance surrounding risk management today. In the 1980s, wine critics dominated the market with predictions based on their own reputations, palate, and frankly very little more. Ashenfelter, in contrast, studied the Bordeaux region of France and developed a statistic model about the quality of wine. His model was based on the average rainfall in the winter before the growing season (the rain that makes the grapes plump) and the average sunshine during the growing season (the rays that make the grapes ripe), resulting in simple formula: quality = 12.145 + (0.00117 * winter rainfall) + (0.0614 * average growing season temperature) (0.00386 * harvest rainfall) Of course he was chastised and lampooned by the stuffy wine critics who dominated the industry, but after several years of producing valuable results, his methods are now widely accepted as providing important valuation criteria for wine.

…

I hope that when I look back on this text and my blog in years to come, I’ll cringe at their resemblance to the cocktail-mixing house robots from movies of the 1970s. I believe the right elements are really coming together where technology can create better technology. Advances in technology have been used to both arm and disarm the planet, to empower and oppress populations, and to attack and defend the global community and all it will have become. The areas I’ve pulled together in this chapter—from business process management, number crunching and statistical modeling, visualization, and long-tail technology—provide fertile ground for security management systems in the future that archive today’s best efforts in the annals of history. At least I hope so, for I hate mediocrity with a passion and I think security management systems today are mediocre at best! 168 CHAPTER NINE Acknowledgments This chapter is dedicated to my mother, Margaret Curphey, who passed away after an epileptic fit in 2004 at her house in the south of France.

pages: 541 words: 109,698

**
Mining the Social Web: Finding Needles in the Social Haystack
** by
Matthew A. Russell

Climategate, cloud computing, crowdsourcing, en.wikipedia.org, fault tolerance, Firefox, full text search, Georg Cantor, Google Earth, information retrieval, Mark Zuckerberg, natural language processing, NP-complete, Saturday Night Live, semantic web, Silicon Valley, slashdot, social graph, social web, statistical model, Steve Jobs, supply-chain management, text mining, traveling salesman, Turing test, web application

Substituting various values into the precision and recall formulas is straightforward and a worthwhile exercise if this is your first time encountering these terms. For example, what would the precision, recall, and F1 score have been if your algorithm had identified “Mr. Green”, “Colonel”, “Mustard”, and “candlestick”? As somewhat of an aside, you might find it interesting to know that many of the most compelling technology stacks used by commercial businesses in the NLP space use advanced statistical models to process natural language according to supervised learning algorithms. A supervised learning algorithm is essentially an approach in which you provide training samples of the form [(input1, output1), (input2, output2), ..., (inputN, outputN)] to a model such that the model is able to predict the tuples with reasonable accuracy. The tricky part is ensuring that the trained model generalizes well to inputs that have not yet been encountered.

…

SocialGraph Node Mapper, Brief analysis of breadth-first techniques sorting, Sensible Sorting, Sorting Documents by Value documents by value, Sorting Documents by Value documents in CouchDB, Sensible Sorting split method, using to tokenize text, Data Hacking with NLTK, Before You Go Off and Try to Build a Search Engine… spreadsheets, visualizing Facebook network data, Visualizing with spreadsheets (the old-fashioned way) statistical models processing natural language, Quality of Analytics stemming verbs, Querying Buzz Data with TF-IDF stopwords, Data Hacking with NLTK, Analysis of Luhn’s Summarization Algorithm downloading NLTK stopword data, Data Hacking with NLTK filtering out before document summarization, Analysis of Luhn’s Summarization Algorithm streaming API (Twitter), Analyzing Tweets (One Entity at a Time) Strong Links API, The Infochimps “Strong Links” API, Interactive 3D Graph Visualization student’s t-score, How the Collocation Sausage Is Made: Contingency Tables and Scoring Functions subject-verb-object triples, Entity-Centric Analysis: A Deeper Understanding of the Data, Man Cannot Live on Facts Alone summarizing documents, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm analysis of Luhn’s algorithm, Analysis of Luhn’s Summarization Algorithm Tim O’Reilly Radar blog post (example), Summarizing Documents summingReducer function, Frequency by date/time range, What entities are in Tim’s tweets?

pages: 404 words: 43,442

**
The Art of R Programming
** by
Norman Matloff

Debian, discrete time, Donald Knuth, general-purpose programming language, linked data, sorting algorithm, statistical model

The latter again stems from vectorization, a beneﬁt discussed in detail in Chapter 14. This approach is used in the loop beginning at line 53. (Arguably, in this case, the increase in speed comes at the expense of readability of the code.) 9.1.7 Extended Example: A Procedure for Polynomial Regression As another example, consider a statistical regression setting with one predictor variable. Since any statistical model is merely an approximation, in principle, you can get better and better models by ﬁtting polynomials of higher and higher degrees. However, at some point, this becomes overﬁtting, so that the prediction of new, future data actually deteriorates for degrees higher than some value. The class "polyreg" aims to deal with this issue. It ﬁts polynomials of various degrees but assesses ﬁts via cross-validation to reduce the risk of overﬁtting.

…

Input/Output 239 We’ll create a function called extractpums() to read in a PUMS ﬁle and create a data frame from its Person records. The user speciﬁes the ﬁlename and lists ﬁelds to extract and names to assign to those ﬁelds. We also want to retain the household serial number. This is good to have because data for persons in the same household may be correlated and we may want to add that aspect to our statistical model. Also, the household data may provide important covariates. (In the latter case, we would want to retain the covariate data as well.) Before looking at the function code, let’s see what the function does. In this data set, gender is in column 23 and age in columns 25 and 26. In the example, our ﬁlename is pumsa. The following call creates a data frame consisting of those two variables. pumsdf <- extractpums("pumsa",list(Gender=c(23,23),Age=c(25,26))) Note that we are stating here the names we want the columns to have in the resulting data frame.

pages: 133 words: 42,254

**
Big Data Analytics: Turning Big Data Into Big Money
** by
Frank J. Ohlhorst

algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

Much like the data themselves, the team should not be static in nature and should be able to evolve and adapt to the needs of the business. CHALLENGES REMAIN Locating the right talent to analyze data is the biggest hurdle in building a team. Such talent is in high demand, and the need for data analysts and data scientists continues to grow at an almost exponential rate. Finding this talent means that organizations will have to focus on data science and hire statistical modelers and text data–mining professionals as well as people who specialize in sentiment analysis. Success with Big Data analytics requires solid data models, statistical predictive models, and test analytic models, since these will be the core applications needed to do Big Data. Locating the appropriate talent takes more than just a typical IT job placement; the skills required for a good return on investment are not simple and are not solely technology oriented.

pages: 492 words: 118,882

**
The Blockchain Alternative: Rethinking Macroeconomic Policy and Economic Theory
** by
Kariappa Bheemaiah

accounting loophole / creative accounting, Ada Lovelace, Airbnb, algorithmic trading, asset allocation, autonomous vehicles, balance sheet recession, bank run, banks create money, Basel III, basic income, Ben Bernanke: helicopter money, bitcoin, blockchain, Bretton Woods, business cycle, business process, call centre, capital controls, Capital in the Twenty-First Century by Thomas Piketty, cashless society, cellular automata, central bank independence, Claude Shannon: information theory, cloud computing, cognitive dissonance, collateralized debt obligation, commoditize, complexity theory, constrained optimization, corporate governance, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, cryptocurrency, David Graeber, deskilling, Diane Coyle, discrete time, disruptive innovation, distributed ledger, diversification, double entry bookkeeping, Ethereum, ethereum blockchain, fiat currency, financial innovation, financial intermediation, Flash crash, floating exchange rates, Fractional reserve banking, full employment, George Akerlof, illegal immigration, income inequality, income per capita, inflation targeting, information asymmetry, interest rate derivative, inventory management, invisible hand, John Maynard Keynes: technological unemployment, John von Neumann, joint-stock company, Joseph Schumpeter, Kenneth Arrow, Kenneth Rogoff, Kevin Kelly, knowledge economy, large denomination, liquidity trap, London Whale, low skilled workers, M-Pesa, Marc Andreessen, market bubble, market fundamentalism, Mexican peso crisis / tequila crisis, MITM: man-in-the-middle, money market fund, money: store of value / unit of account / medium of exchange, mortgage debt, natural language processing, Network effects, new economy, Nikolai Kondratiev, offshore financial centre, packet switching, Pareto efficiency, pattern recognition, peer-to-peer lending, Ponzi scheme, precariat, pre–internet, price mechanism, price stability, private sector deleveraging, profit maximization, QR code, quantitative easing, quantitative trading / quantitative ﬁnance, Ray Kurzweil, Real Time Gross Settlement, rent control, rent-seeking, Satoshi Nakamoto, Satyajit Das, savings glut, seigniorage, Silicon Valley, Skype, smart contracts, software as a service, software is eating the world, speech recognition, statistical model, Stephen Hawking, supply-chain management, technology bubble, The Chicago School, The Future of Employment, The Great Moderation, the market place, The Nature of the Firm, the payments system, the scientific method, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, too big to fail, trade liberalization, transaction costs, Turing machine, Turing test, universal basic income, Von Neumann architecture, Washington Consensus

As we have seen in Chapter 3, it is monetary and fiscal policy that play a determining role in guiding the state of markets and the prosperity of a nation. Thus, owing to their fundamental role in monetary policy decision making, it is important to understand the history, abilities and limitations of these models. Currently, most central banks, such as the Federal Reserve and the ECB,13 use two kinds of models to study and build forecasts about the economy (Axtell and Farmer, 2015). The first, statistical models, fit current aggregate data of variables such as GDP, interest rates, and unemployment to empirical data in order to predict/suggest what the near future holds. The second type of models (which are more widely used), are known as “Dynamic Stochastic General Equilibrium” (DSGE) models. These models are constructed on the basis that the economy would be at rest (i.e.: static equilibrium) if it wasn’t being randomly perturbed by events from outside the economy.

…

See Efficient Market Hypothesis (EMH) Equation based modelling (EBM), 196 Equilibrium business-cycle models, 221 Equilibrium economic models contract theory contact incompleteness, 171 efficiency wages, 172 explicit contracts, 172 implicit contracts, 172 intellectual framework, 171 labor market flexibility, 171 menu cost, 173 risk sharing, 171 DSGE models Federal Reserve system, 173 implicit contracts, 172 macroeconomic models of business cycle, 168 NK models, 170 non-optimizing households, 168 principles, 175 RBC models, 169 RET, 174–175 ‘rigidity’ of wage and price change, 171 SIGE, 170 steady state equilibrium, economy, 176 structure, 176 Taylor rule, 168 FRB/US model, 173, 175 Keynesian macroeconomic theory, 169 RBC models, 169–170 244 Romer’s analysis tests, 178 statistical models, 168 Estonian government, 80 European Migration Network (EMN), 88 Exogenous and endogenous function, 137 Explicit contracts, 172 F Feedback loop, 191 Fiat currency CBDC, 129 commercial banks, 129 debt-based money, 124 digital cash, 129 digital monetary framework, 125 framework, 124 ideas and methods, 130 non-bank private sector, 124 sovereign digital currency, 125–128 transition, 124 Financialization, 25 de facto, 26 definition of, 27 eastern economic association, 27 enemy of my enemy is my friend, 65 FT slogans, 26 Palley, Thomas I., 28 relative industry shares, 27 risk innovation CDOs, CLOs and CDSs, 29 non-financial firms, 29 originate, repackage and sell model, 29 originate-to-distribute model, 29 originate-to-hold model, 29 principal component, 29 production and exchange, 29 sharding, 44 Blockchain, 54 FinTech transformation, 45, 48 global Fintech financing activity, 46 private sector, 44 skeleton keys, 60 AI-led high frequency trading, 63 amalgamation, 61 Blockchain, 63–64 fragmentation process, 60 information asymmetries, 62 Kabbage, 62 ■ INDEX KYC/AML procedures, 62 KYC process, 61 machine learning, 62 P2P lending sector, 62 payments and remittances sector, 60 physical barriers, 64 rehypothecation, 63 robo-advisors, 62 SWIFT and ACH, 61 transferwise, 61 solution pathways digital identity and KYC, 67 private and public utilization, 67 scalability, 81 TBTF (see (Too Big to Fail (TBTF))) television advertisement, 25 Financialization.

pages: 428 words: 121,717

**
Warnings
** by
Richard A. Clarke

active measures, Albert Einstein, algorithmic trading, anti-communist, artificial general intelligence, Asilomar, Asilomar Conference on Recombinant DNA, Bernie Madoff, cognitive bias, collateralized debt obligation, complexity theory, corporate governance, cuban missile crisis, data acquisition, discovery of penicillin, double helix, Elon Musk, failed state, financial thriller, fixed income, Flash crash, forensic accounting, friendly AI, Intergovernmental Panel on Climate Change (IPCC), Internet of things, James Watt: steam engine, Jeff Bezos, John Maynard Keynes: Economic Possibilities for our Grandchildren, knowledge worker, Maui Hawaii, megacity, Mikhail Gorbachev, money market fund, mouse model, Nate Silver, new economy, Nicholas Carr, nuclear winter, pattern recognition, personalized medicine, phenotype, Ponzi scheme, Ray Kurzweil, Richard Feynman, Richard Feynman: Challenger O-ring, risk tolerance, Ronald Reagan, Sam Altman, Search for Extraterrestrial Intelligence, self-driving car, Silicon Valley, smart grid, statistical model, Stephen Hawking, Stuxnet, technological singularity, The Future of Employment, the scientific method, The Signal and the Noise by Nate Silver, Tunguska event, uranium enrichment, Vernor Vinge, Watson beat the top human players on Jeopardy!, women in the workforce, Y2K

The deeper they dig, the harder it gets to climb out and see what is happening outside, and the more tempting it becomes to keep on doing what they know how to do . . . uncovering new reasons why their initial inclination, usually too optimistic or pessimistic, was right.” Still, maddeningly, even the foxes, considered as a group, were only ever able to approximate the accuracy of simple statistical models that extrapolated trends. They did perform somewhat better than undergraduates subjected to the same exercises, and they outperformed the proverbial “chimp with a dart board,” but they didn’t come close to the predictive accuracy of formal statistical models. Later books have looked at Tetlock’s foundational results in some additional detail. Dan Gardner’s 2012 Future Babble draws on recent research in psychology, neuroscience, and behavioral economics to detail the biases and other cognitive processes that skew our judgment when we try to make predictions about the future.

pages: 1,164 words: 309,327

**
Trading and Exchanges: Market Microstructure for Practitioners
** by
Larry Harris

active measures, Andrei Shleifer, asset allocation, automated trading system, barriers to entry, Bernie Madoff, business cycle, buttonwood tree, buy and hold, compound rate of return, computerized trading, corporate governance, correlation coefficient, data acquisition, diversified portfolio, fault tolerance, financial innovation, financial intermediation, fixed income, floating exchange rates, High speed trading, index arbitrage, index fund, information asymmetry, information retrieval, interest rate swap, invention of the telegraph, job automation, law of one price, London Interbank Offered Rate, Long Term Capital Management, margin call, market bubble, market clearing, market design, market fragmentation, market friction, market microstructure, money market fund, Myron Scholes, Nick Leeson, open economy, passive investing, pattern recognition, Ponzi scheme, post-materialism, price discovery process, price discrimination, principal–agent problem, profit motive, race to the bottom, random walk, rent-seeking, risk tolerance, risk-adjusted returns, selection bias, shareholder value, short selling, Small Order Execution System, speech recognition, statistical arbitrage, statistical model, survivorship bias, the market place, transaction costs, two-sided market, winner-take-all economy, yield curve, zero-coupon bond, zero-sum game

Pairs traders also pay close attention to how quickly and how efficiently markets respond, on average, to new information about common fundamental factors. Arbitrageurs generally should be reluctant to trade against markets that quickly and efficiently aggregate new information because the prices in such markets tend to accurately reflect fundamental values. 17.3.2.3 Statistical Arbitrage Statistical arbitrageurs use factor models to generalize the pairs trading strategy to many instruments. Factor models are statistical models that represent instrument returns by a weighted sum of common factors plus an instrument-specific factor. The weights, called factor loadings, are unique for each instrument. The arbitrageur must estimate them. Either statistical arbitrageurs specify the factors, or they use statistical methods to identify the factors from returns data for many instruments. Specified factors typically include macroeconomic variables such as interest rates, inflation rates, industrial production, credit spreads, stock index levels, and market volatility.

…

People generally measure total volatility by using variances, standard deviations, or mean absolute deviations of price changes. The variance of a set of price changes is the average squared difference between the price change and the average price change. The standard deviation is the square root of the variance. The mean absolute deviation is the average absolute difference between the price change and the average price change. Statistical models are necessary to identify and estimate the two components of total volatility. These models exploit the primary distinguishing characteristics of the two types of volatility: Fundamental volatility consists of seemingly random price changes that do not revert, whereas transitory volatility consists of price changes that ultimately revert. The transitory price changes are generally correlated with order flows of uninformed liquidity-demanding traders.

…

• The probability that trader † is a buyer is independent of ε† and ε†+1 Let price at time † equal fundamental value plus or minus one-half of the spread depending on whether the tth trader is a buyer or a seller: so that the price change is These assumptions imply that the price change variance is The two terms are the fundamental and transitory volatility components. Roll showed that we can estimate the latter term from the expected serial covariance. It is Inverting this expression gives Roll’s serial covariance spread estimator substitutes the sample serial covariance for the expected serial covariance in this last expression. ◀ * * * The simplest statistical model that can estimate these variance components is Roll’s serial covariance spread estimator model. Roll analyzed this simple model to create a simple serial covariance estimator of bid/ask spreads. The model assumes that fundamental values follow a random walk, and that observed prices are equal to fundamental value plus or minus half of the bid/ask spread. Total variance in this model is therefore the sum of variance due to changes in fundamental values and of variance due to bid/ask bounce.

pages: 199 words: 47,154

**
Gnuplot Cookbook
** by
Lee Phillips

bioinformatics, computer vision, general-purpose programming language, pattern recognition, statistical model, web application

These new features include the use of Unicode characters, transparency, new graph positioning commands, plotting objects, internationalization, circle plots, interactive HTML5 canvas plotting, iteration in scripts, lua/tikz/LaTeX integration, cairo and SVG terminal drivers, and volatile data. What this book covers Chapter 1, Plotting Curves, Boxes, Points, and more, covers the basic usage of Gnuplot: how to make all kinds of 2D plots for statistics, modeling, finance, science, and more. Chapter 2, Annotating with Labels and Legends, explains how to add labels, arrows, and mathematical text to our plots. Chapter 3, Applying Colors and Styles, covers the basics of colors and styles in gnuplot, plus transparency, and plotting with points and objects. Chapter 4, Controlling Your Tics, will show you how to get your tic marks and labels just right, along with gnuplot's new internationalization features.

**
Syntactic Structures
** by
Noam Chomsky

finite state, P = NP, statistical model

We shall see, in fact, in § 7, that there are deep structural reasons for distinguish i ng (3) and (4) from (5) and (6) ; but before we are able to find an explana tion for such facts as these we shall have to carry the theory of syntactic structure a good deal beyond its fam i l iar li mits. 2.4 Third, the notion "grammatical i n English" cannot be identi- 16 SYNTACTIC STRUCTURES fied in any way with the notion "h igh order of statistical approxi mation to English." It is fa ir to assume that neither sentence ( I ) nor (2) (nor i ndeed any part of these sentences) has ever occurred in an English di scourse. Hence, in ,my statistical model for grammatical ness, these sentences will be ruled out on i dentica l grounds as equally 'remote' from English. Yet ( I ), though nonsensica l, i s grammatical, w h i l e ( 2 ) is not. Presented with these sentences, a speaker of English will read ( I ) with a normal sentence intonation, but he will read (2) with a fall ing i ntonation on each word ; i n fact, with just the i ntonation pattern given to any sequence of unrelated words.

pages: 480 words: 138,041

**
The Book of Woe: The DSM and the Unmaking of Psychiatry
** by
Gary Greenberg

addicted to oil, Albert Einstein, Asperger Syndrome, back-to-the-land, David Brooks, impulse control, invisible hand, Isaac Newton, John Snow's cholera map, Kickstarter, late capitalism, longitudinal study, Louis Pasteur, McMansion, meta analysis, meta-analysis, neurotypical, phenotype, placebo effect, random walk, selection bias, statistical model, theory of mind, Winter of Discontent

If the DSM is not the map of an actual world against whose contours any changes can be validated, then opening up old arguments, or inviting new ones, might only sow dissension and reap chaos—and annoy Frances in the bargain. If he was going to revise the DSM, Frances told Pincus, then his goal would be stabilizing the system rather than trying to perfect it—or, as he put it to me, “loving the pet, even if it is a mutt5.” Frances thought there was a way to protect the system from both instability and pontificating: meta-analysis, a statistical method that, thanks to advances in computer technology and statistical modeling, had recently allowed statisticians to compile results from large numbers of studies by combining disparate data into common terms. The result was a statistical synthesis by which many different research projects could be treated as one large study. “We needed something that would leave it up to the tables rather than the people,” he told me, and meta-analysis was perfect for the job. “The idea was you would have to present evidence in tabular form that would be so convincing it would jump up and grab people by the throats.”

…

There’s a lot of information they”—I think she meant the APA, not the National Transportation Safety Board—“can look at, but it’s not a matter of analyzing the data to find out exactly what’s wrong.” Kraemer seemed to be saying that the point wasn’t to sift through the wreckage and try to prevent another catastrophe but, evidently, to crash the plane and then announce that the destruction could have been a lot worse. To be honest, however, I wasn’t sure. She was not making all that much sense, or maybe I just didn’t grasp the complexities of statistical modeling. And besides, I was distracted by a memory of something Steve Hyman once wrote. Fixing the DSM, finding another paradigm, getting away from its reifications—this, he said, was like “repairing a plane while it is flying.” It was a suggestive analogy, I thought at the time, one that recognized the near impossibility of the task even as it indicated its high stakes—and the necessity of keeping the mechanics from swearing and banging too loudly, lest the passengers start asking for a quick landing and a voucher on another airline.

pages: 444 words: 138,781

**
Evicted: Poverty and Profit in the American City
** by
Matthew Desmond

affirmative action, Cass Sunstein, crack epidemic, Credit Default Swap, deindustrialization, desegregation, dumpster diving, ending welfare as we know it, fixed income, ghettoisation, glass ceiling, Gunnar Myrdal, housing crisis, informal economy, Jane Jacobs, jobless men, Kickstarter, late fees, mass incarceration, New Urbanism, payday loans, price discrimination, profit motive, rent control, statistical model, superstar cities, The Chicago School, The Death and Life of Great American Cities, thinkpad, upwardly mobile, working poor, young professional

With Jonathan Mijs, I combined all eviction court records between January 17 and February 26, 2011 (the Milwaukee Eviction Court Study period) with information about aspects of tenants’ neighborhoods, procured after geocoding the addresses that appeared in the eviction records. Working with the Harvard Center for Geographic Analysis, I also calculated the distance (in drive miles and time) between tenants’ addresses and the courthouse. Then I constructed a statistical model that attempted to explain the likelihood of a tenant appearing in court based on aspects of that tenant’s case and her or his neighborhood. The model generated only null findings. How much a tenant owed a landlord, her commute time to the courthouse, her gender—none of these factors were significantly related to appearing in court. I also investigated whether several aspects of a tenant’s neighborhood—e.g., its eviction, poverty, and crime rates—mattered when it came to explaining defaults.

…

In those where children made up at least 40 percent of the population, 1 household in every 12 was. All else equal, a 1 percent increase in the percentage of children in a neighborhood is predicted to increase a neighborhood’s evictions by almost 7 percent. These estimates are based on court-ordered eviction records that took place in Milwaukee County between January 1, 2010, and December 31, 2010. The statistical model evaluating the association between a neighborhood’s percentage of children and its number of evictions is a zero-inflated Poisson regression, which is described in detail in Matthew Desmond et al., “Evicting Children,” Social Forces 92 (2013): 303–27. 3. That misery could stick around. At least two years after their eviction, mothers like Arleen still experienced significantly higher rates of depression than their peers.

pages: 624 words: 127,987

**
The Personal MBA: A World-Class Business Education in a Single Volume
** by
Josh Kaufman

Albert Einstein, Atul Gawande, Black Swan, business cycle, business process, buy low sell high, capital asset pricing model, Checklist Manifesto, cognitive bias, correlation does not imply causation, Credit Default Swap, Daniel Kahneman / Amos Tversky, David Heinemeier Hansson, David Ricardo: comparative advantage, Dean Kamen, delayed gratification, discounted cash flows, Donald Knuth, double entry bookkeeping, Douglas Hofstadter, en.wikipedia.org, Frederick Winslow Taylor, George Santayana, Gödel, Escher, Bach, high net worth, hindsight bias, index card, inventory management, iterative process, job satisfaction, Johann Wolfgang von Goethe, Kevin Kelly, Kickstarter, Lao Tzu, lateral thinking, loose coupling, loss aversion, Marc Andreessen, market bubble, Network effects, Parkinson's law, Paul Buchheit, Paul Graham, place-making, premature optimization, Ralph Waldo Emerson, rent control, side project, statistical model, stealth mode startup, Steve Jobs, Steve Wozniak, subscription business, telemarketer, the scientific method, time value of money, Toyota Production System, tulip mania, Upton Sinclair, Vilfredo Pareto, Walter Mischel, Y Combinator, Yogi Berra

The primary question is not whether attending a university is a positive experience: it’s whether or not the experience is worth the cost.9 2. MBA programs teach many worthless, outdated, even outright damaging concepts and practices—assuming your goal is to actually build a successful business and increase your net worth. Many of my MBAHOLDING readers and clients come to me after spending tens (sometimes hundreds) of thousands of dollars learning the ins and outs of complex financial formulas and statistical models, only to realize that their MBA program didn’t teach them how to start or improve a real, operating business. That’s a problem—graduating from business school does not guarantee having a useful working knowledge of business when you’re done, which is what you actually need to be successful. 3. MBA programs won’t guarantee you a high-paying job, let alone make you a skilled manager or leader with a shot at the executive suite.

…

Over time, managers and executives began using statistics and analysis to forecast the future, relying on databases and spreadsheets in much the same way ancient seers relied on tea leaves and goat entrails. The world itself is no less unpredictable or uncertain: as in the olden days, the signs only “prove” the biases and desires of the soothsayer. The complexity of financial transactions and the statistical models those transactions relied upon continued to grow until few practitioners fully understood how they worked or respected their limits. As Wired revealed in a February 2009 article, “Recipe for Disaster: The Formula That Killed Wall Street,” the inherent limitations of deified financial formulas such as the Black-Scholes option pricing model, the Gaussian copula function, and the capital asset pricing model (CAPM) played a major role in the tech bubble of 2000 and the housing market and derivatives shenanigans behind the 2008 recession.

pages: 504 words: 139,137

**
Efficiently Inefficient: How Smart Money Invests and Market Prices Are Determined
** by
Lasse Heje Pedersen

activist fund / activist shareholder / activist investor, algorithmic trading, Andrei Shleifer, asset allocation, backtesting, bank run, banking crisis, barriers to entry, Black-Scholes formula, Brownian motion, business cycle, buy and hold, buy low sell high, capital asset pricing model, commodity trading advisor, conceptual framework, corporate governance, credit crunch, Credit Default Swap, currency peg, David Ricardo: comparative advantage, declining real wages, discounted cash flows, diversification, diversified portfolio, Emanuel Derman, equity premium, Eugene Fama: efficient market hypothesis, fixed income, Flash crash, floating exchange rates, frictionless, frictionless market, Gordon Gekko, implied volatility, index arbitrage, index fund, interest rate swap, late capitalism, law of one price, Long Term Capital Management, margin call, market clearing, market design, market friction, merger arbitrage, money market fund, mortgage debt, Myron Scholes, New Journalism, paper trading, passive investing, price discovery process, price stability, purchasing power parity, quantitative easing, quantitative trading / quantitative ﬁnance, random walk, Renaissance Technologies, Richard Thaler, risk-adjusted returns, risk/return, Robert Shiller, Robert Shiller, selection bias, shareholder value, Sharpe ratio, short selling, sovereign wealth fund, statistical arbitrage, statistical model, stocks for the long run, stocks for the long term, survivorship bias, systematic trading, technology bubble, time value of money, total factor productivity, transaction costs, value at risk, Vanguard fund, yield curve, zero-coupon bond

However, volatility is not an appropriate measure of risk for strategies with an extreme crash risk. For instance, volatility does not capture well the risk of selling out-the-money options, a strategy with small positive returns on most days but infrequent large crashes. To compute the volatility of a large portfolio, hedge funds need to account for correlations across assets, which can be accomplished by simulating the overall portfolio or by using a statistical model such as a factor model. Another measure of risk is value-at-risk (VaR), which attempts to capture tail risk (non-normality). The VaR measures the maximum loss with a certain confidence, as seen in figure 4.1 below. For example, the VaR is the most that you can lose with a 95% or 99% confidence. For instance, a hedge fund has a one-day 95% VaR of $10 million if A simple way to estimate VaR is to line up past returns, sort them by magnitude, and find a return that has 5% worse days and 95% better days.

…

Intermediaries are always worried that the flows will continue against them. That part is invisible to them. The market demand might evolve as a wave builds up. The intermediary makes money when the wave subsides. Then the flows and equilibrium pricing are in the same direction. LHP: Or you might even short at a nickel cheap? MS: You might. Trend following is based on understanding macro developments and what governments are doing. Or they are based on statistical models of price movements. A positive up price tends to result in a positive up price. Here, however, it is not possible to determine whether the trend will continue. LHP: Why do spreads tend to widen during some periods of stress? MS: Well, capital becomes more scarce, both physical capital and human capital, in the sense that there isn’t enough time for intermediaries to understand what is happening in chaotic times.

pages: 186 words: 49,251

**
The Automatic Customer: Creating a Subscription Business in Any Industry
** by
John Warrillow

Airbnb, airport security, Amazon Web Services, asset allocation, barriers to entry, call centre, cloud computing, commoditize, David Heinemeier Hansson, discounted cash flows, high net worth, Jeff Bezos, Network effects, passive income, rolodex, sharing economy, side project, Silicon Valley, Silicon Valley startup, software as a service, statistical model, Steve Jobs, Stewart Brand, subscription business, telemarketer, time value of money, zero-sum game, Zipcar

But your true return is much greater because you have had $1,200 of your customer’s money—interest free—to invest in your business. You have taken on a risk in guaranteeing your customer’s roof replacement and need to be paid for placing that bet. The repair job could have cost you $3,000, and then you would have taken an underwriting loss of $1,800 ($1,200−$3,000). Calculating your risk is the primary challenge of running a peace-of-mind model company. Big insurance companies employ an army of actuaries who use statistical models to predict the likelihood of a claim being made. You don’t need to be quite so scientific. Instead, start by looking back at the last 20 roofs you’ve installed with a guarantee and figure out how many service calls you needed to make. That will give you a pretty good idea of the possible risk of offering a peace-of-mind subscription. Assuming you’re not an actuary and you didn’t get your doctorate in math from MIT, it’s probably a wise idea to go slow in leveraging the peace-of-mind subscription model.

pages: 222 words: 53,317

**
Overcomplicated: Technology at the Limits of Comprehension
** by
Samuel Arbesman

algorithmic trading, Anton Chekhov, Apple II, Benoit Mandelbrot, citation needed, combinatorial explosion, Danny Hillis, David Brooks, digital map, discovery of the americas, en.wikipedia.org, Erik Brynjolfsson, Flash crash, friendly AI, game design, Google X / Alphabet X, Googley, HyperCard, Inbox Zero, Isaac Newton, iterative process, Kevin Kelly, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mandelbrot fractal, Minecraft, Netflix Prize, Nicholas Carr, Parkinson's law, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman: Challenger O-ring, Second Machine Age, self-driving car, software studies, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, Stewart Brand, superintelligent machines, Therac-25, Tyler Cowen: Great Stagnation, urban planning, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, Y2K

What techniques are used by experts: Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (Oxford, UK: Oxford University Press, 2014), 15. say, 99.9 percent of the time: I made these numbers up for effect, but if any linguist wants to chat, please reach out! “based on millions of specific features”: Alon Halevy et al., “The Unreasonable Effectiveness of Data,” IEEE Intelligent Systems 24, no. 2 (2009): 8–12. In some ways, these statistical models are actually simpler than those that start from seemingly more elegant rules, because the latter end up being complicated by exceptions. sophisticated machine learning techniques: See Douglas Heaven, “Higher State of Mind,” New Scientist 219 (August 10, 2013), 32–35, available online (under the title “Not Like Us: Artificial Minds We Can’t Understand”): http://complex.elte.hu/~csabai/simulationLab/AI_08_August_2013_New_Scientist.pdf.

**
Beautiful Data: The Stories Behind Elegant Data Solutions
** by
Toby Segaran,
Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, longitudinal study, Mars Rover, natural language processing, openstreetmap, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social graph, SPARQL, speech recognition, statistical model, supply-chain management, text mining, Vernor Vinge, web application

Although this is a fairly simple application, it highlights the distributed nature of the solution, combining open data with free visualization methods from multiple sources. More importantly, the distributed nature of the system and free accessibility of the data allow experts in different domains—experimentalists generating data, software developers creating interfaces, and computational modelers creating statistical models—to easily couple their expertise. The true promise of open data, open services, and the ecosystem that supports them is that this coupling can occur without requiring any formal collaboration. Researchers will find and use the data in ways that the generators of that data never considered. By doing this they add value to the original data set and strengthen the ecosystem around it, whether they are performing complementary experiments, doing new analyses, or providing new services that process the data.

…

We try to apply the following template: • “Figure X shows…” • “Each point (or line) in the graph represents…” • “The separate graphs indicate…” 323 Download at Boykma.Com • “Before making this graph, we did…which didn’t work, because…” • “A natural extension would be…” We do not have a full theory of statistical graphics—our closest attempt is to link exploratory graphical displays to checking the fit of statistical models (Gelman 2003)—but we hope that this small bit of structure can help readers in their own efforts. We think of our graphs not as beautiful standalone artifacts but rather as tools to help us understand beautiful reality. We illustrate using examples from our own work, not because our graphs are particularly beautiful, but because in these cases we know the story behind each plot. Example 1: Redistricting and Partisan Bias Figure 19-1 shows the estimated effect on partisan bias from redistricting (redrawing of the lines dividing the districts from which legislators get elected).

pages: 566 words: 155,428

**
After the Music Stopped: The Financial Crisis, the Response, and the Work Ahead
** by
Alan S. Blinder

"Robert Solow", Affordable Care Act / Obamacare, asset-backed security, bank run, banking crisis, banks create money, break the buck, Carmen Reinhart, central bank independence, collapse of Lehman Brothers, collateralized debt obligation, conceptual framework, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, Detroit bankruptcy, diversification, double entry bookkeeping, eurozone crisis, facts on the ground, financial innovation, fixed income, friendly fire, full employment, hiring and firing, housing crisis, Hyman Minsky, illegal immigration, inflation targeting, interest rate swap, Isaac Newton, Kenneth Rogoff, liquidity trap, London Interbank Offered Rate, Long Term Capital Management, market bubble, market clearing, market fundamentalism, McMansion, money market fund, moral hazard, naked short selling, new economy, Nick Leeson, Northern Rock, Occupy movement, offshore financial centre, price mechanism, quantitative easing, Ralph Waldo Emerson, Robert Shiller, Robert Shiller, Ronald Reagan, shareholder value, short selling, South Sea Bubble, statistical model, the payments system, time value of money, too big to fail, working-age population, yield curve, Yogi Berra

As we will see later, these tests were phenomenally successful.* And there was more. To date, there have been precious few studies of the broader effects of this grab bag of financial-market policies. The only one I know of that even attempts to estimate the macroeconomic impacts of the entire potpourri was published in July 2010 by Mark Zandi and me. Our methodology was pretty simple—and very standard. Take a statistical model of the U.S. economy—we used the Moody’s Analytics model—and simulate it both with and without the policies. The differences between the two simulations are then estimates of the effects of the policies. These estimates, of course, are only as good as the model, but ours were huge. By 2011, we estimated, real GDP was about 6 percent higher, the unemployment rate was nearly 3 percentage points lower, and 4.8 million more Americans were employed because of the financial-market policies (as compared with sticking with laissez-faire).

…

The standard analysis of conventional monetary policy—what we teach in textbooks and what central bankers are raised on—is predicated, roughly speaking, on constant risk spreads. When the Federal Reserve lowers riskless interest rates, like those on federal funds and T-bills, riskier interest rates, like those on corporate lending and auto loans, are supposed to follow suit.* The history on which we economists base our statistical models looks like that. Figure 9.1 shows the behavior of the interest rates on 10-year Treasuries (the lower line) and Moody’s Baa corporate bonds (the upper line) over the period from January 1980 through June 2007, just before the crisis got started. The spread between these two rates is the vertical distance between the two lines, and the fact that they look roughly parallel means that the spread did not change much over those twenty-seven years.

pages: 517 words: 147,591

**
Small Wars, Big Data: The Information Revolution in Modern Conflict
** by
Eli Berman,
Joseph H. Felter,
Jacob N. Shapiro,
Vestal Mcintyre

basic income, call centre, centre right, clean water, crowdsourcing, demand response, drone strike, experimental economics, failed state, George Akerlof, Google Earth, HESCO bastion, income inequality, income per capita, information asymmetry, Internet of things, iterative process, land reform, mandatory minimum, minimum wage unemployment, moral hazard, natural language processing, RAND corporation, randomized controlled trial, Ronald Reagan, school vouchers, statistical model, the scientific method, trade route, unemployed young men, WikiLeaks, World Values Survey

He found that experiencing an indiscriminate attack was associated with a more than 50 percent decrease in the rate of insurgent attacks in a village—which amounts to a 24.2 percent reduction relative to the average.59 Furthermore, the correlation between the destructiveness of the random shelling and subsequent insurgent violence from that village was either negative or statistically insignificant, depending on the exact statistical model.60 While it’s not clear how civilians subject to these attacks interpreted them, what is clear is that in this case objectively indiscriminate violence by the government reduced local insurgent activity. Both of these studies are of asymmetric conflicts, and while the settings differ in important ways, each provides evidence that is not obviously consistent with the model. When we look deeper, however, we believe that both are consistent with the arguments about the use of suppressive force in our model.

…

The placement of training centers in villages was randomized to measure the effect on take-up on training, and that randomization also enabled Jake and his coauthors to assess the effect on attitudes and beliefs. Looking at subsequent village council elections, villages that had the training centers installed were much more likely to have a candidate from the PMLN place in the top two positions. The odds of a PMLN candidate either winning or being runner-up rose by 10 to 20 percentage points (depending on the statistical model). While other studies have shown that provision of public goods can sway attitudes, the effect is not usually so large. Remember, the training was funded and was going to be provided anyway. On the other hand, villages where vouchers were distributed for training elsewhere—making them less useful to men and virtually unusable by women—saw no increased support for the PMLN. It looks like simply providing a public good can pay huge political dividends, but it must be done in a smart way, applicable to people’s needs and conforming to people’s customs.

pages: 207 words: 57,959

**
Little Bets: How Breakthrough Ideas Emerge From Small Discoveries
** by
Peter Sims

Amazon Web Services, Black Swan, Clayton Christensen, complexity theory, David Heinemeier Hansson, deliberate practice, discovery of penicillin, endowment effect, fear of failure, Frank Gehry, Guggenheim Bilbao, Jeff Bezos, knowledge economy, lateral thinking, Lean Startup, longitudinal study, loss aversion, meta analysis, meta-analysis, PageRank, Richard Florida, Richard Thaler, Ruby on Rails, Silicon Valley, statistical model, Steve Ballmer, Steve Jobs, Steve Wozniak, theory of mind, Toyota Production System, urban planning, Wall-E

One of the men in charge of U.S. strategy in the war for many years was Robert McNamara, secretary of defense under Presidents Kennedy and Johnson. McNamara was known for his enormous intellect, renowned for achievements at Ford Motors (where he was once president) and in government. Many considered him the best management mind of his era. During World War II, McNamara had gained acclaim for developing statistical models to optimize the destruction from bombing operations over Japan. The challenge of Vietnam, however, proved to be different in ways that exposed the limits of McNamara’s approach. McNamara assumed that increased bombing in Vietnam would reduce the Viet Cong resistance with some degree of proportionality, but it did not. There wasn’t a linear cause and effect relationship. The Viet Cong kept shifting its positions and strategies (including using extensive tunnels), but mostly proved far more resilient than McNamara and the other U.S. planners expected.

pages: 190 words: 62,941

**
Wild Ride: Inside Uber's Quest for World Domination
** by
Adam Lashinsky

"side hustle", Airbnb, always be closing, Amazon Web Services, autonomous vehicles, Ayatollah Khomeini, business process, Chuck Templeton: OpenTable:, cognitive dissonance, corporate governance, DARPA: Urban Challenge, Donald Trump, Elon Musk, gig economy, Golden Gate Park, Google X / Alphabet X, information retrieval, Jeff Bezos, Lyft, Marc Andreessen, Mark Zuckerberg, megacity, Menlo Park, new economy, pattern recognition, price mechanism, ride hailing / ride sharing, Sand Hill Road, self-driving car, Silicon Valley, Silicon Valley startup, Skype, Snapchat, South of Market, San Francisco, sovereign wealth fund, statistical model, Steve Jobs, TaskRabbit, Tony Hsieh, transportation-network company, Travis Kalanick, turn-by-turn navigation, Uber and Lyft, Uber for X, uber lyft, ubercab, young professional

Uber already operated in New York and planned to launch shortly in Seattle, Washington, D.C., Boston, and Chicago. Kalanick bragged about the advanced math that went into Uber’s calculation of when riders should expect their cars to show up. Uber’s “math department,” as he called it, included a computational statistician, a rocket scientist, and a nuclear physicist. They were running, he informed me, a Gaussian process emulation—a fancy statistical model—to improve on data available from Google’s mapping products. “Our estimates are far superior to Google’s,” Kalanick said. I was witnessing for the first time the cocksure Kalanick. I told him I had an idea for a market for Uber. I had recently sent a babysitter home in an Uber, a wonderful convenience because I could pay with my credit card from Uber’s app and then monitor the car’s progress on my phone to make sure the sitter got home safely.

pages: 219 words: 63,495

**
50 Future Ideas You Really Need to Know
** by
Richard Watson

23andMe, 3D printing, access to a mobile phone, Albert Einstein, artificial general intelligence, augmented reality, autonomous vehicles, BRICs, Buckminster Fuller, call centre, clean water, cloud computing, collaborative consumption, computer age, computer vision, crowdsourcing, dark matter, dematerialisation, digital Maoism, digital map, Elon Musk, energy security, failed state, future of work, Geoffrey West, Santa Fe Institute, germ theory of disease, global pandemic, happiness index / gross national happiness, hive mind, hydrogen economy, Internet of things, Jaron Lanier, life extension, Mark Shuttleworth, Marshall McLuhan, megacity, natural language processing, Network effects, new economy, oil shale / tar sands, pattern recognition, peak oil, personalized medicine, phenotype, precision agriculture, profit maximization, RAND corporation, Ray Kurzweil, RFID, Richard Florida, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Skype, smart cities, smart meter, smart transportation, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, supervolcano, telepresence, The Wisdom of Crowds, Thomas Malthus, Turing test, urban decay, Vernor Vinge, Watson beat the top human players on Jeopardy!, web application, women in the workforce, working-age population, young professional

Link all this to new imaging technologies, remote monitoring, medical smartcards, e-records and even gamification. One day, we may, for example, develop a tiny chip that can hold the full medical history of a person including any medical conditions, allergies, prescriptions and contact information (this is already planned in America). Digital vacuums Digital vacuuming refers to the practice of scooping up vast amounts of data then using mathematical and statistical models to determine content and possible linkages. The data itself can be anything from phone calls in historical or real time (the US company AT&T, for example, holds the records of 1.9 trillion telephone calls) to financial transactions, emails and Internet site visits. Commercial applications could include future health risks to counterterrorism. The card could feature a picture ID and hours of video content, such as X-rays or moving medical imagery.

pages: 256 words: 60,620

**
Think Twice: Harnessing the Power of Counterintuition
** by
Michael J. Mauboussin

affirmative action, asset allocation, Atul Gawande, availability heuristic, Benoit Mandelbrot, Bernie Madoff, Black Swan, butter production in bangladesh, Cass Sunstein, choice architecture, Clayton Christensen, cognitive dissonance, collateralized debt obligation, Daniel Kahneman / Amos Tversky, deliberate practice, disruptive innovation, Edward Thorp, experimental economics, financial innovation, framing effect, fundamental attribution error, Geoffrey West, Santa Fe Institute, George Akerlof, hindsight bias, hiring and firing, information asymmetry, libertarian paternalism, Long Term Capital Management, loose coupling, loss aversion, mandelbrot fractal, Menlo Park, meta analysis, meta-analysis, money market fund, Murray Gell-Mann, Netflix Prize, pattern recognition, Philip Mirowski, placebo effect, Ponzi scheme, prediction markets, presumed consent, Richard Thaler, Robert Shiller, Robert Shiller, statistical model, Steven Pinker, The Wisdom of Crowds, ultimatum game

Second, Cinematch, or whatever program ultimately unseats it, is vastly better than the video-store employee in New York City.15 The night-and-day contrast between the quality of advice from Netflix’s algorithms and the local video-store clerk illustrates this chapter’s first decision mistake: using experts instead of mathematical models. This mistake, I admit, is hard to swallow and is a direct affront to experts of all stripes. But it is also among the best documented findings in the social sciences. In 1954, Paul Meehl, a psychologist at the University of Minnesota, published a book that reviewed studies comparing the clinical judgment of experts (psychologists and psychiatrists) with linear statistical models. He made sure the analysis was done carefully so he could be confident that the comparisons were fair. In study after study, the statistical methods exceeded or matched the expert performance.16 More recently, Philip Tetlock, a psychologist at the University of California, Berkeley, completed an exhaustive study of expert predictions, including twenty-eight thousand forecasts made by three hundred experts hailing from sixty countries over fifteen years.

**
Logically Fallacious: The Ultimate Collection of Over 300 Logical Fallacies (Academic Edition)
** by
Bo Bennett

Black Swan, butterfly effect, clean water, cognitive bias, correlation does not imply causation, Donald Trump, equal pay for equal work, Richard Feynman, side project, statistical model, the scientific method

Exception: Of course, there is no clear line between situations that call for critical thought and those that call for reactionary obedience. But if you cross the line, hopefully you are with people who care about you enough to tell you. Tip: People don’t like to be made to feel inferior. You need to know when showing tack and restraint is more important than being right. Ludic Fallacy ludus Description: Assuming flawless statistical models apply to situations where they actually don’t. This can result in the over-confidence in probability theory or simply not knowing exactly where it applies, as opposed to chaotic situations or situations with external influences too subtle or numerous to predict. Example #1: The best example of this fallacy is presented by the person who coined this term, Nassim Nicholas Taleb in his 2007 book, The Black Swan.

pages: 504 words: 89,238

**
Natural language processing with Python
** by
Steven Bird,
Ewan Klein,
Edward Loper

bioinformatics, business intelligence, conceptual framework, Donald Knuth, elephant in my pajamas, en.wikipedia.org, finite state, Firefox, Guido van Rossum, information retrieval, Menlo Park, natural language processing, P = NP, search inside the book, speech recognition, statistical model, text mining, Turing test

Structure of the published TIMIT Corpus: The CD-ROM contains doc, train, and test directories at the top level; the train and test directories both have eight sub-directories, one per dialect region; each of these contains further subdirectories, one per speaker; the contents of the directory for female speaker aks0 are listed, showing 10 wav files accompanied by a text transcription, a wordaligned transcription, and a phonetic transcription. there is a split between training and testing sets, which gives away its intended use for developing and evaluating statistical models. Finally, notice that even though TIMIT is a speech corpus, its transcriptions and associated data are just text, and can be processed using programs just like any other text corpus. Therefore, many of the computational methods described in this book are applicable. Moreover, notice that all of the data types included in the TIMIT Corpus fall into the two basic categories of lexicon and text, which we will discuss later.

…

For example, one intermediate position is to assume that humans are innately endowed with analogical and memory-based learning methods (weak rationalism), and use these methods to identify meaningful patterns in their sensory language experience (empiricism). We have seen many examples of this methodology throughout this book. Statistical methods inform symbolic models anytime corpus statistics guide the selection of productions in a context-free grammar, i.e., “grammar engineering.” Symbolic methods inform statistical models anytime a corpus that was created using rule-based methods is used as a source of features for training a statistical language model, i.e., “grammatical inference.” The circle is closed. NLTK Roadmap The Natural Language Toolkit is a work in progress, and is being continually expanded as people contribute code. Some areas of NLP and linguistics are not (yet) well supported in NLTK, and contributions in these areas are especially welcome.

**
Debtor Nation: The History of America in Red Ink (Politics and Society in Modern America)
** by
Louis Hyman

asset-backed security, bank run, barriers to entry, Bretton Woods, business cycle, card file, central bank independence, computer age, corporate governance, credit crunch, declining real wages, deindustrialization, diversified portfolio, financial independence, financial innovation, fixed income, Gini coefficient, Home mortgage interest deduction, housing crisis, income inequality, invisible hand, late fees, London Interbank Offered Rate, market fundamentalism, means of production, mortgage debt, mortgage tax deduction, p-value, pattern recognition, profit maximization, profit motive, risk/return, Ronald Reagan, Silicon Valley, statistical model, technology bubble, the built environment, transaction costs, union organizing, white flight, women in the workforce, working poor, zero-sum game

Applications became more consistent and less subject to the whims of a particular loan officer. In computer models, feminist credit advocates believed they had found the solution to discriminatory lending, ushering in the contemporary calculated credit regimes under which we live today. Yet removing such basic demographics from any model was not as straightforward as the authors of the ECOA had hoped because of how THE CREDIT INFRASTRUCTURE 215 all statistical models function, but which legislators seem to not have fully understood. The “objective” credit statistics that legislators had pined for during the early investigations of the Consumer Credit Protection Act could now exist, but with new difficulties that stemmed from using regressions and not human judgment to decide on loans. In human-judged credit lending, a loan officer who knew the race and gender of an applicant would be more discriminatory, whereas in a computer credit model, knowing the applicant’s race and gender allowed the credit decision to be less discriminatory.

…

The higher the level of education and income, the lower the effective interest rate paid, since such users tended more frequently to be non-revolvers.96 The researchers found that young, large, low-income families who could not save for major purchases, paid finance charges, while their opposite, older, smaller, highincome families who could save for major purchases, did not pay finance charges. Effectively the young and poor cardholders subsidized the convenience of the old and rich.97 And white.98 The new statistical models revealed that the second best predicator of revolving debt, after a respondent’s own “self-evaluation of his or her ability to save,” was race.99 But what these models revealed was that the very group—African Americans—that the politicians wanted to increase credit access to, tended to revolve their credit more than otherwise similar white borrowers. Though federal laws prevented businesses from using race in their lending decisions, academics were free to examine race as a credit model would and found that, even after adjusting for income and other demographics, race was still the second strongest predictive factor.

pages: 632 words: 166,729

**
Addiction by Design: Machine Gambling in Las Vegas
** by
Natasha Dow Schüll

airport security, Albert Einstein, Build a better mousetrap, business intelligence, capital controls, cashless society, commoditize, corporate social responsibility, deindustrialization, dematerialisation, deskilling, game design, impulse control, information asymmetry, inventory management, iterative process, jitney, large denomination, late capitalism, late fees, longitudinal study, means of production, meta analysis, meta-analysis, Nash equilibrium, Panopticon Jeremy Bentham, post-industrial society, postindustrial economy, profit motive, RFID, Silicon Valley, Slavoj Žižek, statistical model, the built environment, yield curve, zero-sum game

asked a Harrah’s executive at G2E in 2008.38 “What is the order of value of that player to me?” echoed Bally’s Rowe.39 Using statistical modeling, casinos “tier” players based on different parameters, assigning each a “customer value” or “theoretical player value”—a value, that is, based on the theoretical revenue they are likely to generate. On a panel called “Patron Rating: The New Definition of Customer Value,” one specialist shared his system for gauging patron worth, recommending that casinos give each customer a “recency score” (how recently he has visited), a “frequency score” (how often he visits), and a “monetary score” (how much he spends), and then create a personalized marketing algorithm out of these variables.40 “We want to maximize every relationship,” Harrah’s Richard Mirman told a journalist.41 Harrah’s statistical models for determining player value, similar to those used for predicting stocks’ future worth, are the most advanced in the industry.

pages: 204 words: 67,922

**
Elsewhere, U.S.A: How We Got From the Company Man, Family Dinners, and the Affluent Society to the Home Office, BlackBerry Moms,and Economic Anxiety
** by
Dalton Conley

assortative mating, call centre, clean water, commoditize, dematerialisation, demographic transition, Edward Glaeser, extreme commuting, feminist movement, financial independence, Firefox, Frank Levy and Richard Murnane: The New Division of Labor, Home mortgage interest deduction, income inequality, informal economy, Jane Jacobs, Joan Didion, John Maynard Keynes: Economic Possibilities for our Grandchildren, knowledge economy, knowledge worker, labor-force participation, late capitalism, low skilled workers, manufacturing employment, mass immigration, McMansion, mortgage tax deduction, new economy, off grid, oil shock, PageRank, Ponzi scheme, positional goods, post-industrial society, post-materialism, principal–agent problem, recommendation engine, Richard Florida, rolodex, Ronald Reagan, Silicon Valley, Skype, statistical model, The Death and Life of Great American Cities, The Great Moderation, The Wealth of Nations by Adam Smith, Thomas Malthus, Thorstein Veblen, transaction costs, women in the workforce, Yom Kippur War

And how much should Amex have paid for this privilege? Should they have gotten a discount since the first word of their brand is also the first word of American Airlines and thereby reinforces—albeit in a subtle way—the host company’s image? In order to know the value of the deal, they would have had to know how much the marketing campaign increases their business. Impossible. No focus group or statistical model will tell Amex how much worse or better their bottom line would have been in the absence of this marketing campaign. Ditto for the impact of billboards, product placement, and special promotions like airline mileage plans. There are simply too many other forces that come into play to be able to isolate the impact of a specific effort. Ditto for most of the symbolic economy. It is ironic that in this age of markets and seemingly limitless information, we can’t get the very answers we need to make rational business decisions.

**
Exploring Everyday Things with R and Ruby
** by
Sau Sheong Chang

Alfred Russel Wallace, bioinformatics, business process, butterfly effect, cloud computing, Craig Reynolds: boids flock, Debian, Edward Lorenz: Chaos theory, Gini coefficient, income inequality, invisible hand, p-value, price stability, Ruby on Rails, Skype, statistical model, stem cell, Stephen Hawking, text mining, The Wealth of Nations by Adam Smith, We are the 99%, web application, wikimedia commons

The default method for a smooth geom in ggplot2 is the LOESS algorithm, which is suitable for a small number of data points. LOESS is not suitable for a large number of data points, however, because it scales on an O(n2) basis in memory, so instead we use the mgcv library and its gam method. We also send in the formula y~s(x), where s is the smoother function for GAM. GAM stands for generalized addictive model, which is a statistical model used to describe how items of data relate to each other. In our case, we use GAM as an algorithm in the smoother to provide us with a reasonably good estimation of how a large number of data points can be visualized. In Figure 8-5, you can see that the population of roids fluctuates over time between two extremes caused by the oversupply and exhaustion of food, respectively. Figure 8-5.

pages: 239 words: 70,206

**
Data-Ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else
** by
Steve Lohr

"Robert Solow", 23andMe, Affordable Care Act / Obamacare, Albert Einstein, big data - Walmart - Pop Tarts, bioinformatics, business cycle, business intelligence, call centre, cloud computing, computer age, conceptual framework, Credit Default Swap, crowdsourcing, Daniel Kahneman / Amos Tversky, Danny Hillis, data is the new oil, David Brooks, East Village, Edward Snowden, Emanuel Derman, Erik Brynjolfsson, everywhere but in the productivity statistics, Frederick Winslow Taylor, Google Glasses, impulse control, income inequality, indoor plumbing, industrial robot, informal economy, Internet of things, invention of writing, Johannes Kepler, John Markoff, John von Neumann, lifelogging, Mark Zuckerberg, market bubble, meta analysis, meta-analysis, money market fund, natural language processing, obamacare, pattern recognition, payday loans, personalized medicine, precision agriculture, pre–internet, Productivity paradox, RAND corporation, rising living standards, Robert Gordon, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, speech recognition, statistical model, Steve Jobs, Steven Levy, The Design of Experiments, the scientific method, Thomas Kuhn: the structure of scientific revolutions, unbanked and underbanked, underbanked, Von Neumann architecture, Watson beat the top human players on Jeopardy!

Cleveland, then a researcher at Bell Labs, wrote a paper he called an “action plan” for essentially redefining statistics as an engineering task. “The altered field,” he wrote, “will be called ‘data science.’” In his paper, Cleveland, who is now a professor of statistics and computer science at Purdue University, described the contours of this new field. Data science, he said, would touch all disciplines of study and require the development of new statistical models, new computing tools, and educational programs in schools and corporations. Cleveland’s vision of a new field is now rapidly gaining momentum. The federal government, universities, and foundations are funding data science initiatives. Nearly all of these efforts are multidisciplinary melting pots that seek to bring together teams of computer scientists, statisticians, and mathematicians with experts who bring piles of data and unanswered questions from biology, astronomy, business and finance, public health, and elsewhere.

pages: 242 words: 68,019

**
Why Information Grows: The Evolution of Order, From Atoms to Economies
** by
Cesar Hidalgo

"Robert Solow", Ada Lovelace, Albert Einstein, Arthur Eddington, assortative mating, business cycle, Claude Shannon: information theory, David Ricardo: comparative advantage, Douglas Hofstadter, Everything should be made as simple as possible, frictionless, frictionless market, George Akerlof, Gödel, Escher, Bach, income inequality, income per capita, industrial cluster, information asymmetry, invention of the telegraph, invisible hand, Isaac Newton, James Watt: steam engine, Jane Jacobs, job satisfaction, John von Neumann, Joi Ito, New Economic Geography, Norbert Wiener, p-value, Paul Samuelson, phenotype, price mechanism, Richard Florida, Ronald Coase, Rubik’s Cube, Silicon Valley, Simon Kuznets, Skype, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, The Market for Lemons, The Nature of the Firm, The Wealth of Nations by Adam Smith, total factor productivity, transaction costs, working-age population

GDP considers the production of goods and services within a country. GNP considers the goods and services produced by the citizens of a country, whether or not those goods are produced within the boundaries of the country. 5. Simon Kuznets, “Modern Economic Growth: Findings and Reflections,” American Economic Review 63, no. 3 (1973): 247–258. 6. Technically, total factor productivity is the residual or error term of the statistical model. Also, economists often refer to total factor productivity as technology, although this is a semantic deformation that is orthogonal to the definition of technology used by anyone who has ever developed a technology. In the language of economics, technology is the ability to do more—of anything—with the same cost. For inventors of technology, technology is the ability to do something completely new, which often involves the development of a new capacity.

pages: 305 words: 69,216

**
A Failure of Capitalism: The Crisis of '08 and the Descent Into Depression
** by
Richard A. Posner

Andrei Shleifer, banking crisis, Bernie Madoff, business cycle, collateralized debt obligation, collective bargaining, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, debt deflation, diversified portfolio, equity premium, financial deregulation, financial intermediation, Home mortgage interest deduction, illegal immigration, laissez-faire capitalism, Long Term Capital Management, market bubble, money market fund, moral hazard, mortgage debt, Myron Scholes, oil shock, Ponzi scheme, price stability, profit maximization, race to the bottom, reserve currency, risk tolerance, risk/return, Robert Shiller, Robert Shiller, savings glut, shareholder value, short selling, statistical model, too big to fail, transaction costs, very high income

Marketers to Americans (as distinct from Japanese) have had greater success appealing to the first set of motives than to the second. Quantitative models of risk—another fulfillment of Weber's prophecy that more and more activities would be brought under the rule of rationality— are also being blamed for the financial crisis. Suppose a trader is contemplating the purchase of a stock using largely borrowed money, so that if the stock falls even a little way the loss will be great. He might consult a statistical model that predicted, on the basis of the ups and downs of the stock in the preceding two years, the probability distribution of the stock's behavior over the next few days or weeks. The criticism is that the model would have based the prediction on market behavior during a period of rising stock values; the modeler should have gone back to the 1980s or earlier to get a fuller picture of the riskiness of the stock.

**
Once the American Dream: Inner-Ring Suburbs of the Metropolitan United States
** by
Bernadette Hanlon

big-box store, correlation coefficient, deindustrialization, desegregation, edge city, feminist movement, housing crisis, illegal immigration, informal economy, longitudinal study, low skilled workers, low-wage service sector, manufacturing employment, McMansion, New Urbanism, Silicon Valley, statistical model, The Chicago School, transit-oriented development, urban sprawl, white flight, working-age population, zero-sum game

He is one of the first scholars to develop a causal model of suburban social-status change. In this study, he suggests the role of population growth was somewhat exaggerated but finds other characteristics much more pertinent. Aside from population growth, he includes the variables of suburban age, initial suburban status levels, the suburbs’ geographic locations, suburban racial makeup, and employment specialization in his statistical model. He finds (1979: 946) that suburban age, the percentage of black inhabitants, and employment specialization within a suburb affected its then-current status (in 1970) “inasmuch as they also affected earlier (1960) status levels.” He describes how a suburb’s initial, established “ecological niche” was a great determinant of its future status. Using the examples of Hammond, Indiana, and Evanston, Illinois, he states, “The ecological niches occupied by these two places [i.e., Hammond is an employment center, and Evanston is a residential center] have persisted and their socioeconomic compositions have changed little in relation to one another: that is, Evanston still has a much higher status level. . . .

pages: 666 words: 181,495

**
In the Plex: How Google Thinks, Works, and Shapes Our Lives
** by
Steven Levy

23andMe, AltaVista, Anne Wojcicki, Apple's 1984 Super Bowl advert, autonomous vehicles, book scanning, Brewster Kahle, Burning Man, business process, clean water, cloud computing, crowdsourcing, Dean Kamen, discounted cash flows, don't be evil, Donald Knuth, Douglas Engelbart, Douglas Engelbart, El Camino Real, fault tolerance, Firefox, Gerard Salton, Gerard Salton, Google bus, Google Chrome, Google Earth, Googley, HyperCard, hypertext link, IBM and the Holocaust, informal economy, information retrieval, Internet Archive, Jeff Bezos, John Markoff, Kevin Kelly, Kickstarter, Mark Zuckerberg, Menlo Park, one-China policy, optical character recognition, PageRank, Paul Buchheit, Potemkin village, prediction markets, recommendation engine, risk tolerance, Rubik’s Cube, Sand Hill Road, Saturday Night Live, search inside the book, second-price auction, selection bias, Silicon Valley, skunkworks, Skype, slashdot, social graph, social software, social web, spectrum auction, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, Ted Nelson, telemarketer, trade route, traveling salesman, turn-by-turn navigation, undersea cable, Vannevar Bush, web application, WikiLeaks, Y Combinator

Och’s official role was as a scientist in Google’s research group, but it is indicative of Google’s view of research that no step was required to move beyond study into actual product implementation. Because Och and his colleagues knew they would have access to an unprecedented amount of data, they worked from the ground up to create a new translation system. “One of the things we did was to build very, very, very large language models, much larger than anyone has ever built in the history of mankind.” Then they began to train the system. To measure progress, they used a statistical model that, given a series of words, would predict the word that came next. Each time they doubled the amount of training data, they got a .5 percent boost in the metrics that measured success in the results. “So we just doubled it a bunch of times.” In order to get a reasonable translation, Och would say, you might feed something like a billion words to the model. But Google didn’t stop at a billion.

…

To keep making consistently accurate predictions on click-through rates and conversions, Google needed to know everything. “We are trying to understand the mechanisms behind the metrics,” says Qing Wu, a decision support analyst at Google. His specialty was forecasting. He could predict patterns of queries from season to season, in different parts of the day, and the climate. “We have the temperature data, we have the weather data, and we have the queries data so we can do correlation and statistical modeling.” To make sure that his predictions were on track, Qing Wu and his colleagues made use of dozens of onscreen dashboards with information flowing through them, a Bloomberg of the Googlesphere. “With a dashboard you can monitor the queries, the amount of money you make, how many advertisers we have, how many keywords they’re bidding on, what the ROI is for each advertiser.” It’s like the census data, he would say, only Google does much better analyzing its information than the government does with the census results.

pages: 741 words: 179,454

**
Extreme Money: Masters of the Universe and the Cult of Risk
** by
Satyajit Das

affirmative action, Albert Einstein, algorithmic trading, Andy Kessler, Asian financial crisis, asset allocation, asset-backed security, bank run, banking crisis, banks create money, Basel III, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Big bang: deregulation of the City of London, Black Swan, Bonfire of the Vanities, bonus culture, Bretton Woods, BRICs, British Empire, business cycle, capital asset pricing model, Carmen Reinhart, carried interest, Celtic Tiger, clean water, cognitive dissonance, collapse of Lehman Brothers, collateralized debt obligation, corporate governance, corporate raider, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, debt deflation, Deng Xiaoping, deskilling, discrete time, diversification, diversified portfolio, Doomsday Clock, Edward Thorp, Emanuel Derman, en.wikipedia.org, Eugene Fama: efficient market hypothesis, eurozone crisis, Everybody Ought to Be Rich, Fall of the Berlin Wall, financial independence, financial innovation, financial thriller, fixed income, full employment, global reserve currency, Goldman Sachs: Vampire Squid, Gordon Gekko, greed is good, happiness index / gross national happiness, haute cuisine, high net worth, Hyman Minsky, index fund, information asymmetry, interest rate swap, invention of the wheel, invisible hand, Isaac Newton, job automation, Johann Wolfgang von Goethe, John Meriwether, joint-stock company, Jones Act, Joseph Schumpeter, Kenneth Arrow, Kenneth Rogoff, Kevin Kelly, laissez-faire capitalism, load shedding, locking in a profit, Long Term Capital Management, Louis Bachelier, margin call, market bubble, market fundamentalism, Marshall McLuhan, Martin Wolf, mega-rich, merger arbitrage, Mikhail Gorbachev, Milgram experiment, money market fund, Mont Pelerin Society, moral hazard, mortgage debt, mortgage tax deduction, mutually assured destruction, Myron Scholes, Naomi Klein, negative equity, NetJets, Network effects, new economy, Nick Leeson, Nixon shock, Northern Rock, nuclear winter, oil shock, Own Your Own Home, Paul Samuelson, pets.com, Philip Mirowski, plutocrats, Plutocrats, Ponzi scheme, price anchoring, price stability, profit maximization, quantitative easing, quantitative trading / quantitative ﬁnance, Ralph Nader, RAND corporation, random walk, Ray Kurzweil, regulatory arbitrage, rent control, rent-seeking, reserve currency, Richard Feynman, Richard Thaler, Right to Buy, risk-adjusted returns, risk/return, road to serfdom, Robert Shiller, Robert Shiller, Rod Stewart played at Stephen Schwarzman birthday party, rolodex, Ronald Reagan, Ronald Reagan: Tear down this wall, Satyajit Das, savings glut, shareholder value, Sharpe ratio, short selling, Silicon Valley, six sigma, Slavoj Žižek, South Sea Bubble, special economic zone, statistical model, Stephen Hawking, Steve Jobs, survivorship bias, The Chicago School, The Great Moderation, the market place, the medium is the message, The Myth of the Rational Market, The Nature of the Firm, the new new thing, The Predators' Ball, The Wealth of Nations by Adam Smith, Thorstein Veblen, too big to fail, trickle-down economics, Turing test, Upton Sinclair, value at risk, Yogi Berra, zero-coupon bond, zero-sum game

Mortgages against second and third homes, vacation homes and nonowner-occupied investment homes to be rented out (buy-to-let) or sold later (condo flippers) were allowed. HE (home equity) and HELOC (home equity line of credit), borrowing against the equity in existing homes, became prevalent. Empowered by high-tech models, lenders loaned to less creditworthy borrowers, believing they could price any risk. Ben Bernanke shared his predecessor Alan Greenspan’s faith: “banks have become increasingly adept at predicting default risk by applying statistical models to data, such as credit scores.” Bernanke concluded that banks “have made substantial strides...in their ability to measure and manage risks.”13 Innovative affordability products included jumbo and super jumbo loans that did not conform to guidelines because of their size. More risky than prime but less risky than subprime, Alt A (Alternative A) mortgages were for borrowers who did not meet normal criteria.

…

In 2007, Moody’s upgraded three major Icelandic banks to the highest AAA rating, citing new methodology that took into account the likelihood of government support. Although Moody’s reversed the upgrades, all three banks collapsed in 2008. Unimpeded by insufficient disclosure, lack of information transparency, fraud, and improper accounting, traders anticipated these defaults, marking down bond prices well before rating downgrades. Rating-structured securities required statistical models, mapping complex securities to historical patterns of default on normal bonds. With mortgage markets changing rapidly, this was like “using weather in Antarctica to forecast conditions in Hawaii.”17 Antarctica from 100 years ago! The agencies did not look at the underlying mortgages or loans in detail, relying instead on information from others. Moody’s Yuri Yoshizawa stated: “We’re structure experts.

pages: 757 words: 193,541

**
The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Volume 2
** by
Thomas A. Limoncelli,
Strata R. Chalup,
Christina J. Hogan

active measures, Amazon Web Services, anti-pattern, barriers to entry, business process, cloud computing, commoditize, continuous integration, correlation coefficient, database schema, Debian, defense in depth, delayed gratification, DevOps, domain-specific language, en.wikipedia.org, fault tolerance, finite state, Firefox, Google Glasses, information asymmetry, Infrastructure as a Service, intermodal, Internet of things, job automation, job satisfaction, Kickstarter, load shedding, longitudinal study, loose coupling, Malcom McLean invented shipping containers, Marc Andreessen, place-making, platform as a service, premature optimization, recommendation engine, revision control, risk tolerance, side project, Silicon Valley, software as a service, sorting algorithm, standardized shipping container, statistical model, Steven Levy, supply-chain management, Toyota Production System, web application, Yogi Berra

By reducing lead time, capacity planning can be more agile. Standard capacity planing is sufficient for small sites, sites that grow slowly, and sites with simple needs. It is insufficient for large, rapidly growing sites. They require more advanced techniques. Advanced capacity planning is based on core drivers, capacity limits of individual resources, and sophisticated data analysis such as correlation, regression analysis, and statistical models for forecasting. Regression analysis finds correlations between core drivers and resources. Forecasting uses past data to predict future needs. With sufficiently large sites, capacity planning is a full-time job, often done by project managers with technical backgrounds. Some organizations employ full-time statisticians to build complex models and dashboards that provide the information required by a project manager.

…

Capacity planning involves the technical work of understanding how many resources are needed per unit of growth, plus non-technical aspects such as budgeting, forecasting, and supply chain management. These topics are covered in Chapter 18. Sample Assessment Questions • How much capacity do you have now? • How much capacity do you expect to need three months from now? Twelve months from now? • Which statistical models do you use for determining future needs? • How do you load-test? • How much time does capacity planning take? What could be done to make it easier? • Are metrics collected automatically? • Are metrics available always or does their need initiate a process that collects them? • Is capacity planning the job of no one, everyone, a specific person, or a team of capacity planners? • If there is a corporate standard practice for this OR, what is it and how does this service comply with the practice?

pages: 291 words: 77,596

**
Total Recall: How the E-Memory Revolution Will Change Everything
** by
Gordon Bell,
Jim Gemmell

airport security, Albert Einstein, book scanning, cloud computing, conceptual framework, Douglas Engelbart, full text search, information retrieval, invention of writing, inventory management, Isaac Newton, John Markoff, lifelogging, Menlo Park, optical character recognition, pattern recognition, performance metric, RAND corporation, RFID, semantic web, Silicon Valley, Skype, social web, statistical model, Stephen Hawking, Steve Ballmer, Ted Nelson, telepresence, Turing test, Vannevar Bush, web application

Adding summarization to visualization for geolocated photos: Ahern, Shane, Mor Naaman, Rahul Nair, Jeannie Yang. “World Explorer: Visualizing Aggregate Data from Unstructured Text in Geo-Referenced Collections.” In Proceedings, Seventh ACM/IEEE-CS Joint Conference on Digital Libraries ( JCDL 07), June 2007. The Stuff I’ve Seen project did some experiments that showed how displaying milestones alongside a timeline may help orient the user. Horvitz et al. used statistical models to infer the probability that users will consider events to be memory landmarks. Ringel, M., E. Cutrell, S. T. Dumais, and E. Horvitz. 2003. “Milestones in Time: The Value of Landmarks in Retrieving Information from Personal Stores.” Proceedings of IFIP Interact 2003. Horvitz, Eric, Susan Dumais, and Paul Koch. “Learning Predictive Models of Memory Landmarks.” CogSci 2004: 26th Annual Meeting of the Cognitive Science Society, Chicago, August 2004.

pages: 274 words: 75,846

**
The Filter Bubble: What the Internet Is Hiding From You
** by
Eli Pariser

A Declaration of the Independence of Cyberspace, A Pattern Language, Amazon Web Services, augmented reality, back-to-the-land, Black Swan, borderless world, Build a better mousetrap, Cass Sunstein, citizen journalism, cloud computing, cognitive dissonance, crowdsourcing, Danny Hillis, data acquisition, disintermediation, don't be evil, Filter Bubble, Flash crash, fundamental attribution error, global village, Haight Ashbury, Internet of things, Isaac Newton, Jaron Lanier, Jeff Bezos, jimmy wales, Kevin Kelly, knowledge worker, Mark Zuckerberg, Marshall McLuhan, megacity, Metcalfe’s law, Netflix Prize, new economy, PageRank, paypal mafia, Peter Thiel, recommendation engine, RFID, Robert Metcalfe, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, social graph, social software, social web, speech recognition, Startup school, statistical model, stem cell, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, the scientific method, urban planning, Whole Earth Catalog, WikiLeaks, Y Combinator

The best way to avoid overfitting, as Popper suggests, is to try to prove the model wrong and to build algorithms that give the benefit of the doubt. If Netflix shows me a romantic comedy and I like it, it’ll show me another one and begin to think of me as a romantic-comedy lover. But if it wants to get a good picture of who I really am, it should be constantly testing the hypothesis by showing me Blade Runner in an attempt to prove it wrong. Otherwise, I end up caught in a local maximum populated by Hugh Grant and Julia Roberts. The statistical models that make up the filter bubble write off the outliers. But in human life it’s the outliers who make things interesting and give us inspiration. And it’s the outliers who are the first signs of change. One of the best critiques of algorithmic prediction comes, remarkably, from the late-nineteenth-century Russian novelist Fyodor Dostoyevsky, whose Notes from Underground was a passionate critique of the utopian scientific rationalism of the day.

pages: 322 words: 77,341

**
I.O.U.: Why Everyone Owes Everyone and No One Can Pay
** by
John Lanchester

asset-backed security, bank run, banking crisis, Berlin Wall, Bernie Madoff, Big bang: deregulation of the City of London, Black-Scholes formula, Blythe Masters, Celtic Tiger, collateralized debt obligation, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, diversified portfolio, double entry bookkeeping, Exxon Valdez, Fall of the Berlin Wall, financial deregulation, financial innovation, fixed income, George Akerlof, greed is good, hedonic treadmill, hindsight bias, housing crisis, Hyman Minsky, intangible asset, interest rate swap, invisible hand, Jane Jacobs, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Meriwether, Kickstarter, laissez-faire capitalism, light touch regulation, liquidity trap, Long Term Capital Management, loss aversion, Martin Wolf, money market fund, mortgage debt, mortgage tax deduction, mutually assured destruction, Myron Scholes, negative equity, new economy, Nick Leeson, Norman Mailer, Northern Rock, Own Your Own Home, Ponzi scheme, quantitative easing, reserve currency, Right to Buy, risk-adjusted returns, Robert Shiller, Robert Shiller, Ronald Reagan, shareholder value, South Sea Bubble, statistical model, The Great Moderation, the payments system, too big to fail, tulip mania, value at risk

The 1998 default was a 7-sigma event. That means it should statistically have happened only once every 3 billion years. And it wasn’t the only one. The last decades have seen numerous 5-, 6-, and 7-sigma events. Those are supposed to happen, respectively, one day in every 13,932 years, one day in every 4,039,906 years, and one day in every 3,105,395,365 years. Yet no one concluded from this that the statistical models in use were wrong. The mathematical models simply didn’t work in a crisis. They worked when they worked, which was most of the time; but the whole point of them was to assess risk, and some risks by definition happen at the edges of known likelihoods. The strange thing is that this is strongly hinted at in the VAR model, as propounded by its more philosophically minded defenders such as Philippe Jorion: it marks the boundaries of the known world, up to the VAR break, and then writes “Here be Dragons.”

pages: 279 words: 75,527

**
Collider
** by
Paul Halpern

Albert Einstein, Albert Michelson, anthropic principle, cosmic microwave background, cosmological constant, dark matter, Ernest Rutherford, Gary Taubes, gravity well, horn antenna, index card, Isaac Newton, Magellanic Cloud, pattern recognition, Richard Feynman, Ronald Reagan, Solar eclipse in 1919, statistical model, Stephen Hawking

Although this could represent an escaping graviton, more likely possibilities would need to be ruled out, such as the commonplace production of neutrinos. Unfortunately, even a hermetic detector such as ATLAS can’t account for the streams of lost neutrinos that pass unhindered through almost everything in nature—except by estimating the missing momentum and assuming it is all being transferred to neutrinos. Some physicists hope that statistical models of neutrino production would eventually prove sharp enough to indicate significant differences between the expected and actual pictures. Such discrepancies could prove that gravitons fled from collisions and ducked into regions beyond. Another potential means of establishing the existence of extra dimensions would be to look for the hypothetical phenomena called Kaluza-Klein excitations (named for Klein and an earlier unification pioneer, German mathematician Theodor Kaluza).

pages: 306 words: 78,893

**
After the New Economy: The Binge . . . And the Hangover That Won't Go Away
** by
Doug Henwood

"Robert Solow", accounting loophole / creative accounting, affirmative action, Asian financial crisis, barriers to entry, borderless world, Branko Milanovic, Bretton Woods, business cycle, capital controls, corporate governance, corporate raider, correlation coefficient, credit crunch, deindustrialization, dematerialisation, deskilling, ending welfare as we know it, feminist movement, full employment, gender pay gap, George Gilder, glass ceiling, Gordon Gekko, greed is good, half of the world's population has never made a phone call, income inequality, indoor plumbing, intangible asset, Internet Archive, job satisfaction, joint-stock company, Kevin Kelly, labor-force participation, liquidationism / Banker’s doctrine / the Treasury view, manufacturing employment, means of production, minimum wage unemployment, Naomi Klein, new economy, occupational segregation, pets.com, post-work, profit maximization, purchasing power parity, race to the bottom, Ralph Nader, Robert Gordon, Robert Shiller, Robert Shiller, Ronald Reagan, shareholder value, Silicon Valley, Simon Kuznets, statistical model, structural adjustment programs, Telecommunications Act of 1996, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, The Wealth of Nations by Adam Smith, total factor productivity, union organizing, War on Poverty, women in the workforce, working poor, zero-sum game

It's also hard to reconcile with the fact that the distribution of educational attainment has long been growing less, not more, unequal. Even classic statements of this skills argument, Hke that of Juhn, Murphy, and Pierce (1993), find that the standard proxies for skill Hke years of education and years of work experience (proxies being needed because skill is nearly impossible to define or measure) only explain part of the increase in polarization—less than half, in fact. Most of the increase remains unexplained by statistical models, a remainder that is typically attributed to "unobserved" attributes. That is, since conventional economists believe as a matter of faith that market rates of pay are fair compensation for a worker s productive contribution, any inexpHcable anomaUes in pay must be the result of things a boss can see that elude the academics model. Those of us w^ho are not constrained by a faith in the correlation of pay and productivity, or v^ho don't accept conventional definitions of what constitutes productive labor, will want to look elsewhere.

pages: 225 words: 11,355

**
Financial Market Meltdown: Everything You Need to Know to Understand and Survive the Global Credit Crisis
** by
Kevin Mellyn

asset-backed security, bank run, banking crisis, Bernie Madoff, bonus culture, Bretton Woods, business cycle, collateralized debt obligation, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, cuban missile crisis, disintermediation, diversification, fiat currency, financial deregulation, financial innovation, financial intermediation, fixed income, Francis Fukuyama: the end of history, George Santayana, global reserve currency, Home mortgage interest deduction, Isaac Newton, joint-stock company, Kickstarter, liquidity trap, London Interbank Offered Rate, long peace, margin call, market clearing, mass immigration, money market fund, moral hazard, mortgage tax deduction, Northern Rock, offshore financial centre, paradox of thrift, pattern recognition, pension reform, pets.com, plutocrats, Plutocrats, Ponzi scheme, profit maximization, pushing on a string, reserve currency, risk tolerance, risk-adjusted returns, road to serfdom, Ronald Reagan, shareholder value, Silicon Valley, South Sea Bubble, statistical model, The Great Moderation, the new new thing, the payments system, too big to fail, value at risk, very high income, War on Poverty, Y2K, yield curve

Financial innovation was all about getting more credit into the hands of consumers, making more income using less capital, and turning what had been concentrated risks off the books of banks into securities that could be traded between and owned by professional investors who could be expected to look after themselves. Like much of the ‘‘progress’’ of the last century, it was a matter of replacing common sense and tradition with science. The models produced using advanced statistics and computers were designed by brilliant minds from the best universities. At the Basle Committee, which set global standards for bank regulation to be followed by all major central banks, the use of statistical models to measure risk and reliance on the rating agencies were baked into the proposed rules for capital adequacy. The whole thing blew up not because of something obvious like greed. It failed because of the hubris, the fatal pride, of men and women who sincerely thought that they could build computer models that were capable of predicting risk and pricing it correctly. They were wrong. 4 t HOW WE GOT HERE Henry Ford famously said that history is bunk.

pages: 579 words: 76,657

**
Data Science from Scratch: First Principles with Python
** by
Joel Grus

correlation does not imply causation, natural language processing, Netflix Prize, p-value, Paul Graham, recommendation engine, SpamAssassin, statistical model

(You attempt to explain to her that search engine algorithms are clever enough that this won’t actually work, but she refuses to listen.) Of course, she doesn’t want to write thousands of web pages, nor does she want to pay a horde of “content strategists” to do so. Instead she asks you whether you can somehow programatically generate these web pages. To do this, we’ll need some way of modeling language. One approach is to start with a corpus of documents and learn a statistical model of language. In our case, we’ll start with Mike Loukides’s essay “What is data science?” As in Chapter 9, we’ll use requests and BeautifulSoup to retrieve the data. There are a couple of issues worth calling attention to. The first is that the apostrophes in the text are actually the Unicode character u"\u2019". We’ll create a helper function to replace them with normal apostrophes: def fix_unicode(text): return text.replace(u"\u2019", "'") The second issue is that once we get the text of the web page, we’ll want to split it into a sequence of words and periods (so that we can tell where sentences end).

**
Raw Data Is an Oxymoron
** by
Lisa Gitelman

23andMe, collateralized debt obligation, computer age, continuous integration, crowdsourcing, disruptive innovation, Drosophila, Edmond Halley, Filter Bubble, Firefox, fixed income, Google Earth, Howard Rheingold, index card, informal economy, Isaac Newton, Johann Wolfgang von Goethe, knowledge worker, liberal capitalism, lifelogging, longitudinal study, Louis Daguerre, Menlo Park, optical character recognition, Panopticon Jeremy Bentham, peer-to-peer, RFID, Richard Thaler, Silicon Valley, social graph, software studies, statistical model, Stephen Hawking, Steven Pinker, text mining, time value of money, trade route, Turing machine, urban renewal, Vannevar Bush, WikiLeaks

Data storage of this scale, potentially measured in petabytes, would necessarily require sophisticated algorithmic querying in order to detect informational patterns. For David Gelernter, this type of data management would require “topsight,” a topdown perspective achieved through software modeling and the creation of microcosmic “mirror worlds,” in which raw data filters in from the bottom and the whole comes into focus through statistical modeling and rule and pattern extraction.36 The promise of topsight, in Gelernter’s terms, is a progression from annales to annalistes, from data collection that would satisfy a “neo-Victorian curatorial” drive to data analysis that calculates prediction scenarios and manages risk.37 What would be the locus of suspicion and paranoid fantasy (Poster calls it “database anxiety”) if not such an intricate and operationally efficient system, the aggregating capacity of which easily ups the ante on Thomas Pynchon’s paranoid realization that “everything is connected”?

pages: 280 words: 79,029

**
Smart Money: How High-Stakes Financial Innovation Is Reshaping Our WorldÑFor the Better
** by
Andrew Palmer

Affordable Care Act / Obamacare, algorithmic trading, Andrei Shleifer, asset-backed security, availability heuristic, bank run, banking crisis, Black-Scholes formula, bonus culture, break the buck, Bretton Woods, call centre, Carmen Reinhart, cloud computing, collapse of Lehman Brothers, collateralized debt obligation, computerized trading, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, David Graeber, diversification, diversified portfolio, Edmond Halley, Edward Glaeser, endogenous growth, Eugene Fama: efficient market hypothesis, eurozone crisis, family office, financial deregulation, financial innovation, fixed income, Flash crash, Google Glasses, Gordon Gekko, high net worth, housing crisis, Hyman Minsky, implied volatility, income inequality, index fund, information asymmetry, Innovator's Dilemma, interest rate swap, Kenneth Rogoff, Kickstarter, late fees, London Interbank Offered Rate, Long Term Capital Management, longitudinal study, loss aversion, margin call, Mark Zuckerberg, McMansion, money market fund, mortgage debt, mortgage tax deduction, Myron Scholes, negative equity, Network effects, Northern Rock, obamacare, payday loans, peer-to-peer lending, Peter Thiel, principal–agent problem, profit maximization, quantitative trading / quantitative ﬁnance, railway mania, randomized controlled trial, Richard Feynman, Richard Thaler, risk tolerance, risk-adjusted returns, Robert Shiller, Robert Shiller, short selling, Silicon Valley, Silicon Valley startup, Skype, South Sea Bubble, sovereign wealth fund, statistical model, Thales of Miletus, transaction costs, Tunguska event, unbanked and underbanked, underbanked, Vanguard fund, web application

Public data from a couple of longitudinal studies showing the long-term relationship between education and income in the United States enabled him to build what he describes as “a simple multivariate regression model”—you know the sort, we’ve all built one—and work out the relationships between things such as test scores, degrees, and first jobs on later income. That model has since grown into something whizzier. An applicant’s education, SAT scores, work experience, and other details are pumped into a proprietary statistical model, which looks at people with comparable backgrounds and generates a prediction of that person’s personal income. Upstart now uses these data to underwrite loans to younger people—who often find it hard to raise money because of their limited credit histories. But the model was initially used to determine how much money an applicant could raise for each percentage point of future income they gave away.

**
Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
** by
Aurelien Geron

Amazon Mechanical Turk, Bayesian statistics, centre right, combinatorial explosion, constrained optimization, correlation coefficient, crowdsourcing, en.wikipedia.org, iterative process, Netflix Prize, NP-complete, optical character recognition, P = NP, p-value, pattern recognition, performance metric, recommendation engine, self-driving car, SpamAssassin, speech recognition, statistical model

They often end up selecting the same model, but when they differ, the model selected by the BIC tends to be simpler (fewer parameters) than the one selected by the AIC, but it does not fit the data quite as well (this is especially true for larger datasets). Likelihood function The terms “probability” and “likelihood” are often used interchangeably in the English language, but they have very different meanings in statistics: given a statistical model with some parameters θ, the word “probability” is used to describe how plausible a future outcome x is (knowing the parameter values θ), while the word “likelihood” is used to describe how plausible a particular set of parameter values θ are, after the outcome x is known. Consider a one-dimensional mixture model of two Gaussian distributions centered at -4 and +1. For simplicity, this toy model has a single parameter θ that controls the standard deviations of both distributions.

pages: 345 words: 75,660

**
Prediction Machines: The Simple Economics of Artificial Intelligence
** by
Ajay Agrawal,
Joshua Gans,
Avi Goldfarb

"Robert Solow", Ada Lovelace, AI winter, Air France Flight 447, Airbus A320, artificial general intelligence, autonomous vehicles, basic income, Bayesian statistics, Black Swan, blockchain, call centre, Capital in the Twenty-First Century by Thomas Piketty, Captain Sullenberger Hudson, collateralized debt obligation, computer age, creative destruction, Daniel Kahneman / Amos Tversky, data acquisition, data is the new oil, deskilling, disruptive innovation, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, everywhere but in the productivity statistics, Google Glasses, high net worth, ImageNet competition, income inequality, information retrieval, inventory management, invisible hand, job automation, John Markoff, Joseph Schumpeter, Kevin Kelly, Lyft, Minecraft, Mitch Kapor, Moneyball by Michael Lewis explains big data, Nate Silver, new economy, On the Economy of Machinery and Manufactures, pattern recognition, performance metric, profit maximization, QWERTY keyboard, race to the bottom, randomized controlled trial, Ray Kurzweil, ride hailing / ride sharing, Second Machine Age, self-driving car, shareholder value, Silicon Valley, statistical model, Stephen Hawking, Steve Jobs, Steven Levy, strong AI, The Future of Employment, The Signal and the Noise by Nate Silver, Tim Cook: Apple, Turing test, Uber and Lyft, uber lyft, US Airways Flight 1549, Vernor Vinge, Watson beat the top human players on Jeopardy!, William Langewiesche, Y Combinator, zero-sum game

For example, in a mobile phone churn model, researchers utilized data on hour-by-hour call records in addition to standard variables such as bill size and payment punctuality. The machine learning methods also got better at leveraging the data available. In the Duke competition, a key component of success was choosing which of the hundreds of available variables to include and choosing which statistical model to use. The best methods at the time, whether machine learning or classic regression, used a combination of intuition and statistical tests to select the variables and model. Now, machine learning methods, and especially deep learning methods, allow flexibility in the model and this means variables can combine with each other in unexpected ways. People with large phone bills who rack up minutes early in the billing month might be less likely to churn than people with large bills who rack up their minutes later in the month.

pages: 267 words: 72,552

**
Reinventing Capitalism in the Age of Big Data
** by
Viktor Mayer-Schönberger,
Thomas Ramge

accounting loophole / creative accounting, Air France Flight 447, Airbnb, Alvin Roth, Atul Gawande, augmented reality, banking crisis, basic income, Bayesian statistics, bitcoin, blockchain, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, Cass Sunstein, centralized clearinghouse, Checklist Manifesto, cloud computing, cognitive bias, conceptual framework, creative destruction, Daniel Kahneman / Amos Tversky, disruptive innovation, Donald Trump, double entry bookkeeping, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Ford paid five dollars a day, Frederick Winslow Taylor, fundamental attribution error, George Akerlof, gig economy, Google Glasses, information asymmetry, interchangeable parts, invention of the telegraph, inventory management, invisible hand, James Watt: steam engine, Jeff Bezos, job automation, job satisfaction, joint-stock company, Joseph Schumpeter, Kickstarter, knowledge worker, labor-force participation, land reform, lone genius, low cost airline, low cost carrier, Marc Andreessen, market bubble, market design, market fundamentalism, means of production, meta analysis, meta-analysis, Moneyball by Michael Lewis explains big data, multi-sided market, natural language processing, Network effects, Norbert Wiener, offshore financial centre, Parag Khanna, payday loans, peer-to-peer lending, Peter Thiel, Ponzi scheme, prediction markets, price anchoring, price mechanism, purchasing power parity, random walk, recommendation engine, Richard Thaler, ride hailing / ride sharing, Sam Altman, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, smart grid, smart meter, Snapchat, statistical model, Steve Jobs, technoutopianism, The Future of Employment, The Market for Lemons, The Nature of the Firm, transaction costs, universal basic income, William Langewiesche, Y Combinator

The idea was that every day, four hundred nationalized factories around the country would send data to Cybersyn’s nerve center in Santiago, the capital, where it would then be fed into a mainframe computer, scrutinized, and compared against forecasts. Divergences would be flagged and brought to the attention of factory directors, then to government decision makers sitting in a futuristic operations room. From there the officials would send directives back to the factories. Cybersyn was quite sophisticated for its time, employing a network approach to capturing and calculating economic activity and using Bayesian statistical models. Most important, it relied on feedback that would loop back into the decision-making processes. The system never became fully operational. Its communications network was in place and was used in the fall of 1972 to keep the country running when striking transportation workers blocked goods from entering Santiago. The computer-analysis part of Cybersyn was mostly completed, too, but its results were often unreliable and slow.

pages: 229 words: 72,431

**
Shadow Work: The Unpaid, Unseen Jobs That Fill Your Day
** by
Craig Lambert

airline deregulation, Asperger Syndrome, banking crisis, Barry Marshall: ulcers, big-box store, business cycle, carbon footprint, cashless society, Clayton Christensen, cognitive dissonance, collective bargaining, Community Supported Agriculture, corporate governance, crowdsourcing, disintermediation, disruptive innovation, financial independence, Galaxy Zoo, ghettoisation, gig economy, global village, helicopter parent, IKEA effect, industrial robot, informal economy, Jeff Bezos, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, Mark Zuckerberg, new economy, pattern recognition, plutocrats, Plutocrats, recommendation engine, Schrödinger's Cat, Silicon Valley, single-payer health, statistical model, Thorstein Veblen, Turing test, unpaid internship, Vanguard fund, Vilfredo Pareto, zero-sum game, Zipcar

Then an outsider with little knowledge of the discipline shows up, attacks a problem with statistics and algorithms, and unearths an astonishing insight. Algorithms are another tool that democratizes expertise, using the revolutionary power of data to outdo established authorities. For example, Theodore Ruger, then a law professor at Washington University in St. Louis, and three colleagues ran a contest to predict the outcome of Supreme Court cases on the 2002 docket. The four political scientists developed a statistical model based on six general case characteristics they extracted from previous trials; the model ignored information about specific laws and the facts of the actual cases. Their friendly contest pitted this model against the qualitative judgments of eighty-seven law professors, many of whom had clerked at the Court. The legal mavens knew the jurisprudence, case law, and previous decisions of each sitting justice.

pages: 296 words: 78,631

**
Hello World: Being Human in the Age of Algorithms
** by
Hannah Fry

23andMe, 3D printing, Air France Flight 447, Airbnb, airport security, augmented reality, autonomous vehicles, Brixton riot, chief data officer, computer vision, crowdsourcing, DARPA: Urban Challenge, Douglas Hofstadter, Elon Musk, Firefox, Google Chrome, Gödel, Escher, Bach, Ignaz Semmelweis: hand washing, John Markoff, Mark Zuckerberg, meta analysis, meta-analysis, pattern recognition, Peter Thiel, RAND corporation, ransomware, recommendation engine, ride hailing / ride sharing, selection bias, self-driving car, Shai Danziger, Silicon Valley, Silicon Valley startup, Snapchat, speech recognition, Stanislav Petrov, statistical model, Stephen Hawking, Steven Levy, Tesla Model S, The Wisdom of Crowds, Thomas Bayes, Watson beat the top human players on Jeopardy!, web of trust, William Langewiesche

When all the inmates were eventually granted their release, and so were free to violate the terms of their parole if they chose to, Burgess had a chance to check how good his predictions were. From such a basic analysis, he managed to be remarkably accurate. Ninety-eight per cent of his low-risk group made a clean pass through their parole, while two-thirds of his high-risk group did not.17 Even crude statistical models, it turned out, could make better forecasts than the experts. But his work had its critics. Sceptical onlookers questioned how much the factors which reliably predicted parole success in one place at one time could apply elsewhere. (They had a point: I’m not sure the category ‘farm boy’ would be much help in predicting recidivism among modern inner-city criminals.) Other scholars criticized Burgess for just making use of whatever information was on hand, without investigating if it was relevant.18 There were also questions about the way he scored the inmates: after all, his method was little more than opinion written in equations.

pages: 267 words: 71,941

**
How to Predict the Unpredictable
** by
William Poundstone

accounting loophole / creative accounting, Albert Einstein, Bernie Madoff, Brownian motion, business cycle, butter production in bangladesh, buy and hold, buy low sell high, call centre, centre right, Claude Shannon: information theory, computer age, crowdsourcing, Daniel Kahneman / Amos Tversky, Edward Thorp, Firefox, fixed income, forensic accounting, high net worth, index card, index fund, John von Neumann, market bubble, money market fund, pattern recognition, Paul Samuelson, Ponzi scheme, prediction markets, random walk, Richard Thaler, risk-adjusted returns, Robert Shiller, Robert Shiller, Rubik’s Cube, statistical model, Steven Pinker, transaction costs

They would have to offer paltry odds on a near-certain win. There might be a happy medium, though, a range of probabilities where it does make sense to bet on a strong away team. That’s what the Bristol group found. To use this rule you need a good estimate of the probability of an away team win. Such estimates are not hard to come by on the web. You can also find spreadsheets or software that can be used as is or adapted to create your own statistical model. Note that the bookie odds are not proper estimates of the chances, as they factor in the commission and other tweaks. The researchers’ optimal rule was to bet on the away team when its chance of winning was between 44.7 and 71.5 percent. This is a selective rule. It applied to just twenty-two of the 194 matches in October 2007. But the rule’s average profit would have been an astonishing 74 percent on every pound wagered.

pages: 589 words: 69,193

**
Mastering Pandas
** by
Femi Anthony

Amazon Web Services, Bayesian statistics, correlation coefficient, correlation does not imply causation, Debian, en.wikipedia.org, Internet of things, natural language processing, p-value, random walk, side project, statistical model, Thomas Bayes

The normalizing constant doesn't always need to be calculated, especially in many popular algorithms such as MCMC, which we will examine later in this chapter. is the probability that the hypothesis is true, given the data that we observe. This is called the posterior. is the probability of obtaining the data, considering our hypothesis. This is called the likelihood. Thus, Bayesian statistics amounts to applying Bayes rule to solve problems in inferential statistics with H representing our hypothesis and D the data. A Bayesian statistical model is cast in terms of parameters, and the uncertainty in these parameters is represented by probability distributions. This is different from the Frequentist approach where the values are regarded as deterministic. An alternative representation is as follows: where, is our unknown data and is our observed data In Bayesian statistics, we make assumptions about the prior data and use the likelihood to update to the posterior probability using the Bayes rule.

**
Deep Work: Rules for Focused Success in a Distracted World
** by
Cal Newport

8-hour work day, Albert Einstein, barriers to entry, business climate, Cal Newport, Capital in the Twenty-First Century by Thomas Piketty, Clayton Christensen, David Brooks, David Heinemeier Hansson, deliberate practice, disruptive innovation, Donald Knuth, Donald Trump, Downton Abbey, en.wikipedia.org, Erik Brynjolfsson, experimental subject, follow your passion, Frank Gehry, informal economy, information retrieval, Internet Archive, Jaron Lanier, knowledge worker, Mark Zuckerberg, Marshall McLuhan, Merlin Mann, Nate Silver, new economy, Nicholas Carr, popular electronics, remote working, Richard Feynman, Ruby on Rails, Silicon Valley, Silicon Valley startup, Snapchat, statistical model, the medium is the message, Watson beat the top human players on Jeopardy!, web application, winner-take-all economy, zero-sum game

But the real importance of this story is the experiment itself, and in particular, its complexity. It turns out to be really difficult to answer a simple question such as: What’s the impact of our current e-mail habits on the bottom line? Cochran had to conduct a company-wide survey and gather statistics from the IT infrastructure. He also had to pull together salary data and information on typing and reading speed, and run the whole thing through a statistical model to spit out his final result. And even then, the outcome is fungible, as it’s not able to separate out, for example, how much value was produced by this frequent, expensive e-mail use to offset some of its cost. This example generalizes to most behaviors that potentially impede or improve deep work. Even though we abstractly accept that distraction has costs and depth has value, these impacts, as Tom Cochran discovered, are difficult to measure.

pages: 238 words: 75,994

**
A Burglar's Guide to the City
** by
Geoff Manaugh

A. Roger Ekirch, big-box store, card file, dark matter, game design, index card, megacity, megastructure, Minecraft, off grid, Rubik’s Cube, Skype, smart cities, statistical model, the built environment, urban planning

* The fundamental premise of the capture-house program is that police can successfully predict what sorts of buildings and internal spaces will attract not just any criminal but a specific burglar, the unique individual each particular capture house was built to target. This is because burglars unwittingly betray personal, as well as shared, patterns in their crimes; they often hit the same sorts of apartments and businesses over and over. But the urge to mathematize this, and to devise complex statistical models for when and where a burglar will strike next, can lead to all sorts of analytical absurdities. A great example of this comes from an article published in the criminology journal Crime, Law and Social Change back in 2011. Researchers from the Physics Engineering Department at Tsinghua University reported some eyebrow-raisingly specific data about the meteorological circumstances during which burglaries were most likely to occur in urban China.

pages: 277 words: 80,703

**
Revolution at Point Zero: Housework, Reproduction, and Feminist Struggle
** by
Silvia Federici

Community Supported Agriculture, declining real wages, equal pay for equal work, feminist movement, financial independence, fixed income, global village, illegal immigration, informal economy, invisible hand, labor-force participation, land tenure, mass incarceration, means of production, microcredit, neoliberal agenda, new economy, Occupy movement, planetary scale, Scramble for Africa, statistical model, structural adjustment programs, the market place, trade liberalization, UNCLOS, wages for housework, Washington Consensus, women in the workforce, World Values Survey

At least since the Zapatistas, on December 31, 1993, took over the zócalo of San Cristóbal to protest legislation dissolving the ejidal lands of Mexico, the concept of the “commons” has gained popularity among the radical Left, internationally and in the United States, appearing as a ground of convergence among anarchists, Marxists/socialists, ecologists, and ecofeminists.1 There are important reasons why this apparently archaic idea has come to the center of political discussion in contemporary social movements. Two in particular stand out. On the one side, there has been the demise of the statist model of revolution that for decades has sapped the efforts of radical movements to build an alternative to capitalism. On the other, the neoliberal attempt to subordinate every form of life and knowledge to the logic of the market has heightened our awareness of the danger of living in a world in which we no longer have access to seas, trees, animals, and our fellow beings except through the cash-nexus.

pages: 373 words: 80,248

**
Empire of Illusion: The End of Literacy and the Triumph of Spectacle
** by
Chris Hedges

Albert Einstein, Ayatollah Khomeini, Cal Newport, clean water, collective bargaining, corporate governance, creative destruction, Credit Default Swap, haute couture, Honoré de Balzac, Howard Zinn, illegal immigration, income inequality, Joseph Schumpeter, Naomi Klein, offshore financial centre, Ralph Nader, Ronald Reagan, single-payer health, social intelligence, statistical model, uranium enrichment

He told the senators that the collapse of the global financial system is “likely to produce a wave of economic crises in emerging market nations over the next year.” He added that “much of Latin America, former Soviet Union states, and sub-Saharan Africa lack sufficient cash reserves, access to international aid or credit, or other coping mechanism.” “When those growth rates go down, my gut tells me that there are going to be problems coming out of that, and we’re looking for that,” he said. He referred to “statistical modeling” showing that “economic crises increase the risk of regime-threatening instability if they persist over a one- to two-year period.” Blair articulated the newest narrative of fear. As the economic unraveling accelerates, we will be told it is not the bearded Islamic extremists who threaten us most, although those in power will drag them out of the Halloween closet whenever they need to give us an exotic shock, but instead the domestic riffraff, environmentalists, anarchists, unions, right-wing militias, and enraged members of our dispossessed working class.

pages: 251 words: 76,128

**
Borrow: The American Way of Debt
** by
Louis Hyman