174 results back to index
pages: 227 words: 62,177 
Numbers Rule Your World: The Hidden Influence of Probability and Statistics on Everything You Do by Kaiser Fung Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
American Society of Civil Engineers: Report Card, Andrew Wiles, Bernie Madoff, Black Swan, call centre, correlation does not imply causation, crosssubsidies, Daniel Kahneman / Amos Tversky, edge city, Emanuel Derman, facts on the ground, Gary Taubes, John Snow's cholera map, moral hazard, pvalue, pattern recognition, profit motive, Report Card for America’s Infrastructure, statistical model, the scientific method, traveling salesman Figure C1 Drawing a Line Between Natural and Doping Highs Because the antidoping laboratories face bad publicity for false positives (while false negatives are invisible unless the dopers confess), they calibrate the tests to minimize false accusations, which allows some athletes to get away with doping. The Virtue of Being Wrong The subject matter of statistics is variability, and statistical models are tools that examine why things vary. A disease outbreak model links causes to effects to tell us why some people fall ill while others do not; a creditscoring model identifies correlated traits to describe which borrowers are likely to default on their loans and which will not. These two examples represent two valid modes of statistical modeling. George Box is justly celebrated for his remark “All models are false but some are useful.” The mark of great statisticians is their confidence in the face of fallibility. They recognize that no one can have a monopoly on the truth, which is unknowable as long as there is uncertainty in the world. … Highway engineers in Minnesota tell us why their favorite tactic to reduce congestion is a technology that forces commuters to wait more, while Disney engineers make the case that the most effective tool to reduce wait times does not actually reduce average wait times. Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. In Chapter 2, we compare and contrast these two modes of statistical modeling by trailing disease detectives on the hunt for tainted spinach (causal models) and by prying open the black box that produces credit scores (correlational models). Surprisingly, these practitioners freely admit that their models are “wrong” in the sense that they do not perfectly describe the world around us; we explore how they justify what they do. Third, statisticians are constantly looking out for missed nuances: a statistical average for all groups may well hide vital differences that exist between these groups. … They play a highstakes game, ever wary of the tyranny of the unknown, ever worried about the consequence of miscalculation. Their special talent is the educated guess, with emphasis on the adjective. The leaders of the pack are practicalminded people who rely on detailed observation, directed research, and data analysis. Their Achilles heel is the big I, when they let intuition lead them astray. This chapter celebrates two groups of statistical modelers who have made lasting, positive impacts on our lives. First, we meet the epidemiologists whose investigations explain the causes of disease. Later, we meet credit modelers who mark our fiscal reputation for banks, insurers, landlords, employers, and so on. By observing these scientists in action, we will learn how they have advanced the technical frontier and to what extent we can trust their handiwork. ~###~ In November 2006, the U.S. 
pages: 209 words: 13,138 
Empirical Market Microstructure: The Institutions, Economics and Econometrics of Securities Trading by Joel Hasbrouck Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
barriers to entry, conceptual framework, correlation coefficient, discrete time, disintermediation, distributed generation, experimental economics, financial intermediation, index arbitrage, interest rate swap, inventory management, market clearing, market design, market friction, market microstructure, martingale, price discovery process, price discrimination, quantitative trading / quantitative ﬁnance, random walk, Richard Thaler, secondprice auction, short selling, statistical model, stochastic process, stochastic volatility, transaction costs, twosided market, ultimatum game If we know that the structural model is the particular one described in section 9.2, we simply set vt so that qt = +1, set ut = 0 and forecast using equation (9.7). We do not usually know the structural model, however. Typically we’re working from estimates of a statistical model (a VAR or VMA). This complicates specification of ε0 . From the perspective of the VAR or VMA model of the trade and price data, the innovation vector and its variance are: 2 σp,q σp εp,t . (9.15) and = εt = εq,t σp,q σq2 The innovations in the statistical model are simply associated with the observed variables, and have no necessary structural interpretation. We can still set εq,t according to our contemplated trade (εq,t = +1), but how should we set εp,t ? MULTIVARIATE LINEAR MICROSTRUCTURE MODELS The answer to this specific problem depends on the immediate (time t) relation between the trade and pricechange innovations. … The role they play and how they should be regulated are ongoing concerns of practical interest. 117 12 Limit Order Markets The worldwide proliferation of limit order markets (LOMs) clearly establishes a need for economic and statistical models of these mechanisms. This chapter discusses some approaches, but it should be admitted at the outset that no comprehensive and realistic models (either statistical or economic) exist. One might start with the view that a limit order, being a bid or offer, is simply a dealer quote by another name. The implication is that a limit order is exposed to asymmetric information risk and also must recover noninformational costs of trade. This view supports the application of the economic and statistical models described earlier to LOM, hybrid, and other nondealer markets. This perspective features a sharp division between liquidity suppliers and demanders. … Stock exchanges—Mathematical models. I. Title. HG4521.H353 2007 332.64—dc22 2006003935 9 8 7 6 5 4 3 2 1 Printed in the United States of America on acidfree paper To Lisa, who inspires these pages and much more. This page intentionally left blank Preface This book is a study of the trading mechanisms in financial markets: the institutions, the economic principles underlying the institutions, and statistical models for analyzing the data they generate. The book is aimed at graduate and advanced undergraduate students in financial economics and practitioners who design or use order management systems. Most of the book presupposes only a basic familiarity with economics and statistics. I began writing this book because I perceived a need for treatment of empirical market microstructure that was unified, authoritative, and comprehensive. 
pages: 327 words: 103,336 
Everything Is Obvious: *Once You Know the Answer by Duncan J. Watts Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
affirmative action, Albert Einstein, Amazon Mechanical Turk, Black Swan, butterfly effect, Carmen Reinhart, Cass Sunstein, clockwork universe, cognitive dissonance, collapse of Lehman Brothers, complexity theory, correlation does not imply causation, crowdsourcing, death of newspapers, discovery of DNA, East Village, easy for humans, difficult for computers, edge city, en.wikipedia.org, Erik Brynjolfsson, framing effect, Geoffrey West, Santa Fe Institute, happiness index / gross national happiness, high batting average, hindsight bias, illegal immigration, interest rate swap, invention of the printing press, invention of the telescope, invisible hand, Isaac Newton, Jane Jacobs, Jeff Bezos, Joseph Schumpeter, Kenneth Rogoff, lake wobegon effect, Long Term Capital Management, loss aversion, medical malpractice, meta analysis, metaanalysis, Milgram experiment, natural language processing, Netflix Prize, Network effects, oil shock, packet switching, pattern recognition, performance metric, phenotype, planetary scale, prediction markets, pre–internet, RAND corporation, random walk, RFID, school choice, Silicon Valley, statistical model, Steve Ballmer, Steve Jobs, Steve Wozniak, supplychain management, The Death and Life of Great American Cities, the scientific method, The Wisdom of Crowds, too big to fail, Toyota Production System, ultimatum game, urban planning, Vincenzo Peruggia: Mona Lisa, Watson beat the top human players on Jeopardy!, X Prize Nevertheless, as a speculative exercise, we tested a range of plausible assumptions, each corresponding to a different hypothetical “influencerbased” marketing campaign, and measured their return on investment using the same statistical model as before. What we found was surprising even to us: Even though the Kim Kardashians of the world were indeed more influential than average, they were so much more expensive that they did not provide the best value for the money. Rather, it was what we called ordinary influencers, meaning individuals who exhibit average or even lessthanaverage influence, who often proved to be the most costeffective means to disseminate information. CIRCULAR REASONING AGAIN Before you rush out to short stock in Kim Kardashian, I should emphasize that we didn’t actually run the experiment that we imagined. Even though we were studying data from the real world, not a computer simulation, our statistical models still made a lot of assumptions. Assuming, for example, that our hypothetical marketer could persuade a few thousand ordinary influencers to tweet about their product, it is not at all obvious that their followers would respond as favorably as they do to normal tweets. … Next, we compared the performance of these two polls with the Vegas sports betting market—one of the oldest and most popular betting markets in the world—as well as with another prediction market, TradeSports. And finally, we compared the prediction of both the markets and the polls against two simple statistical models. The first model relied only on the historical probability that home teams win—which they do 58 percent of the time—while the second model also factored in the recent winloss records of the two teams in question. In this way, we set up a sixway comparison between different prediction methods—two statistical models, two markets, and two polls.6 Given how different these methods were, what we found was surprising: All of them performed about the same. To be fair, the two prediction markets performed a little better than the other methods, which is consistent with the theoretical argument above. … Indeed, an entire field of research called sabermetrics has developed specifically for the purpose of analyzing baseball statistics, even spawning its own journal, the Baseball Research Journal. One might think, therefore, that prediction markets, with their far greater capacity to factor in different sorts of information, would outperform simplistic statistical models by a much wider margin for baseball than they do for football. But that turns out not to be true either. We compared the predictions of the Las Vegas sports betting markets over nearly twenty thousand Major League baseball games played from 1999 to 2006 with a simple statistical model based again on hometeam advantage and the recent winloss records of the two teams. This time, the difference between the two was even smaller—in fact, the performance of the market and the model were indistinguishable. In spite of all the statistics and analysis, in other words, and in spite of the absence of meaningful salary caps in baseball and the resulting concentration of superstar players on teams like the New York Yankees and Boston Red Sox, the outcomes of baseball games are even closer to random events than football games. 
pages: 257 words: 13,443 
Statistical Arbitrage: Algorithmic Trading Insights and Techniques by Andrew Pole Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
algorithmic trading, Benoit Mandelbrot, Chance favours the prepared mind, constrained optimization, Dava Sobel, Long Term Capital Management, Louis Pasteur, mandelbrot fractal, market clearing, market fundamentalism, merger arbitrage, pattern recognition, price discrimination, profit maximization, quantitative trading / quantitative ﬁnance, risk tolerance, Sharpe ratio, statistical arbitrage, statistical model, stochastic volatility, systematic trading, transaction costs Once again, some of the variation magically disappears when each day is scaled according to that day’s overall volume in the stock. Orders, up to a threshold labeled ‘‘visibility threshold,’’ have less impact on largevolume days. Fitting a mathematical curve or statistical model to the order size–market impact data yields a tool for answering the question: How much will I have to pay to buy 10,000 shares of XYZ? Note that buy and sell responses may be different and may be dependent on whether the stock is moving up or down that day. Breaking down the raw (60day) data set and analyzing up days and down days separately will illuminate that issue. More formally, one could define an encompassing statistical model including an indicator variable for up or down day and test the significance of the estimated coefficient. Given the dubious degree to which one could reasonably determine independence and other conditions necessary for the validity of such statistical tests (without a considerable amount of work) one will be better off building prediction models for the combined data and for the up/down days separately and comparing predictions. … Approaches for selecting a universe of instruments for modeling and trading are described. Consideration of change is Preface xv introduced from this first toe dipping into analysis, because temporal dynamics underpin the entirety of the project. Without the dynamic there is no arbitrage. In Chapter 3 we increase the depth and breadth of the analysis, expanding the modeling scope from simple observational rules1 for pairs to formal statistical models for more general portfolios. Several popular models for time series are described but detailed focus is on weighted moving averages at one extreme of complexity and factor analysis at another, these extremes serving to carry the message as clearly as we can make it. Pair spreads are referred to throughout the text serving, as already noted, as the simplest practical illustrator of the notions discussed. … Therefore, it is not necessary to be overly concerned about which set of events to use in the correlation analysis as a screen for good riskcontrolled candidate pairs. Events in trading volume series provide information sometimes not identified (by turning point analysis) in price series. Volume patterns do not directly affect price spreads but volume spurts are a useful warning that a stock may be subject to unusual trading activity and that price development may therefore not be as characterized in statistical models that have been estimated on average recent historical price series. In historical analysis, flags of unusual activity are extremely important in the evaluation of, for example, simulation 25 Statistical Arbitrage 80 $ 70 60 50 40 19970102 19970524 19971016 19980312 FIGURE 2.8 Adjusted close price trace (General Motors) with 20 percent turning points identified TABLE 2.1 Event return summary for Chrysler–GM Criterion daily 30% move 25% move 20% move # Events Return Correlation 332 22 26 33 0.53 0.75 0.73 0.77 results. 
pages: 829 words: 186,976 
The Signal and the Noise: Why So Many Predictions FailBut Some Don't by Nate Silver Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
airport security, availability heuristic, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, bigbox store, Black Swan, Broken windows theory, Carmen Reinhart, Claude Shannon: information theory, Climategate, Climatic Research Unit, cognitive dissonance, collapse of Lehman Brothers, collateralized debt obligation, complexity theory, computer age, correlation does not imply causation, Credit Default Swap, credit default swaps / collateralized debt obligations, cuban missile crisis, Daniel Kahneman / Amos Tversky, diversification, Donald Trump, Edmond Halley, Edward Lorenz: Chaos theory, en.wikipedia.org, equity premium, Eugene Fama: efficient market hypothesis, everywhere but in the productivity statistics, fear of failure, Fellow of the Royal Society, Freestyle chess, fudge factor, George Akerlof, haute cuisine, Henri Poincaré, high batting average, housing crisis, income per capita, index fund, Internet Archive, invention of the printing press, invisible hand, Isaac Newton, James Watt: steam engine, John Nash: game theory, John von Neumann, Kenneth Rogoff, knowledge economy, locking in a profit, Loma Prieta earthquake, market bubble, Mikhail Gorbachev, Moneyball by Michael Lewis explains big data, Monroe Doctrine, mortgage debt, Nate Silver, new economy, Norbert Wiener, PageRank, pattern recognition, pets.com, prediction markets, Productivity paradox, random walk, Richard Thaler, Robert Shiller, Robert Shiller, Rodney Brooks, Ronald Reagan, Saturday Night Live, savings glut, security theater, short selling, Skype, statistical model, Steven Pinker, The Great Moderation, The Market for Lemons, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, too big to fail, transaction costs, transfer pricing, University of East Anglia, Watson beat the top human players on Jeopardy!, wikimedia commons Moreover, even the aggregate economic forecasts have been quite poor in any realworld sense, so there is plenty of room for progress. Most economists rely on their judgment to some degree when they make a forecast, rather than just take the output of a statistical model as is. Given how noisy the data is, this is probably helpful. A study62 by Stephen K. McNess, the former vice president of the Federal Reserve Bank of Boston, found that judgmental adjustments to statistical forecasting methods resulted in forecasts that were about 15 percent more accurate. The idea that a statistical model would be able to “solve” the problem of economic forecasting was somewhat in vogue during the 1970s and 1980s when computers came into wider use. But as was the case in other fields, like earthquake forecasting during that time period, improved technology did not cover for the lack of theoretical understanding about the economy; it only gave economists faster and more elaborate ways to mistake noise for a signal. … McNees, “The Role of Judgment in Macroeconomic Forecasting Accuracy,” International Journal of Forecasting, 6, no. 3, pp. 287–99, October 1990. http://www.sciencedirect.com/science/article/pii/016920709090056H. 63. About the only economist I am aware of who relies solely on statistical models without applying any adjustments to them is Ray C. Fair of Yale. I looked at the accuracy of the forecasts from Fair’s model, which have been published regularly since 1984. They aren’t bad in some cases: the GDP and inflation forecasts from Fair’s model have been roughly as good as those of the typical judgmental forecaster. However, the model’s unemployment forecasts have always been very poor, and its performance has been deteriorating recently as it considerably underestimated the magnitude of the recent recession while overstating the prospects for recovery. One problem with statistical models is that they tend to perform well until one of their assumptions is violated and they encounter a new situation, in which case they may produce very inaccurate forecasts. … .* This explanation becomes less credible, however, when the forecaster does not have a history of successful predictions and when the magnitude of his error is larger. In these cases, it is much more likely that the fault lies with the forecaster’s model of the world and not with the world itself. In the instance of CDOs, the ratings agencies had no track record at all: these were new and highly novel securities, and the default rates claimed by S&P were not derived from historical data but instead were assumptions based on a faulty statistical model. Meanwhile, the magnitude of their error was enormous: AAArated CDOs were two hundred times more likely to default in practice than they were in theory. The ratings agencies’ shot at redemption would be to admit that the models had been flawed and the mistake had been theirs. But at the congressional hearing, they shirked responsibility and claimed to have been unlucky. They blamed an external contingency: the housing bubble. 

Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyberphysical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supplychain management, text mining, thinkpad, web application Statistics Statistics studies the collection, analysis, interpretation or explanation, and presentation of data. Data mining has an inherent connection with statistics. A statistical model is a set of mathematical functions that describe the behavior of the objects in a target class in terms of random variables and their associated probability distributions. Statistical models are widely used to model data and data classes. For example, in data mining tasks like data characterization and classification, statistical models of target classes can be built. In other words, such statistical models can be the outcome of a data mining task. Alternatively, data mining tasks can be built on top of statistical models. For example, we can use statistics to model noise and missing data values. Then, when mining patterns in a large data set, the data mining process can use the model to help identify and handle noisy or missing values in the data. … Thus, the Gaussian distribution gD can be used to model the normal data, that is, most of the data points in the data set. For each object y in region, R, we can estimate , the probability that this point fits the Gaussian distribution. Because is very low, y is unlikely generated by the Gaussian model, and thus is an outlier. The effectiveness of statistical methods highly depends on whether the assumptions made for the statistical model hold true for the given data. There are many kinds of statistical models. For example, the statistic models used in the methods may be parametric or nonparametric. Statistical methods for outlier detection are discussed in detail in Section 12.3. ProximityBased Methods Proximitybased methods assume that an object is an outlier if the nearest neighbors of the object are far away in feature space, that is, the proximity of the object to its neighbors significantly deviates from the proximity of most of the other objects to their neighbors in the same data set. … Then, when mining patterns in a large data set, the data mining process can use the model to help identify and handle noisy or missing values in the data. Statistics research develops tools for prediction and forecasting using data and statistical models. Statistical methods can be used to summarize or describe a collection of data. Basic statistical descriptions of data are introduced in Chapter 2. Statistics is useful for mining various patterns from data as well as for understanding the underlying mechanisms generating and affecting the patterns. Inferential statistics (or predictive statistics) models data in a way that accounts for randomness and uncertainty in the observations and is used to draw inferences about the process or population under investigation. Statistical methods can also be used to verify data mining results. For example, after a classification or prediction model is mined, the model should be verified by statistical hypothesis testing. 
pages: 204 words: 58,565 
Keeping Up With the Quants: Your Guide to Understanding and Using Analytics by Thomas H. Davenport, Jinho Kim Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
BlackScholes formula, business intelligence, business process, call centre, computer age, correlation coefficient, correlation does not imply causation, Credit Default Swap, en.wikipedia.org, feminist movement, Florence Nightingale: pie chart, forensic accounting, global supply chain, Hans Rosling, hypertext link, invention of the telescope, inventory management, Jeff Bezos, margin call, Moneyball by Michael Lewis explains big data, Netflix Prize, pvalue, performance metric, publish or perish, quantitative hedge fund, random walk, Renaissance Technologies, Robert Shiller, Robert Shiller, selfdriving car, sentiment analysis, six sigma, Skype, statistical model, supplychain management, text mining, the scientific method Data analysis * * * Key Software Vendors for Different Analysis Types (listed alphabetically) REPORTING SOFTWARE BOARD International IBM Cognos Information Builders WebFOCUS Oracle Business Intelligence (including Hyperion) Microsoft Excel/SQL Server/SharePoint MicroStrategy Panorama SAP BusinessObjects INTERACTIVE VISUAL ANALYTICS QlikTech QlikView Tableau TIBCO Spotfire QUANTITATIVE OR STATISTICAL MODELING IBM SPSS R (an opensource software package) SAS * * * While all of the listed reporting software vendors also have capabilities for graphical display, some vendors focus specifically on interactive visual analytics, or the use of visual representations of data and reporting. Such tools are often used simply to graph data and for data discovery—understanding the distribution of the data, identifying outliers (data points with unexpected values) and visual relationships between variables. So we’ve listed these as a separate category. We’ve also listed key vendors of software for the other category of analysis, which we’ll call quantitative or statistical modeling. In that category, you’re trying to use statistics to understand the relationships between variables and to make inferences from your sample to a larger population. … However, there are circumstances in which these “black box” approaches to analysis can greatly leverage the time and productivity of human analysts. In bigdata environments, where the data just keeps coming in large volumes, it may not always be possible for humans to create hypotheses before sifting through the data. In the context of placing digital ads on publishers’ sites, for example, decisions need to be made in thousandths of a second by automated decision systems, and the firms doing this work must generate several thousand statistical models per week. Clearly this type of analysis can’t involve a lot of human hypothesizing and reflection on results, and machine learning is absolutely necessary. But for the most part, we’d advise sticking to hypothesisdriven analysis and the steps and sequence in this book. The Modeling (Variable Selection) Step A model is a purposefully simplified representation of the phenomenon or problem. … The software vendors for this type of data tend to be different from the reporting software vendors, though the two categories are blending a bit over time. Microsoft Excel, for example, perhaps the most widely used analytical software tool in the world (though most people think of it as a spreadsheet tool), can do some statistical analysis (and visual analytics) as well as reporting, but it’s not the most robust statistical software if you have a lot of data or a complex statistical model to build, so it’s not listed in that category. Excel’s usage for analytics in the corporate environment is frequently augmented by other Microsoft products, including SQL Server (primarily a database tool with some analytical functionality) and SharePoint (primarily a collaboration tool, with some analytical functionality). Types of Models There are a variety of model types that analysts and their organizations use to think analytically and make databased decisions. 
pages: 400 words: 94,847 
Reinventing Discovery: The New Era of Networked Science by Michael Nielsen Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, augmented reality, barriers to entry, bioinformatics, Cass Sunstein, Climategate, Climatic Research Unit, conceptual framework, dark matter, discovery of DNA, double helix, Douglas Engelbart, en.wikipedia.org, Erik Brynjolfsson, fault tolerance, Fellow of the Royal Society, Firefox, Freestyle chess, Galaxy Zoo, Internet Archive, invisible hand, Jane Jacobs, Jaron Lanier, Kevin Kelly, Magellanic Cloud, means of production, medical residency, Nicholas Carr, publish or perish, Richard Feynman, Richard Feynman, Richard Stallman, semantic web, Silicon Valley, Silicon Valley startup, Simon Singh, Skype, slashdot, social web, statistical model, Stephen Hawking, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, The Nature of the Firm, The Wisdom of Crowds, University of East Anglia, Vannevar Bush, Vernor Vinge Might it be that the statistical models contain more truth than our conventional theories of language, with their notions of verb, noun, and adjective, subjects and objects, and so on? Or perhaps the models contain a different kind of truth, in part complementary, and in part overlapping, with conventional theories of language? Maybe we could develop a better theory of language by combining the best insights from the conventional approach and the approach based on statistical modeling into a single, unified explanation? Unfortunately, we don’t yet know how to make such unified theories. But it’s stimulating to speculate that nouns and verbs, subjects and objects, and all the other paraphernalia of language are really emergent properties whose existence can be deduced from statistical models of language. … The program would also examine the corpus to figure out how words moved around in the sentence, observing, for example, that “hola” and “hello” tend to be in the same parts of the sentence, while other words get moved around more. Repeating this for every pair of words in the Spanish and English languages, their program gradually built up a statistical model of translation—an immensely complex model, but nonetheless one that can be stored on a modern computer. I won’t describe the models they used in complete detail here, but the holahello example gives you the flavor. Once they had analyzed the corpus and built up their statistical model, they used that model to translate new texts. To translate a Spanish sentence, the idea was to find the English sentence that, according to the model, had the highest probability. That highprobability sentence would be output as the translation. Frankly, when I first heard about statistical machine translation I thought it didn’t sound very promising. … But whereas Darwin’s theory of evolution can be summed up in a few sentences, and Einstein’s general theory of relativity can be expressed in a single equation, these theories of translation are expressed in models with billions of parameters. You might object that such a statistical model doesn’t seem much like a conventional scientific explanation, and you’d be right: it’s not an explanation in the conventional sense. But perhaps it should be considered instead as a new kind of explanation. Ordinarily, we judge explanations in part by their ability to predict new phenomena. In the case of translation, that means accurately translating neverbeforeseen sentences. And so far, at least, the statistical translation models do a better job of that than any conventional theory of language. It’s telling that a model that doesn’t even understand the nounverb distinction can outperform our best linguistic models. At the least we should take seriously the idea that these statistical models express truths not found in more conventional explanations of language translation. 
pages: 354 words: 26,550 
HighFrequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems by Irene Aldridge Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
algorithmic trading, asset allocation, assetbacked security, automated trading system, backtesting, Black Swan, Brownian motion, business process, capital asset pricing model, centralized clearinghouse, collapse of Lehman Brothers, collateralized debt obligation, collective bargaining, diversification, equity premium, fault tolerance, financial intermediation, fixed income, high net worth, implied volatility, index arbitrage, interest rate swap, inventory management, law of one price, Long Term Capital Management, Louis Bachelier, margin call, market friction, market microstructure, martingale, New Journalism, pvalue, paper trading, performance metric, profit motive, purchasing power parity, quantitative trading / quantitative ﬁnance, random walk, Renaissance Technologies, risk tolerance, riskadjusted returns, risk/return, Sharpe ratio, short selling, Small Order Execution System, statistical arbitrage, statistical model, stochastic process, stochastic volatility, systematic trading, trade route, transaction costs, value at risk, yield curve Operational risk—the risk of financial losses embedded in daily trading operations 5. Legal risk—the risk of litigation expenses All current risk measurement approaches fall into four categories: r r r r Statistical models Scalar models Scenario analysis Causal modeling Statistical models generate predictions about worstcase future conditions based on past information. The ValueatRisk (VaR) methodology is the most common statistical risk measurement tool, discussed in detail in the sections that focus on market and liquidity risk estimation. Statistical models are the preferred methodology of risk estimation whenever statistical modeling is feasible. Scalar models establish the maximum foreseeable loss levels as percentages of business parameters, such as revenues, operating costs, and the like. The parameters can be computed as averages of several days, weeks, months, or even years of a particular business variable, depending on the time frame most suitable for each parameter. … Yet, readers relying on software packages with preconfigured statistical procedures may find the level of detail presented here to be sufficient for quality analysis of trading opportunities. The depth of the statistical content should be also sufficient for readers to understand the models presented throughout the remainder of this book. Readers interested in a more thorough treatment of statistical models may refer to Tsay (2002); Campbell, Lo, and MacKinlay (1997); and Gouriéroux and Jasiak (2001). This chapter begins with a review of the fundamental statistical estimators, moves on to linear dependency identification methods and volatility modeling techniques, and concludes with standard nonlinear approaches for identifying and modeling trading opportunities. T STATISTICAL PROPERTIES OF RETURNS According to Dacorogna et al. (2001, p. 121), “highfrequency data opened up a whole new field of exploration and brought to light some behaviors that could not be observed at lower frequencies.” … CHAPTER 12 Event Arbitrage ith news reported instantly and trades placed on a tickbytick basis, highfrequency strategies are now ideally positioned to profit from the impact of announcements on markets. These highfrequency strategies, which trade on the market movements surrounding news announcements, are collectively referred to as event arbitrage. This chapter investigates the mechanics of event arbitrage in the following order: W r Overview of the development process r Generating a price forecast through statistical modeling of r Directional forecasts r Point forecasts r Applying event arbitrage to corporate announcements, industry news, and macroeconomic news r Documented effects of events on foreign exchange, equities, fixed income, futures, emerging economies, commodities, and REIT markets DEVELOPING EVENT ARBITRAGE TRADING STRATEGIES Event arbitrage refers to the group of trading strategies that place trades on the basis of the markets’ reaction to events. 
pages: 320 words: 33,385 
Market Risk Analysis, Quantitative Methods in Finance by Carol Alexander Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
asset allocation, backtesting, barriers to entry, Brownian motion, capital asset pricing model, constrained optimization, credit crunch, Credit Default Swap, discounted cash flows, discrete time, diversification, diversified portfolio, en.wikipedia.org, implied volatility, interest rate swap, market friction, market microstructure, pvalue, performance metric, quantitative trading / quantitative ﬁnance, random walk, risk tolerance, riskadjusted returns, risk/return, Sharpe ratio, statistical arbitrage, statistical model, stochastic process, stochastic volatility, transaction costs, value at risk, volatility smile, Wiener process, yield curve Chapter 3, Probability and Statistics, covers the probabilistic and statistical models that we use to analyse the evolution of financial asset prices or interest rates. Starting from the basic concepts of a random variable, a probability distribution, quantiles and population and sample moments, we then provide a catalogue of probability distributions. We describe the theoretical properties of each distribution and give examples of practical applications to finance. Stable distributions and kernel estimates are also covered, because they have broad applications to financial risk management. The sections on statistical inference and maximum likelihood lay the foundations for Chapter 4. Finally, we focus on the continuous time and discrete time statistical models for the evolution of financial asset prices and returns, which are further developed in Volume III. … The multivariate t distribution has very useful applications which will be described in Volumes II and IV. Its most important market risk modelling applications are to: • multivariate GARCH modelling, generating copulas, and • simulating asset prices. • I.3.5 INTRODUCTION TO STATISTICAL INFERENCE A statistical model will predict well only if it is properly specified and its parameter estimates are robust, unbiased and efficient. Unbiased means that the expected value of the estimator is equal to the true model parameter and efficient means that the variance of the estimator is low, i.e. different samples give similar estimates. When we set up a statistical model the implicit assumption is that this is the ‘true’ model for the population. We estimate the model’s parameters from a sample and then use these estimates to infer the values of the ‘true’ population parameters. With what degree of confidence can we say that the ‘true’ parameter takes some value such as 0? … Using this addin, we have been able to compute eigenvectors and eigenvalues and perform many other matrix operations that would not be possible otherwise in Excel, except by purchasing software. This matrix.xla addin is included on the CDROM, but readers may also like to download any later versions, currently available free from: http://digilander.libero.it/foxes (email: leovlp@libero.it). I.3 Probability and Statistics I.3.1 INTRODUCTION This chapter describes the probabilistic and statistical models that we use to analyse the evolution of financial asset prices or interest rates. Prices or returns on financial assets, interest rates or their changes, and the value or P&L of a portfolio are some examples of the random variables used in finance. A random variable is a variable whose value could be observed today and in the past, but whose future values are unknown. We may have some idea about the future values, but we do not know exactly which value will be realized in the future. 
pages: 265 words: 74,000 
The Numerati by Stephen Baker Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Berlin Wall, Black Swan, business process, call centre, correlation does not imply causation, Drosophila, full employment, illegal immigration, index card, Isaac Newton, job automation, job satisfaction, McMansion, natural language processing, PageRank, personalized medicine, recommendation engine, RFID, Silicon Valley, Skype, statistical model, Watson beat the top human players on Jeopardy! He started publishing papers nearly as soon as he arrived. And when he got his master's, he decided to look for a job "at places where they hire Ph.D.'s." He landed at Accenture, and now, at an age at which many of his classmates are just finishing their doctorate, he runs the analytics division from his perch in Chicago. Ghani leads me out of his office and toward the shopping cart. For statistical modeling, he explains, grocery shopping is one of the first retail industries to conquer. This is because we buy food constantly. For many of us, the supermarket functions as a chilly, Muzakblaring annex to our pantries. (I would bet that millions of suburban Americans spend more time in supermarkets than in their formal living room.) Our grocery shopping is so prodigious that just by studying one year of our receipts, researchers can detect all sorts of patterns—far more than they can learn from a year of records detailing our other, more sporadic purchases. … It's terrifying." He thinks that over the next generation, many of us will surround ourselves with the kinds of networked gadgets he and his team are building and testing. These machines will busy themselves with far more than measuring people's pulse and counting the pills they take, which is what today's stateoftheart monitors can do. Dishman sees sensors eventually recording and building statistical models of almost every aspect of our behavior. They'll track our pathways in the house, the rhythm of our gait. They'll diagram our thrashing in bed and chart our nightly trips to the bathroom—perhaps keeping tabs on how much time we spend in there. Some of these gadgets will even measure the pause before we recognize a familiar voice on the phone. A surveillance society gone haywire? Personal privacy in tatters? … Let's say they see lots of activity in the morning and at bedtime. Together those two periods might represent 90 percent of toothbrush movement. From that, they can calculate a 90 percent probability that toothbrush movement involves teeth cleaning. (They could factor in time variables, but there's more than enough complexity ahead, as we'll see.) Next they move to the broom and the teakettle, and they ask the same questions. The goal is to build a statistical model for each of us that will infer from a series of observations what we're most likely to be doing. The toothbrush was easy. For the most part, it sticks to only one job. But consider the kettle. What are the chances that it's being used for tea? Maybe a person uses it to make instant soup (which is more nutritious than tea but dangerously salty for people like my mother). How can the Intel team come up with a probability? 

Everydata: The Misinformation Hidden in the Little Data You Consume Every Day by John H. Johnson Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Affordable Care Act / Obamacare, Black Swan, business intelligence, Carmen Reinhart, cognitive bias, correlation does not imply causation, Daniel Kahneman / Amos Tversky, Donald Trump, en.wikipedia.org, Kenneth Rogoff, laborforce participation, lake wobegon effect, Long Term Capital Management, Mercator projection, Mercator projection distort size, especially Greenland and Africa, meta analysis, metaanalysis, Nate Silver, obamacare, pvalue, PageRank, pattern recognition, randomized controlled trial, riskadjusted returns, Ronald Reagan, statistical model, The Signal and the Noise by Nate Silver, Tim Cook: Apple, wikimedia commons, Yogi Berra You collect all the data on every wheat price in the history of humankind, and all the different factors that determine the price of wheat (temperature, feed prices, transportation costs, etc.). First, you need to develop a statistical model to determine what factors have affected the price of wheat in the past and how these various factors relate to one another mathematically. Then, based on that model, you predict the price of wheat for next year.14 The problem is that no matter how big your sample is (even if it’s the full population), and how accurate your statistical model is, there are still unknowns that can cause your forecast to be off: n n n What if a railroad strike doubles the transportation costs? What if Congress passes new legislation capping the price of wheat? What if there’s a genetic mutation that makes wheat grow twice as fast, essentially doubling the world’s supply? … As Hovenkamp said, “the plaintiff’s expert had ignored a clear ‘outlier’ in the data.”33 If that outlier data had been excluded—as it arguably should have been—then the results would have shown a clear increase in market share for Conwood. Instead, the conclusion—driven by an extreme observation—showed a decrease. If your conclusions change dramatically by excluding a data point, then that data point is a strong candidate to be an outlier. In a good statistical model, you would expect that you can drop a data point without seeing a substantive difference in the results. It’s something to think about when looking for outliers. Are You Better Than Average? The average American: n n n n n n Sleeps more than 8.7 hours per day34 Weighs approximately 181 pounds (195.5 pounds for men and 166.2 pounds for women)35 Drinks 20.8 gallons of beer per year36 Drives 13,476 miles per year (hopefully not after drinking all that beer)37 Showers six times a week, but only shampoos four times a week38 Has been at his or her current job 4.6 years39 221158 ixiv 1210 r4ga.indd 42 2/8/16 5:58:50 PM Red State Blues 43 So, are you better than average? … (On its website, Visa even suggests that you tell your financial institution if you’ll be traveling, which can “help ensure that your card isn’t flagged for unusual activity.”18) This is a perfect example of a false positive—the credit card company predicted that the charges on your card were potentially fraudulent, but it was wrong. Events like this, which may not be accounted for in the statistical model, are potential sources of prediction error. Just as sampling error tells us about the uncertainty in our sample, prediction error is a way to measure uncertainty in the future, essentially by comparing the predicted results to the actual outcomes, once they occur.19 Prediction error is often measured using a prediction interval, which is the range in which we expect to see the next data point. 
pages: 294 words: 82,438 
Simple Rules: How to Thrive in a Complex World by Donald Sull, Kathleen M. Eisenhardt Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Affordable Care Act / Obamacare, Airbnb, asset allocation, Atul Gawande, barriers to entry, Basel III, Berlin Wall, carbon footprint, Checklist Manifesto, complexity theory, Craig Reynolds: boids flock, Credit Default Swap, Daniel Kahneman / Amos Tversky, diversification, en.wikipedia.org, European colonialism, Exxon Valdez, facts on the ground, Fall of the Berlin Wall, haute cuisine, invention of the printing press, Isaac Newton, Kickstarter, late fees, Lean Startup, Louis Pasteur, Lyft, Moneyball by Michael Lewis explains big data, Nate Silver, Network effects, obamacare, Paul Graham, performance metric, price anchoring, RAND corporation, risk/return, Saturday Night Live, sharing economy, Silicon Valley, Startup school, statistical model, Steve Jobs, TaskRabbit, The Signal and the Noise by Nate Silver, transportationnetwork company, twosided market, WallE, web application, Y Combinator, Zipcar One study looked at how police can identify where serial criminals live. A simple rule—take the midpoint of the two most distant crime scenes—got police closer to the criminal than more sophisticated decisionmaking approaches. Another study compared a stateoftheart statistical model and a simple rule to determine which did a better job of predicting whether past customers would purchase again. According to the simple rule, a customer was inactive if they had not purchased in x months (the number of months varies by industry). The simple rule did as well as the statistical model in predicting repeat purchases of online music, and beat it in the apparel and airline industries. Other research finds that simple rules match or beat more complicated models in assessing the likelihood that a house will be burgled and in forecasting which patients with chest pain are actually suffering from a heart attack. … ., “Validation of the Emergency Severity Index (ESI) in SelfReferred Patients in a European Emergency Department,” Emergency Medicine Journal 24, no. 3 (2007): 170–74. [>] Statisticians have found: Professor Scott Armstrong of the Wharton School reviewed thirtythree studies comparing simple and complex statistical models used to forecast business and economic outcomes. He found no difference in forecasting accuracy in twentyone of the studies. Sophisticated models did better in five studies, while simple models outperformed complex ones in seven cases. See J. Scott Armstrong, “Forecasting by Extrapolation: Conclusions from 25 Years of Research,” Interfaces 14 (1984): 52–66. Spyros Makridakis has hosted a series of competitions for statistical models over two decades, and consistently found that complex models fail to outperform simpler approaches. The history of the competitions is summarized in Spyros Makridakis and Michèle Hibon, “The M3Competition: Results, Conclusions, and Implications,” International Journal of Forecasting 16, no. 4 (2000): 451–76. [>] When it comes to modeling: In statistical terms, a model that closely approximates the underlying function that generates observed data is said to have low bias. … In fact, the 1/N rule ignores everything except for the number of investment alternatives under consideration. It is hard to imagine a simpler investment rule. And yet it works. One recent study of alternative investment approaches pitted the Markowitz model and three extensions of his approach against the 1/N rule, testing them on seven samples of data from the real world. This research ran a total of twentyeight horseraces between the four stateoftheart statistical models and the 1/N rule. With ten years of historical data to estimate risk, returns, and correlations, the 1/N rule outperformed the Markowitz equation and its extensions 79 percent of the time. The 1/N rule earned a positive return in every test, while the more complicated models lost money for investors more than half the time. Other studies have run similar tests and come to the same conclusions. 
pages: 348 words: 39,850 
Data Scientists at Work by Sebastian Gutierrez Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, algorithmic trading, bioinformatics, bitcoin, business intelligence, chief data officer, clean water, cloud computing, computer vision, continuous integration, correlation does not imply causation, crowdsourcing, data is the new oil, DevOps, domainspecific language, follow your passion, full text search, informal economy, information retrieval, Infrastructure as a Service, inventory management, iterative process, linked data, Mark Zuckerberg, microbiome, Moneyball by Michael Lewis explains big data, move fast and break things, natural language processing, Network effects, nuclear winter, optical character recognition, pattern recognition, Paul Graham, personalized medicine, Peter Thiel, pre–internet, quantitative hedge fund, quantitative trading / quantitative ﬁnance, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, selfdriving car, side project, Silicon Valley, Skype, software as a service, speech recognition, statistical model, Steve Jobs, stochastic process, technology bubble, text mining, the scientific method, web application www.itebooks.info 63 64 Chapter 3  Yann LeCun, Facebook In physics, a lot of the new results in astrophysics and highenergy physics actually rely very heavily on large data and complex statistical models. Things like the discovery of dark energy, for example, codiscovered by Saul Perlmutter, Nobel Prize winner, who is my counterpart of the MooreSloan Data Science Initiative at UC Berkeley, was made using massive statistical analysis. Also, a thing like the discovery of the Higgs boson was the result of massive statistical data analysis and results. Part of the system for this work was actually designed by my NYU colleague, Kyle Cranmer, who designed the integration for all the statistical models. Data Science is also on its way to revolutionize social science. There is actually a big push from social scientists who would love to put their hands on Facebook’s data. … You just can’t really move slowly when you’ve got a whole company full of supermotivated people excited about what they’re doing. It’s just not in your DNA. Of course, as competitors enter the www.itebooks.info Data Scientists at Work market, there’s also a legitimate business need of moving fast if we really want to keep our awesome business thriving. Gutierrez: How would you describe your work to a data scientist? Smallwood: I would say we’re a team that does all kinds of statistical modeling. We really focus and output three things as a team. We work on predictive models using all of the techniques that people in this field would be familiar with—regression techniques, clustering techniques, matrix factorization, support vector machines, et cetera, both supervised and unsupervised techniques. A second thing is algorithms, which I would say are obviously closely related to models, except that they’re embedded in some sort of ongoing process, like our product. … That’s really been my favorite part of working on a multidisciplinary team. Gutierrez: In addition to pair programming, do you do pair data science? Shellman: We don’t formally pair on statistics or data science work. For these subjects we have standing discussions around the whiteboards that surround our openplan office. For instance, yesterday we finished the day with a discussion of how a statistical model could be applied, what data would be needed, the limitations of the model, and the latency expected when using the model in a realtime application. So while we weren’t pair programming, we were discussing behavior and expected results as a group. The great thing about our workspace is that these discussions happen in the open, so everybody can hear, chose to participate, and join in if they have something to contribute. 
pages: 443 words: 51,804 
Handbook of Modeling HighFrequency Data in Finance by Frederi G. Viens, Maria C. Mariani, Ionut Florescu Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
algorithmic trading, asset allocation, automated trading system, backtesting, BlackScholes formula, Brownian motion, business process, continuous integration, corporate governance, discrete time, distributed generation, fixed income, Flash crash, housing crisis, implied volatility, incomplete markets, linear programming, mandelbrot fractal, market friction, market microstructure, martingale, Menlo Park, pvalue, pattern recognition, performance metric, principal–agent problem, random walk, risk tolerance, risk/return, short selling, statistical model, stochastic process, stochastic volatility, transaction costs, value at risk, volatility smile, Wiener process Florescu, Ionuţ, 1973– III. Title. HG106.V54 2011 332.01 5193–dc23 2011038022 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 Contents Preface Contributors xi xiii part One Analysis of Empirical Data 1 1 Estimation of NIG and VG Models for High Frequency Financial Data 3 José E. FigueroaLópez, Steven R. Lancette, Kiseop Lee, and Yanhui Mi 1.1 1.2 1.3 1.4 1.5 1.6 Introduction, 3 The Statistical Models, 6 Parametric Estimation Methods, 9 FiniteSample Performance via Simulations, 14 Empirical Results, 18 Conclusion, 22 References, 24 2 A Study of Persistence of Price Movement using High Frequency Financial Data 27 Dragos Bozdog, Ionuţ Florescu, Khaldoun Khashanah, and Jim Wang 2.1 Introduction, 27 2.2 Methodology, 29 2.3 Results, 35 v vi Contents 2.4 Rare Events Distribution, 41 2.5 Conclusions, 44 References, 45 3 Using Boosting for Financial Analysis and Trading 47 Germán Creamer 3.1 3.2 3.3 3.4 3.5 Introduction, 47 Methods, 48 Performance Evaluation, 53 Earnings Prediction and Algorithmic Trading, 60 Final Comments and Conclusions, 66 References, 69 4 Impact of Correlation Fluctuations on Securitized structures 75 Eric Hillebrand, Ambar N. … In Section 1.5, we present our empirical results using high frequency transaction data from the US equity market. The data was obtained from the NYSE TAQ database of 2005 trades via Wharton’s WRDS system. For the sake of clarity and space, we only present the results for Intel and defer a full analysis of other stocks for a future publication. We ﬁnish with a section of conclusions and further recommendations. 1.2 The Statistical Models 1.2.1 GENERALITIES OF EXPONENTIAL LÉVY MODELS Before introducing the speciﬁc models we consider in this chapter, let us brieﬂy motivate the application of Lévy processes in ﬁnancial modeling. We refer the reader to the monographs of Cont & Tankov (2004) and Sato (1999) or the recent review papers FigueroaLópez (2011) and Tankov (2011) for further information. Exponential (or Geometric) Lévy models are arguably the most natural generalization of the geometric Brownian motion intrinsic in the Black–Scholes option pricing model. … Exponential (or Geometric) Lévy models are arguably the most natural generalization of the geometric Brownian motion intrinsic in the Black–Scholes option pricing model. A geometric Brownian motion (also called Black–Scholes model) postulates the following conditions about the price process (St )t≥0 of a risky asset: (1) The (log) return on the asset over a time period [t, t + h] of length h, that is, Rt,t+h := log St+h St is Gaussian with mean μh and variance σ 2 h (independent of t); 7 1.2 The Statistical Models (2) Log returns on disjoint time periods are mutually independent; (3) The price path t → St is continuous; that is, P(Su → St , as u → t, ∀ t) = 1. The previous assumptions can equivalently be stated in terms of the socalled log return process (Xt )t , denoted henceforth as Xt := log St . S0 Indeed, assumption (1) is equivalent to ask that the increment Xt+h − Xt of the process X over [t, t + h] is Gaussian with mean μh and variance σ 2 h. 
pages: 545 words: 137,789 
How Markets Fail: The Logic of Economic Calamities by John Cassidy Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, Andrei Shleifer, anticommunist, asset allocation, assetbacked security, availability heuristic, bank run, banking crisis, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, BlackScholes formula, Bretton Woods, British Empire, capital asset pricing model, centralized clearinghouse, collateralized debt obligation, Columbine, conceptual framework, Corn Laws, correlation coefficient, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, Daniel Kahneman / Amos Tversky, debt deflation, diversification, Elliott wave, Eugene Fama: efficient market hypothesis, financial deregulation, financial innovation, Financial Instability Hypothesis, financial intermediation, full employment, George Akerlof, global supply chain, Haight Ashbury, hiring and firing, Hyman Minsky, income per capita, incomplete markets, index fund, invisible hand, John Nash: game theory, John von Neumann, Joseph Schumpeter, laissezfaire capitalism, liquidity trap, London Interbank Offered Rate, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, margin call, market bubble, market clearing, mental accounting, Mikhail Gorbachev, Mont Pelerin Society, moral hazard, mortgage debt, Naomi Klein, Network effects, Nick Leeson, Northern Rock, paradox of thrift, Ponzi scheme, price discrimination, price stability, principal–agent problem, profit maximization, quantitative trading / quantitative ﬁnance, race to the bottom, Ralph Nader, RAND corporation, random walk, Renaissance Technologies, rent control, Richard Thaler, risk tolerance, riskadjusted returns, road to serfdom, Robert Shiller, Robert Shiller, Ronald Coase, Ronald Reagan, shareholder value, short selling, Silicon Valley, South Sea Bubble, sovereign wealth fund, statistical model, technology bubble, The Chicago School, The Great Moderation, The Market for Lemons, The Wealth of Nations by Adam Smith, too big to fail, transaction costs, unorthodox policies, value at risk, Vanguard fund “Today, retail lending has become more routinized as banks have become increasingly adept at predicting default risk by applying statistical models to data, such as credit scores,” Bernanke went on. “Other tools include proprietary internal debtrating models and thirdparty programs that use market data to analyze the risk of exposures to corporate borrowers that issue stock.” While challenges remained, Bernanke concluded, “banking organizations of all sizes have made substantial strides over the past two decades in their ability to measure and manage risks.” Nobody could quibble with Bernanke’s point that Wall Street was becoming more quantitative: the research and risk departments of big financial firms were teeming with physicists, applied mathematicians, and statisticians. But the proper role of statistical models is as a useful adjunct to an overall strategy of controlling risk, not as a substitute for one. … However, it also raises the possibility that the causal relationships that determine market movements aren’t fixed, but vary over time. Maybe because of shifts in psychology or government policy, there are periods when markets will settle into a rut, and other periods when they will be apt to gyrate in alarming fashion. This picture seems to jibe with reality, but it raises some tricky issues for quantitative finance. If the underlying reality of the markets is constantly changing, statistical models based on past data will be of limited use, at best, in determining what is likely to happen in the future. And firms and investors that rely on these models to manage risk may well be exposing themselves to danger. The economics profession didn’t exactly embrace Mandelbrot’s criticisms. As the 1970s proceeded, the use of quantitative techniques became increasingly common on Wall Street. The cointossing view of finance made its way into the textbooks and, with the help of Burton Malkiel, onto the bestsellers list. … After listening to Vincent Reinhart, the head of the Fed’s Division of Monetary Affairs, suggest several ways the Fed could try to revive the economy if interest rate changes could no longer be used, he dismissed the discussion as “premature” and described the possibility of a prolonged deflation as “a very small probability event.” The discussion turned to the immediate issue of whether to keep the funds rate at 1.25 percent. Since the committee’s previous meeting, Congress had approved the Bush administration’s third set of tax cuts since 2001, which was expected to give spending a boost. The Fed’s own statistical model of the economy was predicting a vigorous upturn later in 2003, suggesting that further rate cuts would be unnecessary and that some policy tightening might even be needed. “But that forecast has a very low probability, as far as I’m concerned,” Greenspan said curtly. “It points to an outcome that would be delightful if it were to materialize, but it is not a prospect on which we should focus our policy at this point.” 
pages: 263 words: 75,455 
Quantitative Value: A Practitioner's Guide to Automating Intelligent Investment and Eliminating Behavioral Errors by Wesley R. Gray, Tobias E. Carlisle Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, Andrei Shleifer, asset allocation, Atul Gawande, backtesting, Black Swan, capital asset pricing model, Checklist Manifesto, cognitive bias, compound rate of return, corporate governance, correlation coefficient, credit crunch, Daniel Kahneman / Amos Tversky, discounted cash flows, Eugene Fama: efficient market hypothesis, forensic accounting, hindsight bias, Louis Bachelier, pvalue, passive investing, performance metric, quantitative hedge fund, random walk, Richard Thaler, riskadjusted returns, Robert Shiller, Robert Shiller, shareholder value, Sharpe ratio, short selling, statistical model, systematic trading, The Myth of the Rational Market, time value of money, transaction costs We need some means to protect us from our cognitive biases, and the quantitative method is that means. It serves both to protect us from our own behavioral errors and to exploit the behavioral errors of others. The model does need not be complex to achieve this end. In fact, the weight of evidence indicates that even simple statistical models outperform the best experts. It speaks to the diabolical nature of our faulty cognitive apparatus that those simple statistical models continue to outperform the best experts even when those same experts are given access to the models' output. This is as true for a value investor as it is for any other expert in any other field of endeavor. This book is aimed at value investors. It's a humbling and maddening experience to compare active investment results with an analogous passive strategy. … In his book, Expert Political Judgment,36 Philip Tetlock discusses his extensive study of people who make prediction their business—the experts. Tetlock's conclusion is that experts suffer from the same behavioral biases as the laymen. Tetlock's study fits within a much larger body of research that has consistently found that experts are as unreliable as the rest of us. A large number of studies have examined the records of experts against simple statistical model, and, in almost all cases, concluded that experts either underperform the models or can do no better. It's a compelling argument against human intuition and for the statistical approach, whether it's practiced by experts or nonexperts.37 Even Experts Make Behavioral Errors In many disciplines, simple quantitative models outperform the intuition of the best experts. The simple quantitative models continue to outperform the judgments of the best experts, even when those experts are given the benefit of the outputs from the simple quantitative model. … The model predicted O'Connor's vote correctly 70 percent of the time, while the experts' success rate was only 61 percent.41How can it be that simple models perform better than experienced clinical psychologists or renowned legal experts with access to detailed information about the cases? Are these results just flukes? No. In fact, the MMPI and Supreme Court decision examples are not even rare. There are an overwhelming number of studies and metaanalyses—studies of studies—that corroborate this phenomenon. In his book, Montier provides a diverse range of studies comparing statistical models and experts, ranging from the detection of brain damage, the interview process to admit students to university, the likelihood of a criminal to reoffend, the selection of “good” and “bad” vintages of Bordeaux wine, and the buying decisions of purchasing managers. Value Investors Have Cognitive Biases, Too Graham recognized early on that successful investing required emotional discipline. 

Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, data acquisition, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, Mark Zuckerberg, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, selfdriving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining Another difference is a widespread preference for visual analytics on big data. For reasons not entirely understood (by anyone, I think), the results of big data analyses are often expressed in visual formats. Now, visual analytics have a lot of strengths: They are relatively easy for nonquantitative executives to interpret, and they get attention. The downside is that they are not generally well suited for expressing complex multivariate relationships and statistical models. Put in other terms, most visual displays of data are for descriptive analytics, rather than predictive or prescriptive ones. They can, however, show a lot of data at once, as figure 41 illustrates. It’s a display of the tweets and retweets on Twitter involving particular New York Times articles.5 I find—as with many other complex big data visualizations—this one difficult to decipher. I sometimes think that many big data visualizations are created simply because they can be, rather than to provide clarity on an issue. … Chapter_04.indd 112 03/12/13 12:00 PM 5 Technology for Big Data Written with Jill Dyché A major component of what makes the management and analysis of big data possible is new technology.* In effect, big data is not just a large volume of unstructured data, but also the technologies that make processing and analyzing it possible. Specific big data technologies analyze textual, video, and audio content. When big data is fast moving, technologies like machine learning allow for the rapid creation of statistical models that fit, optimize, and predict the data. This chapter is devoted to all of these big data technologies and the difference they make. The technologies addressed in the chapter are outlined in table 51. *I am indebted in this section to Jill Dyché, vice president of SAS Best Practices, who collaborated with me on this work and developed many of the frameworks in this section. Much of the content is taken from our report, Big Data in Big Companies (International Institute for Analytics, April 2013). … Hive performs similar functions but is more batch oriented, and it can transform data into the relational format suitable for Structured Query Language (SQL; used to access and manipulate data in databases) queries. This makes it useful for analysts who are familiar with that query language. Business View The business view layer of the stack makes big data ready for further analysis. Depending on the big data application, additional processing via MapReduce or custom code might be used to construct an intermediate data structure, such as a statistical model, a flat file, a relational table, or a data cube. The resulting structure may be intended for additional analysis or to be queried by a traditional SQLbased query tool. Many vendors are moving to socalled “SQL on Hadoop” approaches, simply because SQL has been used in business for a couple of decades, and many people (and higherlevel languages) know how to create SQL queries. This business view ensures that big data is more consumable by the tools and the knowledge workers that already exist in an organization. 
pages: 460 words: 122,556 
The End of Wall Street by Roger Lowenstein Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Asian financial crisis, assetbacked security, bank run, banking crisis, Berlin Wall, Bernie Madoff, Black Swan, Brownian motion, Carmen Reinhart, collateralized debt obligation, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversified portfolio, eurozone crisis, Fall of the Berlin Wall, fear of failure, financial deregulation, fixed income, high net worth, Hyman Minsky, interest rate derivative, invisible hand, Kenneth Rogoff, London Interbank Offered Rate, Long Term Capital Management, margin call, market bubble, Martin Wolf, moral hazard, mortgage debt, Northern Rock, Ponzi scheme, profit motive, race to the bottom, risk tolerance, Ronald Reagan, savings glut, short selling, sovereign wealth fund, statistical model, the payments system, too big to fail, tulip mania, Y2K See AIG bailouts Ben Bernanke and board of Warren Buffett and CDOs and collateral calls on compensation at corporate structure of credit default swaps and credit rating agencies and Jamie Dimon and diversity of holdings employees, number of Financial Products subsidiary Timothy Geithner and Goldman Sachs and insurance (credit default swap) premiums of JPMorgan Chase and lack of reserve for losses leadership changes Lehman Brothers and losses Moody’s and Morgan Stanley and New York Federal Reserve Bank and Hank Paulson and rescue of. See AIG bailouts revenue of shareholders statistical modeling of stock price of struggles of risk of systemic effects of failure of Texas and AIG bailouts amount of Ben Bernanke and board’s role in credit rating agencies and Federal Reserve and Timothy Geithner and Goldman Sachs and JPMorgan Chase and Lehman Brothers’ bankruptcy and New York state and Hank Paulson and reasons for harm to shareholders in Akers, John Alexander, Richard Allison, Herbert Ambac American Home Mortgages Andrukonis, David appraisers, real estate ArchstoneSmith Trust Associates First Capital Atteberry, Thomas auto industry Bagehot, Walter bailouts. … See credit crisis volatility of credit crisis borrowers, lack of effects of fear of lending mortgages and reasons for spread of as unforeseen credit cycle credit default swaps AIG and Goldman Sachs and Morgan Stanley and credit rating agencies. See also specific agencies AIG and capital level determination by guessing by inadequacy of models of Lehman Brothers and Monte Carlo method of mortgagebacked securities and statistical modeling used by Credit Suisse Cribiore, Alberto Cummings, Christine Curl, Gregory Dallavecchia, Enrico Dannhauser, Stephen Darling, Alistair Dean Witter debt of financial firms U.S. reliance on of U.S. families defaults/delinquencies deflation deleveraging. See also specific firms del Missier, Jerry Democrats deposit insurance deregulation of banking system and derivatives of financial markets derivatives. … See home foreclosure(s) foreign investors France Frank, Barney Freddie Mac and Fannie Mae accounting problems of affordable housing and AlternativeA loans bailout of Ben Bernanke and capital raised by competitive threats to Congress and Countrywide Financial and Democrats and Federal Reserve and foreign investment in Alan Greenspan and as guarantor history of lack of regulation of leadership changes leverage losses mortgage bubble and as mortgage traders Hank Paulson and politics and predatory lending and reasons for failures of relocation to private sector Robert Rodriguez and shareholders solving financial crisis through statistical models of stock price of Treasury Department and free market Freidheim, Scott Friedman, Milton Fuld, Richard compensation of failure to pull back from mortgagebacked securities identification with Lehman Brothers Lehman Brothers’ bankruptcy and Lehman Brothers’ last days and long tenure of Hank Paulson and personality and character of Gamble, James (Jamie) GDP Geithner, Timothy AIG and bank debt guarantees and Bear Stearns bailout and career of China and Citigroup and financial crisis, response to Lehman Brothers and money markets and Morgan Stanley and in Obama administration Hank Paulson and TARP and Gelband, Michael General Electric General Motors Germany GlassSteagall Act Glauber, Robert Golden West Savings and Loan Goldman Sachs AIG and as bank holding company Warren Buffett investment in capital raised by capital sought by compensation at credit default swaps and hedge funds and insurance (credit default swap) premiums of job losses at leverage of Merrill Lynch and Stanley O’Neal’s obsession with Hank Paulson and pull back from mortgagebacked securities short selling against stock price of Wachovia and Gorton, Gary government, U.S. 
pages: 461 words: 128,421 
The Myth of the Rational Market: A History of Risk, Reward, and Delusion on Wall Street by Justin Fox Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, Andrei Shleifer, asset allocation, assetbacked security, bank run, Benoit Mandelbrot, BlackScholes formula, Bretton Woods, Brownian motion, capital asset pricing model, card file, Cass Sunstein, collateralized debt obligation, complexity theory, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, discovery of the americas, diversification, diversified portfolio, Edward Glaeser, endowment effect, Eugene Fama: efficient market hypothesis, experimental economics, financial innovation, Financial Instability Hypothesis, floating exchange rates, George Akerlof, Henri Poincaré, Hyman Minsky, implied volatility, impulse control, index arbitrage, index card, index fund, invisible hand, Isaac Newton, John Nash: game theory, John von Neumann, jointstock company, Joseph Schumpeter, libertarian paternalism, linear programming, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, market bubble, market design, New Journalism, Nikolai Kondratiev, Paul Lévy, pension reform, performance metric, Ponzi scheme, prediction markets, pushing on a string, quantitative trading / quantitative ﬁnance, Ralph Nader, RAND corporation, random walk, Richard Thaler, risk/return, road to serfdom, Robert Shiller, Robert Shiller, rolodex, Ronald Reagan, shareholder value, Sharpe ratio, short selling, side project, Silicon Valley, South Sea Bubble, statistical model, The Chicago School, The Myth of the Rational Market, The Predators' Ball, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas L Friedman, Thorstein Veblen, Tobin tax, transaction costs, tulip mania, value at risk, Vanguard fund, volatility smile, Yogi Berra First, modeling financial risk is hard. Statistical models can never fully capture all things that can go wrong (or right). It was as physicist and random walk pioneer M. F. M. Osborne told his students at UC–Berkeley back in 1972: For everyday market events the bell curve works well. When it doesn’t, one needs to look outside the statistical models and make informed judgments about what’s driving the market and what the risks are. The derivatives business and other financial sectors on the rise in the 1980s and 1990s were dominated by young quants. These people knew how to work statistical models, but they lacked the market experience needed to make informed judgments. Meanwhile, those with the experience, wisdom, and authority to make informed judgments—the bosses—didn’t understand the statistical models. It’s possible that, as more quants rise into positions of high authority (1986 Columbia finance Ph.D. … Traditional ratios of loantovalue and monthly payments to income gave way to credit scoring and purportedly precise gradations of default risk that turned out to be worse than useless. In the 1970s, Amos Tversky and Daniel Kahneman had argued that realworld decision makers didn’t follow the statistical models of John von Neumann and Oskar Morgenstern, but used simple heuristics—rules of thumb—instead. Now the mortgage lending industry was learning that heuristics worked much better than statistical models descended from the work of von Neumann and Morgenstern. Simple trumped complex. In 2005, Robert Shiller came out with a second edition of Irrational Exuberance that featured a new twentypage chapter on “The Real Estate Market in Historical Perspective.” It offered no formulas for determining whether prices were right, but it did feature an index of U.S. home prices back to 1890. 

EvidenceBased Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals by David Aronson Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, Andrew Wiles, asset allocation, availability heuristic, backtesting, Black Swan, capital asset pricing model, cognitive dissonance, compound rate of return, Daniel Kahneman / Amos Tversky, distributed generation, Elliott wave, en.wikipedia.org, feminist movement, hindsight bias, index fund, invention of the telescope, invisible hand, Long Term Capital Management, mental accounting, meta analysis, metaanalysis, pvalue, pattern recognition, Ponzi scheme, price anchoring, price stability, quantitative trading / quantitative ﬁnance, Ralph Nelson Elliott, random walk, retrograde motion, revision control, risk tolerance, riskadjusted returns, riskless arbitrage, Robert Shiller, Robert Shiller, Sharpe ratio, short selling, statistical model, systematic trading, the scientific method, transfer pricing, unbiased observer, yield curve, Yogi Berra It was a review of prior studies, known as a metaanalysis, which examined 20 studies that had compared the subjective diagnoses of psychologists and psychiatrists with those produced by linear statistical models. The studies covered the prediction of academic success, the likelihood of criminal recidivism, and predicting the outcomes of electrical shock therapy. In each case, the experts rendered a judgment by evaluating a multitude of variables in a subjective manner. “In all studies, the statistical model provided more accurate predictions or the two methods tied.”34 A subsequent study by Sawyer35 was a meta analysis of 45 studies. “Again, there was not a single study in which clinical global judgment was superior to the statistical prediction (termed ‘mechanical combination’ by Sawyer).”36 Sawyer’s investigation is noteworthy because he considered studies in which the human expert was allowed access to information that was not considered by the statistical model, and yet the model was still superior. … The prediction problems spanned nine different ﬁelds: (1) academic performance of graduate students, (2) lifeexpectancy of cancer patients, (3) changes in stock prices, (4) mental illness using personality tests, (5) grades and attitudes in a psychology course, (6) business failures using ﬁnancial ratios, (7) students’ ratings of teaching effectiveness, (8) performance of life insurance sales personnel, and (9) IQ scores using Rorschach Tests. Note that the average correlation of the statistical model was 0.64 versus the expert average of 0.33. In terms of information content, which is measured by the correlation coefﬁcient squared or rsquared, the model’s predictions were on average 3.76 times as informative as the experts’. Numerous additional studies comparing expert judgment to statistical models (rules) have conﬁrmed these ﬁndings, forcing the conclusion that people do poorly when attempting to combine a multitude of variables to make predictions or judgments. In 1968, Goldberg39 showed that a linear prediction model utilizing personality test scores as inputs could discriminate neurotic from psychotic patients better than experienced clinical diagnosticians. … The task was to predict the propensity for violence among newly admitted male psychiatric patients based on 19 inputs. The average accuracy of the experts, as measured by the correlation coefﬁcient between their prediction of violence and the actual manifestation of violence, was a poor 0.12. The single best expert had a score of 0.36. The predictions of a linear statistical model, using the same set of 19 inputs, achieved a correlation of 0.82. In this instance the model’s predictions were nearly 50 times more informative than the experts’. Meehl continued to expand his research of comparing experts and statistical models and in 1986 concluded that “There is no controversy in social science which shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one. When you are pushing 90 investigations [currently greater than 15040] predicting everything from the outcomes of football games to the diagnosis of liver disease and when you can hardly come up with a half dozen studies showing even a weak tendency in favor of the clinician, it is time to draw a practical conclusion.”41 The evidence continues to accumulate, yet few experts pay heed. 
pages: 197 words: 35,256 
NumPy Cookbook by Ivan Idris Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
business intelligence, cloud computing, computer vision, Debian, en.wikipedia.org, Eratosthenes, mandelbrot fractal, pvalue, sorting algorithm, statistical model, transaction costs, web application diff Calculates differences of numbers within a NumPy array. If not specified, firstorder differences are computed. log Calculates the natural log of elements in a NumPy array. sum Sums the elements of a NumPy array. dot Does matrix multiplication for 2D arrays. Calculates the inner product for 1D arrays. Installing scikitsstatsmodels The scikitsstatsmodels package focuses on statistical modeling. It can be integrated with NumPy and Pandas (more about Pandas later in this chapter). How to do it... Source and binaries can be downloaded from http://statsmodels.sourceforge.net/install.html . If you are installing from source, you need to run the following command: python setup.py install If you are using setuptools, the command is: easy_install statsmodels Performing a normality test with scikitsstatsmodels The scikitsstatsmodels package has lots of statistical tests. … Perform an ordinary least squares calculation by creating an OLS object, and calling its fit method as follows: x, y = data.exog, data.endog fit = statsmodels.api.OLS(y, x).fit() print "Fit params", fit.params This should print the result of the fitting procedure, as follows: Fit params COPPERPRICE 14.222028 INCOMEINDEX 1693.166242 ALUMPRICE 60.638117 INVENTORYINDEX 2515.374903 TIME 183.193035 Summarize.The results of the OLS fit can be summarized by the summary method as follows: print fit.summary() This will give us the following output for the regression results: The code to load the copper data set is as follows: import statsmodels.api # See https://github.com/statsmodels /statsmodels/tree/master/statsmodels/datasets data = statsmodels.api.datasets.copper.load_pandas() x, y = data.exog, data.endog fit = statsmodels.api.OLS(y, x).fit() print "Fit params", fit.params print print "Summary" print print fit.summary() How it works... The data in the Dataset class of statsmodels follows a special format. Among others, this class has the endog and exog attributes. Statsmodels has a load function, which loads data as NumPy arrays. Instead, we used the load_pandas method, which loads data as Pandas objects. We did an OLS fit, basically giving us a statistical model for copper price and consumption. Resampling time series data In this tutorial, we will learn how to resample time series with Pandas. How to do it... We will download the daily price time series data for AAPL, and resample it to monthly data by computing the mean. We will accomplish this by creating a Pandas DataFrame, and calling its resample method. Creating a datetime index.Before we can create a Pandas DataFrame, we need to create a DatetimeIndex method to pass to the DataFrame constructor. 
pages: 416 words: 39,022 
Asset and Risk Management: Risk Oriented Finance by Louis Esch, Robert Kieffer, Thierry Lopez Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
asset allocation, Brownian motion, business continuity plan, business process, capital asset pricing model, computer age, corporate governance, discrete time, diversified portfolio, implied volatility, index fund, interest rate derivative, iterative process, P = NP, pvalue, random walk, risk/return, shareholder value, statistical model, stochastic process, transaction costs, value at risk, Wiener process, yield curve, zerocoupon bond Table 6.3 Student distribution quantiles ν γ2 z0.95 z0.975 z0.99 6.00 1.00 0.55 0.38 0.29 0.23 0.17 0.11 0.05 0 2.601 2.026 1.883 1.818 1.781 1.757 1.728 1.700 1.672 1.645 3.319 2.491 2.289 2.199 2.148 2.114 2.074 2.034 1.997 1.960 4.344 3.090 2.795 2.665 2.591 2.543 2.486 2.431 2.378 2.326 5 10 15 20 25 30 40 60 120 normal 8 Blattberg R. and Gonedes N., A comparison of stable and student distributions as statistical models for stock prices, Journal of Business, Vol. 47, 1974, pp. 244–80. 9 Pearson E. S. and Hartley H. O., Biometrika Tables for Statisticians, Biometrika Trust, 1976, p. 146. 190 Asset and Risk Management This clearly shows that when the normal law is used in place of the Student laws, the VaR parameter is underestimated unless the number of degrees of freedom is high. Example With the same data as above, that is, E(pt ) = 100 and σ (pt ) = 80, and for 15 degrees of freedom, we ﬁnd the following evaluations of VaR, instead of 31.6, 64.3 and 86.1 respectively. … Using pt presents the twofold advantage of: • making the magnitudes of the various factors likely to be involved in evaluating an asset or portfolio relative; • supplying a variable that has been shown to be capable of possessing certain distributional properties (normality or quasinormality for returns on equities, for example). 1 Estimating quantiles is often a complex problem, especially for arguments close to 0 or 1. Interested readers should read Gilchrist W. G., Statistical Modelling with Quantile Functions, Chapman & Hall/CRC, 2000. 2 If the risk factor X is a share price, we are looking at the return on that share (see Section 3.1.1). 200 Asset and Risk Management Valuation models Historical data Estimation technique VaR Figure 7.1 Estimating VaR Note In most calculation methods, a different expression is taken into consideration: ∗ (t) = ln X(t) X(t − 1) As we saw in Section 3.1.1, this is in fact very similar to (t) and has the advantage that it can take on any real value3 and that the logarithmic return for several consecutive periods is the sum of the logarithmic return for each of those periods. … If the model is nonstationary (nonstationary variance and/or mean), it can be converted into a stationary model by using the integration of order r after the logarithmic transformation : if y is the transformed variable, apply the technique to ((. . . (yt ))) − r times− instead of yt ((yt ) = yt − yt−1 ). We therefore use an ARIMA(p, r, q) procedure.16 If this procedure fails because of nonconstant volatility in the error term, it will be necessary to use the ARCHGARCH or EGARCH models (Appendix 7). B. The equation on the replicated positions This equation may be estimated by a statistical model (such as SAS/OR procedure PROC NPL), using multiple regression with the constraints 15 years αi = 1 and αi ≥ 0 i=3 months It is also possible to estimate the replicated positions (b) with the single constraint (by using the SAS/STAT procedure) 15 years αi = 1 i=3 months In both cases, the duration of the demand product is a weighted average of the durations. In the second case, it is possible to obtain negative αi values. 
pages: 252 words: 72,473 
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Affordable Care Act / Obamacare, Bernie Madoff, big data  Walmart  Pop Tarts, call centre, carried interest, cloud computing, collateralized debt obligation, correlation does not imply causation, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, Emanuel Derman, housing crisis, illegal immigration, Internet of things, late fees, medical bankruptcy, Moneyball by Michael Lewis explains big data, new economy, obamacare, Occupy movement, offshore financial centre, payday loans, peertopeer lending, Peter Thiel, Ponzi scheme, prediction markets, price discrimination, quantitative hedge fund, Ralph Nader, RAND corporation, recommendation engine, Sharpe ratio, statistical model, Tim Cook: Apple, too big to fail, Unsafe at Any Speed, Upton Sinclair, Watson beat the top human players on Jeopardy!, working poor The proxies the journalists chose for educational excellence make sense, after all. Their spectacular failure comes, instead, from what they chose not to count: tuition and fees. Student financing was left out of the model. This brings us to the crucial question we’ll confront time and again. What is the objective of the modeler? In this case, put yourself in the place of the editors at U.S. News in 1988. When they were building their first statistical model, how would they know when it worked? Well, it would start out with a lot more credibility if it reflected the established hierarchy. If Harvard, Stanford, Princeton, and Yale came out on top, it would seem to validate their model, replicating the informal models that they and their customers carried in their own heads. To build such a model, they simply had to look at those top universities and count what made them so special. … In a sense, it learns. Compared to the human brain, machine learning isn’t especially efficient. A child places her finger on the stove, feels pain, and masters for the rest of her life the correlation between the hot metal and her throbbing hand. And she also picks up the word for it: burn. A machine learning program, by contrast, will often require millions or billions of data points to create its statistical models of cause and effect. But for the first time in history, those petabytes of data are now readily available, along with powerful computers to process them. And for many jobs, machine learning proves to be more flexible and nuanced than the traditional programs governed by rules. Language scientists, for example, spent decades, from the 1960s to the early years of this century, trying to teach computers how to read. … Imagine if a highly motivated and responsible person with modest immigrant beginnings is trying to start a business and needs to rely on such a system for early investment. Who would take a chance on such a person? Probably not a model trained on such demographic and behavioral data. I should note that in the statistical universe proxies inhabit, they often work. More times than not, birds of a feather do fly together. Rich people buy cruises and BMWs. All too often, poor people need a payday loan. And since these statistical models appear to work much of the time, efficiency rises and profits surge. Investors double down on scientific systems that can place thousands of people into what appear to be the correct buckets. It’s the triumph of Big Data. And what about the person who is misunderstood and placed in the wrong bucket? That happens. And there’s no feedback to set the system straight. A statisticscrunching engine has no way to learn that it dispatched a valuable potential customer to call center hell. 

Analysis of Financial Time Series by Ruey S. Tsay Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Asian financial crisis, asset allocation, BlackScholes formula, Brownian motion, capital asset pricing model, compound rate of return, correlation coefficient, data acquisition, discrete time, frictionless, frictionless market, implied volatility, index arbitrage, Long Term Capital Management, market microstructure, martingale, pvalue, pattern recognition, random walk, risk tolerance, short selling, statistical model, stochastic process, stochastic volatility, telemarketer, transaction costs, value at risk, volatility smile, Wiener process, yield curve Stable Distribution The stable distributions are a natural generalization of normal in that they are stable under addition, which meets the need of continuously compounded returns rt . Furthermore, stable distributions are capable of capturing excess kurtosis shown by historical stock returns. However, nonnormal stable distributions do not have a finite variance, which is in conflict with most finance theories. In addition, statistical modeling using nonnormal stable distributions is difficult. An example of nonnormal stable distributions is the Cauchy distribution, which is symmetric with respect to its median, but has infinite variance. Scale Mixture of Normal Distributions Recent studies of stock returns tend to use scale mixture or finite mixture of normal distributions. Under the assumption of scale mixture of normal distributions, the log return rt is normally distributed with mean µ and variance σ 2 [i.e., rt ∼ N (µ, σ 2 )]. … Furthermore, the lag autocovariance of rt is γ = Cov(rt , rt− ) = E =E ∞ i=0 ∞ ψi at−i ∞ ψ j at−− j j=0 ψi ψ j at−i at−− j i, j=0 = ∞ j=0 2 2 ψ j+ ψ j E(at−− j ) = σa ∞ ψ j ψ j+ . j=0 Consequently, the ψweights are related to the autocorrelations of rt as follows: ∞ ψi ψi+ γ = i=0 ρ = ∞ 2 , γ0 1 + i=1 ψi ≥ 0, (2.5) where ψ0 = 1. Linear time series models are econometric and statistical models used to describe the pattern of the ψweights of rt . 2.4 SIMPLE AUTOREGRESSIVE MODELS The fact that the monthly return rt of CRSP valueweighted index has a statistically significant lag1 autocorrelation indicates that the lagged return rt−1 might be useful in predicting rt . A simple model that makes use of such predictive power is rt = φ0 + φ1rt−1 + at , (2.6) where {at } is assumed to be a white noise series with mean zero and variance σa2 . … If at has a symmetric distribution around zero, then conditional on pt−1 , pt has a 50–50 chance to go up or down, implying that pt would go up or down at random. If we treat the randomwalk model as a special AR(1) model, then the coefficient of pt−1 is unity, which does not satisfy the weak stationarity condition of an AR(1) model. A randomwalk series is, therefore, not weakly stationary, and we call it a unitroot nonstationary time series. The randomwalk model has been widely considered as a statistical model for the movement of logged stock prices. Under such a model, the stock price is not predictable or mean reverting. To see this, the 1step ahead forecast of model (2.32) at the forecast origin h is p̂h (1) = E( ph+1  ph , ph−1 , . . .) = ph , which is the log price of the stock at the forecast origin. Such a forecast has no practical value. The 2step ahead forecast is UNIT ROOT NONSTATIONARITY 57 p̂h (2) = E( ph+2  ph , ph−1 , . . .) = E( ph+1 + ah+2  ph , ph−1 , . . .) = E( ph+1  ph , ph−1 , . . .) = p̂h (1) = ph , which again is the log price at the forecast origin. 
pages: 481 words: 125,946 
What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence by John Brockman Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
3D printing, agricultural Revolution, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, artificial general intelligence, augmented reality, autonomous vehicles, bitcoin, blockchain, clean water, cognitive dissonance, Colonization of Mars, complexity theory, computer age, computer vision, constrained optimization, corporate personhood, cosmological principle, cryptocurrency, cuban missile crisis, Danny Hillis, dark matter, discrete time, Elon Musk, Emanuel Derman, endowment effect, epigenetics, Ernest Rutherford, experimental economics, Flash crash, friendly AI, Google Glasses, hive mind, income inequality, information trail, Internet of things, invention of writing, iterative process, Jaron Lanier, job automation, John von Neumann, Kevin Kelly, knowledge worker, loose coupling, microbiome, Moneyball by Michael Lewis explains big data, natural language processing, Network effects, Norbert Wiener, pattern recognition, Peter Singer: altruism, phenotype, planetary scale, Ray Kurzweil, recommendation engine, Republic of Letters, RFID, Richard Thaler, Rory Sutherland, Search for Extraterrestrial Intelligence, selfdriving car, sharing economy, Silicon Valley, Skype, smart contracts, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, superintelligent machines, supervolcano, the scientific method, The Wisdom of Crowds, theory of mind, Thorstein Veblen, too big to fail, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!, Y2K A literature pioneered by psychologists such as the late Robyn Dawes finds that virtually any routine decisionmaking task—detecting fraud, assessing the severity of a tumor, hiring employees—is done better by a simple statistical model than by a leading expert in the field. Let me offer just two illustrative examples, one from humanresource management and the other from the world of sports. First, let’s consider the embarrassing ubiquity of job interviews as an important, often the most important, determinant of who gets hired. At the University of Chicago Booth School of Business, where I teach, recruiters devote endless hours to interviewing students on campus for potential jobs—a process that selects the few who will be invited to visit the employer, where they will undergo another extensive set of interviews. Yet research shows that interviews are nearly useless in predicting whether a job prospect will perform well on the job. Compared to a statistical model based on objective measures such as grades in courses relevant to the job in question, interviews primarily add noise and introduce the potential for prejudice. … AI systems can be thought of as trying to approximate rational behavior using limited resources. There’s an algorithm for computing the optimal action for achieving a desired outcome, but it’s computationally expensive. Experiments have found that simple learning algorithms with lots of training data often outperform complex handcrafted models. Today’s systems primarily provide value by learning better statistical models and performing statistical inference for classification and decision making. The next generation will be able to create and improve their own software and are likely to selfimprove rapidly. In addition to improving productivity, AI and robotics are drivers for numerous military and economic arms races. Autonomous systems can be faster, smarter, and less predictable than their competitors. … Compared to a statistical model based on objective measures such as grades in courses relevant to the job in question, interviews primarily add noise and introduce the potential for prejudice. (Statistical models don’t favor any particular alma mater or ethnic background and cannot detect good looks.) These facts have been known for more than four decades, but hiring practices have barely budged. The reason is simple: Each of us just knows that if we are the one conducting an interview, we will learn a lot about the candidate. It might well be that other people are not good at this task, but I am! This illusion, in direct contradiction to empirical research, means that we continue to choose employees the same way we always did. We size them up, eye to eye. One domain where some progress has been made in adopting a more scientific approach to jobcandidate selection is sports, as documented by the Michael Lewis book and movie Moneyball. 
pages: 396 words: 117,149 
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
3D printing, Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, crowdsourcing, Danny Hillis, data is the new oil, double helix, Douglas Hofstadter, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, global village, Google Glasses, Gödel, Escher, Bach, information retrieval, job automation, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, lone genius, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, NPcomplete, P = NP, PageRank, pattern recognition, phenotype, planetary scale, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, Second Machine Age, selfdriving car, Silicon Valley, speech recognition, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight In machine learning, knowledge is often in the form of statistical models, because most knowledge is statistical: all humans are mortal, but only 4 percent are Americans. Skills are often in the form of procedures: if the road curves left, turn the wheel left; if a deer jumps in front of you, slam on the brakes. (Unfortunately, as of this writing Google’s selfdriving cars still confuse windblown plastic bags with deer.) Often, the procedures are quite simple, and it’s the knowledge at their core that’s complex. If you can tell which emails are spam, you know which ones to delete. If you can tell how good a board position in chess is, you know which move to make (the one that leads to the best position). Machine learning takes many different forms and goes by many different names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, selforganizing systems, and more. … They called this scheme the EM algorithm, where the E stands for expectation (inferring the expected probabilities) and the M for maximization (estimating the maximumlikelihood parameters). They also showed that many previous algorithms were special cases of EM. For example, to learn hidden Markov models, we alternate between inferring the hidden states and estimating the transition and observation probabilities based on them. Whenever we want to learn a statistical model but are missing some crucial information (e.g., the classes of the examples), we can use EM. This makes it one of the most popular algorithms in all of machine learning. You might have noticed a certain resemblance between kmeans and EM, in that they both alternate between assigning entities to clusters and updating the clusters’ descriptions. This is not an accident: kmeans itself is a special case of EM, which you get when all the attributes have “narrow” normal distributions, that is, normal distributions with very small variance. … See S curves Significance tests, 87 Silver, Nate, 17, 238 Similarity, 178, 179 Similarity measures, 192, 197–200, 207 Simon, Herbert, 41, 225–226, 302 Simultaneous localization and mapping (SLAM), 166 Singularity, 28, 186, 286–289, 311 The Singularity Is Near (Kurzweil), 286 Siri, 37, 155, 161–162, 165, 172, 255 SKICAT (sky image cataloging and analysis tool), 15, 299 Skills, learners and, 8, 217–227 Skynet, 282–286 Sloan Digital Sky Survey, 15 Smith, Adam, 58 Snow, John, 183 Soar, chunking in, 226 Social networks, information propagation in, 231 The Society of Mind (Minsky), 35 Space complexity, 5 Spam filters, 23–24, 151–152, 168–169, 171 Sparse autoencoder, 117 Speech recognition, 155, 170–172, 276, 306 Speed, learning algorithms and, 139–142 Spin glasses, brain and, 102–103 Spinoza, Baruch, 58 Squared error, 241, 243 Stacked autoencoder, 117 Stacking, 238, 255, 309 States, value of, 219–221 Statistical algorithms, 8 Statistical learning, 37, 228, 297, 300, 307 Statistical modeling, 8. See also Machine learning Statistical relational learning, 227–233, 254, 309 Statistical significance tests, 76–77 Statistics, Master Algorithm and, 31–32 Stock market predictions, neural networks and, 112, 302 Stream mining, 258 String theory, 46–47 Structure mapping, 199–200, 254, 307 Succession, rule of, 145–146 The Sun Also Rises (Hemingway), 106 Supervised learning, 209, 214, 220, 222, 226 Support vector machines (SVMs), 53, 179, 190–196, 240, 242, 244, 245, 254, 307 Support vectors, 191–193, 196, 243–244 Surfaces and Essences (Hofstadter & Sander), 200 Survival of the fittest programs, 131–134 Sutton, Rich, 221, 223 SVMs. 
pages: 442 words: 39,064 
Why Stock Markets Crash: Critical Events in Complex Financial Systems by Didier Sornette Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Asian financial crisis, asset allocation, Berlin Wall, Bretton Woods, Brownian motion, capital asset pricing model, capital controls, continuous double auction, currency peg, Deng Xiaoping, discrete time, diversified portfolio, Elliott wave, Erdős number, experimental economics, financial innovation, floating exchange rates, frictionless, frictionless market, full employment, global village, implied volatility, index fund, invisible hand, John von Neumann, jointstock company, law of one price, Louis Bachelier, mandelbrot fractal, margin call, market bubble, market clearing, market design, market fundamentalism, mental accounting, moral hazard, Network effects, new economy, oil shock, open economy, pattern recognition, Paul Erdős, quantitative trading / quantitative ﬁnance, random walk, risk/return, Ronald Reagan, Schrödinger's Cat, short selling, Silicon Valley, South Sea Bubble, statistical model, stochastic process, Tacoma Narrows Bridge, technological singularity, The Coming Technological Singularity, The Wealth of Nations by Adam Smith, Tobin tax, total factor productivity, transaction costs, tulip mania, VA Linux, Y2K, yield curve Of special interest will be the study of the premonitory processes before ﬁnancial crashes or “bubble” corrections in the stock market. For this purpose, I shall describe a new set of computational methods that are capable of searching and comparing patterns, simultaneously and iteratively, at multiple scales in hierarchical systems. I shall use these patterns to improve the understanding of the dynamical state before and after a ﬁnancial crash and to enhance the statistical modeling of social hierarchical systems with the goal of developing reliable forecasting skills for these largescale ﬁnancial crashes. IS PREDICTION POSSIBLE? A WORKING HYPOTHESIS With the low of 3227 on April 17, 2000, identiﬁed as the end of the “crash,” the Nasdaq Composite index lost in ﬁve weeks over 37% of its alltime high of 5133 reached on March 10, 2000. This crash has not been followed by a recovery, as occurred from the October 1987 crash. … Following the null hypothesis that the exponential description is correct and extrapolating this description to, for example, the three largest crashes on the U.S. market in this century (1914, 1929, and 1987), as indicated in Figure 3.4, yields a recurrence time of about ﬁfty centuries for each single crash. In reality, the three crashes occurred in less than one century. This result is a ﬁrst indication that the exponential model may not apply for the large crashes. As an additional test, 10,000 socalled synthetic data sets, each covering a time span close to a century, hence adding up to about 1 million years, was generated using a standard statistical model used by the ﬁnancial industry [46]. We use the model version GARCH(1,1) estimated from the true index with a student distribution with four degrees of freedom. This model includes both nonstationarity of volatilities (the amplitude of price variations) and the (fat tail) nature of the distribution of the price returns seen in Figure 2.7. Our analysis [209] shows that, in approximately 1 million years of heavy tail “GARCHtrading,” with a reset every century, never did three crashes similar to the three largest observed in the true DJIA occur in a single “GARCHcentury.” … More recently, Feigenbaum has examined the ﬁrst differences for the logarithm of the S&P 500 from 1980 to 1987 and ﬁnds that he cannot reject the logperiodic component at the 95% conﬁdence level [127]: in plain words, this means that the probability that the logperiodic component results from chance is about or less than one in twenty. To test furthermore the solidity of the advanced logperiodic hypothesis, Johansen, Ledoit, and I [209] tested whether the null hypothesis that a standard statistical model of ﬁnancial markets, called the GARCH(1,1) model with Studentdistributed noise, could “explain” the presence of logperiodicity. In the 1,000 surrogate data sets of length 400 weeks generated using this GARCH(1,1) model with Studentdistributed noise and analyzed as for the real crashes, only two 400week windows qualiﬁed. This result corresponds to a conﬁdence level of 998% for rejecting the hypothesis that GARCH(1,1) with Studentdistributed noise can generate meaningful logperiodicity. 
pages: 518 words: 147,036 
The Fissured Workplace by David Weil Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
accounting loophole / creative accounting, affirmative action, Affordable Care Act / Obamacare, banking crisis, barriers to entry, business process, call centre, Carmen Reinhart, Cass Sunstein, Clayton Christensen, clean water, collective bargaining, corporate governance, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, declining real wages, employer provided health coverage, Frank Levy and Richard Murnane: The New Division of Labor, George Akerlof, global supply chain, global value chain, hiring and firing, income inequality, intermodal, inventory management, Jane Jacobs, Kenneth Rogoff, law of one price, loss aversion, low skilled workers, minimum wage unemployment, moral hazard, Network effects, new economy, occupational segregation, performance metric, pre–internet, price discrimination, principal–agent problem, Rana Plaza, Richard Florida, Richard Thaler, Ronald Coase, shareholder value, Silicon Valley, statistical model, Steve Jobs, supplychain management, The Death and Life of Great American Cities, The Nature of the Firm, transaction costs, ultimatum game, union organizing, women in the workforce, Y2K, yield management The impact of shedding janitorial jobs in otherwise higherwage companies is borne out in several studies of contracting out among janitorial workers. Using a statistical model to predict the factors that increase the likelihood of contracting out specific types of jobs, Abraham and Taylor demonstrate that the higher the typical wage for the workforce at an establishment, the more likely that establishment will contract out its janitorial work. They also show that establishments that do any contracting out of janitorial workers tend to shift out the function entirely.36 Wages and benefits for workers employed directly versus contracted out can be compared given the significant number of people in both groups. Using statistical models that control for both observed characteristics of the workers and the places in which they work, several studies directly compare the wages and benefits for these occupations. … For example, franchisees might be more common in areas where there is greater competition among fastfood restaurants. That competition (and franchising only indirectly) might lead them to have higher incentives to not comply. Alternatively, companyowned outlets might be in locations with stronger consumer markets, higherskilled workers, or lower crime rates, all of which might also be associated with compliance. To adequately account for these problems, statistical models that consider all of the potentially relevant factors, including franchise status, are generated to predict compliance levels. By doing so, the effect of franchising can be examined, holding other factors constant. This allows measurement of the impact on compliance of an outlet being run by a franchisee with otherwise identical features, as opposed to a companyowned outlet. Figure 6.1 provides estimates of the impact of franchise ownership on three different measures of compliance for the top twenty branded fastfood companies in the United States.22 The figure presents the percentage difference in compliance between franchised outlets relative to otherwise comparable companyowned outlets of the same brand.23 FIGURE 6.1. … Mining entered into contract agreements at mine sites that Ember had never worked. This narrative is based on Federal Mine Safety and Health Review Commission, Secretary of Labor MSHA v. Ember Contracting Corporation, Office of Administrative Law Judges, November 4, 2011. I am grateful to Greg Wagner for flagging this case and to Andrew Razov for additional research on it. 26. These estimates are based on quarterly mining data from 2000–2010. Using statistical modeling techniques, two different measures of traumatic injuries and a direct measure of fatality rates are associated with contracting status of the mine operator as well as other explanatory factors, including mining method, physical attributes of the mine, union status, size of operations, year, and location. The contracting measure includes all forms of contracting. See Buessing and Weil (2013). 27. 
pages: 336 words: 113,519 
The Undoing Project: A Friendship That Changed Our Minds by Michael Lewis Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, availability heuristic, Cass Sunstein, choice architecture, complexity theory, Daniel Kahneman / Amos Tversky, Donald Trump, Douglas Hofstadter, endowment effect, feminist movement, framing effect, hindsight bias, John von Neumann, loss aversion, medical residency, Menlo Park, Murray GellMann, Nate Silver, New Journalism, Richard Thaler, Saturday Night Live, statistical model, Walter Mischel, Yom Kippur War He helped hire new management, then helped to figure out how to price tickets, and, finally, inevitably, was asked to work on the problem of whom to select in the NBA draft. “How will that nineteenyearold perform in the NBA?” was like “Where will the price of oil be in ten years?” A perfect answer didn’t exist, but statistics could get you to some answer that was at least a bit better than simply guessing. Morey already had a crude statistical model to evaluate amateur players. He’d built it on his own, just for fun. In 2003 the Celtics had encouraged him to use it to pick a player at the tail end of the draft—the 56th pick, when the players seldom amount to anything. And thus Brandon Hunter, an obscure power forward out of Ohio University, became the first player picked by an equation.* Two years later Morey got a call from a headhunter who said that the Houston Rockets were looking for a new general manager. … He had a diffidence about him—an understanding of how hard it is to know anything for sure. The closest he came to certainty was in his approach to making decisions. He never simply went with his first thought. He suggested a new definition of the nerd: a person who knows his own mind well enough to mistrust it. One of the first things Morey did after he arrived in Houston—and, to him, the most important—was to install his statistical model for predicting the future performance of basketball players. The model was also a tool for the acquisition of basketball knowledge. “Knowledge is literally prediction,” said Morey. “Knowledge is anything that increases your ability to predict the outcome. Literally everything you do you’re trying to predict the right thing. Most people just do it subconsciously.” A model allowed you to explore the attributes in an amateur basketball player that led to professional success, and determine how much weight should be given to each. … Without data, there’s nothing to analyze. The Indian was DeAndre Jordan all over again; he was, like most of the problems you faced in life, a puzzle, with pieces missing. The Houston Rockets would pass on him—and be shocked when the Dallas Mavericks took him in the second round of the NBA draft. Then again, you never knew.†† And that was the problem: You never knew. In Morey’s ten years of using his statistical model with the Houston Rockets, the players he’d drafted, after accounting for the draft slot in which they’d been taken, had performed better than the players drafted by threequarters of the other NBA teams. His approach had been sufficiently effective that other NBA teams were adopting it. He could even pinpoint the moment when he felt, for the first time, imitated. It was during the 2012 draft, when the players were picked in almost the exact same order the Rockets ranked them. 
pages: 447 words: 104,258 
Mathematics of the Financial Markets: Financial Instruments and Derivatives Modelling, Valuation and Risk Issues by Alain Ruttiens Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
algorithmic trading, asset allocation, assetbacked security, backtesting, banking crisis, Black Swan, BlackScholes formula, Brownian motion, capital asset pricing model, collateralized debt obligation, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, delta neutral, discounted cash flows, discrete time, diversification, fixed income, implied volatility, interest rate derivative, interest rate swap, margin call, market microstructure, martingale, pvalue, passive investing, quantitative trading / quantitative ﬁnance, random walk, risk/return, Sharpe ratio, short selling, statistical model, stochastic process, stochastic volatility, time value of money, transaction costs, value at risk, volatility smile, Wiener process, yield curve, zerocoupon bond FOCARDI, Frank J. FABOZZI, The Mathematics of Financial Modeling and Investment Management, John Wiley & Sons, Inc., Hoboken, 2004, 800 p. Lawrence GALITZ, Financial Times Handbook of Financial Engineering, FT Press, 3rd ed. Scheduled on November 2011, 480 p. Philippe JORION, Financial Risk Manager Handbook, John Wiley & Sons, Inc., Hoboken, 5th ed., 2009, 752 p. Tze Leung LAI, Haipeng XING, Statistical Models and Methods for Financial Markets, Springer, 2008, 374 p. David RUPPERT, Statistics and Finance, An Introduction, Springer, 2004, 482 p. Dan STEFANICA, A Primer for the Mathematics of Financial Engineering, FE Press, 2011, 352 p. Robert STEINER, Mastering Financial Calculations, FT Prentice Hall, 1997, 400 p. John L. TEALL, Financial Market Analytics, Quorum Books, 1999, 328 p. Presents the maths needed to understand quantitative finance, with examples and applications focusing on financial markets. 1. … More generally, Jarrow has developed some general but very useful considerations about model risk in an article devoted to risk management models, but valid for any kind of (financial) mathematical model.17 In his article, Jarrow is distinguishing between statistical and theoretical models: the former ones refer to modeling a market price or return evolution, based on historical data, such as a GARCH model. What is usually developed as “quantitative models” by some fund or portfolio managers, also belong to statistical models. On the other hand, theoretical models aim to evidence some causality based on a financial/economic reasoning, for example the Black–Scholes formula. Both types of model imply some assumptions: Jarrow distinguishes between robust and nonrobust assumptions, depending on the size of the impact when the assumption is slightly modified. The article then develops pertinent considerations about testing, calibrating and using a model. … Philippe JORION, Financial Risk Manager Handbook, John Wiley & Sons, Inc., Hoboken, 6th ed., 2010, 800 p. E. JURCZENKO, B. MAILLET (eds), MultiMoment Asset Allocation and Pricing Models, John Wiley & Sons, Ltd, Chichester, 2006, 233 p. Ioannis KARATZAS, Steven E. SHREVE, Methods of Mathematical Finance, Springer, 2010, 430 p. Donna KLINE, Fundamentals of the Futures Market, McGrawHill, 2000, 256 p. Tze Leung LAI, Haipeng XING, Statistical Models and Methods for Financial Markets, Springer, 2008, 374 p. Raymond M. LEUTHOLD, Joan C. JUNKUS, Jean E. CORDIER, The Theory and Practice of Futures Markets, Stipes Publishing, 1999, 410 p. Bob LITTERMAN, Modern Investment Management – An Equilibrium Approach, John Wiley & Sons, Inc., Hoboken, 2003, 624 p. T. LYNCH, J. APPLEBY, Large Fluctuation of Stochastic Differential Equations: Regime Switching and Applications to Simulation and Finance, LAP LAMBERT Academic Publishing, 2010, 240 p. 

Quantitative Trading: How to Build Your Own Algorithmic Trading Business by Ernie Chan Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
algorithmic trading, asset allocation, automated trading system, backtesting, Black Swan, Brownian motion, business continuity plan, compound rate of return, Elliott wave, endowment effect, fixed income, generalpurpose programming language, index fund, Long Term Capital Management, loss aversion, pvalue, paper trading, price discovery process, quantitative hedge fund, quantitative trading / quantitative ﬁnance, random walk, Ray Kurzweil, Renaissance Technologies, riskadjusted returns, Sharpe ratio, short selling, statistical arbitrage, statistical model, systematic trading, transaction costs I will illustrate this somewhat convoluted procedure at the end of Example 3.6. DataSnooping Bias In Chapter 2, I mentioned datasnooping bias—the danger that backtest performance is inflated relative to the future performance of the strategy because we have overoptimized the parameters of the model based on transient noise in the historical data. Data snooping bias is pervasive in the business of predictive statistical models of historical data, but is especially serious in finance because of the limited amount of independent data we have. Highfrequency data, while in abundant supply, is useful only for highfrequency models. And while we have stock market data stretching back to the early parts of the twentieth century, only data within the past 10 years are really suitable for building predictive model. Furthermore, as discussed in Chapter 2, regime shifts may render even data that are just a few years old obsolete for backtesting purposes. … Chan & Associates (www.epchan.com), a consulting firm focusing on trading strategy and software development for money managers. He also comanages EXP Quantitative Investments, LLC and publishes the Quantitative Trading blog (epchan.blogspot.com), which is syndicated to multiple financial news services including www.tradingmarkets.com and Yahoo! Finance. He has been quoted by the New York Times and CIO magazine on quantitative hedge funds, and has appeared on CNBC’s Closing Bell. Ernie is an expert in developing statistical models and advanced computer algorithms to discover patterns and trends from large quantities of data. He was a researcher in computer science at IBM’s T. J. Watson Research Center, in data mining at Morgan Stanley, and in statistical arbitrage trading at Credit Suisse. He has also been a senior quantitative strategist and trader at various hedge funds, with sizes ranging from millions to billions of dollars. 
pages: 49 words: 12,968 
Industrial Internet by Jon Bruner Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
autonomous vehicles, barriers to entry, computer vision, data acquisition, demand response, en.wikipedia.org, factory automation, Google X / Alphabet X, industrial robot, Internet of things, job automation, loose coupling, natural language processing, performance metric, Silicon Valley, slashdot, smart grid, smart meter, statistical model, web application “Imagine trying to operate a highway system if all you have are monthly traffic readings for a few spots on the road. But that’s what operating our power system was like.” The utility’s customers benefit, too — an example of the industrial internet creating value for every entity to which it’s connected. Fort Collins utility customers can see data on their electric usage through a Web portal that uses a statistical model to estimate how much electricity they’re using on heating, cooling, lighting and appliances. The site then draws building data from county records to recommend changes to insulation and other improvements that might save energy. Water meters measure usage every hour — frequent enough that officials will soon be able to dispatch inspection crews to houses whose vacationing owners might not know about a burst pipe. 
pages: 1,088 words: 228,743 
Expected Returns: An Investor's Guide to Harvesting Market Rewards by Antti Ilmanen Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Andrei Shleifer, asset allocation, assetbacked security, availability heuristic, backtesting, balance sheet recession, bank run, banking crisis, barriers to entry, Bernie Madoff, Black Swan, Bretton Woods, buy low sell high, capital asset pricing model, capital controls, Carmen Reinhart, central bank independence, collateralized debt obligation, commodity trading advisor, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, debt deflation, deglobalization, delta neutral, demand response, discounted cash flows, disintermediation, diversification, diversified portfolio, dividendyielding stocks, equity premium, Eugene Fama: efficient market hypothesis, fiat currency, financial deregulation, financial innovation, financial intermediation, fixed income, Flash crash, framing effect, frictionless, frictionless market, George Akerlof, global reserve currency, Google Earth, high net worth, hindsight bias, Hyman Minsky, implied volatility, income inequality, incomplete markets, index fund, inflation targeting, interest rate swap, invisible hand, Kenneth Rogoff, laissezfaire capitalism, law of one price, Long Term Capital Management, loss aversion, margin call, market bubble, market clearing, market friction, market fundamentalism, market microstructure, mental accounting, merger arbitrage, mittelstand, moral hazard, New Journalism, oil shock, pvalue, passive investing, performance metric, Ponzi scheme, prediction markets, price anchoring, price stability, principal–agent problem, private sector deleveraging, purchasing power parity, quantitative easing, quantitative trading / quantitative ﬁnance, random walk, reserve currency, Richard Thaler, risk tolerance, riskadjusted returns, risk/return, riskless arbitrage, Robert Shiller, Robert Shiller, savings glut, Sharpe ratio, short selling, sovereign wealth fund, statistical arbitrage, statistical model, stochastic volatility, systematic trading, The Great Moderation, The Myth of the Rational Market, too big to fail, transaction costs, tulip mania, value at risk, volatility arbitrage, volatility smile, workingage population, Y2K, yield curve, zerocoupon bond This is an insample measure and can be misleading if the correlations are not stable over time. Note, though, that most academic studies rely on such insample relations; econometricians simply assume that any observed statistical relation between predictors and subsequent market returns was already known to rational investors in real time. Practitioners who find this assumption unrealistic try to avoid insample bias by selecting and/or estimating statistical models repeatedly using only data that were available at each point in time, so as to assess predictability in a quasioutofsample sense, but never completely succeeding in doing so. Table 8.6. Correlations with future excess returns of the S&P 500, 1962–2009 Sources: Haver Analytics, Robert Shiller’s website, Amit Goyal’s website, own calculations. Valuations. Various valuation ratios have predictive correlations between 10% and 20% for the next quarter [5]. … They treat default (or rating change) as a random event whose probability can be estimated from observed market prices in the context of an analytical model (or directly from historical default data). Useful indicators, besides equity volatility and leverage, include past equity returns, certain financial ratios, and proxies for the liquidity premium. This modeling approach is sort of a compromise between statistical models and theoretically purer structural models. Reducedform models can naturally match market spreads better than structural models, but unconstrained indicator selection can make them overfitted to insample data. Box 10.1. (wonkish) Riskneutral and actual default probabilities Under certain assumptions (continuous trading, a singlefactor diffusion process), positions in risky assets can be perfectly hedged and thus should earn riskless return. … However, there is some evidence of rising correlations across all quant strategies, presumably due to common positions among leveraged traders. 12.7 NOTES [1] Like many others, I prefer to use economic intuition as one guard against data mining, but the virtues of such intuition can be overstated as our intuition is inevitably influenced by past experiences. Purely datadriven statistical approaches are even worse, but at least then statistical models can help assess the magnitude of datamining bias. [2] Here are some additional points on VMG: —No trading costs or financing costs related to shorting are subtracted from VMG returns. This is typical for academic studies because such costs are trade specific and/or investor specific and, moreover, such data are not available over long histories. —VMG is constructed in a deliberately conservative (“underfitted”) manner. 
pages: 183 words: 17,571 
Broken Markets: A User's Guide to the PostFinance Economy by Kevin Mellyn Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
banking crisis, banks create money, Basel III, Bernie Madoff, Big bang: deregulation of the City of London, Bonfire of the Vanities, bonus culture, Bretton Woods, BRICs, British Empire, call centre, Carmen Reinhart, central bank independence, centre right, cloud computing, collapse of Lehman Brothers, collateralized debt obligation, corporate governance, credit crunch, crony capitalism, currency manipulation / currency intervention, disintermediation, eurozone crisis, fiat currency, financial innovation, financial repression, floating exchange rates, Fractional reserve banking, global reserve currency, global supply chain, Home mortgage interest deduction, index fund, jointstock company, Joseph Schumpeter, laborforce participation, labour market flexibility, liquidity trap, London Interbank Offered Rate, lump of labour, market bubble, market clearing, Martin Wolf, means of production, mobile money, moral hazard, mortgage debt, mortgage tax deduction, Ponzi scheme, profit motive, quantitative easing, Real Time Gross Settlement, regulatory arbitrage, reserve currency, rising living standards, Ronald Coase, seigniorage, shareholder value, Silicon Valley, statistical model, Steve Jobs, The Great Moderation, the payments system, Tobin tax, too big to fail, transaction costs, underbanked, Works Progress Administration, yield curve, Yogi Berra Regulators were becoming increasingly comfortable with the “marketcentric” model too, because the securities churned out had to be properly vetted and rated by the credit agencies under SEC (Securities and Exchange Commission) rules. Moreover, distributing risk to large numbers of sophisticated institutions seemed safer than leaving it concentrated on the books of individual banks. Besides, even the Baselprocess experts had become convinced that bank risk management had reached a new level of effectiveness through the use of sophisticated statistical models, and the Basel II rules that superseded Basel I especially allowed the largest and most sophisticated banks to use approved models to set their capital requirements. The ﬂy in the ointment of marketcentric ﬁnance was that it allowed an almost inﬁnite expansion of credit in the economy, but creditworthy risks are by deﬁnition ﬁnite. At some point, every household with a steady income has 33 34 Chapter 2  Banking, Regulation, and Financial Crises seven credit cards, a mortgage, and a home equity line. … Americans make this tradeoff with limited faircreditreporting protections, while many other societies do not. It is critical to understand that a credit score is only a measure of whether a consumer can service a certain amount of credit—that is, make timely interest and principal payments. It is not concerned with the ability to pay off Broken Markets debts over time. What it really measures is the probability that an individual will default. This is a statistical model–based determination, and as such is hostage to historical experience of the behavior of tens of millions of individuals. The factors that over time have proved most predictive include not only behavior—late or missed payments on any bill, not just a loan, signals potential default—but also circumstances. Home ownership of long duration is a plus. So is longterm employment at the same ﬁrm. 
pages: 238 words: 77,730 
Final Jeopardy: Man vs. Machine and the Quest to Know Everything by Stephen Baker Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
23andMe, AI winter, Albert Einstein, artificial general intelligence, business process, call centre, clean water, computer age, Frank Gehry, information retrieval, Iridium satellite, Isaac Newton, job automation, pattern recognition, Ray Kurzweil, Silicon Valley, Silicon Valley startup, statistical model, theory of mind, thinkpad, Turing test, Vernor Vinge, WallE, Watson beat the top human players on Jeopardy! The Google team had fed millions of translated documents, many of them from the United Nations, into their computers and supplemented them with a multitude of naturallanguage text culled from the Web. This training set dwarfed their competitors’. Without knowing what the words meant, their computers had learned to associate certain strings of words in Arabic and Chinese with their English equivalents. Since they had so very many examples to learn from, these statistical models caught nuances that had long confounded machines. Using statistics, Google’s computers won hands down. “Just like that, they bypassed thirty years of work on machine translation,” said Ed Lazowska, the chairman of the computer science department at the University of Washington. The statisticians trounced the experts. But the statistically trained machines they built, whether they were translating from Chinese or analyzing the ads that a Web surfer clicked, didn’t know anything. … “We knew all of its algorithms,” he said, and the team had precise statistics on every aspect of its behavior. The human players were more complicated. Tesauro had to pull together statistics on the thousands of humans who had played Jeopardy: how often they buzzed in, their precision in different levels of clues, their betting patterns for Daily Doubles and Final Jeopardy. From these, the IBM team pieced together statistical models of two humans. Then they put them into action against the model of Watson. The games had none of the life or drama of Jeopardy—no suspense, no jokes, no jingle while the digital players came up with their Final Jeopardy responses. They were only simulations of the scoring dynamics of Jeopardy. Yet they were valuable. After millions of games, Tesauro was able to calculate the value of each clue at each state of the game. 
pages: 304 words: 80,965 
What They Do With Your Money: How the Financial System Fails Us, and How to Fix It by Stephen Davis, Jon Lukomnik, David PittWatson Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Admiral Zheng, banking crisis, Basel III, Bernie Madoff, Black Swan, centralized clearinghouse, clean water, corporate governance, correlation does not imply causation, credit crunch, Credit Default Swap, crowdsourcing, David Brooks, Dissolution of the Soviet Union, diversification, diversified portfolio, en.wikipedia.org, financial innovation, financial intermediation, Flash crash, income inequality, index fund, invisible hand, London Whale, Long Term Capital Management, moral hazard, Northern Rock, passive investing, performance metric, Ponzi scheme, principal–agent problem, rentseeking, Ronald Coase, shareholder value, Silicon Valley, South Sea Bubble, sovereign wealth fund, statistical model, Steve Jobs, the market place, The Wealth of Nations by Adam Smith, transaction costs, Upton Sinclair, value at risk, WikiLeaks Even if they change your life profoundly, such days are not likely to resemble the ones before and after. That is why the day you get married is so memorable. In fact, the elements of that day are not likely to be present in the sample of any of the previous 3,652 days.28 So how could the computer possibly calculate the likelihood of their recurring tomorrow, or next week? Similarly, in the financial world, if you feed a statistical model data that have come from a period where there has been no banking crisis, the model will predict that it is very unlikely you will have a banking crisis. When statisticians worked out that a financial crisis of the sort we witnessed in 2008 would occur once in billions of years, their judgment was based on years of data when there had not been such a crisis.29 It compounds the problem that people tend to simplify the outcome of risk models. … Just as the laws of gravity don’t explain magnetism or subatomic forces, so the disciplines of economics that held sway in our financial institutions paid little attention to the social, cultural, legal, political, institutional, moral, psychological, and technological forces that shape our economy’s behavior. The compass that bankers and regulators were using worked well according to its own logic, but it was pointing in the wrong direction, and they steered the ship onto the rocks. History does not record whether the Queen was satisfied with the academics’ response. She might, however, have noted that this economicstatistical model had been found wanting before—in 1998, when the collapse of the hedge fund LongTerm Capital Management nearly took the financial system down with it. Ironically, its directors included the two people who had shared the Nobel Prize in Economics the previous year.20 The Queen might also have noted the glittering lineup of senior economists who, over the last century, have warned against excessive confidence in predictions made using models. 
pages: 752 words: 131,533 
Python for Data Analysis by Wes McKinney Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
backtesting, cognitive dissonance, crowdsourcing, Debian, Firefox, Google Chrome, index card, random walk, recommendation engine, revision control, sentiment analysis, Sharpe ratio, side project, sorting algorithm, statistical model, type inference While readers may have many different end goals for their work, the tasks required generally fall into a number of different broad groups: Interacting with the outside world Reading and writing with a variety of file formats and databases. Preparation Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and transforming data for analysis. Transformation Applying mathematical and statistical operations to groups of data sets to derive new data sets. For example, aggregating a large table by group variables. Modeling and computation Connecting your data to statistical models, machine learning algorithms, or other computational tools Presentation Creating interactive or static graphical visualizations or textual summaries In this chapter I will show you a few data sets and some things we can do with them. These examples are just intended to pique your interest and thus will only be explained at a high level. Don’t worry if you have no experience with any of these tools; they will be discussed in great detail throughout the rest of the book. … To create a Panel, you can use a dict of DataFrame objects or a threedimensional ndarray: import pandas.io.data as web pdata = pd.Panel(dict((stk, web.get_data_yahoo(stk, '1/1/2009', '6/1/2012')) for stk in ['AAPL', 'GOOG', 'MSFT', 'DELL'])) Each item (the analogue of columns in a DataFrame) in the Panel is a DataFrame: In [297]: pdata Out[297]: <class 'pandas.core.panel.Panel'> Dimensions: 4 (items) x 861 (major) x 6 (minor) Items: AAPL to MSFT Major axis: 20090102 00:00:00 to 20120601 00:00:00 Minor axis: Open to Adj Close In [298]: pdata = pdata.swapaxes('items', 'minor') In [299]: pdata['Adj Close'] Out[299]: <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 861 entries, 20090102 00:00:00 to 20120601 00:00:00 Data columns: AAPL 861 nonnull values DELL 861 nonnull values GOOG 861 nonnull values MSFT 861 nonnull values dtypes: float64(4) ixbased label indexing generalizes to three dimensions, so we can select all data at a particular date or a range of dates like so: In [300]: pdata.ix[:, '6/1/2012', :] Out[300]: Open High Low Close Volume Adj Close AAPL 569.16 572.65 560.52 560.99 18606700 560.99 DELL 12.15 12.30 12.05 12.07 19396700 12.07 GOOG 571.79 572.65 568.35 570.98 3057900 570.98 MSFT 28.76 28.96 28.44 28.45 56634300 28.45 In [301]: pdata.ix['Adj Close', '5/22/2012':, :] Out[301]: AAPL DELL GOOG MSFT Date 20120522 556.97 15.08 600.80 29.76 20120523 570.56 12.49 609.46 29.11 20120524 565.32 12.45 603.66 29.07 20120525 562.29 12.46 591.53 29.06 20120529 572.27 12.66 594.34 29.56 20120530 579.17 12.56 588.23 29.34 20120531 577.73 12.33 580.86 29.19 20120601 560.99 12.07 570.98 28.45 An alternate way to represent panel data, especially for fitting statistical models, is in “stacked” DataFrame form: In [302]: stacked = pdata.ix[:, '5/30/2012':, :].to_frame() In [303]: stacked Out[303]: Open High Low Close Volume Adj Close major minor 20120530 AAPL 569.20 579.99 566.56 579.17 18908200 579.17 DELL 12.59 12.70 12.46 12.56 19787800 12.56 GOOG 588.16 591.90 583.53 588.23 1906700 588.23 MSFT 29.35 29.48 29.12 29.34 41585500 29.34 20120531 AAPL 580.74 581.50 571.46 577.73 17559800 577.73 DELL 12.53 12.54 12.33 12.33 19955500 12.33 GOOG 588.72 590.00 579.00 580.86 2968300 580.86 MSFT 29.30 29.42 28.94 29.19 39134000 29.19 20120601 AAPL 569.16 572.65 560.52 560.99 18606700 560.99 DELL 12.15 12.30 12.05 12.07 19396700 12.07 GOOG 571.79 572.65 568.35 570.98 3057900 570.98 MSFT 28.76 28.96 28.44 28.45 56634300 28.45 DataFrame has a related to_panel method, the inverse of to_frame: In [304]: stacked.to_panel() Out[304]: <class 'pandas.core.panel.Panel'> Dimensions: 6 (items) x 3 (major) x 4 (minor) Items: Open to Adj Close Major axis: 20120530 00:00:00 to 20120601 00:00:00 Minor axis: AAPL to MSFT Chapter 6. … There are much more efficient samplingwithoutreplacement algorithms, but this is an easy strategy that uses readily available tools: In [183]: df.take(np.random.permutation(len(df))[:3]) Out[183]: 0 1 2 3 1 4 5 6 7 3 12 13 14 15 4 16 17 18 19 To generate a sample with replacement, the fastest way is to use np.random.randint to draw random integers: In [184]: bag = np.array([5, 7, 1, 6, 4]) In [185]: sampler = np.random.randint(0, len(bag), size=10) In [186]: sampler Out[186]: array([4, 4, 2, 2, 2, 0, 3, 0, 4, 1]) In [187]: draws = bag.take(sampler) In [188]: draws Out[188]: array([ 4, 4, 1, 1, 1, 5, 6, 5, 4, 7]) Computing Indicator/Dummy Variables Another type of transformation for statistical modeling or machine learning applications is converting a categorical variable into a “dummy” or “indicator” matrix. If a column in a DataFrame has k distinct values, you would derive a matrix or DataFrame containing k columns containing all 1’s and 0’s. pandas has a get_dummies function for doing this, though devising one yourself is not difficult. Let’s return to an earlier example DataFrame: In [189]: df = DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'b'], .....: 'data1': range(6)}) In [190]: pd.get_dummies(df['key']) Out[190]: a b c 0 0 1 0 1 0 1 0 2 1 0 0 3 0 0 1 4 1 0 0 5 0 1 0 In some cases, you may want to add a prefix to the columns in the indicator DataFrame, which can then be merged with the other data. get_dummies has a prefix argument for doing just this: In [191]: dummies = pd.get_dummies(df['key'], prefix='key') In [192]: df_with_dummy = df[['data1']].join(dummies) In [193]: df_with_dummy Out[193]: data1 key_a key_b key_c 0 0 0 1 0 1 1 0 1 0 2 2 1 0 0 3 3 0 0 1 4 4 1 0 0 5 5 0 1 0 If a row in a DataFrame belongs to multiple categories, things are a bit more complicated. 
pages: 467 words: 116,094 
I Think You'll Find It's a Bit More Complicated Than That by Ben Goldacre Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
call centre, conceptual framework, correlation does not imply causation, crowdsourcing, death of newspapers, Desert Island Discs, en.wikipedia.org, experimental subject, Firefox, Flynn Effect, jimmy wales, John Snow's cholera map, Loebner Prize, meta analysis, metaanalysis, placebo effect, Simon Singh, statistical model, stem cell, the scientific method, Turing test, WikiLeaks Obviously, there are no out gay people in the eighteentotwentyfour group who came out at an age later than twentyfour; so the average age at which people in the eighteentotwentyfour group came out cannot possibly be greater than the average age of that group, and certainly it will be lower than, say, thirtyseven, the average age at which people in their sixties came out. For the same reason, it’s very likely indeed that the average age of coming out will increase as the average age of each age group rises. In fact, if we assume (in formal terms we could call this a ‘statistical model’) that at any time, all the people who are out have always come out at a uniform rate between the age of ten and their current age, you would get almost exactly the same figures (you’d get fifteen, twentythree and thirtyfive, instead of seventeen, twentyone and thirtyseven). This is almost certainly why ‘the average comingout age has fallen by over twenty years’: in fact you could say that Stonewall’s survey has found that on average, as people get older, they get older. … For example, a recent study identified two broad subpopulations of cyclist: ‘one speedhappy group that cycle fast and have lots of cycle equipment including helmets, and one traditional kind of cyclist without much equipment, cycling slowly’. The study concluded that compulsory cyclehelmet legislation may selectively reduce cycling in the second group. There are even more complex secondround effects if each individual cyclist’s safety is improved by increased cyclist density through ‘safety in numbers’, a phenomenon known as Smeed’s law. Statistical models for the overall impact of helmet habits are therefore inevitably complex and based on speculative assumptions. This complexity seems at odds with the current official BMA policy, which confidently calls for compulsory helmet legislation. Standing over all this methodological complexity is a layer of politics, culture and psychology. Supporters of helmets often tell vivid stories about someone they knew, or heard of, who was apparently saved from severe head injury by a helmet. … A&E departments: randomised trials in 208; waiting times 73–5 abdominal aortic aneurysms (AAA) 18, 114 abortion; GPs and xviii, 89–91; Science and Technology Committee report on ‘scientific developments relating to the Abortion Act, 1967’ 196–201 academia, bad xviii–xix, 127–46; animal experiments, failures in research 136–8; brainimaging studies report more positive findings than their numbers can support 131–4; journals, failures of academic 138–46; Medical Hypotheses: Aids denialism in 138–41; Medical Hypotheses: ‘Down Subjects and Oriental Population Share Several Specific Attitudes and Characteristics’ article 139, 141–3; Medical Hypotheses: masturbation as a treatment for nasal congestion articles 139, 143–6; misuse of statistics 129–31; retractions, academic literature and 134–6 academic journals: access to papers published in 32–4, 143; cherrypicking and 5–8; ‘citation classics’ and 9–10, 102–3, 173; commercial ghost writers and 25–6; data published in newspapers rather than 17–20; doctors and technical academic journals 214; ‘impact factor’ 143; number of 14, 17; peer review and 138–46 see also peer review; poor quality (‘crap’) 138–46; refusal to publish in 3–5; retractions and 134–6; statistical model errors in 129–31; studies of errors in papers published in 9–10, 129–31; summaries of important new research from 214–15; teaching and 214–15; youngest people to publish papers in 11–12 academic papers xvi; access to 32–4; cherrypicking from xvii, 5–8, 12, 174, 176–7, 192, 193, 252, 336, 349, 355; ‘citation classics’ 9–10, 102–3, 173; commercial ‘ghost writers’ and 25–6; investigative journalism work and 18; journalists linking work to 342, 344, 346; number of 14; peer review and see peer review; postpublication 4–5; press releases and xxi, 6, 29–31, 65, 66, 107–9, 119, 120, 121–2, 338–9, 340–2, 358–60; public relations and 358–60; publication bias 132–3, 136, 314, 315; references to other academic papers within allowing study of how ideas spread 26; refusal to publish in 3–5, 29–31; retractions and 134–6; studies of errors in 9–10, 129–31; titles of 297 Acousticom 366 acupuncture 39, 388 ADE 651 273–5 ADHD 40–2 Advertising Standards Authority (ASA) 252 Afghanistan 231; crop captures in xx, 221–4 Ahn, Professor Anna 341 Aids; antiretroviral drugs and 140, 185, 281, 284, 285; Big Pharma and 186; birth control, abortion and US Christian aid groups 185; Catholic Church fight against condom use and 183–4; cures for 12, 182–3, 185–6, 366; denialism 138–41, 182–3, 185–6, 263, 273, 281–6; drug users and 182, 183, 233–4; House of Numbers film 281–3; Medical Hypotheses, Aids denial in 138–41; needleexchange programmes and 182, 183; number of deaths from 20, 186, 309; power of ideas and 182–7; Roger Coghill and ‘the Aids test’ 366; Spectator, Aids denialism at the xxi, 283–6; US Presidential Emergency Plan for Aids Relief 185 Aidstruth.org 139 alJabiri, Major General Jehad 274–5 alcohol: intravenous use of 233; lung cancer and 108–9; rape and consumption of 329, 330 ALLHAT trial 119 Alzheimer’s, smoking and 20–1 American Academy of Child and Adolescent Psychiatry 325 American Association on Mental Retardation 325 American Journal of Clinical Nutrition 344 American Medical Association 262 American Psychological Association 325 American SpeechLanguageHearing Association 325 anecdotes, illustrating data with 8, 118–22, 189, 248–9, 293 animal experiments 136–8 Annals of Internal Medicine 358 Annals of Thoracic Surgery 134 antidepressants 18; recession linked to rise in prescriptions for xviii, 104–7; SSRI 18, 105 antiretroviral medications 140, 185, 281, 284, 285 aortic aneurysm repair, mortality rates in hospital after/during 18–20, 114 APGaylard 252 Appleby, John 19, 173 artificial intelligence xxii, 394–5 Asch, Solomon 15, 16 Asphalia 365 Associated Press 316 Astel, Professor Karl 22 ATSC 273 autism: educational interventions in 325; internet use and 3; MMR and 145, 347–55, 356–8 Autism Research Centre, Cambridge 348, 354 Bad Science (Goldacre) xvi, 104, 110n, 257, 346 Bad Science column see Guardian Ballas, Dr Dimitris 58 Barasi, Leo 96 Barden, Paul 101–4 Barnardo’s 394 BaronCohen, Professor Simon 349–51, 353–4 Batarim 305–6 BBC xxi; ‘bioresonance’ story and 277–8; Britain’s happiest places story and 56, 57; causes of avoidable death, overall coverage of 20; Down’s syndrome births increase story and 61–2; ‘EDF Survey Shows Support for Hinkley Power Station’ story and 95–6; psychological nature of libido problems story and 37; radiation from wifi networks story and 289–91, 293; recession and antidepressant link, reports 105; Reform: The Value of Mathematics’ story and 196; ‘Threefold variation’ in UK bowel cancer rates’ story and 101–4; Wightman and 393, 394; ‘“Worrying’’ Jobless Rise Needs Urgent Action – Labour’ story and 59 Beating Bowel Cancer 101, 104 Becker muscular dystrophy 121 Bem Sex Role Inventory (BSRI) 45 Benedict XVI, Pope 183, 184 Benford’s law 54–6 bicycle helmets, the law and 110–13 big data xvii, xviii, 71–86; access to government data 75–7; care.data and risk of sharing medical records 77–86; magical way that patterns emerge from data 73–5 Big Pharma xvii, 324, 401 bin Laden, Osama 357 biologising xvii, 35–46; biological causes for psychological or behavioural conditions 40–2; brain imaging, reality of phenomena and 37–9; girls’ love of pink, evolution and 42–6 Biologist 6 BioSTAR 248 birth rate, UK 49–50 Bishop, Professor Dorothy 3, 6 bladder cancer 24–5, 342 Blair, Tony 357 Blakemore, Colin 138 blame, mistakes in medicine and 267–70 blind auditions, orchestras and xxi, 309–11 blinding, randomised testing and xviii, 12, 118, 124, 126, 133, 137–8, 292–3, 345 blood tests 117, 119–20, 282 bloodpressure drugs 119–20 Blundell, Professor John 337 BMA 112 Booth, Patricia 265 Boston Globe 39 bowel cancer 101–4 Boynton, Dr Petra 252 Brain Committee 230–1 Brain Gym 10–12 Brainiac: faking of science on xxii, 371–5 brainimaging studies, positive findings in 131–4 breast cancer: abortion and 200–1; diet and 338–40; red wine and 267, 269; screening 113, 114, 115 breast enhancement cream xx, 254–7 Breuning, Stephen 135–6 The British Association for Applied Nutrition and Nutritional Therapy (BANT) 268–9 British Association of Nutritional Therapists 270 British Chiropractic Association (BCA) 250–4 British Dental Association 24 British Household Panel Survey 57 British Journal of Cancer: ‘What if Cancer Survival in Britain were the Same as in Europe: How Many Deaths are Avoidable?’ 
pages: 719 words: 104,316 
R Cookbook by Paul Teetor Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Debian, en.wikipedia.org, pvalue, quantitative trading / quantitative ﬁnance, statistical model Solution The factor function encodes your vector of discrete values into a factor: > f < factor(v) # v is a vector of strings or integers If your vector contains only a subset of possible values and not the entire universe, then include a second argument that gives the possible levels of the factor: > f < factor(v, levels) Discussion In R, each possible value of a categorical variable is called a level. A vector of levels is called a factor. Factors fit very cleanly into the vector orientation of R, and they are used in powerful ways for processing data and building statistical models. Most of the time, converting your categorical data into a factor is a simple matter of calling the factor function, which identifies the distinct levels of the categorical data and packs them into a factor: > f < factor(c("Win","Win","Lose","Tie","Win","Lose")) > f [1] Win Win Lose Tie Win Lose Levels: Lose Tie Win Notice that when we printed the factor, f, R did not put quotes around the values. … So think twice before you diddle with those globals: do you really want all lines in all graphics to be (say) magenta, dotted, and three times wider? Probably not, so use local parameters rather than global parameters whenever possible. See Also The help page for par lists the global graphics parameters; the chapter of R in a Nutshell on graphics includes the list with useful annotations. R Graphics contains extensive explanations of graphics parameters. Chapter 11. Linear Regression and ANOVA Introduction In statistics, modeling is where we get down to business. Models quantify the relationships between our variables. Models let us make predictions. A simple linear regression is the most basic model. It’s just two variables and is modeled as a linear relationship with an error term: yi = β0 + β1xi + εi We are given the data for x and y. Our mission is to fit the model, which will give us the best estimates for β0 and β1 (Recipe 11.1). 
pages: 398 words: 86,855 
Bad Data Handbook by Q. Ethan McCallum Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, cloud computing, cognitive dissonance, combinatorial explosion, conceptual framework, database schema, en.wikipedia.org, Firefox, Flash crash, Gini coefficient, illegal immigration, iterative process, laborforce participation, loose coupling, natural language processing, Netflix Prize, quantitative trading / quantitative ﬁnance, recommendation engine, sentiment analysis, statistical model, supplychain management, text mining, too big to fail, web application In a previous life, he invented the refrigerator. Spencer Burns is a data scientist/engineer living in San Francisco. He has spent the past 15 years extracting information from messy data in fields ranging from intelligence to quantitative finance to social media. Richard Cotton is a data scientist with a background in chemical health and safety, and has worked extensively on tools to give nontechnical users access to statistical models. He is the author of the R packages “assertive” for checking the state of your variables and “sig” to make sure your functions have a sensible API. He runs The Damned Liars statistics consultancy. Philipp K. Janert was born and raised in Germany. He obtained a Ph.D. in Theoretical Physics from the University of Washington in 1997 and has been working in the tech industry since, including four years at Amazon.com, where he initiated and led several projects to improve Amazon’s order fulfillment process. … As the first and second examples show, a scientist can spot faulty experimental setups, because of his or her ability to test the data for internal consistency and for agreement with known theories, and thereby prevent wrong conclusions and faulty analyses. What possibly could be more importantto a scientist? And if that means taking a trip to the factory, I’ll be glad to go. Chapter 8. Blood, Sweat, and Urine Richard Cotton A Very Nerdy Body Swap Comedy I spent six years working in the statistical modeling team at the UK’s Health and Safety Laboratory.[23] A large part of my job was working with the laboratory’s chemists, looking at occupational exposure to various nasty substances to see if an industry was adhering to safe limits. The laboratory gets sent tens of thousands of blood and urine samples each year (and sometimes more exotic fluids like sweat or saliva), and has its own team of occupational hygienists who visit companies and collect yet more samples. 
pages: 309 words: 86,909 
The Spirit Level: Why Greater Equality Makes Societies Stronger by Richard Wilkinson; Kate Pickett Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Berlin Wall, clean water, Diane Coyle, epigenetics, experimental economics, experimental subject, Fall of the Berlin Wall, full employment, germ theory of disease, Gini coefficient, impulse control, income inequality, knowledge economy, laborforce participation, land reform, Louis Pasteur, meta analysis, metaanalysis, Milgram experiment, offshore financial centre, phenotype, Plutocrats, plutocrats, profit maximization, profit motive, Ralph Waldo Emerson, statistical model, The Chicago School, The Spirit Level, The Wealth of Nations by Adam Smith, Thorstein Veblen, ultimatum game, upwardly mobile, World Values Survey One factor is the strength of the relationship, which is shown by the steepness of the lines in Figures 4.1 and 4.2. People in Sweden are much more likely to trust each other than people in Portugal. Any alternative explanation would need to be just as strong, and in our own statistical models we find that neither poverty nor average standards of living can explain our findings. We also see a consistent association among both the United States and the developed countries. Earlier we described how Uslaner and Rothstein used a statistical model to show the ordering of inequality and trust: inequality affects trust, not the other way round. The relationships between inequality and women’s status and between inequality and foreign aid also add coherence and plausibility to our belief that inequality increases the social distance between different groups of people, making us less willing to see them as ‘us’ rather than ‘them’. 
pages: 350 words: 103,270 
The Devil's Derivatives: The Untold Story of the Slick Traders and Hapless Regulators Who Almost Blew Up Wall Street . . . And Are Ready to Do It Again by Nicholas Dunbar Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
assetbacked security, bank run, banking crisis, Basel III, Black Swan, BlackScholes formula, bonus culture, capital asset pricing model, Carmen Reinhart, Cass Sunstein, collateralized debt obligation, Credit Default Swap, credit default swaps / collateralized debt obligations, delayed gratification, diversification, Edmond Halley, facts on the ground, financial innovation, fixed income, George Akerlof, implied volatility, index fund, interest rate derivative, interest rate swap, Isaac Newton, Kenneth Rogoff, Long Term Capital Management, margin call, market bubble, Nick Leeson, Northern Rock, offshore financial centre, price mechanism, regulatory arbitrage, rentseeking, Richard Thaler, risk tolerance, risk/return, Ronald Reagan, shareholder value, short selling, statistical model, The Chicago School, time value of money, too big to fail, transaction costs, value at risk, Vanguard fund, yield curve The mattress had done its job—it had given international regulators the confidence to sign off as commercial banks built up their trading businesses. Betting—and Beating—the Spread Now return to the trading floor, to the people regulators and bank senior management need to police. Although they are taught to overcome risk aversion, traders continue to look for a mattress everywhere, in the form of “free lunches.” But do they use statistical modeling to identify a mattress, and make money? If you talk to traders, the answer tends to be no. Listen to the warning of a senior Morgan Stanley equities trader who I interviewed in 2009: “You can compare to theoretical or historic value. But these forms of trading are probably a bit dangerous.” While regulators and senior bankers may have embraced VAR, traders themselves have always been skeptical. … According to the Morgan Stanley trader, “You study the perception of the market: I buy this because the next tick will be on the upside, or I sell because the next tick will be on the downside. This is probably based on the observations of your peers and so on. If you look purely at the anticipation of the price, that’s a way to make money in trading.” One reason traders don’t tend to make outright bets on the basis of statistical modeling is that capital rules such as VAR discourage it. The capital required to be set aside by VAR scales up with the size of the positions and the degree of worstcase scenario projected by the statistics. For volatile markets like equities, that restriction takes a big bite out of potential profit since trading firms must borrow to invest.5 On the other hand, shortterm, opportunistic trading (which might be less profitable) slips under the VAR radar because the positions never stay on the books for very long. 
pages: 502 words: 107,510 
Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, game design, information retrieval, iterative process, natural language processing, pattern recognition, performance metric, sentiment analysis, social web, speech recognition, statistical model, text mining This is a corpus of tagged and parsed sentences of naturally occurring English (4.5 million words). The British National Corpus (BNC) is compiled and released as the largest corpus of English to date (100 million words). The Text Encoding Initiative (TEI) is established to develop and maintain a standard for the representation of texts in digital form. 2000s: As the World Wide Web grows, more data is available for statistical models for Machine Translation and other applications. The American National Corpus (ANC) project releases a 22millionword subcorpus, and the Corpus of Contemporary American English (COCA) is released (400 million words). Google releases its Google Ngram Corpus of 1 trillion word tokens from public web pages. The corpus holds up to five ngrams for each word token, along with their frequencies . 2010s: International standards organizations, such as ISO, begin to recognize and codevelop text encoding formats that are being used for corpus annotation efforts. … .), this algorithm computes a probability distribution over the possible labels associated with them, and then computes the best label sequence. We can identify two basic methods for sequence classification: Featurebased classification A sequence is tranformed into a feature vector. The vector is then classified according to conventional classifier methods. Modelbased classification An inherent model of the probability distribution of the sequence is built. HMMs and other statistical models are examples of this method. Included in featurebased methods are ngram models of sequences, where an ngram is selected as a feature. Given a set of such ngrams, we can represent a sequence as a binary vector of the occurrence of the ngrams, or as a vector containing frequency counts of the ngrams. With this sort of encoding, we can apply conventional methods to model sequences (Manning and Schütze 1999). 
pages: 317 words: 106,130 
The New Science of Asset Allocation: Risk Management in a MultiAsset World by Thomas Schneeweis, Garry B. Crowder, Hossein Kazemi Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
asset allocation, backtesting, Bernie Madoff, Black Swan, capital asset pricing model, collateralized debt obligation, commodity trading advisor, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, diversified portfolio, fixed income, high net worth, implied volatility, index fund, interest rate swap, invisible hand, market microstructure, merger arbitrage, moral hazard, passive investing, Richard Feynman, Richard Feynman, Richard Feynman: Challenger Oring, risk tolerance, riskadjusted returns, risk/return, Sharpe ratio, short selling, statistical model, systematic trading, technology bubble, the market place, Thomas Kuhn: the structure of scientific revolutions, transaction costs, value at risk, yield curve In practice, we must come up with estimates of the expected returns, standard deviations, and correlations. There are libraries of statistical books dedicated to the simple task of coming up with estimates of the parameters used in MPT. Here is the point: It is not simple. For example, (1) for what period is one estimating the parameters (week, month, year)? and (2) how constant are the estimates (e.g., do they change and, if they do, do we have statistical models that permit us to systematically reflect those changes?)? There are many more issues in parameter estimation, but probably the biggest is that when two assets exist with the same true expected return, standard deviation, and Measuring Risk 33 correlation but when the risk parameter is often estimated with error (e.g., standard deviation is larger or smaller than its true standard deviation), the procedure for determining the efficient frontier always picks the asset with the downward bias risk estimate (e.g., the lower estimated standard deviation) and the upward bias return estimate. … The expected return on a comparably risky nonactively managed investment strategy is often either derived from academic theory or statistically derived from historical pricing relationships. The primary issue, of course, remains how to create a comparably risky investable nonactively managed asset. Even when one believes in the use of ex ante equilibrium (e.g., CAPM) or arbitrage (e.g., APT) models of expected return, problems in empirically estimating the required parameters usually results in alpha being determined using statistical models based on the underlying theoretical model. As generally measured in a statistical sense, the term alpha is often derived from a linear regression in which the equation that relates an observed variable y (asset return) to some other factor x (market index) is written as: y = α + βx + ε The first term, α (alpha) represents the intercept; β (beta) represents the slope; and ε (epsilon) represents a random error term. 
pages: 311 words: 99,699 
Fool's Gold: How the Bold Dream of a Small Tribe at J.P. Morgan Was Corrupted by Wall Street Greed and Unleashed a Catastrophe by Gillian Tett Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
accounting loophole / creative accounting, assetbacked security, bank run, banking crisis, BlackScholes formula, Bretton Woods, business climate, collateralized debt obligation, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, easy for humans, difficult for computers, financial innovation, fixed income, housing crisis, interest rate derivative, interest rate swap, locking in a profit, Long Term Capital Management, McMansion, mortgage debt, North Sea oil, Northern Rock, Renaissance Technologies, risk tolerance, Robert Shiller, Robert Shiller, short selling, sovereign wealth fund, statistical model, The Great Moderation, too big to fail, value at risk, yield curve That triggered panic among some investors, and many rushed to sell CDSs and CDOs, causing their prices to drop, an eventuality not predicted by the models. JPMorgan Chase, Deutsche Bank, and many other banks and funds suffered substantial losses. For a few weeks after the turmoil, the banking community engaged in soulsearching. At J.P. Morgan the traders stuck bananas on their desks as a jibe at the socalled F9 model monkeys, the mathematical wizards who had created such havoc. (The “monkeys” who wrote the statistical models tended to use the “F9” key on the computer when they performed their calculations, giving rise to the tag.) J.P. Morgan, Deutsche, and others conducted internal reviews that led them to introduce slight changes in their statistical systems. GLG Ltd., one large hedge fund, told its investors that it would use a wider set of data to analyze CDOs in the future. Within a couple of months, though, the markets rebounded, and the furor died down. … Compared to Greenspan, Geithner was not just younger, but he also commanded far less clout and respect. As the decade wore on, though, he became privately uneasy about some of the trends in the credit world. From 2005 onwards, he started to call on bankers to prepare for socalled “fat tails,” a statistical term for extremely negative events that occur more often than the normal bell curve statistical models the banks’ risk assessment relied on so much implied. He commented in the spring of 2006: “A number of fundamental changes in the US financial system over the past twentyfive years appear to have rendered it able to withstand the stress of a broader array of shocks than was the case in the past. [But] confidence in the overall resilience of the financial system needs to be tempered by the realization that there is much we still do not know about the likely sources and consequences of future stress to the system…[and]…The proliferation of new forms of derivatives and structured financial products has changed the nature of leverage in the financial system. 
pages: 317 words: 100,414 
Superforecasting: The Art and Science of Prediction by Philip Tetlock, Dan Gardner Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Affordable Care Act / Obamacare, Any sufficiently advanced technology is indistinguishable from magic, availability heuristic, Black Swan, butterfly effect, cloud computing, cuban missile crisis, Daniel Kahneman / Amos Tversky, desegregation, Edward Lorenz: Chaos theory, forward guidance, Freestyle chess, fundamental attribution error, germ theory of disease, hindsight bias, index fund, Jane Jacobs, Jeff Bezos, Mikhail Gorbachev, Mohammed Bouazizi, Nash equilibrium, Nate Silver, obamacare, pattern recognition, performance metric, placemaking, placebo effect, prediction markets, quantitative easing, random walk, randomized controlled trial, Richard Feynman, Richard Feynman, Richard Thaler, Robert Shiller, Robert Shiller, Ronald Reagan, Saturday Night Live, Silicon Valley, Skype, statistical model, stem cell, Steve Ballmer, Steve Jobs, Steven Pinker, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Watson beat the top human players on Jeopardy! Amos had an impish sense of humor. He also appreciated the absurdity of an academic committee on a mission to save the world. So I am 98% sure he was joking. And 99% sure his joke captures a basic truth about human judgment. Probability for the Stone Age Human beings have coped with uncertainty for as long as we have been recognizably human. And for almost all that time we didn’t have access to statistical models of uncertainty because they didn’t exist. It was remarkably late in history—arguably as late as the 1713 publication of Jakob Bernoulli’s Ars Conjectandi—before the best minds started to think seriously about probability. Before that, people had no choice but to rely on the tipofyournose perspective. You see a shadow moving in the long grass. Should you worry about lions? You try to think of an example of a lion attacking from the long grass. … Appendix Ten Commandments for Aspiring Superforecasters The guidelines sketched here distill key themes in this book and in training systems that have been experimentally demonstrated to boost accuracy in realworld forecasting contests. For more details, visit www.goodjudgment.com. (1) Triage. Focus on questions where your hard work is likely to pay off. Don’t waste time either on easy “clocklike” questions (where simple rules of thumb can get you close to the right answer) or on impenetrable “cloudlike” questions (where even fancy statistical models can’t beat the dartthrowing chimp). Concentrate on questions in the Goldilocks zone of difficulty, where effort pays off the most. For instance, “Who will win the presidential election, twelve years out, in 2028?” is impossible to forecast now. Don’t even try. Could you have predicted in 1940 the winner of the election, twelve years out, in 1952? If you think you could have known it would be a thenunknown colonel in the United States Army, Dwight Eisenhower, you may be afflicted by one of the worst cases of hindsight bias ever documented by psychologists. 

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, speech recognition, statistical model, William of Occam WHY THE BOOK IS NEEDED The book provides the reader with: r The models and techniques to uncover hidden nuggets of information in Webbased data r Insight into how web mining algorithms really work r The experience of actually performing web mining on realworld data sets “WHITEBOX” APPROACH: UNDERSTANDING THE UNDERLYING ALGORITHMIC AND MODEL STRUCTURES The best way to avoid costly errors stemming from a blind blackbox approach to data mining, is to apply, instead, a whitebox methodology, which emphasizes an understanding of the algorithmic and statistical model structures underlying the software. The book, applies this whitebox approach by: r Walking the reader through various algorithms r Providing examples of the operation of web mining algorithms on actual large data sets PREFACE xiii r Testing the reader’s level of understanding of the concepts and algorithms r Providing an opportunity for the reader to do some real web mining on large Webbased data sets Algorithm WalkThroughs The book walks the reader through the operations and nuances of various algorithms, using small sample data sets, so that the reader gets a true appreciation of what is really going on inside an algorithm. … By inspecting the normal density curves, determine which attribute is more relevant for the classiﬁcation task. CHAPTER 4 EVALUATING CLUSTERING APPROACHES TO EVALUATING CLUSTERING SIMILARITYBASED CRITERION FUNCTIONS PROBABILISTIC CRITERION FUNCTIONS MDLBASED MODEL AND FEATURE EVALUATION CLASSESTOCLUSTERS EVALUATION PRECISION, RECALL, AND FMEASURE ENTROPY APPROACHES TO EVALUATING CLUSTERING Clustering algorithms group documents by similarity or create statistical models based solely on the document representation, which in turn reﬂects document content. Then the criterion functions evaluate these models objectively (i.e., using only the document content). In contrast, when we label documents by topic we use additional knowledge, which is generally not explicitly available in document content and representation. Labeled documents are used primarily in supervised learning (classiﬁcation) to create a mapping between the document representation and the external notion (concept, category, class) provided by the teacher through labeling. 
pages: 347 words: 97,721 
Only Humans Need Apply: Winners and Losers in the Age of Smart Machines by Thomas H. Davenport, Julia Kirby Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
AI winter, Andy Kessler, artificial general intelligence, asset allocation, Automated Insights, autonomous vehicles, Baxter: Rethink Robotics, business intelligence, business process, call centre, carbonbased life, Clayton Christensen, clockwork universe, conceptual framework, dark matter, David Brooks, deliberate practice, deskilling, Edward Lloyd's coffeehouse, Elon Musk, Erik Brynjolfsson, estate planning, follow your passion, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, game design, generalpurpose programming language, Google Glasses, Hans Lippershey, haute cuisine, income inequality, index fund, industrial robot, information retrieval, intermodal, Internet of things, inventory management, Isaac Newton, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Khan Academy, knowledge worker, laborforce participation, loss aversion, Mark Zuckerberg, Narrative Science, natural language processing, Norbert Wiener, nuclear winter, pattern recognition, performance metric, Peter Thiel, precariat, quantitative trading / quantitative ﬁnance, Ray Kurzweil, Richard Feynman, Richard Feynman, risk tolerance, Robert Shiller, Robert Shiller, Rodney Brooks, Second Machine Age, selfdriving car, Silicon Valley, six sigma, Skype, speech recognition, spinning jenny, statistical model, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, superintelligent machines, supplychain management, transaction costs, Tyler Cowen: Great Stagnation, Watson beat the top human players on Jeopardy!, Works Progress Administration, Zipcar Where It All Began Today, someone using the term “smart machine” could be talking about any number of technologies. The term “artificial intelligence” alone, for example, has been used to describe such technologies as expert systems (collections of rules facilitating decisions in a specified domain, such as financial planning or knowing when a batch of soup is cooked), neural networks (a more mathematical approach to creating a model that fits a data set), machine learning (semiautomated statistical modeling to achieve the best fittingmodel to data), natural language processing or NLP (in which computers make sense of human language in textual form), and so forth. Wikipedia lists at least ten branches of AI, and we have seen other sources that mention many more. To make sense of this army of machines and the direction in which it is marching, it helps to remember where it all started: with numerical analytics supporting and supported by human decisionmakers. … He hired additional credit risk modelers, and encouraged them to build a variety of quantitative models to identify any problems with the bank’s loan portfolios and credit processes. This work required a broad range of sophisticated models including “neural network” models; some were vendor supplied; some were custombuilt . Cathcart, who was an English major at Dartmouth College but also learned the BASIC computer language there from its creator, John Kemeny, knew his way around computer systems and statistical models. Most important, he knew when to trust them and when not to. The models and analyses began to exhibit significant problems. No matter how automated and sophisticated the models were, Cathcart realized that they were becoming less valid over time with changes in the economy and banking climate. Many of the mortgage models, for example, were based on five years of historical data. But as the economy became worse by the day in 2007, those fiveyear models became dramatically overoptimistic. 
pages: 502 words: 107,657 
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, call centre, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil, en.wikipedia.org, Erik Brynjolfsson, experimental subject, Google Glasses, happiness index / gross national happiness, job satisfaction, Johann Wolfgang von Goethe, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, riskadjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, selfdriving car, sentiment analysis, software as a service, speech recognition, statistical model, Steven Levy, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra FICO: Todd Steffes, “Predictive Analytics: Saving Lives and Lowering Medical Bills,” Analytics Magazine, Analytics Informs, January/February 2012. www.analyticsmagazine.org/januaryfebruary2012/505predictiveanalyticssavinglivesandloweringmedicalbills. GlaxoSmithKline (UK): Vladimir Anisimov, GlaxoSmithKline, “Predictive Analytic Patient Recruitment and Drug Supply Modelling in Clinical Trials,” Predictive Analytics World London Conference, November 30, 2011, London, UK. www.predictiveanalyticsworld.com/london/2011/agenda.php#day1–16. Vladimir V. Anisimov, “Statistical Modelling of Clinical Trials (Recruitment and Randomization),” Communications in Statistics—Theory and Methods 40, issue 19–20 (2011): 3684–3699. www.tandfonline.com/toc/lsta20/40/19–20. MultiCare Health System (four hospitals in Washington): Karen MinichPourshadi for HealthLeaders Media, “Hospital Data Mining Hits Paydirt,” HealthLeaders Media Online, November 29, 2010. www.healthleadersmedia.com/page1/FIN259479/HospitalDataMiningHitsPaydirt. … Johnson, Serena Lee, Frank Doherty, and Arthur Kressner (Consolidated Edison Company of New York), “Predicting Electricity Distribution Feeder Failures Using Machine Learning Susceptibility Analysis,” March 31, 2006. www.phillong.info/publications/GBAetal06_susc.pdf. This work has been partly supported by a research contract from Consolidated Edison. BNSF Railway: C. Tyler Dick, Christopher P. L. Barkan, Edward R. Chapman, and Mark P. Stehly, “Multivariate Statistical Model for Predicting Occurrence and Location of Broken Rails,” Transportation Research Board of the National Academies, January 26, 2007. http://trb.metapress.com/content/v2j6022171r41478/. See also: http://ict.uiuc.edu/railroad/cee/pdf/Dick_et_al_2003.pdf. TTX: Thanks to Mahesh Kumar at Tiger Analytics for this case study, “Predicting Wheel Failure Rate for Railcars.” Fortune 500 global technology company: Thanks to Dean Abbott, Abbot Analytics (http://abbottanalytics.com/index.php) for information about this case study. 
pages: 345 words: 86,394 
Frequently Asked Questions in Quantitative Finance by Paul Wilmott Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, asset allocation, BlackScholes formula, Brownian motion, butterfly effect, capital asset pricing model, collateralized debt obligation, Credit Default Swap, credit default swaps / collateralized debt obligations, delta neutral, discrete time, diversified portfolio, Emanuel Derman, Eugene Fama: efficient market hypothesis, fixed income, fudge factor, implied volatility, incomplete markets, interest rate derivative, interest rate swap, iterative process, London Interbank Offered Rate, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, margin call, market bubble, martingale, Norbert Wiener, quantitative trading / quantitative ﬁnance, random walk, regulatory arbitrage, risk/return, Sharpe ratio, statistical arbitrage, statistical model, stochastic process, stochastic volatility, transaction costs, urban planning, value at risk, volatility arbitrage, volatility smile, Wiener process, yield curve, zerocoupon bond Here is a list and description of the most important.• A static arbitrage is an arbitrage that does not require rebalancing of positions • A dynamic arbitrage is an arbitrage that requires trading instruments in the future, generally contingent on market states • A statistical arbitrage is not an arbitrage but simply a likely profit in excess of the riskfree return (perhaps even suitably adjusted for risk taken) as predicted by past statistics • Modelindependent arbitrage is an arbitrage which does not depend on any mathematical model of financial instruments to work. For example, an exploitable violation of putcall parity or a violation of the relationship between spot and forward prices, or between bonds and swaps • Modeldependent arbitrage does require a model. For example, options mispriced because of incorrect volatility estimate. … One hat’s numbers have mean of zero and standard deviation 0.1. This is hat A. Another hat’s numbers have mean of zero and standard deviation 1. This is hat B. The final hat’s numbers have mean of zero and standard deviation 10. This is hat C. You don’t know which hat is which. You pick a number out of one hat, it is −2.6. Which hat do you think it came from? MLE can help you answer this question. Long Answer A large part of statistical modelling concerns finding model parameters. One popular way of doing this is Maximum Likelihood Estimation. The method is easily explained by a very simple example. You are attending a maths conference. You arrive by train at the city hosting the event. You take a taxi from the train station to the conference venue. The taxi number is 20,922. How many taxis are there in the city? This is a parameter estimation problem. 
pages: 346 words: 92,984 
The Lucky Years: How to Thrive in the Brave New World of Health by David B. Agus Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
3D printing, active transport: walking or cycling, Affordable Care Act / Obamacare, Albert Einstein, butterfly effect, clean water, cognitive dissonance, crowdsourcing, Danny Hillis, Drosophila, Edward Lorenz: Chaos theory, en.wikipedia.org, epigenetics, Kickstarter, medical residency, meta analysis, metaanalysis, microbiome, microcredit, mouse model, Murray GellMann, New Journalism, pattern recognition, personalized medicine, phenotype, placebo effect, publish or perish, randomized controlled trial, risk tolerance, statistical model, stem cell, Steve Jobs, Thomas Malthus, wikimedia commons It didn’t take long for there to be a backlash against the implied message. Tomasetti and Vogelstein were accused of focusing on rare cancers while leaving out several common cancers that indeed are largely preventable. The International Agency for Research on Cancer, the cancer arm of the World Health Organization, published a press release stating it “strongly disagrees” with the report. To arrive at their conclusion, Tomasetti and Vogelstein used a statistical model they developed based on known rates of cell division in thirtyone types of tissue. Stem cells were their main focal point. As a reminder, these are the small, specialized “mothership” cells in each organ or tissue that divide to replace cells that die or wear out. Only in recent years have researchers been able to conduct these kinds of studies due to advances in the understanding of stemcell biology. … ., “Intensive Lifestyle Changes May Affect the Progression of Prostate Cancer,” Journal of Urology, 174, no. 3 (September 2005): 1065–69; discussion 1069–70. 11. A. R. Kristal et al., “Baseline Selenium Status and Effects of Selenium and Vitamin E Supplementation on Prostate Cancer Risk,” Journal of the National Cancer Institute 106, no. 3 (March 2014): djt456, doi:10.1093/jnci/djt456, Epub February 22, 2014. 12. Johns Hopkins Medicine, “Bad Luck of Random Mutations Plays Predominant Role in Cancer, Study Shows—Statistical Modeling Links Cancer Risk with Number of Stem Cell Divisions,” news release, January 1, 2015, www.hopkinsmedicine.org/news/media/releases/bad_luck_of_random_mutations_plays_predominant_role_in_cancer_study_shows. 13. C. Tomasetti and B. Vogelstein, “Cancer Etiology. Variation in Cancer Risk Among Tissues Can Be Explained by the Number of Stem Cell Divisions,” Science 347, no. 6217 (January 2, 2015): 78–81, doi:10.1126/science.1260825. 14. 
pages: 103 words: 32,131 
Program Or Be Programmed: Ten Commands for a Digital Age by Douglas Rushkoff Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
banking crisis, bigbox store, citizen journalism, cloud computing, East Village, financial innovation, Firefox, hive mind, Howard Rheingold, invention of the printing press, Kevin Kelly, Marshall McLuhan, Silicon Valley, statistical model, Stewart Brand, Ted Nelson, WikiLeaks In fact, the game only became a mass phenomenon as free agenting and Major League players’ strikes soured fans on the sport. As baseball became a business, the fans took back baseball as a game—even if it had to happen on their computers. The effects didn’t stay in the computer. Leveraging the tremendous power of digital abstraction back to the real world, Billy Bean, coach of the Oakland Athletics, applied these same sorts of statistical modeling to players for another purpose: to assemble a roster for his own Major League team. Bean didn’t have the same salary budget as his counterparts in New York or Los Angeles, and he needed to find another way to assemble a winning combination. So he abstracted and modeled available players in order to build a better team that went from the bottom to the top of its division, and undermined the way that money had come to control the game. 
pages: 123 words: 32,382 
Grouped: How Small Groups of Friends Are the Key to Influence on the Social Web by Paul Adams Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Airbnb, Cass Sunstein, cognitive dissonance, David Brooks, information retrieval, invention of the telegraph, planetary scale, race to the bottom, Richard Thaler, sentiment analysis, social web, statistical model, The Wisdom of Crowds, web application, white flight Research by Forrester found that cancer patients trust their local care physician more than world renowned cancer treatment centers, and in most cases, the patient had known their local care physician for years.16 We overrate the advice of experts Psychologist Philip Tetlock conducted numerous studies to test the accuracy of advice from experts in the fields of journalism and politics. He quantified over 82,000 predictions and found that the journalism experts tended to perform slightly worse than picking answers at random. Political experts didn’t fare much better. They slightly outperformed random chance, but did not perform as well as a basic statistical model. In fact, they actually performed slightly better at predicting things outside their area of expertise, and 80 percent of their predictions were wrong. Studies in finance also show that only 20 percent of investment bankers outperform the stock market.17 We overestimate what we know Sometimes we consider ourselves as experts, even though we don’t know as much as we think we know. Research by Russo and Schoemaker asked managers in the advertising industry questions about their domain. 
pages: 456 words: 185,658 
More Guns, Less Crime: Understanding Crime and GunControl Laws by John R. Lott Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
affirmative action, Columbine, crack epidemic, Donald Trump, Edward Glaeser, gun show loophole, income per capita, More Guns, Less Crime, statistical model, the medium is the message, transaction costs As to the concern that other changes in law enforcement may have been occurring at the same time, the estimates account for changes in other guncontrol laws and changes in law enforcement as measured by arrest and conviction rates as well as by prison terms. No previous study of crime has attempted to control for as many diﬀerent factors that might explain changes in the crime rate. 3 Did I assume that there was an immediate and constant effect from these laws and that the effect should be the same everywhere? The “statistical models assumed: (1) an immediate and constant eﬀect of shallissue laws, and (2) similar eﬀects across diﬀerent states and counties.” (Webster, “Claims,” p. 2; see also Dan Black and Daniel Nagin, “Do ‘RighttoCarry’ Laws Deter Violent Crime?” Journal of Legal Studies 27 [January 1998], p. 213.) One of the central arguments both in the original paper and in this book is that the size of the deterrent eﬀect is related to the number of permits issued, and it takes many years before states reach their longrun level of permits. … A major reason for the larger eﬀect on crime in the more urban counties was that in rural areas, permit requests already were being approved; hence it was in urban areas that the number of permitted concealed handguns increased the most. A week later, in response to a column that I published in the Omaha WorldHerald,20 Mr. Webster modified this claim somewhat: Lott claims that his analysis did not assume an immediate and constant eﬀect, but that is contrary to his published article, in which the vast majority of the statistical models assume such an eﬀect. (Daniel W. Webster, “ConcealedGun Research Flawed,” Omaha WorldHerald, March 12, 1997; emphasis added.) When one does research, it is most appropriate to take the simplest specifications first and then gradually make things more complicated. The simplest way of doing this is to examine the mean crime rates before and 136  CHAPTER SEVEN after the change in a law. … While he includes a chapter that contains replies to his critics, unfortunately he doesn’t directly respond to the key Black and Nagin finding that formal statistical tests reject his methods. The closest he gets to addressing this point is to acknowledge “the more serious possibility is that some other factor may have caused both the reduction in crime rates and the passage of the law to occur at the same time,” but then goes on to say that he has “presented over a thousand [statistical model] specifications” that reveal “an extremely consistent pattern” that righttocarry laws reduce crime. Another view would be that a thousand versions of a demonstrably invalid analytical approach produce boxes full of invalid results. (Jens Ludwig, “Guns and Numbers,” Washington Monthly, June 1998, p. 51)76 We applied a number of specification tests suggested by James J. Heckman and V. Joseph Hotz. 
pages: 541 words: 109,698 
Mining the Social Web: Finding Needles in the Social Haystack by Matthew A. Russell Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Climategate, cloud computing, crowdsourcing, en.wikipedia.org, fault tolerance, Firefox, full text search, Georg Cantor, Google Earth, information retrieval, Mark Zuckerberg, natural language processing, NPcomplete, profit motive, Saturday Night Live, semantic web, Silicon Valley, slashdot, social graph, social web, statistical model, Steve Jobs, supplychain management, text mining, traveling salesman, Turing test, web application Substituting various values into the precision and recall formulas is straightforward and a worthwhile exercise if this is your first time encountering these terms. For example, what would the precision, recall, and F1 score have been if your algorithm had identified “Mr. Green”, “Colonel”, “Mustard”, and “candlestick”? As somewhat of an aside, you might find it interesting to know that many of the most compelling technology stacks used by commercial businesses in the NLP space use advanced statistical models to process natural language according to supervised learning algorithms. A supervised learning algorithm is essentially an approach in which you provide training samples of the form [(input1, output1), (input2, output2), ..., (inputN, outputN)] to a model such that the model is able to predict the tuples with reasonable accuracy. The tricky part is ensuring that the trained model generalizes well to inputs that have not yet been encountered. … SocialGraph Node Mapper, Brief analysis of breadthfirst techniques sorting, Sensible Sorting, Sorting Documents by Value documents by value, Sorting Documents by Value documents in CouchDB, Sensible Sorting split method, using to tokenize text, Data Hacking with NLTK, Before You Go Off and Try to Build a Search Engine… spreadsheets, visualizing Facebook network data, Visualizing with spreadsheets (the oldfashioned way) statistical models processing natural language, Quality of Analytics stemming verbs, Querying Buzz Data with TFIDF stopwords, Data Hacking with NLTK, Analysis of Luhn’s Summarization Algorithm downloading NLTK stopword data, Data Hacking with NLTK filtering out before document summarization, Analysis of Luhn’s Summarization Algorithm streaming API (Twitter), Analyzing Tweets (One Entity at a Time) Strong Links API, The Infochimps “Strong Links” API, Interactive 3D Graph Visualization student’s tscore, How the Collocation Sausage Is Made: Contingency Tables and Scoring Functions subjectverbobject triples, EntityCentric Analysis: A Deeper Understanding of the Data, Man Cannot Live on Facts Alone summarizing documents, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm analysis of Luhn’s algorithm, Analysis of Luhn’s Summarization Algorithm Tim O’Reilly Radar blog post (example), Summarizing Documents summingReducer function, Frequency by date/time range, What entities are in Tim’s tweets? 
pages: 302 words: 82,233 
Beautiful security by Andy Oram, John Viega Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, Amazon Web Services, business intelligence, business process, call centre, cloud computing, corporate governance, credit crunch, crowdsourcing, defense in depth, en.wikipedia.org, fault tolerance, Firefox, loose coupling, market design, Monroe Doctrine, new economy, Nicholas Carr, Nick Leeson, Norbert Wiener, optical character recognition, packet switching, performance metric, pirate software, Search for Extraterrestrial Intelligence, security theater, SETI@home, Silicon Valley, Skype, software as a service, statistical model, Steven Levy, The Wisdom of Crowds, Upton Sinclair, web application, web of trust, x509 certificate, zero day, Zimmermann PGP Ashenfelter is a statistician at Princeton who loves wine but is perplexed by the pomp and circumstance around valuing and rating wine in much the same way I am perplexed by the pomp and circumstance surrounding risk management today. In the 1980s, wine critics dominated the market with predictions based on their own reputations, palate, and frankly very little more. Ashenfelter, in contrast, studied the Bordeaux region of France and developed a statistic model about the quality of wine. His model was based on the average rainfall in the winter before the growing season (the rain that makes the grapes plump) and the average sunshine during the growing season (the rays that make the grapes ripe), resulting in simple formula: quality = 12.145 + (0.00117 * winter rainfall) + (0.0614 * average growing season temperature) (0.00386 * harvest rainfall) Of course he was chastised and lampooned by the stuffy wine critics who dominated the industry, but after several years of producing valuable results, his methods are now widely accepted as providing important valuation criteria for wine. … I hope that when I look back on this text and my blog in years to come, I’ll cringe at their resemblance to the cocktailmixing house robots from movies of the 1970s. I believe the right elements are really coming together where technology can create better technology. Advances in technology have been used to both arm and disarm the planet, to empower and oppress populations, and to attack and defend the global community and all it will have become. The areas I’ve pulled together in this chapter—from business process management, number crunching and statistical modeling, visualization, and longtail technology—provide fertile ground for security management systems in the future that archive today’s best efforts in the annals of history. At least I hope so, for I hate mediocrity with a passion and I think security management systems today are mediocre at best! 168 CHAPTER NINE Acknowledgments This chapter is dedicated to my mother, Margaret Curphey, who passed away after an epileptic fit in 2004 at her house in the south of France. 
pages: 404 words: 43,442 
The Art of R Programming by Norman Matloff Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Debian, discrete time, generalpurpose programming language, linked data, sorting algorithm, statistical model The latter again stems from vectorization, a beneﬁt discussed in detail in Chapter 14. This approach is used in the loop beginning at line 53. (Arguably, in this case, the increase in speed comes at the expense of readability of the code.) 9.1.7 Extended Example: A Procedure for Polynomial Regression As another example, consider a statistical regression setting with one predictor variable. Since any statistical model is merely an approximation, in principle, you can get better and better models by ﬁtting polynomials of higher and higher degrees. However, at some point, this becomes overﬁtting, so that the prediction of new, future data actually deteriorates for degrees higher than some value. The class "polyreg" aims to deal with this issue. It ﬁts polynomials of various degrees but assesses ﬁts via crossvalidation to reduce the risk of overﬁtting. … Input/Output 239 We’ll create a function called extractpums() to read in a PUMS ﬁle and create a data frame from its Person records. The user speciﬁes the ﬁlename and lists ﬁelds to extract and names to assign to those ﬁelds. We also want to retain the household serial number. This is good to have because data for persons in the same household may be correlated and we may want to add that aspect to our statistical model. Also, the household data may provide important covariates. (In the latter case, we would want to retain the covariate data as well.) Before looking at the function code, let’s see what the function does. In this data set, gender is in column 23 and age in columns 25 and 26. In the example, our ﬁlename is pumsa. The following call creates a data frame consisting of those two variables. pumsdf < extractpums("pumsa",list(Gender=c(23,23),Age=c(25,26))) Note that we are stating here the names we want the columns to have in the resulting data frame. 
pages: 566 words: 155,428 
After the Music Stopped: The Financial Crisis, the Response, and the Work Ahead by Alan S. Blinder Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Affordable Care Act / Obamacare, assetbacked security, bank run, banking crisis, banks create money, Carmen Reinhart, central bank independence, collapse of Lehman Brothers, collateralized debt obligation, conceptual framework, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, Detroit bankruptcy, diversification, double entry bookkeeping, eurozone crisis, facts on the ground, financial innovation, fixed income, friendly fire, full employment, hiring and firing, housing crisis, Hyman Minsky, illegal immigration, inflation targeting, interest rate swap, Isaac Newton, Kenneth Rogoff, liquidity trap, London Interbank Offered Rate, Long Term Capital Management, market bubble, market clearing, market fundamentalism, McMansion, moral hazard, naked short selling, new economy, Nick Leeson, Northern Rock, Occupy movement, offshore financial centre, price mechanism, quantitative easing, Ralph Waldo Emerson, Robert Shiller, Robert Shiller, Ronald Reagan, shareholder value, short selling, South Sea Bubble, statistical model, the payments system, time value of money, too big to fail, workingage population, yield curve, Yogi Berra As we will see later, these tests were phenomenally successful.* And there was more. To date, there have been precious few studies of the broader effects of this grab bag of financialmarket policies. The only one I know of that even attempts to estimate the macroeconomic impacts of the entire potpourri was published in July 2010 by Mark Zandi and me. Our methodology was pretty simple—and very standard. Take a statistical model of the U.S. economy—we used the Moody’s Analytics model—and simulate it both with and without the policies. The differences between the two simulations are then estimates of the effects of the policies. These estimates, of course, are only as good as the model, but ours were huge. By 2011, we estimated, real GDP was about 6 percent higher, the unemployment rate was nearly 3 percentage points lower, and 4.8 million more Americans were employed because of the financialmarket policies (as compared with sticking with laissezfaire). … The standard analysis of conventional monetary policy—what we teach in textbooks and what central bankers are raised on—is predicated, roughly speaking, on constant risk spreads. When the Federal Reserve lowers riskless interest rates, like those on federal funds and Tbills, riskier interest rates, like those on corporate lending and auto loans, are supposed to follow suit.* The history on which we economists base our statistical models looks like that. Figure 9.1 shows the behavior of the interest rates on 10year Treasuries (the lower line) and Moody’s Baa corporate bonds (the upper line) over the period from January 1980 through June 2007, just before the crisis got started. The spread between these two rates is the vertical distance between the two lines, and the fact that they look roughly parallel means that the spread did not change much over those twentyseven years. 
pages: 480 words: 138,041 
The Book of Woe: The DSM and the Unmaking of Psychiatry by Gary Greenberg Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, Asperger Syndrome, backtotheland, David Brooks, impulse control, invisible hand, Isaac Newton, John Snow's cholera map, late capitalism, Louis Pasteur, McMansion, meta analysis, metaanalysis, neurotypical, phenotype, placebo effect, random walk, statistical model, theory of mind, Winter of Discontent If the DSM is not the map of an actual world against whose contours any changes can be validated, then opening up old arguments, or inviting new ones, might only sow dissension and reap chaos—and annoy Frances in the bargain. If he was going to revise the DSM, Frances told Pincus, then his goal would be stabilizing the system rather than trying to perfect it—or, as he put it to me, “loving the pet, even if it is a mutt5.” Frances thought there was a way to protect the system from both instability and pontificating: metaanalysis, a statistical method that, thanks to advances in computer technology and statistical modeling, had recently allowed statisticians to compile results from large numbers of studies by combining disparate data into common terms. The result was a statistical synthesis by which many different research projects could be treated as one large study. “We needed something that would leave it up to the tables rather than the people,” he told me, and metaanalysis was perfect for the job. “The idea was you would have to present evidence in tabular form that would be so convincing it would jump up and grab people by the throats.” … There’s a lot of information they”—I think she meant the APA, not the National Transportation Safety Board—“can look at, but it’s not a matter of analyzing the data to find out exactly what’s wrong.” Kraemer seemed to be saying that the point wasn’t to sift through the wreckage and try to prevent another catastrophe but, evidently, to crash the plane and then announce that the destruction could have been a lot worse. To be honest, however, I wasn’t sure. She was not making all that much sense, or maybe I just didn’t grasp the complexities of statistical modeling. And besides, I was distracted by a memory of something Steve Hyman once wrote. Fixing the DSM, finding another paradigm, getting away from its reifications—this, he said, was like “repairing a plane while it is flying.” It was a suggestive analogy, I thought at the time, one that recognized the near impossibility of the task even as it indicated its high stakes—and the necessity of keeping the mechanics from swearing and banging too loudly, lest the passengers start asking for a quick landing and a voucher on another airline. 
pages: 504 words: 139,137 
Efficiently Inefficient: How Smart Money Invests and Market Prices Are Determined by Lasse Heje Pedersen Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
algorithmic trading, Andrei Shleifer, asset allocation, backtesting, bank run, banking crisis, barriers to entry, BlackScholes formula, Brownian motion, buy low sell high, capital asset pricing model, commodity trading advisor, conceptual framework, corporate governance, credit crunch, Credit Default Swap, currency peg, David Ricardo: comparative advantage, declining real wages, discounted cash flows, diversification, diversified portfolio, Emanuel Derman, equity premium, Eugene Fama: efficient market hypothesis, fixed income, Flash crash, floating exchange rates, frictionless, frictionless market, Gordon Gekko, implied volatility, index arbitrage, index fund, interest rate swap, late capitalism, law of one price, Long Term Capital Management, margin call, market clearing, market design, market friction, merger arbitrage, mortgage debt, New Journalism, paper trading, passive investing, price discovery process, price stability, purchasing power parity, quantitative easing, quantitative trading / quantitative ﬁnance, random walk, Renaissance Technologies, Richard Thaler, riskadjusted returns, risk/return, Robert Shiller, Robert Shiller, shareholder value, Sharpe ratio, short selling, sovereign wealth fund, statistical arbitrage, statistical model, systematic trading, technology bubble, time value of money, total factor productivity, transaction costs, value at risk, Vanguard fund, yield curve, zerocoupon bond However, volatility is not an appropriate measure of risk for strategies with an extreme crash risk. For instance, volatility does not capture well the risk of selling outthemoney options, a strategy with small positive returns on most days but infrequent large crashes. To compute the volatility of a large portfolio, hedge funds need to account for correlations across assets, which can be accomplished by simulating the overall portfolio or by using a statistical model such as a factor model. Another measure of risk is valueatrisk (VaR), which attempts to capture tail risk (nonnormality). The VaR measures the maximum loss with a certain confidence, as seen in figure 4.1 below. For example, the VaR is the most that you can lose with a 95% or 99% confidence. For instance, a hedge fund has a oneday 95% VaR of $10 million if A simple way to estimate VaR is to line up past returns, sort them by magnitude, and find a return that has 5% worse days and 95% better days. … Intermediaries are always worried that the flows will continue against them. That part is invisible to them. The market demand might evolve as a wave builds up. The intermediary makes money when the wave subsides. Then the flows and equilibrium pricing are in the same direction. LHP: Or you might even short at a nickel cheap? MS: You might. Trend following is based on understanding macro developments and what governments are doing. Or they are based on statistical models of price movements. A positive up price tends to result in a positive up price. Here, however, it is not possible to determine whether the trend will continue. LHP: Why do spreads tend to widen during some periods of stress? MS: Well, capital becomes more scarce, both physical capital and human capital, in the sense that there isn’t enough time for intermediaries to understand what is happening in chaotic times. 
pages: 444 words: 138,781 
Evicted: Poverty and Profit in the American City by Matthew Desmond Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
affirmative action, Cass Sunstein, crack epidemic, Credit Default Swap, deindustrialization, desegregation, dumpster diving, ending welfare as we know it, ghettoisation, glass ceiling, housing crisis, informal economy, Jane Jacobs, late fees, New Urbanism, payday loans, price discrimination, profit motive, rent control, statistical model, superstar cities, The Chicago School, The Death and Life of Great American Cities, thinkpad, upwardly mobile, working poor, young professional With Jonathan Mijs, I combined all eviction court records between January 17 and February 26, 2011 (the Milwaukee Eviction Court Study period) with information about aspects of tenants’ neighborhoods, procured after geocoding the addresses that appeared in the eviction records. Working with the Harvard Center for Geographic Analysis, I also calculated the distance (in drive miles and time) between tenants’ addresses and the courthouse. Then I constructed a statistical model that attempted to explain the likelihood of a tenant appearing in court based on aspects of that tenant’s case and her or his neighborhood. The model generated only null findings. How much a tenant owed a landlord, her commute time to the courthouse, her gender—none of these factors were significantly related to appearing in court. I also investigated whether several aspects of a tenant’s neighborhood—e.g., its eviction, poverty, and crime rates—mattered when it came to explaining defaults. … In those where children made up at least 40 percent of the population, 1 household in every 12 was. All else equal, a 1 percent increase in the percentage of children in a neighborhood is predicted to increase a neighborhood’s evictions by almost 7 percent. These estimates are based on courtordered eviction records that took place in Milwaukee County between January 1, 2010, and December 31, 2010. The statistical model evaluating the association between a neighborhood’s percentage of children and its number of evictions is a zeroinflated Poisson regression, which is described in detail in Matthew Desmond et al., “Evicting Children,” Social Forces 92 (2013): 303–27. 3. That misery could stick around. At least two years after their eviction, mothers like Arleen still experienced significantly higher rates of depression than their peers. 
pages: 624 words: 127,987 
The Personal MBA: A WorldClass Business Education in a Single Volume by Josh Kaufman Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, Atul Gawande, Black Swan, business process, buy low sell high, capital asset pricing model, Checklist Manifesto, cognitive bias, correlation does not imply causation, Credit Default Swap, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, Dean Kamen, delayed gratification, discounted cash flows, double entry bookkeeping, Douglas Hofstadter, en.wikipedia.org, Frederick Winslow Taylor, Gödel, Escher, Bach, high net worth, hindsight bias, index card, inventory management, iterative process, job satisfaction, Johann Wolfgang von Goethe, Kevin Kelly, Lao Tzu, loose coupling, loss aversion, market bubble, Network effects, Parkinson's law, Paul Buchheit, Paul Graham, placemaking, premature optimization, Ralph Waldo Emerson, rent control, side project, statistical model, stealth mode startup, Steve Jobs, Steve Wozniak, subscription business, telemarketer, the scientific method, time value of money, Toyota Production System, tulip mania, Upton Sinclair, Walter Mischel, Y Combinator, Yogi Berra The primary question is not whether attending a university is a positive experience: it’s whether or not the experience is worth the cost.9 2. MBA programs teach many worthless, outdated, even outright damaging concepts and practices—assuming your goal is to actually build a successful business and increase your net worth. Many of my MBAHOLDING readers and clients come to me after spending tens (sometimes hundreds) of thousands of dollars learning the ins and outs of complex financial formulas and statistical models, only to realize that their MBA program didn’t teach them how to start or improve a real, operating business. That’s a problem—graduating from business school does not guarantee having a useful working knowledge of business when you’re done, which is what you actually need to be successful. 3. MBA programs won’t guarantee you a highpaying job, let alone make you a skilled manager or leader with a shot at the executive suite. … Over time, managers and executives began using statistics and analysis to forecast the future, relying on databases and spreadsheets in much the same way ancient seers relied on tea leaves and goat entrails. The world itself is no less unpredictable or uncertain: as in the olden days, the signs only “prove” the biases and desires of the soothsayer. The complexity of financial transactions and the statistical models those transactions relied upon continued to grow until few practitioners fully understood how they worked or respected their limits. As Wired revealed in a February 2009 article, “Recipe for Disaster: The Formula That Killed Wall Street,” the inherent limitations of deified financial formulas such as the BlackScholes option pricing model, the Gaussian copula function, and the capital asset pricing model (CAPM) played a major role in the tech bubble of 2000 and the housing market and derivatives shenanigans behind the 2008 recession. 
pages: 199 words: 47,154 
Gnuplot Cookbook by Lee Phillips Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
bioinformatics, computer vision, generalpurpose programming language, pattern recognition, statistical model, web application These new features include the use of Unicode characters, transparency, new graph positioning commands, plotting objects, internationalization, circle plots, interactive HTML5 canvas plotting, iteration in scripts, lua/tikz/LaTeX integration, cairo and SVG terminal drivers, and volatile data. What this book covers Chapter 1, Plotting Curves, Boxes, Points, and more, covers the basic usage of Gnuplot: how to make all kinds of 2D plots for statistics, modeling, finance, science, and more. Chapter 2, Annotating with Labels and Legends, explains how to add labels, arrows, and mathematical text to our plots. Chapter 3, Applying Colors and Styles, covers the basics of colors and styles in gnuplot, plus transparency, and plotting with points and objects. Chapter 4, Controlling Your Tics, will show you how to get your tic marks and labels just right, along with gnuplot's new internationalization features. 
pages: 186 words: 49,251 
The Automatic Customer: Creating a Subscription Business in Any Industry by John Warrillow Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Airbnb, airport security, Amazon Web Services, asset allocation, barriers to entry, call centre, cloud computing, discounted cash flows, high net worth, Jeff Bezos, Network effects, passive income, rolodex, sharing economy, side project, Silicon Valley, Silicon Valley startup, software as a service, statistical model, Steve Jobs, Stewart Brand, subscription business, telemarketer, time value of money, Zipcar But your true return is much greater because you have had $1,200 of your customer’s money—interest free—to invest in your business. You have taken on a risk in guaranteeing your customer’s roof replacement and need to be paid for placing that bet. The repair job could have cost you $3,000, and then you would have taken an underwriting loss of $1,800 ($1,200−$3,000). Calculating your risk is the primary challenge of running a peaceofmind model company. Big insurance companies employ an army of actuaries who use statistical models to predict the likelihood of a claim being made. You don’t need to be quite so scientific. Instead, start by looking back at the last 20 roofs you’ve installed with a guarantee and figure out how many service calls you needed to make. That will give you a pretty good idea of the possible risk of offering a peaceofmind subscription. Assuming you’re not an actuary and you didn’t get your doctorate in math from MIT, it’s probably a wise idea to go slow in leveraging the peaceofmind subscription model. 
pages: 133 words: 42,254 
Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supplychain management, Watson beat the top human players on Jeopardy!, web application Much like the data themselves, the team should not be static in nature and should be able to evolve and adapt to the needs of the business. CHALLENGES REMAIN Locating the right talent to analyze data is the biggest hurdle in building a team. Such talent is in high demand, and the need for data analysts and data scientists continues to grow at an almost exponential rate. Finding this talent means that organizations will have to focus on data science and hire statistical modelers and text data–mining professionals as well as people who specialize in sentiment analysis. Success with Big Data analytics requires solid data models, statistical predictive models, and test analytic models, since these will be the core applications needed to do Big Data. Locating the appropriate talent takes more than just a typical IT job placement; the skills required for a good return on investment are not simple and are not solely technology oriented. 
pages: 219 words: 63,495 
50 Future Ideas You Really Need to Know by Richard Watson Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
23andMe, 3D printing, access to a mobile phone, Albert Einstein, artificial general intelligence, augmented reality, autonomous vehicles, BRICs, Buckminster Fuller, call centre, clean water, cloud computing, collaborative consumption, computer age, computer vision, crowdsourcing, dark matter, dematerialisation, digital Maoism, Elon Musk, energy security, failed state, future of work, Geoffrey West, Santa Fe Institute, germ theory of disease, happiness index / gross national happiness, hive mind, hydrogen economy, Internet of things, Jaron Lanier, life extension, Marshall McLuhan, megacity, natural language processing, Network effects, new economy, oil shale / tar sands, pattern recognition, peak oil, personalized medicine, phenotype, precision agriculture, profit maximization, RAND corporation, Ray Kurzweil, RFID, Richard Florida, Search for Extraterrestrial Intelligence, selfdriving car, semantic web, Skype, smart cities, smart meter, smart transportation, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, supervolcano, telepresence, The Wisdom of Crowds, Thomas Malthus, Turing test, urban decay, Vernor Vinge, Watson beat the top human players on Jeopardy!, web application, women in the workforce, workingage population, young professional Link all this to new imaging technologies, remote monitoring, medical smartcards, erecords and even gamification. One day, we may, for example, develop a tiny chip that can hold the full medical history of a person including any medical conditions, allergies, prescriptions and contact information (this is already planned in America). Digital vacuums Digital vacuuming refers to the practice of scooping up vast amounts of data then using mathematical and statistical models to determine content and possible linkages. The data itself can be anything from phone calls in historical or real time (the US company AT&T, for example, holds the records of 1.9 trillion telephone calls) to financial transactions, emails and Internet site visits. Commercial applications could include future health risks to counterterrorism. The card could feature a picture ID and hours of video content, such as Xrays or moving medical imagery. 
pages: 222 words: 53,317 
Overcomplicated: Technology at the Limits of Comprehension by Samuel Arbesman Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
3D printing, algorithmic trading, Anton Chekhov, Apple II, Benoit Mandelbrot, citation needed, combinatorial explosion, Danny Hillis, David Brooks, discovery of the americas, en.wikipedia.org, Erik Brynjolfsson, Flash crash, friendly AI, game design, Google X / Alphabet X, Googley, HyperCard, Inbox Zero, Isaac Newton, iterative process, Kevin Kelly, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mandelbrot fractal, Minecraft, Netflix Prize, Nicholas Carr, Parkinson's law, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, Richard Feynman: Challenger Oring, Second Machine Age, selfdriving car, software studies, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, Stewart Brand, superintelligent machines, Therac25, Tyler Cowen: Great Stagnation, urban planning, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, Y2K What techniques are used by experts: Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (Oxford, UK: Oxford University Press, 2014), 15. say, 99.9 percent of the time: I made these numbers up for effect, but if any linguist wants to chat, please reach out! “based on millions of specific features”: Alon Halevy et al., “The Unreasonable Effectiveness of Data,” IEEE Intelligent Systems 24, no. 2 (2009): 8–12. In some ways, these statistical models are actually simpler than those that start from seemingly more elegant rules, because the latter end up being complicated by exceptions. sophisticated machine learning techniques: See Douglas Heaven, “Higher State of Mind,” New Scientist 219 (August 10, 2013), 32–35, available online (under the title “Not Like Us: Artificial Minds We Can’t Understand”): http://complex.elte.hu/~csabai/simulationLab/AI_08_August_2013_New_Scientist.pdf. 

Syntactic Structures by Chomsky, Noam Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
finite state, statistical model We shall see, in fact, in § 7, that there are deep structural reasons for distinguish i ng (3) and (4) from (5) and (6) ; but before we are able to find an explana tion for such facts as these we shall have to carry the theory of syntactic structure a good deal beyond its fam i l iar li mits. 2.4 Third, the notion "grammatical i n English" cannot be identi 16 SYNTACTIC STRUCTURES fied in any way with the notion "h igh order of statistical approxi mation to English." It is fa ir to assume that neither sentence ( I ) nor (2) (nor i ndeed any part of these sentences) has ever occurred in an English di scourse. Hence, in ,my statistical model for grammatical ness, these sentences will be ruled out on i dentica l grounds as equally 'remote' from English. Yet ( I ), though nonsensica l, i s grammatical, w h i l e ( 2 ) is not. Presented with these sentences, a speaker of English will read ( I ) with a normal sentence intonation, but he will read (2) with a fall ing i ntonation on each word ; i n fact, with just the i ntonation pattern given to any sequence of unrelated words. 
pages: 291 words: 77,596 
Total Recall: How the EMemory Revolution Will Change Everything by C. Gordon Bell, Jim Gemmell Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
airport security, Albert Einstein, book scanning, cloud computing, conceptual framework, full text search, information retrieval, invention of writing, inventory management, Isaac Newton, Menlo Park, optical character recognition, pattern recognition, performance metric, RAND corporation, RFID, semantic web, Silicon Valley, Skype, social web, statistical model, Stephen Hawking, Steve Ballmer, Ted Nelson, telepresence, Turing test, Vannevar Bush, web application Adding summarization to visualization for geolocated photos: Ahern, Shane, Mor Naaman, Rahul Nair, Jeannie Yang. “World Explorer: Visualizing Aggregate Data from Unstructured Text in GeoReferenced Collections.” In Proceedings, Seventh ACM/IEEECS Joint Conference on Digital Libraries ( JCDL 07), June 2007. The Stuff I’ve Seen project did some experiments that showed how displaying milestones alongside a timeline may help orient the user. Horvitz et al. used statistical models to infer the probability that users will consider events to be memory landmarks. Ringel, M., E. Cutrell, S. T. Dumais, and E. Horvitz. 2003. “Milestones in Time: The Value of Landmarks in Retrieving Information from Personal Stores.” Proceedings of IFIP Interact 2003. Horvitz, Eric, Susan Dumais, and Paul Koch. “Learning Predictive Models of Memory Landmarks.” CogSci 2004: 26th Annual Meeting of the Cognitive Science Society, Chicago, August 2004. 
pages: 279 words: 75,527 
Collider by Paul Halpern Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, Albert Michelson, anthropic principle, cosmic microwave background, cosmological constant, dark matter, Ernest Rutherford, Gary Taubes, gravity well, horn antenna, index card, Isaac Newton, Magellanic Cloud, pattern recognition, Richard Feynman, Richard Feynman, Ronald Reagan, Solar eclipse in 1919, statistical model, Stephen Hawking Although this could represent an escaping graviton, more likely possibilities would need to be ruled out, such as the commonplace production of neutrinos. Unfortunately, even a hermetic detector such as ATLAS can’t account for the streams of lost neutrinos that pass unhindered through almost everything in nature—except by estimating the missing momentum and assuming it is all being transferred to neutrinos. Some physicists hope that statistical models of neutrino production would eventually prove sharp enough to indicate significant differences between the expected and actual pictures. Such discrepancies could prove that gravitons fled from collisions and ducked into regions beyond. Another potential means of establishing the existence of extra dimensions would be to look for the hypothetical phenomena called KaluzaKlein excitations (named for Klein and an earlier unification pioneer, German mathematician Theodor Kaluza). 
pages: 204 words: 67,922 
Elsewhere, U.S.A: How We Got From the Company Man, Family Dinners, and the Affluent Society to the Home Office, BlackBerry Moms,and Economic Anxiety by Dalton Conley Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
3D printing, call centre, clean water, dematerialisation, demographic transition, Edward Glaeser, extreme commuting, feminist movement, financial independence, Firefox, Frank Levy and Richard Murnane: The New Division of Labor, Home mortgage interest deduction, income inequality, informal economy, Jane Jacobs, John Maynard Keynes: Economic Possibilities for our Grandchildren, knowledge economy, knowledge worker, laborforce participation, late capitalism, low skilled workers, manufacturing employment, McMansion, mortgage tax deduction, new economy, oil shock, PageRank, Ponzi scheme, positional goods, postindustrial society, Postmaterialism, postmaterialism, principal–agent problem, recommendation engine, Richard Florida, rolodex, Ronald Reagan, Silicon Valley, Skype, statistical model, The Death and Life of Great American Cities, The Great Moderation, The Wealth of Nations by Adam Smith, Thomas Malthus, Thorstein Veblen, transaction costs, women in the workforce, Yom Kippur War And how much should Amex have paid for this privilege? Should they have gotten a discount since the first word of their brand is also the first word of American Airlines and thereby reinforces—albeit in a subtle way—the host company’s image? In order to know the value of the deal, they would have had to know how much the marketing campaign increases their business. Impossible. No focus group or statistical model will tell Amex how much worse or better their bottom line would have been in the absence of this marketing campaign. Ditto for the impact of billboards, product placement, and special promotions like airline mileage plans. There are simply too many other forces that come into play to be able to isolate the impact of a specific effort. Ditto for most of the symbolic economy. It is ironic that in this age of markets and seemingly limitless information, we can’t get the very answers we need to make rational business decisions. 
pages: 306 words: 78,893 
After the New Economy: The Binge . . . And the Hangover That Won't Go Away by Doug Henwood Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
accounting loophole / creative accounting, affirmative action, Asian financial crisis, barriers to entry, borderless world, Branko Milanovic, Bretton Woods, capital controls, corporate governance, correlation coefficient, credit crunch, deindustrialization, dematerialisation, deskilling, ending welfare as we know it, feminist movement, full employment, gender pay gap, George Gilder, glass ceiling, Gordon Gekko, greed is good, half of the world's population has never made a phone call, income inequality, indoor plumbing, Internet Archive, job satisfaction, jointstock company, Kevin Kelly, laborforce participation, liquidationism / Banker’s doctrine / the Treasury view, manufacturing employment, means of production, minimum wage unemployment, Naomi Klein, new economy, occupational segregation, pets.com, profit maximization, purchasing power parity, race to the bottom, Ralph Nader, Robert Gordon, Robert Shiller, Robert Shiller, Ronald Reagan, shareholder value, Silicon Valley, Simon Kuznets, statistical model, structural adjustment programs, Telecommunications Act of 1996, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, The Wealth of Nations by Adam Smith, total factor productivity, union organizing, War on Poverty, women in the workforce, working poor, Y2K It's also hard to reconcile with the fact that the distribution of educational attainment has long been growing less, not more, unequal. Even classic statements of this skills argument, Hke that of Juhn, Murphy, and Pierce (1993), find that the standard proxies for skill Hke years of education and years of work experience (proxies being needed because skill is nearly impossible to define or measure) only explain part of the increase in polarization—less than half, in fact. Most of the increase remains unexplained by statistical models, a remainder that is typically attributed to "unobserved" attributes. That is, since conventional economists believe as a matter of faith that market rates of pay are fair compensation for a worker s productive contribution, any inexpHcable anomaUes in pay must be the result of things a boss can see that elude the academics model. Those of us w^ho are not constrained by a faith in the correlation of pay and productivity, or v^ho don't accept conventional definitions of what constitutes productive labor, will want to look elsewhere. 
pages: 225 words: 11,355 
Financial Market Meltdown: Everything You Need to Know to Understand and Survive the Global Credit Crisis by Kevin Mellyn Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
assetbacked security, bank run, banking crisis, Bernie Madoff, bonus culture, Bretton Woods, collateralized debt obligation, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, cuban missile crisis, disintermediation, diversification, fiat currency, financial deregulation, financial innovation, financial intermediation, fixed income, Francis Fukuyama: the end of history, global reserve currency, Home mortgage interest deduction, Isaac Newton, jointstock company, liquidity trap, London Interbank Offered Rate, margin call, market clearing, moral hazard, mortgage tax deduction, Northern Rock, offshore financial centre, paradox of thrift, pattern recognition, pension reform, pets.com, Plutocrats, plutocrats, Ponzi scheme, profit maximization, pushing on a string, reserve currency, risk tolerance, riskadjusted returns, road to serfdom, Ronald Reagan, shareholder value, Silicon Valley, South Sea Bubble, statistical model, The Great Moderation, the payments system, too big to fail, value at risk, very high income, War on Poverty, Y2K, yield curve Financial innovation was all about getting more credit into the hands of consumers, making more income using less capital, and turning what had been concentrated risks off the books of banks into securities that could be traded between and owned by professional investors who could be expected to look after themselves. Like much of the ‘‘progress’’ of the last century, it was a matter of replacing common sense and tradition with science. The models produced using advanced statistics and computers were designed by brilliant minds from the best universities. At the Basle Committee, which set global standards for bank regulation to be followed by all major central banks, the use of statistical models to measure risk and reliance on the rating agencies were baked into the proposed rules for capital adequacy. The whole thing blew up not because of something obvious like greed. It failed because of the hubris, the fatal pride, of men and women who sincerely thought that they could build computer models that were capable of predicting risk and pricing it correctly. They were wrong. 4 t HOW WE GOT HERE Henry Ford famously said that history is bunk. 
pages: 274 words: 75,846 
The Filter Bubble: What the Internet Is Hiding From You by Eli Pariser Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
A Declaration of the Independence of Cyberspace, A Pattern Language, Amazon Web Services, augmented reality, backtotheland, Black Swan, borderless world, Build a better mousetrap, Cass Sunstein, citizen journalism, cloud computing, cognitive dissonance, crowdsourcing, Danny Hillis, data acquisition, disintermediation, don't be evil, Filter Bubble, Flash crash, fundamental attribution error, global village, Haight Ashbury, Internet of things, Isaac Newton, Jaron Lanier, Jeff Bezos, jimmy wales, Kevin Kelly, knowledge worker, Mark Zuckerberg, Marshall McLuhan, megacity, Netflix Prize, new economy, PageRank, paypal mafia, Peter Thiel, recommendation engine, RFID, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, social graph, social software, social web, speech recognition, Startup school, statistical model, stem cell, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, the scientific method, urban planning, Whole Earth Catalog, WikiLeaks, Y Combinator The best way to avoid overfitting, as Popper suggests, is to try to prove the model wrong and to build algorithms that give the benefit of the doubt. If Netflix shows me a romantic comedy and I like it, it’ll show me another one and begin to think of me as a romanticcomedy lover. But if it wants to get a good picture of who I really am, it should be constantly testing the hypothesis by showing me Blade Runner in an attempt to prove it wrong. Otherwise, I end up caught in a local maximum populated by Hugh Grant and Julia Roberts. The statistical models that make up the filter bubble write off the outliers. But in human life it’s the outliers who make things interesting and give us inspiration. And it’s the outliers who are the first signs of change. One of the best critiques of algorithmic prediction comes, remarkably, from the latenineteenthcentury Russian novelist Fyodor Dostoyevsky, whose Notes from Underground was a passionate critique of the utopian scientific rationalism of the day. 
pages: 322 words: 77,341 
I.O.U.: Why Everyone Owes Everyone and No One Can Pay by John Lanchester Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
assetbacked security, bank run, banking crisis, Berlin Wall, Bernie Madoff, Big bang: deregulation of the City of London, BlackScholes formula, Celtic Tiger, collateralized debt obligation, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, diversified portfolio, double entry bookkeeping, Exxon Valdez, Fall of the Berlin Wall, financial deregulation, financial innovation, fixed income, George Akerlof, greed is good, hindsight bias, housing crisis, Hyman Minsky, interest rate swap, invisible hand, Jane Jacobs, John Maynard Keynes: Economic Possibilities for our Grandchildren, laissezfaire capitalism, liquidity trap, Long Term Capital Management, loss aversion, Martin Wolf, mortgage debt, mortgage tax deduction, mutually assured destruction, new economy, Nick Leeson, Northern Rock, Own Your Own Home, Ponzi scheme, quantitative easing, reserve currency, riskadjusted returns, Robert Shiller, Robert Shiller, Ronald Reagan, shareholder value, South Sea Bubble, statistical model, The Great Moderation, the payments system, too big to fail, tulip mania, value at risk The 1998 default was a 7sigma event. That means it should statistically have happened only once every 3 billion years. And it wasn’t the only one. The last decades have seen numerous 5, 6, and 7sigma events. Those are supposed to happen, respectively, one day in every 13,932 years, one day in every 4,039,906 years, and one day in every 3,105,395,365 years. Yet no one concluded from this that the statistical models in use were wrong. The mathematical models simply didn’t work in a crisis. They worked when they worked, which was most of the time; but the whole point of them was to assess risk, and some risks by definition happen at the edges of known likelihoods. The strange thing is that this is strongly hinted at in the VAR model, as propounded by its more philosophically minded defenders such as Philippe Jorion: it marks the boundaries of the known world, up to the VAR break, and then writes “Here be Dragons.” 

Exploring Everyday Things with R and Ruby by Sau Sheong Chang Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Alfred Russel Wallace, bioinformatics, business process, butterfly effect, cloud computing, Craig Reynolds: boids flock, Debian, Edward Lorenz: Chaos theory, Gini coefficient, income inequality, invisible hand, pvalue, price stability, Skype, statistical model, stem cell, Stephen Hawking, text mining, The Wealth of Nations by Adam Smith, We are the 99%, web application, wikimedia commons The default method for a smooth geom in ggplot2 is the LOESS algorithm, which is suitable for a small number of data points. LOESS is not suitable for a large number of data points, however, because it scales on an O(n2) basis in memory, so instead we use the mgcv library and its gam method. We also send in the formula y~s(x), where s is the smoother function for GAM. GAM stands for generalized addictive model, which is a statistical model used to describe how items of data relate to each other. In our case, we use GAM as an algorithm in the smoother to provide us with a reasonably good estimation of how a large number of data points can be visualized. In Figure 85, you can see that the population of roids fluctuates over time between two extremes caused by the oversupply and exhaustion of food, respectively. Figure 85. 
pages: 373 words: 80,248 
Empire of Illusion: The End of Literacy and the Triumph of Spectacle by Chris Hedges Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, Ayatollah Khomeini, Cal Newport, clean water, collective bargaining, corporate governance, Credit Default Swap, haute couture, Honoré de Balzac, Howard Zinn, illegal immigration, income inequality, Joseph Schumpeter, Naomi Klein, offshore financial centre, Ralph Nader, Ronald Reagan, singlepayer health, statistical model, uranium enrichment He told the senators that the collapse of the global financial system is “likely to produce a wave of economic crises in emerging market nations over the next year.” He added that “much of Latin America, former Soviet Union states, and subSaharan Africa lack sufficient cash reserves, access to international aid or credit, or other coping mechanism.” “When those growth rates go down, my gut tells me that there are going to be problems coming out of that, and we’re looking for that,” he said. He referred to “statistical modeling” showing that “economic crises increase the risk of regimethreatening instability if they persist over a one to twoyear period.” Blair articulated the newest narrative of fear. As the economic unraveling accelerates, we will be told it is not the bearded Islamic extremists who threaten us most, although those in power will drag them out of the Halloween closet whenever they need to give us an exotic shock, but instead the domestic riffraff, environmentalists, anarchists, unions, rightwing militias, and enraged members of our dispossessed working class. 
pages: 291 words: 81,703 
Average Is Over: Powering America Beyond the Age of the Great Stagnation by Tyler Cowen Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Amazon Mechanical Turk, Black Swan, brain emulation, Brownian motion, Cass Sunstein, choice architecture, complexity theory, computer age, computer vision, cosmological constant, crowdsourcing, dark matter, David Brooks, David Ricardo: comparative advantage, deliberate practice, Drosophila, en.wikipedia.org, endowment effect, epigenetics, Erik Brynjolfsson, eurozone crisis, experimental economics, Flynn Effect, Freestyle chess, full employment, future of work, game design, income inequality, industrial robot, informal economy, Isaac Newton, Khan Academy, laborforce participation, Loebner Prize, low skilled workers, manufacturing employment, Mark Zuckerberg, meta analysis, metaanalysis, microcredit, Narrative Science, Netflix Prize, Nicholas Carr, pattern recognition, Peter Thiel, randomized controlled trial, Ray Kurzweil, reshoring, Richard Florida, Richard Thaler, Ronald Reagan, Silicon Valley, Skype, statistical model, stem cell, Steve Jobs, Turing test, Tyler Cowen: Great Stagnation, upwardly mobile, Yogi Berra I accessed the Wikipedia entry on string theory on December 26, 2012. Perhaps it will become clearer! On the age dynamics for achievement for noneconomists, see Benjamin F. Jones and Bruce A. Weinberg, “Age Dynamics in Scientific Creativity,” published online before print, PNAS, November 7, 2011, doi: 10.1073/pnas.1102895108. On data crunching pushing out theory, see the famous essay by Leo Breiman, “Statistical Modeling: The Two Cultures,” Statistical Science, 2001, 16(3): 199–231, including the comments on the piece as well. See also the recent piece by Betsey Stevenson and Justin Wolfers, “Business is Booming in Empirical Economics,” Bloomberg.com, August 6, 2012. And as mentioned earlier, see Daniel S. Hamermesh, “Six Decades of Top Economics Publishing: Who and How?” National Bureau of Economic Research, Working Paper 18635, December 2012. 
pages: 589 words: 69,193 
Mastering Pandas by Femi Anthony Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Amazon Web Services, correlation coefficient, correlation does not imply causation, Debian, en.wikipedia.org, Internet of things, natural language processing, pvalue, random walk, side project, statistical model The normalizing constant doesn't always need to be calculated, especially in many popular algorithms such as MCMC, which we will examine later in this chapter. is the probability that the hypothesis is true, given the data that we observe. This is called the posterior. is the probability of obtaining the data, considering our hypothesis. This is called the likelihood. Thus, Bayesian statistics amounts to applying Bayes rule to solve problems in inferential statistics with H representing our hypothesis and D the data. A Bayesian statistical model is cast in terms of parameters, and the uncertainty in these parameters is represented by probability distributions. This is different from the Frequentist approach where the values are regarded as deterministic. An alternative representation is as follows: where, is our unknown data and is our observed data In Bayesian statistics, we make assumptions about the prior data and use the likelihood to update to the posterior probability using the Bayes rule. 
pages: 242 words: 68,019 
Why Information Grows: The Evolution of Order, From Atoms to Economies by Cesar Hidalgo Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Ada Lovelace, Albert Einstein, Arthur Eddington, Claude Shannon: information theory, David Ricardo: comparative advantage, Douglas Hofstadter, frictionless, frictionless market, George Akerlof, Gödel, Escher, Bach, income inequality, income per capita, invention of the telegraph, invisible hand, Isaac Newton, James Watt: steam engine, Jane Jacobs, job satisfaction, John von Neumann, New Economic Geography, Norbert Wiener, pvalue, phenotype, price mechanism, Richard Florida, Ronald Coase, Silicon Valley, Simon Kuznets, Skype, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, The Market for Lemons, The Nature of the Firm, The Wealth of Nations by Adam Smith, total factor productivity, transaction costs, workingage population GDP considers the production of goods and services within a country. GNP considers the goods and services produced by the citizens of a country, whether or not those goods are produced within the boundaries of the country. 5. Simon Kuznets, “Modern Economic Growth: Findings and Reflections,” American Economic Review 63, no. 3 (1973): 247–258. 6. Technically, total factor productivity is the residual or error term of the statistical model. Also, economists often refer to total factor productivity as technology, although this is a semantic deformation that is orthogonal to the definition of technology used by anyone who has ever developed a technology. In the language of economics, technology is the ability to do more—of anything—with the same cost. For inventors of technology, technology is the ability to do something completely new, which often involves the development of a new capacity. 
pages: 280 words: 79,029 
Smart Money: How HighStakes Financial Innovation Is Reshaping Our WorldÑFor the Better by Andrew Palmer Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Affordable Care Act / Obamacare, algorithmic trading, Andrei Shleifer, assetbacked security, availability heuristic, bank run, banking crisis, BlackScholes formula, bonus culture, Bretton Woods, call centre, Carmen Reinhart, cloud computing, collapse of Lehman Brothers, collateralized debt obligation, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, David Graeber, diversification, diversified portfolio, Edmond Halley, Edward Glaeser, Eugene Fama: efficient market hypothesis, eurozone crisis, family office, financial deregulation, financial innovation, fixed income, Flash crash, Google Glasses, Gordon Gekko, high net worth, housing crisis, Hyman Minsky, implied volatility, income inequality, index fund, Innovator's Dilemma, interest rate swap, Kenneth Rogoff, Kickstarter, late fees, London Interbank Offered Rate, Long Term Capital Management, loss aversion, margin call, Mark Zuckerberg, McMansion, mortgage debt, mortgage tax deduction, Network effects, Northern Rock, obamacare, payday loans, peertopeer lending, Peter Thiel, principal–agent problem, profit maximization, quantitative trading / quantitative ﬁnance, railway mania, randomized controlled trial, Richard Feynman, Richard Feynman, Richard Thaler, risk tolerance, riskadjusted returns, Robert Shiller, Robert Shiller, short selling, Silicon Valley, Silicon Valley startup, Skype, South Sea Bubble, sovereign wealth fund, statistical model, transaction costs, Tunguska event, unbanked and underbanked, underbanked, Vanguard fund, web application Public data from a couple of longitudinal studies showing the longterm relationship between education and income in the United States enabled him to build what he describes as “a simple multivariate regression model”—you know the sort, we’ve all built one—and work out the relationships between things such as test scores, degrees, and first jobs on later income. That model has since grown into something whizzier. An applicant’s education, SAT scores, work experience, and other details are pumped into a proprietary statistical model, which looks at people with comparable backgrounds and generates a prediction of that person’s personal income. Upstart now uses these data to underwrite loans to younger people—who often find it hard to raise money because of their limited credit histories. But the model was initially used to determine how much money an applicant could raise for each percentage point of future income they gave away. 
pages: 239 words: 70,206 
DataIsm: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else by Steve Lohr Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
23andMe, Affordable Care Act / Obamacare, Albert Einstein, big data  Walmart  Pop Tarts, bioinformatics, business intelligence, call centre, cloud computing, computer age, conceptual framework, Credit Default Swap, crowdsourcing, Daniel Kahneman / Amos Tversky, Danny Hillis, data is the new oil, David Brooks, East Village, Edward Snowden, Emanuel Derman, Erik Brynjolfsson, everywhere but in the productivity statistics, Frederick Winslow Taylor, Google Glasses, impulse control, income inequality, indoor plumbing, industrial robot, informal economy, Internet of things, invention of writing, John von Neumann, Mark Zuckerberg, market bubble, meta analysis, metaanalysis, natural language processing, obamacare, pattern recognition, payday loans, personalized medicine, precision agriculture, pre–internet, Productivity paradox, RAND corporation, rising living standards, Robert Gordon, Second Machine Age, selfdriving car, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, speech recognition, statistical model, Steve Jobs, Steven Levy, The Design of Experiments, the scientific method, Thomas Kuhn: the structure of scientific revolutions, unbanked and underbanked, underbanked, Von Neumann architecture, Watson beat the top human players on Jeopardy! Cleveland, then a researcher at Bell Labs, wrote a paper he called an “action plan” for essentially redefining statistics as an engineering task. “The altered field,” he wrote, “will be called ‘data science.’” In his paper, Cleveland, who is now a professor of statistics and computer science at Purdue University, described the contours of this new field. Data science, he said, would touch all disciplines of study and require the development of new statistical models, new computing tools, and educational programs in schools and corporations. Cleveland’s vision of a new field is now rapidly gaining momentum. The federal government, universities, and foundations are funding data science initiatives. Nearly all of these efforts are multidisciplinary melting pots that seek to bring together teams of computer scientists, statisticians, and mathematicians with experts who bring piles of data and unanswered questions from biology, astronomy, business and finance, public health, and elsewhere. 
pages: 283 words: 81,163 
How Capitalism Saved America: The Untold History of Our Country, From the Pilgrims to the Present by Thomas J. Dilorenzo Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
banking crisis, British Empire, collective bargaining, corporate governance, corporate social responsibility, financial deregulation, Fractional reserve banking, Hernando de Soto, income inequality, invisible hand, Joseph Schumpeter, laissezfaire capitalism, means of production, medical malpractice, Menlo Park, minimum wage unemployment, Plutocrats, plutocrats, price stability, profit maximization, profit motive, Ralph Nader, rent control, rentseeking, Ronald Coase, Ronald Reagan, Silicon Valley, statistical model, The Wealth of Nations by Adam Smith, transcontinental railway, union organizing, Upton Sinclair, working poor, Works Progress Administration Wages rose by a phenomenal 13.7 percent during the first three quarters of 1937 alone.46 The union/nonunion wage differential increased from 5 percent in 1933 to 23 percent by 1940.47 On top of this, the Social Security payroll and unemployment insurance taxes contributed to a rapid rise in governmentmandated fringe benefits, from 2.4 percent of payrolls in 1936 to 5.1 percent just two years later. Economists Richard Vedder and Lowell Gallaway have determined the costs of all this misguided legislation, showing how most of the abnormal unemployment of the 1930s would have been avoided had it not been for the New Deal. Using a statistical model, Vedder and Gallaway concluded that by 1940 the unemployment rate was more than 8 percentage points higher than it would have been without the legislationinduced growth in unionism and governmentmandated fringebenefit costs imposed on employers.48 Their conclusion: “The Great Depression was very significantly prolonged in both its duration and its magnitude by the impact of New Deal programs.”49 In addition to fascistic labor policies and governmentmandated wage and fringebenefit increases that destroyed millions of jobs, the Second New Deal was responsible for economydestroying tax increases and massive government spending on myriad government makework programs. 
pages: 579 words: 76,657 
Data Science from Scratch: First Principles with Python by Joel Grus Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
correlation does not imply causation, natural language processing, Netflix Prize, pvalue, Paul Graham, recommendation engine, SpamAssassin, statistical model (You attempt to explain to her that search engine algorithms are clever enough that this won’t actually work, but she refuses to listen.) Of course, she doesn’t want to write thousands of web pages, nor does she want to pay a horde of “content strategists” to do so. Instead she asks you whether you can somehow programatically generate these web pages. To do this, we’ll need some way of modeling language. One approach is to start with a corpus of documents and learn a statistical model of language. In our case, we’ll start with Mike Loukides’s essay “What is data science?” As in Chapter 9, we’ll use requests and BeautifulSoup to retrieve the data. There are a couple of issues worth calling attention to. The first is that the apostrophes in the text are actually the Unicode character u"\u2019". We’ll create a helper function to replace them with normal apostrophes: def fix_unicode(text): return text.replace(u"\u2019", "'") The second issue is that once we get the text of the web page, we’ll want to split it into a sequence of words and periods (so that we can tell where sentences end). 
pages: 277 words: 80,703 
Revolution at Point Zero: Housework, Reproduction, and Feminist Struggle by Silvia Federici Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Community Supported Agriculture, declining real wages, equal pay for equal work, feminist movement, financial independence, global village, illegal immigration, informal economy, invisible hand, laborforce participation, land tenure, means of production, microcredit, neoliberal agenda, new economy, Occupy movement, planetary scale, Scramble for Africa, statistical model, structural adjustment programs, the market place, trade liberalization, UNCLOS, wages for housework, Washington Consensus, women in the workforce, World Values Survey At least since the Zapatistas, on December 31, 1993, took over the zócalo of San Cristóbal to protest legislation dissolving the ejidal lands of Mexico, the concept of the “commons” has gained popularity among the radical Left, internationally and in the United States, appearing as a ground of convergence among anarchists, Marxists/socialists, ecologists, and ecofeminists.1 There are important reasons why this apparently archaic idea has come to the center of political discussion in contemporary social movements. Two in particular stand out. On the one side, there has been the demise of the statist model of revolution that for decades has sapped the efforts of radical movements to build an alternative to capitalism. On the other, the neoliberal attempt to subordinate every form of life and knowledge to the logic of the market has heightened our awareness of the danger of living in a world in which we no longer have access to seas, trees, animals, and our fellow beings except through the cashnexus. 

Raw Data Is an Oxymoron by Lisa Gitelman Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
collateralized debt obligation, computer age, continuous integration, crowdsourcing, Drosophila, Edmond Halley, Filter Bubble, Firefox, Google Earth, Howard Rheingold, index card, informal economy, Isaac Newton, Johann Wolfgang von Goethe, knowledge worker, Louis Daguerre, Menlo Park, optical character recognition, RFID, Richard Thaler, Silicon Valley, social graph, software studies, statistical model, Stephen Hawking, Steven Pinker, text mining, time value of money, trade route, Turing machine, urban renewal, Vannevar Bush Data storage of this scale, potentially measured in petabytes, would necessarily require sophisticated algorithmic querying in order to detect informational patterns. For David Gelernter, this type of data management would require “topsight,” a topdown perspective achieved through software modeling and the creation of microcosmic “mirror worlds,” in which raw data filters in from the bottom and the whole comes into focus through statistical modeling and rule and pattern extraction.36 The promise of topsight, in Gelernter’s terms, is a progression from annales to annalistes, from data collection that would satisfy a “neoVictorian curatorial” drive to data analysis that calculates prediction scenarios and manages risk.37 What would be the locus of suspicion and paranoid fantasy (Poster calls it “database anxiety”) if not such an intricate and operationally efficient system, the aggregating capacity of which easily ups the ante on Thomas Pynchon’s paranoid realization that “everything is connected”? 

The Armchair Economist: Economics and Everyday Life by Steven E. Landsburg Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, Arthur Eddington, diversified portfolio, firstprice auction, German hyperinflation, Golden Gate Park, invisible hand, means of production, price discrimination, profit maximization, Ralph Nader, random walk, Ronald Coase, sealedbid auction, secondprice auction, secondprice sealedbid, statistical model, the scientific method, Unsafe at Any Speed (Exactly why he thought this has never been determined, but he was quite sure of himself.) The commissioner became obsessed with the need to discourage punting and called in his assistants for advice on how to cope with the problem. One of those assistants, a fresh M.B.A., breathlessly announced that he had taken courses from an economist who was a great expert on all aspects of the game and who had developed detailed statistical models to predict how teams behave. He proposed retaining the economist to study what makes teams punt. 211 212 THE PITFALLS OF SCIENCE The commissioner summoned the economist, who went home with a large retainer check and a mandate to discover the causes of punting. Many hours later (he billed by the hour) the answer was at hand. Volumes of computer printouts left no doubt: Punting nearly always takes place on the fourth down. 

Deep Work: Rules for Focused Success in a Distracted World by Cal Newport Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
8hour work day, Albert Einstein, barriers to entry, business climate, Cal Newport, Capital in the TwentyFirst Century by Thomas Piketty, Clayton Christensen, David Brooks, deliberate practice, Donald Trump, Downton Abbey, en.wikipedia.org, Erik Brynjolfsson, experimental subject, follow your passion, Frank Gehry, informal economy, information retrieval, Internet Archive, Jaron Lanier, knowledge worker, Mark Zuckerberg, Marshall McLuhan, Merlin Mann, Nate Silver, new economy, Nicholas Carr, popular electronics, remote working, Richard Feynman, Richard Feynman, Silicon Valley, Silicon Valley startup, Snapchat, statistical model, the medium is the message, Watson beat the top human players on Jeopardy!, web application, winnertakeall economy But the real importance of this story is the experiment itself, and in particular, its complexity. It turns out to be really difficult to answer a simple question such as: What’s the impact of our current email habits on the bottom line? Cochran had to conduct a companywide survey and gather statistics from the IT infrastructure. He also had to pull together salary data and information on typing and reading speed, and run the whole thing through a statistical model to spit out his final result. And even then, the outcome is fungible, as it’s not able to separate out, for example, how much value was produced by this frequent, expensive email use to offset some of its cost. This example generalizes to most behaviors that potentially impede or improve deep work. Even though we abstractly accept that distraction has costs and depth has value, these impacts, as Tom Cochran discovered, are difficult to measure. 
pages: 305 words: 69,216 
A Failure of Capitalism: The Crisis of '08 and the Descent Into Depression by Richard A. Posner Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Andrei Shleifer, banking crisis, Bernie Madoff, collateralized debt obligation, collective bargaining, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, debt deflation, diversified portfolio, equity premium, financial deregulation, financial intermediation, Home mortgage interest deduction, illegal immigration, laissezfaire capitalism, Long Term Capital Management, market bubble, moral hazard, mortgage debt, oil shock, Ponzi scheme, price stability, profit maximization, race to the bottom, reserve currency, risk tolerance, risk/return, Robert Shiller, Robert Shiller, savings glut, shareholder value, short selling, statistical model, too big to fail, transaction costs, very high income Marketers to Americans (as distinct from Japanese) have had greater success appealing to the first set of motives than to the second. Quantitative models of risk—another fulfillment of Weber's prophecy that more and more activities would be brought under the rule of rationality— are also being blamed for the financial crisis. Suppose a trader is contemplating the purchase of a stock using largely borrowed money, so that if the stock falls even a little way the loss will be great. He might consult a statistical model that predicted, on the basis of the ups and downs of the stock in the preceding two years, the probability distribution of the stock's behavior over the next few days or weeks. The criticism is that the model would have based the prediction on market behavior during a period of rising stock values; the modeler should have gone back to the 1980s or earlier to get a fuller picture of the riskiness of the stock. 
pages: 251 words: 76,128 
Borrow: The American Way of Debt by Louis Hyman Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
assetbacked security, barriers to entry, bigbox store, cashless society, collateralized debt obligation, credit crunch, deindustrialization, deskilling, diversified portfolio, financial innovation, Ford paid five dollars a day, Home mortgage interest deduction, housing crisis, income inequality, market bubble, McMansion, mortgage debt, mortgage tax deduction, Network effects, new economy, Plutocrats, plutocrats, price stability, Ronald Reagan, statistical model, technology bubble, transaction costs, women in the workforce In the fall of 2006, the impossible happened. Housing prices began to fall. As creditrating agencies began to reassess the safety of the AAA mortgagebacked securities, insurance companies had to pony up greater quantities of collateral to guarantee the insurance policies on the bonds. The global credit market rested on a simple assumption: housing prices would always go up. Foreclosures would be randomly distributed, as the statistical models assumed. Yet as those models, and the companies that had created them, began to fail, a shudder ran through the corpus of global capitalism. The insurance giant AIG, which had hoped for so much profit in 1998, watched as its entire business—both traditional and new—went down, supported only by the U.S. government. The arcane operations of the credit markets spilled out into the larger economy, bringing about the greatest economic downturn since the Great Depression. 
pages: 238 words: 75,994 
A Burglar's Guide to the City by Geoff Manaugh Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
bigbox store, card file, dark matter, game design, index card, megacity, megastructure, Minecraft, Skype, smart cities, statistical model, the built environment, urban planning * The fundamental premise of the capturehouse program is that police can successfully predict what sorts of buildings and internal spaces will attract not just any criminal but a specific burglar, the unique individual each particular capture house was built to target. This is because burglars unwittingly betray personal, as well as shared, patterns in their crimes; they often hit the same sorts of apartments and businesses over and over. But the urge to mathematize this, and to devise complex statistical models for when and where a burglar will strike next, can lead to all sorts of analytical absurdities. A great example of this comes from an article published in the criminology journal Crime, Law and Social Change back in 2011. Researchers from the Physics Engineering Department at Tsinghua University reported some eyebrowraisingly specific data about the meteorological circumstances during which burglaries were most likely to occur in urban China. 
pages: 804 words: 212,335 
Revelation Space by Alastair Reynolds Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
game design, glass ceiling, gravity well, Kuiper Belt, planetary scale, random walk, statistical model But if Sajaki's equipment was not the best, chances were good that he had excellent algorithms to distil memory traces. Over centuries, statistical models had studied patterns of memory storage in ten billion human minds, correlating structure against experience. Certain impressions tended to be reflected in similar neural structures — internal qualia — which were the functional blocks out of which more complex memories were assembled. Those qualia were never the same from mind to mind, except in very rare cases, but neither were they encoded in radically different ways, since nature would never deviate far from the minimumenergy route to a particular solution. The statistical models could identify those qualia patterns very efficiently, and then map the connections between them out of which memories were forged. 
pages: 504 words: 89,238 
Natural language processing with Python by Steven Bird, Ewan Klein, Edward Loper Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
bioinformatics, business intelligence, conceptual framework, elephant in my pajamas, en.wikipedia.org, finite state, Firefox, information retrieval, Menlo Park, natural language processing, P = NP, search inside the book, speech recognition, statistical model, text mining, Turing test Structure of the published TIMIT Corpus: The CDROM contains doc, train, and test directories at the top level; the train and test directories both have eight subdirectories, one per dialect region; each of these contains further subdirectories, one per speaker; the contents of the directory for female speaker aks0 are listed, showing 10 wav files accompanied by a text transcription, a wordaligned transcription, and a phonetic transcription. there is a split between training and testing sets, which gives away its intended use for developing and evaluating statistical models. Finally, notice that even though TIMIT is a speech corpus, its transcriptions and associated data are just text, and can be processed using programs just like any other text corpus. Therefore, many of the computational methods described in this book are applicable. Moreover, notice that all of the data types included in the TIMIT Corpus fall into the two basic categories of lexicon and text, which we will discuss later. … For example, one intermediate position is to assume that humans are innately endowed with analogical and memorybased learning methods (weak rationalism), and use these methods to identify meaningful patterns in their sensory language experience (empiricism). We have seen many examples of this methodology throughout this book. Statistical methods inform symbolic models anytime corpus statistics guide the selection of productions in a contextfree grammar, i.e., “grammar engineering.” Symbolic methods inform statistical models anytime a corpus that was created using rulebased methods is used as a source of features for training a statistical language model, i.e., “grammatical inference.” The circle is closed. NLTK Roadmap The Natural Language Toolkit is a work in progress, and is being continually expanded as people contribute code. Some areas of NLP and linguistics are not (yet) well supported in NLTK, and contributions in these areas are especially welcome. 
pages: 666 words: 181,495 
In the Plex: How Google Thinks, Works, and Shapes Our Lives by Steven Levy Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
23andMe, AltaVista, Anne Wojcicki, Apple's 1984 Super Bowl advert, autonomous vehicles, book scanning, Brewster Kahle, Burning Man, business process, clean water, cloud computing, crowdsourcing, Dean Kamen, discounted cash flows, don't be evil, Douglas Engelbart, El Camino Real, fault tolerance, Firefox, Gerard Salton, Google bus, Google Chrome, Google Earth, Googley, HyperCard, hypertext link, IBM and the Holocaust, informal economy, information retrieval, Internet Archive, Jeff Bezos, Kevin Kelly, Mark Zuckerberg, Menlo Park, optical character recognition, PageRank, Paul Buchheit, Potemkin village, prediction markets, recommendation engine, risk tolerance, Sand Hill Road, Saturday Night Live, search inside the book, secondprice auction, Silicon Valley, skunkworks, Skype, slashdot, social graph, social software, social web, spectrum auction, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, Ted Nelson, telemarketer, trade route, traveling salesman, Vannevar Bush, web application, WikiLeaks, Y Combinator Och’s official role was as a scientist in Google’s research group, but it is indicative of Google’s view of research that no step was required to move beyond study into actual product implementation. Because Och and his colleagues knew they would have access to an unprecedented amount of data, they worked from the ground up to create a new translation system. “One of the things we did was to build very, very, very large language models, much larger than anyone has ever built in the history of mankind.” Then they began to train the system. To measure progress, they used a statistical model that, given a series of words, would predict the word that came next. Each time they doubled the amount of training data, they got a .5 percent boost in the metrics that measured success in the results. “So we just doubled it a bunch of times.” In order to get a reasonable translation, Och would say, you might feed something like a billion words to the model. But Google didn’t stop at a billion. … To keep making consistently accurate predictions on clickthrough rates and conversions, Google needed to know everything. “We are trying to understand the mechanisms behind the metrics,” says Qing Wu, a decision support analyst at Google. His specialty was forecasting. He could predict patterns of queries from season to season, in different parts of the day, and the climate. “We have the temperature data, we have the weather data, and we have the queries data so we can do correlation and statistical modeling.” To make sure that his predictions were on track, Qing Wu and his colleagues made use of dozens of onscreen dashboards with information flowing through them, a Bloomberg of the Googlesphere. “With a dashboard you can monitor the queries, the amount of money you make, how many advertisers we have, how many keywords they’re bidding on, what the ROI is for each advertiser.” It’s like the census data, he would say, only Google does much better analyzing its information than the government does with the census results. 
pages: 741 words: 179,454 
Extreme Money: Masters of the Universe and the Cult of Risk by Satyajit Das Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
affirmative action, Albert Einstein, algorithmic trading, Andy Kessler, Asian financial crisis, asset allocation, assetbacked security, bank run, banking crisis, banks create money, Basel III, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Big bang: deregulation of the City of London, Black Swan, Bonfire of the Vanities, bonus culture, Bretton Woods, BRICs, British Empire, capital asset pricing model, Carmen Reinhart, carried interest, Celtic Tiger, clean water, cognitive dissonance, collapse of Lehman Brothers, collateralized debt obligation, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, debt deflation, Deng Xiaoping, deskilling, discrete time, diversification, diversified portfolio, Doomsday Clock, Emanuel Derman, en.wikipedia.org, Eugene Fama: efficient market hypothesis, eurozone crisis, Fall of the Berlin Wall, financial independence, financial innovation, fixed income, full employment, global reserve currency, Goldman Sachs: Vampire Squid, Gordon Gekko, greed is good, happiness index / gross national happiness, haute cuisine, high net worth, Hyman Minsky, index fund, interest rate swap, invention of the wheel, invisible hand, Isaac Newton, job automation, Johann Wolfgang von Goethe, jointstock company, Joseph Schumpeter, Kenneth Rogoff, Kevin Kelly, labour market flexibility, laissezfaire capitalism, load shedding, locking in a profit, Long Term Capital Management, Louis Bachelier, margin call, market bubble, market fundamentalism, Marshall McLuhan, Martin Wolf, merger arbitrage, Mikhail Gorbachev, Milgram experiment, Mont Pelerin Society, moral hazard, mortgage debt, mortgage tax deduction, mutually assured destruction, Naomi Klein, Network effects, new economy, Nick Leeson, Nixon shock, Northern Rock, nuclear winter, oil shock, Own Your Own Home, pets.com, Plutocrats, plutocrats, Ponzi scheme, price anchoring, price stability, profit maximization, quantitative easing, quantitative trading / quantitative ﬁnance, Ralph Nader, RAND corporation, random walk, Ray Kurzweil, regulatory arbitrage, rent control, rentseeking, reserve currency, Richard Feynman, Richard Feynman, Richard Thaler, riskadjusted returns, risk/return, road to serfdom, Robert Shiller, Robert Shiller, Rod Stewart played at Stephen Schwarzman birthday party, rolodex, Ronald Reagan, Ronald Reagan: Tear down this wall, savings glut, shareholder value, Sharpe ratio, short selling, Silicon Valley, six sigma, Slavoj Žižek, South Sea Bubble, special economic zone, statistical model, Stephen Hawking, Steve Jobs, The Chicago School, The Great Moderation, the market place, the medium is the message, The Myth of the Rational Market, The Nature of the Firm, The Predators' Ball, The Wealth of Nations by Adam Smith, Thorstein Veblen, too big to fail, trickledown economics, Turing test, Upton Sinclair, value at risk, Yogi Berra, zerocoupon bond Mortgages against second and third homes, vacation homes and nonowneroccupied investment homes to be rented out (buytolet) or sold later (condo flippers) were allowed. HE (home equity) and HELOC (home equity line of credit), borrowing against the equity in existing homes, became prevalent. Empowered by hightech models, lenders loaned to less creditworthy borrowers, believing they could price any risk. Ben Bernanke shared his predecessor Alan Greenspan’s faith: “banks have become increasingly adept at predicting default risk by applying statistical models to data, such as credit scores.” Bernanke concluded that banks “have made substantial strides...in their ability to measure and manage risks.”13 Innovative affordability products included jumbo and super jumbo loans that did not conform to guidelines because of their size. More risky than prime but less risky than subprime, Alt A (Alternative A) mortgages were for borrowers who did not meet normal criteria. … In 2007, Moody’s upgraded three major Icelandic banks to the highest AAA rating, citing new methodology that took into account the likelihood of government support. Although Moody’s reversed the upgrades, all three banks collapsed in 2008. Unimpeded by insufficient disclosure, lack of information transparency, fraud, and improper accounting, traders anticipated these defaults, marking down bond prices well before rating downgrades. Ratingstructured securities required statistical models, mapping complex securities to historical patterns of default on normal bonds. With mortgage markets changing rapidly, this was like “using weather in Antarctica to forecast conditions in Hawaii.”17 Antarctica from 100 years ago! The agencies did not look at the underlying mortgages or loans in detail, relying instead on information from others. Moody’s Yuri Yoshizawa stated: “We’re structure experts. 

Debtor Nation: The History of America in Red Ink (Politics and Society in Modern America) by Louis Hyman Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
assetbacked security, bank run, barriers to entry, Bretton Woods, card file, central bank independence, computer age, corporate governance, credit crunch, declining real wages, deindustrialization, diversified portfolio, financial independence, financial innovation, Gini coefficient, Home mortgage interest deduction, housing crisis, income inequality, invisible hand, late fees, London Interbank Offered Rate, market fundamentalism, means of production, mortgage debt, mortgage tax deduction, pvalue, pattern recognition, profit maximization, profit motive, risk/return, Ronald Reagan, Silicon Valley, statistical model, technology bubble, the built environment, transaction costs, union organizing, white flight, women in the workforce, working poor Applications became more consistent and less subject to the whims of a particular loan officer. In computer models, feminist credit advocates believed they had found the solution to discriminatory lending, ushering in the contemporary calculated credit regimes under which we live today. Yet removing such basic demographics from any model was not as straightforward as the authors of the ECOA had hoped because of how THE CREDIT INFRASTRUCTURE 215 all statistical models function, but which legislators seem to not have fully understood. The “objective” credit statistics that legislators had pined for during the early investigations of the Consumer Credit Protection Act could now exist, but with new difficulties that stemmed from using regressions and not human judgment to decide on loans. In humanjudged credit lending, a loan officer who knew the race and gender of an applicant would be more discriminatory, whereas in a computer credit model, knowing the applicant’s race and gender allowed the credit decision to be less discriminatory. … The higher the level of education and income, the lower the effective interest rate paid, since such users tended more frequently to be nonrevolvers.96 The researchers found that young, large, lowincome families who could not save for major purchases, paid finance charges, while their opposite, older, smaller, highincome families who could save for major purchases, did not pay finance charges. Effectively the young and poor cardholders subsidized the convenience of the old and rich.97 And white.98 The new statistical models revealed that the second best predicator of revolving debt, after a respondent’s own “selfevaluation of his or her ability to save,” was race.99 But what these models revealed was that the very group—African Americans—that the politicians wanted to increase credit access to, tended to revolve their credit more than otherwise similar white borrowers. Though federal laws prevented businesses from using race in their lending decisions, academics were free to examine race as a credit model would and found that, even after adjusting for income and other demographics, race was still the second strongest predictive factor. 
pages: 257 words: 94,168 
Oil Panic and the Global Crisis: Predictions and Myths by Steven M. Gorelick Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
California gold rush, carbon footprint, energy security, energy transition, flex fuel, income per capita, invention of the telephone, meta analysis, metaanalysis, North Sea oil, oil shale / tar sands, oil shock, peak oil, price stability, profit motive, purchasing power parity, RAND corporation, statistical model, Thomas Malthus At a depth of over 5 miles, this find contains anywhere between 3 and 15 billion barrels and could comprise 11 percent of US production by 2013.107 In 2009, Chevron reported another deepwater discovery just 44 miles away that may yield 0.5 billion barrels and could be profitably produced at an oil price of $50 per barrel.108 The second insight from discovery trends is that an underlying premise of many statistical models of oil discovery is probably incorrect. This premise is that larger oil fields are found first, followed by the discovery of smaller fields. Large fields in geologically related proximity to one another are typically discovered first simply because they are the most easily detected targets. However, this is not always the case, as pointed out by Ron Charpentier of the USGS, who notes that new technology can rejuvenate the discovery 140 CounterArguments to Imminent Global Oil Depletion process. 
pages: 364 words: 101,286 
The Misbehavior of Markets by Benoit Mandelbrot Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, asset allocation, AugustinLouis Cauchy, Benoit Mandelbrot, Big bang: deregulation of the City of London, BlackScholes formula, British Empire, Brownian motion, buy low sell high, capital asset pricing model, carbonbased life, discounted cash flows, diversification, double helix, Edward Lorenz: Chaos theory, Elliott wave, equity premium, Eugene Fama: efficient market hypothesis, Fellow of the Royal Society, full employment, Georg Cantor, Henri Poincaré, implied volatility, index fund, informal economy, invisible hand, John von Neumann, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, market bubble, market microstructure, new economy, paper trading, passive investing, Paul Lévy, Plutocrats, plutocrats, price mechanism, quantitative trading / quantitative ﬁnance, Ralph Nelson Elliott, RAND corporation, random walk, risk tolerance, Robert Shiller, Robert Shiller, short selling, statistical arbitrage, statistical model, Steve Ballmer, stochastic volatility, transfer pricing, value at risk, volatility smile . • Abstract: Intermittency and periodicity, and the problem of long cycles. Econometrica 34, 1966 (Supplement): 152153. Mandelbrot, Benoit B. 1970. Longrun interdependence in price records and other economic time series. Econometrica 38: 122123. Mandelbrot, Benoit B. 1972. Possible refinement of the lognormal hypothesis concerning the distribution of energy dissipation in intermittent turbulence. Statistical Models and Turbulence. M. Rosenblatt and C. Van Atta, eds. Lecture Notes in Physics 12. New York: Springer, 333351. • Reprint: Chapter N14 of Mandelbrot 1999a. Mandelbrot, Benoit B. 1974a. Intermittent turbulence in selfsimilar cascades; divergence of high moments and dimension of the carrier. Journal of Fluid Mechanics 62: 331358. • Reprint: Chapter N15 of Mandelbrot 1999a. Mandelbrot, Benoit B. 1974b. 
pages: 227 words: 32,306 
Using Open Source Platforms for Business Intelligence: Avoid Pitfalls and Maximize Roi by Lyndsay Wise Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
barriers to entry, business intelligence, business process, call centre, cloud computing, en.wikipedia.org, Justintime delivery, knowledge worker, Richard Stallman, software as a service, statistical model, supplychain management, the market place All of these situations mean that different people within businesses have different worldviews and apply separate calculations to their work, resulting in data that is considered “manipulated” to some extent. 82 CHAPTER 8 The strategy behind BI adoption Mitigating risk Another reason organizations look at BI is to help mitigate risk. In the past, much risk management within BI remained within the realm of finance, insurance, and banking, but most organizations need to assess potential risk and help mitigate its effects on the organization. Within BI, this goes beyond information visibility and means using predictive modeling and other advanced statistical models to ensure that customers with accounts past due are not allowed to submit new orders unless it is known beforehand, or that insurance claims aren’t being submitted fraudulently. The National Health Care AntiFraud Association (NHCAA) estimates that in 2010, 3% of all health care spending or $68 billion is lost to health care fraud in the United States.2 This makes fraud detection in health care extremely important, especially when you consider that if you are paying for insurance in the United States, part of your insurance premiums are probably being paid to cover the instances of fraud that occur, making this relevant beyond health care insurance providers. 
pages: 302 words: 86,614 
The Alpha Masters: Unlocking the Genius of the World's Top Hedge Funds by Maneet Ahuja, Myron Scholes, Mohamed ElErian Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Asian financial crisis, asset allocation, assetbacked security, backtesting, Bernie Madoff, Bretton Woods, business process, call centre, collapse of Lehman Brothers, collateralized debt obligation, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, Donald Trump, en.wikipedia.org, family office, fixed income, high net worth, interest rate derivative, Isaac Newton, Long Term Capital Management, Mark Zuckerberg, merger arbitrage, NetJets, oil shock, pattern recognition, Ponzi scheme, quantitative easing, quantitative trading / quantitative ﬁnance, Renaissance Technologies, riskadjusted returns, risk/return, rolodex, short selling, Silicon Valley, South Sea Bubble, statistical model, Steve Jobs, systematic trading Wong says that the one thing most people don’t understand about systematic trading is the tradeoff between profit potential in the long term and the potential for shortterm fluctuation and losses. “We are all about the long run,” he says. “It’s why I say, over and over, the trend is your friend.” “If you’re a macro trader and you basically have 20 positions, you better make sure that no more than two or three are wrong. But we base our positions on statistical models, and we take hundreds of positions. At any given time, a lot of them are going to be wrong, and we have to accept that. But in the long run, we’ll be more right than wrong.” Evidently—since 1990, AHL’s total returns have exceeded 1,000 percent. Still, AHL is hardly invulnerable. The financial crisis brought on a sharp reversal, and the firm remains vulnerable to the Fedinduced drop in market volatility. 
pages: 335 words: 94,657 
The Bogleheads' Guide to Investing by Taylor Larimore, Michael Leboeuf, Mel Lindauer Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
asset allocation, buy low sell high, corporate governance, correlation coefficient, Daniel Kahneman / Amos Tversky, diversification, diversified portfolio, Donald Trump, endowment effect, estate planning, financial independence, financial innovation, high net worth, index fund, late fees, Long Term Capital Management, loss aversion, Louis Bachelier, margin call, market bubble, mental accounting, passive investing, random walk, risk tolerance, risk/return, Sharpe ratio, statistical model, transaction costs, Vanguard fund, yield curve Mensa is an exclusive society whose membership is restricted to persons scoring in the top 2 percent on IQ tests. During a 15year period when the S&P 500 had average annual returns of 15.3 percent, the Mensa Investment Club's performance averaged returns of only 2.5 percent. 3. In 1994, a hedge fund called Long Term Capital Management (LTCM) was created with the help of two Nobel Prizewinning economists. They believed they had a statistical model that could eliminate risk from investing. The fund was extremely leveraged. They controlled positions totaling $1.25 trillion, an amount equal to the annual budget of the U.S. government. After some spectacular early successes, a financial panic swept across Asia. In 1998, LTCM hemorrhaged and faced bankruptcy. To prevent a world economic collapse, the New York Federal Reserve orchestrated a buyout by 14 banks that put up a total of $3.6 billion to buy out the fund. 
pages: 377 words: 97,144 
Singularity Rising: Surviving and Thriving in a Smarter, Richer, and More Dangerous World by James D. Miller Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
23andMe, affirmative action, Albert Einstein, artificial general intelligence, Asperger Syndrome, barriers to entry, brain emulation, cloud computing, cognitive bias, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, David Brooks, David Ricardo: comparative advantage, Deng Xiaoping, en.wikipedia.org, feminist movement, Flynn Effect, friendly AI, hive mind, impulse control, indoor plumbing, invention of agriculture, Isaac Newton, John von Neumann, knowledge worker, Long Term Capital Management, low skilled workers, Netflix Prize, neurotypical, pattern recognition, Peter Thiel, phenotype, placebo effect, prisoner's dilemma, profit maximization, Ray Kurzweil, recommendation engine, reversible computing, Richard Feynman, Richard Feynman, Rodney Brooks, Silicon Valley, Singularitarianism, Skype, statistical model, Stephen Hawking, Steve Jobs, supervolcano, technological singularity, The Coming Technological Singularity, the scientific method, Thomas Malthus, transaction costs, Turing test, Vernor Vinge, Von Neumann architecture Nobel Prizewinning economist James Heckman has written that “an entire literature has found” that cognitive abilities “significantly affect wages.”147 Of course, “cognitive abilities” aren’t necessarily the same thing as g or IQ. Recall that the theory behind g, and therefore IQ’s importance, is that a single variable can represent intelligence. To check whether a single measure of cognitive ability has predictive value, Heckman developed a statistical model testing whether one number essentially representing g and another representing noncognitive ability can explain most of the variations in wages.148 Heckman’s model shows that it could. Heckman, however, carefully points out that noncognitive traits such as “sticktoitiveness” are at least as important as cognitive traits in determining wages—meaning that a lazy worker with a high IQ won’t succeed at Microsoft or Goldman Sachs. 
pages: 364 words: 99,613 
Servant Economy: Where America's Elite Is Sending the Middle Class by Jeff Faux Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
backtotheland, Bernie Sanders, Black Swan, Bretton Woods, BRICs, British Empire, call centre, centre right, cognitive dissonance, collateralized debt obligation, collective bargaining, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, currency manipulation / currency intervention, David Brooks, David Ricardo: comparative advantage, falling living standards, financial deregulation, financial innovation, full employment, hiring and firing, Howard Zinn, Hyman Minsky, illegal immigration, indoor plumbing, informal economy, invisible hand, John Maynard Keynes: Economic Possibilities for our Grandchildren, lake wobegon effect, Long Term Capital Management, market fundamentalism, Martin Wolf, McMansion, medical malpractice, mortgage debt, Naomi Klein, new economy, oil shock, Plutocrats, plutocrats, price mechanism, price stability, private military company, Ralph Nader, reserve currency, rising living standards, Robert Shiller, Robert Shiller, rolodex, Ronald Reagan, school vouchers, Silicon Valley, singlepayer health, South China Sea, statistical model, Steve Jobs, Thomas L Friedman, Thorstein Veblen, too big to fail, trade route, Triangle Shirtwaist Factory, union organizing, upwardly mobile, urban renewal, War on Poverty, We are the 99%, working poor, Yogi Berra, Yom Kippur War Martin Wolf, “Why Obama’s Plan Is Still Inadequate and Incomplete,”Financial Times, January 13, 2009. 9. “Larry Summers and Michael Steele,” This Week with Christiane Amanpour, ABC News, February 8, 2009. 10. CNN Politics, Election Center, November 24, 2010, http://www.cnn.com/ELECTION/2010/results/polls.main. 11. Andrew Gelman, “Unsurprisingly, More People Are Worried about the Economy and Jobs Than about Deficit,” Statistical Modeling, Causal Interference, and Social Science, June 19, 2010, http://www.stat.columbia.edu/~cook/movabletype/archives/2010/06/unsurprisingly.html;Ryan Grim, “Mayberry Machiavellis: Obama Political Team Handcuffing Recovery,” Huffington Post, July 6, 2010, http://www.huffingtonpost.com/2010/07/06/mayberrymachiavellisoba_n_636770.html. 12. Grim, “Mayberry Machiavellis.” 13. Ryan Lizza, “The Obama Memos,” New Yorker, January 30, 2012. 14. 
pages: 323 words: 89,795 
Food and Fuel: Solutions for the Future by Andrew Heintzman, Evan Solomon, Eric Schlosser Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
agricultural Revolution, Berlin Wall, bigbox store, clean water, Community Supported Agriculture, corporate social responsibility, David Brooks, deindustrialization, distributed generation, energy security, Exxon Valdez, flex fuel, full employment, half of the world's population has never made a phone call, hydrogen economy, land reform, microcredit, Negawatt, oil shale / tar sands, oil shock, peak oil, RAND corporation, risk tolerance, Silicon Valley, statistical model, Upton Sinclair, uranium enrichment Relying on member countries to provide their own catch reports, the FAO has few safeguards to ensure that its statistics are accurate. Specifically, there were some indications that China’s catch reports were too high. For example, some of China’s major fish populations were declared overexploited decades ago. In 2001, Watson and Pauly published an eyeopening study in the journal Nature about the true status of our world’s fisheries. These researchers used a statistical model to compare China’s officially reported catches to those that would be expected, given oceanographic conditions and other factors. They determined that China’s actual catches were likely closer to one half their reported levels. The implications of China’s overreporting are dramatic: instead of global catches increasing by 0.33 million tonnes per year since 1988, as reported by the FAO, catches have actually declined by 0.36 million tonnes per year. 
pages: 411 words: 108,119 
The Irrational Economist: Making Decisions in a Dangerous World by Erwann MichelKerjan, Paul Slovic Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Andrei Shleifer, availability heuristic, bank run, Black Swan, Cass Sunstein, clean water, cognitive dissonance, collateralized debt obligation, complexity theory, conceptual framework, corporate social responsibility, Credit Default Swap, credit default swaps / collateralized debt obligations, crosssubsidies, Daniel Kahneman / Amos Tversky, endowment effect, experimental economics, financial innovation, Fractional reserve banking, George Akerlof, hindsight bias, incomplete markets, invisible hand, Isaac Newton, iterative process, Loma Prieta earthquake, London Interbank Offered Rate, market bubble, market clearing, moral hazard, mortgage debt, placebo effect, price discrimination, price stability, RAND corporation, Richard Thaler, Robert Shiller, Robert Shiller, Ronald Reagan, statistical model, stochastic process, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, too big to fail, transaction costs, ultimatum game, University of East Anglia, urban planning First, we tend to overreact when virgin risks occur. The particular danger, now both available and salient, is likely to be overestimated in the future. Second, and by contrast, we tend to raise our probability estimate insufficiently when an experienced risk occurs. Followup research should document these tendencies with many more examples, and in laboratory settings. If improved predictions are our goal, it should also provide rigorous statistical models of effective updating of virgin and experienced risks. Future inquiry should consider resembled risks as well. Evidence from both terrorist incidents and financial markets suggests that we have difficulty extrapolating from risks that, though varied, bear strong similarities. Behavioral biases such as these are difficult to counteract, but awareness of them is the first step. Requiring careful analysis of all available data could help decision makers to make better risk assessments. 
pages: 313 words: 101,403 
My Life as a Quant: Reflections on Physics and Finance by Emanuel Derman Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Berlin Wall, bioinformatics, BlackScholes formula, Brownian motion, capital asset pricing model, Claude Shannon: information theory, Emanuel Derman, fixed income, Gödel, Escher, Bach, haute couture, hiring and firing, implied volatility, interest rate derivative, Jeff Bezos, John von Neumann, law of one price, linked data, Long Term Capital Management, moral hazard, Murray GellMann, pre–internet, publish or perish, quantitative trading / quantitative ﬁnance, Richard Feynman, Sharpe ratio, statistical arbitrage, statistical model, Stephen Hawking, Steve Jobs, stochastic volatility, technology bubble, transaction costs, value at risk, volatility smile, Y2K, yield curve, zerocoupon bond The most complex, which used interestrate simulation models of the B1T type I had helped develop at Goldman, was Salonion's optionadjusted spread model that reported the spread over Treasury bonds the pool would generate, on average, over all future interestrate scenarios. We ran daily reports on the desk's inventory using both these models. Different clients preferred different metrics, depending on their sophistication and on the accounting rules and regulations to which they were subject. We also did some longerterm, clientfocused research, developing improved statistical models for homeowner prepayments or programs for valuing the more exotic ARMbased structures that were growing in popularity. The traders on the desk used the optionadjusted spread model to decide how much to bid for newly available ARM pools. The calculation was arduous. Each pool consisted of a variety of mortgages with a range of coupons and a spectrum of servicing fees, and the optionadjusted spread was calculated by averaging over thousands of future scenarios, each one involving a monthbymonth simulation of interest rates over hundreds of months. 
pages: 342 words: 94,762 
Wait: The Art and Science of Delay by Frank Partnoy Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
algorithmic trading, Atul Gawande, Bernie Madoff, Black Swan, blood diamonds, Cass Sunstein, Checklist Manifesto, cognitive bias, collapse of Lehman Brothers, collateralized debt obligation, corporate governance, Daniel Kahneman / Amos Tversky, delayed gratification, Flash crash, Frederick Winslow Taylor, George Akerlof, Google Earth, Hernando de Soto, High speed trading, impulse control, income inequality, Isaac Newton, Long Term Capital Management, Menlo Park, mental accounting, meta analysis, metaanalysis, Nick Leeson, paper trading, Paul Graham, payday loans, Ralph Nader, Richard Thaler, risk tolerance, Robert Shiller, Robert Shiller, Ronald Reagan, Saturday Night Live, six sigma, Spread Networks laid a new fibre optics cable between New York and Chicago, statistical model, Steve Jobs, The Market for Lemons, the scientific method, The Wealth of Nations by Adam Smith, upwardly mobile, Walter Mischel Cohen, “Separate Neural Systems Value Immediacy and Delayed Monetary Rewards,” Science 306(2004): 503–507. It is worth noting that when economists attempt to describe human behavior using highlevel math, it often doesn’t go particularly well. Because the math is complex, people are prone to rely on it without question. And the equations often are vulnerable to unrealistic assumptions. Most recently, the financial crisis was caused in part by overreliance on statistical models that didn’t take into account the chances of declines in housing prices. But that was just the most recent iteration: the collapse of Enron, the implosion of the hedge fund LongTerm Capital Management, the billions of dollars lost by rogue traders Kweku Adoboli, Jerome Kerviel, Nick Leeson, and others—all of these fiascos have, at their heart, a mistaken reliance on complex math. Nassim N. 
pages: 339 words: 88,732 
The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson, Andrew McAfee Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
2013 Report for America's Infrastructure  American Society of Civil Engineers  19 March 2013, 3D printing, access to a mobile phone, additive manufacturing, Airbnb, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, American Society of Civil Engineers: Report Card, Any sufficiently advanced technology is indistinguishable from magic, autonomous vehicles, barriers to entry, Baxter: Rethink Robotics, British Empire, business intelligence, business process, call centre, clean water, combinatorial explosion, computer age, computer vision, congestion charging, corporate governance, crowdsourcing, David Ricardo: comparative advantage, employer provided health coverage, en.wikipedia.org, Erik Brynjolfsson, factory automation, falling living standards, Filter Bubble, first square of the chessboard / second half of the chessboard, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, full employment, game design, global village, happiness index / gross national happiness, illegal immigration, immigration reform, income inequality, income per capita, indoor plumbing, industrial robot, informal economy, inventory management, James Watt: steam engine, Jeff Bezos, jimmy wales, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, Khan Academy, knowledge worker, Kodak vs Instagram, law of one price, low skilled workers, Lyft, Mahatma Gandhi, manufacturing employment, Mark Zuckerberg, Mars Rover, means of production, Narrative Science, Nate Silver, natural language processing, Network effects, new economy, New Urbanism, Nicholas Carr, Occupy movement, oil shale / tar sands, oil shock, pattern recognition, payday loans, price stability, Productivity paradox, profit maximization, Ralph Nader, Ray Kurzweil, recommendation engine, Report Card for America’s Infrastructure, Robert Gordon, Rodney Brooks, Ronald Reagan, Second Machine Age, selfdriving car, sharing economy, Silicon Valley, Simon Kuznets, six sigma, Skype, software patent, sovereign wealth fund, speech recognition, statistical model, Steve Jobs, Steven Pinker, Stuxnet, supplychain management, TaskRabbit, technological singularity, telepresence, The Bell Curve by Richard Herrnstein and Charles Murray, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, total factor productivity, transaction costs, Tyler Cowen: Great Stagnation, Vernor Vinge, Watson beat the top human players on Jeopardy!, winnertakeall economy, Y2K These days those initial investigations will take place over the Internet and consist of typing into a search engine phrases like “Phoenix real estate agent,” “Phoenix neighborhoods,” and “Phoenix twobedroom house prices.” To test this hypothesis, Erik asked Google if he could access data about its search terms. He was told that he didn’t have to ask; the company made these data freely available over the Web. Erik and his doctoral student Lynn Wu, neither of whom was versed in the economics of housing, built a simple statistical model to look at the data utilizing the usergenerated content of search terms made available by Google. Their model linked changes in searchterm volume to later housing sales and price changes, predicting that if search terms like the ones above were on the increase today, then housing sales and prices in Phoenix would rise three months from now. They found their simple model worked. In fact, it predicted sales 23.6 percent more accurately than predictions published by the experts at the National Association of Realtors. 
pages: 294 words: 81,292 
Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
3D printing, AI winter, Amazon Web Services, artificial general intelligence, Automated Insights, Bernie Madoff, Bill Joy: nanobots, brain emulation, cellular automata, cloud computing, cognitive bias, computer vision, cuban missile crisis, Daniel Kahneman / Amos Tversky, Danny Hillis, data acquisition, don't be evil, Extropian, finite state, Flash crash, friendly AI, friendly fire, Google Glasses, Google X / Alphabet X, Isaac Newton, Jaron Lanier, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, Loebner Prize, lone genius, mutually assured destruction, natural language processing, Nicholas Carr, optical character recognition, PageRank, pattern recognition, Peter Thiel, prisoner's dilemma, Ray Kurzweil, Rodney Brooks, Search for Extraterrestrial Intelligence, selfdriving car, semantic web, Silicon Valley, Singularitarianism, Skype, smart grid, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, Stuxnet, superintelligent machines, technological singularity, The Coming Technological Singularity, traveling salesman, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, zero day Through several wellfunded projects, IBM pursues AGI, and DARPA seems to be backing every AGI project I look into. So, again, why not Google? When I asked Jason Freidenfelds, from Google PR, he wrote: … it’s much too early for us to speculate about topics this far down the road. We’re generally more focused on practical machine learning technologies like machine vision, speech recognition, and machine translation, which essentially is about building statistical models to match patterns—nothing close to the “thinking machine” vision of AGI. But I think Page’s quotation sheds more light on Google’s attitudes than Freidenfelds’s. And it helps explain Google’s evolution from the visionary, insurrectionist company of the 1990s, with the much touted slogan DON’T BE EVIL, to today’s opaque, Orwellian, personaldataaggregating behemoth. The company’s privacy policy shares your personal information among Google services, including Gmail, Google+, YouTube, and others. 
pages: 370 words: 94,968 
The Most Human Human: What Talking With Computers Teaches Us About What It Means to Be Alive by Brian Christian Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
4chan, Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Bertrand Russell: In Praise of Idleness, carbon footprint, cellular automata, Claude Shannon: information theory, cognitive dissonance, complexity theory, crowdsourcing, Donald Trump, Douglas Hofstadter, George Akerlof, Gödel, Escher, Bach, high net worth, Isaac Newton, Jacques de Vaucanson, Jaron Lanier, job automation, l'esprit de l'escalier, Loebner Prize, Menlo Park, Ray Kurzweil, RFID, Richard Feynman, Richard Feynman, Ronald Reagan, Skype, statistical model, Stephen Hawking, Steve Jobs, Steven Pinker, theory of mind, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy! The Turing test would seem to corroborate that. UCSD’s computational linguist Roger Levy: “Programs have gotten relatively good at what is actually said. We can devise complex new expressions, if we intend new meanings, and we can understand those new meanings. This strikes me as a great way to break the Turing test [programs] and a great way to distinguish yourself as a human. I think that in my experience with statistical models of language, it’s the unboundedness of human language that’s really distinctive.”4 Dave Ackley offers very similar confederate advice: “I would make up words, because I would expect programs to be operating out of a dictionary.” My mind on deponents and attorneys, I think of drug culture, how dealers and buyers develop their own micropatois, and how if any of these idiosyncratic reference systems started to become too standardized—if they use the wellknown “snow” for cocaine, for instance—their textmessage records and email records become much more legally vulnerable (i.e., have less room for deniability) than if the dealers and buyers are, like poets, ceaselessly inventing. 
pages: 353 words: 88,376 
The Investopedia Guide to Wall Speak: The Terms You Need to Know to Talk Like Cramer, Think Like Soros, and Buy Like Buffett by Jack (edited By) Guinan Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, asset allocation, assetbacked security, Brownian motion, business process, capital asset pricing model, clean water, collateralized debt obligation, correlation coefficient, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, discounted cash flows, diversification, diversified portfolio, dividendyielding stocks, equity premium, fixed income, implied volatility, index fund, interest rate swap, inventory management, London Interbank Offered Rate, margin call, market fundamentalism, mortgage debt, passive investing, performance metric, risk tolerance, riskadjusted returns, risk/return, shareholder value, Sharpe ratio, short selling, statistical model, time value of money, transaction costs, yield curve, zerocoupon bond Some examples of definedcontribution plans are 401(k) plans, moneypurchase pension plans, and profitsharing plans. Related Terms: • DefinedBenefit Plan • DefinedContribution Plan • Individual Retirement Account—IRA • Roth IRA • Tax Deferred 241 242 The Investopedia Guide to Wall Speak Quantitative Analysis What Does Quantitative Analysis Mean? A business or financial analysis technique that is used to understand market behavior by employing complex mathematical and statistical modeling, measurement, and research. By assigning a numerical value to variables, quantitative analysts try to replicate reality in mathematical terms. Quantitative analysis helps measure performance evaluation or valuation of a financial instrument. It also can be used to predict realworld events such as changes in a share’s price. Investopedia explains Quantitative Analysis In broad terms, quantitative analysis is a way of measuring things. 
pages: 338 words: 106,936 
The Physics of Wall Street: A Brief History of Predicting the Unpredictable by James Owen Weatherall Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, algorithmic trading, Antoine Gombaud: Chevalier de Méré, Asian financial crisis, bank run, Benoit Mandelbrot, Black Swan, BlackScholes formula, Bonfire of the Vanities, Bretton Woods, Brownian motion, butterfly effect, capital asset pricing model, Carmen Reinhart, Claude Shannon: information theory, collateralized debt obligation, collective bargaining, dark matter, Edward Lorenz: Chaos theory, Emanuel Derman, Eugene Fama: efficient market hypothesis, financial innovation, George Akerlof, Gerolamo Cardano, Henri Poincaré, invisible hand, Isaac Newton, iterative process, John Nash: game theory, Kenneth Rogoff, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, martingale, new economy, Paul Lévy, prediction markets, probability theory / Blaise Pascal / Pierre de Fermat, quantitative trading / quantitative ﬁnance, random walk, Renaissance Technologies, riskadjusted returns, Robert Gordon, Robert Shiller, Robert Shiller, Ronald Coase, Sharpe ratio, short selling, Silicon Valley, South Sea Bubble, statistical arbitrage, statistical model, stochastic process, The Chicago School, The Myth of the Rational Market, tulip mania, V2 rocket, volatility smile “Complex Critical Exponents From Renormalization Group Theory of Earthquakes: Implications for Earthquake Predictions.” Journal de Physique I 5 (5): 607–19. Sornette, Didier, and Christian Vanneste. 1992. “Dynamics and Memory Effects in Rupture of Thermal Fuse.” Physical Review Letters 68: 612–15. — — — . 1994. “Dendrites and Fronts in a Model of Dynamical Rupture with Damage.” Physical Review E 50 (6, December): 4327–45. Sornette, D., C. Vanneste, and L. Knopoff. 1992. “Statistical Model of Earthquake Foreshocks.” Physical Review A 45: 8351–57. Sourd, Véronique, Le. 2008. “Hedge Fund Performance in 2007.” EDHEC Risk and Asset Management Research Centre. Spence, Joseph. 1820. Observations, Anecdotes, and Characters, of Books and Men. London: John Murray. Stewart, James B. 1992. Den of Thieves. New York: Simon & Schuster. Stigler, Stephen M. 1986. The History of Statistics: The Measurement of Uncertainty Before 1900. 
pages: 312 words: 89,728 
The End of My Addiction by Olivier Ameisen Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, epigenetics, meta analysis, metaanalysis, placebo effect, randomized controlled trial, statistical model ., Hansen, H. J., Sunde, N. et al. (2002) Evidence of tolerance to baclofen in treatment of severe spasticity with intrathecal baclofen. Clinical Neurology and Neurosurgery 104, 142–145. Pelc, I., Ansoms, C., Lehert, P. et al. (2002) The European NEAT program: an integrated approach using acamprosate and psychosocial support for the prevention of relapse in alcoholdependent patients with a statistical modeling of therapy success prediction. Alcoholism: Clinical and Experimental Research 26, 1529–1538. Roberts, D. C. and Andrews, M. M. (1997) Baclofen suppression of cocaine selfadministration: demonstration using a discrete trials procedure. Psychopharmacology (Berlin) 131, 271–277. Shoaib, M., Swanner, L. S., Beyer, C. E. et al. (1998) The GABAB agonist baclofen modifies cocaine selfadministration in rats. 
pages: 561 words: 87,892 
Losing Control: The Emerging Threats to Western Prosperity by Stephen D. King Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Admiral Zheng, assetbacked security, barriers to entry, Berlin Wall, Bernie Madoff, Bretton Woods, BRICs, British Empire, capital controls, Celtic Tiger, central bank independence, collateralized debt obligation, corporate governance, credit crunch, crony capitalism, currency manipulation / currency intervention, currency peg, David Ricardo: comparative advantage, demographic dividend, demographic transition, Deng Xiaoping, Diane Coyle, Fall of the Berlin Wall, financial deregulation, financial innovation, Francis Fukuyama: the end of history, full employment, George Akerlof, German hyperinflation, Gini coefficient, hiring and firing, income inequality, income per capita, inflation targeting, invisible hand, Isaac Newton, knowledge economy, labour market flexibility, labour mobility, low skilled workers, market clearing, Martin Wolf, Mexican peso crisis / tequila crisis, Naomi Klein, new economy, Ponzi scheme, price mechanism, price stability, purchasing power parity, rentseeking, reserve currency, rising living standards, Ronald Reagan, savings glut, Silicon Valley, Simon Kuznets, sovereign wealth fund, spice trade, statistical model, technology bubble, The Great Moderation, The Market for Lemons, The Wealth of Nations by Adam Smith, Thomas Malthus, trade route, transaction costs, Washington Consensus, women in the workforce, workingage population, Y2K, Yom Kippur War WE’RE NOT ON OUR OWN In my twentyfive years as a professional economist, initially as a civil servant in Whitehall but, for the most part, as an employee of a major international bank, I’ve spent a good deal of time looking into the future. As the emerging nations first appeared on the economic radar screen, I began to realize I could talk about the future only by delving much further into the past. I wasn’t interested merely in the history incorporated into statistical models of the economy, a history which typically includes just a handful of years and therefore ignores almost all the interesting economic developments that have taken place over the last millennium. Instead, the history that mattered to me had to capture the long sweep of economic and political progress and all too frequent reversal. In recent years, as the emerging nations have taken their seats at the international table of powers and superpowers, economic and political history has become increasingly important. 
pages: 322 words: 84,752 
Pax Technica: How the Internet of Things May Set Us Free or Lock Us Up by Philip N. Howard Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Affordable Care Act / Obamacare, Berlin Wall, bitcoin, blood diamonds, Bretton Woods, Brian Krebs, British Empire, call centre, Chelsea Manning, citizen journalism, clean water, cloud computing, corporate social responsibility, crowdsourcing, Edward Snowden, en.wikipedia.org, failed state, Fall of the Berlin Wall, feminist movement, Filter Bubble, Firefox, Francis Fukuyama: the end of history, Google Earth, Howard Rheingold, income inequality, informal economy, Internet of things, Julian Assange, Kibera, Kickstarter, land reform, MPesa, Marshall McLuhan, megacity, Mikhail Gorbachev, mobile money, Mohammed Bouazizi, national security letter, Network effects, obamacare, Occupy movement, packet switching, pension reform, prediction markets, sentiment analysis, Silicon Valley, Skype, spectrum auction, statistical model, Stuxnet, trade route, uranium enrichment, WikiLeaks, zero day This makes it tough to learn from the causes and consequences of technology diffusion throughout history. Important events and recognizable causal connections can’t be replicated or falsified. We can’t repeat the Arab Spring in some kind of experiment. We can’t test its negation—an Arab Spring that never happened, or an Arab Spring minus one key factor that resulted in a different outcome. We don’t have enough large datasets about Arab Spring–like events to run statistical models. That doesn’t mean we shouldn’t try to learn from the real events that happened. In fact, for many in the social sciences, tracing how real events unfolded is the best way to understand political change. The richest explanations of the fall of the Berlin Wall, for example, as sociologist Steve Pfaff crafts them, come from such process tracing.2 We do, however, know enough to make some educated guesses about what will happen next. 
pages: 322 words: 88,197 
Wonderland: How Play Made the Modern World by Steven Johnson Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Ada Lovelace, Alfred Russel Wallace, Antoine Gombaud: Chevalier de Méré, Berlin Wall, bitcoin, Book of Ingenious Devices, Buckminster Fuller, Claude Shannon: information theory, Clayton Christensen, colonial exploitation, computer age, conceptual framework, crowdsourcing, cuban missile crisis, Drosophila, Fellow of the Royal Society, game design, global village, Hedy Lamarr / George Antheil, HyperCard, invention of air conditioning, invention of the printing press, invention of the telegraph, Islamic Golden Age, Jacquard loom, Jacquard loom, Jacques de Vaucanson, James Watt: steam engine, Jane Jacobs, John von Neumann, jointstock company, JosephMarie Jacquard, Landlord's Game, lone genius, megacity, Minecraft, Murano, Venice glass, music of the spheres, Necker cube, New Urbanism, Oculus Rift, On the Economy of Machinery and Manufactures, pattern recognition, pets.com, placebo effect, probability theory / Blaise Pascal / Pierre de Fermat, profit motive, QWERTY keyboard, Ray Oldenburg, spice trade, spinning jenny, statistical model, Steve Jobs, Steven Pinker, Stewart Brand, supplychain management, talking drums, the built environment, The Great Good Place, the scientific method, The Structural Transformation of the Public Sphere, trade route, Turing machine, Turing test, Upton Sinclair, urban planning, Victor Gruen, Watson beat the top human players on Jeopardy!, white flight, Whole Earth Catalog, working poor, Wunderkammern It was the first time anyone had begun talking, mathematically at least, about what we now call life expectancy. Probability theory served as a kind of conceptual fossil fuel for the modern world. It gave rise to the modern insurance industry, which for the first time could calculate with some predictive power the claims it could expect when insuring individuals or industries. Capital markets—for good and for bad—rely extensively on elaborate statistical models that predict future risk. “The pundits and pollsters who today tell us who is likely to win the next election make direct use of mathematical techniques developed by Pascal and Fermat,” the mathematician Keith Devlin writes. “In modern medicine, futurepredictive statistical methods are used all the time to compare the benefits of various drugs and treatments with their risks.” The astonishing safety record of modern aviation is in part indebted to the dice games Pascal and Fermat analyzed; today’s aircraft are statistical assemblages, with each part’s failure rate modeled to multiple decimal places. 
pages: 364 words: 102,926 
What the F: What Swearing Reveals About Our Language, Our Brains, and Ourselves by Benjamin K. Bergen Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
correlation does not imply causation, information retrieval, pre–internet, Ronald Reagan, statistical model, Steven Pinker So if you believe that exposure to violence in media could be a confounding factor—it correlates with exposure to profanity and could explain some amount of aggression—then you measure not only how much profanity but also how much violence children are exposed to. The two will probably correlate, but the key point is that you can measure exactly how much media violence correlates with child aggressiveness, and you can pull that apart in a statistical model from the amount that profanity exposure correlates with child aggressiveness. The authors of the Pediatrics study tried to do this. But to know that profanity exposure per se and not any of these other possible confounding factors is responsible for increased reports of aggressiveness, you’d need to do the same thing not just for exposure to media violence, as the authors did, but for every other possible confounding factor, which they did not. 
pages: 297 words: 91,141 
Market Sense and Nonsense by Jack D. Schwager Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
asset allocation, Bernie Madoff, Brownian motion, collateralized debt obligation, commodity trading advisor, conceptual framework, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, diversification, diversified portfolio, fixed income, high net worth, implied volatility, index arbitrage, index fund, London Interbank Offered Rate, Long Term Capital Management, margin call, market bubble, market fundamentalism, merger arbitrage, pattern recognition, performance metric, pets.com, Ponzi scheme, quantitative trading / quantitative ﬁnance, random walk, risk tolerance, riskadjusted returns, risk/return, Robert Shiller, Robert Shiller, Sharpe ratio, short selling, statistical arbitrage, statistical model, transaction costs, twosided market, value at risk, yield curve If, as occurred in 2008, they need to liquidate at the same time because of a flighttosafety psychology in the market, the huge imbalance between supply and demand can result in managers being forced to liquidate positions at deeply discounted prices. Statistical arbitrage. The premise underlying statistical arbitrage is that shortterm imbalances in buy and sell orders cause temporary price distortions, which provide shortterm trading opportunities. Statistical arbitrage is a meanreversion strategy that seeks to sell excessive strength and buy excessive weakness based on statistical models that define when shortterm price moves in individual equities are considered out of line relative to price moves in related equities. The origin of the strategy was a subset of statistical arbitrage called pairs trading. In pairs trading, the price ratios of closely related stocks are tracked (e.g., Ford and General Motors), and when the mathematical model indicates that one stock has gained too much versus the other (either by rising more or by declining less), it is sold and hedged by the purchase of the related equity in the pair. 
pages: 345 words: 92,849 
Equal Is Unfair: America's Misguided Fight Against Income Inequality by Don Watkins, Yaron Brook Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
3D printing, Affordable Care Act / Obamacare, Apple II, barriers to entry, Berlin Wall, Bernie Madoff, bluecollar work, business process, Capital in the TwentyFirst Century by Thomas Piketty, Cass Sunstein, collective bargaining, colonial exploitation, corporate governance, correlation does not imply causation, Credit Default Swap, crony capitalism, David Brooks, deskilling, Edward Glaeser, Elon Musk, en.wikipedia.org, financial deregulation, immigration reform, income inequality, indoor plumbing, inventory management, invisible hand, Isaac Newton, Jeff Bezos, Jony Ive, laissezfaire capitalism, Louis Pasteur, low skilled workers, means of production, minimum wage unemployment, Naomi Klein, new economy, obamacare, Peter Singer: altruism, Peter Thiel, profit motive, rent control, Ronald Reagan, Silicon Valley, Skype, statistical model, Steve Jobs, Steve Wozniak, The Spirit Level, too big to fail, trickledown economics, Uber for X, urban renewal, War on Poverty, women in the workforce, working poor You will no doubt encounter many claims that you can’t easily evaluate: academic studies that say inequality undermines mobility or economic progress, claims about “the bulk of the gains” going to “the rich” rather than the middle class, stories about injustices supposedly committed by “the 1 percent” against “the 99 percent.” In these cases the question to ask is: “Assuming this is a problem, what is your solution?” Inevitably, the inequality critics’ answer will be that some form of force must be used to tear down the top by depriving them of the earned, and to prop up the bottom by giving them the unearned. But nothing can justify an injustice, nor can any statistical model erase the fact that all of the values human life requires are a product of the human mind, and that the human mind cannot function without freedom. Don’t concede that the inequality alarmists value equality. The egalitarians pose as defenders of equality. But there is no such thing as being for equality across the board: different types of equality conflict. Namely, economic equality (including equality of opportunity) is incompatible with political equality. 

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, discrete time, George Gilder, Google Earth, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, late capitalism, linked data, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supplychain management, the scientific method, The Signal and the Noise by Nate Silver, transaction costs The difference between the humanities and social sciences in this respect is because the statistics used in the digital humanities are largely descriptive – identifying patterns and plotting them as counts, graphs, and maps. In contrast, the computational social sciences employ the scientific method, complementing descriptive statistics with inferential statistics that seek to identify causality. In other words, they are underpinned by an epistemology wherein the aim is to produce sophisticated statistical models that explain, simulate and predict human life. This is much more difficult to reconcile with postpositivist approaches. The defence then rests on the utility and value of the method and models, not on providing complementary analysis of a more expansive set of data. There are alternatives to this position, such as that adopted within critical GIS (Geographic Information Science) and radical statistics, and those who utilise mixedmethod approaches, that either employ models and inferential statistics while being mindful of their shortcomings, or more commonly only utilise descriptive statistics that are complemented with small data studies. 
pages: 291 words: 90,200 
Networks of Outrage and Hope: Social Movements in the Internet Age by Manuel Castells Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
access to a mobile phone, banking crisis, call centre, centre right, citizen journalism, cognitive dissonance, collective bargaining, conceptual framework, crowdsourcing, currency manipulation / currency intervention, disintermediation, en.wikipedia.org, housing crisis, income inequality, microcredit, Mohammed Bouazizi, Occupy movement, offshore financial centre, Port of Oakland, social software, statistical model, We are the 99%, web application, WikiLeaks, World Values Survey, young professional He wrote: “Countries where civil society and journalism made active use of the new information technologies subsequently experience a radical democratic transition or significant solidification of their democratic institutions” (2011: 200). Particularly significant, before the Arab Spring, was the transformation of social involvement in Egypt and Bahrain with the help of ICT diffusion. In a stream of research conducted in 2011 and 2012 after the Arab uprisings, Howard and Hussain, using a series of quantitative and qualitative indicators, probed a multicausal, statistical model of the processes and outcomes of the Arab uprisings by using fuzzy logic (Hussain and Howard 2012). They found that the extensive use of digital networks by a predominantly young population of demonstrators had a significant effect on the intensity and power of these movements, starting with a very active debate on social and political demands in the social media before the demonstrations’ onset. 
pages: 623 words: 448,848 
Food Allergy: Adverse Reactions to Foods and Food Additives by Dean D. Metcalfe Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Albert Einstein, bioinformatics, epigenetics, impulse control, life extension, meta analysis, metaanalysis, mouse model, pattern recognition, phenotype, placebo effect, randomized controlled trial, statistical model, stem cell Of course, the group of 29 subjects must be representative of the entire allergic population. Furthermore, this approach allows for the possibility that almost 10% of patients allergic to that food will react to ingestion of that dose and this possibility may be considered as too high. Modeling of collective data from several studies is probably the preferred approach to determine the populationbased threshold, although the best statistical model to use remains to be determined [8]. typical servings of these foods. Thus, it is tempting to speculate that those individuals with very low individual threshold doses would be less likely to outgrow their food allergy or would require a longer time period for that to occur. In at least one study [25], individuals with histories of severe food allergies had significantly lower individual threshold doses. … As it stands, most foodallergic patients do not know their individual threshold dose because few allergy clinics make this assessment. The knowledge of individual threshold doses would allow physicians to offer more complete advice to foodallergic patients in terms of their comparative vulnerability to hidden residues of allergenic foods. The clinical determination of large numbers of individual threshold doses would allow estimates of populationbased thresholds using appropriate statistical modeling approaches. The food industry and regulatory agencies could also make effective use of information on populationbased threshold doses to establish improved labeling regulations and practices and allergen control programs. References 1 Gern JE, Yang E, Evrard HM, et al. Allergic reactions to milkcontaminated “nondairy” products. N Engl J Med 1991;324:976–9. 2 Yman IM. Detection of inadequate labeling and contamination as causes of allergic reactions to foods. … Allergy 2005;60:865–70. 74 BindslevJensen C. Standardization of doubleblind, placebocontrolled food challenges. Allergy 2001;56:75–7. 75 Caffarelli C, Petroccione T. Falsenegative food challenges in children with suspected food allergy. Lancet 2001;358:1871–2. 76 Sampson HA. Use of foodchallenge tests in children. Lancet 2001; 358:1832–3. 77 Briggs D, Aspinall L, Dickens A, BindslevJensen C. Statistical model for assessing the proportion of subjects with subjective sensitisations in adverse reactions to foods. Allergy 2001; 56:83–5. 78 Chinchilli VM, Fisher L, Craig TJ. Statistical issues in clinical trials that involve the doubleblind, placebocontrolled food challenge. J Allergy Clin Immunol 2005;115:592–7. 21 CHAPTER 21 IgE Tests: In Vitro Diagnosis Kirsten Beyer KEY CONCEPTS • The presence of food allergenspecific IgE determines the sensitization to a specific food. 
pages: 945 words: 292,893 
Seveneves by Neal Stephenson Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
clean water, Colonization of Mars, Danny Hillis, double helix, epigenetics, fault tolerance, Fellow of the Royal Society, Filipino sailors, gravity well, Isaac Newton, Jeff Bezos, kremlinology, Kuiper Belt, microbiome, phenotype, Potemkin village, pre–internet, random walk, remote working, side project, Silicon Valley, Skype, statistical model, Stewart Brand, supervolcano, the scientific method, Tunguska event, zero day, éminence grise Would it stay together as a compact swarm or spread out? Or would it split up into two or more distinct swarms that would try different things? Arguments could be made for all of the above scenarios and many more, depending on what actually happened in the Hard Rain. Since the Earth had never before been bombarded by a vast barrage of lunar fragments, there was no way to predict what it was going to be like. Statistical models had been occupying much of Doob’s time because they had a big influence on which scenarios might be most worth preparing for. To take a simplistic example, if the moon could be relied on to disassemble itself into peasized rocks, then the best strategy was to remain in place and not worry too much about maneuvering. It was hard to detect a peasized bolide until it was pretty close, by which time it was probably too late to take evasive action. … They could clearly make out Cleft’s radar signature, as well as those of many other big rocks that traveled in its vicinity. A clutter of faint noise and clouds on the optical telescope gave them data about the density of objects too small and numerous to resolve. All of it fed into the plan. Doob looked tired, and nodded off frequently, and hadn’t eaten a square meal since the last perigee, but he pulled himself together when he was needed and fed any new information into a statistical model, prepared long in advance, that would enable them to maximize their chances by ditching Amalthea and doing the big final burn at just the right times. But as he kept warning Ivy and Zeke, the time was coming soon when they would become so embroiled in the particulars of which rock was coming from which direction that it wouldn’t be a statistical exercise anymore. It would be a video game, and its objective would be to build up speed while merging into a stream of large and small rocks that would be overtaking them with the speed of artillery shells. 
pages: 473 words: 154,182 
Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
carbon footprint, clean water, collective bargaining, dark matter, Deng Xiaoping, Exxon Valdez, Filipino sailors, Google Earth, illegal immigration, indoor plumbing, intermodal, Isaac Newton, means of production, microbiome, Panamax, postPanamax, profit motive, Skype, statistical model, Thorstein Veblen, traveling salesman In June, 690 miles southwest of Sitka, for the first time since the spill, dramatic complications occur. As it collides with the continental shelf and then with the freshwater gushing out of the rainforests of the coastal mountains, and then with the coast, the North Pacific Drift loses its coherence, crazies, sends out fractal meanders and eddies and tendrils that tease the four voyagers apart. We don’t know for certain what happens next, but statistical models suggest that at least one of the four voyagers I’m imagining—the frog, let’s pretend—will turn south, carried by an eddy or a meander into the California Current, which will likely deliver it, after many months, into the North Pacific Subtropical Gyre. You may now forget about the frog. We already know its story—how, as it disintegrates, it will contribute a few tablespoons of plastic to the Garbage Patch, or to Hawaii’s Plastic Beach, or to the dinner of an albatross, or to a sample collected in the codpiece of Charlie Moore’s manta trawl. 
pages: 755 words: 121,290 
Statistics hacks by Bruce Frey Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Berlin Wall, correlation coefficient, Daniel Kahneman / Amos Tversky, distributed generation, en.wikipedia.org, feminist movement, game design, Hacker Ethic, index card, Milgram experiment, pvalue, placemaking, RFID, Search for Extraterrestrial Intelligence, SETI@home, Silicon Valley, statistical model He is proudest of two accomplishments: his marriage to his sweet wife, and his purchase of a lowgrade copy of Showcase #4, a comic book wherein the "Silver Age Flash first appears," whatever that means. Contributors The following people contributed their hacks, writing, and inspiration to this book: Joseph Adler is the author of Baseball Hacks (O'Reilly), and a researcher in the Advanced Product Development Group at VeriSign, focusing on problems in user authentication, managed security services, and RFID security. Joe has years of experience analyzing data, building statistical models, and formulating business strategies as an employee and consultant for companies including DoubleClick, American Express, and Dun & Bradstreet. He is a graduate of the Massachusetts Institute of Technology with an Sc.B. and an M.Eng. in computer science and computer engineering. Joe is an unapologetic Yankees fan, but he appreciates any good baseball game. Joe lives in Silicon Valley with his wife, two cats, and a DirecTV satellite dish. 
pages: 484 words: 136,735 
Capitalism 4.0: The Birth of a New Economy in the Aftermath of Crisis by Anatole Kaletsky Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
bank run, banking crisis, Benoit Mandelbrot, Berlin Wall, Black Swan, bonus culture, Bretton Woods, BRICs, Carmen Reinhart, cognitive dissonance, collapse of Lehman Brothers, Corn Laws, correlation does not imply causation, credit crunch, currency manipulation / currency intervention, David Ricardo: comparative advantage, deglobalization, Deng Xiaoping, Edward Glaeser, Eugene Fama: efficient market hypothesis, eurozone crisis, experimental economics, F. W. de Klerk, failed state, Fall of the Berlin Wall, financial deregulation, financial innovation, Financial Instability Hypothesis, floating exchange rates, full employment, George Akerlof, global rebalancing, Hyman Minsky, income inequality, invisible hand, Isaac Newton, Joseph Schumpeter, Kenneth Rogoff, laissezfaire capitalism, Long Term Capital Management, mandelbrot fractal, market design, market fundamentalism, Martin Wolf, moral hazard, mortgage debt, new economy, Northern Rock, offshore financial centre, oil shock, paradox of thrift, peak oil, pets.com, Ponzi scheme, postindustrial society, price stability, profit maximization, profit motive, quantitative easing, Ralph Waldo Emerson, random walk, rentseeking, reserve currency, rising living standards, Robert Shiller, Robert Shiller, Ronald Reagan, shareholder value, short selling, South Sea Bubble, sovereign wealth fund, special drawing rights, statistical model, The Chicago School, The Great Moderation, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, too big to fail, Washington Consensus Mandelbrot’s research program undermined most of the mathematical assumptions of modern portfolio theory, which is the basis for the conventional risk models used by regulators, creditrating agencies, and unsophisticated financial institutions. Mandelbrot’s analysis, presented to nonspecialist readers in his 2004 book (Mis)behavior of Markets, shows with mathematical certainty that these standard statistical models based on neoclassical definitions of efficient markets and rational expectations among investors cannot be true. Had these models been valid, events such as the 1987 stock market crash and the bankruptcy of the 1998 hedge fund crisis would not have occurred even once in the fifteen billion years since the creation of the universe.9 In fact, four such extreme events occurred in just two weeks after the Lehman bankruptcy. 
pages: 561 words: 120,899 
The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy by Sharon Bertsch McGrayne Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
bioinformatics, British Empire, Claude Shannon: information theory, Daniel Kahneman / Amos Tversky, double helix, Edmond Halley, Fellow of the Royal Society, full text search, Henri Poincaré, Isaac Newton, John Nash: game theory, John von Neumann, linear programming, meta analysis, metaanalysis, Nate Silver, pvalue, placebo effect, prediction markets, RAND corporation, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, Richard Feynman: Challenger Oring, Ronald Reagan, speech recognition, statistical model, stochastic process, Thomas Kuhn: the structure of scientific revolutions, traveling salesman, Turing machine, Turing test, uranium enrichment, Yom Kippur War Cochran WG, Mosteller F, Tukey JW. (1954) Statistical Problems of the Kinsey Report on Sexual Behavior in the Human Male. American Statistical Association. Converse, Jean M. (1987) Survey Research in the United States: Roots and Emergence 1890–1960. University of California Press. Fienberg SE, Hoaglin DC, eds. (2006) Selected Papers of Frederick Mosteller. Springer. Fienberg SE et al., eds. (1990) A Statistical Model: Frederick Mosteller’s Contributions to Statistics, Science and Public Policy. SpringerVerlag. HedleyWhyte J. (2007) Frederick Mosteller (1916–2006): Mentoring, A Memoir. International Journal of Technology Assessment in Health Care (23) 152–54. Ingelfinger, Joseph, et al. (1987) Biostatistics in Clinical Medicine. Macmillan. Jones, James H. (1997) Alfred C. Kinsey: A Public/Private Life. 
pages: 402 words: 110,972 
Nerds on Wall Street: Math, Machines and Wired Markets by David J. Leinweber Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
AI winter, algorithmic trading, asset allocation, banking crisis, barriers to entry, Big bang: deregulation of the City of London, butterfly effect, buttonwood tree, buy low sell high, capital asset pricing model, citizen journalism, collateralized debt obligation, corporate governance, Craig Reynolds: boids flock, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Danny Hillis, demand response, disintermediation, distributed generation, diversification, diversified portfolio, Emanuel Derman, en.wikipedia.org, experimental economics, financial innovation, Gordon Gekko, implied volatility, index arbitrage, index fund, information retrieval, Internet Archive, John Nash: game theory, Khan Academy, load shedding, Long Term Capital Management, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, market fragmentation, market microstructure, Mars Rover, moral hazard, mutually assured destruction, natural language processing, Network effects, optical character recognition, paper trading, passive investing, pez dispenser, phenotype, prediction markets, quantitative hedge fund, quantitative trading / quantitative ﬁnance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Renaissance Technologies, Richard Stallman, risk tolerance, riskadjusted returns, risk/return, Ronald Reagan, semantic web, Sharpe ratio, short selling, Silicon Valley, Small Order Execution System, smart grid, smart meter, social web, South Sea Bubble, statistical arbitrage, statistical model, Steve Jobs, Steven Levy, Tacoma Narrows Bridge, the scientific method, The Wisdom of Crowds, time value of money, too big to fail, transaction costs, Turing machine, Upton Sinclair, value at risk, Vernor Vinge, yield curve, Yogi Berra Shaw went on to found D.E. Shaw & Company, one of the largest and most consistently successful quantitative hedge funds. Fischer Black’s Quantitative Strategies Group at Goldman Sachs were algo pioneers. They were perhaps the first to use computers for actual trading, as well as for identifying trades. The early alpha seekers were the first combatants in the algo wars. Pairs trading, popular at the time, relied on statistical models. Finding stronger shortterm correlations than the next guy had big rewards. Escalation beyond pairs to groups of related securities was inevitable. Parallel developments in futures markets opened the door to electronic index arbitrage trading. Automated market making was a valuable early algorithm. In quiet, normal markets buying low and selling high across the spread was easy 68 Nerds on Wall Str eet money. 

Beginning R: The Statistical Programming Language by Mark Gardener Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
correlation coefficient, distributed generation, natural language processing, New Urbanism, pvalue, statistical model What steps will you need to carry out to conduct an ANOVA? 5. The bats data yielded a significant interaction term in the twoway ANOVA. Look at this further. Make a graphic of the data and then follow up with a posthoc analysis. Draw a graph of the interaction. What You Learned in This Chapter Topic Key Points Formula syntax response ~ predictor The formula syntax enables you to specify complex statistical models. Usually the response variables go on the left and predictor variables go on the right. The syntax can also be used in more simple situations and for graphics. Stacking samples stack() In more complex analyses, the data need to be in a layout where each column is a separate item; that is, a column for the response variable and a column for each predictor variable. The stack() command can rearrange data into this layout. 
pages: 349 words: 134,041 
Traders, Guns & Money: Knowns and Unknowns in the Dazzling World of Derivatives by Satyajit Das Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
accounting loophole / creative accounting, Albert Einstein, Asian financial crisis, assetbacked security, Black Swan, BlackScholes formula, Bretton Woods, BRICs, Brownian motion, business process, buy low sell high, call centre, capital asset pricing model, collateralized debt obligation, complexity theory, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, cuban missile crisis, currency peg, disintermediation, diversification, diversified portfolio, Eugene Fama: efficient market hypothesis, financial innovation, fixed income, Haight Ashbury, high net worth, implied volatility, index arbitrage, index card, index fund, interest rate derivative, interest rate swap, Isaac Newton, job satisfaction, locking in a profit, Long Term Capital Management, mandelbrot fractal, margin call, market bubble, Marshall McLuhan, mass affluent, merger arbitrage, Mexican peso crisis / tequila crisis, moral hazard, mutually assured destruction, new economy, New Journalism, Nick Leeson, offshore financial centre, oil shock, Parkinson's law, placebo effect, Ponzi scheme, purchasing power parity, quantitative trading / quantitative ﬁnance, random walk, regulatory arbitrage, riskadjusted returns, risk/return, shareholder value, short selling, South Sea Bubble, statistical model, technology bubble, the medium is the message, time value of money, too big to fail, transaction costs, value at risk, Vanguard fund, volatility smile, yield curve, Yogi Berra, zerocoupon bond The antipathy between the front and back offices is a conscious strategy to keep the wild beasts caged. The back office has a large, diverse cast. Risk managers are employed to ensure that the risk taken by traders is within specified limits. They ensure that the firm does not selfdestruct as a result of some trader betting the bank on the correlation between the lunar cycle and the $/yen exchange rate. Risk managers use elaborate statistical models to keep tabs on the traders. Like double and triple agents, risk managers spy on the traders, each other and even themselves. Lawyers are employed to ensure that hopefully legally binding contracts are signed. Compliance officers ensure that the firm does not break any laws or at least is not caught breaking any laws. They keep lists of all documents that need to be shredded in case of a problem. 
pages: 486 words: 132,784 
Inventors at Work: The Minds and Motivation Behind Modern Inventions by Brett Stern Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Apple II, augmented reality, autonomous vehicles, bioinformatics, Build a better mousetrap, business process, cloud computing, computer vision, cyberphysical system, distributed generation, game design, Grace Hopper, Richard Feynman, Richard Feynman, Silicon Valley, skunkworks, Skype, smart transportation, speech recognition, statistical model, stealth mode startup, Steve Jobs, Steve Wozniak, the market place, Yogi Berra Keck: I guess my thought is, “Whatever works.” Computer modeling has gotten better and better over the years. When we were doing our work then, computers really weren’t around. They were in the university. The math statistics group at Corning had a big IBM mainframe computer that could tackle really difficult problems, but modeling capabilities just didn’t exist. I had grown up with computers that were basically doing the statistical modeling of molecular spectrum. So, I was reasonably familiar with doing this and eventually got the first computer in the lab. I was actually taking data off the optical bench that was in my lab directly into a computer. Would I have used 3D modeling if the capability had existed then? Sure, you use whatever tool is available to you. Modeling can circumvent a lot of dead ends. But, at the end of the day, if the thing has got to go into a customer’s hands, you have to make something tangible. 
pages: 484 words: 120,507 
The Last Lingua Franca: English Until the Return of Babel by Nicholas Ostler Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
barriers to entry, BRICs, British Empire, call centre, en.wikipedia.org, European colonialism, Internet Archive, invention of writing, Isaac Newton, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, open economy, Republic of Letters, Scramble for Africa, statistical model, trade route, upwardly mobile This may seem a hopelessly utopian dream, but increasingly progress in all forms of language technology depends on automatic processing of what are now called language resources. In essence these resources are nothing other than large quantities of text (text corpora) or recorded speech (speech databases) in some form that is systematic and well documented enough to be tractable for digital analysis. From these files, it is possible to derive indices, glossaries, and thesauri, which can be the basis for dictionaries; it is also possible to derive statistical models of the languages, and (if they are multilingual files as, e.g., the official dossiers of the Canadian Parliament, the European Union, or some agency of the United Nations) models of equivalences among languages. These models are calculations of the conditional probability of sequences of sounds, or sequences of words, on the basis of past per formance in all those recorded files. They are the First steps toward calculating automatically the fundamental grammar of the languages. 
pages: 459 words: 118,959 
Confidence Game: How a Hedge Fund Manager Called Wall Street's Bluff by Christine S. Richard Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Asian financial crisis, assetbacked security, banking crisis, Bernie Madoff, cognitive dissonance, collateralized debt obligation, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, Donald Trump, family office, financial innovation, fixed income, forensic accounting, glass ceiling, Long Term Capital Management, market bubble, moral hazard, Ponzi scheme, profit motive, short selling, statistical model, white flight The creditrating companies also were underestimating correlation risk, the report said. Although an earthquake in California doesn’t increase the chance of an earthquake occurring in Florida, bond defaults tend to be contagious and closely correlated in times of economic stress. That makes CDOs, which mingle various types of loans across different geographic regions, vulnerable to the same pressures. In fact, the whole bondinsurance industry might be vulnerable to faulty statistical models that rely on the past to predict the future, Ackman argued in the report. These models estimated that MBIA faced just a 1in10,000 chance of confronting a scenario that would leave it unable to meet all its claims. Yet historical databased models considered the 1987 stock market crash an event so improbable that it would be expected to happen only once in a trillion years, Ackman explained. 
pages: 503 words: 131,064 
Liars and Outliers: How Security Holds Society Together by Bruce Schneier Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
airport security, barriers to entry, Berlin Wall, Bernie Madoff, Bernie Sanders, Brian Krebs, Broken windows theory, carried interest, Cass Sunstein, Chelsea Manning, corporate governance, crack epidemic, credit crunch, crowdsourcing, cuban missile crisis, Daniel Kahneman / Amos Tversky, David Graeber, desegregation, don't be evil, Double Irish / Dutch Sandwich, Douglas Hofstadter, experimental economics, Fall of the Berlin Wall, financial deregulation, George Akerlof, hydraulic fracturing, impulse control, income inequality, invention of agriculture, invention of gunpowder, iterative process, Jean Tirole, John Nash: game theory, jointstock company, Julian Assange, meta analysis, metaanalysis, microcredit, moral hazard, mutually assured destruction, Nate Silver, Network effects, Nick Leeson, offshore financial centre, patent troll, phenotype, pre–internet, principal–agent problem, prisoner's dilemma, profit maximization, profit motive, race to the bottom, Ralph Waldo Emerson, RAND corporation, rentseeking, RFID, Richard Thaler, risk tolerance, Ronald Coase, security theater, shareholder value, slashdot, statistical model, Steven Pinker, Stuxnet, technological singularity, The Market for Lemons, The Nature of the Firm, The Spirit Level, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, too big to fail, traffic fines, transaction costs, ultimatum game, UNCLOS, union organizing, Vernor Vinge, WikiLeaks, World Values Survey, Y2K Amlan Kundu, Shamik Sural, and Arun K. Majumdar (2006), “TwoStage Credit Card Fraud Detection Using Sequence Alignment,” Information Systems Security, Lecture Notes in Computer Science, SpringerVerlag, 4332:260–75. predictive policing programs Martin B. Short, Maria R. D'Orsogna, Virginia B. Pasour, George E. Tita, P. Jeffrey Brantingham, Andrea L. Bertozzi, and Lincoln B. Chayes (2008), “A Statistical Model of Criminal Behavior,” Mathematical Models and Methods in Applied Sciences, 18 (Supplement):1249–67. Beth Pearsall (2010), “Predictive Policing: The Future of Law Enforcement?” NIJ Journal, 266:16–9. Nancy Murray (2011), “Profiling in the Age of Total Information Awareness,” Race & Class, 51:3–24. Timothy McVeigh's van Associated Press (28 Sep 2009), “Attorney: Oklahoma City Bombing Tapes Appear Edited,” Oklahoman. 
pages: 543 words: 157,991 
All the Devils Are Here by Bethany McLean Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Asian financial crisis, assetbacked security, bank run, BlackScholes formula, call centre, collateralized debt obligation, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, Exxon Valdez, fear of failure, financial innovation, fixed income, high net worth, Home mortgage interest deduction, interest rate swap, laissezfaire capitalism, Long Term Capital Management, margin call, market bubble, market fundamentalism, Maui Hawaii, moral hazard, mortgage debt, Northern Rock, Own Your Own Home, Ponzi scheme, quantitative trading / quantitative ﬁnance, race to the bottom, risk/return, Ronald Reagan, Rosa Parks, shareholder value, short selling, South Sea Bubble, statistical model, telemarketer, too big to fail, value at risk Morgan’s estimate, Magnetar’s CDOs accounted for between 35 and 60 percent of the mezzanine CDOs that were issued in that period. Merrill did a number of these deals with Magnetar. The performance of these CDOs can be summed up in one word: horrible. The essence of the ProPublica allegation is that Magnetar, like Paulson, was betting that “its” CDOs would implode. Magnetar denies that this was its intent and claims that its strategy was based on a “mathematical statistical model.” The firm says it would have done well regardless of the direction of the market. It almost doesn’t matter. The tripleAs did blow up. You didn’t have to be John Paulson, picking out the securities you were then going to short, to make a fortune in this trade. Given that the CDOs referenced poorly underwritten subprime mortgages, they had to blow up, almost by definition. That’s what subprime mortgages were poised to do in 2007. 
pages: 421 words: 125,417 
Common Wealth: Economics for a Crowded Planet by Jeffrey Sachs Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
agricultural Revolution, air freight, backtotheland, British Empire, business process, carbon footprint, clean water, colonial rule, corporate social responsibility, correlation does not imply causation, demographic transition, Diane Coyle, Edward Glaeser, energy security, failed state, Gini coefficient, HaberBosch Process, income inequality, income per capita, intermodal, invention of agriculture, invention of the steam engine, invisible hand, Joseph Schumpeter, knowledge worker, laborforce participation, labour mobility, low skilled workers, microcredit, oil shale / tar sands, peak oil, profit maximization, profit motive, purchasing power parity, road to serfdom, Ronald Reagan, Simon Kuznets, Skype, statistical model, The Wealth of Nations by Adam Smith, Thomas Malthus, trade route, transaction costs, unemployed young men, War on Poverty, women in the workforce, workingage population In economic jargon, we say that saving must be devoted to “capital widening,” to keep up with population growth, rather than “capital deepening,” to raise the capital stock per person. One test of this is the crosscountry evidence on economic growth. We can examine whether countries with high fertility rates indeed have lower growth rates of income per person. The standard tests have been carried out by the leaders of empirical growth modeling, Robert Barro and Xavier SalaiMartin. Their statistical model accounts for each country’s average annual growth rate of income per person according to various characteristics of the country, including the level of income per person, the average educational attainment, the life expectancy, an indicator of the “rule of law,” and other variables, including the total fertility rate. The TFR is shown to have a strong, statistically significant negative effect on economic growth. 
pages: 470 words: 144,455 
Secrets and Lies: Digital Security in a Networked World by Bruce Schneier Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Ayatollah Khomeini, barriers to entry, business process, butterfly effect, cashless society, Columbine, defense in depth, double entry bookkeeping, fault tolerance, game design, IFF: identification friend or foe, John von Neumann, knapsack problem, mutually assured destruction, pez dispenser, pirate software, profit motive, Richard Feynman, Richard Feynman, risk tolerance, Silicon Valley, Simon Singh, slashdot, statistical model, Steve Ballmer, Steven Levy, the payments system, Y2K, Yogi Berra Sometimes it’s as easy as taking an existing attack and mixing up the order of commands. Sometimes it’s taking the attack and breaking up the packets differently. Just as antivirus software needs to be constantly updated with new signatures, this type of IDS needs a constantly updated database of attack signatures. It’s unclear whether such a database can ever keep up with the hacker tools. The other IDS paradigm is anomaly detection. The IDS does some statistical modeling of your network and figures out what is normal. Then, if anything abnormal happens, it sounds an alarm. This kind of thing can be done with rules (the system knows what’s normal and flags anything else), statistics (the system figures out statistically what’s normal and flags anything else), or with artificialintelligence techniques. This has a plethora of problems. What if you’re being hacked as you train the system? 
pages: 560 words: 158,238 
Fifty Degrees Below by Kim Stanley Robinson Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
airport security, bioinformatics, Burning Man, clean water, Donner party, full employment, invisible hand, iterative process, means of production, minimum wage unemployment, North Sea oil, Ralph Waldo Emerson, Richard Feynman, Richard Feynman, statistical model, Stephen Hawking, the scientific method Meanwhile, since it’s been a rumor, it’s treated like all the other rumors, many of which are wrong. So actually, to have the idea of something broached without any subsequent repercussion is actually a kind of, what. A kind of inoculation for an event you don’t want investigated.” “Jesus. So how does it work, do you know?” “Not the technical details, no. I know they target certain counties in swing states. They use various statistical models and decisiontree algorithms to pick which ones, and how much to intervene.” “I’d like to see this algorithm.” “Yes, I thought you might.” She reached into her purse, pulled out a data disk in a paper sleeve. She handed it to him. “This is it.” “Whoah,” Frank said, staring at it. “And so . . . What should I do with it?” “I thought you might have some friends at NSF who might be able to put it to use.” 
pages: 574 words: 164,509 
Superintelligence: Paths, Dangers, Strategies by Nick Bostrom Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, anticommunist, artificial general intelligence, autonomous vehicles, barriers to entry, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, demographic transition, Douglas Hofstadter, Drosophila, Elon Musk, en.wikipedia.org, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, Gödel, Escher, Bach, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John von Neumann, knowledge worker, Menlo Park, meta analysis, metaanalysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Norbert Wiener, NPcomplete, nuclear winter, optical character recognition, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, strong AI, superintelligent machines, supervolcano, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, transaction costs, Turing machine, Vernor Vinge, Watson beat the top human players on Jeopardy!, World Values Survey Optical character recognition of handwritten and typewritten text is routinely used in applications such as mail sorting and digitization of old documents.66 Machine translation remains imperfect but is good enough for many applications. Early systems used the GOFAI approach of handcoded grammars that had to be developed by skilled linguists from the ground up for each language. Newer systems use statistical machine learning techniques that automatically build statistical models from observed usage patterns. The machine infers the parameters for these models by analyzing bilingual corpora. This approach dispenses with linguists: the programmers building these systems need not even speak the languages they are working with.67 Face recognition has improved sufficiently in recent years that it is now used at automated border crossings in Europe and Australia. The US Department of State operates a face recognition system with over 75 million photographs for visa processing. 
pages: 320 words: 87,853 
The Black Box Society: The Secret Algorithms That Control Money and Information by Frank Pasquale Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
Affordable Care Act / Obamacare, algorithmic trading, Amazon Mechanical Turk, assetbacked security, Atul Gawande, bank run, barriers to entry, Berlin Wall, Bernie Madoff, Black Swan, bonus culture, Brian Krebs, call centre, Capital in the TwentyFirst Century by Thomas Piketty, Chelsea Manning, cloud computing, collateralized debt obligation, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, cryptocurrency, Debian, don't be evil, Edward Snowden, en.wikipedia.org, Fall of the Berlin Wall, Filter Bubble, financial innovation, Flash crash, full employment, Goldman Sachs: Vampire Squid, Google Earth, Hernando de Soto, High speed trading, hiring and firing, housing crisis, informal economy, information retrieval, interest rate swap, Internet of things, invisible hand, Jaron Lanier, Jeff Bezos, job automation, Julian Assange, Kevin Kelly, knowledge worker, Kodak vs Instagram, kremlinology, late fees, London Interbank Offered Rate, London Whale, Mark Zuckerberg, mobile money, moral hazard, new economy, Nicholas Carr, offshore financial centre, PageRank, pattern recognition, precariat, profit maximization, profit motive, quantitative easing, race to the bottom, recommendation engine, regulatory arbitrage, riskadjusted returns, search engine result page, shareholder value, Silicon Valley, Snapchat, Spread Networks laid a new fibre optics cable between New York and Chicago, statistical arbitrage, statistical model, Steven Levy, the scientific method, too big to fail, transaction costs, twosided market, universal basic income, Upton Sinclair, value at risk, WikiLeaks While investors may not be interested in any one particular mortgagor’s stream of payments, an aggregation of such payments can be marketed as a far more stable income source (or security) than, say, any one loan. Think, for instance, of the stream of payments coming out of a small city. It might seem risky to give any one household a loan; the breadwinner might fall ill, they might declare bankruptcy, they may hit the lottery and pay off the loan tomorrow (denying the investor a steady stream of interest payments). It’s hard to predict what will happen to any given family. But statistical models can much better predict the likelihood of defaults happening in, say, a group of 1,000 families. They “know” that, in the data used, rarely do, say, more than thirty in a 1,000 borrowers default. This statistical analysis, programmed in proprietary software, was one “green light” for massive investments in the mortgage market.21 That sounds simple, but as fi nance automation took off, such deals tended to get hedged around by contingencies, for instance about possible refi nancings or defaults. 
pages: 303 words: 67,891 
Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
AI winter, artificial general intelligence, bioinformatics, brain emulation, combinatorial explosion, complexity theory, computer vision, conceptual framework, correlation coefficient, epigenetics, friendly AI, information retrieval, Isaac Newton, John Conway, Loebner Prize, Menlo Park, natural language processing, Occam's razor, pvalue, pattern recognition, performance metric, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K A number of researchers [25,2831] have described one or more postformal stages. Commons and colleagues have also proposed a taskbased model which provides a framework for explaining stage discrepancies across tasks and for generating new stages based on classification of observed logical behaviors. [32] promotes a statistical conception of stage, which provides a good bridge between taskbased and stagebased models of development, as statistical modeling allows for stages to be roughly defined and analyzed based on collections of task behaviors. [29] postulates the existence of a postformal stage by observing elevated levels of abstraction which, they argue, are not manifested in formal thought. [33] observes a postformal stage when subjects become capable of analyzing and coordinating complex logical systems with each other, creating metatheoretical supersystems. 
pages: 497 words: 123,718 
A Game as Old as Empire: The Secret World of Economic Hit Men and the Web of Global Corruption by Steven Hiatt; John Perkins Amazon: amazon.com — amazon.co.uk — amazon.de — amazon.fr
airline deregulation, Andrei Shleifer, Asian financial crisis, Berlin Wall, bigbox store, Bretton Woods, British Empire, capital controls, centre right, clean water, colonial rule, corporate governance, corporate personhood, deglobalization, deindustrialization, Doha Development Round, energy security, European colonialism, financial deregulation, financial independence, full employment, global village, high net worth, land reform, large denomination, Long Term Capital Management, Mexican peso crisis / tequila crisis, Mikhail Gorbachev, moral hazard, Naomi Klein, new economy, North Sea oil, offshore financial centre, oil shock, Ponzi scheme, race to the bottom, reserve currency, Ronald Reagan, Scramble for A 