Moneyball by Michael Lewis explains big data

11 results back to index

pages: 294 words: 82,438

Simple Rules: How to Thrive in a Complex World by Donald Sull, Kathleen M. Eisenhardt


Affordable Care Act / Obamacare, Airbnb, asset allocation, Atul Gawande, barriers to entry, Basel III, Berlin Wall, carbon footprint, Checklist Manifesto, complexity theory, Craig Reynolds: boids flock, Credit Default Swap, Daniel Kahneman / Amos Tversky, diversification,, European colonialism, Exxon Valdez, facts on the ground, Fall of the Berlin Wall, haute cuisine, invention of the printing press, Isaac Newton, Kickstarter, late fees, Lean Startup, Louis Pasteur, Lyft, Moneyball by Michael Lewis explains big data, Nate Silver, Network effects, obamacare, Paul Graham, performance metric, price anchoring, RAND corporation, risk/return, Saturday Night Live, sharing economy, Silicon Valley, Startup school, statistical model, Steve Jobs, TaskRabbit, The Signal and the Noise by Nate Silver, transportation-network company, two-sided market, Wall-E, web application, Y Combinator, Zipcar

They tire out pitchers by making them throw more pitches overall, and disciplined hitting does not erode much with age. These and other insights are at the heart of what author Michael Lewis famously described as moneyball. Moneyball, the book and movie, is the ultimate sports fairy tale, with the A’s playing the role of Cinderella. But unlike Cinderella, the A’s did not live happily ever after. Moneyball’s simple rules were just too easy to copy. By 2004, a free-spending team, the Boston Red Sox, co-opted the A’s principles and won the World Series for the first time since 1918. In contrast, the A’s went into decline, and by 2007 they were losing more games than they were winning. Moneyball had struck out. Enter Farhan Zaidi, the A’s director of baseball operations since 2009, who was named assistant general manager in 2014. Zaidi’s background is rare by the standards of professional baseball.

. [>] The right choice is often: For a review of relevant research, see Nicolaj Siggelkow, “Change in the Presence of Fit: The Rise, the Fall and the Renaissance of Liz Claiborne,” Academy of Management Journal, 44, no. 4 (2001): 838–57. [>] Alderson, a former Marine: Michael Lewis, Moneyball: The Art of Winning an Unfair Game (New York: W. W. Norton, 2004). [>] These and other insights: Ibid. [>] Enter Farhan Zaidi: Susan Slusser, “A Beautiful Mind,” San Jose Mercury News, February 5, 2014. As this book went into production, the L.A. Dodgers hired away Zaidi to be their general manager, to the dismay of A’s fans. [>] As his boss, Billy Beane: Ibid. [>] After the collapse: David Laurila, “Sloan Analytics: Farhan Zaidi on A’s Analytics,” accessed September 27, 2014, [>] At Zaidi’s urging: Slusser, “A Beautiful Mind.” The five tools are described more fully in Michael Lewis’s book Moneyball. [>] One was a how-to rule: Alexander Smith, “Billy Beane’s Finest Work Yet: How the Oakland A’s Won the AL West,”, October 19, 2012, [>] The two of them: Andrew Brown, “A’s Platoon System New Moneyball,” SwinginA’, September 20, 2013, [>] In 2013, they added: Rob Neyer, “Those A’s Found Another Edge?”

. [>] One was a how-to rule: Alexander Smith, “Billy Beane’s Finest Work Yet: How the Oakland A’s Won the AL West,”, October 19, 2012, [>] The two of them: Andrew Brown, “A’s Platoon System New Moneyball,” SwinginA’, September 20, 2013, [>] In 2013, they added: Rob Neyer, “Those A’s Found Another Edge?”, December 31, 2013, Baseball Nation, accessed March 22, 2014, [>] In fact, the A’s: Andrew Koo, “A Decade after Moneyball, Have the A’s Found a New Market Efficiency?,” accessed July 23, 2014, [>] As journalist Tim: Tim Kawakami, “Beane, Staff Become Experts at Playing the Roster Game,” May 23, 2014, San Jose Mercury News. [>] At the turn of the twentieth: David Roberts, “Into the Unknown,” National Geographic, January 2013, pp. 120–34. [>] To be first: Roland Huntsford, The Last Place on Earth (New York: Modern Library, 1999). [>] As the trek: Ibid. [>] First, Scott could: Ibid. [>] In a telling quote: Ibid. p. 379. [>] A key to getting unstuck: Christopher B.


pages: 204 words: 58,565

Keeping Up With the Quants: Your Guide to Understanding and Using Analytics by Thomas H. Davenport, Jinho Kim


Black-Scholes formula, business intelligence, business process, call centre, computer age, correlation coefficient, correlation does not imply causation, Credit Default Swap,, feminist movement, Florence Nightingale: pie chart, forensic accounting, global supply chain, Hans Rosling, hypertext link, invention of the telescope, inventory management, Jeff Bezos, margin call, Moneyball by Michael Lewis explains big data, Netflix Prize, p-value, performance metric, publish or perish, quantitative hedge fund, random walk, Renaissance Technologies, Robert Shiller, Robert Shiller, self-driving car, sentiment analysis, six sigma, Skype, statistical model, supply-chain management, text mining, the scientific method

Amazon review from “A ‘Umea University’ student (Sweden) give ratings,” August 24, 1999, cr_dp_title?ie=UTF8&ASIN=0673184447&channel=detail-glance&nodeID=283155 &store=books, retrieved December 30, 2012. 19. Michael Lewis, Moneyball: The Art of Winning an Unfair Game (New York: Norton, 2003). 20. “SN Names the 20 Smartest Athletes in Sports,” The Sporting News, Sept. 23, 2010, 21. Michael Lewis, “The No-Stats All Star,” New York Times, February 13, 2009, 22. Frances X. Frei and Mathew Perlberg, “Discovering Hidden Gems: The Story of Daryl Morey, Shane Battier, and the Houston Rockets (B),” Harvard Business School case study (Boston: Harvard Business Publishing, September 2010), 1.

He is also chair of the annual MIT Sports Analytics Conference, which now attracts over two thousand attendees. Shane Battier is an NBA player—a forward—who currently plays for the Miami Heat. He played for the Houston Rockets from 2006 to 2011. He is relatively analytical as professional basketball players go, and was named the seventh-smartest player in professional sports by Sporting News magazine.20 Daryl Morey notes (in an article by Moneyball author Michael Lewis) that Battier was . . . given his special package of information. “He’s the only player we give it to,” Morey says. “We can give him this fire hose of data and let him sift. Most players are like golfers. You don’t want them swinging while they’re thinking.” The data essentially broke down the floor into many discrete zones and calculated the odds of Bryant making shots from different places on the court, under different degrees of defensive pressure, in different relationships to other players—how well he scored off screens, off pick-and-rolls, off catch-and-shoots and so on.

If we can’t turn that data into better decision making through quantitative analysis, we are both wasting data and probably creating suboptimal performance. Therefore, our goal in this book is to show you how quantitative analysis works—even if you do not have a quantitative background—and how you can use it to make better decisions. The Rise of Analytics and Big Data The rise of data is taking place in virtually every domain of society. If you’re into sports, you undoubtedly know about moneyball, the transformation of professional baseball—and by now virtually every major sport—by use of player performance data and analytics. If you’re into online gaming, you probably realize that every aspect of your game behavior is being collected and analyzed by such companies as Zynga and Electronic Arts. Like movies? If so, you probably know about the algorithms Netflix uses to predict what movies you will like.


pages: 829 words: 186,976

The Signal and the Noise: Why So Many Predictions Fail-But Some Don't by Nate Silver


airport security, availability heuristic, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, big-box store, Black Swan, Broken windows theory, Carmen Reinhart, Claude Shannon: information theory, Climategate, Climatic Research Unit, cognitive dissonance, collapse of Lehman Brothers, collateralized debt obligation, complexity theory, computer age, correlation does not imply causation, Credit Default Swap, credit default swaps / collateralized debt obligations, cuban missile crisis, Daniel Kahneman / Amos Tversky, diversification, Donald Trump, Edmond Halley, Edward Lorenz: Chaos theory,, equity premium, Eugene Fama: efficient market hypothesis, everywhere but in the productivity statistics, fear of failure, Fellow of the Royal Society, Freestyle chess, fudge factor, George Akerlof, haute cuisine, Henri Poincaré, high batting average, housing crisis, income per capita, index fund, Internet Archive, invention of the printing press, invisible hand, Isaac Newton, James Watt: steam engine, John Nash: game theory, John von Neumann, Kenneth Rogoff, knowledge economy, locking in a profit, Loma Prieta earthquake, market bubble, Mikhail Gorbachev, Moneyball by Michael Lewis explains big data, Monroe Doctrine, mortgage debt, Nate Silver, new economy, Norbert Wiener, PageRank, pattern recognition,, prediction markets, Productivity paradox, random walk, Richard Thaler, Robert Shiller, Robert Shiller, Rodney Brooks, Ronald Reagan, Saturday Night Live, savings glut, security theater, short selling, Skype, statistical model, Steven Pinker, The Great Moderation, The Market for Lemons, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, too big to fail, transaction costs, transfer pricing, University of East Anglia, Watson beat the top human players on Jeopardy!, wikimedia commons

This doesn’t make him the most generous human being, but it is exactly what he needs in order to play second base for the Boston Red Sox, and that’s the only thing that Pedroia cares about. “Our weaknesses and our strengths are always very intimately connected,” James said. “Pedroia made strengths out of things that would be weaknesses for other players.” The Real Lessons of Moneyball “As Michael Lewis said, the debate is over,” Billy Beane declared when we were discussing Moneyball. For a time, Moneyball was very threatening to people in the game; it seemed to imply that their jobs and livelihoods were at stake. But the reckoning never came—scouts were never replaced by computers. In fact, the demand to know what the future holds for different types of baseball players—whether couched in terms of scouting reports or statistical systems like PECOTA—still greatly exceeds the supply.

A good statistical forecasting system might have found some reason to be optimistic after Heyward’s 2011 season: his numbers were essentially the same except for his batting average, and batting average is subject to more luck than other statistics. But can statistics tell you everything you’ll want to know about a player? Ten years ago, that was the hottest topic in baseball. Can’t We All Just Get Along? A slipshod but nevertheless commonplace reading of Moneyball is that it was a story about the conflict between two rival gangs—“statheads” and “scouts”—that centered on the different paradigms that each group had adopted to evaluate player performance (statistics, of course, for the statheads, and “tools” for the scouts). In 2003, when Moneyball was published, Michael Lewis’s readers would not have been wrong to pick up on some animosity between the two groups. (The book itself probably contributed to some of the hostility.) When I attended baseball’s Winter Meetings that year at the New Orleans Marriott, it was like being back in high school.

As an annoying little math prodigy, I was attracted to all the numbers in the game, buying my first baseball card at seven, reading my first Elias Baseball Analyst at ten, and creating my own statistic at twelve. (It somewhat implausibly concluded that the obscure Red Sox infielder Tim Naehring was one of the best players in the game.) My interest peaked, however, in 2002. At the time Michael Lewis was busy writing Moneyball, the soon-to-be national bestseller that chronicled the rise of the Oakland Athletics and their statistically savvy general manager Billy Beane. Bill James, who twenty-five years earlier had ushered in the Sabermetric era* by publishing a book called The Bill James Baseball Abstract, was soon to be hired as a consultant by the Red Sox. An unhealthy obsession with baseball statistics suddenly seemed like it could be more than just a hobby—and as it happened, I was looking for a new job.


pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier


23andMe, Affordable Care Act / Obamacare, airport security, AltaVista, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, IBM and the Holocaust, index card, informal economy, Internet of things, invention of the printing press, Jeff Bezos, Louis Pasteur, Mark Zuckerberg, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, performance metric, Peter Thiel, Post-materialism, post-materialism, random walk, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, speech recognition, Steve Jobs, Steven Levy, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Watson beat the top human players on Jeopardy!

See imprecision MetaCrawler, [>] metadata: in datafication, [>]–[>] metric system, [>] Microsoft, [>], [>], [>] Amalga software, [>]–[>], [>] and data-valuation, [>] and language translation, [>] Word spell-checking system, [>]–[>] Minority Report [film], [>]–[>], [>] Moneyball [film], [>], [>]–[>], [>], [>] Moneyball (Lewis), [>] Moore’s Law, [>] Mydex, [>] nanotechnology: and qualitative changes, [>] Nash, Bruce, [>] nations: big data and competitive advantage among, [>]–[>] natural language processing, [>] navigation, marine: correlation analysis in, [>]–[>] Maury revolutionizes, [>]–[>], [>], [>], [>], [>], [>], [>], [>], [>], [>] Negroponte, Nicholas: Being Digital, [>] Netbot, [>] Netflix, [>] collaborative filtering at, [>] data-reuse by, [>] releases personal data, [>] Netherlands: comprehensive civil records in, [>]–[>] network analysis, [>] network theory, [>] big data in, [>]–[>] New York City: exploding manhole covers in, [>]–[>], [>]–[>], [>], [>] government data-reuse in, [>]–[>] New York Times, [>]–[>] Next Jump, [>] Neyman, Jerzy: on statistical sampling, [>] Ng, Andrew, [>] 1984 (Orwell), [>], [>] Norvig, Peter, [>] “The Unreasonable Effectiveness of Data,” [>] Nuance: fails to understand data-reuse, [>]–[>] numerical systems: history of, [>]–[>] Oakland Athletics, [>]–[>] Obama, Barack: on open data, [>] Och, Franz Josef, [>] Ohm, Paul: on privacy, [>] oil refining: big data in, [>] ombudsmen, [>] Omidyar, Pierre, [>] open data.

Specific area expertise matters less in a world where probability and correlation are paramount. In the movie Moneyball, baseball scouts were upstaged by statisticians when gut instinct gave way to sophisticated analytics. Similarly, subject-matter specialists will not go away, but they will have to contend with what the big-data analysis says. This will force an adjustment to traditional ideas of management, decision-making, human resources, and education. Most of our institutions were established under the presumption that human decisions are based on information that is small, exact, and causal in nature. But the situation changes when the data is huge, can be processed quickly, and tolerates inexactitude. Moreover, because of the data’s vast size, decisions may often be made not by humans but by machines. We consider the dark side of big data in Chapter Eight. Society has millennia of experience in understanding and overseeing human behavior.

But after it became independent, UPS’s competitors felt more comfortable supplying their data, and ultimately everyone benefited from the improved accuracy that aggregation brings. Evidence that data itself, rather than skills or mindset, will come to be most valued can be found in numerous acquisitions in the big-data business. For example, in 2006 Microsoft rewarded Etzioni’s big-data mindset by buying Farecast for around $110 million. But two years later Google paid $700 million to acquire Farecast’s data supplier, ITA Software. The demise of the expert In the movie Moneyball, about how the Oakland A’s became a winning baseball team by applying analytics and new types of metrics to the game, there is a delightful scene in which grizzled old scouts are sitting around a table discussing players. The audience can’t help cringing, not simply because the scene exposes the way decisions are made devoid of data, but because we’ve all been in situations where “certainty” was based on sentiment rather than science.


pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel


Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, call centre, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil,, Erik Brynjolfsson, experimental subject, Google Glasses, happiness index / gross national happiness, job satisfaction, Johann Wolfgang von Goethe, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, software as a service, speech recognition, statistical model, Steven Levy, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra

University of Phoenix: Rebecca Barber and Mike Sharkey, Apollo Group, “Course Correction: Using Analytics to Predict Course Success,” Learning Analytics and Knowledge, May 2012, 259–262. Rio Salado Community College: Marc Parry, “Big Data on Campus,” New York Times, July 28, 2012. Jeopardy! winner: See Chapter 6 for more details. Roger Craig, “Data Science Meets the Quiz Show Jeopardy!,” Predictive Analytics World Chicago Conference, June 26, 2012, Chicago, IL.–11. NPR Staff, “How One Man Played ‘Moneyball’ with ‘Jeopardy!,’” National Public Radio Online, November 20, 2011. Facebook, Elsevier, IBM, Pittsburgh Science of Learning Center: ACM KDD Cup 2010 Annual Data Mining “Student Performance Evaluation” Challenge.

You may have heard of the butterfly, Doppler, and placebo effects. Stay tuned here for the Data, Induction, Ensemble, and Persuasion Effects. Each of these Effects encompasses the fun part of science and technology: an intuitive hook that reveals how it works and why it succeeds. The Field of Dreams People . . . operate with beliefs and biases. To the extent you can eliminate both and replace them with data, you gain a clear advantage. —Michael Lewis, Moneyball: The Art of Winning an Unfair Game What field of study or branch of science are we talking about here? Learning how to predict from data is sometimes called machine learning—but, it turns out, this is mostly an academic term you find used within research labs, conference papers, and university courses (full disclosure: I taught the Machine Learning graduate course at Columbia University a couple of times in the late 1990s).

data in order to “program himself” to become a celebrated champion of the game show. Moneyballing Jeopardy! On September 21, 2010, a few months before Watson faced off on Jeopardy!, televisions across the land displayed host Alex Trebek speaking a clue tailored to the science fiction fan. Contestant Roger Craig avidly buzzed in. Like any technology PhD, he knew the answer was Spock. As Spock would, Roger had taken studying to its logical extreme. Jeopardy! requires inordinate cultural literacy, the almost unattainable status of a Renaissance man, one who holds at least basic knowledge about pretty much every topic. To prepare for his appearance on the show, which he’d craved since age 12, Roger did for Jeopardy! what had never been done before. He Moneyballed it. Roger optimized his study time with prediction. As a mere mortal, he faced a limited number of hours per day to study.


pages: 252 words: 72,473

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil


Affordable Care Act / Obamacare, Bernie Madoff, big data - Walmart - Pop Tarts, call centre, carried interest, cloud computing, collateralized debt obligation, correlation does not imply causation, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, Emanuel Derman, housing crisis, illegal immigration, Internet of things, late fees, medical bankruptcy, Moneyball by Michael Lewis explains big data, new economy, obamacare, Occupy movement, offshore financial centre, payday loans, peer-to-peer lending, Peter Thiel, Ponzi scheme, prediction markets, price discrimination, quantitative hedge fund, Ralph Nader, RAND corporation, recommendation engine, Sharpe ratio, statistical model, Tim Cook: Apple, too big to fail, Unsafe at Any Speed, Upton Sinclair, Watson beat the top human players on Jeopardy!, working poor

And how will that affect their overall odds of winning? Baseball is an ideal home for predictive mathematical modeling. As Michael Lewis wrote in his 2003 bestseller, Moneyball, the sport has attracted data nerds throughout its history. In decades past, fans would pore over the stats on the back of baseball cards, analyzing Carl Yastrzemski’s home run patterns or comparing Roger Clemens’s and Dwight Gooden’s strikeout totals. But starting in the 1980s, serious statisticians started to investigate what these figures, along with an avalanche of new ones, really meant: how they translated into wins, and how executives could maximize success with a minimum of dollars. “Moneyball” is now shorthand for any statistical approach in domains long ruled by the gut. But baseball represents a healthy case study—and it serves as a useful contrast to the toxic models, or WMDs, that are popping up in so many areas of our lives.

the erasures were “suggestive”: Turque, “ ‘Creative…Motivating’ and Fired.” Sarah Wysocki was out of a job: Ibid. CHAPTER 1 Boudreau, perhaps out of desperation: David Waldstein, “Who’s on Third? In Baseball’s Shifting Defenses, Maybe Nobody,” New York Times, May 12, 2014, www.​nytimes.​com/​2014/​05/​13/​sports/​baseball/​whos-​on-​third-​in-​baseballs-​shifting-​defenses-​maybe-​nobody.​html?​_​r=​0. Moneyball: Michael Lewis, Moneyball: The Art of Winning an Unfair Game (New York: W. W. Norton, 2003). In 1997, a convicted murderer: Manny Fernandez, “Texas Execution Stayed Based on Race Testimony,” New York Times, September 16, 2011, www.​nytimes.​com/​2011/​09/​17/​us/​experts-​testimony-​on-​race-​led-​to-​stay-​of-​execution-​in-​texas.​html?​pagewanted=​all. made a reference to Buck’s race: Ibid. “It is inappropriate to allow race”: Alan Berlow, “See No Racism, Hear No Racism: Despite Evidence, Perry About to Execute Another Texas Man,” National Memo, September 15, 2011, www.​nation​almemo.​com/​perry-​might-​let-​another-​man-​die/.

American Express learned this the hard way: Ron Lieber, “American Express Kept a (Very) Watchful Eye on Charges,” New York Times, January 30, 2009, www.​nytimes.​com/​2009/​01/​31/​your-​money/credit-​and-​debit-​cards/​31money.​html. Douglas Merrill’s idea: Steve Lohr, “Big Data Underwriting for Payday Loans,” New York Times, January 19, 2015, http://​bits.​blogs.​nytimes.​com/​2015/​01/​19/​big-​data-​underwriting-​for-​payday-​loans/. On the company web page: Website ZestFinance.​com, accessed January 9, 2016, www.​zestfinance.​com/. A typical $500 loan: Lohr, “Big Data Underwriting.” ten thousand data points: Michael Carney, “Flush with $20M from Peter Thiel, ZestFinance Is Measuring Credit Risk Through Non-traditional Big Data,” Pando, July 31, 2013, https://​pando.​com/​2013/​07/​31/​flush-​with-​20m-​from-​peter-​thiel-​zestfinance-​is-​measuring-​credit-​risk-​through-​non-​traditional-​big-​data/. one of the first peer-to-peer exchanges, Lending Club: Richard MacManus, “Facebook App, Lending Club, Passes Half a Million Dollars in Loans,” Readwrite, July 29, 2007, http://​readwrite.​com/​2007/​07/​29/​facebook_​app_​lending_​club_​passes_​half_​a_​million_​in_​loans.


pages: 481 words: 120,693

Plutocrats: The Rise of the New Global Super-Rich and the Fall of Everyone Else by Chrystia Freeland


Albert Einstein, algorithmic trading, banking crisis, barriers to entry, Basel III, battle of ideas, Bernie Madoff, Big bang: deregulation of the City of London, Black Swan, Branko Milanovic, Bretton Woods, BRICs, business climate, call centre, carried interest, Cass Sunstein, Clayton Christensen, collapse of Lehman Brothers, conceptual framework, corporate governance, credit crunch, Credit Default Swap, crony capitalism, Deng Xiaoping, don't be evil, double helix, energy security, estate planning, experimental subject, financial deregulation, financial innovation, Flash crash, Frank Gehry, Gini coefficient, global village, Goldman Sachs: Vampire Squid, Gordon Gekko, Guggenheim Bilbao, haute couture, high net worth, income inequality, invention of the steam engine, job automation, joint-stock company, Joseph Schumpeter, knowledge economy, knowledge worker, linear programming, London Whale, low skilled workers, manufacturing employment, Mark Zuckerberg, Martin Wolf, Mikhail Gorbachev, Moneyball by Michael Lewis explains big data, NetJets, new economy, Occupy movement, open economy, Peter Thiel, place-making, Plutocrats, plutocrats, Plutonomy: Buying Luxury, Explaining Global Imbalances, postindustrial economy, Potemkin village, profit motive, purchasing power parity, race to the bottom, rent-seeking, Rod Stewart played at Stephen Schwarzman birthday party, Ronald Reagan, self-driving car, short selling, Silicon Valley, Silicon Valley startup, Simon Kuznets, Solar eclipse in 1919, sovereign wealth fund, stem cell, Steve Jobs, The Spirit Level, The Wealth of Nations by Adam Smith, Tony Hsieh, too big to fail, trade route, trickle-down economics, Tyler Cowen: Great Stagnation, wage slave, Washington Consensus, winner-take-all economy

That is the story of the Oakland A’s and their general manager, Billy Beane, as lionized in Michael Lewis’s Moneyball. Beane is Lewis’s underfunded, underdog hero, but his is really the story of capital—the baseball team owners—looking for a way to avoid paying the celebrity premium to its stars—the players—in this case by looking for athletes whose skills were crucial to the team’s success but were undervalued by the market. Even in finance, whose superstars are less well known but even better paid than film and sports celebrities, some bosses have been looking for ways to avoid the celebrity premium. Harvard Business School professor Boris Groysberg became the hero of Wall Street’s HR departments in 2010 when he published Chasing Stars, a study that has become the banking industry’s Moneyball. After interviewing more than two hundred Wall Street analysts, Groysberg concluded that recruiting stars from rival firms was a waste of money, because poached analysts tended to falter when they were plucked from their native culture.

— If you have a PhD in math or statistics, the revolution you are probably trying to capitalize on today is big data—a term for the vast amounts of digital data we now create and have an increasing ability to store and manipulate. If wonks were fashionistas, big data would be this season’s hot new color. When I interviewed him before a university audience in late 2011, Larry Summers named big data as one of the three big ideas he is most excited about (the others were biology and the rise of the emerging markets). The McKinsey Global Institute, the management consultancy’s research arm and the closest the corporate world comes to having an ivory tower, published a 143-page report in 2011 on big data, touting it as “the next frontier for innovation, competition, and productivity.” To understand how much data is now at our fingertips, consider a few striking facts from the McKinsey tome.

McKinsey believes that the transformative power of all this data will amount to a fifth wave in the technology revolution, building on the first four: the mainframe era; the PC era; the Internet and Web 1.0 era; and, most recently, the mobile and Web 2.0 era. Big data will create a new tribe of highly paid superstars. McKinsey estimates that by 2018 in the United States alone there will be shortfall of between 140,000 and 190,000 people with the “deep analytical talent” required to use big data. And it will probably create a handful of billionaires who understand and capitalize on the revolutionary potential of big data before the rest of us do—indeed, one way to understand Facebook’s $100 billion market capitalization is as a bet on big data. — The technology revolution isn’t just about the nerds of the West Coast. We think of the computer revolution as a Silicon Valley phenomenon. But while most of the technology is invented there, many of its biggest beneficiaries are on Wall Street.


pages: 348 words: 39,850

Data Scientists at Work by Sebastian Gutierrez


Albert Einstein, algorithmic trading, bioinformatics, bitcoin, business intelligence, chief data officer, clean water, cloud computing, computer vision, continuous integration, correlation does not imply causation, crowdsourcing, data is the new oil, DevOps, domain-specific language, follow your passion, full text search, informal economy, information retrieval, Infrastructure as a Service, inventory management, iterative process, linked data, Mark Zuckerberg, microbiome, Moneyball by Michael Lewis explains big data, move fast and break things, natural language processing, Network effects, nuclear winter, optical character recognition, pattern recognition, Paul Graham, personalized medicine, Peter Thiel, pre–internet, quantitative hedge fund, quantitative trading / quantitative finance, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, self-driving car, side project, Silicon Valley, Skype, software as a service, speech recognition, statistical model, Steve Jobs, stochastic process, technology bubble, text mining, the scientific method, web application

Hu: I studied math and statistics at the undergraduate and graduate levels at Harvard. For a long time, I wasn’t sure what I wanted to do with this skillset. People always told me, “You can do anything with a degree in math”—which I think is really funny. I do not know if that is necessarily true. I think you can apply mathematics in any number of industries, but the approaches are often similar. One of my earliest inspirations was reading Moneyball by Michael Lewis.1 He brought to the forefront the concept that data was transforming the baseball industry. That was one of the earliest instances where I really saw how powerful intelligent data analysis can be. This was, I think, even before the term “data science” was really in play. Yet all these people in baseball operations were coming in and providing very valuable insights that maybe went against the norm.

I wanted to do a lot of that work and get in on the ground floor with it. I became an intern with the Yankees, as one of the two first people they hired to do this type of analysis. It was really exciting because it felt like the Wild West. Anything was in play—anything that I wanted to do or they wanted to do was a possibility, and we tried so many different things. I think that is one of the most exciting things about data science today. Michael Lewis, Moneyball (W.W. Norton & Co., 2004). 1 Data Scientists at Work Gutierrez: What was a key lesson you learned from your experience with the Yankees? Hu: I think one of the most important lessons I learned was how critical it is to persuade other people. One of the big challenges of being a data scientist— that people might not usually think about—is that the results or the insights you come up with have to make sense and be convincing.

So the key message was to really think about your data and how it’s being generated, as evidenced by what they found out about the hurricane and what happened just based on Twitter data analysis. Another person is Kenneth Cukier, who is The Economist’s big data editor. He co-wrote a book called Big Data: A Revolution That Will Transform How We Live, Work, and Think that’s given me a lot of thoughts to mull over regarding the direction that the industry’s going.2 So it’s good to have these voices that challenge you a little bit. Kate Crawford, “The Hidden Biases of Big Data,” Harvard Business Review, April 1, 2013, 2 Kenneth Cukier and Viktor Mayer-Schönberger, Big Data: A Revolution That Will Transform How We Live,Work, and Think (Houghton Mifflin Harcourt, 2013). 1 Data Scientists at Work Gutierrez: What in your career are you most proud of so far?


pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos


3D printing, Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, crowdsourcing, Danny Hillis, data is the new oil, double helix, Douglas Hofstadter, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, global village, Google Glasses, Gödel, Escher, Bach, information retrieval, job automation, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, lone genius, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, NP-complete, P = NP, PageRank, pattern recognition, phenotype, planetary scale, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, Second Machine Age, self-driving car, Silicon Valley, speech recognition, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight

This is only a crude example; we’ll see many deeper ones in this book. A related, frequently heard objection is “Data can’t replace human intuition.” In fact, it’s the other way around: human intuition can’t replace data. Intuition is what you use when you don’t know the facts, and since you often don’t, intuition is precious. But when the evidence is before you, why would you deny it? Statistical analysis beats talent scouts in baseball (as Michael Lewis memorably documented in Moneyball), it beats connoisseurs at wine tasting, and every day we see new examples of what it can do. Because of the influx of data, the boundary between evidence and intuition is shifting rapidly, and as with any revolution, entrenched ways have to be overcome. If I’m the expert on X at company Y, I don’t like to be overridden by some guy with data. There’s a saying in industry: “Listen to your customers, not to the HiPPO,” HiPPO being short for “highest paid person’s opinion.”

Prologue An early list of examples of machine learning’s impact on daily life can be found in “Behind-the-scenes data mining,” by George John (SIGKDD Explorations, 1999), which was also the inspiration for the “day-in-the-life” paragraphs of the prologue. Eric Siegel’s book Predictive Analytics (Wiley, 2013) surveys a large number of machine-learning applications. The term big data was popularized by the McKinsey Global Institute’s 2011 report Big Data: The Next Frontier for Innovation, Competition, and Productivity. Many of the issues raised by big data are discussed in Big Data: A Revolution That Will Change How We Live, Work, and Think, by Viktor Mayer-Schönberger and Kenneth Cukier (Houghton Mifflin Harcourt, 2013). The textbook I learned AI from is Artificial Intelligence,* by Elaine Rich (McGraw-Hill, 1983). A current one is Artificial Intelligence: A Modern Approach, by Stuart Russell and Peter Norvig (3rd ed., Prentice Hall, 2010).

., 230 Mendeleev, Dmitri, 235 Meta-learning, 237–239, 255, 309 Methane/methanol, 197–198 Michalski, Ryszard, 69, 70, 90 Michelangelo, 2 Microprocessor, 48–49, 236 Microsoft, 9, 22 Kinect, 88, 237, 238 Windows, 12, 133, 224 Xbox Live, 160–161 Microsoft Research, 152 Military robots, 21, 279–282, 299, 310 Mill, John Stuart, 93 Miller, George, 224 Minsky, Marvin, 35, 38, 100–101, 102, 110, 112, 113 Mitchell, Tom, 64, 69, 90 Mixability, 135 MLNs. See Markov logic networks (MLNs) Moby Dick (Melville), 72 Molecular biology, data and, 14 Moneyball (Lewis), 39 Mooney, Ray, 76 Moore’s law, 287 Moravec, Hans, 288 Muggleton, Steve, 80 Multilayer perceptron, 108–111 autoencoder, 116–118 Bayesian, 170 driving a car and, 113 Master Algorithm and, 244 NETtalk system, 112 reinforcement learning and, 222 support vector machines and, 195 Music composition, case-based reasoning and, 199 Music Genome Project, 171 Mutation, 124, 134–135, 241, 252 Naïve Bayes classifier, 151–153, 171, 304 Bayesian networks and, 158–159 clustering and, 209 Master Algorithm and, 245 medical diagnosis and, 23 relational learning and, 228–229 spam filters and, 23–24 text classification and, 195–196 Narrative Science, 276 National Security Agency (NSA), 19–20, 232 Natural selection, 28–29, 30, 52 as algorithm, 123–128 Nature Bayesians and, 141 evolutionaries and, 137–142 symbolists and, 141 Nature (journal), 26 Nature vs. nurture debate, machine learning and, 29, 137–139 Neal, Radford, 170 Nearest-neighbor algorithms, 24, 178–186, 202, 306–307 dimensionality and, 186–190 Negative examples, 67 Netflix, 12–13, 183–184, 237, 266 Netflix Prize, 238, 292 Netscape, 9 NETtalk system, 112 Network effect, 12, 299 Neumann, John von, 72, 123 Neural learning, fitness and, 138–139 Neural networks, 99, 100, 112–114, 122, 204 convolutional, 117–118, 302–303 Master Algorithm and, 240, 244, 245 reinforcement learning and, 222 spin glasses and, 102–103 Neural network structure, Baldwin effect and, 139 Neurons action potentials and, 95–96, 104–105 Hebb’s rule and, 93–94 McCulloch-Pitts model of, 96–97 processing in brain and, 94–95 See also Perceptron Neuroscience, Master Algorithm and, 26–28 Newell, Allen, 224–226, 302 Newhouse, Neil, 17 Newman, Mark, 160 Newton, Isaac, 293 attribute selection, 189 laws of, 4, 14, 15, 46, 235 rules of induction, 65–66, 81, 82 Newtonian determinism, Laplace and, 145 Newton phase of science, 39–400 New York Times (newspaper), 115, 117 Ng, Andrew, 117, 297 Nietzche, Friedrich, 178 NIPS.


pages: 479 words: 144,453

Homo Deus: A Brief History of Tomorrow by Yuval Noah Harari


23andMe, agricultural Revolution, algorithmic trading, Anne Wojcicki, anti-communist, Anton Chekhov, autonomous vehicles, Berlin Wall, call centre, Chris Urmson, cognitive dissonance, Columbian Exchange, computer age, Deng Xiaoping, don't be evil, European colonialism, experimental subject, falling living standards, Flash crash, Frank Levy and Richard Murnane: The New Division of Labor, glass ceiling, global village, invention of writing, invisible hand, Isaac Newton, job automation, Kevin Kelly, means of production, Mikhail Gorbachev, Minecraft, Moneyball by Michael Lewis explains big data, mutually assured destruction, new economy, pattern recognition, Peter Thiel, placebo effect, Ray Kurzweil, self-driving car, Silicon Valley, Silicon Valley ideology, stem cell, Steven Pinker, telemarketer, too big to fail, trade route, Turing machine, Turing test, ultimatum game, Watson beat the top human players on Jeopardy!

Rebecca Morelle, ‘Google Machine Learns to Master Video Games’, BBC, 25 February 2015, accessed 12 August 2015,; Elizabeth Lopatto, ‘Google’s AI Can Learn to Play Video Games’, The Verge, 25 February 2015, accessed 12 August 2015,; Volodymyr Mnih et al., ‘Human-Level Control through Deep Reinforcement Learning’, Nature, 26 February 2015, accessed 12 August 2015, 14. Michael Lewis, Moneyball: The Art of Winning an Unfair Game (New York: W. W. Norton, 2003). Also see the 2011 film Moneyball, directed by Bennett Miller and starring Brad Pitt as Billy Beane. 15. Frank Levy and Richard Murnane, The New Division of Labor: How Computers are Creating the Next Job Market (Princeton: Princeton University Press, 2004); Dormehl, The Formula, 225–6. 16. Tom Simonite, ‘When Your Boss is an Uber Algorithm’, MIT Technology Review, 1 December 2015, retrieved 4 February 2016, 17.

When people realise how fast we are rushing towards the great unknown, and that they cannot count even on death to shield them from it, their reaction is to hope that somebody will hit the brakes and slow us down. But we cannot hit the brakes, for several reasons. Firstly, nobody knows where the brakes are. While some experts are familiar with developments in one field, such as artificial intelligence, nanotechnology, big data or genetics, no one is an expert on everything. No one is therefore capable of connecting all the dots and seeing the full picture. Different fields influence one another in such intricate ways that even the best minds cannot fathom how breakthroughs in artificial intelligence might impact nanotechnology, or vice versa. Nobody can absorb all the latest scientific discoveries, nobody can predict how the global economy will look in ten years, and nobody has a clue where we are heading in such a rush.

However, Dataists believe that humans can no longer cope with the immense flows of data, hence they cannot distil data into information, let alone into knowledge or wisdom. The work of processing data should therefore be entrusted to electronic algorithms, whose capacity far exceeds that of the human brain. In practice, this means that Dataists are sceptical about human knowledge and wisdom, and prefer to put their trust in Big Data and computer algorithms. Dataism is most firmly entrenched in its two mother disciplines: computer science and biology. Of the two, biology is the more important. It was the biological embracement of Dataism that turned a limited breakthrough in computer science into a world-shattering cataclysm that may completely transform the very nature of life. You may not agree with the idea that organisms are algorithms, and that giraffes, tomatoes and human beings are just different methods for processing data.


pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence by John Brockman


3D printing, agricultural Revolution, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, artificial general intelligence, augmented reality, autonomous vehicles, bitcoin, blockchain, clean water, cognitive dissonance, Colonization of Mars, complexity theory, computer age, computer vision, constrained optimization, corporate personhood, cosmological principle, cryptocurrency, cuban missile crisis, Danny Hillis, dark matter, discrete time, Elon Musk, Emanuel Derman, endowment effect, epigenetics, Ernest Rutherford, experimental economics, Flash crash, friendly AI, Google Glasses, hive mind, income inequality, information trail, Internet of things, invention of writing, iterative process, Jaron Lanier, job automation, John von Neumann, Kevin Kelly, knowledge worker, loose coupling, microbiome, Moneyball by Michael Lewis explains big data, natural language processing, Network effects, Norbert Wiener, pattern recognition, Peter Singer: altruism, phenotype, planetary scale, Ray Kurzweil, recommendation engine, Republic of Letters, RFID, Richard Thaler, Rory Sutherland, Search for Extraterrestrial Intelligence, self-driving car, sharing economy, Silicon Valley, Skype, smart contracts, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, superintelligent machines, supervolcano, the scientific method, The Wisdom of Crowds, theory of mind, Thorstein Veblen, too big to fail, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!, Y2K

The reason is simple: Each of us just knows that if we are the one conducting an interview, we will learn a lot about the candidate. It might well be that other people are not good at this task, but I am! This illusion, in direct contradiction to empirical research, means that we continue to choose employees the same way we always did. We size them up, eye to eye. One domain where some progress has been made in adopting a more scientific approach to job-candidate selection is sports, as documented by the Michael Lewis book and movie Moneyball. However, it would be a mistake to think there has been a revolution in how decisions are made in sports. It’s true that most professional sports teams now hire data analysts to help them evaluate potential players, improve training techniques, and devise strategies. But the final decisions about which players to draft or sign, and whom to play, are still made by coaches and general managers, who tend to put more faith in their gut than in the resident geek.

That has come from the steady Moore’s Law doubling of circuit density every two years or so, not from any fundamentally new algorithms. That exponential rise in crunch power lets ordinary-looking computers tackle tougher problems of Big Data and pattern recognition. Consider the most popular algorithms in Big Data and machine learning. One algorithm is unsupervised (requires no teacher to label data). The other is supervised (requires a teacher). They account for a great deal of applied AI. The unsupervised algorithm is called k-means clustering, arguably the most popular algorithm for working with Big Data. It clusters like with like and underlies Google News. Start with a million data points. Group them into 10 or 50 or 100 clusters or patterns. That’s a computationally hard problem. But k-means clustering has been an iterative way to form the clusters since at least the 1960s.

Nowadays we have some novel performing entities, such as Apple Siri, Microsoft Cortana, Google Now, and Amazon Echo. These exciting modern services often camp it up with “female” vocal chat. They talk like Turing women—or, rather, they emit lines of dialog somewhat like voice-over actresses. However, they also offer swift access to vast fields of combinatorial Big Data that no human brain could ever contain, or will ever contain. These services are not stand-alone Turing Machines. They’re amorphous global networks, combing through clouds of Big Data, algorithmically cataloging responses from human users, providing real-time user response with wireless broadband, while wearing the pseudohuman mask of a fake individual so as to meet some basic interface-design needs. That’s what they are. Every aspect of the tired “artificial intelligence” metaphor actively gets in the way of our grasping how, why, where, and for whom that is done.