10 results back to index
Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier
23andMe, Affordable Care Act / Obamacare, airport security, AltaVista, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, IBM and the Holocaust, index card, informal economy, intangible asset, Internet of things, invention of the printing press, Jeff Bezos, lifelogging, Louis Pasteur, Mark Zuckerberg, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, performance metric, Peter Thiel, Post-materialism, post-materialism, random walk, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, speech recognition, Steve Jobs, Steven Levy, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Watson beat the top human players on Jeopardy!
Next [>] Mike Flowers and New York City’s analytics—Based on interview with Cukier, July 2012. For a good description, see: Alex Howard, “Predictive data analytics is saving lives and taxpayer dollars in New York City,” O’Reilly Media, June 26, 2012 (http://strata.oreilly.com/2012/06/predictive-data-analytics-big-data-nyc.html). [>] Walmart and Pop-Tarts—Hays, “What Wal-Mart Knows About Customers’ Habits.” [>] Big data’s use in slums and in modeling refugee movements—Nathan Eagle, “Big Data, Global Development, and Complex Systems,” http://www.youtube.com/watch?v=yaivtqlu7iM. Perception of time—Benedict Anderson, Imagined Communities (Verso, 2006). [>] “What’s past is prologue”—William Shakespeare, “The Tempest,” Act 2, Scene I. [>] CERN experiment and data storage—Cukier email exchange with CERN researchers, November 2012.
We will still need causal studies and controlled experiments with carefully curated data in certain cases, such as designing a critical airplane part. But for many everyday needs, knowing what not why is good enough. And big-data correlations can point the way toward promising areas in which to explore causal relationships. These quick correlations let us save money on plane tickets, predict flu outbreaks, and know which manholes or overcrowded buildings to inspect in a resource-constrained world. They may enable health insurance firms to provide coverage without a physical exam and lower the cost of reminding the sick to take their medication. Languages are translated and cars drive themselves on the basis of predictions made through big-data correlations. Walmart can learn which flavor Pop-Tarts to stock at the front of the store before a hurricane. (Answer: strawberry.) Of course, causality is nice when you can get it.
. [>] Recommendations one-third of Amazon’s income—This figure has never been officially confirmed by the company but has been published in numerous analyst reports and articles in the media, including “Building with Big Data: The Data Revolution Is Changing the Landscape of Business,” The Economist, May 26, 2011 (http://www.economist.com/node/18741392/). The figure was also referenced by two former Amazon executives in interviews with Cukier. Netflix price information—Xavier Amatriain and Justin Basilico, “Netflix Recommendations: Beyond the 5 stars (Part 1),” Netflix blog, April 6, 2012. [>] “Fooled by Randomness”—Nassim Nicholas Taleb, Fooled by Randomness (Random House, 2008); for more, see Nassim Nicholas Taleb, The Black Swan: The Impact of the Highly Improbable (2nd ed., Random House, 2010). [>] Walmart and Pop-Tarts—Constance L. Hays, “What Wal-Mart Knows About Customers’ Habits,” New York Times, November 14, 2004 (http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.html). [>] Examples of predictive models by FICO, Experian, and Equifax—Scott Thurm, “Next Frontier in Credit Scores: Predicting Personal Behavior,” Wall Street Journal, October 27, 2011 (http://online.wsj.com/article/SB10001424052970203687504576655182086300912.html). [>] Aviva’s predictive models—Leslie Scism and Mark Maremont, “Insurers Test Data Profiles to Identify Risky Clients,” Wall Street Journal, November 19, 2010 (http://online.wsj.com/article/SB10001424052748704648604575620750998072986.html).
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz
affirmative action, AltaVista, Amazon Mechanical Turk, Asian financial crisis, Bernie Sanders, big data - Walmart - Pop Tarts, Cass Sunstein, computer vision, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, desegregation, Donald Trump, Edward Glaeser, Filter Bubble, game design, happiness index / gross national happiness, income inequality, Jeff Bezos, John Snow's cholera map, Mark Zuckerberg, Nate Silver, peer-to-peer lending, Peter Thiel, price discrimination, quantitative hedge fund, Ronald Reagan, Rosa Parks, sentiment analysis, Silicon Valley, statistical model, Steve Jobs, Steven Levy, Steven Pinker, TaskRabbit, The Signal and the Noise by Nate Silver, working poor
And, in the prediction business, you just need to know that something works, not why. For example, Walmart uses data from sales in all their stores to know what products to shelve. Before Hurricane Frances, a destructive storm that hit the Southeast in 2004, Walmart suspected—correctly—that people’s shopping habits may change when a city is about to be pummeled by a storm. They pored through sales data from previous hurricanes to see what people might want to buy. A major answer? Strawberry Pop-Tarts. This product sells seven times faster than normal in the days leading up to a hurricane. Based on their analysis, Walmart had trucks loaded with strawberry Pop-Tarts heading down Interstate 95 toward stores in the path of the hurricane. And indeed, these Pop-Tarts sold well. Why Pop-Tarts? Probably because they don’t require refrigeration or cooking.
Probably because they don’t require refrigeration or cooking. Why strawberry? No clue. But when hurricanes hit, people turn to strawberry Pop-Tarts apparently. So in the days before a hurricane, Walmart now regularly stocks its shelves with boxes upon boxes of strawberry Pop-Tarts. The reason for the relationship doesn’t matter. But the relationship itself does. Maybe one day food scientists will figure out the association between hurricanes and toaster pastries filled with strawberry jam. But, while waiting for some such explanation, Walmart still needs to stock its shelves with strawberry Pop-Tarts when hurricanes are approaching and save the Rice Krispies treats for sunnier days. This lesson is also clear in the story of Orley Ashenfelter. What Seder is to horses, Ashenfelter, an economist at Princeton, may be to wine. A little over a decade ago, Ashenfelter was frustrated.
., 227, 228 127 Hours (movie), 90, 91 Optimal Decisions Group, 262 Or, Flora, 266 Ortiz, David “Big Papi,” 197–200, 200n, 203 “out-of-sample” tests, 250–51 Page, Larry, 60, 61, 62, 103 pancreatic cancer, Columbia University-Microsoft study of, 28–29 Pandora, 203 Pantheon project (Massachusetts Institute of Technology), 184–85 parents/parenting and child abuse, 145–47, 149–50, 161 and examples of Big Data searches, 22 and prejudice against children, 134–36, 135n Parks, Rosa, 93, 94 Parr, Ben, 153–54 Pathak, Parag, 235–36 PatientsLikeMe.com, 205 patterns, and data science as intuitive, 27, 33 Paul, Chris, 37 paying back loans, 257–61 PECOTA model, 199–200, 200n pedigrees of basketball players, 67 of horses, 66–67, 69, 71 pedometer, Chance emphasis on, 252–53 penis and Freud’s theories, 46 and phallic symbols in dreams, 46–47 size of, 17, 19, 123–24, 124n, 127 “penistrian,” 45, 46, 48, 50 Pennsylvania State University, income of graduates of, 237–39 Peysakhovich, Alex, 254 phallic symbols, in dreams, 46–48 Philadelphia Daily News, and words as data, 95 Philippines, cigarette economy in, 102 physical appearance and dating, 82, 120n and parents prejudice against children, 135–36 and truth about sex, 120, 120n, 125–26, 127 physics, as science, 272–73 pictures, as data, 97–102, 103 Pierson, Emma, 160n Piketty, Thomas, 283 Pinky Pizwaanski (horse), 70 pizza, information about, 77 PlentyOfFish (dating site), 139 Plomin, Robert, 249–50 political science, and digital revolution, 244, 274 politics and A/B testing, 211–14 complexity of, 273 and ignoring what people tell you, 157 and origin of political preferences, 169–71 and truth about the internet, 140–44 and words as data, 95–97 See also conservatives; Democrats; liberals; Republicans polls Google searches compared with, 9 and lying, 107 reliability of, 12 See also specific poll or topic Pop-Tarts, 72 Popp, Noah, 202 Popper, Karl, 45, 272, 273 PornHub (website), 14, 50–52, 54, 116, 120–22, 274 pornography as addiction, 219 and bias of social media, 151 and breastfeeding, 19 cartoon, 52 child, 121 and digital revolution, 279 and gays, 114–15, 114n, 116, 117, 119 honesty of data about, 53–54 and incest, 50–52 in India, 19 and lying, 110 popular videos on, 152 popularity of, 53, 151 and power of Big Data, 53 search engines for, 61n and truth about sex, 114–15, 117 unemployed and, 58, 59 Posada, Jorge, 200 poverty and life expectancy, 176–78 and words as data, 93, 94 See also income distribution predictions and data science as intuitive, 27 and getting the numbers right, 74 and what counts as data, 74 and what vs. why it works, 71 See also specific topic pregnancy, 20, 187–90 prejudice implicit, 132–34 of parents against children, 134–36, 135n subconscious, 134, 163 truth about, 128–40, 162–63 See also bias; hate; race/racism; Stormfront Premise, 101–2, 103 price discrimination, 262–65 prison conditions, and crime, 235 privacy issues, and danger of empowered government, 267–70 property rights, and words as data, 93, 94 proquest.com, 95 Prosper (lending site), 257 Psy, “Gangnam Style” video of, 152 psychics, 266 psychology and digital revolution, 274, 277–78, 279 as science, 273 as soft science, 273 and traditional research methods, 274 Quantcast, 137 questions asking the right, 21–22 and dating, 82–83 race/racism causes of, 18–19 elections of 2008 and, 2, 6–7, 12, 133 elections of 2012 and, 2–3, 8, 133 elections of 2016 and, 8, 11, 12, 14, 133 explicit, 133, 134 and Harvard Crimson editorial about Zuckerberg, 155 and lying, 109 map of, 7–9 and Obama, 2, 6–7, 8–9, 12, 133, 240, 243–44 and predicting success in basketball, 35, 36–37 and Republicans, 3, 7, 8 Stephens-Davidowitz’s study of, 2–3, 6–7, 12, 14, 243–44 and Trump, 8, 9, 11, 12, 14, 133 and truth about hate and prejudice, 129–34, 162–63 See also Muslims; “nigger” randomized controlled experiments and A/B testing, 209–21 and causality, 208–9 rape, 121–22, 190–91 Rawlings, Craig, 80 “rawtube” (porn site), 59 Reagan, Andy, 88, 90, 91 Reagan, Ronald, 227 regression discontinuity, 234–36 Reisinger, Joseph, 101–2, 103 relationships, lasting, 31–33 religion, and life expectancy, 177 Renaissance (hedge fund), 246 Republicans core principles of, 94 and origins of political preferences, 170–71 and racism, 3, 7, 8 and words as data, 93–97 See also specific person or election research and expansion of research methodology, 275–76 See also specific researcher or research reviews, of businesses, 265 “Rocket Tube” (gay porn site), 115 Rolling Stones, 278 Romney, Mitt, 10, 212 Roseau County, Minnesota, successful/notable Americans from, 186, 187 Runaway Bride (movie), 192, 195 sabermetricians, 198–99 San Bernardino, California, shooting in, 129–30 Sands, Emily, 202 science and Big Data, 273 and experiments, 272–73 real, 272–73 at scale, 276 soft, 273 search engines differentiation of Google from other, 60–62 for pornography, 61n reliability of, 60 word-count, 71 See also specific engine searchers, typing errors by, 48–50 searches negative words used in, 128–29 See also specific search “secrets about people,” 155–56 Seder, Jeff, 63–66, 68–70, 71, 74, 155, 256 segregation, 141–44.
23andMe, Affordable Care Act / Obamacare, Albert Einstein, big data - Walmart - Pop Tarts, bioinformatics, business intelligence, call centre, cloud computing, computer age, conceptual framework, Credit Default Swap, crowdsourcing, Daniel Kahneman / Amos Tversky, Danny Hillis, data is the new oil, David Brooks, East Village, Edward Snowden, Emanuel Derman, Erik Brynjolfsson, everywhere but in the productivity statistics, Frederick Winslow Taylor, Google Glasses, impulse control, income inequality, indoor plumbing, industrial robot, informal economy, Internet of things, invention of writing, John Markoff, John von Neumann, lifelogging, Mark Zuckerberg, market bubble, meta analysis, meta-analysis, money market fund, natural language processing, obamacare, pattern recognition, payday loans, personalized medicine, precision agriculture, pre–internet, Productivity paradox, RAND corporation, rising living standards, Robert Gordon, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, speech recognition, statistical model, Steve Jobs, Steven Levy, The Design of Experiments, the scientific method, Thomas Kuhn: the structure of scientific revolutions, unbanked and underbanked, underbanked, Von Neumann architecture, Watson beat the top human players on Jeopardy!
Exploiting correlation is the first wave of the big-data phenomenon, and it can be extremely powerful. Indeed, useful and profitable observations increasingly do come from “listening to the data” to find correlations. A handful of large corporations have been at this for years, using their own data. A canonical example of this kind of data discovery is the Pop-Tarts-and-beer case at Walmart from a decade ago. The giant retailer, mining the historical purchasing data from its stores, found that consumers in the path of a predicted hurricane bought strawberry Pop-Tarts at seven times the usual rate and the best-selling item of all before a hurricane was beer. Walmart’s store managers don’t care why that purchasing pattern occurs. They’re just going to stock up on beer and strawberry Pop-Tarts when hurricane warnings come their way.
Randall, 40 Mount Sinai Hospital, 8, 13–14, 15 data science and genomic research at, 163–65, 171, 173–81 medical data and human experience, 68–70 Mundie, Craig, 203 Nakashima, George, 65 Naked Society, The (Packard), 184 Narayanan, Arvind, 204 Nest learning thermostat, 143–45 Google and, 152–53 human behavior and, 147–52 Never-Ending Language Learning system (NELL), of Carnegie Mellon University, 110–11 New York State, Medicaid fraud prevention in, 48 Norvig, Peter, 116 Norway, 48 “notice and choice,” in data collection of personal information, 186, 187–88 Noyes, Eliot, 49 “numerical imagination,” of Hammerbacher, 13–14 Oak Ridge National Laboratory, 176 Obama administration, big data and, 203–4 O’Donnell, Tim, 180–81 OfficeMax, 188–89 Olmo, Harold, 126 Olson, Mike, 101 online advertising, 84–85 as “socio-technical construct,” 193–95 open-source code, IBM and, 9 operations research, 154 optimization, at IBM, 46 Packard, Vance, 184 Palmisano, Samuel, 49–51, 53 “Parable of Google Flu: Traps in Big Data Analysis, The” (Science), 108 Pattern Recognition (Gibson), 154 Paul, Sharoda, 135 payday lending market, 104–7 Pennebaker, James, 199 Pentland, Alex, 15, 203–4, 206 Perlich, Claudia, 120 personality traits, values, and needs, 198–99 personally identifying information, privacy concerns and, 187–92 Pieroni, Stephanie, 36 Pitts, Martha, 57 Pitts, Shereline, 57 Pop-Tarts, beer, and hurricane data, 104 precision agriculture, E. & J.
., 5–6 Snyder, Steven, 165–67, 170 social networks, research using human behavior and, 86–94 retail use, 153–62 spread of information and, 73–74 Twitter posts and, 197–202 see also privacy concerns Social Security numbers, data used to predict person’s, 187–88 software, origin of term, 96 Solow, Robert, 72 Speakeasy programming language, 160 Spee (Harvard club), 28–30 Spohrer, Jim, 25 Stanford University, 211–12 Starbucks, 157 Stockholm, rush-hour pricing in, 47 storytelling, computer algorithms and, 120–21, 149, 165–66, 205, 214 structural racism, in big data racial profiling, 194–95 Structure of Scientific Revolutions, The (Kuhn), 175 Sweeney, Latanya, 193–95 System S, at IBM, 40 Tarbell, Ida, 208 Taylor, Frederick Winslow, 207–8 Tecco, Halle, 16, 25, 28, 168–69 Tetlock, Philip, 67–68 thermostats, learning by, 143–45, 147–53 Thinking, Fast and Slow (Kahneman), 66–67 toggling, 84 Truth in Lending Act (1968), 185 T-shaped people, 25 Tukey, John, 96–97 Turing, Alan, 178–79 Tversky, Amos, 66 Twitter, 85 posts studied for personal information, 197–202 “Two Cultures, The” (Snow), 5–6 “universal machine” (Turing’s theoretical computer), 179 universities, data science and, 15–16, 97–98, 211–12 Unlocking the Value of Personal Data: From Collection to Usage (World Economic Forum), 203 “Unreasonable Effectiveness of Data, The” (Norvig), 116 use-only restrictions, on data, 203 Uttamchandani, Menka, 77–78, 80, 212 VALS (Values, Attitudes, and Lifestyles), 155 Van Alstyne, Marshall, 74 Vance, Ashlee, 85 Vargas, Veronica, 159–60 Varma, Anil, 136–37 Veritas, 91 vineyards, data used for precision agriculture in, 123–33, 212 Vivero, David, 29 Vladeck, David, 203, 204 von Neumann, John, 54 Von Neumann architecture, 54 Walker, Donald, 2, 63, 212 Walmart, 104, 154 Watson, Thomas Jr., 49 Watson technology, of IBM, 45, 66–67, 120, 205 as cloud service, 9, 54 Jeopardy and, 7, 40, 111, 114 medical diagnoses and, 69–70, 109 Watts, Duncan J., 86 weather analysis, with big data, 129–32 Weitzner, Daniel, 184 “Why ask Why?” (Gelman and Imbens), 115–16 winemaking, precision agriculture and, 123–33, 212 Wing, Michael, 49–50 workforce rebalancing, at IBM, 57 World Economic Forum, 203 Yarkoni, Tal, 199 Yoshimi, Bill, 198 ZestFinance, data correlation and, 104–7 Zeyliger, Philip, 100–101 Zhou, Michelle, 197–202 Zuckerberg, Mark, 28, 86, 89 ABOUT THE AUTHOR Photo by Fred Conrad STEVE LOHR reports on technology, business, and economics for the New York Times.
Affordable Care Act / Obamacare, Bernie Madoff, big data - Walmart - Pop Tarts, call centre, carried interest, cloud computing, collateralized debt obligation, correlation does not imply causation, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, Emanuel Derman, housing crisis, I will remember that I didn’t make the world, and it doesn’t satisfy my equations, illegal immigration, Internet of things, late fees, mass incarceration, medical bankruptcy, Moneyball by Michael Lewis explains big data, new economy, obamacare, Occupy movement, offshore financial centre, payday loans, peer-to-peer lending, Peter Thiel, Ponzi scheme, prediction markets, price discrimination, quantitative hedge fund, Ralph Nader, RAND corporation, recommendation engine, Rubik’s Cube, Sharpe ratio, statistical model, Tim Cook: Apple, too big to fail, Unsafe at Any Speed, Upton Sinclair, Watson beat the top human players on Jeopardy!, working poor
American Express learned this the hard way: Ron Lieber, “American Express Kept a (Very) Watchful Eye on Charges,” New York Times, January 30, 2009, www.nytimes.com/2009/01/31/your-money/credit-and-debit-cards/31money.html. Douglas Merrill’s idea: Steve Lohr, “Big Data Underwriting for Payday Loans,” New York Times, January 19, 2015, http://bits.blogs.nytimes.com/2015/01/19/big-data-underwriting-for-payday-loans/. On the company web page: Website ZestFinance.com, accessed January 9, 2016, www.zestfinance.com/. A typical $500 loan: Lohr, “Big Data Underwriting.” ten thousand data points: Michael Carney, “Flush with $20M from Peter Thiel, ZestFinance Is Measuring Credit Risk Through Non-traditional Big Data,” Pando, July 31, 2013, https://pando.com/2013/07/31/flush-with-20m-from-peter-thiel-zestfinance-is-measuring-credit-risk-through-non-traditional-big-data/. one of the first peer-to-peer exchanges, Lending Club: Richard MacManus, “Facebook App, Lending Club, Passes Half a Million Dollars in Loans,” Readwrite, July 29, 2007, http://readwrite.com/2007/07/29/facebook_app_lending_club_passes_half_a_million_in_loans.
I’ve got loads of memories of people grabbing seconds of asparagus or avoiding the string beans. But they’re all mixed up and hard to formalize in a comprehensive list. The better solution would be to train the model over time, entering data every day on what I’d bought and cooked and noting the responses of each family member. I would also include parameters, or constraints. I might limit the fruits and vegetables to what’s in season and dole out a certain amount of Pop-Tarts, but only enough to forestall an open rebellion. I also would add a number of rules. This one likes meat, this one likes bread and pasta, this one drinks lots of milk and insists on spreading Nutella on everything in sight. If I made this work a major priority, over many months I might come up with a very good model. I would have turned the food management I keep in my head, my informal internal model, into a formal external one.
., schools, to return to that example, evaluates teachers largely on the basis of students’ test scores, while ignoring how much the teachers engage the students, work on specific skills, deal with classroom management, or help students with personal and family problems. It’s overly simple, sacrificing accuracy and insight for efficiency. Yet from the administrators’ perspective it provides an effective tool to ferret out hundreds of apparently underperforming teachers, even at the risk of misreading some of them. Here we see that models, despite their reputation for impartiality, reflect goals and ideology. When I removed the possibility of eating Pop-Tarts at every meal, I was imposing my ideology on the meals model. It’s something we do without a second thought. Our own values and desires influence our choices, from the data we choose to collect to the questions we ask. Models are opinions embedded in mathematics. Whether or not a model works is also a matter of opinion. After all, a key component of every model, whether formal or informal, is its definition of success.
3D printing, algorithmic trading, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, big data - Walmart - Pop Tarts, call centre, Cass Sunstein, Clayton Christensen, commoditize, computer age, death of newspapers, deferred acceptance, Edward Lorenz: Chaos theory, Erik Brynjolfsson, Filter Bubble, Flash crash, Florence Nightingale: pie chart, Frank Levy and Richard Murnane: The New Division of Labor, Google Earth, Google Glasses, High speed trading, Internet Archive, Isaac Newton, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Kevin Kelly, Kodak vs Instagram, lifelogging, Marshall McLuhan, means of production, Nate Silver, natural language processing, Netflix Prize, pattern recognition, price discrimination, recommendation engine, Richard Thaler, Rosa Parks, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Slavoj Žižek, social graph, speech recognition, Steve Jobs, Steven Levy, Steven Pinker, Stewart Brand, the scientific method, The Signal and the Noise by Nate Silver, upwardly mobile, Wall-E, Watson beat the top human players on Jeopardy!, Y Combinator
CHAPTER 3 Do Algorithms Dream of Electric Laws? Adecade ago, Walmart stumbled upon an oddball piece of information while using its data-mining algorithms to comb through the mountains of information generated by its 245 million weekly customers. What it discovered was that, alongside the expected emergency supplies of duct tape, beer and bottled water, no product saw more of an increase in demand during severe weather warnings than strawberry Pop-Tarts. To test this insight, when news broke about the impending Hurricane Frances in 2004, Walmart bosses ordered trucks stocked with the Kellogg’s snack to be delivered to all its stores in the hurricane’s path. When these sold out just as quickly, Walmart bosses knew that they had gained a valuable glimpse into both consumer habits and the power of The Formula.1 Walmart executives weren’t alone in seeing the value of this discovery.
Elias, Norbert. The Civilizing Process (New York: Urizen Books, 1978). 13 This wave metaphor was not, in itself, new: the German sociologist Norbert Elias had referred to “a wave of advancing integration over several centuries” in his book The Civilizing Process, as had other writers over the previous century. 14 Richtel, Matt. “How Big Data Is Playing Recruiter for Specialized Workers.” New York Times, April 27, 2013. nytimes.com/2013/04/28/technology/how-big-data-is-playing-recruiter-for-specialized-workers.html?_r=0. 15 Kwoh, Leslie. “Facebook Profiles Found to Predict Job Performance.” Wall Street Journal, February 21, 2012. online.wsj.com/news/articles/SB10001424052970204909104577235474086304212. 16 Bulmer, Michael. Francis Galton: Pioneer of Heredity and Biometry (Baltimore: Johns Hopkins University Press, 2003). 17 Pearson, Karl.
When these sold out just as quickly, Walmart bosses knew that they had gained a valuable glimpse into both consumer habits and the power of The Formula.1 Walmart executives weren’t alone in seeing the value of this discovery. At the time, psychologist Colleen McCue and Los Angeles police chief Charlie Beck were collaborating on a paper for the law-enforcement magazine The Police Chief. They too seized upon Walmart’s revelation as a way of reimagining police work in a form that would be more predictive and less reactive. Entitled “Predictive Policing: What Can We Learn from Walmart and Amazon about Fighting Crime in a Recession?,” their 2009 paper immediately captured the imagination of law-enforcement professionals around the country when it was published.2 What McCue and Beck meant by “predictive policing” was that, thanks to advances in computing, crime data could now be gathered and analyzed in near-real time—and subsequently used to anticipate, prevent and respond more effectively to those crimes that would take place in the future.
The Internet of Us: Knowing More and Understanding Less in the Age of Big Data by Michael P. Lynch
Affordable Care Act / Obamacare, Amazon Mechanical Turk, big data - Walmart - Pop Tarts, bitcoin, Cass Sunstein, Claude Shannon: information theory, crowdsourcing, Edward Snowden, Firefox, Google Glasses, hive mind, income inequality, Internet of things, John von Neumann, meta analysis, meta-analysis, Nate Silver, new economy, patient HM, prediction markets, RFID, sharing economy, Steve Jobs, Steven Levy, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, WikiLeaks
Similarly, Google Flu Trends doesn’t care why people are searching as they do; it just correlates the data. And Walmart doesn’t care why people buy more Pop-Tarts before a hurricane, nor do insurance companies care why certain credit scores correlate with certain medication adherences; they care only that they do. As Viktor Mayer-Schönberger and Kenneth Cukier put it, “predictions based on correlations lie at the heart of big data. Correlation analyses are now used so frequently that we sometimes fail to appreciate the inroads they have made. And the uses will only increase.” 4 Does the use of big data in this way however, really signal the end of theory, as Anderson alleged? The answer is no. And, as we’ll see, that is a very good thing. Start with Rudder and Anderson’s remarks. As Rudder puts it, big data seems to allow us to investigate by direct inspection.
As a consequence of the increasing importance of data analytics, we might employ “big data” in a third sense—to refer to firms like Google or Amazon that utilize data analytics as an essential part of their business model, and government agencies like the NSA that use these techniques as an essential part of, well, their business model. In this third sense, Big Data is like Big Oil. Large oil conglomerates are powerful because they control how the world’s major energy resource is not only distributed but how it is extracted. The tech giants are similar. Energy is not information, but both are resources, and resources by which the world runs. And Big Data, like Big Oil, is big precisely because it can control access to data as well as the extraction of information and knowledge from that data. Big Data refines data for information and knowledge, and we need to pay attention to that fact because knowledge, like energy, is not just a passive, inert resource.
Search as I just did for “Web 3.0 and …” and Google will suggest “big data” and “education”; search for “knowledge and …” and you might get “power” and “information systems.” Complete is a familiar, if rather gentle, form of big data analysis. It works because Google knows not only what much of the world is searching for on the Web, but also what you’ve been searching for. That data is useless without Google’s propriety analytic tools for transforming the numbers and words into a predictive search. These predictions aren’t perfect. But they are amazingly good, and getting better all the time. Google has done more than perhaps any other single high-profile company or entity to usher in the brave new world of big data. As I noted in the first chapter, the term “big data” can refer to three different things. The first is the ever-expanding volume of data being collected by our digital devices.
Platform Revolution: How Networked Markets Are Transforming the Economy--And How to Make Them Work for You by Sangeet Paul Choudary, Marshall W. van Alstyne, Geoffrey G. Parker
3D printing, Affordable Care Act / Obamacare, Airbnb, Alvin Roth, Amazon Mechanical Turk, Amazon Web Services, Andrei Shleifer, Apple's 1984 Super Bowl advert, autonomous vehicles, barriers to entry, big data - Walmart - Pop Tarts, bitcoin, blockchain, business process, buy low sell high, chief data officer, Chuck Templeton: OpenTable, clean water, cloud computing, connected car, corporate governance, crowdsourcing, data acquisition, data is the new oil, digital map, discounted cash flows, disintermediation, Edward Glaeser, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, financial innovation, Haber-Bosch Process, High speed trading, information asymmetry, Internet of things, inventory management, invisible hand, Jean Tirole, Jeff Bezos, jimmy wales, John Markoff, Khan Academy, Kickstarter, Lean Startup, Lyft, Marc Andreessen, market design, Metcalfe’s law, multi-sided market, Network effects, new economy, payday loans, peer-to-peer lending, Peter Thiel, pets.com, pre–internet, price mechanism, recommendation engine, RFID, Richard Stallman, ride hailing / ride sharing, Robert Metcalfe, Ronald Coase, Satoshi Nakamoto, self-driving car, shareholder value, sharing economy, side project, Silicon Valley, Skype, smart contracts, smart grid, Snapchat, software is eating the world, Steve Jobs, TaskRabbit, The Chicago School, the payments system, Tim Cook: Apple, transaction costs, two-sided market, Uber and Lyft, Uber for X, winner-take-all economy, zero-sum game, Zipcar
It referred to an unusual housing option for professionals who planned to attend the upcoming joint convention of two industrial design organizations, the International Congress of Societies of Industrial Design (ICSID) and the Industrial Designers Society of America (IDSA): If you’re heading out to the ICSID/IDSA World Congress/Connecting ’07 event in San Francisco next week and have yet to make accommodations, well, consider networking in your jam-jams. That’s right. For “an affordable alternative to hotels in the city,” imagine yourself in a fellow design industry person’s home, fresh awake from a snooze on the ol’ air mattress, chatting about the day’s upcoming events over Pop Tarts and OJ. The hosts for this “networking in your jam-jams” opportunity were Brian Chesky and Joe Gebbia, budding designers who’d moved to San Francisco only to find they couldn’t afford the rent on the loft they shared. Strapped for cash, they impulsively decided to make air mattresses and their own services as part-time tour guides available to convention attendees. Chesky and Gebbia attracted three weekend guests and made a thousand bucks, which covered the next month’s rent.
He lists eight markets with the potential to generate new multi-billion-dollar industries based on smart connections among industrial devices: • Security: using platform-based networks to protect industrial assets from attacks • Network: designing, building, and servicing the networks that will link and control industrial tools • Connected services: developing software and systems to manage the new networks • Product as a service: transitioning industrial companies from selling machines and tools to selling services facilitated by platform connections • Payments: implementing new ways to create and capture value from industrial equipment • Retrofits: equipping the $6.8 trillion worth of existing industrial machinery in the U.S. to participate in the new industrial Internet • Translation: teaching a wide array of devices and software systems to share data and communicate with one another • Vertical applications: finding ways to connect industrial tools at various places in the value chain to solve specific problems In total, Mount concludes (drawing on data from a World Economic Forum report) that the Industrial Awakening will generate $14.2 trillion of global output by 2030.13 Economist Jeremy Rifkin has deftly summarized this development, as well as some of its broader implications: There are now 11 billion sensors connecting devices to the internet of things. By 2030, 100 trillion sensors will be [in place] … continually sending big data to the communications, energy and logistics internets. Anyone will be able to access the internet of things and use big data and analytics to develop predictive algorithms that can speed efficiency, dramatically increase productivity and lower the marginal cost of producing and distributing physical things, including energy, products and services, to near zero, just as we now do with information goods.14 We may not be on the verge of seeing the majority of physical goods priced at or even near to zero—not yet.
In response, over 2,000 extension developers signed up in the first twelve months. The power of APIs to attract extension developers and the value they can create is enormous. Compare the financial results experienced by two major retailers: traditional giant Walmart and online platform Amazon. Amazon has some thirty-three open APIs as well as over 300 API “mashups” (i.e., combination tools that span two or more APIs), enabling e-commerce, cloud computing, messaging, search engine optimization, and payments. By contrast, Walmart has just one API, an e-commerce tool.14 Partly as a result of this difference, Amazon’s stock market capitalization exceeded that of Walmart for the first time in June 2015, reflecting Wall Street’s bullish view of Amazon’s future growth prospects.15 Other platform businesses have reaped similar benefits from their APIs. Cloud computing and computer services platform Salesforce generates 50 percent of its revenues through APIs, while for travel platform Expedia, the figure is 90 percent.16 The third category of developers who add value to the interactions on a platform are data aggregators.
Move Fast and Break Things: How Facebook, Google, and Amazon Cornered Culture and Undermined Democracy by Jonathan Taplin
1960s counterculture, 3D printing, affirmative action, Affordable Care Act / Obamacare, Airbnb, Amazon Mechanical Turk, American Legislative Exchange Council, Apple's 1984 Super Bowl advert, back-to-the-land, barriers to entry, basic income, battle of ideas, big data - Walmart - Pop Tarts, bitcoin, Brewster Kahle, Buckminster Fuller, Burning Man, Clayton Christensen, commoditize, creative destruction, crony capitalism, crowdsourcing, data is the new oil, David Brooks, David Graeber, don't be evil, Donald Trump, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, Elon Musk, equal pay for equal work, Erik Brynjolfsson, future of journalism, future of work, George Akerlof, George Gilder, Google bus, Hacker Ethic, Howard Rheingold, income inequality, informal economy, information asymmetry, information retrieval, Internet Archive, Internet of things, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: technological unemployment, John von Neumann, Joseph Schumpeter, Kevin Kelly, Kickstarter, labor-force participation, life extension, Marc Andreessen, Mark Zuckerberg, Menlo Park, Metcalfe’s law, Mother of all demos, move fast and break things, move fast and break things, natural language processing, Network effects, new economy, Norbert Wiener, offshore financial centre, packet switching, Paul Graham, Peter Thiel, Plutocrats, plutocrats, pre–internet, Ray Kurzweil, recommendation engine, rent-seeking, revision control, Robert Bork, Robert Gordon, Robert Metcalfe, Ronald Reagan, Sand Hill Road, secular stagnation, self-driving car, sharing economy, Silicon Valley, Silicon Valley ideology, smart grid, Snapchat, software is eating the world, Steve Jobs, Stewart Brand, technoutopianism, The Chicago School, The Market for Lemons, Tim Cook: Apple, trade route, transfer pricing, trickle-down economics, Tyler Cowen: Great Stagnation, universal basic income, unpaid internship, We wanted flying cars, instead we got 140 characters, web application, Whole Earth Catalog, winner-take-all economy, women in the workforce, Y Combinator
And it’s ludicrous to believe that this stuff doesn’t alter our brains. It’s also equally ludicrous to believe that—at the very least—this mass distraction and manipulation is not convenient for the people who are in charge. People are starving. They may not know it because they’re being fed mass-produced garbage. The packaging is colorful and loud, but it’s produced in the same factories that make Pop-Tarts and iPads by people sitting around thinking, “What can we do to get people to buy more of these?” And they’re very good at their jobs. But that’s what it is you’re getting, because that’s what they’re making. They’re selling you something. And the world is built on this now. Politics and government are built on this; corporations are built on this. Interpersonal relationships are built on this.
The very rich, when they get to be 130 years old or more, would be so fearful of ordinary causes of death—a car accident, a plane crash, a terrorist bomb—that, having spent millions of dollars on immortality, they might be afraid to leave their mansions for fear of losing money on their investment. I would say it takes no big leap to guess that both Peter Thiel and Larry Page truly believe that technology can deliver happiness. In a new book, The Internet of Us: Knowing More and Understanding Less in the Age of Big Data, Michael Patrick Lynch starts with a thought experiment: “Imagine a society where smartphones are miniaturized and hooked directly into a person’s brain.” Google’s Larry Page is already working on this. Then Lynch takes us several generations into the future, where we have stopped learning by observation and reason and have become totally dependent on the Google Now chip in our brains. And then imagine some disaster disables the worldwide communications grid.
Schumacher, Small Is Beautiful: Economics as if People Mattered (New York: Harper, 1973). Yuval Levin, Fractured Republic: Renewing America’s Social Contract in the Age of Individualism (New York: Basic Books, 2016). Toni Morrison, Ta-Nehisi Coates, and Sonia Sanchez, “Art is Dangerous,” VOX, June 17, 2016, www.vox.com/2016/6/17/11955704/ta-nehisi-coates-toni-morrison-sonia-sanchez-in-conversation. Yuval Noah Harari, “Big Data, Google, and the End of Free Will,” Financial Times, August 26, 2016, www.ft.com/content/50bb4830-6a4c-11e6-ae5b-a7cc5dd5a28c. Thank you for buying this ebook, published by Hachette Digital. To receive special offers, bonus content, and news about our latest ebooks and apps, sign up for our newsletters. Sign Up Or visit us at hachettebookgroup.com/newsletters
The Future of the Professions: How Technology Will Transform the Work of Human Experts by Richard Susskind, Daniel Susskind
23andMe, 3D printing, additive manufacturing, AI winter, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, Andrew Keen, Atul Gawande, Automated Insights, autonomous vehicles, Big bang: deregulation of the City of London, big data - Walmart - Pop Tarts, Bill Joy: nanobots, business process, business process outsourcing, Cass Sunstein, Checklist Manifesto, Clapham omnibus, Clayton Christensen, clean water, cloud computing, commoditize, computer age, Computer Numeric Control, computer vision, conceptual framework, corporate governance, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, death of newspapers, disintermediation, Douglas Hofstadter, en.wikipedia.org, Erik Brynjolfsson, Filter Bubble, Frank Levy and Richard Murnane: The New Division of Labor, full employment, future of work, Google Glasses, Google X / Alphabet X, Hacker Ethic, industrial robot, informal economy, information retrieval, interchangeable parts, Internet of things, Isaac Newton, James Hargreaves, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Khan Academy, knowledge economy, lifelogging, lump of labour, Marshall McLuhan, Metcalfe’s law, Narrative Science, natural language processing, Network effects, optical character recognition, Paul Samuelson, personalized medicine, pre–internet, Ray Kurzweil, Richard Feynman, Richard Feynman, Second Machine Age, self-driving car, semantic web, Shoshana Zuboff, Skype, social web, speech recognition, spinning jenny, strong AI, supply-chain management, telepresence, The Future of Employment, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, transaction costs, Turing test, Watson beat the top human players on Jeopardy!, young professional
In relation to the latter, on one view, the ‘proportion of the world’s data that comes from such sensors is expected to increase from 11 percent in 2005 to 42 percent in 2020’.40 The upshot of all of this is that great volumes of data are now at large, and the broad aim of data scientists is to develop methods for collecting, analysing, and exploiting these data. Case studies of success in Big Data abound. One (not entirely uncontroversial) illustration is Google Flu Trends, a system that can identify outbreaks of flu earlier than was possible in the past, by identifying geographical clustering of users whose search requests are made up of similar symptoms. Another is provided by Walmart, which analysed the buying habits of its customers prior to hurricanes and found not just that flashlights were in greater demand but so too were Pop-Tarts; and this insight enabled them to stock up accordingly when the next storm came round. Natural language translation systems and self-driving cars are also said to operate on the back of Big Data techniques.41 While there are many ways in which Big Data is valuable,42 most specialists in the field would agree with Mayer-Schönberger and Cukier that, ‘[a]t its core, big data is about predictions … it’s about applying math to huge quantities of data in order to infer probabilities … these systems perform well because they are fed with lots of data on which to base their predictions’.43 More extravagantly, Eric Siegel, a computer scientist, goes further when he speaks of ‘computers automatically developing new knowledge and capabilities by furiously feeding on modern society’s greatest and most potent unnatural resource: data’.44 If we combine these views of Big Data, we can see its promise for the professions—as a way of making predictions and as a way of generating new knowledge.
Liddy, that the future of audit was ‘the capacity to examine 100 percent of a client’s transactions’.275 This ambition of ‘100 per-cent testing’—using all available data, and not just a representative sample—is a particular case of a more general ambition very much in vogue in statistics, as discussed by Viktor Mayer-Schönberger and Kenneth Cukier in their book Big Data. One of the general features of Big Data, the authors argue, is precisely this move from taking small samples of data to using all the data instead (as they put it, ‘from some to all’).276 The next step on from 100 per cent testing is a phenomenon referred to by auditors at the vanguard as ‘continuous auditing’. Combining ongoing review of transactions and traditional financial accounts with platforms that can draw on more varied data sources, the aim is real-time insight into a company’s financial health. Again, this is a reflection of a general ambition in Big Data—to use data derived from many different sources, in different formats, and with less formal structure (not, for example, data that are carefully presented in a spreadsheet).
Coffee, Gatekeepers: The Role of the Professions and Corporate Governance (2006), 15. 273 ‘The Dozy Watchdogs’, Economist, 13 Dec. 2014. 274 James Shanteau, ‘Cognitive Heuristics and Biases in Behavioral Auditing: Review, Comments, and Observations’, Accounting, Organizations, and Society, 14: 1 (1989), 165–77. 275 James P. Liddy, ‘The Future of Audit’, Forbes, 4 Aug. 2014 <http://www.forbes.com> (accessed 8 March 2015). 276 Viktor Mayer-Schönberger and Kenneth Cukier, Big Data: A Revolution That Will Transform How we Live, Work, and Think (2013), 26. 277 Mayer-Schönberger and Cukier, Big Data, 32. 278 Mayer-Schönberger and Cukier, Big Data, and James Surowiekcki, ‘A Billion Prices Now’, New Yorker, 30 May 2011. 279 Michael Andersen, ‘Four crowdsourcing lessons from the Guardian’s (spectacular) expenses-scandal experiment’, NiemanLab, 23 June 2009 <http://www.niemanlab.org> (accessed 8 March 2015). 280 <https://www.xbrl.org>. 281 For instance, ‘the long shadow of the gentleman architect still hangs over the profession’, in Dickon Robinson et al., ‘The Future for Architects?’
To Save Everything, Click Here: The Folly of Technological Solutionism by Evgeny Morozov
3D printing, algorithmic trading, Amazon Mechanical Turk, Andrew Keen, augmented reality, Automated Insights, Berlin Wall, big data - Walmart - Pop Tarts, Buckminster Fuller, call centre, carbon footprint, Cass Sunstein, choice architecture, citizen journalism, cloud computing, cognitive bias, creative destruction, crowdsourcing, data acquisition, Dava Sobel, disintermediation, East Village, en.wikipedia.org, Fall of the Berlin Wall, Filter Bubble, Firefox, Francis Fukuyama: the end of history, frictionless, future of journalism, game design, Gary Taubes, Google Glasses, illegal immigration, income inequality, invention of the printing press, Jane Jacobs, Jean Tirole, Jeff Bezos, jimmy wales, Julian Assange, Kevin Kelly, Kickstarter, license plate recognition, lifelogging, lone genius, Louis Pasteur, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, moral panic, Narrative Science, Nicholas Carr, packet switching, PageRank, Parag Khanna, Paul Graham, peer-to-peer, Peter Singer: altruism, Peter Thiel, pets.com, placebo effect, pre–internet, Ray Kurzweil, recommendation engine, Richard Thaler, Ronald Coase, Rosa Parks, self-driving car, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, Skype, Slavoj Žižek, smart meter, social graph, social web, stakhanovite, Steve Jobs, Steven Levy, Stuxnet, technoutopianism, the built environment, The Chicago School, The Death and Life of Great American Cities, the medium is the message, The Nature of the Firm, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas L Friedman, transaction costs, urban decay, urban planning, urban sprawl, Vannevar Bush, WikiLeaks
., 1972), 6. 182 ShotSpotter: Ethan Watters, “Shot Spotter,” Wired, March 2007, http://www.wired.com/wired/archive/15.04/shotspotter.html. 183 PredPol: on PredPol and predictive policing in general, see “Sci-fi Policing: Predicting Crime before It Occurs,” Associated Press, July 1, 2012; Joel Rubin, “Stopping Crime before It Starts,” Los Angeles Times, August 21, 2010, http://articles.latimes.com/2010/aug/21/local/la-me-predictcrime-20100427–1. 183 Consider the New York Police Department’s latest innovation: “NYPD, Microsoft Push Big Data Policing into Spotlight,” Informationweek, August 20, 2012, http://www.informationweek.com/security/privacy/nypd-microsoft-push-big-data-policing-in/240005838 . 183 “understand the unique groups in their customer base”: C. Beck and C. McCue, “Predictive Policing: What Can We Learn from Wal-Mart and Amazon About Fighting Crime in a Recession?,” Police Chief 76, no. 11 (2009), http://www.policechiefmagazine.org/magazine/index.cfm?fuseaction=print_display&article_id=1942&issue_id=112009. 185 “Predictive algorithms are not magic boxes”: Andrew Guthrie Ferguson, “Predictive Policing: The Future of Reasonable Suspicion,” Emory Law Journal, May 2, 2012, http://ssrn.com/abstract=2050001. 185 “the environmental vulnerability that encouraged”: ibid. 185 financial authorities in Hong Kong and Australia: for more on this, see Jeremy Grant, “Australia Clamps Down on ‘Algo’ Trading,” Financial Times, August 13, 2012, http://www.ft.com/intl/cms/s/0/ad11c4bc-e4f2–11e1–8e29–00144feab49a.html, and “Hong Kong Considers Annual Inspections of Algorithms,” Automated Trader, July 26, 2012, http://www.automatedtrader.net/headlines/129847/hong-kong-considers-annual-inspections-of-algorithms. 186 Facebook began using PhotoDNA: Riva Richmond, “Facebook’s New Way to Combat Child Pornography,” New York Times Gadgetwise, May 19, 2011, http://gadgetwise.blogs.nytimes.com/2011/05/19/facebook-to-combat-child-porn-using-microsofts-technology . 186 “We’ve never wanted to set up an environment”: Joseph Menn, “Social Networks Scan for Sexual Predators, with Uneven Results,” Reuters, July 12, 2012, http://www.reuters.com/article/2012/07/12/us-usa-internet-predators-idUSBRE86B05G20120712. 187 A headline that appeared in the Wall Street Journal: “Can Data Mining Stop the Killing?
Not surprisingly, gamification has already become a favorite trick in the solutionist tool kit. That everything can be gamified does not mean that everything ought to be. Wired reports on how game theorist Jesse Schell, attempting to show that gamification has its limits, gave a conference talk describing “a world in which a person’s every action—brushing their teeth, showing up to work on time, tattooing an advertisement for Pop-Tarts onto their forearm—earned points.” Alas, Schell’s attempt to encourage more critical thinking by gamification apologists backfired. As Schell told Wired, “I’ve had dozens of people come to me saying, ‘Your talk was so influential to me that I started a company. . . All I can think is, oh God, don’t blame me for that.” It all looks extremely appealing—especially to the bored and tired citizenry.
Of course, algorithms can be configured differently—and some independent labels might choose to release music that is bound to remain unpopular—but it’s hard to expect the major labels to pass up the opportunity to make more, and safer, money by deploying the algorithms. Surviving Big Data As we transition into the meme-saturated world of “algorithmic audiences,” it becomes very hard to remember the time when serious news media didn’t obsess over whether something was a “total bummer” and reported news that was important and worth caring about, regardless of how it affected the emotional well-being of the audience. To celebrate “the age of big data” and acquiesce to the ongoing invasion of journalism by various statistical measures and indicators is to give in to solutionism and endorse a very different, complacent kind of journalism. Ignorance of one’s audience—and a certain inefficiency that this introduces into the world of journalism—is not necessarily a problem that needs to be solved, even if the latest tools make the solutions trivial and obvious.