Netflix Prize

31 results back to index

pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel


Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, call centre, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil,, Erik Brynjolfsson, experimental subject, Google Glasses, happiness index / gross national happiness, job satisfaction, Johann Wolfgang von Goethe, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, software as a service, speech recognition, statistical model, Steven Levy, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra

Netflix Prize team BellKor’s Pragmatic Chaos: “BellKor’s Pragmatic Chaos Is the Winner of the $1 Million Netflix Prize!!!!” September 17, 2009. Regarding SpaceShipOne and the XPrize: XPrize Foundation, “Ansari X Prize,” XPrize Foundation, updated April 25, 2012. Netflix Prize team PragmaticTheory: PragmaticTheory website. Netflix Prize team BigChaos: Istvan Pilaszy, “Lessons That We Learned from the Netflix Prize,” Predictive Analytics World Washington, DC, Conference, October 21, 2009, Washington, DC.–13. Clive Thompson, “If You Liked This, You’re Sure to Love That,” New York Times, November 21, 2008. Netflix Prize team The Ensemble: Blog post by Aron, “Netflix Prize Conclusion,” The Ensemble, September 22, 2009.

Information Security Amazon Data Security Competition. Approaches to the Netflix Prize: Clive Thompson, “If You Liked This, You’re Sure to Love That,” New York Times, November 21, 2008. Regarding collaboration rather than competition on the Netflix Prize: Jordan Ellenberg, “This Psychologist Might Outsmart the Math Brains Competing for the Netflix Prize,” Wired, February 25, 2008. Overview of several uses of ensembles by Netflix Prize teams: Todd Holloway, “Ensemble Learning Better Predictions through Diversity,” ETech 2008, March 11, 2008. Andreas Töscher from Netflix Prize team BigChaos: “Advanced Approaches for Recommender System and the Netflix Prize,” Predictive Analytics World San Francisco Conference, February 28, 2009, San Francisco, CA.

Quote from Bart Baesens: Bart Baesens, PhD, “Building Bulletproof Models,” sascom Magazine, 3rd quarter, 2010. Chapter 5 Layperson competitors for the Netflix Prize: Eric Siegel, PhD, “Casual Rocket Scientists: An Interview with a Layman Leading the Netflix Prize, Martin Chabbert,” September 2009. $1 million Netflix Prize: Netflix Prize, September 21, 2009. Seventy percent of Netflix movie choices based on recommendations: Jeffrey M. O’Brien, “The Netflix Effect,” Wired Magazine Online, December 12, 2002. Michael Liedtke, “Netflix Recommendations Are About to Get Better, Say Execs,” Huffington Post Online, April 9, 2012. Netflix Prize team BellKor’s Pragmatic Chaos: “BellKor’s Pragmatic Chaos Is the Winner of the $1 Million Netflix Prize!!!!”


pages: 368 words: 96,825

Bold: How to Go Big, Create Wealth and Impact the World by Peter H. Diamandis, Steven Kotler


3D printing, additive manufacturing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, cloud computing, crowdsourcing, Daniel Kahneman / Amos Tversky, dematerialisation, deskilling, Elon Musk,, Exxon Valdez, fear of failure, Firefox, Galaxy Zoo, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, industrial robot, Internet of things, Jeff Bezos, John Harrison: Longitude, Jono Bacon, Just-in-time delivery, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, loss aversion, Louis Pasteur, Mahatma Gandhi, Mark Zuckerberg, Mars Rover, meta analysis, meta-analysis, microbiome, minimum viable product, move fast and break things, Narrative Science, Netflix Prize, Network effects, Oculus Rift, optical character recognition, packet switching, PageRank, pattern recognition, performance metric, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, ride hailing / ride sharing, risk tolerance, rolodex, self-driving car, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart grid, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, telepresence, telepresence robot, Turing test, urban renewal, web application, X Prize, Y Combinator

When asked about their experience, Vor-Tek member and tattoo artist Fred Giovannitti said, “We get asked all the time, ‘How long have you been in the oil industry?’ and I ask back, ‘Counting today?’ ” The lesson here is that in incentive competitions, results can come from the most unusual of places, from players you would never expect, and from technologies you might never suspect. Lee Stein, an XPRIZE benefactor, says, “When you are looking for a needle in the haystack, incentive competitions help the needle come to you.” Case Study 2: The Netflix Prize The best incentive prizes are those that solve important puzzles that people want solved and people want to solve—and there’s a difference. The Wendy Schmidt Oil Cleanup XCHALLENGE falls directly into the former category. It took me over ten years to raise the money for the Ansari XPRIZE, but Wendy Schmidt stepped forward to fund the Oil Cleanup Challenge in less than forty-eight hours. Certainly one reason I raised money for the Oil Cleanup Challenge so quickly was the fact that by then I had a track record of success and a considerably thicker Rolodex, but a more important factor was the 800,000 gallons of crude gushing into the Gulf Coast each day.

By the middle 2000s, Netflix engineers had plucked all the low-hanging fruit and the rate of Cinematch optimization had slowed to a crawl. Every time one of their recommendations was a clear miss—based on your interest in Breakfast at Tiffany’s we think you’ll enjoy Naked Lunch—customers got angry. And with new competitors sprouting up in the likes of Hulu, Amazon, and YouTube, this ire was getting expensive. So Netflix decided to attack the problem head-on, announcing the Netflix Prize in October 2006—a million-dollar purse for whoever could write an algorithm that improved their existing system by 10 percent.15 And this contest is a perfect example of what happens when you design prizes around intrinsic motivations. Competition, coding, and movies—what could be more fun than that? Within two weeks, Netflix had received nearly 170 submissions, three of them outperforming Cinematch.

Within two weeks, Netflix had received nearly 170 submissions, three of them outperforming Cinematch. Within ten months, there were over 20,000 teams from 150 different countries involved. By the time the contest was won, in 2009, that figure had doubled to 40,000 teams. But the results that Netflix saw extended far beyond the number of contestants entered in a contest. As Jordan Ellenberg explained in Wired: “Secrecy hasn’t been a big part of the Netflix competition. The prize hunters, even the leaders, are startlingly open about the methods they’re using, acting more like academics huddled over a knotty problem than entrepreneurs jostling for a $1 million payday. In December 2006, a competitor called ‘simonfunk’ posted a complete description of his algorithm—which at the time was tied for third place—giving everyone else the opportunity to piggyback on his progress. ‘We had no idea the extent to which people would collaborate with each other,’ says Jim Bennett, vice president for recommendation systems at Netflix.”16 And this isn’t an aberration.


Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport


Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, data acquisition, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, Mark Zuckerberg, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining

GE is primarily focused on big data for improving services and is already using data science to optimize the service contracts and maintenance intervals for industrial products. Google, of course—the ultimate big data firm—uses data scientists to refine its core search and ad-serving algorithms. Zynga uses data scientists to target games and game-related products to customers. Netflix created the wellknown Netflix Prize for the data science team that could optimize the company’s movie recommendations for customers. The testing firm Kaplan uses its data scientists to begin advising customers on effective learning and test-preparation strategies. These companies’ big data efforts are directly focused on products, services, and customers. This has important implications, of course, for the organizational locus of big data and the processes and pace of new product development.

On the decision side, the primary value from big data derives from adding new sources of data to explanatory and predictive models. Many big data enthusiasts argue that there is more value from adding new sources of data to a model than to refining the model itself. For example, Anand Rajaram, who works at @WalMartLabs and teaches at Stanford, ran a bit of a natural experiment in one of his Stanford classes along the lines of the Netflix Prize—the contest that invited anyone to try to improve the Netflix customer video preference algorithm and win a million bucks.16 One of the groups in Rajaram’s classes used the data that Netflix provided and applied very sophisticated algorithms to it. Another group supplemented the data (illegally, according to the rules of the competition) with movie genre data from the Internet Movie Database.

There are many other examples of this phenomenon in both online and primarily offline businesses. GE is mainly focused on big data for improving services—among other things, to optimize the service contracts and maintenance intervals for industrial products. The real estate site Zillow created the Zestimate home price estimate, as well as Chapter_03.indd 65 03/12/13 11:28 AM 66  big data @ work rental cost Zestimates and a national home value index. Netflix created the Netflix Prize for the data science team that could optimize the company’s movie recommendations for customers and, as I noted in chapter 2, is now using big data to help in the creation of proprietary content. The testing firm Kaplan uses its big data to begin advising customers on effective learning and test-preparation strategies. Novartis focuses on big data—the health-care industry calls it informatics—to develop new drugs.


pages: 56 words: 16,788

The New Kingmakers by Stephen O'Grady


Amazon Web Services, barriers to entry, cloud computing, correlation does not imply causation, crowdsourcing, DevOps, Jeff Bezos, Khan Academy, Kickstarter, Mark Zuckerberg, Netflix Prize, Paul Graham, Silicon Valley, Skype, software as a service, software is eating the world, Steve Ballmer, Steve Jobs, Tim Cook: Apple, Y Combinator

Netflix’s own algorithm, Cinematch, attempted to predict what rating a given user would assign to a given film. On October 2, 2006, Netflix announced the Netflix Prize: The first team of non-employees that could best their in-house algorithm by 10% would claim $1,000,000. This prize had two major implications. First, it implied that the benefits of an improved algorithm would exceed one million dollars for Netflix, presumably through customer acquisition and improvements in retention. Second, it implied that crowd-sourcing had the potential to deliver better results than the organization could produce on its own. In this latter assumption, Netflix was proven correct. On October 8—just six days after the prize was announced—an independent team bested the Netflix algorithm, albeit by substantially less than ten percent. The 10% threshold was finally reached in 2009.

The 10% threshold was finally reached in 2009. In September of that year, Netflix announced that the team “BellKor’s Pragmatic Chaos”—composed of researchers from AT&T Labs, Pragmatic Theory, and Yahoo!—had won the Netflix Prize, taking home a million dollars for their efforts. A year earlier, meanwhile, Netflix had enabled the recruitment of millions of other developers by providing official APIs. In September 2008, Netflix launched, where developers could independently register with Netflix to get access to APIs that would enable them to build applications that would manage users’ video queues, check availability, and access account details. Just as Netflix believed that the wider world might be able to build a better algorithm, so too did it believe that out of the millions of developers in the world, one of them might be able to build a better application than Netflix itself.


pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier


23andMe, Affordable Care Act / Obamacare, airport security, AltaVista, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, IBM and the Holocaust, index card, informal economy, Internet of things, invention of the printing press, Jeff Bezos, Louis Pasteur, Mark Zuckerberg, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, performance metric, Peter Thiel, Post-materialism, post-materialism, random walk, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, speech recognition, Steve Jobs, Steven Levy, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Watson beat the top human players on Jeopardy!

Still, within days, the New York Times cobbled together searches like “60 single men” and “tea for good health” and “landscapers in Lilburn, Ga” to successfully identify user number 4417749 as Thelma Arnold, a 62-year-old widow from Lilburn, Georgia. “My goodness, it’s my whole personal life,” she told the Times reporter when he came knocking. “I had no idea somebody was looking over my shoulder.” The ensuing public outcry led to the ouster of AOL’s chief technology officer and two other employees. Yet a mere two months later, in October 2006, the movie rental service Netflix did something similar in launching its “Netflix Prize.” The company released 100 million rental records from nearly half a million users—and offered a bounty of a million dollars to any team that could improve its film recommendation system by at least 10 percent. Again, personal identifiers had been carefully removed from the data. And yet again, a user was re-identified: a mother and a closeted lesbian in America’s conservative Midwest, who because of this later sued Netflix under the pseudonym “Jane Doe.”

. [>] Netflix identified individual—Ryan Singel, “Netflix Spilled Your Brokeback Mountain Secret, Lawsuit Claims,” Wired, December 17, 2009 ( On the Netflix data release—Arvind Narayanan and Vitaly Shmatikov, “Robust De-Anonymization of Large Sparse Datasets,” Proceedings of the 2008 IEEE Symposium on Security and Privacy, p. 111 et seq. (; Arvind Narayanan and Vitaly Shmatikov, “How to Break the Anonymity of the Netflix Prize Dataset,” October 18, 2006, arXiv:cs/0610105 [cs.CR] ( On identifying people from three characteristics—Philippe Golle, “Revisiting the Uniqueness of Simple Demographics in the US Population,” Association for Computing Machinery Workshop on Privacy in Electronic Society 5 (2006), p. 77. On the structural weakness of anonymization—Paul Ohm, “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization,” 57 UCLA Law Review 1701 (2010).

Scientific American, March 30, 2007 ( Murray, Alexander. Reason and Society in the Middle Ages. Oxford University Press, 1978. Nalimov, E. V., G. McC. Haworth, and E. A. Heinz. “Space-Efficient Indexing of Chess Endgame Tables.” ICGA Journal 23, no. 3 (2000), pp. 148–162. Narayanan, Arvind, and Vitaly Shmatikov. “How to Break the Anonymity of the Netflix Prize Dataset.” October 18, 2006, arXiv:cs/0610105 ( ———. “Robust De-Anonymization of Large Sparse Datasets.” Proceedings of the 2008 IEEE Symposium on Security and Privacy, p. 111 ( Nazareth, Rita, and Julia Leite. “Stock Trading in U.S. Falls to Lowest Level Since 2008.” Bloomberg, August 13, 2012 (


pages: 397 words: 110,130

Smarter Than You Think: How Technology Is Changing Our Minds for the Better by Clive Thompson


3D printing, 4chan, A Declaration of the Independence of Cyberspace, augmented reality, barriers to entry, Benjamin Mako Hill, butterfly effect, citizen journalism, Claude Shannon: information theory, conceptual framework, corporate governance, crowdsourcing, Deng Xiaoping, discovery of penicillin, Douglas Engelbart, Edward Glaeser,, experimental subject, Filter Bubble, Freestyle chess, Galaxy Zoo, Google Earth, Google Glasses, Henri Poincaré, hindsight bias, hive mind, Howard Rheingold, information retrieval, iterative process, jimmy wales, Kevin Kelly, Khan Academy, knowledge worker, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Netflix Prize, Nicholas Carr, patent troll, pattern recognition, pre–internet, Richard Feynman, Richard Feynman, Ronald Coase, Ronald Reagan, sentiment analysis, Silicon Valley, Skype, Snapchat, Socratic dialogue, spaced repetition, telepresence, telepresence robot, The Nature of the Firm, the scientific method, The Wisdom of Crowds, theory of mind, transaction costs, Vannevar Bush, Watson beat the top human players on Jeopardy!, WikiLeaks, X Prize, éminence grise

the Galaxy Zoo: Tim Adams, “Galaxy Zoo and the New Dawn of Citizen Science,” The Observer (UK), March 17, 2012, accessed March 24, 2013, a one-million-dollar prize: Eliot Van Buskirk, “BellKor’s Pragmatic Chaos Wins $1 Million Netflix Prize by Mere Minutes,” Wired, September 21, 2009, accessed March 24, 2013,; I also previously reported on the Netflix Prize in “If You Liked This, You’re Sure to Love That,” The New York Times Magazine, November 21, 2008, accessed March 24, 2013, I didn’t specifically note the increasing secrecy of the participants over time in my article, but the teams remarked on this in my interviews.

At best, companies have been able to deploy fairly simple polling- and-voting group thinking projects, often to tap in to what their customers want; clothing firms like Threadless let their users vote on user-submitted designs. Others have solved the motivational problem by offering substantial prizes. Netflix, for example, offered a one-million-dollar prize for whoever could improve its movie-recommendation algorithm by 10 percent. But while prizes motivate hard work, they can inhibit sharing. When people are competing for a big prize, they’re often not willing to talk about their smartest breakthrough ideas for fear that a rival will steal their work. (Indeed, as teams got closer to winning the Netflix prize, they became increasingly secretive.) Other corporations have solved the problems of motivation and secrecy by turning inward and creating internal “decision markets” where employees can pose ideas and vote on the best ones.


pages: 274 words: 75,846

The Filter Bubble: What the Internet Is Hiding From You by Eli Pariser


A Declaration of the Independence of Cyberspace, A Pattern Language, Amazon Web Services, augmented reality, back-to-the-land, Black Swan, borderless world, Build a better mousetrap, Cass Sunstein, citizen journalism, cloud computing, cognitive dissonance, crowdsourcing, Danny Hillis, data acquisition, disintermediation, don't be evil, Filter Bubble, Flash crash, fundamental attribution error, global village, Haight Ashbury, Internet of things, Isaac Newton, Jaron Lanier, Jeff Bezos, jimmy wales, Kevin Kelly, knowledge worker, Mark Zuckerberg, Marshall McLuhan, megacity, Netflix Prize, new economy, PageRank, paypal mafia, Peter Thiel, recommendation engine, RFID, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, social graph, social software, social web, speech recognition, Startup school, statistical model, stem cell, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, the scientific method, urban planning, Whole Earth Catalog, WikiLeaks, Y Combinator

They can solve for serendipity, by designing filtering systems to expose people to topics outside their normal experience. This will often be in tension with pure optimization in the short term, because a personalization system with an element of randomness will (by definition) get fewer clicks. But as the problems of personalization become better known, it may be a good move in the long run—consumers may choose systems that are good at introducing them to new topics. Perhaps what we need is a kind of anti-Netflix Prize—a Serendipity Prize for systems that are the best at holding readers’ attention while introducing them to new topics and ideas. If this shift toward corporate responsibility seems improbable, it’s not without precedent. In the mid-1800s, printing a newspaper was hardly a reputable business. Papers were fiercely partisan and recklessly ideological. They routinely altered facts to suit their owners’ vendettas of the day, or just to add color.

When you use AddThis to share a piece of content on ABC News’s site (or anyone else’s), AddThis places a tracking cookie on your computer that can be used to target advertising to people who share items from particular sites. 6 “the cost is information about you”: Chris Palmer, phone interview with author, Dec 10, 2010. 7 accumulated an average of 1,500 pieces of data: Stephanie Clifford, “Ads Follow Web Users, and Get More Personal,” New York Times, July 30, 2009, accessed Dec. 19, 2010, 7 96 percent of Americans: Richard Behar, “Never Heard of Acxiom? Chances Are It’s Heard of You.” Fortune, Feb. 23, 2004, accessed Dec. 19, 2010, 8 Netflix can predict: Marshall Kirkpatrick, “They Did It! One Team Reports Success in the $1m Netflix Prize,” ReadWriteWeb, June 26, 2009, accessed Dec. 19, 2010, 8 Web site that isn’t customized . . . will seem quaint: Marshall Kirpatrick, “Facebook Exec: All Media Will Be Personalized in 3 to 5 Years,” ReadWriteWeb, Sept. 29, 2010, accessed Jan. 30, 2011, 8 “now the web is about ‘me’ ”: Josh Catone, “Yahoo: The Web’s Future Is Not in Search,” ReadWriteWeb, June 4, 2007, accessed Dec. 19, 2010, 8 “tell them what they should be doing”: James Farrar, “Google to End Serendipity (by Creating It),” ZDNet, Aug. 17, 2010, accessed Dec. 19, 2010, 8 are becoming a primary news source: Pew Research Center, “Americans Spend More Time Following the News,” Sept. 12, 2010, accessed Feb 7, 2011,


pages: 265 words: 69,310

What's Yours Is Mine: Against the Sharing Economy by Tom Slee


4chan, Airbnb, Amazon Mechanical Turk, asset-backed security, barriers to entry, Berlin Wall, big-box store, bitcoin, blockchain, citizen journalism, collaborative consumption, congestion charging, Credit Default Swap, crowdsourcing, data acquisition, David Brooks, don't be evil, gig economy, Hacker Ethic, income inequality, informal economy, invisible hand, Jacob Appelbaum, Jane Jacobs, Jeff Bezos, Khan Academy, Kibera, Kickstarter, license plate recognition, Lyft, Mark Zuckerberg, move fast and break things, natural language processing, Netflix Prize, Network effects, new economy, Occupy movement, openstreetmap, Paul Graham, peer-to-peer lending, Peter Thiel, pre–internet, principal–agent problem, profit motive, race to the bottom, Ray Kurzweil, recommendation engine, rent control, ride hailing / ride sharing, sharing economy, Silicon Valley, Snapchat, software is eating the world, South of Market, San Francisco, TaskRabbit, The Nature of the Firm, Thomas L Friedman, transportation-network company, Uber and Lyft, Uber for X, ultimatum game, urban planning, WikiLeaks, winner-take-all economy, Y Combinator, Zipcar

There is every reason to believe that most Netflix ratings are independent and honest. When you rate a movie you can offer your opinion freely, having no reason to expect reward or punishment for any particular rating. You also have an incentive to give a rating that matches your actual opinion, as it enables Netflix to recommend movies that better match your tastes. Figure 3 shows the distribution of ratings for a set of 100 million ratings that Netflix released for its Netflix Prize competition. Figure 3. Ratings in the Netflix contest data set. The ratings are distributed among the available scores with a peak at about 3.5, so a rating of 4 or 5 is a pretty good rating and Netflix ratings help us to discriminate between one-star stinkers and five-star favorites. Yelp is a rating site for restaurants and other small businesses. Each rating is made by an individual customer (who may remain anonymous).

BlaBlaCar ratings. Even though Sharing Economy ratings are typically crammed into a small range, could a 4.9 rating still indicate a better experience than a 4.7? All the evidence to date says no, it cannot. Even in rating systems with widely-distributed ratings like Netflix, the relationship between an individual rating and the quality of user experience is murky. One of the results of the Netflix Prize competition was that individual ratings turn out to depend on factors that have nothing to do with the movie itself: people tend to grade relative to the existing rating, so highly rated films tend to stay highly rated. The best competitors managed to compensate for these effects, but only in an environment where individual movies were getting millions of ratings, quite different from the Sharing Economy case.


pages: 98 words: 25,753

Ethics of Big Data: Balancing Risk and Innovation by Kord Davis, Doug Patterson


4chan, business process, corporate social responsibility, crowdsourcing,, Mahatma Gandhi, Mark Zuckerberg, Netflix Prize, Occupy movement, performance metric, side project, smart grid, urban planning

For example, what Google considers Personally Identifiable Information (PII) may be substantially different from Microsoft’s definition. How are we to protect PII if we can’t agree on what we’re protecting? The increasing availability of open data (and increasing number of data breaches) that make cross-correlation and de-anonymization an increasingly trivial task. Let’s not forget the example of the Netflix prize. Finally, there is no mention anywhere, in any policy statement reviewed, no matter what it was called, that addressed the topic of reputation. Reputation might be considered an “aggregate” value comprised of personal information that is judged in one fashion or another. Again, however, this raises the question of what values an organization is motivated by when developing the constituent policies.


pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos


3D printing, Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, crowdsourcing, Danny Hillis, data is the new oil, double helix, Douglas Hofstadter, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, global village, Google Glasses, Gödel, Escher, Bach, information retrieval, job automation, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, lone genius, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, NP-complete, P = NP, PageRank, pattern recognition, phenotype, planetary scale, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, Second Machine Age, self-driving car, Silicon Valley, speech recognition, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight

With a decision tree, the choice of whether to use a learner can be contingent on other learners’ predictions. Either way, to obtain a learner’s prediction for a given training example, we must first apply it to the original training set excluding that example and use the resulting classifier—otherwise the committee risks being dominated by learners that overfit, since they can predict the correct class just by remembering it. The Netflix Prize winner used metalearning to combine hundreds of different learners. Watson uses it to choose its final answer from the available candidates. Nate Silver combines polls in a similar way to predict election results. This type of metalearning is called stacking and is the brainchild of David Wolpert, whom we met in Chapter 3 as the author of the “no free lunch” theorem. An even simpler metalearner is bagging, invented by the statistician Leo Breiman.

We don’t have the Master Algorithm yet, just a glimpse of what it might look like. What if something fundamental is still missing, something all of us in the field, steeped in its history, can’t see? We need new ideas, and ideas that are not just variations on the ones we already have. That’s why I wrote this book: to start you thinking. I teach an evening class on machine learning at the University of Washington. In 2007, soon after the Netflix Prize was announced, I proposed it as one of the class projects. Jeff Howbert, a student in the class, got hooked and continued to work on it after the class was over. He wound up being a member of one of the two winning teams, two years after learning about machine learning for the first time. Now it’s your turn. To learn more about machine learning, check out the section on further readings at the end of the book.

See Markov logic networks (MLNs) Moby Dick (Melville), 72 Molecular biology, data and, 14 Moneyball (Lewis), 39 Mooney, Ray, 76 Moore’s law, 287 Moravec, Hans, 288 Muggleton, Steve, 80 Multilayer perceptron, 108–111 autoencoder, 116–118 Bayesian, 170 driving a car and, 113 Master Algorithm and, 244 NETtalk system, 112 reinforcement learning and, 222 support vector machines and, 195 Music composition, case-based reasoning and, 199 Music Genome Project, 171 Mutation, 124, 134–135, 241, 252 Naïve Bayes classifier, 151–153, 171, 304 Bayesian networks and, 158–159 clustering and, 209 Master Algorithm and, 245 medical diagnosis and, 23 relational learning and, 228–229 spam filters and, 23–24 text classification and, 195–196 Narrative Science, 276 National Security Agency (NSA), 19–20, 232 Natural selection, 28–29, 30, 52 as algorithm, 123–128 Nature Bayesians and, 141 evolutionaries and, 137–142 symbolists and, 141 Nature (journal), 26 Nature vs. nurture debate, machine learning and, 29, 137–139 Neal, Radford, 170 Nearest-neighbor algorithms, 24, 178–186, 202, 306–307 dimensionality and, 186–190 Negative examples, 67 Netflix, 12–13, 183–184, 237, 266 Netflix Prize, 238, 292 Netscape, 9 NETtalk system, 112 Network effect, 12, 299 Neumann, John von, 72, 123 Neural learning, fitness and, 138–139 Neural networks, 99, 100, 112–114, 122, 204 convolutional, 117–118, 302–303 Master Algorithm and, 240, 244, 245 reinforcement learning and, 222 spin glasses and, 102–103 Neural network structure, Baldwin effect and, 139 Neurons action potentials and, 95–96, 104–105 Hebb’s rule and, 93–94 McCulloch-Pitts model of, 96–97 processing in brain and, 94–95 See also Perceptron Neuroscience, Master Algorithm and, 26–28 Newell, Allen, 224–226, 302 Newhouse, Neil, 17 Newman, Mark, 160 Newton, Isaac, 293 attribute selection, 189 laws of, 4, 14, 15, 46, 235 rules of induction, 65–66, 81, 82 Newtonian determinism, Laplace and, 145 Newton phase of science, 39–400 New York Times (newspaper), 115, 117 Ng, Andrew, 117, 297 Nietzche, Friedrich, 178 NIPS.


Remix: Making Art and Commerce Thrive in the Hybrid Economy by Lawrence Lessig


Amazon Web Services, Andrew Keen, Benjamin Mako Hill, Berlin Wall, Bernie Sanders, Brewster Kahle, Cass Sunstein, collaborative editing, disintermediation, don't be evil, Erik Brynjolfsson, Internet Archive, invisible hand, Jeff Bezos, jimmy wales, Kevin Kelly, late fees, Netflix Prize, Network effects, new economy, optical character recognition, PageRank, recommendation engine, revision control, Richard Stallman, Ronald Coase, Saturday Night Live, SETI@home, sharing economy, Silicon Valley, Skype, slashdot, Steve Jobs, The Nature of the Firm, thinkpad, transaction costs, VA Linux

Functionality gets LEGO-ized: it gets turned into a block that others can add to their own Web site or their own business. Netflix does this the least among the three, but it does it nonetheless. (The company was scolded by one of the Net’s leading bloggers in 2004 for failing to offer APIs.26 It is slowly responding.) Its purpose is to “improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences.”27 To achieve this end, Netflix runs a “Netflix Prize”—offering a grand prize of $1 million to anyone who improves Netflix’s own system by more than 10 percent. To enable this competition to happen, Netflix shared “a lot of anonymous rating data.” The company 80706 i-xxiv 001-328 r4nk.indd 137 8/12/08 1:55:20 AM REMI X 138 also increasingly offers through RSS feeds access to ranking information about its users’ choices. Amazon does this through its Amazon Web Services.

See “Google Defies US Over Search Data,” BBC News, January 20, 2006, available at link #64; Maryclaire Dale, “Judge Throws Out Internet Blocking Law: Ruling States Parents Must Protect Children Through Less Restrictive Means,” MSNBC, March 22, 2007, available at link #65. Google prevailed in its effort to restrict the government’s search. See Gonzales v. Google, 234 F.R.D. 674 (N.D. Cal. 2006). 26. Phillip Torrone, “Netflix, Open Up or Die . . . ,” available at link #66. 27. Netflix, Netflix Prize, available at link #67 (last visited July 2, 2007). 28. Tapscott and Williams, Wikinomics, 183. 29. See Tim O’Reilly, “What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software,” O’Reilly, September 30, 2005, available at link #68. As Mary Madden summarizes the idea, it is “utilizing collective intelligence, providing networkenabled interactive services, giving users control over their own data.”


pages: 247 words: 71,698

Avogadro Corp by William Hertling


Any sufficiently advanced technology is indistinguishable from magic, cloud computing, crowdsourcing, Hacker Ethic, hive mind, invisible hand, natural language processing, Netflix Prize, private military company, Ray Kurzweil, recommendation engine, Richard Stallman, technological singularity, Turing test, web application

“At the heart of how this works is the field of recommendation algorithms,” David explained. “Sean hired me not because I knew anything about language analysis but because I was a leading competitor in the Netflix competition. Netflix recommends movies that you’d enjoy watching. The better Netflix can do this, the more you as a customer enjoy using Netflix’s movie rental service. Several years ago, Netflix offered a million dollar prize to anyone who could beat their own algorithm by ten percent.” “What’s amazing and even counterintuitive about recommendation algorithms is that they don’t depend on understanding anything about the movie. Netflix does not, for example, have a staff of people watching movies to categorize and rate them, just to find the latest sci-fi space action thriller that I happen to like. Instead, they rely on a technique called collaborative filtering, where they find other customers just like me, and then see how those customers rated a given movie to predict how I’ll rate it.

While the analysis module determined goals and objectives from the email, the optimization module used fragments from thousands of other emails to create a realistic email written in a voice very similar to that of the sender. David relished the success of the team, and wished he could share with them what they had accomplished. Their project was the culmination of nearly three years of dedicated research and development. It had started with David’s work on the Netflix Prize before he was hired at Avogadro, although even that work had been built on the shoulders of geniuses that had come before him. Then there were eight months of him and Mike laboring on their own to prove out the idea enough to justify an entire team. Finally, during the last eighteen months, an entire R&D team worked on the project, building the initial architecture, and then incrementally improving the effectiveness of the system week after week.


pages: 163 words: 42,402

Machine Learning for Email by Drew Conway, John Myles White


call centre, correlation does not imply causation, Debian, natural language processing, Netflix Prize, pattern recognition, recommendation engine, SpamAssassin, text mining

As long as we assume we’ve working with rectangular arrays, we’ll be able to use lots of powerful mathematical techniques without having to think very carefully about the actual mathematical operations being performed. For example, we won’t explicitly use matrix multiplication anywhere in this book, even though almost every technique we’re going to exploit can be described in terms of matrix multiplications, whether it’s the standard linear regression model or the modern matrix factorization techniques that have become so popular lately thanks to the Netflix prize. Because we’ll treat data rectangles, tables, and matrices interchangeably, we ask for your patience when we switch back and forth between those terms throughout this book. Whatever term we use, you should just remember that we’re thinking of something like Table 2-1 when we talk about data. Table 2-1. Your authors NameAge Drew Conway 28 John Myles White 29 Since data consists of rectangles, we can actually draw pictures of the sorts of operations we’ll perform pretty easily.


pages: 579 words: 76,657

Data Science from Scratch: First Principles with Python by Joel Grus


correlation does not imply causation, natural language processing, Netflix Prize, p-value, Paul Graham, recommendation engine, SpamAssassin, statistical model

= other_interest_id and similarity > 0] return sorted(pairs, key=lambda (_, similarity): similarity, reverse=True) which suggests the following similar interests: [('Hadoop', 0.8164965809277261), ('Java', 0.6666666666666666), ('MapReduce', 0.5773502691896258), ('Spark', 0.5773502691896258), ('Storm', 0.5773502691896258), ('Cassandra', 0.4082482904638631), ('artificial intelligence', 0.4082482904638631), ('deep learning', 0.4082482904638631), ('neural networks', 0.4082482904638631), ('HBase', 0.3333333333333333)] Now we can create recommendations for a user by summing up the similarities of the interests similar to his: def item_based_suggestions(user_id, include_current_interests=False): # add up the similar interests suggestions = defaultdict(float) user_interest_vector = user_interest_matrix[user_id] for interest_id, is_interested in enumerate(user_interest_vector): if is_interested == 1: similar_interests = most_similar_interests_to(interest_id) for interest, similarity in similar_interests: suggestions[interest] += similarity # sort them by weight suggestions = sorted(suggestions.items(), key=lambda (_, similarity): similarity, reverse=True) if include_current_interests: return suggestions else: return [(suggestion, weight) for suggestion, weight in suggestions if suggestion not in users_interests[user_id]] For user 0, this generates the following (seemingly reasonable) recommendations: [('MapReduce', 1.861807319565799), ('Postgres', 1.3164965809277263), ('MongoDB', 1.3164965809277263), ('NoSQL', 1.2844570503761732), ('programming languages', 0.5773502691896258), ('MySQL', 0.5773502691896258), ('Haskell', 0.5773502691896258), ('databases', 0.5773502691896258), ('neural networks', 0.4082482904638631), ('deep learning', 0.4082482904638631), ('C++', 0.4082482904638631), ('artificial intelligence', 0.4082482904638631), ('Python', 0.2886751345948129), ('R', 0.2886751345948129)] For Further Exploration Crab is a framework for building recommender systems in Python. Graphlab also has a recommender toolkit. The Netflix Prize was a somewhat famous competition to build a better system to recommend movies to Netflix users. Chapter 23. Databases and SQL Memory is man’s greatest friend and worst enemy. Gilbert Parker The data you need will often live in databases, systems designed for efficiently storing and querying data. The bulk of these are relational databases, such as Oracle, MySQL, and SQL Server, which store data in tables and are typically queried using Structured Query Language (SQL), a declarative language for manipulating data.


pages: 459 words: 103,153

Adapt: Why Success Always Starts With Failure by Tim Harford


Andrew Wiles, banking crisis, Basel III, Berlin Wall, Bernie Madoff, Black Swan, car-free, carbon footprint, Cass Sunstein, charter city, Clayton Christensen, clean water, cloud computing, cognitive dissonance, complexity theory, corporate governance, correlation does not imply causation, credit crunch, Credit Default Swap, crowdsourcing, cuban missile crisis, Daniel Kahneman / Amos Tversky, Dava Sobel, Deep Water Horizon, Deng Xiaoping, double entry bookkeeping, Edmond Halley,, Erik Brynjolfsson, experimental subject, Fall of the Berlin Wall, Fermat's Last Theorem, Firefox, food miles, Gerolamo Cardano, global supply chain, Isaac Newton, Jane Jacobs, Jarndyce and Jarndyce, Jarndyce and Jarndyce, John Harrison: Longitude, knowledge worker, loose coupling, Martin Wolf, Menlo Park, Mikhail Gorbachev, mutually assured destruction, Netflix Prize, New Urbanism, Nick Leeson, PageRank, Piper Alpha, profit motive, Richard Florida, Richard Thaler, rolodex, Shenzhen was a fishing village, Silicon Valley, Silicon Valley startup, South China Sea, special economic zone, spectrum auction, Steve Jobs, supply-chain management, the market place, The Wisdom of Crowds, too big to fail, trade route, Tyler Cowen: Great Stagnation, web application, X Prize

The better the recommendations, the happier the customer, so in March 2006 the founder and chief executive of Netflix, Reed Hastings, met some colleagues to discuss how they might improve the software that made the recommendations. Hastings had been inspired by the story of John Harrison, and suggested offering a prize of $1m to anyone who could do better than Netflix’s in-house algorithm, Cinematch. The Netflix prize, announced in October 2006, struck a chord with the Web 2.0 generation. Within days of the prize announcement, some of the best minds in the relevant fields of computer science were on the case. Within a year, the leading entries had reduced Cinematch’s recommendation errors by more than 8 per cent – close to the million-dollar hurdle of 10 per cent. Over 2,500 teams from 161 countries and comprising 27,000 competitors entered the contest.

Economy, 104 ‘Firms are reluctant to risk their money’: McKinstry, Spitfire, pp. 34–5. 105 There is an inconvenient tale behind this: I have drawn much of this account from Dava Sobel’s Longitude (London: Fourth Estate, 1996). 106 Compared with the typical wage of the day: Officer, ‘Purchasing power of British pounds’, cited above, n. 10. 107 In 1810 Nicolas Appert: 107 Ultimately the Académie began to turn down: Maurice Crosland, ‘From prizes to grants in the support of scientific research in France in the nineteenth century: The Montyon legacy’, Minerva, 17(3) (1979), pp. 355–80, and Robin Hanson, ‘Patterns of patronage: why grants won over prizes in science’, University of California, Berkeley, working paper 1998, 108 Innovation prizes were firmly supplanted: Hanson, ‘Patterns of patronage’. 109 The prize was eventually awarded in September 2009: a follow-up prize was announced and then cancelled following a lawsuit over privacy. One Netflix user alleged that the data released by Netflix didn’t sufficiently conceal her anonymity, and might allow others to discover that she was a lesbian by connecting her with ‘anonymous’ reviews. (Ryan Singel, ‘Netflix spilled your Brokeback Mountain secret, lawsuit claims’, Wired, 17 December 2009, 110 ‘One of the goals of the prize’: author interview, 13 December 2007. 110 Not everybody responds to such incentives: ‘Russian maths genius Perelman urged to take $1m prize’, BBC News, 24 March 2010, 111 The vaccine prize takes the form of an agreement: the advanced market commitment idea was developed by Michael Kremer in ‘Patent buyouts: a mechanism for encouraging innovation’, Quarterly Journal of Economics, 113:4 (1998), 1137–67; but also see and the Center for Global Development’s ‘Making markets for vaccines’, 111 Only the very largest pharmaceutical companies spend more than: Medicines Australia, ‘Global pharmaceutical industry facts at a glance’, p. 3, 111 Children in Nicaragua received: Amanda Glassman, ‘Break out the champagne!

.), 37, 65, 77–8, 223, 227, 228; Battle of 73 Easting and, 72–3, 79; counterinsurgency strategy and, 53–5, 57, 61, 64, 74, 75, 79, 258, 262; streak of sedition, 54–5, 56, 58, 59, 78; in Tal Afar, 53–6, 61, 64, 79; on Vietnam War, 46, 47, 50, 56, 78 McNamara, Robert, 46–7, 49–50, 60, 68, 69, 76, 78 medical profession, 120–1; clinical trials, 123–4, 125–6; in history, 121–3, 140–1; rigorous evidence and, 121, 122–3, 125–7 Medical Research Council, 100 Melvill, Mike, 112, 114 Menand, Louis, 7 Merton Rule, 169–72, 176, 177 microfinance organisations, 116, 117–18, 120 Microsoft, 11, 12, 90, 78668 >112, 241, 242 Miguel, Edward, 129, 131 Millennium prizes, 110, 114 mission command doctrine, 79 Mitchelhill, Steve, 193 Mitchell, Reginald, 88, 89, 114, 223, 262 molecular biology, 98–9; DNA research, 99–100; prizes for, 109, 110 Mondrian, Piet, 260 moon landings, 84, 113 Moore, Paul, 211, 213, 214, 250 Morse, Adair, 210, 213 Moulin, Sylvie, 127–8 Movin’ Out (ballet/musical), 247–50, 253–4, 257, 258–9 Mprize, 110 Mullainathan, Sendhil, 135 Murray, Euan, 163–5 Myers, Dave, 233 Nagl, John, 52, 61, 63, 65, 66, 76 Napoleon, 41, 107 > NASA, 113 National Academy of Sciences, 6 National Bureau of Economic Research, 145 National Institutes of Health, US (NIH), 99–100, 101, 102–3 Netflix, 108–9 New Songdo City (South Korea), 152 New Zealand, 161, 176 Newsweek, 63 Newton, Sir Isaac, 105 Nobel prizes, 68, 75, 100, 108, 116, 120 nuclear industry, 184, 185, 187, 191–3, 215, 220, 227–8, 230–1 Obama, President Barack, 5, 195 Odean, Terrance, 35 Office for Financial Research, US, 195 Ofshe, Richard, 252–3 Oklahoma! (musical), 248 Oliver, Jamie, 29–30 Olken, Benjamin, 133–4, 142, 143 Opportunity International, organic products, 159, 160, 226 organisations: ‘bottom-up’ adaptation, 58, 60–1, 134; grandiosity and, 27–8; idealised hierarchy, 40–1, 42, 46–7, 49–50, 55; peer monitoring, 229–31, 232–3; standardisation and, 28; traditional, 29, 31, 35; see also corporations and companies; government and politics Orgel, Leslie, 174, 175, 176, 177, 178, 180 Ormerod, Paul, 18–19 outsourcing, 90 Pace, General Peter, 42, 43 Packer, George, 43 Page, Larry, 231–2 Page, Scott, 49 Palchinsky, Peter, 21–5, 26, 27, 30, 31, 49, 118, 250 ‘Palchinsky principles’, 25, 28, 29, 36, 207, 224, 235, 250; see also selection principle; survivability principle; variation principle Palmer, Geoffrey, 170–1 Palo Alto Research Center (Parc), 11 Parkinson, Elizabeth, 249 patents, 90, 91–2, 94, 95–7, 104, 110, 111, 113, 114, 179 Patriquin, Captain Travis, 58 Pentagon Papers, 62 PEPFAR, 119 Pepys, Samuel, 96 Perelman, Grigory, 110 Perrow, Charles, 185, 186, 191, 194–5 Peters, Tom, 8, 10, 244 Petraeus, General David, 37, 59–62, 63–4, 65, 74, 78, 256 Pfizer, 90 pharmaceutical industry, 94, 110–11, 114, 236–7 PhilCo, 11 Philippines, 136 Phillips, Michael, 249 Picasso, Pablo, 260 pilot schemes, 29–30 Pinochet, General Augusto, 70 Piper-Alpha disaster (July 1988), 181–3, 184, 186, 187, 208–9, 219 planning, 19, 68–9; centrally planned economies, 11, 21, 23–6, 68–9, 70; ‘effects-based operations’ (EBO), 67–8, 74; localised/fleeting information and, 21–2, 24, 25, 31, 52–3, 57–8, 66–7, 71–3, 74, 78, 79; quantitative analysis, 46–7, 69, 78 PlayPumps, 118–19, 120, 130, 142 pneumococcal infections, 110–11, 114 poker, 31–2 politics see government and politics POSCO, 152 ‘postcode lottery’ concept, 28 poverty, global, 4, 5, 115–16 printing industry, early, 10 problem solving, 4–6, 14; evolutionary theory and, 14–15, 16, 17; idealised view of, 40–1, 42, 46–7, 49–50, 55; lessons from history and, 63, 65–7; technology and, 84, 94; ‘Toaster Project’, 1–2, 4, 12; see also decision making; innovation; ‘Palchinsky principles’; trial and error Procter & Gamble, 9, 12 public services, 28, 141, 213–14 public transport, 161–2 Pullman, 9, 15 PwC, 196–200 Pye, David, 80 Al Qaeda in Iraq (AQI), 39, 40, 43, 51, 54, 57, 77 quantitative analysis, 46–7, 69, 78 Rajan, Raghuram, 75 randomised trials, 235–8; development and, 127–9, 131, 132, 133, 134, 135–6, 137–40, 141 Raskin, Aza, 221 Reagan, President Ronald, 6 Reason, James, 184–5, 186–7, 208, 209, 218 Reinikka, Ritva, 142 renewable energy technology, 84, 91, 96, 130, 168, 169–73, 179, 245 research and development, 83–5, 87–95, 99–104, 111; see also innovation Ricks, Thomas, 61 risk, psychology of, 32–5, 253–4, 256 risk management, 183, 185, 187–90, 206–7 Roche, 97 Roger Preston Partners, 170–1 Romer, Paul, 150–1 Royal Air Force (RAF), 80–2, 88 Royal College of Physicians, 122 Royal Observatory, 105, 106, 107 Rumsfeld, Donald, 59, 61; centralisation and, 47, 69, 71, 72, 76, 196; refusal of advice/feedback, 43–4, 45, 46, 50, 57, 60, 62, 63, 65, 67, 223, 256; term ‘insurgency’ and, 42–3, 55, 63, 250 Russia, 21–7, 68–9, 250 Rutan, Burt, 112 Sachs, Jeffrey, 129–30 Saddam Hussein, 44, 45, 66, 73 Santa Fe Institute, 16, 103 Schmidt, Eric, 230, 231, 232 Schneider Trophy, 88, 89, 110, 114 Schulz, Kathryn, Being Wrong, 262 Schumacher, E.F., 181 Schwab, Charles, 243 Schwarzkopf, Norman, 67, 68 Scott, Owen, 118–19 scurvy, 122–3 Second World War, 81–2, 83, 85, 89, 124–5, 126 Securities and Exchange Commission (SEC), 210, 212–13 selection principle, 25–6, 27, 207, 224, 250, 259; charter cities and, 149, 152, 153; development aid, 117, 140–3, 149, 152, 153; evolutionary theory and, 13, 14, 15, 16–17, 23, 86; pilot schemes and, 29–30 Sepp, Kalev, 61 Sewall, Sarah, 61, 63 Shell, 9, 244–5 Shenzhen, 150, 152 Shimura, Goro, 247 Shindell, Drew, 160* Shinseki, General Eric, 43–4, 45 Shirky, Clay, 90 Shovell, Admiral Sir Clowdisley, 105 SIGMA I war game, 50 Sims, Karl, 13–14, 174, 176 Singapore, 150 Singer, 9, 10, 15 Skunk Works division, Lockheed, 89, 93, 224, 242 Smith, Adam, 143, 147 Sobel, Dava, Longitude, 107* solar power, 84, 91, 96, 179, 245 Solidarity movement, Polish, 26 Sorkin, Andrew Ross, 193 South Africa, 147 South Korea, 146–7, 152 Soviet Union, 21–7, 68–9, 250 space tourism, 112–13, 114 Spitfire aircraft, 81–2, 83, 84–5, 87–9, 114, 262 Spock, Dr Benjamin, Baby and Child Care, 120–1 Sri Lanka, 136 Stalin, Joseph, 24, 250 Starbucks, 28, 159, 164, 165, 166 Sunstein, Cass, 177–8 Supermarine, 81, 87–9 survivability principle, 25, 36, 153, 207–8, 215, 224, 235, 243, 250 Svensson, Jakob, 142–3 Tabarrok, Alex, 96 Taiwan, 148 Taleb, Nassim, The Black Swan, 83 Target (discount retailer), 243 Taylor, A.J.P., 89 Taylor, Charles, 136 technologies, new: centralisation and, 71, 75, 76, 79, 226, 227, 228; decentralisation and, 76; ‘effects-baseerations’ (EBO), 67–8, 74; evolutionary theory and, 13–14, 174; first Gulf War and, 67, 71, 72–3, 79; fraud and, 212; hi-tech start-ups, 90, 91; innovation and, 89–90, 91, 94–5, 239–40; iPhone and Android apps, 90, 93; Iraq war and, 71, 72, 74, 78–9, 196; Robert McNamara and, 47, 69; open-source software movement, 230; prizes and, 108–9; product space concept and, 145–8; Project CyberSyn, 69–72; randomised experiments and, 235–7; return on investment and, 83–4; safety systems and, 193; software, 12, 76, 90, 92–3, 230, 241–2; unpredictability and, 84–5; virtual decision making, 49; see also internet terrorism, 4, 51, 54, 57, 96–7, 192 Tesco, 75, 226 Tetlock, Philip, 6–8, 10, 16, 17, 19, 66 Thaler, Richard, 33, 34, 177–8, 254, 256 Tharp, Twyla, 247–51, 253–4, 256, 257, 258–9, 262 Thatcher, Margaret, 20 Thre Mile Island disaster (1979), 36, 184, 185, 191–2, 193, 220 Thwaites, Thomas, ‘Toaster Project’, 1–2, 4, 12 Timpson, John, 226–7, 228–9, 230, 232–3 Tipton, Jennifer, 257 Toyota, 9, 159, 161, 165 Transitron, 11 Transocean, 216, 217, 218–19 Trenchard, Sir Hugh, 88 trial and error, 12, 14, 17, 19–20, 21, 35, 36, 66, 220; decentralisation and, 31, 174–5, 232, 234; Thomas Edison and, 236, 238; individuals and, 31–5; Iraq war and, 64–5, 66–7; market system, 20; randomised experiments, 235–9; Muhammad Yunus and, 116, 117–18 Tversky, Amos, 32, 253 Tyndall, John, 154–6 Tzara, Tristan, 247 Uganda, 142–3 United Arab Emirates, 147 university students, 260–1 US Air Force, 93 US Steel, 9 USAID, 119 Utah, University of, 99 van Helmont, Jan Baptist, 121–2, 141 variation principle, 25–6, 79, 100, 117, 140, 174–5, 207, 224, 235, 250; charter cities and, 152–3; evolutionary theory and, 13, 14, 16–17, 23; grandiosity and, 27–8; pluralism and, 85; uniform standards and, 28–9 Vaze, Prashant, 177–8 Venter, Craig, 109 Vickers, 88 Vietnam war, 46–7, 49–50, 56, 62, 64, 68, 69, 78, 243–4 Virgin Group, 112, 243 Wallis, Barnes, 88 Wallstrom, Margot, 139 Wal-Mart, 3, 75, 226, 238 Warhol, Andy, 28 Waterman, Robert, 8, 10 Watson, James, 98–9 Weinstein, Jeremy, 137, 138 Weiss, Bob, 110 Westinghouse Electric, 9 whistleblowers, 211–14, 215, 218–19, 220, 229 White Knight One, 112, 114 Whole Foods Market, 224–6, 227, 228, 229, 232, 234 Wikipedia, 230 Williams, Mike, 216–17 Willumstad, Robert, 193–4 W.L.


pages: 369 words: 80,355

Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room by David Weinberger


airport security, Alfred Russel Wallace, Amazon Mechanical Turk, Berlin Wall, Black Swan, book scanning, Cass Sunstein, corporate social responsibility, crowdsourcing, Danny Hillis, David Brooks, Debian, double entry bookkeeping, double helix,, Exxon Valdez, Fall of the Berlin Wall, future of journalism, Galaxy Zoo, Hacker Ethic, Haight Ashbury, hive mind, Howard Rheingold, invention of the telegraph, jimmy wales, John Harrison: Longitude, Kevin Kelly, linked data, Netflix Prize, New Journalism, Nicholas Carr, Norbert Wiener, openstreetmap, P = NP, Pluto: dwarf planet, profit motive, Ralph Waldo Emerson, RAND corporation, Ray Kurzweil, Republic of Letters, RFID, Richard Feynman, Richard Feynman, Ronald Reagan, semantic web, slashdot, social graph, Steven Pinker, Stewart Brand, technological singularity, Ted Nelson, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas Malthus, Whole Earth Catalog, X Prize

Then, of course, startups and Web 2.0 companies began holding “jellies,” which are like jams but bring together multiple smaller companies.32 Because the Net lets us form expert networks of just about any size and configuration, from twosomes to crowds to massively multiplayer games, the expertise of networks need not be equal to the expertise of its smartest member—or even cumulative. The complex, multiway interactions the Net enables means that networks of experts can be smarter than the sum of their participants. For example, BellKor’s Pragmatic Chaos was able to win the Netflix prize because the Internet not only made it feasible to assemble experts from around the world but also made it possible for those experts to collaborate. While crowdsourcing can aggregate information—people in every neighborhood of New York City can report on what their local groceries are charging for diapers—networked experts who are talking with one another can build on what they know. We see this all the time on topical mailing lists.


pages: 268 words: 75,850

The Formula: How Algorithms Solve All Our Problems-And Create More by Luke Dormehl


3D printing, algorithmic trading, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, big data - Walmart - Pop Tarts, call centre, Cass Sunstein, Clayton Christensen, computer age, death of newspapers, deferred acceptance, Edward Lorenz: Chaos theory, Erik Brynjolfsson, Filter Bubble, Flash crash, Florence Nightingale: pie chart, Frank Levy and Richard Murnane: The New Division of Labor, Google Earth, Google Glasses, High speed trading, Internet Archive, Isaac Newton, Jaron Lanier, Jeff Bezos, job automation, Kevin Kelly, Kodak vs Instagram, Marshall McLuhan, means of production, Nate Silver, natural language processing, Netflix Prize, pattern recognition, price discrimination, recommendation engine, Richard Thaler, Rosa Parks, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Slavoj Žižek, social graph, speech recognition, Steve Jobs, Steven Levy, Steven Pinker, Stewart Brand, the scientific method, The Signal and the Noise by Nate Silver, upwardly mobile, Wall-E, Watson beat the top human players on Jeopardy!, Y Combinator

In other areas—particularly as relate to law—a reliance on algorithms might simply justify existing bias and lack of understanding, in the same way that the “filter bubble” effect described in Chapter 1 can result in some people not being presented with certain pieces of information, which may take the form of opportunities. “It’s not just you and I who don’t understand how these algorithms work—the engineers themselves don’t understand them entirely,” says scholar Ted Striphas. “If you look at the Netflix Prize, one of the things the people responsible for the winning entries said over and over again was that their algorithms worked, even though they couldn’t tell you why they worked. They might understand how they work from the point of view of mathematical principles, but that math is so complex that it is impossible for a human being to truly follow. That troubles me to some extent. The idea that we don’t know the world that we’re creating makes it very difficult for us to operate ethically and mindfully within it.”


pages: 292 words: 85,151

Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It) by Salim Ismail, Yuri van Geest


23andMe, 3D printing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, bioinformatics, bitcoin, Black Swan, blockchain, Burning Man, business intelligence, business process, call centre, chief data officer, Clayton Christensen, clean water, cloud computing, cognitive bias, collaborative consumption, collaborative economy, corporate social responsibility, cross-subsidies, crowdsourcing, cryptocurrency, dark matter, Dean Kamen, dematerialisation, discounted cash flows, distributed ledger, Edward Snowden, Elon Musk,, ethereum blockchain, Galaxy Zoo, game design, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, hiring and firing, Hyperloop, industrial robot, Innovator's Dilemma, Internet of things, Iridium satellite, Isaac Newton, Jeff Bezos, Kevin Kelly, Kickstarter, knowledge worker, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, loose coupling, loss aversion, Lyft, Mark Zuckerberg, market design, means of production, minimum viable product, natural language processing, Netflix Prize, Network effects, new economy, Oculus Rift, offshore financial centre, p-value, PageRank, pattern recognition, Paul Graham, Peter H. Diamandis: Planetary Resources, Peter Thiel, prediction markets, profit motive, publish or perish, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, Ronald Coase, Second Machine Age, self-driving car, sharing economy, Silicon Valley, skunkworks, Skype, smart contracts, Snapchat, social software, software is eating the world, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, subscription business, supply-chain management, TaskRabbit, telepresence, telepresence robot, Tony Hsieh, transaction costs, Tyler Cowen: Great Stagnation, urban planning, WikiLeaks, winner-take-all economy, X Prize, Y Combinator

Internal Usage: Maps display resulting gestures for all users SCALE Attribute: Community & Crowd Google Interface: AdWords Description: User picks keywords to advertise against Internal Usage: Google places ads against search results SCALE Attribute: Algorithms GitHub Interface: Version control system Description: Multiple coders updating software sequentially and in parallel Internal Usage: Platform keeps all contributions in sync SCALE Attribute: Community & Crowd Zappos Interface: Hiring process Description: Incentive competitions Internal Usage: Narrows down candidates from large pool SCALE Attribute: Engagement Gigwalk Interface: Task availability Description: Gigwalk workers receive location-based, simple tasks when available Internal Usage: Matches task demand with supply of Gigwalkers SCALE Attribute: Staff on Demand One final way to think about Interfaces is that they help manage abundance. While most processes are optimized around scarcity and efficiency, SCALE elements generate large result sets, meaning Interfaces are geared towards filtering and matching. As an example, keep in mind that the Netflix prize generated 44,104 entries that needed to be filtered, ranked, prioritized and scored. Why Important? Dependencies or Prerequisites • Filter external abundance into internal value • Bridge between external growth drivers and internal stabilizing factors • Automation allows scalability • Standardized processes to enable automation • Scalable externalities • Algorithms (in most cases) Dashboards Given the huge amounts of data from customers and employees becoming available, ExOs need a new way to measure and manage the organization: a real-time, adaptable dashboard with all essential company and employee metrics, accessible to everyone in the organization.

A small mass allows dramatic acceleration and quick changes in direction—precisely what we’re seeing with many ExOs today. With very little internal inertia (that is, number of employees, assets or organizational structures), they demonstrate extraordinary flexibility, which is a critical quality in today’s volatile world. This remarkable characteristic has been well demonstrated by Netflix. As mentioned earlier, the company offered a $1 million prize (Engagement) to anyone who could improve its rental recommendation program. What is less well known is that Netflix never implemented the winning algorithm. Why? Because, tellingly, the market had already moved on. By the conclusion of the contest the industry had shifted away from rental DVDs; meanwhile Netflix’s streaming video business was exploding and, unfortunately, the winning algorithm didn’t apply to streaming recommendations.


pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future by Kevin Kelly


3D printing, A Declaration of the Independence of Cyberspace, AI winter, Airbnb, Albert Einstein, Amazon Web Services, augmented reality, bank run, barriers to entry, Baxter: Rethink Robotics, bitcoin, blockchain, book scanning, Brewster Kahle, Burning Man, cloud computing, computer age, connected car, crowdsourcing, dark matter, dematerialisation, Downton Abbey, Edward Snowden, Elon Musk, Filter Bubble, Freestyle chess, game design, Google Glasses, hive mind, Howard Rheingold, index card, indoor plumbing, industrial robot, Internet Archive, Internet of things, invention of movable type, invisible hand, Jaron Lanier, Jeff Bezos, job automation, Kevin Kelly, Kickstarter, linked data, Lyft, M-Pesa, Marshall McLuhan, means of production, megacity, Minecraft, multi-sided market, natural language processing, Netflix Prize, Network effects, new economy, Nicholas Carr, peer-to-peer lending, personalized medicine, placebo effect, planetary scale, postindustrial economy, recommendation engine, RFID, ride hailing / ride sharing, Rodney Brooks, self-driving car, sharing economy, Silicon Valley, slashdot, Snapchat, social graph, social web, software is eating the world, speech recognition, Stephen Hawking, Steven Levy, Ted Nelson, the scientific method, transport as a service, two-sided market, Uber for X, Watson beat the top human players on Jeopardy!, Whole Earth Review

loans worth more than $10 billion: Simon Cunningham, “Default Rates at Lending Club & Prosper: When Loans Go Bad,” LendingMemo, October 17, 2014; and Davey Alba, “Banks Are Betting Big on a Startup That Bypasses Banks,” Wired, April 8, 2015. GE has launched over 400 new products: Steve Lohr, “The Invention Mob, Brought to You by Quirky,” New York Times, February 14, 2015. Netflix announced an award: Preethi Dumpala, “Netflix Reveals Million-Dollar Contest Winner,” Business Insider, September 21, 2009. Forty thousand groups submitted: “Leaderboard,” Netflix Prize, 2009. 150,000 car fanatics: Gary Gastelu, “Local Motors 3-D-Printed Car Could Lead an American Manufacturing Revolution,” Fox News, July 3, 2014. 3-D-printed electric car: Paul A. Eisenstein, “Startup Plans to Begin Selling First 3-D-Printed Cars Next Year,” NBC News, July 8, 2015. 7: FILTERING 8 million new songs: Private correspondence with Richard Gooch, CTO, International Federation of the Phonographic Industry, April 15, 2015.


pages: 398 words: 86,855

Bad Data Handbook by Q. Ethan McCallum


Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, cloud computing, cognitive dissonance, combinatorial explosion, conceptual framework, database schema,, Firefox, Flash crash, Gini coefficient, illegal immigration, iterative process, labor-force participation, loose coupling, natural language processing, Netflix Prize, quantitative trading / quantitative finance, recommendation engine, sentiment analysis, statistical model, supply-chain management, text mining, too big to fail, web application

As the value of the approach becomes better-known, the demand for part-time or project-based machine-learning work has grown, but it’s often hard for a traditional engineering team to effectively work with outside experts in the field. I’m going to talk about some of the things I learned while running an outsourced project through Kaggle,[73] a community of thousands of researchers who participate in data competitions modeled on the Netflix Prize. This was an extreme example of outsourcing: we literally handed over a dataset, a short description, and a success metric to a large group of strangers. It had almost none of the traditional interactions you’d expect, but it did teach me valuable lessons that apply to any interactions with machine-learning specialists. Define the Problem My company Jetpac creates a travel magazine written by your friends, using vacation photos they’ve shared with you on Facebook and other social services.


pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom


agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, anti-communist, artificial general intelligence, autonomous vehicles, barriers to entry, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, demographic transition, Douglas Hofstadter, Drosophila, Elon Musk,, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, Gödel, Escher, Bach, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John von Neumann, knowledge worker, Menlo Park, meta analysis, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Norbert Wiener, NP-complete, nuclear winter, optical character recognition, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, strong AI, superintelligent machines, supervolcano, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, transaction costs, Turing machine, Vernor Vinge, Watson beat the top human players on Jeopardy!, World Values Survey

This again foreshadows another later theme: the difficulty of anticipating all specific ways in which some particular plausible-seeming rule might go wrong. 73. Nilsson (2009, 319). 74. Minsky (2006); McCarthy (2007); Beal and Winston (2009). 75. Peter Norvig, personal communication. Machine-learning classes are also very popular, reflecting a somewhat orthogonal hype-wave of “big data” (inspired by e.g. Google and the Netflix Prize). 76. Armstrong and Sotala (2012). 77. Müller and Bostrom (forthcoming). 78. See Baum et al. (2011), another survey cited therein, and Sandberg and Bostrom (2011). 79. Nilsson (2009). 80. This is again conditional on no civilization-disrupting catastrophe occurring. The definition of HLMI used by Nilsson is “AI able to perform around 80% of jobs as well or better than humans perform” (Kruel 2012). 81.


pages: 306 words: 85,836

When to Rob a Bank: ...And 131 More Warped Suggestions and Well-Intended Rants by Steven D. Levitt, Stephen J. Dubner


Affordable Care Act / Obamacare, airport security, augmented reality, barriers to entry, Bernie Madoff, Black Swan, Broken windows theory, Captain Sullenberger Hudson, Daniel Kahneman / Amos Tversky, deliberate practice, feminist movement, food miles, George Akerlof, invisible hand, loss aversion, mental accounting, Netflix Prize, obamacare, oil shale / tar sands, peak oil, pre–internet, price anchoring, price discrimination, principal–agent problem, profit maximization, Richard Thaler, security theater, Ted Kaczynski, the built environment, The Chicago School, the High Line, Thorstein Veblen, transaction costs

. / 53 “A comprehensive Wall Street Journal article”: Sarah Rubenstein, “Why Generic Doesn’t Always Mean Cheap,” The Wall Street Journal, March 13, 2007. 57 “FOR $25 MILLION, NO WAY . . .”: “The virtues of offering big prizes to encourage . . . curing disease”: See Levitt, “Fight Global Pandemics (or at Least Find a Good Excuse When You’re Playing Hooky),”, May 18, 2007; “or improving Netflix’s algorithms”: See Levitt, “Netflix $ Million Prize,”, October 6, 2006. / 59 “As reported by ABC News”: See Matthew Cole, “U.S. Will Not Pay $25 Million Osama Bin Laden Reward, Officials Say,”, May 19, 2011. 61 “CAN WE PLEASE GET RID OF THE PENNY ALREADY?”: A “60 Minutes segment called ‘Making Cents’”: See Morley Safer, “Should We Make Cents?,” 60 Minutes, February 10, 2008. 71 “JANE SIBERRY SNAPS”: “Anybody remember when Levitt announced . . .”: See Levitt, “The Two Smartest Musicians I Ever Met,”, April 5, 2006; and Levitt, “From Now on I Will Leave the Reporting to Dubner,”, April 9, 2006. 72 “HOW MUCH TAX ARE ATHLETES . . .”: “Manny Pacquiao will probably never fight in New York”: See “Manny Pacquiao Won’t Ever Fight in New York Due to State Tax Rates,” The Wall Street Journal, August 7, 2013. / 73 “Pacquiao may never fight anywhere in the U.S. again”: See Lance Pugmire, “Promoter: Manny Pacquiao May Never Again Fight in the U.S.,” The Los Angeles Times, May 31, 2013. / 73 “Phil Mickelson . . .


pages: 743 words: 201,651

Free Speech: Ten Principles for a Connected World by Timothy Garton Ash


A Declaration of the Independence of Cyberspace, Affordable Care Act / Obamacare, Andrew Keen, Apple II, Ayatollah Khomeini, battle of ideas, Berlin Wall, bitcoin, British Empire, Cass Sunstein, Chelsea Manning, citizen journalism, Clapham omnibus, colonial rule, crowdsourcing, David Attenborough, don't be evil, Edward Snowden, Etonian, European colonialism, eurozone crisis, failed state, Fall of the Berlin Wall, Ferguson, Missouri, Filter Bubble, financial independence, Firefox, Galaxy Zoo, global village, index card, Internet Archive, invention of movable type, invention of writing, Jaron Lanier, jimmy wales, Julian Assange, Mark Zuckerberg, Marshall McLuhan, megacity, mutually assured destruction, national security letter, Netflix Prize, Nicholas Carr, obamacare, Peace of Westphalia, Peter Thiel, pre–internet, profit motive, RAND corporation, Ray Kurzweil, Ronald Reagan, semantic web, Silicon Valley, Simon Singh, Snapchat, social graph, Stephen Hawking, Steve Jobs, Steve Wozniak, The Death and Life of Great American Cities, The Wisdom of Crowds, Turing test, We are Anonymous. We are Legion, WikiLeaks, World Values Survey, Yom Kippur War

DeNardis 2014, 235–36 131. see Emily Steel and April Dembosky, ‘Facebook Raises Fears with Ad Tracking’, Financial Times, 23 September 2012, 132. see Mayer-Schönberger et al. 2013 133. a useful summary is given by Nate Anderson, ‘“Anonymized” Data Really Isn’t—and Here’s Why Not’, arstechnica, Sweeney’s article was published in Journal of Law, Medicine and Ethics, no. 25, 1997, 98–110 134. Mayer-Schönberger et al. 2013, 154–55 135. see the paper by Arvind Narayanan and Vitaly Shmatikov, ‘Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset)’, University of Texas, 2008, and their useful FAQs at For even more serious examples of the reidentification of supposedly anonymised medical data, see Nuffield Council on Bioethics 2015, 66–69 136. Ghonim 2012, chapters 3 and 4. He was the anonymous administrator of the Facebook page and used Tor to conceal his IP address 137. Josh Chin, ‘China Is Requiring People to Register Real Names for Some Internet Services’, Wall Street Journal, 4 February 2015,


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei


bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, web application

False positives are less desirable because they can annoy or anger consumers. Content-based recommender systems are limited by the features used to describe the items they recommend. Another challenge for both content-based and collaborative recommender systems is how to deal with new users for which a buying history is not yet available. Hybrid approaches integrate both content-based and collaborative methods to achieve further improved recommendations. The Netflix Prize was an open competition held by an online DVD-rental service, with a payout of $1,000,000 for the best recommender algorithm to predict user ratings for films, based on previous ratings. The competition and other studies have shown that the predictive accuracy of a recommender system can be substantially improved when blending multiple predictors, especially by using an ensemble of many substantially different methods, rather than refining a single technique.


pages: 222 words: 53,317

Overcomplicated: Technology at the Limits of Comprehension by Samuel Arbesman


3D printing, algorithmic trading, Anton Chekhov, Apple II, Benoit Mandelbrot, citation needed, combinatorial explosion, Danny Hillis, David Brooks, discovery of the americas,, Erik Brynjolfsson, Flash crash, friendly AI, game design, Google X / Alphabet X, Googley, HyperCard, Inbox Zero, Isaac Newton, iterative process, Kevin Kelly, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mandelbrot fractal, Minecraft, Netflix Prize, Nicholas Carr, Parkinson's law, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, Richard Feynman: Challenger O-ring, Second Machine Age, self-driving car, software studies, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, Stewart Brand, superintelligent machines, Therac-25, Tyler Cowen: Great Stagnation, urban planning, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, Y2K

Exceptions must be cherished, rather than discarded, for exceptions or rare instances contain a large amount of information. The sophisticated machine learning techniques used in linguistics—employing probability and a large array of parameters rather than principled rules—are increasingly being used in numerous other areas, both in science and outside it, from criminal detection to medicine, as well as in the insurance industry. Even our aesthetic tastes are rather complicated, as Netflix discovered when it awarded a prize for improvements in its recommendation engine to a team whose solution was cobbled together from a variety of different statistical techniques. The contest seemed to demonstrate that no simple algorithm could provide a significant improvement in recommendation accuracy; the winners needed to use a more complex suite of methods in order to capture and predict our personal and quirky tastes in films.


pages: 204 words: 58,565

Keeping Up With the Quants: Your Guide to Understanding and Using Analytics by Thomas H. Davenport, Jinho Kim


Black-Scholes formula, business intelligence, business process, call centre, computer age, correlation coefficient, correlation does not imply causation, Credit Default Swap,, feminist movement, Florence Nightingale: pie chart, forensic accounting, global supply chain, Hans Rosling, hypertext link, invention of the telescope, inventory management, Jeff Bezos, margin call, Moneyball by Michael Lewis explains big data, Netflix Prize, p-value, performance metric, publish or perish, quantitative hedge fund, random walk, Renaissance Technologies, Robert Shiller, Robert Shiller, self-driving car, sentiment analysis, six sigma, Skype, statistical model, supply-chain management, text mining, the scientific method

Many students opted to try their hand at the Netflix Challenge: to design a movie recommendations algorithm that does better than the one developed by Netflix. Here’s how the competition works. Netflix has provided a large data set that tells you how nearly half a million people have rated about 18,000 movies. Based on these ratings, you are asked to predict the ratings of these users for movies in the set that they have not rated. The first team to beat the accuracy of Netflix’s proprietary algorithm by a certain margin wins a prize of $1 million! Different student teams in my class adopted different approaches to the problem, using both published algorithms and novel ideas. Of these, the results from two of the teams illustrate a broader point. Team A came up with a very sophisticated algorithm using the Netflix data. Team B used a very simple algorithm, but they added in additional data beyond the Netflix set: information about movie genres from the Internet Movie Database (IMDB).


pages: 272 words: 64,626

Eat People: And Other Unapologetic Rules for Game-Changing Entrepreneurs by Andy Kessler


23andMe, Andy Kessler, bank run, barriers to entry, Berlin Wall, British Empire, business process, California gold rush, carbon footprint, Cass Sunstein, cloud computing, collateralized debt obligation, collective bargaining, computer age, disintermediation, Eugene Fama: efficient market hypothesis, fiat currency, Firefox, Fractional reserve banking, George Gilder, Gordon Gekko, greed is good, income inequality, invisible hand, James Watt: steam engine, Jeff Bezos, job automation, Joseph Schumpeter, knowledge economy, knowledge worker, libertarian paternalism, low skilled workers, Mark Zuckerberg, McMansion, Netflix Prize, packet switching, personalized medicine,, prediction markets, pre–internet, profit motive, race to the bottom, Richard Thaler, risk tolerance, risk-adjusted returns, Silicon Valley, six sigma, Skype, social graph, Steve Jobs, The Wealth of Nations by Adam Smith, transcontinental railway, transfer pricing, Yogi Berra

Again, all those powerful machines at the edge and huge networks of servers in the cloud with giant repositories of all the things we’ve done, what our friends are doing, what the average twenty-seven-year-old from Sheboygan, Wisconsin, is likely to do. Amazon uses a limited version of this in their recommendations, but more as a marketing tool to get you to buy yet another book. Others who view this item bought this book. We recommend that. They look for patterns and crudely overlay them on your page views in their system. Netflix, the DVD rental and streaming video company, offered a prize of $1 million for a better algorithm to suggest movies you might like to watch. These are all early adopters of the adaptive model. But why not recommend books based on my search history? If I’m searching on Google for information on the Ottoman Empire, surely there are a dozen books that ought to pop up that will be of interest immediately, without me heading to Amazon to find out.


pages: 377 words: 97,144

Singularity Rising: Surviving and Thriving in a Smarter, Richer, and More Dangerous World by James D. Miller


23andMe, affirmative action, Albert Einstein, artificial general intelligence, Asperger Syndrome, barriers to entry, brain emulation, cloud computing, cognitive bias, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, David Brooks, David Ricardo: comparative advantage, Deng Xiaoping,, feminist movement, Flynn Effect, friendly AI, hive mind, impulse control, indoor plumbing, invention of agriculture, Isaac Newton, John von Neumann, knowledge worker, Long Term Capital Management, low skilled workers, Netflix Prize, neurotypical, pattern recognition, Peter Thiel, phenotype, placebo effect, prisoner's dilemma, profit maximization, Ray Kurzweil, recommendation engine, reversible computing, Richard Feynman, Richard Feynman, Rodney Brooks, Silicon Valley, Singularitarianism, Skype, statistical model, Stephen Hawking, Steve Jobs, supervolcano, technological singularity, The Coming Technological Singularity, the scientific method, Thomas Malthus, transaction costs, Turing test, Vernor Vinge, Von Neumann architecture

The AI would obviously benefit from faster computers, but it could also improve its performance by using data from DNA sequencing and brain scans to conduct large statistical studies so as to better categorize people. For example, if 90 percent of people who had some unusual allele or brain microstructure enjoyed a certain cat video, then the AI recommender would suggest the video to all other viewers who had that trait. 12.Amenable to Crowdsourcing—Netflix, the rent-by-mail and streaming video distributor, offered (and eventually paid) a $1 million prize to whichever group improved its recommendation system the most, so long as at least one group improved the system by at least 10 percent. This “crowdsourcing,” which occurs when a problem is thrown open to anyone, helps a company by allowing them to draw on the talents of strangers, while only paying the strangers if they help the firm. This kind of crowdsourcing works only if, as with a video recommendation system, there is an easy and objective way of measuring progress toward the crowdsourced goal. 13.Potential Improvement All the Way Up to Superhuman Artificial General Intelligence—A recommendation AI could slowly morph into a content creator.

See also cognitive-enhancement drugs mental ability, general, 64 Methuselah Foundation, 170 Mexicans, millions of illegal, 135 microprocessors, 122 military spending, 180 military threats, 126–28 Milky Way galaxy, 45, 199 minority students, “acceptable” number of, 87 missile technology, 124 modafinil (cognitive-enhancement drug), 104 “modafinil squared” technology, 159 modern man, 77 “A Modest Proposal: Allow Women to Pay for College in Eggs” (Miller), 88 moons of Jupiter, 41 Moore, Gordon, xvi Moore’s Law, xivi 3–5, 8–9, 11, 17, 209 Muehlhauser, Luke, 7 multimillionaires, self-made, 209 murder, 39 musicians, 108 N nanobots, 10 nanosensors, 127 nanotechnology based weapons, 127 computing hardware, to improve, 8 to control our emotions and levels of happiness, 8 free energy from our galaxy stars and, 45 moons of Jupiter and, 41 office buildings, to construct large, 181 restorative, 213 virtual reality and, 171 nanotech weapons, 197 nanotubes, 4, 17 narcolepsy, 184 narcoleptics, 105 NASA, 217 National Center for Education Statistics (Washington, DC), 172 national defense, 180 National Football League (NFL), 90 natural disasters, 197 natural selection, 171 Nature, 109 Nazi Germany, 23 Netflix, 20 neuroscience, 185 neuroscientists, 13, 17, 203 Newton, Isaac, 91 New York Times, x NFL. See National Football League (NFL) Nobel science prizes, 96 nonparallel processing, 18 Normans in 1066, 187 North Korea, 187 Norvig, Peter, 35 nuclear war, xi, 197. See also thermonuclear war nuclear weapons, 24, 126 nutritional-supplement regime, 179 O Obama, Barack (President), 73 obsolescence, 144, 147 The Odyssey (Homer), 61 Omohundro, Steve, 25 Overcoming Bias (blog), 138, 207 P Pac-Man video game, 209 parallel processing, 18–19 Parkinson’s disease, 168–69 Pascal, Blaise, 208 patents and copyrights, 143 people, long term—oriented, 80 person, anonymous, 93 pharmaceutical product development, 183 phonetic pattern of language, 91 pirate maps, 184 placebo effect, 110–11 plagues, 36, 45, 78 plastic surgery, 89 Plath, Sylvia, 92 political correctness, 172 Polizzi, Eric, 5 population groups, 75–76, 96, 173 pornography, hard-core, 38, 195 pornography, Internet, 194 post-Singularity civilization, 199–200, 221 goods, 42 operating-system world, 41 pre-Singularity property will have value of, 188 pre-Singularity will be worthless, 187 property rights, 56, 188 race throughout the galaxy, 199 ultra-AI and chess, 132 value, 189 value of education, 192 value of money, 211 Praetorian Guard, 148 pre-Singularity destructive technologies, 201–2 investments, ultra-AI might obliterate the value of, 187 property rights, 56, 187–89 value of money, 211 prisoner-of-war camp, 31 Prisoners’ Dilemma AI development and, 47–53 annihilation of mankind, xix Chinese militaries and, 48–53 drug use and risk of schizophrenia or kidney failure, 160–62 unleaded vs. leaded petrol (gas), 57 US militaries and, 48–53 probe, self-replicating, 199–200 procrastination, 106 production wands, 145 pro-eugenic Chinese, ix prognosticators, 206–7 property owning, 147 property rights economic behavior and, xviii post-Singularity, 56 stable, 82 property rights, pre-Singularity, 187 property rights of bio-humans, 149 Psychology Today, 195 psychotic breakdowns, 120 Q quantum computing, 5, 17 quantum effects, 4 R rabbit population, 142 race, star-faring, 200 racial classifications, 76 racial equality, 173 rapture of the nerds, 208 Rattner, Justin, 35 real estate developer, 181–82 real estate development, 188 recessive condition, inherited, 83 Recursive Darkness (horse), 55, 57 Reed, Leonard, 204–5 religious disagreement, 43 reproductive fitness, 76 reproductive success, 75 resale value, 181 residential housing, 181 The Restaurant at the End of the Universe Adams), 150 retirement savings, 175–76 reverse-engineering biology, 203 reversible computing, 17 Ricardian comparative advantage, 136–37, 143, 188, 190 Ricardian scenario, 189 Ricardo, David, xvii, 135, 143 rich investors, 144–45 Ritalin (cognitive-enhancement drug), xiv, 104–5 Robin.


pages: 327 words: 103,336

Everything Is Obvious: *Once You Know the Answer by Duncan J. Watts


affirmative action, Albert Einstein, Amazon Mechanical Turk, Black Swan, butterfly effect, Carmen Reinhart, Cass Sunstein, clockwork universe, cognitive dissonance, collapse of Lehman Brothers, complexity theory, correlation does not imply causation, crowdsourcing, death of newspapers, discovery of DNA, East Village, easy for humans, difficult for computers, edge city,, Erik Brynjolfsson, framing effect, Geoffrey West, Santa Fe Institute, happiness index / gross national happiness, high batting average, hindsight bias, illegal immigration, interest rate swap, invention of the printing press, invention of the telescope, invisible hand, Isaac Newton, Jane Jacobs, Jeff Bezos, Joseph Schumpeter, Kenneth Rogoff, lake wobegon effect, Long Term Capital Management, loss aversion, medical malpractice, meta analysis, meta-analysis, Milgram experiment, natural language processing, Netflix Prize, Network effects, oil shock, packet switching, pattern recognition, performance metric, phenotype, planetary scale, prediction markets, pre–internet, RAND corporation, random walk, RFID, school choice, Silicon Valley, statistical model, Steve Ballmer, Steve Jobs, Steve Wozniak, supply-chain management, The Death and Life of Great American Cities, the scientific method, The Wisdom of Crowds, too big to fail, Toyota Production System, ultimatum game, urban planning, Vincenzo Peruggia: Mona Lisa, Watson beat the top human players on Jeopardy!, X Prize

The funding agency DARPA, for example, was able to harness the collective creativity of dozens of university research labs to build self-driving robot vehicles by offering just a few million dollars in prize money—far less than it would have cost to fund the same amount of work with conventional research grants. Likewise, the $10 million Ansari X Prize elicited more than $100 million worth of research and development in pursuit of building a reusable spacecraft. And the video rental company Netflix got some of the world’s most talented computer scientists to help it improve its movie recommendation algorithms for just a $1 million prize. Inspired by these examples—along with “open innovation” companies like Innocentive, which conducts hundreds of prize competitions in engineering, computer science, math, chemistry, life sciences, physical sciences, and business—governments are wondering if the same approach can be used to solve otherwise intractable policy problems.


pages: 291 words: 81,703

Average Is Over: Powering America Beyond the Age of the Great Stagnation by Tyler Cowen


Amazon Mechanical Turk, Black Swan, brain emulation, Brownian motion, Cass Sunstein, choice architecture, complexity theory, computer age, computer vision, cosmological constant, crowdsourcing, dark matter, David Brooks, David Ricardo: comparative advantage, deliberate practice, Drosophila,, endowment effect, epigenetics, Erik Brynjolfsson, eurozone crisis, experimental economics, Flynn Effect, Freestyle chess, full employment, future of work, game design, income inequality, industrial robot, informal economy, Isaac Newton, Khan Academy, labor-force participation, Loebner Prize, low skilled workers, manufacturing employment, Mark Zuckerberg, meta analysis, meta-analysis, microcredit, Narrative Science, Netflix Prize, Nicholas Carr, pattern recognition, Peter Thiel, randomized controlled trial, Ray Kurzweil, reshoring, Richard Florida, Richard Thaler, Ronald Reagan, Silicon Valley, Skype, statistical model, stem cell, Steve Jobs, Turing test, Tyler Cowen: Great Stagnation, upwardly mobile, Yogi Berra

See also artificial intelligence (AI) Mechanical Turk, 148–49 mechanization, 126–27 media, 146 median incomes, 38, 52, 60, 253 Medicaid, 234–39, 250 medical diagnosis, 87–89, 128–29 Medicare, 232–35, 237–38, 242 Medication Adherence Scores, 124 Mediterranean Europe, 174–75 memory, 151–55 meritocracy, 189–90, 230–31 meta-rationality, 82, 115 meta-studies, 224–25 Mexico, 168, 171, 177, 242–43 microcredit, 222–23 microeconomics, 212, 225 “micro-intelligibility,” 219 mid-wage occupations, 38 military, 29, 57 Millennium Prize Problems, 207–8 minimum wage, 59, 60 modes of employment, 35–36 monetarist theory, 226 MOOCs (massive open online courses), 180 Moonwalking with Einstein (Foer), 152 Moore’s law, 10, 15–16 moral issues, 26, 130–31 morale in the workplace, 30, 36 Mormon Church, 197 Morphy, Paul, 106 motivation, 197–202, 203 movie ratings, 121 Moxon’s Master, 134 Mueller, Andreas, 59 multinational corporations, 164 Murray, Charles, 231, 249 music, 146–47, 158 Myspace, 42, 209 mysticism, 153 Nakamura, Hikaru, 80 Narrative Science, 8–9 natural gas production, 177 natural language, 7, 119, 140–41 Naum (chess program), 72 negotiations in business, 12–13, 73 Netflix, 9 Nevada, 8 The New York Times, 11–12 Newton, Isaac, 153 Ng, Jennifer Hwee Kwoon, 89 Nickel, Arno, 81 Nielsen, Dagh, 80 Nobel Prizes, 187, 216 non-tradeable sectors, 176 North American Free Trade Agreement (NAFTA), 8 Northeast US, 241 “nudge” concept, 105 Obama healthcare reform, 237–38 Occupy Wall Street, 230, 251, 253, 256 O’Daniel, Karrah, 96 offshoring, 175. See also outsourcing “off-the-grid” living, 246–47 online dating, 9, 16, 95–98, 125, 144–45 online education, 179–85 opportunity cost, 184 options-pricing theory, 203 outsourcing, 162, 163–71 overseas labor markets, 59 “P vs.


pages: 552 words: 168,518

MacroWikinomics: Rebooting Business and the World by Don Tapscott, Anthony D. Williams


accounting loophole / creative accounting, airport security, Andrew Keen, augmented reality, Ayatollah Khomeini, barriers to entry, bioinformatics, Bretton Woods, business climate, business process, car-free, carbon footprint, citizen journalism, Clayton Christensen, clean water, Climategate, Climatic Research Unit, cloud computing, collaborative editing, collapse of Lehman Brothers, collateralized debt obligation, colonial rule, corporate governance, corporate social responsibility, crowdsourcing, death of newspapers, demographic transition, distributed generation, don't be evil,, energy security, energy transition, Exxon Valdez, failed state, fault tolerance, financial innovation, Galaxy Zoo, game design, global village, Google Earth, Hans Rosling, hive mind, Home mortgage interest deduction, interchangeable parts, Internet of things, invention of movable type, Isaac Newton, James Watt: steam engine, Jaron Lanier, jimmy wales, Joseph Schumpeter, Julian Assange, Kevin Kelly, knowledge economy, knowledge worker, Marshall McLuhan, medical bankruptcy, megacity, mortgage tax deduction, Netflix Prize, new economy, Nicholas Carr, oil shock, online collectivism, open borders, open economy, pattern recognition, peer-to-peer lending, personalized medicine, Ray Kurzweil, RFID, ride hailing / ride sharing, Ronald Reagan, scientific mainstream, shareholder value, Silicon Valley, Skype, smart grid, smart meter, social graph, social web, software patent, Steve Jobs, text mining, the scientific method, The Wisdom of Crowds, transaction costs, transfer pricing, University of East Anglia, urban sprawl, value at risk, WikiLeaks, X Prize, young professional, Zipcar

Today, the X Prize Foundation is just one of many organizations that have latched on to incentivized challenges as a way to unleash fundamental breakthroughs in society. Richard Branson, the founder of Virgin, will part with $25 million of his own money in exchange for a commercially feasible way to remove greenhouse gases from Earth’s atmosphere. Netflix has issued a global challenge to anyone who can improve the company’s automated movie recommendations algorithm, while Google’s Lunar X Prize will go to the first private venture to send image-transmitting rovers to the moon. Why do these competitions work? “We are genetically bred to compete,” Diamandis explains. “It’s when we do our best business many times, we do our best sports, and I believe competition extracts the best out of individuals.” Diamandis argues that competitions also bring out the best in small teams.