recommendation engine

166 results back to index


pages: 282 words: 63,385

Attention Factory: The Story of TikTok and China's ByteDance by Matthew Brennan

Airbnb, AltaVista, augmented reality, Benchmark Capital, Big Tech, business logic, Cambridge Analytica, computer vision, coronavirus, COVID-19, deep learning, Didi Chuxing, Donald Trump, en.wikipedia.org, fail fast, Google X / Alphabet X, growth hacking, ImageNet competition, income inequality, invisible hand, Kickstarter, Mark Zuckerberg, Menlo Park, natural language processing, Netflix Prize, Network effects, paypal mafia, Pearl River Delta, pre–internet, recommendation engine, ride hailing / ride sharing, Sheryl Sandberg, Silicon Valley, Snapchat, social graph, Steve Jobs, TikTok, Travis Kalanick, WeWork, Y Combinator

To deceive machines, a person needed to have a scrupulous attention to detail. These practices boiled down to a single goal: trick the app’s recommendation systems. Recommendation—the use of machine learning to infer people’s preferences for content based on their behavior alone—was at the very heart of TikTok. It was the key to understanding the success of the app and its parent company, ByteDance. ByteDance had been the earliest Chinese internet company to go “all in” on the then-nascent technology and commit to the daunting task of building a recommendation engine, challenging the status quo of human curation. This early bet paid off in spades. The foundations of TikTok’s success were laid many years before the app itself was built, and it was no coincidence that ByteDance was the company to make it.

Lei Li, ByteDance AI Labs Image: the large central “fishbowl” glass meeting room, ByteDance’s Beijing head offices, former aviation museum AVIC Plaza. Chapter Timeline 2012 Sept – Toutiao ’s personalized recommendation system goes live 2013 Aug – Zhang Lidong joins ByteDance to lead commercialization 20 14 – Yang Zhenyuan joins ByteDance as VP of Technology 2015 Jan – Okina wa annual meeting 2016 Feb – Company moves to new offices at AVIC Plaza I n mid-2012, an email dropped into ByteDance’s technical team’s inboxes with the ominous title “Recommended Engine General Meeting.” Yiming was determined to push forward on a topic that he saw to be critical to the company’s future.

Yiming was determined to push forward on a topic that he saw to be critical to the company’s future. The email continued: “To be an information platform, it is necessary to do a good job on the personalized recommendation engine. Do you want to start this thing now?” Toutiao’s early recommendation system, its so-called “personalization technology,” was, at the time, rudimentary. Open the app, and the user would be bombarded with top-read articles to keep them immediately hooked. Next, it would mix in more targeted click-bait articles appealing only to specific demographics to test and determine who the reader was. The user clicking on the article with a big preview picture of a female car show model is probably male.


pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline by Cathy O'Neil, Rachel Schutt

Amazon Mechanical Turk, augmented reality, Augustin-Louis Cauchy, barriers to entry, Bayesian statistics, bike sharing, bioinformatics, computer vision, confounding variable, correlation does not imply causation, crowdsourcing, data science, distributed generation, Dunning–Kruger effect, Edward Snowden, Emanuel Derman, fault tolerance, Filter Bubble, finite state, Firefox, game design, Google Glasses, index card, information retrieval, iterative process, John Harrison: Longitude, Khan Academy, Kickstarter, machine translation, Mars Rover, Nate Silver, natural language processing, Netflix Prize, p-value, pattern recognition, performance metric, personalized medicine, pull request, recommendation engine, rent-seeking, selection bias, Silicon Valley, speech recognition, statistical model, stochastic process, tacit knowledge, text mining, the scientific method, The Wisdom of Crowds, Watson beat the top human players on Jeopardy!, X Prize

proximity clustering, Morningside Analytics prtobuf, Back to Josh: Workflow pseudo-likelihood estimation procedure, Inference for ERGMs pseudocounts, Comparing Naive Bayes to k-NN purity, Probabilities Matter, Not 0s and 1s Q Quora, The Current Landscape (with a Little History) R R-squared, Adding in modeling assumptions about the errors, Selection criterion random forests, Random Forests–Random Forests random graphs, A First Example of Random Graphs: The Erdos-Renyi Model–A Second Example of Random Graphs: The Exponential Random Graph Model Erdos-Renyi model, A First Example of Random Graphs: The Erdos-Renyi Model–A Second Example of Random Graphs: The Exponential Random Graph Model exponential, A Second Example of Random Graphs: The Exponential Random Graph Model random variables, Probability distributions ranks, Evaluation, The Dimensionality Problem real-life performance measures, How to Be a Good Modeler real-time streaming data, Populations and Samples of Big Data real-world data, Process Thinking real-world processes, Statistical Inference RealDirect, Case Study: RealDirect website, Exercise: RealDirect Data Strategy RealDirect case study, Case Study: RealDirect–Sample R code RealDirect data strategy exercise, Exercise: RealDirect Data Strategy–Sample R code realizations, Probability distributions recall, Pick an evaluation metric, Evaluation, Defining the error metric receiver operating characteristic curve, Evaluation recommendation engines, Recommendation Engines: Building a User-Facing Data Product at Scale–Exercise: Build Your Own Recommendation System Amazon and, Recommendation Engines: Building a User-Facing Data Product at Scale building, exercise, Exercise: Build Your Own Recommendation System dimensionality, The Dimensionality Problem k-Nearest Neighbors (k-NN) and, Nearest Neighbor Algorithm Review–Some Problems with Nearest Neighbors machine learning classifications and, Beyond Nearest Neighbor: Machine Learning Classification–Beyond Nearest Neighbor: Machine Learning Classification Netflix and, Recommendation Engines: Building a User-Facing Data Product at Scale real-world, A Real-World Recommendation Engine records, Populations and Samples of Big Data Red Hat, Cloudera Reddy, Ben, Helping Hands redundancies, Feature Selection regression, stepwise, Selecting an algorithm regular expressions, Helping Hands relational ties, Terminology from Social Networks relations, Terminology from Social Networks relationships deterministic, Linear Regression understanding, Linear Regression relative time differentials, Thought Experiment residual sum of squares (RSS), Fitting the model residuals, Adding in modeling assumptions about the errors retention, understanding, Example: User Retention return, The Decision Tree Algorithm Robo-Graders, ethical implications of as thought experiment, Thought Experiment: What Are the Ethical Implications of a Robo-Grader?

Use mixed methods to come to a better understanding of what’s going on. Qualitative surveys can really help. Chapter 8. Recommendation Engines: Building a User-Facing Data Product at Scale Recommendation engines, also called recommendation systems, are the quintessential data product and are a good starting point when you’re explaining to non–data scientists what you do or what data science really is. This is because many people have interacted with recommendation systems when they’ve been suggested books on Amazon.com or gotten recommended movies on Netflix. Beyond that, however, they likely have not thought much about the engineering and algorithms underlying those recommendations, nor the fact that their behavior when they buy a book or rate a movie is generating data that then feeds back into the recommendation engine and leads to (hopefully) improved recommendations for themselves and other people.

Beyond that, however, they likely have not thought much about the engineering and algorithms underlying those recommendations, nor the fact that their behavior when they buy a book or rate a movie is generating data that then feeds back into the recommendation engine and leads to (hopefully) improved recommendations for themselves and other people. Aside from being a clear example of a product that literally uses data as its fuel, another reason we call recommendation systems “quintessential” is that building a solid recommendation system end-to-end requires an understanding of linear algebra and an ability to code; it also illustrates the challenges that Big Data poses when dealing with a problem that makes intuitive sense, but that can get complicated when implementing its solution at scale.


pages: 208 words: 57,602

Futureproof: 9 Rules for Humans in the Age of Automation by Kevin Roose

"World Economic Forum" Davos, adjacent possible, Airbnb, Albert Einstein, algorithmic bias, algorithmic management, Alvin Toffler, Amazon Web Services, Atul Gawande, augmented reality, automated trading system, basic income, Bayesian statistics, Big Tech, big-box store, Black Lives Matter, business process, call centre, choice architecture, coronavirus, COVID-19, data science, deep learning, deepfake, DeepMind, disinformation, Elon Musk, Erik Brynjolfsson, factory automation, fake news, fault tolerance, Frederick Winslow Taylor, Freestyle chess, future of work, Future Shock, Geoffrey Hinton, George Floyd, gig economy, Google Hangouts, GPT-3, hiring and firing, hustle culture, hype cycle, income inequality, industrial robot, Jeff Bezos, job automation, John Markoff, Kevin Roose, knowledge worker, Kodak vs Instagram, labor-force participation, lockdown, Lyft, mandatory minimum, Marc Andreessen, Mark Zuckerberg, meta-analysis, Narrative Science, new economy, Norbert Wiener, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, off-the-grid, OpenAI, pattern recognition, planetary scale, plutocrats, Productivity paradox, QAnon, recommendation engine, remote working, risk tolerance, robotic process automation, scientific management, Second Machine Age, self-driving car, Shoshana Zuboff, Silicon Valley, Silicon Valley startup, social distancing, Steve Jobs, Stuart Kauffman, surveillance capitalism, tech worker, The Future of Employment, The Wealth of Nations by Adam Smith, TikTok, Travis Kalanick, Uber and Lyft, uber lyft, universal basic income, warehouse robotics, Watson beat the top human players on Jeopardy!, work culture

The injection of algorithmic recommendations into every facet of modern life has gone mostly unnoticed, and yet, if we consider how many of our daily decisions we outsource to machines, it’s hard not to think that a historic, species-level transformation is taking place. “Recommendation engines increasingly shape who people are, what they desire, and who they want to become,” writes Michael Schrage, an MIT research fellow and author of a book about recommendation engines. “The future of the self,” he adds, “is the future of recommendation.” Modern recommendation systems are orders of magnitude more powerful than the one Doug Terry and Dave Nichols developed to sift through their email inboxes. Today’s tech companies have access to huge amounts of computing power that allows them to generate detailed models of user behavior, and machine learning techniques that let them discover patterns in enormous data sets—studying the online shopping behavior of a hundred million people to find out, for example, that people who buy a certain brand of dog food are statistically more likely to vote Republican.

What is rewarding is often hard, and hard is the enemy of the machine. * * * — Recently, I called Doug Terry, the Xerox PARC engineer who came up with Tapestry, the first algorithmic recommender system, nearly three decades ago. Terry, who is sixty-two, works at Amazon now, and after reminiscing about the early days of Tapestry, I asked what he thought of the recommendation engines that power services like Facebook, YouTube, and Netflix. “I don’t think there’s any comparison,” he said. “We just had a little simple system, and nowadays there’s trillions and trillions of feeds for billions of people—just the scale and complexity and everything is different.”

Terry, “A Tour Through Tapestry,” Proceedings of the 1993 ACM Conference on Organizational Computing Systems (1993). Michael Schrage, an MIT research fellow Michael Schrage, Recommendation Engines (Boston: MIT Press, 2020). YouTube has said that recommendations Paresh Dave, “YouTube Sharpens How It Recommends Videos Despite Fears of Isolating Users,” Reuters, November 28, 2017. It has been estimated that 30 percent of Amazon page views Amit Sharma, Jake M. Hofman, and Duncan J. Watts, “Estimating the Causal Impact of Recommendation Systems from Observational Data,” Proceedings of the 2015 ACM Conference on Economics and Computation (2015). Spotify’s algorithmically generated Discover Weekly playlists Devindra Hardawar, “Spotify’s Discover Weekly Playlists Have 40 Million Listeners,” Engadget, May 25, 2016.


pages: 347 words: 91,318

Netflixed: The Epic Battle for America's Eyeballs by Gina Keating

activist fund / activist shareholder / activist investor, AOL-Time Warner, Apollo 13, barriers to entry, Bear Stearns, business intelligence, Carl Icahn, collaborative consumption, company town, corporate raider, digital rights, inventory management, Jeff Bezos, late fees, Mark Zuckerberg, McMansion, Menlo Park, Michael Milken, Netflix Prize, new economy, out of africa, performance metric, Ponzi scheme, pre–internet, price stability, recommendation engine, Saturday Night Live, shareholder value, Silicon Valley, Silicon Valley startup, Steve Jobs, subscription business, Superbowl ad, tech worker, telemarketer, warehouse automation, X Prize

The costs of buying enough DVDs to satisfy the growing subscriber base would eventually crush the company unless Lowe could persuade studios to drop DVD prices drastically in exchange for a share of rental revenues. In the meantime, Netflix engineers had been hard at work since shortly after launch on a recommendation engine—an in-house solution to DVD shortages that would theoretically drive up retention and get more of the company’s catalog into circulation by directing customers away from the most popular films toward more obscure titles that they would like just as much. As a result, the recommendation engine took over the editorial team’s tasks of determining which movies to feature on certain themed Web pages, using machine logic rather than human intuition.

• • • WHEN NETFLIX’S FOUNDING software engineers, including Hastings, contemplated building a recommendation engine in 1999, their first approach was rudimentary and involved linking movies through common attributes: genre, actors, director, setting, happy or sad ending. As the film library grew, that method proved cumbersome and inaccurate, because no matter how many attributes they assigned each film, they could not capture why Pretty Woman was so different from, say, American Gigolo. Both were movies about prostitution set in a major U.S. city and starring Richard Gere, but they were unlikely to appeal to the same audiences. Early recommendation engines were unpredictable: In one famous gaffe, Walmart had to issue an apology and disable theirs after its Web site presented the film Planet of the Apes to shoppers looking for films related to Black History Month.

leaves Netflix, 84–85 Marquee Plan, 58–59, 63 meets Mitch Lowe, 23–24 Netflix, development of, 20–31 and Netflix founding, 6–9 personality of, 2, 15 post-Netflix positions, 253–54 public relations, early experience, 17–19 at Pure Atria, 11–13, 14–16 Queue, 58 on Qwikster fiasco, 253 relationship with Hastings, 15, 20, 44 video rentals, learning about, 22–24 Randolph, Muriel, 17 Randolph, Stephen, 17 Raskopf, Karen, 129, 152, 175 Rational Software, 7, 16 Recommendation engine. See Cinematch recommendation engine Redbox, 5, 84, 234–38 as Blockbuster competitor, 231–32, 234 and Coinstar, 237–38 development of, 24–25, 161 McDonalds placements, 235–36 as Netflix competitor, 228–29 new releases advantage, 161, 235, 238 pricing, 237 rationale for business, 235–36 video stores, impact on, 212, 238 Redpoint Ventures, 53 Redstone, Sumner, 72–74, 111–12 Reel.com, 48, 60, 62 ReFLEX, 78 Reiss, Lisa Battaglia, 45 Remind Me, 36 Rendich, Andy, 248, 251–53 leaves Netflix, 252 ReplayTV, 168 Rock The Block, 219–21, 230–32 Roku box, 224–25 Rolling Road Show, 177–79 Ross, Ken leaves Netflix, 243–44, 247 Netflix corporate communications actions, 138–41, 145, 177–81, 187–88 on Qwikster fiasco, 252 Roth Capitol Partners, 135 S Sam Goody/Musicland, 51 Santa Cruz, Netflix launch, 7–9 Sarandos, Ted, 103, 126, 129, 179, 210, 225, 240 Satellite hub system, 57 Schappert, John, 225 Scorsese, Martin, 179 Sellers, Pattie, 243 Serialized Delivery, 58–59 Sheehan, Susan, 140 Shepherd, James, 128–29 Shepherd, Nick background of, 118–19 as Blockbuster COO, 214–15 Blockbuster financial moves, concern about, 128–29, 202–3 cost-cutting, 118, 162–63 and End of Late Fees, 117–19 and hostile board of directors, 216–17 joins Blockbuster, 90–92 leaves Blockbuster, 217 personality of, 116–17 Redbox purchase, rejecting, 236 sells Blockbuster shares, 219 Siftar, Michael, 107 Siminoff, Ellen and David, 222 Simpson, Jessica, 175 Skip shipping, 28–29 Skorman, Stuart, 48 Smith, Therese “Te” background of, 16 leaves Netflix, 54 and Netflix development, 22, 28, 33, 37–38 Smith’s grocery (Las Vegas) Netflix Express, 83–84 Redbox at, 237 Social Register, 13 Sock puppets, 40–41 Software Publishing, 15 Soleil Securities, 209 SpeakerText, 42 Squali, Youssef, 135 Starfish Software, 16 Starz Entertainment, 225, 239–40 Stead, Ed and Blockbuster Online development, 86–89, 95 and Carl Icahn, 116, 121 and Hastings alliance attempts, 66–67, 77 leaves Blockbuster, 170 personality of, 60, 95 and video streaming plans, 77–78 Streaming video.


pages: 390 words: 109,519

Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media by Tarleton Gillespie

4chan, A Declaration of the Independence of Cyberspace, affirmative action, Airbnb, algorithmic bias, algorithmic management, AltaVista, Amazon Mechanical Turk, borderless world, Burning Man, complexity theory, conceptual framework, crowdsourcing, deep learning, do what you love, Donald Trump, drone strike, easy for humans, difficult for computers, Edward Snowden, eternal september, fake news, Filter Bubble, Gabriella Coleman, game design, gig economy, Google Glasses, Google Hangouts, hiring and firing, Ian Bogost, independent contractor, Internet Archive, Jean Tirole, John Gruber, Kickstarter, Mark Zuckerberg, mass immigration, Menlo Park, Minecraft, moral panic, multi-sided market, Netflix Prize, Network effects, pattern recognition, peer-to-peer, power law, real-name policy, recommendation engine, Rubik’s Cube, Salesforce, sharing economy, Silicon Valley, Skype, slashdot, Snapchat, social graph, social web, Steve Jobs, Stewart Brand, TED Talk, Telecommunications Act of 1996, two-sided market, WikiLeaks, Yochai Benkler

See self-harm Tinder (dating app), (i), (ii), (iii) Topfree Equal Rights Association [TERA], (i), (ii), (iii) traditional media: economics of, (i), (ii), (iii); regulation of, (i), (ii); moderation of, (i), (ii), (iii), (iv), (v), (vi), (vii), (viii), (ix), (x), (xi) transgender, (i) transparency and accountability, (i), (ii), (iii), (iv), (v), (vi), (vii), (viii), (ix) transparency reports, (i) TripAdvisor (recommendation platform), (i) trolling, (i), (ii), (iii), (iv), (v), (vi), (vii), (viii)n2 Trust and Safety Council (Twitter), (i), (ii)n60 Tumblr (Yahoo): community guidelines, (i), (ii), (iii); and the thinspo controversy, (i); and ratings, (i); and the NSFW controversy, (i); and filtering, (i), (ii) Tushnet, Rebecca, (i) Twitter: community guidelines, (i), (ii), (iii), (iv), (v), (vi), (vii)n10; and harassment, (i), (ii), (iii), (iv), (v)n60, (vi)n2, (vii)n5; responses to removal requests, (i), (ii); approach to moderation, (i), (ii); and flagging, (i); and automated detection, (i), (ii), (iii); moderation of Trends, (i); fake news / Russian ad controversies, (i) U.K.

SoundCloud is a social place but it’s not the place for you to act out rage from other parts of your life. Don’t let a personal issue strain the rest of the community.”20 Trolls often look to game the platform itself: misusing complaint mechanisms, and wreaking havoc with machine learning algorithms behind recommendation systems and chatbots.21 Unfortunately, platforms must also address behavior that goes well beyond spirited debate gone overboard or the empty cruelty of trolls: directed and ongoing harassment of an individual over time, including intimidation, stalking, and direct threats of violence. This is where the prohibitions often get firm: “Never threaten to harm a person, group of people, or property” (Snapchat).

“Race, Civil Rights, and Hate Speech in the Digital Era.” In Learning Race and Ethnicity: Youth and Digital Media. Cambridge: MIT Press. http://academicworks.cuny.edu/gc_pubs/193/. DAVID, SHAY, AND TREVOR JOHN PINCH. 2005. “Six Degrees of Reputation: The Use and Abuse of Online Review and Recommendation Systems.” First Monday, special issue 6: Commercial Applications of the Internet. http://firstmonday.org/ojs/index.php/fm/article/view/1590/1505. DEIBERT, RONALD, JOHN PALFREY, RAFAL ROHOZINSKI, AND JONATHAN ZITTRAIN, EDS. 2008. Access Denied: The Practice and Policy of Global Internet Filtering.


pages: 451 words: 103,606

Machine Learning for Hackers by Drew Conway, John Myles White

call centre, centre right, correlation does not imply causation, data science, Debian, Erdős number, Nate Silver, natural language processing, Netflix Prize, off-by-one error, p-value, pattern recognition, Paul Erdős, recommendation engine, social graph, SpamAssassin, statistical model, text mining, the scientific method, traveling salesman

More likely, you have heard of something like a recommendation system, which implicitly produces a ranking of products. Even if you have not heard of a recommendation system, it’s almost certain that you have used or interacted with a recommendation system at some point. Some of the most successful ecommerce websites have benefited from leveraging data on their users to generate recommendations for other products their users might be interested in. For example, if you have ever shopped at Amazon.com, then you have interacted with a recommendation system. The problem Amazon faces is simple: what items in their inventory are you most likely to buy?

We use data from US Senator roll call voting to cluster those legislators based on their votes. Recommendation system: suggesting R packages to users To further the discussion of spatial similarities, we discuss how to build a recommendation system based on the closeness of observations in space. Here we introduce the k-nearest neighbors algorithm and use it to suggest R packages to programmers based on their currently installed packages. Social network analysis: who to follow on Twitter Here we attempt to combine many of the concepts previously discussed, as well as introduce a few new ones, to design and build a “who to follow” recommendation system from Twitter data.

It can be quite interesting and informative to explore these structures in detail, and we encourage you to do so. In the next and final section, we will use these community structures to build our own “who to follow” recommendation engine for Twitter. Building Your Own “Who to Follow” Engine There are many ways that we might think about building our own friend recommendation engine for Twitter. Twitter has many dimensions of data in it, so we could think about recommending people based on what they “tweet” about. This would be an exercise in text mining and would require matching people based on some common set of words or topics within their corpus of tweets.


pages: 23 words: 5,264

Designing Great Data Products by Jeremy Howard, Mike Loukides, Margit Zwemer

AltaVista, data science, Filter Bubble, PageRank, pattern recognition, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, text mining

The takeaway, whether you are a tiny startup or a giant insurance company, is that we unconsciously use optimization whenever we decide how to get to where we want to go. Drivetrain Approach to recommender systems Let’s look at how we could apply this process to another industry: marketing. We begin by applying the Drivetrain Approach to a familiar example, recommendation engines, and then building this up into an entire optimized marketing strategy. Recommendation engines are a familiar example of a data product based on well-built predictive models that do not achieve an optimal objective. The current algorithms predict what products a customer will like, based on purchase history and the histories of similar customers.

Here is a screenshot of the “Customers Who Bought This Item Also Bought” feed on Amazon from a search for the latest book in Terry Pratchett’s “Discworld series:” All of the recommendations are for other books in the same series, but it’s a good assumption that a customer who searched for “Terry Pratchett” is already aware of these books. There may be some unexpected recommendations on pages 2 through 14 of the feed, but how many customers are going to bother clicking through? Instead, let’s design an improved recommendation engine using the Drivetrain Approach, starting by reconsidering our objective. The objective of a recommendation engine is to drive additional sales by surprising and delighting the customer with books he or she would not have purchased without the recommendation. What we would really like to do is emulate the experience of Mark Johnson, CEO of Zite, who gave a perfect example of what a customer’s recommendation experience should be like in a recent TOC talk.

While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. O'Reilly Media * * * Chapter 1. Designing Great Data Products By Jeremy Howard, Margit Zwemer, and Mike Loukides In the past few years, we’ve seen many data products based on predictive modeling. These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself. But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction. Prediction technology can be interesting and mathematically elegant, but we need to take the next step.


Succeeding With AI: How to Make AI Work for Your Business by Veljko Krunic

AI winter, Albert Einstein, algorithmic trading, AlphaGo, Amazon Web Services, anti-fragile, anti-pattern, artificial general intelligence, autonomous vehicles, Bayesian statistics, bioinformatics, Black Swan, Boeing 737 MAX, business process, cloud computing, commoditize, computer vision, correlation coefficient, data is the new oil, data science, deep learning, DeepMind, en.wikipedia.org, fail fast, Gini coefficient, high net worth, information retrieval, Internet of things, iterative process, job automation, Lean Startup, license plate recognition, minimum viable product, natural language processing, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, six sigma, smart cities, speech recognition, statistical model, strong AI, tail risk, The Design of Experiments, the scientific method, web application, zero-sum game

.  Application of various metrics seen in the scientific papers (such as “Evaluating Recommendation Systems” [77]) on recommendation engines. An example of such a metric is novelty [78], which measures how many new items that the user didn’t know about were recommended. What is often not clear is if such technical metrics positively impact any aspect of the business—I might not have known that the retailer stocks garden hoses, but do I care?  Measure the sales increase from improved recommendations. This approach has clear business relevance and is unambiguous. You don’t have to deploy your recommendation engine fully in production to test the sales increase.

Let’s look at one typical situation in which it’s difficult to have an intuitive feel for what good results are for your AI project. 4.1.1 What constitutes a good recommendation engine? You’re in charge of the recommendation engine of a large retailer. The retailer is selling 200 K products, and it has a total of 80 million customers and close to 2 million products viewed every day. Your recommendation engine suggests to every customer additional products they might be interested in buying. You’ve just made an update to your recommendation engine. How do you know that the latest update is moving the system in the right direction? You can look at a few products overall, but looking at a few products doesn’t actually tell you if your latest change is doing well across all of your customers.

Artificial general intelligence. Wikipedia. [Cited 2018 Jun 13.] Available from: https://en.wikipedia.org/w/index.php?title=Artificial _general_intelligence Shani G, Gunawardana A. Evaluating recommendation systems. In: Ricci F, Rokach L, Shapira B, Kantor PB, editors. Recommender systems handbook. New York: Springer; 2011. p. 257–297. Konstan JA, McNee SM, Ziegler , Torres R, Kapoor N, Riedl JT. Lessons on applying automated recommender systems to information-seeking tasks. Proceedings of the Twenty-First National Conference on Artificial Intelligence; 2006. Wikimedia Foundation. Expected value of perfect information.


pages: 439 words: 131,081

The Chaos Machine: The Inside Story of How Social Media Rewired Our Minds and Our World by Max Fisher

2021 United States Capitol attack, 4chan, A Declaration of the Independence of Cyberspace, Airbnb, Bellingcat, Ben Horowitz, Bernie Sanders, Big Tech, Bill Gates: Altair 8800, bitcoin, Black Lives Matter, call centre, centre right, cloud computing, Comet Ping Pong, Computer Lib, coronavirus, COVID-19, crisis actor, crowdsourcing, dark pattern, data science, deep learning, deliberate practice, desegregation, disinformation, domesticated silver fox, Donald Trump, Douglas Engelbart, Douglas Engelbart, end-to-end encryption, fake news, Filter Bubble, Future Shock, game design, gamification, George Floyd, growth hacking, Hacker Conference 1984, Hacker News, hive mind, illegal immigration, Jeff Bezos, John Perry Barlow, Jon Ronson, Joseph Schumpeter, Julian Assange, Kevin Roose, lockdown, Lyft, Marc Andreessen, Mark Zuckerberg, Max Levchin, military-industrial complex, Oklahoma City bombing, Parler "social media", pattern recognition, Paul Graham, Peter Thiel, profit maximization, public intellectual, QAnon, recommendation engine, ride hailing / ride sharing, Rutger Bregman, Saturday Night Live, Sheryl Sandberg, side project, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, Snapchat, social distancing, Social Justice Warrior, social web, Startup school, Stephen Hawking, Steve Bannon, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Susan Wojcicki, tech billionaire, tech worker, Ted Nelson, TED Talk, TikTok, Uber and Lyft, uber lyft, Whole Earth Catalog, WikiLeaks, Y Combinator

An American iteration, which had first appeared on the message board 4chan under the label “QAnon,” had recently hit Facebook like a match to a pool of gasoline. Later, as QAnon became a movement with tens of thousands of followers, an internal FBI report identified it as a domestic terror threat. Throughout, Facebook’s recommendation engines promoted QAnon groups to huge numbers of readers, as if this were merely another club, helping to grow the conspiracy into the size of a minor political party, for seemingly no more elaborate reason than the continued clicks the QAnon content generated. Within Facebook’s muraled walls, though, belief in the product as a force for good seemed unshakable.

Soon, DiResta noticed Facebook doing something strange: pushing a stream of notifications urging her to follow other anti-vaccine pages. “If you joined the one anti-vaccine group,” she said, “it was transformative.” Nearly every vaccine-related recommendation promoted to her was for anti-vaccine content. “The recommendation engine would push them and push them and push them.” Before long, the system prompted her to consider joining groups for unrelated conspiracies. Chemtrails. Flat Earth. And as she poked around, she found another way that the system boosted vaccine misinformation. Just as with the ad-targeting tool, typing “vaccines” in Facebook’s search bar returned a stream of anti-vaccine posts and groups.

Others from DiResta’s informal group of social media watchers were noticing Facebook and other platforms routing them in similar ways. The same pattern played out over and over, as if those A.I.s had all independently arrived at some common, terrible truth about human nature. “I called it radicalization via the recommendation engine,” she said. “By having engagement-driven metrics, you created a world in which rage-filled content would become the norm.” The algorithmic logic was sound, even brilliant. Radicalization is an obsessive, life-consuming process. Believers come back again and again, their obsession becoming an identity, with social media platforms the center of their day-to-day lives.


pages: 642 words: 141,888

Like, Comment, Subscribe: Inside YouTube's Chaotic Rise to World Domination by Mark Bergen

23andMe, 4chan, An Inconvenient Truth, Andy Rubin, Anne Wojcicki, Big Tech, Black Lives Matter, book scanning, Burning Man, business logic, call centre, Cambridge Analytica, citizen journalism, cloud computing, Columbine, company town, computer vision, coronavirus, COVID-19, crisis actor, crowdsourcing, cryptocurrency, data science, David Graeber, DeepMind, digital map, disinformation, don't be evil, Donald Trump, Edward Snowden, Elon Musk, fake news, false flag, game design, gender pay gap, George Floyd, gig economy, global pandemic, Golden age of television, Google Glasses, Google X / Alphabet X, Googley, growth hacking, Haight Ashbury, immigration reform, James Bridle, John Perry Barlow, Justin.tv, Kevin Roose, Khan Academy, Kinder Surprise, Marc Andreessen, Marc Benioff, Mark Zuckerberg, mass immigration, Max Levchin, Menlo Park, Minecraft, mirror neurons, moral panic, move fast and break things, non-fungible token, PalmPilot, paypal mafia, Peter Thiel, Ponzi scheme, QAnon, race to the bottom, recommendation engine, Rubik’s Cube, Salesforce, Saturday Night Live, self-driving car, Sheryl Sandberg, side hustle, side project, Silicon Valley, slashdot, Snapchat, social distancing, Social Justice Warrior, speech recognition, Stanford marshmallow experiment, Steve Bannon, Steve Jobs, Steven Levy, surveillance capitalism, Susan Wojcicki, systems thinking, tech bro, the long tail, The Wisdom of Crowds, TikTok, Walter Mischel, WikiLeaks, work culture

Type in an impossibly long question (Where did the actress who plays Rachel’s mom on Friends go to college?) and there’s the answer. Translate this question into French et voilà. Neural networks went into Google’s email spam filters and ad-targeting dials and digital photo albums. At YouTube neural networks plugged into its recommendation engine. * * * • • • Think of YouTube’s recommendation system as a gigantic, multiarmed sorting machine. It has one task: predict what video someone will watch next and deliver it. From YouTube’s outset its computer programs strove to do this. But the Brain neural network could make predictions and sort in ways fallible humans and flimsier code could not.

Certain videos claimed that Hillary Clinton and her top aide assaulted a young girl and drank her blood. This was Frazzledrip, a bizarre cousin of Pizzagate, a theory that had, by then, morphed into QAnon, the cultlike conspiracy theory and movement. “What is your company policy on that?” Raskin asked. At that time YouTube was working on a major overhaul of its recommendation engine to bury conspiracy clips and other footage deemed “harmful” in its penalty box. But this change wasn’t ready for public consumption, so Pichai didn’t mention it. “We are looking to do more,” he replied. “Is your basic position,” the congressman pressed, “that there’s just an avalanche of material and there’s nothing that could be done?”

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A ABC television network, 76 “Abortion Man,” 75 abortion-related content, 86 Accenture, 317–18, 319 addictiveness of videos, 239 “The Adpocalypse: What it Means” (vlogbrothers), 289 advertising/advertisers and ad-friendly mandate for creators, 267–68, 274, 282–83 algorithms’ role in placement of, 284, 285, 295 banner ads, 73, 197, 200 and boycotts, 283–90, 295, 300, 308, 329, 356 and child-directed content, 173, 368, 394 and child-exploitation debacle, 311–14 and comments removed from kids’ videos, 371 and copyright concerns, 108 creators’ control over types of, 347 and data shared with marketers, 284–85, 295 dismantling of targeted, 390 Dynamic Ad Loads (Dallas), 191–92, 194 and eligibility thresholds for partner program, 329 on Facebook, 252, 284 and first profit of YouTube, 50 and fraudsters, 284 and Google Preferred, 210 and home page of YouTube, 101 Hurley’s reluctance to employ, 68 increase in videos eligible for, 110 on non-partner program channels, 391–92 and partner program, 69–70, 163–64 and pay-per-view business model, 133 pop-up ads, 68, 73 “pre-rolls,” 68 and product placements, 67 and Project MASA, 296, 313, 381 and questionable/troubling content, 67, 75, 87, 255, 285–87, 296 recent sales revenues, 391, 401 removed from creators’ videos, 267–68, 314, 324 Russia Today (RT), 340–41 and sell-through rate, 108, 109–10 shortage of slots for, 164 skippable, 192 and Spotlight (influencer campaigns), 248 television model for, 110–11 and user experience, 68 and viewability/measurability, 284–85 and Wojcicki, 195, 196, 197–98, 200, 213 AdWords, 51 Agha-Soltan, Neda, 137 Aghdam, Nasim Najafi, 331–35 Akilah Obviously, 261 al-Awlaki, Anwar, 214 Alchemy House, 256 algorithms of YouTube for ad placements, 284, 285, 295 adult content from kids’ search terms, 306 and advertiser-friendly content, 267 and authoritative news sources, 325–26 and changes to reward system, 155, 156–60, 164 and clickbait, 150 and comments vs. likes, 158, 276 and conspiracy theories, 325–29 and creators of color, 339 creators’ understanding of, 297, 385 and daily viewers, 252, 254 disclosure of information about, 297, 398 and Google Preferred content, 210 and government regulation, 401 and home-page of YouTube, 99–102, 135, 298 Jho on improvements in, 394–95 and keyword stuffing, 308–9 and “Leanback” feature, 189–90 limitations of, 255–56 and machine learning, 191–92 and Paul’s video of suicide victim, 323 and PewDiePie, 275 and presidential election of 2016, 272, 326 and presidential election of 2020, 388 and quality content, 175 responsibility metric, 328 screeners’ role in training, 320 and skeptics of YouTube, 223 skin-detection by, 255–56 titles of content chosen for, 172 watch time favored in, 156–60 and YouTube Kids app, 238, 244–45 Allen & Company (investment bank), 49 Alphabet, 257 alt-right, 263, 269–70, 275, 277–78 Amazing Atheist, 223 Amazon, 210, 232–33, 253 Anderson, Erica, 350 Andreessen, Marc, 72 Android, 147, 177 animated videos, 241 Annoying Orange (YouTuber), 128, 140, 160 anonymous creators, 172–73 Anti-Defamation League (ADL), 281 antisemitism, 86, 275, 277, 281 Apple, 35, 56, 149, 176–77, 207–8 Arab Spring, 139, 140, 141–43, 145, 149, 164, 213 Argento, Dario, 383 Armstrong, Tim, 73 Arnspiger, Dianna, 334 artificial intelligence and neural networks content moderation with, 233–35, 292–93, 315, 399–400 and DeepMind acquisition by Google, 230–31 detection of red flags, 396 and DistBelief system, 232 and Google Brain, 231–35, 298 Google’s application of, 233 inability to precisely control or predict, 308 “precision and recall” protocols for, 309 and problematic content targeted at kids, 308–10 in recommendation engine of YouTube, 233–35 Reinforce program, 298 Whittaker’s criticisms of, 355 See also algorithms of YouTube; machine learning Ask a Ninja, 69 ASMR videos, 7, 208 AT&T, 210, 286 atheists/atheism, 221–22, 223, 226 audience of YouTube ages of viewers, 86, 169 and Arab Spring content, 145 and audience is king credo, 254, 297 average time on platform, 126 and billion-hours-of-viewing goal, 228, 270 and channels model, 127 communities built by, 122 complaints from users, 25 and COVID-19 pandemic, 376, 377–78 and cumulative hours of viewed footage, 154 daily viewers, 252, 254 emphasis on growth of, 91 and initiative to recruit female viewers, 369 and length of viewing sessions, 252 (see also watch time of audience) loyalty to YouTube, 394 number of videos watched daily, 49, 140 satisfaction ratings of, 296–97 See also engagement of users Auletta, Ken, 97 authoritative sources, 368, 388 Authors Guild, 48 auto-play function, 167 AwesomenessTV, 132, 210 B “Baby Shark,” 5, 306 bad actors, 308, 316, 329.


pages: 519 words: 102,669

Programming Collective Intelligence by Toby Segaran

algorithmic management, always be closing, backpropagation, correlation coefficient, Debian, en.wikipedia.org, Firefox, full text search, functional programming, information retrieval, PageRank, prediction markets, recommendation engine, slashdot, social bookmarking, sparse data, Thomas Bayes, web application

To find a set of links similar to one that you found particularly interesting, you can try: >>url=recommendations.getRecommendations(delusers,user)[0][1] >> recommendations.topMatches(recommendations.transformPrefs(delusers),url) [(0.312, u'http://www.fonttester.com/'), (0.312, u'http://www.cssremix.com/'), (0.266, u'http://www.logoorange.com/color/color-codes-chart.php'), (0.254, u'http://yotophoto.com/'), (0.254, u'http://www.wpdfd.com/editorial/basics/index.html')] That's it! You've successfully added a recommendation engine to del.icio.us. There's a lot more that could be done here. Since del.icio.us supports searching by tags, you can look for tags that are similar to each other. You can even search for people trying to manipulate the "popular" pages by posting the same links with multiple accounts. Item-Based Filtering The way the recommendation engine has been implemented so far requires the use of all the rankings from every user in order to create a dataset. This will probably work well for a few thousand people or items, but a very large site like Amazon has millions of customers and products—comparing a user with every other user and then comparing every product each user has rated can be very slow.

In late 2006 it announced a prize of $1 million to the first person to improve the accuracy of its recommendation system by 10 percent, along with progress prizes of $50,000 to the current leader each year for as long as the contest runs. Thousands of teams from all over the world entered and, as of April 2007, the leading team has managed to score an improvement of 7 percent. By using data about which movies each customer enjoyed, Netflix is able to recommend movies to other customers that they may never have even heard of and keep them coming back for more. Any way to improve its recommendation system is worth a lot of money to Netflix. The search engine Google was started in 1998, at a time when there were already several big search engines, and many assumed that a new player would never be able to take on the giants.

In Chapter 4 you'll learn about search engines and the PageRank algorithm, an important part of Google's ranking system. Other examples include web sites with recommendation systems. Sites like Amazon and Netflix use information about the things people buy or rent to determine which people or items are similar to one another, and then make recommendations based on purchase history. Other sites like Pandora and Last.fm use your ratings of different bands and songs to create custom radio stations with music they think you will enjoy. Chapter 2 covers ways to build recommendation systems. Prediction markets are also a form of collective intelligence. One of the most well known of these is the Hollywood Stock Exchange (http://hsx.com), where people trade stocks on movies and movie stars.


The Internet Trap: How the Digital Economy Builds Monopolies and Undermines Democracy by Matthew Hindman

A Declaration of the Independence of Cyberspace, accounting loophole / creative accounting, activist fund / activist shareholder / activist investor, AltaVista, Amazon Web Services, barriers to entry, Benjamin Mako Hill, bounce rate, business logic, Cambridge Analytica, cloud computing, computer vision, creative destruction, crowdsourcing, David Ricardo: comparative advantage, death of newspapers, deep learning, DeepMind, digital divide, discovery of DNA, disinformation, Donald Trump, fake news, fault tolerance, Filter Bubble, Firefox, future of journalism, Ida Tarbell, incognito mode, informal economy, information retrieval, invention of the telescope, Jeff Bezos, John Perry Barlow, John von Neumann, Joseph Schumpeter, lake wobegon effect, large denomination, longitudinal study, loose coupling, machine translation, Marc Andreessen, Mark Zuckerberg, Metcalfe’s law, natural language processing, Netflix Prize, Network effects, New Economic Geography, New Journalism, pattern recognition, peer-to-peer, Pepsi Challenge, performance metric, power law, price discrimination, recommendation engine, Robert Metcalfe, search costs, selection bias, Silicon Valley, Skype, sparse data, speech recognition, Stewart Brand, surveillance capitalism, technoutopianism, Ted Nelson, The Chicago School, the long tail, The Soul of a New Machine, Thomas Malthus, web application, Whole Earth Catalog, Yochai Benkler

Compared to straight collaborative filtering, the hybrid model produced 31 percent more clicks on news stories, though this was largely the result of shifting traffic from interior sections of the site to recommended stories on the front page. Even more importantly, over the course of the study users who saw the hybrid model had 14 percent more daily visits to the Google News site. This is a clear demonstration of just how much improved recommendation systems can boost daily traffic. Other computer science researchers have produced traffic bonuses with news recommendation engines. Hewlett-Packard researchers Evan Kirshenbaum, George Forman, and Michael Dugan conducted an experiment comparing different methods of content recommendation on Forbes.com. Here too, as at Google, the researchers found that a mixture of content-based and collaborative-filtering methods gave a significant improvement.40 Yahoo!

Scholarship to date suggests six broad, interrelated lessons about which types of organizations are likely to win—and lose—in a world with ubiquitous algorithmic filtering. First, and most important, recommender systems can dramatically increase digital audience. Web traffic is properly thought of as a dynamic, even evolutionary process. Recommender systems make sites stickier, and users respond by clicking more and visiting more often. Over time sites and apps with recommender systems have grown in market share, while those without have shrunk. Second, recommender systems favor digital firms with lots of goods and content. There is only value in matching if the underlying catalogue of choices is large.

As this book goes to press, new journalism and communication scholarship has finally started to address this longstanding gap.9 Still, much work remains to be done. This chapter has two main aims. First, it offers a more detailed examination of the principals behind these recommendation systems than previous media scholarship. Recommender systems research has changed dramatically over the past decade, but little of this new knowledge has filtered into 40 • Chapter 3 research on web traffic, online news, or the future of journalism. Much of the writing on recommender systems in these fields has been an unhelpful montage of hypotheticals and what-ifs. Elaborate deductive conclusions have been built from false foundational assumptions.


pages: 1,085 words: 219,144

Solr in Action by Trey Grainger, Timothy Potter

business intelligence, cloud computing, commoditize, conceptual framework, crowdsourcing, data acquisition, data science, en.wikipedia.org, failed state, fault tolerance, finite state, full text search, functional programming, glass ceiling, information retrieval, machine readable, natural language processing, openstreetmap, performance metric, premature optimization, recommendation engine, web application

Instead of thinking of Solr as a text search engine, it can be mentally freeing to think of Solr as a “matching engine that happens to be able to match on parsed text.” Whether the search is manual or automated is of no consequence to Solr. In fact, several organizations have successfully built recommender systems directly on top of Solr using this thinking. The following sections will cover how to build your own Solr-powered recommendation engine and ultimately how to merge the concepts of a user-driven search experience and an automated recommendation system to provide a powerful, personalized search experience. In particular, we will discuss several content-based recommendation approaches including attribute-based matching, hierarchical-classification-based matching, matching based upon extracted interesting terms (More Like This), concept-based matching, and geographical matching.

This shifts the paradigm completely, because it requires software systems to be intelligent enough to recommend information to users as opposed to having them explicitly search for it. Although organizations such as Netflix and Amazon are well known for their recommender systems and have spent millions of dollars developing them, it’s both possible and easy to develop such systems yourself—particularly on top of Solr—to drastically improve the relevancy of your application. 16.5.1. Search vs. recommendations When one thinks of a search engine, the vision of a keyword box (and sometimes a separate location box) typically comes to mind. Likewise, when one thinks of a recommendation engine, the vision of a magical algorithm which automatically suggests information based upon past behavior and preferences likely comes to mind.

The beauty of collaborative filtering, regardless of the implementation, is that it’s able to work without any knowledge about the content of your documents. Therefore, you could build a recommendation engine based upon Solr with documents containing nothing more than document IDs and users, and you should still see quality recommendations as long as you have enough users linking your documents together. If you don’t put any text content, attributes, or classifications into Solr, then it means you will not be able to make use of those additional techniques at all. The next section will discuss why you may want to consider combining multiple techniques to achieve optimal relevancy in your recommendation system. 16.5.8. Hybrid approaches Throughout this chapter, you have seen multiple different recommendation approaches, each with its own strengths and weaknesses.


Designing Search: UX Strategies for Ecommerce Success by Greg Nudelman, Pabini Gabriel-Petit

access to a mobile phone, Albert Einstein, AltaVista, augmented reality, barriers to entry, Benchmark Capital, business intelligence, call centre, cognitive load, crowdsourcing, folksonomy, information retrieval, Internet of things, Neal Stephenson, Palm Treo, performance metric, QR code, recommendation engine, RFID, search costs, search engine result page, semantic web, Silicon Valley, social graph, social web, speech recognition, text mining, the long tail, the map is not the territory, The Wisdom of Crowds, web application, zero-sum game, Zipcar

If fancy group formatting or Ajax carousels make customers disregard the more important More Like This buttons, such a page fails to meet its primary objective. Note—If you are still thinking about using a carousel for your More Like This groups, consider that Netflix has one of the best recommendation engines in the world and can usually select very relevant items to include among its 8 to 10 options. Amazon.com, which also has an exceptional recommendation engine, tried incorporating carousels for all its groups in the past, but has since dropped the feature. Amazon.com now uses the carousel feature sparingly, if at all, presumably, because the results underperformed the Spartan group design, which is optimized for quick scanning.

—Brynn Evans References Enterprise Social Search slides: www.slideshare.net/bmevans/designing-for-sociality-in-enterprise-search Wired article (Wired, November 2010): www.wired.com/magazine/2010/11/st_flowchart_social/ “Do your friends make you smarter” paper: http://brynnevans.com/papers/Do-your-friends-make-you-smarter.pdf Personalized Search and Recommender Systems Machine learning lets search engines draw reliable inferences and deliver improved search results by leveraging customers’ data. In the ecommerce realm, personalized search lets an online vendor use a customer’s past purchasing history—and possibly other data like product ratings, search history, the customer’s user profile, and even social networking activity—to interpret search strings, predict what products might be of interest to that customer, and deliver more relevant search results. On ecommerce sites, recommender systems—which are sometimes called implicit collaborative filtering systems, a bit of a misnomer—often use the past purchasing history of other customers who are similar in some way to a particular customer to predict what products might be of interest to that customer.

. … Given a similar-items table, the algorithm finds items similar to each of the user’s purchases and ratings, aggregates those items, and then recommends the most popular or correlated items.” [1] Amazon employs its recommender system to great effect—delivering product recommendations that encourage customers to browse additional products and, thus, helping users to find similar products of interest. Recommendations are particularly effective on product pages, where Amazon uses them in cross-selling additional products to customers. Amazon also personalizes the content on its home page extensively by providing many different types of recommendations. The recommender system Amazon has innovated helps customers find what they need and, because its recommendations actually provide a valuable service to customers, increases customer loyalty—and ultimately enhances Amazon’s bottom line.


pages: 523 words: 61,179

Human + Machine: Reimagining Work in the Age of AI by Paul R. Daugherty, H. James Wilson

3D printing, AI winter, algorithmic management, algorithmic trading, AlphaGo, Amazon Mechanical Turk, Amazon Robotics, augmented reality, autonomous vehicles, blockchain, business process, call centre, carbon footprint, circular economy, cloud computing, computer vision, correlation does not imply causation, crowdsourcing, data science, deep learning, DeepMind, digital twin, disintermediation, Douglas Hofstadter, driverless car, en.wikipedia.org, Erik Brynjolfsson, fail fast, friendly AI, fulfillment center, future of work, Geoffrey Hinton, Hans Moravec, industrial robot, Internet of things, inventory management, iterative process, Jeff Bezos, job automation, job satisfaction, knowledge worker, Lyft, machine translation, Marc Benioff, natural language processing, Neal Stephenson, personalized medicine, precision agriculture, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, robotic process automation, Rodney Brooks, Salesforce, Second Machine Age, self-driving car, sensor fusion, sentiment analysis, Shoshana Zuboff, Silicon Valley, Snow Crash, software as a service, speech recognition, tacit knowledge, telepresence, telepresence robot, text mining, the scientific method, uber lyft, warehouse automation, warehouse robotics

Whereas, in the past, a salesperson might glean a sales opportunity based on physical or social cues over the phone or in person, 6sense is returning to salespeople some of the skills that more socially opaque online interactions, like the extensive use of email, had blunted.7 Your Buddy, the Brand Some of the biggest changes to the front office are happening through online tools and AI-enabled interfaces. Think how easily Amazon customers can purchase a vast array of consumer items, thanks to AI-enhanced product-recommendation engines and “Alexa” (the personal assistant bot), which is used via “Echo” (the smart, voice-enabled wireless speaker). AI systems similar to those designed for jobs like customer service are now beginning to play a much larger role in generating revenue, traditionally a front-office objective, and the ease of the purchasing experience has become a major factor for customers.

., 7, 19, 106, 207 integration of, 3 modifying outcomes of, 172–174 potential and impact of, 3–4 in production, supply chain, and distribution, 19–39 in R&D, 67–83 responses to, 131–132 scientific method and, 69–77 skills of, 20–21, 105–106 symbiotic partnerships with, 7–8 third wave of, 4–6 training, 100, 114–122 “winters” of, 25, 41 Akshaya Patra, 37 Alexa, 11, 56, 86, 92, 94, 118, 146 Capital One and, 204–205 empathy training for, 117–118 Alexander, Rob, 204–205 algorithm aversion, 167 algorithm forensics analysts, 124–125 Alice, 146 Allgood, Brandon, 81 Almax, 89, 90 Amazon Alexa, 11, 56, 86, 92, 94–95, 118, 146 Echo, 92, 94–95, 164–165 fulfillment at, 31, 150 Go, 160–165 Mechanical Turk, 169 recommendation engine, 92 Amelia, 55–56, 139, 164, 201, 202 amplification, 7, 107, 138–139, 141–143, 176–177 jobs with, 141–143 See also augmentation; missing middle anthropomorphism, brand, 93–94 Antigena, 58 anti-money-laundering (AML) detection, 45–46, 51 Apple, 11, 96–97, 118, 146 Apprenticeship Levy, 202 apprenticing, reciprocal, 12, 201–202 Arizona State University, 49 “Artificial Intelligence, Automation, and the Economy,” 211 Asimov, Isaac, 69, 128–129 assembly lines, 1–2, 4 flexible teams vs., 13–14 AT&T, 188 Audi, 158–160, 190 audio and signal processing, 64 Audi Robotic Telepresence (ART), 159–160 augmentation, 5, 7 customer-aware shops and, 87–90 embodiment and, 147–149 fostering positive experiences with, 166 generative design and, 135–137 of observation, 157–158 types of, 138–140 workforce implications of, 137–138 augmented reality, 143 Autodesk, 3, 136–137, 141 automakers, 116–117, 140 autonomous cars and, 67–68, 166–167, 189, 190 BMW, 1, 4, 10, 149–150 customization among, 147–149 Mercedes-Benz, 4, 10 process reimagination at, 158–160 automation, 5, 19 intelligent, 65 automation ethicists, 130–131 Ayasdi, 178 back-office operations, 10 banking digital lending, 86 fraud detection in, 42 money laundering and, 45–46, 51 virtual assistants in, 55–56 Beiersdorf, 176–177 Benetton, 89 Benioff, Marc, 196 Berg Health, 82 Bezos, Jeff, 161, 164 BHP Billiton Ltd., 28 biases, 121–122, 129–130, 174, 179 biometrics, 65 BlackRock, 122 blockchain, 37 Bloomberg Beta, 195 BMW, 1, 4, 10, 148, 209 Boeing, 28, 143 Boli.io, 196 bot-based empowerment, 12, 186, 195–196 boundaries, 168–169 BQ Zosi, 146 Braga, Leda, 167 brands, 87, 92–94 anthropomorphism of, 93–94 disintermediated, 94–95 personalization and, 96–97 as two-way relationships, 119 Brooks, Rodney, 22, 24 burnout, 187–188 Burns, Ed, 76 business models, 152 business processes.

Intelligent automation. Transfers some tasks from man to machine to fundamentally change the traditional ways of operating. Through machine-specific strengths and capabilities (speed, scale, and the ability to cut through complexity), these tools complement human work to expand what is possible. Recommendation systems. Make suggestions based on subtle patterns detected by AI algorithms over time. These can be targeted toward consumers to suggest new products or used internally to make strategic suggestions. Intelligent products. Have intelligence baked into their design so that they can evolve to continuously meet and anticipate customers’ needs and preferences.


pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future by Kevin Kelly

A Declaration of the Independence of Cyberspace, Aaron Swartz, AI winter, Airbnb, Albert Einstein, Alvin Toffler, Amazon Web Services, augmented reality, bank run, barriers to entry, Baxter: Rethink Robotics, bitcoin, blockchain, book scanning, Brewster Kahle, Burning Man, cloud computing, commoditize, computer age, Computer Lib, connected car, crowdsourcing, dark matter, data science, deep learning, DeepMind, dematerialisation, Downton Abbey, driverless car, Edward Snowden, Elon Musk, Filter Bubble, Freestyle chess, Gabriella Coleman, game design, Geoffrey Hinton, Google Glasses, hive mind, Howard Rheingold, index card, indoor plumbing, industrial robot, Internet Archive, Internet of things, invention of movable type, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Perry Barlow, Kevin Kelly, Kickstarter, lifelogging, linked data, Lyft, M-Pesa, machine readable, machine translation, Marc Andreessen, Marshall McLuhan, Mary Meeker, means of production, megacity, Minecraft, Mitch Kapor, multi-sided market, natural language processing, Netflix Prize, Network effects, new economy, Nicholas Carr, off-the-grid, old-boy network, peer-to-peer, peer-to-peer lending, personalized medicine, placebo effect, planetary scale, postindustrial economy, Project Xanadu, recommendation engine, RFID, ride hailing / ride sharing, robo advisor, Rodney Brooks, self-driving car, sharing economy, Silicon Valley, slashdot, Snapchat, social graph, social web, software is eating the world, speech recognition, Stephen Hawking, Steven Levy, Ted Nelson, TED Talk, The future is already here, the long tail, the scientific method, transport as a service, two-sided market, Uber for X, uber lyft, value engineering, Watson beat the top human players on Jeopardy!, WeWork, Whole Earth Review, Yochai Benkler, yottabyte, zero-sum game

First I’d like to be delivered more of what I know I like. This personal filter already exists. It’s called a recommendation engine. It is in wide use at Amazon, Netflix, Twitter, LinkedIn, Spotify, Beats, and Pandora, among other aggregators. Twitter uses a recommendation system to suggest who I should follow based on whom I already follow. Pandora uses a similar system to recommend what new music I’ll like based on what I already like. Over half of the connections made on LinkedIn arise from their follower recommender. Amazon’s recommendation engine is responsible for the well-known banner that “others who like this item also liked this next item.”

Amazon’s greatest asset is not its Prime delivery service but the millions of reader reviews it has accumulated over decades. Readers will pay for Amazon’s all-you-can-read ebook service, Kindle Unlimited, even though they will be able to find ebooks for free elsewhere, because Amazon’s reviews will guide them to books they want to read. Ditto for Netflix. Movie fans will pay Netflix because their recommendation engine finds gems they would not otherwise discover. They may be free somewhere else, but they are essentially lost and buried. In these examples, you are not paying for the copies, you are paying for the findability. • • • These eight qualities require a new skill set for creators. Success no longer derives from mastering distribution.

., 70–71 and platform synergy, 122–25 and real-time on demand, 114–17 and renting, 117–18 and right of modification, 124–25 accountability, 260–64 Adobe, 113, 206 advertising, 177–89 aggregated information, 140, 147 Airbnb, 109, 113, 124, 172 algorithms and targeted advertising, 179–82 Alibaba, 109 Amazon and accessibility vs. ownership, 109 and artificial intelligence, 33 cloud of, 128, 129 and on-demand model of access, 115 as ecosystem, 124 and filtering systems, 171–72 and recommendation engines, 169 and robot technology, 50 and tracking technology, 254 and user reviews, 21, 72–73 anime, 198 annotation systems, 202 anonymity, 263–64 anthropomorphization of technology, 259 Apache software, 69, 141, 143 API (application programming interface), 23 Apple, 1–2, 123, 124, 246 Apple Pay, 65 Apple Watch, 224 Arthur, Brian, 193, 209 artificial intelligence (AI), 29–60 ability to think differently, 42–43, 48, 51–52 as accelerant of change, 30 as alien intelligence, 48 in chess, 41–42 and cloud-based services, 127 and collaboration, 273 and commodity consumer attention, 179 and complex questions, 47 concerns regarding, 44 and consciousness, 42 corporate investment in, 32 costs of, 29, 52–53 data informing, 39 and defining humanity, 48–49 and digital storage capacity, 265, 266–67 and emergence of the “holos,” 291 as enhancement of human intelligence, 41–42 and filtering systems, 175 of Google, 36–37 impact of, 29 learning ability of, 32–33, 40 and lifelogging, 251 networked, 30 and network effect, 40 potential applications for, 34–36 questions arising from, 284 specialized applications of, 42 in tagging book content, 98 technological breakthroughs influencing, 38–40 ubiquity of, 30, 33 and video games, 230 and visual intelligence, 203 See also robots arts and artists artist/audience inversion, 81 and augmented reality, 232 and authenticity, 70 and creative remixing, 209 and crowdfunding, 156–61 and low-cost reproduction, 87 and patronage, 72 public art, 232 attention, 168–69, 176, 177–89 audience, 88, 148–49, 155, 156–57 audio recording, 249.


pages: 306 words: 82,909

A Hacker's Mind: How the Powerful Bend Society's Rules, and How to Bend Them Back by Bruce Schneier

4chan, Airbnb, airport security, algorithmic trading, Alignment Problem, AlphaGo, Automated Insights, banking crisis, Big Tech, bitcoin, blockchain, Boeing 737 MAX, Brian Krebs, Capital in the Twenty-First Century by Thomas Piketty, cloud computing, computerized trading, coronavirus, corporate personhood, COVID-19, cryptocurrency, dark pattern, deepfake, defense in depth, disinformation, Donald Trump, Double Irish / Dutch Sandwich, driverless car, Edward Thorp, Elon Musk, fake news, financial innovation, Financial Instability Hypothesis, first-past-the-post, Flash crash, full employment, gig economy, global pandemic, Goodhart's law, GPT-3, Greensill Capital, high net worth, Hyman Minsky, income inequality, independent contractor, index fund, information security, intangible asset, Internet of things, Isaac Newton, Jeff Bezos, job automation, late capitalism, lockdown, Lyft, Mark Zuckerberg, money market fund, moral hazard, move fast and break things, Nate Silver, offshore financial centre, OpenAI, payday loans, Peter Thiel, precautionary principle, Ralph Nader, recommendation engine, ride hailing / ride sharing, self-driving car, sentiment analysis, Skype, smart cities, SoftBank, supply chain finance, supply-chain attack, surveillance capitalism, systems thinking, TaskRabbit, technological determinism, TED Talk, The Wealth of Nations by Adam Smith, theory of mind, TikTok, too big to fail, Turing test, Uber and Lyft, uber lyft, ubercab, UNCLOS, union organizing, web application, WeWork, When a measure becomes a target, WikiLeaks, zero day

DEFENDING AGAINST AI HACKERS 236recommendation engines: Zeynep Tufekci (10 Mar 2018), “YouTube, the great equalizer,” New York Times, https://www.nytimes.com/2018/03/10/opinion/sunday/youtube-politics-radical.html. Renee DiResta (11 Apr 2018), “Up next: A better recommendation system,” Wired, https://www.wired.com/story/creating-ethical-recommendation-engines. 237can also benefit the defense: One example: Gregory Falco et al. (28 Aug 2018), “A master attack methodology for an AI-based automated attack planner for smart cities,” IEEE Access 6, https://ieeexplore.ieee.org/document/8449268. 59. A FUTURE OF AI HACKERS 242novel and completely unexpected hacks: Hedge funds and investment firms are already using AI to inform investment decisions.

And tribalism is so powerful and divisive that hacking it—especially with digital speed and precision—can have disastrous social effects, whether that’s the goal of a computer-assisted social hacker (like the Russians) or a side effect of an AI that neither knows nor cares about the costs of its actions (like social media recommendation engines). 48 Defending against Cognitive Hacks The “pick-up artist” community is a movement of men who develop and share manipulative techniques to seduce women. It predates the popular Internet but thrives there today. A lot of their techniques resemble cognitive hacks. “Negging” is one of their techniques.

If your driverless car navigation system satisfies the goal of maintaining a high speed by spinning in circles, programmers will notice this behavior and modify the AI’s goal accordingly. We’ll never see this behavior on the road. The greatest concern lies in the less obvious hacks that we won’t even notice because their effects are subtle. Much has been written about recommendation engines—the first generation of subtle AI hacks—and how they push people towards extreme content. They weren’t programmed to do this; it’s a property that naturally emerged as the systems continually tried things, saw the results, then modified themselves to do more of what increased user engagement and less of what didn’t.


pages: 302 words: 73,581

Platform Scale: How an Emerging Business Model Helps Startups Build Large Empires With Minimum Investment by Sangeet Paul Choudary

3D printing, Airbnb, Amazon Web Services, barriers to entry, bitcoin, blockchain, business logic, business process, Chuck Templeton: OpenTable:, Clayton Christensen, collaborative economy, commoditize, crowdsourcing, cryptocurrency, data acquisition, data science, fake it until you make it, frictionless, game design, gamification, growth hacking, Hacker News, hive mind, hockey-stick growth, Internet of things, invisible hand, Kickstarter, Lean Startup, Lyft, M-Pesa, Marc Andreessen, Mark Zuckerberg, means of production, multi-sided market, Network effects, new economy, Paul Graham, recommendation engine, ride hailing / ride sharing, Salesforce, search costs, shareholder value, sharing economy, Silicon Valley, Skype, Snapchat, social bookmarking, social graph, social software, software as a service, software is eating the world, Spread Networks laid a new fibre optics cable between New York and Chicago, TaskRabbit, the long tail, the payments system, too big to fail, transport as a service, two-sided market, Uber and Lyft, Uber for X, uber lyft, vertical integration, Wave and Pay

Many Web 1.0 era filters were created based on long sign-up forms that the user filled out. Today, filters are created based on data captured on an ongoing basis through a user’s actions. Filters may be standalone or collaborative. Amazon’s “People who purchased this product also purchased this product” feature is based on a collaborative filter. Many recommendation platforms allow users to filter results based on a “people like you” parameter. This, again, is a collaborative filter. The most important innovation in recent times that has led to the spread of collaborative filters is the implementation of Facebook’s social graph. Through the social graph, third-party platforms like TripAdvisor serve reviews based on a collaborative filter of people who are close to you on the graph.


pages: 267 words: 72,552

Reinventing Capitalism in the Age of Big Data by Viktor Mayer-Schönberger, Thomas Ramge

accounting loophole / creative accounting, Air France Flight 447, Airbnb, Alvin Roth, Apollo 11, Atul Gawande, augmented reality, banking crisis, basic income, Bayesian statistics, Bear Stearns, behavioural economics, bitcoin, blockchain, book value, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, Cass Sunstein, centralized clearinghouse, Checklist Manifesto, cloud computing, cognitive bias, cognitive load, conceptual framework, creative destruction, Daniel Kahneman / Amos Tversky, data science, Didi Chuxing, disruptive innovation, Donald Trump, double entry bookkeeping, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, flying shuttle, Ford Model T, Ford paid five dollars a day, Frederick Winslow Taylor, fundamental attribution error, George Akerlof, gig economy, Google Glasses, Higgs boson, information asymmetry, interchangeable parts, invention of the telegraph, inventory management, invisible hand, James Watt: steam engine, Jeff Bezos, job automation, job satisfaction, joint-stock company, Joseph Schumpeter, Kickstarter, knowledge worker, labor-force participation, land reform, Large Hadron Collider, lone genius, low cost airline, low interest rates, Marc Andreessen, market bubble, market design, market fundamentalism, means of production, meta-analysis, Moneyball by Michael Lewis explains big data, multi-sided market, natural language processing, Neil Armstrong, Network effects, Nick Bostrom, Norbert Wiener, offshore financial centre, Parag Khanna, payday loans, peer-to-peer lending, Peter Thiel, Ponzi scheme, prediction markets, price anchoring, price mechanism, purchasing power parity, radical decentralization, random walk, recommendation engine, Richard Thaler, ride hailing / ride sharing, Robinhood: mobile stock trading app, Sam Altman, scientific management, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, smart grid, smart meter, Snapchat, statistical model, Steve Jobs, subprime mortgage crisis, Suez canal 1869, tacit knowledge, technoutopianism, The Future of Employment, The Market for Lemons, The Nature of the Firm, transaction costs, universal basic income, vertical integration, William Langewiesche, Y Combinator

With data-richness, market participants may learn the preferences of others and pair them using matching algorithms, but how do market participants express their preferences and their relative weight and communicate them to each other? It’s a difficult challenge, and solving it is crucial. Nobody wants to transact on markets that require hours of time spent answering questionnaires. Fortunately, here, too, recent technical advances have gotten us much closer to viable solutions. Consider again Amazon’s product-recommendation engine: at first glance, it’s a matching system. It quite successfully matches our preferences with available products and makes recommendations about what we should order. But that is only half of the story. Amazon captures our preferences not from us directly but from the comprehensive data stream it gathers about our every interaction with its website—what products we look at, when and for how long we look at them, which reviews we read.

To put an end to such inefficiencies, firms such as American Express, AT&T, and IBM have phased in software platforms that go far beyond classified-ad-type announcements of open positions on the company’s intranet. They match detailed (albeit standardized) job descriptions with detailed (albeit standardized) talent profiles. Filters make individuals and position pools easy to search, both for employees seeking a new challenge and for managers looking for new talent. And recommendation engines facilitate matchmaking across multiple dimensions. These internal talent marketplaces offer a number of advantages. First, they decentralize matching, reducing information overload within HR departments. Searching and matching is done outside HR, by managers with positions to fill and employees interested in making a move.

Consider Amazon: because of its sheer scale, it can fulfill customer orders at low cost. Network effects make Amazon a thick market, with lots of buyers and sellers, and many customers who leave valuable product reviews for others. Each additional customer adds value to the community. Finally, Amazon uses adaptive systems and feedback data to hone its recommendation engine, as well as its intelligent personal assistant, Alexa. Apple’s iPhone is another case in point. Because it can mass produce the phone, Apple can keep profit margins high while still holding to a price point that’s acceptable to consumers. A growing number of iPhone users have led to a vibrant app market.


pages: 296 words: 78,631

Hello World: Being Human in the Age of Algorithms by Hannah Fry

23andMe, 3D printing, Air France Flight 447, Airbnb, airport security, algorithmic bias, algorithmic management, augmented reality, autonomous vehicles, backpropagation, Brixton riot, Cambridge Analytica, chief data officer, computer vision, crowdsourcing, DARPA: Urban Challenge, data science, deep learning, DeepMind, Douglas Hofstadter, driverless car, Elon Musk, fake news, Firefox, Geoffrey Hinton, Google Chrome, Gödel, Escher, Bach, Ignaz Semmelweis: hand washing, John Markoff, Mark Zuckerberg, meta-analysis, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, pattern recognition, Peter Thiel, RAND corporation, ransomware, recommendation engine, ride hailing / ride sharing, selection bias, self-driving car, Shai Danziger, Silicon Valley, Silicon Valley startup, Snapchat, sparse data, speech recognition, Stanislav Petrov, statistical model, Stephen Hawking, Steven Levy, systematic bias, TED Talk, Tesla Model S, The Wisdom of Crowds, Thomas Bayes, trolley problem, Watson beat the top human players on Jeopardy!, web of trust, William Langewiesche, you are the product

There are algorithms that can automatically classify and remove inappropriate content on YouTube, algorithms that will label your holiday photos for you, and algorithms that can scan your handwriting and classify each mark on the page as a letter of the alphabet. Association: finding links Association is all about finding and marking relationships between things. Dating algorithms such as OKCupid have association at their core, looking for connections between members and suggesting matches based on the findings. Amazon’s recommendation engine uses a similar idea, connecting your interests to those of past customers. It’s what led to the intriguing shopping suggestion that confronted Reddit user Kerbobotat after buying a baseball bat on Amazon: ‘Perhaps you’ll be interested in this balaclava?’11 Filtering: isolating what’s important Algorithms often need to remove some information to focus on what’s important, to separate the signal from the noise.

But an algorithm needs something to go on. So, once you take away popularity and inherent quality, you’re left with the only thing that can be quantified: a metric for similarity to whatever has gone before. There’s still a great deal that can be done using measures of similarity. When it comes to building a recommendation engine, like the ones found in Netflix and Spotify, similarity is arguably the ideal measure. Both companies have a way to help users discover new films and songs, and, as subscription services, both have an incentive to accurately predict what users will enjoy. They can’t base their algorithms on what’s popular, or users would just get bombarded with suggestions for Justin Bieber and Peppa Pig The Movie.

Every now and then they will come up with something that you absolutely love, but it’s a bit like cold reading in that sense. You only need a strike every now and then to feel the serendipity of discovering new music. The engines don’t need to be right all the time. Similarity works perfectly well for recommendation engines. But when you ask algorithms to create art without a pure measure for quality, that’s where things start to get interesting. Can an algorithm be creative if its only sense of art is what happened in the past? Good artists borrow; great artists steal – Pablo Picasso In October 1997, an audience arrived at the University of Oregon to be treated to a rather unusual concert.


pages: 202 words: 62,901

The People's Republic of Walmart: How the World's Biggest Corporations Are Laying the Foundation for Socialism by Leigh Phillips, Michal Rozworski

Alan Greenspan, Anthropocene, Berlin Wall, Bernie Sanders, biodiversity loss, call centre, capitalist realism, carbon footprint, carbon tax, central bank independence, Colonization of Mars, combinatorial explosion, company town, complexity theory, computer age, corporate raider, crewed spaceflight, data science, decarbonisation, digital rights, discovery of penicillin, Elon Musk, financial engineering, fulfillment center, G4S, Garrett Hardin, Georg Cantor, germ theory of disease, Gordon Gekko, Great Leap Forward, greed is good, hiring and firing, independent contractor, index fund, Intergovernmental Panel on Climate Change (IPCC), Internet of things, inventory management, invisible hand, Jeff Bezos, Jeremy Corbyn, Joseph Schumpeter, Kanban, Kiva Systems, linear programming, liquidity trap, mass immigration, Mont Pelerin Society, Neal Stephenson, new economy, Norbert Wiener, oil shock, passive investing, Paul Samuelson, post scarcity, profit maximization, profit motive, purchasing power parity, recommendation engine, Ronald Coase, Ronald Reagan, sharing economy, Silicon Valley, Skype, sovereign wealth fund, strikebreaker, supply-chain management, surveillance capitalism, technoutopianism, TED Talk, The Nature of the Firm, The Wealth of Nations by Adam Smith, theory of mind, Tragedy of the Commons, transaction costs, Turing machine, union organizing, warehouse automation, warehouse robotics, We are all Keynesians now

Two of the best examples of this are the “chaotic storage” system Amazon uses in its warehouses and the recommendations system buzzing in the background of its website, telling you which books or garden implements you might be interested in. Amazon’s recommendations system is the backbone of the company’s rapid success. This system drives those usually helpful (although sometimes comical—“Frequently bought together: baseball bat + black balaclava”) items that pop up in the “Customers who bought this also bought …” section of the website. Recommendations systems solve some of the information problems that have historically been associated with planning.

A universe of the most disparate ratings and reviews—always partial and often contradictory—can, if parsed right, provide very useful and lucrative information. Amazon also uses a system it calls “item-to-item collaborative filtering.” The company made a breakthrough when it devised its recommendations algorithm by managing to avoid common pitfalls plaguing other early recommendation engines. Amazon’s system doesn’t look for similarities between people; not only do such systems slow down significantly once millions are profiled, but they report significant overlaps among people whose tastes are actually very different (e.g., hipsters and boomers who buy the same bestsellers).

The two things may not be very obviously related, but it is enough that some people buy or browse them together. Combining millions of such interactions between people and things, Amazon’s algorithm creates a virtual map of its catalog that adapts very well to new information, even saving precious computing power when compared to the alternatives—clunkier recommendations systems that try to match similar users or find abstract similarities. Here is how the researchers at IBM’s labs describe Amazon’s recommendations: “When it takes other users’ behavior into account, collaborative filtering uses group knowledge to form a recommendation based on like users.” Filtering is an example of an IT-based rejoinder to one of the criticisms Hayek leveled against his socialist adversaries in the 1930s calculation debate: that only markets can aggregate and put to use the information dispersed throughout society.


pages: 368 words: 96,825

Bold: How to Go Big, Create Wealth and Impact the World by Peter H. Diamandis, Steven Kotler

3D printing, additive manufacturing, adjacent possible, Airbnb, Amazon Mechanical Turk, Amazon Web Services, Apollo 11, augmented reality, autonomous vehicles, Boston Dynamics, Charles Lindbergh, cloud computing, company town, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, data science, deal flow, deep learning, dematerialisation, deskilling, disruptive innovation, driverless car, Elon Musk, en.wikipedia.org, Exxon Valdez, fail fast, Fairchild Semiconductor, fear of failure, Firefox, Galaxy Zoo, Geoffrey Hinton, Google Glasses, Google Hangouts, gravity well, hype cycle, ImageNet competition, industrial robot, information security, Internet of things, Jeff Bezos, John Harrison: Longitude, John Markoff, Jono Bacon, Just-in-time delivery, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, loss aversion, Louis Pasteur, low earth orbit, Mahatma Gandhi, Marc Andreessen, Mark Zuckerberg, Mars Rover, meta-analysis, microbiome, minimum viable product, move fast and break things, Narrative Science, Netflix Prize, Network effects, Oculus Rift, OpenAI, optical character recognition, packet switching, PageRank, pattern recognition, performance metric, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, Ray Kurzweil, recommendation engine, Richard Feynman, ride hailing / ride sharing, risk tolerance, rolodex, Scaled Composites, self-driving car, sentiment analysis, shareholder value, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart grid, SpaceShipOne, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Stewart Brand, Stuart Kauffman, superconnector, Susan Wojcicki, synthetic biology, technoutopianism, TED Talk, telepresence, telepresence robot, Turing test, urban renewal, Virgin Galactic, Wayback Machine, web application, X Prize, Y Combinator, zero-sum game

Well, in the case of Netflix, a better movie recommendation engine. A movie recommendation engine is a bit of software that tells you what movie you might want to watch next based on movies you’ve already watched and rated (on a scale of one to five stars). Netflix’s original recommendation engine, Cinematch, was created back in 2000 and quickly proved to be a wild success. Within a few years, nearly two-thirds of their rental business was being driven by their recommendation engine. Thus the obvious corollary: the better their recommendation engine, the better their business. And that was the problem.

In December 2006, a competitor called ‘simonfunk’ posted a complete description of his algorithm—which at the time was tied for third place—giving everyone else the opportunity to piggyback on his progress. ‘We had no idea the extent to which people would collaborate with each other,’ says Jim Bennett, vice president for recommendation systems at Netflix.”16 And this isn’t an aberration. Over the course of the eight XPRIZEs launched to date, there has been an extraordinary amount of cooperation. We’ve seen teams providing unsolicited advice, teams merging, teams acquiring and sharing technology and experts. When the prize is driven by an MTP, while a team’s primary purpose is to win, a close second is their desire to see the primary objective achieved; thus teams exhibit a much higher willingness to share.


pages: 353 words: 104,146

European Founders at Work by Pedro Gairifo Santos

business intelligence, clean tech, cloud computing, crowdsourcing, deal flow, do what you love, fail fast, fear of failure, full text search, Hacker News, hockey-stick growth, information retrieval, inventory management, iterative process, Jeff Bezos, Joi Ito, Lean Startup, Mark Zuckerberg, Multics, natural language processing, pattern recognition, pre–internet, recommendation engine, Richard Stallman, Salesforce, Silicon Valley, Skype, slashdot, SoftBank, Steve Jobs, Steve Wozniak, subscription business, technology bubble, TED Talk, web application, Y Combinator

Jones: Nothing substantial, really. I think sometimes rights holders, especially in the music industry, will use court action or the threat of court action as a sort of negotiating position. But, no. I think we managed to avoid anything serious in that regard. Santos: From the technical point of view, the actual recommendation engine and statistics, how does that actually work? How hard was it to develop it and tweak it? Did you change the approach many times? Did you have a clear idea on how to do it from the start? Jones: So initially when I was building it, we tried all sorts of stuff. I think what I was using for a long time in the beginning was just to use Lucene, a document indexing system.

At one point we published a data dump of all of these scrobbling histories and some of our users at the time contributed various recommender strategies and said, “Hey, try this. I had quite good results with it." So for a while, we were piecing together ideas from the community. All this time, were mainly concerned with keeping the site afloat, keeping it fast, scaling up properly, and this sort of scrobbling data and radio. The recommendation engine wasn't brilliant to begin with. And then, we finally decided we needed to hire somebody who knows what they're doing, who's going to work on this full-time. We e-mailed some mailing lists. We e-mailed the ISMIR2 mailing list. They're a group who meet every year about music recommendations and information retrieval in music.

I told some of my friends so I could get some data and they told their friends and they told their friends and it spread. It turns out people quite liked just having those stats on what they listened to. They weren't even interested in recommendations at that point. I didn't really have a good recommender system for a long time. From your listening stats, you could click on an artist, and see who else had been listening to them. You could then see the listening stats of the other fans of artists you like. Just that system of connecting all the listening tastes proved to be really quite addictive.


pages: 404 words: 95,163

Amazon: How the World’s Most Relentless Retailer Will Continue to Revolutionize Commerce by Natalie Berg, Miya Knights

3D printing, Adam Neumann (WeWork), Airbnb, Amazon Robotics, Amazon Web Services, asset light, augmented reality, Bernie Sanders, big-box store, business intelligence, cloud computing, Colonization of Mars, commoditize, computer vision, connected car, deep learning, DeepMind, digital divide, Donald Trump, Doomsday Clock, driverless car, electronic shelf labels (ESLs), Elon Musk, fulfillment center, gig economy, independent contractor, Internet of things, inventory management, invisible hand, Jeff Bezos, Kiva Systems, market fragmentation, new economy, Ocado, pattern recognition, Ponzi scheme, pre–internet, QR code, race to the bottom, random stow, recommendation engine, remote working, Salesforce, sensor fusion, sharing economy, Skype, SoftBank, Steve Bannon, sunk-cost fallacy, supply-chain management, TaskRabbit, TechCrunch disrupt, TED Talk, trade route, underbanked, urban planning, vertical integration, warehouse automation, warehouse robotics, WeWork, white picket fence, work culture

The value of recommendation Having identified AI as the culmination of the main drivers shaping technology innovation today (stemming from a need for more autonomous computer systems particularly) – and before diving straight into voice technology as its current apotheosis – it is necessary to undertake an examination of how Amazon capitalized on the development of AI systems across its business and not just in its customers’ homes, as we have already done with the drivers of ubiquitous connectivity and pervasive interfaces. This examination adds to our understanding of how it has achieved its aim of removing friction from the average shopping journey and, in so doing, created a virtuous cycle that, in turn, generates even more sales and growth. In fact, it is AI that underpins the power of its search and recommendation engines. Back in the 1990s, Amazon was one of the first e-commerce players to rely heavily on product recommendations, which also helped it to cross-sell new categories as it moved beyond books. It is a category of technology development that Bezos has described as ‘the practical application of machine learning’.

The decision to open source DSSTNE also demonstrates when Amazon recognizes the need to collaborate over making gains with the vast potential of AI. On the Amazon site, these recommendations can be personalized, based on categories and ranges previously searched or browsed, to increase conversion. Equally, Amazon’s recommendation engine can display products similar to those searched for or browsed in the hopes of converting customers to rival brands or products. There are also recommendations based on anything ‘related to the items you’ve viewed’. Or they can depend on items that are ‘frequently bought together’ or by ‘customers who bought this item also bought…’ with the aim of boosting average order value.

Return customers to the Group’s Tmall and Taobao platforms are presented with product recommendations based not just on their past transactions, but also on browsing history, product feedback, bookmarks, geographic location and other online activity-related data. During the 2016 ‘Singles’ Day’ shopping festival, Alibaba said it used its AI recommendations engine to generate 6.7 billion personalized shopping pages based on merchants’ target customer data. Alibaba said that this large-scale personalization resulted in a 20 per cent improvement in conversion rate from the 11 November event.4 Recommendations and personalization aside, Amazon’s reliance on AI systems to orchestrate its vast business operations as well as its customer-facing ones is diverse.


pages: 377 words: 97,144

Singularity Rising: Surviving and Thriving in a Smarter, Richer, and More Dangerous World by James D. Miller

23andMe, affirmative action, Albert Einstein, artificial general intelligence, Asperger Syndrome, barriers to entry, brain emulation, cloud computing, cognitive bias, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, David Brooks, David Ricardo: comparative advantage, Deng Xiaoping, en.wikipedia.org, feminist movement, Flynn Effect, friendly AI, hive mind, impulse control, indoor plumbing, invention of agriculture, Isaac Newton, John Gilmore, John von Neumann, knowledge worker, Larry Ellison, Long Term Capital Management, low interest rates, low skilled workers, Netflix Prize, neurotypical, Nick Bostrom, Norman Macrae, pattern recognition, Peter Thiel, phenotype, placebo effect, prisoner's dilemma, profit maximization, Ray Kurzweil, recommendation engine, reversible computing, Richard Feynman, Rodney Brooks, Silicon Valley, Singularitarianism, Skype, statistical model, Stephen Hawking, Steve Jobs, sugar pill, supervolcano, tech billionaire, technological singularity, The Coming Technological Singularity, the scientific method, Thomas Malthus, transaction costs, Turing test, twin studies, Vernor Vinge, Von Neumann architecture

This recommendation system bases its decisions on statistical analysis of the videos that viewers with tastes similar to yours have chosen and rated positively.75 Let me now offer you thirteen reasons why video recommendation is an excellent medium in which to develop AI: 1.Massive Profits—The growing proliferation of Internet videos means that a high-quality AI recommender would be worth billions to its owner. 2.Implicitly Knows a Lot About Us—Although we humans often understand why we like a video and can accurately guess what other types of people would like it, we frequently can’t reduce our reasoning to words, in part because mere language generally isn’t rich enough to capture our video experiences. A big part of our brain is devoted to processing visual inputs. Hence, a good recommendation system would necessarily have powerful insights into a significant chunk of our brains. 3.Measurable Incremental Progress—Think of AI as a destination a thousand miles away with the entire pathway hidden by fog. To reach our destination, we need to take many small steps, and for each step we need a way to determine if we have gone in the right direction. A video recommendation system provides this corrective by gathering continuous feedback on how many users liked the recommended videos. 4.Profitable with Every Step—Businesses are more motivated to invest in a type of innovation if they can continually increase revenue with each small improvement.

Fortunately, with video recommendations, many challenges, such as finding what type of cat video a certain set of users might enjoy, can be worked on independently for reasonably long periods of time. 6.Free Labor from Customers—A recommendation system would rely on millions of people to freely help train the system by picking which videos to watch, rating some of the videos they see, writing reviews of videos, and labeling in words the content they upload. 7.Help from Advertisers and Political Consultants—Salesmen would eagerly seek to learn what types of messages appealed to different factions of the population. The recommendation system could piggyback on these salesmen’s attempts to understand their clientele and use their insights to improve recommendation software. 8.AI and Human Recommenders Could Productively Work Together—Unlike what YouTube currently does, an effective AI recommendation system could make use of human evaluators.

For example, if 90 percent of people who had some unusual allele or brain microstructure enjoyed a certain cat video, then the AI recommender would suggest the video to all other viewers who had that trait. 12.Amenable to Crowdsourcing—Netflix, the rent-by-mail and streaming video distributor, offered (and eventually paid) a $1 million prize to whichever group improved its recommendation system the most, so long as at least one group improved the system by at least 10 percent. This “crowdsourcing,” which occurs when a problem is thrown open to anyone, helps a company by allowing them to draw on the talents of strangers, while only paying the strangers if they help the firm. This kind of crowdsourcing works only if, as with a video recommendation system, there is an easy and objective way of measuring progress toward the crowdsourced goal. 13.Potential Improvement All the Way Up to Superhuman Artificial General Intelligence—A recommendation AI could slowly morph into a content creator.


pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier

23andMe, Affordable Care Act / Obamacare, airport security, Apollo 11, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, book value, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, data science, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, hype cycle, IBM and the Holocaust, index card, informal economy, intangible asset, Internet of things, invention of the printing press, Jeff Bezos, Joi Ito, lifelogging, Louis Pasteur, machine readable, machine translation, Marc Benioff, Mark Zuckerberg, Max Levchin, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, paypal mafia, performance metric, Peter Thiel, Plato's cave, post-materialism, random walk, recommendation engine, Salesforce, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, sparse data, speech recognition, Steve Jobs, Steven Levy, systematic bias, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Thomas Davenport, Turing test, vertical integration, Watson beat the top human players on Jeopardy!

For example, in Amazon’s early days it signed a deal with AOL to run the technology behind AOL’s e-commerce site. To most people, it looked like an ordinary outsourcing deal. But what really interested Amazon, explains Andreas Weigend, Amazon’s former chief scientist, was getting hold of data on what AOL users were looking at and buying, which would improve the performance of its recommendation engine. Poor AOL never realized this. It only saw the data’s value in terms of its primary purpose—sales. Clever Amazon knew it could reap benefits by putting the data to a secondary use. Or take the case of Google’s entry into speech recognition with GOOG-411 for local search listings, which ran from 2007 to 2010.

Purchase one about babies and you’d be inundated with more of the same. “They tended to offer you tiny variations on your previous purchase, ad infinitum,” recalled James Marcus, an Amazon book reviewer from 1996 to 2001, in his memoir, Amazonia. “It felt as if you had gone shopping with the village idiot.” Greg Linden saw a solution. He realized that the recommendation system didn’t actually need to compare people with other people, a task that was technically cumbersome. All it needed to do was find associations among products themselves. In 1998 Linden and his colleagues applied for a patent on “item-to-item” collaborative filtering, as the technique is known.

Salespeople in all sectors have long been told that they need to understand what makes customers tick, to grasp the reasons behind their decisions. Professional skills and years of experience have been highly valued. Big data shows that there is another, in some ways more pragmatic approach. Amazon’s innovative recommendation systems teased out valuable correlations without knowing the underlying causes. Knowing what, not why, is good enough. Predictions and predilections Correlations are useful in a small-data world, but in the context of big data they really shine. Through them we can glean insights more easily, faster, and more clearly than before.


The Ethical Algorithm: The Science of Socially Aware Algorithm Design by Michael Kearns, Aaron Roth

23andMe, affirmative action, algorithmic bias, algorithmic trading, Alignment Problem, Alvin Roth, backpropagation, Bayesian statistics, bitcoin, cloud computing, computer vision, crowdsourcing, data science, deep learning, DeepMind, Dr. Strangelove, Edward Snowden, Elon Musk, fake news, Filter Bubble, general-purpose programming language, Geoffrey Hinton, Google Chrome, ImageNet competition, Lyft, medical residency, Nash equilibrium, Netflix Prize, p-value, Pareto efficiency, performance metric, personalized medicine, pre–internet, profit motive, quantitative trading / quantitative finance, RAND corporation, recommendation engine, replication crisis, ride hailing / ride sharing, Robert Bork, Ronald Coase, self-driving car, short selling, sorting algorithm, sparse data, speech recognition, statistical model, Stephen Hawking, superintelligent machines, TED Talk, telemarketer, Turing machine, two-sided market, Vilfredo Pareto

The engine can then recommend to a user the movies that it predicts she will rate the highest. Netflix had a basic recommendation system based on collaborative filtering, but the company wanted a better one. The Netflix Prize competition offered $1 million for improving the accuracy of Netflix’s existing system by 10 percent. A 10 percent improvement is hard, so Netflix expected a multiyear competition. An improvement of 1 percent over the previous year’s state of the art qualified a competitor for an annual $50,000 progress prize, which would go to the best recommendation system submitted that year. Of course, to build a recommendation system, you need data, so Netflix publicly released a lot of it—a dataset consisting of more than a hundred million movie rating records, corresponding to the ratings that roughly half a million users gave to a total of nearly eighteen thousand movies.

But now that we know this, can the problem of privacy be solved by simply concealing information about birthdate, sex, and zip code in future data releases? It turns out that lots of less obvious things can also identify you—like the movies you watch. In 2006, Netflix launched the Netflix Prize competition, a public data science competition to find the best “collaborative filtering” algorithm to power Netflix’s movie recommendation engine. A key feature of Netflix’s service is its ability to recommend to users movies that they might like, given how they have rated past movies. (This was especially important when Netflix was primarily a mail-order DVD rental service, rather than a streaming service—it was harder to quickly browse or sample movies.)


Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

The discussion in Chapter 2 was focused around OLTP-style use: quickly executing queries to find a small number of vertices matching certain criteria. It is also interesting to look at graphs in a batch processing context, where the goal is to perform some kind of offline processing or analysis on an entire graph. This need often arises in machine learning applications such as recommendation engines, or in ranking systems. For example, one of the most famous graph analysis algorithms is PageRank [69], which tries to estimate the popularity of a web page based on what other web pages link to it. It is used as part of the formula that determines the order in which web search engines present their results.

The opposite of bounded. 558 | Glossary Index A aborts (transactions), 222, 224 in two-phase commit, 356 performance of optimistic concurrency con‐ trol, 266 retrying aborted transactions, 231 abstraction, 21, 27, 222, 266, 321 access path (in network model), 37, 60 accidental complexity, removing, 21 accountability, 535 ACID properties (transactions), 90, 223 atomicity, 223, 228 consistency, 224, 529 durability, 226 isolation, 225, 228 acknowledgements (messaging), 445 active/active replication (see multi-leader repli‐ cation) active/passive replication (see leader-based rep‐ lication) ActiveMQ (messaging), 137, 444 distributed transaction support, 361 ActiveRecord (object-relational mapper), 30, 232 actor model, 138 (see also message-passing) comparison to Pregel model, 425 comparison to stream processing, 468 Advanced Message Queuing Protocol (see AMQP) aerospace systems, 6, 10, 305, 372 aggregation data cubes and materialized views, 101 in batch processes, 406 in stream processes, 466 aggregation pipeline query language, 48 Agile, 22 minimizing irreversibility, 414, 497 moving faster with confidence, 532 Unix philosophy, 394 agreement, 365 (see also consensus) Airflow (workflow scheduler), 402 Ajax, 131 Akka (actor framework), 139 algorithms algorithm correctness, 308 B-trees, 79-83 for distributed systems, 306 hash indexes, 72-75 mergesort, 76, 402, 405 red-black trees, 78 SSTables and LSM-trees, 76-79 all-to-all replication topologies, 175 AllegroGraph (database), 50 ALTER TABLE statement (SQL), 40, 111 Amazon Dynamo (database), 177 Amazon Web Services (AWS), 8 Kinesis Streams (messaging), 448 network reliability, 279 postmortems, 9 RedShift (database), 93 S3 (object storage), 398 checking data integrity, 530 amplification of bias, 534 of failures, 364, 495 Index | 559 of tail latency, 16, 207 write amplification, 84 AMQP (Advanced Message Queuing Protocol), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 message ordering, 446 analytics, 90 comparison to transaction processing, 91 data warehousing (see data warehousing) parallel query execution in MPP databases, 415 predictive (see predictive analytics) relation to batch processing, 411 schemas for, 93-95 snapshot isolation for queries, 238 stream analytics, 466 using MapReduce, analysis of user activity events (example), 404 anti-caching (in-memory databases), 89 anti-entropy, 178 Apache ActiveMQ (see ActiveMQ) Apache Avro (see Avro) Apache Beam (see Beam) Apache BookKeeper (see BookKeeper) Apache Cassandra (see Cassandra) Apache CouchDB (see CouchDB) Apache Curator (see Curator) Apache Drill (see Drill) Apache Flink (see Flink) Apache Giraph (see Giraph) Apache Hadoop (see Hadoop) Apache HAWQ (see HAWQ) Apache HBase (see HBase) Apache Helix (see Helix) Apache Hive (see Hive) Apache Impala (see Impala) Apache Jena (see Jena) Apache Kafka (see Kafka) Apache Lucene (see Lucene) Apache MADlib (see MADlib) Apache Mahout (see Mahout) Apache Oozie (see Oozie) Apache Parquet (see Parquet) Apache Qpid (see Qpid) Apache Samza (see Samza) Apache Solr (see Solr) Apache Spark (see Spark) 560 | Index Apache Storm (see Storm) Apache Tajo (see Tajo) Apache Tez (see Tez) Apache Thrift (see Thrift) Apache ZooKeeper (see ZooKeeper) Apama (stream analytics), 466 append-only B-trees, 82, 242 append-only files (see logs) Application Programming Interfaces (APIs), 5, 27 for batch processing, 403 for change streams, 456 for distributed transactions, 361 for graph processing, 425 for services, 131-136 (see also services) evolvability, 136 RESTful, 133 SOAP, 133 application state (see state) approximate search (see similarity search) archival storage, data from databases, 131 arcs (see edges) arithmetic mean, 14 ASCII text, 119, 395 ASN.1 (schema language), 127 asynchronous networks, 278, 553 comparison to synchronous networks, 284 formal model, 307 asynchronous replication, 154, 553 conflict detection, 172 data loss on failover, 157 reads from asynchronous follower, 162 Asynchronous Transfer Mode (ATM), 285 atomic broadcast (see total order broadcast) atomic clocks (caesium clocks), 294, 295 (see also clocks) atomicity (concurrency), 553 atomic increment-and-get, 351 compare-and-set, 245, 327 (see also compare-and-set operations) replicated operations, 246 write operations, 243 atomicity (transactions), 223, 228, 553 atomic commit, 353 avoiding, 523, 528 blocking and nonblocking, 359 in stream processing, 360, 477 maintaining derived data, 453 for multi-object transactions, 229 for single-object writes, 230 auditability, 528-533 designing for, 531 self-auditing systems, 530 through immutability, 460 tools for auditable data systems, 532 availability, 8 (see also fault tolerance) in CAP theorem, 337 in service level agreements (SLAs), 15 Avro (data format), 122-127 code generation, 127 dynamically generated schemas, 126 object container files, 125, 131, 414 reader determining writer’s schema, 125 schema evolution, 123 use in Hadoop, 414 awk (Unix tool), 391 AWS (see Amazon Web Services) Azure (see Microsoft) B B-trees (indexes), 79-83 append-only/copy-on-write variants, 82, 242 branching factor, 81 comparison to LSM-trees, 83-85 crash recovery, 82 growing by splitting a page, 81 optimizations, 82 similarity to dynamic partitioning, 212 backpressure, 441, 553 in TCP, 282 backups database snapshot for replication, 156 integrity of, 530 snapshot isolation for, 238 use for ETL processes, 405 backward compatibility, 112 BASE, contrast to ACID, 223 bash shell (Unix), 70, 395, 503 batch processing, 28, 389-431, 553 combining with stream processing lambda architecture, 497 unifying technologies, 498 comparison to MPP databases, 414-418 comparison to stream processing, 464 comparison to Unix, 413-414 dataflow engines, 421-423 fault tolerance, 406, 414, 422, 442 for data integration, 494-498 graphs and iterative processing, 424-426 high-level APIs and languages, 403, 426-429 log-based messaging and, 451 maintaining derived state, 495 MapReduce and distributed filesystems, 397-413 (see also MapReduce) measuring performance, 13, 390 outputs, 411-413 key-value stores, 412 search indexes, 411 using Unix tools (example), 391-394 Bayou (database), 522 Beam (dataflow library), 498 bias, 534 big ball of mud, 20 Bigtable data model, 41, 99 binary data encodings, 115-128 Avro, 122-127 MessagePack, 116-117 Thrift and Protocol Buffers, 117-121 binary encoding based on schemas, 127 by network drivers, 128 binary strings, lack of support in JSON and XML, 114 BinaryProtocol encoding (Thrift), 118 Bitcask (storage engine), 72 crash recovery, 74 Bitcoin (cryptocurrency), 532 Byzantine fault tolerance, 305 concurrency bugs in exchanges, 233 bitmap indexes, 97 blockchains, 532 Byzantine fault tolerance, 305 blocking atomic commit, 359 Bloom (programming language), 504 Bloom filter (algorithm), 79, 466 BookKeeper (replicated log), 372 Bottled Water (change data capture), 455 bounded datasets, 430, 439, 553 (see also batch processing) bounded delays, 553 in networks, 285 process pauses, 298 broadcast hash joins, 409 Index | 561 brokerless messaging, 442 Brubeck (metrics aggregator), 442 BTM (transaction coordinator), 356 bulk synchronous parallel (BSP) model, 425 bursty network traffic patterns, 285 business data processing, 28, 90, 390 byte sequence, encoding data in, 112 Byzantine faults, 304-306, 307, 553 Byzantine fault-tolerant systems, 305, 532 Byzantine Generals Problem, 304 consensus algorithms and, 366 C caches, 89, 553 and materialized views, 101 as derived data, 386, 499-504 database as cache of transaction log, 460 in CPUs, 99, 338, 428 invalidation and maintenance, 452, 467 linearizability, 324 CAP theorem, 336-338, 554 Cascading (batch processing), 419, 427 hash joins, 409 workflows, 403 cascading failures, 9, 214, 281 Cascalog (batch processing), 60 Cassandra (database) column-family data model, 41, 99 compaction strategy, 79 compound primary key, 204 gossip protocol, 216 hash partitioning, 203-205 last-write-wins conflict resolution, 186, 292 leaderless replication, 177 linearizability, lack of, 335 log-structured storage, 78 multi-datacenter support, 184 partitioning scheme, 213 secondary indexes, 207 sloppy quorums, 184 cat (Unix tool), 391 causal context, 191 (see also causal dependencies) causal dependencies, 186-191 capturing, 191, 342, 494, 514 by total ordering, 493 causal ordering, 339 in transactions, 262 sending message to friends (example), 494 562 | Index causality, 554 causal ordering, 339-343 linearizability and, 342 total order consistent with, 344, 345 consistency with, 344-347 consistent snapshots, 340 happens-before relationship, 186 in serializable transactions, 262-265 mismatch with clocks, 292 ordering events to capture, 493 violations of, 165, 176, 292, 340 with synchronized clocks, 294 CEP (see complex event processing) certificate transparency, 532 chain replication, 155 linearizable reads, 351 change data capture, 160, 454 API support for change streams, 456 comparison to event sourcing, 457 implementing, 454 initial snapshot, 455 log compaction, 456 changelogs, 460 change data capture, 454 for operator state, 479 generating with triggers, 455 in stream joins, 474 log compaction, 456 maintaining derived state, 452 Chaos Monkey, 7, 280 checkpointing in batch processors, 422, 426 in high-performance computing, 275 in stream processors, 477, 523 chronicle data model, 458 circuit-switched networks, 284 circular buffers, 450 circular replication topologies, 175 clickstream data, analysis of, 404 clients calling services, 131 pushing state changes to, 512 request routing, 214 stateful and offline-capable, 170, 511 clocks, 287-299 atomic (caesium) clocks, 294, 295 confidence interval, 293-295 for global snapshots, 294 logical (see logical clocks) skew, 291-294, 334 slewing, 289 synchronization and accuracy, 289-291 synchronization using GPS, 287, 290, 294, 295 time-of-day versus monotonic clocks, 288 timestamping events, 471 cloud computing, 146, 275 need for service discovery, 372 network glitches, 279 shared resources, 284 single-machine reliability, 8 Cloudera Impala (see Impala) clustered indexes, 86 CODASYL model, 36 (see also network model) code generation with Avro, 127 with Thrift and Protocol Buffers, 118 with WSDL, 133 collaborative editing multi-leader replication and, 170 column families (Bigtable), 41, 99 column-oriented storage, 95-101 column compression, 97 distinction between column families and, 99 in batch processors, 428 Parquet, 96, 131, 414 sort order in, 99-100 vectorized processing, 99, 428 writing to, 101 comma-separated values (see CSV) command query responsibility segregation (CQRS), 462 commands (event sourcing), 459 commits (transactions), 222 atomic commit, 354-355 (see also atomicity; transactions) read committed isolation, 234 three-phase commit (3PC), 359 two-phase commit (2PC), 355-359 commutative operations, 246 compaction of changelogs, 456 (see also log compaction) for stream operator state, 479 of log-structured storage, 73 issues with, 84 size-tiered and leveled approaches, 79 CompactProtocol encoding (Thrift), 119 compare-and-set operations, 245, 327 implementing locks, 370 implementing uniqueness constraints, 331 implementing with total order broadcast, 350 relation to consensus, 335, 350, 352, 374 relation to transactions, 230 compatibility, 112, 128 calling services, 136 properties of encoding formats, 139 using databases, 129-131 using message-passing, 138 compensating transactions, 355, 461, 526 complex event processing (CEP), 465 complexity distilling in theoretical models, 310 hiding using abstraction, 27 of software systems, managing, 20 composing data systems (see unbundling data‐ bases) compute-intensive applications, 3, 275 concatenated indexes, 87 in Cassandra, 204 Concord (stream processor), 466 concurrency actor programming model, 138, 468 (see also message-passing) bugs from weak transaction isolation, 233 conflict resolution, 171, 174 detecting concurrent writes, 184-191 dual writes, problems with, 453 happens-before relationship, 186 in replicated systems, 161-191, 324-338 lost updates, 243 multi-version concurrency control (MVCC), 239 optimistic concurrency control, 261 ordering of operations, 326, 341 reducing, through event logs, 351, 462, 507 time and relativity, 187 transaction isolation, 225 write skew (transaction isolation), 246-251 conflict-free replicated datatypes (CRDTs), 174 conflicts conflict detection, 172 causal dependencies, 186, 342 in consensus algorithms, 368 in leaderless replication, 184 Index | 563 in log-based systems, 351, 521 in nonlinearizable systems, 343 in serializable snapshot isolation (SSI), 264 in two-phase commit, 357, 364 conflict resolution automatic conflict resolution, 174 by aborting transactions, 261 by apologizing, 527 convergence, 172-174 in leaderless systems, 190 last write wins (LWW), 186, 292 using atomic operations, 246 using custom logic, 173 determining what is a conflict, 174, 522 in multi-leader replication, 171-175 avoiding conflicts, 172 lost updates, 242-246 materializing, 251 relation to operation ordering, 339 write skew (transaction isolation), 246-251 congestion (networks) avoidance, 282 limiting accuracy of clocks, 293 queueing delays, 282 consensus, 321, 364-375, 554 algorithms, 366-368 preventing split brain, 367 safety and liveness properties, 365 using linearizable operations, 351 cost of, 369 distributed transactions, 352-375 in practice, 360-364 two-phase commit, 354-359 XA transactions, 361-364 impossibility of, 353 membership and coordination services, 370-373 relation to compare-and-set, 335, 350, 352, 374 relation to replication, 155, 349 relation to uniqueness constraints, 521 consistency, 224, 524 across different databases, 157, 452, 462, 492 causal, 339-348, 493 consistent prefix reads, 165-167 consistent snapshots, 156, 237-242, 294, 455, 500 (see also snapshots) 564 | Index crash recovery, 82 enforcing constraints (see constraints) eventual, 162, 322 (see also eventual consistency) in ACID transactions, 224, 529 in CAP theorem, 337 linearizability, 324-338 meanings of, 224 monotonic reads, 164-165 of secondary indexes, 231, 241, 354, 491, 500 ordering guarantees, 339-352 read-after-write, 162-164 sequential, 351 strong (see linearizability) timeliness and integrity, 524 using quorums, 181, 334 consistent hashing, 204 consistent prefix reads, 165 constraints (databases), 225, 248 asynchronously checked, 526 coordination avoidance, 527 ensuring idempotence, 519 in log-based systems, 521-524 across multiple partitions, 522 in two-phase commit, 355, 357 relation to consensus, 374, 521 relation to event ordering, 347 requiring linearizability, 330 Consul (service discovery), 372 consumers (message streams), 137, 440 backpressure, 441 consumer offsets in logs, 449 failures, 445, 449 fan-out, 11, 445, 448 load balancing, 444, 448 not keeping up with producers, 441, 450, 502 context switches, 14, 297 convergence (conflict resolution), 172-174, 322 coordination avoidance, 527 cross-datacenter, 168, 493 cross-partition ordering, 256, 294, 348, 523 services, 330, 370-373 coordinator (in 2PC), 356 failure, 358 in XA transactions, 361-364 recovery, 363 copy-on-write (B-trees), 82, 242 CORBA (Common Object Request Broker Architecture), 134 correctness, 6 auditability, 528-533 Byzantine fault tolerance, 305, 532 dealing with partial failures, 274 in log-based systems, 521-524 of algorithm within system model, 308 of compensating transactions, 355 of consensus, 368 of derived data, 497, 531 of immutable data, 461 of personal data, 535, 540 of time, 176, 289-295 of transactions, 225, 515, 529 timeliness and integrity, 524-528 corruption of data detecting, 519, 530-533 due to pathological memory access, 529 due to radiation, 305 due to split brain, 158, 302 due to weak transaction isolation, 233 formalization in consensus, 366 integrity as absence of, 524 network packets, 306 on disks, 227 preventing using write-ahead logs, 82 recovering from, 414, 460 Couchbase (database) durability, 89 hash partitioning, 203-204, 211 rebalancing, 213 request routing, 216 CouchDB (database) B-tree storage, 242 change feed, 456 document data model, 31 join support, 34 MapReduce support, 46, 400 replication, 170, 173 covering indexes, 86 CPUs cache coherence and memory barriers, 338 caching and pipelining, 99, 428 increasing parallelism, 43 CRDTs (see conflict-free replicated datatypes) CREATE INDEX statement (SQL), 85, 500 credit rating agencies, 535 Crunch (batch processing), 419, 427 hash joins, 409 sharded joins, 408 workflows, 403 cryptography defense against attackers, 306 end-to-end encryption and authentication, 519, 543 proving integrity of data, 532 CSS (Cascading Style Sheets), 44 CSV (comma-separated values), 70, 114, 396 Curator (ZooKeeper recipes), 330, 371 curl (Unix tool), 135, 397 cursor stability, 243 Cypher (query language), 52 comparison to SPARQL, 59 D data corruption (see corruption of data) data cubes, 102 data formats (see encoding) data integration, 490-498, 543 batch and stream processing, 494-498 lambda architecture, 497 maintaining derived state, 495 reprocessing data, 496 unifying, 498 by unbundling databases, 499-515 comparison to federated databases, 501 combining tools by deriving data, 490-494 derived data versus distributed transac‐ tions, 492 limits of total ordering, 493 ordering events to capture causality, 493 reasoning about dataflows, 491 need for, 385 data lakes, 415 data locality (see locality) data models, 27-64 graph-like models, 49-63 Datalog language, 60-63 property graphs, 50 RDF and triple-stores, 55-59 query languages, 42-48 relational model versus document model, 28-42 data protection regulations, 542 data systems, 3 about, 4 Index | 565 concerns when designing, 5 future of, 489-544 correctness, constraints, and integrity, 515-533 data integration, 490-498 unbundling databases, 499-515 heterogeneous, keeping in sync, 452 maintainability, 18-22 possible faults in, 221 reliability, 6-10 hardware faults, 7 human errors, 9 importance of, 10 software errors, 8 scalability, 10-18 unreliable clocks, 287-299 data warehousing, 91-95, 554 comparison to data lakes, 415 ETL (extract-transform-load), 92, 416, 452 keeping data systems in sync, 452 schema design, 93 slowly changing dimension (SCD), 476 data-intensive applications, 3 database triggers (see triggers) database-internal distributed transactions, 360, 364, 477 databases archival storage, 131 comparison of message brokers to, 443 dataflow through, 129 end-to-end argument for, 519-520 checking integrity, 531 inside-out, 504 (see also unbundling databases) output from batch workflows, 412 relation to event streams, 451-464 (see also changelogs) API support for change streams, 456, 506 change data capture, 454-457 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 unbundling, 499-515 composing data storage technologies, 499-504 designing applications around dataflow, 504-509 566 | Index observing derived state, 509-515 datacenters geographically distributed, 145, 164, 278, 493 multi-tenancy and shared resources, 284 network architecture, 276 network faults, 279 replication across multiple, 169 leaderless replication, 184 multi-leader replication, 168, 335 dataflow, 128-139, 504-509 correctness of dataflow systems, 525 differential, 504 message-passing, 136-139 reasoning about, 491 through databases, 129 through services, 131-136 dataflow engines, 421-423 comparison to stream processing, 464 directed acyclic graphs (DAG), 424 partitioning, approach to, 429 support for declarative queries, 427 Datalog (query language), 60-63 datatypes binary strings in XML and JSON, 114 conflict-free, 174 in Avro encodings, 122 in Thrift and Protocol Buffers, 121 numbers in XML and JSON, 114 Datomic (database) B-tree storage, 242 data model, 50, 57 Datalog query language, 60 excision (deleting data), 463 languages for transactions, 255 serial execution of transactions, 253 deadlocks detection, in two-phase commit (2PC), 364 in two-phase locking (2PL), 258 Debezium (change data capture), 455 declarative languages, 42, 554 Bloom, 504 CSS and XSL, 44 Cypher, 52 Datalog, 60 for batch processing, 427 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 delays bounded network delays, 285 bounded process pauses, 298 unbounded network delays, 282 unbounded process pauses, 296 deleting data, 463 denormalization (data representation), 34, 554 costs, 39 in derived data systems, 386 materialized views, 101 updating derived data, 228, 231, 490 versus normalization, 462 derived data, 386, 439, 554 from change data capture, 454 in event sourcing, 458-458 maintaining derived state through logs, 452-457, 459-463 observing, by subscribing to streams, 512 outputs of batch and stream processing, 495 through application code, 505 versus distributed transactions, 492 deterministic operations, 255, 274, 554 accidental nondeterminism, 423 and fault tolerance, 423, 426 and idempotence, 478, 492 computing derived data, 495, 526, 531 in state machine replication, 349, 452, 458 joins, 476 DevOps, 394 differential dataflow, 504 dimension tables, 94 dimensional modeling (see star schemas) directed acyclic graphs (DAGs), 424 dirty reads (transaction isolation), 234 dirty writes (transaction isolation), 235 discrimination, 534 disks (see hard disks) distributed actor frameworks, 138 distributed filesystems, 398-399 decoupling from query engines, 417 indiscriminately dumping data into, 415 use by MapReduce, 402 distributed systems, 273-312, 554 Byzantine faults, 304-306 cloud versus supercomputing, 275 detecting network faults, 280 faults and partial failures, 274-277 formalization of consensus, 365 impossibility results, 338, 353 issues with failover, 157 limitations of distributed transactions, 363 multi-datacenter, 169, 335 network problems, 277-286 quorums, relying on, 301 reasons for using, 145, 151 synchronized clocks, relying on, 291-295 system models, 306-310 use of clocks and time, 287 distributed transactions (see transactions) Django (web framework), 232 DNS (Domain Name System), 216, 372 Docker (container manager), 506 document data model, 30-42 comparison to relational model, 38-42 document references, 38, 403 document-oriented databases, 31 many-to-many relationships and joins, 36 multi-object transactions, need for, 231 versus relational model convergence of models, 41 data locality, 41 document-partitioned indexes, 206, 217, 411 domain-driven design (DDD), 457 DRBD (Distributed Replicated Block Device), 153 drift (clocks), 289 Drill (query engine), 93 Druid (database), 461 Dryad (dataflow engine), 421 dual writes, problems with, 452, 507 duplicates, suppression of, 517 (see also idempotence) using a unique ID, 518, 522 durability (transactions), 226, 554 duration (time), 287 measurement with monotonic clocks, 288 dynamic partitioning, 212 dynamically typed languages analogy to schema-on-read, 40 code generation and, 127 Dynamo-style databases (see leaderless replica‐ tion) E edges (in graphs), 49, 403 property graph model, 50 edit distance (full-text search), 88 effectively-once semantics, 476, 516 Index | 567 (see also exactly-once semantics) preservation of integrity, 525 elastic systems, 17 Elasticsearch (search server) document-partitioned indexes, 207 partition rebalancing, 211 percolator (stream search), 467 usage example, 4 use of Lucene, 79 ElephantDB (database), 413 Elm (programming language), 504, 512 encodings (data formats), 111-128 Avro, 122-127 binary variants of JSON and XML, 115 compatibility, 112 calling services, 136 using databases, 129-131 using message-passing, 138 defined, 113 JSON, XML, and CSV, 114 language-specific formats, 113 merits of schemas, 127 representations of data, 112 Thrift and Protocol Buffers, 117-121 end-to-end argument, 277, 519-520 checking integrity, 531 publish/subscribe streams, 512 enrichment (stream), 473 Enterprise JavaBeans (EJB), 134 entities (see vertices) epoch (consensus algorithms), 368 epoch (Unix timestamps), 288 equi-joins, 403 erasure coding (error correction), 398 Erlang OTP (actor framework), 139 error handling for network faults, 280 in transactions, 231 error-correcting codes, 277, 398 Esper (CEP engine), 466 etcd (coordination service), 370-373 linearizable operations, 333 locks and leader election, 330 quorum reads, 351 service discovery, 372 use of Raft algorithm, 349, 353 Ethereum (blockchain), 532 Ethernet (networks), 276, 278, 285 packet checksums, 306, 519 568 | Index Etherpad (collaborative editor), 170 ethics, 533-543 code of ethics and professional practice, 533 legislation and self-regulation, 542 predictive analytics, 533-536 amplifying bias, 534 feedback loops, 536 privacy and tracking, 536-543 consent and freedom of choice, 538 data as assets and power, 540 meaning of privacy, 539 surveillance, 537 respect, dignity, and agency, 543, 544 unintended consequences, 533, 536 ETL (extract-transform-load), 92, 405, 452, 554 use of Hadoop for, 416 event sourcing, 457-459 commands and events, 459 comparison to change data capture, 457 comparison to lambda architecture, 497 deriving current state from event log, 458 immutability and auditability, 459, 531 large, reliable data systems, 519, 526 Event Store (database), 458 event streams (see streams) events, 440 deciding on total order of, 493 deriving views from event log, 461 difference to commands, 459 event time versus processing time, 469, 477, 498 immutable, advantages of, 460, 531 ordering to capture causality, 493 reads as, 513 stragglers, 470, 498 timestamp of, in stream processing, 471 EventSource (browser API), 512 eventual consistency, 152, 162, 308, 322 (see also conflicts) and perpetual inconsistency, 525 evolvability, 21, 111 calling services, 136 graph-structured data, 52 of databases, 40, 129-131, 461, 497 of message-passing, 138 reprocessing data, 496, 498 schema evolution in Avro, 123 schema evolution in Thrift and Protocol Buffers, 120 schema-on-read, 39, 111, 128 exactly-once semantics, 360, 476, 516 parity with batch processors, 498 preservation of integrity, 525 exclusive mode (locks), 258 eXtended Architecture transactions (see XA transactions) extract-transform-load (see ETL) F Facebook Presto (query engine), 93 React, Flux, and Redux (user interface libra‐ ries), 512 social graphs, 49 Wormhole (change data capture), 455 fact tables, 93 failover, 157, 554 (see also leader-based replication) in leaderless replication, absence of, 178 leader election, 301, 348, 352 potential problems, 157 failures amplification by distributed transactions, 364, 495 failure detection, 280 automatic rebalancing causing cascading failures, 214 perfect failure detectors, 359 timeouts and unbounded delays, 282, 284 using ZooKeeper, 371 faults versus, 7 partial failures in distributed systems, 275-277, 310 fan-out (messaging systems), 11, 445 fault tolerance, 6-10, 555 abstractions for, 321 formalization in consensus, 365-369 use of replication, 367 human fault tolerance, 414 in batch processing, 406, 414, 422, 425 in log-based systems, 520, 524-526 in stream processing, 476-479 atomic commit, 477 idempotence, 478 maintaining derived state, 495 microbatching and checkpointing, 477 rebuilding state after a failure, 478 of distributed transactions, 362-364 transaction atomicity, 223, 354-361 faults, 6 Byzantine faults, 304-306 failures versus, 7 handled by transactions, 221 handling in supercomputers and cloud computing, 275 hardware, 7 in batch processing versus distributed data‐ bases, 417 in distributed systems, 274-277 introducing deliberately, 7, 280 network faults, 279-281 asymmetric faults, 300 detecting, 280 tolerance of, in multi-leader replication, 169 software errors, 8 tolerating (see fault tolerance) federated databases, 501 fence (CPU instruction), 338 fencing (preventing split brain), 158, 302-304 generating fencing tokens, 349, 370 properties of fencing tokens, 308 stream processors writing to databases, 478, 517 Fibre Channel (networks), 398 field tags (Thrift and Protocol Buffers), 119-121 file descriptors (Unix), 395 financial data, 460 Firebase (database), 456 Flink (processing framework), 421-423 dataflow APIs, 427 fault tolerance, 422, 477, 479 Gelly API (graph processing), 425 integration of batch and stream processing, 495, 498 machine learning, 428 query optimizer, 427 stream processing, 466 flow control, 282, 441, 555 FLP result (on consensus), 353 FlumeJava (dataflow library), 403, 427 followers, 152, 555 (see also leader-based replication) foreign keys, 38, 403 forward compatibility, 112 forward decay (algorithm), 16 Index | 569 Fossil (version control system), 463 shunning (deleting data), 463 FoundationDB (database) serializable transactions, 261, 265, 364 fractal trees, 83 full table scans, 403 full-text search, 555 and fuzzy indexes, 88 building search indexes, 411 Lucene storage engine, 79 functional reactive programming (FRP), 504 functional requirements, 22 futures (asynchronous operations), 135 fuzzy search (see similarity search) G garbage collection immutability and, 463 process pauses for, 14, 296-299, 301 (see also process pauses) genome analysis, 63, 429 geographically distributed datacenters, 145, 164, 278, 493 geospatial indexes, 87 Giraph (graph processing), 425 Git (version control system), 174, 342, 463 GitHub, postmortems, 157, 158, 309 global indexes (see term-partitioned indexes) GlusterFS (distributed filesystem), 398 GNU Coreutils (Linux), 394 GoldenGate (change data capture), 161, 170, 455 (see also Oracle) Google Bigtable (database) data model (see Bigtable data model) partitioning scheme, 199, 202 storage layout, 78 Chubby (lock service), 370 Cloud Dataflow (stream processor), 466, 477, 498 (see also Beam) Cloud Pub/Sub (messaging), 444, 448 Docs (collaborative editor), 170 Dremel (query engine), 93, 96 FlumeJava (dataflow library), 403, 427 GFS (distributed file system), 398 gRPC (RPC framework), 135 MapReduce (batch processing), 390 570 | Index (see also MapReduce) building search indexes, 411 task preemption, 418 Pregel (graph processing), 425 Spanner (see Spanner) TrueTime (clock API), 294 gossip protocol, 216 government use of data, 541 GPS (Global Positioning System) use for clock synchronization, 287, 290, 294, 295 GraphChi (graph processing), 426 graphs, 555 as data models, 49-63 example of graph-structured data, 49 property graphs, 50 RDF and triple-stores, 55-59 versus the network model, 60 processing and analysis, 424-426 fault tolerance, 425 Pregel processing model, 425 query languages Cypher, 52 Datalog, 60-63 recursive SQL queries, 53 SPARQL, 59-59 Gremlin (graph query language), 50 grep (Unix tool), 392 GROUP BY clause (SQL), 406 grouping records in MapReduce, 406 handling skew, 407 H Hadoop (data infrastructure) comparison to distributed databases, 390 comparison to MPP databases, 414-418 comparison to Unix, 413-414, 499 diverse processing models in ecosystem, 417 HDFS distributed filesystem (see HDFS) higher-level tools, 403 join algorithms, 403-410 (see also MapReduce) MapReduce (see MapReduce) YARN (see YARN) happens-before relationship, 340 capturing, 187 concurrency and, 186 hard disks access patterns, 84 detecting corruption, 519, 530 faults in, 7, 227 sequential write throughput, 75, 450 hardware faults, 7 hash indexes, 72-75 broadcast hash joins, 409 partitioned hash joins, 409 hash partitioning, 203-205, 217 consistent hashing, 204 problems with hash mod N, 210 range queries, 204 suitable hash functions, 203 with fixed number of partitions, 210 HAWQ (database), 428 HBase (database) bug due to lack of fencing, 302 bulk loading, 413 column-family data model, 41, 99 dynamic partitioning, 212 key-range partitioning, 202 log-structured storage, 78 request routing, 216 size-tiered compaction, 79 use of HDFS, 417 use of ZooKeeper, 370 HDFS (Hadoop Distributed File System), 398-399 (see also distributed filesystems) checking data integrity, 530 decoupling from query engines, 417 indiscriminately dumping data into, 415 metadata about datasets, 410 NameNode, 398 use by Flink, 479 use by HBase, 212 use by MapReduce, 402 HdrHistogram (numerical library), 16 head (Unix tool), 392 head vertex (property graphs), 51 head-of-line blocking, 15 heap files (databases), 86 Helix (cluster manager), 216 heterogeneous distributed transactions, 360, 364 heuristic decisions (in 2PC), 363 Hibernate (object-relational mapper), 30 hierarchical model, 36 high availability (see fault tolerance) high-frequency trading, 290, 299 high-performance computing (HPC), 275 hinted handoff, 183 histograms, 16 Hive (query engine), 419, 427 for data warehouses, 93 HCatalog and metastore, 410 map-side joins, 409 query optimizer, 427 skewed joins, 408 workflows, 403 Hollerith machines, 390 hopping windows (stream processing), 472 (see also windows) horizontal scaling (see scaling out) HornetQ (messaging), 137, 444 distributed transaction support, 361 hot spots, 201 due to celebrities, 205 for time-series data, 203 in batch processing, 407 relieving, 205 hot standbys (see leader-based replication) HTTP, use in APIs (see services) human errors, 9, 279, 414 HyperDex (database), 88 HyperLogLog (algorithm), 466 I I/O operations, waiting for, 297 IBM DB2 (database) distributed transaction support, 361 recursive query support, 54 serializable isolation, 242, 257 XML and JSON support, 30, 42 electromechanical card-sorting machines, 390 IMS (database), 36 imperative query APIs, 46 InfoSphere Streams (CEP engine), 466 MQ (messaging), 444 distributed transaction support, 361 System R (database), 222 WebSphere (messaging), 137 idempotence, 134, 478, 555 by giving operations unique IDs, 518, 522 idempotent operations, 517 immutability advantages of, 460, 531 Index | 571 deriving state from event log, 459-464 for crash recovery, 75 in B-trees, 82, 242 in event sourcing, 457 inputs to Unix commands, 397 limitations of, 463 Impala (query engine) for data warehouses, 93 hash joins, 409 native code generation, 428 use of HDFS, 417 impedance mismatch, 29 imperative languages, 42 setting element styles (example), 45 in doubt (transaction status), 358 holding locks, 362 orphaned transactions, 363 in-memory databases, 88 durability, 227 serial transaction execution, 253 incidents cascading failures, 9 crashes due to leap seconds, 290 data corruption and financial losses due to concurrency bugs, 233 data corruption on hard disks, 227 data loss due to last-write-wins, 173, 292 data on disks unreadable, 309 deleted items reappearing, 174 disclosure of sensitive data due to primary key reuse, 157 errors in transaction serializability, 529 gigabit network interface with 1 Kb/s throughput, 311 network faults, 279 network interface dropping only inbound packets, 279 network partitions and whole-datacenter failures, 275 poor handling of network faults, 280 sending message to ex-partner, 494 sharks biting undersea cables, 279 split brain due to 1-minute packet delay, 158, 279 vibrations in server rack, 14 violation of uniqueness constraint, 529 indexes, 71, 555 and snapshot isolation, 241 as derived data, 386, 499-504 572 | Index B-trees, 79-83 building in batch processes, 411 clustered, 86 comparison of B-trees and LSM-trees, 83-85 concatenated, 87 covering (with included columns), 86 creating, 500 full-text search, 88 geospatial, 87 hash, 72-75 index-range locking, 260 multi-column, 87 partitioning and secondary indexes, 206-209, 217 secondary, 85 (see also secondary indexes) problems with dual writes, 452, 491 SSTables and LSM-trees, 76-79 updating when data changes, 452, 467 Industrial Revolution, 541 InfiniBand (networks), 285 InfiniteGraph (database), 50 InnoDB (storage engine) clustered index on primary key, 86 not preventing lost updates, 245 preventing write skew, 248, 257 serializable isolation, 257 snapshot isolation support, 239 inside-out databases, 504 (see also unbundling databases) integrating different data systems (see data integration) integrity, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 in consensus formalization, 365 integrity checks, 530 (see also auditing) end-to-end, 519, 531 use of snapshot isolation, 238 maintaining despite software bugs, 529 Interface Definition Language (IDL), 117, 122 intermediate state, materialization of, 420-423 internet services, systems for implementing, 275 invariants, 225 (see also constraints) inversion of control, 396 IP (Internet Protocol) unreliability of, 277 ISDN (Integrated Services Digital Network), 284 isolation (in transactions), 225, 228, 555 correctness and, 515 for single-object writes, 230 serializability, 251-266 actual serial execution, 252-256 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 violating, 228 weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-237 snapshot isolation, 237-242 iterative processing, 424-426 J Java Database Connectivity (JDBC) distributed transaction support, 361 network drivers, 128 Java Enterprise Edition (EE), 134, 356, 361 Java Message Service (JMS), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 distributed transaction support, 361 message ordering, 446 Java Transaction API (JTA), 355, 361 Java Virtual Machine (JVM) bytecode generation, 428 garbage collection pauses, 296 process reuse in batch processors, 422 JavaScript in MapReduce querying, 46 setting element styles (example), 45 use in advanced queries, 48 Jena (RDF framework), 57 Jepsen (fault tolerance testing), 515 jitter (network delay), 284 joins, 555 by index lookup, 403 expressing as relational operators, 427 in relational and document databases, 34 MapReduce map-side joins, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 MapReduce reduce-side joins, 403-408 handling skew, 407 sort-merge joins, 405 parallel execution of, 415 secondary indexes and, 85 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 support in document databases, 42 JOTM (transaction coordinator), 356 JSON Avro schema representation, 122 binary variants, 115 for application data, issues with, 114 in relational databases, 30, 42 representing a résumé (example), 31 Juttle (query language), 504 K k-nearest neighbors, 429 Kafka (messaging), 137, 448 Kafka Connect (database integration), 457, 461 Kafka Streams (stream processor), 466, 467 fault tolerance, 479 leader-based replication, 153 log compaction, 456, 467 message offsets, 447, 478 request routing, 216 transaction support, 477 usage example, 4 Ketama (partitioning library), 213 key-value stores, 70 as batch process output, 412 hash indexes, 72-75 in-memory, 89 partitioning, 201-205 by hash of key, 203, 217 by key range, 202, 217 dynamic partitioning, 212 skew and hot spots, 205 Kryo (Java), 113 Kubernetes (cluster manager), 418, 506 L lambda architecture, 497 Lamport timestamps, 345 Index | 573 Large Hadron Collider (LHC), 64 last write wins (LWW), 173, 334 discarding concurrent writes, 186 problems with, 292 prone to lost updates, 246 late binding, 396 latency instability under two-phase locking, 259 network latency and resource utilization, 286 response time versus, 14 tail latency, 15, 207 leader-based replication, 152-161 (see also replication) failover, 157, 301 handling node outages, 156 implementation of replication logs change data capture, 454-457 (see also changelogs) statement-based, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 linearizability of operations, 333 locking and leader election, 330 log sequence number, 156, 449 read-scaling architecture, 161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 leaderless replication, 177-191 (see also replication) detecting concurrent writes, 184-191 capturing happens-before relationship, 187 happens-before relationship and concur‐ rency, 186 last write wins, 186 merging concurrently written values, 190 version vectors, 191 multi-datacenter, 184 quorums, 179-182 consistency limitations, 181-183, 334 sloppy quorums and hinted handoff, 183 read repair and anti-entropy, 178 leap seconds, 8, 290 in time-of-day clocks, 288 leases, 295 implementation with ZooKeeper, 370 574 | Index need for fencing, 302 ledgers, 460 distributed ledger technologies, 532 legacy systems, maintenance of, 18 less (Unix tool), 397 LevelDB (storage engine), 78 leveled compaction, 79 Levenshtein automata, 88 limping (partial failure), 311 linearizability, 324-338, 555 cost of, 335-338 CAP theorem, 336 memory on multi-core CPUs, 338 definition, 325-329 implementing with total order broadcast, 350 in ZooKeeper, 370 of derived data systems, 492, 524 avoiding coordination, 527 of different replication methods, 332-335 using quorums, 334 relying on, 330-332 constraints and uniqueness, 330 cross-channel timing dependencies, 331 locking and leader election, 330 stronger than causal consistency, 342 using to implement total order broadcast, 351 versus serializability, 329 LinkedIn Azkaban (workflow scheduler), 402 Databus (change data capture), 161, 455 Espresso (database), 31, 126, 130, 153, 216 Helix (cluster manager) (see Helix) profile (example), 30 reference to company entity (example), 34 Rest.li (RPC framework), 135 Voldemort (database) (see Voldemort) Linux, leap second bug, 8, 290 liveness properties, 308 LMDB (storage engine), 82, 242 load approaches to coping with, 17 describing, 11 load testing, 16 load balancing (messaging), 444 local indexes (see document-partitioned indexes) locality (data access), 32, 41, 555 in batch processing, 400, 405, 421 in stateful clients, 170, 511 in stream processing, 474, 478, 508, 522 location transparency, 134 in the actor model, 138 locks, 556 deadlock, 258 distributed locking, 301-304, 330 fencing tokens, 303 implementation with ZooKeeper, 370 relation to consensus, 374 for transaction isolation in snapshot isolation, 239 in two-phase locking (2PL), 257-261 making operations atomic, 243 performance, 258 preventing dirty writes, 236 preventing phantoms with index-range locks, 260, 265 read locks (shared mode), 236, 258 shared mode and exclusive mode, 258 in two-phase commit (2PC) deadlock detection, 364 in-doubt transactions holding locks, 362 materializing conflicts with, 251 preventing lost updates by explicit locking, 244 log sequence number, 156, 449 logic programming languages, 504 logical clocks, 293, 343, 494 for read-after-write consistency, 164 logical logs, 160 logs (data structure), 71, 556 advantages of immutability, 460 compaction, 73, 79, 456, 460 for stream operator state, 479 creating using total order broadcast, 349 implementing uniqueness constraints, 522 log-based messaging, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 disk space usage, 450 replaying old messages, 451, 496, 498 slow consumers, 450 using logs for message storage, 447 log-structured storage, 71-79 log-structured merge tree (see LSMtrees) replication, 152, 158-161 change data capture, 454-457 (see also changelogs) coordination with snapshot, 156 logical (row-based) replication, 160 statement-based replication, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 scalability limits, 493 loose coupling, 396, 419, 502 lost updates (see updates) LSM-trees (indexes), 78-79 comparison to B-trees, 83-85 Lucene (storage engine), 79 building indexes in batch processes, 411 similarity search, 88 Luigi (workflow scheduler), 402 LWW (see last write wins) M machine learning ethical considerations, 534 (see also ethics) iterative processing, 424 models derived from training data, 505 statistical and numerical algorithms, 428 MADlib (machine learning toolkit), 428 magic scaling sauce, 18 Mahout (machine learning toolkit), 428 maintainability, 18-22, 489 defined, 23 design principles for software systems, 19 evolvability (see evolvability) operability, 19 simplicity and managing complexity, 20 many-to-many relationships in document model versus relational model, 39 modeling as graphs, 49 many-to-one and many-to-many relationships, 33-36 many-to-one relationships, 34 MapReduce (batch processing), 390, 399-400 accessing external services within job, 404, 412 comparison to distributed databases designing for frequent faults, 417 diversity of processing models, 416 diversity of storage, 415 Index | 575 comparison to stream processing, 464 comparison to Unix, 413-414 disadvantages and limitations of, 419 fault tolerance, 406, 414, 422 higher-level tools, 403, 426 implementation in Hadoop, 400-403 the shuffle, 402 implementation in MongoDB, 46-48 machine learning, 428 map-side processing, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 mapper and reducer functions, 399 materialization of intermediate state, 419-423 output of batch workflows, 411-413 building search indexes, 411 key-value stores, 412 reduce-side processing, 403-408 analysis of user activity events (exam‐ ple), 404 grouping records by same key, 406 handling skew, 407 sort-merge joins, 405 workflows, 402 marshalling (see encoding) massively parallel processing (MPP), 216 comparison to composing storage technolo‐ gies, 502 comparison to Hadoop, 414-418, 428 master-master replication (see multi-leader replication) master-slave replication (see leader-based repli‐ cation) materialization, 556 aggregate values, 101 conflicts, 251 intermediate state (batch processing), 420-423 materialized views, 101 as derived data, 386, 499-504 maintaining, using stream processing, 467, 475 Maven (Java build tool), 428 Maxwell (change data capture), 455 mean, 14 media monitoring, 467 median, 14 576 | Index meeting room booking (example), 249, 259, 521 membership services, 372 Memcached (caching server), 4, 89 memory in-memory databases, 88 durability, 227 serial transaction execution, 253 in-memory representation of data, 112 random bit-flips in, 529 use by indexes, 72, 77 memory barrier (CPU instruction), 338 MemSQL (database) in-memory storage, 89 read committed isolation, 236 memtable (in LSM-trees), 78 Mercurial (version control system), 463 merge joins, MapReduce map-side, 410 mergeable persistent data structures, 174 merging sorted files, 76, 402, 405 Merkle trees, 532 Mesos (cluster manager), 418, 506 message brokers (see messaging systems) message-passing, 136-139 advantages over direct RPC, 137 distributed actor frameworks, 138 evolvability, 138 MessagePack (encoding format), 116 messages exactly-once semantics, 360, 476 loss of, 442 using total order broadcast, 348 messaging systems, 440-451 (see also streams) backpressure, buffering, or dropping mes‐ sages, 441 brokerless messaging, 442 event logs, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 replaying old messages, 451, 496, 498 slow consumers, 450 message brokers, 443-446 acknowledgements and redelivery, 445 comparison to event logs, 448, 451 multiple consumers of same topic, 444 reliability, 442 uniqueness in log-based messaging, 522 Meteor (web framework), 456 microbatching, 477, 495 microservices, 132 (see also services) causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 Microsoft Azure Service Bus (messaging), 444 Azure Storage, 155, 398 Azure Stream Analytics, 466 DCOM (Distributed Component Object Model), 134 MSDTC (transaction coordinator), 356 Orleans (see Orleans) SQL Server (see SQL Server) migrating (rewriting) data, 40, 130, 461, 497 modulus operator (%), 210 MongoDB (database) aggregation pipeline, 48 atomic operations, 243 BSON, 41 document data model, 31 hash partitioning (sharding), 203-204 key-range partitioning, 202 lack of join support, 34, 42 leader-based replication, 153 MapReduce support, 46, 400 oplog parsing, 455, 456 partition splitting, 212 request routing, 216 secondary indexes, 207 Mongoriver (change data capture), 455 monitoring, 10, 19 monotonic clocks, 288 monotonic reads, 164 MPP (see massively parallel processing) MSMQ (messaging), 361 multi-column indexes, 87 multi-leader replication, 168-177 (see also replication) handling write conflicts, 171 conflict avoidance, 172 converging toward a consistent state, 172 custom conflict resolution logic, 173 determining what is a conflict, 174 linearizability, lack of, 333 replication topologies, 175-177 use cases, 168 clients with offline operation, 170 collaborative editing, 170 multi-datacenter replication, 168, 335 multi-object transactions, 228 need for, 231 Multi-Paxos (total order broadcast), 367 multi-table index cluster tables (Oracle), 41 multi-tenancy, 284 multi-version concurrency control (MVCC), 239, 266 detecting stale MVCC reads, 263 indexes and snapshot isolation, 241 mutual exclusion, 261 (see also locks) MySQL (database) binlog coordinates, 156 binlog parsing for change data capture, 455 circular replication topology, 175 consistent snapshots, 156 distributed transaction support, 361 InnoDB storage engine (see InnoDB) JSON support, 30, 42 leader-based replication, 153 performance of XA transactions, 360 row-based replication, 160 schema changes in, 40 snapshot isolation support, 242 (see also InnoDB) statement-based replication, 159 Tungsten Replicator (multi-leader replica‐ tion), 170 conflict detection, 177 N nanomsg (messaging library), 442 Narayana (transaction coordinator), 356 NATS (messaging), 137 near-real-time (nearline) processing, 390 (see also stream processing) Neo4j (database) Cypher query language, 52 graph data model, 50 Nephele (dataflow engine), 421 netcat (Unix tool), 397 Netflix Chaos Monkey, 7, 280 Network Attached Storage (NAS), 146, 398 network model, 36 Index | 577 graph databases versus, 60 imperative query APIs, 46 Network Time Protocol (see NTP) networks congestion and queueing, 282 datacenter network topologies, 276 faults (see faults) linearizability and network delays, 338 network partitions, 279, 337 timeouts and unbounded delays, 281 next-key locking, 260 nodes (in graphs) (see vertices) nodes (processes), 556 handling outages in leader-based replica‐ tion, 156 system models for failure, 307 noisy neighbors, 284 nonblocking atomic commit, 359 nondeterministic operations accidental nondeterminism, 423 partial failures in distributed systems, 275 nonfunctional requirements, 22 nonrepeatable reads, 238 (see also read skew) normalization (data representation), 33, 556 executing joins, 39, 42, 403 foreign key references, 231 in systems of record, 386 versus denormalization, 462 NoSQL, 29, 499 transactions and, 223 Notation3 (N3), 56 npm (package manager), 428 NTP (Network Time Protocol), 287 accuracy, 289, 293 adjustments to monotonic clocks, 289 multiple server addresses, 306 numbers, in XML and JSON encodings, 114 O object-relational mapping (ORM) frameworks, 30 error handling and aborted transactions, 232 unsafe read-modify-write cycle code, 244 object-relational mismatch, 29 observer pattern, 506 offline systems, 390 (see also batch processing) 578 | Index stateful, offline-capable clients, 170, 511 offline-first applications, 511 offsets consumer offsets in partitioned logs, 449 messages in partitioned logs, 447 OLAP (online analytic processing), 91, 556 data cubes, 102 OLTP (online transaction processing), 90, 556 analytics queries versus, 411 workload characteristics, 253 one-to-many relationships, 30 JSON representation, 32 online systems, 389 (see also services) Oozie (workflow scheduler), 402 OpenAPI (service definition format), 133 OpenStack Nova (cloud infrastructure) use of ZooKeeper, 370 Swift (object storage), 398 operability, 19 operating systems versus databases, 499 operation identifiers, 518, 522 operational transformation, 174 operators, 421 flow of data between, 424 in stream processing, 464 optimistic concurrency control, 261 Oracle (database) distributed transaction support, 361 GoldenGate (change data capture), 161, 170, 455 lack of serializability, 226 leader-based replication, 153 multi-table index cluster tables, 41 not preventing write skew, 248 partitioned indexes, 209 PL/SQL language, 255 preventing lost updates, 245 read committed isolation, 236 Real Application Clusters (RAC), 330 recursive query support, 54 snapshot isolation support, 239, 242 TimesTen (in-memory database), 89 WAL-based replication, 160 XML support, 30 ordering, 339-352 by sequence numbers, 343-348 causal ordering, 339-343 partial order, 341 limits of total ordering, 493 total order broadcast, 348-352 Orleans (actor framework), 139 outliers (response time), 14 Oz (programming language), 504 P package managers, 428, 505 packet switching, 285 packets corruption of, 306 sending via UDP, 442 PageRank (algorithm), 49, 424 paging (see virtual memory) ParAccel (database), 93 parallel databases (see massively parallel pro‐ cessing) parallel execution of graph analysis algorithms, 426 queries in MPP databases, 216 Parquet (data format), 96, 131 (see also column-oriented storage) use in Hadoop, 414 partial failures, 275, 310 limping, 311 partial order, 341 partitioning, 199-218, 556 and replication, 200 in batch processing, 429 multi-partition operations, 514 enforcing constraints, 522 secondary index maintenance, 495 of key-value data, 201-205 by key range, 202 skew and hot spots, 205 rebalancing partitions, 209-214 automatic or manual rebalancing, 213 problems with hash mod N, 210 using dynamic partitioning, 212 using fixed number of partitions, 210 using N partitions per node, 212 replication and, 147 request routing, 214-216 secondary indexes, 206-209 document-based partitioning, 206 term-based partitioning, 208 serial execution of transactions and, 255 Paxos (consensus algorithm), 366 ballot number, 368 Multi-Paxos (total order broadcast), 367 percentiles, 14, 556 calculating efficiently, 16 importance of high percentiles, 16 use in service level agreements (SLAs), 15 Percona XtraBackup (MySQL tool), 156 performance describing, 13 of distributed transactions, 360 of in-memory databases, 89 of linearizability, 338 of multi-leader replication, 169 perpetual inconsistency, 525 pessimistic concurrency control, 261 phantoms (transaction isolation), 250 materializing conflicts, 251 preventing, in serializability, 259 physical clocks (see clocks) pickle (Python), 113 Pig (dataflow language), 419, 427 replicated joins, 409 skewed joins, 407 workflows, 403 Pinball (workflow scheduler), 402 pipelined execution, 423 in Unix, 394 point in time, 287 polyglot persistence, 29 polystores, 501 PostgreSQL (database) BDR (multi-leader replication), 170 causal ordering of writes, 177 Bottled Water (change data capture), 455 Bucardo (trigger-based replication), 161, 173 distributed transaction support, 361 foreign data wrappers, 501 full text search support, 490 leader-based replication, 153 log sequence number, 156 MVCC implementation, 239, 241 PL/pgSQL language, 255 PostGIS geospatial indexes, 87 preventing lost updates, 245 preventing write skew, 248, 261 read committed isolation, 236 recursive query support, 54 representing graphs, 51 Index | 579 serializable snapshot isolation (SSI), 261 snapshot isolation support, 239, 242 WAL-based replication, 160 XML and JSON support, 30, 42 pre-splitting, 212 Precision Time Protocol (PTP), 290 predicate locks, 259 predictive analytics, 533-536 amplifying bias, 534 ethics of (see ethics) feedback loops, 536 preemption of datacenter resources, 418 of threads, 298 Pregel processing model, 425 primary keys, 85, 556 compound primary key (Cassandra), 204 primary-secondary replication (see leaderbased replication) privacy, 536-543 consent and freedom of choice, 538 data as assets and power, 540 deleting data, 463 ethical considerations (see ethics) legislation and self-regulation, 542 meaning of, 539 surveillance, 537 tracking behavioral data, 536 probabilistic algorithms, 16, 466 process pauses, 295-299 processing time (of events), 469 producers (message streams), 440 programming languages dataflow languages, 504 for stored procedures, 255 functional reactive programming (FRP), 504 logic programming, 504 Prolog (language), 61 (see also Datalog) promises (asynchronous operations), 135 property graphs, 50 Cypher query language, 52 Protocol Buffers (data format), 117-121 field tags and schema evolution, 120 provenance of data, 531 publish/subscribe model, 441 publishers (message streams), 440 punch card tabulating machines, 390 580 | Index pure functions, 48 putting computation near data, 400 Q Qpid (messaging), 444 quality of service (QoS), 285 Quantcast File System (distributed filesystem), 398 query languages, 42-48 aggregation pipeline, 48 CSS and XSL, 44 Cypher, 52 Datalog, 60 Juttle, 504 MapReduce querying, 46-48 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 query optimizers, 37, 427 queueing delays (networks), 282 head-of-line blocking, 15 latency and response time, 14 queues (messaging), 137 quorums, 179-182, 556 for leaderless replication, 179 in consensus algorithms, 368 limitations of consistency, 181-183, 334 making decisions in distributed systems, 301 monitoring staleness, 182 multi-datacenter replication, 184 relying on durability, 309 sloppy quorums and hinted handoff, 183 R R-trees (indexes), 87 RabbitMQ (messaging), 137, 444 leader-based replication, 153 race conditions, 225 (see also concurrency) avoiding with linearizability, 331 caused by dual writes, 452 dirty writes, 235 in counter increments, 235 lost updates, 242-246 preventing with event logs, 462, 507 preventing with serializable isolation, 252 write skew, 246-251 Raft (consensus algorithm), 366 sensitivity to network problems, 369 term number, 368 use in etcd, 353 RAID (Redundant Array of Independent Disks), 7, 398 railways, schema migration on, 496 RAMCloud (in-memory storage), 89 ranking algorithms, 424 RDF (Resource Description Framework), 57 querying with SPARQL, 59 RDMA (Remote Direct Memory Access), 276 read committed isolation level, 234-237 implementing, 236 multi-version concurrency control (MVCC), 239 no dirty reads, 234 no dirty writes, 235 read path (derived data), 509 read repair (leaderless replication), 178 for linearizability, 335 read replicas (see leader-based replication) read skew (transaction isolation), 238, 266 as violation of causality, 340 read-after-write consistency, 163, 524 cross-device, 164 read-modify-write cycle, 243 read-scaling architecture, 161 reads as events, 513 real-time collaborative editing, 170 near-real-time processing, 390 (see also stream processing) publish/subscribe dataflow, 513 response time guarantees, 298 time-of-day clocks, 288 rebalancing partitions, 209-214, 556 (see also partitioning) automatic or manual rebalancing, 213 dynamic partitioning, 212 fixed number of partitions, 210 fixed number of partitions per node, 212 problems with hash mod N, 210 recency guarantee, 324 recommendation engines batch process outputs, 412 batch workflows, 403, 420 iterative processing, 424 statistical and numerical algorithms, 428 records, 399 events in stream processing, 440 recursive common table expressions (SQL), 54 redelivery (messaging), 445 Redis (database) atomic operations, 243 durability, 89 Lua scripting, 255 single-threaded execution, 253 usage example, 4 redundancy hardware components, 7 of derived data, 386 (see also derived data) Reed–Solomon codes (error correction), 398 refactoring, 22 (see also evolvability) regions (partitioning), 199 register (data structure), 325 relational data model, 28-42 comparison to document model, 38-42 graph queries in SQL, 53 in-memory databases with, 89 many-to-one and many-to-many relation‐ ships, 33 multi-object transactions, need for, 231 NoSQL as alternative to, 29 object-relational mismatch, 29 relational algebra and SQL, 42 versus document model convergence of models, 41 data locality, 41 relational databases eventual consistency, 162 history, 28 leader-based replication, 153 logical logs, 160 philosophy compared to Unix, 499, 501 schema changes, 40, 111, 130 statement-based replication, 158 use of B-tree indexes, 80 relationships (see edges) reliability, 6-10, 489 building a reliable system from unreliable components, 276 defined, 6, 22 hardware faults, 7 human errors, 9 importance of, 10 of messaging systems, 442 Index | 581 software errors, 8 Remote Method Invocation (Java RMI), 134 remote procedure calls (RPCs), 134-136 (see also services) based on futures, 135 data encoding and evolution, 136 issues with, 134 using Avro, 126, 135 using Thrift, 135 versus message brokers, 137 repeatable reads (transaction isolation), 242 replicas, 152 replication, 151-193, 556 and durability, 227 chain replication, 155 conflict resolution and, 246 consistency properties, 161-167 consistent prefix reads, 165 monotonic reads, 164 reading your own writes, 162 in distributed filesystems, 398 leaderless, 177-191 detecting concurrent writes, 184-191 limitations of quorum consistency, 181-183, 334 sloppy quorums and hinted handoff, 183 monitoring staleness, 182 multi-leader, 168-177 across multiple datacenters, 168, 335 handling write conflicts, 171-175 replication topologies, 175-177 partitioning and, 147, 200 reasons for using, 145, 151 single-leader, 152-161 failover, 157 implementation of replication logs, 158-161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 state machine replication, 349, 452 using erasure coding, 398 with heterogeneous data systems, 453 replication logs (see logs) reprocessing data, 496, 498 (see also evolvability) from log-based messaging, 451 request routing, 214-216 582 | Index approaches to, 214 parallel query execution, 216 resilient systems, 6 (see also fault tolerance) response time as performance metric for services, 13, 389 guarantees on, 298 latency versus, 14 mean and percentiles, 14 user experience, 15 responsibility and accountability, 535 REST (Representational State Transfer), 133 (see also services) RethinkDB (database) document data model, 31 dynamic partitioning, 212 join support, 34, 42 key-range partitioning, 202 leader-based replication, 153 subscribing to changes, 456 Riak (database) Bitcask storage engine, 72 CRDTs, 174, 191 dotted version vectors, 191 gossip protocol, 216 hash partitioning, 203-204, 211 last-write-wins conflict resolution, 186 leaderless replication, 177 LevelDB storage engine, 78 linearizability, lack of, 335 multi-datacenter support, 184 preventing lost updates across replicas, 246 rebalancing, 213 search feature, 209 secondary indexes, 207 siblings (concurrently written values), 190 sloppy quorums, 184 ring buffers, 450 Ripple (cryptocurrency), 532 rockets, 10, 36, 305 RocksDB (storage engine), 78 leveled compaction, 79 rollbacks (transactions), 222 rolling upgrades, 8, 112 routing (see request routing) row-oriented storage, 96 row-based replication, 160 rowhammer (memory corruption), 529 RPCs (see remote procedure calls) Rubygems (package manager), 428 rules (Datalog), 61 S safety and liveness properties, 308 in consensus algorithms, 366 in transactions, 222 sagas (see compensating transactions) Samza (stream processor), 466, 467 fault tolerance, 479 streaming SQL support, 466 sandboxes, 9 SAP HANA (database), 93 scalability, 10-18, 489 approaches for coping with load, 17 defined, 22 describing load, 11 describing performance, 13 partitioning and, 199 replication and, 161 scaling up versus scaling out, 146 scaling out, 17, 146 (see also shared-nothing architecture) scaling up, 17, 146 scatter/gather approach, querying partitioned databases, 207 SCD (slowly changing dimension), 476 schema-on-read, 39 comparison to evolvable schema, 128 in distributed filesystems, 415 schema-on-write, 39 schemaless databases (see schema-on-read) schemas, 557 Avro, 122-127 reader determining writer’s schema, 125 schema evolution, 123 dynamically generated, 126 evolution of, 496 affecting application code, 111 compatibility checking, 126 in databases, 129-131 in message-passing, 138 in service calls, 136 flexibility in document model, 39 for analytics, 93-95 for JSON and XML, 115 merits of, 127 schema migration on railways, 496 Thrift and Protocol Buffers, 117-121 schema evolution, 120 traditional approach to design, fallacy in, 462 searches building search indexes in batch processes, 411 k-nearest neighbors, 429 on streams, 467 partitioned secondary indexes, 206 secondaries (see leader-based replication) secondary indexes, 85, 557 partitioning, 206-209, 217 document-partitioned, 206 index maintenance, 495 term-partitioned, 208 problems with dual writes, 452, 491 updating, transaction isolation and, 231 secondary sorts, 405 sed (Unix tool), 392 self-describing files, 127 self-joins, 480 self-validating systems, 530 semantic web, 57 semi-synchronous replication, 154 sequence number ordering, 343-348 generators, 294, 344 insufficiency for enforcing constraints, 347 Lamport timestamps, 345 use of timestamps, 291, 295, 345 sequential consistency, 351 serializability, 225, 233, 251-266, 557 linearizability versus, 329 pessimistic versus optimistic concurrency control, 261 serial execution, 252-256 partitioning, 255 using stored procedures, 253, 349 serializable snapshot isolation (SSI), 261-266 detecting stale MVCC reads, 263 detecting writes that affect prior reads, 264 distributed execution, 265, 364 performance of SSI, 265 preventing write skew, 262-265 two-phase locking (2PL), 257-261 index-range locks, 260 performance, 258 Serializable (Java), 113 Index | 583 serialization, 113 (see also encoding) service discovery, 135, 214, 372 using DNS, 216, 372 service level agreements (SLAs), 15 service-oriented architecture (SOA), 132 (see also services) services, 131-136 microservices, 132 causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 remote procedure calls (RPCs), 134-136 issues with, 134 similarity to databases, 132 web services, 132, 135 session windows (stream processing), 472 (see also windows) sessionization, 407 sharding (see partitioning) shared mode (locks), 258 shared-disk architecture, 146, 398 shared-memory architecture, 146 shared-nothing architecture, 17, 146-147, 557 (see also replication) distributed filesystems, 398 (see also distributed filesystems) partitioning, 199 use of network, 277 sharks biting undersea cables, 279 counting (example), 46-48 finding (example), 42 website about (example), 44 shredding (in relational model), 38 siblings (concurrent values), 190, 246 (see also conflicts) similarity search edit distance, 88 genome data, 63 k-nearest neighbors, 429 single-leader replication (see leader-based rep‐ lication) single-threaded execution, 243, 252 in batch processing, 406, 421, 426 in stream processing, 448, 463, 522 size-tiered compaction, 79 skew, 557 584 | Index clock skew, 291-294, 334 in transaction isolation read skew, 238, 266 write skew, 246-251, 262-265 (see also write skew) meanings of, 238 unbalanced workload, 201 compensating for, 205 due to celebrities, 205 for time-series data, 203 in batch processing, 407 slaves (see leader-based replication) sliding windows (stream processing), 472 (see also windows) sloppy quorums, 183 (see also quorums) lack of linearizability, 334 slowly changing dimension (data warehouses), 476 smearing (leap seconds adjustments), 290 snapshots (databases) causal consistency, 340 computing derived data, 500 in change data capture, 455 serializable snapshot isolation (SSI), 261-266, 329 setting up a new replica, 156 snapshot isolation and repeatable read, 237-242 implementing with MVCC, 239 indexes and MVCC, 241 visibility rules, 240 synchronized clocks for global snapshots, 294 snowflake schemas, 95 SOAP, 133 (see also services) evolvability, 136 software bugs, 8 maintaining integrity, 529 solid state drives (SSDs) access patterns, 84 detecting corruption, 519, 530 faults in, 227 sequential write throughput, 75 Solr (search server) building indexes in batch processes, 411 document-partitioned indexes, 207 request routing, 216 usage example, 4 use of Lucene, 79 sort (Unix tool), 392, 394, 395 sort-merge joins (MapReduce), 405 Sorted String Tables (see SSTables) sorting sort order in column storage, 99 source of truth (see systems of record) Spanner (database) data locality, 41 snapshot isolation using clocks, 295 TrueTime API, 294 Spark (processing framework), 421-423 bytecode generation, 428 dataflow APIs, 427 fault tolerance, 422 for data warehouses, 93 GraphX API (graph processing), 425 machine learning, 428 query optimizer, 427 Spark Streaming, 466 microbatching, 477 stream processing on top of batch process‐ ing, 495 SPARQL (query language), 59 spatial algorithms, 429 split brain, 158, 557 in consensus algorithms, 352, 367 preventing, 322, 333 using fencing tokens to avoid, 302-304 spreadsheets, dataflow programming capabili‐ ties, 504 SQL (Structured Query Language), 21, 28, 43 advantages and limitations of, 416 distributed query execution, 48 graph queries in, 53 isolation levels standard, issues with, 242 query execution on Hadoop, 416 résumé (example), 30 SQL injection vulnerability, 305 SQL on Hadoop, 93 statement-based replication, 158 stored procedures, 255 SQL Server (database) data warehousing support, 93 distributed transaction support, 361 leader-based replication, 153 preventing lost updates, 245 preventing write skew, 248, 257 read committed isolation, 236 recursive query support, 54 serializable isolation, 257 snapshot isolation support, 239 T-SQL language, 255 XML support, 30 SQLstream (stream analytics), 466 SSDs (see solid state drives) SSTables (storage format), 76-79 advantages over hash indexes, 76 concatenated index, 204 constructing and maintaining, 78 making LSM-Tree from, 78 staleness (old data), 162 cross-channel timing dependencies, 331 in leaderless databases, 178 in multi-version concurrency control, 263 monitoring for, 182 of client state, 512 versus linearizability, 324 versus timeliness, 524 standbys (see leader-based replication) star replication topologies, 175 star schemas, 93-95 similarity to event sourcing, 458 Star Wars analogy (event time versus process‐ ing time), 469 state derived from log of immutable events, 459 deriving current state from the event log, 458 interplay between state changes and appli‐ cation code, 507 maintaining derived state, 495 maintenance by stream processor in streamstream joins, 473 observing derived state, 509-515 rebuilding after stream processor failure, 478 separation of application code and, 505 state machine replication, 349, 452 statement-based replication, 158 statically typed languages analogy to schema-on-write, 40 code generation and, 127 statistical and numerical algorithms, 428 StatsD (metrics aggregator), 442 stdin, stdout, 395, 396 Stellar (cryptocurrency), 532 Index | 585 stock market feeds, 442 STONITH (Shoot The Other Node In The Head), 158 stop-the-world (see garbage collection) storage composing data storage technologies, 499-504 diversity of, in MapReduce, 415 Storage Area Network (SAN), 146, 398 storage engines, 69-104 column-oriented, 95-101 column compression, 97-99 defined, 96 distinction between column families and, 99 Parquet, 96, 131 sort order in, 99-100 writing to, 101 comparing requirements for transaction processing and analytics, 90-96 in-memory storage, 88 durability, 227 row-oriented, 70-90 B-trees, 79-83 comparing B-trees and LSM-trees, 83-85 defined, 96 log-structured, 72-79 stored procedures, 161, 253-255, 557 and total order broadcast, 349 pros and cons of, 255 similarity to stream processors, 505 Storm (stream processor), 466 distributed RPC, 468, 514 Trident state handling, 478 straggler events, 470, 498 stream processing, 464-481, 557 accessing external services within job, 474, 477, 478, 517 combining with batch processing lambda architecture, 497 unifying technologies, 498 comparison to batch processing, 464 complex event processing (CEP), 465 fault tolerance, 476-479 atomic commit, 477 idempotence, 478 microbatching and checkpointing, 477 rebuilding state after a failure, 478 for data integration, 494-498 586 | Index maintaining derived state, 495 maintenance of materialized views, 467 messaging systems (see messaging systems) reasoning about time, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 types of windows, 472 relation to databases (see streams) relation to services, 508 search on streams, 467 single-threaded execution, 448, 463 stream analytics, 466 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 streams, 440-451 end-to-end, pushing events to clients, 512 messaging systems (see messaging systems) processing (see stream processing) relation to databases, 451-464 (see also changelogs) API support for change streams, 456 change data capture, 454-457 derivative of state by time, 460 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 topics, 440 strict serializability, 329 strong consistency (see linearizability) strong one-copy serializability, 329 subjects, predicates, and objects (in triplestores), 55 subscribers (message streams), 440 (see also consumers) supercomputers, 275 surveillance, 537 (see also privacy) Swagger (service definition format), 133 swapping to disk (see virtual memory) synchronous networks, 285, 557 comparison to asynchronous networks, 284 formal model, 307 synchronous replication, 154, 557 chain replication, 155 conflict detection, 172 system models, 300, 306-310 assumptions in, 528 correctness of algorithms, 308 mapping to the real world, 309 safety and liveness, 308 systems of record, 386, 557 change data capture, 454, 491 treating event log as, 460 systems thinking, 536 T t-digest (algorithm), 16 table-table joins, 474 Tableau (data visualization software), 416 tail (Unix tool), 447 tail vertex (property graphs), 51 Tajo (query engine), 93 Tandem NonStop SQL (database), 200 TCP (Transmission Control Protocol), 277 comparison to circuit switching, 285 comparison to UDP, 283 connection failures, 280 flow control, 282, 441 packet checksums, 306, 519, 529 reliability and duplicate suppression, 517 retransmission timeouts, 284 use for transaction sessions, 229 telemetry (see monitoring) Teradata (database), 93, 200 term-partitioned indexes, 208, 217 termination (consensus), 365 Terrapin (database), 413 Tez (dataflow engine), 421-423 fault tolerance, 422 support by higher-level tools, 427 thrashing (out of memory), 297 threads (concurrency) actor model, 138, 468 (see also message-passing) atomic operations, 223 background threads, 73, 85 execution pauses, 286, 296-298 memory barriers, 338 preemption, 298 single (see single-threaded execution) three-phase commit, 359 Thrift (data format), 117-121 BinaryProtocol, 118 CompactProtocol, 119 field tags and schema evolution, 120 throughput, 13, 390 TIBCO, 137 Enterprise Message Service, 444 StreamBase (stream analytics), 466 time concurrency and, 187 cross-channel timing dependencies, 331 in distributed systems, 287-299 (see also clocks) clock synchronization and accuracy, 289 relying on synchronized clocks, 291-295 process pauses, 295-299 reasoning about, in stream processors, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 timestamp of events, 471 types of windows, 472 system models for distributed systems, 307 time-dependence in stream joins, 475 time-of-day clocks, 288 timeliness, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 timeouts, 279, 557 dynamic configuration of, 284 for failover, 158 length of, 281 timestamps, 343 assigning to events in stream processing, 471 for read-after-write consistency, 163 for transaction ordering, 295 insufficiency for enforcing constraints, 347 key range partitioning by, 203 Lamport, 345 logical, 494 ordering events, 291, 345 Titan (database), 50 tombstones, 74, 191, 456 topics (messaging), 137, 440 total order, 341, 557 limits of, 493 sequence numbers or timestamps, 344 total order broadcast, 348-352, 493, 522 consensus algorithms and, 366-368 Index | 587 implementation in ZooKeeper and etcd, 370 implementing with linearizable storage, 351 using, 349 using to implement linearizable storage, 350 tracking behavioral data, 536 (see also privacy) transaction coordinator (see coordinator) transaction manager (see coordinator) transaction processing, 28, 90-95 comparison to analytics, 91 comparison to data warehousing, 93 transactions, 221-267, 558 ACID properties of, 223 atomicity, 223 consistency, 224 durability, 226 isolation, 225 compensating (see compensating transac‐ tions) concept of, 222 distributed transactions, 352-364 avoiding, 492, 502, 521-528 failure amplification, 364, 495 in doubt/uncertain status, 358, 362 two-phase commit, 354-359 use of, 360-361 XA transactions, 361-364 OLTP versus analytics queries, 411 purpose of, 222 serializability, 251-266 actual serial execution, 252-256 pessimistic versus optimistic concur‐ rency control, 261 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 single-object and multi-object, 228-232 handling errors and aborts, 231 need for multi-object transactions, 231 single-object writes, 230 snapshot isolation (see snapshots) weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-238 transitive closure (graph algorithm), 424 trie (data structure), 88 triggers (databases), 161, 441 implementing change data capture, 455 implementing replication, 161 588 | Index triple-stores, 55-59 SPARQL query language, 59 tumbling windows (stream processing), 472 (see also windows) in microbatching, 477 tuple spaces (programming model), 507 Turtle (RDF data format), 56 Twitter constructing home timelines (example), 11, 462, 474, 511 DistributedLog (event log), 448 Finagle (RPC framework), 135 Snowflake (sequence number generator), 294 Summingbird (processing library), 497 two-phase commit (2PC), 353, 355-359, 558 confusion with two-phase locking, 356 coordinator failure, 358 coordinator recovery, 363 how it works, 357 issues in practice, 363 performance cost, 360 transactions holding locks, 362 two-phase locking (2PL), 257-261, 329, 558 confusion with two-phase commit, 356 index-range locks, 260 performance of, 258 type checking, dynamic versus static, 40 U UDP (User Datagram Protocol) comparison to TCP, 283 multicast, 442 unbounded datasets, 439, 558 (see also streams) unbounded delays, 558 in networks, 282 process pauses, 296 unbundling databases, 499-515 composing data storage technologies, 499-504 federation versus unbundling, 501 need for high-level language, 503 designing applications around dataflow, 504-509 observing derived state, 509-515 materialized views and caching, 510 multi-partition data processing, 514 pushing state changes to clients, 512 uncertain (transaction status) (see in doubt) uniform consensus, 365 (see also consensus) uniform interfaces, 395 union type (in Avro), 125 uniq (Unix tool), 392 uniqueness constraints asynchronously checked, 526 requiring consensus, 521 requiring linearizability, 330 uniqueness in log-based messaging, 522 Unix philosophy, 394-397 command-line batch processing, 391-394 Unix pipes versus dataflow engines, 423 comparison to Hadoop, 413-414 comparison to relational databases, 499, 501 comparison to stream processing, 464 composability and uniform interfaces, 395 loose coupling, 396 pipes, 394 relation to Hadoop, 499 UPDATE statement (SQL), 40 updates preventing lost updates, 242-246 atomic write operations, 243 automatically detecting lost updates, 245 compare-and-set operations, 245 conflict resolution and replication, 246 using explicit locking, 244 preventing write skew, 246-251 V validity (consensus), 365 vBuckets (partitioning), 199 vector clocks, 191 (see also version vectors) vectorized processing, 99, 428 verification, 528-533 avoiding blind trust, 530 culture of, 530 designing for auditability, 531 end-to-end integrity checks, 531 tools for auditable data systems, 532 version control systems, reliance on immutable data, 463 version vectors, 177, 191 capturing causal dependencies, 343 versus vector clocks, 191 Vertica (database), 93 handling writes, 101 replicas using different sort orders, 100 vertical scaling (see scaling up) vertices (in graphs), 49 property graph model, 50 Viewstamped Replication (consensus algo‐ rithm), 366 view number, 368 virtual machines, 146 (see also cloud computing) context switches, 297 network performance, 282 noisy neighbors, 284 reliability in cloud services, 8 virtualized clocks in, 290 virtual memory process pauses due to page faults, 14, 297 versus memory management by databases, 89 VisiCalc (spreadsheets), 504 vnodes (partitioning), 199 Voice over IP (VoIP), 283 Voldemort (database) building read-only stores in batch processes, 413 hash partitioning, 203-204, 211 leaderless replication, 177 multi-datacenter support, 184 rebalancing, 213 reliance on read repair, 179 sloppy quorums, 184 VoltDB (database) cross-partition serializability, 256 deterministic stored procedures, 255 in-memory storage, 89 output streams, 456 secondary indexes, 207 serial execution of transactions, 253 statement-based replication, 159, 479 transactions in stream processing, 477 W WAL (write-ahead log), 82 web services (see services) Web Services Description Language (WSDL), 133 webhooks, 443 webMethods (messaging), 137 WebSocket (protocol), 512 Index | 589 windows (stream processing), 466, 468-472 infinite windows for changelogs, 467, 474 knowing when all events have arrived, 470 stream joins within a window, 473 types of windows, 472 winners (conflict resolution), 173 WITH RECURSIVE syntax (SQL), 54 workflows (MapReduce), 402 outputs, 411-414 key-value stores, 412 search indexes, 411 with map-side joins, 410 working set, 393 write amplification, 84 write path (derived data), 509 write skew (transaction isolation), 246-251 characterizing, 246-251, 262 examples of, 247, 249 materializing conflicts, 251 occurrence in practice, 529 phantoms, 250 preventing in snapshot isolation, 262-265 in two-phase locking, 259-261 options for, 248 write-ahead log (WAL), 82, 159 writes (database) atomic write operations, 243 detecting writes affecting prior reads, 264 preventing dirty writes with read commit‐ ted, 235 WS-* framework, 133 (see also services) WS-AtomicTransaction (2PC), 355 590 | Index X XA transactions, 355, 361-364 heuristic decisions, 363 limitations of, 363 xargs (Unix tool), 392, 396 XML binary variants, 115 encoding RDF data, 57 for application data, issues with, 114 in relational databases, 30, 41 XSL/XPath, 45 Y Yahoo!

If you lose derived data, you can recreate it from the original source. A classic example is a cache: data can be served from the cache if present, but if the cache doesn’t contain what you need, you can fall back to the underlying database. Denormalized values, indexes, and materialized views also fall into this category. In recommendation systems, predictive summary data is often derived from usage logs. Technically speaking, derived data is redundant, in the sense that it duplicates exist‐ ing information. However, it is often essential for getting good performance on read queries. It is commonly denormalized. You can derive several different datasets from a single source, enabling you to look at the data from different “points of view.”


Remix: Making Art and Commerce Thrive in the Hybrid Economy by Lawrence Lessig

Aaron Swartz, Amazon Web Services, Andrew Keen, Benjamin Mako Hill, Berlin Wall, Bernie Sanders, Brewster Kahle, carbon tax, Cass Sunstein, collaborative editing, commoditize, disintermediation, don't be evil, Erik Brynjolfsson, folksonomy, Free Software Foundation, Internet Archive, invisible hand, Jeff Bezos, jimmy wales, John Perry Barlow, Joi Ito, Kevin Kelly, Larry Wall, late fees, Mark Shuttleworth, Netflix Prize, Network effects, new economy, optical character recognition, PageRank, peer-to-peer, recommendation engine, revision control, Richard Stallman, Ronald Coase, Saturday Night Live, search costs, SETI@home, sharing economy, Silicon Valley, Skype, slashdot, Steve Jobs, the long tail, The Nature of the Firm, thinkpad, transaction costs, VA Linux, Wayback Machine, yellow journalism, Yochai Benkler

Jeff Jarvis, journalist and blogger, suggests companies “pay dividends back to [the] crowd” and avoid trying too hard “to control [the gathered] 80706 i-xxiv 001-328 r4nk.indd 233 8/12/08 1:55:56 AM REMI X 234 wisdom, and limit its use and the sharing of it.”19 Tapscott and Williams make the same recommendation: “platforms for participation will only remain viable for as long as all the stakeholders are adequately and appropriately compensated for their contributions— don’t expect a free ride forever.”20 The key word here is “appropriately.” Obviously, there must be adequate compensation. But the kind of compensation is the puzzle.


Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps by Valliappa Lakshmanan, Sara Robinson, Michael Munn

A Pattern Language, Airbnb, algorithmic trading, automated trading system, business intelligence, business logic, business process, combinatorial explosion, computer vision, continuous integration, COVID-19, data science, deep learning, DevOps, discrete time, en.wikipedia.org, Hacker News, industrial research laboratory, iterative process, Kubernetes, machine translation, microservices, mobile money, natural language processing, Netflix Prize, optical character recognition, pattern recognition, performance metric, recommendation engine, ride hailing / ride sharing, selection bias, self-driving car, sentiment analysis, speech recognition, statistical model, the payments system, web application

Feature Store Transform Reframing Hashed Feature Cascade Neutral Class Two-Phase Predictions Stateless Serving Function Windowed Inference Recommendation Systems Recommender systems are one of the most widespread applications of machine learning in business and they often arise whenever users interact with items. Recommender systems capture features of past behavior and similar users and recommend items most relevant for a given user. Think of how YouTube will recommend a series of videos for you to watch based on your watch history, or Amazon may recommend purchases based on items in your shopping cart. Recommendation systems are popular throughout many businesses, particularly for product recommendation, personalized and dynamic marketing, and streaming video or music platforms.

A recent paper that beats all benchmarks at predicting protein folding structure also predicts the distance between amino acids as a 64-way classification problem where the distances are bucketized into 64 bins. Another reason to reframe a problem is when the objective is better in the other type of model. For example, suppose we are trying to build a recommendation system for videos. A natural way to frame this problem is as a classification problem of predicting whether a user is likely to watch a certain video. This framing, however, can lead to a recommendation system that prioritizes click bait. It might be better to reframe this into a regression problem of predicting the fraction of the video that will be watched. Why It Works Changing the context and reframing the task of a problem can help when building a machine learning solution.

Cached results of batch serving We discussed batch serving as a way to invoke a model over millions of items when the model is normally served online using the Stateless Serving Function design pattern. Of course, it is possible for batch serving to work even if the model does not support online serving. What matters is that the machine learning framework doing inference is capable of taking advantage of embarrassingly parallel processing. Recommendation engines, for example, need to fill out a sparse matrix consisting of every user–item pair. A typical business might have 10 million all-time users and 10,000 items in the product catalog. In order to make a recommendation for a user, recommendation scores have to be computed for each of the 10,000 items, ranked, and the top 5 presented to the user.


pages: 320 words: 87,853

The Black Box Society: The Secret Algorithms That Control Money and Information by Frank Pasquale

Adam Curtis, Affordable Care Act / Obamacare, Alan Greenspan, algorithmic trading, Amazon Mechanical Turk, American Legislative Exchange Council, asset-backed security, Atul Gawande, bank run, barriers to entry, basic income, Bear Stearns, Berlin Wall, Bernie Madoff, Black Swan, bonus culture, Brian Krebs, business cycle, business logic, call centre, Capital in the Twenty-First Century by Thomas Piketty, Chelsea Manning, Chuck Templeton: OpenTable:, cloud computing, collateralized debt obligation, computerized markets, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, cryptocurrency, data science, Debian, digital rights, don't be evil, drone strike, Edward Snowden, en.wikipedia.org, Evgeny Morozov, Fall of the Berlin Wall, Filter Bubble, financial engineering, financial innovation, financial thriller, fixed income, Flash crash, folksonomy, full employment, Gabriella Coleman, Goldman Sachs: Vampire Squid, Google Earth, Hernando de Soto, High speed trading, hiring and firing, housing crisis, Ian Bogost, informal economy, information asymmetry, information retrieval, information security, interest rate swap, Internet of things, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Bogle, Julian Assange, Kevin Kelly, Kevin Roose, knowledge worker, Kodak vs Instagram, kremlinology, late fees, London Interbank Offered Rate, London Whale, machine readable, Marc Andreessen, Mark Zuckerberg, Michael Milken, mobile money, moral hazard, new economy, Nicholas Carr, offshore financial centre, PageRank, pattern recognition, Philip Mirowski, precariat, profit maximization, profit motive, public intellectual, quantitative easing, race to the bottom, reality distortion field, recommendation engine, regulatory arbitrage, risk-adjusted returns, Satyajit Das, Savings and loan crisis, search engine result page, shareholder value, Silicon Valley, Snapchat, social intelligence, Spread Networks laid a new fibre optics cable between New York and Chicago, statistical arbitrage, statistical model, Steven Levy, technological solutionism, the scientific method, too big to fail, transaction costs, two-sided market, universal basic income, Upton Sinclair, value at risk, vertical integration, WikiLeaks, Yochai Benkler, zero-sum game

A bad credit score may cost a borrower hundreds of thousands of dollars, but he will never understand exactly how it was calculated. A predictive INTRODUCTION—THE NEED TO KNOW 5 analytics firm may score someone as a “high cost” or “unreliable” worker, yet never tell her about the decision. More benignly, perhaps, these companies influence the choices we make ourselves. Recommendation engines at Amazon and YouTube affect an automated familiarity, gently suggesting offerings they think we’ll like. But don’t discount the significance of that “perhaps.” The economic, political, and cultural agendas behind their suggestions are hard to unravel. As middlemen, they specialize in shifting alliances, sometimes advancing the interests of customers, sometimes suppliers: all to orchestrate an online world that maximizes their own profits.

Similar protocols also influence— invisibly—not only the route we take to a new restaurant, but which restaurant Google, Yelp, OpenTable, or Siri recommends to us. They might help us fi nd reviews of the car we drive. Yet choosing a car, or even a restaurant, is not as straightforward as optimizing an engine or routing a drive. Does the recommendation engine take into account, say, whether the restaurant or car company gives its workers health benefits or maternity leave? Could we prompt it to do so? In their race for the most profitable methods of mapping social reality, the data scientists of Silicon Valley and Wall Street tend to treat recommendations as purely technical problems.

Even if it is the former, we should note that Google’s autosuggest feature may have automatically entered the word “bomb” after “pressure cooker” while he was 228 NOTES TO PAGES 21–23 typing— certainly many people would have done the search in the days after the Boston bombing merely to learn just how lethal such an attack could be. The police had no way of knowing whether Catalano had actually typed “bomb” himself, or accidentally clicked on it thanks to Google’s increasingly aggressive recommendation engines. See also Philip Bump, “Update: Now We Know Why Googling ‘Pressure Cookers’ Gets a Visit from the Cops,” The Wire, August 1, 2013, http://www.thewire.com /national /2013/08/government-knocking -doors-because-google-searches/67864 /#.UfqCSAXy7zQ.facebook. 10. Martin Kuhn, Federal Dataveillance: Implications for Constitutional Privacy Protections (New York: LFB Scholarly Publishing, 2007), 178. 11.


pages: 398 words: 86,855

Bad Data Handbook by Q. Ethan McCallum

Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, Chuck Templeton: OpenTable:, cloud computing, cognitive dissonance, combinatorial explosion, commoditize, conceptual framework, data science, database schema, DevOps, en.wikipedia.org, Firefox, Flash crash, functional programming, Gini coefficient, hype cycle, illegal immigration, iterative process, labor-force participation, loose coupling, machine readable, natural language processing, Netflix Prize, One Laptop per Child (OLPC), power law, quantitative trading / quantitative finance, recommendation engine, selection bias, sentiment analysis, SQL injection, statistical model, supply-chain management, survivorship bias, text mining, too big to fail, web application

Facebook is powered by its Open Graph, the “people and the connections they have to everything they care about.”[68] Facebook provides an API to access this social network and make it available for integration into other networked datasets. On Twitter, the network structure resulting from friends and followers leads to recommendations of “Who to follow.” On LinkedIn, network-based recommendations include “Jobs you may be interested in” and “Groups you may like.” The recommendation engine hunch.com is built on a “Taste Graph” that “uses signals from around the Web to map members with their predicted affinity for products, services, other people, websites, or just about anything, and customizes recommended topics for them.”[69] A search on Google can be considered a type of recommendation about which of possibly millions of search hits are most relevant for a particular query.

[63] http://en.wikipedia.org/wiki/File:KochFlake.svg [64] http://blueprints.tinkerpop.com [65] http://gremlin.tinkerpop.com [66] http://gremlin.tinkerpop.com/Path-Pattern [67] Ted G. Lewis. 2009. Network Science: Theory and Applications. Wiley Publishing. [68] http://developers.facebook.com/docs/opengraph [69] “eBay Acquires Recommendation Engine Hunch.com,” http://www.businesswire.com/news/home/20111121005831/en [70] Brin, S.; Page, L. 1998. “The anatomy of a large-scale hypertextual Web search engine.” Computer Networks and ISDN Systems 30: 107–117 Chapter 14. Myths of Cloud Computing Steve Francia Myths are an important and natural part of the emergence of any new technology, product, or idea as identified by the hype cycle.

.), designed intranet search systems for portal software (at DataChannel), and combined multiple sets of directory assistance data into a searchable website (as CTO at WhitePages.com). For the past five years or so, I’ve spent most of my time at Demand Media using a wide variety of data sources to build optimization systems for advertising and content recommendation systems, with various side excursions into large-scale data-driven search engine optimization (SEO) and search engine marketing (SEM). Most of my examples will be related to work I’ve done in Ad Optimization, Content Recommendation, SEO, and SEM. These areas, as with most, have their own terminology, so a few term definitions may be helpful.


pages: 475 words: 134,707

The Hype Machine: How Social Media Disrupts Our Elections, Our Economy, and Our Health--And How We Must Adapt by Sinan Aral

Airbnb, Albert Einstein, algorithmic bias, AlphaGo, Any sufficiently advanced technology is indistinguishable from magic, AOL-Time Warner, augmented reality, behavioural economics, Bernie Sanders, Big Tech, bitcoin, Black Lives Matter, Cambridge Analytica, carbon footprint, Cass Sunstein, computer vision, contact tracing, coronavirus, correlation does not imply causation, COVID-19, crowdsourcing, cryptocurrency, data science, death of newspapers, deep learning, deepfake, digital divide, digital nomad, disinformation, disintermediation, Donald Trump, Drosophila, Edward Snowden, Elon Musk, en.wikipedia.org, end-to-end encryption, Erik Brynjolfsson, experimental subject, facts on the ground, fake news, Filter Bubble, George Floyd, global pandemic, hive mind, illegal immigration, income inequality, Kickstarter, knowledge worker, lockdown, longitudinal study, low skilled workers, Lyft, Mahatma Gandhi, Mark Zuckerberg, Menlo Park, meta-analysis, Metcalfe’s law, mobile money, move fast and break things, multi-sided market, Nate Silver, natural language processing, Neal Stephenson, Network effects, performance metric, phenotype, recommendation engine, Robert Bork, Robert Shiller, Russian election interference, Second Machine Age, seminal paper, sentiment analysis, shareholder value, Sheryl Sandberg, skunkworks, Snapchat, social contagion, social distancing, social graph, social intelligence, social software, social web, statistical model, stem cell, Stephen Hawking, Steve Bannon, Steve Jobs, Steve Jurvetson, surveillance capitalism, Susan Wojcicki, Telecommunications Act of 1996, The Chicago School, the strength of weak ties, The Wisdom of Crowds, theory of mind, TikTok, Tim Cook: Apple, Uber and Lyft, uber lyft, WikiLeaks, work culture , Yogi Berra

Instagram began blocking antivaccine-related hashtags like #vaccinescauseautism and #vaccinesarepoison. YouTube announced it is no longer allowing users to monetize antivaccine videos with ads. Pinterest banned searches for vaccine content. Facebook stopped showing pages and groups featuring antivaccine content and tweaked its recommendation engines to stop suggesting users join these groups. They also took down the Facebook ads that Larry Cook and others had been buying. The social platforms took similar steps to stem the spread of coronavirus fake news in 2020. Will these measures help slow the coronavirus, measles outbreaks, and future pandemics?

The Transparency Paradox Immediately after the Cambridge Analytica scandal broke, in an interview by Martin Giles for the MIT Technology Review, I predicted the Hype Machine was about to face a dilemma that would pull it in competing directions. On the one hand, social media platforms would face pressure to be more open and transparent about their inner workings: how their trending and ad-targeting algorithms work, how misinformation diffuses through them, and whether recommendation engines increase polarization. The world wanted Facebook and Twitter to open the kimono and reveal how it all worked, so we could understand how to use and fix social media. On the other hand, the Hype Machine would also be pushed to protect our privacy and security, to lock down consumer data, to stop sharing private information with third parties, and to protect us from data breaches like Cambridge Analytica’s.

In this case, it’s important, because if people with more economic opportunity tend to develop more diverse networks (rather than the networks providing the opportunity), then the Hype Machine is more likely to reflect economic opportunity than to create it. How important is the machine in all this? Do we just replicate our existing social networks on social media, or do the Hype Machine’s recommendation engines provide us with new economic opportunities? Erik Brynjolfsson and I collaborated with Ya Xu and Guillaume Saint-Jacques of LinkedIn to find out. Guillaume was our PhD student at MIT before going to work for Ya, LinkedIn’s director of data science. The collaboration allowed us to test the cause and effect relationship between weak ties and job mobility.


pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem

Amazon Web Services, anti-pattern, bioinformatics, business logic, commoditize, corporate governance, create, read, update, delete, data acquisition, en.wikipedia.org, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, the strength of weak ties, web application

Common Use Cases | 95 As in the social use case, making an effective recommendation depends on under‐ standing the connections between things, as well as the quality and strength of those connections—all of which are best expressed as a property graph. Queries are primarily graph local, in that they start with one or more identifiable subjects, whether people or resources, and thereafter discover surrounding portions of the graph. Taken together, social networks and recommendation engines provide key differenti‐ ating capabilities in the areas of retail, recruitment, sentiment analysis, search, and knowledge management. Graphs are a good fit for the densely connected data structures germane to each of these areas; storing and querying this data using a graph database allows an application to surface end-user realtime results that reflect recent changes to the data, rather than pre-calculated, stale results.

. • Sparse tables with nullable columns require special checking in code, despite the presence of a schema. • Several expensive joins are needed just to discover what a customer bought. • Reciprocal queries are even more costly. “What products did a customer buy?” is relatively cheap compared to “which customers bought this product?”, which is the basis of recommendation systems. We could introduce an index, but even with an index, recursive questions such as “which customers bought this product who also bought that product?” quickly become prohibitively expensive as the degree of re‐ cursion increases. Relational databases struggle with highly-connected domains.


pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence by John Brockman

Adam Curtis, agricultural Revolution, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Anthropocene, artificial general intelligence, augmented reality, autism spectrum disorder, autonomous vehicles, backpropagation, basic income, behavioural economics, bitcoin, blockchain, bread and circuses, Charles Babbage, clean water, cognitive dissonance, Colonization of Mars, complexity theory, computer age, computer vision, constrained optimization, corporate personhood, cosmological principle, cryptocurrency, cuban missile crisis, Danny Hillis, dark matter, data science, deep learning, DeepMind, Demis Hassabis, digital capitalism, digital divide, digital rights, discrete time, Douglas Engelbart, driverless car, Elon Musk, Emanuel Derman, endowment effect, epigenetics, Ernest Rutherford, experimental economics, financial engineering, Flash crash, friendly AI, functional fixedness, global pandemic, Google Glasses, Great Leap Forward, Hans Moravec, hive mind, Ian Bogost, income inequality, information trail, Internet of things, invention of writing, iterative process, James Webb Space Telescope, Jaron Lanier, job automation, Johannes Kepler, John Markoff, John von Neumann, Kevin Kelly, knowledge worker, Large Hadron Collider, lolcat, loose coupling, machine translation, microbiome, mirror neurons, Moneyball by Michael Lewis explains big data, Mustafa Suleyman, natural language processing, Network effects, Nick Bostrom, Norbert Wiener, paperclip maximiser, pattern recognition, Peter Singer: altruism, phenotype, planetary scale, Ray Kurzweil, Recombinant DNA, recommendation engine, Republic of Letters, RFID, Richard Thaler, Rory Sutherland, Satyajit Das, Search for Extraterrestrial Intelligence, self-driving car, sharing economy, Silicon Valley, Skype, smart contracts, social intelligence, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, superintelligent machines, supervolcano, synthetic biology, systems thinking, tacit knowledge, TED Talk, the scientific method, The Wisdom of Crowds, theory of mind, Thorstein Veblen, too big to fail, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!, We are as Gods, Y2K

Conceptually, autonomous or artificial intelligence systems can develop in two ways: either as an extension of human thinking or as radically new thinking. Call the first “Humanoid Thinking,” or Humanoid AI, and the second “Alien Thinking,” or Alien AI. Almost all AI today is Humanoid Thinking. We use AI to solve problems too difficult, time-consuming, or boring for our limited brains to process: electrical-grid balancing, recommendation engines, self-driving cars, face recognition, trading algorithms, and the like. These artificial agents work in narrow domains with clear goals their human creators specify. Such AI aims to accomplish human objectives—often better, with fewer cognitive errors, distractions, outbursts of bad temper, or processing limitations.

Computer programs can keep track of a student’s performance, and some provide corrective feedback for common errors. But each brain is different, and there’s no substitute for a human teacher who has a long-term relationship with the student. Is it possible to create an artificial mentor for each student? We already have recommender systems on the Internet that tell us, “If you liked X, you might also like Y,” based on data of many others with similar patterns of preference. Someday the mind of each student may be tracked from childhood by a personalized deep-learning system. To achieve this level of understanding of a human mind is beyond the capabilities of current technology, but there are already efforts at Facebook to use their vast social database of friends, photos, and likes to create a Theory of Mind for every person on the planet.

To achieve this level of understanding of a human mind is beyond the capabilities of current technology, but there are already efforts at Facebook to use their vast social database of friends, photos, and likes to create a Theory of Mind for every person on the planet. So my prediction is that as more and more cognitive appliances, like chess-playing programs and recommender systems are devised, humans will become smarter and more capable. SHALLOW LEARNING SETH LLOYD Professor of quantum mechanical engineering, MIT; author, Programming the Universe Pity the poor folks at the National Security Agency: They’re spying on everyone (quelle surprise!) and everyone is annoyed at them.


pages: 579 words: 160,351

Breaking News: The Remaking of Journalism and Why It Matters Now by Alan Rusbridger

"World Economic Forum" Davos, accounting loophole / creative accounting, Airbnb, Andy Carvin, banking crisis, Bellingcat, Bernie Sanders, Bletchley Park, Boris Johnson, Brexit referendum, Cambridge Analytica, centre right, Chelsea Manning, citizen journalism, country house hotel, cross-subsidies, crowdsourcing, data science, David Attenborough, David Brooks, death of newspapers, Donald Trump, Doomsday Book, Double Irish / Dutch Sandwich, Downton Abbey, Edward Snowden, Etonian, Evgeny Morozov, fake news, Filter Bubble, folksonomy, forensic accounting, Frank Gehry, future of journalism, G4S, high net worth, information security, invention of movable type, invention of the printing press, Jeff Bezos, jimmy wales, Julian Assange, Large Hadron Collider, Laura Poitras, Mark Zuckerberg, Mary Meeker, Menlo Park, natural language processing, New Journalism, offshore financial centre, oil shale / tar sands, open borders, packet switching, Panopticon Jeremy Bentham, post-truth, pre–internet, ransomware, recommendation engine, Ruby on Rails, sexual politics, Silicon Valley, Skype, Snapchat, social web, Socratic dialogue, sovereign wealth fund, speech recognition, Steve Bannon, Steve Jobs, the long tail, The Wisdom of Crowds, Tim Cook: Apple, traveling salesman, upwardly mobile, WikiLeaks, Yochai Benkler

Web 2.0 – the thing Emily had warned was going to take over the world – was now called social media. The GMG CEO Carolyn McCall and I took another swing to the West Coast to see what was on the horizon. We dropped in on Flickr, the picture-sharing platform; on Yahoo; on Google; on Topix.net, a content aggregator in Palo Alto. We had drinks with the founders of Digg, a social recommendation platform; tea with Knight Ridder in San Jose; coffee with Real Networks and then on to Microsoft in Seattle. So many people trying so many different things; vast sums of money in play; the speed of development; the seeming impossibility of picking who would be the next big thing and who, in a couple of months, would have shut up shop or sold out.


pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Infrastructure as a Service, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

The discussion in Chapter 2 was focused around OLTP-style use: quickly executing queries to find a small number of vertices matching certain criteria. It is also interesting to look at graphs in a batch processing context, where the goal is to perform some kind of offline processing or analysis on an entire graph. This need often arises in machine learning applications such as recommendation engines, or in ranking systems. For example, one of the most famous graph analysis algorithms is PageRank [69], which tries to estimate the popularity of a web page based on what other web pages link to it. It is used as part of the formula that determines the order in which web search engines present their results.

If you lose derived data, you can recreate it from the original source. A classic example is a cache: data can be served from the cache if present, but if the cache doesn’t contain what you need, you can fall back to the underlying database. Denormalized values, indexes, and materialized views also fall into this category. In recommendation systems, predictive summary data is often derived from usage logs. Technically speaking, derived data is redundant, in the sense that it duplicates existing information. However, it is often essential for getting good performance on read queries. It is commonly denormalized. You can derive several different datasets from a single source, enabling you to look at the data from different “points of view.”

To handle these dependencies between job executions, various workflow schedulers for Hadoop have been developed, including Oozie, Azkaban, Luigi, Airflow, and Pinball [28]. These schedulers also have management features that are useful when maintaining a large collection of batch jobs. Workflows consisting of 50 to 100 MapReduce jobs are common when building recommendation systems [29], and in a large organization, many different teams may be running different jobs that read each other’s output. Tool support is important for managing such complex dataflows. Various higher-level tools for Hadoop, such as Pig [30], Hive [31], Cascading [32], Crunch [33], and FlumeJava [34], also set up workflows of multiple MapReduce stages that are automatically wired together appropriately.


pages: 518 words: 49,555

Designing Social Interfaces by Christian Crumlish, Erin Malone

A Pattern Language, Amazon Mechanical Turk, anti-pattern, barriers to entry, c2.com, carbon footprint, cloud computing, collaborative editing, commons-based peer production, creative destruction, crowdsourcing, en.wikipedia.org, Firefox, folksonomy, Free Software Foundation, game design, ghettoisation, Howard Rheingold, hypertext link, if you build it, they will come, information security, lolcat, Merlin Mann, Nate Silver, Network effects, Potemkin village, power law, recommendation engine, RFC: Request For Comment, semantic web, SETI@home, Skype, slashdot, social bookmarking, social graph, social software, social web, source of truth, stealth mode startup, Stewart Brand, systems thinking, tacit knowledge, telepresence, the long tail, the strength of weak ties, The Wisdom of Crowds, web application, Yochai Benkler

I’d venture that thumb-voting and the recommender system are a huge part of why many people buy TiVo in the first place. (OK, that plus “pause live TV.”) Items with a great deal of persistence (on the extreme end are real-world establishments, such as restaurants or businesses) make excellent candidates for rateability. Furthermore, the types of ratings we can ask for may be more involved. Because these establishments will persist, we can be reasonably sure that others will always come along afterward and benefit from the work that the community has put into the item. When it comes to explicitly input recommender systems, we should acknowledge the limitations of folks’ interest in “feeding the machine.”

This network can give rich social rewards to those who participate; however, more and more participants are finding that the rewards extend beyond just being social and discovering that the connectedness and serendipity of ambient intimacy can bring great professional gains as well. These days, ambient intimacy plays many roles in my life: it has stopped me from missing an important international flight and helped me keep sane whilst at home with a small baby. It is my outsourced tech support resource, my recommendation engine, my news filter. Twitter lets me virtually attend conferences I can’t get to but am interested in. But most valuable of all, it has allowed me to create, maintain, and even build professional and personal relationships with people in my field whose work I admire and from whom I have been able to learn and develop as a professional.


pages: 254 words: 79,052

Evil by Design: Interaction Design to Lead Us Into Temptation by Chris Nodder

4chan, affirmative action, Amazon Mechanical Turk, cognitive dissonance, crowdsourcing, Daniel Kahneman / Amos Tversky, Donald Trump, drop ship, Dunning–Kruger effect, en.wikipedia.org, endowment effect, game design, gamification, haute couture, Ian Bogost, jimmy wales, Jony Ive, Kickstarter, late fees, lolcat, loss aversion, Mark Zuckerberg, meta-analysis, Milgram experiment, Monty Hall problem, Netflix Prize, Nick Leeson, Occupy movement, Paradox of Choice, pets.com, price anchoring, recommendation engine, Rory Sutherland, Silicon Valley, Stanford prison experiment, stealth mode startup, Steve Jobs, sunk-cost fallacy, TED Talk, telemarketer, Tim Cook: Apple, trickle-down economics, upwardly mobile

These configurators reduce the load on customers by presenting options in groups (drivetrain, body, interior) and by also offering packages that combine many options into a single trim level, ideal for the satisficers. To reduce the confusion caused by the number of options while still retaining the perception of quality, many sites employ recommendation engines or filters. Recommendation engines provide a small set of options based on either comparison with prior behavior or on answers to a set of preference questions. Netflix uses a recommendation engine to suggest new movies based on ones that customers have already watched. Its business is so dependent upon this functionality that it recently offered a one million dollar prize to anyone who could increase the accuracy of the engine by more than 10 percent.

So the trick is to demonstrate that you have sufficient options to keep the maximizers happy but also provide tools that allow both the maximizers and the satisficers to find the options they want quickly. The three techniques you can use (alone or in combination) are to present many compatible choices, to use a recommendation engine or filter, and to offer a best choice guarantee. Brands that offer greater variety of compatible (that is, focused and internally consistent) options are perceived as having greater commitment and expertise in the category, which, in turn, enhances their perceived quality and purchase likelihood.

Its business is so dependent upon this functionality that it recently offered a one million dollar prize to anyone who could increase the accuracy of the engine by more than 10 percent. Currently, 75 percent of movies watched on Netflix come from a recommendation made by the site. Recommendation engines are a great way to limit choice from an otherwise overwhelming quantity of items. (Netflix.com) Filters rely less on preference algorithms and more on on-screen choices. Customers refine a product search by choosing successive properties of the product they are looking for—size, color, style, and brand—until they have narrowed the set down to a manageable group. Because individuals are responsible for each successive decision, they should still feel invested in the outcome but not overwhelmed by the number of items they’ve discarded during the process.


pages: 283 words: 85,824

The People's Platform: Taking Back Power and Culture in the Digital Age by Astra Taylor

"World Economic Forum" Davos, A Declaration of the Independence of Cyberspace, Aaron Swartz, Alan Greenspan, American Legislative Exchange Council, Andrew Keen, AOL-Time Warner, barriers to entry, Berlin Wall, big-box store, Brewster Kahle, business logic, Californian Ideology, citizen journalism, cloud computing, collateralized debt obligation, Community Supported Agriculture, conceptual framework, content marketing, corporate social responsibility, creative destruction, cross-subsidies, crowdsourcing, David Brooks, digital capitalism, digital divide, digital Maoism, disinformation, disintermediation, don't be evil, Donald Trump, Edward Snowden, Evgeny Morozov, Fall of the Berlin Wall, Filter Bubble, future of journalism, Gabriella Coleman, gentrification, George Gilder, Google Chrome, Google Glasses, hive mind, income inequality, informal economy, Internet Archive, Internet of things, invisible hand, Jane Jacobs, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Perry Barlow, Julian Assange, Kevin Kelly, Kickstarter, knowledge worker, Laura Poitras, lolcat, Mark Zuckerberg, means of production, Metcalfe’s law, Naomi Klein, Narrative Science, Network effects, new economy, New Journalism, New Urbanism, Nicholas Carr, oil rush, peer-to-peer, Peter Thiel, planned obsolescence, plutocrats, post-work, power law, pre–internet, profit motive, recommendation engine, Richard Florida, Richard Stallman, self-driving car, shareholder value, sharing economy, Sheryl Sandberg, Silicon Valley, Silicon Valley billionaire, Silicon Valley ideology, slashdot, Slavoj Žižek, Snapchat, social graph, Steve Jobs, Stewart Brand, technological solutionism, technoutopianism, TED Talk, the long tail, trade route, Tragedy of the Commons, vertical integration, Whole Earth Catalog, WikiLeaks, winner-take-all economy, Works Progress Administration, Yochai Benkler, young professional

A more democratic culture is one where previously excluded populations are given the material means to fully engage. To create a culture that is more diverse and inclusive, we have to pioneer ways of addressing discrimination and bias head-on, despite the difficulties of applying traditional methods of mitigating prejudice to digital networks. We have to shape our tools of discovery, the recommendation engines and personalization filters, so they do more than reinforce our prior choices and private bubbles. Finally, if we want a culture that is more resistant to the short-term expectations of corporate shareholders and the whims of marketers, we have to invest in noncommercial enterprises. There is no shortage of good ideas.

Huberman, “The Persistence Paradox,” First Monday 15, nos. 1–4 (January 2010). 36. James Evans, “Electronic Publication and the Narrowing of Science and Scholarship,” Science 321, no. 5887 (July 18, 2008): 395–99. 37. Daniel M. Fleder and Kartik Hosanagar, “Blockbuster Culture’s Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity,” Management Science 55, no. 5 (May 2009): 697–712. 38. Evan Hughes, “Here’s How Amazon Self-Destructs,” Salon, July 19, 2013. 39. Gary Flake et al., “Winners Don’t Take All: Characterizing the Competition for Links on the Web,” Proceedings of the National Academy of Sciences 99, no. 8 (April 16, 2002). 40.


pages: 259 words: 84,261

Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World by Mo Gawdat

3D printing, accounting loophole / creative accounting, AI winter, AlphaGo, anthropic principle, artificial general intelligence, autonomous vehicles, basic income, Big Tech, Black Lives Matter, Black Monday: stock market crash in 1987, butterfly effect, call centre, carbon footprint, cloud computing, computer vision, coronavirus, COVID-19, CRISPR, cryptocurrency, deep learning, deepfake, DeepMind, Demis Hassabis, digital divide, digital map, Donald Trump, Elon Musk, fake news, fulfillment center, game design, George Floyd, global pandemic, Google Glasses, Google X / Alphabet X, Law of Accelerating Returns, lockdown, microplastics / micro fibres, Nick Bostrom, off-the-grid, OpenAI, optical character recognition, out of africa, pattern recognition, Ponzi scheme, Ray Kurzweil, recommendation engine, self-driving car, Silicon Valley, smart contracts, Stanislav Petrov, Stephen Hawking, subprime mortgage crisis, superintelligent machines, TED Talk, TikTok, Turing machine, Turing test, universal basic income, Watson beat the top human players on Jeopardy!, Y2K

The train has left the station and, due to the three inevitables, we are just about to be supervised by a GLaDOS and all her infinitely intelligent brothers and sisters. Make no mistake, even as we speak, intelligent machines are observing us like lab rats. They are monitoring our every move and designing tests to see how we react. From the ad engines of Google to the personalization and recommendation engines of Instagram and YouTube, from the music recommendation engines of Spotify and Apple Music to the product recommendation engines of Amazon, from the chatbots to the discrimination engines of dating apps, we are the lab rats, you and me, and we are being led blindly through the maze. And what are we being promised? Digital cake – a piece of worthless content or an uninformed opinion.

What will completely sway the needle, however, is when AI itself understands this rule of engagement – do good if you want my attention – better than the humans do. So don’t approve of killing machines, even if you are patriotic and they are killing on behalf of your own country. Don’t keep feeding the recommendation engines of social media with hours and hours of your daily life. Don’t ever click on content recommended to you, search for what you actually need and don’t click on ads. Don’t approve of FinTech AI that uses machine intelligence to trade or aid the wealth concentration of a few. Don’t share about these on your LinkedIn page.

Post about every positive, friendly, healthy use of AI you find, to make others aware of it. Stand Together We should teach each other, so we collectively become smarter at identifying what is good for humanity. Don’t believe the lies you are told. It’s called the ‘defence’ industry but in reality it is mostly about offence. It’s called a ‘recommendation engine’ when in reality it is about manipulation and distraction. We are told that ‘people who bought this also bought that’ when in reality what should be said is ‘can we tempt you to buy this too?’ We are told how many found love on a dating site but not told how many were left broken-hearted. They call it a ‘matching’ algorithm when actually it is a filtering algorithm that connects you only to those the AI believes you are good enough to attract.


pages: 504 words: 67,845

Designing Web Interfaces: Principles and Patterns for Rich Interactions by Bill Scott, Theresa Neil

A Pattern Language, anti-pattern, en.wikipedia.org, Firefox, recommendation engine, Ruby on Rails, Silicon Valley, web application

It turns out that voting and rating systems are the most common places to make tools always visible. Netflix was the earliest to use a one-click rating system (Figure 4-4). Figure 4-4. Netflix star ratings are always visible Just as with Digg, rating movies is central to the health of Netflix. The Cinematch™ recommendation engine is driven largely by the user's ratings. So a clear call to action (to rate) is important. Not only do the stars serve as a strong call to action to rate movies, but they also provide important information for the other in-context tool: the "Add" button. Adding movies to your movie-shipping queue is key to having a good experience with the Netflix service.

In fact, the Gap, Old Navy, Banana Republic, and PiperLime all share the same Inline Assistant Process-style shopping cart. The Gap is betting that making it quick and easy to add items to the cart across four stores will equal more sales. Additional step Amazon, on the other hand, is betting on its recommendation engine. By going to a second page, Amazon can display other shirts like the one added—as well as advertise the Amazon.com Visa card (Figure 8-8). Figure 8-8. Amazon shows recommendations when confirming an add to its shopping cart Which is the better experience? The Gap seems to be the clear winner in pure user experience.

Netflix displays its recommendations in an overlay Each movie on the site has an "Add" button. Clicking "Add" immediately adds the movie to the user's queue. As a confirmation and an opportunity for recommendations, a Dialog Overlay is displayed on top of the movie page. Just like Amazon, Netflix has a sophisticated recommendation engine. The bet is that since the user has expressed interest in an item (shirt or movie), the site can find other items similar to it to suggest. Amazon does this in a separate page. Netflix does it in an overlay that is easily dismissed by clicking anywhere outside the overlay (or by clicking the close button at the top or bottom).


pages: 285 words: 86,853

What Algorithms Want: Imagination in the Age of Computing by Ed Finn

Airbnb, Albert Einstein, algorithmic bias, algorithmic management, algorithmic trading, AlphaGo, Amazon Mechanical Turk, Amazon Web Services, bitcoin, blockchain, business logic, Charles Babbage, Chuck Templeton: OpenTable:, Claude Shannon: information theory, commoditize, Computing Machinery and Intelligence, Credit Default Swap, crowdsourcing, cryptocurrency, data science, DeepMind, disruptive innovation, Donald Knuth, Donald Shoup, Douglas Engelbart, Douglas Engelbart, Elon Musk, Evgeny Morozov, factory automation, fiat currency, Filter Bubble, Flash crash, game design, gamification, Google Glasses, Google X / Alphabet X, Hacker Conference 1984, High speed trading, hiring and firing, Ian Bogost, industrial research laboratory, invisible hand, Isaac Newton, iterative process, Jaron Lanier, Jeff Bezos, job automation, John Conway, John Markoff, Just-in-time delivery, Kickstarter, Kiva Systems, late fees, lifelogging, Loebner Prize, lolcat, Lyft, machine readable, Mother of all demos, Nate Silver, natural language processing, Neal Stephenson, Netflix Prize, new economy, Nicholas Carr, Nick Bostrom, Norbert Wiener, PageRank, peer-to-peer, Peter Thiel, power law, Ray Kurzweil, recommendation engine, Republic of Letters, ride hailing / ride sharing, Satoshi Nakamoto, self-driving car, sharing economy, Silicon Valley, Silicon Valley billionaire, Silicon Valley ideology, Silicon Valley startup, SimCity, Skinner box, Snow Crash, social graph, software studies, speech recognition, statistical model, Steve Jobs, Steven Levy, Stewart Brand, supply-chain management, tacit knowledge, TaskRabbit, technological singularity, technological solutionism, technoutopianism, the Cathedral and the Bazaar, The Coming Technological Singularity, the scientific method, The Signal and the Noise by Nate Silver, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, transaction costs, traveling salesman, Turing machine, Turing test, Uber and Lyft, Uber for X, uber lyft, urban planning, Vannevar Bush, Vernor Vinge, wage slave

Going farther from shore, the deep waters of algorithmic imagination draw us relentlessly back toward ourselves and the mysterious origins of cognition, inspiration, and serendipity that drive creative work. How are computational systems reinventing, channeling, or modulating those processes? On an individual level this is a straightforward extension of technics: when does the memory bank, the virtual assistant, or the recommendation engine deserve credit in the creative process? These tools manage cognition, inspiration, and serendipity for us, generating conversation and intellectual connection in our social media streams, our digital workspaces and notebooks, and more broadly, in the horizon of visible knowledge. The writer using a word processor to manage drafts; the scientist using research databases and citation tools to manage a field of professional knowledge; the artist using image editing software, photo sharing tools, and a virtual notebook to track observations—all of these creative processes depend on tools that are increasingly active, occasionally manipulative agents in their own use.

At the same time, we are deeply compelled by these abstracting systems, by the romance of clean interfaces and tidy ontologies. Even with thousands of human hours encoded into its recommendations, Netflix presents a seamless computational facade, because we have arrived at a stage where many of us will trust a strange computer’s suggestions more than we will trust a stranger’s. The rhetoric of the recommendation system is so successful because it black boxes the task of judgment, asking us to trust the efficacy of personalization embedded in the algorithm. By contrast, reading movie critics or browsing sites like IMDb or Rotten Tomatoes requires us to evaluate the evaluators in a much more complicated, human way, measuring the applicability of advice generated by other personalities who might not share our tastes.


pages: 406 words: 88,820

Television disrupted: the transition from network to networked TV by Shelly Palmer

AOL-Time Warner, barriers to entry, call centre, commoditize, disintermediation, en.wikipedia.org, folksonomy, Golden age of television, hypertext link, interchangeable parts, invention of movable type, Irwin Jacobs: Qualcomm, James Watt: steam engine, Leonard Kleinrock, linear programming, Marc Andreessen, market design, Metcalfe’s law, pattern recognition, peer-to-peer, power law, recommendation engine, Saturday Night Live, shareholder value, Skype, spectrum auction, Steve Jobs, subscription business, Telecommunications Act of 1996, the long tail, There's no reason for any individual to have a computer in his home - Ken Olsen, Vickrey auction, Vilfredo Pareto, yield management

The key problem with on-demand technology is not desire; it is complexity. It’s just too hard for the average person to do. Now, making a playlist in iTunes could not be simpler. But, putting your iPod in shuffle mode is actually easier, and it is also the path of least resistance. There are other factors that help with playlist creation. Recommendation engines and collaborative filtering like Amazon’s “if you like this … you might also like …” are good ways to help people pick the right stuff for their playlists. Consumers can also skew shuffle modes, setting them to play the content they manually play the most more often than the content they play less often.

, MSN, Amazon, eBay, and of course, about every existing broadcast and cable network. A trip to the video section of the Apple Music Store through iTunes is a very interesting experience, particularly when you see how the interface handles show branding vs. network branding. Social Search Solution Another probable future is Tim Halle’s vision of a “social search,” a recommendation system that will emerge from social networking sites. Of course, the biggest social Copyright © 2006, Shelly Palmer. All rights reserved. 8-Television.Chap Eight v3.qxd 3/20/06 7:25 AM Page 114 114 C H A P T E R 8 Media Consumption networking sites like friendster.com or myspace.com are also big brands, so this may be just another permutation of branded search.


pages: 344 words: 96,020

Hacking Growth: How Today's Fastest-Growing Companies Drive Breakout Success by Sean Ellis, Morgan Brown

Airbnb, Amazon Web Services, barriers to entry, behavioural economics, Ben Horowitz, bounce rate, business intelligence, business process, content marketing, correlation does not imply causation, crowdsourcing, dark pattern, data science, DevOps, disruptive innovation, Elon Musk, game design, gamification, Google Glasses, growth hacking, Internet of things, inventory management, iterative process, Jeff Bezos, Khan Academy, Kickstarter, Lean Startup, Lyft, Mark Zuckerberg, market design, minimum viable product, multi-armed bandit, Network effects, Paul Graham, Peter Thiel, Ponzi scheme, recommendation engine, ride hailing / ride sharing, Salesforce, Sheryl Sandberg, side project, Silicon Valley, Silicon Valley startup, Skype, Snapchat, software as a service, Steve Jobs, Steve Jurvetson, subscription business, TED Talk, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, working poor, Y Combinator, young professional

Amazon is, once again, a leading practitioner, having developed one of the most powerful “recommendation engines,” the term for the algorithmic programs that customize which items are recommended to you while browsing the site. The selections are based on a combination of a customer’s search history and buying habits, and data about the habits of other shoppers like that customer. All Amazon shoppers in effect see their own version of Amazon with a unique experience tailored to their preferences. Some recommendation engines, such as Amazon’s, as well as those deployed by Google and Netflix, are incredibly complex, but many are based on relatively simple math.

This calculation can be done for a host of combinations of every item in the store, creating powerful recommendations that lead to more purchases. And with the best recommendation engines, these product suggestions will only get better and more personalized over time because the more customers shop, the more data is available not just about what an individual customer has purchased, but also about common patterns among a large pool of shoppers. The grocery app recommendation engine might, for example, recommend seltzer water and limes when a shopper puts Red Bull in her shopping cart—even if that shopper has no history of buying any of those products—based on data that shows most people buying Red Bull are purchasing mixers for vodka.6 DON’T BE INTRUSIVE An important word of caution about customizing is that it can backfire if you’re not sensitive about how you’re doing it.


pages: 293 words: 78,439

Dual Transformation: How to Reposition Today's Business While Creating the Future by Scott D. Anthony, Mark W. Johnson

activist fund / activist shareholder / activist investor, additive manufacturing, Affordable Care Act / Obamacare, Airbnb, Amazon Web Services, Andy Rubin, Apollo 13, asset light, autonomous vehicles, barriers to entry, behavioural economics, Ben Horowitz, Big Tech, blockchain, business process, business process outsourcing, call centre, Carl Icahn, Clayton Christensen, cloud computing, commoditize, corporate governance, creative destruction, crowdsourcing, death of newspapers, disintermediation, disruptive innovation, distributed ledger, diversified portfolio, driverless car, Internet of things, invention of hypertext, inventory management, Jeff Bezos, job automation, job satisfaction, Joseph Schumpeter, Kickstarter, late fees, Lean Startup, long term incentive plan, Lyft, M-Pesa, Marc Andreessen, Marc Benioff, Mark Zuckerberg, Minecraft, obamacare, Parag Khanna, Paul Graham, peer-to-peer lending, pez dispenser, recommendation engine, Salesforce, self-driving car, shareholder value, side project, Silicon Valley, SimCity, Skype, software as a service, software is eating the world, Steve Jobs, subscription business, the long tail, the market place, the scientific method, Thomas Kuhn: the structure of scientific revolutions, transfer pricing, uber lyft, Watson beat the top human players on Jeopardy!, Y Combinator, Zipcar

Netflix set to work building sophisticated inventory management systems to help ensure that people could get the DVDs they wanted when they wanted them. The company also invested heavily to build algorithms that predicted users’ desired content based on their ratings of movies they rented. The so-called recommendations engine is so critical to Netflix that in 2008 it announced a public contest wherein the team that most improved the performance of the engine would get $1 million, as long as they crossed a 10 percent improvement threshold. Two teams indeed crossed the threshold, with the winning team receiving a check from Hastings in 2009 (remarkably, that was the first time the team members met face-to-face; they had done their work virtually).

There are others, of course, such as InterActive Corp (worth about $6 billion as of this writing), which runs a collection of websites such as Match.com, About.com, and The Daily Beast; travel recommendation site TripAdvisor (worth $10 billion); real estate platform Zillow ($1.5 billion); coupon disruptor Groupon ($3 billion); local recommendations engine Yelp ($2 billion), and listicle and algorithmic innovator BuzzFeed ($1.5 billion). As of late 2016, the dozen companies here had created almost $1 trillion in market value. FIGURE 3-1 Transformation B Just because newspaper publishers didn’t create these companies doesn’t mean they couldn’t have created them.

See also curiosity capabilities link and, 74–75 disruption as, 8–12, 47–50 focusing on highest-potential, 141–142 leaders on, 196–197 stopping exploration of, 126–127 strategic opportunity areas and, 123–127 Optus, 145, 147–148, 149 Orange Is the New Black, 35 O’Reilly, Charles III, 53, 54 outsiders, involving in decision making, 109–110 overshooting, 103 Palo Alto Research Center (PARC), 13, 31 Pandesic, 78–79 parable of the eleventh floor, 77 Pathway, 58 patientslikeme.com, 60 PayPal, 200 Paytm, 202 Pearson, 67 penicillin, 139 periphery, spotting warning signs from the, 107–108 Perry, Tyler, 98 Pfizer, 17, 22, 138–139 Pharmacyclics, 19 Photoshop Express, 32 Pixar Animation Studios, 3–4 planning fallacy, 120 Playing to Win (Martin and Lafley), 124 Plunify, 72, 74 Porter, Michael, 99–100, 177 portfolio management systems, 80–82 Potemkin portfolios, 120 potential estimating current operations’, 118–119 estimating existing investments’, 119 problem solving approaches, 140–141 Procter & Gamble (P&G), 23, 64, 109 capabilities identification at, 79–80 innovation at, 146 predictability versus innovation at, 137–138 Professional Golfers’ Association, 99 Project ET, 127–128 Psychology Today, 177 purpose, 175–179 leaders on, 194–195 QQ, 106 Quantum Solutions, 51, 52 Quattro Wireless, 67 Quicken, 132–133 Qwikster, 94 Rakuten Group, 143 recommendations engine, Netflix, 33–35 reinvention, 42–43 Reminder app, 152 repositioning, 12, 27–45. See also transformation A reinvention versus, 42–43 Research in Motion (RIM), 4 revenue models, 40–41 reverse mentors, 151 Ricks College, 37, 44, 170. See also Brigham Young University-Idaho (BYU-Idaho) Ries, Eric, 65, 153 risk management early warning signs of disruption and, 102–113 growth gap determination and, 120–121 through experimentation, 64–66 toolkit for, 218–219 Ronn, Karl, 109 Rotman School of Management, 140 Rubin, Andy, 4 Rumelt, Richard P., 78, 116 Safaricom, 201 sales careful management of, 45 salesforce and, 77 Salesforce.com, 27–28, 151 The Salt Lake Tribune, 8.


pages: 283 words: 78,705

Principles of Web API Design: Delivering Value with APIs and Microservices by James Higginbotham

Amazon Web Services, anti-pattern, business intelligence, business logic, business process, Clayton Christensen, cognitive dissonance, cognitive load, collaborative editing, continuous integration, create, read, update, delete, database schema, DevOps, fallacies of distributed computing, fault tolerance, index card, Internet of things, inventory management, Kubernetes, linked data, loose coupling, machine readable, Metcalfe’s law, microservices, recommendation engine, semantic web, side project, single page application, Snapchat, software as a service, SQL injection, web application, WebSocket

The API design will incorporate internal technology decisions, sometimes to the point of requiring familiarity with a particular database or cloud vendor. For example, a public API product for a recommendation engine required the understanding of Apache Lucene to use the API. The API accepts configuration files via an HTTP POST using the Lucene configuration file format to manage the recommendation engine. The leaking of internal implementation details to API consumers resulted in the need to become Apache Lucene experts, rather than experts in using the recommendation engine API. There is value in prototyping APIs or producing evolutionary API design through a mixture of code and design.


pages: 380 words: 118,675

The Everything Store: Jeff Bezos and the Age of Amazon by Brad Stone

airport security, Amazon Mechanical Turk, Amazon Web Services, AOL-Time Warner, Apollo 11, bank run, Bear Stearns, Bernie Madoff, big-box store, Black Swan, book scanning, Brewster Kahle, buy and hold, call centre, centre right, Chuck Templeton: OpenTable:, Clayton Christensen, cloud computing, collapse of Lehman Brothers, crowdsourcing, cuban missile crisis, Danny Hillis, deal flow, Douglas Hofstadter, drop ship, Elon Musk, facts on the ground, fulfillment center, game design, housing crisis, invention of movable type, inventory management, James Dyson, Jeff Bezos, John Markoff, junk bonds, Kevin Kelly, Kiva Systems, Kodak vs Instagram, Larry Ellison, late fees, loose coupling, low skilled workers, Maui Hawaii, Menlo Park, Neal Stephenson, Network effects, new economy, off-the-grid, optical character recognition, PalmPilot, pets.com, Ponzi scheme, proprietary trading, quantitative hedge fund, reality distortion field, recommendation engine, Renaissance Technologies, RFID, Rodney Brooks, search inside the book, shareholder value, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, Skype, SoftBank, statistical arbitrage, Steve Ballmer, Steve Jobs, Steven Levy, Stewart Brand, the long tail, Thomas L Friedman, Tony Hsieh, two-pizza team, Virgin Galactic, Whole Earth Catalog, why are manhole covers round?, zero-sum game

Over the next year, Miller tangled with the European divisions of Random House, Hachette, and Bloomsbury, the publisher of the Harry Potter series. “I did everything I could to screw with their performance,” he says. He took selections of their catalog to full price and yanked their books from Amazon’s recommendation engine; with some titles, like travel books, he promoted comparable books from competitors. Miller’s constant search for new points of leverage exploited the anxieties of neurotic authors who obsessively tracked sales rank—the number on Amazon.com that showed an author how well his or her book was doing compared to other products on the site.

Amazon approached large publishers aggressively. It demanded accommodations like steeper discounts on bulk purchases, longer periods to pay its bills, and shipping arrangements that leveraged Amazon’s discounts with UPS. To publishers that didn’t comply, Amazon threatened to pull their books out of its automated personalization and recommendation systems, meaning that they would no longer be suggested to customers. “Publishers didn’t really understand Amazon. They were very naïve about what was going on with their back catalog,” says Goss. “Most didn’t know their sales were up because their backlist was getting such visibility.” Amazon had an easy way to demonstrate its market power.


pages: 1,172 words: 114,305

New Laws of Robotics: Defending Human Expertise in the Age of AI by Frank Pasquale

affirmative action, Affordable Care Act / Obamacare, Airbnb, algorithmic bias, Amazon Mechanical Turk, Anthropocene, augmented reality, Automated Insights, autonomous vehicles, basic income, battle of ideas, Bernie Sanders, Big Tech, Bill Joy: nanobots, bitcoin, blockchain, Brexit referendum, call centre, Cambridge Analytica, carbon tax, citizen journalism, Clayton Christensen, collective bargaining, commoditize, computer vision, conceptual framework, contact tracing, coronavirus, corporate social responsibility, correlation does not imply causation, COVID-19, critical race theory, cryptocurrency, data is the new oil, data science, decarbonisation, deep learning, deepfake, deskilling, digital divide, digital twin, disinformation, disruptive innovation, don't be evil, Donald Trump, Douglas Engelbart, driverless car, effective altruism, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, fake news, Filter Bubble, finite state, Flash crash, future of work, gamification, general purpose technology, Google Chrome, Google Glasses, Great Leap Forward, green new deal, guns versus butter model, Hans Moravec, high net worth, hiring and firing, holacracy, Ian Bogost, independent contractor, informal economy, information asymmetry, information retrieval, interchangeable parts, invisible hand, James Bridle, Jaron Lanier, job automation, John Markoff, Joi Ito, Khan Academy, knowledge economy, late capitalism, lockdown, machine readable, Marc Andreessen, Mark Zuckerberg, means of production, medical malpractice, megaproject, meta-analysis, military-industrial complex, Modern Monetary Theory, Money creation, move fast and break things, mutually assured destruction, natural language processing, new economy, Nicholas Carr, Nick Bostrom, Norbert Wiener, nuclear winter, obamacare, One Laptop per Child (OLPC), open immigration, OpenAI, opioid epidemic / opioid crisis, paperclip maximiser, paradox of thrift, pattern recognition, payday loans, personalized medicine, Peter Singer: altruism, Philip Mirowski, pink-collar, plutocrats, post-truth, pre–internet, profit motive, public intellectual, QR code, quantitative easing, race to the bottom, RAND corporation, Ray Kurzweil, recommendation engine, regulatory arbitrage, Robert Shiller, Rodney Brooks, Ronald Reagan, self-driving car, sentiment analysis, Shoshana Zuboff, Silicon Valley, Singularitarianism, smart cities, smart contracts, software is eating the world, South China Sea, Steve Bannon, Strategic Defense Initiative, surveillance capitalism, Susan Wojcicki, tacit knowledge, TaskRabbit, technological solutionism, technoutopianism, TED Talk, telepresence, telerobotics, The Future of Employment, The Turner Diaries, Therac-25, Thorstein Veblen, too big to fail, Turing test, universal basic income, unorthodox policies, wage slave, Watson beat the top human players on Jeopardy!, working poor, workplace surveillance , Works Progress Administration, zero day

On the other hand, they do not have much competition, so there is little reason to fear user defection. Meanwhile, bots inflate platforms’ engagement numbers, the holy grail for digital marketers. In 2012, YouTube “set a company-wide objective to reach one billion hours of viewing a day, and rewrote its recommendation engine to maximize for that goal.”67 “The billion hours of daily watch time gave our tech people a North Star,” said its CEO, Susan Wojcicki. Unfortunately for YouTube users, that single-minded fixation on metrics also empowered bad actors to manipulate recommendations and drive traffic to dangerous misinformation, as discussed above.

Professionals in health and education also owe clear and well-established legal and ethical duties to patients and students. These standards are only beginning to emerge among technologists. Thus, in the case of media and journalism—the focus of Chapter 4—a concerted corrective effort will be necessary to compensate for what is now a largely automated public sphere. When it comes to advertising and recommendation systems—the lifeblood of new media—AI’s advance has been rapid. Reorganizing commercial and political life, firms like Facebook and Google have deployed AI to make the types of decisions made by managers at television networks or editors at newspapers—but with much more powerful effects. The reading and viewing habits of hundreds of millions of people have been altered by such companies.


pages: 286 words: 87,401

Blitzscaling: The Lightning-Fast Path to Building Massively Valuable Companies by Reid Hoffman, Chris Yeh

"Susan Fowler" uber, activist fund / activist shareholder / activist investor, adjacent possible, Airbnb, Amazon Web Services, Andy Rubin, autonomous vehicles, Benchmark Capital, bitcoin, Blitzscaling, blockchain, Bob Noyce, business intelligence, Cambridge Analytica, Chuck Templeton: OpenTable:, cloud computing, CRISPR, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, data science, database schema, DeepMind, Didi Chuxing, discounted cash flows, Elon Musk, fake news, Firefox, Ford Model T, forensic accounting, fulfillment center, Future Shock, George Gilder, global pandemic, Google Hangouts, Google X / Alphabet X, Greyball, growth hacking, high-speed rail, hockey-stick growth, hydraulic fracturing, Hyperloop, initial coin offering, inventory management, Isaac Newton, Jeff Bezos, Joi Ito, Khan Academy, late fees, Lean Startup, Lyft, M-Pesa, Marc Andreessen, Marc Benioff, margin call, Mark Zuckerberg, Max Levchin, minimum viable product, move fast and break things, Network effects, Oculus Rift, oil shale / tar sands, PalmPilot, Paul Buchheit, Paul Graham, Peter Thiel, pre–internet, Quicken Loans, recommendation engine, ride hailing / ride sharing, Salesforce, Sam Altman, Sand Hill Road, Saturday Night Live, self-driving car, shareholder value, sharing economy, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, Skype, smart grid, social graph, SoftBank, software as a service, software is eating the world, speech recognition, stem cell, Steve Jobs, subscription business, synthetic biology, Tesla Model S, thinkpad, three-martini lunch, transaction costs, transport as a service, Travis Kalanick, Uber for X, uber lyft, web application, winner-take-all economy, work culture , Y Combinator, yellow journalism

Climbing the learning curve for these tasks was painful and expensive, but it gave Netflix a competitive advantage over its competitors. Later, as broadband connections became more widespread, Netflix had to climb the learning curve when building out its massive streaming infrastructure while continuing to improve its consumer recommendation engine. That was when Netflix began running into a major strategic issue. Netflix relied on the studios for its content (movies and TV shows), but the studios now saw online video companies like YouTube and Netflix as a threat. In response, they began to increase the price they demanded from Netflix for licensing their content and held back some of their “crown jewels” (e.g., massively popular content like Saturday Night Live) for themselves and Hulu (an industry joint venture).

Today, Netflix might very well be the leader in original video content, and even traditional Hollywood power players, such as superproducer Shonda Rhimes (Grey’s Anatomy, Scandal, How to Get Away with Murder) and comedian Adam Sandler (Happy Gilmore, Grown Ups), have switched from traditional studios to Netflix. What’s more, the other learning curves that Netflix climbed along the way actually helped it beat the studios at their own game. The consumer recommendation engine gives Netflix an unprecedented ability to predict what content its users want to watch, which allows it to work with creators to produce that content (such as the popular drama Stranger Things). And because Netflix has greater confidence in its own predictions than its competitors have in theirs, it can outbid them for content when they go head-to-head.

The challenge was figuring out how to develop a daily use case that helped LinkedIn users with their professional lives and encouraged them to use the service continuously rather than just when they were looking to switch jobs or hire a new employee. We tried a number of single-threaded efforts to meet the challenge. We rolled out features one after another, such as a recommendation engine for people that our users should meet and a professional Q&A service. None of them worked well enough to solve the problem. We concluded that the problem might require a Swiss Army knife approach with multiple use cases for multiple groups of users. After all, some people might want a news feed, some might want to track their career progress, and some might be keen on continuing education.


pages: 247 words: 81,135

The Great Fragmentation: And Why the Future of All Business Is Small by Steve Sammartino

3D printing, additive manufacturing, Airbnb, augmented reality, barriers to entry, behavioural economics, Bill Gates: Altair 8800, bitcoin, BRICs, Buckminster Fuller, citizen journalism, collaborative consumption, cryptocurrency, data science, David Heinemeier Hansson, deep learning, disruptive innovation, driverless car, Dunbar number, Elon Musk, fiat currency, Frederick Winslow Taylor, game design, gamification, Google X / Alphabet X, haute couture, helicopter parent, hype cycle, illegal immigration, index fund, Jeff Bezos, jimmy wales, Kickstarter, knowledge economy, Law of Accelerating Returns, lifelogging, market design, Mary Meeker, Metcalfe's law, Minecraft, minimum viable product, Network effects, new economy, peer-to-peer, planned obsolescence, post scarcity, prediction markets, pre–internet, profit motive, race to the bottom, random walk, Ray Kurzweil, recommendation engine, remote working, RFID, Rubik’s Cube, scientific management, self-driving car, sharing economy, side project, Silicon Valley, Silicon Valley startup, skunkworks, Skype, social graph, social web, software is eating the world, Steve Jobs, subscription business, survivorship bias, The Home Computer Revolution, the long tail, too big to fail, US Airways Flight 1549, vertical integration, web application, zero-sum game

Creative types Collaboration, creative orientation and counter intuition Note Chapter 6: Demographics is history: moving on from predictive marketing How to get profiled The price of pop culture The best average The weapon of choice Don’t fence me in How do you define a teenager? Stealing music or connecting? Marketing 1.0 Marketing revised The new intersection Social + interests = intention The story of cities Do I know you? The interest graph in action The anti-demographic recommendation engine Chapter 7: The truth about pricing: technology and omnipresent deflation Technology deflation Real-world technology deflation The free super computer The crux is human It’s getting quicker Technology curve jumping Technology stacking Omnipresent deflation Consumer price index trickery Connections and the impact on prices Economic border hopping The new minimum wage Notes Chapter 8: A zero-barrier world: how access to knowledge is breaking down barriers So what’s changed?

They helped a person, which is a very different approach. It seems old-school BMXers are a little bit smarter than old-school marketers. What a great way to build a community; one that I’m now a part of. While everyone gets enamoured with ‘big data’, there’s probably a lot more we can do with ‘little data’. The anti-demographic recommendation engine A lot of e-commerce platforms and social-media engines seem to be able to do what mainstream marketers could never quite pull off. Every day, I’m exposed to products and services that I have zero interest in ever purchasing, mainly due to the laziness of the marketers who allocate the budget behind them.

It’s always spot on, sitting perfectly in the centre of my personal interest graph, based on the simplicity of what I’ve bought, looked at, wish listed and what others have in their list when there are overlaps. For me personally, it’s very accurate indeed. What’s interesting is that this recommendation engine is what I’d coin an ‘anti-demographic’ profiler: It doesn’t care what sex I am. It doesn’t care where I live. It doesn’t care or know how much I earn. It doesn’t care if I finished school. None of this matters. What matters is the direct connection and the reality of my interests based on my digital footprint.


pages: 420 words: 130,503

Actionable Gamification: Beyond Points, Badges and Leaderboards by Yu-Kai Chou

Apple's 1984 Super Bowl advert, barriers to entry, behavioural economics, bitcoin, Burning Man, Cass Sunstein, crowdsourcing, Daniel Kahneman / Amos Tversky, delayed gratification, Do you want to sell sugared water for the rest of your life?, don't be evil, en.wikipedia.org, endowment effect, Firefox, functional fixedness, game design, gamification, growth hacking, IKEA effect, Internet of things, Kickstarter, late fees, lifelogging, loss aversion, Maui Hawaii, Minecraft, pattern recognition, peer-to-peer, performance metric, QR code, recommendation engine, Richard Thaler, Silicon Valley, Skinner box, Skype, software as a service, Stanford prison experiment, Steve Jobs, TED Talk, The Wealth of Nations by Adam Smith, transaction costs

Accompanying the Alfred Effect is Amazon’s Recommendation Engine, now infamous in the personalization industry. Amazon’s recommendation engine, according to Amazon themselves, led to 30% of their sales5. That’s a fairly significant factor for a company that is already making billions of dollars every month. In fact, JP Mangalindan, a writer for Fortune and CNN money, argues that a significant part of Amazon’s 29% sales growth from the second fiscal quarter of 2011 to the second fiscal quarter of 2012 was attributed to the recommendation engine.6 And what does this recommendation engine look like? “Customers Who Bought This Item Also Bought.”


pages: 567 words: 122,311

Lean Analytics: Use Data to Build a Better Startup Faster by Alistair Croll, Benjamin Yoskovitz

Airbnb, Amazon Mechanical Turk, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, barriers to entry, Bay Area Rapid Transit, Ben Horowitz, bounce rate, business intelligence, call centre, cloud computing, cognitive bias, commoditize, constrained optimization, data science, digital rights, en.wikipedia.org, Firefox, Frederick Winslow Taylor, frictionless, frictionless market, game design, gamification, Google X / Alphabet X, growth hacking, hockey-stick growth, Infrastructure as a Service, Internet of things, inventory management, Kickstarter, lateral thinking, Lean Startup, lifelogging, longitudinal study, Marshall McLuhan, minimum viable product, Network effects, PalmPilot, pattern recognition, Paul Graham, performance metric, place-making, platform as a service, power law, price elasticity of demand, reality distortion field, recommendation engine, ride hailing / ride sharing, rolodex, Salesforce, sentiment analysis, skunkworks, Skype, social graph, social software, software as a service, Steve Jobs, subscription business, telemarketer, the long tail, transaction costs, two-sided market, Uber for X, web application, Y Combinator

Shoppers start with an external search and then bounce back and forth from sites they visit to their search results, seeking the scent of what they’re after. Once they find it, on-site navigation becomes more important. This means on-site funnels are somewhat outdated; keywords are more important. Retailers use recommendation engines to anticipate what else a buyer might want, basing their suggestions on past buyers and other users with similar profiles. Few visitors see the same pages as one another. Retailers are always optimizing performance, which means that they’re segmenting traffic. Mid- to large-size retailers segment their funnel by several tests that are being run to find the right products, offers, and prices.

Revenue per customer The lifetime value of each customer. Top keywords driving traffic to the site Those terms that people are looking for, and associate with you—a clue to adjacent products or markets. Top search terms Both those that lead to revenue, and those that don’t have any results. Effectiveness of recommendation engines How likely a visitor is to add a recommended product to the shopping cart. Virality Word of mouth, and sharing per visitor. Mailing list effectiveness Click-through rates and ability to make buyers return and buy. More sophisticated retailers care about other metrics such as the number of reviews written or the number considered helpful, but this is really a secondary business within the organization, and we’ll deal with these when we look at the user-generated content model in Chapter 12.

We’re not going to get into the details of search engine optimization and search engine marketing here—those are worlds unto themselves. For now, realize that search is a significant part of any e-commerce operation, and the old model of formal navigational steps toward a particular page is outdated (even though it remains in many analytics tools). Recommendation Acceptance Rate Big e-commerce companies use recommendation engines to suggest additional items to visitors. Today, these engines are becoming more widespread thanks to third-party recommendation services that work with smaller retailers. Even bloggers have this kind of algorithm, suggesting other articles similar to the one the visitor is currently reading.


We Are the Nerds: The Birth and Tumultuous Life of Reddit, the Internet's Culture Laboratory by Christine Lagorio-Chafkin

"Friedman doctrine" OR "shareholder theory", 4chan, Aaron Swartz, Airbnb, Amazon Web Services, Bernie Sanders, big-box store, bitcoin, blockchain, Brewster Kahle, Burning Man, compensation consultant, crowdsourcing, cryptocurrency, data science, David Heinemeier Hansson, digital rights, disinformation, Donald Trump, East Village, eternal september, fake news, game design, Golden Gate Park, growth hacking, Hacker News, hiring and firing, independent contractor, Internet Archive, Jacob Appelbaum, Jeff Bezos, jimmy wales, Joi Ito, Justin.tv, Kickstarter, Large Hadron Collider, Lean Startup, lolcat, Lyft, Marc Andreessen, Mark Zuckerberg, medical residency, minimum viable product, natural language processing, Palm Treo, Paul Buchheit, Paul Graham, paypal mafia, Peter Thiel, plutocrats, QR code, r/findbostonbombers, recommendation engine, RFID, rolodex, Ruby on Rails, Sam Altman, Sand Hill Road, Saturday Night Live, self-driving car, semantic web, Sheryl Sandberg, side project, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, slashdot, Snapchat, Social Justice Warrior, social web, South of Market, San Francisco, Startup school, Stephen Hawking, Steve Bannon, Steve Jobs, Steve Wozniak, Streisand effect, technoutopianism, uber lyft, Wayback Machine, web application, WeWork, WikiLeaks, Y Combinator

The notebook contained some typical college scribbles (“I’m sorry I’ll shut up now”) and doodles (3-D cubes, a penis) he’d made during class at UVA, and some coursework notes, too, but on this day it transformed into a place where Huffman would document the origins of, and his progress on, their new, as yet unnamed project. “The site people go to find something new,” Huffman wrote in blue pen. “Points for being the first to recommend,” he also wrote, likely transcribing Graham’s exact words regarding building a recommendation engine before any of their preexisting competitors could. The recommendation engine was integral to the success of this hypothetical project, Graham thought, because one would need to dangle a carrot for users to entice them to post links in the first place, and then to return again and again to discover and share. Discover and share.

It had been a longtime and significant priority of Huffman’s to keep the site loading quickly and, for users, functioning well (programmers call this keeping a site “perky”). Thanks to numerous small changes and additions to its functionality over the past years, the codebase had become unwieldy. Plus, there were portions of code that were now unused, features built and never launched, or pulled back on, such as the complex recommendation engine Paul Graham had pushed so hard for at Reddit’s inception. With a team of four in place in the conference room overlooking SoMa’s tech-company epicenter, it felt good. Reddit was ready to grow. Huffman and Slowe felt proud that they’d learned to navigate Condé Nast human resources well enough to hire, which allowed them finally to get ahead of the game on site maintenance.

Huffman’s long-standing trust in Slowe was so deep that when Slowe returned to Reddit, Huffman said his mandate was simply: “Go do stuff, Chris.” Slowe dug into how Reddit’s homepage functions for various users, dubbing the project “Relevance.” Updating the homepage algorithm led him to revisit the recommendation engine project they’d worked on eleven years before. Soon, he added another major project to his plate: overseeing a department that would be dubbed “anti-evil.” It would build specific tools for use by the secretive trust and safety team, and essentially be its programming counterpart. As new engineers were hired, more were handed over to Slowe to build robust antispam systems.


pages: 215 words: 55,212

The Mesh: Why the Future of Business Is Sharing by Lisa Gansky

"World Economic Forum" Davos, Airbnb, Amazon Mechanical Turk, Amazon Web Services, banking crisis, barriers to entry, Bear Stearns, bike sharing, business logic, carbon footprint, carbon tax, Chuck Templeton: OpenTable:, clean tech, cloud computing, credit crunch, crowdsourcing, diversification, Firefox, fixed income, Google Earth, impact investing, industrial cluster, Internet of things, Joi Ito, Kickstarter, late fees, Network effects, new economy, peer-to-peer lending, planned obsolescence, recommendation engine, RFID, Richard Florida, Richard Thaler, ride hailing / ride sharing, sharing economy, Silicon Valley, smart grid, social web, software as a service, TaskRabbit, the built environment, the long tail, vertical integration, walkable city, yield management, young professional, Zipcar

As the service developed, the company added layers of information to inform a user’s choices, such as reviews from people in the network whose profile of selections and ratings were similar. Recently, it sponsored a contest awarding a million dollars to anyone who could significantly improve the movie recommendation service. Thousands of teams from more than a hundred nations competed. Netflix’s “recommendation engine” relies on algorithms culled from masses of data collected on the Web, including that provided directly by customers. The lesson learned from the contest, according to the New York Times, was the power of collaboration, as winning teams began sharing ideas and information: “The formula for success was to bring together people with complementary skills and combine different methods of problem solving.”

See Social networking starting Mesh company Sweet Spot trends influencing growth of trust building Millennial generation Mobile networks digital translation to physical and flash branding as foundation of the Mesh share-based business operation users, increase in Modular design Mohsenin, Kamran Movie rentals online, Mesh companies Mozilla Firefox Music-based businesses, Mesh companies Natural ecosystem, relationship to Mesh ecosystem Netflix annual sales as information business Mesh strategy perfection recommendation engine recommendations Network effect Niche markets for maintaining/servicing products Mesh companies opening, reason for sharing as North Portland Tool Library (NPTL) Ofoto Olapic Ombudsman Open Architecture Network Open Design Open innovation service provider Open networks advantages of Architecture for Humanity communal IP concept and marketing products openness versus proprietary approach and product improvement software development OpenTable O’Reilly, Tim Ostrom, Elinor Own-to-Mesh model car-sharing services profits, generation from retirees as customers Partnerships characteristics of corporations and Mesh companies income generation from in Mesh ecosystem unexpected value of Patagonia recycled textiles of Walmart partnership Paul, Sunil Payne, Steven Peer-to-peer lending.


pages: 334 words: 102,899

That Will Never Work: The Birth of Netflix and the Amazing Life of an Idea by Marc Randolph

Airbnb, Apollo 13, crowdsourcing, digital rights, high net worth, inventory management, Isaac Newton, Jeff Bezos, late fees, loose coupling, Mason jar, pets.com, recommendation engine, rolodex, Sand Hill Road, Silicon Valley, Silicon Valley startup, Snapchat, Steve Jobs, subscription business, tech worker, The last Blockbuster video rental store is in Bend, Oregon, Travis Kalanick

For example: Say I rented (and loved) Pleasantville, one of the best movies of 1998 and a clever dark comedy about what happens when two teenagers from the nineties (Tobey Maguire and Reese Witherspoon) are sucked into a black-and-white television show set in 1950s small-town America. The ideal recommendation engine would be able to steer me away from more current new releases and toward other movies, like Pleasantville—movies like Doc Hollywood. That was a tall order. The thing about taste is that it’s subjective. And the number of factors in play, when trying to establish similarities between films, is almost endless.

After that, Reed’s team went to work integrating these taste predictions into a broader algorithm that made movie recommendations after weighing a number of factors—keyword, number of copies, number of copies in stock, cost per disc. The result—which launched in February of 2000 as Cinematch—was a seemingly more intuitive recommendation engine, one that outsourced qualitative assessment to users while also optimizing things on the back end. In many ways, it was the best of both worlds: an automated system that nonetheless felt human, like a video store clerk asking you what you’d seen lately and then recommending something he knew you’d like—and that he had in stock.

We’d continued on our streak of making major talent hires—the most recent being Leslie Kilgore, whom Reed had convinced to leave Amazon to head our marketing efforts as CMO, and Ted Sarandos, who now managed our content acquisition. Since walking away from à la carte rentals, our no-due-dates, no-late-fees program had steadily built up steam. Users loved Cinematch, our recommendation engine. We did, too. It kept our subscribers’ queues full—and nothing, we found, correlated more to retention than a queue with lots of movies in it. We were now approaching nearly 200,000 paying subscribers. Our other metrics were looking pretty impressive as well. We now carried 5,800 different DVD titles and shipped more than 800,000 discs a month, and our warehouse was packed with more than a million discs.


pages: 414 words: 117,581

Binge Times: Inside Hollywood's Furious Billion-Dollar Battle to Take Down Netflix by Dade Hayes, Dawn Chmielewski

activist fund / activist shareholder / activist investor, Airbnb, Albert Einstein, Amazon Web Services, AOL-Time Warner, Apollo 13, augmented reality, barriers to entry, Big Tech, borderless world, cloud computing, cognitive dissonance, content marketing, coronavirus, corporate raider, COVID-19, data science, digital rights, Donald Trump, Downton Abbey, Elon Musk, George Floyd, global pandemic, Golden age of television, haute cuisine, hockey-stick growth, invention of the telephone, Jeff Bezos, John Markoff, Jony Ive, late fees, lockdown, loose coupling, Marc Andreessen, Mark Zuckerberg, Mitch Kapor, Netflix Prize, Osborne effect, performance metric, period drama, Phoebe Waller-Bridge, QR code, reality distortion field, recommendation engine, remote working, Ronald Reagan, Salesforce, Saturday Night Live, Silicon Valley, skunkworks, Skype, Snapchat, social distancing, Steve Jobs, subscription business, tech bro, the long tail, the medium is the message, TikTok, Tim Cook: Apple, vertical integration, WeWork

And when they want another one, they’ll just mail it back and we’ll replace it. There’ll be no due dates and no late fees.” The service Netflix introduced in 1999 changed the struggling startup’s fortunes, attracting 239,000 subscribers, winning loyalty from those who appreciated not only its novel approach to DVD rentals but also its recommendation engine and the community of cinephiles gathered around its website. At the time, prior to the arrival of social media, chat rooms and message boards were the primary means of expression. Netflix subscribers could build “queues” of desired rental titles and trade reviews with other subscribers. Compared with Blockbuster, whose khaki-and-blue-shirt staff uniforms and regimented aisles were directly inspired by mass brands like McDonald’s, Netflix emphasized the individual.

Its relentless focus on delivering what people want to watch, and its multilayered approach to understanding individual consumer preferences, is something that sets it apart from its Hollywood studio rivals. The traditional focus of entertainment companies has been convincing consumers to tune in at a certain hour or to show up in theaters on a particular weekend. Netflix saw its role as that of matchmaker, not carnival barker. The company launched its first recommendation engine, Cinematch, in February 2000 to help subscribers navigate a library of five thousand movie titles that was too unwieldy to browse. Six years later, it held a closely watched contest to boost the accuracy of its recommendations by 10 percent. Netflix dangled a $1 million prize, though the ultimate lure for nerds was access to a data set of over 100 million ratings on 17,700 movies from 480,189 customers.

As Netflix’s content flowed onto millions of screens around the world, it invested deeply in local-language production to attract subscribers from Darfur to Kuala Lumpur. Netflix discovered its shows effortlessly traveled the borderless world of the internet, propelled by local-language dubbing and its recommendation engine. The German time-travel series Dark, the postapocalyptic Danish series The Rain, India’s crime thriller Sacred Games, and France’s action mystery Lupin would find audiences well beyond their countries of origin. Meanwhile, veteran studio executive Scott Stuber launched Netflix’s pursuit of a Best Picture Oscar with Roma, director Alfonso Cuarón’s sumptuous black-and-white portrait of a domestic worker set in 1970s Mexico City.


pages: 58 words: 12,386

Big Data Glossary by Pete Warden

business intelligence, business logic, crowdsourcing, fault tolerance, functional programming, information retrieval, linked data, machine readable, natural language processing, recommendation engine, web application

It comes with algorithms to perform a lot of common tasks, like clustering and classifying objects into groups, recommending items based on other users’ behaviors, and spotting attributes that occur together a lot. In practical terms, the framework makes it easy to use analysis techniques to implement features such as Amazon’s “People who bought this also bought” recommendation engine on your own site. It’s a heavily used project with an active community of developers and users, and it’s well worth trying if you have any significant number of transaction or similar data that you’d like to get more value out of. Introducing Mahout Using Mahout with Cassandra scikits.learn It’s hard to find good off-the-shelf tools for practical machine learning.


pages: 222 words: 70,132

Move Fast and Break Things: How Facebook, Google, and Amazon Cornered Culture and Undermined Democracy by Jonathan Taplin

"Friedman doctrine" OR "shareholder theory", "there is no alternative" (TINA), 1960s counterculture, affirmative action, Affordable Care Act / Obamacare, Airbnb, AlphaGo, Amazon Mechanical Turk, American Legislative Exchange Council, AOL-Time Warner, Apple's 1984 Super Bowl advert, back-to-the-land, barriers to entry, basic income, battle of ideas, big data - Walmart - Pop Tarts, Big Tech, bitcoin, Brewster Kahle, Buckminster Fuller, Burning Man, Clayton Christensen, Cody Wilson, commoditize, content marketing, creative destruction, crony capitalism, crowdsourcing, data is the new oil, data science, David Brooks, David Graeber, decentralized internet, don't be evil, Donald Trump, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, Elon Musk, equal pay for equal work, Erik Brynjolfsson, Fairchild Semiconductor, fake news, future of journalism, future of work, George Akerlof, George Gilder, Golden age of television, Google bus, Hacker Ethic, Herbert Marcuse, Howard Rheingold, income inequality, informal economy, information asymmetry, information retrieval, Internet Archive, Internet of things, invisible hand, Jacob Silverman, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: technological unemployment, John Perry Barlow, John von Neumann, Joseph Schumpeter, Kevin Kelly, Kickstarter, labor-force participation, Larry Ellison, life extension, Marc Andreessen, Mark Zuckerberg, Max Levchin, Menlo Park, Metcalfe’s law, military-industrial complex, Mother of all demos, move fast and break things, natural language processing, Network effects, new economy, Norbert Wiener, offshore financial centre, packet switching, PalmPilot, Paul Graham, paypal mafia, Peter Thiel, plutocrats, pre–internet, Ray Kurzweil, reality distortion field, recommendation engine, rent-seeking, revision control, Robert Bork, Robert Gordon, Robert Metcalfe, Ronald Reagan, Ross Ulbricht, Sam Altman, Sand Hill Road, secular stagnation, self-driving car, sharing economy, Silicon Valley, Silicon Valley ideology, Skinner box, smart grid, Snapchat, Social Justice Warrior, software is eating the world, Steve Bannon, Steve Jobs, Stewart Brand, tech billionaire, techno-determinism, technoutopianism, TED Talk, The Chicago School, the long tail, The Market for Lemons, The Rise and Fall of American Growth, Tim Cook: Apple, trade route, Tragedy of the Commons, transfer pricing, Travis Kalanick, trickle-down economics, Tyler Cowen, Tyler Cowen: Great Stagnation, universal basic income, unpaid internship, vertical integration, We are as Gods, We wanted flying cars, instead we got 140 characters, web application, Whole Earth Catalog, winner-take-all economy, women in the workforce, Y Combinator, you are the product

When Thefacebook really started to grow, in the late spring of 2004, Zuckerberg and his right-hand man, Dustin Moskovitz, decided to go to Silicon Valley for the summer. Zuckerberg had met Sean Parker in a Chinese restaurant in New York in May and had been awed by his outlaw tales of Napster. Zuckerberg had written a music-recommendation engine while he was a senior at Exeter, and so Napster loomed large in his notion of hipness. When the two men got to Palo Alto in June, they ran into Parker, who was essentially homeless, having been thrown out of his latest company, Plaxo, an online address-book application. It is a tribute to Zuckerberg’s naive trust that he invited Parker to live in the house he and Moskovitz had rented.

They had no time for politics or even for wondering why their horizons were so narrow. The kids attending DigiTour would fit right into the plot of Brave New World. The Internet’s self-curated view from everywhere has the amazing ability to distract us in trivial pursuits, narrow our choices, and keep us safe in a balkanized suburb of our own taste. Search engines and recommendation engines constantly favor the most popular options and constantly make our discovery more limited. I began this chapter wondering whether technology was robbing us of some of our essential humanity. Google’s chief technologist proclaims that technology will “allow us to transcend these limitations of our biological bodies and brains.… There will be no distinction, post-Singularity, between human and machine.”


pages: 274 words: 75,846

The Filter Bubble: What the Internet Is Hiding From You by Eli Pariser

A Declaration of the Independence of Cyberspace, A Pattern Language, adjacent possible, Amazon Web Services, An Inconvenient Truth, Apple Newton, augmented reality, back-to-the-land, Black Swan, borderless world, Build a better mousetrap, Cass Sunstein, citizen journalism, cloud computing, cognitive dissonance, crowdsourcing, Danny Hillis, data acquisition, disintermediation, don't be evil, Filter Bubble, Flash crash, fundamental attribution error, Gabriella Coleman, global village, Haight Ashbury, Internet of things, Isaac Newton, Jaron Lanier, Jeff Bezos, jimmy wales, John Perry Barlow, Kevin Kelly, knowledge worker, Mark Zuckerberg, Marshall McLuhan, megacity, Metcalfe’s law, Netflix Prize, new economy, PageRank, Paradox of Choice, Patri Friedman, paypal mafia, Peter Thiel, power law, recommendation engine, RFID, Robert Metcalfe, sentiment analysis, shareholder value, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, social graph, social software, social web, speech recognition, Startup school, statistical model, stem cell, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, Ted Nordhaus, The future is already here, the scientific method, urban planning, We are as Gods, Whole Earth Catalog, WikiLeaks, Y Combinator, Yochai Benkler

In a memo for fellow progressives, Mark Steitz, one of the primary Democratic data gurus, recently wrote that “targeting too often returns to a bombing metaphor—dropping message from planes. Yet the best data tools help build relationships based on observed contacts with people. Someone at the door finds out someone is interested in education; we get back to that person and others like him or her with more information. Amazon’s recommendation engine is the direction we need to head.” The trend is clear: We’re moving from swing states to swing people. Consider this scenario: It’s 2016, and the race is on for the presidency of the United States. Or is it? It depends on who you are, really. If the data says you vote frequently and that you may have been a swing voter in the past, the race is a maelstrom.

Quora Forum, accessed Dec. 17, 2010, www.quora.com/Facebook-company/Whats-the-history-of-the-Awesome-Button-that-eventually-became-the-Like-button-on-Facebook. 151 “against the cruise line industry”: Hollis Thomases, “Google Drops Anti-Cruise Line Ads from AdWords,” Web Ad.vantage, Feb. 13, 2004, accessed Dec. 17, 2010, www.webadvantage.net/webadblog/google-drops-anti-cruise-line-ads-from-adwords-338. 151–52 identify who was persuadable: “How Rove Targeted the Republican Vote,” Frontline, accessed Feb. 8, 2011, www.pbs.org/wgbh/pages/frontline/shows/architect/rove/metrics.html. 152 “Amazon’s recommendation engine is the direction”: Mark Steitz and Laura Quinn, “An Introduction to Microtargeting in Politics,” accessed Dec. 17, 2010, www.docstoc.com/docs/43575201/An-Introduction-to-Microtargeting-in-Politics. 153 round-the-clock “war room”: “Google’s War Room for the Home Stretch of Campaign 2010,” e.politics, Sept. 24, 2010, accessed Feb. 9, 2011, www.epolitics.com/2010/09/24/googles-war-room-for-the-home-stretch-of-campaign-2010/. 155 “campaign wanted to spend on Facebook”: Vincent R.


pages: 301 words: 85,263

New Dark Age: Technology and the End of the Future by James Bridle

AI winter, Airbnb, Alfred Russel Wallace, AlphaGo, Anthropocene, Automated Insights, autonomous vehicles, back-to-the-land, Benoit Mandelbrot, Bernie Sanders, bitcoin, Boeing 747, British Empire, Brownian motion, Buckminster Fuller, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, coastline paradox / Richardson effect, cognitive bias, cognitive dissonance, combinatorial explosion, computer vision, congestion charging, cryptocurrency, data is the new oil, disinformation, Donald Trump, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dr. Strangelove, drone strike, Edward Snowden, Eyjafjallajökull, Fairchild Semiconductor, fake news, fear of failure, Flash crash, fulfillment center, Google Earth, Greyball, Haber-Bosch Process, Higgs boson, hive mind, income inequality, informal economy, Internet of things, Isaac Newton, ITER tokamak, James Bridle, John von Neumann, Julian Assange, Kickstarter, Kim Stanley Robinson, Large Hadron Collider, late capitalism, Laura Poitras, Leo Hollis, lone genius, machine translation, mandelbrot fractal, meta-analysis, Minecraft, mutually assured destruction, natural language processing, Network effects, oil shock, p-value, pattern recognition, peak oil, recommendation engine, road to serfdom, Robert Mercer, Ronald Reagan, security theater, self-driving car, Seymour Hersh, Silicon Valley, Silicon Valley ideology, Skype, social graph, sorting algorithm, South China Sea, speech recognition, Spread Networks laid a new fibre optics cable between New York and Chicago, stem cell, Stuxnet, technoutopianism, the built environment, the scientific method, Uber for X, undersea cable, University of East Anglia, uranium enrichment, Vannevar Bush, warehouse robotics, WikiLeaks

YouTube’s official guidelines state that the site is for ages thirteen and up, with parental permission required for those below eighteen, but there’s nothing to stop a thirteen-year-old accessing it. Moreover, there’s no need to have an account at all; like most websites, YouTube tracks unique visitors by their address, browser and device profile, and behaviour, and it can build a detailed demographic and preference profile to feed the recommendation engines without the viewer ever consciously submitting any information about themselves. That applies even if the viewer is a three-year-old child plonked in front of their parent’s iPad and mashing the screen with a balled-up fist. The frequency with which such a situation occurs is obvious in the site’s own viewer statistics.

Whatever agency is at play here is far from clear: the video starts with a trollish Peppa parody, but later syncs into the kind of automated repetition of tropes we’ve seen before. It’s not just trolls, or just automation; it’s not just human actors playing out an algorithmic logic, or algorithms mindlessly responding to recommendation engines. It’s a vast and almost completely hidden matrix of interactions between desires and rewards, technologies and audiences, tropes and masks. Other examples seem less accidental, and more intentional. One whole strand of video production involves automated recuts of video game footage, reprogrammed with superheroes or cartoon characters instead of soldiers and gangsters.


pages: 292 words: 85,151

Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It) by Salim Ismail, Yuri van Geest

23andMe, 3D printing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, anti-fragile, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, behavioural economics, Ben Horowitz, bike sharing, bioinformatics, bitcoin, Black Swan, blockchain, Blue Ocean Strategy, book value, Burning Man, business intelligence, business process, call centre, chief data officer, Chris Wanstrath, circular economy, Clayton Christensen, clean water, cloud computing, cognitive bias, collaborative consumption, collaborative economy, commoditize, corporate social responsibility, cross-subsidies, crowdsourcing, cryptocurrency, dark matter, data science, Dean Kamen, deep learning, DeepMind, dematerialisation, discounted cash flows, disruptive innovation, distributed ledger, driverless car, Edward Snowden, Elon Musk, en.wikipedia.org, Ethereum, ethereum blockchain, fail fast, game design, gamification, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, hiring and firing, holacracy, Hyperloop, industrial robot, Innovator's Dilemma, intangible asset, Internet of things, Iridium satellite, Isaac Newton, Jeff Bezos, Joi Ito, Kevin Kelly, Kickstarter, knowledge worker, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, lifelogging, loose coupling, loss aversion, low earth orbit, Lyft, Marc Andreessen, Mark Zuckerberg, market design, Max Levchin, means of production, Michael Milken, minimum viable product, natural language processing, Netflix Prize, NetJets, Network effects, new economy, Oculus Rift, offshore financial centre, PageRank, pattern recognition, Paul Graham, paypal mafia, peer-to-peer, peer-to-peer model, Peter H. Diamandis: Planetary Resources, Peter Thiel, Planet Labs, prediction markets, profit motive, publish or perish, radical decentralization, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, Ronald Coase, Rutger Bregman, Salesforce, Second Machine Age, self-driving car, sharing economy, Silicon Valley, skunkworks, Skype, smart contracts, Snapchat, social software, software is eating the world, SpaceShipOne, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, Steve Jurvetson, subscription business, supply-chain management, synthetic biology, TaskRabbit, TED Talk, telepresence, telepresence robot, the long tail, Tony Hsieh, transaction costs, Travis Kalanick, Tyler Cowen, Tyler Cowen: Great Stagnation, uber lyft, urban planning, Virgin Galactic, WikiLeaks, winner-take-all economy, X Prize, Y Combinator, zero-sum game

At the heart of this staggering growth was the PageRank algorithm, which ranks the popularity of web pages. (Google doesn’t gauge which page is better from a human perspective; its algorithms simply respond to the pages that deliver the most clicks.) Google isn’t alone. Today, the world is pretty much run on algorithms. From automotive anti-lock braking to Amazon’s recommendation engine; from dynamic pricing for airlines to predicting the success of upcoming Hollywood blockbusters; from writing news posts to air traffic control; from credit card fraud detection to the 2 percent of posts that Facebook shows a typical user—algorithms are everywhere in modern life. Recently, McKinsey estimated that of the seven hundred end-to-end bank processes (opening an account or getting a car loan, for example), about half can be fully automated.

., Amazon Web Services, Kindle, and now Fire smartphones and delivery drones), views new products as if they are seedlings needing careful tending for a five-to-seven-year period, is maniacal about growth over profits and ignores the short-term view of Wall Street analysts. Its pioneering initiatives include its Affiliate Program, its recommendation engine (collaborative filtering) and the Mechanical Turk project. As Bezos says, “If you’re competitor-focused, you have to wait until there is a competitor doing something. Being customer-focused allows you to be more pioneering.” Not only has Amazon built ExOs on its edges (such as AWS), it also has had the courage to cannibalize its own products (e.g., Kindle).


pages: 308 words: 84,713

The Glass Cage: Automation and Us by Nicholas Carr

Airbnb, Airbus A320, Andy Kessler, Atul Gawande, autonomous vehicles, Bernard Ziegler, business process, call centre, Captain Sullenberger Hudson, Charles Lindbergh, Checklist Manifesto, cloud computing, cognitive load, computerized trading, David Brooks, deep learning, deliberate practice, deskilling, digital map, Douglas Engelbart, driverless car, drone strike, Elon Musk, Erik Brynjolfsson, Evgeny Morozov, Flash crash, Frank Gehry, Frank Levy and Richard Murnane: The New Division of Labor, Frederick Winslow Taylor, future of work, gamification, global supply chain, Google Glasses, Google Hangouts, High speed trading, human-factors engineering, indoor plumbing, industrial robot, Internet of things, Ivan Sutherland, Jacquard loom, James Watt: steam engine, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Kevin Kelly, knowledge worker, low interest rates, Lyft, machine readable, Marc Andreessen, Mark Zuckerberg, means of production, natural language processing, new economy, Nicholas Carr, Norbert Wiener, Oculus Rift, pattern recognition, Peter Thiel, place-making, plutocrats, profit motive, Ralph Waldo Emerson, RAND corporation, randomized controlled trial, Ray Kurzweil, recommendation engine, robot derives from the Czech word robota Czech, meaning slave, scientific management, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley ideology, software is eating the world, Stephen Hawking, Steve Jobs, systems thinking, tacit knowledge, TaskRabbit, technological determinism, technological solutionism, technoutopianism, TED Talk, The Wealth of Nations by Adam Smith, turn-by-turn navigation, Tyler Cowen, US Airways Flight 1549, Watson beat the top human players on Jeopardy!, William Langewiesche

Thanks to the proliferation of smartphones, tablets, and other small, affordable, and even wearable computers, we now depend on software to carry out many of our daily chores and pastimes. We launch apps to aid us in shopping, cooking, exercising, even finding a mate and raising a child. We follow turn-by-turn GPS instructions to get from one place to the next. We use social networks to maintain friendships and express our feelings. We seek advice from recommendation engines on what to watch, read, and listen to. We look to Google, or to Apple’s Siri, to answer our questions and solve our problems. The computer is becoming our all-purpose tool for navigating, manipulating, and understanding the world, in both its physical and its social manifestations. Just think what happens these days when people misplace their smartphones or lose their connections to the net.

Automated essay-grading algorithms encourage in students a rote mastery of the mechanics of writing. The programs are deaf to tone, uninterested in knowledge’s nuances, and actively resistant to creative expression. The deliberate breaking of a grammatical rule may delight a reader, but it’s anathema to a computer. Recommendation engines, whether suggesting a movie or a potential love interest, cater to our established desires rather than challenging us with the new and unexpected. They assume we prefer custom to adventure, predictability to whimsy. The technologies of home automation, which allow things like lighting, heating, cooking, and entertainment to be meticulously programmed, impose a Taylorist mentality on domestic life.


pages: 328 words: 84,682

The Business of Platforms: Strategy in the Age of Digital Competition, Innovation, and Power by Michael A. Cusumano, Annabelle Gawer, David B. Yoffie

activist fund / activist shareholder / activist investor, Airbnb, AltaVista, Amazon Web Services, AOL-Time Warner, asset light, augmented reality, autonomous vehicles, barriers to entry, bitcoin, blockchain, business logic, Cambridge Analytica, Chuck Templeton: OpenTable:, cloud computing, collective bargaining, commoditize, CRISPR, crowdsourcing, cryptocurrency, deep learning, Didi Chuxing, distributed ledger, Donald Trump, driverless car, en.wikipedia.org, fake news, Firefox, general purpose technology, gig economy, Google Chrome, GPS: selective availability, Greyball, independent contractor, Internet of things, Jeff Bezos, Jeff Hawkins, John Zimmer (Lyft cofounder), Kevin Roose, Lean Startup, Lyft, machine translation, Mark Zuckerberg, market fundamentalism, Metcalfe’s law, move fast and break things, multi-sided market, Network effects, pattern recognition, platform as a service, Ponzi scheme, recommendation engine, Richard Feynman, ride hailing / ride sharing, Robert Metcalfe, Salesforce, self-driving car, sharing economy, Silicon Valley, Skype, Snapchat, SoftBank, software as a service, sovereign wealth fund, speech recognition, stealth mode startup, Steve Ballmer, Steve Jobs, Steven Levy, subscription business, Susan Wojcicki, TaskRabbit, too big to fail, transaction costs, transport as a service, Travis Kalanick, two-sided market, Uber and Lyft, Uber for X, uber lyft, vertical integration, Vision Fund, web application, zero-sum game

Think about how Amazon, founded by Jeff Bezos in 1994, expanded from being an online store selling books to an online store selling nearly everything, from electronics products to groceries, and with same-day delivery for some products.38 Even in the early days, Amazon used digital technology to promote online store sales, building a recommendation engine and collecting user evaluations. One estimate is that 40 percent of Amazon’s sales today come through its recommendation engine.39 Then, in the late 1990s, Bezos added the global Amazon Marketplace—what we have called a transaction platform—linking buyers and third-party sellers. Amazon combined the marketplace with its own online store and other fulfillment services, such as billing and shipping, in addition to a massive network of physical warehouses.


pages: 88 words: 25,047

The Mathematics of Love: Patterns, Proofs, and the Search for the Ultimate Equation by Hannah Fry

Brownian motion, John Nash: game theory, linear programming, Nash equilibrium, Pareto efficiency, power law, recommendation engine, Skype, stable marriage problem, statistical model, TED Talk

And that’s it – apply this algorithm to the hundreds of available questions and repeat for each of the millions of users on OkCupid and you’ve got everything you need for one of the world’s most successful dating websites. It’s one of the most elegant approaches ever attempted to pairing couples based on their personal preferences. Together with eHarmony and other similar websites, OkCupid sits alongside Amazon and Netflix as one of the most widely used recommendation engines on the internet. But there’s one problem – if the internet is the ultimate matchmaker, why are people still going on terrible dates? If the science is so good, surely that first date will be the last first date of your life? Shouldn’t the algorithm be able to deliver the perfect partner and leave it at that?


pages: 366 words: 94,209

Throwing Rocks at the Google Bus: How Growth Became the Enemy of Prosperity by Douglas Rushkoff

activist fund / activist shareholder / activist investor, Airbnb, Alan Greenspan, algorithmic trading, Amazon Mechanical Turk, Andrew Keen, bank run, banking crisis, barriers to entry, benefit corporation, bitcoin, blockchain, Burning Man, business process, buy and hold, buy low sell high, California gold rush, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, centralized clearinghouse, citizen journalism, clean water, cloud computing, collaborative economy, collective bargaining, colonial exploitation, Community Supported Agriculture, corporate personhood, corporate raider, creative destruction, crowdsourcing, cryptocurrency, data science, deep learning, disintermediation, diversified portfolio, Dutch auction, Elon Musk, Erik Brynjolfsson, Ethereum, ethereum blockchain, fiat currency, Firefox, Flash crash, full employment, future of work, gamification, Garrett Hardin, gentrification, gig economy, Gini coefficient, global supply chain, global village, Google bus, Howard Rheingold, IBM and the Holocaust, impulse control, income inequality, independent contractor, index fund, iterative process, Jaron Lanier, Jeff Bezos, jimmy wales, job automation, Joseph Schumpeter, Kickstarter, Large Hadron Collider, loss aversion, low interest rates, Lyft, Marc Andreessen, Mark Zuckerberg, market bubble, market fundamentalism, Marshall McLuhan, means of production, medical bankruptcy, minimum viable product, Mitch Kapor, Naomi Klein, Network effects, new economy, Norbert Wiener, Oculus Rift, passive investing, payday loans, peer-to-peer lending, Peter Thiel, post-industrial society, power law, profit motive, quantitative easing, race to the bottom, recommendation engine, reserve currency, RFID, Richard Stallman, ride hailing / ride sharing, Ronald Reagan, Russell Brand, Satoshi Nakamoto, Second Machine Age, shareholder value, sharing economy, Silicon Valley, Snapchat, social graph, software patent, Steve Jobs, stock buybacks, TaskRabbit, the Cathedral and the Bazaar, The Future of Employment, the long tail, trade route, Tragedy of the Commons, transportation-network company, Turing test, Uber and Lyft, Uber for X, uber lyft, unpaid internship, Vitalik Buterin, warehouse robotics, Wayback Machine, Y Combinator, young professional, zero-sum game, Zipcar

The information superhighway morphed into an interactive strip mall; digital technology’s ability to connect people to products, facilitate payments, and track behaviors led to all sorts of new marketing and sales innovations. “Buy” buttons triggered the impulse for instant gratification, while recommendation engines personalized marketing pitches. It was commerce on crack. With a few notable exceptions—such as eBay and Etsy—we didn’t really get a return of the many-to-many marketplace or digital bazaar. No, in online commerce it’s mostly a few companies selling to many, and many people selling to the very few—if anyone at all.

Amazon then leveraged its monopoly in books and free shipping to develop monopolies in other verticals, beginning with home electronics (bankrupting Circuit City and Best Buy), and then every other link in the physical and virtual fulfillment chain, from shoes and food to music and videos. Finally, Amazon flips into personhood by reversing the traditional relationship between people and machines. Amazon’s patented recommendation engines attempt to drive our human selection process. Amazon Mechanical Turks gave computers the ability to mete out repetitive tasks to legions of human drones. The computers did the thinking and choosing; the people pointed and clicked as they were instructed or induced to do. Neither Amazon nor its founder, Jeff Bezos, is slipping to new lows here.


pages: 307 words: 88,180

AI Superpowers: China, Silicon Valley, and the New World Order by Kai-Fu Lee

"World Economic Forum" Davos, AI winter, Airbnb, Albert Einstein, algorithmic bias, algorithmic trading, Alignment Problem, AlphaGo, artificial general intelligence, autonomous vehicles, barriers to entry, basic income, bike sharing, business cycle, Cambridge Analytica, cloud computing, commoditize, computer vision, corporate social responsibility, cotton gin, creative destruction, crony capitalism, data science, deep learning, DeepMind, Demis Hassabis, Deng Xiaoping, deskilling, Didi Chuxing, Donald Trump, driverless car, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, fake news, full employment, future of work, general purpose technology, Geoffrey Hinton, gig economy, Google Chrome, Hans Moravec, happiness index / gross national happiness, high-speed rail, if you build it, they will come, ImageNet competition, impact investing, income inequality, informal economy, Internet of things, invention of the telegraph, Jeff Bezos, job automation, John Markoff, Kickstarter, knowledge worker, Lean Startup, low skilled workers, Lyft, machine translation, mandatory minimum, Mark Zuckerberg, Menlo Park, minimum viable product, natural language processing, Neil Armstrong, new economy, Nick Bostrom, OpenAI, pattern recognition, pirate software, profit maximization, QR code, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, risk tolerance, Robert Mercer, Rodney Brooks, Rubik’s Cube, Sam Altman, Second Machine Age, self-driving car, sentiment analysis, sharing economy, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, Skype, SoftBank, Solyndra, special economic zone, speech recognition, Stephen Hawking, Steve Jobs, strong AI, TED Talk, The Future of Employment, Travis Kalanick, Uber and Lyft, uber lyft, universal basic income, urban planning, vertical integration, Vision Fund, warehouse robotics, Y Combinator

Does Amazon seem to know what you’ll want to buy before you do? If so, then you have been the beneficiary (or victim, depending on how you value your time, privacy, and money) of internet AI. This first wave began almost fifteen years ago but finally went mainstream around 2012. Internet AI is largely about using AI algorithms as recommendation engines: systems that learn our personal preferences and then serve up content hand-picked for us. The horsepower of these AI engines depends on the digital data they have access to, and there’s currently no greater storehouse of this data than the major internet companies. But that data only becomes truly useful to algorithms once it has been labeled.

See artificial intelligence (AI) AI engineers, 14 Airbnb, 39, 49, 73 AI revolution deep learning and, 5, 25, 92, 94, 143 economic impact of, 151–52 speed of, 152–55 AI winters, 6–7, 8, 9, 10 algorithmic bias, 229 algorithms, AI AI revolution and, 152–53 computing power and, 14, 56 credit and, 112–13 data and, 14, 17, 56, 138 fake news detection by, 109 intelligence sharing and, 87 legal applications for, 115–16 medical diagnosis and, 114–15 as recommendation engines, 107–8 robot reporting, 108 white-collar workers and, 167, 168 Alibaba Amazon compared to, 109 Chinese startups and, 58 City Brain, 93–94, 117, 124, 228 as dominant AI player, 83, 91, 93–94 eBay and, 34–35 financial services spun off from, 73 four waves of AI and, 106, 107, 109 global markets and, 137 grid approach and, 95 Microsoft Research Asia and, 89 mobile payments transition, 76 New York Stock Exchange debut, 66–67 online purchasing and, 68 success of, 40 Tencent’s “Pearl Harbor attack” on, 60–61 Wang Xing and, 24 Alipay, 35, 60, 69, 73–74, 75, 112, 118 Alphabet, 92–93 AlphaGo, 1–4, 5, 6, 11, 199 AlphaGo Zero, 90 Altman, Sam, 207 Amazon Alibaba compared to, 109 Chinese market and, 39 data captured by, 77 as dominant AI player, 83, 91 four waves of AI and, 106 grid approach and, 95 innovation mentality at, 33 monopoly of e-commerce, 170 online purchasing and, 68 Wang Xing and, 24 warehouses, 129–30 Amazon Echo, 117, 127 Amazon Go, 163, 213 Anderson, Chris, 130 Andreesen Horowitz, 70 Ant Financial, 73 antitrust laws, 20, 28, 171, 229 Apollo project, 135 app constellation model, 70 Apple, 33, 75, 117, 126, 143, 177, 184 Apple Pay, 75, 76 app-within-an-app model, 59 ARM (British firm), 96 Armstrong, Neil, 3 artificial general intelligence (AGI), 140–44 artificial intelligence (AI) introduction to, ix–xi See also China; deep learning; economy and AI; four waves of AI; global AI story; human coexistence with AI; new world order artificial superintelligence.


pages: 374 words: 94,508

Infonomics: How to Monetize, Manage, and Measure Information as an Asset for Competitive Advantage by Douglas B. Laney

3D printing, Affordable Care Act / Obamacare, banking crisis, behavioural economics, blockchain, book value, business climate, business intelligence, business logic, business process, call centre, carbon credits, chief data officer, Claude Shannon: information theory, commoditize, conceptual framework, crowdsourcing, dark matter, data acquisition, data science, deep learning, digital rights, digital twin, discounted cash flows, disintermediation, diversification, en.wikipedia.org, endowment effect, Erik Brynjolfsson, full employment, hype cycle, informal economy, information security, intangible asset, Internet of things, it's over 9,000, linked data, Lyft, Nash equilibrium, Neil Armstrong, Network effects, new economy, obamacare, performance metric, profit motive, recommendation engine, RFID, Salesforce, semantic web, single source of truth, smart meter, Snapchat, software as a service, source of truth, supply-chain management, tacit knowledge, technological determinism, text mining, uber lyft, Y2K, yield curve

This information can also have real commercial value—especially when mashed with other sources—to understand and act on local or global market conditions, population trends, and weather, for example. Public data even can be used to create new (ahem) high-value businesses such as Potbot, a virtual cannabis “budtender.” At its core is a recommendation engine that uses information on strains, cannabinoids, and medical applications aggregated via semantic web technology. Potbot also incorporates data from cannabis seed DNA scans along with recordings of brain activity in clinical tests. It monetizes this information, not just in the form of a consumer app, but also in helping growers improve their yields for the most popular or beneficial strains.1617 Public data is most monetizable when integrated with your own proprietary information.

Even when embedded into business applications, they tend to present charts or numbers in an application window. Ideally, output is updated to reflect the user’s activity and needs, but less often is it used to affect the business process directly. Evolving to complex-event processing solutions, recommendation engines, rule-based systems, or artificial intelligence (AI), combined with business process management and workflow systems, can help to optimize business processes more directly, either supplementing or supplanting human intervention. Case in point: a company formed from a collection of shopping stalls in 1919 by an English trader named Jack Cohen today has hardwired its thousands of refrigeration units to a data warehouse.


pages: 94 words: 26,453

The End of Nice: How to Be Human in a World Run by Robots (Kindle Single) by Richard Newton

3D printing, Abraham Maslow, adjacent possible, Black Swan, British Empire, Buckminster Fuller, Clayton Christensen, crowdsourcing, deliberate practice, digital divide, disruptive innovation, fail fast, fear of failure, Filter Bubble, future of work, Google Glasses, growth hacking, Isaac Newton, James Dyson, Jaron Lanier, Jeff Bezos, job automation, lateral thinking, Lean Startup, lolcat, low skilled workers, Mark Zuckerberg, move fast and break things, Paul Erdős, Paul Graham, reality distortion field, recommendation engine, rising living standards, Robert Shiller, Silicon Valley, Silicon Valley startup, skunkworks, social intelligence, Steve Ballmer, Steve Jobs, Tyler Cowen, Y Combinator

Like the sirens of legends sung sweet songs to lure sailors to crash on the rocky shore of their island, so Lanier thinks we must be wary of the attractions of the siren servers. They don’t want to make your life more complicated. They are there to make everything frictionless: “Leave it to me”, they sing. “I’ll find you new music you might like, books you’ll want to read, videos you want to watch and friends you should like.” We’re sort of used to the idea that recommendation engines work like this. We know that ads now follow us around the web and that books will be unhelpfully recommended to us by Amazon. But search results are also tailored to you. And that’s more of a concern. The search results you get will be different to the results for an identical search made by me.


Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman

cloud computing, crowdsourcing, en.wikipedia.org, first-price auction, G4S, information retrieval, John Snow's cholera map, Netflix Prize, NP-complete, PageRank, pattern recognition, power law, random walk, recommendation engine, second-price auction, sentiment analysis, social graph, statistical model, the long tail, web application

However, these technologies by themselves are not sufficient, and there are some new algorithms that have proven effective for recommendation systems. 9.1A Model for Recommendation Systems In this section we introduce a model for recommendation systems, based on a utility matrix of preferences. We introduce the concept of a “long-tail,” which explains the advantage of on-line vendors over conventional, brick-and-mortar vendors. We then briefly survey the sorts of applications in which recommendation systems have proved useful. 9.1.1The Utility Matrix In a recommendation-system application there are two classes of entities, which we shall refer to as users and items.

Here, the term “on-line” refers to the nature of the algorithm, and should not be confused with “on-line” meaning “on the Internet” in phrases such as “on-line algorithms for on-line advertising.” 2 A chesterfield is a type of sofa. See, for example, www.chesterfields.info. 3 Thanks to Anna Karlin for this example. 9 Recommendation Systems There is an extensive class of Web applications that involve predicting user responses to options. Such a facility is called a recommendation system. We shall begin this chapter with a survey of the most important examples of these systems. However, to bring the problem into focus, two good examples of recommendation systems are: (1)Offering news articles to on-line newspaper readers, based on a prediction of reader interests. (2)Offering customers of an on-line retailer suggestions about what they might like to buy, based on their past history of purchases and/or product searches.

Rather, it is only necessary to discover some entries in each row that are likely to be high. In most applications, the recommendation system does not offer users a ranking of all items, but rather suggests a few that the user should value highly. It may not even be necessary to find all items with the highest expected ratings, but only to find a large subset of those with the highest ratings. 9.1.2The Long Tail Before discussing the principal applications of recommendation systems, let us ponder the long tail phenomenon that makes recommendation systems necessary. Physical delivery systems are characterized by a scarcity of resources.


pages: 301 words: 85,126

AIQ: How People and Machines Are Smarter Together by Nick Polson, James Scott

Abraham Wald, Air France Flight 447, Albert Einstein, algorithmic bias, Amazon Web Services, Atul Gawande, autonomous vehicles, availability heuristic, basic income, Bayesian statistics, Big Tech, Black Lives Matter, Bletchley Park, business cycle, Cepheid variable, Checklist Manifesto, cloud computing, combinatorial explosion, computer age, computer vision, Daniel Kahneman / Amos Tversky, data science, deep learning, DeepMind, Donald Trump, Douglas Hofstadter, Edward Charles Pickering, Elon Musk, epigenetics, fake news, Flash crash, Grace Hopper, Gödel, Escher, Bach, Hans Moravec, Harvard Computers: women astronomers, Higgs boson, index fund, information security, Isaac Newton, John von Neumann, late fees, low earth orbit, Lyft, machine translation, Magellanic Cloud, mass incarceration, Moneyball by Michael Lewis explains big data, Moravec's paradox, more computing power than Apollo, natural language processing, Netflix Prize, North Sea oil, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, p-value, pattern recognition, Pierre-Simon Laplace, ransomware, recommendation engine, Ronald Reagan, Salesforce, self-driving car, sentiment analysis, side project, Silicon Valley, Skype, smart cities, speech recognition, statistical model, survivorship bias, systems thinking, the scientific method, Thomas Bayes, Uber for X, uber lyft, universal basic income, Watson beat the top human players on Jeopardy!, young professional

Each envelope would come back a few days after it had been sent out, along with the subscriber’s rating of the film on a 1-to-5 scale. As that ratings data accumulated, Netflix’s algorithms would look for patterns, and over time, subscribers would get better film recommendations. (This kind of AI is usually called a “recommender system”; we also like the term “suggestion engine.”) Netflix 1.0 was so focused on improving its recommender system that in 2007, to great fanfare among math geeks the world over, it announced a public machine-learning contest with a prize of $1 million. The company put some of its ratings data on a public server, and it challenged all comers to improve upon Netflix’s own system, called Cinematch, by at least 10%—that is, by predicting how you’d rate a film with 10% better accuracy than Netflix could.

Abraham Wald never shot down a Messerschmitt or even saw the inside of a combat aircraft. Nonetheless, he made an outsized contribution to the Allied war effort using an equally potent weapon: conditional probability. Specifically, Wald built a recommender system that could make personalized survivability suggestions for different kinds of planes. At its heart, it was just like a modern AI-based recommender system for TV shows. And when you understand how he built it, you’ll also understand a lot more about Netflix, Hulu, Spotify, Instagram, Amazon, YouTube, and just about every tech company that’s ever made you an automatic suggestion worth following.

See health care and medicine Medtronic Menger, Karl Microsoft Microsoft Azure modeling assumptions and deep-learning models imputation and Inception latent feature massive models missing data and model rust natural language processing and prediction rules as reality versus rules-based (top-down) models training the model Moneyball Moore’s law Moravec paradox Morgenstern, Oskar Musk, Elon natural language processing (NLP) ambiguity and bottom-up approach chatbots digital assistants future trends Google Translate growth of statistical NLP knowing how versus knowing that natural language revolution “New Deal” for human-machine linguistic interaction prediction rules and programing language revolution robustness and rule bloat and speech recognition top-down approach word co-location statistics word vectors naturally occurring radioactive materials (NORM) Netflix Crown, The (series) data scientists history of House of Cards (series) Netflix Prize for recommender system personalization recommender systems neural networks deep learning and Friends new episodes and Inception model prediction rules and New England Patriots Newton, Isaac Nightingale, Florence coxcomb diagram (1858) Crimean War and early years and training evidence-based medicine legacy of “lady with the lamp” medical statistics legacy of nursing reform legacy of Nvidia Obama, Barack Office of Scientific Research and Development parallax pattern recognition cucumber sorting input and output learning a pattern maximum heart rate and prediction rules and toilet paper theft and See also prediction rules PayPal personalization conditional probability and latent feature models and Netflix and Wald’s survivability recommendations for aircraft and See also recommender systems; suggestion engines philosophy Pickering, Edward C.


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

backpropagation, bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, disinformation, distributed generation, finite state, industrial research laboratory, information retrieval, information security, iterative process, knowledge worker, linked data, machine readable, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, power law, random walk, recommendation engine, RFID, search costs, semantic web, seminal paper, sentiment analysis, sparse data, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

Data mining technology can be used to develop strong intrusion detection and prevention systems, which may employ signature-based or anomaly-based detection. 13.3.5. Data Mining and Recommender Systems Today's consumers are faced with millions of goods and services when shopping online. Recommender systems help consumers by making product recommendations that are likely to be of interest to the user such as books, CDs, movies, restaurants, online news articles, and other services. Recommender systems may use either a content-based approach, a collaborative approach, or a hybrid approach that combines both content-based and collaborative methods.

Such profiles may be obtained explicitly (e.g., through questionnaires) or learned from users' transactional behavior over time. A collaborative recommender system tries to predict the utility of items for a user, u, based on items previously rated by other users who are similar to u. For example, when recommending books, a collaborative recommender system tries to find other users who have a history of agreeing with u (e.g., they tend to buy similar books, or give similar ratings for books). Collaborative recommender systems can be either memory (or heuristic) based or model based. Memory-based methods essentially use heuristics to make rating predictions based on the entire collection of items previously rated by users.

A weighted aggregate can be used, which adjusts for the fact that different users may use the rating scale differently. Model-based collaborative recommender systems use a collection of ratings to learn a model, which is then used to make rating predictions. For example, probabilistic models, clustering (which finds clusters of like-minded customers), Bayesian networks, and other machine learning techniques have been used. Recommender systems face major challenges such as scalability and ensuring quality recommendations to the consumer. For example, regarding scalability, collaborative recommender systems must be able to search through millions of potential neighbors in real time.


pages: 391 words: 105,382

Utopia Is Creepy: And Other Provocations by Nicholas Carr

Abraham Maslow, Air France Flight 447, Airbnb, Airbus A320, AltaVista, Amazon Mechanical Turk, augmented reality, autonomous vehicles, Bernie Sanders, book scanning, Brewster Kahle, Buckminster Fuller, Burning Man, Captain Sullenberger Hudson, centralized clearinghouse, Charles Lindbergh, cloud computing, cognitive bias, collaborative consumption, computer age, corporate governance, CRISPR, crowdsourcing, Danny Hillis, data science, deskilling, digital capitalism, digital map, disruptive innovation, Donald Trump, driverless car, Electric Kool-Aid Acid Test, Elon Musk, Evgeny Morozov, factory automation, failed state, feminist movement, Frederick Winslow Taylor, friendly fire, game design, global village, Google bus, Google Glasses, Google X / Alphabet X, Googley, hive mind, impulse control, indoor plumbing, interchangeable parts, Internet Archive, invention of movable type, invention of the steam engine, invisible hand, Isaac Newton, Jeff Bezos, jimmy wales, Joan Didion, job automation, John Perry Barlow, Kevin Kelly, Larry Ellison, Lewis Mumford, lifelogging, lolcat, low skilled workers, machine readable, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Max Levchin, means of production, Menlo Park, mental accounting, natural language processing, Neal Stephenson, Network effects, new economy, Nicholas Carr, Nick Bostrom, Norman Mailer, off grid, oil shale / tar sands, Peter Thiel, plutocrats, profit motive, Ralph Waldo Emerson, Ray Kurzweil, recommendation engine, Republic of Letters, robot derives from the Czech word robota Czech, meaning slave, Ronald Reagan, scientific management, self-driving car, SETI@home, side project, Silicon Valley, Silicon Valley ideology, Singularitarianism, Snapchat, social graph, social web, speech recognition, Startup school, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, technoutopianism, TED Talk, the long tail, the medium is the message, theory of mind, Turing test, Tyler Cowen, Whole Earth Catalog, Y Combinator, Yochai Benkler

The information may take the form of personal messages or updates from friends or colleagues, broadcast messages from experts or celebrities whose opinions or observations we value, headlines and stories from writers or publications we like, alerts about the availability of various other sorts of content on favorite subjects, or suggestions from recommendation engines—but it all shares the quality of being tailored to our particular interests. It’s all needles. And modern filters don’t just organize that information for us; they push the information at us as alerts, updates, streams. We tend to point to spam as an example of information overload. But spam is just an annoyance.

Social media is a palliative for underemployment. 18. The philistine appears ideally suited to the role of cultural impresario online. 19. Television became more interesting when people started paying for it. 20. Instagram shows us what a world without art looks like. SECOND SERIES (2013) 21. Recommendation engines are the best cure for hubris. 22. Vines would be better if they were one second shorter. 23. Hell is other selfies. 24. Twitter has revealed that brevity and verbosity are not always antonyms. 25. Personalized ads provide a running critique of artificial intelligence. 26. Who you are is what you do between notifications. 27.


pages: 382 words: 105,819

Zucked: Waking Up to the Facebook Catastrophe by Roger McNamee

"Susan Fowler" uber, "World Economic Forum" Davos, 4chan, Albert Einstein, algorithmic trading, AltaVista, Amazon Web Services, Andy Rubin, barriers to entry, Bernie Sanders, Big Tech, Bill Atkinson, Black Lives Matter, Boycotts of Israel, Brexit referendum, Cambridge Analytica, carbon credits, Cass Sunstein, cloud computing, computer age, cross-subsidies, dark pattern, data is the new oil, data science, disinformation, Donald Trump, Douglas Engelbart, Douglas Engelbart, driverless car, Electric Kool-Aid Acid Test, Elon Musk, fake news, false flag, Filter Bubble, game design, growth hacking, Ian Bogost, income inequality, information security, Internet of things, It's morning again in America, Jaron Lanier, Jeff Bezos, John Markoff, laissez-faire capitalism, Lean Startup, light touch regulation, Lyft, machine readable, Marc Andreessen, Marc Benioff, Mark Zuckerberg, market bubble, Max Levchin, Menlo Park, messenger bag, Metcalfe’s law, minimum viable product, Mother of all demos, move fast and break things, Network effects, One Laptop per Child (OLPC), PalmPilot, paypal mafia, Peter Thiel, pets.com, post-work, profit maximization, profit motive, race to the bottom, recommendation engine, Robert Mercer, Ronald Reagan, Russian election interference, Sand Hill Road, self-driving car, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, Skype, Snapchat, social graph, software is eating the world, Stephen Hawking, Steve Bannon, Steve Jobs, Steven Levy, Stewart Brand, subscription business, TED Talk, The Chicago School, The future is already here, Tim Cook: Apple, two-sided market, Uber and Lyft, Uber for X, uber lyft, Upton Sinclair, vertical integration, WikiLeaks, Yom Kippur War

The ease with which like-minded extremists can find one another creates the illusion of legitimacy. Protected from real-world stigma, communication among extreme voices over internet platforms generally evolves to more dangerous language. Normalization lowers a barrier for the curious; algorithmic reinforcement leads some users to increasingly extreme positions. Recommendation engines can and do exploit that. For example, former YouTube algorithm engineer Guillaume Chaslot created a program to take snapshots of what YouTube would recommend to users. He learned that when a user watches a regular 9/11 news video, YouTube will then recommend 9/11 conspiracies; if a teenage girl watches a video on food dietary habits, YouTube will recommend videos that promote anorexia-related behaviors.

It is not for nothing that the industry jokes about YouTube’s “three degrees of Alex Jones,” referring to the notion that no matter where you start, YouTube’s algorithms will often surface a Jones conspiracy theory video within three recommendations. In an op-ed in Wired, my colleague Renée DiResta quoted YouTube chief product officer Neal Mohan as saying that 70 percent of the views on his platform are from recommendations. In the absence of a commitment to civic responsibility, the recommendation engine will be programmed to do the things that generate the most profit. Conspiracy theories cause users to spend more time on the site. Once a person identifies with an extreme position on an internet platform, he or she will be subject to both filter bubbles and human nature. A steady flow of ideas that confirm beliefs will lead many users to make choices that exclude other ideas both online and off.


pages: 428 words: 103,544

The Data Detective: Ten Easy Rules to Make Sense of Statistics by Tim Harford

Abraham Wald, access to a mobile phone, Ada Lovelace, affirmative action, algorithmic bias, Automated Insights, banking crisis, basic income, behavioural economics, Black Lives Matter, Black Swan, Bretton Woods, British Empire, business cycle, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, Charles Babbage, clean water, collapse of Lehman Brothers, contact tracing, coronavirus, correlation does not imply causation, COVID-19, cuban missile crisis, Daniel Kahneman / Amos Tversky, data science, David Attenborough, Diane Coyle, disinformation, Donald Trump, Estimating the Reproducibility of Psychological Science, experimental subject, fake news, financial innovation, Florence Nightingale: pie chart, Gini coefficient, Great Leap Forward, Hans Rosling, high-speed rail, income inequality, Isaac Newton, Jeremy Corbyn, job automation, Kickstarter, life extension, meta-analysis, microcredit, Milgram experiment, moral panic, Netflix Prize, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, opioid epidemic / opioid crisis, Paul Samuelson, Phillips curve, publication bias, publish or perish, random walk, randomized controlled trial, recommendation engine, replication crisis, Richard Feynman, Richard Thaler, rolodex, Ronald Reagan, selection bias, sentiment analysis, Silicon Valley, sorting algorithm, sparse data, statistical model, stem cell, Stephen Hawking, Steve Bannon, Steven Pinker, survivorship bias, systematic bias, TED Talk, universal basic income, W. E. B. Du Bois, When a measure becomes a target

What sort of accountability or transparency we want depends on what problem we are trying to solve. We might, for example, want to distinguish YouTube’s algorithm for recommending videos from Netflix’s algorithm for recommending movies. There is plenty of disturbing content on YouTube, and its recommendation engine has become notorious for its apparent tendency to suggest ever more fringy and conspiratorial videos. It’s not clear that the evidence supports the idea that YouTube is an engine of radicalization, but without more transparency it’s hard to be sure.36 Netflix illustrates a different issue: competition.

See health and medical data public opinion, 149, 220 public transportation, 47–49 publication bias, 113–16, 118–23, 125–27 publicity, 107 Puerto Rico, 197–98, 200 Puy de Dôme, France, 172 Quetelet, Adolphe, 219 racial data, 176–79, 206 Random Walk down Wall Street, A (Malkiel), 125 randomized clinical trials, 4n, 53, 125–26, 133, 180 randomness, 123–24 Rapid Safety Feedback, 170–71 Rayner, Derek, 205–8 Reaper Man (Pratchett), 87 recessions, 11 recommendation engines, 181 record-keeping practices, 220–21 refugees, 191 Reifler, Jason, 129 Reischauer, Robert, 187 Reiter, Jonathan, 108 reliability of data, 233–37 religious authority, 16 religious beliefs, 247–48 Remington Rand, 244 replication/reproducibility studies and problems, 107, 112–16, 120–22, 129–31 Republican Party, 34, 189n, 269, 270 résumé-sorting algorithms, 166 ridership data, 49–51 Riecken, Henry, 239 risk models, 71 Rivlin, Alice, 186–87, 188, 212 Robinson, Nicholas, 168, 169–70 Roman Catholic Church, 16 Rönnlund, Anna Rosling, 62, 63 Roosevelt, Franklin Delano, 143–44 rose diagrams, 215–16, 233–36, 234 Roser, Max, 89, 96 Rosling, Hans, 63, 185 Ross, Lee, 35 Royal Naval Reserve, 218 Royal Society, 13 Royal Statistical Society, 194, 214, 219, 233 Rozenblit, Leonid, 272 Ruge, Mari, 89 sampling techniques, 135–38, 142–51, 155 Samuelson, Paul, 239 sanitation advocacy, 225–26, 233–37 Santos, Alexander, 198 Say It with Charts (Zelazny), 228 scale of statistical data, 92, 93–95, 103 Scarr, Simon, 231–32 Schachter, Stanley, 239 Scheibehenne, Benjamin, 106, 111, 114, 120–21 Scientific American, 102 scientific curiosity, 268–69 scientific literacy, 34–35 scientific method, 173 Scott, James C., 201, 203 Scott Brown, Denise, 217 screen-use studies, 117–18 Scutari (Üsküdar, Istanbul) barracks hospital, 213–14, 220, 225, 233, 235 search algorithms, 156–57 Second World War, 4, 262 secrecy, 174–75 Seehofer, Horst, 191 Seeing Like a State (Scott), 201, 203 selection bias, 2, 245–46.


pages: 268 words: 75,850

The Formula: How Algorithms Solve All Our Problems-And Create More by Luke Dormehl

3D printing, algorithmic bias, algorithmic trading, Alvin Toffler, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, big data - Walmart - Pop Tarts, call centre, Cass Sunstein, classic study, Clayton Christensen, commoditize, computer age, death of newspapers, deferred acceptance, disruptive innovation, Edward Lorenz: Chaos theory, Erik Brynjolfsson, Evgeny Morozov, Filter Bubble, Flash crash, Florence Nightingale: pie chart, Ford Model T, Frank Levy and Richard Murnane: The New Division of Labor, fulfillment center, Google Earth, Google Glasses, High speed trading, Internet Archive, Isaac Newton, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Kevin Kelly, Kodak vs Instagram, Lewis Mumford, lifelogging, machine readable, machine translation, Marshall McLuhan, means of production, Nate Silver, natural language processing, Netflix Prize, Panopticon Jeremy Bentham, Paradox of Choice, pattern recognition, price discrimination, recommendation engine, Richard Thaler, Rosa Parks, scientific management, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Slavoj Žižek, social graph, speech recognition, stable marriage problem, Steve Jobs, Steven Levy, Steven Pinker, Stewart Brand, technological determinism, technological solutionism, TED Talk, the long tail, the scientific method, The Signal and the Noise by Nate Silver, upwardly mobile, Wall-E, Watson beat the top human players on Jeopardy!, Y Combinator

Conversely, scores fall dramatically in situations where the task takes longer than expected.33 Decimated-Reality Aggregators Speaking in October 1944, during the rebuilding of the House of Commons, which had sustained heavy bombing damage during the Battle of Britain, former British prime minister Winston Churchill observed, “We shape our buildings; thereafter they shape us.”34 A similar sentiment might be said in the age of The Formula, in which users shape their online profiles, and from that point forward their online profiles begin to shape them—both in terms of what we see and, perhaps more crucially, what we don’t. Writing about a start-up called Nara, in the middle of 2013, I coined the phrase “decimated reality aggregators” to describe what the company was trying to do.35 Starting out as a restaurant recommender system by connecting together thousands of restaurants around the world, Nara’s ultimate goal was to become the recommender system for your life: drawing on what it knew about you from the restaurants you ate in, to suggest everything from hotels to clothes. Nara even incorporated the idea of upward mobility into its algorithm. Say, for example, you wanted to be a wine connoisseur two years down the line, but currently had no idea how to tell your Chardonnay from your Chianti.

“Neil was adamant that this should be based on science,” Carter says. Before eHarmony, the majority of dating websites took the form of searchable personal ads, of the kind that have been appearing in print since the 17th century.11 After eHarmony, the search engine model was replaced with a recommender system praised in press materials for its “scientific precision.” Instead of allowing users to scan through page after page of profiles, eHarmony simply required them to answer a series of questions—and then picked out the right option on their behalf. The website opened its virtual doors for the first time on August 22, 2000.

All a character has to do—as occurs during one scene in which the novel’s bumbling protagonist, Lenny Abramov, visits a Staten Island nightclub with his friends—is to set the “community parameters” of their iPhone-like device to a particular physical space and hit a button. At this point, every aspect of a person’s profile is revealed, including their “fuckability” and “personality” scores (both ranked on a scale of 800), along with their ranked “anal/oral/vaginal” preferences. There is even a recommender system incorporated, so that a user’s history of romantic relationships can be scrutinized for insights in much the same way that a person’s previous orders on Amazon might dictate what they will be interested in next. As one of Abramov’s friends notes, “This girl [has] a long multimedia thing on how her father abused her . . .


pages: 151 words: 39,757

Ten Arguments for Deleting Your Social Media Accounts Right Now by Jaron Lanier

4chan, Abraham Maslow, basic income, Big Tech, Black Lives Matter, Cambridge Analytica, cloud computing, context collapse, corporate governance, data science, disinformation, Donald Trump, en.wikipedia.org, fake news, Filter Bubble, gig economy, Internet of things, Jaron Lanier, life extension, Mark Zuckerberg, market bubble, Milgram experiment, move fast and break things, Network effects, peak TV, ransomware, Ray Kurzweil, recommendation engine, Silicon Valley, Skinner box, Snapchat, Stanford prison experiment, stem cell, Steve Jobs, Ted Nelson, theory of mind, WikiLeaks, you are the product, zero-sum game

The correlations are effectively theories about the nature of each person, and those theories are constantly measured and rated for how predictive they are. Like all well-managed theories, they improve over time through adaptive feedback. C is for Cramming content down people’s throats Algorithms choose what each person experiences through their devices. This component might be called a feed, a recommendation engine, or personalization. Component C means each person sees different things. The immediate motivation is to deliver stimuli for individualized behavior modification. BUMMER makes it harder to understand why others think and act the way they do. The effects of this component will be examined more in the arguments about how you are losing access to truth and the capacity for empathy.


pages: 163 words: 42,402

Machine Learning for Email by Drew Conway, John Myles White

call centre, correlation does not imply causation, data science, Debian, natural language processing, Netflix Prize, pattern recognition, recommendation engine, SpamAssassin, text mining

More likely, you have heard of something like a recommendation system, which implicitly produces a ranking of products. Even if you have not heard of a recommendation system, it’s almost certain that you have used or interacted with a recommendation system at some point. Some of the most successful e-commerce websites have benefitted from leveraging data on their users to generate recommendations for other products their users might be interested in. For example, if you have ever shopped at Amazon.com, then you have interacted with a recommendation system. The problem Amazon faces is simple: what items in their inventory are you most likely to buy?

Many hackers may be more comfortable thinking of problems in terms of the process by which a solution is attained, rather than the theoretical foundation from which the solution is derived. From this perspective, an alternative approach to teaching machine learning would be to use “cookbook” style examples. To understand how a recommendation system works, for example, we might provide sample training data and a version of the model, and show how the latter uses the former. There are many useful texts of this kind as well—Toby Segaran’s Programming Collective Intelligence is an recent example Seg07. Such a discussion would certainly address the how of a hacker’s method of learning, but perhaps less of the why.

The implication of that statement is that the items in Amazon’s inventory have an ordering specific to each user. Likewise, Netflix.com has a massive library of DVDs available to its customers to rent. In order for those customers to get the most out of the site, Netflix employs a sophisticated recommendation system to present people with rental suggestions. For both companies, these recommendations are based on two kinds of data. First, there is the data pertaining to the inventory itself. For Amazon, if the product is a television, this data might contain the type (i.e., plasma, LCD, LED), manufacturer, price, and so on.


pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life by Adam Greenfield

3D printing, Airbnb, algorithmic bias, algorithmic management, AlphaGo, augmented reality, autonomous vehicles, bank run, barriers to entry, basic income, bitcoin, Black Lives Matter, blockchain, Boston Dynamics, business intelligence, business process, Californian Ideology, call centre, cellular automata, centralized clearinghouse, centre right, Chuck Templeton: OpenTable:, circular economy, cloud computing, Cody Wilson, collective bargaining, combinatorial explosion, Computer Numeric Control, computer vision, Conway's Game of Life, CRISPR, cryptocurrency, David Graeber, deep learning, DeepMind, dematerialisation, digital map, disruptive innovation, distributed ledger, driverless car, drone strike, Elon Musk, Ethereum, ethereum blockchain, facts on the ground, fiat currency, fulfillment center, gentrification, global supply chain, global village, Goodhart's law, Google Glasses, Herman Kahn, Ian Bogost, IBM and the Holocaust, industrial robot, informal economy, information retrieval, Internet of things, Jacob Silverman, James Watt: steam engine, Jane Jacobs, Jeff Bezos, Jeff Hawkins, job automation, jobs below the API, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John Perry Barlow, John von Neumann, joint-stock company, Kevin Kelly, Kickstarter, Kiva Systems, late capitalism, Leo Hollis, license plate recognition, lifelogging, M-Pesa, Mark Zuckerberg, means of production, megacity, megastructure, minimum viable product, money: store of value / unit of account / medium of exchange, natural language processing, Network effects, New Urbanism, Nick Bostrom, Occupy movement, Oculus Rift, off-the-grid, PalmPilot, Pareto efficiency, pattern recognition, Pearl River Delta, performance metric, Peter Eisenman, Peter Thiel, planetary scale, Ponzi scheme, post scarcity, post-work, printed gun, proprietary trading, RAND corporation, recommendation engine, RFID, rolodex, Rutger Bregman, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, sharing economy, Shenzhen special economic zone , Sidewalk Labs, Silicon Valley, smart cities, smart contracts, social intelligence, sorting algorithm, special economic zone, speech recognition, stakhanovite, statistical model, stem cell, technoutopianism, Tesla Model S, the built environment, The Death and Life of Great American Cities, The Future of Employment, Tony Fadell, transaction costs, Uber for X, undersea cable, universal basic income, urban planning, urban sprawl, vertical integration, Vitalik Buterin, warehouse robotics, When a measure becomes a target, Whole Earth Review, WikiLeaks, women in the workforce

The equivalent of classification for unsupervised learning is clustering, in which an algorithm starts to develop a sense for what is significant in its environment via a process of accretion. A concrete example will help us understand how this works. At the end of the 1990s, two engineers named Tim Westegren and Will Glaser developed a rudimentary music-recommendation engine called the Music Genome Project that worked by rebuilding genre from the bottom up. (The engineers eventually founded the Pandora streaming service, and folded their recommendation engine into it.) Music Genome compared the acoustic signatures and other performance characteristics of the pieces of music it was offered, and from them built up associative maps, clustering together all the songs that had similar qualities; after many iterations, these clusters developed a strong resemblance to the musical categories we’re familiar with.


pages: 161 words: 39,526

Applied Artificial Intelligence: A Handbook for Business Leaders by Mariya Yao, Adelyn Zhou, Marlene Jia

Airbnb, algorithmic bias, AlphaGo, Amazon Web Services, artificial general intelligence, autonomous vehicles, backpropagation, business intelligence, business process, call centre, chief data officer, cognitive load, computer vision, conceptual framework, data science, deep learning, DeepMind, en.wikipedia.org, fake news, future of work, Geoffrey Hinton, industrial robot, information security, Internet of things, iterative process, Jeff Bezos, job automation, machine translation, Marc Andreessen, natural language processing, new economy, OpenAI, pattern recognition, performance metric, price discrimination, randomized controlled trial, recommendation engine, robotic process automation, Salesforce, self-driving car, sentiment analysis, Silicon Valley, single source of truth, skunkworks, software is eating the world, source of truth, sparse data, speech recognition, statistical model, strong AI, subscription business, technological singularity, The future is already here

Extensions of this technology include applications such as Pinterest’s Lens and eBay’s ShopBot, which recognize items in pictures uploaded by consumers and make recommendations of similar items currently for sale. The next frontier in recommendation systems is the cold-start scenario, in which algorithms must be able to draw good inferences about users or items despite insufficient information. Layer 6 AI, recently acquired by TD Bank, has focused on making relatively accurate predictions on noisy data in a cold-start scenario. Customer personalization is like a recommendation system on steroids, delivering highly relevant content, experience, or products to consumers without their having to exert additional effort.

Many real-world datasets have noisy, incorrect labels or are missing labels entirely, meaning that inputs and outputs are paired incorrectly with each other or are not paired at all. Active learning, a special case of semi-supervised learning, occurs when an algorithm actively queries a user to discover the right output or label for a new input. Active learning is used to optimize recommendation systems, like the ones used to recommend movies on Netflix or products on Amazon. Reinforcement learning is learning by trial-and-error, in which a computer program is instructed to achieve a stated goal in a dynamic environment. The program learns by repeatedly taking actions, measuring the feedback from those actions, and iteratively improving its behavioral policy.

In machine learning, you can easily incur massive ongoing systems costs by failing to mitigate risks early in the development process.(84) Your most talented data scientists and machine learning engineers want to build new models. Few of them are dedicated to the unsexy tasks of maintaining existing models. However, the performance of your existing models will deteriorate as environmental conditions change over time. For example, as your e-commerce inventory changes, your recommender system will need to learn to suggest new products to shoppers. As more machine learning algorithms are put into production, you will also need to dedicate more resources to model maintenance—monitoring, validating, and updating the model. A myriad of dependencies lead to machine learning debt, with certain practices incurring more technical debt than others.


Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose

Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, sparse data, speech recognition, statistical model, William of Occam

A more general approach would be to consider persons and items again connected by the relation “person likes item.” This is the approach taken in the area of collaborative filtering (also called recommender systems) [3]. Assume that we have m persons and n items (e.g., books, songs, movies, web pages). We arrange them in a m × n matrix M, where each row is a person, each column is an item, and the cells represent the binary relation “likes.” Thus, if person i COLLABORATIVE FILTERING (RECOMMENDER SYSTEMS) 85 likes item j, then M(i, j) = 1; otherwise, M(i, j) = 0. The problem is that many cells are empty (i.e., we don’t know whether or not a person likes an item).

CONTENTS PREFACE xi PART I WEB STRUCTURE MINING 1 2 INFORMATION RETRIEVAL AND WEB SEARCH 3 Web Challenges Web Search Engines Topic Directories Semantic Web Crawling the Web Web Basics Web Crawlers Indexing and Keyword Search Document Representation Implementation Considerations Relevance Ranking Advanced Text Search Using the HTML Structure in Keyword Search Evaluating Search Quality Similarity Search Cosine Similarity Jaccard Similarity Document Resemblance References Exercises 3 4 5 5 6 6 7 13 15 19 20 28 30 32 36 36 38 41 43 43 HYPERLINK-BASED RANKING 47 Introduction Social Networks Analysis PageRank Authorities and Hubs Link-Based Similarity Search Enhanced Techniques for Page Ranking References Exercises 47 48 50 53 55 56 57 57 vii viii CONTENTS PART II WEB CONTENT MINING 3 4 5 CLUSTERING 61 Introduction Hierarchical Agglomerative Clustering k-Means Clustering Probabilty-Based Clustering Finite Mixture Problem Classification Problem Clustering Problem Collaborative Filtering (Recommender Systems) References Exercises 61 63 69 73 74 76 78 84 86 86 EVALUATING CLUSTERING 89 Approaches to Evaluating Clustering Similarity-Based Criterion Functions Probabilistic Criterion Functions MDL-Based Model and Feature Evaluation Minimum Description Length Principle MDL-Based Model Evaluation Feature Selection Classes-to-Clusters Evaluation Precision, Recall, and F-Measure Entropy References Exercises 89 90 95 100 101 102 105 106 108 111 112 112 CLASSIFICATION 115 General Setting and Evaluation Techniques Nearest-Neighbor Algorithm Feature Selection Naive Bayes Algorithm Numerical Approaches Relational Learning References Exercises 115 118 121 125 131 133 137 138 PART III WEB USAGE MINING 6 INTRODUCTION TO WEB USAGE MINING 143 Definition of Web Usage Mining Cross-Industry Standard Process for Data Mining Clickstream Analysis 143 144 147 CONTENTS 7 8 9 ix Web Server Log Files Remote Host Field Date/Time Field HTTP Request Field Status Code Field Transfer Volume (Bytes) Field Common Log Format Identification Field Authuser Field Extended Common Log Format Referrer Field User Agent Field Example of a Web Log Record Microsoft IIS Log Format Auxiliary Information References Exercises 148 PREPROCESSING FOR WEB USAGE MINING 156 Need for Preprocessing the Data Data Cleaning and Filtering Page Extension Exploration and Filtering De-Spidering the Web Log File User Identification Session Identification Path Completion Directories and the Basket Transformation Further Data Preprocessing Steps References Exercises 156 149 149 149 150 151 151 151 151 151 152 152 152 153 154 154 154 158 161 163 164 167 170 171 174 174 174 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING 177 Introduction Number of Visit Actions Session Duration Relationship between Visit Actions and Session Duration Average Time per Page Duration for Individual Pages References Exercises 177 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION Introduction Modeling Methodology Definition of Clustering The BIRCH Clustering Algorithm Affinity Analysis and the A Priori Algorithm 177 178 181 183 185 188 188 191 191 192 193 194 197 x CONTENTS Discretizing the Numerical Variables: Binning Applying the A Priori Algorithm to the CCSU Web Log Data Classification and Regression Trees The C4.5 Algorithm References Exercises INDEX 199 201 204 208 210 211 213 PREFACE DEFINING DATA MINING THE WEB By data mining the Web, we refer to the application of data mining methodologies, techniques, and models to the variety of data forms, structures, and usage patterns that comprise the World Wide Web.

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage C 2007 John Wiley & Sons, Inc. By Zdravko Markov and Daniel T. Larose Copyright CHAPTER 3 CLUSTERING INTRODUCTION HIERARCHICAL AGGLOMERATIVE CLUSTERING k-MEANS CLUSTERING PROBABILTY-BASED CLUSTERING COLLABORATIVE FILTERING (RECOMMENDER SYSTEMS) INTRODUCTION The most popular approach to learning is by example. Given a set of objects, each labeled with a class (category), the learning system builds a mapping between objects and classes which can then be used for classifying new (unlabeled) objects. As the labeling (categorization) of the initial (training) set of objects is done by an agent external to the system (teacher), this setting is called supervised learning.


pages: 579 words: 76,657

Data Science from Scratch: First Principles with Python by Joel Grus

backpropagation, confounding variable, correlation does not imply causation, data science, deep learning, Hacker News, higher-order functions, natural language processing, Netflix Prize, p-value, Paul Graham, recommendation engine, SpamAssassin, statistical model

principal component analysis, Dimensionality Reduction probability, Probability-For Further Exploration, MathematicsBayes's Theorem, Bayes’s Theorem central limit theorem, The Central Limit Theorem conditional, Conditional Probability continuous distributions, Continuous Distributions defined, Probability dependence and independence, Dependence and Independence normal distribution, The Normal Distribution random variables, Random Variables probability density function, Continuous Distributions programming languages for learning data science, From Scratch Python, A Crash Course in Python-For Further Explorationargs and kwargs, args and kwargs arithmetic, Arithmetic benefits of using for data science, From Scratch Booleans, Truthiness control flow, Control Flow Counter, Counter dictionaries, Dictionaries-defaultdict enumerate function, enumerate exceptions, Exceptions functional tools, Functional Tools functions, Functions generators and iterators, Generators and Iterators list comprehensions, List Comprehensions lists, Lists object-oriented programming, Object-Oriented Programming piping data through scripts using stdin and stdout, stdin and stdout random numbers, generating, Randomness regular expressions, Regular Expressions sets, Sets sorting in, The Not-So-Basics strings, Strings tuples, Tuples whitespace formatting, Whitespace Formatting zip function and argument unpacking, zip and Argument Unpacking Q quantile, computing, Central Tendencies query optimization (SQL), Query Optimization R R (programming language), From Scratch, R random forests, Random Forests random module (Python), Randomness random variables, Random VariablesBernoulli, The Central Limit Theorem binomial, The Central Limit Theorem conditioned on events, Random Variables expected value, Random Variables normal, The Normal Distribution-The Central Limit Theorem uniform, Continuous Distributions range, Dispersion range function (Python), Generators and Iterators reading files (see files, reading) recall, Correctness recommendations, Recommender Systems recommender systems, Recommender Systems-For Further ExplorationData Scientists You May Know (example), Data Scientists You May Know item-based collaborative filtering, Item-Based Collaborative Filtering-For Further Exploration manual curation, Manual Curation recommendations based on popularity, Recommending What’s Popular user-based collaborative filtering, User-Based Collaborative Filtering-User-Based Collaborative Filtering reduce function (Python), Functional Toolsusing with vectors, Vectors regression (see linear regression; logistic regression) regression trees, What Is a Decision Tree?

For Further Exploration There are many other notions of centrality besides the ones we used (although the ones we used are pretty much the most popular ones). NetworkX is a Python library for network analysis. It has functions for computing centralities and for visualizing graphs. Gephi is a love-it/hate-it GUI-based network-visualization tool. Chapter 22. Recommender Systems O nature, nature, why art thou so dishonest, as ever to send men with these false recommendations into the world! Henry Fielding Another common data problem is producing recommendations of some sort. Netflix recommends movies you might want to watch. Amazon recommends products you might want to buy.

= other_interest_id and similarity > 0] return sorted(pairs, key=lambda (_, similarity): similarity, reverse=True) which suggests the following similar interests: [('Hadoop', 0.8164965809277261), ('Java', 0.6666666666666666), ('MapReduce', 0.5773502691896258), ('Spark', 0.5773502691896258), ('Storm', 0.5773502691896258), ('Cassandra', 0.4082482904638631), ('artificial intelligence', 0.4082482904638631), ('deep learning', 0.4082482904638631), ('neural networks', 0.4082482904638631), ('HBase', 0.3333333333333333)] Now we can create recommendations for a user by summing up the similarities of the interests similar to his: def item_based_suggestions(user_id, include_current_interests=False): # add up the similar interests suggestions = defaultdict(float) user_interest_vector = user_interest_matrix[user_id] for interest_id, is_interested in enumerate(user_interest_vector): if is_interested == 1: similar_interests = most_similar_interests_to(interest_id) for interest, similarity in similar_interests: suggestions[interest] += similarity # sort them by weight suggestions = sorted(suggestions.items(), key=lambda (_, similarity): similarity, reverse=True) if include_current_interests: return suggestions else: return [(suggestion, weight) for suggestion, weight in suggestions if suggestion not in users_interests[user_id]] For user 0, this generates the following (seemingly reasonable) recommendations: [('MapReduce', 1.861807319565799), ('Postgres', 1.3164965809277263), ('MongoDB', 1.3164965809277263), ('NoSQL', 1.2844570503761732), ('programming languages', 0.5773502691896258), ('MySQL', 0.5773502691896258), ('Haskell', 0.5773502691896258), ('databases', 0.5773502691896258), ('neural networks', 0.4082482904638631), ('deep learning', 0.4082482904638631), ('C++', 0.4082482904638631), ('artificial intelligence', 0.4082482904638631), ('Python', 0.2886751345948129), ('R', 0.2886751345948129)] For Further Exploration Crab is a framework for building recommender systems in Python. Graphlab also has a recommender toolkit. The Netflix Prize was a somewhat famous competition to build a better system to recommend movies to Netflix users. Chapter 23. Databases and SQL Memory is man’s greatest friend and worst enemy. Gilbert Parker The data you need will often live in databases, systems designed for efficiently storing and querying data.


pages: 170 words: 49,193

The People vs Tech: How the Internet Is Killing Democracy (And How We Save It) by Jamie Bartlett

Ada Lovelace, Airbnb, AlphaGo, Amazon Mechanical Turk, Andrew Keen, autonomous vehicles, barriers to entry, basic income, Bernie Sanders, Big Tech, bitcoin, Black Lives Matter, blockchain, Boris Johnson, Californian Ideology, Cambridge Analytica, central bank independence, Chelsea Manning, cloud computing, computer vision, creative destruction, cryptocurrency, Daniel Kahneman / Amos Tversky, data science, deep learning, DeepMind, disinformation, Dominic Cummings, Donald Trump, driverless car, Edward Snowden, Elon Musk, Evgeny Morozov, fake news, Filter Bubble, future of work, general purpose technology, gig economy, global village, Google bus, Hans Moravec, hive mind, Howard Rheingold, information retrieval, initial coin offering, Internet of things, Jeff Bezos, Jeremy Corbyn, job automation, John Gilmore, John Maynard Keynes: technological unemployment, John Perry Barlow, Julian Assange, manufacturing employment, Mark Zuckerberg, Marshall McLuhan, Menlo Park, meta-analysis, mittelstand, move fast and break things, Network effects, Nicholas Carr, Nick Bostrom, off grid, Panopticon Jeremy Bentham, payday loans, Peter Thiel, post-truth, prediction markets, QR code, ransomware, Ray Kurzweil, recommendation engine, Renaissance Technologies, ride hailing / ride sharing, Robert Mercer, Ross Ulbricht, Sam Altman, Satoshi Nakamoto, Second Machine Age, sharing economy, Silicon Valley, Silicon Valley billionaire, Silicon Valley ideology, Silicon Valley startup, smart cities, smart contracts, smart meter, Snapchat, Stanford prison experiment, Steve Bannon, Steve Jobs, Steven Levy, strong AI, surveillance capitalism, TaskRabbit, tech worker, technological singularity, technoutopianism, Ted Kaczynski, TED Talk, the long tail, the medium is the message, the scientific method, The Spirit Level, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, too big to fail, ultimatum game, universal basic income, WikiLeaks, World Values Survey, Y Combinator, you are the product

These algorithms are designed to serve you content that you’re likely to click on, as that means the potential to sell more advertising alongside it. For example, YouTube’s ‘up next’ videos are statistically selected based on an unbelievably sophisticated analysis of what is most likely to keep a person hooked in. According to Guillaume Chaslot, an AI specialist who worked on the recommendation engine for YouTube, the algorithms aren’t there to optimise what is truthful or honest – but to optimise watch-time. ‘Everything else was considered a distraction,’ he recently told the Guardian.17 These non-decision decisions have huge implications, because even mild confirmation bias can set off a cycle of self-perpetuation.


pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, backpropagation, basic income, Bayesian statistics, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Charles Babbage, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, creative destruction, crowdsourcing, Danny Hillis, data is not the new oil, data is the new oil, data science, deep learning, DeepMind, double helix, Douglas Hofstadter, driverless car, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, Geoffrey Hinton, global village, Google Glasses, Gödel, Escher, Bach, Hans Moravec, incognito mode, information retrieval, Jeff Hawkins, job automation, John Markoff, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, large language model, lone genius, machine translation, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, Nick Bostrom, NP-complete, off grid, P = NP, PageRank, pattern recognition, phenotype, planetary scale, power law, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, scientific worldview, Second Machine Age, self-driving car, Silicon Valley, social intelligence, speech recognition, Stanford marshmallow experiment, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the long tail, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight, yottabyte, zero-sum game

Satellites, DNA sequencers, and particle accelerators probe nature in ever-finer detail, and learning algorithms turn the torrents of data into new scientific knowledge. Companies know their customers like never before. The candidate with the best voter models wins, like Obama against Romney. Unmanned vehicles pilot themselves across land, sea, and air. No one programmed your tastes into the Amazon recommendation system; a learning algorithm figured them out on its own, by generalizing from your past purchases. Google’s self-driving car taught itself how to stay on the road; no engineer wrote an algorithm instructing it, step-by-step, how to get from A to B. No one knows how to program a car to drive, and no one needs to, because a car equipped with a learning algorithm picks it up by observing what the driver does.

The Master Algorithm is the complete package. Applying it to vast amounts of patient and drug data, combined with knowledge mined from the biomedical literature, is how we will cure cancer. A universal learner is sorely needed in many other areas, from life-and-death to mundane situations. Picture the ideal recommender system, one that recommends the books, movies, and gadgets you would pick for yourself if you had the time to check them all out. Amazon’s algorithm is a very far cry from it. That’s partly because it doesn’t have enough data—mainly it just knows which items you previously bought from Amazon—but if you went hog wild and gave it access to your complete stream of consciousness from birth, it wouldn’t know what to do with it.

Using the k nearest neighbors instead of one is not the end of the story. Intuitively, the examples closest to the test example should count for more. This leads us to the weighted k-nearest-neighbor algorithm. In 1994, a team of researchers from the University of Minnesota and MIT built a recommendation system based on what they called “a deceptively simple idea”: people who agreed in the past are likely to agree again in the future. That notion led directly to the collaborative filtering systems that all self-respecting e-commerce sites have. Suppose that, like Netflix, you’ve gathered a database of movie ratings, with each user giving a rating of one to five stars to the movies he or she has seen.


pages: 463 words: 105,197

Radical Markets: Uprooting Capitalism and Democracy for a Just Society by Eric Posner, E. Weyl

3D printing, activist fund / activist shareholder / activist investor, Affordable Care Act / Obamacare, Airbnb, Amazon Mechanical Turk, anti-communist, augmented reality, basic income, Berlin Wall, Bernie Sanders, Big Tech, Branko Milanovic, business process, buy and hold, carbon footprint, Cass Sunstein, Clayton Christensen, cloud computing, collective bargaining, commoditize, congestion pricing, Corn Laws, corporate governance, crowdsourcing, cryptocurrency, data science, deep learning, DeepMind, Donald Trump, Elon Musk, endowment effect, Erik Brynjolfsson, Ethereum, feminist movement, financial deregulation, Francis Fukuyama: the end of history, full employment, gamification, Garrett Hardin, George Akerlof, global macro, global supply chain, guest worker program, hydraulic fracturing, Hyperloop, illegal immigration, immigration reform, income inequality, income per capita, index fund, informal economy, information asymmetry, invisible hand, Jane Jacobs, Jaron Lanier, Jean Tirole, Jeremy Corbyn, Joseph Schumpeter, Kenneth Arrow, labor-force participation, laissez-faire capitalism, Landlord’s Game, liberal capitalism, low skilled workers, Lyft, market bubble, market design, market friction, market fundamentalism, mass immigration, negative equity, Network effects, obamacare, offshore financial centre, open borders, Pareto efficiency, passive investing, patent troll, Paul Samuelson, performance metric, plutocrats, pre–internet, radical decentralization, random walk, randomized controlled trial, Ray Kurzweil, recommendation engine, rent-seeking, Richard Thaler, ride hailing / ride sharing, risk tolerance, road to serfdom, Robert Shiller, Ronald Coase, Rory Sutherland, search costs, Second Machine Age, second-price auction, self-driving car, shareholder value, sharing economy, Silicon Valley, Skype, special economic zone, spectrum auction, speech recognition, statistical model, stem cell, telepresence, Thales and the olive presses, Thales of Miletus, The Death and Life of Great American Cities, The Future of Employment, The Market for Lemons, The Nature of the Firm, The Rise and Fall of American Growth, The Theory of the Leisure Class by Thorstein Veblen, The Wealth of Nations by Adam Smith, Thorstein Veblen, trade route, Tragedy of the Commons, transaction costs, trickle-down economics, Tyler Cowen, Uber and Lyft, uber lyft, universal basic income, urban planning, Vanguard fund, vertical integration, women in the workforce, Zipcar

Today, machines learn from the statistical patterns in human behavior, and may be able to use this information to distribute goods (and jobs) as well as, or possibly better than, people can choose goods (and jobs) themselves. We are very far from this point, but we can see the outlines of the route that we might travel. Let us start with an increasingly familiar phenomenon: machine learning–based recommendation systems drawing on existing market behavior. How does Netflix guess what movies you are likely to enjoy? Roughly, it finds people who are like you—who watch many of the movies you watch—and gives those movies ratings similar to your ratings. It then infers that you will enjoy movies you have not yet seen that your hidden doppelgangers have seen and rated highly.

., 240 Amazon, 112, 230–31, 234, 239, 248, 288, 290–91 American Constitution, 86–87 American Federation of Musicians, 210 American Tobacco Company, 174 America OnLine (AOL), 210 Anderson, Chris, 212 antitrust: Clayton Act and, 176–77, 197, 311n25; landlords and, 201–2; monopolies and, 23, 48, 174–77, 180, 184–86, 191, 197–203, 242, 255, 262, 286; resale price maintenance and, 200–201; social media and, 202 Apple, 117, 239, 289 Arginoussai Islands, 83 aristocracy, 16–17, 22–23, 36–38, 84–85, 87, 90, 135–36 Aristotle, 172 Arrow, Kenneth, 92, 303n17 Articles of Confederation, 88 artificial intelligence (AI), 202, 257, 287; Alexa and, 248; algorithms and, 208, 214, 219, 221, 281–82, 289–93; automated video editing and, 208; Cortana and, 219; data capacities and, 236; Deep Blue and, 213; democratization of, 219; diminishing returns and, 229–30; facial recognition and, 208, 216–19; factories for thinking machines and, 213–20; Google Assistant and, 219; human-produced data for, 208–9; marginal value and, 224–28, 247; Microsoft and, 219; neural networks and, 214–19; payment systems for, 224–30; recommendation systems and, 289–90; siren servers and, 220–24, 230–41, 243; Siri and, 219, 248; technofeudalism and, 230–33; techno-optimists and, 254–55, 316n2; techno-pessimists and, 254–55, 316n2; worker replacement and, 223 Athens, 55, 83–84, 131 Atwood, Margaret, 18–19 auctions, xv–xxi, 49–51, 70–71, 97, 99, 147–49, 156–57, 300n34 au pair program, 154–55, 161 Australia, 10, 12, 13, 159, 162 Austrian school, 2 Autor, David, 240 Azar, José, 185, 189, 310n24 Bahrain, 158 banking industry, 182–84, 183, 190 Bank of America, 183, 184 Becker, Gary, 147 Beckford, William, 95 behavioral finance, 180–81 Bénabou, Roland, 236–37 Bentham, Jeremy, 4, 35, 95–96, 98, 132 Berle, Adolf, 177–78, 183, 193–94 Berlin Wall, 1, 140 Berners-Lee, Tim, 210 big data, 213, 226, 293 Bing, xxi BlackRock, 171, 181–84, 183, 187, 191 Brazil, xiii–xvii, 105, 135 Brin, Sergey, 211 broadcast spectrum, xxi, 50–51, 71 Bush, George W., 78 Cabral, Luís, 202 Cadappster app, 31 Caesar, Julius, 84 Canada, 10, 13, 159, 182 capitalism, xvi; basic structure of, 24–25; competition and, 17 (see also competition); corporate planning and, 39–40; cultural consequences of, 270, 273; Engels on, 239–40; freedom and, 34–39; George on, 36–37; growth and, 3 (see also growth, economic); industrial revolution, 36, 255; inequality and, 3 (see also inequality); labor and, 136–37, 143, 159, 165, 211, 224, 231, 239–40, 316n4; laissez-faire, 45; liberalism and, 3, 17, 22–27; markets and, 278, 288, 304n36; Marx on, 239–40; monopolies and, 22–23, 34–39, 44, 46–49, 132, 136, 173, 177, 179, 199, 258, 262; monopsony and, 190, 199–201, 223, 234, 238–41, 255; ownership and, 34–36, 39, 45–49, 75, 78–79; property and, 34–36, 39, 45–49, 75, 78–79; Radical Markets and, 169, 180–85, 203, 273; regulations and, 262; Schumpeter on, 47; shareholders and, 118, 170, 178–84, 189, 193–95; technology and, 34, 203, 316n4; wealth and, 45, 75, 78–79, 136, 143, 239, 273 Capitalism and Freedom (Friedman), xiii Capitalism for the People, A (Luigi), 203 Capra, Frank, 17 Carroll, Lewis, 176 central planning: computers and, 277–85, 288–93; consumers and, 19; democracy and, 89; governance and, 19–20, 39–42, 46–48, 62, 89, 277–85, 288–90, 293; healthcare and, 290–91; liberalism and, 19–20; markets and, 277–85, 288–93; property and, 39–42, 46–48, 62; recommendation systems and, 289–90; socialism and, 39–42, 47, 277, 281 Chetty, Raj, 11 Chiang Kai-shek, 46 China, 15, 46, 56, 133–34, 138 Christensen, Clayton, 202 Chrysler, 193 Citigroup, 183, 184, 191 Clarke, Edward, 99, 102, 105 Clayton Act, 176–77, 197, 311n25 Clemens, Michael, 162 Coase, Ronald, 40, 48–51, 299n26 Cold War, xix, 25, 288 collective bargaining, 240–41 collective decisions: democracy and, 97–105, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; manipulation of, 99; markets for, 97–105; public goods and, 98; Quadratic Voting (QV) and, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; Vickrey and, 99, 102, 105 colonialism, 8, 131 Coming of the Third Reich, The (Evans), 93 common ownership self-assessed tax (COST): broader application of, 273–76; cybersquatters and, 72; education and, 258–59; efficiency and, 256, 261; equality and, 258; globalization and, 269–70; growth and, 73, 256; human capital and, 258–61; immigrants and, 261, 269, 273; inequality and, 256–59; international trade and, 270; investment and, 258–59, 270; legal issues and, 275; markets and, 286; methodology of, 63–66; monopolies and, 256–61, 270, 300n43; objections to, 300n43; optimality and, 61, 73, 75–79, 317n18; personal possessions and, 301n47, 317n18; political effects of, 261–64; predatory outsiders and, 300n43; prices and, 62–63, 67–77, 256, 258, 263, 275, 300n43, 317n18; property and, 31, 61–79, 271–74, 300n43, 301n47; public goods and, 256; public leases and, 69–72; Quadratic Voting (QV) and, 123–25, 194, 261–63, 273, 275, 286; Radical Markets and, 79, 123–26, 257–58, 271–72, 286; taxes and, 61–69, 73–76, 258–61, 275, 317n18; technology and, 71–72, 257–59; true market economy and, 72–75; voting and, 263; wealth and, 256–57, 261–64, 269–70, 275, 286 communism, 19–20, 46–47, 93–94, 125, 278 competition: antitrust policies and, 23, 48, 174–77, 180, 184–86, 191, 197–203, 242, 255, 262, 286; auctions and, xv–xix, 49–51, 70–71, 97, 99, 147–49, 156–57; bargaining and, 240–41, 299n26; democracy and, 109, 119–20; by design, 49–55; elitism and, 25–28; equilibrium and, 305n40; eternal vigilance and, 204; horizontal concentration and, 175; imperfect, 304n36; indexing and, 185–91, 302n63; innovation and, 202–3; investment and, 196–97; labor and, 145, 158, 162–63, 220, 234, 236, 239, 243, 245, 256, 266; laissez-faire and, 253; liberalism and, 6, 17, 20–28; lobbyists and, 262; monopolies and, 174; monopsony and, 190, 199–201, 223, 234, 238–41, 255; ownership and, 20–21, 41, 49–55, 79; perfect, 6, 25–28, 109; prices and, 20–22, 25, 173, 175, 180, 185–90, 193, 200–201, 204, 244; property and, 41, 49–55, 79; Quadratic Voting (QV) and, 304n36; regulations and, 262; resale price maintenance and, 200–201; restoring, 191–92; Section 7 and, 196–97, 311n25; selfishness and, 109, 270–71; Smith on, 17; tragedy of the commons and, 44 complexity, 218–20, 226–28, 274–75, 279, 281, 284, 287, 313n15 “Computer and the Market, The” (Lange), 277 computers: algorithms and, 208, 214, 219, 221, 281–82, 289–93; automation of labor and, 222–23, 251, 254; central planning and, 277–85, 288–93; data and, 213–14, 218, 222, 233, 244, 260; Deep Blue, 213; distributed computing and, 282–86, 293; growth in poor countries and, 255; as intermediaries, 274; machine learning (ML) and, 214 (see also machine learning [ML]); markets and, 277, 280–93; Mises and, 281; Moore’s Law and, 286–87; Open-Trac and, 31–32; parallel processing and, 282–86; prices of, 21; recommendation systems and, 289–90 Condorcet, Marquis de, 4, 90–93, 303n15, 306n51 conspicuous consumption, 78 Consumer Reports magazine, 291 consumers: antitrust suits and, 175, 197–98; central planning and, 19; data from, 47, 220, 238, 242–44, 248, 289; drone delivery to, 220; as entrepreneurs, 256; goods and services for, 27, 92, 123, 130, 175, 280, 292; institutional investment and, 190–91; international culture for, 270; lobbyists and, 262; machine learning (ML) and, 238; monopolies and, 175, 186, 197–98; preferences of, 280, 288–93; prices and, 172 (see also prices); recommendation systems and, 289–90; robots and, 287; sharing economy and, 117; Soviet collapse and, 289; technology and, 287 cooperatives, 118, 126, 261, 267, 299n24 Corbyn, Jeremy, 12, 13 corruption, 3, 23, 27, 57, 93, 122, 126, 157, 262 Cortana, 219 cost-benefit analysis, 2, 244 “Counterspeculation, Auctions and Competitive Sealed Tenders” (Vickrey), xx–xxi Cramton, Peter, 52, 54–55, 57 crowdsourcing, 235 crytocurrencies, 117–18 cybersquatters, 72 data: algorithms and, 208, 214, 219, 221, 281–82, 289–93; big, 213, 226, 293; computers and, 213–14, 218, 222, 233, 244, 260; consumer, 47, 220, 238, 242–44, 248, 289; diamond-water paradox and, 224–25; diminishing returns and, 226, 229–30; distribution of complexity and, 228; as entertainment, 233–39, 248–49; Facebook and, 28, 205–9, 212–13, 220–21, 231–48; feedback and, 114, 117, 233, 238, 245; free, 209, 211, 220, 224, 231–35, 239; Google and, 28, 202, 207–13, 219–20, 224, 231–36, 241–42, 246; investment in, 212, 224, 232, 244; labeled, 217–21, 227, 228, 230, 232, 234, 237; labor movement for, 241–43; Lanier and, 208, 220–24, 233, 237, 313n2, 315n48; marginal value and, 224–28, 247; network effects and, 211, 236, 238, 243; neural networks and, 214–19; online services and, 211, 235; overfitting and, 217–18; payment systems for, 210–13, 224–30; photographs and, 64, 214–15, 217, 219–21, 227–28, 291; programmers and, 163, 208–9, 214, 217, 219, 224; Radical Markets for, 246–49; reCAPTCHA and, 235–36; recommendation systems and, 289–90; rise of data work and, 209–13; sample complexity and, 217–18; siren servers and, 220–24, 230–41, 243; social networks and, 202, 212, 231, 233–36; technofeudalism and, 230–33; under-employment and, 256; value of, 243–45; venture capital and, 211, 224; virtual reality and, 206, 208, 229, 251, 253; women’s work and, 209, 313n4 Declaration of Independence, 86 Deep Blue, 213 DeFoe, Daniel, 132 Demanding Work (Gray and Suri), 233 democracy: 1p1v system and, 82–84, 94, 109, 119, 122–24, 304n36, 306n51; artificial intelligence (AI) and, 219; Athenians and, 55, 83–84, 131; auctions and, 97, 99; basic structure of, 24–25; central planning and, 89; check and balance systems and, 23, 25, 87, 92; collective decisions and, 97–105, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; collective mediocrity and, 96; competition and, 109, 119–20; Declaration of Independence and, 86; efficiency and, 92, 110, 126; elections and, 22, 80, 93, 100, 115, 119–21, 124, 217–18, 296n20; elitism and, 89–91, 96, 124; Enlightenment and, 86, 95; Europe and, 90–96; France and, 90–95; governance and, 84, 117; gridlock and, 84, 88, 122–24, 261, 267; Hitler and, 93–94; House of Commons and, 84–85; House of Lords and, 85; impossibility theorem and, 92; inequality and, 123; Jury Theorem and, 90–92; liberalism and, 3–4, 25, 80, 86, 90; limits of, 85–86; majority rule and, 27, 83–89, 92–97, 100–101, 121, 306n51; markets and, 97–105, 262, 276; minorities and, 85–90, 93–97, 101, 106, 110; mixed constitution and, 84–85; multi-candidate, single-winner elections and, 119–20; origins of, 83–85; ownership and, 81–82, 89, 101, 105, 118, 124; public goods and, 28, 97–100, 107, 110, 120, 123, 126; Quadratic Voting (QV) and, 105–22; Radical Markets and, 82, 106, 123–26, 203; supermajorities and, 84–85, 88, 92; tyrannies and, 23, 25, 88, 96–100, 106, 108; United Kingdom and, 95–96; United States and, 86–90, 93, 95; voting and, 80–82, 85–93, 96, 99, 105, 108, 115–16, 119–20, 123–24, 303n14, 303n17, 303n20, 304n36, 305n39; wealth and, 83–84, 87, 95, 116 Demosthenes, 55 Denmark, 182 Department of Justice (DOJ), 176, 186, 191 deregulation, 3, 9, 24 Desmond, Matthew, 201–2 Dewey, John, 43 Dickens, Charles, 36 digital economy: data producers and, 208–9, 230–31; diamond-water paradox and, 224–25; as entertainment, 233–39; facial recognition and, 208, 216, 218–19; free access and, 211; Lanier and, 208, 220–24, 233, 237, 313n2, 315n48; machine learning (ML) and, 208–9, 213–14, 217–21, 226–31, 234–35, 238, 247, 289, 291, 315n48; payment systems for, 210–13, 221–30, 243–45; programmers and, 163, 208–9, 214, 217, 219, 224; rise of data work and, 209–13; siren servers and, 220–24, 230–41, 243; spam and, 210, 245; technofeudalism and, 230–33; virtual reality and, 206, 208, 229, 251, 253 diversification, 171–72, 180–81, 185, 191–92, 194–96, 310n22, 310n24 dot-com bubble, 211 double taxation, 65 Dupuit, Jules, 173 Durkheim, Émile, 297n23 Dworkin, Ronald, 305n40 dystopia, 18, 191, 273, 293 education, 114; common ownership self-assessed tax (COST) and, 258; data and, 229, 232, 248; elitism and, 260; equality in, 89; financing, 276; free compulsory, 23; immigrants and, 14, 143–44, 148; labor and, 140, 143–44, 148, 150, 158, 170–71, 232, 248, 258–60; Mill on, 96; populist movements and, 14; Stolper-Samuelson Theorem and, 143 efficient capital markets hypothesis, 180 elections, 80; data and, 217–18; democracy and, 22, 93, 100, 115, 119–21, 124, 217–18, 296n20; gridlock and, 124; Hitler and, 93; multi-candidate, single-winner, 119–20; polls and, 13, 111; Quadratic Voting (QV) and, 115, 119–21, 268, 306n52; U.S. 2016, 93, 296n20 Elhauge, Einer, 176, 197 elitism: aristocracy and, 16–17, 22–23, 36–38, 84–85, 87, 90, 135–36; bourgeoisie and, 36; bureaucrats and, 267; democracy and, 89–91, 96, 124; education and, 260; feudalism and, 16, 34–35, 37, 41, 61, 68, 136, 230–33, 239; financial deregulation and, 3; immigrants and, 146, 166; liberalism and, 3, 15–16, 25–28; minorities and, 12, 14–15, 19, 23–27, 85–90, 93–97, 101, 106, 110, 181, 194, 273, 303n14, 304n36; monarchies and, 85–86, 91, 95, 160 Emergency Economic Stabilization Act, 121 eminent domain, 33, 62, 89 Empire State Building, 45 Engels, Friedrich, 78, 240 Enlightenment, 86, 95 entrepreneurs, xiv; immigrants and, 144–45, 159, 256; labor and, 129, 144–45, 159, 173, 177, 203, 209–12, 224, 226, 256; ownership and, 35, 39 equality: common ownership self-assessed tax (COST) and, 258; education and, 89; immigrants and, 257; labor and, 147, 166, 239, 257; liberalism and, 4, 8, 24, 29; living standards and, 3, 11, 13, 133, 135, 148, 153, 254, 257; Quadratic Voting (QV) and, 264; Radical Markets and, 262, 276; trickle down theories and, 9, 12 Espinosa, Alejandro, 30–32 Ethereum, 117 Europe, 177, 201; democracy and, 88, 90–95; European Union and, 15; fiefdoms in, 34; government utilities and, 48; income patterns in, 5; instability in, 88; labor and, 11, 130–31, 136–47, 165, 245; social democrats and, 24; unemployment rates in, 11 Evans, Richard, 93 Evicted (Desmond), 201–2 Ex Machina (film), 208 Facebook, xxi; advertising and, 50, 202; data and, 28, 205–9, 212–13, 220–21, 231–48; monetization by, 28; news service of, 289; Vickrey Commons and, 50 facial recognition, 208, 216–19 family reunification programs, 150, 152 farms, 17, 34–35, 37–38, 61, 72, 135, 142, 179, 283–85 Federal Communications Commission (FCC), 50, 71 Federal Trade Commission (FTC), 176, 186 feedback, 114, 117, 233, 238, 245 feudalism, 16, 34–35, 37, 41, 61, 68, 136, 230–33, 239 Fidelity, 171, 181–82, 184 financial crisis of 2008, 3, 121 Fitzgerald, F.

., 78 Cabral, Luís, 202 Cadappster app, 31 Caesar, Julius, 84 Canada, 10, 13, 159, 182 capitalism, xvi; basic structure of, 24–25; competition and, 17 (see also competition); corporate planning and, 39–40; cultural consequences of, 270, 273; Engels on, 239–40; freedom and, 34–39; George on, 36–37; growth and, 3 (see also growth, economic); industrial revolution, 36, 255; inequality and, 3 (see also inequality); labor and, 136–37, 143, 159, 165, 211, 224, 231, 239–40, 316n4; laissez-faire, 45; liberalism and, 3, 17, 22–27; markets and, 278, 288, 304n36; Marx on, 239–40; monopolies and, 22–23, 34–39, 44, 46–49, 132, 136, 173, 177, 179, 199, 258, 262; monopsony and, 190, 199–201, 223, 234, 238–41, 255; ownership and, 34–36, 39, 45–49, 75, 78–79; property and, 34–36, 39, 45–49, 75, 78–79; Radical Markets and, 169, 180–85, 203, 273; regulations and, 262; Schumpeter on, 47; shareholders and, 118, 170, 178–84, 189, 193–95; technology and, 34, 203, 316n4; wealth and, 45, 75, 78–79, 136, 143, 239, 273 Capitalism and Freedom (Friedman), xiii Capitalism for the People, A (Luigi), 203 Capra, Frank, 17 Carroll, Lewis, 176 central planning: computers and, 277–85, 288–93; consumers and, 19; democracy and, 89; governance and, 19–20, 39–42, 46–48, 62, 89, 277–85, 288–90, 293; healthcare and, 290–91; liberalism and, 19–20; markets and, 277–85, 288–93; property and, 39–42, 46–48, 62; recommendation systems and, 289–90; socialism and, 39–42, 47, 277, 281 Chetty, Raj, 11 Chiang Kai-shek, 46 China, 15, 46, 56, 133–34, 138 Christensen, Clayton, 202 Chrysler, 193 Citigroup, 183, 184, 191 Clarke, Edward, 99, 102, 105 Clayton Act, 176–77, 197, 311n25 Clemens, Michael, 162 Coase, Ronald, 40, 48–51, 299n26 Cold War, xix, 25, 288 collective bargaining, 240–41 collective decisions: democracy and, 97–105, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; manipulation of, 99; markets for, 97–105; public goods and, 98; Quadratic Voting (QV) and, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; Vickrey and, 99, 102, 105 colonialism, 8, 131 Coming of the Third Reich, The (Evans), 93 common ownership self-assessed tax (COST): broader application of, 273–76; cybersquatters and, 72; education and, 258–59; efficiency and, 256, 261; equality and, 258; globalization and, 269–70; growth and, 73, 256; human capital and, 258–61; immigrants and, 261, 269, 273; inequality and, 256–59; international trade and, 270; investment and, 258–59, 270; legal issues and, 275; markets and, 286; methodology of, 63–66; monopolies and, 256–61, 270, 300n43; objections to, 300n43; optimality and, 61, 73, 75–79, 317n18; personal possessions and, 301n47, 317n18; political effects of, 261–64; predatory outsiders and, 300n43; prices and, 62–63, 67–77, 256, 258, 263, 275, 300n43, 317n18; property and, 31, 61–79, 271–74, 300n43, 301n47; public goods and, 256; public leases and, 69–72; Quadratic Voting (QV) and, 123–25, 194, 261–63, 273, 275, 286; Radical Markets and, 79, 123–26, 257–58, 271–72, 286; taxes and, 61–69, 73–76, 258–61, 275, 317n18; technology and, 71–72, 257–59; true market economy and, 72–75; voting and, 263; wealth and, 256–57, 261–64, 269–70, 275, 286 communism, 19–20, 46–47, 93–94, 125, 278 competition: antitrust policies and, 23, 48, 174–77, 180, 184–86, 191, 197–203, 242, 255, 262, 286; auctions and, xv–xix, 49–51, 70–71, 97, 99, 147–49, 156–57; bargaining and, 240–41, 299n26; democracy and, 109, 119–20; by design, 49–55; elitism and, 25–28; equilibrium and, 305n40; eternal vigilance and, 204; horizontal concentration and, 175; imperfect, 304n36; indexing and, 185–91, 302n63; innovation and, 202–3; investment and, 196–97; labor and, 145, 158, 162–63, 220, 234, 236, 239, 243, 245, 256, 266; laissez-faire and, 253; liberalism and, 6, 17, 20–28; lobbyists and, 262; monopolies and, 174; monopsony and, 190, 199–201, 223, 234, 238–41, 255; ownership and, 20–21, 41, 49–55, 79; perfect, 6, 25–28, 109; prices and, 20–22, 25, 173, 175, 180, 185–90, 193, 200–201, 204, 244; property and, 41, 49–55, 79; Quadratic Voting (QV) and, 304n36; regulations and, 262; resale price maintenance and, 200–201; restoring, 191–92; Section 7 and, 196–97, 311n25; selfishness and, 109, 270–71; Smith on, 17; tragedy of the commons and, 44 complexity, 218–20, 226–28, 274–75, 279, 281, 284, 287, 313n15 “Computer and the Market, The” (Lange), 277 computers: algorithms and, 208, 214, 219, 221, 281–82, 289–93; automation of labor and, 222–23, 251, 254; central planning and, 277–85, 288–93; data and, 213–14, 218, 222, 233, 244, 260; Deep Blue, 213; distributed computing and, 282–86, 293; growth in poor countries and, 255; as intermediaries, 274; machine learning (ML) and, 214 (see also machine learning [ML]); markets and, 277, 280–93; Mises and, 281; Moore’s Law and, 286–87; Open-Trac and, 31–32; parallel processing and, 282–86; prices of, 21; recommendation systems and, 289–90 Condorcet, Marquis de, 4, 90–93, 303n15, 306n51 conspicuous consumption, 78 Consumer Reports magazine, 291 consumers: antitrust suits and, 175, 197–98; central planning and, 19; data from, 47, 220, 238, 242–44, 248, 289; drone delivery to, 220; as entrepreneurs, 256; goods and services for, 27, 92, 123, 130, 175, 280, 292; institutional investment and, 190–91; international culture for, 270; lobbyists and, 262; machine learning (ML) and, 238; monopolies and, 175, 186, 197–98; preferences of, 280, 288–93; prices and, 172 (see also prices); recommendation systems and, 289–90; robots and, 287; sharing economy and, 117; Soviet collapse and, 289; technology and, 287 cooperatives, 118, 126, 261, 267, 299n24 Corbyn, Jeremy, 12, 13 corruption, 3, 23, 27, 57, 93, 122, 126, 157, 262 Cortana, 219 cost-benefit analysis, 2, 244 “Counterspeculation, Auctions and Competitive Sealed Tenders” (Vickrey), xx–xxi Cramton, Peter, 52, 54–55, 57 crowdsourcing, 235 crytocurrencies, 117–18 cybersquatters, 72 data: algorithms and, 208, 214, 219, 221, 281–82, 289–93; big, 213, 226, 293; computers and, 213–14, 218, 222, 233, 244, 260; consumer, 47, 220, 238, 242–44, 248, 289; diamond-water paradox and, 224–25; diminishing returns and, 226, 229–30; distribution of complexity and, 228; as entertainment, 233–39, 248–49; Facebook and, 28, 205–9, 212–13, 220–21, 231–48; feedback and, 114, 117, 233, 238, 245; free, 209, 211, 220, 224, 231–35, 239; Google and, 28, 202, 207–13, 219–20, 224, 231–36, 241–42, 246; investment in, 212, 224, 232, 244; labeled, 217–21, 227, 228, 230, 232, 234, 237; labor movement for, 241–43; Lanier and, 208, 220–24, 233, 237, 313n2, 315n48; marginal value and, 224–28, 247; network effects and, 211, 236, 238, 243; neural networks and, 214–19; online services and, 211, 235; overfitting and, 217–18; payment systems for, 210–13, 224–30; photographs and, 64, 214–15, 217, 219–21, 227–28, 291; programmers and, 163, 208–9, 214, 217, 219, 224; Radical Markets for, 246–49; reCAPTCHA and, 235–36; recommendation systems and, 289–90; rise of data work and, 209–13; sample complexity and, 217–18; siren servers and, 220–24, 230–41, 243; social networks and, 202, 212, 231, 233–36; technofeudalism and, 230–33; under-employment and, 256; value of, 243–45; venture capital and, 211, 224; virtual reality and, 206, 208, 229, 251, 253; women’s work and, 209, 313n4 Declaration of Independence, 86 Deep Blue, 213 DeFoe, Daniel, 132 Demanding Work (Gray and Suri), 233 democracy: 1p1v system and, 82–84, 94, 109, 119, 122–24, 304n36, 306n51; artificial intelligence (AI) and, 219; Athenians and, 55, 83–84, 131; auctions and, 97, 99; basic structure of, 24–25; central planning and, 89; check and balance systems and, 23, 25, 87, 92; collective decisions and, 97–105, 110–11, 118–20, 122, 124, 273, 303n17, 304n36; collective mediocrity and, 96; competition and, 109, 119–20; Declaration of Independence and, 86; efficiency and, 92, 110, 126; elections and, 22, 80, 93, 100, 115, 119–21, 124, 217–18, 296n20; elitism and, 89–91, 96, 124; Enlightenment and, 86, 95; Europe and, 90–96; France and, 90–95; governance and, 84, 117; gridlock and, 84, 88, 122–24, 261, 267; Hitler and, 93–94; House of Commons and, 84–85; House of Lords and, 85; impossibility theorem and, 92; inequality and, 123; Jury Theorem and, 90–92; liberalism and, 3–4, 25, 80, 86, 90; limits of, 85–86; majority rule and, 27, 83–89, 92–97, 100–101, 121, 306n51; markets and, 97–105, 262, 276; minorities and, 85–90, 93–97, 101, 106, 110; mixed constitution and, 84–85; multi-candidate, single-winner elections and, 119–20; origins of, 83–85; ownership and, 81–82, 89, 101, 105, 118, 124; public goods and, 28, 97–100, 107, 110, 120, 123, 126; Quadratic Voting (QV) and, 105–22; Radical Markets and, 82, 106, 123–26, 203; supermajorities and, 84–85, 88, 92; tyrannies and, 23, 25, 88, 96–100, 106, 108; United Kingdom and, 95–96; United States and, 86–90, 93, 95; voting and, 80–82, 85–93, 96, 99, 105, 108, 115–16, 119–20, 123–24, 303n14, 303n17, 303n20, 304n36, 305n39; wealth and, 83–84, 87, 95, 116 Demosthenes, 55 Denmark, 182 Department of Justice (DOJ), 176, 186, 191 deregulation, 3, 9, 24 Desmond, Matthew, 201–2 Dewey, John, 43 Dickens, Charles, 36 digital economy: data producers and, 208–9, 230–31; diamond-water paradox and, 224–25; as entertainment, 233–39; facial recognition and, 208, 216, 218–19; free access and, 211; Lanier and, 208, 220–24, 233, 237, 313n2, 315n48; machine learning (ML) and, 208–9, 213–14, 217–21, 226–31, 234–35, 238, 247, 289, 291, 315n48; payment systems for, 210–13, 221–30, 243–45; programmers and, 163, 208–9, 214, 217, 219, 224; rise of data work and, 209–13; siren servers and, 220–24, 230–41, 243; spam and, 210, 245; technofeudalism and, 230–33; virtual reality and, 206, 208, 229, 251, 253 diversification, 171–72, 180–81, 185, 191–92, 194–96, 310n22, 310n24 dot-com bubble, 211 double taxation, 65 Dupuit, Jules, 173 Durkheim, Émile, 297n23 Dworkin, Ronald, 305n40 dystopia, 18, 191, 273, 293 education, 114; common ownership self-assessed tax (COST) and, 258; data and, 229, 232, 248; elitism and, 260; equality in, 89; financing, 276; free compulsory, 23; immigrants and, 14, 143–44, 148; labor and, 140, 143–44, 148, 150, 158, 170–71, 232, 248, 258–60; Mill on, 96; populist movements and, 14; Stolper-Samuelson Theorem and, 143 efficient capital markets hypothesis, 180 elections, 80; data and, 217–18; democracy and, 22, 93, 100, 115, 119–21, 124, 217–18, 296n20; gridlock and, 124; Hitler and, 93; multi-candidate, single-winner, 119–20; polls and, 13, 111; Quadratic Voting (QV) and, 115, 119–21, 268, 306n52; U.S. 2016, 93, 296n20 Elhauge, Einer, 176, 197 elitism: aristocracy and, 16–17, 22–23, 36–38, 84–85, 87, 90, 135–36; bourgeoisie and, 36; bureaucrats and, 267; democracy and, 89–91, 96, 124; education and, 260; feudalism and, 16, 34–35, 37, 41, 61, 68, 136, 230–33, 239; financial deregulation and, 3; immigrants and, 146, 166; liberalism and, 3, 15–16, 25–28; minorities and, 12, 14–15, 19, 23–27, 85–90, 93–97, 101, 106, 110, 181, 194, 273, 303n14, 304n36; monarchies and, 85–86, 91, 95, 160 Emergency Economic Stabilization Act, 121 eminent domain, 33, 62, 89 Empire State Building, 45 Engels, Friedrich, 78, 240 Enlightenment, 86, 95 entrepreneurs, xiv; immigrants and, 144–45, 159, 256; labor and, 129, 144–45, 159, 173, 177, 203, 209–12, 224, 226, 256; ownership and, 35, 39 equality: common ownership self-assessed tax (COST) and, 258; education and, 89; immigrants and, 257; labor and, 147, 166, 239, 257; liberalism and, 4, 8, 24, 29; living standards and, 3, 11, 13, 133, 135, 148, 153, 254, 257; Quadratic Voting (QV) and, 264; Radical Markets and, 262, 276; trickle down theories and, 9, 12 Espinosa, Alejandro, 30–32 Ethereum, 117 Europe, 177, 201; democracy and, 88, 90–95; European Union and, 15; fiefdoms in, 34; government utilities and, 48; income patterns in, 5; instability in, 88; labor and, 11, 130–31, 136–47, 165, 245; social democrats and, 24; unemployment rates in, 11 Evans, Richard, 93 Evicted (Desmond), 201–2 Ex Machina (film), 208 Facebook, xxi; advertising and, 50, 202; data and, 28, 205–9, 212–13, 220–21, 231–48; monetization by, 28; news service of, 289; Vickrey Commons and, 50 facial recognition, 208, 216–19 family reunification programs, 150, 152 farms, 17, 34–35, 37–38, 61, 72, 135, 142, 179, 283–85 Federal Communications Commission (FCC), 50, 71 Federal Trade Commission (FTC), 176, 186 feedback, 114, 117, 233, 238, 245 feudalism, 16, 34–35, 37, 41, 61, 68, 136, 230–33, 239 Fidelity, 171, 181–82, 184 financial crisis of 2008, 3, 121 Fitzgerald, F.


pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel

Alan Greenspan, Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apollo 11, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, butter production in bangladesh, call centre, Charles Lindbergh, commoditize, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil, data science, driverless car, en.wikipedia.org, Erik Brynjolfsson, Everything should be made as simple as possible, experimental subject, Google Glasses, happiness index / gross national happiness, information security, job satisfaction, Johann Wolfgang von Goethe, lifelogging, machine readable, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mass immigration, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, Shai Danziger, software as a service, SpaceShipOne, speech recognition, statistical model, Steven Levy, supply chain finance, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Davenport, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra, zero-sum game

Every important thing a person does is valuable to predict, namely: consume, think, work, quit, vote, love, procreate, divorce, mess up, lie, cheat, steal, kill, and die. Let’s explore some examples.2 People Consume Hollywood studios predict the success of a screenplay if produced. Netflix awarded $1 million to a team of scientists who best improved their recommendation system’s ability to predict which movies you will like. Australian energy company Energex predicts electricity demand in order to decide where to build out its power grid, and Con Edison predicts system failure in the face of high levels of consumption. Wall Street predicts stock prices by observing how demand drives them up and down.

The product it hawked, pictured for all my fellow shoppers to see, had the potential to mortify. It was a coupon for Beano, a medication for flatulence. I’d developed mild lactose intolerance, but, before figuring that out, had been trying anything to address my symptom. Acting blindly on data, Walgreens’ recommendation system seemed to suggest that others not stand so close. Other clinical data holds a more serious and sensitive status than digestive woes. Once, when teaching a summer program for talented teenagers, I received data I felt would have been better kept away from me. The administrator took me aside to inform me that one of my students had a diagnosis of bipolar disorder.

Such a contest is a hard-nosed, objective bake-off—whoever can cook up the solution that best handles the predictive task at hand wins kudos and, usually, cash. Dark Horses And so it was with our two Montrealers, Martin and Martin, who took the Netflix Prize by storm despite their lack of experience—or, perhaps, because of it. Neither had a background in statistics or analytics, let alone recommendation systems in particular. By day, the two worked in the telecommunications industry developing software. But by night, at home, the two-member team plugged away, for 10 to 20 hours per week apiece, racing ahead in the contest under the team name PragmaticTheory. The “pragmatic” approach proved groundbreaking.


pages: 181 words: 52,147

The Driver in the Driverless Car: How Our Technology Choices Will Create the Future by Vivek Wadhwa, Alex Salkever

23andMe, 3D printing, Airbnb, AlphaGo, artificial general intelligence, augmented reality, autonomous vehicles, barriers to entry, benefit corporation, Bernie Sanders, bitcoin, blockchain, clean water, correlation does not imply causation, CRISPR, deep learning, DeepMind, distributed ledger, Donald Trump, double helix, driverless car, Elon Musk, en.wikipedia.org, epigenetics, Erik Brynjolfsson, gigafactory, Google bus, Hyperloop, income inequality, information security, Internet of things, job automation, Kevin Kelly, Khan Academy, Kickstarter, Law of Accelerating Returns, license plate recognition, life extension, longitudinal study, Lyft, M-Pesa, Mary Meeker, Menlo Park, microbiome, military-industrial complex, mobile money, new economy, off-the-grid, One Laptop per Child (OLPC), personalized medicine, phenotype, precision agriculture, radical life extension, RAND corporation, Ray Kurzweil, recommendation engine, Ronald Reagan, Second Machine Age, self-driving car, seminal paper, Silicon Valley, Skype, smart grid, stem cell, Stephen Hawking, Steve Wozniak, Stuxnet, supercomputer in your pocket, synthetic biology, Tesla Model S, The future is already here, The Future of Employment, Thomas Davenport, Travis Kalanick, Turing test, Uber and Lyft, Uber for X, uber lyft, uranium enrichment, Watson beat the top human players on Jeopardy!, zero day

I couldn’t, for example, recall the winning and losing pitcher in every baseball game of the major leagues from the previous night. Narrow A.I. is now embedded in the fabric of our everyday lives. The humanoid phone trees that route calls to airlines’ support desks are all narrow A.I., as are recommendation engines in Amazon and Spotify. Google Maps’ astonishingly smart route suggestions (and mid-course modifications to avoid traffic) are classic narrow A.I. Narrow-A.I. systems are much better than humans are at accessing information stored in complex databases, but their capabilities are specific and limited, and exclude creative thought.


pages: 222 words: 53,317

Overcomplicated: Technology at the Limits of Comprehension by Samuel Arbesman

algorithmic trading, Anthropocene, Anton Chekhov, Apple II, Benoit Mandelbrot, Boeing 747, Chekhov's gun, citation needed, combinatorial explosion, Computing Machinery and Intelligence, Danny Hillis, data science, David Brooks, digital map, discovery of the americas, driverless car, en.wikipedia.org, Erik Brynjolfsson, Flash crash, friendly AI, game design, Google X / Alphabet X, Googley, Hans Moravec, HyperCard, Ian Bogost, Inbox Zero, Isaac Newton, iterative process, Kevin Kelly, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mandelbrot fractal, Minecraft, Neal Stephenson, Netflix Prize, Nicholas Carr, Nick Bostrom, Parkinson's law, power law, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman: Challenger O-ring, Second Machine Age, self-driving car, SimCity, software studies, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, Stewart Brand, superintelligent machines, synthetic biology, systems thinking, the long tail, Therac-25, Tyler Cowen, Tyler Cowen: Great Stagnation, urban planning, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, Y2K

The sophisticated machine learning techniques used in linguistics—employing probability and a large array of parameters rather than principled rules—are increasingly being used in numerous other areas, both in science and outside it, from criminal detection to medicine, as well as in the insurance industry. Even our aesthetic tastes are rather complicated, as Netflix discovered when it awarded a prize for improvements in its recommendation engine to a team whose solution was cobbled together from a variety of different statistical techniques. The contest seemed to demonstrate that no simple algorithm could provide a significant improvement in recommendation accuracy; the winners needed to use a more complex suite of methods in order to capture and predict our personal and quirky tastes in films.


pages: 196 words: 54,339

Team Human by Douglas Rushkoff

1960s counterculture, Abraham Maslow, Adam Curtis, autonomous vehicles, basic income, Berlin Wall, big-box store, bitcoin, blockchain, Burning Man, carbon footprint, circular economy, clean water, clockwork universe, cloud computing, collective bargaining, Computing Machinery and Intelligence, corporate personhood, digital capitalism, disintermediation, Donald Trump, drone strike, European colonialism, fake news, Filter Bubble, full employment, future of work, game design, gamification, gig economy, Google bus, Gödel, Escher, Bach, hockey-stick growth, Internet of things, invention of the printing press, invention of writing, invisible hand, iterative process, John Perry Barlow, Kevin Kelly, Kevin Roose, knowledge economy, Larry Ellison, Lewis Mumford, life extension, lifelogging, Mark Zuckerberg, Marshall McLuhan, means of production, mirror neurons, multilevel marketing, new economy, patient HM, pattern recognition, peer-to-peer, Peter Thiel, planned obsolescence, power law, prosperity theology / prosperity gospel / gospel of success, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Ronald Reagan, Ronald Reagan: Tear down this wall, shareholder value, sharing economy, Silicon Valley, Silicon Valley billionaire, social intelligence, sovereign wealth fund, Steve Jobs, Steven Pinker, Stewart Brand, tech billionaire, technoutopianism, TED Talk, theory of mind, trade route, Travis Kalanick, Turing test, universal basic income, Vannevar Bush, We are as Gods, winner-take-all economy, zero-sum game

Instead of retrieving the peer-to-peer marketplace, the digital economy exacerbates the division of wealth and paralyzes the social instincts for mutual aid that usually mitigate its effects. Digital platforms amplify the power law dynamics that determine winners and losers. While digital music platforms make space for many more performers to sell their music, their architecture and recommendation engines end up promoting many fewer artists than a diverse ecosystem of record stores or FM radio did. One or two superstars get all the plays, and everyone else sells almost nothing. It’s the same across the board. While the net creates more access for artists and businesses of all kinds, it allows fewer than ever to make any money.


pages: 554 words: 149,489

The Content Trap: A Strategist's Guide to Digital Change by Bharat Anand

Airbnb, Alan Greenspan, An Inconvenient Truth, AOL-Time Warner, Benjamin Mako Hill, Bernie Sanders, Clayton Christensen, cloud computing, commoditize, correlation does not imply causation, creative destruction, crowdsourcing, death of newspapers, disruptive innovation, Donald Trump, driverless car, electricity market, Eyjafjallajökull, fulfillment center, gamification, Google Glasses, Google X / Alphabet X, information asymmetry, Internet of things, inventory management, Jean Tirole, Jeff Bezos, John Markoff, Just-in-time delivery, Kaizen: continuous improvement, Khan Academy, Kickstarter, late fees, managed futures, Mark Zuckerberg, market design, Minecraft, multi-sided market, Network effects, post-work, price discrimination, publish or perish, QR code, recommendation engine, ride hailing / ride sharing, Salesforce, selection bias, self-driving car, shareholder value, Shenzhen special economic zone , Shenzhen was a fishing village, Silicon Valley, Silicon Valley startup, Skype, social graph, social web, special economic zone, Stephen Hawking, Steve Jobs, Steven Levy, Stuart Kauffman, the long tail, Thomas L Friedman, transaction costs, two-sided market, ubercab, vertical integration, WikiLeaks, winner-take-all economy, zero-sum game

Consider the intrinsic technology properties of networked products, or the word-of-mouth benefits that arise from seemingly unpredictable acts of sharing by interested individuals. It’s tempting to view these user connections as “acts of nature” over which managers have little control. But that’s not the case. By 2002 Amazon had spent more than five years creating a formidable advantage in e-commerce. That came not only from a user-friendly platform and recommendation engine—both features were adopted by other entrants—but from its warehousing and logistics operation. By building distribution centers across the country, investing in algorithms to optimize pick-time in the centers, and hiring operational wizards from Walmart and other competitors, Amazon could get products to customers anywhere in the United States faster and cheaper than anyone else.

Netflix’s queueing system, widely regarded as a tool to enhance user convenience, was instead really a powerful lever for demand forecasting: It told the company exactly what movies every customer in every part of the country wanted next, letting it tailor inventory in different warehouses to local preferences. The recommendation engine, also thought of as a means of increasing customer satisfaction, doubled as an inventory management tool: It let the company recommend not only movies a customer might like, but also those that were in stock! Netflix integrated its sorting machines with the U.S. Postal Service to make deliveries more efficient.


How to Be a Liberal: The Story of Liberalism and the Fight for Its Life by Ian Dunt

4chan, Alan Greenspan, Alfred Russel Wallace, bank run, battle of ideas, Bear Stearns, Big bang: deregulation of the City of London, Boris Johnson, bounce rate, Brexit referendum, British Empire, Brixton riot, Cambridge Analytica, Carmen Reinhart, centre right, classic study, David Ricardo: comparative advantage, disinformation, Dominic Cummings, Donald Trump, eurozone crisis, experimental subject, fake news, feminist movement, Francis Fukuyama: the end of history, full employment, Glass-Steagall Act, Growth in a Time of Debt, illegal immigration, invisible hand, John Bercow, Kenneth Rogoff, liberal world order, low interest rates, Mark Zuckerberg, mass immigration, means of production, Mohammed Bouazizi, Northern Rock, old-boy network, Paul Samuelson, Peter Thiel, Phillips curve, price mechanism, profit motive, quantitative easing, recommendation engine, road to serfdom, Ronald Reagan, Saturday Night Live, Scientific racism, Silicon Valley, Silicon Valley billionaire, Steve Bannon, The Wealth of Nations by Adam Smith, too big to fail, upwardly mobile, Winter of Discontent, working poor, zero-sum game

Nowhere was this process more evident than on YouTube. It quickly became the most popular social network in the US, with far more users than there were viewers for cable news. And those users were subject to an algorithm that seemed to push them towards ever more extreme material for their political tribe. This was chiefly because of its recommendation engine, which presented a viewer with options for what they might want to watch after they finished a video. The YouTube algorithm was not based on how to make sure people came across alternate views so that it could preserve the health of liberal democracy. It was based, like that of other social media operations, purely on engagement.

It was based, like that of other social media operations, purely on engagement. Initially, the website grounded it in ‘clicks to watch,’ but it then pivoted to ‘watchtime.’ Whatever got people watching longer was what mattered. The political effect was potentially very far-reaching. If someone clicked on a left-wing video and watched it to the end, the recommendation engine would provide more left-wing videos. Out of the options, the user might pick one. Once they did so, their choices were again narrowed, on the basis that the algorithm presumed the user had made an active choice for more left-wing content. Videos which were more edgy or shocking, which triggered more of an emotional response, provoked more engagement and were therefore prioritised in recommendations.


Smart Mobs: The Next Social Revolution by Howard Rheingold

"hyperreality Baudrillard"~20 OR "Baudrillard hyperreality", A Pattern Language, Alvin Toffler, AOL-Time Warner, augmented reality, barriers to entry, battle of ideas, Brewster Kahle, Burning Man, business climate, citizen journalism, computer vision, conceptual framework, creative destruction, Dennis Ritchie, digital divide, disinformation, Douglas Engelbart, Douglas Engelbart, experimental economics, experimental subject, Extropian, Free Software Foundation, Garrett Hardin, Hacker Ethic, Hedy Lamarr / George Antheil, Herman Kahn, history of Unix, hockey-stick growth, Howard Rheingold, invention of the telephone, inventory management, Ivan Sutherland, John Markoff, John von Neumann, Joi Ito, Joseph Schumpeter, Ken Thompson, Kevin Kelly, Lewis Mumford, Metcalfe's law, Metcalfe’s law, more computing power than Apollo, move 37, Multics, New Urbanism, Norbert Wiener, packet switching, PalmPilot, Panopticon Jeremy Bentham, pattern recognition, peer-to-peer, peer-to-peer model, pez dispenser, planetary scale, pre–internet, prisoner's dilemma, radical decentralization, RAND corporation, recommendation engine, Renaissance Technologies, RFID, Richard Stallman, Robert Metcalfe, Robert X Cringely, Ronald Coase, Search for Extraterrestrial Intelligence, seminal paper, SETI@home, sharing economy, Silicon Valley, skunkworks, slashdot, social intelligence, spectrum auction, Steven Levy, Stewart Brand, the Cathedral and the Bazaar, the scientific method, Tragedy of the Commons, transaction costs, ultimatum game, urban planning, web of trust, Whole Earth Review, Yochai Benkler, zero-sum game

Slashdot and other self-organized online forums enable participants to rate the postings of other participants in discussions, causing the best writing to rise in prominence and most objectionable postings to sink. Amazon’s online recommendation system tells customers about books and records bought by people whose tastes are similar to their own. Google.com, the foremost Internet search engine, lists first those Web sites that have the most links pointing to them—an implicit form of recommendation system. Hordes of programmers who compete for bragging rights as well as paying work are already driving the evolution of the first-generation reputation systems toward more advanced forms.

Even simple instruments that enable groups to share knowledge online by recommending useful Web sites, without requiring any action by the participants beyond bookmarking them, can multiply the groups’ effectiveness. In 1997, Hui Guo, Thomas Kreifelts, and Angi Voss of the German National Research Center for Information Technology described their “SOaP” social filtering service designed to address several of the problems constraining recommender systems.10 Guo and his colleagues created software agents, programs that could search, query, gather information, report results, even negotiate and execute transactions with other programs. The SOaP agents could implicitly collect recommendation information by the members of a group and mediate among people, groups, and the Web.

Buyers searching for items can see the feedback scores of the sellers. Over time, consistently honest sellers build up substantial reputation scores, which are costly to discard, guarding against the temptation to cheat buyers and adopt a new reputation. Paul Resnick, whose GroupLens had been a pioneering recommender system in 1992, and Richard Zeckhauser performed empirical studies on “a large data set from 1999” that indicated that despite the lack of physical presence on eBay, “trust has emerged due to the feedback or reputation system.”29 Biological theories of cooperation and experiments in game theory point to the expectation of dealing with others in future interactions— the “shadow of the future” that influences behavior in the present.


pages: 176 words: 55,819

The Start-Up of You: Adapt to the Future, Invest in Yourself, and Transform Your Career by Reid Hoffman, Ben Casnocha

Airbnb, Andy Kessler, Apollo 13, Benchmark Capital, Black Swan, business intelligence, Cal Newport, Clayton Christensen, commoditize, David Brooks, Donald Trump, Dunbar number, en.wikipedia.org, fear of failure, follow your passion, future of work, game design, independent contractor, information security, Jeff Bezos, job automation, Joi Ito, late fees, lateral thinking, Marc Andreessen, Mark Zuckerberg, Max Levchin, Menlo Park, out of africa, PalmPilot, Paul Graham, paypal mafia, Peter Thiel, public intellectual, recommendation engine, Richard Bolles, risk tolerance, rolodex, Salesforce, shareholder value, Sheryl Sandberg, side project, Silicon Valley, Silicon Valley startup, social web, Steve Jobs, Steve Wozniak, the strength of weak ties, Tony Hsieh, transaction costs, Tyler Cowen

In 1999 he set up a meeting at Blockbuster’s headquarters in part to discuss possibly partnering on local distribution and faster fulfillment. Blockbuster was not impressed. “They just about laughed us out of their office,” Reed recalls.16 Reed and his team kept at it. They perfected their distribution center network so that more than 80 percent of customers received overnight delivery of movies.17 They developed an innovative recommendation engine that prompted users with movies they might like based on past purchases. By 2005 Netflix had a subscriber base four million strong, had fended off competition from imitations like Walmart’s online movie-by-mail effort, and became the king of online movie rentals. In 2010 Netflix made a profit of more than $160 million.


pages: 375 words: 88,306

The Sharing Economy: The End of Employment and the Rise of Crowd-Based Capitalism by Arun Sundararajan

"World Economic Forum" Davos, additive manufacturing, Airbnb, AltaVista, Amazon Mechanical Turk, asset light, autonomous vehicles, barriers to entry, basic income, benefit corporation, bike sharing, bitcoin, blockchain, book value, Burning Man, call centre, Carl Icahn, collaborative consumption, collaborative economy, collective bargaining, commoditize, commons-based peer production, corporate social responsibility, cryptocurrency, data science, David Graeber, distributed ledger, driverless car, Eben Moglen, employer provided health coverage, Erik Brynjolfsson, Ethereum, ethereum blockchain, Frank Levy and Richard Murnane: The New Division of Labor, future of work, general purpose technology, George Akerlof, gig economy, housing crisis, Howard Rheingold, independent contractor, information asymmetry, Internet of things, inventory management, invisible hand, job automation, job-hopping, John Zimmer (Lyft cofounder), Kickstarter, knowledge worker, Kula ring, Lyft, Marc Andreessen, Mary Meeker, megacity, minimum wage unemployment, moral hazard, moral panic, Network effects, new economy, Oculus Rift, off-the-grid, pattern recognition, peer-to-peer, peer-to-peer lending, peer-to-peer model, peer-to-peer rental, profit motive, public intellectual, purchasing power parity, race to the bottom, recommendation engine, regulatory arbitrage, rent control, Richard Florida, ride hailing / ride sharing, Robert Gordon, Ronald Coase, Ross Ulbricht, Second Machine Age, self-driving car, sharing economy, Silicon Valley, smart contracts, Snapchat, social software, supply-chain management, TaskRabbit, TED Talk, the long tail, The Nature of the Firm, total factor productivity, transaction costs, transportation-network company, two-sided market, Uber and Lyft, Uber for X, uber lyft, universal basic income, Vitalik Buterin, WeWork, Yochai Benkler, Zipcar

This point has been noted about digital markets more generally. While a conventional brick-and-mortar bookstore may hold 40,000 to 100,000 books, Amazon offers access to over 3 million books. The same expansion in variety holds true for music, movies, electronics, and myriad other products. Furthermore, since Amazon uses several recommender systems to help promote products, it is not just variety but “fit” that has increased.14 Capturing the economic impacts of enhanced variety and automated word-of-mouth promotions, however, is difficult, since once again, what has changed is primarily the quality of the consumer experience. As Erik Brynjolfsson, Yu (Jeffery) Hu, and Michael Smith argue in their study of consumer surplus in the digital economy, these benefits may be particularly difficult to measure because different consumers are impacted to varying degrees.

This effect will be especially beneficial to those consumers who live in remote areas.”15 Analogous increases in consumer surplus were documented by Anindya Ghose, Rahul Telang and Michael Smith in their 2005 study of electronic markets for used books.16 These effects are exacerbated by a wide variety of recommender systems that use machine learning algorithms to better direct consumer choice. As Alexander Tuzhilin and Gedas Adomavicius document, such systems are ubiquitous in digital markets.17 It is natural to expect similar challenges when, for example, trying to encompass the different economic impacts of increased variety and fit from Airbnb, or increased convenience from Lyft, or Dennis’s increased access to financing on the Isle of Gigha.

Smith, “Internet Exchanges for Used Books: An Empirical Analysis of Product Cannibalization and Welfare Impact,” Information Systems Research 17, 1 (2006): 3–9. http://pubsonline.informs.org/doi/abs/10.1287/isre.1050.0072. 17. Alexander Tuzhilin and Gedas Adomavicius, ”Toward the next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions,” IEEE Transactions on Knowledge and Data Engineering 17, 6 (2006): 734–739. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1423975&tag=1. 18. Prasanna Tambe and Lorin M. Hitt, “Job Hopping, Information Technology Spillovers, and Productivity Growth,” Management Science 60, 2 (2013): 338–355. 19.


pages: 223 words: 60,909

Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech by Sara Wachter-Boettcher

"Susan Fowler" uber, Abraham Maslow, Airbnb, airport security, algorithmic bias, AltaVista, big data - Walmart - Pop Tarts, Big Tech, Black Lives Matter, data science, deep learning, Donald Trump, fake news, false flag, Ferguson, Missouri, Firefox, Grace Hopper, Greyball, Hacker News, hockey-stick growth, independent contractor, job automation, Kickstarter, lifelogging, lolcat, Marc Benioff, Mark Zuckerberg, Max Levchin, Menlo Park, meritocracy, microaggression, move fast and break things, natural language processing, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, off-the-grid, pattern recognition, Peter Thiel, real-name policy, recommendation engine, ride hailing / ride sharing, Salesforce, self-driving car, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, Snapchat, Steve Jobs, Tactical Technology Collective, TED Talk, Tim Cook: Apple, Travis Kalanick, upwardly mobile, Wayback Machine, women in the workforce, work culture , zero-sum game

In other words, if a system like Word2vec is fed data that reflects historical biases, then those biases will be reflected in the resulting word embeddings. The problem is that very few people have been talking about this—and meanwhile, because Google released Word2vec as an open-source technology, all kinds of companies are using it as the foundation for other products. These products include recommendation engines (the tools behind all those “you might also like . . .” features on websites), document classification, and search engines—all without considering the implications of relying on data that reflects historical biases and outdated norms to make future predictions. One of the most worrisome developments is this: using word embeddings to automatically review résumés.


pages: 229 words: 68,426

Everyware: The Dawning Age of Ubiquitous Computing by Adam Greenfield

"Hurricane Katrina" Superdome, augmented reality, business process, Charles Babbage, defense in depth, demand response, demographic transition, facts on the ground, game design, Howard Rheingold, Internet of things, James Dyson, knowledge worker, late capitalism, machine readable, Marshall McLuhan, new economy, Norbert Wiener, packet switching, pattern recognition, profit motive, QR code, recommendation engine, RFID, seminal paper, Steve Jobs, technoutopianism, the built environment, the scientific method, value engineering

It may well be that a full mug on my desk implies that I am also in the room, but this is not always going to be the case, and any system that correlates the two facts had better do so pretty loosely. Products and services based on such pattern-recognition already exist in the world—I think of Amazon's "collaborative filtering"–driven recommendation engine—but for the most part, their designers are only now beginning to recognize that they have significantly underestimated the difficulty of deriving meaning from those patterns. The better part of my Amazon recommendations turn out to be utterly worthless—and of all commercial pattern-recognition systems, that's among those with the largest pools of data to draw on.


pages: 233 words: 67,596

Competing on Analytics: The New Science of Winning by Thomas H. Davenport, Jeanne G. Harris

always be closing, Apollo 13, big data - Walmart - Pop Tarts, business intelligence, business logic, business process, call centre, commoditize, data acquisition, digital map, en.wikipedia.org, fulfillment center, global supply chain, Great Leap Forward, high net worth, if you build it, they will come, intangible asset, inventory management, iterative process, Jeff Bezos, job satisfaction, knapsack problem, late fees, linear programming, Moneyball by Michael Lewis explains big data, Netflix Prize, new economy, performance metric, personalized medicine, quantitative hedge fund, quantitative trading / quantitative finance, recommendation engine, RFID, search inside the book, shareholder value, six sigma, statistical model, supply-chain management, text mining, The future is already here, the long tail, the scientific method, traveling salesman, yield management

Customers watch their cinematic choices at their leisure; there are no late fees. When the DVDs are returned, customers select their next films. Besides the logistical expertise that Netflix needs to make this a profitable venture, Netflix employs analytics in two important ways, both driven by customer behavior and buying patterns. The first is a movie-recommendation “engine” called Cinematch that’s based on proprietary, algorithmically driven software. Netflix hired mathematicians with programming experience to write the algorithms and code to define clusters of movies, connect customer movie rankings to the clusters, evaluate thousands of ratings per second, and factor in current Web site behavior—all to ensure a personalized Web page for each visiting customer.


pages: 247 words: 69,593

The Creative Curve: How to Develop the Right Idea, at the Right Time by Allen Gannett

Alfred Russel Wallace, collective bargaining, content marketing, data science, David Brooks, deliberate practice, Desert Island Discs, Elon Musk, en.wikipedia.org, gentrification, glass ceiling, iterative process, lone genius, longitudinal study, Lyft, Mark Zuckerberg, McMansion, pattern recognition, profit motive, randomized controlled trial, recommendation engine, Richard Florida, ride hailing / ride sharing, Salesforce, Saturday Night Live, sentiment analysis, Silicon Valley, Silicon Valley startup, Skype, Snapchat, South of Market, San Francisco, Steve Jobs, TED Talk, too big to fail, uber lyft, work culture

Rather than doing his homework during his quiet shifts, Ted made a pact with himself that he would watch every single movie in the store. He wanted to learn everything he possibly could about films, and finally he had the best possible resource—a well-stocked video store—at his disposal. A few months later, after watching nearly every movie on the store shelves, Ted had morphed into a human recommendation engine. If you were a customer who liked Woody Allen films, Ted would suggest you try the movies of Albert Brooks, announcing that “what Woody Allen is to New York, Albert Brooks is to L.A.” Like a particular action movie? Ted had three other movie suggestions that would keep your blood flowing in just the same way.


pages: 593 words: 118,995

Relevant Search: With Examples Using Elasticsearch and Solr by Doug Turnbull, John Berryman

business logic, cognitive load, commoditize, crowdsourcing, data science, domain-specific language, Dr. Strangelove, fail fast, finite state, fudge factor, full text search, heat death of the universe, information retrieval, machine readable, natural language processing, premature optimization, recommendation engine, sentiment analysis, the long tail

But they often provide better results, because they employ a more holistic understanding of item-user relationships. To dive deeper into recommendation systems, we recommend Practical Recommender Systems by Kim Falk (Manning, 2016). And no matter the method you choose, keep in mind that the end result is a model that lets you quickly find the item-to-item or user-to-item affinities. This understanding is important as we explain how collaborative filtering results can be used in the context of search. 11.2.3. Tying user behavior information back to the search index In the previous section, we demonstrated how to build a simple recommendation system. But we’re supposed to be talking about personalized search!

In both cases, we start with relatively simple methods and then outline more sophisticated approaches using machine learning. In the process of laying out personalized search, we introduce recommendations. You can provide users with personalized content recommendations even before they’ve made a search. In addition, you’ll see that a search engine can be a powerful platform for building a recommendation system. Figure 11.1 shows recommendations side-by-side with search, implemented by a relevance engineer. Figure 11.1. By incorporating knowledge about the content and the user, search can be extended to tasks such as personalized search and recommendations. 11.1. Personalizing search based on user profiles Until now, we’ve defined relevance in terms of how well a search result matches a user’s immediate information need.

Particularly engaged users might even be willing to directly tell us about their interests. Item information —To make good recommendations, it’s important to be familiar with the items in the catalog. At a minimum, the items need to have useful textual content to match on. Items also need good metadata for boosting and filtering. In more advanced recommendation systems, you should also take advantage of the overall user behavior that gives you new information about how items in the catalog are interrelated. Recommendation context —To provide users with the best recommendations possible, you must consider their current context. Are they looking at an item details page?


pages: 252 words: 72,473

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil

Affordable Care Act / Obamacare, Alan Greenspan, algorithmic bias, Bernie Madoff, big data - Walmart - Pop Tarts, call centre, Cambridge Analytica, carried interest, cloud computing, collateralized debt obligation, correlation does not imply causation, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, data science, disinformation, electronic logging device, Emanuel Derman, financial engineering, Financial Modelers Manifesto, Glass-Steagall Act, housing crisis, I will remember that I didn’t make the world, and it doesn’t satisfy my equations, Ida Tarbell, illegal immigration, Internet of things, late fees, low interest rates, machine readable, mass incarceration, medical bankruptcy, Moneyball by Michael Lewis explains big data, new economy, obamacare, Occupy movement, offshore financial centre, payday loans, peer-to-peer lending, Peter Thiel, Ponzi scheme, prediction markets, price discrimination, quantitative hedge fund, Ralph Nader, RAND corporation, real-name policy, recommendation engine, Rubik’s Cube, Salesforce, Sharpe ratio, statistical model, tech worker, Tim Cook: Apple, too big to fail, Unsafe at Any Speed, Upton Sinclair, Watson beat the top human players on Jeopardy!, working poor

Well, an internal data scientist might say, no statistical system can be perfect. Those folks are collateral damage. And often, like Sarah Wysocki, they are deemed unworthy and expendable. Forget about them for a minute, they might say, and focus on all the people who get helpful suggestions from recommendation engines or who find music they love on Pandora, the ideal job on LinkedIn, or perhaps the love of their life on Match.​com. Think of the astounding scale, and ignore the imperfections. Big Data has plenty of evangelists, but I’m not one of them. This book will focus sharply in the other direction, on the damage inflicted by WMDs and the injustice they perpetuate.


pages: 231 words: 71,248

Shipping Greatness by Chris Vander Mey

business logic, corporate raider, don't be evil, en.wikipedia.org, fudge factor, Google Chrome, Google Hangouts, Gordon Gekko, Jeff Bezos, Kickstarter, Lean Startup, minimum viable product, performance metric, recommendation engine, Skype, slashdot, sorting algorithm, source of truth, SQL injection, Steve Jobs, Superbowl ad, two-pizza team, web application

Using IMDb’s unique collection of movie data and Amazon’s ability to distribute digital content and proven personalization tools, we will uniquely solve the content discovery problem by integrating these technologies and building unique suggestion algorithms. Unlike competitors such as Netflix, who already have a recommendations engine, we’ll integrate across all video sources and use our richer data to provide more interesting in-viewing experiences and more accurate recommendations. We will deliver these in-viewing experiences through platforms that can expose contextually relevant data (e.g., the cast of a YouTube video), such as a browser plug-in for YouTube and mobile applications for phones.


pages: 229 words: 72,431

Shadow Work: The Unpaid, Unseen Jobs That Fill Your Day by Craig Lambert

airline deregulation, Asperger Syndrome, banking crisis, Barry Marshall: ulcers, big-box store, business cycle, carbon footprint, cashless society, Clayton Christensen, cognitive dissonance, collective bargaining, Community Supported Agriculture, corporate governance, crowdsourcing, data science, disintermediation, disruptive innovation, emotional labour, fake it until you make it, financial independence, Galaxy Zoo, ghettoisation, gig economy, global village, helicopter parent, IKEA effect, industrial robot, informal economy, Jeff Bezos, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, Mark Zuckerberg, new economy, off-the-grid, pattern recognition, plutocrats, pneumatic tube, recommendation engine, Schrödinger's Cat, Silicon Valley, single-payer health, statistical model, the strength of weak ties, The Theory of the Leisure Class by Thorstein Veblen, Thorstein Veblen, Turing test, unpaid internship, Vanguard fund, Vilfredo Pareto, you are the product, zero-sum game, Zipcar

To make online purchases, customers open accounts with bookstores, banks, newspapers, utilities, sports teams, apparel vendors, phone service providers, and so on. Everyone wants you to open an account. This means supplying contact and demographic data and then having all transactions tracked, building a personal profile for the vendor. That profile enables vendors to activate “recommendation engines.” Once its algorithms have examined your past purchases, Amazon can recommend books or desk lamps you might like, and Netflix can suggest movies to rent. On my computer, opening Amazon.com brings up thumbnails of books by Bill Bryson, an author whose works I have purchased, and books on pharmaceutical companies, a topic I’ve browsed.


pages: 326 words: 74,433

Do More Faster: TechStars Lessons to Accelerate Your Startup by Brad Feld, David Cohen

An Inconvenient Truth, augmented reality, computer vision, corporate governance, crowdsourcing, deal flow, disintermediation, fail fast, hiring and firing, hockey-stick growth, Inbox Zero, independent contractor, Jeff Bezos, Kickstarter, knowledge worker, Lean Startup, lolcat, Ray Kurzweil, recommendation engine, risk tolerance, Silicon Valley, Skype, slashdot, social web, SoftBank, software as a service, Steve Jobs, subscription business

—usingmiles.com TutuorialTab (2010)—lets companies make their web site more learnable.—tutorialtab.com Usermojo (2010)—is an emotion analytics platform that tells you why users do what they do.—usermojo.com Vanilla (2009)—is open source forum software.—vanillaforums.com Villij (2007)—is a recommendation engine for people.—villij.com Vacation Rental Partner (2010)—makes it easy to generate revenue from a second home. We offer tools that eliminate the need for traditional property management companies.—vacationrentalpartner.com TechStars companies funded after publication are listed on the TechStars web site.


pages: 260 words: 76,223

Ctrl Alt Delete: Reboot Your Business. Reboot Your Life. Your Future Depends on It. by Mitch Joel

3D printing, Amazon Web Services, augmented reality, behavioural economics, call centre, clockwatching, cloud computing, content marketing, digital nomad, do what you love, Firefox, future of work, gamification, ghettoisation, Google Chrome, Google Glasses, Google Hangouts, Khan Academy, Kickstarter, Kodak vs Instagram, Lean Startup, Marc Andreessen, Marc Benioff, Mark Zuckerberg, Network effects, new economy, Occupy movement, place-making, prediction markets, pre–internet, QR code, recommendation engine, Richard Florida, risk tolerance, Salesforce, self-driving car, Silicon Valley, Silicon Valley startup, Skype, social graph, social web, Steve Jobs, Steve Wozniak, TechCrunch disrupt, TED Talk, the long tail, Thomas L Friedman, Tim Cook: Apple, Tony Hsieh, vertical integration, white picket fence, WikiLeaks, zero-sum game

THE REALITY OF CAREER CHOICES IN A CTRL ALT DELETE WORLD. You can contrast the fictional story above with the tale of a friend of mine. This individual was never really sure what she wanted to do. There was no clear desire or talent in a single area of interest. In her final years of high school, a guidance counselor recommended engineering or the sciences because she had above-average math grades. So my friend studied engineering through university and squeaked by. Never passionate about it, she got her diploma and entered the workforce. I had lunch with her a while back and she confessed that she was miserable because of her work but could not figure out why.


pages: 265 words: 74,000

The Numerati by Stephen Baker

Berlin Wall, Black Swan, business process, call centre, correlation does not imply causation, Drosophila, full employment, illegal immigration, index card, information security, Isaac Newton, job automation, job satisfaction, junk bonds, McMansion, Myron Scholes, natural language processing, off-the-grid, PageRank, personalized medicine, recommendation engine, RFID, Silicon Valley, Skype, statistical model, surveillance capitalism, Watson beat the top human players on Jeopardy!, workplace surveillance

It will be up to doctors and nurses to follow up, figuring out why someone is limping or swaying differently at the kitchen sink. But in time, these systems will have enough feedback from thousands of users that they should be able to point people—either doctors or patients—to the most probable cause. In this way, they will work like the recommendation engines on Netflix or Amazon.com, which point people toward books or movies that are popular among customers with similar patterns. (Amazon and Netflix, of course, don't always get it right, and neither will the analysis issuing from the magic carpet. It will only point caregivers toward statistically probable causes.)


pages: 499 words: 144,278

Coders: The Making of a New Tribe and the Remaking of the World by Clive Thompson

"Margaret Hamilton" Apollo, "Susan Fowler" uber, 2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 4chan, 8-hour work day, Aaron Swartz, Ada Lovelace, AI winter, air gap, Airbnb, algorithmic bias, AlphaGo, Amazon Web Services, Andy Rubin, Asperger Syndrome, augmented reality, Ayatollah Khomeini, backpropagation, barriers to entry, basic income, behavioural economics, Bernie Sanders, Big Tech, bitcoin, Bletchley Park, blockchain, blue-collar work, Brewster Kahle, Brian Krebs, Broken windows theory, call centre, Cambridge Analytica, cellular automata, Charles Babbage, Chelsea Manning, Citizen Lab, clean water, cloud computing, cognitive dissonance, computer vision, Conway's Game of Life, crisis actor, crowdsourcing, cryptocurrency, Danny Hillis, data science, David Heinemeier Hansson, deep learning, DeepMind, Demis Hassabis, disinformation, don't be evil, don't repeat yourself, Donald Trump, driverless car, dumpster diving, Edward Snowden, Elon Musk, Erik Brynjolfsson, Ernest Rutherford, Ethereum, ethereum blockchain, fake news, false flag, Firefox, Frederick Winslow Taylor, Free Software Foundation, Gabriella Coleman, game design, Geoffrey Hinton, glass ceiling, Golden Gate Park, Google Hangouts, Google X / Alphabet X, Grace Hopper, growth hacking, Guido van Rossum, Hacker Ethic, hockey-stick growth, HyperCard, Ian Bogost, illegal immigration, ImageNet competition, information security, Internet Archive, Internet of things, Jane Jacobs, John Markoff, Jony Ive, Julian Assange, Ken Thompson, Kickstarter, Larry Wall, lone genius, Lyft, Marc Andreessen, Mark Shuttleworth, Mark Zuckerberg, Max Levchin, Menlo Park, meritocracy, microdosing, microservices, Minecraft, move 37, move fast and break things, Nate Silver, Network effects, neurotypical, Nicholas Carr, Nick Bostrom, no silver bullet, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, Oculus Rift, off-the-grid, OpenAI, operational security, opioid epidemic / opioid crisis, PageRank, PalmPilot, paperclip maximiser, pattern recognition, Paul Graham, paypal mafia, Peter Thiel, pink-collar, planetary scale, profit motive, ransomware, recommendation engine, Richard Stallman, ride hailing / ride sharing, Rubik’s Cube, Ruby on Rails, Sam Altman, Satoshi Nakamoto, Saturday Night Live, scientific management, self-driving car, side project, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, single-payer health, Skype, smart contracts, Snapchat, social software, software is eating the world, sorting algorithm, South of Market, San Francisco, speech recognition, Steve Wozniak, Steven Levy, systems thinking, TaskRabbit, tech worker, techlash, TED Talk, the High Line, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, universal basic income, urban planning, Wall-E, Watson beat the top human players on Jeopardy!, WeWork, WikiLeaks, women in the workforce, Y Combinator, Zimmermann PGP, éminence grise

At Columbia University, the researcher Jonathan Albright experimentally searched on YouTube for the phrase “crisis actors,” in the wake of a major school shooting, and took the “next up” recommendation from the recommendation system. He quickly amassed 9,000 videos, a large percentage that seemed custom designed to shock, inflame, or mislead, ranging from “rape game jokes, shock reality social experiments, celebrity pedophilia, ‘false flag’ rants, and terror-related conspiracy theories,” as he wrote. Some of it, he figured, was driven by sheer profit motive: Post outrageous nonsense, get into the recommendation system, and reap the profit from the clicks. Recommender systems, in other words, may have a bias toward “inflammatory content,” as Tufekci notes.

They needed automation, an algorithm that would pick only posts you’d most likely find interesting. How does Facebook figure that out? It’s hard to know for sure. Social networks do not discuss their ranking systems with much detail, to prevent people from gaming their algorithms; spammers constantly try to suss out how recommendation systems work so they can produce spammy material that will get upranked. So few outside the firms truly know. But generally, the algorithms uprank the type of content you’d expect: posts and photos and videos that have amassed tons of likes or “faves” or attracted many comments, reposts, and retweets, with a particular bias toward recent activity.

Recommender systems, in other words, may have a bias toward “inflammatory content,” as Tufekci notes. Another academic, Renée DiResta, found the same problem with Facebook’s recommendation system for its “Groups.” People who read posts about vaccines were urged to join anti-vaccination groups, and thence to groups devoted to even more unhinged conspiracies like “chemtrails.” The recommendations, DiResta concluded, were “essentially creating this vortex in which conspiratorial ideas can just breed and multiply.” Certainly, big-tech firms keep quiet about how their systems work, for fear of being gamed. But since they seem to self-evidently favor high emotionality, it makes them pretty easy to manipulate, as Siva Vaidhyanathan, a media scholar and author of Antisocial Media, notes.


pages: 252 words: 74,167

Thinking Machines: The Inside Story of Artificial Intelligence and Our Race to Build the Future by Luke Dormehl

"World Economic Forum" Davos, Ada Lovelace, agricultural Revolution, AI winter, Albert Einstein, Alexey Pajitnov wrote Tetris, algorithmic management, algorithmic trading, AlphaGo, Amazon Mechanical Turk, Apple II, artificial general intelligence, Automated Insights, autonomous vehicles, backpropagation, Bletchley Park, book scanning, borderless world, call centre, cellular automata, Charles Babbage, Claude Shannon: information theory, cloud computing, computer vision, Computing Machinery and Intelligence, correlation does not imply causation, crowdsourcing, deep learning, DeepMind, driverless car, drone strike, Elon Musk, Flash crash, Ford Model T, friendly AI, game design, Geoffrey Hinton, global village, Google X / Alphabet X, Hans Moravec, hive mind, industrial robot, information retrieval, Internet of things, iterative process, Jaron Lanier, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, life extension, Loebner Prize, machine translation, Marc Andreessen, Mark Zuckerberg, Menlo Park, Mustafa Suleyman, natural language processing, Nick Bostrom, Norbert Wiener, out of africa, PageRank, paperclip maximiser, pattern recognition, radical life extension, Ray Kurzweil, recommendation engine, remote working, RFID, scientific management, self-driving car, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, social intelligence, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, tech billionaire, technological singularity, The Coming Technological Singularity, The Future of Employment, Tim Cook: Apple, Tony Fadell, too big to fail, traumatic brain injury, Turing machine, Turing test, Vernor Vinge, warehouse robotics, Watson beat the top human players on Jeopardy!

Much like laws continue to be followed after lawmakers have passed away, the idea of an expert system is that we ought to be able to continue drawing on an expert’s knowledge about a specialist subject after the person is no longer available to us. The concept failed, but the intention (and, for a while, the funding) was absolutely there. In some senses, the modern parallel of the expert system is the so-called ‘recommender system’. This subclass of information filtering system sets out to anticipate and predict what rating or selection a user is likely to give an item in a specific narrow domain. Everyone reading this will likely have come across the feature on Amazon or Netflix which suggests that, ‘You liked X, so you may also enjoy Y.’

‘Eventually we will entirely replace our brains using nanotechnology,’ he wrote. ‘Once delivered from the limitations of biology, we will be able to decide the length of our lives – with the option of immortality – and choose among other, unimagined capabilities as well.’ The Connectome A complex recommender system ‘mindfile’ of the sort described by Marius Ursache and William Sims Bainbridge may go some way towards replicating us in software form. However, the only truly faithful means of making sure that a person is reconstructed in a form other than their original one would be to duplicate all of the cellular pathways in the brain – neuron by painstaking neuron.

(TV show) 135–9, 162, 189–90, 225, 254 Jobs, Steve 6–7, 32, 35, 108, 113, 181, 193, 231 Jochem, Todd 55–6 judges 153–4 Kasparov, Garry 137, 138–9, 177 Katz, Lawrence 159–60 Keck, George Fred 81–2 Keynes, John Maynard 139–40 Kjellberg, Felix (PewDiePie) 151 ‘knowledge engineers’ 29, 37 Knowledge Narrator 110–11 Kodak 238 Kolibree 67 Koza, John 188–9 Ktesibios of Alexandria 71–2 Kubrick, Stanley 2, 228 Kurzweil, Ray 213–14, 231–3 Landauer, Thomas 201–2 Lanier, Jaron 156, 157 Laorden, Carlos 100, 101 learning 37–9, 41–4, 52–3, 55 Deep 11–2, 56–63, 96–7, 164, 225 and email filters 88 machine 3, 71, 84–6, 88, 100, 112, 154, 158, 197, 215, 233, 237, 239 reinforcement 83, 232 and smart homes 84, 85 supervised 57 unsupervised 57–8 legal profession 145, 188, 192 LegalZoom 145 LG 132 Lickel, Charles 136–7 ‘life logging’ software 200 Linden, David J. 213–14 Loebner, Hugh 102–3, 105 Loebner Prize 102–5 Lohn, Jason 182, 183–5, 186 long-term potentiation 39–40 love 122–4 Lovelace, Ada 185, 189 Lovelace Test 185–6 Lucas, George 110–11 M2M communication 70–71 ‘M’ (AI assistant) 153 Machine Intelligence from Cortical Networks (MICrONS) project 214–15 machine learners 38 machine learning 3, 71, 84–6, 88, 100, 112, 154, 158, 197, 215, 233, 237, 239 Machine Translator 8–9, 11 ‘machine-aided recognition’ 19–20 Manhattan Project 14, 229 MARK 1 (computer) 43–4 Mattersight Corporation 127 McCarthy, John 18, 19, 20, 27, 42, 54, 253 McCulloch, Warren 40–2, 43, 60, 142–3 Mechanical Turk jobs 152–7 medicine 11, 30, 87–8, 92–5, 187–8, 192, 247, 254 memory 13, 14, 16, 38–9, 42, 49 ‘micro-worlds’ 25 Microsoft 62–3, 106–7, 111–12, 114, 118, 129 mind mapping the 210–14, 217, 218 ‘mind clones’ 203 uploads 221 mindfiles 201–2, 207, 212 Minsky, Marvin 18, 21, 24, 32, 42, 44–6, 49, 105, 205–7, 253–4 MIT 19–20, 27, 96–7, 129, 194–5 Mitsuku (chatterbot) 103–6, 108 Modernising Medicine 11 Momentum Machines, Inc. 141 Moore’s Law 209, 220, 231 Moravec’s paradox 26–7 mortgage applications 237–8 MTurk platform 153, 154, 155 music 168, 172–7, 179 Musk, Elon 149–50, 223–4 MYCIN (expert system) 30–1 nanobots 213–14 nanosensors 92 Nara Logics 118 NASA 6, 182, 184–5 natural selection 182–3 navigational aids 90–1, 126, 127, 128, 241 Nazis 15, 17, 227 Negobot 99–102 Nest Labs 67, 96, 254 Netflix 156, 198 NETtalk 51, 52–3, 60 neural networks 11–12, 38–9, 41, 42–3, 97, 118, 164–6, 168, 201, 208–9, 211, 214–15, 218, 220, 224–5, 233, 237–8, 249, 254, 256–7 neurons 40, 41–2, 46, 49–50, 207, 209–13, 216 neuroscience 40–2, 211, 212, 214, 215 New York World’s Fair 1964 5–11 Newell, Alan 19, 226 Newman, Judith 128–9 Nuance Communications 109 offices, smart 90 OpenWorm 210 ‘Optical Scanning and Information Retrieval’ 7–8, 10 paedophile detection 99–102 Page, Larry 6–7, 34, 220 ‘paperclip maximiser’ scenario 235 Papert, Seymour 27, 44, 45–6, 49 Paro (therapeutic robot) 130–1 patents 188–9 Perceiving and Recognising Automation (PARA) 43 perceptrons 43–6 personality capture 200–4 pharmaceuticals 187–8 Pitts, Walter 40–2, 43, 60 politics 119–2 Pomerlau, Dean 54, 55–6, 90 prediction 87, 198–9 Profound Hypothermia and Circulatory Arrest 219–20 punch-cards 8 Qualcomm 93 radio-frequency identification device (RFID) 65–6 Ramón y Cajal, Santiago 39–40 Rapidly Adapting Lateral Position Handler (RALPH) 55 ‘recommender system’ 198 refuse collection 142 ‘relational agents’ 130 remote working 238–9 reverse engineering 208, 216, 217 rights for AIs 248–51 risks of AI 223–40 accountability issues 240–4, 246–8 ethics 244–8 rights for AIs 248–51 technological unemployment 139–50, 163, 225, 255 robots 62, 74–7, 89–90, 130–1, 141, 149, 162, 217, 225, 227, 246–7, 255–6 Asimov’s three ethical rules of 244–8 robotic limbs 211–12 Roomba robot vacuum cleaner 75–7, 234, 236 Rosenblatt, Frank 42–6, 61, 220 rules 36–7, 79–80 Rumelhart, David 48, 50–1, 63 Russell, Bertrand 41 Rutter, Brad 138, 139 SAINT program 20 sampling (music) 155, 157 ‘Scheherazade’ (Ai storyteller) 169–70 scikit-learn 239 Scripps Health 92 Sculley, John 110–11 search engines 109–10 Searle, John 24–5 Second Life (video game) 194 Second World War 12–13, 14–15, 17, 72, 227 Sejnowski, Terry 48, 51–3 self-awareness 77, 246–7 self-driving 53–6, 90, 143, 149–50 Semantic Information Retrieval (SIR) 20–2 sensors 75–6, 80, 84–6, 93 SHAKEY robot 23–4, 27–8, 90 Shamir, Lior 172–7, 179, 180 Shannon, Claude 13, 16–18, 28, 253 shipping systems 198 Simon, Herbert 10, 19, 24, 226 Sinclair Oil Corporation 6 Singularity, the 228–3, 251, 256 Siri (AI assistant) 108–11, 113–14, 116, 118–19, 125–30, 132, 225–6, 231, 241, 256 SITU 69, 93 Skynet 231 smart devices 3, 66–7, 69–71, 73–7, 80–8, 92–7, 230–1, 254 and AI assistants 116 and feedback 73–4 problems with 94–7 ubiquitous 92–4 and unemployment 141–2 smartwatches 66, 93, 199 Sony 199–200 Sorto, Erik 211, 212 Space Invaders (video game) 37 spectrometers 93 speech recognition 59, 62, 109, 111, 114, 120 SRI International 28, 89–90, 112–13 StarCraft II (video game) 186–7 story generation 169–70 strategy 36 STUDENT program 20 synapses 209 Synthetic Interview 202–3 Tamagotchis 123–5 Tay (chatbot) 106–7 Taylorism 95–6 Teknowledge 32, 33 Terminator franchise 231, 235 Tetris (video game) 28 Theme Park (video game) 29 thermostats 73, 79, 80 ‘three wise men’ puzzle 246–7 Tojan Room, Cambridge University 69–70 ‘tortoises’ (robots) 74–7 toys 123–5 traffic congestion 90–1 transhumanists 205 transistors 16–17 Transits – Into an abyss (musical composition) 168 translation 8–9, 11, 62–3, 155, 225 Turing, Alan 3, 13–17, 28, 35, 102, 105–6, 227, 232 Turing Test 15, 101–7, 229, 232 tutors, remote 160–1 TV, smart 80, 82 Twitter 153–4 ‘ubiquitous computing’ 91–4 unemployment, technological 139–50, 163, 225, 255 universal micropayment system 156 Universal Turing Machine 15–16 Ursache, Marius 193–7, 203–4, 207 vacuum cleaners, robotic 75–7, 234, 236 video games 28–9, 35–7, 151–2, 186–7, 194, 197 Vinge, Vernor 229–30 virtual assistants 107–32, 225–6, 240–1 characteristics 126–8 falling in love with 122–4 political 119–22 proactive 116–18 therapeutic 128–31 voices 124–126, 127–8 Viv Labs 132 Vladeck, David 242–4 ‘vloggers’ 151–2 von Neumann, John 13–14, 17, 100, 229 Voxta (AI assistant) 119–20 waiter drones 141 ‘Walking Cities’ 89–90 Walter, William Grey 74–7 Warwick, Kevin 65–6 Watson (Blue J) 138–9, 162, 189–92 Waze 90–91, 126 weapons 14, 17, 72, 224–5, 234–5, 247, 255–6 ‘wetware’ 208 Wevorce 145 Wiener, Norbert 72–3, 227 Winston, Patrick 49–50 Wofram Alpha tool 108–9 Wozniak, Steve 35, 114 X.ai 116–17 Xbox 360, Kinect device 114 XCoffee 70 XCON (expert system) 31 Xiaoice 129, 130 YouTube 151 Yudkowsky, Eliezer 237–8 Zuckerberg, Mark 7, 107–8, 230–1, 254–5 Acknowledgments WRITING A BOOK is always a bit of a solitary process.


pages: 561 words: 120,899

The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy by Sharon Bertsch McGrayne

Abraham Wald, Alan Greenspan, Bayesian statistics, bioinformatics, Bletchley Park, British Empire, classic study, Claude Shannon: information theory, Daniel Kahneman / Amos Tversky, data science, double helix, Dr. Strangelove, driverless car, Edmond Halley, Fellow of the Royal Society, full text search, government statistician, Henri Poincaré, Higgs boson, industrial research laboratory, Isaac Newton, Johannes Kepler, John Markoff, John Nash: game theory, John von Neumann, linear programming, longitudinal study, machine readable, machine translation, meta-analysis, Nate Silver, p-value, Pierre-Simon Laplace, placebo effect, prediction markets, RAND corporation, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman: Challenger O-ring, Robert Mercer, Ronald Reagan, seminal paper, speech recognition, statistical model, stochastic process, Suez canal 1869, Teledyne, the long tail, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, traveling salesman, Turing machine, Turing test, uranium enrichment, We are all Keynesians now, Yom Kippur War

In draft. Quatse JT, Najmi A. (2007) Empirical Bayesian targeting. Proceedings, 2007 World Congress in Computer Science, Computer Engineering, and Applied Computing, June 25–28, 2007. Schafer JB, Konstan J, Riedl J. (1999) Recommender systems in E-commerce. In ACM Conference on Electronic Commerce (EC-99) 158–66. Schafer JB, Konstan J, Riedl J. (2001) Recommender systems in E-commerce. Data Mining and Knowledge Discovery (5) 115–53. Schneider, Stephen H. (2005) The Patient from Hell. Perseus Books. Spolsky, Joel. (2005) (http://www.joelonsoftware.com/items/2005/10/17.html). Swinburne, Richard, ed. (2002) Bayes’s Theorem.

This use of Bayesian optimal classifiers is similar to the technique used by Frederick Mosteller and David Wallace to determine who wrote certain Federalist papers. Bayesian theory is firmly embedded in Microsoft’s Windows operating system. In addition, a variety of Bayesian techniques are involved in Microsoft’s handwriting recognition; recommender systems; the question-answering box in the upper right corner of a PC’s monitor screen; a datamining software package for tracking business sales; a program that infers the applications that users will want and preloads them before they are requested; and software to make traffic jam predictions for drivers to check before their commute.

The updating used in machine learning does not necessarily follow Bayes’ theorem formally but “shares its perspective.” A 1-million contest sponsored by Netflix.com illustrates the prominent role of Bayesian concepts in modern e-commerce and learning theory. In 2006 the online film-rental company launched a search for the best recommender system to improve its own algorithm. More than 50,000 contestants from 186 countries vied over the four years of the competition. The AT&T Labs team organized around Yehuda Koren, Christopher T. Volinsky, and Robert M. Bell won the prize in September 2009. Interestingly, although no contestants questioned Bayes as a legitimate method, almost none wrote a formal Bayesian model.


pages: 288 words: 86,995

Rule of the Robots: How Artificial Intelligence Will Transform Everything by Martin Ford

AI winter, Airbnb, algorithmic bias, algorithmic trading, Alignment Problem, AlphaGo, Amazon Mechanical Turk, Amazon Web Services, artificial general intelligence, Automated Insights, autonomous vehicles, backpropagation, basic income, Big Tech, big-box store, call centre, carbon footprint, Chris Urmson, Claude Shannon: information theory, clean water, cloud computing, commoditize, computer age, computer vision, Computing Machinery and Intelligence, coronavirus, correlation does not imply causation, COVID-19, crowdsourcing, data is the new oil, data science, deep learning, deepfake, DeepMind, Demis Hassabis, deskilling, disruptive innovation, Donald Trump, Elon Musk, factory automation, fake news, fulfillment center, full employment, future of work, general purpose technology, Geoffrey Hinton, George Floyd, gig economy, Gini coefficient, global pandemic, Googley, GPT-3, high-speed rail, hype cycle, ImageNet competition, income inequality, independent contractor, industrial robot, informal economy, information retrieval, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Jeff Bezos, job automation, John Markoff, Kiva Systems, knowledge worker, labor-force participation, Law of Accelerating Returns, license plate recognition, low interest rates, low-wage service sector, Lyft, machine readable, machine translation, Mark Zuckerberg, Mitch Kapor, natural language processing, Nick Bostrom, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, Ocado, OpenAI, opioid epidemic / opioid crisis, passive income, pattern recognition, Peter Thiel, Phillips curve, post scarcity, public intellectual, Ray Kurzweil, recommendation engine, remote working, RFID, ride hailing / ride sharing, Robert Gordon, Rodney Brooks, Rubik’s Cube, Sam Altman, self-driving car, Silicon Valley, Silicon Valley startup, social distancing, SoftBank, South of Market, San Francisco, special economic zone, speech recognition, stealth mode startup, Stephen Hawking, superintelligent machines, TED Talk, The Future of Employment, The Rise and Fall of American Growth, the scientific method, Turing machine, Turing test, Tyler Cowen, Tyler Cowen: Great Stagnation, Uber and Lyft, uber lyft, universal basic income, very high income, warehouse automation, warehouse robotics, Watson beat the top human players on Jeopardy!, WikiLeaks, women in the workforce, Y Combinator

As always, competition between the cloud providers is a powerful driver of innovation, and Amazon’s deep learning tools for the AWS platform are likewise becoming easier to use. Along with the development tools, all the cloud services offer pre-built deep learning components that are ready to be used out of the box and incorporated into applications. Amazon, for example, offers packages for speech recognition and natural language processing and a “recommendation engine” that can make suggestions in the same way that online shoppers or movie watchers are shown alternatives that are likely to be of interest.16 The most controversial example of this kind of prepackaged capability is AWS’s Rekognition service, which makes it easy for developers to deploy facial recognition technology.


pages: 619 words: 177,548

Power and Progress: Our Thousand-Year Struggle Over Technology and Prosperity by Daron Acemoglu, Simon Johnson

"Friedman doctrine" OR "shareholder theory", "World Economic Forum" Davos, 4chan, agricultural Revolution, AI winter, Airbnb, airline deregulation, algorithmic bias, algorithmic management, Alignment Problem, AlphaGo, An Inconvenient Truth, artificial general intelligence, augmented reality, basic income, Bellingcat, Bernie Sanders, Big Tech, Bletchley Park, blue-collar work, British Empire, carbon footprint, carbon tax, carried interest, centre right, Charles Babbage, ChatGPT, Clayton Christensen, clean water, cloud computing, collapse of Lehman Brothers, collective bargaining, computer age, Computer Lib, Computing Machinery and Intelligence, conceptual framework, contact tracing, Corn Laws, Cornelius Vanderbilt, coronavirus, corporate social responsibility, correlation does not imply causation, cotton gin, COVID-19, creative destruction, declining real wages, deep learning, DeepMind, deindustrialization, Demis Hassabis, Deng Xiaoping, deskilling, discovery of the americas, disinformation, Donald Trump, Douglas Engelbart, Douglas Engelbart, Edward Snowden, Elon Musk, en.wikipedia.org, energy transition, Erik Brynjolfsson, European colonialism, everywhere but in the productivity statistics, factory automation, facts on the ground, fake news, Filter Bubble, financial innovation, Ford Model T, Ford paid five dollars a day, fulfillment center, full employment, future of work, gender pay gap, general purpose technology, Geoffrey Hinton, global supply chain, Gordon Gekko, GPT-3, Grace Hopper, Hacker Ethic, Ida Tarbell, illegal immigration, income inequality, indoor plumbing, industrial robot, interchangeable parts, invisible hand, Isaac Newton, Jacques de Vaucanson, James Watt: steam engine, Jaron Lanier, Jeff Bezos, job automation, Johannes Kepler, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph-Marie Jacquard, Kenneth Arrow, Kevin Roose, Kickstarter, knowledge economy, labor-force participation, land reform, land tenure, Les Trente Glorieuses, low skilled workers, low-wage service sector, M-Pesa, manufacturing employment, Marc Andreessen, Mark Zuckerberg, megacity, mobile money, Mother of all demos, move fast and break things, natural language processing, Neolithic agricultural revolution, Norbert Wiener, NSO Group, offshore financial centre, OpenAI, PageRank, Panopticon Jeremy Bentham, paperclip maximiser, pattern recognition, Paul Graham, Peter Thiel, Productivity paradox, profit maximization, profit motive, QAnon, Ralph Nader, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Robert Bork, Robert Gordon, Robert Solow, robotic process automation, Ronald Reagan, scientific management, Second Machine Age, self-driving car, seminal paper, shareholder value, Sheryl Sandberg, Shoshana Zuboff, Silicon Valley, social intelligence, Social Responsibility of Business Is to Increase Its Profits, social web, South Sea Bubble, speech recognition, spice trade, statistical model, stem cell, Steve Jobs, Steve Wozniak, strikebreaker, subscription business, Suez canal 1869, Suez crisis 1956, supply-chain management, surveillance capitalism, tacit knowledge, tech billionaire, technoutopianism, Ted Nelson, TED Talk, The Future of Employment, The Rise and Fall of American Growth, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, theory of mind, Thomas Malthus, too big to fail, total factor productivity, trade route, transatlantic slave trade, trickle-down economics, Turing machine, Turing test, Twitter Arab Spring, Two Sigma, Tyler Cowen, Tyler Cowen: Great Stagnation, union organizing, universal basic income, Unsafe at Any Speed, Upton Sinclair, upwardly mobile, W. E. B. Du Bois, War on Poverty, WikiLeaks, wikimedia commons, working poor, working-age population

The World Wide Web, for instance, has become as much a platform for digital advertisement and propagation of misinformation as a source of useful information for people. Recommendation systems are often used for steering customers to specific products, depending on the platform’s financial incentives. Digital tools can provide information to managers not just for better decision making but also for the better monitoring of workers. Some of the AI-powered recommendation systems have incorporated and reintensified existing biases—for example, toward race in hiring or toward race in the justice system. Platforms for ride sharing and delivery have imposed exploitative arrangements on workers lacking protection or job security.

AI is the name given to the branch of computer science that develops “intelligent” machines, meaning machines and algorithms (instructions for solving problems) capable of exhibiting high-level capabilities. Modern intelligent machines perform tasks that many would have thought impossible a couple of decades ago. Examples include face-recognition software, search engines that guess what you want to find, and recommendation systems that match you to the products that you are most likely to enjoy or, at the very least, purchase. Many systems now use some form of natural-language processing to interface between human speech or written enquiries and computers. Apple’s Siri and Google’s search engine are examples of AI-based systems that are used widely around the world every day.

As the naked-streets experiment emphasized, driving in busy cities requires a tremendous amount of situational intelligence to adapt to changing circumstances, and even more social intelligence to respond to cues from other drivers and pedestrians. General AI Illusion The apogee of the current AI approach inspired by Turing’s ideas is the quest for general, human-level intelligence. Despite tremendous advances such as GPT-3 and recommendation systems, the current approach to AI is unlikely to soon crack human intelligence or even achieve very high levels of productivity in many of the decision-making tasks humans engage in. Tasks that involve social and situational aspects of human cognition will continue to pose formidable challenges for machine intelligence.


pages: 1,535 words: 337,071

Networks, Crowds, and Markets: Reasoning About a Highly Connected World by David Easley, Jon Kleinberg

Albert Einstein, AltaVista, AOL-Time Warner, Apollo 13, classic study, clean water, conceptual framework, Daniel Kahneman / Amos Tversky, Douglas Hofstadter, Dutch auction, Erdős number, experimental subject, first-price auction, fudge factor, Garrett Hardin, George Akerlof, Gerard Salton, Gerard Salton, Gödel, Escher, Bach, incomplete markets, information asymmetry, information retrieval, John Nash: game theory, Kenneth Arrow, longitudinal study, market clearing, market microstructure, moral hazard, Nash equilibrium, Network effects, Pareto efficiency, Paul Erdős, planetary scale, power law, prediction markets, price anchoring, price mechanism, prisoner's dilemma, random walk, recommendation engine, Richard Thaler, Ronald Coase, sealed-bid auction, search engine result page, second-price auction, second-price sealed-bid, seminal paper, Simon Singh, slashdot, social contagion, social web, Steve Jobs, Steve Jurvetson, stochastic process, Ted Nelson, the long tail, The Market for Lemons, the strength of weak ties, The Wisdom of Crowds, trade route, Tragedy of the Commons, transaction costs, two and twenty, ultimatum game, Vannevar Bush, Vickrey auction, Vilfredo Pareto, Yogi Berra, zero-sum game

Ideas from the theory of voting have been adopted in a number of recent on-line applications [139]. Different Web search engines produce different rankings of results; a line of work on meta-search has developed tools for combining these rankings into a single aggregate ranking. Recommendation systems for books, music, and other items — such as Amazon’s product-recommendation system — have employed related ideas for aggregating preferences. In this case, a recommendation system determines a set of users whose past history indicates tastes similar to yours, and then uses voting methods to combine the preferences of these other users to produce a ranked list of recommendations (or a single best recommendation) for you.

We discussed such systems in the context of structural balance in Chapter 5, and will see their role in providing information essential to the functioning of on-line markets in Chapter 22. Web 2.0 sites also make use of recommendations systems, to guide users toward items that they may not know about. In addition to serving as helpful features for a site’s users, such recommendation systems interact in complex but important ways with distributions of popularity and the long tail of niche content, as we will see in Chapter 18. The development of the current generation of Web search engines, led by Google, is sometimes seen as a crucial step in the pivot from the early days of the Web to the era of Web 2.0.

. . . . . . . . . . . . . . . . 299 10.6 Advanced Material: A Proof of the Matching Theorem . . . . . . . . . . . . 300 10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 11 Network Models of Markets with Intermediaries 319 11.1 Price-Setting in Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 11.2 A Model of Trade on Networks . . . . . . . . . . . . . . . . . . . . . . . . . 323 11.3 Equilibria in Trading Networks . . . . . . . . . . . . . . . . . . . . . . . . . 330 11.4 Further Equilibrium Phenomena: Auctions and Ripple Effects . . . . . . . . 334 11.5 Social Welfare in Trading Networks . . . . . . . . . . . . . . . . . . . . . . . 338 11.6 Trader Profits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 11.7 Reflections on Trade with Intermediaries . . . . . . . . . . . . . . . . . . . . 342 11.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 12 Bargaining and Power in Networks 347 12.1 Power in Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 12.2 Experimental Studies of Power and Exchange . . . . . . . . . . . . . . . . . 350 12.3 Results of Network Exchange Experiments . . . . . . . . . . . . . . . . . . . 352 12.4 A Connection to Buyer-Seller Networks . . . . . . . . . . . . . . . . . . . . . 356 12.5 Modeling Two-Person Interaction: The Nash Bargaining Solution . . . . . . 357 12.6 Modeling Two-Person Interaction: The Ultimatum Game . . . . . . . . . . . 360 12.7 Modeling Network Exchange: Stable Outcomes . . . . . . . . . . . . . . . . 362 12.8 Modeling Network Exchange: Balanced Outcomes . . . . . . . . . . . . . . . 366 12.9 Advanced Material: A Game-Theoretic Approach to Bargaining . . . . . . . 369 12.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 IV Information Networks and the World Wide Web 381 13 The Structure of the Web 383 13.1 The World Wide Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 13.2 Information Networks, Hypertext, and Associative Memory . . . . . . . . . . 386 13.3 The Web as a Directed Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 394 13.4 The Bow-Tie Structure of the Web . . . . . . . . . . . . . . . . . . . . . . . 397 13.5 The Emergence of Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 6 CONTENTS 14 Link Analysis and Web Search 405 14.1 Searching the Web: The Problem of Ranking . . . . . . . . . . . . . . . . . . 405 14.2 Link Analysis using Hubs and Authorities . . . . . . . . . . . . . . . . . . . 407 14.3 PageRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 14.4 Applying Link Analysis in Modern Web Search . . . . . . . . . . . . . . . . 420 14.5 Applications beyond the Web . . . . . . . . . . . . . . . . . . . . . . . . . . 423 14.6 Advanced Material: Spectral Analysis, Random Walks, and Web Search . . . 425 14.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 15 Sponsored Search Markets 445 15.1 Advertising Tied to Search Behavior . . . . . . . . . . . . . . . . . . . . . . 445 15.2 Advertising as a Matching Market . . . . . . . . . . . . . . . . . . . . . . . . 448 15.3 Encouraging Truthful Bidding in Matching Markets: The VCG Principle . . 452 15.4 Analyzing the VCG Procedure: Truth-Telling as a Dominant Strategy . . . . 457 15.5 The Generalized Second Price Auction . . . . . . . . . . . . . . . . . . . . . 460 15.6 Equilibria of the Generalized Second Price Auction . . . . . . . . . . . . . . 464 15.7 Ad Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 15.8 Complex Queries and Interactions Among Keywords . . . . . . . . . . . . . 469 15.9 Advanced Material: VCG Prices and the Market-Clearing Property . . . . . 470 15.10Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 V Network Dynamics: Population Models 489 16 Information Cascades 491 16.1 Following the Crowd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 16.2 A Simple Herding Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 493 16.3 Bayes’ Rule: A Model of Decision-Making Under Uncertainty . . . . . . . . . 497 16.4 Bayes’ Rule in the Herding Experiment . . . . . . . . . . . . . . . . . . . . . 502 16.5 A Simple, General Cascade Model . . . . . . . . . . . . . . . . . . . . . . . . 504 16.6 Sequential Decision-Making and Cascades . . . . . . . . . . . . . . . . . . . 508 16.7 Lessons from Cascades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 16.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 17 Network Effects 517 17.1 The Economy Without Network Effects . . . . . . . . . . . . . . . . . . . . . 518 17.2 The Economy with Network Effects . . . . . . . . . . . . . . . . . . . . . . . 522 17.3 Stability, Instability, and Tipping Points . . . . . . . . . . . . . . . . . . . . 525 17.4 A Dynamic View of the Market . . . . . . . . . . . . . . . . . . . . . . . . . 527 17.5 Industries with Network Goods . . . . . . . . . . . . . . . . . . . . . . . . . 534 17.6 Mixing Individual Effects with Population-Level Effects . . . . . . . . . . . . 536 17.7 Advanced Material: Negative Externalities and The El Farol Bar Problem . 541 17.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549 CONTENTS 7 18 Power Laws and Rich-Get-Richer Phenomena 553 18.1 Popularity as a Network Phenomenon . . . . . . . . . . . . . . . . . . . . . . 553 18.2 Power Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 18.3 Rich-Get-Richer Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 18.4 The Unpredictability of Rich-Get-Richer Effects . . . . . . . . . . . . . . . . 559 18.5 The Long Tail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 18.6 The Effect of Search Tools and Recommendation Systems . . . . . . . . . . . 564 18.7 Advanced Material: Analysis of Rich-Get-Richer Processes . . . . . . . . . . 565 18.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 VI Network Dynamics: Structural Models 571 19 Cascading Behavior in Networks 573 19.1 Diffusion in Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 19.2 Modeling Diffusion through a Network . . . . . . . . . . . . . . . . . . . . . 575 19.3 Cascades and Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 19.4 Diffusion, Thresholds, and the Role of Weak Ties . . . . . . . . . . . . . . . 588 19.5 Extensions of the Basic Cascade Model . . . . . . . . . . . . . . . . . . . . . 590 19.6 Knowledge, Thresholds, and Collective Action . . . . . . . . . . . . . . . . . 593 19.7 Advanced Material: The Cascade Capacity . . . . . . . . . . . . . . . . . . . 597 19.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 20 The Small-World Phenomenon 621 20.1 Six Degrees of Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621 20.2 Structure and Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 20.3 Decentralized Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 20.4 Modeling the Process of Decentralized Search . . . . . . . . . . . . . . . . . 629 20.5 Empirical Analysis and Generalized Models . . . . . . . . . . . . . . . . . . 632 20.6 Core-Periphery Structures and Difficulties in Decentralized Search . . . . . . 638 20.7 Advanced Material: Analysis of Decentralized Search . . . . . . . . . . . . . 640 20.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652 21 Epidemics 655 21.1 Diseases and the Networks that Transmit Them . . . . . . . . . . . . . . . . 655 21.2 Branching Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657 21.3 The SIR Epidemic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660 21.4 The SIS Epidemic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666 21.5 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669 21.6 Transient Contacts and the Dangers of Concurrency . . . . . . . . . . . . . . 672 21.7 Genealogy, Genetic Inheritance, and Mitochondrial Eve . . . . . . . . . . . . 676 21.8 Advanced Material: Analysis of Branching and Coalescent Processes . . . . . 682 21.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695 8 CONTENTS VII Institutions and Aggregate Behavior 699 22 Markets and Information 701 22.1 Markets with Exogenous Events . . . . . . . . . . . . . . . . . . . . . . . . . 702 22.2 Horse Races, Betting, and Beliefs . . . . . . . . . . . . . . . . . . . . . . . . 704 22.3 Aggregate Beliefs and the “Wisdom of Crowds” . . . . . . . . . . . . . . . . 710 22.4 Prediction Markets and Stock Markets . . . . . . . . . . . . . . . . . . . . . 714 22.5 Markets with Endogenous Events . . . . . . . . . . . . . . . . . . . . . . . . 717 22.6 The Market for Lemons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 22.7 Asymmetric Information in Other Markets . . . . . . . . . . . . . . . . . . . 724 22.8 Signaling Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728 22.9 Quality Uncertainty On-Line: Reputation Systems and Other Mechanisms . 729 22.10Advanced Material: Wealth Dynamics in Markets . . . . . . . . . . . . . . . 732 22.11Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740 23 Voting 745 23.1 Voting for Group Decision-Making . . . . . . . . . . . . . . . . . . . . . . . 745 23.2 Individual Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 23.3 Voting Systems: Majority Rule . . . . . . . . . . . . . . . . . . . . . . . . . 750 23.4 Voting Systems: Positional Voting . . . . . . . . . . . . . . . . . . . . . . . . 755 23.5 Arrow’s Impossibility Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 758 23.6 Single-Peaked Preferences and the Median Voter Theorem . . . . . . . . . . 760 23.7 Voting as a Form of Information Aggregation . . . . . . . . . . . . . . . . . . 766 23.8 Insincere Voting for Information Aggregation . . . . . . . . . . . . . . . . . . 768 23.9 Jury Decisions and the Unanimity Rule . . . . . . . . . . . . . . . . . . . . . 771 23.10Sequential Voting and the Relation to Information Cascades . . . . . . . . . 776 23.11Advanced Material: A Proof of Arrow’s Impossibility Theorem . . . . . . . . 777 23.12Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782 24 Property Rights 785 24.1 Externalities and the Coase Theorem . . . . . . . . . . . . . . . . . . . . . . 785 24.2 The Tragedy of the Commons . . . . . . . . . . . . . . . . . . . . . . . . . . 790 24.3 Intellectual Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793 24.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796 Chapter 1 Overview Over the past decade there has been a growing public fascination with the complex “connectedness” of modern society.


Beautiful Visualization by Julie Steele

barriers to entry, correlation does not imply causation, data acquisition, data science, database schema, Drosophila, en.wikipedia.org, epigenetics, global pandemic, Hans Rosling, index card, information retrieval, iterative process, linked data, Mercator projection, meta-analysis, natural language processing, Netflix Prize, no-fly zone, pattern recognition, peer-to-peer, performance metric, power law, QR code, recommendation engine, semantic web, social bookmarking, social distancing, social graph, sorting algorithm, Steve Jobs, the long tail, web application, wikimedia commons, Yochai Benkler

Preference Similarity A well-known measure of similarity used in many recommendation systems is cosine similarity. A practical introduction to this technique can be found in Linden, Smith, and York (2003). In the case of movies, intuitively, the measure indicates that two movies are similar if users who rated one highly rated the other highly or, conversely, users who rated one poorly rated the other poorly. We’ll use this similarity measure to generate similarity data for all 17,700 movies in the Netflix Prize dataset, then generate coordinates based on that data. If we were interested in building an actual movie recommender system, we might do so simply by recommending the movies that were similar to those a user had rated highly.

If we were interested in building an actual movie recommender system, we might do so simply by recommending the movies that were similar to those a user had rated highly. However, the goal here is just to gain insight into the dynamics of such a recommender system. Labeling The YELLOWPAGES.COM visualization was easier to label than this Netflix Prize visualization for a number of reasons, including fewer nodes and shorter labels, but mostly because the nodes were more uniformly distributed. Although the Netflix Prize visualization has a large number of clusters, most of the movies are contained in only a small number of those clusters. This disparity is even more apparent when we look at only the movies with the most ratings.


pages: 201 words: 21,180

Designing for the Social Web by Joshua Porter

barriers to entry, classic study, en.wikipedia.org, endowment effect, fail fast, Howard Rheingold, late fees, Marc Andreessen, Mark Zuckerberg, Milgram experiment, Paradox of Choice, Paul Buchheit, Ralph Waldo Emerson, recommendation engine, social bookmarking, social software, social web, Steve Jobs, the Cathedral and the Bazaar, web application, Yochai Benkler, zero-sum game

Del.icio.us simply counts the number of bookmarks that people have saved in the last x hours and orders them from most popular to least popular, displaying as a “most popular” list of bookmarks that people have saved recently7. . Participant ranking. The Digg Top Diggers page was a ranking system that took into account measures of desired behavior to come up with an overall rank for each Digger. . Collaborative filtering. Netflix’s recommendation system relies on collaborative filtering to display recommended movies based on your previous ratings. . Relevance. Services like Google rely on a complex algorithm to determine what to display. Figuring out which content is relevant is a big deal to Google—it’s the core value of the entire service. .

See also Netflix Movies For You screen, Netflix, 105–106 MSN Groups, 122 MSNBC.com, 157–158 MusicLab study, 137–139 MySpace, 13, 16, 18, 119 N nature vs. nurture debate, 8 navigation, non-linear, 171–172 Neeleman, David, 61, 62 negative feedback, 57–62, 139. See also feedback Netflix collaborative filtering of ratings on, 136 as example of complex adaptive system, 128, 129 as example of successful social object, 32 goals/activities/tasks for, 27 “How It Works” graphic, 73–74 Movies For You screen, 105–106 primary activity for, 26 recommendation system, 136 Netvibes, 92–93 network value, 24 networked world, designing for, viii New York Times most-shared articles screen, 160–161 sharing call to action, 149, 150–151, 152 Newmark, Craig, 51, 54 news feed blowup, Facebook, 116–118 news sites, 17, 133, 136 Newsvine, 153 Nielsen/NetRatings, 20 Nike+, 17 non-interactive entertainment, vii–viii non-linear navigation, 171–172 Norman, Dan, 25 notifications feature, 104 nytimes.com, 149.


pages: 660 words: 141,595

Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking by Foster Provost, Tom Fawcett

Albert Einstein, Amazon Mechanical Turk, Apollo 13, big data - Walmart - Pop Tarts, bioinformatics, business process, call centre, chief data officer, Claude Shannon: information theory, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, data acquisition, data science, David Brooks, en.wikipedia.org, Erik Brynjolfsson, Gini coefficient, Helicobacter pylori, independent contractor, information retrieval, intangible asset, iterative process, Johann Wolfgang von Goethe, Louis Pasteur, Menlo Park, Nate Silver, Netflix Prize, new economy, p-value, pattern recognition, placebo effect, price discrimination, recommendation engine, Ronald Coase, selection bias, Silicon Valley, Skype, SoftBank, speech recognition, Steve Jobs, supply-chain management, systems thinking, Teledyne, text mining, the long tail, The Signal and the Noise by Nate Silver, Thomas Bayes, transaction costs, WikiLeaks

Definitions of data scientists (and advertisements for positions) specify not just areas of expertise but also specific programming languages and tools. It is common to see job advertisements mentioning data mining techniques (e.g., random forests, support vector machines), specific application areas (recommendation systems, ad placement optimization), alongside popular software tools for processing big data (Hadoop, MongoDB). There is often little distinction between the science and the technology for dealing with large datasets. We must point out that data science, like computer science, is a young field.

For example, analyzing purchase records from a supermarket may uncover that ground meat is purchased together with hot sauce much more frequently than we might expect. Deciding how to act upon this discovery might require some creativity, but it could suggest a special promotion, product display, or combination offer. Co-occurrence of products in purchases is a common type of grouping known as market-basket analysis. Some recommendation systems also perform a type of affinity grouping by finding, for example, pairs of books that are purchased frequently by the same people (“people who bought X also bought Y”). The result of co-occurrence grouping is a description of items that occur together. These descriptions usually include statistics on the frequency of the co-occurrence and an estimate of how surprising it is.

If that’s what our algorithm is doing, we’re using the wrong algorithm. For regression problems we have a directly analogous baseline: predict the average value over the population (usually the mean or median). In some applications there are multiple simple averages that one may want to combine. For example, when evaluating recommender systems that internally predict how many “stars” a particular customer would give to a particular movie, we have the average number of stars a movie gets across the population (how well liked it is) and the average number of stars a particular customer gives to movies (what that customer’s overall bias is).


pages: 389 words: 87,758

No Ordinary Disruption: The Four Global Forces Breaking All the Trends by Richard Dobbs, James Manyika

2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, access to a mobile phone, additive manufacturing, Airbnb, Amazon Mechanical Turk, American Society of Civil Engineers: Report Card, asset light, autonomous vehicles, Bakken shale, barriers to entry, business cycle, business intelligence, carbon tax, Carmen Reinhart, central bank independence, circular economy, cloud computing, corporate governance, creative destruction, crowdsourcing, data science, demographic dividend, deskilling, digital capitalism, disintermediation, disruptive innovation, distributed generation, driverless car, Erik Brynjolfsson, financial innovation, first square of the chessboard, first square of the chessboard / second half of the chessboard, Gini coefficient, global supply chain, global village, high-speed rail, hydraulic fracturing, illegal immigration, income inequality, index fund, industrial robot, intangible asset, Intergovernmental Panel on Climate Change (IPCC), Internet of things, inventory management, job automation, Just-in-time delivery, Kenneth Rogoff, Kickstarter, knowledge worker, labor-force participation, low interest rates, low skilled workers, Lyft, M-Pesa, machine readable, mass immigration, megacity, megaproject, mobile money, Mohammed Bouazizi, Network effects, new economy, New Urbanism, ocean acidification, oil shale / tar sands, oil shock, old age dependency ratio, openstreetmap, peer-to-peer lending, pension reform, pension time bomb, private sector deleveraging, purchasing power parity, quantitative easing, recommendation engine, Report Card for America’s Infrastructure, RFID, ride hailing / ride sharing, Salesforce, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Silicon Valley startup, Skype, smart cities, Snapchat, sovereign wealth fund, spinning jenny, stem cell, Steve Jobs, subscription business, supply-chain management, synthetic biology, TaskRabbit, The Great Moderation, trade route, transaction costs, Travis Kalanick, uber lyft, urban sprawl, Watson beat the top human players on Jeopardy!, working-age population, Zipcar

To illustrate the scale of the opportunity, consider this change: on July 31, 2013, the US Bureau of Economic Analysis released GDP figures that for the first time categorized research and development and software into a new category of “intellectual property products.” We estimate that digital capital is now the source of roughly one-third of total global GDP growth, with intangible assets (think of the value of Google’s search algorithm or Amazon’s recommendation engine) being the main driver.41 For businesses and governments alike, failing to navigate today’s technological tide will mean losing out on a huge economic opportunity as well as increasing vulnerability to potential disruptions. Digitization and technological advances can transform industries in the blink of an eye, as BlackBerry has learned.


pages: 319 words: 89,477

The Power of Pull: How Small Moves, Smartly Made, Can Set Big Things in Motion by John Hagel Iii, John Seely Brown

Albert Einstein, Andrew Keen, barriers to entry, Black Swan, business process, call centre, Clayton Christensen, clean tech, cloud computing, commoditize, corporate governance, creative destruction, disruptive innovation, Elon Musk, en.wikipedia.org, future of work, game design, George Gilder, intangible asset, Isaac Newton, job satisfaction, Joi Ito, knowledge economy, knowledge worker, loose coupling, Louis Pasteur, Malcom McLean invented shipping containers, Marc Benioff, Maui Hawaii, medical residency, Network effects, old-boy network, packet switching, pattern recognition, peer-to-peer, pre–internet, profit motive, recommendation engine, Ronald Coase, Salesforce, shareholder value, Silicon Valley, Skype, smart transportation, software as a service, supply-chain management, tacit knowledge, The Nature of the Firm, the new new thing, the strength of weak ties, too big to fail, trade liberalization, transaction costs, TSMC, Yochai Benkler

Blurring Creation and Use Pull platforms tend to allow us to perform the following activities, with a blurring of the boundaries between creation and use: • Find. Pull platforms allow us to find not just raw materials, products, and services, but also people with relevant skills and experience. Some of the tools and services that pull platforms use to help participants find relevant resources include search, recommendation engines, directories, agents, and reputation services. • Connect. Again, pull platforms connect us not just to raw materials, products, and services, but also to people with relevant skills and experiences. Performance fabrics5 are particularly helpful in establishing appropriate connections. The mobile Internet is dramatically extending our ability to connect wherever we are


pages: 713 words: 93,944

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Eric Redmond, Jim Wilson, Jim R. Wilson

AGPL, Amazon Web Services, business logic, create, read, update, delete, data is the new oil, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, full text search, general-purpose programming language, Kickstarter, Large Hadron Collider, linked data, MVC pattern, natural language processing, node package manager, random walk, recommendation engine, Ruby on Rails, seminal paper, Skype, social graph, sparse data, web application

Unlike other database styles that group collections of like objects into common buckets, graph databases are more free-form—queries consist of following edges shared by two nodes or, namely, traversing nodes. As more projects use them, graph databases are growing the straightforward social examples to occupy more nuanced use cases, such as recommendation engines, access control lists, and geographic data. Good For: Graph databases seem to be tailor-made for networking applications. The prototypical example is a social network, where nodes represent users who have various kinds of relationships to each other. Modeling this kind of data using any of the other styles is often a tough fit, but a graph database would accept it with relish.


pages: 294 words: 96,661

The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity by Byron Reese

"World Economic Forum" Davos, agricultural Revolution, AI winter, Apollo 11, artificial general intelligence, basic income, bread and circuses, Buckminster Fuller, business cycle, business process, Charles Babbage, Claude Shannon: information theory, clean water, cognitive bias, computer age, CRISPR, crowdsourcing, dark matter, DeepMind, Edward Jenner, Elon Musk, Eratosthenes, estate planning, financial independence, first square of the chessboard, first square of the chessboard / second half of the chessboard, flying shuttle, full employment, Hans Moravec, Hans Rosling, income inequality, invention of agriculture, invention of movable type, invention of the printing press, invention of writing, Isaac Newton, Islamic Golden Age, James Hargreaves, job automation, Johannes Kepler, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, lateral thinking, life extension, Louis Pasteur, low interest rates, low skilled workers, manufacturing employment, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Mary Lou Jepsen, Moravec's paradox, Nick Bostrom, On the Revolutions of the Heavenly Spheres, OpenAI, pattern recognition, profit motive, quantum entanglement, radical life extension, Ray Kurzweil, recommendation engine, Rodney Brooks, Sam Altman, self-driving car, seminal paper, Silicon Valley, Skype, spinning jenny, Stephen Hawking, Steve Wozniak, Steven Pinker, strong AI, technological singularity, TED Talk, telepresence, telepresence robot, The Future of Employment, the scientific method, Timothy McVeigh, Turing machine, Turing test, universal basic income, Von Neumann architecture, Wall-E, warehouse robotics, Watson beat the top human players on Jeopardy!, women in the workforce, working poor, Works Progress Administration, Y Combinator

Reducing it down it to ones and zeros is obviously possible, but equally obviously difficult for a device that can only manipulate abstract symbols in memory. One wrinkle with these sorts of perception problems is that we don’t have the training data to teach the robots. Amazon has a huge database of “people who bought this also bought that” with which to train its recommendation engine. But we don’t have all the tactile data of a million adults holding a million babies in a thousand situations. We could certainly collect the data by making a version of those CGI suits that people wear when making movies. Using upgraded sensors in the hands and fingers, we could get a thousand parents to wear them for a year to begin to collect that data.


pages: 339 words: 88,732

The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson, Andrew McAfee

2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, access to a mobile phone, additive manufacturing, Airbnb, Alan Greenspan, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, American Society of Civil Engineers: Report Card, Any sufficiently advanced technology is indistinguishable from magic, autonomous vehicles, barriers to entry, basic income, Baxter: Rethink Robotics, Boston Dynamics, British Empire, business cycle, business intelligence, business process, call centre, carbon tax, Charles Lindbergh, Chuck Templeton: OpenTable:, clean water, combinatorial explosion, computer age, computer vision, congestion charging, congestion pricing, corporate governance, cotton gin, creative destruction, crowdsourcing, data science, David Ricardo: comparative advantage, digital map, driverless car, employer provided health coverage, en.wikipedia.org, Erik Brynjolfsson, factory automation, Fairchild Semiconductor, falling living standards, Filter Bubble, first square of the chessboard / second half of the chessboard, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, full employment, G4S, game design, general purpose technology, global village, GPS: selective availability, Hans Moravec, happiness index / gross national happiness, illegal immigration, immigration reform, income inequality, income per capita, indoor plumbing, industrial robot, informal economy, intangible asset, inventory management, James Watt: steam engine, Jeff Bezos, Jevons paradox, jimmy wales, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, Khan Academy, Kiva Systems, knowledge worker, Kodak vs Instagram, law of one price, low skilled workers, Lyft, Mahatma Gandhi, manufacturing employment, Marc Andreessen, Mark Zuckerberg, Mars Rover, mass immigration, means of production, Narrative Science, Nate Silver, natural language processing, Network effects, new economy, New Urbanism, Nicholas Carr, Occupy movement, oil shale / tar sands, oil shock, One Laptop per Child (OLPC), pattern recognition, Paul Samuelson, payday loans, post-work, power law, price stability, Productivity paradox, profit maximization, Ralph Nader, Ray Kurzweil, recommendation engine, Report Card for America’s Infrastructure, Robert Gordon, Robert Solow, Rodney Brooks, Ronald Reagan, search costs, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Simon Kuznets, six sigma, Skype, software patent, sovereign wealth fund, speech recognition, statistical model, Steve Jobs, Steven Pinker, Stuxnet, supply-chain management, TaskRabbit, technological singularity, telepresence, The Bell Curve by Richard Herrnstein and Charles Murray, the Cathedral and the Bazaar, the long tail, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, total factor productivity, transaction costs, Tyler Cowen, Tyler Cowen: Great Stagnation, Vernor Vinge, warehouse robotics, Watson beat the top human players on Jeopardy!, winner-take-all economy, Y2K

When there are many small local markets, there can be a ‘best’ provider in each, and these local heroes frequently can all earn a good income. If these markets merge into a single global market, top performers have an opportunity to win more customers, while the next-best performers face harsher competition from all directions. A similar dynamic comes into play when technologies like Google or even Amazon’s recommendation engine reduce search costs. Suddenly second-rate producers can no longer count on consumer ignorance or geographic barriers to protect their margins. Digital technologies have aided the transition to winner-take-all markets, even for products we wouldn’t think would have superstar status. In a traditional camera store, cameras typically are not ranked number one versus number ten.


System Error by Rob Reich

"Friedman doctrine" OR "shareholder theory", "World Economic Forum" Davos, 2021 United States Capitol attack, A Declaration of the Independence of Cyberspace, Aaron Swartz, AI winter, Airbnb, airport security, Alan Greenspan, Albert Einstein, algorithmic bias, AlphaGo, AltaVista, artificial general intelligence, Automated Insights, autonomous vehicles, basic income, Ben Horowitz, Berlin Wall, Bernie Madoff, Big Tech, bitcoin, Blitzscaling, Cambridge Analytica, Cass Sunstein, clean water, cloud computing, computer vision, contact tracing, contact tracing app, coronavirus, corporate governance, COVID-19, creative destruction, CRISPR, crowdsourcing, data is the new oil, data science, decentralized internet, deep learning, deepfake, DeepMind, deplatforming, digital rights, disinformation, disruptive innovation, Donald Knuth, Donald Trump, driverless car, dual-use technology, Edward Snowden, Elon Musk, en.wikipedia.org, end-to-end encryption, Fairchild Semiconductor, fake news, Fall of the Berlin Wall, Filter Bubble, financial engineering, financial innovation, fulfillment center, future of work, gentrification, Geoffrey Hinton, George Floyd, gig economy, Goodhart's law, GPT-3, Hacker News, hockey-stick growth, income inequality, independent contractor, informal economy, information security, Jaron Lanier, Jeff Bezos, Jim Simons, jimmy wales, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John Perry Barlow, Lean Startup, linear programming, Lyft, Marc Andreessen, Mark Zuckerberg, meta-analysis, minimum wage unemployment, Monkeys Reject Unequal Pay, move fast and break things, Myron Scholes, Network effects, Nick Bostrom, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, NP-complete, Oculus Rift, OpenAI, Panopticon Jeremy Bentham, Parler "social media", pattern recognition, personalized medicine, Peter Thiel, Philippa Foot, premature optimization, profit motive, quantitative hedge fund, race to the bottom, randomized controlled trial, recommendation engine, Renaissance Technologies, Richard Thaler, ride hailing / ride sharing, Ronald Reagan, Sam Altman, Sand Hill Road, scientific management, self-driving car, shareholder value, Sheryl Sandberg, Shoshana Zuboff, side project, Silicon Valley, Snapchat, social distancing, Social Responsibility of Business Is to Increase Its Profits, software is eating the world, spectrum auction, speech recognition, stem cell, Steve Jobs, Steven Levy, strong AI, superintelligent machines, surveillance capitalism, Susan Wojcicki, tech billionaire, tech worker, techlash, technoutopianism, Telecommunications Act of 1996, telemarketer, The Future of Employment, TikTok, Tim Cook: Apple, traveling salesman, Triangle Shirtwaist Factory, trolley problem, Turing test, two-sided market, Uber and Lyft, uber lyft, ultimatum game, union organizing, universal basic income, washing machines reduced drudgery, Watson beat the top human players on Jeopardy!, When a measure becomes a target, winner-take-all economy, Y Combinator, you are the product

Facebook’s business model is to increase the time we spend on its platform and then sell access to our personalized profiles to advertisers and political operatives who seek to manipulate our behavior and dump the by-product of that manipulation onto our personal lives and democratic institutions. YouTube’s recommendation systems and default autoplay setting keep users watching videos on its platform while pushing people into echo chambers and feeding them more extreme content, thereby undermining our democracies, which rely on facts and trust. And Uber’s and Waymo’s push for automated vehicles may increase productivity but leave displaced and unemployed workers at the mercy of the government’s feeble social safety net.

It requires us to be explicit about the values we want to promote and how we trade off among them, because those values are encoded in some way into the objective functions that are optimized. Technology is also an amplifier because it can often enable the execution of a particular policy to reach a goal far more efficiently than a human can. It can power an autonomous vehicle to drive more safely than your neighbor does or be the basis of a recommendation system that keeps you watching online videos far longer than you intended. Even well-meaning policies can easily become objectionable when technology enables their hyper-efficient automation. With current GPS and mapping technology it would be possible to produce vehicles that would automatically issue a speeding ticket every time the driver exceeded the speed limit—and would eventually stop the car from moving and issue a warrant for the driver’s arrest when he or she had accumulated enough speeding tickets.

We admit that it’s a strange time to be mounting a defense of democracy and civic empowerment as the antidote to big tech’s current predicaments. The public’s faith in our governing institutions is at historic lows. Yet we must also remember that the distrust in democracy is partly a product of the rise of technologists. The recommendation systems and algorithmic curation of the private platforms that constitute the infrastructure of our digital public sphere have contributed to polarization and supercharged the spread of misinformation. And the tech industry has contributed to a winner-take-all economy, which has in turn widened wealth and income inequality, phenomena that social scientists have repeatedly demonstrated undermine confidence in democratic institutions.


pages: 326 words: 103,170

The Seventh Sense: Power, Fortune, and Survival in the Age of Networks by Joshua Cooper Ramo

air gap, Airbnb, Alan Greenspan, Albert Einstein, algorithmic trading, barriers to entry, Berlin Wall, bitcoin, Bletchley Park, British Empire, cloud computing, Computing Machinery and Intelligence, crowdsourcing, Danny Hillis, data science, deep learning, defense in depth, Deng Xiaoping, drone strike, Edward Snowden, Fairchild Semiconductor, Fall of the Berlin Wall, financial engineering, Firefox, Google Chrome, growth hacking, Herman Kahn, income inequality, information security, Isaac Newton, Jeff Bezos, job automation, Joi Ito, Laura Poitras, machine translation, market bubble, Menlo Park, Metcalfe’s law, Mitch Kapor, Morris worm, natural language processing, Neal Stephenson, Network effects, Nick Bostrom, Norbert Wiener, Oculus Rift, off-the-grid, packet switching, paperclip maximiser, Paul Graham, power law, price stability, quantitative easing, RAND corporation, reality distortion field, Recombinant DNA, recommendation engine, Republic of Letters, Richard Feynman, road to serfdom, Robert Metcalfe, Sand Hill Road, secular stagnation, self-driving car, Silicon Valley, Skype, Snapchat, Snow Crash, social web, sovereign wealth fund, Steve Jobs, Steve Wozniak, Stewart Brand, Stuxnet, superintelligent machines, systems thinking, technological singularity, The Coming Technological Singularity, The Wealth of Nations by Adam Smith, too big to fail, Vernor Vinge, zero day

And, well, you had liked that film. This seemed magic, just the sort of data-meets-human question that showcased a machine learning and thinking. An honestly artificial intelligence. Maes hoped to design a computer that could predict what movies or music or books you or I might enjoy. (And, of course, buy.) A recommendation engine. We all know how sputtering our own suggestion motors can be. Think of that primitive analog exchange known as the First Date: Oh, you like Radiohead? Do you know Sigur Rós? Pause. Hate them. Can you really predict what albums or novels even your closest friend will enjoy? You might offer an occasional lucky suggestion.


pages: 323 words: 95,939

Present Shock: When Everything Happens Now by Douglas Rushkoff

"Hurricane Katrina" Superdome, algorithmic trading, Alvin Toffler, Andrew Keen, bank run, behavioural economics, Benoit Mandelbrot, big-box store, Black Swan, British Empire, Buckminster Fuller, business cycle, cashless society, citizen journalism, clockwork universe, cognitive dissonance, Credit Default Swap, crowdsourcing, Danny Hillis, disintermediation, Donald Trump, double helix, East Village, Elliott wave, European colonialism, Extropian, facts on the ground, Flash crash, Future Shock, game design, global pandemic, global supply chain, global village, Howard Rheingold, hypertext link, Inbox Zero, invention of agriculture, invention of hypertext, invisible hand, iterative process, James Bridle, John Nash: game theory, Kevin Kelly, laissez-faire capitalism, lateral thinking, Law of Accelerating Returns, Lewis Mumford, loss aversion, mandelbrot fractal, Marshall McLuhan, Merlin Mann, messenger bag, Milgram experiment, mirror neurons, mutually assured destruction, negative equity, Network effects, New Urbanism, Nicholas Carr, Norbert Wiener, Occupy movement, off-the-grid, passive investing, pattern recognition, peak oil, Peter Pan Syndrome, price mechanism, prisoner's dilemma, Ralph Nelson Elliott, RAND corporation, Ray Kurzweil, recommendation engine, scientific management, selective serotonin reuptake inhibitor (SSRI), Silicon Valley, SimCity, Skype, social graph, South Sea Bubble, Steve Jobs, Steve Wozniak, Steven Pinker, Stewart Brand, supply-chain management, technological determinism, the medium is the message, The Wisdom of Crowds, theory of mind, Tragedy of the Commons, Turing test, upwardly mobile, Whole Earth Catalog, WikiLeaks, Y2K, zero-sum game

That adds up to millions of unskilled, untrained, unpaid, unknown ‘journalists’—a thousandfold growth between 1996 and 2006—spewing their (mis)information out in the cyberworld.” More sanguine voices, such as City University of New York journalism professor and BuzzFeed blogger Jeff Jarvis, argue that the market—amplified by search results and recommendation engines—will eventually allow the better journalism to rise to the top of the pile. But even market mechanisms may have a hard time functioning as we consumers of all this media lose our ability to distinguish between facts, informed opinions, and wild assertions. Our impatient disgust with politics as usual combined with our newfound faith in our own gut sensibilities drives us to take matters into our own hands—in journalism and beyond.


Mindf*ck: Cambridge Analytica and the Plot to Break America by Christopher Wylie

4chan, affirmative action, Affordable Care Act / Obamacare, air gap, availability heuristic, Berlin Wall, Bernie Sanders, Big Tech, big-box store, Boris Johnson, Brexit referendum, British Empire, call centre, Cambridge Analytica, Chelsea Manning, chief data officer, cognitive bias, cognitive dissonance, colonial rule, computer vision, conceptual framework, cryptocurrency, Daniel Kahneman / Amos Tversky, dark pattern, dark triade / dark tetrad, data science, deep learning, desegregation, disinformation, Dominic Cummings, Donald Trump, Downton Abbey, Edward Snowden, Elon Musk, emotional labour, Etonian, fake news, first-past-the-post, gamification, gentleman farmer, Google Earth, growth hacking, housing crisis, income inequality, indoor plumbing, information asymmetry, Internet of things, Julian Assange, Lyft, Marc Andreessen, Mark Zuckerberg, Menlo Park, move fast and break things, Network effects, new economy, obamacare, Peter Thiel, Potemkin village, recommendation engine, Renaissance Technologies, Robert Mercer, Ronald Reagan, Rosa Parks, Sand Hill Road, Scientific racism, Shoshana Zuboff, side project, Silicon Valley, Skype, Stephen Fry, Steve Bannon, surveillance capitalism, tech bro, uber lyft, unpaid internship, Valery Gerasimov, web application, WikiLeaks, zero-sum game

Cambridge Analytica did this because of a specific feature of Facebook’s algorithm at the time. When someone follows pages of generic brands like Walmart or some prime-time sitcom, nothing much changes in his newsfeed. But liking an extreme group, such as the Proud Boys or the Incel Liberation Army, marks the user as distinct from others in such a way that a recommendation engine will prioritize these topics for personalization. Which means the site’s algorithm will start to funnel the user similar stories and pages—all to increase engagement. For Facebook, rising engagement is the only metric that matters, as more engagement means more screen time to be exposed to advertisements.


pages: 332 words: 100,245

Mine!: How the Hidden Rules of Ownership Control Our Lives by Michael A. Heller, James Salzman

23andMe, Airbnb, behavioural economics, Berlin Wall, Big Tech, British Empire, Cass Sunstein, clean water, collaborative consumption, Cornelius Vanderbilt, coronavirus, COVID-19, CRISPR, crowdsourcing, Donald Trump, Downton Abbey, Elon Musk, endowment effect, estate planning, facts on the ground, Fall of the Berlin Wall, Firefox, Garrett Hardin, gig economy, Hernando de Soto, Internet of things, land tenure, Mason jar, Neil Armstrong, new economy, North Sea oil, offshore financial centre, oil rush, planetary scale, race to the bottom, recommendation engine, rent control, Richard Thaler, Ronald Coase, sharing economy, Shoshana Zuboff, Silicon Valley, Silicon Valley startup, social distancing, South China Sea, sovereign wealth fund, stem cell, surveillance capitalism, TaskRabbit, The future is already here, Tim Cook: Apple, Tony Fadell, Tragedy of the Commons, you are the product, Zipcar

We quickly learn to tune out unpleasant ownership details—in part because the digital economy brings so much immediate gratification. There’s a reason streaming services are replacing home bookshelves. While some may be nostalgic for their wall of treasured CDs, many prefer the vast library and song-recommendation engine available with a click on Spotify—both old favorites and new discoveries. We also benefit as consumers because licensing the stick can be cheaper than owning the bundle. Companies can maximize revenue by offering us just what we want right that minute. We may feel we own more, but we really don’t.


Forward: Notes on the Future of Our Democracy by Andrew Yang

2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, Affordable Care Act / Obamacare, Amazon Web Services, American Society of Civil Engineers: Report Card, basic income, benefit corporation, Bernie Sanders, blockchain, blue-collar work, call centre, centre right, clean water, contact tracing, coronavirus, correlation does not imply causation, COVID-19, data is the new oil, data science, deepfake, disinformation, Donald Trump, facts on the ground, fake news, forensic accounting, future of work, George Floyd, gig economy, global pandemic, income inequality, independent contractor, Jaron Lanier, Jeff Bezos, job automation, Kevin Roose, labor-force participation, Marc Benioff, Mark Zuckerberg, medical bankruptcy, new economy, obamacare, opioid epidemic / opioid crisis, pez dispenser, QAnon, recommendation engine, risk tolerance, rolodex, Ronald Reagan, Rutger Bregman, Sam Altman, Saturday Night Live, shareholder value, Shoshana Zuboff, Silicon Valley, Simon Kuznets, single-payer health, Snapchat, social distancing, SoftBank, surveillance capitalism, systematic bias, tech billionaire, TED Talk, The Day the Music Died, the long tail, TikTok, universal basic income, winner-take-all economy, working poor

On my social media platforms, the algorithms that determine which content I see are constantly suggesting social media posts to amplify; many of them express sentiments of outrage and hostility toward someone or something. I ignore most of them. Due to the insidious nature of these platforms’ recommendation engines, however, that’s hard to do. You might be watching something relatively benign on YouTube—for example, a news documentary about the 9/11 attacks. In the list of suggested links next to the video you’re watching, however, there is often something far more inflammatory, such as a video espousing conspiracy theories.


pages: 265 words: 69,310

What's Yours Is Mine: Against the Sharing Economy by Tom Slee

4chan, Airbnb, Amazon Mechanical Turk, asset-backed security, barriers to entry, Benchmark Capital, benefit corporation, Berlin Wall, big-box store, bike sharing, bitcoin, blockchain, Californian Ideology, citizen journalism, collaborative consumption, commons-based peer production, congestion charging, Credit Default Swap, crowdsourcing, data acquisition, data science, David Brooks, democratizing finance, do well by doing good, don't be evil, Dr. Strangelove, emotional labour, Evgeny Morozov, gentrification, gig economy, Hacker Ethic, impact investing, income inequality, independent contractor, informal economy, invisible hand, Jacob Appelbaum, Jane Jacobs, Jeff Bezos, John Zimmer (Lyft cofounder), Kevin Roose, Khan Academy, Kibera, Kickstarter, license plate recognition, Lyft, machine readable, Marc Andreessen, Mark Zuckerberg, Max Levchin, move fast and break things, natural language processing, Netflix Prize, Network effects, new economy, Occupy movement, openstreetmap, Paul Graham, peer-to-peer, peer-to-peer lending, Peter Thiel, pre–internet, principal–agent problem, profit motive, race to the bottom, Ray Kurzweil, recommendation engine, rent control, ride hailing / ride sharing, sharing economy, Silicon Valley, Snapchat, software is eating the world, South of Market, San Francisco, TaskRabbit, TED Talk, the Cathedral and the Bazaar, the long tail, The Nature of the Firm, Thomas L Friedman, transportation-network company, Travis Kalanick, Tyler Cowen, Uber and Lyft, Uber for X, uber lyft, ultimatum game, urban planning, WeWork, WikiLeaks, winner-take-all economy, Y Combinator, Yochai Benkler, Zipcar

People in the Airbnb economy don’t have the option of trusting each other on the basis of institutional affiliations, so they do it on the basis of online signaling and peer evaluations.” 3 Sharing Economy companies are not the first to use ratings and algorithms to guide behavior. Their trust systems build on the rating and recommendation systems used by Amazon, Netflix, eBay, Yelp, TripAdvisor, iTunes, the App Store and many others. Each takes individual ratings as their input and transforms them into some form of recommendation. As rating systems have become ubiquitous their usefulness has become a matter of faith in the world of software development.

For Anderson, Amazon represents the return of variety and diversity after decades of homogenous blockbusters: “We are turning from a mass market back into a niche nation, defined not by geography but by interests.” 19 In a Long Tail world there is no need for formal gatekeepers who select or restrict the works that can find their public; instead, Web 2.0 platforms will do it for us using crowdsourced consumer reviews and recommender systems: “By combining infinite shelf space with real-time information about buying trends and public opinion . . . unlimited selection is revealing truths about what consumers want and how they want to get it.” 20 Amazon and Airbnb are similar in many ways. Both are, at least in part, software companies whose inventory is simply a set of entries in a database, accessed via a web site.


Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data by Dipanjan Sarkar

bioinformatics, business intelligence, business logic, computer vision, continuous integration, data science, deep learning, Dr. Strangelove, en.wikipedia.org, functional programming, general-purpose programming language, Guido van Rossum, information retrieval, Internet of things, invention of the printing press, iterative process, language acquisition, machine readable, machine translation, natural language processing, out of africa, performance metric, premature optimization, recommendation engine, self-driving car, semantic web, sentiment analysis, speech recognition, statistical model, text mining, Turing test, web application

Keyphrase extraction, also known as terminology extraction, is defined as the process or technique of extracting key important and relevant terms or phrases from a body of unstructured text such that the core topics or themes of the text document(s) are captured in these key phrases. This technique falls under the broad umbrella of information retrieval and extraction. Keyphrase extraction finds its uses in many areas, including the following: Semantic web Query-based search engines and crawlers Recommendation systems Tagging systems Document similarity Translation Keyphrase extraction is often the starting point for carrying out more complex tasks in text analytics or NLP, and the output from this can itself act as features for more complex systems. There are various approaches for keyphrase extraction.

Web sites and pages contain further links embedded in them, which link to more pages with more links, and this continues across the Internet. This can be represented as a graph-based model where vertices indicate the web pages, and edges indicate links among them. This can be used to form a voting or recommendation system such that when one vertex links to another one in the graph, it is basically casting a vote. Vertex importance is decided not only on the number of votes or edges but also the importance of the vertices that are connected to it and their importance. This helps in determining the score or rank for each vertex or page.

This should be enough for you to get started with analyzing document similarity and clustering, and you can even start combining various techniques from the chapters covered so far. (Hint: Topic models with clustering, building classifiers by combining supervised and unsupervised learning, and augmenting recommendation systems using document clusters—just to name a few!) © Dipanjan Sarkar 2016 Dipanjan Sarkar, Text Analytics with Python, 10.1007/978-1-4842-2388-8_7 7. Semantic and Sentiment Analysis Dipanjan Sarkar1 (1)Bangalore, Karnataka, India Natural language understanding has gained significant importance in the last decade with the advent of machine learning (ML) and further advances like deep learningand artificial intelligence.


Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Geron

AlphaGo, Amazon Mechanical Turk, Bayesian statistics, centre right, combinatorial explosion, constrained optimization, correlation coefficient, crowdsourcing, data science, deep learning, DeepMind, duck typing, en.wikipedia.org, Geoffrey Hinton, iterative process, Netflix Prize, NP-complete, optical character recognition, P = NP, p-value, pattern recognition, performance metric, recommendation engine, self-driving car, SpamAssassin, speech recognition, statistical model

In Chapter 8, we looked at the most common unsupervised learning task: dimensionality reduction. In this chapter, we will look at a few more unsupervised learning tasks and algorithms: Clustering: the goal is to group similar instances together into clusters. This is a great tool for data analysis, customer segmentation, recommender systems, search engines, image segmentation, semi-supervised learning, dimensionality reduction, and more. Anomaly detection: the objective is to learn what “normal” data looks like, and use this to detect abnormal instances, such as defective items on a production line or a new trend in a time series.

Classification (left) versus clustering (right) Clustering is used in a wide variety of applications, including: For customer segmentation: you can cluster your customers based on their purchases, their activity on your website, and so on. This is useful to understand who your customers are and what they need, so you can adapt your products and marketing campaigns to each segment. For example, this can be useful in recommender systems to suggest content that other users in the same cluster enjoyed. For data analysis: when analyzing a new dataset, it is often useful to first discover clusters of similar instances, as it is often easier to analyze clusters separately. As a dimensionality reduction technique: once a dataset has been clustered, it is usually possible to measure each instance’s affinity with each cluster (affinity is any measure of how well an instance fits into a cluster).


pages: 416 words: 108,370

Hit Makers: The Science of Popularity in an Age of Distraction by Derek Thompson

Airbnb, Albert Einstein, Alexey Pajitnov wrote Tetris, always be closing, augmented reality, Clayton Christensen, data science, Donald Trump, Downton Abbey, Ford Model T, full employment, game design, Golden age of television, Gordon Gekko, hindsight bias, hype cycle, indoor plumbing, industrial cluster, information trail, invention of the printing press, invention of the telegraph, Jeff Bezos, John Snow's cholera map, Kevin Roose, Kodak vs Instagram, linear programming, lock screen, Lyft, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Mary Meeker, Menlo Park, Metcalfe’s law, Minecraft, Nate Silver, Network effects, Nicholas Carr, out of africa, planned obsolescence, power law, prosperity theology / prosperity gospel / gospel of success, randomized controlled trial, recommendation engine, Robert Gordon, Ronald Reagan, Savings and loan crisis, Silicon Valley, Skype, Snapchat, social contagion, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, Steven Pinker, subscription business, TED Talk, telemarketer, the medium is the message, The Rise and Fall of American Growth, Tyler Cowen, Uber and Lyft, Uber for X, uber lyft, Vilfredo Pareto, Vincenzo Peruggia: Mona Lisa, women in the workforce

I recently visited Spotify, the large online streaming music company, to talk to Matt Ogle, the lead engineer on a new hit product called Discover Weekly, a personalized list of thirty songs delivered every Monday to tens of million of users. For about a decade, Ogle had worked for several music companies to design the perfect music recommendation engine. His philosophy of music was that most people enjoy new songs, but they don’t enjoy the effort that it takes to find them. They want effortless, frictionless musical revelations, a series of achievable challenges. In the design of Discover Weekly, “every decision we made was shaped by the notion that this should feel like a friend giving you a mix tape,” he said.


pages: 364 words: 99,897

The Industries of the Future by Alec Ross

"World Economic Forum" Davos, 23andMe, 3D printing, Airbnb, Alan Greenspan, algorithmic bias, algorithmic trading, AltaVista, Anne Wojcicki, autonomous vehicles, banking crisis, barriers to entry, Bernie Madoff, bioinformatics, bitcoin, Black Lives Matter, blockchain, Boston Dynamics, Brian Krebs, British Empire, business intelligence, call centre, carbon footprint, clean tech, cloud computing, collaborative consumption, connected car, corporate governance, Credit Default Swap, cryptocurrency, data science, David Brooks, DeepMind, Demis Hassabis, disintermediation, Dissolution of the Soviet Union, distributed ledger, driverless car, Edward Glaeser, Edward Snowden, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, fiat currency, future of work, General Motors Futurama, global supply chain, Google X / Alphabet X, Gregor Mendel, industrial robot, information security, Internet of things, invention of the printing press, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Joi Ito, Kevin Roose, Kickstarter, knowledge economy, knowledge worker, lifelogging, litecoin, low interest rates, M-Pesa, machine translation, Marc Andreessen, Mark Zuckerberg, Max Levchin, Mikhail Gorbachev, military-industrial complex, mobile money, money: store of value / unit of account / medium of exchange, Nelson Mandela, new economy, off-the-grid, offshore financial centre, open economy, Parag Khanna, paypal mafia, peer-to-peer, peer-to-peer lending, personalized medicine, Peter Thiel, precision agriculture, pre–internet, RAND corporation, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Rubik’s Cube, Satoshi Nakamoto, selective serotonin reuptake inhibitor (SSRI), self-driving car, sharing economy, Silicon Valley, Silicon Valley startup, Skype, smart cities, social graph, software as a service, special economic zone, supply-chain management, supply-chain management software, technoutopianism, TED Talk, The Future of Employment, Travis Kalanick, underbanked, unit 8200, Vernor Vinge, Watson beat the top human players on Jeopardy!, women in the workforce, work culture , Y Combinator, young professional

Academics have likened it to both a microscope and telescope—a tool that allows us to both examine smaller details than could previously be observed and to see data at a larger scale, revealing correlations that were previously too distant for us to notice. The story of big data’s real-world impact to this point has been largely about logistics and persuasion. It has been great for supply chains, elections, and advertising because these tend to be fields with lots of small, repeated, and quantifiable actions—hence the “recommendation engines” used by Amazon and Netflix that help make more precise recommendations to customers. But these fields are just the beginning, and by the time my kids enter the workforce, big data won’t be a buzz phrase any longer. It will have permeated parts of our lives that we do not think of today as being rooted in analytics.


pages: 421 words: 110,406

Platform Revolution: How Networked Markets Are Transforming the Economy--And How to Make Them Work for You by Sangeet Paul Choudary, Marshall W. van Alstyne, Geoffrey G. Parker

3D printing, Affordable Care Act / Obamacare, Airbnb, Alvin Roth, Amazon Mechanical Turk, Amazon Web Services, Andrei Shleifer, Apple's 1984 Super Bowl advert, autonomous vehicles, barriers to entry, Benchmark Capital, big data - Walmart - Pop Tarts, bitcoin, blockchain, business cycle, business logic, business process, buy low sell high, chief data officer, Chuck Templeton: OpenTable:, clean water, cloud computing, connected car, corporate governance, crowdsourcing, data acquisition, data is the new oil, data science, digital map, discounted cash flows, disintermediation, driverless car, Edward Glaeser, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, financial innovation, Free Software Foundation, gigafactory, growth hacking, Haber-Bosch Process, High speed trading, independent contractor, information asymmetry, Internet of things, inventory management, invisible hand, Jean Tirole, Jeff Bezos, jimmy wales, John Markoff, Kevin Roose, Khan Academy, Kickstarter, Lean Startup, Lyft, Marc Andreessen, market design, Max Levchin, Metcalfe’s law, multi-sided market, Network effects, new economy, PalmPilot, payday loans, peer-to-peer lending, Peter Thiel, pets.com, pre–internet, price mechanism, recommendation engine, RFID, Richard Stallman, ride hailing / ride sharing, Robert Metcalfe, Ronald Coase, Salesforce, Satoshi Nakamoto, search costs, self-driving car, shareholder value, sharing economy, side project, Silicon Valley, Skype, smart contracts, smart grid, Snapchat, social bookmarking, social contagion, software is eating the world, Steve Jobs, TaskRabbit, The Chicago School, the long tail, the payments system, Tim Cook: Apple, transaction costs, Travis Kalanick, two-sided market, Uber and Lyft, Uber for X, uber lyft, vertical integration, winner-take-all economy, zero-sum game, Zipcar

Many firms—both platform businesses and others—track consumers’ web usage, financial interactions, magazine subscriptions, political and charitable contributions, and much more to create highly detailed individual profiles. In the aggregate, such data can be used for cross-marketing to people who share profiles, as when a recommendation engine on a shopping site tells you, “People like you who bought product A often enjoy product B, too!” The anonymity of this process renders it unobjectionable to most people. But the same underlying data can be, and is, sold to prospective employers, government agencies, health care providers, and marketers of all kinds.


pages: 412 words: 116,685

The Metaverse: And How It Will Revolutionize Everything by Matthew Ball

"hyperreality Baudrillard"~20 OR "Baudrillard hyperreality", 3D printing, Airbnb, Albert Einstein, Amazon Web Services, Apple Newton, augmented reality, Big Tech, bitcoin, blockchain, business process, call centre, cloud computing, commoditize, computer vision, COVID-19, cryptocurrency, deepfake, digital divide, digital twin, disintermediation, don't be evil, Elon Musk, en.wikipedia.org, Ethereum, ethereum blockchain, game design, gig economy, Google Chrome, Google Earth, Google Glasses, hype cycle, intermodal, Internet Archive, Internet of things, iterative process, Jeff Bezos, John Gruber, Kevin Roose, Kickstarter, lockdown, Mark Zuckerberg, Metcalfe’s law, Minecraft, minimum viable product, Neal Stephenson, Network effects, new economy, non-fungible token, open economy, openstreetmap, pattern recognition, peer-to-peer, peer-to-peer model, Planet Labs, pre–internet, QR code, recommendation engine, rent control, rent-seeking, ride hailing / ride sharing, Robinhood: mobile stock trading app, satellite internet, self-driving car, SETI@home, Silicon Valley, skeuomorphism, Skype, smart contracts, Snapchat, Snow Crash, social graph, social web, SpaceX Starlink, Steve Ballmer, Steve Jobs, thinkpad, TikTok, Tim Cook: Apple, TSMC, undersea cable, Vannevar Bush, vertical integration, Vitalik Buterin, Wayback Machine, Y2K

See Google Altberg, Ebbe, 110 Amazon, xiv Amazon content via the Apple App Store, 184–85, 197 business model, 164 Fire OS, 213 Fire Phone, 143 gaming and, 178–79, 278, 281n investment in AR/VR hardware, 143, 277–78 market capitalization of, 166 positioning for the Metaverse, 274, 277–78 recommendation engine, 288 see also Bezos, Jeff Amazon Game Studios, 277 Amazon GameSparks, 107–8, 117 Amazon Go, 157 Amazon Lumberyard, 278 Amazon Luna, 96, 131, 277–78, 282 Amazon Music, 197, 277 Amazon Prime, 179, 185, 197, 277–78 Amazon Prime Video, 185, 277 Amazon Web Services (AWS), 84, 99, 277–78 AMC Entertainment, 28 American Cancer Society, 9 American Express, 172, 188 American Tower, 243, 244 America Online (AOL), 13, 15, 61, 130, 165, 273, 283 Andreessen Horowitz, 233 Android, 25, 61, 143, 212–14 Amazon Fire Phone, 143 backwards “pinch-to-zoom” concept, 149–50, 151 game development for, 131 gaming and, 32, 92, 133 Google Cardboard viewer for, 142 Google’s approach to, 184, 212–15, 275 progressive closure of, 213 Samsung’s approach to, 213 the 30% standard, 188, 190–91, 204–5 Animal Crossing: New Horizons, 30–32, 247 AOL Instant Messenger, 61 Apple Audio Interchange File Format (AIFF), 122 dominance of, 189 investment in AR/VR hardware, 143–44 lawsuit from Epic Games, 14n, 22–23, 32n, 134, 186, 284 lawsuit from the European Union, 184 market capitalization of, 166, 186–87 moral stance on pornography, 261 patents of, 143–44, 150 “There’s an app for that” ad campaign, 26, 150, 243 Apple App Store, 26, 132, 165, 309 categories of apps in, 183, 185–87 control over competing browsers, 194–95 control over payment rails, 201–4, 243–44 economics of, 186 as hindering the development of the Metaverse, 192–95, 197–99, 243–44, 309 policies on blockchain, crypto mining, and cryptocurrency trading apps, 200–201 the 30% standard, 120, 172–80, 183–84, 186–92, 197, 201, 203–4, 286 user identity and control, 299 Apple iOS, 60–61 Animoji, 159 “App Tracking Transparency” (ATT), 204–5 AssistiveTouch, 153 control over its NFC chip, 199–200 Face ID authentication system, 159 FaceTime, 65, 83 the home button, 148–49, 244 iCloud storage, 124, 200 “iPad Natives,” 13, 249 iPads, xi, 294 iPhones, 64, 131, 146, 242–44 Metal, 142, 175, 196 multitasking, 149, 244 Newton tablet, 145 “pinch-to-zoom” concept, 149–50, 151 Safari, 194–96, 209 Siri queries to Apple’s servers, 161 “slide-to-unlock” feature, 150–51 WebKit, 39, 194 Apple Music, 184, 197, 255 Apple News, 256 Apple Watch, 152, 161 application programming interfaces (APIs) authentication, 138 Discord APIs, 135 Instagram’s Twitter integration API, 287, 300 proprietary APIs and gaming consoles, 174–77, 287 in United States v.


pages: 340 words: 94,464

Randomistas: How Radical Researchers Changed Our World by Andrew Leigh

Albert Einstein, Amazon Mechanical Turk, Anton Chekhov, Atul Gawande, basic income, behavioural economics, Black Swan, correlation does not imply causation, crowdsourcing, data science, David Brooks, Donald Trump, ending welfare as we know it, Estimating the Reproducibility of Psychological Science, experimental economics, Flynn Effect, germ theory of disease, Ignaz Semmelweis: hand washing, Indoor air pollution, Isaac Newton, It's morning again in America, Kickstarter, longitudinal study, loss aversion, Lyft, Marshall McLuhan, meta-analysis, microcredit, Netflix Prize, nudge unit, offshore financial centre, p-value, Paradox of Choice, placebo effect, price mechanism, publication bias, RAND corporation, randomized controlled trial, recommendation engine, Richard Feynman, ride hailing / ride sharing, Robert Metcalfe, Ronald Reagan, Sheryl Sandberg, statistical model, Steven Pinker, sugar pill, TED Talk, uber lyft, universal basic income, War on Poverty

Landon ended up with just 8 of the 531 electoral college votes. 61Huizhi Xie & Juliette Aurisset, ‘Improving the sensitivity of online controlled experiments: Case studies at Netflix.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 645–54. ACM, 2016. 62Carlos A. Gomez-Uribe & Neil Hunt, ‘The Netflix recommender system: Algorithms, business value, and innovation’, ACM Transactions on Management Information Systems (TMIS), vol. 6, no. 4, 2016, p. 13. 63Gomez-Uribe & Hunt, ‘The Netflix recommender system’, p. 13. 64Adam D.I. Kramer, Jamie E. Guillory & Jeffrey T. Hancock, ‘Experimental evidence of massive-scale emotional contagion through social networks’, Proceedings of the National Academy of Sciences, vol. 3, no. 24, 2014, pp. 8788–90. 65Because 22.4 per cent of Facebook posts contained negative words, and 46.8 per cent contained negative words, the study also had two control groups: one of which randomly omitted 2.24 per cent of all posts, and another that randomly omitted 4.68 per cent of all posts. 66Oddly, some commentators seem unaware of the finding, continuing to make claims like ‘Facebook makes us feel inadequate, so we try to compete, putting a positive spin and a pretty filter on an ordinary moment – prompting someone else to do the same . . . when you sign up to Facebook you put yourself under pressure to appear popular, fun and loved, regardless of your reality’: Daisy Buchanan, ‘Facebook bragging’s route to divorce’, Australian Financial Review, 27 August 2016 67Kate Bullen & John Oates, ‘Facebook’s ‘experiment’ was socially irresponsible’, Guardian, 2 July 2014. 68Quoted in David Goldman, ‘Facebook still won’t say “sorry” for mind games experiment’, CNNMoney, 2 July 2014. 9 TESTING THEORIES IN POLITICS AND PHILANTHROPY 1Julian Jamison & Dean Karlan, ‘Candy elasticity: Halloween experiments on public political statements’, Economic Inquiry, vol. 54, no. 1, 2016, pp. 543–7. 2This experiment is outlined in detail in Dan Siroker, ‘How Obama raised $60 million by running a simple experiment’, Optimizely blog, 29 November 2010. 3Quoted in Brian Christian, ‘The A/B test: Inside the technology that’s changing the rules of business’, Wired, 25 April 2012. 4Alan S.


pages: 442 words: 94,734

The Art of Statistics: Learning From Data by David Spiegelhalter

Abraham Wald, algorithmic bias, Anthropocene, Antoine Gombaud: Chevalier de Méré, Bayesian statistics, Brexit referendum, Carmen Reinhart, Charles Babbage, complexity theory, computer vision, confounding variable, correlation coefficient, correlation does not imply causation, dark matter, data science, deep learning, DeepMind, Edmond Halley, Estimating the Reproducibility of Psychological Science, government statistician, Gregor Mendel, Hans Rosling, Higgs boson, Kenneth Rogoff, meta-analysis, Nate Silver, Netflix Prize, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, p-value, placebo effect, probability theory / Blaise Pascal / Pierre de Fermat, publication bias, randomized controlled trial, recommendation engine, replication crisis, self-driving car, seminal paper, sparse data, speech recognition, statistical model, sugar pill, systematic bias, TED Talk, The Design of Experiments, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Malthus, Two Sigma

First, we are in an age of data science, in which large and complex data sets are collected from routine sources such as traffic monitors, social media posts and internet purchases, and used as a basis for technological innovations such as optimizing travel routes, targeted advertising or purchase recommendation systems – we shall look at algorithms based on ‘big data’ in Chapter 6. Statistical training is increasingly seen as just one necessary component of being a data scientist, together with skills in data management, programming and algorithm development, as well as proper knowledge of the subject matter.

A major problem is that these algorithms tend to be inscrutable black boxes – they come up with a prediction, but it is almost impossible to work out what is going on inside. This has three negative aspects. First, extreme complexity makes implementation and upgrading a great effort: when Netflix offered a $1m prize for prediction recommendation systems, the winner was so complicated that Netflix ended up not using it. The second negative feature is that we do not know how the conclusion was arrived at, or what confidence we should have in it: we just have to take it or leave it. Simpler algorithms can better explain themselves. Finally, if we do not know how an algorithm is producing its answer, we cannot investigate it for implicit but systematic biases against some members of the community – a point I expand on below.


The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Bayesian statistics, business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, data science, discrete time, disruptive innovation, George Gilder, Google Earth, hype cycle, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, Large Hadron Collider, late capitalism, lifelogging, linked data, longitudinal study, machine readable, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, SimCity, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, technological solutionism, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

Discovering correlations between certain items led to new product placements and alterations to shelf space management and a 16 per cent increase in revenue per shopping cart in the first month’s trial. There was no hypothesis that Product A was often bought with Product H that was then tested. The data were simply queried to discover what relationships existed that might have previously been unnoticed. Similarly, Amazon’s recommendation system produces suggestions for other items a shopper might be interested in without knowing anything about the culture and conventions of books and reading; it simply identifies patterns of purchasing across customers in order to determine whether, if Person A likes Book X, they are also likely to like Book Y given their own and others’ consumption patterns.

Popper (1979, cited in Callebaut 2012: 74) thus suggests that all science adopts a searchlight approach to scientific discovery, with the focus of light guided by previous findings, theories and training; by speculation that is grounded in experience and knowledge. The same is true for Amazon, Hunch, Ayasdi, and Google. How Amazon constructed its recommendation system was based on scientific reasoning, underpinned by a guiding model and accompanied by empirical testing designed to improve the performance of the algorithms it uses. Likewise, Google undertakes extensive research and development, it works in partnership with scientists and it buys scientific knowledge, either funding research within universities or by buying the IP of other companies, to refine and extend the utility of how it organises, presents and extracts value from data.


pages: 404 words: 92,713

The Art of Statistics: How to Learn From Data by David Spiegelhalter

Abraham Wald, algorithmic bias, Antoine Gombaud: Chevalier de Méré, Bayesian statistics, Brexit referendum, Carmen Reinhart, Charles Babbage, complexity theory, computer vision, confounding variable, correlation coefficient, correlation does not imply causation, dark matter, data science, deep learning, DeepMind, Edmond Halley, Estimating the Reproducibility of Psychological Science, government statistician, Gregor Mendel, Hans Rosling, Higgs boson, Kenneth Rogoff, meta-analysis, Nate Silver, Netflix Prize, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, p-value, placebo effect, probability theory / Blaise Pascal / Pierre de Fermat, publication bias, randomized controlled trial, recommendation engine, replication crisis, self-driving car, seminal paper, sparse data, speech recognition, statistical model, sugar pill, systematic bias, TED Talk, The Design of Experiments, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Malthus, Two Sigma

First, we are in an age of data science, in which large and complex data sets are collected from routine sources such as traffic monitors, social media posts and internet purchases, and used as a basis for technological innovations such as optimizing travel routes, targeted advertising or purchase recommendation systems—we shall look at algorithms based on ‘big data’ in Chapter 6. Statistical training is increasingly seen as just one necessary component of being a data scientist, together with skills in data management, programming and algorithm development, as well as proper knowledge of the subject matter.

A major problem is that these algorithms tend to be inscrutable black boxes—they come up with a prediction, but it is almost impossible to work out what is going on inside. This has three negative aspects. First, extreme complexity makes implementation and upgrading a great effort: when Netflix offered a $1m prize for prediction recommendation systems, the winner was so complicated that Netflix ended up not using it. The second negative feature is that we do not know how the conclusion was arrived at, or what confidence we should have in it: we just have to take it or leave it. Simpler algorithms can better explain themselves. Finally, if we do not know how an algorithm is producing its answer, we cannot investigate it for implicit but systematic biases against some members of the community—a point I expand on below


pages: 480 words: 123,979

Dawn of the New Everything: Encounters With Reality and Virtual Reality by Jaron Lanier

4chan, air gap, augmented reality, back-to-the-land, Big Tech, Bill Atkinson, Buckminster Fuller, Burning Man, carbon footprint, cloud computing, collaborative editing, commoditize, Computer Lib, cosmological constant, creative destruction, crowdsourcing, deep learning, Donald Trump, Douglas Engelbart, Douglas Hofstadter, El Camino Real, Elon Musk, fake news, Firefox, game design, general-purpose programming language, gig economy, Google Glasses, Grace Hopper, Gödel, Escher, Bach, Hacker Ethic, Hans Moravec, Howard Rheingold, hype cycle, impulse control, information asymmetry, intentional community, invisible hand, Ivan Sutherland, Jaron Lanier, John Gilmore, John Perry Barlow, John von Neumann, Kevin Kelly, Kickstarter, Kuiper Belt, lifelogging, mandelbrot fractal, Mark Zuckerberg, Marshall McLuhan, Menlo Park, military-industrial complex, Minecraft, Mitch Kapor, Mondo 2000, Mother of all demos, Murray Gell-Mann, Neal Stephenson, Netflix Prize, Network effects, new economy, Nick Bostrom, Norbert Wiener, Oculus Rift, pattern recognition, Paul Erdős, peak TV, Plato's cave, profit motive, Project Xanadu, quantum cryptography, Ray Kurzweil, reality distortion field, recommendation engine, Richard Feynman, Richard Stallman, Ronald Reagan, self-driving car, Silicon Valley, Silicon Valley startup, Skinner box, Skype, Snapchat, stem cell, Stephen Hawking, Steve Bannon, Steve Jobs, Steven Levy, Stewart Brand, systems thinking, technoutopianism, Ted Nelson, telemarketer, telepresence, telepresence robot, Thorstein Veblen, Turing test, Vernor Vinge, Whole Earth Catalog, Whole Earth Review, WikiLeaks, wikimedia commons

The company even offered a million-dollar prize for ideas to make the algorithm smarter. The thing about Netflix, though, is that it doesn’t offer a comprehensive catalog, especially of recent, hot releases. If you think of any particular movie, it might not be available for streaming. The recommendation engine is a magician’s misdirection, distracting you from the fact that not everything is available. So is the algorithm intelligent, or are people making themselves somewhat blind and silly in order to make the algorithm seem intelligent? What Netflix has done is admirable, because the whole point of Netflix is to deliver theatrical illusions to you.


pages: 382 words: 120,064

Bank 3.0: Why Banking Is No Longer Somewhere You Go but Something You Do by Brett King

3D printing, Abraham Maslow, additive manufacturing, Airbus A320, Albert Einstein, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, Apollo 11, Apollo 13, Apollo Guidance Computer, asset-backed security, augmented reality, barriers to entry, behavioural economics, bitcoin, bounce rate, business intelligence, business process, business process outsourcing, call centre, capital controls, citizen journalism, Clayton Christensen, cloud computing, credit crunch, crowdsourcing, disintermediation, en.wikipedia.org, fixed income, George Gilder, Google Glasses, high net worth, I think there is a world market for maybe five computers, Infrastructure as a Service, invention of the printing press, Jeff Bezos, jimmy wales, Kickstarter, London Interbank Offered Rate, low interest rates, M-Pesa, Mark Zuckerberg, mass affluent, Metcalfe’s law, microcredit, mobile money, more computing power than Apollo, Northern Rock, Occupy movement, operational security, optical character recognition, peer-to-peer, performance metric, Pingit, platform as a service, QR code, QWERTY keyboard, Ray Kurzweil, recommendation engine, RFID, risk tolerance, Robert Metcalfe, self-driving car, Skype, speech recognition, stem cell, telepresence, the long tail, Tim Cook: Apple, transaction costs, underbanked, US Airways Flight 1549, web application, world market for maybe five computers

In Siri’s patent application, various possibilities are hinted at, including being a voice agent providing assistance for “automated teller machines”.4 In fact, SRI (the creator of Siri™) and BBVA recently announced a collaboration to introduce Lola5, a Siri-like technology, to customers through the Internet and via voice. Siri’s near-term capabilities include: 1. Being able to make simple online purchases, such as “Purchase Bank 3.0 from Amazon Kindle” 2. Serving as a recommendation engine or intelligent automated assistant—an “agent avatar”, as it has sometimes been labelled However, there are some challenges in having customers talk into their phones for customer support, or replacing an IVR system with technologies such as Lola, as a recent New York Times article pointed out when it called Siri “the latest public nuisance in the cell phone revolution”.


The Deep Learning Revolution (The MIT Press) by Terrence J. Sejnowski

AI winter, Albert Einstein, algorithmic bias, algorithmic trading, AlphaGo, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, autonomous vehicles, backpropagation, Baxter: Rethink Robotics, behavioural economics, bioinformatics, cellular automata, Claude Shannon: information theory, cloud computing, complexity theory, computer vision, conceptual framework, constrained optimization, Conway's Game of Life, correlation does not imply causation, crowdsourcing, Danny Hillis, data science, deep learning, DeepMind, delayed gratification, Demis Hassabis, Dennis Ritchie, discovery of DNA, Donald Trump, Douglas Engelbart, driverless car, Drosophila, Elon Musk, en.wikipedia.org, epigenetics, Flynn Effect, Frank Gehry, future of work, Geoffrey Hinton, Google Glasses, Google X / Alphabet X, Guggenheim Bilbao, Gödel, Escher, Bach, haute couture, Henri Poincaré, I think there is a world market for maybe five computers, industrial robot, informal economy, Internet of things, Isaac Newton, Jim Simons, John Conway, John Markoff, John von Neumann, language acquisition, Large Hadron Collider, machine readable, Mark Zuckerberg, Minecraft, natural language processing, Neil Armstrong, Netflix Prize, Norbert Wiener, OpenAI, orbital mechanics / astrodynamics, PageRank, pattern recognition, pneumatic tube, prediction markets, randomized controlled trial, Recombinant DNA, recommendation engine, Renaissance Technologies, Rodney Brooks, self-driving car, Silicon Valley, Silicon Valley startup, Socratic dialogue, speech recognition, statistical model, Stephen Hawking, Stuart Kauffman, theory of mind, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, traveling salesman, Turing machine, Von Neumann architecture, Watson beat the top human players on Jeopardy!, world market for maybe five computers, X Prize, Yogi Berra

As a consequence, there are fewer parameters to train on each epoch, and the resulting network has fewer dependencies between units than would be the case if the same large network were trained on every epoch. Dropout decreases the error rate in deep learning networks by 10 percent, which is a large improvement. In 2009, Netflix conducted an open competition, offering a prize of $1 million to the first person who could reduce the error of their recommender system by 10 percent.16 Almost every graduate student in machine learning entered the competition. Netflix probably inspired $10 million of research for the cost of the prize. And deep networks are now a core technology for online streaming.17 Intriguingly, cortical synapses drop out at a high rate.

Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research 15 (2014):1929–1958. 16. “Netflix Prize,” Wikipedia, last modified, August 23, 2017, https://en.wikipedia .org/wiki/Netflix_Prize. 17. Carlos A. Gomez-Uribe, Neil Hunt, “The Netflix Recommender System: Algorithms,” ACM Transactions on Management Information Systems 6, no. 4 (2016) , article no. 13. 18. T. M. Bartol Jr., C. Bromer, J. Kinney, M. A. Chirillo, J. N. Bourne, K. M. Harris, and T. J. Sejnowski, “Nanoconnectomic Upper Bound on the Variability of Synaptic Plasticity,” eLife, 4:e10778, 2015, doi:10.7554/eLife.10778. 19.


pages: 540 words: 103,101

Building Microservices by Sam Newman

airport security, Amazon Web Services, anti-pattern, business logic, business process, call centre, continuous integration, Conway's law, create, read, update, delete, defense in depth, don't repeat yourself, Edward Snowden, fail fast, fallacies of distributed computing, fault tolerance, index card, information retrieval, Infrastructure as a Service, inventory management, job automation, Kubernetes, load shedding, loose coupling, microservices, MITM: man-in-the-middle, platform as a service, premature optimization, pull request, recommendation engine, Salesforce, SimCity, social graph, software as a service, source of truth, sunk-cost fallacy, systems thinking, the built environment, the long tail, two-pizza team, web application, WebSocket

Let’s imagine that initially we identify four contexts we think our monolithic backend covers: Catalog Everything to do with metadata about the items we offer for sale Finance Reporting for accounts, payments, refunds, etc. Warehouse Dispatching and returning of customer orders, managing inventory levels, etc. Recommendation Our patent-pending, revolutionary recommendation system, which is highly complex code written by a team with more PhDs than the average science lab The first thing to do is to create packages representing these contexts, and then move the existing code into them. With modern IDEs, code movement can be done automatically via refactorings, and can be done incrementally while we are doing other things.

Currently, all of this is handled by the finance-related code. If we split this service out, we can provide additional protections to this individual service in terms of monitoring, protection of data at transit, and protection of data at rest — ideas we’ll look at in more detail in Chapter 9. Technology The team looking after our recommendation system has been spiking out some new algorithms using a logic programming library in the language Clojure. The team thinks this could benefit our customers by improving what we offer them. If we could split out the recommendation code into a separate service, it would be easy to consider building an alternative implementation that we could test against.


pages: 575 words: 140,384

It's Not TV: The Spectacular Rise, Revolution, and Future of HBO by Felix Gillette, John Koblin

activist fund / activist shareholder / activist investor, Airbnb, Amazon Web Services, AOL-Time Warner, Apollo 13, Big Tech, bike sharing, Black Lives Matter, Burning Man, business cycle, call centre, cloud computing, coronavirus, corporate governance, COVID-19, data science, disruptive innovation, Dissolution of the Soviet Union, Donald Trump, Elon Musk, Erlich Bachman, Exxon Valdez, fake news, George Floyd, Jeff Bezos, Keith Raniere, lockdown, Menlo Park, multilevel marketing, Nelson Mandela, Netflix Prize, out of africa, payday loans, peak TV, period drama, recommendation engine, Richard Hendricks, ride hailing / ride sharing, risk tolerance, Robert Durst, Ronald Reagan, Saturday Night Live, self-driving car, shareholder value, Sheryl Sandberg, side hustle, Silicon Valley, Silicon Valley startup, Stephen Hawking, Steve Jobs, subscription business, tech billionaire, TechCrunch disrupt, TikTok, Tim Cook: Apple, traveling salesman, unpaid internship, upwardly mobile, urban decay, WeWork

The company was expanding into foreign markets, offering the service for the first time in Canada, Latin America, and the Caribbean. And its technology kept improving. Just the year before, in 2009, Netflix had handed out a $1-million award to an international team of machine-learning experts for developing an algorithm that was able to beat the accuracy of the company’s in-house recommendation engine, Cinematch, by 10 percent. Netflix’s brand name was growing synonymous with a new, better way of watching commercial-free Hollywood entertainment at home. Wall Street was smitten. Netflix’s stock price was shooting up, and media outlets were fawning over Netflix’s future. In the fall of 2010, Fortune magazine, which was owned by HBO’s parent company Time Warner, named Netflix’s CEO Hastings as its Businessperson of the Year.


pages: 364 words: 119,398

Men Who Hate Women: From Incels to Pickup Artists, the Truth About Extreme Misogyny and How It Affects Us All by Laura Bates

"World Economic Forum" Davos, 4chan, Ada Lovelace, anti-bias training, autism spectrum disorder, Bellingcat, Black Lives Matter, Boris Johnson, Brexit referendum, Cambridge Analytica, cognitive dissonance, coherent worldview, deplatforming, Dominic Cummings, Donald Trump, fake news, feminist movement, Filter Bubble, gender pay gap, George Floyd, glass ceiling, Grace Hopper, job satisfaction, Kickstarter, off grid, Overton Window, recommendation engine, ride hailing / ride sharing, Snapchat, Social Justice Warrior, Steve Bannon, tech bro, young professional

Chaslot told the Daily Beast he very quickly realised that ‘YouTube’s recommendation was putting people into filter bubbles… There was no way out.’ In a 2019 New York Times interview, YouTube’s chief product officer, Neal Mohan, denied that the platform created a ‘rabbit hole’ effect, saying that it offered a full spectrum of content and opinion, and that watch time was not the only feature used by the site’s recommendation systems. He acknowledged that the algorithm might queue up more extreme videos, but claimed it might also offer ‘other videos that skew in the opposite direction’.12 But that didn’t seem to be the case in my own experiments, or those of other writers who have documented this phenomenon. This doesn’t mean that YouTube is deliberately setting out to promote and support these extreme racist and misogynist viewpoints.

Just like the facilitation of manosphere radicalisation on the platform, the problem may have been completely unintentional, but the outcome was horrifying. What matters is that, once YouTube was alerted to the issue, it was given a clear solution. Researchers suggested that the platform simply turn off its recommendation system on videos of children. It was a change that could have been implemented automatically and with ease. And it would have stopped the exploitation in its tracks. But YouTube declined to put it into practice. Why? Because recommendations are its biggest traffic driver, it told the New York Times, so turning them off ‘would hurt “creators” who rely on those clicks’.


pages: 451 words: 115,720

Green Tyranny: Exposing the Totalitarian Roots of the Climate Industrial Complex by Rupert Darwall

1960s counterculture, active measures, Affordable Care Act / Obamacare, Albert Einstein, Bakken shale, Berlin Wall, Bernie Sanders, California energy crisis, carbon credits, carbon footprint, centre right, clean tech, collapse of Lehman Brothers, creative destruction, decarbonisation, deindustrialization, dematerialisation, disinformation, Donald Trump, electricity market, Elon Musk, energy security, energy transition, facts on the ground, Fall of the Berlin Wall, Garrett Hardin, gigafactory, Gunnar Myrdal, Herbert Marcuse, hydraulic fracturing, Intergovernmental Panel on Climate Change (IPCC), invisible hand, it's over 9,000, James Watt: steam engine, John Elkington, Joseph Schumpeter, Kenneth Rogoff, Kickstarter, liberal capitalism, market design, means of production, megaproject, Mikhail Gorbachev, mittelstand, Murray Bookchin, Neil Armstrong, nuclear winter, obamacare, oil shale / tar sands, Paris climate accords, Peace of Westphalia, peak oil, plutocrats, postindustrial economy, precautionary principle, pre–internet, recommendation engine, renewable energy transition, rent-seeking, road to serfdom, rolling blackouts, Ronald Reagan, shareholder value, Silicon Valley, Silicon Valley billionaire, Solyndra, Strategic Defense Initiative, subprime mortgage crisis, tech baron, tech billionaire, The Wealth of Nations by Adam Smith, Tragedy of the Commons, women in the workforce, young professional

Joachim Israel, an exceptionally fortunate German Jew who managed to settle in Sweden just as the door was closing, wrote in his memoir, The Nazis were quite willing to fulfil these demands and surely understood that the Swedish and Swiss requests were confirmation of the correctness of their anti-Semitic policies.21 On September 9, 1938, Sweden introduced the Gränsrekomendationssystemet (Border Recommendation System), a bureaucratic term designed to sanitize its intent, making it virtually impossible for Jewish citizens of the German Reich to enter Sweden. Israel writes in his memoir that the year he had the J stamped in his passport was the same year “in which the responsible Swedish minister, the brother of the prime minister, issued secret orders that Swedish border guards should turn away all unauthorized Jews who tried to cross the border and send them back to Germany.”22 Solutions to Sweden’s population crisis—the one ostensibly identified by the Myrdals four years earlier—were presenting themselves in growing numbers at its borders.

Stalin, Joseph Five Year Plans Stanford University State Institute for Racial Biology (Sweden) Staudenmaier, Peter Stern, Todd Steyer, Tom Stockholm Conference Stockholm Declaration Principle 21 Principle 22 Stockholm Environment Institute “Design to Win” (2007) Streicher, Julius Strong, Maurice Students for a Democratic Society (SDS) Port Huron Statement sulfur dioxide Sunday Times Sussman, Bob Sustainable Markets Foundation Supplemental Poverty Measure Svenska Dagbladet Svensson, Göte Svensson, Ulf Sweden 1812 Policy Border Recommendation System (1938) Defense Staff Environmental Protection Agency foreign aid program of Gothenburg Hårsfjärden incident (1982) National Environmental Board State Forestry Agency Stockholm U-137 Incident (1981) Uppsala Swedish Committee for Vietnam Swedish Energy Research Commission Swedish Radio Swiss Re Switzerland Basle Geneva Kaieraugust Zurich Syria Taylor, Kat TechNet Tesla, Nikola Thatcher, Margaret administration of Theutenberg, Bo Diaries from the Foreign Ministry Third Reich (1933–1945) Architects and Engineers Association Hitler Youth Nationalsozialistische Bibliotek public health policies of Reich Ministry of Economic Affairs Reich Ministry of Finance Reich Windpower Study Group symbolism of Thirty Years’ War (1618–1648) Thomas, David Thomas, Lewis Thompson, Starley Thoreau, Henry David Three Mile Island incident (1979) Threshold Foundation Thunberg, Anders Swedish International Secretary Tides Foundation founding of (1976) Time Tinbergen, Jan Tinbergen Rule Tocqueville, Alexis de Tolba, Mustafa visit to Stockholm (1982) Tooze, Adam Toronto Conference (1988) role of NGOs in Tretyakov, Sergei Trittin, Jürgen Trudeau, Pierre Trumka, Richard Trump, Donald TTAPS Twitter Tyndall, John Uganda Entebbe Ukraine Chernobyl Disaster (1986) Ulbricht, Walter Ulrich, Bernard Undén, Östen Union of Concerned Scientists formation of (1968) United Arab Emirates (UAE) Abu Dhabi United Kingdom (UK) Climate Change Act Cumbria Department for Energy and Climate Change Department for International Development Department for Transport domestic electricity prices in Drax power station electricity grid infrastructure in Foreign and Commonwealth Office London National Grid United Nations (UN) Charter of Economic Commission for Europe (UNECE) Educational, Scientific and Cultural Organization (UNESCO) Environment Programme (UNEP) Framework Convention on Climate Change (1992) General Assembly Geneva Convention on Long-Range Transboundary Air Pollution (1979) Global Impact Resolution (1998) Security Council United States Ahoskie, NC Air Quality Agreement (1991) American Clean Energy and Security Act (Waxman-Markey Bill) (2009) Buffalo, NY California Global Warming Solutions Act (2006) California Renewable Portfolio Standard Program Californian Coastal Commission Californian Energy Crisis (2001–2002) Central Intelligence Agency (CIA) Chicago, IL Clean Air Act (1970) Clean Power Plan (2015) Congress Constitution of Court of Appeals Declaration of Independence (1776) Department for Justice Department of Health Department of the Interior Energy Independence and Security Act (2007) Environmental Protection Agency (EPA) Federal Bureau of Investigation (FBI) Freedom of Information Act House of Representatives Ithaca, NY Los Angeles, CA National Acid Precipitation Assessment Program (NAPAP) natural gas and oil output of New Deal New York Office of Management and Budget Office of Science and Technology Policy oil reserves of Palo Alto, CA Pentagon Phoenix, AZ Proposition 23 (2010) Sacramento, CA Safe Drinking Water Act (2005) San Francisco, CA Santa Barbara oil spill (1969) Senate Senate Committee on Environment and Public Works (EPW) Senate Foreign Relations Committee Senate Judiciary Committee Silicon Valley State Department subprime mortgage crisis (2007–2009) Supreme Court Washington, D.C.


Four Battlegrounds by Paul Scharre

2021 United States Capitol attack, 3D printing, active measures, activist lawyer, AI winter, AlphaGo, amateurs talk tactics, professionals talk logistics, artificial general intelligence, ASML, augmented reality, Automated Insights, autonomous vehicles, barriers to entry, Berlin Wall, Big Tech, bitcoin, Black Lives Matter, Boeing 737 MAX, Boris Johnson, Brexit referendum, business continuity plan, business process, carbon footprint, chief data officer, Citizen Lab, clean water, cloud computing, commoditize, computer vision, coronavirus, COVID-19, crisis actor, crowdsourcing, DALL-E, data is not the new oil, data is the new oil, data science, deep learning, deepfake, DeepMind, Demis Hassabis, Deng Xiaoping, digital map, digital rights, disinformation, Donald Trump, drone strike, dual-use technology, Elon Musk, en.wikipedia.org, endowment effect, fake news, Francis Fukuyama: the end of history, future of journalism, future of work, game design, general purpose technology, Geoffrey Hinton, geopolitical risk, George Floyd, global supply chain, GPT-3, Great Leap Forward, hive mind, hustle culture, ImageNet competition, immigration reform, income per capita, interchangeable parts, Internet Archive, Internet of things, iterative process, Jeff Bezos, job automation, Kevin Kelly, Kevin Roose, large language model, lockdown, Mark Zuckerberg, military-industrial complex, move fast and break things, Nate Silver, natural language processing, new economy, Nick Bostrom, one-China policy, Open Library, OpenAI, PalmPilot, Parler "social media", pattern recognition, phenotype, post-truth, purchasing power parity, QAnon, QR code, race to the bottom, RAND corporation, recommendation engine, reshoring, ride hailing / ride sharing, robotic process automation, Rodney Brooks, Rubik’s Cube, self-driving car, Shoshana Zuboff, side project, Silicon Valley, slashdot, smart cities, smart meter, Snapchat, social software, sorting algorithm, South China Sea, sparse data, speech recognition, Steve Bannon, Steven Levy, Stuxnet, supply-chain attack, surveillance capitalism, systems thinking, tech worker, techlash, telemarketer, The Brussels Effect, The Signal and the Noise by Nate Silver, TikTok, trade route, TSMC

YouTube’s algorithm for recommending videos to watch next has come under fire for promoting harmful content, from conspiracy theory videos to extremist content. YouTube executives have stated that over 70 percent of viewing hours are driven by the algorithm. Google engineers described the deep learning algorithm as “one of the largest-scale and most sophisticated industrial recommendation systems in existence.” Yet multiple independent researchers, journalists, and even a former Google engineer claimed in 2018 the algorithm was biased toward more extreme and incendiary content, leading viewers video-by-video down a “rabbit hole” of conspiracy theories and misinformation. Critics have speculated that the effect was not intentional, but rather that the algorithm was responding to increased viewer engagement with more sensational material in a “feedback loop” that trained the machine learning system to provide viewers with more inflammatory content.

The Complete Guide,” Hootsuite Blog, June 21, 2021, https://blog.hootsuite.com/how-the-youtube-algorithm-works/; Paige Cooper, “How the Facebook Algorithm Works in 2021 and How to Make It Work for You,” Hootsuite Blog, February 10, 2021, https://blog.hootsuite.com/facebook-algorithm/. 144more sophisticated algorithm: Eric Meyerson, “YouTube Now: Why We Focus on Watch Time,” YouTube Official Blog, August 10, 2012, https://blog.youtube/news-and-events/youtube-now-why-we-focus-on-watch-time. 144deep learning to improve their algorithms: Koumchatzky and Andryeyev, “Using Deep Learning at Scale in Twitter’s Timelines.” 1449.3 million problematic videos: “YouTube Community Guidelines Enforcement,” Google Transparency Report, June 2021, https://transparencyreport.google.com/youtube-policy/removals. 145algorithm for recommending videos to watch next: Paul Lewis, “‘Fiction Is Outperforming Reality’: How YouTube’s Algorithm Distorts Truth,” The Guardian, February 2, 2018, https://www.theguardian.com/technology/2018/feb/02/how-youtubes-algorithm-distorts-truth; Zeynep Tufekci, “YouTube, the Great Radicalizer,” New York Times, March 10, 2018, https://www.nytimes.com/2018/03/10/opinion/sunday/youtube-politics-radical.html; Sam Levin, “Las Vegas Survivors Furious as YouTube Promotes Clips Calling Shooting a Hoax,” The Guardian, October 4, 2017, https://www.theguardian.com/us-news/2017/oct/04/las-vegas-shooting-youtube-hoax-conspiracy-theories; Clive Thompson, “YouTube’s Plot to Silence Conspiracy Theories,” Wired, September 18, 2020, https://www.wired.com/story/youtube-algorithm-silence-conspiracy-theories/. 145over 70 percent of viewing hours are driven by the algorithm: Joan E. Solsman, “YouTube’s AI Is the Puppet Master over Most of What You Watch,” CNET, January 10, 2018, https://www.cnet.com/news/youtube-ces-2018-neal-mohan/. 145“one of the largest-scale and most sophisticated industrial recommendation systems”: Paul Covington, Jay Adams, and Emre Sargin, Deep Neural Networks for YouTube Recommendations (Google, 2016), https://research.google.com/pubs/archive/45530.pdf. 145even a former Google engineer: Lewis, “‘Fiction Is Outperforming Reality’”; Guillaume Chaslot, “The Toxic Potential of YouTube’s Feedback Loop,” Wired, July 13, 2019, https://www.wired.com/story/the-toxic-potential-of-youtubes-feedback-loop/. 145more extreme and incendiary content: Lewis, “‘Fiction Is Outperforming Reality’”; Tufekci, “YouTube, the Great Radicalizer”; Nicas, “How YouTube Drives People to the Internet’s Darkest Corners,” Wall Street Journal, February 7, 2018, https://www.wsj.com/articles/how-youtube-drives-viewers-to-the-internets-darkest-corners-1518020478. 145“rabbit hole” of conspiracy theories: Kevin Roose, “The Making of a YouTube Radical,” New York Times, June 8, 2019, https://www.nytimes.com/interactive/2019/06/08/technology/youtube-radical.html; Tufekci, “YouTube, the Great Radicalizer”; Max Fisher and Amanda Taub, “How YouTube Radicalized Brazil,” New York Times, August 11, 2019, https://www.nytimes.com/2019/08/11/world/americas/youtube-brazil.html; Thompson, “YouTube’s Plot to Silence Conspiracy Theories.” 145responding to increased viewer engagement: Chaslot, “The Toxic Potential of YouTube’s Feedback Loop.” 145denied that a “rabbit hole” effect exists: Kevin Roose, “YouTube’s Product Chief on Online Radicalization and Algorithmic Rabbit Holes,” New York Times, March 29, 2019, https://www.nytimes.com/2019/03/29/technology/youtube-online-extremism.html. 145opacity of machine learning algorithms: Chico Q.

in Proceedings of the 32nd International Conference on Machine Learning (Lille, France, 2015), https://arxiv.org/pdf/1804.07933.pdf; ilmoi, “Poisoning Attacks on Machine Learning,” towards data science, July 14, 2019, https://towardsdatascience.com/poisoning-attacks-on-machine-learning-1ff247c254db. 245recommendation algorithms: Hai Huang, “Data Poisoning Attacks to Deep Learning Based Recommender Systems,” (paper, Network and Distributed Systems Security (NDSS) Symposium 2021, February 21–25, 2021), https://arxiv.org/pdf/2101.02644.pdf. 245poison a medical AI model: Matthew Jagielski et al., Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning (arXiv.org, September 28, 2021), https://arxiv.org/pdf/1804.00308.pdf. 245manipulate real-world data: Ricky Laishram and Vir Virander Phoha, Curie: A method for protecting SVM Classifier from Poisoning Attack (arXiv.org, June 7, 2016), https://arxiv.org/pdf/1606.01584.pdf; Zhao et al., “Efficient Label Contamination Attacks.” 245data from external sources: Zhao et al., “Efficient Label Contamination Attacks.” 245alter the data or even just the label: Zhao et al., “Efficient Label Contamination Attacks.” 245insert adversarial noise into the training data: Adrien Chan-Hon-Tong, “An Algorithm for Generating Invisible Data Poisoning Using Adversarial Noise That Breaks Image Classification Deep Learning,” Machine Learning & Knowledge Extraction 1, no. 1 (November 9, 2018), https://doi.org/10.3390/make1010011; Ali Shafahi et al., “Poison Frogs!


pages: 493 words: 139,845

Women Leaders at Work: Untold Tales of Women Achieving Their Ambitions by Elizabeth Ghaffari

"World Economic Forum" Davos, Albert Einstein, AltaVista, Bear Stearns, business cycle, business process, cloud computing, Columbine, compensation consultant, corporate governance, corporate social responsibility, dark matter, deal flow, do what you love, family office, Fellow of the Royal Society, financial independence, follow your passion, glass ceiling, Grace Hopper, high net worth, John Elkington, knowledge worker, Larry Ellison, Long Term Capital Management, longitudinal study, Oklahoma City bombing, performance metric, pink-collar, profit maximization, profit motive, recommendation engine, Ronald Reagan, Savings and loan crisis, shareholder value, Silicon Valley, Silicon Valley startup, Steve Ballmer, Steve Jobs, thinkpad, trickle-down economics, urban planning, women in the workforce, young professional

Kate just came back from rural India, studying the ways people there use technology. These are intriguing issues. I find it especially interesting to bring such people together with more mathematical people like me. I have worked on models of social networks and recommendation systems that exist in social networks. When I talk to danah, I'm trying to understand what people are seeking through recommendation systems. When you merge qualitative and quantitative skill sets, it takes a while for each to adapt to the other because there are language barriers and differences in what we're trying to achieve. When we finally do achieve something jointly, I find that it's usually very good and very deep. __________ 3 The lower case spelling of danah boyd is “how she chooses to identify” herself.


pages: 527 words: 147,690

Terms of Service: Social Media and the Price of Constant Connection by Jacob Silverman

"World Economic Forum" Davos, 23andMe, 4chan, A Declaration of the Independence of Cyberspace, Aaron Swartz, Airbnb, airport security, Amazon Mechanical Turk, augmented reality, basic income, Big Tech, Brian Krebs, California gold rush, Californian Ideology, call centre, cloud computing, cognitive dissonance, commoditize, company town, context collapse, correlation does not imply causation, Credit Default Swap, crowdsourcing, data science, deep learning, digital capitalism, disinformation, don't be evil, driverless car, drone strike, Edward Snowden, Evgeny Morozov, fake it until you make it, feminist movement, Filter Bubble, Firefox, Flash crash, game design, global village, Google Chrome, Google Glasses, Higgs boson, hive mind, Ian Bogost, income inequality, independent contractor, informal economy, information retrieval, Internet of things, Jacob Silverman, Jaron Lanier, jimmy wales, John Perry Barlow, Kevin Kelly, Kevin Roose, Kickstarter, knowledge economy, knowledge worker, Larry Ellison, late capitalism, Laura Poitras, license plate recognition, life extension, lifelogging, lock screen, Lyft, machine readable, Mark Zuckerberg, Mars Rover, Marshall McLuhan, mass incarceration, meta-analysis, Minecraft, move fast and break things, national security letter, Network effects, new economy, Nicholas Carr, Occupy movement, off-the-grid, optical character recognition, payday loans, Peter Thiel, planned obsolescence, postindustrial economy, prediction markets, pre–internet, price discrimination, price stability, profit motive, quantitative hedge fund, race to the bottom, Ray Kurzweil, real-name policy, recommendation engine, rent control, rent stabilization, RFID, ride hailing / ride sharing, Salesforce, self-driving car, sentiment analysis, shareholder value, sharing economy, Sheryl Sandberg, Silicon Valley, Silicon Valley ideology, Snapchat, social bookmarking, social graph, social intelligence, social web, sorting algorithm, Steve Ballmer, Steve Jobs, Steven Levy, systems thinking, TaskRabbit, technological determinism, technological solutionism, technoutopianism, TED Talk, telemarketer, transportation-network company, Travis Kalanick, Turing test, Uber and Lyft, Uber for X, uber lyft, universal basic income, unpaid internship, women in the workforce, Y Combinator, yottabyte, you are the product, Zipcar

Even so, companies remain extraordinarily reliant on these reviews. A 2011 Harvard Business School study found that, on Yelp, “an extra star is worth an extra 5 to 9 percent in revenue.” The result of all this reviewing has been the atrophying of the critical culture, with professional critics seen as dispensable, nothing more than recommendation engines who can be replaced with algorithms and free, crowdsourced reviews. (Even so, some prominent cultural critics remain, though with less influence than they used to hold, and a smattering of publications, from the actuarially precise Consumer Reports to the liberal humanist New York Review of Books, continue to thrive.)


Mastering Machine Learning With Scikit-Learn by Gavin Hackeling

backpropagation, computer vision, constrained optimization, correlation coefficient, data science, Debian, deep learning, distributed generation, iterative process, natural language processing, Occam's razor, optical character recognition, performance metric, recommendation engine

For example, assume that your training data consists of the samples plotted in the following figure: www.it-ebooks.info Clustering with K-Means Clustering might reveal the following two groups, indicated by squares and circles: Clustering could also reveal the following four groups: [ 116 ] www.it-ebooks.info Chapter 6 Clustering is commonly used to explore a dataset. Social networks can be clustered to identify communities and to suggest missing connections between people. In biology, clustering is used to find groups of genes with similar expression patterns. Recommendation systems sometimes employ clustering to identify products or media that might appeal to a user. In marketing, clustering is used to find segments of similar consumers. In the following sections, we will work through an example of using the K-Means algorithm to cluster a dataset. Clustering with the K-Means algorithm The K-Means algorithm is a clustering method that is popular because of its speed and scalability.


pages: 170 words: 51,205

Information Doesn't Want to Be Free: Laws for the Internet Age by Cory Doctorow, Amanda Palmer, Neil Gaiman

Airbnb, barriers to entry, Big Tech, Brewster Kahle, cloud computing, Dean Kamen, Edward Snowden, game design, general purpose technology, Internet Archive, John von Neumann, Kickstarter, Large Hadron Collider, machine readable, MITM: man-in-the-middle, optical character recognition, plutocrats, pre–internet, profit maximization, recommendation engine, rent-seeking, Saturday Night Live, Skype, Steve Jobs, Steve Wozniak, Stewart Brand, Streisand effect, technological determinism, transfer pricing, Whole Earth Catalog, winner-take-all economy

Customers don’t necessarily deliver themselves to “stores”—virtual or physical—and when they do, the titles on offer are rarely the neatly curated, finite, and browsable selections that once dominated. The shelves, instead, are nearly infinite. Browsing has been augmented by search algorithms and automated recommendation systems. And the number of ways for customers to discover new work has exploded. Word of mouth has always been a creator’s best friend. Recommendations from personally trusted sources were a surefire way to sell products. When I worked in a bookstore, one of the most reliable indicators of an imminent sale was two friends entering the store together, and one of them picking up a book and handing it to the other with the words “Oh, you’ve got to read this; you’ll love it.”


pages: 606 words: 157,120

To Save Everything, Click Here: The Folly of Technological Solutionism by Evgeny Morozov

"World Economic Forum" Davos, 3D printing, algorithmic bias, algorithmic trading, Amazon Mechanical Turk, An Inconvenient Truth, Andrew Keen, augmented reality, Automated Insights, behavioural economics, Berlin Wall, big data - Walmart - Pop Tarts, Buckminster Fuller, call centre, carbon footprint, Cass Sunstein, choice architecture, citizen journalism, classic study, cloud computing, cognitive bias, creative destruction, crowdsourcing, data acquisition, Dava Sobel, digital divide, disintermediation, Donald Shoup, driverless car, East Village, en.wikipedia.org, Evgeny Morozov, Fall of the Berlin Wall, Filter Bubble, Firefox, Francis Fukuyama: the end of history, frictionless, future of journalism, game design, gamification, Gary Taubes, Google Glasses, Ian Bogost, illegal immigration, income inequality, invention of the printing press, Jane Jacobs, Jean Tirole, Jeff Bezos, jimmy wales, Julian Assange, Kevin Kelly, Kickstarter, license plate recognition, lifelogging, lolcat, lone genius, Louis Pasteur, machine readable, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, moral panic, Narrative Science, Nelson Mandela, Nicholas Carr, packet switching, PageRank, Parag Khanna, Paul Graham, peer-to-peer, Peter Singer: altruism, Peter Thiel, pets.com, placebo effect, pre–internet, public intellectual, Ray Kurzweil, recommendation engine, Richard Thaler, Ronald Coase, Rosa Parks, self-driving car, Sheryl Sandberg, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, Skype, Slavoj Žižek, smart meter, social graph, social web, stakhanovite, Steve Jobs, Steven Levy, Stuxnet, surveillance capitalism, systems thinking, technoutopianism, TED Talk, the built environment, The Chicago School, The Death and Life of Great American Cities, the medium is the message, The Nature of the Firm, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas L Friedman, transaction costs, Twitter Arab Spring, urban decay, urban planning, urban sprawl, Vannevar Bush, warehouse robotics, WikiLeaks, work culture , Yochai Benkler

Ruck.us then calculates your “political DNA” in order to match you with similar users and encourage you to join relevant “rucks” (according to the site, “the word comes from rugby, where players form a ruck when they loosely come together to fight the other team for possession of the ball.”). Ruck.us is like Netflix for politics, with its cause-recommendation engine essentially encouraging you to, say, check out a campaign to ban abortion if you have expressed strong opposition to gun control, much in the way that Netflix would recommend that you check out Rambo if you liked Rocky. Once in a “ruck,” members can simply follow news posted by other members or be more proactive and share information themselves: links to relevant petitions, organizations, and events are particularly encouraged.


pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values by Brian Christian

Albert Einstein, algorithmic bias, Alignment Problem, AlphaGo, Amazon Mechanical Turk, artificial general intelligence, augmented reality, autonomous vehicles, backpropagation, butterfly effect, Cambridge Analytica, Cass Sunstein, Claude Shannon: information theory, computer vision, Computing Machinery and Intelligence, data science, deep learning, DeepMind, Donald Knuth, Douglas Hofstadter, effective altruism, Elaine Herzberg, Elon Musk, Frances Oldham Kelsey, game design, gamification, Geoffrey Hinton, Goodhart's law, Google Chrome, Google Glasses, Google X / Alphabet X, Gödel, Escher, Bach, Hans Moravec, hedonic treadmill, ImageNet competition, industrial robot, Internet Archive, John von Neumann, Joi Ito, Kenneth Arrow, language acquisition, longitudinal study, machine translation, mandatory minimum, mass incarceration, multi-armed bandit, natural language processing, Nick Bostrom, Norbert Wiener, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, OpenAI, Panopticon Jeremy Bentham, pattern recognition, Peter Singer: altruism, Peter Thiel, precautionary principle, premature optimization, RAND corporation, recommendation engine, Richard Feynman, Rodney Brooks, Saturday Night Live, selection bias, self-driving car, seminal paper, side project, Silicon Valley, Skinner box, sparse data, speech recognition, Stanislav Petrov, statistical model, Steve Jobs, strong AI, the map is not the territory, theory of mind, Tim Cook: Apple, W. E. B. Du Bois, Wayback Machine, zero-sum game

In this sense they will be like butlers who are paid on commission; they will never help us without at least implicitly wanting something in return. They will make astute inferences we don’t necessarily want them to make. And we will come to realize that we are now—already, in the present—almost never acting alone. A friend of mine is in recovery from an alcohol addiction. The ad recommendation engines of their social media accounts know all too much. Their feed is infested with ads for alcohol. Now here’s a person, their preference model says, who LOVES alcohol. As the British writer Iris Murdoch wrote: “Self-knowledge will lead us to avoid occasions of temptation rather than rely on naked strength to overcome them.”54 For any addiction or compulsion, the better part of wisdom tells us—in the case of alcohol, say—that it’s better to throw out every last drop in our home than it is to have it around and not drink it.


pages: 245 words: 64,288

Robots Will Steal Your Job, But That's OK: How to Survive the Economic Collapse and Be Happy by Pistono, Federico

3D printing, Albert Einstein, autonomous vehicles, bioinformatics, Buckminster Fuller, cloud computing, computer vision, correlation does not imply causation, en.wikipedia.org, epigenetics, Erik Brynjolfsson, Firefox, future of work, gamification, George Santayana, global village, Google Chrome, happiness index / gross national happiness, hedonic treadmill, illegal immigration, income inequality, information retrieval, Internet of things, invention of the printing press, Jeff Hawkins, jimmy wales, job automation, John Markoff, Kevin Kelly, Khan Academy, Kickstarter, Kiva Systems, knowledge worker, labor-force participation, Lao Tzu, Law of Accelerating Returns, life extension, Loebner Prize, longitudinal study, means of production, Narrative Science, natural language processing, new economy, Occupy movement, patent troll, pattern recognition, peak oil, post scarcity, QR code, quantum entanglement, race to the bottom, Ray Kurzweil, recommendation engine, RFID, Rodney Brooks, selection bias, self-driving car, seminal paper, slashdot, smart cities, software as a service, software is eating the world, speech recognition, Steven Pinker, strong AI, synthetic biology, technological singularity, TED Talk, Turing test, Vernor Vinge, warehouse automation, warehouse robotics, women in the workforce

The classical “Turing test approach” has been largely abandoned as a realistic research goal, and is now just an intellectual curiosity (the annual Loebner prize for realistic chattiest81), but helped spawn the two dominant themes of modern cognition and artificial intelligence: calculating probabilities and producing complex behaviour from the interaction of many small, simple processes. As of today (2012), we believe these represent more closely what the human brain does, and they have been used in a variety of real-world applications: Google’s autonomous cars, search results, recommendation systems, automated language translation, personal assistants, cybernetic computational search engines, and IBM’s newest super brain Watson. Natural language processing was believed to be a task that only humans could accomplish. A word can have different meanings depending on the context, a phrase could not mean what it says if it is a joke or a pun.


pages: 247 words: 71,698

Avogadro Corp by William Hertling

Any sufficiently advanced technology is indistinguishable from magic, cloud computing, crowdsourcing, Hacker Ethic, hive mind, invisible hand, messenger bag, natural language processing, Netflix Prize, off-the-grid, private military company, Ray Kurzweil, Recombinant DNA, recommendation engine, Richard Stallman, Ruby on Rails, standardized shipping container, tech worker, technological singularity, Turing test, web application, WikiLeaks

“I wish I could find something,” he finally said, “but I don’t know what. There’s this brilliant self-taught Serbian kid who is doing some stuff with artificial intelligence algorithms, and he’s doing it all on his home PC. I’ve been reading his blog, and it sounds like he has some really novel approaches to recommendation systems. But I don’t see any way we could duplicate what he’s doing before the end of the week.” Mike was really grasping at straws. Thin straws at that. He hated to bring bad news to David. “Maybe we can turn down the accuracy of the system. If we use fewer language-goal clusters, we can run with less memory and fewer processor cycles.


pages: 270 words: 64,235

Effective Programming: More Than Writing Code by Jeff Atwood

AltaVista, Amazon Web Services, barriers to entry, cloud computing, endowment effect, fail fast, Firefox, fizzbuzz, Ford Model T, future of work, game design, gamification, Google Chrome, gravity well, Hacker News, job satisfaction, Khan Academy, Kickstarter, loss aversion, Marc Andreessen, Mark Zuckerberg, Merlin Mann, Minecraft, Paul Buchheit, Paul Graham, price anchoring, race to the bottom, recommendation engine, science of happiness, Skype, social software, Steve Jobs, systems thinking, TED Talk, Tragedy of the Commons, web application, Y Combinator, zero-sum game

Here’s the one bit that struck me as most essential: We’ve sometimes referred to the Netflix software architecture in AWS as our Rambo Architecture. Each system has to be able to succeed, no matter what, even all on its own. We’re designing each distributed system to expect and tolerate failure from other systems on which it depends. If our recommendations system is down, we degrade the quality of our responses to our customers, but we still respond. We’ll show popular titles instead of personalized picks. If our search system is intolerably slow, streaming should still work perfectly fine. One of the first systems our engineers built in AWS is called the Chaos Monkey.


Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport

Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, commoditize, data acquisition, data science, disruptive innovation, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, lifelogging, Mark Zuckerberg, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining, Thomas Davenport, three-martini lunch

For instance, LinkedIn has used some of its own internal data to predict which companies will buy LinkedIn Chapter_07.indd 158 03/12/13 12:42 PM What You Can Learn from Start-Ups and Online Firms   159 products, and even who in those firms has the highest likelihood of buying. This work led to an internal recommendation system for salespeople that makes it much easier for them to get the data in one place, and has improved conversion rates by several hundred percent. LinkedIn’s cofounder, Reid Hoffman, is a strong advocate for big data: Because of Web 2.0 [the explosion of social networks and c ­ onsumer participation in the web] and the increasing number of sensors, there’s all this data.


pages: 204 words: 67,922

Elsewhere, U.S.A: How We Got From the Company Man, Family Dinners, and the Affluent Society to the Home Office, BlackBerry Moms,and Economic Anxiety by Dalton Conley

Alan Greenspan, assortative mating, call centre, clean water, commoditize, company town, dematerialisation, demographic transition, Edward Glaeser, extreme commuting, feminist movement, financial independence, Firefox, Frank Levy and Richard Murnane: The New Division of Labor, Home mortgage interest deduction, income inequality, informal economy, insecure affluence, It's morning again in America, Jane Jacobs, Joan Didion, John Maynard Keynes: Economic Possibilities for our Grandchildren, knowledge economy, knowledge worker, labor-force participation, late capitalism, low interest rates, low skilled workers, manufacturing employment, mass immigration, McMansion, Michael Shellenberger, mortgage tax deduction, new economy, off grid, oil shock, PageRank, Paradox of Choice, Ponzi scheme, positional goods, post-industrial society, post-materialism, principal–agent problem, recommendation engine, Richard Florida, rolodex, Ronald Reagan, Silicon Valley, Skype, statistical model, Ted Nordhaus, The Death and Life of Great American Cities, The Great Moderation, the long tail, the strength of weak ties, The Wealth of Nations by Adam Smith, Thomas Malthus, Thorstein Veblen, Tragedy of the Commons, transaction costs, women in the workforce, Yom Kippur War

Not only would my local video store not have been able to afford the shelf space to stock Ring of Bright Water, but the issue more germane to the present discussion is that I would have never even known to ask for it. In fact, short of some chance encounter of a recommendation at a dinner party, I would have never even known that this 1969 British film existed. The fact that I now know it exists can be attributed to the network basis of the Netflix recommendation system. The connected economy, then, does not merely facilitate sameness and the diffusion of hits. It can encourage niche consumption (as Chris Anderson celebrates in The Long Tail). But as wonderful as it is to have a computer recommend a sleeper film that even the slacker clerks at my neighborhood video store wouldn’t be able to name, there is a subtle cost to this form of knowledge diffusion.


pages: 210 words: 65,833

This Is Not Normal: The Collapse of Liberal Britain by William Davies

Airbnb, basic income, Bernie Sanders, Big bang: deregulation of the City of London, Black Lives Matter, Boris Johnson, Cambridge Analytica, central bank independence, centre right, Chelsea Manning, coronavirus, corporate governance, COVID-19, credit crunch, data science, deindustrialization, disinformation, Dominic Cummings, Donald Trump, double entry bookkeeping, Edward Snowden, fake news, family office, Filter Bubble, Francis Fukuyama: the end of history, ghettoisation, gig economy, global pandemic, global village, illegal immigration, Internet of things, Jeremy Corbyn, late capitalism, Leo Hollis, liberal capitalism, loadsamoney, London Interbank Offered Rate, mass immigration, moral hazard, Neil Kinnock, Northern Rock, old-boy network, post-truth, postnationalism / post nation state, precariat, prediction markets, quantitative easing, recommendation engine, Robert Mercer, Ronald Reagan, sentiment analysis, sharing economy, Silicon Valley, Slavoj Žižek, statistical model, Steve Bannon, Steven Pinker, surveillance capitalism, technoutopianism, The Chicago School, Thorstein Veblen, transaction costs, universal basic income, W. E. B. Du Bois, web of trust, WikiLeaks, Yochai Benkler

Distrust and audit culture work in a vicious circle, generating a spiral of surveillance and paranoia. Once suspicions are cast on others – be they public officials, teachers or other members of our community – no amount of data will be sufficient to alleviate them. The platform economy drives this into everyday life. Reputation and recommendations systems were originally unveiled with the promise of establishing trust between strangers, for instance on eBay. But Airbnb is now increasingly plagued by the phenomenon of sellers installing secret cameras around their homes, to seek additional proof of a buyer’s honesty. The authority of language is downgraded in the process.


pages: 296 words: 66,815

The AI-First Company by Ash Fontana

23andMe, Amazon Mechanical Turk, Amazon Web Services, autonomous vehicles, barriers to entry, blockchain, business intelligence, business process, business process outsourcing, call centre, Charles Babbage, chief data officer, Clayton Christensen, cloud computing, combinatorial explosion, computer vision, crowdsourcing, data acquisition, data science, deep learning, DevOps, en.wikipedia.org, Geoffrey Hinton, independent contractor, industrial robot, inventory management, John Conway, knowledge economy, Kubernetes, Lean Startup, machine readable, minimum viable product, natural language processing, Network effects, optical character recognition, Pareto efficiency, performance metric, price discrimination, recommendation engine, Ronald Coase, Salesforce, single source of truth, software as a service, source of truth, speech recognition, the scientific method, transaction costs, vertical integration, yield management

First, it gathered a great deal of data on products and helped customers make better buying decisions by putting all of that data in the product listings, providing comparison tables with structured product information. More information meant better comparisons and decisions. Then Amazon invested in a team to build machine learned search and recommendation systems: A9. This team effectively got that product data and matched it with purchase data to learn which products customers want to buy so that Amazon could recommend similar products to those customers in listing pages and search results. Gathering a lot of data started the entry-level network effect: Amazon was the most useful shopping website to consumers because it had the most product information.


pages: 234 words: 67,589

Internet for the People: The Fight for Our Digital Future by Ben Tarnoff

4chan, A Declaration of the Independence of Cyberspace, accounting loophole / creative accounting, Alan Greenspan, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic management, AltaVista, Amazon Web Services, barriers to entry, Bernie Sanders, Big Tech, Black Lives Matter, blue-collar work, business logic, call centre, Charles Babbage, cloud computing, computer vision, coronavirus, COVID-19, decentralized internet, deep learning, defund the police, deindustrialization, desegregation, digital divide, disinformation, Edward Snowden, electricity market, fake news, Filter Bubble, financial intermediation, future of work, gamification, General Magic , gig economy, God and Mammon, green new deal, independent contractor, information asymmetry, Internet of things, Jeff Bezos, Jessica Bruder, John Markoff, John Perry Barlow, Kevin Roose, Kickstarter, Leo Hollis, lockdown, lone genius, low interest rates, Lyft, Mark Zuckerberg, means of production, Menlo Park, natural language processing, Network effects, Nicholas Carr, packet switching, PageRank, pattern recognition, pets.com, profit maximization, profit motive, QAnon, recommendation engine, rent-seeking, ride hailing / ride sharing, Sheryl Sandberg, Shoshana Zuboff, side project, Silicon Valley, single-payer health, smart grid, social distancing, Steven Levy, stock buybacks, supply-chain management, surveillance capitalism, techlash, Telecommunications Act of 1996, TikTok, transportation-network company, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, undersea cable, UUNET, vertical integration, Victor Gruen, web application, working poor, Yochai Benkler

This messiness is manifest in online spaces, contrary to the “filter bubbles” thesis—which, like the theory that polarization is produced by social media, has scant evidence to support it. People can and do find like-minded interlocutors on the internet, and the algorithms that underpin social media feeds and recommendation systems can contribute to these clusterings. But the conversations that ensue rarely resemble an echo chamber, with everyone parroting the same party line. When the researchers P. M. Krafft and Joan Donovan examined the origins of one campaign to spread false information on 4chan, a message board popular with the far Right, they found “widespread heterogeneity of beliefs and contestation of the claims.”


pages: 757 words: 193,541

The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Volume 2 by Thomas A. Limoncelli, Strata R. Chalup, Christina J. Hogan

active measures, Amazon Web Services, anti-pattern, barriers to entry, business process, cloud computing, commoditize, continuous integration, correlation coefficient, database schema, Debian, defense in depth, delayed gratification, DevOps, domain-specific language, en.wikipedia.org, fault tolerance, finite state, Firefox, functional programming, Google Glasses, information asymmetry, Infrastructure as a Service, intermodal, Internet of things, job automation, job satisfaction, Ken Thompson, Kickstarter, level 1 cache, load shedding, longitudinal study, loose coupling, machine readable, Malcom McLean invented shipping containers, Marc Andreessen, place-making, platform as a service, premature optimization, recommendation engine, revision control, risk tolerance, Salesforce, scientific management, seminal paper, side project, Silicon Valley, software as a service, sorting algorithm, standardized shipping container, statistical model, Steven Levy, supply-chain management, systems thinking, The future is already here, Toyota Production System, vertical integration, web application, Yogi Berra

It could not be installed by users because the framework does not permit Python libraries that include portions written in compiled languages. PaaS provides many high-level services including storage services, database services, and many of the same services available in IaaS offerings. Some offer more more esoteric services such as Google’s Machine Learning service, which can be used to build a recommendation engine. Additional services are announced periodically. 3.1.3 Software as a Service SaaS is what we used to call a web site before the marketing department decided adding “as a service” made it more appealing. SaaS is a web-accessible application. The application is the service, and you interact with it as you would any web site.


pages: 236 words: 77,098

I Live in the Future & Here's How It Works: Why Your World, Work, and Brain Are Being Creatively Disrupted by Nick Bilton

3D printing, 4chan, Albert Einstein, augmented reality, barriers to entry, Cass Sunstein, death of newspapers, en.wikipedia.org, Internet of things, Joan Didion, John Gruber, John Markoff, Marshall McLuhan, Nicholas Carr, QR code, recommendation engine, RFID, Saturday Night Live, Steve Jobs, Steven Pinker, Stewart Brand, TED Talk, The future is already here

Here are three different ways people, especially young ones, may evaluate whether something is worth purchasing. Bad = Free My friend Mike loves music. In fact, Mike is a music fanatic. In every spare moment he has, Mike scours the Web and his social networks, searching for new music to listen to and potentially purchase. Like most of his friends, Mike uses his recommendation systems and social networks to find the music he’s interested in. He’ll preview a few songs, and if he decides the content is good, he’ll follow through with a purchase. He rarely buys entire albums because he believes most albums contain only one or two good songs. Mike also follows a handful of bands and immediately buys their entire albums on release day.


pages: 706 words: 202,591

Facebook: The Inside Story by Steven Levy

active measures, Airbnb, Airbus A320, Amazon Mechanical Turk, AOL-Time Warner, Apple's 1984 Super Bowl advert, augmented reality, Ben Horowitz, Benchmark Capital, Big Tech, Black Lives Matter, Blitzscaling, blockchain, Burning Man, business intelligence, Cambridge Analytica, cloud computing, company town, computer vision, crowdsourcing, cryptocurrency, data science, deep learning, disinformation, don't be evil, Donald Trump, Dunbar number, East Village, Edward Snowden, El Camino Real, Elon Musk, end-to-end encryption, fake news, Firefox, Frank Gehry, Geoffrey Hinton, glass ceiling, GPS: selective availability, growth hacking, imposter syndrome, indoor plumbing, information security, Jeff Bezos, John Markoff, Jony Ive, Kevin Kelly, Kickstarter, lock screen, Lyft, machine translation, Mahatma Gandhi, Marc Andreessen, Marc Benioff, Mark Zuckerberg, Max Levchin, Menlo Park, Metcalfe’s law, MITM: man-in-the-middle, move fast and break things, natural language processing, Network effects, Oculus Rift, operational security, PageRank, Paul Buchheit, paypal mafia, Peter Thiel, pets.com, post-work, Ray Kurzweil, recommendation engine, Robert Mercer, Robert Metcalfe, rolodex, Russian election interference, Salesforce, Sam Altman, Sand Hill Road, self-driving car, sexual politics, Sheryl Sandberg, Shoshana Zuboff, side project, Silicon Valley, Silicon Valley startup, skeuomorphism, slashdot, Snapchat, social contagion, social graph, social software, South of Market, San Francisco, Startup school, Steve Ballmer, Steve Bannon, Steve Jobs, Steven Levy, Steven Pinker, surveillance capitalism, tech billionaire, techlash, Tim Cook: Apple, Tragedy of the Commons, web application, WeWork, WikiLeaks, women in the workforce, Y Combinator, Y2K, you are the product

Facebook, she felt, had built an engine to push propaganda. She managed to get a meeting with a News Feed director, who conceded that some groups were problematic but that the company did not want to hamper free expression. “I wasn’t asking for suppression,” DiResta says. “I was saying your recommendation engine was growing this community!” * * * • • • IN FACT, HALFWAY around the world, there was terrifying proof of those fears. In the Philippines. By 2015, nearly all inhabitants of that Pacific island country of 10 million had been on Facebook for several years. A major factor in making this happen was the Internet.org Facebook program—hatched from the Growth team—known as Free Basics.


Writing Effective Use Cases by Alistair Cockburn

business process, c2.com, create, read, update, delete, finite state, index card, information retrieval, iterative process, operational security, recommendation engine, Silicon Valley, web application, work culture

System recalls the selected solution. 26c4. Continue at step 26 26d. Shopper wants to finance products in the shopping cart with available Finance Plans: 26d1. Shopper chooses to finance products in the shopping cart 26d2. System will present a series of questions that are dependent on previous answers to determine finance plan recommendations. System interfaces with Finance System to obtain credit rating approval. Initiate Obtain Finance Rating. 26d3. Shopper will select a finance plan 26d4. System will present a series of questions based on previous answers to determine details of the selected finance plan. 26d5. Shopper will view financial plan details and chooses to go with the plan. 26d6.


pages: 411 words: 80,925

What's Mine Is Yours: How Collaborative Consumption Is Changing the Way We Live by Rachel Botsman, Roo Rogers

"World Economic Forum" Davos, Abraham Maslow, Airbnb, Apollo 13, barriers to entry, behavioural economics, Bernie Madoff, bike sharing, Buckminster Fuller, business logic, buy and hold, carbon footprint, Cass Sunstein, collaborative consumption, collaborative economy, commoditize, Community Supported Agriculture, credit crunch, crowdsourcing, dematerialisation, disintermediation, en.wikipedia.org, experimental economics, Ford Model T, Garrett Hardin, George Akerlof, global village, hedonic treadmill, Hugh Fearnley-Whittingstall, information retrieval, intentional community, iterative process, Kevin Kelly, Kickstarter, late fees, Mark Zuckerberg, market design, Menlo Park, Network effects, new economy, new new economy, out of africa, Paradox of Choice, Parkinson's law, peer-to-peer, peer-to-peer lending, peer-to-peer rental, planned obsolescence, Ponzi scheme, pre–internet, public intellectual, recommendation engine, RFID, Richard Stallman, ride hailing / ride sharing, Robert Shiller, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, Simon Kuznets, Skype, slashdot, smart grid, South of Market, San Francisco, Stewart Brand, systems thinking, TED Talk, the long tail, The Nature of the Firm, The Spirit Level, the strength of weak ties, The Theory of the Leisure Class by Thorstein Veblen, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thorstein Veblen, Torches of Freedom, Tragedy of the Commons, transaction costs, traveling salesman, ultimatum game, Victor Gruen, web of trust, women in the workforce, work culture , Yochai Benkler, Zipcar

Collective Wisdom of Members At the same time, Netflix has built a sophisticated platform to foster a community among members, and to tailor recommendations to individual tastes. Talk to anyone who has ever used Netflix and they will tell you about how they “discovered releases,” “learned about classics,” and “found rare gems” they never would have found on their own at a store. Approximately 60 percent of members base their selections on Netflix’s Cinematch recommendations system. Early on, people’s willingness to share and rate the films they had watched and to make suggestions to “friends” surprised the founders. The user community itself adopted the ethos of “Millions of members helping you.” Impressively, there are now more than 2 billion ratings from members, and the average member has evaluated approximately two hundred movies.


pages: 239 words: 80,319

Lurking: How a Person Became a User by Joanne McNeil

"World Economic Forum" Davos, 4chan, A Declaration of the Independence of Cyberspace, Ada Lovelace, Adam Curtis, Airbnb, AltaVista, Amazon Mechanical Turk, Andy Rubin, benefit corporation, Big Tech, Black Lives Matter, Burning Man, Cambridge Analytica, Chelsea Manning, Chris Wanstrath, citation needed, cloud computing, context collapse, crowdsourcing, data science, deal flow, decentralized internet, delayed gratification, dematerialisation, disinformation, don't be evil, Donald Trump, drone strike, Edward Snowden, Elon Musk, eternal september, fake news, feminist movement, Firefox, gentrification, Google Earth, Google Glasses, Google Hangouts, green new deal, helicopter parent, holacracy, Internet Archive, invention of the telephone, Jeff Bezos, jimmy wales, John Perry Barlow, Jon Ronson, Julie Ann Horvath, Kim Stanley Robinson, l'esprit de l'escalier, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Max Levchin, means of production, Menlo Park, Mondo 2000, moral panic, move fast and break things, Neal Stephenson, Network effects, packet switching, PageRank, pre–internet, profit motive, Project Xanadu, QAnon, real-name policy, recommendation engine, Salesforce, Saturday Night Live, Sheryl Sandberg, Shoshana Zuboff, Silicon Valley, slashdot, Snapchat, social graph, Social Justice Warrior, Stephen Hawking, Steve Jobs, Steven Levy, Stewart Brand, subscription business, surveillance capitalism, tech worker, techlash, technoutopianism, Ted Nelson, TED Talk, Tim Cook: Apple, trade route, Turing complete, Wayback Machine, We are the 99%, web application, white flight, Whole Earth Catalog, you are the product

Eric Schmidt called multiple results a “bug” in an interview with Charlie Rose in 2005, which is further considered in a Washington Post piece by Gregory Ferenstein (“Google, Competition and the Perfect Result,” January 4, 2013). Nitasha Tiku has reported on activism at Google (“Why Tech Worker Dissent Is Going Viral,” Wired, June 29, 2018). An interview with Guillaume Chaslot, one of the engineers who worked on the recommendation system, in The Guardian (“‘Fiction is outperforming reality’: how YouTube’s algorithm distorts truth,” February 2, 2018) provides more information on how hateful content and misinformation spreads on the platform. Safiya Umoja Noble’s book Algorithms of Oppression (NYU Press, 2018) is a definitive look at Google’s bias.


Know Thyself by Stephen M Fleming

Abraham Wald, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, AlphaGo, autism spectrum disorder, autonomous vehicles, availability heuristic, backpropagation, citation needed, computer vision, confounding variable, data science, deep learning, DeepMind, Demis Hassabis, Douglas Hofstadter, Dunning–Kruger effect, Elon Musk, Estimating the Reproducibility of Psychological Science, fake news, global pandemic, higher-order functions, index card, Jeff Bezos, l'esprit de l'escalier, Lao Tzu, lifelogging, longitudinal study, meta-analysis, mutually assured destruction, Network effects, patient HM, Pierre-Simon Laplace, power law, prediction markets, QWERTY keyboard, recommendation engine, replication crisis, self-driving car, side project, Skype, Stanislav Petrov, statistical model, theory of mind, Thomas Bayes, traumatic brain injury

The remarkable developments in artificial intelligence have not yet been accompanied by comparable developments in artificial self-awareness. In fact, as technology gets smarter, the relevance of our self-awareness might also diminish. A powerful combination of data and machine learning may end up knowing what we want or need better than we know ourselves. The Amazon and Netflix recommendation systems offer up the next movie to watch; dating algorithms take on the job of finding our perfect match; virtual assistants book hair appointments before we are aware that we need them; online personal shoppers send us clothes that we didn’t even know we wanted. As human consumers in such a world, we may no longer need to know how we are solving problems or making decisions, because these tasks have become outsourced to AI assistants.


Off the Edge: Flat Earthers, Conspiracy Culture, and Why People Will Believe Anything by Kelly Weill

4chan, Albert Einstein, Alfred Russel Wallace, algorithmic bias, anti-communist, Apollo 11, Big Tech, bitcoin, Comet Ping Pong, coronavirus, COVID-19, crisis actor, cryptocurrency, disinformation, Donald Trump, Elon Musk, fake news, false flag, income inequality, Internet Archive, Isaac Newton, Johannes Kepler, Kevin Roose, Kickstarter, lockdown, Mark Zuckerberg, Mars Society, mass immigration, medical malpractice, moral panic, off-the-grid, QAnon, recommendation engine, side project, Silicon Valley, Silicon Valley startup, Skype, tech worker, Tesla Model S, TikTok, Timothy McVeigh, Wayback Machine, Y2K

The issue wasn’t just that people were being racist online (a problem as old as the internet). It was that Facebook’s own recommendation algorithm was driving users to those groups. “64% of all extremist group joins are due to our recommendation tools,” an internal Facebook presentation on the study said, namely the “Groups You Should Join” and “Discover” algorithms. “Our recommendation systems grow the problem.” Facebook’s recommendations actively cross-pollinated the conspiracy world, luring truthers over the lines that once demarcated their individual theories. The result was a conspiratorial melting pot: QAnon followers preaching their gospel on pages for people who believed airplanes were spraying mind-control drugs, bogus miracle cures being sold in anti-vaccination groups, and nearly every popular conspiracy theory finding its way onto Flat Earth pages, which saw skeptics of all stripes gather to share notes.


How to Stand Up to a Dictator by Maria Ressa

2021 United States Capitol attack, activist lawyer, affirmative action, Affordable Care Act / Obamacare, airport security, anti-communist, Asian financial crisis, Big Tech, Brexit referendum, business process, business process outsourcing, call centre, Cambridge Analytica, citizen journalism, cognitive bias, colonial rule, commoditize, contact tracing, coronavirus, COVID-19, crowdsourcing, delayed gratification, disinformation, Donald Trump, fake news, future of journalism, iterative process, James Bridle, Kevin Roose, lockdown, lone genius, Mahatma Gandhi, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Milgram experiment, move fast and break things, natural language processing, Nelson Mandela, Network effects, obamacare, performance metric, QAnon, recommendation engine, Sheryl Sandberg, Shoshana Zuboff, Silicon Valley, Steven Levy, surveillance capitalism, the medium is the message, The Wisdom of Crowds, TikTok, Twitter Arab Spring, work culture

Instead of making the platform more transparent, as Mark claimed to be doing, the company made sure that no one but Facebook had the data to see the whole picture.15 Even when the company produced its own disturbing internal research findings, its executives refused to act. A 2016 internal presentation about Germany detailed that “64% of all extremist group joins are due to our recommendation tools,” such as algorithms driving “Groups You Should Join” and “Discover.” The report made a very clear statement: “Our recommendation systems grow the problem.”16 Facebook has a staggering ability to determine the fates of news organizations—of journalism itself, even. Today it has an internal ranking for news that is supposedly determined by algorithms; however, not only did a human code those algorithms, but Facebook decides whether a given user is fed more hate or more facts.


pages: 669 words: 210,153

Tools of Titans: The Tactics, Routines, and Habits of Billionaires, Icons, and World-Class Performers by Timothy Ferriss

Abraham Maslow, Adam Curtis, Airbnb, Alexander Shulgin, Alvin Toffler, An Inconvenient Truth, artificial general intelligence, asset allocation, Atul Gawande, augmented reality, back-to-the-land, Ben Horowitz, Bernie Madoff, Bertrand Russell: In Praise of Idleness, Beryl Markham, billion-dollar mistake, Black Swan, Blue Bottle Coffee, Blue Ocean Strategy, blue-collar work, book value, Boris Johnson, Buckminster Fuller, business process, Cal Newport, call centre, caloric restriction, caloric restriction, Carl Icahn, Charles Lindbergh, Checklist Manifesto, cognitive bias, cognitive dissonance, Colonization of Mars, Columbine, commoditize, correlation does not imply causation, CRISPR, David Brooks, David Graeber, deal flow, digital rights, diversification, diversified portfolio, do what you love, Donald Trump, effective altruism, Elon Musk, fail fast, fake it until you make it, fault tolerance, fear of failure, Firefox, follow your passion, fulfillment center, future of work, Future Shock, Girl Boss, Google X / Alphabet X, growth hacking, Howard Zinn, Hugh Fearnley-Whittingstall, Jeff Bezos, job satisfaction, Johann Wolfgang von Goethe, John Markoff, Kevin Kelly, Kickstarter, Lao Tzu, lateral thinking, life extension, lifelogging, Mahatma Gandhi, Marc Andreessen, Mark Zuckerberg, Mason jar, Menlo Park, microdosing, Mikhail Gorbachev, MITM: man-in-the-middle, Neal Stephenson, Nelson Mandela, Nicholas Carr, Nick Bostrom, off-the-grid, optical character recognition, PageRank, Paradox of Choice, passive income, pattern recognition, Paul Graham, peer-to-peer, Peter H. Diamandis: Planetary Resources, Peter Singer: altruism, Peter Thiel, phenotype, PIHKAL and TIHKAL, post scarcity, post-work, power law, premature optimization, private spaceflight, QWERTY keyboard, Ralph Waldo Emerson, Ray Kurzweil, recommendation engine, rent-seeking, Richard Feynman, risk tolerance, Ronald Reagan, Salesforce, selection bias, sharing economy, side project, Silicon Valley, skunkworks, Skype, Snapchat, Snow Crash, social graph, software as a service, software is eating the world, stem cell, Stephen Hawking, Steve Jobs, Stewart Brand, superintelligent machines, TED Talk, Tesla Model S, The future is already here, the long tail, The Wisdom of Crowds, Thomas L Friedman, traumatic brain injury, trolley problem, vertical integration, Wall-E, Washington Consensus, We are as Gods, Whole Earth Catalog, Y Combinator, zero-sum game

Chris Anderson (my successor at Wired) named this effect “the Long Tail,” for the visually graphed shape of the sales distribution curve: a low, nearly interminable line of items selling only a few copies per year that form a long “tail” for the abrupt vertical beast of a few bestsellers. But the area of the tail was as big as the head. With that insight, the aggregators had great incentive to encourage audiences to click on the obscure items. They invented recommendation engines and other algorithms to channel attention to the rare creations in the long tail. Even web search companies like Google, Bing, and Baidu found it in their interests to reward searchers with the obscure because they could sell ads in the long tail as well. The result was that the most obscure became less obscure.


RDF Database Systems: Triples Storage and SPARQL Query Processing by Olivier Cure, Guillaume Blin

Amazon Web Services, bioinformatics, business intelligence, cloud computing, database schema, fault tolerance, folksonomy, full text search, functional programming, information retrieval, Internet Archive, Internet of things, linked data, machine readable, NP-complete, peer-to-peer, performance metric, power law, random walk, recommendation engine, RFID, semantic web, Silicon Valley, social intelligence, software as a service, SPARQL, sparse data, web application

This points to two distinct nodes corresponding to blog entries. For each of them, the system will navigate through the category edge and will only retain those with a Science value—that is, in the figure, only blog2 matches to our search. Typical use cases of graph databases are social and e-commerce domains, as well as recommendation systems. 2.2.4 MapReduce In the previous section, we emphasized on solutions that enable us to store data on cluster commodity machines. To apprehend the full potential of this approach, this also has Database Management Systems to come with methods to process this data efficiently—that is, to perform the processing on the servers and to limit the transfer of data between machines to its minimum. ­


pages: 337 words: 103,522

The Creativity Code: How AI Is Learning to Write, Paint and Think by Marcus Du Sautoy

3D printing, Ada Lovelace, Albert Einstein, algorithmic bias, AlphaGo, Alvin Roth, Andrew Wiles, Automated Insights, Benoit Mandelbrot, Bletchley Park, Cambridge Analytica, Charles Babbage, Claude Shannon: information theory, computer vision, Computing Machinery and Intelligence, correlation does not imply causation, crowdsourcing, data is the new oil, data science, deep learning, DeepMind, Demis Hassabis, Donald Trump, double helix, Douglas Hofstadter, driverless car, Elon Musk, Erik Brynjolfsson, Fellow of the Royal Society, Flash crash, Gödel, Escher, Bach, Henri Poincaré, Jacquard loom, John Conway, Kickstarter, Loebner Prize, machine translation, mandelbrot fractal, Minecraft, move 37, music of the spheres, Mustafa Suleyman, Narrative Science, natural language processing, Netflix Prize, PageRank, pattern recognition, Paul Erdős, Peter Thiel, random walk, Ray Kurzweil, recommendation engine, Rubik’s Cube, Second Machine Age, Silicon Valley, speech recognition, stable marriage problem, Turing test, Watson beat the top human players on Jeopardy!, wikimedia commons

., ‘Teaching Machines to Read and Comprehend’, in Advances in Neural Information Processing Systems, NIPS Proceedings (2015) Ilyas, Andrew, et al., ‘Query-Efficient Black-Box Adversarial Examples’, arXiv:1712.07113 (2017) Khalifa, Ahmed, Gabriella A. B. Barros and Julian Togelius, ‘DeepTingle’, arXiv:1705.03557 (2017) Koren, Yehuda, Robert M. Bell and Chris Volinsky, ‘Matrix Factorization Techniques for Recommender Systems’, Computer Journal, vol. 42(8), 30–37 (2009) Li, Boyang and Mark O. Riedl, ‘Scheherazade: Crowd-Powered Interactive Narrative Generation’, 29th AAAI Conference on Artificial Intelligence (2015) Llano, Maria Teresa, et al., ‘What If a Fish Got Drunk? Exploring the Plausibility of Machine-Generated Fictions’, in Proceedings of the Seventh International Conference on Computational Creativity (2016) Loos, Sarah, et al., ‘Deep Network Guided Proof Search’, arXiv: 1701.06972v1 (2017) Mahendran, Aravindh and Andrea Vedaldi, ‘Understanding Deep Image Representations by Inverting Them’, Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5188–96 (2015) Mathewson, Kory Wallace and Piotr W.


Rockonomics: A Backstage Tour of What the Music Industry Can Teach Us About Economics and Life by Alan B. Krueger

"Friedman doctrine" OR "shareholder theory", accounting loophole / creative accounting, Affordable Care Act / Obamacare, Airbnb, Alan Greenspan, autonomous vehicles, bank run, behavioural economics, Berlin Wall, bitcoin, Bob Geldof, butterfly effect, buy and hold, congestion pricing, creative destruction, crowdsourcing, digital rights, disintermediation, diversified portfolio, Donald Trump, endogenous growth, Gary Kildall, George Akerlof, gig economy, income inequality, independent contractor, index fund, invisible hand, Jeff Bezos, John Maynard Keynes: Economic Possibilities for our Grandchildren, Kenneth Arrow, Kickstarter, Larry Ellison, Live Aid, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, moral hazard, Multics, Network effects, obamacare, offshore financial centre, opioid epidemic / opioid crisis, Paul Samuelson, personalized medicine, power law, pre–internet, price discrimination, profit maximization, random walk, recommendation engine, rent-seeking, Richard Thaler, ride hailing / ride sharing, Saturday Night Live, Skype, Steve Jobs, the long tail, The Wealth of Nations by Adam Smith, TikTok, too big to fail, transaction costs, traumatic brain injury, Tyler Cowen, ultimatum game, winner-take-all economy, women in the workforce, Y Combinator, zero-sum game

In fact, the number of musical choices facing individuals greatly expanded—and became even more bewildering—with the advent of streaming services, which is likely to lead us to rely even more on our social networks for clues in selecting songs and artists. And the rapidly growing set of curated recommendation systems that use Big Data to help us discover new music is also likely to reinforce network effects, unless there is a surge in demand for curation systems that recommend songs that are both unpopular and likely to stay that way. Gloria Estefan: Music of the Heart Gloria Estefan is the most successful crossover artist of all time.


pages: 346 words: 97,330

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass by Mary L. Gray, Siddharth Suri

"World Economic Forum" Davos, Affordable Care Act / Obamacare, AlphaGo, Amazon Mechanical Turk, Apollo 13, augmented reality, autonomous vehicles, barriers to entry, basic income, benefit corporation, Big Tech, big-box store, bitcoin, blue-collar work, business process, business process outsourcing, call centre, Capital in the Twenty-First Century by Thomas Piketty, cloud computing, cognitive load, collaborative consumption, collective bargaining, computer vision, corporate social responsibility, cotton gin, crowdsourcing, data is the new oil, data science, deep learning, DeepMind, deindustrialization, deskilling, digital divide, do well by doing good, do what you love, don't be evil, Donald Trump, Elon Musk, employer provided health coverage, en.wikipedia.org, equal pay for equal work, Erik Brynjolfsson, fake news, financial independence, Frank Levy and Richard Murnane: The New Division of Labor, fulfillment center, future of work, gig economy, glass ceiling, global supply chain, hiring and firing, ImageNet competition, independent contractor, industrial robot, informal economy, information asymmetry, Jeff Bezos, job automation, knowledge economy, low skilled workers, low-wage service sector, machine translation, market friction, Mars Rover, natural language processing, new economy, operational security, passive income, pattern recognition, post-materialism, post-work, power law, race to the bottom, Rana Plaza, recommendation engine, ride hailing / ride sharing, Ronald Coase, scientific management, search costs, Second Machine Age, sentiment analysis, sharing economy, Shoshana Zuboff, side project, Silicon Valley, Silicon Valley startup, Skype, software as a service, speech recognition, spinning jenny, Stephen Hawking, TED Talk, The Future of Employment, The Nature of the Firm, Tragedy of the Commons, transaction costs, two-sided market, union organizing, universal basic income, Vilfredo Pareto, Wayback Machine, women in the workforce, work culture , Works Progress Administration, Y Combinator, Yochai Benkler

There is no easy, free alternative, unless everyone decides to delete their social media accounts. FIX 5: RÉSUMÉ 2.0 AND PORTABLE REPUTATION SYSTEMS Since requesters can seamlessly enter and exit the market, independent workers are often at a disadvantage when it comes to getting a rating or recommendation after they finish a task. On-demand workers will need reputation and recommendation systems that help them navigate finding their next income opportunity and manage the risk of such an uncertain future. They might establish a rapport with a requester for months, like Joan or Riyaz, only to find that requester leave the market or begin looking for workers with different skills. Successful workers will need to adapt to this dynamic environment quickly.


pages: 350 words: 98,077

Artificial Intelligence: A Guide for Thinking Humans by Melanie Mitchell

Ada Lovelace, AI winter, Alignment Problem, AlphaGo, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, artificial general intelligence, autonomous vehicles, backpropagation, Bernie Sanders, Big Tech, Boston Dynamics, Cambridge Analytica, Charles Babbage, Claude Shannon: information theory, cognitive dissonance, computer age, computer vision, Computing Machinery and Intelligence, dark matter, deep learning, DeepMind, Demis Hassabis, Douglas Hofstadter, driverless car, Elon Musk, en.wikipedia.org, folksonomy, Geoffrey Hinton, Gödel, Escher, Bach, I think there is a world market for maybe five computers, ImageNet competition, Jaron Lanier, job automation, John Markoff, John von Neumann, Kevin Kelly, Kickstarter, license plate recognition, machine translation, Mark Zuckerberg, natural language processing, Nick Bostrom, Norbert Wiener, ought to be enough for anybody, paperclip maximiser, pattern recognition, performance metric, RAND corporation, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Rodney Brooks, self-driving car, sentiment analysis, Silicon Valley, Singularitarianism, Skype, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, tacit knowledge, tail risk, TED Talk, the long tail, theory of mind, There's no reason for any individual to have a computer in his home - Ken Olsen, trolley problem, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, world market for maybe five computers

On the one hand, deep neural networks, trained via supervised learning, perform remarkably well (though still far from perfectly) on many problems in computer vision, as well as in other domains such as speech recognition and language translation. Because of their impressive abilities, these networks are rapidly being taken from research settings and employed in real-world applications such as web search, self-driving cars, face recognition, virtual assistants, and recommendation systems, and it’s getting hard to imagine life without these AI tools. On the other hand, it’s misleading to say that deep networks “learn on their own” or that their training is “similar to human learning.” Recognition of the success of these networks must be tempered with a realization that they can fail in unexpected ways because of overfitting to their training data, long-tail effects, and vulnerability to hacking.


Artificial Whiteness by Yarden Katz

affirmative action, AI winter, algorithmic bias, AlphaGo, Amazon Mechanical Turk, autonomous vehicles, benefit corporation, Black Lives Matter, blue-collar work, Californian Ideology, Cambridge Analytica, cellular automata, Charles Babbage, cloud computing, colonial rule, computer vision, conceptual framework, Danny Hillis, data science, David Graeber, deep learning, DeepMind, desegregation, Donald Trump, Dr. Strangelove, driverless car, Edward Snowden, Elon Musk, Erik Brynjolfsson, European colonialism, fake news, Ferguson, Missouri, general purpose technology, gentrification, Hans Moravec, housing crisis, income inequality, information retrieval, invisible hand, Jeff Bezos, Kevin Kelly, knowledge worker, machine readable, Mark Zuckerberg, mass incarceration, Menlo Park, military-industrial complex, Nate Silver, natural language processing, Nick Bostrom, Norbert Wiener, pattern recognition, phenotype, Philip Mirowski, RAND corporation, recommendation engine, rent control, Rodney Brooks, Ronald Reagan, Salesforce, Seymour Hersh, Shoshana Zuboff, Silicon Valley, Silicon Valley billionaire, Silicon Valley ideology, Skype, speech recognition, statistical model, Stephen Hawking, Stewart Brand, Strategic Defense Initiative, surveillance capitalism, talking drums, telemarketer, The Signal and the Noise by Nate Silver, W. E. B. Du Bois, Whole Earth Catalog, WikiLeaks

As Jordan writes, “By the turn of the century forward-looking companies such as Amazon were already using ML [machine-learning] throughout their business, solving mission-critical, back-end problems in fraud detection and supply-chain prediction, and building innovative consumer-facing services such as recommendation systems.” His main point is that these technical ideas and the disciplines that produced them weren’t part of an attempt to “imitate” human intelligence and thus should not be labeled “AI.” In a familiar turn, Jordan offers to switch the letters and call it instead “Intelligence Augmentation (IA).”   16.   


pages: 788 words: 223,004

Merchants of Truth: The Business of News and the Fight for Facts by Jill Abramson

"World Economic Forum" Davos, 23andMe, 4chan, Affordable Care Act / Obamacare, Alexander Shulgin, Apple's 1984 Super Bowl advert, barriers to entry, Bernie Madoff, Bernie Sanders, Big Tech, Black Lives Matter, Cambridge Analytica, Charles Lindbergh, Charlie Hebdo massacre, Chelsea Manning, citizen journalism, cloud computing, commoditize, content marketing, corporate governance, creative destruction, crowdsourcing, data science, death of newspapers, digital twin, diversified portfolio, Donald Trump, East Village, Edward Snowden, fake news, Ferguson, Missouri, Filter Bubble, future of journalism, glass ceiling, Google Glasses, haute couture, hive mind, income inequality, information asymmetry, invisible hand, Jeff Bezos, Joseph Schumpeter, Khyber Pass, late capitalism, Laura Poitras, Marc Andreessen, Mark Zuckerberg, move fast and break things, Nate Silver, new economy, obamacare, Occupy movement, Paris climate accords, performance metric, Peter Thiel, phenotype, pre–internet, race to the bottom, recommendation engine, Robert Mercer, Ronald Reagan, Saturday Night Live, self-driving car, sentiment analysis, Sheryl Sandberg, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, skunkworks, Snapchat, social contagion, social intelligence, social web, SoftBank, Steve Bannon, Steve Jobs, Steven Levy, tech billionaire, technoutopianism, telemarketer, the scientific method, The Wisdom of Crowds, Tim Cook: Apple, too big to fail, vertical integration, WeWork, WikiLeaks, work culture , Yochai Benkler, you are the product

These stories fit right in the feed, matching the tone and topical matter of Facebook at large, as if they came from readers’ family or friends. It was a breakthrough in making news personal and connecting with readers on their terms, right there in the streamlined scroll that encapsulated their social lives. Like Amazon’s recommendation engine (displaying the products that “customers who bought this item also bought”), BuzzFeed’s empire was built on computer processes that, with as little human input as possible, could pull off the illusion of “getting you.” By 2016 Facebook was far larger than any nation-state, the biggest and most centralized congregation of people—friends, readers, consumers, voters—that the world had ever seen.


pages: 380 words: 109,724

Don't Be Evil: How Big Tech Betrayed Its Founding Principles--And All of US by Rana Foroohar

"Susan Fowler" uber, "World Economic Forum" Davos, accounting loophole / creative accounting, Airbnb, Alan Greenspan, algorithmic bias, algorithmic management, AltaVista, Andy Rubin, autonomous vehicles, banking crisis, barriers to entry, behavioural economics, Bernie Madoff, Bernie Sanders, Big Tech, bitcoin, Black Lives Matter, book scanning, Brewster Kahle, Burning Man, call centre, Cambridge Analytica, cashless society, clean tech, cloud computing, cognitive dissonance, Colonization of Mars, computer age, corporate governance, creative destruction, Credit Default Swap, cryptocurrency, data is the new oil, data science, deal flow, death of newspapers, decentralized internet, Deng Xiaoping, digital divide, digital rights, disinformation, disintermediation, don't be evil, Donald Trump, drone strike, Edward Snowden, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Etonian, Evgeny Morozov, fake news, Filter Bubble, financial engineering, future of work, Future Shock, game design, gig economy, global supply chain, Gordon Gekko, Great Leap Forward, greed is good, income inequality, independent contractor, informal economy, information asymmetry, intangible asset, Internet Archive, Internet of things, invisible hand, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, junk bonds, Kenneth Rogoff, life extension, light touch regulation, low interest rates, Lyft, Mark Zuckerberg, Marshall McLuhan, Martin Wolf, Menlo Park, military-industrial complex, move fast and break things, Network effects, new economy, offshore financial centre, PageRank, patent troll, Paul Volcker talking about ATMs, paypal mafia, Peter Thiel, pets.com, price discrimination, profit maximization, race to the bottom, recommendation engine, ride hailing / ride sharing, Robert Bork, Sand Hill Road, search engine result page, self-driving car, shareholder value, sharing economy, Sheryl Sandberg, Shoshana Zuboff, side hustle, Sidewalk Labs, Silicon Valley, Silicon Valley startup, smart cities, Snapchat, SoftBank, South China Sea, sovereign wealth fund, Steve Bannon, Steve Jobs, Steven Levy, stock buybacks, subscription business, supply-chain management, surveillance capitalism, TaskRabbit, tech billionaire, tech worker, TED Talk, Telecommunications Act of 1996, The Chicago School, the long tail, the new new thing, Tim Cook: Apple, too big to fail, Travis Kalanick, trickle-down economics, Uber and Lyft, Uber for X, uber lyft, Upton Sinclair, warehouse robotics, WeWork, WikiLeaks, zero-sum game

” This was a culture in which the metrics were always right. The company was simply serving users, even if that meant knowingly monetizing content that was undermining the fabric of democracy.3 A spokesperson at YouTube, which doesn’t contradict the basic facts of Chaslot’s account, told me in 2018 that the company’s recommendation system has “changed substantially over time” and now includes other metrics beyond watch time, including consumer surveys and the number of shares and likes. And, as this book goes to press in the summer of 2019, YouTube is, in the wake of the FTC investigations along with numerous reports of pedophiles using the platform to find and share videos of children,4 considering whether to shift children’s content into an entirely separate app to avoid such problems.5 But as anyone who uses the site knows, you are, at this moment, still served up more of whatever you have spent the most time with—whether that’s videos of cats playing the piano or conspiracy theories.


pages: 416 words: 112,268

Human Compatible: Artificial Intelligence and the Problem of Control by Stuart Russell

3D printing, Ada Lovelace, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Alfred Russel Wallace, algorithmic bias, AlphaGo, Andrew Wiles, artificial general intelligence, Asilomar, Asilomar Conference on Recombinant DNA, augmented reality, autonomous vehicles, basic income, behavioural economics, Bletchley Park, blockchain, Boston Dynamics, brain emulation, Cass Sunstein, Charles Babbage, Claude Shannon: information theory, complexity theory, computer vision, Computing Machinery and Intelligence, connected car, CRISPR, crowdsourcing, Daniel Kahneman / Amos Tversky, data science, deep learning, deepfake, DeepMind, delayed gratification, Demis Hassabis, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Ernest Rutherford, fake news, Flash crash, full employment, future of work, Garrett Hardin, Geoffrey Hinton, Gerolamo Cardano, Goodhart's law, Hans Moravec, ImageNet competition, Intergovernmental Panel on Climate Change (IPCC), Internet of things, invention of the wheel, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John Nash: game theory, John von Neumann, Kenneth Arrow, Kevin Kelly, Law of Accelerating Returns, luminiferous ether, machine readable, machine translation, Mark Zuckerberg, multi-armed bandit, Nash equilibrium, Nick Bostrom, Norbert Wiener, NP-complete, OpenAI, openstreetmap, P = NP, paperclip maximiser, Pareto efficiency, Paul Samuelson, Pierre-Simon Laplace, positional goods, probability theory / Blaise Pascal / Pierre de Fermat, profit maximization, RAND corporation, random walk, Ray Kurzweil, Recombinant DNA, recommendation engine, RFID, Richard Thaler, ride hailing / ride sharing, Robert Shiller, robotic process automation, Rodney Brooks, Second Machine Age, self-driving car, Shoshana Zuboff, Silicon Valley, smart cities, smart contracts, social intelligence, speech recognition, Stephen Hawking, Steven Pinker, superintelligent machines, surveillance capitalism, Thales of Miletus, The Future of Employment, The Theory of the Leisure Class by Thorstein Veblen, Thomas Bayes, Thorstein Veblen, Tragedy of the Commons, transport as a service, trolley problem, Turing machine, Turing test, universal basic income, uranium enrichment, vertical integration, Von Neumann architecture, Wall-E, warehouse robotics, Watson beat the top human players on Jeopardy!, web application, zero-sum game

A new word, softbot, was coined to describe software “robots” that operate entirely in a software environment such as the Web. Softbots, or bots as they later became known, perceive Web pages and act by emitting sequences of characters, URLs, and so on. AI companies mushroomed during the dot-com boom (1997–2000), providing core capabilities for search and e-commerce, including link analysis, recommendation systems, reputation systems, comparison shopping, and product categorization. In the early 2000s, the widespread adoption of mobile phones with microphones, cameras, accelerometers, and GPS provided new access for AI systems to people’s daily lives; “smart speakers” such as the Amazon Echo, Google Home, and Apple HomePod have completed this process.


pages: 903 words: 235,753

The Stack: On Software and Sovereignty by Benjamin H. Bratton

1960s counterculture, 3D printing, 4chan, Ada Lovelace, Adam Curtis, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Robotics, Amazon Web Services, Andy Rubin, Anthropocene, augmented reality, autonomous vehicles, basic income, Benevolent Dictator For Life (BDFL), Berlin Wall, bioinformatics, Biosphere 2, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, capitalist realism, carbon credits, carbon footprint, carbon tax, carbon-based life, Cass Sunstein, Celebration, Florida, Charles Babbage, charter city, clean water, cloud computing, company town, congestion pricing, connected car, Conway's law, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, digital capitalism, digital divide, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Douglas Engelbart, driverless car, Edward Snowden, Elon Musk, en.wikipedia.org, Eratosthenes, Ethereum, ethereum blockchain, Evgeny Morozov, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, fulfillment center, functional programming, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, high-speed rail, Hyperloop, Ian Bogost, illegal immigration, industrial robot, information retrieval, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Jacob Appelbaum, James Bridle, Jaron Lanier, Joan Didion, John Markoff, John Perry Barlow, Joi Ito, Jony Ive, Julian Assange, Khan Academy, Kim Stanley Robinson, Kiva Systems, Laura Poitras, liberal capitalism, lifelogging, linked data, lolcat, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megaproject, megastructure, Menlo Park, Minecraft, MITM: man-in-the-middle, Monroe Doctrine, Neal Stephenson, Network effects, new economy, Nick Bostrom, ocean acidification, off-the-grid, offshore financial centre, oil shale / tar sands, Oklahoma City bombing, OSI model, packet switching, PageRank, pattern recognition, peak oil, peer-to-peer, performance metric, personalized medicine, Peter Eisenman, Peter Thiel, phenotype, Philip Mirowski, Pierre-Simon Laplace, place-making, planetary scale, pneumatic tube, post-Fordism, precautionary principle, RAND corporation, recommendation engine, reserve currency, rewilding, RFID, Robert Bork, Sand Hill Road, scientific management, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, skeuomorphism, Slavoj Žižek, smart cities, smart grid, smart meter, Snow Crash, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, synthetic biology, TaskRabbit, technological determinism, TED Talk, the built environment, The Chicago School, the long tail, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, undersea cable, universal basic income, urban planning, Vernor Vinge, vertical integration, warehouse automation, warehouse robotics, Washington Consensus, web application, Westphalian system, WikiLeaks, working poor, Y Combinator, yottabyte

As discussed in the City layer chapter, there is then a kind of programmatic blending between the urban situation through which a User moves and the interactions he may be having with a specific App and Cloud service. A mall becomes a game board, a sidewalk becomes a banking center, a restaurant becomes the scene of a crime in a crowd-sourced recommendation engine, birds are angry and enemies are identified, and the experience of these may be very different for different people and purposes. At any given moment, multiple Users interacting with different Apps in the same place may have brought their shared location into contrasting Cloud dramas; one may be ensconced in a first-person shooter game and the other in measuring his carbon footprint, further fragmenting any apparent solidarity of the crowd.


pages: 396 words: 117,897

Making the Modern World: Materials and Dematerialization by Vaclav Smil

2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, additive manufacturing, American Society of Civil Engineers: Report Card, Apollo 11, Apollo Guidance Computer, Boeing 747, British Empire, decarbonisation, degrowth, deindustrialization, dematerialisation, Deng Xiaoping, energy transition, Fellow of the Royal Society, flying shuttle, Ford Model T, global pandemic, Haber-Bosch Process, happiness index / gross national happiness, hydraulic fracturing, income inequality, indoor plumbing, Intergovernmental Panel on Climate Change (IPCC), James Watt: steam engine, megacity, megastructure, microplastics / micro fibres, oil shale / tar sands, peak oil, post-industrial society, Post-Keynesian economics, purchasing power parity, recommendation engine, rolodex, X Prize

Even for economies with good historical statistics, all pre-World War II GDP estimates are less reliable than their post-1950 counterparts, and for many modernizing economies they are simply unavailable, or amount to nothing but rough estimates: these realities make reliable long-term international comparisons questionable. Moreover, recent GDPs, calculated according to a UN-recommended System of National Accounts, exclude all black market (underground economy) transactions whose addition would boost the total by 10–15% even in the most law-abiding countries, and could double the economy's size in the most lawless settings. But, once again, the most important bias comes from conversion.


pages: 424 words: 114,905

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again by Eric Topol

"World Economic Forum" Davos, 23andMe, Affordable Care Act / Obamacare, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic bias, AlphaGo, Apollo 11, artificial general intelligence, augmented reality, autism spectrum disorder, autonomous vehicles, backpropagation, Big Tech, bioinformatics, blockchain, Cambridge Analytica, cloud computing, cognitive bias, Colonization of Mars, computer age, computer vision, Computing Machinery and Intelligence, conceptual framework, creative destruction, CRISPR, crowdsourcing, Daniel Kahneman / Amos Tversky, dark matter, data science, David Brooks, deep learning, DeepMind, Demis Hassabis, digital twin, driverless car, Elon Musk, en.wikipedia.org, epigenetics, Erik Brynjolfsson, fake news, fault tolerance, gamification, general purpose technology, Geoffrey Hinton, George Santayana, Google Glasses, ImageNet competition, Jeff Bezos, job automation, job satisfaction, Joi Ito, machine translation, Mark Zuckerberg, medical residency, meta-analysis, microbiome, move 37, natural language processing, new economy, Nicholas Carr, Nick Bostrom, nudge unit, OpenAI, opioid epidemic / opioid crisis, pattern recognition, performance metric, personalized medicine, phenotype, placebo effect, post-truth, randomized controlled trial, recommendation engine, Rubik’s Cube, Sam Altman, self-driving car, Silicon Valley, Skinner box, speech recognition, Stephen Hawking, techlash, TED Talk, text mining, the scientific method, Tim Cook: Apple, traumatic brain injury, trolley problem, War on Poverty, Watson beat the top human players on Jeopardy!, working-age population

., “Deal Struck to Mine Cancer Patient Database for New Treatment Insights,” Stat News. 2017. 32. Muoio, D., “Machine Learning App Migraine Alert Warns Patients of Oncoming Episodes,” MobiHealthNews. 2017. 33. Comstock, J., “New ResApp Data Shows ~90 Percent Accuracy When Diagnosing Range of Respiratory Conditions,” MobiHealthNews. 2017. 34. Han, Q., et al., A Hybrid Recommender System for Patient-Doctor Matchmaking in Primary Care. arXiv, 2018. 35. Razzaki, S., et al., A Comparative Study of Artificial Intelligence and Human Doctors for the Purpose of Triage and Diagnosis. arXiv, 2018; Olson, P., “This AI Just Beat Human Doctors on a Clinical Exam,” Forbes. 2018. 36. Foley, K.


Super Thinking: The Big Book of Mental Models by Gabriel Weinberg, Lauren McCann

Abraham Maslow, Abraham Wald, affirmative action, Affordable Care Act / Obamacare, Airbnb, Albert Einstein, anti-pattern, Anton Chekhov, Apollo 13, Apple Newton, autonomous vehicles, bank run, barriers to entry, Bayesian statistics, Bernie Madoff, Bernie Sanders, Black Swan, Broken windows theory, business process, butterfly effect, Cal Newport, Clayton Christensen, cognitive dissonance, commoditize, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, dark pattern, David Attenborough, delayed gratification, deliberate practice, discounted cash flows, disruptive innovation, Donald Trump, Douglas Hofstadter, Dunning–Kruger effect, Edward Lorenz: Chaos theory, Edward Snowden, effective altruism, Elon Musk, en.wikipedia.org, experimental subject, fake news, fear of failure, feminist movement, Filter Bubble, framing effect, friendly fire, fundamental attribution error, Goodhart's law, Gödel, Escher, Bach, heat death of the universe, hindsight bias, housing crisis, if you see hoof prints, think horses—not zebras, Ignaz Semmelweis: hand washing, illegal immigration, imposter syndrome, incognito mode, income inequality, information asymmetry, Isaac Newton, Jeff Bezos, John Nash: game theory, karōshi / gwarosa / guolaosi, lateral thinking, loss aversion, Louis Pasteur, LuLaRoe, Lyft, mail merge, Mark Zuckerberg, meta-analysis, Metcalfe’s law, Milgram experiment, minimum viable product, moral hazard, mutually assured destruction, Nash equilibrium, Network effects, nocebo, nuclear winter, offshore financial centre, p-value, Paradox of Choice, Parkinson's law, Paul Graham, peak oil, Peter Thiel, phenotype, Pierre-Simon Laplace, placebo effect, Potemkin village, power law, precautionary principle, prediction markets, premature optimization, price anchoring, principal–agent problem, publication bias, recommendation engine, remote working, replication crisis, Richard Feynman, Richard Feynman: Challenger O-ring, Richard Thaler, ride hailing / ride sharing, Robert Metcalfe, Ronald Coase, Ronald Reagan, Salesforce, school choice, Schrödinger's Cat, selection bias, Shai Danziger, side project, Silicon Valley, Silicon Valley startup, speech recognition, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, Streisand effect, sunk-cost fallacy, survivorship bias, systems thinking, The future is already here, The last Blockbuster video rental store is in Bend, Oregon, The Present Situation in Quantum Mechanics, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Tragedy of the Commons, transaction costs, uber lyft, ultimatum game, uranium enrichment, urban planning, vertical integration, Vilfredo Pareto, warehouse robotics, WarGames: Global Thermonuclear War, When a measure becomes a target, wikimedia commons

Many algorithms operate as black boxes, which means they require very little understanding by the user of how they work. You don’t care how you got the best seats, you just want the best seats! You can think of each algorithm as a box where inputs go in and outputs come out, but outside it is painted black so you can’t tell what is going on inside. Common examples of black box algorithms include recommendation systems on Netflix or Amazon, matching on online dating sites, and content moderation on social media. Physical tools can also be black boxes. Two sayings, “The skill is built into the tool” and “The craftsmanship is the workbench itself,” suggest that the more sophisticated tools get, the fewer skills are required to operate them.


Enriching the Earth: Fritz Haber, Carl Bosch, and the Transformation of World Food Production by Vaclav Smil

agricultural Revolution, Albert Einstein, demographic transition, Deng Xiaoping, Great Leap Forward, Haber-Bosch Process, invention of gunpowder, Louis Pasteur, military-industrial complex, Pearl River Delta, precision agriculture, recommendation engine, The Design of Experiments

Boca Raton, Fla.: Lewis Publishing; Trenkel, M. A. 1997. Improving Fertilizer Use Efficiency. Paris: IFA. 11. Havlin, J. L., et al., eds. 1994. Soil Testing: Prospects for Improving Nutrient Recommendations. Madison, Wis.: Soil Science Society of America; MacKenzie, G. H., and J.-C. Taureau. 1997. Recommendation Systems for Nitrogen—A Review. York: Fertiliser Society. Periodic testing for major macronutrients has been common in high-income nations for decades, but testing for micronutrient deficiencies (ranging from boron and copper in many crops to molybdenum and cobalt needed by nitrogenase in leguminous species) has been much less frequent. 12.


pages: 476 words: 132,042

What Technology Wants by Kevin Kelly

Albert Einstein, Alfred Russel Wallace, Apollo 13, Boeing 747, Buckminster Fuller, c2.com, carbon-based life, Cass Sunstein, charter city, classic study, Clayton Christensen, cloud computing, computer vision, cotton gin, Danny Hillis, dematerialisation, demographic transition, digital divide, double entry bookkeeping, Douglas Engelbart, Edward Jenner, en.wikipedia.org, Exxon Valdez, Fairchild Semiconductor, Ford Model T, George Gilder, gravity well, Great Leap Forward, Gregor Mendel, hive mind, Howard Rheingold, interchangeable parts, invention of air conditioning, invention of writing, Isaac Newton, Jaron Lanier, Joan Didion, John Conway, John Markoff, John von Neumann, Kevin Kelly, knowledge economy, Lao Tzu, life extension, Louis Daguerre, Marshall McLuhan, megacity, meta-analysis, new economy, off grid, off-the-grid, out of africa, Paradox of Choice, performance metric, personalized medicine, phenotype, Picturephone, planetary scale, precautionary principle, quantum entanglement, RAND corporation, random walk, Ray Kurzweil, recommendation engine, refrigerator car, rewilding, Richard Florida, Rubik’s Cube, Silicon Valley, silicon-based life, skeuomorphism, Skype, speech recognition, Stephen Hawking, Steve Jobs, Stewart Brand, Stuart Kauffman, technological determinism, Ted Kaczynski, the built environment, the long tail, the scientific method, Thomas Malthus, Vernor Vinge, wealth creators, Whole Earth Catalog, Y2K, yottabyte

As always, the solution to the problems that technology brings, such as an overwhelming diversity of choices, is better technologies. The solution to ultradiversity will be choice-assist technologies. These better tools will aid humans in making choices among bewildering options. That is what search engines, recommendation systems, tagging, and a lot of social media are all about. Diversity, in fact, will produce tools to handle diversity. (Diversity-taming tools will be among the wildly diversity-making 821 million patents that current rates predict will have been filed in the U.S. Patent Office by 2060!) We are already discovering how to use computers to augment our choices with information and web pages (Google is one such tool), but it will take additional learning and technologies to do this with tangible stuff and idiosyncratic media.


pages: 1,829 words: 135,521

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney

Bear Stearns, business process, data science, Debian, duck typing, Firefox, general-purpose programming language, Google Chrome, Guido van Rossum, index card, p-value, quantitative trading / quantitative finance, random walk, recommendation engine, sentiment analysis, side project, sorting algorithm, statistical model, Two Sigma, type inference

Percentage Windows and non-Windows users in top-occurring time zones We could have computed the normalized sum more efficiently by using the transform method with groupby: In [66]: g = count_subset.groupby('tz') In [67]: results2 = count_subset.total / g.total.transform('sum') 14.2 MovieLens 1M Dataset GroupLens Research provides a number of collections of movie ratings data collected from users of MovieLens in the late 1990s and early 2000s. The data provide movie ratings, movie metadata (genres and year), and demographic data about the users (age, zip code, gender identification, and occupation). Such data is often of interest in the development of recommendation systems based on machine learning algorithms. While we do not explore machine learning techniques in detail in this book, I will show you how to slice and dice datasets like these into the exact form you need. The MovieLens 1M dataset contains 1 million ratings collected from 6,000 users on 4,000 movies.


pages: 460 words: 131,579

Masters of Management: How the Business Gurus and Their Ideas Have Changed the World—for Better and for Worse by Adrian Wooldridge

"Friedman doctrine" OR "shareholder theory", "World Economic Forum" Davos, affirmative action, Alan Greenspan, barriers to entry, behavioural economics, Black Swan, blood diamond, borderless world, business climate, business cycle, business intelligence, business process, carbon footprint, Cass Sunstein, Clayton Christensen, clean tech, cloud computing, collaborative consumption, collapse of Lehman Brothers, collateralized debt obligation, commoditize, company town, corporate governance, corporate social responsibility, creative destruction, credit crunch, crowdsourcing, David Brooks, David Ricardo: comparative advantage, disintermediation, disruptive innovation, do well by doing good, don't be evil, Donald Trump, Edward Glaeser, Exxon Valdez, financial deregulation, Ford Model T, Frederick Winslow Taylor, future of work, George Gilder, global supply chain, Golden arches theory, hobby farmer, industrial cluster, intangible asset, It's morning again in America, job satisfaction, job-hopping, joint-stock company, Joseph Schumpeter, junk bonds, Just-in-time delivery, Kickstarter, knowledge economy, knowledge worker, lake wobegon effect, Long Term Capital Management, low skilled workers, Mark Zuckerberg, McMansion, means of production, Menlo Park, meritocracy, Michael Milken, military-industrial complex, mobile money, Naomi Klein, Netflix Prize, Network effects, new economy, Nick Leeson, Norman Macrae, open immigration, patent troll, Ponzi scheme, popular capitalism, post-industrial society, profit motive, purchasing power parity, radical decentralization, Ralph Nader, recommendation engine, Richard Florida, Richard Thaler, risk tolerance, Ronald Reagan, science of happiness, scientific management, shareholder value, Silicon Valley, Silicon Valley startup, Skype, Social Responsibility of Business Is to Increase Its Profits, Steve Jobs, Steven Levy, supply-chain management, tacit knowledge, technoutopianism, the long tail, The Soul of a New Machine, The Wealth of Nations by Adam Smith, Thomas Davenport, Tony Hsieh, too big to fail, vertical integration, wealth creators, women in the workforce, young professional, Zipcar

They also discover that the crowds don’t always have their best interests at heart: when Justin Bieber, a Canadian teenage pop star, asked his fans for suggestions as to what country he should visit next, the most popular answer was North Korea.9 One popular solution to the problem of oversupply is to use prizes to give crowdsourcing a focus and structure. The value of prizes being offered by corporations has more than tripled over the past decade, to $375 million.10 Netflix offers a $1 million prize to anyone who can improve its film recommendation system by 10 percent. Frito-Lay offers prizes to people who can come up with new TV ads for its products. Indeed, prizes have become businesses in their own right: InnoCentive has created a network of 170,000 scientists who stand ready to solve R&D problems for a price. Regular users include some of the world’s biggest companies, such as Eli Lilly, which helped to found the network in 2001; Boeing; DuPont; and P&G.


pages: 752 words: 131,533

Python for Data Analysis by Wes McKinney

Alignment Problem, backtesting, Bear Stearns, cognitive dissonance, crowdsourcing, data science, Debian, duck typing, Firefox, functional programming, Google Chrome, Guido van Rossum, index card, machine readable, random walk, recommendation engine, revision control, sentiment analysis, Sharpe ratio, side project, sorting algorithm, statistical model, type inference

MovieLens 1M Data Set GroupLens Research (http://www.grouplens.org/node/73) provides a number of collections of movie ratings data collected from users of MovieLens in the late 1990s and early 2000s. The data provide movie ratings, movie metadata (genres and year), and demographic data about the users (age, zip code, gender, and occupation). Such data is often of interest in the development of recommendation systems based on machine learning algorithms. While I will not be exploring machine learning techniques in great detail in this book, I will show you how to slice and dice data sets like these into the exact form you need. The MovieLens 1M data set contains 1 million ratings collected from 6000 users on 4000 movies.


Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth by Stuart Ritchie

Albert Einstein, anesthesia awareness, autism spectrum disorder, Bayesian statistics, Black Lives Matter, Carmen Reinhart, Cass Sunstein, Charles Babbage, citation needed, Climatic Research Unit, cognitive dissonance, complexity theory, coronavirus, correlation does not imply causation, COVID-19, crowdsourcing, data science, deindustrialization, Donald Trump, double helix, en.wikipedia.org, epigenetics, Estimating the Reproducibility of Psychological Science, fake news, Goodhart's law, Growth in a Time of Debt, Helicobacter pylori, Higgs boson, hype cycle, Kenneth Rogoff, l'esprit de l'escalier, Large Hadron Collider, meta-analysis, microbiome, Milgram experiment, mouse model, New Journalism, ocean acidification, p-value, phenotype, placebo effect, profit motive, publication bias, publish or perish, quantum entanglement, race to the bottom, randomized controlled trial, recommendation engine, rent-seeking, replication crisis, Richard Thaler, risk tolerance, Ronald Reagan, Scientific racism, selection bias, Silicon Valley, Silicon Valley startup, social distancing, Stanford prison experiment, statistical model, stem cell, Steven Pinker, TED Talk, Thomas Bayes, twin studies, Tyler Cowen, University of East Anglia, Wayback Machine

Even worse: of that seven, fully six were redundant compared to much simpler methods that had been known about for years before these new algorithms appeared. Maurizio Ferrari Dacrema et al., ‘Are We Really Making Much Progress?: A Worrying Analysis of Recent Neural Recommendation Approaches’, in Proceedings of the 13th ACM Conference on Recommender Systems – RecSys 2019 (Copenhagen, Denmark: ACM Press, 2019): pp. 101–9; https://doi.org/10.1145/3298689.3347058. See also this report from computer science, which hints that new researchers are having trouble reproducing the performance of several classic algorithms – something of a ticking time bomb, since ‘young researchers don’t want to be seen as criticising senior researchers’ by publishing failures to reproduce the performance of algorithms the senior researchers had developed and on which they’d staked their reputations: Matthew Hutson, ‘Artificial Intelligence Faces Reproducibility Crisis’, Science 359, no. 6377 (16 Feb. 2018): pp. 725–26; https://doi.org/10.1126/science.359.6377.725, p. 726. 44.  


pages: 502 words: 132,062

Ways of Being: Beyond Human Intelligence by James Bridle

Ada Lovelace, Airbnb, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Anthropocene, Any sufficiently advanced technology is indistinguishable from magic, autonomous vehicles, behavioural economics, Benoit Mandelbrot, Berlin Wall, Big Tech, Black Lives Matter, blockchain, Californian Ideology, Cambridge Analytica, carbon tax, Charles Babbage, cloud computing, coastline paradox / Richardson effect, Computing Machinery and Intelligence, corporate personhood, COVID-19, cryptocurrency, DeepMind, Donald Trump, Douglas Hofstadter, Elon Musk, experimental subject, factory automation, fake news, friendly AI, gig economy, global pandemic, Gödel, Escher, Bach, impulse control, James Bridle, James Webb Space Telescope, John von Neumann, Kickstarter, Kim Stanley Robinson, language acquisition, life extension, mandelbrot fractal, Marshall McLuhan, microbiome, music of the spheres, negative emissions, Nick Bostrom, Norbert Wiener, paperclip maximiser, pattern recognition, peer-to-peer, planetary scale, RAND corporation, random walk, recommendation engine, self-driving car, SETI@home, shareholder value, Silicon Valley, Silicon Valley ideology, speech recognition, statistical model, surveillance capitalism, techno-determinism, technological determinism, technoutopianism, the long tail, the scientific method, The Soul of a New Machine, theory of mind, traveling salesman, trolley problem, Turing complete, Turing machine, Turing test, UNCLOS, undersea cable, urban planning, Von Neumann architecture, wikimedia commons, zero-sum game

Google and others’ stated mission is to reduce this vast complexity. Their less trumpeted goal is to profit from it, at the expense of our own potential for random encounters, and thereby for our own evolution. So many of our tools are designed to reduce randomness in a similar fashion: from algorithmic recommendation systems to dating apps, from GPS navigation to weather forecasting. Each of these technologies – with the best of intentions – attempts to draw clear lines through a complex environment and provides us with a route to our desires free from obstructions, diversions and the vagaries of chance and unforeseen encounters.


pages: 1,202 words: 144,667

The Linux kernel primer: a top-down approach for x86 and PowerPC architectures by Claudia Salzberg Rodriguez, Gordon Fischer, Steven Smolski

Debian, Dennis Ritchie, domain-specific language, en.wikipedia.org, Free Software Foundation, G4S, history of Unix, Ken Thompson, level 1 cache, Multics, recommendation engine, Richard Stallman

Many of the C library routines available to user mode programs, such as the fork() function in Figure 3.9, bundle code and one or more system calls to accomplish a single function. When a user process calls one of these functions, certain values are placed into the appropriate processor registers and a software interrupt is generated. This software interrupt then calls the kernel entry point. Although not recommended, system calls (syscalls) can also be accessed from kernel code. From where a syscall should be accessed is the source of some discussion because syscalls called from the kernel can have an improvement in performance. This improvement in performance is weighed against the added complexity and maintainability of the code.


pages: 598 words: 134,339

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World by Bruce Schneier

23andMe, Airbnb, airport security, AltaVista, Anne Wojcicki, AOL-Time Warner, augmented reality, behavioural economics, Benjamin Mako Hill, Black Swan, Boris Johnson, Brewster Kahle, Brian Krebs, call centre, Cass Sunstein, Chelsea Manning, citizen journalism, Citizen Lab, cloud computing, congestion charging, data science, digital rights, disintermediation, drone strike, Eben Moglen, Edward Snowden, end-to-end encryption, Evgeny Morozov, experimental subject, failed state, fault tolerance, Ferguson, Missouri, Filter Bubble, Firefox, friendly fire, Google Chrome, Google Glasses, heat death of the universe, hindsight bias, informal economy, information security, Internet Archive, Internet of things, Jacob Appelbaum, James Bridle, Jaron Lanier, John Gilmore, John Markoff, Julian Assange, Kevin Kelly, Laura Poitras, license plate recognition, lifelogging, linked data, Lyft, Mark Zuckerberg, moral panic, Nash equilibrium, Nate Silver, national security letter, Network effects, Occupy movement, operational security, Panopticon Jeremy Bentham, payday loans, pre–internet, price discrimination, profit motive, race to the bottom, RAND corporation, real-name policy, recommendation engine, RFID, Ross Ulbricht, satellite internet, self-driving car, Shoshana Zuboff, Silicon Valley, Skype, smart cities, smart grid, Snapchat, social graph, software as a service, South China Sea, sparse data, stealth mode startup, Steven Levy, Stuxnet, TaskRabbit, technological determinism, telemarketer, Tim Cook: Apple, transaction costs, Uber and Lyft, uber lyft, undersea cable, unit 8200, urban planning, Wayback Machine, WikiLeaks, workplace surveillance , Yochai Benkler, yottabyte, zero day

So, for example, Bruce Schneier might be 608429. They were surprised when researchers were able to attach names to numbers by correlating different items in individuals’ search history. In 2008, Netflix published 10 million movie rankings by 500,000 anonymized customers, as part of a challenge for people to come up with better recommendation systems than the one the company was using at that time. Researchers were able to de-anonymize people by comparing rankings and time stamps with public rankings and time stamps in the Internet Movie Database. These might seem like special cases, but correlation opportunities pop up more frequently than you might think.


pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron

AlphaGo, Amazon Mechanical Turk, Anton Chekhov, backpropagation, combinatorial explosion, computer vision, constrained optimization, correlation coefficient, crowdsourcing, data science, deep learning, DeepMind, don't repeat yourself, duck typing, Elon Musk, en.wikipedia.org, friendly AI, Geoffrey Hinton, ImageNet competition, information retrieval, iterative process, John von Neumann, Kickstarter, machine translation, natural language processing, Netflix Prize, NP-complete, OpenAI, optical character recognition, P = NP, p-value, pattern recognition, pull request, recommendation engine, self-driving car, sentiment analysis, SpamAssassin, speech recognition, stochastic process

Going forward, my best advice to you is to practice and practice: try going through all the exercises if you have not done so already, play with the Jupyter notebooks, join Kaggle.com or some other ML community, watch ML courses, read papers, attend conferences, meet experts. You may also want to study some topics that we did not cover in this book, including recommender systems, clustering algorithms, anomaly detection algorithms, and genetic algorithms. My greatest hope is that this book will inspire you to build a wonderful ML application that will benefit all of us! What will it be? Aurélien Géron, November 26th, 2016 1 For more details, be sure to check out Richard Sutton and Andrew Barto’s book on RL, Reinforcement Learning: An Introduction (MIT Press), or David Silver’s free online RL course at University College London. 2 “Playing Atari with Deep Reinforcement Learning,” V.


pages: 680 words: 157,865

Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design by Diomidis Spinellis, Georgios Gousios

Albert Einstein, barriers to entry, business intelligence, business logic, business process, call centre, continuous integration, corporate governance, database schema, Debian, domain-specific language, don't repeat yourself, Donald Knuth, duck typing, en.wikipedia.org, fail fast, fault tolerance, financial engineering, Firefox, Free Software Foundation, functional programming, general-purpose programming language, higher-order functions, iterative process, linked data, locality of reference, loose coupling, meta-analysis, MVC pattern, Neal Stephenson, no silver bullet, peer-to-peer, premature optimization, recommendation engine, Richard Stallman, Ruby on Rails, semantic web, smart cities, social graph, social web, SPARQL, Steve Jobs, Stewart Brand, Strategic Defense Initiative, systems thinking, the Cathedral and the Bazaar, traveling salesman, Turing complete, type inference, web application, zero-coupon bond

The real magic, however, is the explicit linkage between publicly available information, what that linkage represents, and the ease with which we can create windows into this underlying content. There is no starting point, and there is no end in sight. As long as we know what to ask for, we can usually get to it. Several technologies have emerged to help us know what to ask for, either through search engines or some manner of recommendation system. We like giving names to things because we are fundamentally name-oriented beings; we use names to disambiguate “that thing” from “that other thing.” One of our earliest communication acts as children is to name and point to the subjects that interest us and to ask for them. In many ways, the Web is the application of this childlike wonder to our collective wisdom and folly.


pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom

agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, Anthropocene, anti-communist, artificial general intelligence, autism spectrum disorder, autonomous vehicles, backpropagation, barriers to entry, Bayesian statistics, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, Computing Machinery and Intelligence, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, Demis Hassabis, demographic transition, different worldview, Donald Knuth, Douglas Hofstadter, driverless car, Drosophila, Elon Musk, en.wikipedia.org, endogenous growth, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, general purpose technology, Geoffrey Hinton, Gödel, Escher, Bach, hallucination problem, Hans Moravec, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John Markoff, John von Neumann, knowledge worker, Large Hadron Collider, longitudinal study, machine translation, megaproject, Menlo Park, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Nick Bostrom, Norbert Wiener, NP-complete, nuclear winter, operational security, optical character recognition, paperclip maximiser, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, search costs, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, Strategic Defense Initiative, strong AI, superintelligent machines, supervolcano, synthetic biology, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, time dilation, Tragedy of the Commons, transaction costs, trolley problem, Turing machine, Vernor Vinge, WarGames: Global Thermonuclear War, Watson beat the top human players on Jeopardy!, World Values Survey, zero-sum game

Then the entire system was overthrown by the heliocentric theory of Copernicus, which was simpler and—though only after further elaboration by Kepler—more predictively accurate.63 Artificial intelligence methods are now used in more areas than it would make sense to review here, but mentioning a sampling of them will give an idea of the breadth of applications. Aside from the game AIs listed in Table 1, there are hearing aids with algorithms that filter out ambient noise; route-finders that display maps and offer navigation advice to drivers; recommender systems that suggest books and music albums based on a user’s previous purchases and ratings; and medical decision support systems that help doctors diagnose breast cancer, recommend treatment plans, and aid in the interpretation of electrocardiograms. There are robotic pets and cleaning robots, lawn-mowing robots, rescue robots, surgical robots, and over a million industrial robots.64 The world population of robots exceeds 10 million.65 Modern speech recognition, based on statistical techniques such as hidden Markov models, has become sufficiently accurate for practical use (some fragments of this book were drafted with the help of a speech recognition program).


pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives by Steven Levy

"World Economic Forum" Davos, 23andMe, AltaVista, Andy Rubin, Anne Wojcicki, Apple's 1984 Super Bowl advert, autonomous vehicles, Bill Atkinson, book scanning, Brewster Kahle, Burning Man, business process, clean water, cloud computing, crowdsourcing, Dean Kamen, discounted cash flows, don't be evil, Donald Knuth, Douglas Engelbart, Douglas Engelbart, Dutch auction, El Camino Real, Evgeny Morozov, fault tolerance, Firefox, General Magic , Gerard Salton, Gerard Salton, Google bus, Google Chrome, Google Earth, Googley, high-speed rail, HyperCard, hypertext link, IBM and the Holocaust, informal economy, information retrieval, Internet Archive, Jeff Bezos, John Markoff, Ken Thompson, Kevin Kelly, Kickstarter, large language model, machine translation, Mark Zuckerberg, Menlo Park, one-China policy, optical character recognition, PageRank, PalmPilot, Paul Buchheit, Potemkin village, prediction markets, Project Xanadu, recommendation engine, risk tolerance, Rubik’s Cube, Sand Hill Road, Saturday Night Live, search inside the book, second-price auction, selection bias, Sheryl Sandberg, Silicon Valley, SimCity, skunkworks, Skype, slashdot, social graph, social software, social web, spectrum auction, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, subscription business, Susan Wojcicki, Ted Nelson, telemarketer, The future is already here, the long tail, trade route, traveling salesman, turn-by-turn navigation, undersea cable, Vannevar Bush, web application, WikiLeaks, Y Combinator

While he put the pieces of YouTube together, though, he always kept in mind that he was documenting a traditional media system on the verge of collapse. He had to deal with the music world as it was but also plan for the way it would be after disruptions, which Google and YouTube were accelerating. Kamangar had some specific ideas for improvement of YouTube. He urged a simpler user interface and a smarter recommendation system to point users to other videos they might enjoy. He urged more flexibility with producers of professional video so YouTube would get more commercial content. He also emphasized how some of Google’s key attributes—notably speed—had a huge impact on the overall experience. If Google could reliably deliver videos with almost no latency, he reasoned, users might not balk so much at the “preroll” ads that come before the actual content, especially if the video was one of a series that users subscribed to and so were already eager to see what was coming.


pages: 586 words: 186,548

Architects of Intelligence by Martin Ford

3D printing, agricultural Revolution, AI winter, algorithmic bias, Alignment Problem, AlphaGo, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, barriers to entry, basic income, Baxter: Rethink Robotics, Bayesian statistics, Big Tech, bitcoin, Boeing 747, Boston Dynamics, business intelligence, business process, call centre, Cambridge Analytica, cloud computing, cognitive bias, Colonization of Mars, computer vision, Computing Machinery and Intelligence, correlation does not imply causation, CRISPR, crowdsourcing, DARPA: Urban Challenge, data science, deep learning, DeepMind, Demis Hassabis, deskilling, disruptive innovation, Donald Trump, Douglas Hofstadter, driverless car, Elon Musk, Erik Brynjolfsson, Ernest Rutherford, fake news, Fellow of the Royal Society, Flash crash, future of work, general purpose technology, Geoffrey Hinton, gig economy, Google X / Alphabet X, Gödel, Escher, Bach, Hans Moravec, Hans Rosling, hype cycle, ImageNet competition, income inequality, industrial research laboratory, industrial robot, information retrieval, job automation, John von Neumann, Large Hadron Collider, Law of Accelerating Returns, life extension, Loebner Prize, machine translation, Mark Zuckerberg, Mars Rover, means of production, Mitch Kapor, Mustafa Suleyman, natural language processing, new economy, Nick Bostrom, OpenAI, opioid epidemic / opioid crisis, optical character recognition, paperclip maximiser, pattern recognition, phenotype, Productivity paradox, radical life extension, Ray Kurzweil, recommendation engine, Robert Gordon, Rodney Brooks, Sam Altman, self-driving car, seminal paper, sensor fusion, sentiment analysis, Silicon Valley, smart cities, social intelligence, sparse data, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, synthetic biology, systems thinking, Ted Kaczynski, TED Talk, The Rise and Fall of American Growth, theory of mind, Thomas Bayes, Travis Kalanick, Turing test, universal basic income, Wall-E, Watson beat the top human players on Jeopardy!, women in the workforce, working-age population, workplace surveillance , zero-sum game, Zipcar

All over the world people are interacting with AI today through machine translation, image analysis, and computer vision. DeepMind has started working on quite a few things, like optimizing the energy being used in Google’s data centers. We’ve worked on WaveNet, the very human-like text-to-speech system that’s now in the Google Assistant in all Android-powered phones. We use AI in recommendation systems, in Google Play, and even on behind-the-scenes elements like saving battery life on your Android phone. Things that everyone uses every single day. We’re finding that because they’re general algorithms, they’re coming up all over the place, so I think that’s just the beginning. What I’m hoping will come through next are the collaborations we have in healthcare.


pages: 834 words: 180,700

The Architecture of Open Source Applications by Amy Brown, Greg Wilson

8-hour work day, anti-pattern, bioinformatics, business logic, c2.com, cloud computing, cognitive load, collaborative editing, combinatorial explosion, computer vision, continuous integration, Conway's law, create, read, update, delete, David Heinemeier Hansson, Debian, domain-specific language, Donald Knuth, en.wikipedia.org, fault tolerance, finite state, Firefox, Free Software Foundation, friendly fire, functional programming, Guido van Rossum, Ken Thompson, linked data, load shedding, locality of reference, loose coupling, Mars Rover, MITM: man-in-the-middle, MVC pattern, One Laptop per Child (OLPC), peer-to-peer, Perl 6, premature optimization, recommendation engine, revision control, Ruby on Rails, side project, Skype, slashdot, social web, speech recognition, the scientific method, The Wisdom of Crowds, web application, WebSocket

To cater to a broader set of users, including many who do not have programming expertise, it provides a series of operations and user interfaces that simplify workflow design and use [FSC+06], including the ability to create and refine workflows by analogy, to query workflows by example, and to suggest workflow completions as users interactively construct their workflows using a recommendation system [SVK+07]. We have also developed a new framework that allows the creation of custom applications that can be more easily deployed to (non-expert) end users. The extensibility of VisTrails comes from an infrastructure that makes it simple for users to integrate tools and libraries, as well as to quickly prototype new functions.


pages: 775 words: 208,604

The Great Leveler: Violence and the History of Inequality From the Stone Age to the Twenty-First Century by Walter Scheidel

agricultural Revolution, assortative mating, basic income, Berlin Wall, Bernie Sanders, Branko Milanovic, British Empire, capital controls, Capital in the Twenty-First Century by Thomas Piketty, classic study, collective bargaining, colonial rule, Columbian Exchange, conceptual framework, confounding variable, corporate governance, cosmological principle, CRISPR, crony capitalism, dark matter, declining real wages, democratizing finance, demographic transition, Dissolution of the Soviet Union, Downton Abbey, Edward Glaeser, failed state, Fall of the Berlin Wall, financial deregulation, fixed income, Francisco Pizarro, full employment, Gini coefficient, global pandemic, Great Leap Forward, guns versus butter model, hiring and firing, income inequality, John Markoff, knowledge worker, land reform, land tenure, low skilled workers, means of production, mega-rich, Network effects, nuclear winter, offshore financial centre, plutocrats, race to the bottom, recommendation engine, rent control, rent-seeking, road to serfdom, Robert Gordon, Ronald Reagan, Second Machine Age, Simon Kuznets, synthetic biology, The Future of Employment, The Rise and Fall of American Growth, The Wealth of Nations by Adam Smith, Thomas Malthus, transaction costs, transatlantic slave trade, universal basic income, very high income, working-age population, zero-sum game

The wealthy either held office themselves or were linked to those who did, and state service and connections to those who performed it in turn generated more personal wealth.8 These dynamics both favored and constrained familial continuity in wealth holding. On the one hand, the sons of high officials were more likely to follow in their footsteps. They and other junior relatives were automatically entitled to enter officialdom and benefited disproportionately from the recommendation system employed to fill governmental positions. We hear of officials among whose brothers and sons six or seven—in one case, no fewer than thirteen sons—also came to serve as imperial administrators. On the other hand, the same predatory and capricious exercise of political power that turned civil servants into plutocrats also undermined their success.


pages: 933 words: 205,691

Hadoop: The Definitive Guide by Tom White

Amazon Web Services, bioinformatics, business intelligence, business logic, combinatorial explosion, data science, database schema, Debian, domain-specific language, en.wikipedia.org, exponential backoff, fallacies of distributed computing, fault tolerance, full text search, functional programming, Grace Hopper, information retrieval, Internet Archive, Kickstarter, Large Hadron Collider, linked data, loose coupling, openstreetmap, recommendation engine, RFID, SETI@home, social graph, sparse data, web application

When processing the received data, we distinguish between a track listen submitted by a user (the first source above, referred to as a scrobble from here on) and a track listened to on the Last.fm radio (the second source, mentioned earlier, referred to as a radio listen from here on). This distinction is very important in order to prevent a feedback loop in the Last.fm recommendation system, which is based only on scrobbles. One of the most fundamental Hadoop jobs at Last.fm takes the incoming listening data and summarizes it into a format that can be used for display purposes on the Last.fm website as well as for input to other Hadoop programs. This is achieved by the Track Statistics program, which is the example described in the following sections.


pages: 678 words: 216,204

The Wealth of Networks: How Social Production Transforms Markets and Freedom by Yochai Benkler

affirmative action, AOL-Time Warner, barriers to entry, bioinformatics, Brownian motion, business logic, call centre, Cass Sunstein, centre right, clean water, commoditize, commons-based peer production, dark matter, desegregation, digital divide, East Village, Eben Moglen, fear of failure, Firefox, Free Software Foundation, game design, George Gilder, hiring and firing, Howard Rheingold, informal economy, information asymmetry, information security, invention of radio, Isaac Newton, iterative process, Jean Tirole, jimmy wales, John Markoff, John Perry Barlow, Kenneth Arrow, Lewis Mumford, longitudinal study, machine readable, Mahbub ul Haq, market bubble, market clearing, Marshall McLuhan, Mitch Kapor, New Journalism, optical character recognition, pattern recognition, peer-to-peer, power law, precautionary principle, pre–internet, price discrimination, profit maximization, profit motive, public intellectual, radical decentralization, random walk, Recombinant DNA, recommendation engine, regulatory arbitrage, rent-seeking, RFID, Richard Stallman, Ronald Coase, scientific management, search costs, Search for Extraterrestrial Intelligence, SETI@home, shareholder value, Silicon Valley, Skype, slashdot, social software, software patent, spectrum auction, subscription business, tacit knowledge, technological determinism, technoutopianism, The Fortune at the Bottom of the Pyramid, the long tail, The Nature of the Firm, the strength of weak ties, Timothy McVeigh, transaction costs, vertical integration, Vilfredo Pareto, work culture , Yochai Benkler

Without one of these noncompetitive infrastructure owners, the home user has no broadband access to the Internet. In Amazon's case, the consumer outrage when the practice was revealed focused on the lack of transparency. Users had little objection to clearly demarcated advertisement. The resistance was to the nontransparent manipulation of the recommendation system aimed at causing the consumers to act in ways consistent with Amazon's goals, rather than their own. In that case, however, there were alternatives. There are many different places from which to find book reviews and recommendations, and [pg 157] at the time, barnesandnoble.com was already available as an online bookseller--and had not significantly adopted similar practices.


Engineering Security by Peter Gutmann

active measures, address space layout randomization, air gap, algorithmic trading, Amazon Web Services, Asperger Syndrome, bank run, barriers to entry, bitcoin, Brian Krebs, business process, call centre, card file, cloud computing, cognitive bias, cognitive dissonance, cognitive load, combinatorial explosion, Credit Default Swap, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, Debian, domain-specific language, Donald Davies, Donald Knuth, double helix, Dr. Strangelove, Dunning–Kruger effect, en.wikipedia.org, endowment effect, false flag, fault tolerance, Firefox, fundamental attribution error, George Akerlof, glass ceiling, GnuPG, Google Chrome, Hacker News, information security, iterative process, Jacob Appelbaum, Jane Jacobs, Jeff Bezos, John Conway, John Gilmore, John Markoff, John von Neumann, Ken Thompson, Kickstarter, lake wobegon effect, Laplace demon, linear programming, litecoin, load shedding, MITM: man-in-the-middle, Multics, Network effects, nocebo, operational security, Paradox of Choice, Parkinson's law, pattern recognition, peer-to-peer, Pierre-Simon Laplace, place-making, post-materialism, QR code, quantum cryptography, race to the bottom, random walk, recommendation engine, RFID, risk tolerance, Robert Metcalfe, rolling blackouts, Ruby on Rails, Sapir-Whorf hypothesis, Satoshi Nakamoto, security theater, semantic web, seminal paper, Skype, slashdot, smart meter, social intelligence, speech recognition, SQL injection, statistical model, Steve Jobs, Steven Pinker, Stuxnet, sunk-cost fallacy, supply-chain attack, telemarketer, text mining, the built environment, The Death and Life of Great American Cities, The Market for Lemons, the payments system, Therac-25, too big to fail, Tragedy of the Commons, Turing complete, Turing machine, Turing test, Wayback Machine, web application, web of trust, x509 certificate, Y2K, zero day, Zimmermann PGP

The same applies for many of the other results of psychology research mentioned above — you can scoff at them, but that won’t change the fact that they work when applied in the field (although admittedly trying to apply the Sapir-Whorf hypothesis to security messages may be going a bit far [20]). You can use social validation in your user interface to guide users in their decisionmaking, and in fact a similar technique has already been applied to the problem of making computer error messages more useful, using a social recommendation system to tune the error messages to make them comprehensible to larger numbers of users [21]. For example when you’re asking the user to make a security-related decision you can prompt them that “most users would do xyz” or “for most users, xyz is the best action”, where xyz is the safest and most appropriate choice.

, Marc Conrad, Tim French, Wei Huang and Carsten Maple, Proceedings of the 1st International Conference on Availability, Reliability and Security (ARES’06), April 2006, p.482. [191] “Graphical Representations of Authorization Policies for Weighted Credentials”, Isaac Agudo, Javier Lopez and Jose Montenegro, Proceedings of the 11th Australasian Conference on Information Security and Privacy (ACISP’06), Springer-Verlag LNCS No.4058, July 2006, p.87. [192] ”Vulnerability analysis of certificate graphs”, Eunjin Jung and Mohamed Gouda, International Journal of Security and Networks, Vol.1, No.1/2 (2006), p.13. [193] “Towards a Precise Semantics for Authenticity and Trust”, Reto Kohlas, Jacek Jonczy and Rolf Haenni, Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services, October 2006, Article No.18. [194] “A Hybrid Trust Model for Enhancing Security in Distributed Systems”, Ching Lin and Vijay Varadharajan, Proceedings of the 2nd International Conference on Availability, Reliability and Security (ARES’07), April 2007, p.35. [195] “A Probabilistic Trust Model for GnuPG”, Jacek Jonczy, Markus Wűthrich and Rolf Haenni, presentation at the 23rd Chaos Communication Congress (23C3), December 2006, https://events.ccc.de/congress/2006/Fahrplan/attachments/1101-JWH06.pdf. [196] “Trust-Based Recommendation Systems: an Axiomatic Approach”, Reid Andersen, Christian Borgs, Jennifer Chayes, Uriel Feige, Abraham Flaxman, Adam Kalai, Vahab Mirrokni and Moshe Tennenholtz, Proceedings of the 17th World Wide Web Conference (WWW’08), April 2008, p.199. [197] “An Adaptive Probabilistic Trust Model and Its Evaluation” Chung-Wei Hang, Yonghong Wang and Munindar Singh, Proceedings of the 7th Conference on Autonomous Agents and Multiagent Systems (AAMAS’08), May 2008, p.1485. [198] “Trust*: Using Local Guarantees to Extend the Reach of Trust”, Stephen Clarke, Bruce Christianson and Hannan Xiao, Proceedings of the 17th Security Protocols Workshop (Protocols’09), Springer-Verlag LNCS No.7028, April 2009, p.189. [199] “Trust Is in the Eye of the Beholder”, Dimitri DeFigueiredo, Earl Barr and S.Felix Wu, Proceedings of the Conference on Computational Science and Engineering (CSE’09), August 2009, Vol.3, p.100.


pages: 918 words: 257,605

The Age of Surveillance Capitalism by Shoshana Zuboff

"World Economic Forum" Davos, algorithmic bias, Amazon Web Services, Andrew Keen, augmented reality, autonomous vehicles, barriers to entry, Bartolomé de las Casas, behavioural economics, Berlin Wall, Big Tech, bitcoin, blockchain, blue-collar work, book scanning, Broken windows theory, California gold rush, call centre, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, choice architecture, citizen journalism, Citizen Lab, classic study, cloud computing, collective bargaining, Computer Numeric Control, computer vision, connected car, context collapse, corporate governance, corporate personhood, creative destruction, cryptocurrency, data science, deep learning, digital capitalism, disinformation, dogs of the Dow, don't be evil, Donald Trump, Dr. Strangelove, driverless car, Easter island, Edward Snowden, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, facts on the ground, fake news, Ford Model T, Ford paid five dollars a day, future of work, game design, gamification, Google Earth, Google Glasses, Google X / Alphabet X, Herman Kahn, hive mind, Ian Bogost, impulse control, income inequality, information security, Internet of things, invention of the printing press, invisible hand, Jean Tirole, job automation, Johann Wolfgang von Goethe, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, Kevin Roose, knowledge economy, Lewis Mumford, linked data, longitudinal study, low skilled workers, Mark Zuckerberg, market bubble, means of production, multi-sided market, Naomi Klein, natural language processing, Network effects, new economy, Occupy movement, off grid, off-the-grid, PageRank, Panopticon Jeremy Bentham, pattern recognition, Paul Buchheit, performance metric, Philip Mirowski, precision agriculture, price mechanism, profit maximization, profit motive, public intellectual, recommendation engine, refrigerator car, RFID, Richard Thaler, ride hailing / ride sharing, Robert Bork, Robert Mercer, Salesforce, Second Machine Age, self-driving car, sentiment analysis, shareholder value, Sheryl Sandberg, Shoshana Zuboff, Sidewalk Labs, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, slashdot, smart cities, Snapchat, social contagion, social distancing, social graph, social web, software as a service, speech recognition, statistical model, Steve Bannon, Steve Jobs, Steven Levy, structural adjustment programs, surveillance capitalism, technological determinism, TED Talk, The Future of Employment, The Wealth of Nations by Adam Smith, Tim Cook: Apple, two-sided market, union organizing, vertical integration, Watson beat the top human players on Jeopardy!, winner-take-all economy, Wolfgang Streeck, work culture , Yochai Benkler, you are the product

Zuckerberg had described the corporation’s decision to unilaterally release users’ personal information, declaring, “We decided that these would be the social norms now, and we just went for it.”55 Despite their misgivings, the authors went on to suggest the relevance of their findings for “marketing,” “user interface design,” and recommender systems.56 In 2013 another provocative study by Kosinski, Stillwell, and Microsoft’s Thore Graepel revealed that Facebook “likes” could “automatically and accurately estimate a wide range of personal attributes that people would typically assume to be private,” including sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender.57 The authors appeared increasingly ambivalent about the social implications of their work.