recommendation engine

80 results back to index

pages: 23 words: 5,264

Designing Great Data Products by Jeremy Howard, Mike Loukides, Margit Zwemer


AltaVista, Filter Bubble, PageRank, pattern recognition, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, text mining

One of the authors of this paper was explaining an iterative optimization technique, and the host says, “So, in a sense Jeremy, your approach was like that of doing a startup, which is just get something out there and iterate and iterate and iterate.” The takeaway, whether you are a tiny startup or a giant insurance company, is that we unconsciously use optimization whenever we decide how to get to where we want to go. Drivetrain Approach to recommender systems Let’s look at how we could apply this process to another industry: marketing. We begin by applying the Drivetrain Approach to a familiar example, recommendation engines, and then building this up into an entire optimized marketing strategy. Recommendation engines are a familiar example of a data product based on well-built predictive models that do not achieve an optimal objective. The current algorithms predict what products a customer will like, based on purchase history and the histories of similar customers. A company like Amazon represents every purchase that has ever been made as a giant sparse matrix, with customers as the rows and products as the columns.

These models are good at predicting whether a customer will like a given product, but they often suggest products that the customer already knows about or has already decided not to buy. Amazon’s recommendation engine is probably the best one out there, but it’s easy to get it to show its warts. Here is a screenshot of the “Customers Who Bought This Item Also Bought” feed on Amazon from a search for the latest book in Terry Pratchett’s “Discworld series:” All of the recommendations are for other books in the same series, but it’s a good assumption that a customer who searched for “Terry Pratchett” is already aware of these books. There may be some unexpected recommendations on pages 2 through 14 of the feed, but how many customers are going to bother clicking through? Instead, let’s design an improved recommendation engine using the Drivetrain Approach, starting by reconsidering our objective. The objective of a recommendation engine is to drive additional sales by surprising and delighting the customer with books he or she would not have purchased without the recommendation.

While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. O'Reilly Media * * * Chapter 1. Designing Great Data Products By Jeremy Howard, Margit Zwemer, and Mike Loukides In the past few years, we’ve seen many data products based on predictive modeling. These products range from weather forecasting to recommendation engines to services that predict airline flight times more accurately than the airline itself. But these products are still just making predictions, rather than asking what action they want someone to take as a result of a prediction. Prediction technology can be interesting and mathematically elegant, but we need to take the next step. The technology exists to build data products that can revolutionize entire industries.

pages: 519 words: 102,669

Programming Collective Intelligence by Toby Segaran


correlation coefficient, Debian,, Firefox, full text search, information retrieval, PageRank, prediction markets, recommendation engine, slashdot, web application

To find a set of links similar to one that you found particularly interesting, you can try: >>url=recommendations.getRecommendations(delusers,user)[0][1] >> recommendations.topMatches(recommendations.transformPrefs(delusers),url) [(0.312, u''), (0.312, u''), (0.266, u''), (0.254, u''), (0.254, u'')] That's it! You've successfully added a recommendation engine to There's a lot more that could be done here. Since supports searching by tags, you can look for tags that are similar to each other. You can even search for people trying to manipulate the "popular" pages by posting the same links with multiple accounts. Item-Based Filtering The way the recommendation engine has been implemented so far requires the use of all the rankings from every user in order to create a dataset. This will probably work well for a few thousand people or items, but a very large site like Amazon has millions of customers and products—comparing a user with every other user and then comparing every product each user has rated can be very slow.

Introduction to Collective Intelligence Netflix is an online DVD rental company that lets people choose movies to be sent to their homes, and makes recommendations based on the movies that customers have previously rented. In late 2006 it announced a prize of $1 million to the first person to improve the accuracy of its recommendation system by 10 percent, along with progress prizes of $50,000 to the current leader each year for as long as the contest runs. Thousands of teams from all over the world entered and, as of April 2007, the leading team has managed to score an improvement of 7 percent. By using data about which movies each customer enjoyed, Netflix is able to recommend movies to other customers that they may never have even heard of and keep them coming back for more. Any way to improve its recommendation system is worth a lot of money to Netflix. The search engine Google was started in 1998, at a time when there were already several big search engines, and many assumed that a new player would never be able to take on the giants.

Google is likely the largest effort—it not only uses web links to rank pages, but it constantly gathers information on when advertisements are clicked by different users, which allows Google to target the advertising more effectively. In Chapter 4 you'll learn about search engines and the PageRank algorithm, an important part of Google's ranking system. Other examples include web sites with recommendation systems. Sites like Amazon and Netflix use information about the things people buy or rent to determine which people or items are similar to one another, and then make recommendations based on purchase history. Other sites like Pandora and use your ratings of different bands and songs to create custom radio stations with music they think you will enjoy. Chapter 2 covers ways to build recommendation systems. Prediction markets are also a form of collective intelligence. One of the most well known of these is the Hollywood Stock Exchange (, where people trade stocks on movies and movie stars.

pages: 1,085 words: 219,144

Solr in Action by Trey Grainger, Timothy Potter


business intelligence, cloud computing, conceptual framework, crowdsourcing, data acquisition,, failed state, fault tolerance, finite state, full text search, glass ceiling, information retrieval, natural language processing, performance metric, premature optimization, recommendation engine, web application

Instead of thinking of Solr as a text search engine, it can be mentally freeing to think of Solr as a “matching engine that happens to be able to match on parsed text.” Whether the search is manual or automated is of no consequence to Solr. In fact, several organizations have successfully built recommender systems directly on top of Solr using this thinking. The following sections will cover how to build your own Solr-powered recommendation engine and ultimately how to merge the concepts of a user-driven search experience and an automated recommendation system to provide a powerful, personalized search experience. In particular, we will discuss several content-based recommendation approaches including attribute-based matching, hierarchical-classification-based matching, matching based upon extracted interesting terms (More Like This), concept-based matching, and geographical matching.

This shifts the paradigm completely, because it requires software systems to be intelligent enough to recommend information to users as opposed to having them explicitly search for it. Although organizations such as Netflix and Amazon are well known for their recommender systems and have spent millions of dollars developing them, it’s both possible and easy to develop such systems yourself—particularly on top of Solr—to drastically improve the relevancy of your application. 16.5.1. Search vs. recommendations When one thinks of a search engine, the vision of a keyword box (and sometimes a separate location box) typically comes to mind. Likewise, when one thinks of a recommendation engine, the vision of a magical algorithm which automatically suggests information based upon past behavior and preferences likely comes to mind. In reality, both search and recommendations are just related forms of matching, with search engines generally matching keywords and locations in a query to keywords and locations in a document, and recommendation engines typically matching behavior of users to documents for which other users exhibited similar behaviors or matching content of one document to the content of another document.

The beauty of collaborative filtering, regardless of the implementation, is that it’s able to work without any knowledge about the content of your documents. Therefore, you could build a recommendation engine based upon Solr with documents containing nothing more than document IDs and users, and you should still see quality recommendations as long as you have enough users linking your documents together. If you don’t put any text content, attributes, or classifications into Solr, then it means you will not be able to make use of those additional techniques at all. The next section will discuss why you may want to consider combining multiple techniques to achieve optimal relevancy in your recommendation system. 16.5.8. Hybrid approaches Throughout this chapter, you have seen multiple different recommendation approaches, each with its own strengths and weaknesses.

pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future by Kevin Kelly


3D printing, A Declaration of the Independence of Cyberspace, AI winter, Airbnb, Albert Einstein, Amazon Web Services, augmented reality, bank run, barriers to entry, Baxter: Rethink Robotics, bitcoin, blockchain, book scanning, Brewster Kahle, Burning Man, cloud computing, computer age, connected car, crowdsourcing, dark matter, dematerialisation, Downton Abbey, Edward Snowden, Elon Musk, Filter Bubble, Freestyle chess, game design, Google Glasses, hive mind, Howard Rheingold, index card, indoor plumbing, industrial robot, Internet Archive, Internet of things, invention of movable type, invisible hand, Jaron Lanier, Jeff Bezos, job automation, Kevin Kelly, Kickstarter, linked data, Lyft, M-Pesa, Marshall McLuhan, means of production, megacity, Minecraft, multi-sided market, natural language processing, Netflix Prize, Network effects, new economy, Nicholas Carr, peer-to-peer lending, personalized medicine, placebo effect, planetary scale, postindustrial economy, recommendation engine, RFID, ride hailing / ride sharing, Rodney Brooks, self-driving car, sharing economy, Silicon Valley, slashdot, Snapchat, social graph, social web, software is eating the world, speech recognition, Stephen Hawking, Steven Levy, Ted Nelson, the scientific method, transport as a service, two-sided market, Uber for X, Watson beat the top human players on Jeopardy!, Whole Earth Review

And I’ll make it personal. How would I like to choose what I give my attention to next? First I’d like to be delivered more of what I know I like. This personal filter already exists. It’s called a recommendation engine. It is in wide use at Amazon, Netflix, Twitter, LinkedIn, Spotify, Beats, and Pandora, among other aggregators. Twitter uses a recommendation system to suggest who I should follow based on whom I already follow. Pandora uses a similar system to recommend what new music I’ll like based on what I already like. Over half of the connections made on LinkedIn arise from their follower recommender. Amazon’s recommendation engine is responsible for the well-known banner that “others who like this item also liked this next item.” Netflix uses the same to recommend movies for me. Clever algorithms churn through a massive history of everyone’s behavior in order to closely predict my own behavior.

Amazon’s greatest asset is not its Prime delivery service but the millions of reader reviews it has accumulated over decades. Readers will pay for Amazon’s all-you-can-read ebook service, Kindle Unlimited, even though they will be able to find ebooks for free elsewhere, because Amazon’s reviews will guide them to books they want to read. Ditto for Netflix. Movie fans will pay Netflix because their recommendation engine finds gems they would not otherwise discover. They may be free somewhere else, but they are essentially lost and buried. In these examples, you are not paying for the copies, you are paying for the findability. • • • These eight qualities require a new skill set for creators. Success no longer derives from mastering distribution. Distribution is nearly automatic; it’s all streams. The Great Copy Machine in the Sky takes care of that.

., 70–71 and platform synergy, 122–25 and real-time on demand, 114–17 and renting, 117–18 and right of modification, 124–25 accountability, 260–64 Adobe, 113, 206 advertising, 177–89 aggregated information, 140, 147 Airbnb, 109, 113, 124, 172 algorithms and targeted advertising, 179–82 Alibaba, 109 Amazon and accessibility vs. ownership, 109 and artificial intelligence, 33 cloud of, 128, 129 and on-demand model of access, 115 as ecosystem, 124 and filtering systems, 171–72 and recommendation engines, 169 and robot technology, 50 and tracking technology, 254 and user reviews, 21, 72–73 anime, 198 annotation systems, 202 anonymity, 263–64 anthropomorphization of technology, 259 Apache software, 69, 141, 143 API (application programming interface), 23 Apple, 1–2, 123, 124, 246 Apple Pay, 65 Apple Watch, 224 Arthur, Brian, 193, 209 artificial intelligence (AI), 29–60 ability to think differently, 42–43, 48, 51–52 as accelerant of change, 30 as alien intelligence, 48 in chess, 41–42 and cloud-based services, 127 and collaboration, 273 and commodity consumer attention, 179 and complex questions, 47 concerns regarding, 44 and consciousness, 42 corporate investment in, 32 costs of, 29, 52–53 data informing, 39 and defining humanity, 48–49 and digital storage capacity, 265, 266–67 and emergence of the “holos,” 291 as enhancement of human intelligence, 41–42 and filtering systems, 175 of Google, 36–37 impact of, 29 learning ability of, 32–33, 40 and lifelogging, 251 networked, 30 and network effect, 40 potential applications for, 34–36 questions arising from, 284 specialized applications of, 42 in tagging book content, 98 technological breakthroughs influencing, 38–40 ubiquity of, 30, 33 and video games, 230 and visual intelligence, 203 See also robots arts and artists artist/audience inversion, 81 and augmented reality, 232 and authenticity, 70 and creative remixing, 209 and crowdfunding, 156–61 and low-cost reproduction, 87 and patronage, 72 public art, 232 attention, 168–69, 176, 177–89 audience, 88, 148–49, 155, 156–57 audio recording, 249.

pages: 348 words: 39,850

Data Scientists at Work by Sebastian Gutierrez


Albert Einstein, algorithmic trading, bioinformatics, bitcoin, business intelligence, chief data officer, clean water, cloud computing, computer vision, continuous integration, correlation does not imply causation, crowdsourcing, data is the new oil, DevOps, domain-specific language, follow your passion, full text search, informal economy, information retrieval, Infrastructure as a Service, inventory management, iterative process, linked data, Mark Zuckerberg, microbiome, Moneyball by Michael Lewis explains big data, move fast and break things, natural language processing, Network effects, nuclear winter, optical character recognition, pattern recognition, Paul Graham, personalized medicine, Peter Thiel, pre–internet, quantitative hedge fund, quantitative trading / quantitative finance, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, self-driving car, side project, Silicon Valley, Skype, software as a service, speech recognition, statistical model, Steve Jobs, stochastic process, technology bubble, text mining, the scientific method, web application

If you were looking for more of an individual output thing, I’m probably most proud of some work I did at Intuit prior to it acquiring Mint. We had a scrappy little team of four people doing an internal startup-like project. I had the chance to lead the creation of a personalization system. It was Mint-like in that we were using a recommendation engine to match a couple hundred advertisers we signed up and who had coupons to people based on people’s spending behaviors. It was super exciting to build a whole recommendation system from scratch that actually worked quite well. It contributed to Intuit’s decision to acquire Mint, because the project was sort of a proof of concept that we could do it and make it work. Gutierrez: What is a typical Netflix day for you and your team? Smallwood: It would be quite different for me versus my team, so I’ll talk about my team.

It’s not about celebrating the material part of it, but it’s about you wanting to look good and have a great night. And everyone should be able to do that. Gutierrez: How do you pick projects to work on? Smith: Interest and ability to persuade others that it’s a good project. A great deal of my work here has been in support for other people’s projects. For instance, one thing I’ve worked on is research into the recommendations system. They built the recommendation system and it’s been running. Now I am doing the research into how it’s actually working and if it’s actually working. Many of the projects end up being formulated this way. I think of an idea or a different hypothesis or assumption than what we are currently doing, and I go and test it. Then I present the data and we discuss the findings. From there we can figure out where to go next.

So as a data scientist, even if I don’t have the domain expertise I can learn it, and can work on any problem that can be quantitatively described. I can almost guarantee that I won’t be in fashion retail in my forties, but I’m sure I’ll be working on something that relies on data and using similar techniques and methodologies. Gutierrez: How would you describe your work to a data scientist? Shellman: I build the recommendation engines like the ones you’re used to seeing all over the web, and sometimes I do it with really unique data, like transactions involving personal stylists in our brick-and-mortar stores or color trends from fabrics. Gutierrez: What have you been working on recently? Shellman: Over the last year and a half I’ve mostly worked on Recommendo, building new algorithms and the real-time scorer. For the last couple months we’ve been working on a follow-up to Recommendo that will offer customer segmentation as a service.

pages: 302 words: 73,581

Platform Scale: How an Emerging Business Model Helps Startups Build Large Empires With Minimum Investment by Sangeet Paul Choudary


3D printing, Airbnb, Amazon Web Services, barriers to entry, bitcoin, blockchain, business process, Clayton Christensen, collaborative economy, crowdsourcing, cryptocurrency, data acquisition, frictionless, game design, hive mind, Internet of things, invisible hand, Kickstarter, Lean Startup, Lyft, M-Pesa, Mark Zuckerberg, means of production, multi-sided market, Network effects, new economy, Paul Graham, recommendation engine, ride hailing / ride sharing, shareholder value, sharing economy, Silicon Valley, Skype, Snapchat, social graph, social software, software as a service, software is eating the world, Spread Networks laid a new fibre optics cable between New York and Chicago, TaskRabbit, the payments system, too big to fail, transport as a service, two-sided market, Uber and Lyft, Uber for X, Wave and Pay

Context may be static or dynamic. Many Web 1.0 era filters were created based on long sign-up forms that the user filled out. Today, filters are created based on data captured on an ongoing basis through a user’s actions. Filters may be standalone or collaborative. Amazon’s “People who purchased this product also purchased this product” feature is based on a collaborative filter. Many recommendation platforms allow users to filter results based on a “people like you” parameter. This, again, is a collaborative filter. The most important innovation in recent times that has led to the spread of collaborative filters is the implementation of Facebook’s social graph. Through the social graph, third-party platforms like TripAdvisor serve reviews based on a collaborative filter of people who are close to you on the graph.

pages: 353 words: 104,146

European Founders at Work by Pedro Gairifo Santos


business intelligence, cloud computing, crowdsourcing, fear of failure, full text search, information retrieval, inventory management, iterative process, Jeff Bezos, Lean Startup, Mark Zuckerberg, natural language processing, pattern recognition, pre–internet, recommendation engine, Richard Stallman, Silicon Valley, Skype, slashdot, Steve Jobs, Steve Wozniak, subscription business, technology bubble, web application, Y Combinator

All this time, were mainly concerned with keeping the site afloat, keeping it fast, scaling up properly, and this sort of scrobbling data and radio. The recommendation engine wasn't brilliant to begin with. And then, we finally decided we needed to hire somebody who knows what they're doing, who's going to work on this full-time. We e-mailed some mailing lists. We e-mailed the ISMIR2 mailing list. They're a group who meet every year about music recommendations and information retrieval in music. We ended up hiring a guy called Norman, who was both a great scientist and understood all the algorithms and captive audience sort of things, but also an excellent programmer who was able to implement all these ideas. So we got really lucky. The first person we hired was great and he just took over. He chucked out all of our crappy recommendation systems we had and built something good, and then improved it constantly for the next several years. __________ 2 The International Society for Music Information Retrieval So we had some A/B testing, split testing systems in there for the radio so they could try out new tweaks to the algorithms and see what was performing better.

They weren't even interested in recommendations at that point. I didn't really have a good recommender system for a long time. From your listening stats, you could click on an artist, and see who else had been listening to them. You could then see the listening stats of the other fans of artists you like. Just that system of connecting all the listening tastes proved to be really quite addictive. It spread by word of mouth. And then toward the end of my degree, I started working on some collaborative filtering recommendation stuff. Obviously that all tapped into some latent interest that people have in stats on their music listening. So I knew that recommendations weren't necessarily the main focus at that point. Not for a couple years after that did we have a really good recommender system. Music recommendation never really was my field, but I had a go at it, and then later on we hired somebody who knew what they were doing.

. _____________ 1 Digital Millennium Copyright Act Santos: Did you ever have any court problems with any of the copyright holders? Jones: Nothing substantial, really. I think sometimes rights holders, especially in the music industry, will use court action or the threat of court action as a sort of negotiating position. But, no. I think we managed to avoid anything serious in that regard. Santos: From the technical point of view, the actual recommendation engine and statistics, how does that actually work? How hard was it to develop it and tweak it? Did you change the approach many times? Did you have a clear idea on how to do it from the start? Jones: So initially when I was building it, we tried all sorts of stuff. I think what I was using for a long time in the beginning was just to use Lucene, a document indexing system. We just created fake documents of people's profiles.

pages: 368 words: 96,825

Bold: How to Go Big, Create Wealth and Impact the World by Peter H. Diamandis, Steven Kotler


3D printing, additive manufacturing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, cloud computing, crowdsourcing, Daniel Kahneman / Amos Tversky, dematerialisation, deskilling, Elon Musk,, Exxon Valdez, fear of failure, Firefox, Galaxy Zoo, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, industrial robot, Internet of things, Jeff Bezos, John Harrison: Longitude, Jono Bacon, Just-in-time delivery, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, loss aversion, Louis Pasteur, Mahatma Gandhi, Mark Zuckerberg, Mars Rover, meta analysis, meta-analysis, microbiome, minimum viable product, move fast and break things, Narrative Science, Netflix Prize, Network effects, Oculus Rift, optical character recognition, packet switching, PageRank, pattern recognition, performance metric, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, ride hailing / ride sharing, risk tolerance, rolodex, self-driving car, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart grid, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, telepresence, telepresence robot, Turing test, urban renewal, web application, X Prize, Y Combinator

Thus, if you could create an incentive prize that harnessed this competitive love of coding and this argumentative love of movies and tied them together—meaning design a prize around the intrinsic motivations at the core of coder culture—what might be possible? Well, in the case of Netflix, a better movie recommendation engine. A movie recommendation engine is a bit of software that tells you what movie you might want to watch next based on movies you’ve already watched and rated (on a scale of one to five stars). Netflix’s original recommendation engine, Cinematch, was created back in 2000 and quickly proved to be a wild success. Within a few years, nearly two-thirds of their rental business was being driven by their recommendation engine. Thus the obvious corollary: the better their recommendation engine, the better their business. And that was the problem. By the middle 2000s, Netflix engineers had plucked all the low-hanging fruit and the rate of Cinematch optimization had slowed to a crawl.

The prize hunters, even the leaders, are startlingly open about the methods they’re using, acting more like academics huddled over a knotty problem than entrepreneurs jostling for a $1 million payday. In December 2006, a competitor called ‘simonfunk’ posted a complete description of his algorithm—which at the time was tied for third place—giving everyone else the opportunity to piggyback on his progress. ‘We had no idea the extent to which people would collaborate with each other,’ says Jim Bennett, vice president for recommendation systems at Netflix.”16 And this isn’t an aberration. Over the course of the eight XPRIZEs launched to date, there has been an extraordinary amount of cooperation. We’ve seen teams providing unsolicited advice, teams merging, teams acquiring and sharing technology and experts. When the prize is driven by an MTP, while a team’s primary purpose is to win, a close second is their desire to see the primary objective achieved; thus teams exhibit a much higher willingness to share.

pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier


23andMe, Affordable Care Act / Obamacare, airport security, AltaVista, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, IBM and the Holocaust, index card, informal economy, Internet of things, invention of the printing press, Jeff Bezos, Louis Pasteur, Mark Zuckerberg, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, performance metric, Peter Thiel, Post-materialism, post-materialism, random walk, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, speech recognition, Steve Jobs, Steven Levy, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Watson beat the top human players on Jeopardy!

In fact, the company approached its business model in that order, which is the inverse of the norm. It initially only had the idea for its celebrated recommendation system. Its stock market prospectus in 1997 described “collaborative filtering” before Amazon knew how it would work in practice or had enough data to make it useful. Both Google and Amazon span the categories, but their strategies differ. When Google first sets out to collect any sort of data, it has secondary uses in mind. Its Street View cars, as we have seen, collected GPS information not just for its map service but also to train self-driving cars. By contrast, Amazon is more focused on the primary use of data and only taps the secondary uses as a marginal bonus. Its recommendation system, for example, relies on clickstream data as a signal, but the company hasn’t used the information to do extraordinary things like predict the state of the economy or flu outbreaks.

Companies that have failed to appreciate the importance of data’s reuse have learned their lesson the hard way. For example, in Amazon’s early days it signed a deal with AOL to run the technology behind AOL’s e-commerce site. To most people, it looked like an ordinary outsourcing deal. But what really interested Amazon, explains Andreas Weigend, Amazon’s former chief scientist, was getting hold of data on what AOL users were looking at and buying, which would improve the performance of its recommendation engine. Poor AOL never realized this. It only saw the data’s value in terms of its primary purpose—sales. Clever Amazon knew it could reap benefits by putting the data to a secondary use. Or take the case of Google’s entry into speech recognition with GOOG-411 for local search listings, which ran from 2007 to 2010. The search giant didn’t have its own speech-recognition technology so needed to license it.

Buy a book on Poland and you’d be bombarded with Eastern European fare. Purchase one about babies and you’d be inundated with more of the same. “They tended to offer you tiny variations on your previous purchase, ad infinitum,” recalled James Marcus, an Amazon book reviewer from 1996 to 2001, in his memoir, Amazonia. “It felt as if you had gone shopping with the village idiot.” Greg Linden saw a solution. He realized that the recommendation system didn’t actually need to compare people with other people, a task that was technically cumbersome. All it needed to do was find associations among products themselves. In 1998 Linden and his colleagues applied for a patent on “item-to-item” collaborative filtering, as the technique is known. The shift in approach made a big difference. Because the calculations could be done ahead of time, the recommendations were lightning fast.

pages: 377 words: 97,144

Singularity Rising: Surviving and Thriving in a Smarter, Richer, and More Dangerous World by James D. Miller


23andMe, affirmative action, Albert Einstein, artificial general intelligence, Asperger Syndrome, barriers to entry, brain emulation, cloud computing, cognitive bias, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, David Brooks, David Ricardo: comparative advantage, Deng Xiaoping,, feminist movement, Flynn Effect, friendly AI, hive mind, impulse control, indoor plumbing, invention of agriculture, Isaac Newton, John von Neumann, knowledge worker, Long Term Capital Management, low skilled workers, Netflix Prize, neurotypical, pattern recognition, Peter Thiel, phenotype, placebo effect, prisoner's dilemma, profit maximization, Ray Kurzweil, recommendation engine, reversible computing, Richard Feynman, Richard Feynman, Rodney Brooks, Silicon Valley, Singularitarianism, Skype, statistical model, Stephen Hawking, Steve Jobs, supervolcano, technological singularity, The Coming Technological Singularity, the scientific method, Thomas Malthus, transaction costs, Turing test, Vernor Vinge, Von Neumann architecture

A big part of our brain is devoted to processing visual inputs. Hence, a good recommendation system would necessarily have powerful insights into a significant chunk of our brains. 3.Measurable Incremental Progress—Think of AI as a destination a thousand miles away with the entire pathway hidden by fog. To reach our destination, we need to take many small steps, and for each step we need a way to determine if we have gone in the right direction. A video recommendation system provides this corrective by gathering continuous feedback on how many users liked the recommended videos. 4.Profitable with Every Step—Businesses are more motivated to invest in a type of innovation if they can continually increase revenue with each small improvement. Consequently, an application such as a video recommendation engine in which each improvement increases consumer satisfaction is (all else being equal) more likely to attract large corporate investment than an application that would have value only if it achieved near-human-level intelligence. 5.Amenable to Parallel Processing—Imagine we want to move a heavy object from point A to point B.

Fortunately, with video recommendations, many challenges, such as finding what type of cat video a certain set of users might enjoy, can be worked on independently for reasonably long periods of time. 6.Free Labor from Customers—A recommendation system would rely on millions of people to freely help train the system by picking which videos to watch, rating some of the videos they see, writing reviews of videos, and labeling in words the content they upload. 7.Help from Advertisers and Political Consultants—Salesmen would eagerly seek to learn what types of messages appealed to different factions of the population. The recommendation system could piggyback on these salesmen’s attempts to understand their clientele and use their insights to improve recommendation software. 8.AI and Human Recommenders Could Productively Work Together—Unlike what YouTube currently does, an effective AI recommendation system could make use of human evaluators. When my son was four, he enjoyed watching YouTube videos of supernovas and children’s cartoons.

For example, if 90 percent of people who had some unusual allele or brain microstructure enjoyed a certain cat video, then the AI recommender would suggest the video to all other viewers who had that trait. 12.Amenable to Crowdsourcing—Netflix, the rent-by-mail and streaming video distributor, offered (and eventually paid) a $1 million prize to whichever group improved its recommendation system the most, so long as at least one group improved the system by at least 10 percent. This “crowdsourcing,” which occurs when a problem is thrown open to anyone, helps a company by allowing them to draw on the talents of strangers, while only paying the strangers if they help the firm. This kind of crowdsourcing works only if, as with a video recommendation system, there is an easy and objective way of measuring progress toward the crowdsourced goal. 13.Potential Improvement All the Way Up to Superhuman Artificial General Intelligence—A recommendation AI could slowly morph into a content creator.

Remix: Making Art and Commerce Thrive in the Hybrid Economy by Lawrence Lessig


Amazon Web Services, Andrew Keen, Benjamin Mako Hill, Berlin Wall, Bernie Sanders, Brewster Kahle, Cass Sunstein, collaborative editing, disintermediation, don't be evil, Erik Brynjolfsson, Internet Archive, invisible hand, Jeff Bezos, jimmy wales, Kevin Kelly, late fees, Netflix Prize, Network effects, new economy, optical character recognition, PageRank, recommendation engine, revision control, Richard Stallman, Ronald Coase, Saturday Night Live, SETI@home, sharing economy, Silicon Valley, Skype, slashdot, Steve Jobs, The Nature of the Firm, thinkpad, transaction costs, VA Linux

And so increasingly, we must ask how these different norms might be made to coexist. Jeff Jarvis, journalist and blogger, suggests companies “pay dividends back to [the] crowd” and avoid trying too hard “to control [the gathered] 80706 i-xxiv 001-328 r4nk.indd 233 8/12/08 1:55:56 AM REMI X 234 wisdom, and limit its use and the sharing of it.”19 Tapscott and Williams make the same recommendation: “platforms for participation will only remain viable for as long as all the stakeholders are adequately and appropriately compensated for their contributions— don’t expect a free ride forever.”20 The key word here is “appropriately.” Obviously, there must be adequate compensation. But the kind of compensation is the puzzle. Once again, the “sharing economy” of two lovers is one in which both need to be concerned that the other is “adequately and appropriately compensated for [his or her] contribution.”

pages: 398 words: 86,855

Bad Data Handbook by Q. Ethan McCallum


Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, cloud computing, cognitive dissonance, combinatorial explosion, conceptual framework, database schema,, Firefox, Flash crash, Gini coefficient, illegal immigration, iterative process, labor-force participation, loose coupling, natural language processing, Netflix Prize, quantitative trading / quantitative finance, recommendation engine, sentiment analysis, statistical model, supply-chain management, text mining, too big to fail, web application

Facebook is powered by its Open Graph, the “people and the connections they have to everything they care about.”[68] Facebook provides an API to access this social network and make it available for integration into other networked datasets. On Twitter, the network structure resulting from friends and followers leads to recommendations of “Who to follow.” On LinkedIn, network-based recommendations include “Jobs you may be interested in” and “Groups you may like.” The recommendation engine is built on a “Taste Graph” that “uses signals from around the Web to map members with their predicted affinity for products, services, other people, websites, or just about anything, and customizes recommended topics for them.”[69] A search on Google can be considered a type of recommendation about which of possibly millions of search hits are most relevant for a particular query.

Springer-Verlag New York, Inc., New York, NY, USA. [63] [64] [65] [66] [67] Ted G. Lewis. 2009. Network Science: Theory and Applications. Wiley Publishing. [68] [69] “eBay Acquires Recommendation Engine,” [70] Brin, S.; Page, L. 1998. “The anatomy of a large-scale hypertextual Web search engine.” Computer Networks and ISDN Systems 30: 107–117 Chapter 14. Myths of Cloud Computing Steve Francia Myths are an important and natural part of the emergence of any new technology, product, or idea as identified by the hype cycle.

I’ve written code to process accelerometer and hydrophone signals for analysis of dams and other large structures (as an undergraduate student in Engineering at Harvey Mudd College), analyzed recordings of calls from various species of bats (as a graduate student in Electrical Engineering at the University of Washington), built systems to visualize imaging sonar data (as a Graduate Research Assistant at the Applied Physics Lab), used large amounts of crawled web content to build content filtering systems (as the co-founder and CTO of N2H2, Inc.), designed intranet search systems for portal software (at DataChannel), and combined multiple sets of directory assistance data into a searchable website (as CTO at For the past five years or so, I’ve spent most of my time at Demand Media using a wide variety of data sources to build optimization systems for advertising and content recommendation systems, with various side excursions into large-scale data-driven search engine optimization (SEO) and search engine marketing (SEM). Most of my examples will be related to work I’ve done in Ad Optimization, Content Recommendation, SEO, and SEM. These areas, as with most, have their own terminology, so a few term definitions may be helpful. Table 2-1. Term Definitions TermDefinition PPC Pay Per Click—Internet advertising model used to drive traffic to websites with a payment model based on clicks on advertisements.

pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem


Amazon Web Services, anti-pattern, bioinformatics, corporate governance, create, read, update, delete, data acquisition,, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, web application

Common Use Cases | 95 As in the social use case, making an effective recommendation depends on under‐ standing the connections between things, as well as the quality and strength of those connections—all of which are best expressed as a property graph. Queries are primarily graph local, in that they start with one or more identifiable subjects, whether people or resources, and thereafter discover surrounding portions of the graph. Taken together, social networks and recommendation engines provide key differenti‐ ating capabilities in the areas of retail, recruitment, sentiment analysis, search, and knowledge management. Graphs are a good fit for the densely connected data structures germane to each of these areas; storing and querying this data using a graph database allows an application to surface end-user realtime results that reflect recent changes to the data, rather than pre-calculated, stale results.

. • Foreign key constraints add additional development and maintenance overhead just to make the database work. • Sparse tables with nullable columns require special checking in code, despite the presence of a schema. • Several expensive joins are needed just to discover what a customer bought. • Reciprocal queries are even more costly. “What products did a customer buy?” is relatively cheap compared to “which customers bought this product?”, which is the basis of recommendation systems. We could introduce an index, but even with an index, recursive questions such as “which customers bought this product who also bought that product?” quickly become prohibitively expensive as the degree of re‐ cursion increases. Relational databases struggle with highly-connected domains. To understand the cost of performing connected queries in a relational database, we’ll look at some simple and not-so-simple queries in a social network domain.

pages: 320 words: 87,853

The Black Box Society: The Secret Algorithms That Control Money and Information by Frank Pasquale


Affordable Care Act / Obamacare, algorithmic trading, Amazon Mechanical Turk, asset-backed security, Atul Gawande, bank run, barriers to entry, Berlin Wall, Bernie Madoff, Black Swan, bonus culture, Brian Krebs, call centre, Capital in the Twenty-First Century by Thomas Piketty, Chelsea Manning, cloud computing, collateralized debt obligation, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, cryptocurrency, Debian, don't be evil, Edward Snowden,, Fall of the Berlin Wall, Filter Bubble, financial innovation, Flash crash, full employment, Goldman Sachs: Vampire Squid, Google Earth, Hernando de Soto, High speed trading, hiring and firing, housing crisis, informal economy, information retrieval, interest rate swap, Internet of things, invisible hand, Jaron Lanier, Jeff Bezos, job automation, Julian Assange, Kevin Kelly, knowledge worker, Kodak vs Instagram, kremlinology, late fees, London Interbank Offered Rate, London Whale, Mark Zuckerberg, mobile money, moral hazard, new economy, Nicholas Carr, offshore financial centre, PageRank, pattern recognition, precariat, profit maximization, profit motive, quantitative easing, race to the bottom, recommendation engine, regulatory arbitrage, risk-adjusted returns, search engine result page, shareholder value, Silicon Valley, Snapchat, Spread Networks laid a new fibre optics cable between New York and Chicago, statistical arbitrage, statistical model, Steven Levy, the scientific method, too big to fail, transaction costs, two-sided market, universal basic income, Upton Sinclair, value at risk, WikiLeaks

But what do we know about them? A bad credit score may cost a borrower hundreds of thousands of dollars, but he will never understand exactly how it was calculated. A predictive INTRODUCTION—THE NEED TO KNOW 5 analytics firm may score someone as a “high cost” or “unreliable” worker, yet never tell her about the decision. More benignly, perhaps, these companies influence the choices we make ourselves. Recommendation engines at Amazon and YouTube affect an automated familiarity, gently suggesting offerings they think we’ll like. But don’t discount the significance of that “perhaps.” The economic, political, and cultural agendas behind their suggestions are hard to unravel. As middlemen, they specialize in shifting alliances, sometimes advancing the interests of customers, sometimes suppliers: all to orchestrate an online world that maximizes their own profits.

In short, they improve the quality of our daily lives in ways both noticeable and not. But where do we call a halt? Similar protocols also influence— invisibly—not only the route we take to a new restaurant, but which restaurant Google, Yelp, OpenTable, or Siri recommends to us. They might help us fi nd reviews of the car we drive. Yet choosing a car, or even a restaurant, is not as straightforward as optimizing an engine or routing a drive. Does the recommendation engine take into account, say, whether the restaurant or car company gives its workers health benefits or maternity leave? Could we prompt it to do so? In their race for the most profitable methods of mapping social reality, the data scientists of Silicon Valley and Wall Street tend to treat recommendations as purely technical problems. The values and prerogatives that the encoded rules enact are hidden within black boxes.23 INTRODUCTION—THE NEED TO KNOW 9 The most obvious question is: Are these algorithmic applications fair?

Even if it is the former, we should note that Google’s autosuggest feature may have automatically entered the word “bomb” after “pressure cooker” while he was 228 NOTES TO PAGES 21–23 typing— certainly many people would have done the search in the days after the Boston bombing merely to learn just how lethal such an attack could be. The police had no way of knowing whether Catalano had actually typed “bomb” himself, or accidentally clicked on it thanks to Google’s increasingly aggressive recommendation engines. See also Philip Bump, “Update: Now We Know Why Googling ‘Pressure Cookers’ Gets a Visit from the Cops,” The Wire, August 1, 2013, /national /2013/08/government-knocking -doors-because-google-searches/67864 /#.UfqCSAXy7zQ.facebook. 10. Martin Kuhn, Federal Dataveillance: Implications for Constitutional Privacy Protections (New York: LFB Scholarly Publishing, 2007), 178. 11.

pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence by John Brockman


3D printing, agricultural Revolution, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, artificial general intelligence, augmented reality, autonomous vehicles, bitcoin, blockchain, clean water, cognitive dissonance, Colonization of Mars, complexity theory, computer age, computer vision, constrained optimization, corporate personhood, cosmological principle, cryptocurrency, cuban missile crisis, Danny Hillis, dark matter, discrete time, Elon Musk, Emanuel Derman, endowment effect, epigenetics, Ernest Rutherford, experimental economics, Flash crash, friendly AI, Google Glasses, hive mind, income inequality, information trail, Internet of things, invention of writing, iterative process, Jaron Lanier, job automation, John von Neumann, Kevin Kelly, knowledge worker, loose coupling, microbiome, Moneyball by Michael Lewis explains big data, natural language processing, Network effects, Norbert Wiener, pattern recognition, Peter Singer: altruism, phenotype, planetary scale, Ray Kurzweil, recommendation engine, Republic of Letters, RFID, Richard Thaler, Rory Sutherland, Search for Extraterrestrial Intelligence, self-driving car, sharing economy, Silicon Valley, Skype, smart contracts, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, superintelligent machines, supervolcano, the scientific method, The Wisdom of Crowds, theory of mind, Thorstein Veblen, too big to fail, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!, Y2K

Is it possible to create an artificial mentor for each student? We already have recommender systems on the Internet that tell us, “If you liked X, you might also like Y,” based on data of many others with similar patterns of preference. Someday the mind of each student may be tracked from childhood by a personalized deep-learning system. To achieve this level of understanding of a human mind is beyond the capabilities of current technology, but there are already efforts at Facebook to use their vast social database of friends, photos, and likes to create a Theory of Mind for every person on the planet. So my prediction is that as more and more cognitive appliances, like chess-playing programs and recommender systems are devised, humans will become smarter and more capable. SHALLOW LEARNING SETH LLOYD Professor of quantum mechanical engineering, MIT; author, Programming the Universe Pity the poor folks at the National Security Agency: They’re spying on everyone (quelle surprise!)

Conceptually, autonomous or artificial intelligence systems can develop in two ways: either as an extension of human thinking or as radically new thinking. Call the first “Humanoid Thinking,” or Humanoid AI, and the second “Alien Thinking,” or Alien AI. Almost all AI today is Humanoid Thinking. We use AI to solve problems too difficult, time-consuming, or boring for our limited brains to process: electrical-grid balancing, recommendation engines, self-driving cars, face recognition, trading algorithms, and the like. These artificial agents work in narrow domains with clear goals their human creators specify. Such AI aims to accomplish human objectives—often better, with fewer cognitive errors, distractions, outbursts of bad temper, or processing limitations. In a couple of decades, AI agents might serve as virtual insurance sellers, doctors, psychotherapists, and maybe even virtual spouses and children.

He implies that the Age of the Thinking Machine is resulting in ossification rather than renewal. As our lives become increasingly recorded, archived, and accessed, we have become cannibals driven to consume our history and terrified of transgressing its established norms. To some extent, the future is blocked to us; we’re stuck in stasis; we’re stuck with a version of ourselves that’s becoming increasingly narrow. No thanks to recent tools such as “recommender systems,” we’re lodged in a seemingly endless feedback loop of “If you liked that, you’ll love this.” As we might become increasingly stuck in Curtis’s idea of the “you-loop,” so the nature of what it means to be human might be compromised by job-hogging machines that will render many of us obsolete. This Edge Question points to the next chapter in human history/evolution; we’re facing the beginning of a new definition of man, a new civilization.

pages: 406 words: 88,820

Television disrupted: the transition from network to networked TV by Shelly Palmer


barriers to entry, call centre, disintermediation,, hypertext link, interchangeable parts, invention of movable type, James Watt: steam engine, linear programming, market design, pattern recognition, recommendation engine, Saturday Night Live, shareholder value, Skype, spectrum auction, Steve Jobs, subscription business, Telecommunications Act of 1996, Vickrey auction, yield management

We could probably list dozens of reasons why a person might choose to be his or her own program director. The key problem with on-demand technology is not desire; it is complexity. It’s just too hard for the average person to do. Now, making a playlist in iTunes could not be simpler. But, putting your iPod in shuffle mode is actually easier, and it is also the path of least resistance. There are other factors that help with playlist creation. Recommendation engines and collaborative filtering like Amazon’s “if you like this … you might also like …” are good ways to help people pick the right stuff for their playlists. Consumers can also skew shuffle modes, setting them to play the content they manually play the most more often than the content they play less often. Of course, all of this technology requires consumers to collect all of their media into one place.

You can (and should) ask the same question about high traffic Web sites like Google,Yahoo!, MSN, Amazon, eBay, and of course, about every existing broadcast and cable network. A trip to the video section of the Apple Music Store through iTunes is a very interesting experience, particularly when you see how the interface handles show branding vs. network branding. Social Search Solution Another probable future is Tim Halle’s vision of a “social search,” a recommendation system that will emerge from social networking sites. Of course, the biggest social Copyright © 2006, Shelly Palmer. All rights reserved. 8-Television.Chap Eight v3.qxd 3/20/06 7:25 AM Page 114 114 C H A P T E R 8 Media Consumption networking sites like or are also big brands, so this may be just another permutation of branded search. (See “Folksonomy” in Chapter 6.)

pages: 283 words: 85,824

The People's Platform: Taking Back Power and Culture in the Digital Age by Astra Taylor


A Declaration of the Independence of Cyberspace, Andrew Keen, barriers to entry, Berlin Wall, big-box store, Brewster Kahle, citizen journalism, cloud computing, collateralized debt obligation, Community Supported Agriculture, conceptual framework, corporate social responsibility, cross-subsidies, crowdsourcing, David Brooks, digital Maoism, disintermediation, don't be evil, Donald Trump, Edward Snowden, Fall of the Berlin Wall, Filter Bubble, future of journalism, George Gilder, Google Chrome, Google Glasses, hive mind, income inequality, informal economy, Internet Archive, Internet of things, invisible hand, Jane Jacobs, Jaron Lanier, Jeff Bezos, job automation, Julian Assange, Kevin Kelly, Kickstarter, knowledge worker, Mark Zuckerberg, means of production, Naomi Klein, Narrative Science, Network effects, new economy, New Journalism, New Urbanism, Nicholas Carr, oil rush, Peter Thiel, Plutocrats, plutocrats, pre–internet, profit motive, recommendation engine, Richard Florida, Richard Stallman, self-driving car, shareholder value, sharing economy, Silicon Valley, Silicon Valley ideology, slashdot, Slavoj Žižek, Snapchat, social graph, Steve Jobs, Stewart Brand, technoutopianism, trade route, Whole Earth Catalog, WikiLeaks, winner-take-all economy, Works Progress Administration, young professional

A more democratic culture is one where previously excluded populations are given the material means to fully engage. To create a culture that is more diverse and inclusive, we have to pioneer ways of addressing discrimination and bias head-on, despite the difficulties of applying traditional methods of mitigating prejudice to digital networks. We have to shape our tools of discovery, the recommendation engines and personalization filters, so they do more than reinforce our prior choices and private bubbles. Finally, if we want a culture that is more resistant to the short-term expectations of corporate shareholders and the whims of marketers, we have to invest in noncommercial enterprises. There is no shortage of good ideas. By not experimenting, we court disillusionment. The Internet was supposed to be free and ubiquitous, but a cable cartel would rather rake in profits than provide universal service.

,” Wired, blog post, November 15, 2008, 35. Fang Wu and Bernardo A. Huberman, “The Persistence Paradox,” First Monday 15, nos. 1–4 (January 2010). 36. James Evans, “Electronic Publication and the Narrowing of Science and Scholarship,” Science 321, no. 5887 (July 18, 2008): 395–99. 37. Daniel M. Fleder and Kartik Hosanagar, “Blockbuster Culture’s Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity,” Management Science 55, no. 5 (May 2009): 697–712. 38. Evan Hughes, “Here’s How Amazon Self-Destructs,” Salon, July 19, 2013. 39. Gary Flake et al., “Winners Don’t Take All: Characterizing the Competition for Links on the Web,” Proceedings of the National Academy of Sciences 99, no. 8 (April 16, 2002). 40. Eli Pariser, The Filter Bubble: What the Internet Is Hiding from You (New York: Penguin Press, 2011), 128. 41.

pages: 247 words: 81,135

The Great Fragmentation: And Why the Future of All Business Is Small by Steve Sammartino


3D printing, additive manufacturing, Airbnb, augmented reality, barriers to entry, Bill Gates: Altair 8800, bitcoin, BRICs, Buckminster Fuller, citizen journalism, collaborative consumption, cryptocurrency, Elon Musk, fiat currency, Frederick Winslow Taylor, game design, Google X / Alphabet X, haute couture, helicopter parent, illegal immigration, index fund, Jeff Bezos, jimmy wales, Kickstarter, knowledge economy, Law of Accelerating Returns, market design, Metcalfe's law, Minecraft, minimum viable product, Network effects, new economy, post scarcity, prediction markets, pre–internet, profit motive, race to the bottom, random walk, Ray Kurzweil, recommendation engine, remote working, RFID, self-driving car, sharing economy, side project, Silicon Valley, Silicon Valley startup, skunkworks, Skype, social graph, social web, software is eating the world, Steve Jobs, too big to fail, web application

Creative types Collaboration, creative orientation and counter intuition Note Chapter 6: Demographics is history: moving on from predictive marketing How to get profiled The price of pop culture The best average The weapon of choice Don’t fence me in How do you define a teenager? Stealing music or connecting? Marketing 1.0 Marketing revised The new intersection Social + interests = intention The story of cities Do I know you? The interest graph in action The anti-demographic recommendation engine Chapter 7: The truth about pricing: technology and omnipresent deflation Technology deflation Real-world technology deflation The free super computer The crux is human It’s getting quicker Technology curve jumping Technology stacking Omnipresent deflation Consumer price index trickery Connections and the impact on prices Economic border hopping The new minimum wage Notes Chapter 8: A zero-barrier world: how access to knowledge is breaking down barriers So what’s changed?

They focused on direct connection, one new fan at a time. They didn’t try to build an audience. They helped a person, which is a very different approach. It seems old-school BMXers are a little bit smarter than old-school marketers. What a great way to build a community; one that I’m now a part of. While everyone gets enamoured with ‘big data’, there’s probably a lot more we can do with ‘little data’. The anti-demographic recommendation engine A lot of e-commerce platforms and social-media engines seem to be able to do what mainstream marketers could never quite pull off. Every day, I’m exposed to products and services that I have zero interest in ever purchasing, mainly due to the laziness of the marketers who allocate the budget behind them. But occasionally I’m utterly inspired and thankful when great marketers (with permission) introduce me to things that are just perfectly suited.

Twitter is terrific at this with its who-to-follow recommendations. But the best example has to be Amazon’s ‘Recommended for you’ books. It’s always spot on, sitting perfectly in the centre of my personal interest graph, based on the simplicity of what I’ve bought, looked at, wish listed and what others have in their list when there are overlaps. For me personally, it’s very accurate indeed. What’s interesting is that this recommendation engine is what I’d coin an ‘anti-demographic’ profiler: It doesn’t care what sex I am. It doesn’t care where I live. It doesn’t care or know how much I earn. It doesn’t care if I finished school. None of this matters. What matters is the direct connection and the reality of my interests based on my digital footprint. It’s the type of efficiency that mass can never achieve. The smart marketing money now lives in a node-by-node approach.

pages: 215 words: 55,212

The Mesh: Why the Future of Business Is Sharing by Lisa Gansky


Airbnb, Amazon Mechanical Turk, Amazon Web Services, banking crisis, barriers to entry, carbon footprint, cloud computing, credit crunch, crowdsourcing, diversification, Firefox, Google Earth, Internet of things, Kickstarter, late fees, Network effects, new economy, peer-to-peer lending, recommendation engine, RFID, Richard Florida, Richard Thaler, ride hailing / ride sharing, sharing economy, Silicon Valley, smart grid, social web, software as a service, TaskRabbit, the built environment, walkable city, yield management, young professional, Zipcar

As the service developed, the company added layers of information to inform a user’s choices, such as reviews from people in the network whose profile of selections and ratings were similar. Recently, it sponsored a contest awarding a million dollars to anyone who could significantly improve the movie recommendation service. Thousands of teams from more than a hundred nations competed. Netflix’s “recommendation engine” relies on algorithms culled from masses of data collected on the Web, including that provided directly by customers. The lesson learned from the contest, according to the New York Times, was the power of collaboration, as winning teams began sharing ideas and information: “The formula for success was to bring together people with complementary skills and combine different methods of problem solving.”

See Social networking starting Mesh company Sweet Spot trends influencing growth of trust building Millennial generation Mobile networks digital translation to physical and flash branding as foundation of the Mesh share-based business operation users, increase in Modular design Mohsenin, Kamran Movie rentals online, Mesh companies Mozilla Firefox Music-based businesses, Mesh companies Natural ecosystem, relationship to Mesh ecosystem Netflix annual sales as information business Mesh strategy perfection recommendation engine recommendations Network effect Niche markets for maintaining/servicing products Mesh companies opening, reason for sharing as North Portland Tool Library (NPTL) Ofoto Olapic Ombudsman Open Architecture Network Open Design Open innovation service provider Open networks advantages of Architecture for Humanity communal IP concept and marketing products openness versus proprietary approach and product improvement software development OpenTable O’Reilly, Tim Ostrom, Elinor Own-to-Mesh model car-sharing services profits, generation from retirees as customers Partnerships characteristics of corporations and Mesh companies income generation from in Mesh ecosystem unexpected value of Patagonia recycled textiles of Walmart partnership Paul, Sunil Payne, Steven Peer-to-peer lending.

pages: 58 words: 12,386

Big Data Glossary by Pete Warden


business intelligence, crowdsourcing, fault tolerance, information retrieval, linked data, natural language processing, recommendation engine, web application

To achieve that scalability, most of the code is written as parallelizable jobs on top of Hadoop. It comes with algorithms to perform a lot of common tasks, like clustering and classifying objects into groups, recommending items based on other users’ behaviors, and spotting attributes that occur together a lot. In practical terms, the framework makes it easy to use analysis techniques to implement features such as Amazon’s “People who bought this also bought” recommendation engine on your own site. It’s a heavily used project with an active community of developers and users, and it’s well worth trying if you have any significant number of transaction or similar data that you’d like to get more value out of. Introducing Mahout Using Mahout with Cassandra scikits.learn It’s hard to find good off-the-shelf tools for practical machine learning. Many of the projects are aimed at students and researchers who want access to the inner workings of the algorithms, which can be off-putting when you’re looking for more of a black box to solve a particular problem.

pages: 274 words: 75,846

The Filter Bubble: What the Internet Is Hiding From You by Eli Pariser


A Declaration of the Independence of Cyberspace, A Pattern Language, Amazon Web Services, augmented reality, back-to-the-land, Black Swan, borderless world, Build a better mousetrap, Cass Sunstein, citizen journalism, cloud computing, cognitive dissonance, crowdsourcing, Danny Hillis, data acquisition, disintermediation, don't be evil, Filter Bubble, Flash crash, fundamental attribution error, global village, Haight Ashbury, Internet of things, Isaac Newton, Jaron Lanier, Jeff Bezos, jimmy wales, Kevin Kelly, knowledge worker, Mark Zuckerberg, Marshall McLuhan, megacity, Netflix Prize, new economy, PageRank, paypal mafia, Peter Thiel, recommendation engine, RFID, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, social graph, social software, social web, speech recognition, Startup school, statistical model, stem cell, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, the scientific method, urban planning, Whole Earth Catalog, WikiLeaks, Y Combinator

In a memo for fellow progressives, Mark Steitz, one of the primary Democratic data gurus, recently wrote that “targeting too often returns to a bombing metaphor—dropping message from planes. Yet the best data tools help build relationships based on observed contacts with people. Someone at the door finds out someone is interested in education; we get back to that person and others like him or her with more information. Amazon’s recommendation engine is the direction we need to head.” The trend is clear: We’re moving from swing states to swing people. Consider this scenario: It’s 2016, and the race is on for the presidency of the United States. Or is it? It depends on who you are, really. If the data says you vote frequently and that you may have been a swing voter in the past, the race is a maelstrom. You’re besieged with ads, calls, and invitations from friends.

Quora Forum, accessed Dec. 17, 2010, 151 “against the cruise line industry”: Hollis Thomases, “Google Drops Anti-Cruise Line Ads from AdWords,” Web Ad.vantage, Feb. 13, 2004, accessed Dec. 17, 2010, 151–52 identify who was persuadable: “How Rove Targeted the Republican Vote,” Frontline, accessed Feb. 8, 2011, 152 “Amazon’s recommendation engine is the direction”: Mark Steitz and Laura Quinn, “An Introduction to Microtargeting in Politics,” accessed Dec. 17, 2010, 153 round-the-clock “war room”: “Google’s War Room for the Home Stretch of Campaign 2010,” e.politics, Sept. 24, 2010, accessed Feb. 9, 2011, 155 “campaign wanted to spend on Facebook”: Vincent R.

Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei


bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, web application

If consumers follow a system recommendation but then do not end up liking the product, they are less likely to use the recommender system again. As with classification systems, recommender systems can make two types of errors: false negatives and false positives. Here, false negatives are products that the system fails to recommend, although the consumer would like them. False positives are products that are recommended, but which the consumer does not like. False positives are less desirable because they can annoy or anger consumers. Content-based recommender systems are limited by the features used to describe the items they recommend. Another challenge for both content-based and collaborative recommender systems is how to deal with new users for which a buying history is not yet available. Hybrid approaches integrate both content-based and collaborative methods to achieve further improved recommendations.

In summary, computer systems are at continual risk of breaks in security. Data mining technology can be used to develop strong intrusion detection and prevention systems, which may employ signature-based or anomaly-based detection. 13.3.5. Data Mining and Recommender Systems Today's consumers are faced with millions of goods and services when shopping online. Recommender systems help consumers by making product recommendations that are likely to be of interest to the user such as books, CDs, movies, restaurants, online news articles, and other services. Recommender systems may use either a content-based approach, a collaborative approach, or a hybrid approach that combines both content-based and collaborative methods. The content-based approach recommends items that are similar to items the user preferred or queried in the past.

They make use of keywords (describing the items) and user profiles that contain information about users' tastes and needs. Such profiles may be obtained explicitly (e.g., through questionnaires) or learned from users' transactional behavior over time. A collaborative recommender system tries to predict the utility of items for a user, u, based on items previously rated by other users who are similar to u. For example, when recommending books, a collaborative recommender system tries to find other users who have a history of agreeing with u (e.g., they tend to buy similar books, or give similar ratings for books). Collaborative recommender systems can be either memory (or heuristic) based or model based. Memory-based methods essentially use heuristics to make rating predictions based on the entire collection of items previously rated by users. That is, the unknown rating of an item–user combination can be estimated as an aggregate of ratings of the most similar users for the same item.

pages: 380 words: 118,675

The Everything Store: Jeff Bezos and the Age of Amazon by Brad Stone


3D printing, airport security, AltaVista, Amazon Mechanical Turk, Amazon Web Services, bank run, Bernie Madoff, big-box store, Black Swan, book scanning, Brewster Kahle, call centre, centre right, Clayton Christensen, cloud computing, collapse of Lehman Brothers, crowdsourcing, cuban missile crisis, Danny Hillis, Douglas Hofstadter, Elon Musk, facts on the ground, game design, housing crisis, invention of movable type, inventory management, James Dyson, Jeff Bezos, Kevin Kelly, Kodak vs Instagram, late fees, loose coupling, low skilled workers, Maui Hawaii, Menlo Park, Network effects, new economy, optical character recognition,, Ponzi scheme, quantitative hedge fund, recommendation engine, Renaissance Technologies, RFID, Rodney Brooks, search inside the book, shareholder value, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, Skype, statistical arbitrage, Steve Ballmer, Steve Jobs, Steven Levy, Stewart Brand, Thomas L Friedman, Tony Hsieh, Whole Earth Catalog, why are manhole covers round?

Once again, Amazon’s lawyers caught wind of this and renamed the program Vendor Realignment. Over the next year, Miller tangled with the European divisions of Random House, Hachette, and Bloomsbury, the publisher of the Harry Potter series. “I did everything I could to screw with their performance,” he says. He took selections of their catalog to full price and yanked their books from Amazon’s recommendation engine; with some titles, like travel books, he promoted comparable books from competitors. Miller’s constant search for new points of leverage exploited the anxieties of neurotic authors who obsessively tracked sales rank—the number on that showed an author how well his or her book was doing compared to other products on the site. “We would constantly meet with authors, so we’d know who would be watching their rankings.”

“Lyn was our ambassador. I credit her for maintaining these relationships.” Amazon approached large publishers aggressively. It demanded accommodations like steeper discounts on bulk purchases, longer periods to pay its bills, and shipping arrangements that leveraged Amazon’s discounts with UPS. To publishers that didn’t comply, Amazon threatened to pull their books out of its automated personalization and recommendation systems, meaning that they would no longer be suggested to customers. “Publishers didn’t really understand Amazon. They were very naïve about what was going on with their back catalog,” says Goss. “Most didn’t know their sales were up because their backlist was getting such visibility.” Amazon had an easy way to demonstrate its market power. When a publisher did not capitulate and the company shut off the recommendation algorithms for its books, the publisher’s sales usually fell by as much as 40 percent.

pages: 268 words: 75,850

The Formula: How Algorithms Solve All Our Problems-And Create More by Luke Dormehl


3D printing, algorithmic trading, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, big data - Walmart - Pop Tarts, call centre, Cass Sunstein, Clayton Christensen, computer age, death of newspapers, deferred acceptance, Edward Lorenz: Chaos theory, Erik Brynjolfsson, Filter Bubble, Flash crash, Florence Nightingale: pie chart, Frank Levy and Richard Murnane: The New Division of Labor, Google Earth, Google Glasses, High speed trading, Internet Archive, Isaac Newton, Jaron Lanier, Jeff Bezos, job automation, Kevin Kelly, Kodak vs Instagram, Marshall McLuhan, means of production, Nate Silver, natural language processing, Netflix Prize, pattern recognition, price discrimination, recommendation engine, Richard Thaler, Rosa Parks, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Slavoj Žižek, social graph, speech recognition, Steve Jobs, Steven Levy, Steven Pinker, Stewart Brand, the scientific method, The Signal and the Noise by Nate Silver, upwardly mobile, Wall-E, Watson beat the top human players on Jeopardy!, Y Combinator

Conversely, scores fall dramatically in situations where the task takes longer than expected.33 Decimated-Reality Aggregators Speaking in October 1944, during the rebuilding of the House of Commons, which had sustained heavy bombing damage during the Battle of Britain, former British prime minister Winston Churchill observed, “We shape our buildings; thereafter they shape us.”34 A similar sentiment might be said in the age of The Formula, in which users shape their online profiles, and from that point forward their online profiles begin to shape them—both in terms of what we see and, perhaps more crucially, what we don’t. Writing about a start-up called Nara, in the middle of 2013, I coined the phrase “decimated reality aggregators” to describe what the company was trying to do.35 Starting out as a restaurant recommender system by connecting together thousands of restaurants around the world, Nara’s ultimate goal was to become the recommender system for your life: drawing on what it knew about you from the restaurants you ate in, to suggest everything from hotels to clothes. Nara even incorporated the idea of upward mobility into its algorithm. Say, for example, you wanted to be a wine connoisseur two years down the line, but currently had no idea how to tell your Chardonnay from your Chianti.

In all, eHarmony’s arrival represented more than just another addition to an already crowded field of Internet dating websites—but a qualitative change in the way that Internet dating was carried out. “Neil was adamant that this should be based on science,” Carter says. Before eHarmony, the majority of dating websites took the form of searchable personal ads, of the kind that have been appearing in print since the 17th century.11 After eHarmony, the search engine model was replaced with a recommender system praised in press materials for its “scientific precision.” Instead of allowing users to scan through page after page of profiles, eHarmony simply required them to answer a series of questions—and then picked out the right option on their behalf. The website opened its virtual doors for the first time on August 22, 2000. There were a few initial teething problems. “Some people were critical of the matches they were getting,” Warren admits.

All a character has to do—as occurs during one scene in which the novel’s bumbling protagonist, Lenny Abramov, visits a Staten Island nightclub with his friends—is to set the “community parameters” of their iPhone-like device to a particular physical space and hit a button. At this point, every aspect of a person’s profile is revealed, including their “fuckability” and “personality” scores (both ranked on a scale of 800), along with their ranked “anal/oral/vaginal” preferences. There is even a recommender system incorporated, so that a user’s history of romantic relationships can be scrutinized for insights in much the same way that a person’s previous orders on Amazon might dictate what they will be interested in next. As one of Abramov’s friends notes, “This girl [has] a long multimedia thing on how her father abused her . . . Like, you’ve dated a lot of abused girls, so it knows you’re into that shit.”24 The world presented by Super Sad True Love Story is, in many ways, closer than you might think.

pages: 94 words: 26,453

The End of Nice: How to Be Human in a World Run by Robots (Kindle Single) by Richard Newton


3D printing, Black Swan, British Empire, Buckminster Fuller, Clayton Christensen, crowdsourcing, deliberate practice, fear of failure, Filter Bubble, future of work, Google Glasses, Isaac Newton, James Dyson, Jaron Lanier, Jeff Bezos, job automation, Lean Startup, low skilled workers, Mark Zuckerberg, move fast and break things, Paul Erdős, Paul Graham, recommendation engine, rising living standards, Robert Shiller, Robert Shiller, Silicon Valley, Silicon Valley startup, skunkworks, Steve Ballmer, Steve Jobs, Y Combinator

Like the sirens of legends sung sweet songs to lure sailors to crash on the rocky shore of their island, so Lanier thinks we must be wary of the attractions of the siren servers. They don’t want to make your life more complicated. They are there to make everything frictionless: “Leave it to me”, they sing. “I’ll find you new music you might like, books you’ll want to read, videos you want to watch and friends you should like.” We’re sort of used to the idea that recommendation engines work like this. We know that ads now follow us around the web and that books will be unhelpfully recommended to us by Amazon. But search results are also tailored to you. And that’s more of a concern. The search results you get will be different to the results for an identical search made by me. In fact, so much insight can be derived from your online behaviour that Google and other organisations can ensure you get news that makes you happy… or even angry the way you like to be angry.

pages: 163 words: 42,402

Machine Learning for Email by Drew Conway, John Myles White


call centre, correlation does not imply causation, Debian, natural language processing, Netflix Prize, pattern recognition, recommendation engine, SpamAssassin, text mining

Generating rules for ranking a list of items is an increasingly common task in machine learning, yet you may not have thought of it in these terms. More likely, you have heard of something like a recommendation system, which implicitly produces a ranking of products. Even if you have not heard of a recommendation system, it’s almost certain that you have used or interacted with a recommendation system at some point. Some of the most successful e-commerce websites have benefitted from leveraging data on their users to generate recommendations for other products their users might be interested in. For example, if you have ever shopped at, then you have interacted with a recommendation system. The problem Amazon faces is simple: what items in their inventory are you most likely to buy? The implication of that statement is that the items in Amazon’s inventory have an ordering specific to each user.

There are many excellent books that focus on the fundamentals, the seminal work being Hastie, Tibshirani, and Friedman’s The Elements of Statistical Learning HTF09.[1] But another important part of the hacker mantra is to learn by doing. Many hackers may be more comfortable thinking of problems in terms of the process by which a solution is attained, rather than the theoretical foundation from which the solution is derived. From this perspective, an alternative approach to teaching machine learning would be to use “cookbook” style examples. To understand how a recommendation system works, for example, we might provide sample training data and a version of the model, and show how the latter uses the former. There are many useful texts of this kind as well—Toby Segaran’s Programming Collective Intelligence is an recent example Seg07. Such a discussion would certainly address the how of a hacker’s method of learning, but perhaps less of the why. Along with understanding the mechanics of a method, we may also want to learn why it is used in a certain context or to address a specific problem.

The implication of that statement is that the items in Amazon’s inventory have an ordering specific to each user. Likewise, has a massive library of DVDs available to its customers to rent. In order for those customers to get the most out of the site, Netflix employs a sophisticated recommendation system to present people with rental suggestions. For both companies, these recommendations are based on two kinds of data. First, there is the data pertaining to the inventory itself. For Amazon, if the product is a television, this data might contain the type (i.e., plasma, LCD, LED), manufacturer, price, and so on. For Netflix, this data might be the genre of a film, its cast, director, running time, etc. Second, there is the data related to the browsing and purchasing behavior of the customers. This sort of data can help Amazon understand what accessories most people look for when shopping for a new plasma TV and can help Netflix understand which romantic comedies George A.

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose


Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, speech recognition, statistical model, William of Occam

COLLABORATIVE FILTERING (RECOMMENDER SYSTEMS) So far we have discussed approaches to content-based retrieval and clustering of documents, where the basic relation that is used in the document description is “document contains term.” At some point we looked into the role of web users as a source of feedback to improve the document ranking. However, we may consider web users as entities in a relation such as the document–term relation. This may, for example, be “web user likes web page.” Then we can build a user–document matrix and use documents to describe users in terms of web pages they like. A more general approach would be to consider persons and items again connected by the relation “person likes item.” This is the approach taken in the area of collaborative filtering (also called recommender systems) [3]. Assume that we have m persons and n items (e.g., books, songs, movies, web pages).

CONTENTS PREFACE xi PART I WEB STRUCTURE MINING 1 2 INFORMATION RETRIEVAL AND WEB SEARCH 3 Web Challenges Web Search Engines Topic Directories Semantic Web Crawling the Web Web Basics Web Crawlers Indexing and Keyword Search Document Representation Implementation Considerations Relevance Ranking Advanced Text Search Using the HTML Structure in Keyword Search Evaluating Search Quality Similarity Search Cosine Similarity Jaccard Similarity Document Resemblance References Exercises 3 4 5 5 6 6 7 13 15 19 20 28 30 32 36 36 38 41 43 43 HYPERLINK-BASED RANKING 47 Introduction Social Networks Analysis PageRank Authorities and Hubs Link-Based Similarity Search Enhanced Techniques for Page Ranking References Exercises 47 48 50 53 55 56 57 57 vii viii CONTENTS PART II WEB CONTENT MINING 3 4 5 CLUSTERING 61 Introduction Hierarchical Agglomerative Clustering k-Means Clustering Probabilty-Based Clustering Finite Mixture Problem Classification Problem Clustering Problem Collaborative Filtering (Recommender Systems) References Exercises 61 63 69 73 74 76 78 84 86 86 EVALUATING CLUSTERING 89 Approaches to Evaluating Clustering Similarity-Based Criterion Functions Probabilistic Criterion Functions MDL-Based Model and Feature Evaluation Minimum Description Length Principle MDL-Based Model Evaluation Feature Selection Classes-to-Clusters Evaluation Precision, Recall, and F-Measure Entropy References Exercises 89 90 95 100 101 102 105 106 108 111 112 112 CLASSIFICATION 115 General Setting and Evaluation Techniques Nearest-Neighbor Algorithm Feature Selection Naive Bayes Algorithm Numerical Approaches Relational Learning References Exercises 115 118 121 125 131 133 137 138 PART III WEB USAGE MINING 6 INTRODUCTION TO WEB USAGE MINING 143 Definition of Web Usage Mining Cross-Industry Standard Process for Data Mining Clickstream Analysis 143 144 147 CONTENTS 7 8 9 ix Web Server Log Files Remote Host Field Date/Time Field HTTP Request Field Status Code Field Transfer Volume (Bytes) Field Common Log Format Identification Field Authuser Field Extended Common Log Format Referrer Field User Agent Field Example of a Web Log Record Microsoft IIS Log Format Auxiliary Information References Exercises 148 PREPROCESSING FOR WEB USAGE MINING 156 Need for Preprocessing the Data Data Cleaning and Filtering Page Extension Exploration and Filtering De-Spidering the Web Log File User Identification Session Identification Path Completion Directories and the Basket Transformation Further Data Preprocessing Steps References Exercises 156 149 149 149 150 151 151 151 151 151 152 152 152 153 154 154 154 158 161 163 164 167 170 171 174 174 174 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING 177 Introduction Number of Visit Actions Session Duration Relationship between Visit Actions and Session Duration Average Time per Page Duration for Individual Pages References Exercises 177 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION Introduction Modeling Methodology Definition of Clustering The BIRCH Clustering Algorithm Affinity Analysis and the A Priori Algorithm 177 178 181 183 185 188 188 191 191 192 193 194 197 x CONTENTS Discretizing the Numerical Variables: Binning Applying the A Priori Algorithm to the CCSU Web Log Data Classification and Regression Trees The C4.5 Algorithm References Exercises INDEX 199 201 204 208 210 211 213 PREFACE DEFINING DATA MINING THE WEB By data mining the Web, we refer to the application of data mining methodologies, techniques, and models to the variety of data forms, structures, and usage patterns that comprise the World Wide Web.

Concept learning methods can also be used to generate explicit descriptions of sets of web documents, which can then be applied to categorization of new documents or to better understand the document area or topic. Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage C 2007 John Wiley & Sons, Inc. By Zdravko Markov and Daniel T. Larose Copyright CHAPTER 3 CLUSTERING INTRODUCTION HIERARCHICAL AGGLOMERATIVE CLUSTERING k-MEANS CLUSTERING PROBABILTY-BASED CLUSTERING COLLABORATIVE FILTERING (RECOMMENDER SYSTEMS) INTRODUCTION The most popular approach to learning is by example. Given a set of objects, each labeled with a class (category), the learning system builds a mapping between objects and classes which can then be used for classifying new (unlabeled) objects. As the labeling (categorization) of the initial (training) set of objects is done by an agent external to the system (teacher), this setting is called supervised learning.

pages: 308 words: 84,713

The Glass Cage: Automation and Us by Nicholas Carr


Airbnb, Andy Kessler, Atul Gawande, autonomous vehicles, business process, call centre, Captain Sullenberger Hudson, Checklist Manifesto, cloud computing, David Brooks, deliberate practice, deskilling, Elon Musk, Erik Brynjolfsson, Flash crash, Frank Gehry, Frank Levy and Richard Murnane: The New Division of Labor, Frederick Winslow Taylor, future of work, global supply chain, Google Glasses, Google Hangouts, High speed trading, indoor plumbing, industrial robot, Internet of things, Jacquard loom, Jacquard loom, James Watt: steam engine, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Kevin Kelly, knowledge worker, Lyft, Mark Zuckerberg, means of production, natural language processing, new economy, Nicholas Carr, Norbert Wiener, Oculus Rift, pattern recognition, Peter Thiel, place-making, Plutocrats, plutocrats, profit motive, Ralph Waldo Emerson, RAND corporation, randomized controlled trial, Ray Kurzweil, recommendation engine, robot derives from the Czech word robota Czech, meaning slave, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley ideology, software is eating the world, Stephen Hawking, Steve Jobs, TaskRabbit, technoutopianism, The Wealth of Nations by Adam Smith, Watson beat the top human players on Jeopardy!

Thanks to the proliferation of smartphones, tablets, and other small, affordable, and even wearable computers, we now depend on software to carry out many of our daily chores and pastimes. We launch apps to aid us in shopping, cooking, exercising, even finding a mate and raising a child. We follow turn-by-turn GPS instructions to get from one place to the next. We use social networks to maintain friendships and express our feelings. We seek advice from recommendation engines on what to watch, read, and listen to. We look to Google, or to Apple’s Siri, to answer our questions and solve our problems. The computer is becoming our all-purpose tool for navigating, manipulating, and understanding the world, in both its physical and its social manifestations. Just think what happens these days when people misplace their smartphones or lose their connections to the net.

Like all analytical programs, they have a bias toward criteria that lend themselves to statistical analysis, downplaying those that entail the exercise of taste or other subjective judgments. Automated essay-grading algorithms encourage in students a rote mastery of the mechanics of writing. The programs are deaf to tone, uninterested in knowledge’s nuances, and actively resistant to creative expression. The deliberate breaking of a grammatical rule may delight a reader, but it’s anathema to a computer. Recommendation engines, whether suggesting a movie or a potential love interest, cater to our established desires rather than challenging us with the new and unexpected. They assume we prefer custom to adventure, predictability to whimsy. The technologies of home automation, which allow things like lighting, heating, cooking, and entertainment to be meticulously programmed, impose a Taylorist mentality on domestic life.

pages: 292 words: 85,151

Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It) by Salim Ismail, Yuri van Geest


23andMe, 3D printing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, bioinformatics, bitcoin, Black Swan, blockchain, Burning Man, business intelligence, business process, call centre, chief data officer, Clayton Christensen, clean water, cloud computing, cognitive bias, collaborative consumption, collaborative economy, corporate social responsibility, cross-subsidies, crowdsourcing, cryptocurrency, dark matter, Dean Kamen, dematerialisation, discounted cash flows, distributed ledger, Edward Snowden, Elon Musk,, ethereum blockchain, Galaxy Zoo, game design, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, hiring and firing, Hyperloop, industrial robot, Innovator's Dilemma, Internet of things, Iridium satellite, Isaac Newton, Jeff Bezos, Kevin Kelly, Kickstarter, knowledge worker, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, loose coupling, loss aversion, Lyft, Mark Zuckerberg, market design, means of production, minimum viable product, natural language processing, Netflix Prize, Network effects, new economy, Oculus Rift, offshore financial centre, p-value, PageRank, pattern recognition, Paul Graham, Peter H. Diamandis: Planetary Resources, Peter Thiel, prediction markets, profit motive, publish or perish, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, Ronald Coase, Second Machine Age, self-driving car, sharing economy, Silicon Valley, skunkworks, Skype, smart contracts, Snapchat, social software, software is eating the world, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, subscription business, supply-chain management, TaskRabbit, telepresence, telepresence robot, Tony Hsieh, transaction costs, Tyler Cowen: Great Stagnation, urban planning, WikiLeaks, winner-take-all economy, X Prize, Y Combinator

Ten years later, its revenues had jumped 125x and the company was generating a half-billion dollars every three days. At the heart of this staggering growth was the PageRank algorithm, which ranks the popularity of web pages. (Google doesn’t gauge which page is better from a human perspective; its algorithms simply respond to the pages that deliver the most clicks.) Google isn’t alone. Today, the world is pretty much run on algorithms. From automotive anti-lock braking to Amazon’s recommendation engine; from dynamic pricing for airlines to predicting the success of upcoming Hollywood blockbusters; from writing news posts to air traffic control; from credit card fraud detection to the 2 percent of posts that Facebook shows a typical user—algorithms are everywhere in modern life. Recently, McKinsey estimated that of the seven hundred end-to-end bank processes (opening an account or getting a car loan, for example), about half can be fully automated.

Not only has he made that rare transition from founder to large-company CEO, but he has also consistently avoided the short-term thinking that so often comes with running a public company—what Joi Ito calls “nowism.” Amazon regularly makes long bets (e.g., Amazon Web Services, Kindle, and now Fire smartphones and delivery drones), views new products as if they are seedlings needing careful tending for a five-to-seven-year period, is maniacal about growth over profits and ignores the short-term view of Wall Street analysts. Its pioneering initiatives include its Affiliate Program, its recommendation engine (collaborative filtering) and the Mechanical Turk project. As Bezos says, “If you’re competitor-focused, you have to wait until there is a competitor doing something. Being customer-focused allows you to be more pioneering.” Not only has Amazon built ExOs on its edges (such as AWS), it also has had the courage to cannibalize its own products (e.g., Kindle). In addition, after realizing that Amazon’s culture wasn’t a perfect fit with the outstanding service he wanted to offer, Bezos spent $1.2 billion in 2009 to acquire Zappos.

pages: 391 words: 105,382

Utopia Is Creepy: And Other Provocations by Nicholas Carr


Air France Flight 447, Airbnb, AltaVista, Amazon Mechanical Turk, augmented reality, autonomous vehicles, Bernie Sanders, book scanning, Brewster Kahle, Buckminster Fuller, Burning Man, Captain Sullenberger Hudson, centralized clearinghouse, cloud computing, cognitive bias, collaborative consumption, computer age, corporate governance, crowdsourcing, Danny Hillis, deskilling, Donald Trump, Elon Musk, factory automation, failed state, feminist movement, Frederick Winslow Taylor, friendly fire, game design, global village, Google bus, Google Glasses, Google X / Alphabet X, Googley, hive mind, impulse control, indoor plumbing, interchangeable parts, Internet Archive, invention of movable type, invention of the steam engine, invisible hand, Isaac Newton, Jeff Bezos, jimmy wales, job automation, Kevin Kelly, low skilled workers, Mark Zuckerberg, Marshall McLuhan, means of production, Menlo Park, mental accounting, natural language processing, Network effects, new economy, Nicholas Carr, oil shale / tar sands, Peter Thiel, Plutocrats, plutocrats, profit motive, Ralph Waldo Emerson, Ray Kurzweil, recommendation engine, Republic of Letters, robot derives from the Czech word robota Czech, meaning slave, Ronald Reagan, self-driving car, SETI@home, side project, Silicon Valley, Silicon Valley ideology, Singularitarianism, Snapchat, social graph, social web, speech recognition, Startup school, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, technoutopianism, the medium is the message, theory of mind, Turing test, Whole Earth Catalog, Y Combinator

The great power of modern digital filters lies in their ability to make information that is of inherent interest to us immediately visible to us. The information may take the form of personal messages or updates from friends or colleagues, broadcast messages from experts or celebrities whose opinions or observations we value, headlines and stories from writers or publications we like, alerts about the availability of various other sorts of content on favorite subjects, or suggestions from recommendation engines—but it all shares the quality of being tailored to our particular interests. It’s all needles. And modern filters don’t just organize that information for us; they push the information at us as alerts, updates, streams. We tend to point to spam as an example of information overload. But spam is just an annoyance. The real source of information overload, at least of the ambient sort, is the stuff we like, the stuff we want.

To thine own image be true. 16. No great work of literature could have been written in hypertext. 17. Social media is a palliative for underemployment. 18. The philistine appears ideally suited to the role of cultural impresario online. 19. Television became more interesting when people started paying for it. 20. Instagram shows us what a world without art looks like. SECOND SERIES (2013) 21. Recommendation engines are the best cure for hubris. 22. Vines would be better if they were one second shorter. 23. Hell is other selfies. 24. Twitter has revealed that brevity and verbosity are not always antonyms. 25. Personalized ads provide a running critique of artificial intelligence. 26. Who you are is what you do between notifications. 27. Online is to offline as a swimming pool to a pond. 28. People in love leave the sparsest data trails. 29.

pages: 366 words: 94,209

Throwing Rocks at the Google Bus: How Growth Became the Enemy of Prosperity by Douglas Rushkoff


3D printing, Airbnb, algorithmic trading, Amazon Mechanical Turk, Andrew Keen, bank run, banking crisis, barriers to entry, bitcoin, blockchain, Burning Man, business process, buy low sell high, California gold rush, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, centralized clearinghouse, citizen journalism, clean water, cloud computing, collaborative economy, collective bargaining, colonial exploitation, Community Supported Agriculture, corporate personhood, crowdsourcing, cryptocurrency, disintermediation, diversified portfolio, Elon Musk, Erik Brynjolfsson, ethereum blockchain, fiat currency, Firefox, Flash crash, full employment, future of work, gig economy, Gini coefficient, global supply chain, global village, Google bus, Howard Rheingold, IBM and the Holocaust, impulse control, income inequality, index fund, iterative process, Jaron Lanier, Jeff Bezos, jimmy wales, job automation, Joseph Schumpeter, Kickstarter, loss aversion, Lyft, Mark Zuckerberg, market bubble, market fundamentalism, Marshall McLuhan, means of production, medical bankruptcy, minimum viable product, Naomi Klein, Network effects, new economy, Norbert Wiener, Oculus Rift, passive investing, payday loans, peer-to-peer lending, Peter Thiel, post-industrial society, profit motive, quantitative easing, race to the bottom, recommendation engine, reserve currency, RFID, Richard Stallman, ride hailing / ride sharing, Ronald Reagan, Satoshi Nakamoto, Second Machine Age, shareholder value, sharing economy, Silicon Valley, Snapchat, social graph, software patent, Steve Jobs, TaskRabbit, trade route, transportation-network company, Turing test, Uber and Lyft, Uber for X, unpaid internship, Y Combinator, young professional, Zipcar

., became one of the first publicly traded Internet giants, responsible (or to blame) for not only the first e-commerce Web sites but also the first banner ad.6 Matthew was likely just as surprised by where this all went as I was. The information superhighway morphed into an interactive strip mall; digital technology’s ability to connect people to products, facilitate payments, and track behaviors led to all sorts of new marketing and sales innovations. “Buy” buttons triggered the impulse for instant gratification, while recommendation engines personalized marketing pitches. It was commerce on crack. With a few notable exceptions—such as eBay and Etsy—we didn’t really get a return of the many-to-many marketplace or digital bazaar. No, in online commerce it’s mostly a few companies selling to many, and many people selling to the very few—if anyone at all. Take music. The best part of an online music catalogue is that it is unlimited in size.

Amazon then leveraged its monopoly in books and free shipping to develop monopolies in other verticals, beginning with home electronics (bankrupting Circuit City and Best Buy), and then every other link in the physical and virtual fulfillment chain, from shoes and food to music and videos. Finally, Amazon flips into personhood by reversing the traditional relationship between people and machines. Amazon’s patented recommendation engines attempt to drive our human selection process. Amazon Mechanical Turks gave computers the ability to mete out repetitive tasks to legions of human drones. The computers did the thinking and choosing; the people pointed and clicked as they were instructed or induced to do. Neither Amazon nor its founder, Jeff Bezos, is slipping to new lows here. The company is simply operating true to the core program of corporatism, expressed through new digital means.

pages: 579 words: 76,657

Data Science from Scratch: First Principles with Python by Joel Grus


correlation does not imply causation, natural language processing, Netflix Prize, p-value, Paul Graham, recommendation engine, SpamAssassin, statistical model

principal component analysis, Dimensionality Reduction probability, Probability-For Further Exploration, MathematicsBayes's Theorem, Bayes’s Theorem central limit theorem, The Central Limit Theorem conditional, Conditional Probability continuous distributions, Continuous Distributions defined, Probability dependence and independence, Dependence and Independence normal distribution, The Normal Distribution random variables, Random Variables probability density function, Continuous Distributions programming languages for learning data science, From Scratch Python, A Crash Course in Python-For Further Explorationargs and kwargs, args and kwargs arithmetic, Arithmetic benefits of using for data science, From Scratch Booleans, Truthiness control flow, Control Flow Counter, Counter dictionaries, Dictionaries-defaultdict enumerate function, enumerate exceptions, Exceptions functional tools, Functional Tools functions, Functions generators and iterators, Generators and Iterators list comprehensions, List Comprehensions lists, Lists object-oriented programming, Object-Oriented Programming piping data through scripts using stdin and stdout, stdin and stdout random numbers, generating, Randomness regular expressions, Regular Expressions sets, Sets sorting in, The Not-So-Basics strings, Strings tuples, Tuples whitespace formatting, Whitespace Formatting zip function and argument unpacking, zip and Argument Unpacking Q quantile, computing, Central Tendencies query optimization (SQL), Query Optimization R R (programming language), From Scratch, R random forests, Random Forests random module (Python), Randomness random variables, Random VariablesBernoulli, The Central Limit Theorem binomial, The Central Limit Theorem conditioned on events, Random Variables expected value, Random Variables normal, The Normal Distribution-The Central Limit Theorem uniform, Continuous Distributions range, Dispersion range function (Python), Generators and Iterators reading files (see files, reading) recall, Correctness recommendations, Recommender Systems recommender systems, Recommender Systems-For Further ExplorationData Scientists You May Know (example), Data Scientists You May Know item-based collaborative filtering, Item-Based Collaborative Filtering-For Further Exploration manual curation, Manual Curation recommendations based on popularity, Recommending What’s Popular user-based collaborative filtering, User-Based Collaborative Filtering-User-Based Collaborative Filtering reduce function (Python), Functional Toolsusing with vectors, Vectors regression (see linear regression; logistic regression) regression trees, What Is a Decision Tree?

Additionally, both of his endorsers endorsed only him, which means that he doesn’t have to divide their rank with anyone else. For Further Exploration There are many other notions of centrality besides the ones we used (although the ones we used are pretty much the most popular ones). NetworkX is a Python library for network analysis. It has functions for computing centralities and for visualizing graphs. Gephi is a love-it/hate-it GUI-based network-visualization tool. Chapter 22. Recommender Systems O nature, nature, why art thou so dishonest, as ever to send men with these false recommendations into the world! Henry Fielding Another common data problem is producing recommendations of some sort. Netflix recommends movies you might want to watch. Amazon recommends products you might want to buy. Twitter recommends users you might want to follow. In this chapter, we’ll look at several ways to use data to make recommendations.

= other_interest_id and similarity > 0] return sorted(pairs, key=lambda (_, similarity): similarity, reverse=True) which suggests the following similar interests: [('Hadoop', 0.8164965809277261), ('Java', 0.6666666666666666), ('MapReduce', 0.5773502691896258), ('Spark', 0.5773502691896258), ('Storm', 0.5773502691896258), ('Cassandra', 0.4082482904638631), ('artificial intelligence', 0.4082482904638631), ('deep learning', 0.4082482904638631), ('neural networks', 0.4082482904638631), ('HBase', 0.3333333333333333)] Now we can create recommendations for a user by summing up the similarities of the interests similar to his: def item_based_suggestions(user_id, include_current_interests=False): # add up the similar interests suggestions = defaultdict(float) user_interest_vector = user_interest_matrix[user_id] for interest_id, is_interested in enumerate(user_interest_vector): if is_interested == 1: similar_interests = most_similar_interests_to(interest_id) for interest, similarity in similar_interests: suggestions[interest] += similarity # sort them by weight suggestions = sorted(suggestions.items(), key=lambda (_, similarity): similarity, reverse=True) if include_current_interests: return suggestions else: return [(suggestion, weight) for suggestion, weight in suggestions if suggestion not in users_interests[user_id]] For user 0, this generates the following (seemingly reasonable) recommendations: [('MapReduce', 1.861807319565799), ('Postgres', 1.3164965809277263), ('MongoDB', 1.3164965809277263), ('NoSQL', 1.2844570503761732), ('programming languages', 0.5773502691896258), ('MySQL', 0.5773502691896258), ('Haskell', 0.5773502691896258), ('databases', 0.5773502691896258), ('neural networks', 0.4082482904638631), ('deep learning', 0.4082482904638631), ('C++', 0.4082482904638631), ('artificial intelligence', 0.4082482904638631), ('Python', 0.2886751345948129), ('R', 0.2886751345948129)] For Further Exploration Crab is a framework for building recommender systems in Python. Graphlab also has a recommender toolkit. The Netflix Prize was a somewhat famous competition to build a better system to recommend movies to Netflix users. Chapter 23. Databases and SQL Memory is man’s greatest friend and worst enemy. Gilbert Parker The data you need will often live in databases, systems designed for efficiently storing and querying data. The bulk of these are relational databases, such as Oracle, MySQL, and SQL Server, which store data in tables and are typically queried using Structured Query Language (SQL), a declarative language for manipulating data.

pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel


Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, call centre, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil,, Erik Brynjolfsson, experimental subject, Google Glasses, happiness index / gross national happiness, job satisfaction, Johann Wolfgang von Goethe, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, software as a service, speech recognition, statistical model, Steven Levy, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra

I Knew You Were Going to Do That With this power at hand, what do we want to predict? Every important thing a person does is valuable to predict, namely: consume, think, work, quit, vote, love, procreate, divorce, mess up, lie, cheat, steal, kill, and die. Let’s explore some examples.2 People Consume Hollywood studios predict the success of a screenplay if produced. Netflix awarded $1 million to a team of scientists who best improved their recommendation system’s ability to predict which movies you will like. Australian energy company Energex predicts electricity demand in order to decide where to build out its power grid, and Con Edison predicts system failure in the face of high levels of consumption. Wall Street predicts stock prices by observing how demand drives them up and down. The firms AlphaGenius and Derwent Capital drive hedge fund trading by following trends across the general public’s activities on Twitter.

I was at Walgreens a few years ago, and upon checkout an attractive, colorful coupon spit out of the machine. The product it hawked, pictured for all my fellow shoppers to see, had the potential to mortify. It was a coupon for Beano, a medication for flatulence. I’d developed mild lactose intolerance, but, before figuring that out, had been trying anything to address my symptom. Acting blindly on data, Walgreens’ recommendation system seemed to suggest that others not stand so close. Other clinical data holds a more serious and sensitive status than digestive woes. Once, when teaching a summer program for talented teenagers, I received data I felt would have been better kept away from me. The administrator took me aside to inform me that one of my students had a diagnosis of bipolar disorder. I wasn’t trained in psychology.

Such a contest is a hard-nosed, objective bake-off—whoever can cook up the solution that best handles the predictive task at hand wins kudos and, usually, cash. Dark Horses And so it was with our two Montrealers, Martin and Martin, who took the Netflix Prize by storm despite their lack of experience—or, perhaps, because of it. Neither had a background in statistics or analytics, let alone recommendation systems in particular. By day, the two worked in the telecommunications industry developing software. But by night, at home, the two-member team plugged away, for 10 to 20 hours per week apiece, racing ahead in the contest under the team name PragmaticTheory. The “pragmatic” approach proved groundbreaking. The team wavered in and out of the number one slot; during the final months of the competition, the team was often in the top echelons.

pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos


3D printing, Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, crowdsourcing, Danny Hillis, data is the new oil, double helix, Douglas Hofstadter, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, global village, Google Glasses, Gödel, Escher, Bach, information retrieval, job automation, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, lone genius, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, NP-complete, P = NP, PageRank, pattern recognition, phenotype, planetary scale, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, Second Machine Age, self-driving car, Silicon Valley, speech recognition, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight

Satellites, DNA sequencers, and particle accelerators probe nature in ever-finer detail, and learning algorithms turn the torrents of data into new scientific knowledge. Companies know their customers like never before. The candidate with the best voter models wins, like Obama against Romney. Unmanned vehicles pilot themselves across land, sea, and air. No one programmed your tastes into the Amazon recommendation system; a learning algorithm figured them out on its own, by generalizing from your past purchases. Google’s self-driving car taught itself how to stay on the road; no engineer wrote an algorithm instructing it, step-by-step, how to get from A to B. No one knows how to program a car to drive, and no one needs to, because a car equipped with a learning algorithm picks it up by observing what the driver does.

It’s an ideal job for machine learning, and yet today’s learners aren’t up to it. Each has some of the needed capabilities but is missing others. The Master Algorithm is the complete package. Applying it to vast amounts of patient and drug data, combined with knowledge mined from the biomedical literature, is how we will cure cancer. A universal learner is sorely needed in many other areas, from life-and-death to mundane situations. Picture the ideal recommender system, one that recommends the books, movies, and gadgets you would pick for yourself if you had the time to check them all out. Amazon’s algorithm is a very far cry from it. That’s partly because it doesn’t have enough data—mainly it just knows which items you previously bought from Amazon—but if you went hog wild and gave it access to your complete stream of consciousness from birth, it wouldn’t know what to do with it.

The price, of course, is that its vision is blurrier: fine details of the frontier get washed away by the voting. When k goes up, variance decreases, but bias increases. Using the k nearest neighbors instead of one is not the end of the story. Intuitively, the examples closest to the test example should count for more. This leads us to the weighted k-nearest-neighbor algorithm. In 1994, a team of researchers from the University of Minnesota and MIT built a recommendation system based on what they called “a deceptively simple idea”: people who agreed in the past are likely to agree again in the future. That notion led directly to the collaborative filtering systems that all self-respecting e-commerce sites have. Suppose that, like Netflix, you’ve gathered a database of movie ratings, with each user giving a rating of one to five stars to the movies he or she has seen.

pages: 375 words: 88,306

The Sharing Economy: The End of Employment and the Rise of Crowd-Based Capitalism by Arun Sundararajan


3D printing, additive manufacturing, Airbnb, Amazon Mechanical Turk, autonomous vehicles, barriers to entry, bitcoin, blockchain, Burning Man, call centre, collaborative consumption, collaborative economy, collective bargaining, corporate social responsibility, cryptocurrency, David Graeber, distributed ledger, employer provided health coverage, Erik Brynjolfsson, ethereum blockchain, Frank Levy and Richard Murnane: The New Division of Labor, future of work, George Akerlof, gig economy, housing crisis, Howard Rheingold, Internet of things, inventory management, invisible hand, job automation, job-hopping, Kickstarter, knowledge worker, Kula ring, Lyft, megacity, minimum wage unemployment, moral hazard, Network effects, new economy, Oculus Rift, pattern recognition, peer-to-peer lending, profit motive, purchasing power parity, race to the bottom, recommendation engine, regulatory arbitrage, rent control, Richard Florida, ride hailing / ride sharing, Robert Gordon, Ronald Coase, Second Machine Age, self-driving car, sharing economy, Silicon Valley, smart contracts, Snapchat, social software, supply-chain management, TaskRabbit, The Nature of the Firm, total factor productivity, transaction costs, transportation-network company, two-sided market, Uber and Lyft, Uber for X, universal basic income, Zipcar

Thus, a big fraction of Google’s impact on the economy isn’t captured since changes in consumer surplus are not reflected in the GDP. This point has been noted about digital markets more generally. While a conventional brick-and-mortar bookstore may hold 40,000 to 100,000 books, Amazon offers access to over 3 million books. The same expansion in variety holds true for music, movies, electronics, and myriad other products. Furthermore, since Amazon uses several recommender systems to help promote products, it is not just variety but “fit” that has increased.14 Capturing the economic impacts of enhanced variety and automated word-of-mouth promotions, however, is difficult, since once again, what has changed is primarily the quality of the consumer experience. As Erik Brynjolfsson, Yu (Jeffery) Hu, and Michael Smith argue in their study of consumer surplus in the digital economy, these benefits may be particularly difficult to measure because different consumers are impacted to varying degrees.

This improves the welfare of these consumers by allowing them to locate and buy specialty products they otherwise would not have purchased due to high transaction costs or low product awareness. This effect will be especially beneficial to those consumers who live in remote areas.”15 Analogous increases in consumer surplus were documented by Anindya Ghose, Rahul Telang and Michael Smith in their 2005 study of electronic markets for used books.16 These effects are exacerbated by a wide variety of recommender systems that use machine learning algorithms to better direct consumer choice. As Alexander Tuzhilin and Gedas Adomavicius document, such systems are ubiquitous in digital markets.17 It is natural to expect similar challenges when, for example, trying to encompass the different economic impacts of increased variety and fit from Airbnb, or increased convenience from Lyft, or Dennis’s increased access to financing on the Isle of Gigha.

Smith, “Consumer Surplus in the Digital Economy: Estimating the Value of Increased Product Variety at Online Booksellers,” Management Science 49, 11 (2003): 1580–1596, 1581. 16. Anindya Ghose, Rahul Telang and Michael D. Smith, “Internet Exchanges for Used Books: An Empirical Analysis of Product Cannibalization and Welfare Impact,” Information Systems Research 17, 1 (2006): 3–9. 17. Alexander Tuzhilin and Gedas Adomavicius, ”Toward the next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions,” IEEE Transactions on Knowledge and Data Engineering 17, 6 (2006): 734–739. 18. Prasanna Tambe and Lorin M. Hitt, “Job Hopping, Information Technology Spillovers, and Productivity Growth,” Management Science 60, 2 (2013): 338–355. 19. One might instead consider using the term “efficiency” of capital or “productivity” of capital.

pages: 176 words: 55,819

The Start-Up of You by Reid Hoffman


Airbnb, Andy Kessler, Black Swan, business intelligence, Cal Newport, Clayton Christensen, David Brooks, Donald Trump,, fear of failure, follow your passion, future of work, game design, Jeff Bezos, job automation, late fees, Mark Zuckerberg, Menlo Park, out of africa, Paul Graham, Peter Thiel, recommendation engine, Richard Bolles, risk tolerance, rolodex, shareholder value, side project, Silicon Valley, Silicon Valley startup, social web, Steve Jobs, Steve Wozniak, Tony Hsieh, transaction costs

In 1999 he set up a meeting at Blockbuster’s headquarters in part to discuss possibly partnering on local distribution and faster fulfillment. Blockbuster was not impressed. “They just about laughed us out of their office,” Reed recalls.16 Reed and his team kept at it. They perfected their distribution center network so that more than 80 percent of customers received overnight delivery of movies.17 They developed an innovative recommendation engine that prompted users with movies they might like based on past purchases. By 2005 Netflix had a subscriber base four million strong, had fended off competition from imitations like Walmart’s online movie-by-mail effort, and became the king of online movie rentals. In 2010 Netflix made a profit of more than $160 million. Blockbuster, in comparison, failed to adapt to the Internet era. That year it filed for bankruptcy.18 Netflix is not resting.

pages: 222 words: 53,317

Overcomplicated: Technology at the Limits of Comprehension by Samuel Arbesman


3D printing, algorithmic trading, Anton Chekhov, Apple II, Benoit Mandelbrot, citation needed, combinatorial explosion, Danny Hillis, David Brooks, discovery of the americas,, Erik Brynjolfsson, Flash crash, friendly AI, game design, Google X / Alphabet X, Googley, HyperCard, Inbox Zero, Isaac Newton, iterative process, Kevin Kelly, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mandelbrot fractal, Minecraft, Netflix Prize, Nicholas Carr, Parkinson's law, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, Richard Feynman: Challenger O-ring, Second Machine Age, self-driving car, software studies, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, Stewart Brand, superintelligent machines, Therac-25, Tyler Cowen: Great Stagnation, urban planning, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, Y2K

The sophisticated machine learning techniques used in linguistics—employing probability and a large array of parameters rather than principled rules—are increasingly being used in numerous other areas, both in science and outside it, from criminal detection to medicine, as well as in the insurance industry. Even our aesthetic tastes are rather complicated, as Netflix discovered when it awarded a prize for improvements in its recommendation engine to a team whose solution was cobbled together from a variety of different statistical techniques. The contest seemed to demonstrate that no simple algorithm could provide a significant improvement in recommendation accuracy; the winners needed to use a more complex suite of methods in order to capture and predict our personal and quirky tastes in films. This phenomenon occurs in all types of technology.

pages: 593 words: 118,995

Relevant Search: With Examples Using Elasticsearch and Solr by Doug Turnbull, John Berryman


crowdsourcing, domain-specific language, finite state, fudge factor, full text search, information retrieval, natural language processing, premature optimization, recommendation engine, sentiment analysis

These methods are less intuitive than the simple co-occurrence counting method presented here, and they tend to be more challenging to implement. But they often provide better results, because they employ a more holistic understanding of item-user relationships. To dive deeper into recommendation systems, we recommend Practical Recommender Systems by Kim Falk (Manning, 2016). And no matter the method you choose, keep in mind that the end result is a model that lets you quickly find the item-to-item or user-to-item affinities. This understanding is important as we explain how collaborative filtering results can be used in the context of search. 11.2.3. Tying user behavior information back to the search index In the previous section, we demonstrated how to build a simple recommendation system. But we’re supposed to be talking about personalized search! In this section, we return to search and explain how the output of collaborative filtering can be used to build a more personalized search experience.

In both cases, we start with relatively simple methods and then outline more sophisticated approaches using machine learning. In the process of laying out personalized search, we introduce recommendations. You can provide users with personalized content recommendations even before they’ve made a search. In addition, you’ll see that a search engine can be a powerful platform for building a recommendation system. Figure 11.1 shows recommendations side-by-side with search, implemented by a relevance engineer. Figure 11.1. By incorporating knowledge about the content and the user, search can be extended to tasks such as personalized search and recommendations. 11.1. Personalizing search based on user profiles Until now, we’ve defined relevance in terms of how well a search result matches a user’s immediate information need.

Here, information comes in three flavors: information about the users, about the items in the catalog, and about the current context of recommendation: User information —As users interact with the application, you can identify patterns in their behavior and learn about their interests and tastes. Particularly engaged users might even be willing to directly tell us about their interests. Item information —To make good recommendations, it’s important to be familiar with the items in the catalog. At a minimum, the items need to have useful textual content to match on. Items also need good metadata for boosting and filtering. In more advanced recommendation systems, you should also take advantage of the overall user behavior that gives you new information about how items in the catalog are interrelated. Recommendation context —To provide users with the best recommendations possible, you must consider their current context. Are they looking at an item details page? Then you should make recommendations for related items in case they aren’t sold on this one.

pages: 561 words: 120,899

The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy by Sharon Bertsch McGrayne


bioinformatics, British Empire, Claude Shannon: information theory, Daniel Kahneman / Amos Tversky, double helix, Edmond Halley, Fellow of the Royal Society, full text search, Henri Poincaré, Isaac Newton, John Nash: game theory, John von Neumann, linear programming, meta analysis, meta-analysis, Nate Silver, p-value, placebo effect, prediction markets, RAND corporation, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, Richard Feynman: Challenger O-ring, Ronald Reagan, speech recognition, statistical model, stochastic process, Thomas Kuhn: the structure of scientific revolutions, traveling salesman, Turing machine, Turing test, uranium enrichment, Yom Kippur War

Pouget A et al. (2009) Neural Computations as Laplacian (or is it Bayesian?) probabilistic inference. In draft. Quatse JT, Najmi A. (2007) Empirical Bayesian targeting. Proceedings, 2007 World Congress in Computer Science, Computer Engineering, and Applied Computing, June 25–28, 2007. Schafer JB, Konstan J, Riedl J. (1999) Recommender systems in E-commerce. In ACM Conference on Electronic Commerce (EC-99) 158–66. Schafer JB, Konstan J, Riedl J. (2001) Recommender systems in E-commerce. Data Mining and Knowledge Discovery (5) 115–53. Schneider, Stephen H. (2005) The Patient from Hell. Perseus Books. Spolsky, Joel. (2005) ( Swinburne, Richard, ed. (2002) Bayes’s Theorem. Oxford University Press. Taylor BL et al. (2000) Incorporating uncertainty into management models for marine mammals.

Users refine their own filters by reading low-scoring messages and either keeping them or sending them to trash and junk files. This use of Bayesian optimal classifiers is similar to the technique used by Frederick Mosteller and David Wallace to determine who wrote certain Federalist papers. Bayesian theory is firmly embedded in Microsoft’s Windows operating system. In addition, a variety of Bayesian techniques are involved in Microsoft’s handwriting recognition; recommender systems; the question-answering box in the upper right corner of a PC’s monitor screen; a datamining software package for tracking business sales; a program that infers the applications that users will want and preloads them before they are requested; and software to make traffic jam predictions for drivers to check before their commute. Bayes was blamed—unfairly, say Heckerman and Horwitz—for Microsoft’s memorably annoying paperclip, Clippy.

As the e-commerce refrain goes, “If you liked this book/song/movie, you’ll like that one too.” The updating used in machine learning does not necessarily follow Bayes’ theorem formally but “shares its perspective.” A 1-million contest sponsored by illustrates the prominent role of Bayesian concepts in modern e-commerce and learning theory. In 2006 the online film-rental company launched a search for the best recommender system to improve its own algorithm. More than 50,000 contestants from 186 countries vied over the four years of the competition. The AT&T Labs team organized around Yehuda Koren, Christopher T. Volinsky, and Robert M. Bell won the prize in September 2009. Interestingly, although no contestants questioned Bayes as a legitimate method, almost none wrote a formal Bayesian model. The winning group relied on empirical Bayes but estimated the initial priors according to their frequencies.

pages: 229 words: 68,426

Everyware: The Dawning Age of Ubiquitous Computing by Adam Greenfield


augmented reality, business process, defense in depth, demand response, demographic transition, facts on the ground, game design, Howard Rheingold, Internet of things, James Dyson, knowledge worker, late capitalism, Marshall McLuhan, new economy, Norbert Wiener, packet switching, pattern recognition, profit motive, recommendation engine, RFID, Steve Jobs, technoutopianism, the built environment, the scientific method

But the word "hint" is well-chosen here, because that's really all the cup will be able to communicate. It may well be that a full mug on my desk implies that I am also in the room, but this is not always going to be the case, and any system that correlates the two facts had better do so pretty loosely. Products and services based on such pattern-recognition already exist in the world—I think of Amazon's "collaborative filtering"–driven recommendation engine—but for the most part, their designers are only now beginning to recognize that they have significantly underestimated the difficulty of deriving meaning from those patterns. The better part of my Amazon recommendations turn out to be utterly worthless—and of all commercial pattern-recognition systems, that's among those with the largest pools of data to draw on. Lest we forget: "simple" is hard.

pages: 265 words: 74,000

The Numerati by Stephen Baker


Berlin Wall, Black Swan, business process, call centre, correlation does not imply causation, Drosophila, full employment, illegal immigration, index card, Isaac Newton, job automation, job satisfaction, McMansion, natural language processing, PageRank, personalized medicine, recommendation engine, RFID, Silicon Valley, Skype, statistical model, Watson beat the top human players on Jeopardy!

It will simply issue alerts when it detects changes in patterns and perhaps urge the user to schedule a medical appointment. It will be up to doctors and nurses to follow up, figuring out why someone is limping or swaying differently at the kitchen sink. But in time, these systems will have enough feedback from thousands of users that they should be able to point people—either doctors or patients—to the most probable cause. In this way, they will work like the recommendation engines on Netflix or, which point people toward books or movies that are popular among customers with similar patterns. (Amazon and Netflix, of course, don't always get it right, and neither will the analysis issuing from the magic carpet. It will only point caregivers toward statistically probable causes.) Dishman's team has installed magic carpets in the homes of people with neurological disorders or a history of falling.

pages: 231 words: 71,248

Shipping Greatness by Chris Vander Mey


don't be evil,, fudge factor, Google Chrome, Google Hangouts, Gordon Gekko, Jeff Bezos, Kickstarter, Lean Startup, minimum viable product, performance metric, recommendation engine, Skype, slashdot, sorting algorithm, Steve Jobs, Superbowl ad, web application

We chose to focus initially on professionals because while teens and tweens have time to spend on Facebook and YouTube, professionals have less time but also have rich networks and strong opinions—not to mention disposable capital to spend on content. Using IMDb’s unique collection of movie data and Amazon’s ability to distribute digital content and proven personalization tools, we will uniquely solve the content discovery problem by integrating these technologies and building unique suggestion algorithms. Unlike competitors such as Netflix, who already have a recommendations engine, we’ll integrate across all video sources and use our richer data to provide more interesting in-viewing experiences and more accurate recommendations. We will deliver these in-viewing experiences through platforms that can expose contextually relevant data (e.g., the cast of a YouTube video), such as a browser plug-in for YouTube and mobile applications for phones. We can also enlighten viewers by providing rich information about the content they are consuming, and prompt for feedback—creating a virtuous cycle in which all users benefit.

pages: 260 words: 76,223

Ctrl Alt Delete: Reboot Your Business. Reboot Your Life. Your Future Depends on It. by Mitch Joel


3D printing, Amazon Web Services, augmented reality, call centre, clockwatching, cloud computing, Firefox, future of work, ghettoisation, Google Chrome, Google Glasses, Google Hangouts, Khan Academy, Kickstarter, Kodak vs Instagram, Lean Startup, Mark Zuckerberg, Network effects, new economy, Occupy movement, place-making, prediction markets, pre–internet, recommendation engine, Richard Florida, risk tolerance, self-driving car, Silicon Valley, Silicon Valley startup, Skype, social graph, social web, Steve Jobs, Steve Wozniak, Thomas L Friedman, Tim Cook: Apple, Tony Hsieh, WikiLeaks

In fact, it’s actually very squiggly. Always bear that in mind. Embrace the squiggle. THE REALITY OF CAREER CHOICES IN A CTRL ALT DELETE WORLD. You can contrast the fictional story above with the tale of a friend of mine. This individual was never really sure what she wanted to do. There was no clear desire or talent in a single area of interest. In her final years of high school, a guidance counselor recommended engineering or the sciences because she had above-average math grades. So my friend studied engineering through university and squeaked by. Never passionate about it, she got her diploma and entered the workforce. I had lunch with her a while back and she confessed that she was miserable because of her work but could not figure out why. She had followed all the rules; she did okay in school, she advanced in a field that typically enables you to be both employable and well paid.

pages: 326 words: 74,433

Do More Faster: TechStars Lessons to Accelerate Your Startup by Brad Feld, David Cohen


augmented reality, computer vision, corporate governance, crowdsourcing, disintermediation, hiring and firing, Inbox Zero, Jeff Bezos, knowledge worker, Lean Startup, Ray Kurzweil, recommendation engine, risk tolerance, Silicon Valley, Skype, slashdot, social web, software as a service, Steve Jobs

— Travelfli (2008)—Now UsingMiles, helps frequent flyers maximize the full potential of their loyalty programs.— TutuorialTab (2010)—lets companies make their web site more learnable.— Usermojo (2010)—is an emotion analytics platform that tells you why users do what they do.— Vanilla (2009)—is open source forum software.— Villij (2007)—is a recommendation engine for people.— Vacation Rental Partner (2010)—makes it easy to generate revenue from a second home. We offer tools that eliminate the need for traditional property management companies.— TechStars companies funded after publication are listed on the TechStars web site. About the Authors Brad Feld is a co-founder and managing director at Foundry Group, an early stage venture capital firm, and a co-founder of TechStars.

pages: 252 words: 72,473

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil


Affordable Care Act / Obamacare, Bernie Madoff, big data - Walmart - Pop Tarts, call centre, carried interest, cloud computing, collateralized debt obligation, correlation does not imply causation, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, Emanuel Derman, housing crisis, illegal immigration, Internet of things, late fees, medical bankruptcy, Moneyball by Michael Lewis explains big data, new economy, obamacare, Occupy movement, offshore financial centre, payday loans, peer-to-peer lending, Peter Thiel, Ponzi scheme, prediction markets, price discrimination, quantitative hedge fund, Ralph Nader, RAND corporation, recommendation engine, Sharpe ratio, statistical model, Tim Cook: Apple, too big to fail, Unsafe at Any Speed, Upton Sinclair, Watson beat the top human players on Jeopardy!, working poor

Investors, of course, feast on these returns and shower WMD companies with more money. And the victims? Well, an internal data scientist might say, no statistical system can be perfect. Those folks are collateral damage. And often, like Sarah Wysocki, they are deemed unworthy and expendable. Forget about them for a minute, they might say, and focus on all the people who get helpful suggestions from recommendation engines or who find music they love on Pandora, the ideal job on LinkedIn, or perhaps the love of their life on Match.​com. Think of the astounding scale, and ignore the imperfections. Big Data has plenty of evangelists, but I’m not one of them. This book will focus sharply in the other direction, on the damage inflicted by WMDs and the injustice they perpetuate. We will explore harmful examples that affect people at critical life moments: going to college, borrowing money, getting sentenced to prison, or finding and holding a job.

pages: 265 words: 69,310

What's Yours Is Mine: Against the Sharing Economy by Tom Slee


4chan, Airbnb, Amazon Mechanical Turk, asset-backed security, barriers to entry, Berlin Wall, big-box store, bitcoin, blockchain, citizen journalism, collaborative consumption, congestion charging, Credit Default Swap, crowdsourcing, data acquisition, David Brooks, don't be evil, gig economy, Hacker Ethic, income inequality, informal economy, invisible hand, Jacob Appelbaum, Jane Jacobs, Jeff Bezos, Khan Academy, Kibera, Kickstarter, license plate recognition, Lyft, Mark Zuckerberg, move fast and break things, natural language processing, Netflix Prize, Network effects, new economy, Occupy movement, openstreetmap, Paul Graham, peer-to-peer lending, Peter Thiel, pre–internet, principal–agent problem, profit motive, race to the bottom, Ray Kurzweil, recommendation engine, rent control, ride hailing / ride sharing, sharing economy, Silicon Valley, Snapchat, software is eating the world, South of Market, San Francisco, TaskRabbit, The Nature of the Firm, Thomas L Friedman, transportation-network company, Uber and Lyft, Uber for X, ultimatum game, urban planning, WikiLeaks, winner-take-all economy, Y Combinator, Zipcar

This meant everyone using the system would pretty quickly develop a relevant ‘reputation’ visible to everyone else in the system.” 2 Friedman was writing just a couple of weeks after his New York Times stablemate David Brooks described “How Airbnb and Lyft Finally Got Americans to Trust Each Other”: “Companies like Airbnb establish trust through ratings mechanisms . . . People in the Airbnb economy don’t have the option of trusting each other on the basis of institutional affiliations, so they do it on the basis of online signaling and peer evaluations.” 3 Sharing Economy companies are not the first to use ratings and algorithms to guide behavior. Their trust systems build on the rating and recommendation systems used by Amazon, Netflix, eBay, Yelp, TripAdvisor, iTunes, the App Store and many others. Each takes individual ratings as their input and transforms them into some form of recommendation. As rating systems have become ubiquitous their usefulness has become a matter of faith in the world of software development. The Sharing Economy is at the cutting edge of a push for “algorithmic regulation” in which rules protecting consumers are replaced by ratings and software algorithms.

For Anderson, Amazon represents the return of variety and diversity after decades of homogenous blockbusters: “We are turning from a mass market back into a niche nation, defined not by geography but by interests.” 19 In a Long Tail world there is no need for formal gatekeepers who select or restrict the works that can find their public; instead, Web 2.0 platforms will do it for us using crowdsourced consumer reviews and recommender systems: “By combining infinite shelf space with real-time information about buying trends and public opinion . . . unlimited selection is revealing truths about what consumers want and how they want to get it.” 20 Amazon and Airbnb are similar in many ways. Both are, at least in part, software companies whose inventory is simply a set of entries in a database, accessed via a web site. Anything can go into the database: for Amazon’s books it might be Harry Potter or a self-published obscurity, or anything in between.

pages: 319 words: 89,477

The Power of Pull: How Small Moves, Smartly Made, Can Set Big Things in Motion by John Hagel Iii, John Seely Brown


Albert Einstein, Andrew Keen, barriers to entry, Black Swan, business process, call centre, Clayton Christensen, cleantech, cloud computing, corporate governance, Elon Musk,, future of work, game design, George Gilder, Isaac Newton, job satisfaction, knowledge economy, knowledge worker, loose coupling, Louis Pasteur, Malcom McLean invented shipping containers, Maui Hawaii, medical residency, Network effects, packet switching, pattern recognition, pre–internet, profit motive, recommendation engine, Ronald Coase, shareholder value, Silicon Valley, Skype, smart transportation, software as a service, supply-chain management, The Nature of the Firm, too big to fail, trade liberalization, transaction costs

Blurring Creation and Use Pull platforms tend to allow us to perform the following activities, with a blurring of the boundaries between creation and use: • Find. Pull platforms allow us to find not just raw materials, products, and services, but also people with relevant skills and experience. Some of the tools and services that pull platforms use to help participants find relevant resources include search, recommendation engines, directories, agents, and reputation services. • Connect. Again, pull platforms connect us not just to raw materials, products, and services, but also to people with relevant skills and experiences. Performance fabrics5 are particularly helpful in establishing appropriate connections. The mobile Internet is dramatically extending our ability to connect wherever we are. • Innovate. Pull platforms provide much more flexible environments for participants to innovate with the resources made available to them.

pages: 713 words: 93,944

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Eric Redmond, Jim Wilson, Jim R. Wilson


Amazon Web Services, create, read, update, delete, data is the new oil, database schema, Debian, domain-specific language,, fault tolerance, full text search, general-purpose programming language, linked data, MVC pattern, natural language processing, node package manager, random walk, recommendation engine, Skype, social graph, web application

Neo4j, as our open source example, is growing in popularity for many social network applications. Unlike other database styles that group collections of like objects into common buckets, graph databases are more free-form—queries consist of following edges shared by two nodes or, namely, traversing nodes. As more projects use them, graph databases are growing the straightforward social examples to occupy more nuanced use cases, such as recommendation engines, access control lists, and geographic data. Good For: Graph databases seem to be tailor-made for networking applications. The prototypical example is a social network, where nodes represent users who have various kinds of relationships to each other. Modeling this kind of data using any of the other styles is often a tough fit, but a graph database would accept it with relish. They are also perfect matches for an object-oriented system.

pages: 339 words: 88,732

The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson, Andrew McAfee


2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, access to a mobile phone, additive manufacturing, Airbnb, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, American Society of Civil Engineers: Report Card, Any sufficiently advanced technology is indistinguishable from magic, autonomous vehicles, barriers to entry, Baxter: Rethink Robotics, British Empire, business intelligence, business process, call centre, clean water, combinatorial explosion, computer age, computer vision, congestion charging, corporate governance, crowdsourcing, David Ricardo: comparative advantage, employer provided health coverage,, Erik Brynjolfsson, factory automation, falling living standards, Filter Bubble, first square of the chessboard / second half of the chessboard, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, full employment, game design, global village, happiness index / gross national happiness, illegal immigration, immigration reform, income inequality, income per capita, indoor plumbing, industrial robot, informal economy, inventory management, James Watt: steam engine, Jeff Bezos, jimmy wales, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, Khan Academy, knowledge worker, Kodak vs Instagram, law of one price, low skilled workers, Lyft, Mahatma Gandhi, manufacturing employment, Mark Zuckerberg, Mars Rover, means of production, Narrative Science, Nate Silver, natural language processing, Network effects, new economy, New Urbanism, Nicholas Carr, Occupy movement, oil shale / tar sands, oil shock, pattern recognition, payday loans, price stability, Productivity paradox, profit maximization, Ralph Nader, Ray Kurzweil, recommendation engine, Report Card for America’s Infrastructure, Robert Gordon, Rodney Brooks, Ronald Reagan, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Simon Kuznets, six sigma, Skype, software patent, sovereign wealth fund, speech recognition, statistical model, Steve Jobs, Steven Pinker, Stuxnet, supply-chain management, TaskRabbit, technological singularity, telepresence, The Bell Curve by Richard Herrnstein and Charles Murray, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, total factor productivity, transaction costs, Tyler Cowen: Great Stagnation, Vernor Vinge, Watson beat the top human players on Jeopardy!, winner-take-all economy, Y2K

When there are many small local markets, there can be a ‘best’ provider in each, and these local heroes frequently can all earn a good income. If these markets merge into a single global market, top performers have an opportunity to win more customers, while the next-best performers face harsher competition from all directions. A similar dynamic comes into play when technologies like Google or even Amazon’s recommendation engine reduce search costs. Suddenly second-rate producers can no longer count on consumer ignorance or geographic barriers to protect their margins. Digital technologies have aided the transition to winner-take-all markets, even for products we wouldn’t think would have superstar status. In a traditional camera store, cameras typically are not ranked number one versus number ten. But online retailers make it easy to list products in rank order by customer ratings, or to filter results to include only products with every conceivable desirable feature.

pages: 323 words: 95,939

Present Shock: When Everything Happens Now by Douglas Rushkoff


algorithmic trading, Andrew Keen, bank run, Benoit Mandelbrot, big-box store, Black Swan, British Empire, Buckminster Fuller, cashless society, citizen journalism, clockwork universe, cognitive dissonance, Credit Default Swap, crowdsourcing, Danny Hillis, disintermediation, Donald Trump, double helix, East Village, Elliott wave, European colonialism, Extropian, facts on the ground, Flash crash, game design, global supply chain, global village, Howard Rheingold, hypertext link, Inbox Zero, invention of agriculture, invention of hypertext, invisible hand, iterative process, John Nash: game theory, Kevin Kelly, laissez-faire capitalism, Law of Accelerating Returns, loss aversion, mandelbrot fractal, Marshall McLuhan, Merlin Mann, Milgram experiment, mutually assured destruction, Network effects, New Urbanism, Nicholas Carr, Norbert Wiener, Occupy movement, passive investing, pattern recognition, peak oil, price mechanism, prisoner's dilemma, Ralph Nelson Elliott, RAND corporation, Ray Kurzweil, recommendation engine, Silicon Valley, Skype, social graph, South Sea Bubble, Steve Jobs, Steve Wozniak, Steven Pinker, Stewart Brand, supply-chain management, the medium is the message, The Wisdom of Crowds, theory of mind, Turing test, upwardly mobile, Whole Earth Catalog, WikiLeaks, Y2K

Today’s most vocal critic of this trend, The Cult of the Amateur author Andrew Keen, explains, “According to a June 2006 study by the Pew Internet and American Life Project, 34 percent of the 12 million bloggers in America consider their online ‘work’ to be a form of journalism. That adds up to millions of unskilled, untrained, unpaid, unknown ‘journalists’—a thousandfold growth between 1996 and 2006—spewing their (mis)information out in the cyberworld.” More sanguine voices, such as City University of New York journalism professor and BuzzFeed blogger Jeff Jarvis, argue that the market—amplified by search results and recommendation engines—will eventually allow the better journalism to rise to the top of the pile. But even market mechanisms may have a hard time functioning as we consumers of all this media lose our ability to distinguish between facts, informed opinions, and wild assertions. Our impatient disgust with politics as usual combined with our newfound faith in our own gut sensibilities drives us to take matters into our own hands—in journalism and beyond.

pages: 364 words: 99,897

The Industries of the Future by Alec Ross


23andMe, 3D printing, Airbnb, algorithmic trading, AltaVista, Anne Wojcicki, autonomous vehicles, banking crisis, barriers to entry, Bernie Madoff, bioinformatics, bitcoin, blockchain, Brian Krebs, British Empire, business intelligence, call centre, carbon footprint, cloud computing, collaborative consumption, connected car, corporate governance, Credit Default Swap, cryptocurrency, David Brooks, disintermediation, Dissolution of the Soviet Union, distributed ledger, Edward Glaeser, Edward Snowden,, Erik Brynjolfsson, fiat currency, future of work, global supply chain, Google X / Alphabet X, industrial robot, Internet of things, invention of the printing press, Jaron Lanier, Jeff Bezos, job automation, knowledge economy, knowledge worker, litecoin, M-Pesa, Mark Zuckerberg, Mikhail Gorbachev, mobile money, money: store of value / unit of account / medium of exchange, new economy, offshore financial centre, open economy, peer-to-peer lending, personalized medicine, Peter Thiel, precision agriculture, pre–internet, RAND corporation, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Satoshi Nakamoto, self-driving car, sharing economy, Silicon Valley, Silicon Valley startup, Skype, smart cities, social graph, software as a service, special economic zone, supply-chain management, supply-chain management software, technoutopianism, underbanked, Vernor Vinge, Watson beat the top human players on Jeopardy!, women in the workforce, Y Combinator, young professional

Academics have likened it to both a microscope and telescope—a tool that allows us to both examine smaller details than could previously be observed and to see data at a larger scale, revealing correlations that were previously too distant for us to notice. The story of big data’s real-world impact to this point has been largely about logistics and persuasion. It has been great for supply chains, elections, and advertising because these tend to be fields with lots of small, repeated, and quantifiable actions—hence the “recommendation engines” used by Amazon and Netflix that help make more precise recommendations to customers. But these fields are just the beginning, and by the time my kids enter the workforce, big data won’t be a buzz phrase any longer. It will have permeated parts of our lives that we do not think of today as being rooted in analytics. It will change what we eat, how we speak, and where we draw the line between our public and private personas.

pages: 421 words: 110,406

Platform Revolution: How Networked Markets Are Transforming the Economy--And How to Make Them Work for You by Sangeet Paul Choudary, Marshall W. van Alstyne, Geoffrey G. Parker


3D printing, Affordable Care Act / Obamacare, Airbnb, Amazon Mechanical Turk, Amazon Web Services, Andrei Shleifer, Apple's 1984 Super Bowl advert, autonomous vehicles, barriers to entry, big data - Walmart - Pop Tarts, bitcoin, blockchain, business process, buy low sell high, chief data officer, clean water, cloud computing, connected car, corporate governance, crowdsourcing, data acquisition, data is the new oil, discounted cash flows, disintermediation, Edward Glaeser, Elon Musk,, Erik Brynjolfsson, financial innovation, Haber-Bosch Process, High speed trading, Internet of things, inventory management, invisible hand, Jean Tirole, Jeff Bezos, jimmy wales, Khan Academy, Kickstarter, Lean Startup, Lyft, market design, multi-sided market, Network effects, new economy, payday loans, peer-to-peer lending, Peter Thiel,, pre–internet, price mechanism, recommendation engine, RFID, Richard Stallman, ride hailing / ride sharing, Ronald Coase, Satoshi Nakamoto, self-driving car, shareholder value, sharing economy, side project, Silicon Valley, Skype, smart contracts, smart grid, Snapchat, software is eating the world, Steve Jobs, TaskRabbit, The Chicago School, the payments system, Tim Cook: Apple, transaction costs, two-sided market, Uber and Lyft, Uber for X, winner-take-all economy, Zipcar

Even more unsettling are some of the less obvious ways in which personal data are used. Many firms—both platform businesses and others—track consumers’ web usage, financial interactions, magazine subscriptions, political and charitable contributions, and much more to create highly detailed individual profiles. In the aggregate, such data can be used for cross-marketing to people who share profiles, as when a recommendation engine on a shopping site tells you, “People like you who bought product A often enjoy product B, too!” The anonymity of this process renders it unobjectionable to most people. But the same underlying data can be, and is, sold to prospective employers, government agencies, health care providers, and marketers of all kinds. Individually identifiable data about sensitive topics such as sexual orientation, prescription drug use, alcoholism, and personal travel (tracked through cell phone location data) can be purchased through data broker firms such as Acxiom.32 Consumer concern over the practices of the data broker industry has led to a number of investigations, including a major FTC inquiry that resulted in a report titled “Data Brokers: A Call for Transparency and Accountability.”33 But very little has actually changed to prevent practices that many find objectionable.34 Skeptics say that, in reality, citizen concerns about data privacy are superficial.

pages: 326 words: 103,170

The Seventh Sense: Power, Fortune, and Survival in the Age of Networks by Joshua Cooper Ramo


Airbnb, Albert Einstein, algorithmic trading, barriers to entry, Berlin Wall, bitcoin, British Empire, cloud computing, crowdsourcing, Danny Hillis, defense in depth, Deng Xiaoping, Edward Snowden, Fall of the Berlin Wall, Firefox, Google Chrome, income inequality, Isaac Newton, Jeff Bezos, job automation, market bubble, Menlo Park, natural language processing, Network effects, Norbert Wiener, Oculus Rift, packet switching, Paul Graham, price stability, quantitative easing, RAND corporation, recommendation engine, Republic of Letters, Richard Feynman, Richard Feynman, road to serfdom, Sand Hill Road, secular stagnation, self-driving car, Silicon Valley, Skype, Snapchat, social web, sovereign wealth fund, Steve Jobs, Steve Wozniak, Stewart Brand, Stuxnet, superintelligent machines, technological singularity, The Coming Technological Singularity, The Wealth of Nations by Adam Smith, too big to fail, Vernor Vinge, zero day

And then the machine would spit back some films you might enjoy. The Paul Newman classic Cool Hand Luke, for instance. And, well, you had liked that film. This seemed magic, just the sort of data-meets-human question that showcased a machine learning and thinking. An honestly artificial intelligence. Maes hoped to design a computer that could predict what movies or music or books you or I might enjoy. (And, of course, buy.) A recommendation engine. We all know how sputtering our own suggestion motors can be. Think of that primitive analog exchange known as the First Date: Oh, you like Radiohead? Do you know Sigur Rós? Pause. Hate them. Can you really predict what albums or novels even your closest friend will enjoy? You might offer an occasional lucky suggestion. But to confidently bridge your knowledge of a friend’s taste and the nearly endless library of movies and songs and books?

pages: 540 words: 103,101

Building Microservices by Sam Newman


airport security, Amazon Web Services, anti-pattern, business process, call centre, continuous integration, create, read, update, delete, defense in depth, Edward Snowden, fault tolerance, index card, information retrieval, Infrastructure as a Service, inventory management, job automation, load shedding, loose coupling, platform as a service, premature optimization, pull request, recommendation engine, social graph, software as a service, the built environment, web application, WebSocket, x509 certificate

Then we want to try to understand what bounded contexts the monolith maps to. Let’s imagine that initially we identify four contexts we think our monolithic backend covers: Catalog Everything to do with metadata about the items we offer for sale Finance Reporting for accounts, payments, refunds, etc. Warehouse Dispatching and returning of customer orders, managing inventory levels, etc. Recommendation Our patent-pending, revolutionary recommendation system, which is highly complex code written by a team with more PhDs than the average science lab The first thing to do is to create packages representing these contexts, and then move the existing code into them. With modern IDEs, code movement can be done automatically via refactorings, and can be done incrementally while we are doing other things. You’ll still need tests to catch any breakages made by moving code, however, especially if you’re using a dynamically typed language where the IDEs have a harder time of performing refactoring.

Security MusicCorp has had a security audit, and has decided to tighten up its protection of sensitive information. Currently, all of this is handled by the finance-related code. If we split this service out, we can provide additional protections to this individual service in terms of monitoring, protection of data at transit, and protection of data at rest — ideas we’ll look at in more detail in Chapter 9. Technology The team looking after our recommendation system has been spiking out some new algorithms using a logic programming library in the language Clojure. The team thinks this could benefit our customers by improving what we offer them. If we could split out the recommendation code into a separate service, it would be easy to consider building an alternative implementation that we could test against. Tangled Dependencies The other point to consider when you’ve identified a couple of seams to separate is how entangled that code is with the rest of the system.

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin


business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, discrete time, George Gilder, Google Earth, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, late capitalism, linked data, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

Discovering correlations between certain items led to new product placements and alterations to shelf space management and a 16 per cent increase in revenue per shopping cart in the first month’s trial. There was no hypothesis that Product A was often bought with Product H that was then tested. The data were simply queried to discover what relationships existed that might have previously been unnoticed. Similarly, Amazon’s recommendation system produces suggestions for other items a shopper might be interested in without knowing anything about the culture and conventions of books and reading; it simply identifies patterns of purchasing across customers in order to determine whether, if Person A likes Book X, they are also likely to like Book Y given their own and others’ consumption patterns. Dyche’s contention is that this open, rather than directed, approach to discovery is more likely to reveal unknown, underlying patterns with respect to customer behaviours, product affinities, and financial risks, that can then be exploited.

In fact, both deductive and inductive reasoning are always discursively framed and do not arise out of nowhere. Popper (1979, cited in Callebaut 2012: 74) thus suggests that all science adopts a searchlight approach to scientific discovery, with the focus of light guided by previous findings, theories and training; by speculation that is grounded in experience and knowledge. The same is true for Amazon, Hunch, Ayasdi, and Google. How Amazon constructed its recommendation system was based on scientific reasoning, underpinned by a guiding model and accompanied by empirical testing designed to improve the performance of the algorithms it uses. Likewise, Google undertakes extensive research and development, it works in partnership with scientists and it buys scientific knowledge, either funding research within universities or by buying the IP of other companies, to refine and extend the utility of how it organises, presents and extracts value from data.

pages: 493 words: 139,845

Women Leaders at Work: Untold Tales of Women Achieving Their Ambitions by Elizabeth Ghaffari


Albert Einstein, AltaVista, business process, cloud computing, Columbine, corporate governance, corporate social responsibility, dark matter, family office, Fellow of the Royal Society, financial independence, follow your passion, glass ceiling, Grace Hopper, high net worth, knowledge worker, Long Term Capital Management, performance metric, pink-collar, profit maximization, profit motive, recommendation engine, Ronald Reagan, shareholder value, Silicon Valley, Silicon Valley startup, Steve Ballmer, Steve Jobs, thinkpad, trickle-down economics, urban planning, women in the workforce, young professional

You begin to see both similarities and huge cultural differences. Kate just came back from rural India, studying the ways people there use technology. These are intriguing issues. I find it especially interesting to bring such people together with more mathematical people like me. I have worked on models of social networks and recommendation systems that exist in social networks. When I talk to danah, I'm trying to understand what people are seeking through recommendation systems. When you merge qualitative and quantitative skill sets, it takes a while for each to adapt to the other because there are language barriers and differences in what we're trying to achieve. When we finally do achieve something jointly, I find that it's usually very good and very deep. __________ 3 The lower case spelling of danah boyd is “how she chooses to identify” herself.

pages: 170 words: 51,205

Information Doesn't Want to Be Free: Laws for the Internet Age by Cory Doctorow, Amanda Palmer, Neil Gaiman


Airbnb, barriers to entry, Brewster Kahle, cloud computing, Dean Kamen, Edward Snowden, game design, Internet Archive, John von Neumann, Kickstarter, optical character recognition, Plutocrats, plutocrats, pre–internet, profit maximization, recommendation engine, rent-seeking, Saturday Night Live, Skype, Steve Jobs, Steve Wozniak, Stewart Brand, transfer pricing, Whole Earth Catalog, winner-take-all economy

But all these sectors are in sharp decline, and in many cases the most significant channel for creative work is now the Internet. Customers don’t necessarily deliver themselves to “stores”—virtual or physical—and when they do, the titles on offer are rarely the neatly curated, finite, and browsable selections that once dominated. The shelves, instead, are nearly infinite. Browsing has been augmented by search algorithms and automated recommendation systems. And the number of ways for customers to discover new work has exploded. Word of mouth has always been a creator’s best friend. Recommendations from personally trusted sources were a surefire way to sell products. When I worked in a bookstore, one of the most reliable indicators of an imminent sale was two friends entering the store together, and one of them picking up a book and handing it to the other with the words “Oh, you’ve got to read this; you’ll love it.”

pages: 606 words: 157,120

To Save Everything, Click Here: The Folly of Technological Solutionism by Evgeny Morozov


3D printing, algorithmic trading, Amazon Mechanical Turk, Andrew Keen, augmented reality, Automated Insights, Berlin Wall, big data - Walmart - Pop Tarts, Buckminster Fuller, call centre, carbon footprint, Cass Sunstein, choice architecture, citizen journalism, cloud computing, cognitive bias, crowdsourcing, data acquisition, Dava Sobel, disintermediation, East Village,, Fall of the Berlin Wall, Filter Bubble, Firefox, Francis Fukuyama: the end of history, frictionless, future of journalism, game design, Gary Taubes, Google Glasses, illegal immigration, income inequality, invention of the printing press, Jane Jacobs, Jean Tirole, Jeff Bezos, jimmy wales, Julian Assange, Kevin Kelly, Kickstarter, license plate recognition, lone genius, Louis Pasteur, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Narrative Science, Nicholas Carr, packet switching, PageRank, Paul Graham, Peter Singer: altruism, Peter Thiel,, placebo effect, pre–internet, Ray Kurzweil, recommendation engine, Richard Thaler, Ronald Coase, Rosa Parks, self-driving car, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, Skype, Slavoj Žižek, smart meter, social graph, social web, stakhanovite, Steve Jobs, Steven Levy, Stuxnet, technoutopianism, the built environment, The Chicago School, The Death and Life of Great American Cities, the medium is the message, The Nature of the Firm, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas L Friedman, transaction costs, urban decay, urban planning, urban sprawl, Vannevar Bush, WikiLeaks

., do you think the government has a role to play in education?). then calculates your “political DNA” in order to match you with similar users and encourage you to join relevant “rucks” (according to the site, “the word comes from rugby, where players form a ruck when they loosely come together to fight the other team for possession of the ball.”). is like Netflix for politics, with its cause-recommendation engine essentially encouraging you to, say, check out a campaign to ban abortion if you have expressed strong opposition to gun control, much in the way that Netflix would recommend that you check out Rambo if you liked Rocky. Once in a “ruck,” members can simply follow news posted by other members or be more proactive and share information themselves: links to relevant petitions, organizations, and events are particularly encouraged.

pages: 382 words: 120,064

Bank 3.0: Why Banking Is No Longer Somewhere You Go but Something You Do by Brett King


3D printing, additive manufacturing, Albert Einstein, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, asset-backed security, augmented reality, barriers to entry, bitcoin, bounce rate, business intelligence, business process, business process outsourcing, call centre, capital controls, citizen journalism, Clayton Christensen, cloud computing, credit crunch, crowdsourcing, disintermediation,, George Gilder, Google Glasses, high net worth, I think there is a world market for maybe five computers, Infrastructure as a Service, invention of the printing press, Jeff Bezos, jimmy wales, London Interbank Offered Rate, M-Pesa, Mark Zuckerberg, mass affluent, microcredit, mobile money, more computing power than Apollo, Northern Rock, Occupy movement, optical character recognition, performance metric, platform as a service, QWERTY keyboard, Ray Kurzweil, recommendation engine, RFID, risk tolerance, self-driving car, Skype, speech recognition, stem cell, telepresence, Tim Cook: Apple, transaction costs, underbanked, web application

In Siri’s patent application, various possibilities are hinted at, including being a voice agent providing assistance for “automated teller machines”.4 In fact, SRI (the creator of Siri™) and BBVA recently announced a collaboration to introduce Lola5, a Siri-like technology, to customers through the Internet and via voice. Siri’s near-term capabilities include: 1. Being able to make simple online purchases, such as “Purchase Bank 3.0 from Amazon Kindle” 2. Serving as a recommendation engine or intelligent automated assistant—an “agent avatar”, as it has sometimes been labelled However, there are some challenges in having customers talk into their phones for customer support, or replacing an IVR system with technologies such as Lola, as a recent New York Times article pointed out when it called Siri “the latest public nuisance in the cell phone revolution”. It outlined several scenarios of people using Siri in less than desirable situations (e.g. public transportation) for things as mundane as sending an SMS message wishing a friend a happy birthday.

pages: 527 words: 147,690

Terms of Service: Social Media and the Price of Constant Connection by Jacob Silverman


23andMe, 4chan, A Declaration of the Independence of Cyberspace, Airbnb, airport security, Amazon Mechanical Turk, augmented reality, Brian Krebs, California gold rush, call centre, cloud computing, cognitive dissonance, correlation does not imply causation, Credit Default Swap, crowdsourcing, don't be evil, Edward Snowden, feminist movement, Filter Bubble, Firefox, Flash crash, game design, global village, Google Chrome, Google Glasses, hive mind, income inequality, informal economy, information retrieval, Internet of things, Jaron Lanier, jimmy wales, Kevin Kelly, Kickstarter, knowledge economy, knowledge worker, late capitalism, license plate recognition, life extension, Lyft, Mark Zuckerberg, Mars Rover, Marshall McLuhan, meta analysis, meta-analysis, Minecraft, move fast and break things, national security letter, Network effects, new economy, Nicholas Carr, Occupy movement, optical character recognition, payday loans, Peter Thiel, postindustrial economy, prediction markets, pre–internet, price discrimination, price stability, profit motive, quantitative hedge fund, race to the bottom, Ray Kurzweil, recommendation engine, rent control, RFID, ride hailing / ride sharing, self-driving car, sentiment analysis, shareholder value, sharing economy, Silicon Valley, Silicon Valley ideology, Snapchat, social graph, social web, sorting algorithm, Steve Ballmer, Steve Jobs, Steven Levy, TaskRabbit, technoutopianism, telemarketer, transportation-network company, Turing test, Uber and Lyft, Uber for X, universal basic income, unpaid internship, women in the workforce, Y Combinator, Zipcar

Negative reviews proliferate as acts of revenge against scorned rivals or as ways to push one’s own rating ahead of a competitor. Even so, companies remain extraordinarily reliant on these reviews. A 2011 Harvard Business School study found that, on Yelp, “an extra star is worth an extra 5 to 9 percent in revenue.” The result of all this reviewing has been the atrophying of the critical culture, with professional critics seen as dispensable, nothing more than recommendation engines who can be replaced with algorithms and free, crowdsourced reviews. (Even so, some prominent cultural critics remain, though with less influence than they used to hold, and a smattering of publications, from the actuarially precise Consumer Reports to the liberal humanist New York Review of Books, continue to thrive.) It’s also expanded the idea of what should be reviewed, with everything now potentially susceptible to, if not a star rating, then the kind of up-or-down judgment we perform all the time when we choose to like things.

pages: 236 words: 77,098

I Live in the Future & Here's How It Works: Why Your World, Work, and Brain Are Being Creatively Disrupted by Nick Bilton


3D printing, 4chan, Albert Einstein, augmented reality, barriers to entry, book scanning, Cass Sunstein, death of newspapers,, Internet of things, John Gruber, Marshall McLuhan, Nicholas Carr, recommendation engine, RFID, Saturday Night Live, Steve Jobs, Steven Pinker, Stewart Brand

In short, I base my choice on the overall experience and what I want at that particular time. Here are three different ways people, especially young ones, may evaluate whether something is worth purchasing. Bad = Free My friend Mike loves music. In fact, Mike is a music fanatic. In every spare moment he has, Mike scours the Web and his social networks, searching for new music to listen to and potentially purchase. Like most of his friends, Mike uses his recommendation systems and social networks to find the music he’s interested in. He’ll preview a few songs, and if he decides the content is good, he’ll follow through with a purchase. He rarely buys entire albums because he believes most albums contain only one or two good songs. Mike also follows a handful of bands and immediately buys their entire albums on release day. But Mike steals music, too. He doesn’t steal music because he can’t afford it or to take a stand against media moguls and corporations, and he definitely doesn’t do it for the thrill.

pages: 204 words: 67,922

Elsewhere, U.S.A: How We Got From the Company Man, Family Dinners, and the Affluent Society to the Home Office, BlackBerry Moms,and Economic Anxiety by Dalton Conley


3D printing, call centre, clean water, dematerialisation, demographic transition, Edward Glaeser, extreme commuting, feminist movement, financial independence, Firefox, Frank Levy and Richard Murnane: The New Division of Labor, Home mortgage interest deduction, income inequality, informal economy, Jane Jacobs, John Maynard Keynes: Economic Possibilities for our Grandchildren, knowledge economy, knowledge worker, labor-force participation, late capitalism, low skilled workers, manufacturing employment, McMansion, mortgage tax deduction, new economy, oil shock, PageRank, Ponzi scheme, positional goods, post-industrial society, Post-materialism, post-materialism, principal–agent problem, recommendation engine, Richard Florida, rolodex, Ronald Reagan, Silicon Valley, Skype, statistical model, The Death and Life of Great American Cities, The Great Moderation, The Wealth of Nations by Adam Smith, Thomas Malthus, Thorstein Veblen, transaction costs, women in the workforce, Yom Kippur War

Not only would my local video store not have been able to afford the shelf space to stock Ring of Bright Water, but the issue more germane to the present discussion is that I would have never even known to ask for it. In fact, short of some chance encounter of a recommendation at a dinner party, I would have never even known that this 1969 British film existed. The fact that I now know it exists can be attributed to the network basis of the Netflix recommendation system. The connected economy, then, does not merely facilitate sameness and the diffusion of hits. It can encourage niche consumption (as Chris Anderson celebrates in The Long Tail). But as wonderful as it is to have a computer recommend a sleeper film that even the slacker clerks at my neighborhood video store wouldn’t be able to name, there is a subtle cost to this form of knowledge diffusion.

pages: 411 words: 80,925

What's Mine Is Yours: How Collaborative Consumption Is Changing the Way We Live by Rachel Botsman, Roo Rogers


Airbnb, barriers to entry, Bernie Madoff, bike sharing scheme, Buckminster Fuller, carbon footprint, Cass Sunstein, collaborative consumption, collaborative economy, Community Supported Agriculture, credit crunch, crowdsourcing, dematerialisation, disintermediation,, experimental economics, George Akerlof, global village, Hugh Fearnley-Whittingstall, information retrieval, iterative process, Kevin Kelly, Kickstarter, late fees, Mark Zuckerberg, market design, Menlo Park, Network effects, new economy, new new economy, out of africa, Parkinson's law, peer-to-peer lending, Ponzi scheme, pre–internet, recommendation engine, RFID, Richard Stallman, ride hailing / ride sharing, Robert Shiller, Robert Shiller, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, Simon Kuznets, Skype, slashdot, smart grid, South of Market, San Francisco, Stewart Brand, The Nature of the Firm, The Spirit Level, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thorstein Veblen, Torches of Freedom, transaction costs, traveling salesman, ultimatum game, Victor Gruen, web of trust, women in the workforce, Zipcar

Collective Wisdom of Members At the same time, Netflix has built a sophisticated platform to foster a community among members, and to tailor recommendations to individual tastes. Talk to anyone who has ever used Netflix and they will tell you about how they “discovered releases,” “learned about classics,” and “found rare gems” they never would have found on their own at a store. Approximately 60 percent of members base their selections on Netflix’s Cinematch recommendations system. Early on, people’s willingness to share and rate the films they had watched and to make suggestions to “friends” surprised the founders. The user community itself adopted the ethos of “Millions of members helping you.” Impressively, there are now more than 2 billion ratings from members, and the average member has evaluated approximately two hundred movies. The result is an invaluable collective wisdom impossible to replicate elsewhere.

pages: 247 words: 71,698

Avogadro Corp by William Hertling


Any sufficiently advanced technology is indistinguishable from magic, cloud computing, crowdsourcing, Hacker Ethic, hive mind, invisible hand, natural language processing, Netflix Prize, private military company, Ray Kurzweil, recommendation engine, Richard Stallman, technological singularity, Turing test, web application

If there was one thing that drove Mike crazy about David, it was his tendency to become uncommunicative exactly when the stakes were highest. Another minute passed, and Mike started to mentally squirm. “I wish I could find something,” he finally said, “but I don’t know what. There’s this brilliant self-taught Serbian kid who is doing some stuff with artificial intelligence algorithms, and he’s doing it all on his home PC. I’ve been reading his blog, and it sounds like he has some really novel approaches to recommendation systems. But I don’t see any way we could duplicate what he’s doing before the end of the week.” Mike was really grasping at straws. Thin straws at that. He hated to bring bad news to David. “Maybe we can turn down the accuracy of the system. If we use fewer language-goal clusters, we can run with less memory and fewer processor cycles. Maybe...” “No, don’t do that.” David’s soft voice floated up out of the dim light, startling Mike.

pages: 270 words: 64,235

Effective Programming: More Than Writing Code by Jeff Atwood


AltaVista, Amazon Web Services, barriers to entry, cloud computing, endowment effect, Firefox, future of work, game design, Google Chrome, gravity well, job satisfaction, Khan Academy, Kickstarter, loss aversion, Mark Zuckerberg, Merlin Mann, Minecraft, Paul Buchheit, Paul Graham, price anchoring, race to the bottom, recommendation engine, science of happiness, Skype, social software, Steve Jobs, web application, Y Combinator

AWS is, of course, the preeminent provider of so-called “cloud computing,” so this can essentially be read as key advice for any website considering a move to the cloud. And it’s great advice, too. Here’s the one bit that struck me as most essential: We’ve sometimes referred to the Netflix software architecture in AWS as our Rambo Architecture. Each system has to be able to succeed, no matter what, even all on its own. We’re designing each distributed system to expect and tolerate failure from other systems on which it depends. If our recommendations system is down, we degrade the quality of our responses to our customers, but we still respond. We’ll show popular titles instead of personalized picks. If our search system is intolerably slow, streaming should still work perfectly fine. One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture.

Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport


Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, data acquisition, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, Mark Zuckerberg, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining

LinkedIn also employs big data for internal processes, including sales and marketing campaigns. For instance, LinkedIn has used some of its own internal data to predict which companies will buy LinkedIn Chapter_07.indd 158 03/12/13 12:42 PM What You Can Learn from Start-Ups and Online Firms   159 products, and even who in those firms has the highest likelihood of buying. This work led to an internal recommendation system for salespeople that makes it much easier for them to get the data in one place, and has improved conversion rates by several hundred percent. LinkedIn’s cofounder, Reid Hoffman, is a strong advocate for big data: Because of Web 2.0 [the explosion of social networks and c ­ onsumer participation in the web] and the increasing number of sensors, there’s all this data. With these massive amounts of highly semantically indexed data that’s indexed around people and places and all the things that matter to us and our lives, I believe there are going to be a ton of interesting apps that come out of that . . . the way our products and services are ­constituted, how we determine our strategy and maintain a ­competitive edge against other folks—if data is a very strong element of each of these, and you’re not doing anything, it’s like trying to run a business without business intelligence.a a.

Writing Effective Use Cases by Alistair Cockburn


business process,, create, read, update, delete, finite state, index card, information retrieval, iterative process, recommendation engine, Silicon Valley, web application

System presents a list of saved solutions for this Shopper 26c2. Shopper selects the solution they wish to recall 26c3. System recalls the selected solution. 26c4. Continue at step 26 26d. Shopper wants to finance products in the shopping cart with available Finance Plans: 26d1. Shopper chooses to finance products in the shopping cart 26d2. System will present a series of questions that are dependent on previous answers to determine finance plan recommendations. System interfaces with Finance System to obtain credit rating approval. Initiate Obtain Finance Rating. 26d3. Shopper will select a finance plan 26d4. System will present a series of questions based on previous answers to determine details of the selected finance plan. 26d5. Shopper will view financial plan details and chooses to go with the plan. 26d6. System will place the finance plan order with the Finance System initiate Place Finance order.

pages: 903 words: 235,753

The Stack: On Software and Sovereignty by Benjamin H. Bratton


1960s counterculture, 3D printing, 4chan, Ada Lovelace, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, Berlin Wall, bioinformatics, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, carbon footprint, carbon-based life, Cass Sunstein, Celebration, Florida, charter city, clean water, cloud computing, connected car, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Edward Snowden, Elon Musk,, Eratosthenes, ethereum blockchain, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, Hyperloop, illegal immigration, industrial robot, information retrieval, intermodal, Internet of things, invisible hand, Jacob Appelbaum, Jaron Lanier, Jony Ive, Julian Assange, Khan Academy, linked data, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megastructure, Menlo Park, Minecraft, Monroe Doctrine, Network effects, new economy, offshore financial centre, oil shale / tar sands, packet switching, PageRank, pattern recognition, peak oil, performance metric, personalized medicine, Peter Thiel, phenotype, place-making, planetary scale, RAND corporation, recommendation engine, reserve currency, RFID, Sand Hill Road, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, Slavoj Žižek, smart cities, smart grid, smart meter, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, TaskRabbit, the built environment, The Chicago School, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, universal basic income, urban planning, Vernor Vinge, Washington Consensus, web application, WikiLeaks, working poor, Y Combinator

In that the service is provided to a device-User that is in motion, moving through the City layer and encountering different contexts on the go, the App platform provides that provisional link between a preexisting physical spatial context and this User-directed overlay of a Cloud service onto immediate circumstances. As discussed in the City layer chapter, there is then a kind of programmatic blending between the urban situation through which a User moves and the interactions he may be having with a specific App and Cloud service. A mall becomes a game board, a sidewalk becomes a banking center, a restaurant becomes the scene of a crime in a crowd-sourced recommendation engine, birds are angry and enemies are identified, and the experience of these may be very different for different people and purposes. At any given moment, multiple Users interacting with different Apps in the same place may have brought their shared location into contrasting Cloud dramas; one may be ensconced in a first-person shooter game and the other in measuring his carbon footprint, further fragmenting any apparent solidarity of the crowd.

pages: 669 words: 210,153

Tools of Titans: The Tactics, Routines, and Habits of Billionaires, Icons, and World-Class Performers by Timothy Ferriss


Airbnb, artificial general intelligence, asset allocation, Atul Gawande, augmented reality, back-to-the-land, Bernie Madoff, Bertrand Russell: In Praise of Idleness, Black Swan, blue-collar work, Buckminster Fuller, business process, Cal Newport, call centre, Checklist Manifesto, cognitive bias, cognitive dissonance, Colonization of Mars, Columbine, correlation does not imply causation, David Brooks, David Graeber, diversification, diversified portfolio, Donald Trump, effective altruism, Elon Musk, fault tolerance, fear of failure, Firefox, follow your passion, future of work, Google X / Alphabet X, Howard Zinn, Hugh Fearnley-Whittingstall, Jeff Bezos, job satisfaction, Johann Wolfgang von Goethe, Kevin Kelly, Kickstarter, Lao Tzu, life extension, Mahatma Gandhi, Mark Zuckerberg, Mason jar, Menlo Park, Mikhail Gorbachev, Nicholas Carr, optical character recognition, PageRank, passive income, pattern recognition, Paul Graham, Peter H. Diamandis: Planetary Resources, Peter Singer: altruism, Peter Thiel, phenotype, post scarcity, premature optimization, QWERTY keyboard, Ralph Waldo Emerson, Ray Kurzweil, recommendation engine, rent-seeking, Richard Feynman, Richard Feynman, risk tolerance, Ronald Reagan, sharing economy, side project, Silicon Valley, skunkworks, Skype, Snapchat, social graph, software as a service, software is eating the world, stem cell, Stephen Hawking, Steve Jobs, Stewart Brand, superintelligent machines, Tesla Model S, The Wisdom of Crowds, Thomas L Friedman, Wall-E, Washington Consensus, Whole Earth Catalog, Y Combinator

Chris Anderson (my successor at Wired) named this effect “the Long Tail,” for the visually graphed shape of the sales distribution curve: a low, nearly interminable line of items selling only a few copies per year that form a long “tail” for the abrupt vertical beast of a few bestsellers. But the area of the tail was as big as the head. With that insight, the aggregators had great incentive to encourage audiences to click on the obscure items. They invented recommendation engines and other algorithms to channel attention to the rare creations in the long tail. Even web search companies like Google, Bing, and Baidu found it in their interests to reward searchers with the obscure because they could sell ads in the long tail as well. The result was that the most obscure became less obscure. If you live in any of the 2 million small towns on Earth, you might be the only one in your town to crave death metal music, or get turned on by whispering, or want a left-handed fishing reel.

pages: 476 words: 132,042

What Technology Wants by Kevin Kelly


Albert Einstein, Alfred Russel Wallace, Buckminster Fuller,, carbon-based life, Cass Sunstein, charter city, Clayton Christensen, cloud computing, computer vision, Danny Hillis, dematerialisation, demographic transition, double entry bookkeeping,, Exxon Valdez, George Gilder, gravity well, hive mind, Howard Rheingold, interchangeable parts, invention of air conditioning, invention of writing, Isaac Newton, Jaron Lanier, John Conway, John von Neumann, Kevin Kelly, knowledge economy, Lao Tzu, life extension, Louis Daguerre, Marshall McLuhan, megacity, meta analysis, meta-analysis, new economy, out of africa, performance metric, personalized medicine, phenotype, Picturephone, planetary scale, RAND corporation, random walk, Ray Kurzweil, recommendation engine, refrigerator car, Richard Florida, Silicon Valley, silicon-based life, Skype, speech recognition, Stephen Hawking, Steve Jobs, Stewart Brand, Ted Kaczynski, the built environment, the scientific method, Thomas Malthus, Vernor Vinge, Whole Earth Catalog, Y2K

It is true that too many choices may induce regret, but “no choice” is a far worse option. Civilization is a steady migration away from “no choice.” As always, the solution to the problems that technology brings, such as an overwhelming diversity of choices, is better technologies. The solution to ultradiversity will be choice-assist technologies. These better tools will aid humans in making choices among bewildering options. That is what search engines, recommendation systems, tagging, and a lot of social media are all about. Diversity, in fact, will produce tools to handle diversity. (Diversity-taming tools will be among the wildly diversity-making 821 million patents that current rates predict will have been filed in the U.S. Patent Office by 2060!) We are already discovering how to use computers to augment our choices with information and web pages (Google is one such tool), but it will take additional learning and technologies to do this with tangible stuff and idiosyncratic media.

pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom


agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, anti-communist, artificial general intelligence, autonomous vehicles, barriers to entry, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, demographic transition, Douglas Hofstadter, Drosophila, Elon Musk,, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, Gödel, Escher, Bach, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John von Neumann, knowledge worker, Menlo Park, meta analysis, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Norbert Wiener, NP-complete, nuclear winter, optical character recognition, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, strong AI, superintelligent machines, supervolcano, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, transaction costs, Turing machine, Vernor Vinge, Watson beat the top human players on Jeopardy!, World Values Survey

Then the entire system was overthrown by the heliocentric theory of Copernicus, which was simpler and—though only after further elaboration by Kepler—more predictively accurate.63 Artificial intelligence methods are now used in more areas than it would make sense to review here, but mentioning a sampling of them will give an idea of the breadth of applications. Aside from the game AIs listed in Table 1, there are hearing aids with algorithms that filter out ambient noise; route-finders that display maps and offer navigation advice to drivers; recommender systems that suggest books and music albums based on a user’s previous purchases and ratings; and medical decision support systems that help doctors diagnose breast cancer, recommend treatment plans, and aid in the interpretation of electrocardiograms. There are robotic pets and cleaning robots, lawn-mowing robots, rescue robots, surgical robots, and over a million industrial robots.64 The world population of robots exceeds 10 million.65 Modern speech recognition, based on statistical techniques such as hidden Markov models, has become sufficiently accurate for practical use (some fragments of this book were drafted with the help of a speech recognition program).

pages: 598 words: 134,339

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World by Bruce Schneier


23andMe, Airbnb, airport security, AltaVista, Anne Wojcicki, augmented reality, Benjamin Mako Hill, Black Swan, Brewster Kahle, Brian Krebs, call centre, Cass Sunstein, Chelsea Manning, citizen journalism, cloud computing, congestion charging, disintermediation, Edward Snowden, experimental subject, failed state, fault tolerance, Ferguson, Missouri, Filter Bubble, Firefox, friendly fire, Google Chrome, Google Glasses, hindsight bias, informal economy, Internet Archive, Internet of things, Jacob Appelbaum, Jaron Lanier, Julian Assange, Kevin Kelly, license plate recognition, linked data, Lyft, Mark Zuckerberg, Nash equilibrium, Nate Silver, national security letter, Network effects, Occupy movement, payday loans, pre–internet, price discrimination, profit motive, race to the bottom, RAND corporation, recommendation engine, RFID, self-driving car, Silicon Valley, Skype, smart cities, smart grid, Snapchat, social graph, software as a service, South China Sea, stealth mode startup, Steven Levy, Stuxnet, TaskRabbit, telemarketer, Tim Cook: Apple, transaction costs, Uber and Lyft, urban planning, WikiLeaks, zero day

The idea was that it would be useful for researchers; to protect people’s identity, they replaced names with numbers. So, for example, Bruce Schneier might be 608429. They were surprised when researchers were able to attach names to numbers by correlating different items in individuals’ search history. In 2008, Netflix published 10 million movie rankings by 500,000 anonymized customers, as part of a challenge for people to come up with better recommendation systems than the one the company was using at that time. Researchers were able to de-anonymize people by comparing rankings and time stamps with public rankings and time stamps in the Internet Movie Database. These might seem like special cases, but correlation opportunities pop up more frequently than you might think. Someone with access to an anonymous data set of telephone records, for example, might partially de-anonymize it by correlating it with a catalog merchant’s telephone order database.

pages: 752 words: 131,533

Python for Data Analysis by Wes McKinney


backtesting, cognitive dissonance, crowdsourcing, Debian, Firefox, Google Chrome, index card, random walk, recommendation engine, revision control, sentiment analysis, Sharpe ratio, side project, sorting algorithm, statistical model, type inference

MovieLens 1M Data Set GroupLens Research ( provides a number of collections of movie ratings data collected from users of MovieLens in the late 1990s and early 2000s. The data provide movie ratings, movie metadata (genres and year), and demographic data about the users (age, zip code, gender, and occupation). Such data is often of interest in the development of recommendation systems based on machine learning algorithms. While I will not be exploring machine learning techniques in great detail in this book, I will show you how to slice and dice data sets like these into the exact form you need. The MovieLens 1M data set contains 1 million ratings collected from 6000 users on 4000 movies. It’s spread across 3 tables: ratings, user information, and movie information.

Enriching the Earth: Fritz Haber, Carl Bosch, and the Transformation of World Food Production by Vaclav Smil


agricultural Revolution, Albert Einstein, demographic transition, Deng Xiaoping, Haber-Bosch Process, invention of gunpowder, Louis Pasteur, precision agriculture, recommendation engine, The Design of Experiments

Power. 1997. Soil Fertility Management 318 Notes to Chapter 10 for Sustainable Agriculture. Boca Raton, Fla.: Lewis Publishing; Trenkel, M. A. 1997. Improving Fertilizer Use Efficiency. Paris: IFA. 11. Havlin, J. L., et al., eds. 1994. Soil Testing: Prospects for Improving Nutrient Recommendations. Madison, Wis.: Soil Science Society of America; MacKenzie, G. H., and J.-C. Taureau. 1997. Recommendation Systems for Nitrogen—A Review. York: Fertiliser Society. Periodic testing for major macronutrients has been common in high-income nations for decades, but testing for micronutrient deficiencies (ranging from boron and copper in many crops to molybdenum and cobalt needed by nitrogenase in leguminous species) has been much less frequent. 12. Cassman, K. G., et al. 1993. Nitrogen use efficiency of rice reconsidered: what are the key constraints?

pages: 1,202 words: 144,667

The Linux kernel primer: a top-down approach for x86 and PowerPC architectures by Claudia Salzberg Rodriguez, Gordon Fischer, Steven Smolski


Debian, domain-specific language,, recommendation engine, Richard Stallman

Many of the C library routines available to user mode programs, such as the fork() function in Figure 3.9, bundle code and one or more system calls to accomplish a single function. When a user process calls one of these functions, certain values are placed into the appropriate processor registers and a software interrupt is generated. This software interrupt then calls the kernel entry point. Although not recommended, system calls (syscalls) can also be accessed from kernel code. From where a syscall should be accessed is the source of some discussion because syscalls called from the kernel can have an improvement in performance. This improvement in performance is weighed against the added complexity and maintainability of the code. In this section, we explore the "traditional" syscall implementation where syscalls are called from user space.

pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives by Steven Levy


23andMe, AltaVista, Anne Wojcicki, Apple's 1984 Super Bowl advert, autonomous vehicles, book scanning, Brewster Kahle, Burning Man, business process, clean water, cloud computing, crowdsourcing, Dean Kamen, discounted cash flows, don't be evil, Douglas Engelbart, El Camino Real, fault tolerance, Firefox, Gerard Salton, Google bus, Google Chrome, Google Earth, Googley, HyperCard, hypertext link, IBM and the Holocaust, informal economy, information retrieval, Internet Archive, Jeff Bezos, Kevin Kelly, Mark Zuckerberg, Menlo Park, optical character recognition, PageRank, Paul Buchheit, Potemkin village, prediction markets, recommendation engine, risk tolerance, Sand Hill Road, Saturday Night Live, search inside the book, second-price auction, Silicon Valley, skunkworks, Skype, slashdot, social graph, social software, social web, spectrum auction, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, Ted Nelson, telemarketer, trade route, traveling salesman, Vannevar Bush, web application, WikiLeaks, Y Combinator

While he put the pieces of YouTube together, though, he always kept in mind that he was documenting a traditional media system on the verge of collapse. He had to deal with the music world as it was but also plan for the way it would be after disruptions, which Google and YouTube were accelerating. Kamangar had some specific ideas for improvement of YouTube. He urged a simpler user interface and a smarter recommendation system to point users to other videos they might enjoy. He urged more flexibility with producers of professional video so YouTube would get more commercial content. He also emphasized how some of Google’s key attributes—notably speed—had a huge impact on the overall experience. If Google could reliably deliver videos with almost no latency, he reasoned, users might not balk so much at the “preroll” ads that come before the actual content, especially if the video was one of a series that users subscribed to and so were already eager to see what was coming.

pages: 933 words: 205,691

Hadoop: The Definitive Guide by Tom White


Amazon Web Services, bioinformatics, business intelligence, combinatorial explosion, database schema, Debian, domain-specific language,, fault tolerance, full text search, Grace Hopper, information retrieval, Internet Archive, linked data, loose coupling, openstreetmap, recommendation engine, RFID, SETI@home, social graph, web application

The player or website can be used to access these streams and extra functionality is made available to the user, allowing her to love, skip, or ban each track that she listens to. When processing the received data, we distinguish between a track listen submitted by a user (the first source above, referred to as a scrobble from here on) and a track listened to on the radio (the second source, mentioned earlier, referred to as a radio listen from here on). This distinction is very important in order to prevent a feedback loop in the recommendation system, which is based only on scrobbles. One of the most fundamental Hadoop jobs at takes the incoming listening data and summarizes it into a format that can be used for display purposes on the website as well as for input to other Hadoop programs. This is achieved by the Track Statistics program, which is the example described in the following sections. The Track Statistics Program When track listening data is submitted to, it undergoes a validation and conversion phase, the end result of which is a number of space-delimited text files containing the user ID, the track ID, the number of times the track was scrobbled, the number of times the track was listened to on the radio, and the number of times it was skipped.

pages: 678 words: 216,204

The Wealth of Networks: How Social Production Transforms Markets and Freedom by Yochai Benkler


affirmative action, barriers to entry, bioinformatics, Brownian motion, call centre, Cass Sunstein, centre right, clean water, dark matter, desegregation, East Village, fear of failure, Firefox, game design, George Gilder, hiring and firing, Howard Rheingold, informal economy, invention of radio, Isaac Newton, iterative process, Jean Tirole, jimmy wales, market bubble, market clearing, Marshall McLuhan, New Journalism, optical character recognition, pattern recognition, pre–internet, price discrimination, profit maximization, profit motive, random walk, recommendation engine, regulatory arbitrage, rent-seeking, RFID, Richard Stallman, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, shareholder value, Silicon Valley, Skype, slashdot, social software, software patent, spectrum auction, technoutopianism, The Fortune at the Bottom of the Pyramid, The Nature of the Firm, transaction costs

Cable broadband covers roughly two-thirds of the home market, in many places without alternative; and where there is an alternative, there is only one--the incumbent telephone company. Without one of these noncompetitive infrastructure owners, the home user has no broadband access to the Internet. In Amazon's case, the consumer outrage when the practice was revealed focused on the lack of transparency. Users had little objection to clearly demarcated advertisement. The resistance was to the nontransparent manipulation of the recommendation system aimed at causing the consumers to act in ways consistent with Amazon's goals, rather than their own. In that case, however, there were alternatives. There are many different places from which to find book reviews and recommendations, and [pg 157] at the time, was already available as an online bookseller--and had not significantly adopted similar practices. The exaction was therefore less significant.

pages: 647 words: 43,757

Types and Programming Languages by Benjamin C. Pierce


Albert Einstein, combinatorial explosion, experimental subject, finite state, Henri Poincaré, recommendation engine, sorting algorithm, Turing complete, Turing machine, type inference, Y Combinator

In particular, the proofs of type preservation and progress are straightforward extensions of the ones we saw in Chapter 9. 23.5.1 23.5.2 Theorem [Preservation]: If Γ ` t : T and t -→ t0 , then Γ ` t0 : T. Proof: Exercise [Recommended, «««]. Theorem [Progress]: If t is a closed, well-typed term, then either t is a value or else there is some t 0 with t -→ t0 . Proof: Exercise [Recommended, «««]. System F also shares with λ→ the property of normalization—the fact that the evaluation of every well-typed program terminates. 2 Unlike the type safety theorems above, normalization is quite difficult to prove (indeed, it is somewhat astonishing that it holds at all, considering that we can code things like sorting functions in the pure language, as we did in Exercise 23.4.12, without resorting to fix).

pages: 834 words: 180,700

The Architecture of Open Source Applications by Amy Brown, Greg Wilson


8-hour work day, anti-pattern, bioinformatics,, cloud computing, collaborative editing, combinatorial explosion, computer vision, continuous integration, create, read, update, delete, Debian, domain-specific language,, fault tolerance, finite state, Firefox, friendly fire, linked data, load shedding, locality of reference, loose coupling, Mars Rover, MVC pattern, premature optimization, recommendation engine, revision control, side project, Skype, slashdot, social web, speech recognition, the scientific method, The Wisdom of Crowds, web application, WebSocket

VisTrails addresses important usability issues that have hampered a wider adoption of workflow and visualization systems. To cater to a broader set of users, including many who do not have programming expertise, it provides a series of operations and user interfaces that simplify workflow design and use [FSC+06], including the ability to create and refine workflows by analogy, to query workflows by example, and to suggest workflow completions as users interactively construct their workflows using a recommendation system [SVK+07]. We have also developed a new framework that allows the creation of custom applications that can be more easily deployed to (non-expert) end users. The extensibility of VisTrails comes from an infrastructure that makes it simple for users to integrate tools and libraries, as well as to quickly prototype new functions. This has been instrumental in enabling the use of the system in a wide range of application areas, including environmental sciences, psychiatry, astronomy, cosmology, high-energy physics, quantum physics, and molecular modeling.