natural language processing

157 results back to index


pages: 502 words: 107,510

Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs

Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, game design, information retrieval, iterative process, natural language processing, pattern recognition, performance metric, sentiment analysis, social web, speech recognition, statistical model, text mining

., Maximum Entropy Classifiers maximum a posteriori (MAP) hypothesis, Naïve Bayes Learning sentiment classification, Sentiment classification Named Entities (NEs), The Annotation Development Cycle, Adding Named Entities, Inline Annotation, Example 3: Extent Annotations—Named Entities, Example 3: Extent Annotations—Named Entities as extent tags, Example 3: Extent Annotations—Named Entities and inline tagging, Inline Annotation and models, Adding Named Entities Simple Named Entity Guidelines V6.5, Example 3: Extent Annotations—Named Entities Narrative Containers, Narrative Containers–Narrative Containers natural language processing, What Is Natural Language Processing?–What Is Natural Language Processing? (see NLP (natural language processing)) Natural Language Processing with Python (Bird, Klein, and Loper), What Is Natural Language Processing?, Collecting Data from the Internet, Training: Machine Learning, Gender Identification–Gender Identification gender identification problem in, Gender Identification–Gender Identification NCSU, TempEval-2 system, TempEval-2: System Summaries neg-content-term, Decision Tree Learning Netflix, Film Genre Classification, Example 2: Multiple Labels—Film Genres New York Times, Building the Corpus NIST TREC Tracks, NLP Challenges NLP (natural language processing), The Importance of Language Annotation–The Importance of Language Annotation, The Layers of Linguistic Description–The Layers of Linguistic Description, What Is Natural Language Processing?

In Proceedings of the 5th International Workshop on Semantic Evaluation. Madnani, Nitin. 2007. “Getting Started on Natural Language Processing with Python.” ACM Crossroads 13(4). Updated version available at http://www.desilinguist.org/. Accessed May 16, 2012. Madnani, Nitin, and Jimmy Lin. Natural Language Processing with Hadoop and Python. http://www.cloudera.com/blog/2010/03/natural-language-processing-with-hadoopand-python/. Posted March 16, 2010. Mani, Inderjeet, Marc Verhagen, Ben Wellner, Chong Min Lee, and James Pustejovsky. 2006. Proceedings of Machine Learning of Temporal Relations. ACL 2006, Sydney, Australia. Manning, Chris, and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. 2008.

Pragmatics The study of how the context of text affects the meaning of an expression, and what information is necessary to infer a hidden or presupposed meaning. Text structure analysis The study of how narratives and other textual styles are constructed to make larger textual compositions. Throughout this book we will present examples of annotation projects that make use of various combinations of the different concepts outlined in the preceding list. What Is Natural Language Processing? Natural Language Processing (NLP) is a field of computer science and engineering that has developed from the study of language and computational linguistics within the field of Artificial Intelligence. The goals of NLP are to design and build applications that facilitate human interaction with machines and other devices through the use of natural language. Some of the major areas of NLP include: Question Answering Systems (QAS) Imagine being able to actually ask your computer or your phone what time your favorite restaurant in New York stops serving dinner on Friday nights.


Natural Language Processing with Python and spaCy by Yuli Vasiliev

Bayesian statistics, computer vision, database schema, en.wikipedia.org, loose coupling, natural language processing, Skype, statistical model

BRIEF CONTENTS Introduction Chapter 1: How Natural Language Processing Works Chapter 2: The Text-Processing Pipeline Chapter 3: Working with Container Objects and Customizing spaCy Chapter 4: Extracting and Using Linguistic Features Chapter 5: Working with Word Vectors Chapter 6: Finding Patterns and Walking Dependency Trees Chapter 7: Visualizations Chapter 8: Intent Recognition Chapter 9: Storing User Input in a Database Chapter 10: Training Models Chapter 11: Deploying Your Own Chatbot Chapter 12: Implementing Web Data and Processing Images Appendix: Linguistic Primer Index CONTENTS IN DETAIL INTRODUCTION Using Python for Natural Language Processing The spaCy Library Who Should Read This Book? What’s in the Book? 1 HOW NATURAL LANGUAGE PROCESSING WORKS How Can Computers Understand Language?

NATURAL LANGUAGE PROCESSING WITH PYTHON AND SPACY A Practical Introduction by Yuli Vasiliev San Francisco NATURAL LANGUAGE PROCESSING WITH PYTHON AND SPACY. Copyright © 2020 by Yuli Vasiliev. All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. ISBN-10: 1-7185-0052-1 ISBN-13: 978-1-7185-0052-5 Publisher: William Pollock Production Editors: Kassie Andreadis and Laurel Chun Cover Illustration: Gina Redman Photography: Igor Shabalin Developmental Editor: Frances Saux Technical Reviewers: Ivan Brigida and Geoff Bacon Copyeditor: Anne Marie Walker Compositor: Happenstance Type-O-Rama Proofreader: James Fraleigh Indexer: Beth Nauman-Montana For information on distribution, translations, or bulk sales, please contact No Starch Press, Inc. directly: No Starch Press, Inc. 245 8th Street, San Francisco, CA 94103 phone: 1.415.863.9900; info@nostarch.com www.nostarch.com A catalog record of this book is available from the Library of Congress.

1 HOW NATURAL LANGUAGE PROCESSING WORKS How Can Computers Understand Language? Mapping Words and Numbers with Word Embedding Using Machine Learning for Natural Language Processing Why Use Machine Learning for Natural Language Processing? What Is a Statistical Model in NLP? Neural Network Models Convolutional Neural Networks for NLP What Is Still on You Keywords Context Meaning Transition Summary 2 THE TEXT-PROCESSING PIPELINE Setting Up Your Working Environment Installing Statistical Models for spaCy Basic NLP Operations with spaCy Tokenization Lemmatization Applying Lemmatization for Meaning Recognition Part-of-Speech Tagging Using Part-of-Speech Tags to Find Relevant Verbs Context Is Important Syntactic Relations Try This Named Entity Recognition Summary 3 WORKING WITH CONTAINER OBJECTS AND CUSTOMIZING SPACY spaCy’s Container Objects Getting the Index of a Token in a Doc Object Iterating over a Token’s Syntactic Children The doc.sents Container The doc.noun_chunks Container Try This The Span Object Try This Customizing the Text-Processing Pipeline Disabling Pipeline Components Loading a Model Step by Step Customizing the Pipeline Components Using spaCy’s C-Level Data Structures How It Works Preparing Your Working Environment and Getting Text Files Your Cython Script Building a Cython Module Testing the Module Summary 4 EXTRACTING AND USING LINGUISTIC FEATURES Extracting and Generating Text with Part-of-Speech Tags Numeric, Symbolic, and Punctuation Tags Extracting Descriptions of Money Try This Turning Statements into Questions Try This Using Syntactic Dependency Labels in Text Processing Distinguishing Subjects from Objects Deciding What Question a Chatbot Should Ask Try This Summary 5 WORKING WITH WORD VECTORS Understanding Word Vectors Defining Meaning with Coordinates Using Dimensions to Represent Meaning The Similarity Method Choosing Keywords for Semantic Similarity Calculations Installing Word Vectors Taking Advantage of Word Vectors That Come with spaCy Models Using Third-Party Word Vectors Comparing spaCy Objects Using Semantic Similarity for Categorization Tasks Extracting Nouns as a Preprocessing Step Try This Extracting and Comparing Named Entities Summary 6 FINDING PATTERNS AND WALKING DEPENDENCY TREES Word Sequence Patterns Finding Patterns Based on Linguistic Features Try This Checking an Utterance for a Pattern Using spaCy’s Matcher to Find Word Sequence Patterns Applying Several Patterns Creating Patterns Based on Customized Features Choosing Which Patterns to Apply Using Word Sequence Patterns in Chatbots to Generate Statements Try This Extracting Keywords from Syntactic Dependency Trees Walking a Dependency Tree for Information Extraction Iterating over the Heads of Tokens Condensing a Text Using Dependency Trees Try This Using Context to Improve the Ticket-Booking Chatbot Making a Smarter Chatbot by Finding Proper Modifiers Summary 7 VISUALIZATIONS Getting Started with spaCy’s Built-In Visualizers displaCy Dependency Visualizer displaCy Named Entity Visualizer Visualizing from Within spaCy Visualizing Dependency Parsing Try This Sentence-by-Sentence Visualizations Customizing Your Visualizations with the Options Argument Using Dependency Visualizer Options Try This Using Named Entity Visualizer Options Exporting a Visualization to a File Using displaCy to Manually Render Data Formatting the Data Try This Summary 8 INTENT RECOGNITION Extracting the Transitive Verb and Direct Object for Intent Recognition Obtaining the Transitive Verb/Direct Object Pair Extracting Multiple Intents with token.conjuncts Try This Using Word Lists to Extract the Intent Finding the Meanings of Words Using Synonyms and Semantic Similarity Recognizing Synonyms Using Predefined Lists Try This Recognizing Implied Intents Using Semantic Similarity Try This Extracting Intent from a Sequence of Sentences Walking the Dependency Structures of a Discourse Replacing Proforms with Their Antecedents Try This Summary 9 STORING USER INPUT IN A DATABASE Converting Unstructured Data into Structured Data Extracting Data into Interchange Formats Moving Application Logic to the Database Building a Database-Powered Chatbot Gathering the Data and Building a JSON Object Converting Number Words to Numbers Preparing Your Database Environment Sending Data to the Underlying Database When a User’s Request Doesn’t Contain Enough Information Try This Summary 10 TRAINING MODELS Training a Model’s Pipeline Component Training the Entity Recognizer Deciding Whether You Need to Train the Entity Recognizer Creating Training Examples Automating the Example Creation Process Disabling the Other Pipeline Components The Training Process Evaluating the Updated Recognizer Creating a New Dependency Parser Custom Syntactic Parsing to Understand User Input Deciding on Types of Semantic Relations to Use Creating Training Examples Training the Parser Testing Your Custom Parser Try This Summary 11 DEPLOYING YOUR OWN CHATBOT How Implementing and Deploying a Chatbot Works Using Telegram as a Platform for Your Bot Creating a Telegram Account and Authorizing Your Bot Getting Started with the python-telegram-bot Library Using the telegram.ext Objects Creating a Telegram Chatbot That Uses spaCy Expanding the Chatbot Holding the State of the Current Chat Putting All the Pieces Together Try This Summary 12 IMPLEMENTING WEB DATA AND PROCESSING IMAGES How It Works Making Your Bot Find Answers to Questions from Wikipedia Determining What the Question Is About Try This Using Wikipedia to Answer User Questions Try This Reacting to Images Sent in a Chat Generating Descriptive Tags for Images Using Clarifai Using Tags to Generate Text Responses to Images Putting All the Pieces Together in a Telegram Bot Importing the Libraries Writing the Helper Functions Writing the Callback and main() Functions Testing the Bot Try This Summary LINGUISTIC PRIMER Dependency Grammars vs.


pages: 174 words: 56,405

Machine Translation by Thierry Poibeau

AltaVista, augmented reality, call centre, Claude Shannon: information theory, cloud computing, combinatorial explosion, crowdsourcing, easy for humans, difficult for computers, en.wikipedia.org, Google Glasses, information retrieval, Internet of things, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, natural language processing, Necker cube, Norbert Wiener, RAND corporation, Robert Mercer, Skype, speech recognition, statistical model, technological singularity, Turing test, wikimedia commons

Processing natural languages (as opposed to processing formal languages, such as the programming languages used by computers) is difficult in itself, mainly because at the heart of natural language lie vagueness and ambiguity. Natural Languages and Ambiguity Linguists as well as computer scientists have been interested ever since the creation of computers in natural language processing, a field also called computational linguistics. Natural language processing is difficult because, by default, computers do not have any knowledge of what a language is. It is thus necessary to specify the definition of a word, a phrase, and a sentence. So far, things may not seem too difficult (however, think about expressions like: “isn’t it,” “won’t,” “U.S.,” “$80”: it is not always clear what is a word and how many words are involved in such expressions) and not so different from formal languages, which are also made of words.

However, the memorandum underestimated the problem of ambiguity. Weaver wrote: “Ambiguity, moreover, attaches primarily to nouns, verbs, and adjectives; and actually (at least so I suppose) to relatively few nouns, verbs, and adjectives.” We now know that ambiguity is the most pervasive problem in natural language processing and applies to nearly all kinds of words, which makes ambiguity a much bigger problem than initially thought. Ambiguity is the most pervasive problem in natural language processing and applies to nearly all kinds of words, which makes ambiguity a much bigger problem than initially thought. The second principle was based on work done in logic and had a profound influence on the concept of formal grammar, which is used for analyzing artificial languages (particularly programming languages) as well as natural languages.

The First Evaluation Campaigns Since the beginnings of machine translation, evaluation has been perceived as necessary, more so than in other fields of natural language processing, probably because machine translation was seen from the beginning as an applicative field and very concrete results were expected. We have seen in this regard that the ALPAC report was very negative and rather skeptical about the quality that could be hoped for from such systems (see chapter 6). At the beginning of the 1990s, with the renewal of research based on a statistical approach originally proposed by IBM, the need to measure machine translation systems was again felt. As is often the case in the field of natural language processing, it was an American funding agency, the Advanced Research Project Agency (ARPA, later known as DARPA1), that initiated research in this area.


pages: 350 words: 98,077

Artificial Intelligence: A Guide for Thinking Humans by Melanie Mitchell

Ada Lovelace, AI winter, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, artificial general intelligence, autonomous vehicles, Bernie Sanders, Claude Shannon: information theory, cognitive dissonance, computer age, computer vision, dark matter, Douglas Hofstadter, Elon Musk, en.wikipedia.org, Gödel, Escher, Bach, I think there is a world market for maybe five computers, ImageNet competition, Jaron Lanier, job automation, John Markoff, John von Neumann, Kevin Kelly, Kickstarter, license plate recognition, Mark Zuckerberg, natural language processing, Norbert Wiener, ought to be enough for anybody, pattern recognition, performance metric, RAND corporation, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Rodney Brooks, self-driving car, sentiment analysis, Silicon Valley, Singularitarianism, Skype, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, theory of mind, There's no reason for any individual to have a computer in his home - Ken Olsen, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!

It’s easy, at least for us as humans, to read between the lines. After all, understanding language—including the parts that are left unsaid—is a fundamental part of human intelligence. It’s no accident that Alan Turing framed his famous “imitation game” as a contest involving the generation and understanding of language. This part of the book deals with natural-language processing, which means “getting computers to deal with human language.” (In AI-speak, “natural” means “human.”) Natural-language processing (abbreviated NLP) includes topics such as speech recognition, web search, automated question answering, and machine translation. Similar to what we’ve seen in previous chapters, deep learning has been the driving force behind most of the recent advances in NLP. I’ll describe some of these advances, using the “Restaurant” story to illustrate a few of the major challenges machines face when it comes to using and understanding human language.

If you pour all the water out of a bottle, the bottle thereby becomes empty. Language also relies on commonsense knowledge of the other people with whom we communicate. A person who asks for a hamburger cooked rare but gets a burned one instead will not be happy. If someone says that a movie is “too dark for my taste,” then the person didn’t like it. While natural-language processing by machines has come a long way, I don’t believe that machines will be able to fully understand human language until they have humanlike common sense. This being said, natural-language processing systems are becoming ever more ubiquitous in our lives—transcribing our words, analyzing our sentiments, translating our documents, and answering our questions. Does the lack of humanlike understanding in such systems, however sophisticated their performance, inevitably result in their being brittle, unreliable, and vulnerable to attack?

; IBM Watson’s match jobs, see unemployment Johnson, George Johnson, Mark K Kapor, Mitchell Karpathy, Andrej Kasparov, Garry Kelly, Kevin Kreye, Andrian Krizhevsky, Alex Kurzweil, Ray L Lakoff, George Landecker, Will LeCun, Yann Lee, Sedol Legg, Shane Lenat, Douglas LeNet Levesque, Hector Li, Fei-Fei Lickel, Charles Long Bets long short-term memory long tail; see also long-tail problem long-tail problem Lovelace, Ada LSTM, see long short-term memory M machine learning; adversarial, see adversarial learning; bias in, see bias; interpretable, see explainable AI; overfitting in, see overfitting; transfer learning in, see transfer learning machine morality, see moral AI machine translation; comparison between humans and machines; evaluating; neural; statistical; see also Google Translate Manning, Christopher Marcus, Gary Markoff, John Marshall, James McCarthy, John McClelland, James Mechanical Turk, see Amazon Mechanical Turk Metacat metacognition metaphors Metaphors We Live By (book) Miller, George Minsky, Marvin Monte Carlo method Monte Carlo tree search; roll-outs Moore, Gordon Moore’s law moral AI Morgenstern, Leora Mullainathan, Sendhil Müller, Vincent multilayer neural networks; Minsky and Papert’s speculations on; see also neural networks Musk, Elon MYCIN N narrow AI natural-language processing: adversarial attacks on; challenges for; definition of; rule-based approaches to; statistical approaches to; see also machine translation; question answering; reading comprehension; sentiment classification; speech recognition; word vectors neocognitron network neural engineering neural machine translation; see also Google Translate; machine translation neural networks: activations in; classification in; convolutional, see convolutional neural networks; deep, see deep learning; depth of; hidden layers; learning in; multilayer; recurrent; 199–200; units in; see also back-propagation; deep learning Newell, Allen Ng, Andrew NLP, see natural-language processing O object recognition; in the brain; comparing ConvNets and humans on; see also ImageNet; PASCAL Visual Object Classes competition Olsen, Ken one-hot encoding operant conditioning overfitting P Page, Larry Papert, Seymour Partnership on AI PASCAL Visual Object Classes competition perceptron learning algorithm perceptrons; analogy with neurons; compared with multilayer neural networks; for handwritten digit recognition; inputs; learning algorithm; limitations of; threshold; as subsymbolic AI approach; weights Perceptrons (book) Pew Research Center Pinker, Steven privacy Q Q-learning; see also deep Q-learning Q-table question answering; 214–15; adversarial attacks on; see also IBM Watson; reading comprehension; Stanford Question-Answering Dataset; Winograd schemas R reading comprehension recurrent neural networks; 199–200 regulation reinforcement learning; actions of agent in; contrast with supervised learning; deep Q-learning, see deep Q-learning; discounting in; episode; epsilon-greedy method for; exploration versus exploitation; Q-learning; Q-table; rewards in; state of agent in; value of action robot soccer Rochester, Nathaniel Rose, Charlie Rosenblatt, Frank Rota, Gian-Carlo Rumelhart, David Rutter, Brad S Samuel, Arthur Samuel’s checkers-playing program; alpha-beta pruning in; evaluation function Sander, Emmanuel Searle, John self-driving cars; 117–18; 267–71; adversarial examples for; benefits of; ethics for; geofencing for; levels of autonomy for; partial versus full autonomy for; safety drivers for; training data for semantic space of words sentiment classification Shannon, Claude Sharpless, Ned Show and Tell (image-captioning system) Simon, Herbert Sims, Karl Singularity Singularity University Situate program Skinner, B.


pages: 504 words: 89,238

Natural language processing with Python by Steven Bird, Ewan Klein, Edward Loper

bioinformatics, business intelligence, conceptual framework, Donald Knuth, elephant in my pajamas, en.wikipedia.org, finite state, Firefox, Guido van Rossum, information retrieval, Menlo Park, natural language processing, P = NP, search inside the book, speech recognition, statistical model, text mining, Turing test

Natural Language Processing with Python Natural Language Processing with Python Steven Bird, Ewan Klein, and Edward Loper Beijing • Cambridge • Farnham • Köln • Sebastopol • Taipei • Tokyo Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper Copyright © 2009 Steven Bird, Ewan Klein, and Edward Loper. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com. Editor: Julie Steele Production Editor: Loranah Dimant Copyeditor: Genevieve d’Entremont Proofreader: Loranah Dimant Indexer: Ellen Troutman Zaig Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano Printing History: June 2009: First Edition.

Managing Linguistic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 11.1 11.2 11.3 11.4 Corpus Structure: A Case Study The Life Cycle of a Corpus Acquiring Data Working with XML 407 412 416 425 Table of Contents | vii 11.5 11.6 11.7 11.8 11.9 Working with Toolbox Data Describing Language Resources Using OLAC Metadata Summary Further Reading Exercises 431 435 437 437 438 Afterword: The Language Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 NLTK Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 General Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 viii | Table of Contents Preface This is a book about Natural Language Processing. By “natural language” we mean a language that is used for everyday communication by humans; languages such as English, Hindi, or Portuguese. In contrast to artificial languages such as programming languages and mathematical notations, natural languages have evolved as they pass from generation to generation, and are hard to pin down with explicit rules. We will take Natural Language Processing—or NLP for short—in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles.

[Church and Patil, 1982] Kenneth Church and Ramesh Patil. Coping with syntactic ambiguity or how to put the block in the box on the table. American Journal of Computational Linguistics, 8:139–149, 1982. [Cohen and Hunter, 2004] K. Bretonnel Cohen and Lawrence Hunter. Natural language processing and systems biology. In Werner Dubitzky and Francisco Azuaje, editors, Artificial Intelligence Methods and Tools for Systems Biology, page 147–174 Springer Verlag, 2004. [Cole, 1997] Ronald Cole, editor. Survey of the State of the Art in Human Language Technology. Studies in Natural Language Processing. Cambridge University Press, 1997. [Copestake, 2002] Ann Copestake. Implementing Typed Feature Structure Grammars. CSLI Publications, Stanford, CA, 2002. [Corbett, 2006] Greville G. Corbett. Agreement. Cambridge University Press, 2006.


pages: 424 words: 114,905

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again by Eric Topol

23andMe, Affordable Care Act / Obamacare, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, artificial general intelligence, augmented reality, autonomous vehicles, bioinformatics, blockchain, cloud computing, cognitive bias, Colonization of Mars, computer age, computer vision, conceptual framework, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, dark matter, David Brooks, digital twin, Elon Musk, en.wikipedia.org, epigenetics, Erik Brynjolfsson, fault tolerance, George Santayana, Google Glasses, ImageNet competition, Jeff Bezos, job automation, job satisfaction, Joi Ito, Mark Zuckerberg, medical residency, meta analysis, meta-analysis, microbiome, natural language processing, new economy, Nicholas Carr, nudge unit, pattern recognition, performance metric, personalized medicine, phenotype, placebo effect, randomized controlled trial, recommendation engine, Rubik’s Cube, Sam Altman, self-driving car, Silicon Valley, speech recognition, Stephen Hawking, text mining, the scientific method, Tim Cook: Apple, War on Poverty, Watson beat the top human players on Jeopardy!, working-age population

Numerous potent drugs failed to reduce the seizures; in fact, they were getting even more pronounced. The infant’s prognosis, including both brain damage and death, was bleak. A blood sample was sent to Rady’s Genomic Institute for a rapid whole-genome sequencing. The sequence encompassed 125 gigabytes of data, including nearly 5 million locations where the child’s genome differed from the most common one. It took twenty seconds for a form of AI called natural-language processing to ingest the boy’s electronic medical record and determine eighty-eight phenotype features (almost twenty times more than the doctors had summarized in their problem list). Machine-learning algorithms quickly sifted the approximately 5 million genetic variants to find the roughly 700,000 rare ones. Of those, 962 are known to cause diseases. Combining that information with the boy’s phenotypic data, the system identified one, in a gene called ALDH7A1, as the most likely culprit.

Shantanu Nundy, an internist, is optimistic.26 He was seeing a woman in her thirties with stiffness and joint pain in her hands. He was unsure of the diagnosis of rheumatoid arthritis, so he posted on the HumanDx app “35F with pain and joint stiffness in L/R hands X 6 months, suspected rheumatoid arthritis.” He also uploaded a picture of her inflamed hands. Within hours, multiple rheumatologists confirmed the diagnosis. Human Dx intends to recruit at least 100,000 doctors by 2022 and increase the use of natural-language-processing algorithms to direct the key data to the appropriate specialists, combining AI tools with doctor crowdsourcing. An alternative model for crowdsourcing to improve diagnosis incorporates citizen science. Developed by CrowdMed, the platform sets up a financially incentivized competition among doctors and lay people to crack difficult diagnostic cases. The use of non-clinicians for this purpose is quite novel and has already led to unexpected outcomes: as Jared Heyman, the company’s founder and CEO told me, the lay participants have a higher rate of accurate diagnosis than the participating doctors do.

It’s also useful to think of algorithms as existing on a continuum from those that are entirely human guided to those that are entirely machine guided, with deep learning at the far machine end of the scale.12 Artificial Intelligence—the science and engineering of creating intelligent machines that have the ability to achieve goals like humans via a constellation of technologies Neural Network (NN)—software constructions modeled after the way adaptable neurons in the brain were understood to work instead of human guided rigid instructions Deep Learning—a type of neural network, the subset of machine learning composed of algorithms that permit software to train itself to perform tasks by processing multilayered networks of data Machine Learning—computers’ ability to learn without being explicitly programmed, with more than fifteen different approaches like Random Forest, Bayesian networks, Support Vector machine uses, computer algorithms to learn from examples and experiences (datasets) rather than predefined, hard rules-based methods Supervised Learning—an optimization, trial-and-error process based on labeled data, algorithm comparing outputs with the correct outputs during training Unsupervised Learning—the training samples are not labeled; the algorithm just looks for patterns, teaches itself Convolutional Neural Network—using the principle of convolution, a mathematical operation that basically takes two functions to produce a third one; instead of feeding in the entire dataset, it is broken into overlapping tiles with small neural networks and max-pooling, used especially for images Natural-Language Processing—a machine’s attempt to “understand” speech or written language like humans Generative Adversarial Networks—a pair of jointly trained neural networks, one generative and the other discriminative, whereby the former generates fake images and the latter tries to distinguish them from real images Reinforcement Learning—a type of machine learning that shifts the focus to an abstract goal or decision making, a technology for learning and executing actions in the real world Recurrent Neural Network—for tasks that involve sequential inputs, like speech or language, this neural network processes an input sequence one element at a time Backpropagation—an algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation on the previous layer passing values backward through the network; how the synapses get updated over time; signals are automatically sent back through the network to update and adjust the weighting values Representation Learning—set of methods that allows a machine with raw data to automatically discover the representations needed for detection or classification Transfer Learning—the ability of an AI to learn from different tasks and apply its precedent knowledge to a completely new task General Artificial Intelligence—perform a wide range of tasks, including any human task, without being explicitly programmed TABLE 4.1: Glossary.


pages: 523 words: 61,179

Human + Machine: Reimagining Work in the Age of AI by Paul R. Daugherty, H. James Wilson

3D printing, AI winter, algorithmic trading, Amazon Mechanical Turk, augmented reality, autonomous vehicles, blockchain, business process, call centre, carbon footprint, cloud computing, computer vision, correlation does not imply causation, crowdsourcing, digital twin, disintermediation, Douglas Hofstadter, en.wikipedia.org, Erik Brynjolfsson, friendly AI, future of work, industrial robot, Internet of things, inventory management, iterative process, Jeff Bezos, job automation, job satisfaction, knowledge worker, Lyft, natural language processing, personalized medicine, precision agriculture, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, Rodney Brooks, Second Machine Age, self-driving car, sensor fusion, sentiment analysis, Shoshana Zuboff, Silicon Valley, software as a service, speech recognition, telepresence, telepresence robot, text mining, the scientific method, uber lyft

It also inspired entirely new areas of research in the decades that followed. For instance, Minsky, with Seymour Papert, wrote what was considered the foundational book on scope and limitations of neural networks, a kind of AI that uses biological neurons as its model. Other ideas like expert systems—wherein a computer contained deep stores of “knowledge” for specific domains like architecture or medical diagnosis—and natural language processing, computer vision, and mobile robotics can also be traced back to the event. One conference participant was Arthur Samuel, an engineer at IBM who was building a computer program to play checkers. His program would assess the current state of a checkers board and calculate the probability that a given position could lead to a win. In 1959, Samuel coined the term “machine learning”: the field of study that gives computers the ability to learn without being explicitly programmed.

Because the read-sort-route process is clearly defined, it is in some ways an excellent example of a process ripe for automation. But because the incoming information is text-based and is considered “unstructured” in the eyes of software systems, parsing could have been difficult for a less advanced system. Enter AI. Virgin Trains has now installed a machine-learning platform, inSTREAM, with natural-language processing capabilities that can recognize patterns in unstructured data by analyzing a corpus of similar examples—in this case, complaints—and by tracking how customer service representatives interact with incoming text. Now when a complaint arrives at Virgin Trains, it’s automatically read, sorted, and packaged into a case ready file that an employee can quickly review and process. The most common complaints get appropriate, automated responses.

These human mentors supervise learning and performance and identify new ways to apply the technology for customer service.6 We discuss this type of human-machine collaboration in greater detail in chapter 5. Aida is showing that automated natural-language customer communications are possible in large and complex business environments. As natural-language techniques improve and interfaces advance, they will continue spreading throughout different business functions in various industries. In chapter 4 we’ll discuss how various natural-language processing chatbots like Amazon’s Alexa are becoming the new front-office faces of companies. Redefining an Entire Industry As AI becomes increasingly capable of adding intelligence to middle- and back-office processes, the technology could potentially redefine entire industries. In IT security, for instance, a growing number of security firms are combining machine-learning approaches to build ultra-smart, continually evolving defenses against malicious software.


pages: 161 words: 39,526

Applied Artificial Intelligence: A Handbook for Business Leaders by Mariya Yao, Adelyn Zhou, Marlene Jia

Airbnb, Amazon Web Services, artificial general intelligence, autonomous vehicles, business intelligence, business process, call centre, chief data officer, computer vision, conceptual framework, en.wikipedia.org, future of work, industrial robot, Internet of things, iterative process, Jeff Bezos, job automation, Marc Andreessen, natural language processing, new economy, pattern recognition, performance metric, price discrimination, randomized controlled trial, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, skunkworks, software is eating the world, source of truth, speech recognition, statistical model, strong AI, technological singularity

The outpouring of responses provoked a government response and led UNICEF and Liberia’s Minister of Education to collaborate on a plan to stop the abuse of authority. In many parts of the world, citizens can’t utilize the feature-rich but data-intensive mobile apps that many of us enjoy due to bandwidth limitations and limited access to phones with up-to-date features. Being limited to voice calls and SMS means that technologies like natural language processing (NLP), dialog systems, and conversational bots become critically important to delivering value. Medical Diagnosis AI can dramatically streamline and improve medical care and our overall health and wellbeing. The fields of pathology and radiology, both of which rely largely on trained human eyes to spot anomalies, are being revolutionized by advancements in computer vision. Pathology is especially subjective, with studies showing that two pathologists assessing the same slide of biopsied tissue will only agree about 60 percent of the time.(25) Researchers at Houston Methodist Research Institute in Texas announced an AI system for diagnosing breast cancer that utilizes computer vision techniques optimized for medical image recognition,(26) which interpreted patient records with a 99 percent accuracy rate.(27) In radiology, 12.1 million mammograms are performed annually in the United States, but half yield false positive results, which means that one in two healthy women may be wrongly diagnosed with cancer.

Retrieved from http://ureport.in/story/194/ (25) Study Finds Computers Surpass Pathologists in Predicting Lung Cancer Type, Severity. (2016). The ASCO Post. Retrieved from http://www.ascopost.com/News/43849 (26) Patel, T. A., Puppala, M., Ogunti, R. O., Ensor, J. E., He, T., Shewale, J. B., & Chang, J. C. (2016). Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods. Cancer, 123(1), 114-121. doi:10.1002/cncr.30245 (27) As validated against a gold standard review conducted on a sample of records by the study’s co-authors, which required 50 to 70 hours. (28) Csail, A. C. (2017, October 16). Using artificial intelligence to improve early breast cancer detection. MIT News. Retrieved from http://news.mit.edu/2017/artificial-intelligence-early-breast-cancer-detection-1017 4.

In most cases, having and using a fantastic machine learning algorithm is less important than deploying a well-designed user experience (UX) for your products. Thoughtful UX design that delights users will drive up engagement, which in turn increases the interactions you can capture for future data and analysis. Thoughtful UX compensates for areas where AI capabilities may be lacking, such as in natural language processing (NLP) for open-domain conversation. In order to develop “thoughtful UX," you’ll need both strong product development and engineering talent as well as partners who have domain expertise and business acumen. A common pattern observed in both academia and industry engineering teams is their propensity to optimize for tactical wins over strategic initiatives. While brilliant minds worry about achieving marginal improvements in competitive benchmarks, the nitty-gritty issues of productizing and operationalizing AI for real-world use cases are often ignored.


pages: 301 words: 85,126

AIQ: How People and Machines Are Smarter Together by Nick Polson, James Scott

Air France Flight 447, Albert Einstein, Amazon Web Services, Atul Gawande, autonomous vehicles, availability heuristic, basic income, Bayesian statistics, business cycle, Cepheid variable, Checklist Manifesto, cloud computing, combinatorial explosion, computer age, computer vision, Daniel Kahneman / Amos Tversky, Donald Trump, Douglas Hofstadter, Edward Charles Pickering, Elon Musk, epigenetics, Flash crash, Grace Hopper, Gödel, Escher, Bach, Harvard Computers: women astronomers, index fund, Isaac Newton, John von Neumann, late fees, low earth orbit, Lyft, Magellanic Cloud, mass incarceration, Moneyball by Michael Lewis explains big data, Moravec's paradox, more computing power than Apollo, natural language processing, Netflix Prize, North Sea oil, p-value, pattern recognition, Pierre-Simon Laplace, ransomware, recommendation engine, Ronald Reagan, self-driving car, sentiment analysis, side project, Silicon Valley, Skype, smart cities, speech recognition, statistical model, survivorship bias, the scientific method, Thomas Bayes, Uber for X, uber lyft, universal basic income, Watson beat the top human players on Jeopardy!, young professional

See also artificial intelligence (AI); Bayes’s rule; neural networks Netflix and mammograms Manhattan Project market dominance mathematics computer science and conditional probability “math skill” Newton’s worst mathematical mistake Nightingale, Florence, and pattern recognition and principle of least squares square-root rule (de Moivre’s equation) suggestion engines and twenty questions game and word vectors and See also Bayes’s rule; neural networks; prediction rules maximum heart rate equations for Mayor’s Office of Data Analytics (MODA) medicine. See health care and medicine Medtronic Menger, Karl Microsoft Microsoft Azure modeling assumptions and deep-learning models imputation and Inception latent feature massive models missing data and model rust natural language processing and prediction rules as reality versus rules-based (top-down) models training the model Moneyball Moore’s law Moravec paradox Morgenstern, Oskar Musk, Elon natural language processing (NLP) ambiguity and bottom-up approach chatbots digital assistants future trends Google Translate growth of statistical NLP knowing how versus knowing that natural language revolution “New Deal” for human-machine linguistic interaction prediction rules and programing language revolution robustness and rule bloat and speech recognition top-down approach word co-location statistics word vectors naturally occurring radioactive materials (NORM) Netflix Crown, The (series) data scientists history of House of Cards (series) Netflix Prize for recommender system personalization recommender systems neural networks deep learning and Friends new episodes and Inception model prediction rules and New England Patriots Newton, Isaac Nightingale, Florence coxcomb diagram (1858) Crimean War and early years and training evidence-based medicine legacy of “lady with the lamp” medical statistics legacy of nursing reform legacy of Nvidia Obama, Barack Office of Scientific Research and Development parallax pattern recognition cucumber sorting input and output learning a pattern maximum heart rate and prediction rules and toilet paper theft and See also prediction rules PayPal personalization conditional probability and latent feature models and Netflix and Wald’s survivability recommendations for aircraft and See also recommender systems; suggestion engines philosophy Pickering, Edward C.

Back in 2009, for example, Secretary of State Hillary Clinton made an elaborate show of presenting the Russian foreign minister with a gift: a big red button that was meant to say “Reset” in both English and Russian, to symbolize the Obama administration’s policy of “pressing the reset button” on relations with Russia. The policy didn’t work out so well, though—and neither did the gift, which didn’t say “Reset” in Russian after all, but “Overcharge.” The second thing to keep in mind is that machines are getting better at language—fast. (You must admit that “wang bang” is a creative piece of boxing commentary.) Experts in AI use the term “natural language processing,” or NLP, to describe how we get computers to work with language. Over the last few years, you’ve been living through a period of tremendous growth in successful NLP systems: • Digital assistants like Amazon’s Echo and Google Home are far better than the clunky speech-to-text programs of just a few years ago. They can schedule appointments, make a grocery list, choose a song, or rack up charges on your credit card—all by voice, at a level of transcription accuracy that until recently would have seemed like science fiction

Harpy seemed to suggest that, with better rules and faster computers, human-level performance might be just around the corner.21 Yet these hoped-for improvements in speech recognition never materialized. In later tests involving real-world conditions, Harpy’s word-level accuracy fell to 37%. After five years, the U.S. government cut funding for the project. And today, pure rules-based systems for natural language processing have become vanishingly rare. In the end, they were never able to overcome three basic problems: rule bloat, robustness, and ambiguity. Problem 1: Rule Bloat First, it’s really hard to write down all the rules for natural languages. There are way too many of them, vastly more than any programming language. Although you may not know it, you can actually learn a lot of Python in a day.


Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data by Dipanjan Sarkar

bioinformatics, business intelligence, computer vision, continuous integration, en.wikipedia.org, general-purpose programming language, Guido van Rossum, information retrieval, Internet of things, invention of the printing press, iterative process, natural language processing, out of africa, performance metric, premature optimization, recommendation engine, self-driving car, semantic web, sentiment analysis, speech recognition, statistical model, text mining, Turing test, web application

Besides these popular corpora, there are a vast number of text corpora available that you can check and access with the nltk.corpus module. Thus, you can see how easy it is to access and use data from any text corpus with the help of Python and NLTK. This brings us to the end of our discussion about text corpora. The following sections cover some ground regarding NLP and text analytics. Natural Language Processing I’ve mentioned the term natural language processing (NLP) several times in this chapter. By now, you may have formed some idea about what NLP means. NLP is defined as a specialized field of computer science and engineering and artificial intelligence with roots in computational linguistics. It is primarily concerned with designing and building applications and systems that enable interaction between machines and natural languages evolved for use by humans.

Analytics, data science, and more recently text analytics came much later, perhaps around four or five years ago when the hype about Big Data and Analytics was getting bigger and crazier. Personally I think a lot of it is over-hyped, but a lot of it is also exciting and presents huge possibilities with regard to new jobs, new discoveries, and solving problems that were previously deemed impossible to solve. Natural Language Processing (NLP) has always caught my eye because the human brain and our cognitive abilities are really fascinating. The ability to communicate information, complex thoughts, and emotions with such little effort is staggering once you think about trying to replicate that ability in machines. Of course, we are advancing by leaps and bounds with regard to cognitive computing and artificial intelligence (AI), but we are not there yet.

Contents Chapter 1:​ Natural Language Basics Natural Language What Is Natural Language?​ The Philosophy of Language Language Acquisition and Usage Linguistics Language Syntax and Structure Words Phrases Clauses Grammar Word Order Typology Language Semantics Lexical Semantic Relations Semantic Networks and Models Representation of Semantics Text Corpora Corpora Annotation and Utilities Popular Corpora Accessing Text Corpora Natural Language Processing Machine Translation Speech Recognition Systems Question Answering Systems Contextual Recognition and Resolution Text Summarization Text Categorization Text Analytics Summary Chapter 2:​ Python Refresher Getting to Know Python The Zen of Python Applications:​ When Should You Use Python?​ Drawbacks:​ When Should You Not Use Python?​ Python Implementations and Versions Installation and Setup Which Python Version?​


pages: 541 words: 109,698

Mining the Social Web: Finding Needles in the Social Haystack by Matthew A. Russell

Climategate, cloud computing, crowdsourcing, en.wikipedia.org, fault tolerance, Firefox, full text search, Georg Cantor, Google Earth, information retrieval, Mark Zuckerberg, natural language processing, NP-complete, Saturday Night Live, semantic web, Silicon Valley, slashdot, social graph, social web, statistical model, Steve Jobs, supply-chain management, text mining, traveling salesman, Turing test, web application

(Chapter 8 introduces a fundamental paradigm shift away from the tools in this chapter and should make the differences more pronounced than they may seem if you haven’t read that material yet.) If you’d like to try applying the techniques from this chapter to the Web (in general), you might want to check out Scrapy, an easy-to-use and mature web scraping and crawling framework. Chapter 8. Blogs et al.: Natural Language Processing (and Beyond) This chapter is a modest attempt to introduce Natural Language Processing (NLP) and apply it to the unstructured data in blogs. In the spirit of the prior chapters, it attempts to present the minimal level of detail required to empower you with a solid general understanding of an inherently complex topic, while also providing enough of a technical drill-down that you’ll be able to immediately get to work mining some data.

plotting geo data via microform.at and Google Maps, Plotting geo data via microform.at and Google Maps hRecipe, Slicing and Dicing Recipes (for the Health of It), Slicing and Dicing Recipes (for the Health of It) hReview data for recipe reviews, Collecting Restaurant Reviews, Collecting Restaurant Reviews popular, for embedding structured data into web pages, XFN and Friends semantic markup, XFN and Friends XFN, XFN and Friends, Exploring Social Connections with XFN, Brief analysis of breadth-first techniques using to explore social connections, Exploring Social Connections with XFN, Brief analysis of breadth-first techniques multiquery (FQL), Slicing and dicing data with FQL N n-gram similarity, Common Similarity Metrics for Clustering n-grams, Common Similarity Metrics for Clustering, Buzzing on Bigrams defined, Common Similarity Metrics for Clustering n-squared problem, Motivation for Clustering natural language processing, Frequency Analysis and Lexical Diversity (see NLP) Natural Language Toolkit, Frequency Analysis and Lexical Diversity (see NLTK) natural numbers, Elementary Set Operations nested query (FQL), Slicing and dicing data with FQL NetworkX, Installing Python Development Tools, Installing Python Development Tools, Extracting relationships from the tweets, Extracting relationships from the tweets, Constructing Friendship Graphs, Clique Detection and Analysis building graph describing retweet data, Extracting relationships from the tweets, Extracting relationships from the tweets exporting Redis friend/follower data to for graph analysis, Constructing Friendship Graphs finding cliques in Twitter friendship data, Clique Detection and Analysis installing, Installing Python Development Tools using to create graph of nodes and edges, Installing Python Development Tools *nix (Linux/Unix) environment, Or Not to Read This Book? NLP (natural language processing), Blogs et al.: Natural Language Processing (and Beyond), Closing Remarks, NLP: A Pareto-Like Introduction, A Brief Thought Exercise, A Typical NLP Pipeline with NLTK, A Typical NLP Pipeline with NLTK, Sentence Detection in Blogs with NLTK, Sentence Detection in Blogs with NLTK, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm, Analysis of Luhn’s Summarization Algorithm, Entity-Centric Analysis: A Deeper Understanding of the Data, Quality of Analytics, Quality of Analytics entity-centric analysis, Entity-Centric Analysis: A Deeper Understanding of the Data, Quality of Analytics, Quality of Analytics quality of analytics, Quality of Analytics sentence detection in blogs with NLTK, Sentence Detection in Blogs with NLTK, Sentence Detection in Blogs with NLTK summarizing documents, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm, Analysis of Luhn’s Summarization Algorithm analysis of Luhn’s algorithm, Analysis of Luhn’s Summarization Algorithm syntax and semantics, NLP: A Pareto-Like Introduction thought exercise, A Brief Thought Exercise typical NLP pipeline with NLTK, A Typical NLP Pipeline with NLTK, A Typical NLP Pipeline with NLTK NLTK (Natural Language Toolkit), Frequency Analysis and Lexical Diversity, Frequency Analysis and Lexical Diversity, What are people talking about right now?

Although these are not difficult to compute, we’d be better off installing a tool that offers a built-in frequency distribution and many other tools for text analysis. The Natural Language Toolkit (NLTK) is a popular module we’ll use throughout this book: it delivers a vast amount of tools for various kinds of text analytics, including the calculation of common metrics, information extraction, and natural language processing (NLP). Although NLTK isn’t necessarily state-of-the-art as compared to ongoing efforts in the commercial space and academia, it nonetheless provides a solid and broad foundation—especially if this is your first experience trying to process natural language. If your project is sufficiently sophisticated that the quality or efficiency that NLTK provides isn’t adequate for your needs, you have approximately three options, depending on the amount of time and money you are willing to put in: scour the open source space for a suitable alternative by running comparative experiments and benchmarks, churn through whitepapers and prototype your own toolkit, or license a commercial product.


pages: 122 words: 29,286

Learning Scikit-Learn: Machine Learning in Python by Raúl Garreta, Guillermo Moncecchi

computer vision, Debian, Everything should be made as simple as possible, natural language processing, Occam's razor, Silicon Valley

Also, I would like to have a special mention to the open source Python and scikit-learn community for their dedication and professionalism in developing these beautiful tools. Guillermo Moncecchi is a Natural Language Processing researcher at the Universidad de la República of Uruguay. He received a PhD in Informatics from the Universidad de la República, Uruguay and a Ph.D in Language Sciences from the Université Paris Ouest, France. He has participated in several international projects on NLP. He has almost 15 years of teaching experience on Automata Theory, Natural Language Processing, and Machine Learning. He also works as Head Developer at the Montevideo Council and has lead the development of several public services for the council, particularly in the Geographical Information Systems area.

ISBN 978-1-78328-193-0 www.packtpub.com Cover Image by Faiz Fattohi (<faizfattohi@gmail.com>) Credits Authors Raúl Garreta Guillermo Moncecchi Reviewers Andreas Hjortgaard Danielsen Noel Dawe Gavin Hackeling Acquisition Editors Kunal Parikh Owen Roberts Commissioning Editor Deepika Singh Technical Editors Shashank Desai Iram Malik Copy Editors Sarang Chari Janbal Dharmaraj Aditya Nair Project Coordinator Aboli Ambardekar Proofreader Katherine Tarr Indexer Monica Ajmera Mehta Graphics Abhinash Sahu Production Co-ordinator Pooja Chiplunkar Cover Work Pooja Chiplunkar About the Authors Raúl Garreta is a Computer Engineer with much experience in the theory and application of Artificial Intelligence (AI), where he specialized in Machine Learning and Natural Language Processing (NLP). He has an entrepreneur profile with much interest in the application of science, technology, and innovation to the Internet industry and startups. He has worked in many software companies, handling everything from video games to implantable medical devices. In 2009, he co-founded Tryolabs with the objective to apply AI to the development of intelligent software products, where he performs as the CTO and Product Manager of the company.

Supervised Learning In Chapter 1, Machine Learning – A Gentle Introduction, we sketched the general idea of a supervised learning algorithm. We have the training data where each instance has an input (a set of attributes) and a desired output (a target class). Then we use this data to train a model that will predict the same target class for new unseen instances. Supervised learning methods are nowadays a standard tool in a wide range of disciplines, from medical diagnosis to natural language processing, image recognition, and searching for new particles at the Large Hadron Collider (LHC). In this chapter we will present several methods applied to several real-world examples by using some of the many algorithms implemented in scikit-learn. This chapter does not intend to substitute the scikit-learn reference, but is an introduction to the main supervised learning techniques and shows how they can be used to solve practical problems.


pages: 347 words: 97,721

Only Humans Need Apply: Winners and Losers in the Age of Smart Machines by Thomas H. Davenport, Julia Kirby

AI winter, Andy Kessler, artificial general intelligence, asset allocation, Automated Insights, autonomous vehicles, basic income, Baxter: Rethink Robotics, business intelligence, business process, call centre, carbon-based life, Clayton Christensen, clockwork universe, commoditize, conceptual framework, dark matter, David Brooks, deliberate practice, deskilling, digital map, disruptive innovation, Douglas Engelbart, Edward Lloyd's coffeehouse, Elon Musk, Erik Brynjolfsson, estate planning, fixed income, follow your passion, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, game design, general-purpose programming language, global pandemic, Google Glasses, Hans Lippershey, haute cuisine, income inequality, index fund, industrial robot, information retrieval, intermodal, Internet of things, inventory management, Isaac Newton, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joi Ito, Khan Academy, knowledge worker, labor-force participation, lifelogging, longitudinal study, loss aversion, Mark Zuckerberg, Narrative Science, natural language processing, Norbert Wiener, nuclear winter, pattern recognition, performance metric, Peter Thiel, precariat, quantitative trading / quantitative finance, Ray Kurzweil, Richard Feynman, risk tolerance, Robert Shiller, Robert Shiller, Rodney Brooks, Second Machine Age, self-driving car, Silicon Valley, six sigma, Skype, social intelligence, speech recognition, spinning jenny, statistical model, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, superintelligent machines, supply-chain management, transaction costs, Tyler Cowen: Great Stagnation, Watson beat the top human players on Jeopardy!, Works Progress Administration, Zipcar

Finally, people who are interested in programming in this context should be interested in and knowledgeable about some aspect of this field’s key movements: artificial intelligence, natural language processing (NLP), machine learning, deep-learning neural networks, statistical analysis and data mining, and so forth. If you have a basic grounding in computer science and programming, it is possible to develop a sufficient understanding of these automation-oriented tools well into your career. Today there are many online courses related to this field. Stanford professors, for example, have created online courses with companies like Coursera and Udacity in such highly relevant fields as machine learning, natural language processing, algorithms, and robotics. You have to be pretty motivated to finish such courses, but it can be done. And as the Watson jobs we list above suggest, there are also plenty of IT-oriented jobs that don’t just involve programming.

The friend is an independent consultant, so it was slightly surprising to learn, by being cc’d on an email, that he employed an assistant, “Amy.” He wrote: Hi Amy, Would you please send an invite for Tom and me for Friday 9/19 at 9:30A.M. at Hi-Rise Cafe in Cambridge, MA. We will be meeting in person. Thanks, Judah Curiosity getting the best of him, Tom looked up the company in Amy’s email extension, @x.ai. It turns out X.ai is a company that uses “natural language processing” software to interpret text and schedule meetings via email. “Amy,” in other words, is automated. Meanwhile, other tools such as email and voice mail, word processing, online travel sites, and Internet search applications have been chipping away the rest of what used to be a secretarial job. Era Two automation doesn’t only affect office workers. It washes across the entire services-based economy that arose after massive productivity gains wiped out jobs in agriculture, then manufacturing.

Our observation is that the experts engaging in the current debate about knowledge work automation tend to fall into two camps—those who say we are heading inexorably toward permanent high levels of unemployment and those who are certain new job types will spring up to replace all the ones that go by the wayside—but that neither camp suggests to workers that there is much they can do personally about the situation. Our main mission in the next couple hundred pages is to persuade you, our knowledge worker reader, that you remain in charge of your destiny. You should be feeling a sense of agency and making decisions for yourself as to how you will deal with advancing automation. Over the past few years, even as every week brings news of some breakthrough in machine learning or natural language processing or visual image recognition, we’ve been learning from knowledge workers who are thriving. They’re redefining what it means to be more capable than computers, and doubling down on their very human strengths. As you’ll find in the chapters to come, these are not superhumans who can somehow process information more quickly than artificial intelligence or perform repetitive tasks as flawlessly as robots.


pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron

Amazon Mechanical Turk, Anton Chekhov, combinatorial explosion, computer vision, constrained optimization, correlation coefficient, crowdsourcing, don't repeat yourself, Elon Musk, en.wikipedia.org, friendly AI, ImageNet competition, information retrieval, iterative process, John von Neumann, Kickstarter, natural language processing, Netflix Prize, NP-complete, optical character recognition, P = NP, p-value, pattern recognition, pull request, recommendation engine, self-driving car, sentiment analysis, SpamAssassin, speech recognition, stochastic process

Pac-Man Using Deep Q-Learning min_after_dequeue, RandomShuffleQueue MNIST dataset, MNIST-MNIST model parallelism, Model Parallelism-Model Parallelism model parameters, Gradient Descent, Batch Gradient Descent, Early Stopping, Under the Hood, Quadratic Programming, Creating Your First Graph and Running It in a Session, Construction Phase, Training RNNsdefining, Model-based learning model selection, Model-based learning model zoos, Model Zoos model-based learning, Model-based learning-Model-based learning modelsanalyzing, Analyze the Best Models and Their Errors-Analyze the Best Models and Their Errors evaluating on test set, Evaluate Your System on the Test Set-Evaluate Your System on the Test Set moments, Adam Optimization Momentum optimization, Momentum optimization-Momentum optimization Monte Carlo tree search, Policy Gradients Multi-Layer Perceptrons (MLP), Introduction to Artificial Neural Networks, The Perceptron-Multi-Layer Perceptron and Backpropagation, Neural Network Policiestraining with TF.Learn, Training an MLP with TensorFlow’s High-Level API multiclass classifiers, Multiclass Classification-Multiclass Classification Multidimensional Scaling (MDS), Other Dimensionality Reduction Techniques multilabel classifiers, Multilabel Classification-Multilabel Classification Multinomial Logistic Regression (see Softmax Regression) multinomial(), Neural Network Policies multioutput classifiers, Multioutput Classification-Multioutput Classification MultiRNNCell, Distributing a Deep RNN Across Multiple GPUs multithreaded readers, Multithreaded readers using a Coordinator and a QueueRunner-Multithreaded readers using a Coordinator and a QueueRunner multivariate regression, Frame the Problem N naive Bayes classifiers, Multiclass Classification name scopes, Name Scopes natural language processing (NLP), Recurrent Neural Networks, Natural Language Processing-An Encoder–Decoder Network for Machine Translationencoder-decoder network for machine translation, An Encoder–Decoder Network for Machine Translation-An Encoder–Decoder Network for Machine Translation TensorFlow tutorials, Natural Language Processing, An Encoder–Decoder Network for Machine Translation word embeddings, Word Embeddings-Word Embeddings Nesterov Accelerated Gradient (NAG), Nesterov Accelerated Gradient-Nesterov Accelerated Gradient Nesterov momentum optimization, Nesterov Accelerated Gradient-Nesterov Accelerated Gradient network topology, Fine-Tuning Neural Network Hyperparameters neural network hyperparameters, Fine-Tuning Neural Network Hyperparameters-Activation Functionsactivation functions, Activation Functions neurons per hidden layer, Number of Neurons per Hidden Layer number of hidden layers, Number of Hidden Layers-Number of Hidden Layers neural network policies, Neural Network Policies-Neural Network Policies neuronsbiological, From Biological to Artificial Neurons-Biological Neurons logical computations with, Logical Computations with Neurons neuron_layer(), Construction Phase next_batch(), Execution Phase No Free Lunch theorem, Testing and Validating node edges, Visualizing the Graph and Training Curves Using TensorBoard nonlinear dimensionality reduction (NLDR), LLE(see also Kernel PCA; LLE (Locally Linear Embedding)) nonlinear SVM classification, Nonlinear SVM Classification-Computational Complexitycomputational complexity, Computational Complexity Gaussian RBF kernel, Gaussian RBF Kernel-Gaussian RBF Kernel with polynomial features, Nonlinear SVM Classification-Polynomial Kernel polynomial kernel, Polynomial Kernel-Polynomial Kernel similarity features, adding, Adding Similarity Features-Adding Similarity Features nonparametric models, Regularization Hyperparameters nonresponse bias, Nonrepresentative Training Data nonsaturating activation functions, Nonsaturating Activation Functions-Nonsaturating Activation Functions normal distribution (see Gaussian distribution) Normal Equation, The Normal Equation-Computational Complexity normalization, Feature Scaling normalized exponential, Softmax Regression norms, Select a Performance Measure notations, Select a Performance Measure-Select a Performance Measure NP-Complete problems, The CART Training Algorithm null hypothesis, Regularization Hyperparameters numerical differentiation, Numerical Differentiation NumPy, Create the Workspace NumPy arrays, Handling Text and Categorical Attributes NVidia Compute Capability, Installation nvidia-smi, Managing the GPU RAM n_components, Choosing the Right Number of Dimensions O observation space, Neural Network Policies off-policy algorithm, Temporal Difference Learning and Q-Learning offline learning, Batch learning one-hot encoding, Handling Text and Categorical Attributes one-versus-all (OvA) strategy, Multiclass Classification, Softmax Regression, Exercises one-versus-one (OvO) strategy, Multiclass Classification online learning, Online learning-Online learning online SVMs, Online SVMs-Online SVMs OpenAI Gym, Introduction to OpenAI Gym-Introduction to OpenAI Gym operation_timeout_in_ms, In-Graph Versus Between-Graph Replication Optical Character Recognition (OCR), The Machine Learning Landscape optimal state value, Markov Decision Processes optimizers, Faster Optimizers-Learning Rate SchedulingAdaGrad, AdaGrad-AdaGrad Adam optimization, Faster Optimizers, Adam Optimization-Adam Optimization Gradient Descent (see Gradient Descent optimizer) learning rate scheduling, Learning Rate Scheduling-Learning Rate Scheduling Momentum optimization, Momentum optimization-Momentum optimization Nesterov Accelerated Gradient (NAG), Nesterov Accelerated Gradient-Nesterov Accelerated Gradient RMSProp, RMSProp out-of-bag evaluation, Out-of-Bag Evaluation-Out-of-Bag Evaluation out-of-core learning, Online learning out-of-memory (OOM) errors, Static Unrolling Through Time out-of-sample error, Testing and Validating OutOfRangeError, Reading the training data directly from the graph, Multithreaded readers using a Coordinator and a QueueRunner output gate, LSTM Cell output layer, Multi-Layer Perceptron and Backpropagation OutputProjectionWrapper, Training to Predict Time Series-Training to Predict Time Series output_put_keep_prob, Applying Dropout overcomplete autoencoder, Unsupervised Pretraining Using Stacked Autoencoders overfitting, Overfitting the Training Data-Overfitting the Training Data, Create a Test Set, Soft Margin Classification, Gaussian RBF Kernel, Regularization Hyperparameters, Regression, Number of Neurons per Hidden Layeravoiding through regularization, Avoiding Overfitting Through Regularization-Data Augmentation P p-value, Regularization Hyperparameters PaddingFIFOQueue, PaddingFifoQueue Pandas, Create the Workspace, Download the Datascatter_matrix, Looking for Correlations-Looking for Correlations parallel distributed computing, Distributing TensorFlow Across Devices and Servers-Exercisesdata parallelism, Data Parallelism-TensorFlow implementation in-graph versus between-graph replication, In-Graph Versus Between-Graph Replication-Model Parallelism model parallelism, Model Parallelism-Model Parallelism multiple devices across multiple servers, Multiple Devices Across Multiple Servers-Other convenience functionsasynchronous communication using queues, Asynchronous Communication Using TensorFlow Queues-PaddingFifoQueue loading training data, Loading Data Directly from the Graph-Other convenience functions master and worker services, The Master and Worker Services opening a session, Opening a Session pinning operations across tasks, Pinning Operations Across Tasks sharding variables, Sharding Variables Across Multiple Parameter Servers sharing state across sessions, Sharing State Across Sessions Using Resource Containers-Sharing State Across Sessions Using Resource Containers multiple devices on a single machine, Multiple Devices on a Single Machine-Control Dependenciescontrol dependencies, Control Dependencies installation, Installation-Installation managing the GPU RAM, Managing the GPU RAM-Managing the GPU RAM parallel execution, Parallel Execution-Parallel Execution placing operations on devices, Placing Operations on Devices-Soft placement one neural network per device, One Neural Network per Device-One Neural Network per Device parameter efficiency, Number of Hidden Layers parameter matrix, Softmax Regression parameter server (ps), Multiple Devices Across Multiple Servers parameter space, Gradient Descent parameter vector, Linear Regression, Gradient Descent, Training and Cost Function, Softmax Regression parametric models, Regularization Hyperparameters partial derivative, Batch Gradient Descent partial_fit(), Incremental PCA Pearson's r, Looking for Correlations peephole connections, Peephole Connections penalties (see rewards, in RL) percentiles, Take a Quick Look at the Data Structure Perceptron convergence theorem, The Perceptron Perceptrons, The Perceptron-Multi-Layer Perceptron and Backpropagationversus Logistic Regression, The Perceptron training, The Perceptron-The Perceptron performance measures, Select a Performance Measure-Select a Performance Measureconfusion matrix, Confusion Matrix-Confusion Matrix cross-validation, Measuring Accuracy Using Cross-Validation-Measuring Accuracy Using Cross-Validation precision and recall, Precision and Recall-Precision/Recall Tradeoff ROC (receiver operating characteristic) curve, The ROC Curve-The ROC Curve performance scheduling, Learning Rate Scheduling permutation(), Create a Test Set PG algorithms, Policy Gradients photo-hosting services, Semisupervised learning pinning operations, Pinning Operations Across Tasks pip, Create the Workspace Pipeline constructor, Transformation Pipelines-Select and Train a Model pipelines, Frame the Problem placeholder nodes, Feeding Data to the Training Algorithm placers (see simple placer; dynamic placer) policy, Policy Search policy gradients, Policy Search (see PG algorithms) policy space, Policy Search polynomial features, adding, Nonlinear SVM Classification-Polynomial Kernel polynomial kernel, Polynomial Kernel-Polynomial Kernel, Kernelized SVM Polynomial Regression, Training Models, Polynomial Regression-Polynomial Regressionlearning curves in, Learning Curves-Learning Curves pooling kernel, Pooling Layer pooling layer, Pooling Layer-Pooling Layer power scheduling, Learning Rate Scheduling precision, Confusion Matrix precision and recall, Precision and Recall-Precision/Recall TradeoffF-1 score, Precision and Recall-Precision and Recall precision/recall (PR) curve, The ROC Curve precision/recall tradeoff, Precision/Recall Tradeoff-Precision/Recall Tradeoff predetermined piecewise constant learning rate, Learning Rate Scheduling predict(), Data Cleaning predicted class, Confusion Matrix predictions, Confusion Matrix-Confusion Matrix, Decision Function and Predictions-Decision Function and Predictions, Making Predictions-Estimating Class Probabilities predictors, Supervised learning, Data Cleaning preloading training data, Preload the data into a variable PReLU (parametric leaky ReLU), Nonsaturating Activation Functions preprocessed attributes, Take a Quick Look at the Data Structure pretrained layers reuse, Reusing Pretrained Layers-Pretraining on an Auxiliary Taskauxiliary task, Pretraining on an Auxiliary Task-Pretraining on an Auxiliary Task caching frozen layers, Caching the Frozen Layers freezing lower layers, Freezing the Lower Layers model zoos, Model Zoos other frameworks, Reusing Models from Other Frameworks TensorFlow model, Reusing a TensorFlow Model-Reusing a TensorFlow Model unsupervised pretraining, Unsupervised Pretraining-Unsupervised Pretraining upper layers, Tweaking, Dropping, or Replacing the Upper Layers Pretty Tensor, Up and Running with TensorFlow primal problem, The Dual Problem principal component, Principal Components Principal Component Analysis (PCA), PCA-Randomized PCAexplained variance ratios, Explained Variance Ratio finding principal components, Principal Components-Principal Components for compression, PCA for Compression-Incremental PCA Incremental PCA, Incremental PCA-Randomized PCA Kernel PCA (kPCA), Kernel PCA-Selecting a Kernel and Tuning Hyperparameters projecting down to d dimensions, Projecting Down to d Dimensions Randomized PCA, Randomized PCA Scikit Learn for, Using Scikit-Learn variance, preserving, Preserving the Variance-Preserving the Variance probabilistic autoencoders, Variational Autoencoders probabilities, estimating, Estimating Probabilities-Estimating Probabilities, Estimating Class Probabilities producer functions, Other convenience functions projection, Projection-Projection propositional logic, From Biological to Artificial Neurons pruning, Regularization Hyperparameters, Symbolic Differentiation Pythonisolated environment in, Create the Workspace-Create the Workspace notebooks in, Create the Workspace-Download the Data pickle, Better Evaluation Using Cross-Validation pip, Create the Workspace Q Q-Learning algorithm, Temporal Difference Learning and Q-Learning-Learning to Play Ms.

Equation 14-4 summarizes how to compute the cell’s state at each time step for a single instance. Equation 14-4. GRU computations Creating a GRU cell in TensorFlow is trivial: gru_cell = tf.contrib.rnn.GRUCell(num_units=n_neurons) LSTM or GRU cells are one of the main reasons behind the success of RNNs in recent years, in particular for applications in natural language processing (NLP). Natural Language Processing Most of the state-of-the-art NLP applications, such as machine translation, automatic summarization, parsing, sentiment analysis, and more, are now based (at least in part) on RNNs. In this last section, we will take a quick look at what a machine translation model looks like. This topic is very well covered by TensorFlow’s awesome Word2Vec and Seq2Seq tutorials, so you should definitely check them out.

Pac-Man Using Deep Q-Learning R Radial Basis Function (RBF), Adding Similarity Features Random Forests, Better Evaluation Using Cross-Validation-Grid Search, Multiclass Classification, Decision Trees, Instability, Ensemble Learning and Random Forests, Random Forests-Feature ImportanceExtra-Trees, Extra-Trees feature importance, Feature Importance-Feature Importance random initialization, Gradient Descent, Batch Gradient Descent, Stochastic Gradient Descent, Vanishing/Exploding Gradients Problems Random Patches and Random Subspaces, Random Patches and Random Subspaces randomized leaky ReLU (RReLU), Nonsaturating Activation Functions Randomized PCA, Randomized PCA randomized search, Randomized Search, Fine-Tuning Neural Network Hyperparameters RandomShuffleQueue, RandomShuffleQueue, Reading the training data directly from the graph random_uniform(), Manually Computing the Gradients reader operations, Reading the training data directly from the graph recall, Confusion Matrix recognition network, Efficient Data Representations reconstruction error, PCA for Compression reconstruction loss, Efficient Data Representations, TensorFlow Implementation, Variational Autoencoders reconstruction pre-image, Selecting a Kernel and Tuning Hyperparameters reconstructions, Efficient Data Representations recurrent neural networks (RNNs), Recurrent Neural Networks-Exercisesdeep RNNs, Deep RNNs-The Difficulty of Training over Many Time Steps exploration policies, Exploration Policies GRU cell, GRU Cell-GRU Cell input and output sequences, Input and Output Sequences-Input and Output Sequences LSTM cell, LSTM Cell-GRU Cell natural language processing (NLP), Natural Language Processing-An Encoder–Decoder Network for Machine Translation in TensorFlow, Basic RNNs in TensorFlow-Handling Variable-Length Output Sequencesdynamic unrolling through time, Dynamic Unrolling Through Time static unrolling through time, Static Unrolling Through Time-Static Unrolling Through Time variable length input sequences, Handling Variable Length Input Sequences variable length output sequences, Handling Variable-Length Output Sequences training, Training RNNs-Creative RNNbackpropagation through time (BPTT), Training RNNs creative sequences, Creative RNN sequence classifiers, Training a Sequence Classifier-Training a Sequence Classifier time series predictions, Training to Predict Time Series-Training to Predict Time Series recurrent neurons, Recurrent Neurons-Input and Output Sequencesmemory cells, Memory Cells reduce_mean(), Construction Phase reduce_sum(), TensorFlow Implementation-TensorFlow Implementation, Variational Autoencoders, Learning to Play Ms.


pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel

Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, butter production in bangladesh, call centre, Charles Lindbergh, commoditize, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil, en.wikipedia.org, Erik Brynjolfsson, Everything should be made as simple as possible, experimental subject, Google Glasses, happiness index / gross national happiness, job satisfaction, Johann Wolfgang von Goethe, lifelogging, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mass immigration, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, Shai Danziger, software as a service, speech recognition, statistical model, Steven Levy, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Davenport, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra, zero-sum game

self-driving cars spam filtering Google Adwords Google Flu Trends Google Glass Google Page Rank government data storage by fraud detection for invoices PA for public access to data GPS data grades, predicting Granger, Clive grant awards, predicting Greenspan, Alan Grockit Groundhog Day (film) Grundhoefer, Michael H hackers, predicting Halder, Gitali HAL (intelligent computer) Hansell, Saul happiness, social effect and Harbor Sweets Harcourt, Bernard Harrah’s Las Vegas Harris, Jeanne Harvard Medical School Harvard University Hastings, Reed healthcare death predictions in health risks, predicting hospital admissions, predicting influenza, predicting medical research, predicting in medical treatments, risks for wrong predictions in medical treatments, testing persuasion in PA for personalized medicine, uplift modeling applications for health insurance companies, PA for Hebrew University Heisenberg, Werner Karl Helle, Eva Helsinki Brain Research Centre Hennessey, Kathleen Heraclitus Heritage Health Prize Heritage Provider Network Hewlett Foundation Hewlett-Packard (HP) employee data used by financial savings and benefits of PA Global Business Services (GBS) quitting and Flight Risks, predicting sales leads, predicting turnover rates at warranty claims and fraud detection High Anxiety (film) HIV progression, predicting HIV treatments, uplift modeling for Hollifield, Stephen Holmes, Sherlock hormone replacement, coronary disease and hospital admissions, predicting Hotmail.com House (TV show) “How Companies Learn Your Secrets” (Duhigg) Howe, Jeff HP. See Hewlett-Packard (HP) Hubbard, Douglas human behavior collective intelligence consumer behavior insights emotions and mood prediction mistakes, predicting social effect and human genome human language inappropriate comments, predicting mood predictions and natural language processing (NLP) PA for persuasion and influence in human resources. See employees and staff I IBM corporate roll-ups Deep Blue computer DeepQA project Iambic IBM AI mind-reading technology natural language processing research sales leads, predicting student performance PA contest T. J. Watson Research Center value of See also Watson computer ID3 impact modeling. See uplift modeling Imperium incremental impact modeling. See uplift modeling incremental response modeling. See uplift modeling India Indiana University Induction Effect, The induction vs. deduction inductive bias infidelity, predicting Infinity Insurance influence.

2001: A Space Odyssey’s smart and talkative computer, HAL, bears a legendary, disputed connection in nomenclature to IBM (just take each letter back one position in the alphabet); however, author Arthur C. Clarke has strenuously denied that this was intentional. Ask IBM researchers whether their question answering Watson system is anything like HAL, which goes famously rogue in the film, and they’ll quickly reroute your comparison toward the obedient computers of Star Trek. The field of research that develops technology to work with human language is natural language processing (NLP, aka computational linguistics). In commercial application, it’s known as text analytics. These fields develop analytical methods especially designed to operate across the written word. If data is all Earth’s water, textual data is the part known as “the ocean.” Often said to compose 80 percent of all data, it’s everything we the human race know that we’ve bothered to write down.

They were tackling the breadth of human language that stretches beyond the phrasing of each question to include a sea of textual sources, from which the answer to each question must be extracted. With this ambition, IBM had truly doubled down. I would have thought success impossible. After witnessing the world’s best researchers attempting to tackle the task through the 1990s (during which I spent six years in natural language processing research, as well as a summer at the same IBM Research center that bore Watson), I was ready to throw up my hands. Language is so tough that it seemed virtually impossible even to program a computer to answer questions within a limited domain of knowledge such as movies or wines. Yet IBM had taken on the unconstrained, open field of questions across any domain. Meeting this challenge would demonstrate such a great leap toward humanlike capabilities that it invokes the “I” word: intelligence.


pages: 586 words: 186,548

Architects of Intelligence by Martin Ford

3D printing, agricultural Revolution, AI winter, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, barriers to entry, basic income, Baxter: Rethink Robotics, Bayesian statistics, bitcoin, business intelligence, business process, call centre, cloud computing, cognitive bias, Colonization of Mars, computer vision, correlation does not imply causation, crowdsourcing, DARPA: Urban Challenge, deskilling, disruptive innovation, Donald Trump, Douglas Hofstadter, Elon Musk, Erik Brynjolfsson, Ernest Rutherford, Fellow of the Royal Society, Flash crash, future of work, gig economy, Google X / Alphabet X, Gödel, Escher, Bach, Hans Rosling, ImageNet competition, income inequality, industrial robot, information retrieval, job automation, John von Neumann, Law of Accelerating Returns, life extension, Loebner Prize, Mark Zuckerberg, Mars Rover, means of production, Mitch Kapor, natural language processing, new economy, optical character recognition, pattern recognition, phenotype, Productivity paradox, Ray Kurzweil, recommendation engine, Robert Gordon, Rodney Brooks, Sam Altman, self-driving car, sensor fusion, sentiment analysis, Silicon Valley, smart cities, social intelligence, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, Ted Kaczynski, The Rise and Fall of American Growth, theory of mind, Thomas Bayes, Travis Kalanick, Turing test, universal basic income, Wall-E, Watson beat the top human players on Jeopardy!, women in the workforce, working-age population, zero-sum game, Zipcar

From that insight, I came to realize that as human beings we don’t in general ever speak in a sequence of isolated utterances, but that there’s always a larger structure, much like there is for a journal article, a newspaper article, a textbook, even for this book, and that we can model that structure. This was my first major contribution to natural-language processing and AI. MARTIN FORD: You’ve touched on one of the natural language breakthroughs that you’re most known for: an effort to somehow model a conversation. The idea that a conversation can be computed, and that there’s some structure within a conversation that can be represented mathematically. I assume that this has become very important, because we’ve seen a lot of progress in the field. Maybe you could talk about some of the work you’ve done there and how things have progressed. Has it astonished you where things are at now in terms of natural language processing, compared to where they were back when you started your research? BARBARA GROSZ: It absolutely has astonished me.

Because scientists are inundated with more and more publications, we realize that scientists, just like all of us when we’re experiencing information overload, really need help in cutting through that clutter; and that’s what Semantic Scholar does. It uses machine learning and natural language processing, along with various AI techniques, to help scientists figure out what they want to read and how to locate results within papers. MARTIN FORD: Does Mosaic involve symbolic logic? I know there was an older project called Cyc that was a very labor-intensive process, where people would try to write down all the logical rules, such as how objects related, and I think it became kind of unwieldy. Is that the kind of thing you’re doing with Mosaic? OREN ETZIONI: The problem with the Cyc project is that, over 35 years in, it’s really been a struggle for them, for exactly the reasons you said. But in our case, we’re hoping to leverage more modern AI techniques—crowdsourcing, natural language processing, machine learning, and machine vision—in order to acquire knowledge in a different way.

MARTIN FORD: Is Facebook working on building systems that can actually carry out a conversation? YANN LECUN: What I’ve mentioned so far are the fundamental topics of research, but there are a whole bunch of application areas. Facebook is very active in computer vision, and I think we can claim to have the best computer vision research group in the world. It’s a mature group and there are a lot of really cool activities there. We’re putting quite a lot of work into natural language processing, and that includes translation, summarization, text categorization—figuring out what topic a text talks about, as well as dialog systems. Actually, dialog systems are a very important area of research for virtual assistants, question and answering systems, and so on. MARTIN FORD: Do you anticipate the creation of an AI that someday could pass the Turing test? YANN LECUN: It’s going to happen at some point, but the Turing test is not actually an interesting test.


pages: 294 words: 81,292

Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat

AI winter, AltaVista, Amazon Web Services, artificial general intelligence, Asilomar, Automated Insights, Bayesian statistics, Bernie Madoff, Bill Joy: nanobots, brain emulation, cellular automata, Chuck Templeton: OpenTable:, cloud computing, cognitive bias, commoditize, computer vision, cuban missile crisis, Daniel Kahneman / Amos Tversky, Danny Hillis, data acquisition, don't be evil, drone strike, Extropian, finite state, Flash crash, friendly AI, friendly fire, Google Glasses, Google X / Alphabet X, Isaac Newton, Jaron Lanier, John Markoff, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, Loebner Prize, lone genius, mutually assured destruction, natural language processing, Nicholas Carr, optical character recognition, PageRank, pattern recognition, Peter Thiel, prisoner's dilemma, Ray Kurzweil, Rodney Brooks, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, smart grid, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, Stuxnet, superintelligent machines, technological singularity, The Coming Technological Singularity, Thomas Bayes, traveling salesman, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, zero day

When I questioned him at an AGI conference, Google’s Director of Research Peter Norvig, coauthor of the classic AI textbook, Artificial Intelligence: A Modern Approach, said Google wasn’t looking into AGI. He compared the quest to NASA’s plan for manned interplanetary travel. It doesn’t have one. But it will continue to develop the component sciences of traveling in space—rocketry, robotics, astronomy, et cetera—and one day all the pieces will come together, and a shot at Mars will look feasible. Likewise, narrow AI projects do lots of intelligent jobs like search, voice recognition, natural language processing, visual perception, data mining, and much more. Separately they are well-funded, powerful tools, dramatically improving each year. Together they advance the computer sciences that will benefit AGI systems. However, Norvig told me, no AGI program for Google exists. But compare that statement to what his boss, Google cofounder Larry Page said at a London conference called Zeitgeist ’06: People always make the assumption that we’re done with search.

Cyc’s inference engine understands queries and generates answers from its vast knowledge database. Created by AI pioneer Douglas Lenat, Cyc is the largest AI project in history, and probably the best funded, with $50 million in grants from government agencies, including DARPA, since 1984. Cyc’s creators continue to improve its database and inference engine so it can better process “natural language,” or everyday written language. Once it has acquired a sufficient natural language processing (NLP) capability, its creators will start it reading, and comprehending, all the Web pages on the Internet. Another contender for most knowledgeable knowledge database is already doing that. Carnegie Mellon University’s NELL, the Never-Ending-Language-Learning system, knows more than 390,000 facts about the world. Operating 24/7, NELL—a beneficiary of DARPA funding—scans hundreds of millions of Web pages for patterns of text so it can learn even more.

Many know that DARPA (then called ARPA) funded the research that invented the Internet (initially called ARPANET), as well as the researchers who developed the now ubiquitous GUI, or Graphical User Interface, a version of which you probably see every time you use a computer or smart phone. But the agency was also a major backer of parallel processing hardware and software, distributed computing, computer vision, and natural language processing (NLP). These contributions to the foundations of computer science are as important to AI as the results-oriented funding that characterizes DARPA today. How is DARPA spending its money? A recent annual budget allocates $61.3 million to a category called Machine Learning, and $49.3 million to Cognitive Computing. But AI projects are also funded under Information and Communication Technology, $400.5 million, and Classified Programs, $107.2 million.


pages: 398 words: 86,855

Bad Data Handbook by Q. Ethan McCallum

Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, Chuck Templeton: OpenTable:, cloud computing, cognitive dissonance, combinatorial explosion, commoditize, conceptual framework, database schema, DevOps, en.wikipedia.org, Firefox, Flash crash, Gini coefficient, illegal immigration, iterative process, labor-force participation, loose coupling, natural language processing, Netflix Prize, quantitative trading / quantitative finance, recommendation engine, selection bias, sentiment analysis, statistical model, supply-chain management, survivorship bias, text mining, too big to fail, web application

Online business reviews are one of the major input signals we use to determine these classifications. Reviews can tell us the positive or negative sentiment of the reviewer, as well as what they specifically care about, such as quality of service, ambience, and value. When we aggregate reviews, we can learn what’s popular about the place and why people like or dislike it. We use many other signals besides reviews, but with the proper application of natural language processing,[9] reviews are a rich source of significant information. Getting Reviews To get reviews, we use APIs where possible, but most reviews are found using good old-fashioned web scraping. If you can use an API like CityGrid[10] to get the data you need, it will make your life much easier, because while scraping isn’t necessarily difficult, it can be very frustrating. Website HTML can change without notice, and only the simplest or most advanced scraping logic will remain unaffected.

For sentiment analysis, a feature set is a piece of text, like a review, and the possible labels can be pos for positive text, and neg for negative text. Such a sentiment classifier could be run over a business’s reviews in order to calculate an overall sentiment, and to make up for any missing rating information. Sentiment Classification NLTK,[12] Python’s Natural Language ToolKit, is a very useful programming library for doing natural language processing and text classification.[13] It also comes with many corpora that you can use for training and testing. One of these is the movie_reviews corpus,[14] and if you’re just learning how to do sentiment classification, this is a good corpus to start with. It is organized into two directories, pos and neg. In each directory is a set of files containing movie reviews, with every review separated by a blank line.

If every other signal is mostly positive, then showing negative reviews is a disservice to our users and results in a poor experience. By choosing to show only positive reviews, the data, design, and user experience are all congruent, helping our users choose from the best options available based on their own preferences, without having to do any mental filtering of negative opinions. Lessons Learned One important lesson for machine learning and statistical natural language processing enthusiasts: it’s very important to train your own models on your own data. If I had used classifiers trained on the standard movie_reviews corpus, I would never have gotten these results. Movie reviews are simply different than local business reviews. In fact, it might be the case that you’d get even better results by segmenting businesses by type, and creating classifiers for each type of business.


pages: 579 words: 76,657

Data Science from Scratch: First Principles with Python by Joel Grus

correlation does not imply causation, natural language processing, Netflix Prize, p-value, Paul Graham, recommendation engine, SpamAssassin, statistical model

Three bottom-up clusters using max distance For Further Exploration scikit-learn has an entire module sklearn.cluster that contains several clustering algorithms including KMeans and the Ward hierarchical clustering algorithm (which uses a different criterion for merging clusters than ours did). SciPy has two clustering models scipy.cluster.vq (which does k-means) and scipy.cluster.hierarchy (which has a variety of hierarchical clustering algorithms). Chapter 20. Natural Language Processing They have been at a great feast of languages, and stolen the scraps. William Shakespeare Natural language processing (NLP) refers to computational techniques involving language. It’s a broad field, but we’ll look at a few techniques both simple and not simple. Word Clouds In Chapter 1, we computed word counts of users’ interests. One approach to visualizing words and counts is word clouds, which artistically lay out the words with sizes proportional to their counts.

modules (Python), Modules multiple assignment (Python), Tuples N n-gram models, n-gram Models-n-gram Modelsbigram, n-gram Models trigrams, n-gram Models n-grams, n-gram Models Naive Bayes algorithm, Naive Bayes-For Further Explorationexample, filtering spam, A Really Dumb Spam Filter-A More Sophisticated Spam Filter implementation, Implementation natural language processing (NLP), Natural Language Processing-For Further Explorationgrammars, Grammars-Grammars topic modeling, Topic Modeling-Topic Modeling topics of interest, finding, Topics of Interest word clouds, Word Clouds-Word Clouds nearest neighbors classification, k-Nearest Neighbors-For Further Explorationcurse of dimensionality, The Curse of Dimensionality-The Curse of Dimensionality example, favorite programming languages, Example: Favorite Languages-Example: Favorite Languages model, The Model network analysis, Network Analysis-For Further Explorationbetweenness centrality, Betweenness Centrality-Betweenness Centrality closeness centrality, Betweenness Centrality degree centrality, Finding Key Connectors, Betweenness Centrality directed graphs and PageRank, Directed Graphs and PageRank-Directed Graphs and PageRank eigenvector centrality, Eigenvector Centrality-Centrality networks, Network Analysis neural networks, Neural Networks-For Further Explorationbackpropagation, Backpropagation example, defeating a CAPTCHA, Example: Defeating a CAPTCHA-Example: Defeating a CAPTCHA feed-forward, Feed-Forward Neural Networks perceptrons, Perceptrons neurons, Neural Networks NLP (see natural language processing) nodes, Network Analysis noise, Rescalingin machine learning, Overfitting and Underfitting None (Python), Truthiness normal distribution, The Normal Distributionand p-value computation, Example: Flipping a Coin central limit theorem and, The Central Limit Theorem in coin flip example, Example: Flipping a Coin standard, The Normal Distribution normalized tables, JOIN NoSQL databases, NoSQL NotQuiteABase, Databases and SQL null hypothesis, Statistical Hypothesis Testingtesting in A/B test, Example: Running an A/B Test NumPy, NumPy O one-sided tests, Example: Flipping a Coin ORDER BY statement (SQL), ORDER BY overfitting, Overfitting and Underfitting, The Bias-Variance Trade-off P p-hacking, P-hacking p-values, Example: Flipping a Coin PageRank algorithm, Directed Graphs and PageRank paid accounts, predicting, Paid Accounts pandas, For Further Exploration, For Further Exploration, pandas parameterized models, What Is Machine Learning?

modules (Python), Modules multiple assignment (Python), Tuples N n-gram models, n-gram Models-n-gram Modelsbigram, n-gram Models trigrams, n-gram Models n-grams, n-gram Models Naive Bayes algorithm, Naive Bayes-For Further Explorationexample, filtering spam, A Really Dumb Spam Filter-A More Sophisticated Spam Filter implementation, Implementation natural language processing (NLP), Natural Language Processing-For Further Explorationgrammars, Grammars-Grammars topic modeling, Topic Modeling-Topic Modeling topics of interest, finding, Topics of Interest word clouds, Word Clouds-Word Clouds nearest neighbors classification, k-Nearest Neighbors-For Further Explorationcurse of dimensionality, The Curse of Dimensionality-The Curse of Dimensionality example, favorite programming languages, Example: Favorite Languages-Example: Favorite Languages model, The Model network analysis, Network Analysis-For Further Explorationbetweenness centrality, Betweenness Centrality-Betweenness Centrality closeness centrality, Betweenness Centrality degree centrality, Finding Key Connectors, Betweenness Centrality directed graphs and PageRank, Directed Graphs and PageRank-Directed Graphs and PageRank eigenvector centrality, Eigenvector Centrality-Centrality networks, Network Analysis neural networks, Neural Networks-For Further Explorationbackpropagation, Backpropagation example, defeating a CAPTCHA, Example: Defeating a CAPTCHA-Example: Defeating a CAPTCHA feed-forward, Feed-Forward Neural Networks perceptrons, Perceptrons neurons, Neural Networks NLP (see natural language processing) nodes, Network Analysis noise, Rescalingin machine learning, Overfitting and Underfitting None (Python), Truthiness normal distribution, The Normal Distributionand p-value computation, Example: Flipping a Coin central limit theorem and, The Central Limit Theorem in coin flip example, Example: Flipping a Coin standard, The Normal Distribution normalized tables, JOIN NoSQL databases, NoSQL NotQuiteABase, Databases and SQL null hypothesis, Statistical Hypothesis Testingtesting in A/B test, Example: Running an A/B Test NumPy, NumPy O one-sided tests, Example: Flipping a Coin ORDER BY statement (SQL), ORDER BY overfitting, Overfitting and Underfitting, The Bias-Variance Trade-off P p-hacking, P-hacking p-values, Example: Flipping a Coin PageRank algorithm, Directed Graphs and PageRank paid accounts, predicting, Paid Accounts pandas, For Further Exploration, For Further Exploration, pandas parameterized models, What Is Machine Learning?


pages: 252 words: 74,167

Thinking Machines: The Inside Story of Artificial Intelligence and Our Race to Build the Future by Luke Dormehl

Ada Lovelace, agricultural Revolution, AI winter, Albert Einstein, Alexey Pajitnov wrote Tetris, algorithmic trading, Amazon Mechanical Turk, Apple II, artificial general intelligence, Automated Insights, autonomous vehicles, book scanning, borderless world, call centre, cellular automata, Claude Shannon: information theory, cloud computing, computer vision, correlation does not imply causation, crowdsourcing, drone strike, Elon Musk, Flash crash, friendly AI, game design, global village, Google X / Alphabet X, hive mind, industrial robot, information retrieval, Internet of things, iterative process, Jaron Lanier, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, life extension, Loebner Prize, Marc Andreessen, Mark Zuckerberg, Menlo Park, natural language processing, Norbert Wiener, out of africa, PageRank, pattern recognition, Ray Kurzweil, recommendation engine, remote working, RFID, self-driving car, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, social intelligence, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, technological singularity, The Coming Technological Singularity, The Future of Employment, Tim Cook: Apple, too big to fail, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!

Overenthusiasm meant that impressive, if incremental, advances were often written up as though truly smart machines were already here. For example, one heavily hyped project was a 1960s robot called SHAKEY, described as the world’s first general-purpose robot capable of reasoning about its own actions. In doing so, it set benchmarks in fields like pattern recognition, information representation, problem solving and natural language processing. That alone should have been enough to make SHAKEY exciting, but journalists couldn’t resist a bit of embellishment. As such, when SHAKEY appeared in Life magazine in 1970, he was hailed not as a promising combination of several important research topics, but as the world’s ‘first electronic person’. Tying SHAKEY into the space mania still carrying over from the previous year’s Moon landing, Life’s reporter went so far as to claim SHAKEY could ‘travel about the Moon for months at a time without a single beep of direction from the earth’.

In other cases, Siri’s reasoning allows it to extract the relevant concepts from our sentences and connect these with web-based services and data, applying its ever-growing knowledge about you to a series of rules, concepts and contexts. The result is a way of turning requests into actions. ‘I want to eat in the same restaurant I ate in last week,’ is a straightforward enough sentence, but to make it into something useful, an AI assistant such as Siri must not only use natural language processing to understand the concept you are talking about, but also use context to find the right rule in its programming to follow. The speech recognition used in Siri is the creation of Nuance Communications, arguably the most advanced speech recognition company in the world. ‘Our job is to figure out the logical assertions inherent in the question that is being asked, or the command that is being given,’ Nuance’s Distinguished Scientist Ron Kaplan tells me.

This intelligent system should be able to automatically learn new skills and abilities by watching and interacting with its users. DARPA approached the non-profit research institute SRI International about creating a five-year, 500-person investigation, which was, at the time, the largest AI project in history. It brought together experts from a range of AI disciplines, including machine learning, knowledge representation and natural language processing. DARPA’s project was called CALO, standing for Cognitive Assistant that Learns and Organises. The name was inspired by the Latin word ‘calonis’, meaning ‘soldier’s servant’. After half a decade of research, SRI International made the decision to spin-off a consumer-facing version of the technology. In homage to SRI, they called it ‘Siri’, a word that also happens to be Norwegian for ‘beautiful woman who leads you to victory’.


pages: 392 words: 108,745

Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think by James Vlahos

Albert Einstein, AltaVista, Amazon Mechanical Turk, Amazon Web Services, augmented reality, Automated Insights, autonomous vehicles, Chuck Templeton: OpenTable:, cloud computing, computer age, Donald Trump, Elon Musk, information retrieval, Internet of things, Jacques de Vaucanson, Jeff Bezos, lateral thinking, Loebner Prize, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Mark Zuckerberg, Menlo Park, natural language processing, PageRank, pattern recognition, Ponzi scheme, randomized controlled trial, Ray Kurzweil, Ronald Reagan, Rubik’s Cube, self-driving car, sentiment analysis, Silicon Valley, Skype, Snapchat, speech recognition, statistical model, Steve Jobs, Steve Wozniak, Steven Levy, Turing test, Watson beat the top human players on Jeopardy!

Apple was also better positioned than Amazon to pioneer a voice-only device. The company made some of the world’s most beloved consumer electronic devices and had a huge head start on conversational AI with Siri, who had just been unveiled. Amazon didn’t have a substantial track record with consumer products; there was only the Kindle e-reader. And the company didn’t employ legions of experts in speech recognition and natural-language processing. The number of people at Amazon with experience in those fields came to a grand total of two. The company was starting from scratch, and Hart had to suspend his own disbelief. “If we could build it—and I didn’t know the answer to the ‘if’ part—that would be an amazing product,” Hart remembers thinking. Assembling a voice-computing team was especially arduous because Amazon was desperate to keep the project a secret.

In September 2011 Amazon acquired Yap, a North Carolina–based company that specialized in cloud-based speech recognition. Engineers at Lab126—the company’s hardware skunk works in Sunnyvale, California, where the Kindle had been created—worked on designing the device itself. In 2012 Doppler added an office in Boston, which, thanks to all of the city’s academic institutions, was a hotbed of natural-language-processing talent. In October 2012 Amazon acquired a Cambridge, UK–based company called Evi, which specialized in automatically answering spoken questions. And in January 2013 Doppler bought out Ivona, a Polish company that produced synthetic computer voices. Big picture, the problems that the Doppler team had to solve could be divided into two categories. The first group of challenges were those that required engineering—speech recognition and language understanding, for example.

Wherever you were in a room, and whatever else was happening acoustically—music playing, baby crying, Klingons attacking—the device should be able to hear you. “Far-field speech recognition did not exist in any commercial product when we started on this project,” Hart says. “We didn’t know if we could solve it.” Rohit Prasad, a scientist whom Amazon hired in April 2013 to oversee Doppler’s natural-language processing, was uniquely qualified to help out. In the 1990s Prasad had done far-field research for the U.S. military, which wanted a system that could transcribe what everyone was saying in a meeting. Prasad helped to engineer technology that was twice as accurate as anything that had been previously developed. It was still a long way from perfect, making transcription mistakes on three out of every ten spoken words.


Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport

Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, commoditize, data acquisition, disruptive innovation, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, lifelogging, Mark Zuckerberg, move fast and break things, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining, Thomas Davenport

Health Care In health care, for example, there will be much more structured data from the many electronic medical record systems that hospitals and outpatient clinics are installing. In addition, there have Chapter_02.indd 44 03/12/13 11:42 AM How Big Data Will Change Your Job, Company, and Industry   45 always been voluminous amounts of text in the clinical setting, ­primarily from physicians’ and nurses’ notes. This text can increasingly be c­ aptured and classified through the use of natural language ­processing ­technology. Insurance firms have huge amounts of medical claims data, but it’s not integrated with the data from healthcare providers. If all of that data could be integrated, categorized, and analyzed, we’d know a lot more about patient conditions. Image data from CAT scans and MRIs is another huge source; thus far doctors only look at it but don’t analyze it in any systematic fashion.

Many companies have used small data analytics to measure and analyze this important factor, but a lot of the data about how customers feel is unstructured—in particular, sitting in recorded voice files from customer calls to call centers. The level of customer satisfaction is increasingly important to health insurers because it is being monitored by state and federal government groups and published by organizations such as Consumers Union. In the past, that valuable data from calls couldn’t be analyzed. Now, however, United is turning it into text and then analyzing it with natural language processing software (a way to extract meaning from text). The analysis process can identify—though it’s not easy, given the vagaries of the English language—customers who use terms suggesting strong dissatisfaction. The insurer can then make some sort of intervention—perhaps a call exploring the source of the ­dissatisfaction. The decision is the same as in the past—how to identify a dissatisfied customer—but the tools are different.

In any case, many organizations that work with big data employ ­specialists in machine learning. Big data often involves the processing of unstructured data types like text, images, and video. It is probably impossible for a data scientist to be familiar with the analysis of all of these data types, but a knowledge of analytical approaches to one of them would be very useful. For example, natural language processing (NLP) is a set of approaches to extracting meaning from text. It may involve counting, classifying, translating, or otherwise analyzing words. It’s quite commonly used, for example, in understanding what customers are saying about a product or company. Virtually every large firm that is interested in big data should have someone available with NLP skills, but one or two experts will probably be sufficient.


pages: 315 words: 89,861

The Simulation Hypothesis by Rizwan Virk

3D printing, Albert Einstein, Apple II, artificial general intelligence, augmented reality, Benoit Mandelbrot, bioinformatics, butterfly effect, discovery of DNA, Dmitri Mendeleev, Elon Musk, en.wikipedia.org, Ernest Rutherford, game design, Google Glasses, Isaac Newton, John von Neumann, Kickstarter, mandelbrot fractal, Marc Andreessen, Minecraft, natural language processing, Pierre-Simon Laplace, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, Schrödinger's Cat, Search for Extraterrestrial Intelligence, Silicon Valley, Stephen Hawking, Steve Jobs, Steve Wozniak, technological singularity, Turing test, Vernor Vinge, Zeno's paradox

Some of the chat-bots use very simplistic pattern matching, while others are starting to incorporate more complicated natural language processing. Different kinds of AI techniques had to be developed in order for a computer to have a chance at passing the “Turing Test.” In the early 21st century, digital assistants like Siri, Alexa, and Google Assistant are much better at processing either text or voice than any of the video games that we have covered thus far. But just as video games drove early graphics technology, you can expect that simulated characters will drive more sophisticated AI in the future. Figure 15: Eliza was an early digital psychiatrist that used simple matching. NLP, AI, and the Quest to Pass the Turing Test Of critical importance to passing the Turing Test is NLP, or Natural Language Processing. NLP is the ability of a computer to read (or listen to) and understand the meaning of natural language.

See MMORPGs (massively multiplayer online roleplaying games) Masterson, Andrew, 251 Matera Laser Ranging Observatory (MLRO), 253–54 Mathematica software, 18 The Matrix, 7–8, 16–17, 25–26, 53, 72–74, 76, 196, 230, 251, 257, 276–77 Max Headroom, 92 Maxwell, James, 125–26 maya (illusion), 5, 14, 186–87, 191, 203 Mazurenko, Roman, 101–2 measurement, future vs. the past, 146–47 Mécanique celeste (Laplace), 125 Mendeleev, Dmitri, 190 metaphysical experiments and consciousness, 249–250 Microsoft, 60 Microsoft Hololens, 62 Miller, Laura, 80 mind interfaces mind reading, 75–77 mind-broadcast technology, 74–75 overview, 72–77 types of, 74 “mind lamp,” 76 mind reading, 75–77 mind-broadcast technology, 74–75 Minecraft, 50, 70–71 minimax algorithm, 154–55, 155f MIT, 6, 13, 32, 38–39, 85, 154, 165–66, 219 MIT Media Lab, 68 MIT Technology Review, 236 MMORPGs (massively multiplayer online roleplaying games) 3D rendering and virtual worlds, 42–44, 56 as 3D world, 94 and 3D world rendering, 136–37 augmented reality (AR), 63 as development to Simulation Point, 49–52 features of, 208–11 game evolution to, 4, 31 and Great Simulation, 53–54 Great Simulation as, 20, 279 quest engines of, 213–14 and realistic 3D models and graphics, 83 vs. simulated reality, 216 world as game state, 41 MMORPGs development 3D avatars, 49 big, graphically rendered 3D world to explore, 49 individual quests, 50–51 multiple online players, 50 persistent world state, 49–50 physics engines vs. rendering engine, 51 procedurally generated world, 51 storage of player’s state outside of rendered world, 49 user-generated content, 50 moksha, 203 Monroe, Robert, 242 Moody, Raymond, 228–29 Moorjani, Anita, 241 Morgan, Richard, 103–4 motion capture, 64 MUDs (multiuser dungeons), 44 Muhammad, 190, 226 multiple lives, 36 doctrines of reincarnation, 201–3 in video games, 200–201 multiple online players, 50 multiple possible futures, 147–48, 148f multiuser dungeons (MUDs), 44 multiverse and parallel worlds, 148–150 Musk, Elon, 5–6, 24–25, 87, 98, 139–140, 275 MWI (many worlds interpretation), 142–43, 149 My Big TOE (Campbell, 2003), 156–57, 173–74 N Natural Language Processing (NLP), 89–92 NDEs (near-death experiences), 15–16, 219, 228–231 near-death experiences (NDEs). See NDEs (near-death experiences) Netscape, 287 Neumann, John von, 100, 260 Neurable, 76 Neurolink, 76 A New Kind of Science (Wolfram, 2002), 266 New York Times, 232 Newton, Isaac, 13, 36, 124–26, 161, 166, 220–21 Niels Bohr Institute, 132 Nintendo Entertainment System (NES), 38–39 nirvana, 203 NLP (Natural Language Processing), 89–92 No Man’s Sky, 46–47, 51, 236 Noack, Marcus, 246 nonhuman earth-based lifeforms, 275 non-player characters (NPCs), 30–31, 39, 82, 280–81 non-player characters (NPCs), graphical, 41–42 non-simulated beings, 114 NPCs (non-player characters), 30–31, 39, 53, 82 NPCs and Turing Test, 115 O OASIS, 56–57, 71 OBEs (out-of-body experiences), 219, 241–42 “object” definition, 70 observation, particle collapse as, 131 Oculus VR, 59–60 OpenAI, 87, 94 optimization, 159–160 optimization techniques, computer graphics, 34, 157 Owhadi, Houman, 254–55 P Pac-Man, 1, 34, 82, 208, 273 parallel lives and future selves, 150–52 parallel universes and simulation hypothesis, 159–160 parallel worlds and Fringe, 152–53 parallel worlds and the multiverse, 148–150 parallel worlds, need for computation, 157–59 Paramahansa Yogananda, 183, 200 particle “local” nature, 127 particles and pixels on screen, 162–64 particle-wave duality, 127–134, 254–55 Pauli, Wolfgang, 121, 125–26 Pauli Exclusion Principle, 126 PCs (player characters), 82 PCs vs.

Reaching Stage 9 Getting back to the road to the simulation point, what developments would we need to complete Stage 9? Quite simply, we would need to produce characters in a simulation so realistic that they would pass the Turing Test. If we are in a giant video game, how do we know if we are interacting with real players or NPCs? The components that would need to be developed for AI/NPCs to pass this test in a fully immersive simulation like our reality include: Natural Language Processing. The first requirement would be that AI could accept natural language as an input. This would initially be typed responses not unlike Alan Turing’s idea. The AI would need to understand the input well enough to consider an appropriate series of responses. Natural Language Response. The AI would then need to give a response back that showed an understanding of what was input in a way that mimicked how a human might respond.


pages: 309 words: 114,984

The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age by Robert Wachter

"Robert Solow", activist fund / activist shareholder / activist investor, Affordable Care Act / Obamacare, AI winter, Airbnb, Atul Gawande, Captain Sullenberger Hudson, Checklist Manifesto, Chuck Templeton: OpenTable:, Clayton Christensen, collapse of Lehman Brothers, computer age, creative destruction, crowdsourcing, deskilling, disruptive innovation, en.wikipedia.org, Erik Brynjolfsson, everywhere but in the productivity statistics, Firefox, Frank Levy and Richard Murnane: The New Division of Labor, Google Glasses, Ignaz Semmelweis: hand washing, Internet of things, job satisfaction, Joseph Schumpeter, Kickstarter, knowledge worker, lifelogging, medical malpractice, medical residency, Menlo Park, minimum viable product, natural language processing, Network effects, Nicholas Carr, obamacare, pattern recognition, peer-to-peer, personalized medicine, pets.com, Productivity paradox, Ralph Nader, RAND corporation, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, Skype, Snapchat, software as a service, Steve Jobs, Steven Levy, the payments system, The Wisdom of Crowds, Thomas Bayes, Toyota Production System, Uber for X, US Airways Flight 1549, Watson beat the top human players on Jeopardy!, Yogi Berra

Even technophiles admit that the quest to replace doctors with computers—or even the more modest ambition of providing them with useful guidance at the point of care—has been overhyped and unproductive. But times have changed. The growing prevalence of electronic health records offers grist for the AI and big-data mills, grist that wasn’t available when the records were on paper. And in this, the Age of Watson, we have new techniques, like natural language processing and machine learning, at our disposal. Perhaps this is our “gradually, then suddenly” moment. The public worships dynamic, innovative surgeons like Michael DeBakey; passionate, insightful researchers like Jonas Salk; and telegenic show horses like Mehmet Oz. But we seldom hear about those doctors whom other physicians tend to hold in the highest esteem: the great medical diagnosticians.

As if this weren’t complicated enough for the poor IBM engineer gearing up to retool Watson from answering questions about “Potent Potables” to diagnosing sick patients, there’s more. While the EHR at least offers a fighting chance for computerized diagnosis (older medical AI programs, built in the pen-and-paper era, required busy physicians to write their notes and then reenter all the key data), parsing an electronic medical record is far from straightforward. Natural language processing is getting much better, but it still has real problems with negation (“the patient has no history of chest pain or cough”) and with family history (“there is a history of arthritis in the patient’s sister, but his mother is well”), to name just a couple of issues. Certain terms have multiple meanings: when written by a psychiatrist, the term depression is likely to refer to a mood disorder, while when it appears in a cardiologist’s note (“there was no evidence of ST-depression”) it probably refers to a dip in the EKG tracing that is often a clue to coronary disease.

The scruffies are the pragmatists, the hackers, the crazy ones; they believe that problems should be attacked through whatever means work, and that modeling the behavior of experts or the scientific truth of a situation isn’t all that important. IBM’s breakthrough was to figure out that a combination of neat and scruffy—programming in some of the core rules of the game, but then folding in the fruits of machine learning and natural language processing—could solve truly complicated problems. When he was asked about the difference between human thinking and Watson’s method, Eric Brown, who runs IBM’s Watson Technologies group, gave a careful answer (note the shout-out to the humans, the bit players who made it all possible): A lot of the way that Watson works is motivated by the way that humans analyze problems and go about trying to find solutions, especially when it comes to dealing with complex problems where there are a number of intermediate steps to get you to the final answer.


pages: 340 words: 97,723

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity by Amy Webb

Ada Lovelace, AI winter, Airbnb, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, artificial general intelligence, Asilomar, autonomous vehicles, Bayesian statistics, Bernie Sanders, bioinformatics, blockchain, Bretton Woods, business intelligence, Cass Sunstein, Claude Shannon: information theory, cloud computing, cognitive bias, complexity theory, computer vision, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, Deng Xiaoping, distributed ledger, don't be evil, Donald Trump, Elon Musk, Filter Bubble, Flynn Effect, gig economy, Google Glasses, Grace Hopper, Gödel, Escher, Bach, Inbox Zero, Internet of things, Jacques de Vaucanson, Jeff Bezos, Joan Didion, job automation, John von Neumann, knowledge worker, Lyft, Mark Zuckerberg, Menlo Park, move fast and break things, move fast and break things, natural language processing, New Urbanism, one-China policy, optical character recognition, packet switching, pattern recognition, personalized medicine, RAND corporation, Ray Kurzweil, ride hailing / ride sharing, Rodney Brooks, Rubik’s Cube, Sand Hill Road, Second Machine Age, self-driving car, SETI@home, side project, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart cities, South China Sea, sovereign wealth fund, speech recognition, Stephen Hawking, strong AI, superintelligent machines, technological singularity, The Coming Technological Singularity, theory of mind, Tim Cook: Apple, trade route, Turing machine, Turing test, uber lyft, Von Neumann architecture, Watson beat the top human players on Jeopardy!, zero day

Thus the first ultraintelligent machine is the last invention that man need ever make.”22 A woman did finally enter the mix, at least in name. At MIT, computer scientist Joseph Weizenbaum wrote an early AI system called ELIZA, a chat program named after the ingenue in George Bernard Shaw’s play Pygmalion.23 This development was important for neural networks and AI because it was an early attempt at natural language processing, and the program accessed various prewritten scripts in order to have conversations with real people. The most famous script was called DOCTOR,24 and it mimicked an empathetic psychologist using pattern recognition to respond with strikingly humanistic responses. The Dartmouth workshop had now generated international attention, as did its researchers, who’d unexpectedly found themselves in the limelight.

These universities are home to active academic research groups with strong industry ties. Tribes typically observe rules and rituals, so let’s explore the rights of initiation for AI’s tribes. It begins with a rigorous university education. In North America, the emphasis within universities has centered on hard skills—like mastery of the R and Python programming languages, competency in natural language processing and applied statistics, and exposure to computer vision, computational biology, and game theory. It’s frowned upon to take classes outside the tribe, such as a course on the philosophy of mind, Muslim women in literature, or colonialism. If we’re trying to build thinking machines capable of thinking like humans do, it would seem counterintuitive to exclude learning about the human condition.

Microsoft had actually launched its own digital assistant earlier in the year—its name was Cortana—but the system just hadn’t caught on among Windows users. Although Microsoft was the indispensable—if invisible—productivity layer that no business could operate without, executives and shareholders were feeling antsy. It isn’t as though Microsoft didn’t see AI coming. In fact, the company had, for more than a decade, been working across multiple fronts: computer vision, natural language processing, machine reading comprehension, AI apps in its Azure cloud, and even edge computing. The problem was misalignment within the organization and the lack of a shared vision among all cross-functional teams. This resulted in bursts of incredible breakthroughs in AI, published papers, and lots of patents created by supernetworks working on individual projects. One example is an experimental research project that Microsoft released in partnership with Tencent and a Chinese Twitter knockoff called Weibo.


pages: 339 words: 88,732

The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson, Andrew McAfee

"Robert Solow", 2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, access to a mobile phone, additive manufacturing, Airbnb, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, American Society of Civil Engineers: Report Card, Any sufficiently advanced technology is indistinguishable from magic, autonomous vehicles, barriers to entry, basic income, Baxter: Rethink Robotics, British Empire, business cycle, business intelligence, business process, call centre, Charles Lindbergh, Chuck Templeton: OpenTable:, clean water, combinatorial explosion, computer age, computer vision, congestion charging, corporate governance, creative destruction, crowdsourcing, David Ricardo: comparative advantage, digital map, employer provided health coverage, en.wikipedia.org, Erik Brynjolfsson, factory automation, falling living standards, Filter Bubble, first square of the chessboard / second half of the chessboard, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, full employment, G4S, game design, global village, happiness index / gross national happiness, illegal immigration, immigration reform, income inequality, income per capita, indoor plumbing, industrial robot, informal economy, intangible asset, inventory management, James Watt: steam engine, Jeff Bezos, jimmy wales, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, Khan Academy, knowledge worker, Kodak vs Instagram, law of one price, low skilled workers, Lyft, Mahatma Gandhi, manufacturing employment, Marc Andreessen, Mark Zuckerberg, Mars Rover, mass immigration, means of production, Narrative Science, Nate Silver, natural language processing, Network effects, new economy, New Urbanism, Nicholas Carr, Occupy movement, oil shale / tar sands, oil shock, pattern recognition, Paul Samuelson, payday loans, post-work, price stability, Productivity paradox, profit maximization, Ralph Nader, Ray Kurzweil, recommendation engine, Report Card for America’s Infrastructure, Robert Gordon, Rodney Brooks, Ronald Reagan, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Simon Kuznets, six sigma, Skype, software patent, sovereign wealth fund, speech recognition, statistical model, Steve Jobs, Steven Pinker, Stuxnet, supply-chain management, TaskRabbit, technological singularity, telepresence, The Bell Curve by Richard Herrnstein and Charles Murray, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, total factor productivity, transaction costs, Tyler Cowen: Great Stagnation, Vernor Vinge, Watson beat the top human players on Jeopardy!, winner-take-all economy, Y2K

A 2004 review of the previous half-century’s research in automatic speech recognition (a critical part of natural language processing) opened with the admission that “Human-level speech recognition has proved to be an elusive goal,” but less than a decade later major elements of that goal have been reached. Apple and other companies have made robust natural language processing technology available to hundreds of millions of people via their mobile phones.10 As noted by Tom Mitchell, who heads the machine-learning department at Carnegie Mellon University: “We’re at the beginning of a ten-year period where we’re going to transition from computers that can’t understand language to a point where computers can understand quite a bit about language.”11 Digital Fluency: The Babel Fish Goes to Work Natural language processing software is still far from perfect, and computers are not yet as good as people at complex communication, but they’re getting better all the time.

Their hundreds of person-years of accumulated experience and expertise seemed like an insurmountable advantage over a bunch of novices. They needn’t have worried. Many of the ‘novices’ drawn to the challenge outperformed all of the testing companies in the essay competition. The surprises continued when Kaggle investigated who the top performers were. In both competitions, none of the top three finishers had any previous significant experience with either essay grading or natural language processing. And in the second competition, none of the top three finishers had any formal training in artificial intelligence beyond a free online course offered by Stanford AI faculty and open to anyone in the world who wanted to take it. People all over the world did, and evidently they learned a lot. The top three individual finishers were from, respectively, the United States, Slovenia, and Singapore.

Thinking Machines, Available Now Machines that can complete cognitive tasks are even more important than machines that can accomplish physical ones. And thanks to modern AI we now have them. Our digital machines have escaped their narrow confines and started to demonstrate broad abilities in pattern recognition, complex communication, and other domains that used to be exclusively human. We’ve also recently seen great progress in natural language processing, machine learning (the ability of a computer to automatically refine its methods and improve its results as it gets more data), computer vision, simultaneous localization and mapping, and many of the other fundamental challenges of the discipline. We’re going to see artificial intelligence do more and more, and as this happens costs will go down, outcomes will improve, and our lives will get better.


pages: 307 words: 88,180

AI Superpowers: China, Silicon Valley, and the New World Order by Kai-Fu Lee

AI winter, Airbnb, Albert Einstein, algorithmic trading, artificial general intelligence, autonomous vehicles, barriers to entry, basic income, business cycle, cloud computing, commoditize, computer vision, corporate social responsibility, creative destruction, crony capitalism, Deng Xiaoping, deskilling, Donald Trump, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, full employment, future of work, gig economy, Google Chrome, happiness index / gross national happiness, if you build it, they will come, ImageNet competition, income inequality, informal economy, Internet of things, invention of the telegraph, Jeff Bezos, job automation, John Markoff, Kickstarter, knowledge worker, Lean Startup, low skilled workers, Lyft, mandatory minimum, Mark Zuckerberg, Menlo Park, minimum viable product, natural language processing, new economy, pattern recognition, pirate software, profit maximization, QR code, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, risk tolerance, Robert Mercer, Rodney Brooks, Rubik’s Cube, Sam Altman, Second Machine Age, self-driving car, sentiment analysis, sharing economy, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, Skype, special economic zone, speech recognition, Stephen Hawking, Steve Jobs, strong AI, The Future of Employment, Travis Kalanick, Uber and Lyft, uber lyft, universal basic income, urban planning, Y Combinator

What do all of you think?” iFlyTek might say the same to its own competitors. The Chinese company has racked up victories at a series of prestigious international AI competitions for speech recognition, speech synthesis, image recognition, and machine translation. Even in the company’s “second language” of English, iFlyTek often beats teams from Google, DeepMind, Facebook, and IBM Watson in natural-language processing—that is, the ability of AI to decipher overall meaning rather than just words. This success didn’t come overnight. Back in 1999, when I started Microsoft Research Asia, my top-choice recruit was a brilliant young Ph.D. named Liu Qingfeng. He had been one of the students I saw filing out of the dorms to study under streetlights after my lecture in Hefei. Liu was both hardworking and creative in tackling research questions; he was one of China’s most promising young researchers.

China’s leader in this category is Jinri Toutiao (meaning “today’s headlines”; English name: “ByteDance”). Founded in 2012, Toutiao is sometimes called “the BuzzFeed of China” because both sites serve as hubs for timely viral stories. But virality is where the similarities stop. BuzzFeed is built on a staff of young editors with a knack for cooking up original content. Toutiao’s “editors” are algorithms. Toutiao’s AI engines trawl the internet for content, using natural-language processing and computer vision to digest articles and videos from a vast network of partner sites and commissioned contributors. It then uses the past behavior of its users—their clicks, reads, views, comments, and so on—to curate a highly personalized newsfeed tailored to each person’s interests. The app’s algorithms even rewrite headlines to optimize for user clicks. And the more those users click, the better Toutiao becomes at recommending precisely the content they want to see.

JUDGING THE JUDGES Similar principles are now being applied to China’s legal system, another sprawling bureaucracy with highly uneven levels of expertise across regions. iFlyTek has taken the lead in applying AI to the courtroom, building tools and executing a Shanghai-based pilot program that uses data from past cases to advise judges on both evidence and sentencing. An evidence cross-reference system uses speech recognition and natural-language processing to compare all evidence presented—testimony, documents, and background material—and seek out contradictory fact patterns. It then alerts the judge to these disputes, allowing for further investigation and clarification by court officers. Once a ruling is handed down, the judge can turn to yet another AI tool for advice on sentencing. The sentencing assistant starts with the fact pattern—defendant’s criminal record, age, damages incurred, and so on—then its algorithms scan millions of court records for similar cases.


pages: 285 words: 86,853

What Algorithms Want: Imagination in the Age of Computing by Ed Finn

Airbnb, Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Amazon Web Services, bitcoin, blockchain, Chuck Templeton: OpenTable:, Claude Shannon: information theory, commoditize, Credit Default Swap, crowdsourcing, cryptocurrency, disruptive innovation, Donald Knuth, Douglas Engelbart, Douglas Engelbart, Elon Musk, factory automation, fiat currency, Filter Bubble, Flash crash, game design, Google Glasses, Google X / Alphabet X, High speed trading, hiring and firing, invisible hand, Isaac Newton, iterative process, Jaron Lanier, Jeff Bezos, job automation, John Conway, John Markoff, Just-in-time delivery, Kickstarter, late fees, lifelogging, Loebner Prize, Lyft, Mother of all demos, Nate Silver, natural language processing, Netflix Prize, new economy, Nicholas Carr, Norbert Wiener, PageRank, peer-to-peer, Peter Thiel, Ray Kurzweil, recommendation engine, Republic of Letters, ride hailing / ride sharing, Satoshi Nakamoto, self-driving car, sharing economy, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, social graph, software studies, speech recognition, statistical model, Steve Jobs, Steven Levy, Stewart Brand, supply-chain management, TaskRabbit, technological singularity, technoutopianism, The Coming Technological Singularity, the scientific method, The Signal and the Noise by Nate Silver, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, transaction costs, traveling salesman, Turing machine, Turing test, Uber and Lyft, Uber for X, uber lyft, urban planning, Vannevar Bush, Vernor Vinge, wage slave

Many complex systems demonstrate computational features or appear to be computable. If complex systems are themselves computational Turing Machines, they are therefore equivalent: weather systems, human cognition, and most provocatively the universe itself.24 The grand problems of the cosmos (the origins thereof, the relationship of time and space) and the less grand problems of culture (box office returns, intelligent web searching, natural language processing) are irreducible but also calculable: they are not complicated problems with simple answers but rather simple problems (or rule-sets) that generate complicated answers. These assumptions open the door to a mathesis universalis, a language of science that the philosophers Gottfried Wilhelm Leibniz, René Descartes, and others presaged as a way to achieve perfect understanding of the natural world.25 This perfect language would exactly describe the universe through its grammar and vocabulary, becoming a new kind of rational magic for scientists that would effectively describe and be the world.

As critics like media scholar Siva Vaidhyanathan have pointed out, the price of nonparticipation is significant but also difficult to pin down, and the gravitational pull of algorithmic culture gradually inculcates the rituals of participation, of obeisance, to particular computational altars.12 The Magic of Ontology The call and response of Siri’s communication is central to the cultural understanding of intelligent assistants as a kind of useful demon—entities with specific, constrained abilities—but the nature of this achievement unveils a deeper technical being that exists beyond its utility to users. The vital element in Siri’s effectiveness as a culture machine is the achievement of a minimum viable threshold for speedy, topical responses to questions. Siri’s ability to interpret real-world commands depends on two key factors: natural language processing (NLP) and semantic interpretation. As any user who has tried to use Siri without a data connection knows, the software cannot operate without a link to Apple’s servers. Each time a user speaks to Siri the sound file is sent to a data center for analysis and storage, a service of the leading speech technology company Nuance.13 The major breakthroughs in algorithmic speech analysis have come by abandoning deep linguistic structure—efforts to thoroughly map grammar and semantics—in favor of treating speech as a statistical, probabilistic challenge.14 Given this audio signal, what text strings are most likely associated with each word?

Index Abortion, 64 Abstraction, 10 aesthetics and, 83, 87–112 arbitrage and, 161 Bogost and, 49, 92–95 capitalism and, 165 context and, 24 cryptocurrency and, 160–180 culture machines and, 54 (see also Culture machines) cybernetics and, 28, 30, 34 desire for answer and, 25 discarded information and, 50 effective computability and, 28, 33 ethos of information and, 159 high frequency trading (HFT) and imagination and, 185, 189, 192, 194 interfaces and, 52, 54, 92, 96, 103, 108, 110–111 ladder of, 82–83 language and, 2, 24 Marxism and, 165 meaning and, 36 money and, 153, 159, 161, 165–167, 171–175 Netflix and, 87–112, 205n36 politics of, 45 pragmatist approach and, 19–21 process and, 2, 52, 54 reality and, 205n36 Siri and, 64–65, 82–84 Turing Machine and, 23 (see also Turing Machine) Uber and, 124–126, 129 Wiener and, 28–29, 30 work of algorithms and, 113, 120, 123–136, 139–149 Adams, Douglas, 123 Adams, Henry, 80–81 Adaptive systems, 50, 63, 72, 92, 174, 176, 186, 191 Addiction, 114–115, 118–119, 121–122, 176 AdSense, 158–159 Advent of the Algorithm, The (Berlinski), 9, 24 Advertisements AdSense and, 158–159 algorithmic arbitrage and, 111, 161 Apple and, 65 cultural calculus of waiting and, 34 as cultural latency, 159 emotional appeals of, 148 Facebook and, 113–114 feedback systems and, 145–148 Google and, 66, 74, 156, 158–160 Habermas on, 175 Netflix and, 98, 100, 102, 104, 107–110 Uber and, 125 Aesthetics abstraction and, 83, 87–112 arbitrage and, 109–112, 175 culture machines and, 55 House of Cards and, 92, 98–112 Netflix Quantum Theory and, 91–97 personalization and, 11, 97–103 of production, 12 work of algorithms and, 123, 129, 131, 138–147 Agre, Philip, 178–179 Airbnb, 124, 127 Algebra, 17 Algorithmic reading, 52–56 Algorithmic trading, 12, 20, 99, 155 Algorithms abstraction and, 2 (see also Abstraction) arbitrage and, 12, 51, 97, 110–112, 119, 121, 124, 127, 130–134, 140, 151, 160, 162, 169, 171, 176 Berlinski on, 9, 24, 30, 36, 181 Bitcoin and, 160–180 black boxes and, 7, 15–16, 47–48, 51, 55, 64, 72, 92–93, 96, 136, 138, 146–147, 153, 162, 169–171, 179 blockchains and, 163–168, 171, 177, 179 Bogost and, 16, 33, 49 Church-Turing thesis and, 23–26, 39–41, 73 consciousness and, 2, 4, 8, 22–23, 36–37, 40, 76–79, 154, 176, 178, 182, 184 DARPA and, 11, 57–58, 87 desire and, 21–26, 37, 41, 47, 49, 52, 79–82, 93–96, 121, 159, 189–192 effective computability and, 10, 13, 21–29, 33–37, 40–49, 52–54, 58, 62, 64, 72–76, 81, 93, 192–193 Elliptic Curve Digital Signature Algorithm and, 163 embodiment and, 26–32 encryption, 153, 162–163 enframing and, 118–119 Enlightenment and, 27, 30, 38, 45, 68–71, 73 experimental humanities and, 192–196 Facebook and, 20 (see also Facebook) faith and, 7–9, 12, 16, 78, 80, 152, 162, 166, 168 gamification and, 12, 114–116, 120, 123–127, 133 ghost in the machine and, 55, 95 halting states and, 41–46 high frequency trading (HFT) and, 151–158, 168–169, 177 how to think about, 36–41 ideology and, 7, 9, 18, 20–23, 26, 33, 38, 42, 46–47, 54, 64, 69, 130, 144, 155, 160–162, 167, 169, 194 imagination and, 11, 55–56, 181–196 implementation and, 47–52 intelligent assistants and, 11, 57, 62, 64–65, 77 intimacy and, 4, 11, 35, 54, 65, 74–78, 82–85, 97, 102, 107, 128–130, 172, 176, 185–189 Knuth and, 17–18 language and, 24–28, 33–41, 44, 51, 54–55 machine learning and, 2, 15, 28, 42, 62, 66, 71, 85, 90, 112, 181–184, 191 mathematical logic and, 2 meaning and, 35–36, 38, 44–45, 50, 54–55 metaphor and, 32–36 Netflix Prize and, 87–91 neural networks and, 28, 31, 39, 182–183, 185 one-way functions and, 162–163 pragmatist approach and, 18–25, 42, 58, 62 process and, 41–46 programmable culture and, 169–175 quest for perfect knowledge and, 13, 65, 71, 73, 190 rise of culture machines and, 15–21 (see also Culture machines) Siri and, 59 (see also Siri) traveling salesman problem and Turing Machine and, 9 (see also Turing Machine) as vehicle of computation, 5 wants of, 81–85 Weizenbaum and, 33–40 work of, 113–149 worship of, 192 Al-Khwārizmī, Abū ‘Abdullāh Muhammad ibn Mūsā, 17 Alphabet Corporation, 66, 155 AlphaGo, 182, 191 Amazon algorithmic arbitrage and, 124 artificial intelligence (AI) and, 135–145 Bezos and, 174 Bitcoin and, 169 business model of, 20–21, 93–94 cloud warehouses and, 131–132, 135–145 disruptive technologies and, 124 effective computability and, 42 efficiency algorithms and, 134 interface economy and, 124 Kindle and, 195 Kiva Systems and, 134 Mechanical Turk and, 135–145 personalization and, 97 physical logistics of, 13, 131 pickers and, 132–134 pragmatic approach and, 18 product improvement and, 42 robotics and, 134 simplification ethos and, 97 worker conditions and, 132–134, 139–140 Android, 59 Anonymous, 112, 186 AOL, 75 Apple, 81 augmenting imagination and, 186 black box of, 169 cloud warehouse of, 131 company value of, 158 effective computability and, 42 efficiency algorithms and, 134 Foxconn and, 133–134 global computation infrastructure of, 131 iOS App Store and, 59{tab} iTunes and, 161 massive infrastructure of, 131 ontology and, 62–63, 65 physical logistics of, 131 pragmatist approach and, 18 product improvement and, 42 programmable culture and, 169 search and, 87 Siri and, 57 (see also Siri) software and, 59, 62 SRI International and, 57, 59 Application Program Interfaces (APIs), 7, 113 Apps culture machines and, 15 Facebook and, 9, 113–115, 149 Her and, 83 identity and, 6 interfaces and, 8, 124, 145 iOS App Store and, 59 Lyft and, 128, 145 Netflix and, 91, 94, 102 third-party, 114–115 Uber and, 124, 145 Arab Spring, 111, 186 Arbesman, Samuel, 188–189 Arbitrage algorithmic, 12, 51, 97, 110–112, 119, 121, 124, 127, 130–134, 140, 151, 160, 162, 169, 171, 176 Bitcoin and, 51, 169–171, 175–179 cultural, 12, 94, 121, 134, 152, 159 differing values and, 121–122 Facebook and, 111 Google and, 111 high frequency trading (HFT) and, 151–158, 168–169, 177 interface economy and, 123–131, 139–140, 145, 147 labor and, 97, 112, 123–145 market issues and, 152, 161 mining value and, 176–177 money and, 151–152, 155–163, 169–171, 175–179 Netflix and, 94, 97, 109–112 PageRank and, 159 pricing, 12 real-time, 12 trumping content and, 13 valuing culture and, 155–160 Archimedes, 18 Artificial intelligence (AI) adaptive systems and, 50, 63, 72, 92, 174, 176, 186, 191 Amazon and, 135–145 anthropomorphism and, 83, 181 anticipation and, 73–74 artificial, 135–141 automata and, 135–138 DARPA and, 11, 57–58, 87 Deep Blue and, 135–138 DeepMind and, 28, 66, 181–182 desire and, 79–82 ELIZA and, 34 ghost in the machine and, 55, 95 HAL and, 181 homeostat and, 199n42 human brain and, 29 intellectual history of, 61 intelligent assistants and, 11, 57, 62, 64–65, 77 intimacy and, 75–76 job elimination and, 133 McCulloch-Pitts Neuron and, 28, 39 machine learning and, 2, 15, 28, 42, 62, 66, 71, 85, 90, 112, 181–186 Mechanical Turk and, 12, 135–145 natural language processing (NLP) and, 62–63 neural networks and, 28, 31, 39, 182–183, 185 OS One (Her) and, 77 renegade independent, 191 Samantha (Her) and, 77–85, 154, 181 Siri and, 57, 61 (see also Siri) Turing test and, 43, 79–82, 87, 138, 142, 182 Art of Computer Programming, The (Knuth), 17 Ashby, Ross, 199n42 Asimov, Isaac, 45 Atlantic, The (magazine), 7, 92, 170 Automation, 122, 134, 144, 188 Autopoiesis, 28–30 Babbage, Charles, 8 Banks, Iain, 191 Barnet, Belinda, 43–44 Bayesian analysis, 182 BBC, 170 BellKor’s Pragmatic Chaos (Netflix), 89–90 Berlinski, David, 9, 24, 30, 36, 181, 184 Bezos, Jeff, 174 Big data, 11, 15–16, 62–63, 90, 110 Biology, 2, 4, 26–33, 36–37, 80, 133, 139, 185 Bitcoin, 12–13 arbitrage and, 51, 169–171, 175–179 blockchains and, 163–168, 171–172, 177, 179 computationalist approach and cultural processing and, 178 eliminating vulnerability and, 161–162 Elliptic Curve Digital Signature Algorithm and, 163 encryption and, 162–163 as glass box, 162 intrinsic value and, 165 labor and, 164, 178 legitimacy and, 178 market issues and, 163–180 miners and, 164–168, 171–172, 175–179 Nakamoto and, 161–162, 165–167 one-way functions and, 162–163 programmable culture and, 169–175 transaction fees and, 164–165 transparency and, 160–164, 168, 171, 177–178 trust and, 166–168 Blockbuster, 99 Blockchains, 163–168, 171–172, 177, 179 Blogs early web curation and, 156 Facebook algorithms and, 178 Gawker Media and, 170–175 journalistic principles and, 173, 175 mining value and, 175, 178 Netflix and, 91–92 turker job conditions and, 139 Uber and, 130 Bloom, Harold, 175 Bogost, Ian abstraction and, 92–95 algorithms and, 16, 33, 49 cathedral of computation and, 6–8, 27, 33, 49, 51 computation and, 6–10, 16 Cow Clicker and, 12, 116–123 Enlightenment and, 8 gamification and, 12, 114–116, 120, 123–127, 133 Netflix and, 92–95 Boolean conjunctions, 51 Bosker, Bianca, 58 Bostrom, Nick, 45 Bowker, Geoffrey, 28, 110 Boxley Abbey, 137 Brain Pickings (Popova), 175 Brain plasticity, 38, 191 Brand, Stewart, 3, 29 Brazil (film), 142 Breaking Bad (TV series), 101 Brin, Sergei, 57, 155–156 Buffett, Warren, 174 Burr, Raymond, 95 Bush, Vannevar, 18, 186–189, 195 Business models Amazon and, 20–21, 93–94, 96 cryptocurrency and, 160–180 Facebook and, 20 FarmVille and, 115 Google and, 20–21, 71–72, 93–94, 96, 155, 159 Netflix and, 87–88 Uber and, 54, 93–94, 96 Business of Enlightenment, The (Darnton) 68, 68 Calculus, 24, 26, 30, 34, 44–45, 98, 148, 186 CALO, 57–58, 63, 65, 67, 79, 81 Campbell, Joseph, 94 Campbell, Murray, 138 Capitalism, 12, 105 cryptocurrency and, 160, 165–168, 170–175 faking it and, 146–147 Gawker Media and, 170–175 identity and, 146–147 interface economy and, 127, 133 labor and, 165 public sphere and, 172–173 venture, 9, 124, 174 Captology, 113 Carr, Nicholas, 38 Carruth, Allison, 131 Castronova, Edward, 121 Cathedral and the Bazaar, The (Raymond), 6 Cathedral of computation, 6–10, 27, 33, 49, 51 Chess, 135–138, 144–145 Chun, Wendy Hui Kyong, 3, 16, 33, 35–36, 42, 104 Church, Alonzo, 23– 24, 42 Church-Turing thesis, 23–26, 39–41 Cinematch (Netflix), 88–90, 95 Citizens United case, 174 Clark, Andy, 37, 39–40 Cloud warehouses Amazon and, 135–145 interface economy and, 131–145 Mechanical Turk and, 135–145 worker conditions and, 132–134, 139–140 CNN, 170 Code.


pages: 58 words: 12,386

Big Data Glossary by Pete Warden

business intelligence, crowdsourcing, fault tolerance, information retrieval, linked data, natural language processing, recommendation engine, web application

If you’re dealing with graph data, Tinkerpop will give you some high-level interfaces that can be much more convenient to deal with than raw graph databases. Chapter 7. NLP Natural language processing (NLP) is a subset of data processing that’s so crucial, it earned its own section. Its focus is taking messy, human-created text and extracting meaningful information. As you can imagine, this chaotic problem domain has spawned a large variety of approaches, with each tool most useful for particular kinds of text. There’s no magic bullet that will understand written information as well as a human, but if you’re prepared to adapt your use of the results to handle some errors and don’t expect miracles, you can pull out some powerful insights. Natural Language Toolkit The NLTK is a collection of Python modules and datasets that implement common natural language processing techniques. It offers the building blocks that you need to build more complex algorithms for specific problems.


pages: 245 words: 64,288

Robots Will Steal Your Job, But That's OK: How to Survive the Economic Collapse and Be Happy by Pistono, Federico

3D printing, Albert Einstein, autonomous vehicles, bioinformatics, Buckminster Fuller, cloud computing, computer vision, correlation does not imply causation, en.wikipedia.org, epigenetics, Erik Brynjolfsson, Firefox, future of work, George Santayana, global village, Google Chrome, happiness index / gross national happiness, hedonic treadmill, illegal immigration, income inequality, information retrieval, Internet of things, invention of the printing press, jimmy wales, job automation, John Markoff, Kevin Kelly, Khan Academy, Kickstarter, knowledge worker, labor-force participation, Lao Tzu, Law of Accelerating Returns, life extension, Loebner Prize, longitudinal study, means of production, Narrative Science, natural language processing, new economy, Occupy movement, patent troll, pattern recognition, peak oil, post scarcity, QR code, race to the bottom, Ray Kurzweil, recommendation engine, RFID, Rodney Brooks, selection bias, self-driving car, slashdot, smart cities, software as a service, software is eating the world, speech recognition, Steven Pinker, strong AI, technological singularity, Turing test, Vernor Vinge, women in the workforce

When I chose the title of this book, Robots will steal your job, I was not completely honest with you. Robots will eventually steal your job, but before them something else is going to jump in. In fact, it already has, in a much more pervasive way that any physical machine could ever do. I am of course talking about computer programs in general. Automated Planning and Scheduling, Machine Learning, Natural Language Processing, Machine Perception, Computer Vision, Speech Recognition, Affective Computing, Computational Creativity, these are all fields of Artificial Intelligence that do not have to face the cumbersome issues that Robotics has to. It is much easier to enhance an algorithm than it is to build a better robot. A more accurate title for the book would have been “Machine intelligence and computer algorithms are already stealing your job, and they will do so ever more in the future” – but that was not exactly a catchy title.

The classical “Turing test approach” has been largely abandoned as a realistic research goal, and is now just an intellectual curiosity (the annual Loebner prize for realistic chattiest81), but helped spawn the two dominant themes of modern cognition and artificial intelligence: calculating probabilities and producing complex behaviour from the interaction of many small, simple processes. As of today (2012), we believe these represent more closely what the human brain does, and they have been used in a variety of real-world applications: Google’s autonomous cars, search results, recommendation systems, automated language translation, personal assistants, cybernetic computational search engines, and IBM’s newest super brain Watson. Natural language processing was believed to be a task that only humans could accomplish. A word can have different meanings depending on the context, a phrase could not mean what it says if it is a joke or a pun. One may infer a subtext implicitly, make cultural references specific to a geographical or cultural area, the possibilities are truly endless. A game that captures pretty well the intricacies and the nuances of the English language is Jeopardy!

While our brains will stay pretty much the same for the next 20 years, computer’s efficiency and computational power will have doubled about twenty times. That is a million-fold increase. So, for the same $3 million you will have a computer a million times more powerful than Watson, or you could have a Watson-equivalent computer for $3. Watson’s computational power and exceptional skills of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, Machine Learning, and open domain question answering are already being put to better use than showing off at a TV contest. IBM and Nuance Communications Inc. are partnering for the research project to develop a commercial product during the next 18 to 24 months that will exploit Watson’s capabilities as a clinical decision support system to aid the diagnosis and treatment of patients.86 Recall the example of automated radiologists we mentioned earlier.


pages: 268 words: 109,447

The Cultural Logic of Computation by David Golumbia

Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, American ideology, Benoit Mandelbrot, borderless world, business process, cellular automata, citizen journalism, Claude Shannon: information theory, computer age, corporate governance, creative destruction, en.wikipedia.org, finite state, future of work, Google Earth, Howard Zinn, IBM and the Holocaust, iterative process, Jaron Lanier, jimmy wales, John von Neumann, Joseph Schumpeter, late capitalism, means of production, natural language processing, Norbert Wiener, packet switching, RAND corporation, Ray Kurzweil, RFID, Richard Stallman, semantic web, Shoshana Zuboff, Slavoj Žižek, social web, stem cell, Stephen Hawking, Steve Ballmer, Stewart Brand, strong AI, supply-chain management, supply-chain management software, Ted Nelson, telemarketer, The Wisdom of Crowds, theory of mind, Turing machine, Turing test, Vannevar Bush, web application

Our human problem, according to this view, is that language has become corrupted due to ambiguity, polysemy, and polyvocality, and computers can bring language back to us, straighten it out, and eliminate the problems that are to blame not just for communicative difficulties but for the “simplicity and power” that would bring about significant political change. Despite Weaver’s assessment, few linguists of note contributed to the 1955 volume (the only practicing linguist among them is Victor Yngve, an MIT Germanicist who is most famous for work in CL and natural language processing, referred to as NLP). In an “historical introduction” provided by the editors, the history of MT begins abruptly in 1946, as if questions of the formal nature of language had never been addressed before. Rather than surveying the intellectual background and history of this topic, the editors cover only the history of machines built at MIT for the express purpose of MT. The book itself begins with Weaver’s famous, (until-then) privately circulated “memorandum” of 1949, here published as “Translation,” and was circulated among many computer scientists of the time who dissented from its conclusions even then.3 At the time Weaver was president of the Rockefeller Foundation, and tried unsuccessfully to enlist major figures like Norbert Wiener, C.

Cambridge, MA: The MIT Press. Badiou, Alain. 2001. Ethics: An Essay on the Understanding of Evil. New York: Verso. Baran, Paul A., and Paul M. Sweezy. 1966. Monopoly Capital: An Essay on the American Economic and Social Order. New York: Monthly Review Press. Barsky, Robert F. 1997. Noam Chomsky: A Life of Dissent. Cambridge, MA: The MIT Press. Bates, Madeleine, and Ralph M. Weischedel, eds. 1993. Challenges in Natural Language Processing. New York: Cambridge University Press. Bauerlein, Mark. 2008. The Dumbest Generation: How the Digital Age Stupefies Young Americans and Jeopardizes Our Future (Or, Don’t Trust Anyone Under 30). New York: Penguin. Bechtel, William, and Adele Abrahamsen. 2002. Connectionism and the Mind: Parallel Processing, Dynamics, and Evolution in Networks. Second edition. Malden, MA: Blackwell. Beniger, James R. 1986.

New York: Cambridge University Press. Crystal, David. 2001. Language and the Internet. New York: Cambridge University Press. ———. 2004. The Language Revolution. New York: Polity Press. Dahlberg, Lincoln, and Eugenia Siapera, eds. 2007. Radical Democracy and the Internet: Interrogating Theory and Practice. New York: Palgrave. Dale, Robert, Hermann Moisl, and Harold Somers, eds. 2000. Handbook of Natural Language Processing. New York: Marcel Dekker. Darnell, Rick. 1997. “A Brief History of SGML.” In HTML Unleashed 4. Indianapolis, IN: Sams Publishing. §3.2. http://www.webreference.com/. Davenport, David. 2000. “Computationalism: The Very Idea.” Conceptus-Studien 14, 121–137. Davidson, Matthew C., Dima Amso, Loren Cruess Anderson, and Adele Diamond. 2006. “Development of Cognitive Control and Executive Functions from 4 to 13 years: Evidence from Manipulations of Memory, Inhibition, and Task Switching.”


pages: 346 words: 97,330

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass by Mary L. Gray, Siddharth Suri

Affordable Care Act / Obamacare, Amazon Mechanical Turk, augmented reality, autonomous vehicles, barriers to entry, basic income, big-box store, bitcoin, blue-collar work, business process, business process outsourcing, call centre, Capital in the Twenty-First Century by Thomas Piketty, cloud computing, collaborative consumption, collective bargaining, computer vision, corporate social responsibility, crowdsourcing, data is the new oil, deindustrialization, deskilling, don't be evil, Donald Trump, Elon Musk, employer provided health coverage, en.wikipedia.org, equal pay for equal work, Erik Brynjolfsson, financial independence, Frank Levy and Richard Murnane: The New Division of Labor, future of work, gig economy, glass ceiling, global supply chain, hiring and firing, ImageNet competition, industrial robot, informal economy, information asymmetry, Jeff Bezos, job automation, knowledge economy, low skilled workers, low-wage service sector, market friction, Mars Rover, natural language processing, new economy, passive income, pattern recognition, post-materialism, post-work, race to the bottom, Rana Plaza, recommendation engine, ride hailing / ride sharing, Ronald Coase, Second Machine Age, sentiment analysis, sharing economy, Shoshana Zuboff, side project, Silicon Valley, Silicon Valley startup, Skype, software as a service, speech recognition, spinning jenny, Stephen Hawking, The Future of Employment, The Nature of the Firm, transaction costs, two-sided market, union organizing, universal basic income, Vilfredo Pareto, women in the workforce, Works Progress Administration, Y Combinator

It’s a great program.” You could say that it’s as easy as playing a video game. Automatically recognizing and translating language looks easy in some ways because people are accustomed to the everyday nature of tools like Siri, Cortana, and Alexa. Automating human speech recognition and translation is a fundamental part of artificial intelligence that grew into a field called natural language processing. Natural language processing was helped immensely by the internet’s capacity to amass tons of examples of people writing and speaking in various languages. Yet capturing dialogue in video, particularly action scenes that change the mood and meaning of an actor’s words, remains a difficult task for a computer program to understand, let alone translate into different languages. In fairness to the computers, it takes a team of people to achieve this, too.

SEMI-AUTOMATED FUTURE The days of large enterprises with full-time employees working on-site are numbered as more and more projects rely on an off-site workforce available on demand, around the globe. Our employment classification systems, won in the 1930s to make full-time assembly line work sustainable, were not built for this future. As machines get more powerful and algorithms take over more and more problems, we know from past advances in natural language processing and image recognition that industries will continue to identify new problems to tackle. Thus, there is an ever-moving frontier between what machines can and can’t solve. We call this the paradox of automation’s last mile: as machines progress, the opportunity to automate something else appears on the horizon. This process constantly repeats, resulting in the expansion of automation through the perpetual creation and destruction of labor markets for new types of human labor.

See application programming interface (API) Apollo 13, 52 application programming interface (API) circumventing, 74 collaboration, 178–80 definition, xiv growth of, 169 head count on, 103–4 hiring via, 4–6 improvements to, 138–39 inequality of power in, 91–93 limitations of, 170–71, 174 logistics of, xiv, 62 networking, 127 thoughtlessness of, 67–68 training and trust, 71–72 articulation work, 238 n1 artificial intelligence (AI), 231 n41 advancement of, 176–77 humans, dependency on, ix–x, xviii–xxiii, 231 n41 misconceptions about, 191–92 natural language processing, 30 rise of, 6–8 training, xxiii, 6–8, 16, 170, 222 n11 Asra, 106–8 assembly lines, 41–42 automation cost shifts from, 173–77 human labor and, xviii–xxiii, 58–59, 176–77 machinery use in Industrial Revolution, 42–45 paradox of automation, 170 projections for, 243 n5 autonomy vs isolation, 80–84 Avendano, Pablo, 142, 143, 145 Ayesha, 81, 219 n8 B B Corps, 147, 164 bait-and-switch strategy, 83 Bangalore, xi, 17, 76, 219 n5, 238 n7 Bangladesh, 193–94 Beckett, Samuel, 29 benefits APIs, 171 at Caviar, 142 at CrowdFlower, 35 disappearance of, 98, 156 at DoorDash, 157–58, 162 full-time employment, 47, 48, 49, 60 at LeadGenius, 159–60 permatemps (Microsoft), 56–57 recommendations for, 189–92 statistics on, xxiii Uber lawsuit, 146 as worker cost, 32 See also employment, reasons for Bezos, Jeff, 2–3, 90, 135–37, 222 n5 Biewald, Lukas, 35 Bing, xii Blight, David, 226 n2 blue collar work.


pages: 477 words: 75,408

The Economic Singularity: Artificial Intelligence and the Death of Capitalism by Calum Chace

3D printing, additive manufacturing, agricultural Revolution, AI winter, Airbnb, artificial general intelligence, augmented reality, autonomous vehicles, banking crisis, basic income, Baxter: Rethink Robotics, Berlin Wall, Bernie Sanders, bitcoin, blockchain, call centre, Chris Urmson, congestion charging, credit crunch, David Ricardo: comparative advantage, Douglas Engelbart, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Flynn Effect, full employment, future of work, gender pay gap, gig economy, Google Glasses, Google X / Alphabet X, ImageNet competition, income inequality, industrial robot, Internet of things, invention of the telephone, invisible hand, James Watt: steam engine, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, knowledge worker, lifelogging, lump of labour, Lyft, Marc Andreessen, Mark Zuckerberg, Martin Wolf, McJob, means of production, Milgram experiment, Narrative Science, natural language processing, new economy, Occupy movement, Oculus Rift, PageRank, pattern recognition, post scarcity, post-industrial society, post-work, precariat, prediction markets, QWERTY keyboard, railway mania, RAND corporation, Ray Kurzweil, RFID, Rodney Brooks, Sam Altman, Satoshi Nakamoto, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Skype, software is eating the world, speech recognition, Stephen Hawking, Steve Jobs, TaskRabbit, technological singularity, The Future of Employment, Thomas Malthus, transaction costs, Tyler Cowen: Great Stagnation, Uber for X, uber lyft, universal basic income, Vernor Vinge, working-age population, Y Combinator, young professional

This algorithm, while ingenious, was not itself an example of artificial intelligence. Over time, Google Search has become unquestionably AI-powered. In August 2013, Google executed a major update of its search function by introducing Hummingbird, which enables the service to respond appropriately to questions phrased in natural language, such as, “what's the quickest route to Australia?”[lxxix] It combines AI techniques of natural language processing with colossal information resources (including Google's own Knowledge Graph, and of course Wikipedia) to analyse the context of the search query and make the response more relevant. PageRank wasn't dropped, but instead became just one of the 200 or so techniques that are now deployed to provide answers. Like IBM Watson, this is an example of how AI systems are often agglomerations of numerous approaches.

[civ] The software was initially licensed for single machines only, so even very well resourced organisations weren’t able to replicate the functionality that Google enjoys, but the move was significant. In April 2016 that restriction was lifted.[cv] In October 2015, Facebook announced that it would follow suit by open sourcing the designs for Big Sur, the server which runs the company's latest AI algorithms.[cvi] Then in May 2016 Google open sourced a natural language processing programme playfully called Parsey McParseFace, and SyntaxNet, an associated software toolkit. Google claims that in the kinds of sentences it can be used with, Parsey’s accuracy is 94%, almost as good as the 95% score achieved by human linguists.[cvii] Open sourcing confers a number of advantages. One is a level of goodwill among the AI community. More importantly, researchers in academia and elsewhere will learn the systems, and be able to work closely with Google and Facebook – and indeed be hired by them.

It was the year when our media caught on to the idea that AI presents enormous opportunity and enormous risk. This was thanks in no small part to the publication the previous year of Nick Bostrom's book “Superintelligence”. It was also the year when cutting-edge AI systems used deep learning and other techniques to demonstrate human-level capabilities in image recognition, speech recognition and natural language processing. In hindsight, 2015 may well be seen as a tipping point. Machines don't have to make everybody unemployed to bring about an economic singularity. If a majority of people – or even just a large minority – can never get hired again, we will need a different type of economy. Furthermore, we don't have to be absolutely certain of this outcome to make it worthwhile to monitor developments and make contingency plans.


Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos

AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, wikimedia commons

To address this, Apache Any23 provides the Validator classes to implement a Rule precondition, which, when matched, will trigger the Fix method to correct the code. Owing to its comprehensive features, Any23 is implemented in major Semantic Web applications, such as Sindice. 85 Chapter 4 ■ Semantic Web Development Tools General Architecture for Text Engineering (GATE) The General Architecture for Text Engineering (GATE), an open source text processor tool developed by the University of Sheffield, uses Natural Language Processing (NLP) methods to generate RDF from text files [8]. GATE’s Ontology plug-in provides an API for manipulating OWL-Lite ontologies that can be serialized as RDF and RDFS. If you work with OWL-DL ontologies, classes that are subclasses of restrictions supported in OWL-Lite are usually shown, but the classes that are subclasses of other restrictions will not be displayed. Similarly, plain RDF/RDFS files will not be shown correctly, because there is no way for the API to represent many constructs that are allowed in RDF but not allowed in OWL-Lite.

Linked Open Data plays a crucial role in the DeepQA architecture, not only in generating candidate answers, but also to score the answers while considering multiple points of view, such as type coercion and geographic proximity [7]. The data extracted from DBpedia also supports entity disambiguation and relation detection. YAGO is used for entity type identification, wherein disjointness properties are manually assigned to higher level types in the YAGO taxonomy. To be able to answer questions from a variety of domains, Watson implements relation detection and entity recognition on top of the Natural Language Processing (NLP) algorithms that process factual data from Wikipedia [8]. BBC’s Dynamic Semantic Publishing The British Broadcasting Corporation (BBC) has implemented RDF since 2010 [9] in web sites such as the World Cup 2010 web site [10] and the London 2012 Olympics web site [11]. Today, BBC News [12], BBC Sport [13], and many other web sites across the BBC are authored and published using Semantic Web technologies.

Index „„         A AllegroGraph ACID implementation, 151 client installation, 156 editions, 151 graph algorithms, 152 Gruff, 160 high-performance storage, 213 Java API connection() method, 157 create method, 157 indexing, 158 RDF statement, 159 read only mode, 158 showTriples method, 159 SparqlSelect query, 159 Triplestore information, 159 quintuplestore, 151 server installation RPM package, 152 TAR Archive, 155 virtual machine image file, 155 text indexing, 151 WebView, 152 Apache Jena, 94, 99 Apache Marmotta, 111 Apache Stanbol, 91 Arachnophilia, 80 Atomic process, 221 Atomicity, Consistency, Isolation, and Durability (ACID), 151, 161 „„         B BBEdit, 80 Big Data applications BBC’s Dynamic Semantic Publishing, 212 Google Knowledge Graph data resources, 200 Google Knowledge Carousel, 201–202 Google Knowledge Panel, 200, 202 JSON-LD annotation, 203 LocalBusiness annotation, 205 SERPs, 200 Google Knowledge Vault, 202 high-performance storage, 214 IBM Watson, 212 Library of Congress Linked Data Service, 213 social media applications (see Social media applications) variety, 199 velocity, 199 veracity, 199 volume, 199 Blazegraph, 171 BlueFish editor, 80 British Broadcasting Corporation (BBC), 212 Business Process Execution Language (BPEL), 140 „„         C Callimachus, 112 Contexts and Dependency Injection (CDI), 111 createDefaultModel() method, 94 CubicWeb, 109 Cypher Query Language (CQL), 188 „„         D D2R server, 193 DBpedia, 63 DBpedia mobile, 116 query Eisenach query, 225 SPARQL endpoint, 64, 225 resources, 63, 64 Spotlight, 84 DeepQA system, 212 227 ■ index Development tools advanced text editors, 79 application development Apache Jena, 94, 99 Sesame (see Sesame) browsers DBpedia Mobile, 116 facet-based (faceted) browsing, 113 IsaViz, 116 marbles, 114 ODE, 114 pivoting (rotation), 113 RelFinder, 117 Tabulator, 113 IDEs (see Integrated Development Environments (IDEs)) linked data software Apache Marmotta, 111 Callimachus, 112 LODStats, 113 Neologism, 112 sameAs.org, 112 Sindice, 110 ontology editors Apache Stanbol, 91 development stages, 86 Fluent Editor, 91 Protégé (see Protégé) SemanticWorks, 89 SML, 92 TopBraid Composer, 90 ZOOMA, 91 RDFizers Apache Any23, 85 GATE, 86 OpenRefine, 86 reasoners ABOX reasoning, 92 FaCT++, 94 HermiT, 92 OWL API, 92 OWLLink support, 92 Pellet, 93 RACER, 94 semantic annotators and converters DBpedia Spotlight, 84 Google Structured Data Testing Tool, 84 RDFa 1.1 distiller and parser, 82 RDFa Play, 82 RDF distiller, 83 Direct graph, 218 Direct mapping, 218 228 „„         E Eclipse Apache Jena set up, 99 JDK installation, 98, 99 Sesame set up, 103 EditPlus, 80 „„         F Facebook Graph API current state representation, 207 Facebook Module, 210 Graph API Explorer Android, 209 fields of node, 208 FOAF profile augmentation, 210 HTTP GET requests, 208 identifier and user name, 207 iOS, 209 JavaScript, 209 JSON-Turtle conversions, 210 Linked Data, 210 PHP, 209 RDF triples, 209 RDF/Turtle output, 210 Turtle translation, 209 JSON, 207 RESTful JSON API, 207 unique identifier, 207 Facebook Module of Apache Marmotta’s LDClient library, 210 Fast Classification of Terminologies (FaCT++), 94 Fluent Editor, 91 4Store application process, 169 RDF file, 169 rest-client installation, 170 SPARQL query, 170 SPARQL server, 169, 195 Fuseki, 192 „„         G General Architecture for Text Engineering (GATE), 86 GeoNames, 65 Gleaning Resource Descriptions from Dialects of Languages (GRDDL), 39 Google Knowledge Graph data resources, 200 Google Knowledge Carousel, 201–202 Google Knowledge Panel, 200, 202 ■ Index JSON-LD annotation Band in Markup, 203 product description, 203 product offering, 204 LocalBusiness annotation, 205 SERPs, 200 Google Knowledge Panel, 200 Graph databases 4Store process, 169 RDF file, 169 rest-client installation, 170 SPARQL query, 170 advantages, 146, 149 AllegroGraph (see AllegroGraph) Blazegraph, 171 definition, 145 features, 146 index-free adjacency, 145 named graph, 149–150 Neo4j (see Neo4j) Oracle, 171 processing engine, 145 quadstore, 149 storage, 145 triplestores, 149 Graphical User Interface (GUI), 86–87 Gruff, 160 „„         H Hadoop Distributed File System (HDFS), 171 „„         I IBM Watson Developers Cloud, 212 Integrated Development Environments (IDEs) CubicWeb, 109 Eclipse Apache Jena set up, 99 Java Development Kit installation, 99 Sesame set up, 103 NetBeans, 108 Internationalized Domain Names (IDN), 9 International Standard Book Number (ISBN), 16 Internet Reasoning Service (IRS), 141 IsaViz, 116 „„         J Java Development Kit (JDK), 99 Java Runtime Environment (JRE), 99 JavaScript Object Notation for Linked Data (JSON-LD), 37 Java Virtual Machine (JVM), 99 „„         K Knowledge representation standards GRDDL, 39 HTML5 microdata attributes, 35 microdata DOM API, 37 JSON-LD, 37 machine-readable annotation formats, 23 microformats drafts and future, 32 hCalendar, 25 hCard, 26 h-event, 26 rel=“license”, 28 rel=“nofollow”, 29 rel=“tag”, 30 URI profile, 25 vote links, 30 XFN, 30 XMDP, 31 OWL classes, 51 description logic, 46 properties, 50 syntaxes, 49 variants, 48 parsers, 54 R2RML, 40 RDF, 18 RDFa, 32 RDFS classes, 42 domains and ranges, 44 instance, 42 properties, 44 subclasses, 42 reasoning, 54 RIF, 53 SKOS, 53 vocabularies and ontologies books, 16 DOAP, 17 e-commerce, 16 FOAF, 13 licensing, 17 media ontologies, 18 metadata, 15 online communities, 18 person vocabularies, 15 PRISM, 16 publications, 16 schema.org, 14 Komodo Edit, 80 229 ■ index „„         L „„         P LinkedGeoData, 66 Linked Open Data (LOD) cloud diagram, 67 collections, 67 creation interlinking, 72 licenses, 71 RDF statements, 72 RDF structure, 70 your dataset, 74 DBpedia, 63 five-star rating system, 60 GeoNames, 65 LinkedGeoData, 66 principles, 59 RDF crawling, 62 RDF dumps, 62 SPARQL endpoints, 62 visualization, 75 Wikidata, 65 YAGO, 67 LODStats, 113 Pellet, 93 Persistent Uniform Resource Locators (PURLs), 9 Process model, 129 Protégé Active Ontology tab, 88 application, 86 class hierarchies, 88 command line, 86 GUI, 87 HermiT reasoner, 93 Individuals tab, 88 Learning Health System, 86 Object Properties and Data Properties tabs, 88 OntoGraf tab, 88 OWLViz, 88 SPARQL Query tab, 89 URIs, 88 PublishMyData, 195 „„         Q „„         M Quadstores, 149 MAchine-Readable Cataloging (MARC), 213 MicroWSMO, 137 „„         R „„         N Named graph, 149 Natural Language Processing (NLP) methods, 86 Neo4j, 161 Cypher commands, 163 graph style sheet, 163 Java API database installation, 165 Eclipse, 164, 168 node method, 166 main method, 166 RDF statement, 167 shut down method, 167 WEBSITE_OF method, 166 server installation, 161 web interface, 162 Neologism, 112 NetBeans, 108 Notepad++, 80 „„         O OpenLink Data Explorer (ODE), 114 OpenLink Virtuoso, 190 OpenRefine, 86 Oracle, 171 230 RACER, 94 RDB2RML (R2RML), 40 RDB to RDF direct mapping employee database table, 217 employee_project database table, 217 project database table, 218 source code, 218 Red Hat Package Manager (RPM) package, 152 Relational database (RDB), 217 RelFinder, 117 Renamed ABox and Concept Expression Reasoner (Racer), 94 rep.initialize() method, 104 Resource Description Framework (RDF), 217 attributes, 32 crawling, 62 dumps, 62 graph, 20, 145 R2RML, 40 statements, 72 structure creation, 70 triples/statements, 19, 220 turtle, 20 vocabulary, 18 RESTful JSON API, 207 Rule Interchange Format (RIF), 53 ■ Index „„         S sameAs.org, 112 Search engine optimization (SEO), 79 Search Engine Result Pages (SERPs), 84, 200 Semantic Annotations for Web Service Description Language (SAWSDL), 127 Semantic Automated Discovery and Integration (SADI), 142 Semantic Measures Library (SML), 92 Semantic search engines, 189 Semantic Web technology, 1 Big Data (see Big Data applications) components AI, 5 controlled vocabularies, 5 inference, 7 ontologies, 6 taxonomies, 5 features, 8 structured data, 2 web evolution, 2 Semantic Web Services OWL-S (see Web Ontology Language for Services (OWL-S)) process, 121 properties, 122 SOAP fault structure, 124 message structure, 122 software IRS, 141 SADI, 142 WSMT, 141 WSMX, 141 UDDI, 142 WS-BPEL (see Web Services Business Process Execution Language (WS-BPEL)) WSDL (see Web Service Description Language (WSDL)) WSML (see Web Service Modeling Language (WSML)) WSMO (see Web Service Modeling Ontology (WSMO)) SemanticWorks, 89 Service profile, 129 Sesame Alibaba, 96 Eclipse, 103 empty graph creation, 98 Graph API, 97 local repository, 96 RDF Model API, 97 RDF triplestore, 96 RemoteRepositoryManager, 97 Repository API, 96 SAIL, 97 triple support, 98 default ValueFactory implementation, 97 Sesame RDF Query Language (SeRQL), 186 Simple Knowledge Organization System (SKOS), 53 Simple Object Access Protocol (SOAP) binding interface, 127 fault structure, 124 message structure, 122 Sindice, 85, 110 SOAPssage, 123 Social media applications Facebook Social Graph Facebook Graph API (see Facebook Graph API) friends recommendation, 206–207 node and edge, 206 Open Graph Protocol, 211 Twitter Cards, 211 Software as a Service (SaaS), 195 SPARQL endpoint 4store’s HTTP server, 195 callback function, 196 D2R configuration file, 193 D2R server installation, 193 Fuseki, 192 jQuery request data, 195 JSON-P request data, 196 OpenLink Virtuoso process, 190–191 PublishMyData request data, 195–196 URL encoding PublishMyData, 195 SPARQL queries ASK query, 179 CONSTRUCT query, 180 core types, 176 CQL, 188 default namespace, 174 DESCRIBE query, 180 existence checking function, 177 federated query, 181 graph management operations ADD operation, 185–186 COPY DEFAULT TO operation, 184 default graph, 184 MOVE DEFAULT TO operation, 185 graph patterns, 176 graph update operations DELETE DATA operation, 183 INSERT DATA operation, 182–183 language checking function, 177 LOD datasets, 189 multiple variable match, 176 namespace declaration, 173 one variable match, 176 231 ■ index SPARQL queries (cont.) property path, 177 public SPARQL endpoints, 190 query engine remove graph property value, 187 Sesame Graph API, 187 Sesame Repository API, 186 RDF graph, 174 RDF triple matching, 176 REASON query, 181 SELECT query, 178–179 solution modifiers, 178 SPARQL 1.0 core types, 175 SPARQL 1.1 aggregation, 175 entailment regimes, 175 service description, 175 Uniform HTTP Protocol, 175 Update language, 175 SPARQL endpoint (see SPARQL endpoint) structure, 174 triple patterns, 176 URI syntax, 173 Storage And Inference Layer API (SAIL), 97 „„         T TextWrangler, 80 TopBraid Composer, 90 Triples map, 218 Twitter Cards, 211 „„         U, V Uniform Resource Identifier (URI), 9 Uniform Resource Names (URNs), 9 Universal Description, Discovery and Integration (UDDI), 142 US Library of Congress, 213 „„         W Web Ontology Language (OWL), 129 classes, 51 description logic, 46 properties, 50 syntaxes, 49 variants, 48 Web Ontology Language for Services (OWL-S) atomic process, 221 output and effect conditions, 132 parameter class, 130 232 precondition process, 131 process agents, 131 properties, 130 service process, 131 situation calculus, 129 SWRL variable, 130 URI value, 131 Web resource identifiers, 8 Web Services Business Process Execution Language (WS-BPEL), 140 Web Services Description Language (WSDL) data types, 126 elements, 124 endpoint element, 127 HTTP binding interface, 126 interface element, 126 namespace declaration, 125 SAWSDL annotation file, 128 modelReference, 127 skeleton document, 125 SOAP binding interface, 127 Web Service Modeling eXecution environment (WSMX), 141 Web Service Modeling Language (WSML) importsOntology, 139 IRI quotes, 139 mediator, 140 namespace declarations, 139 nonfunctional property, 139 syntaxes, 138 XML schema data types, 138 Web Service Modeling Ontology (WSMO) choreography and orchestration expresses, 137 class capability, 137 components, 133 definition, 133 entity set definitions, 135 function, 136 goal class, 137 mediators, 133–134 MicroWSMO, 137 nonfunctional properties, 134 ontology instance, 136 post-condition creditcard service, 224 pre-condition creditcard service, 224 relations definition, 135 service class, 134 service goal definition, 223 travel agency modeling, 223 WSMO-lite, 137 ■ Index Web Services Modeling Toolkit (WSMT), 141 Wikidata, 65 WSMO-Lite, 137 „„         X XHTML Friends Network (XFN), 30 XHTML MetaData Profiles (XMDP), 31 „„         Y Yet Another Great Ontology (YAGO), 67 „„         Z ZOOMA, 91 233 Mastering Structured Data on the Semantic Web From HTML5 Microdata to Linked Open Data Leslie F.


pages: 321

Finding Alphas: A Quantitative Approach to Building Trading Strategies by Igor Tulchinsky

algorithmic trading, asset allocation, automated trading system, backtesting, barriers to entry, business cycle, buy and hold, capital asset pricing model, constrained optimization, corporate governance, correlation coefficient, credit crunch, Credit Default Swap, discounted cash flows, discrete time, diversification, diversified portfolio, Eugene Fama: efficient market hypothesis, financial intermediation, Flash crash, implied volatility, index arbitrage, index fund, intangible asset, iterative process, Long Term Capital Management, loss aversion, market design, market microstructure, merger arbitrage, natural language processing, passive investing, pattern recognition, performance metric, popular capitalism, prediction markets, price discovery process, profit motive, quantitative trading / quantitative finance, random walk, Renaissance Technologies, risk tolerance, risk-adjusted returns, risk/return, selection bias, sentiment analysis, shareholder value, Sharpe ratio, short selling, Silicon Valley, speech recognition, statistical arbitrage, statistical model, stochastic process, survivorship bias, systematic trading, text mining, transaction costs, Vanguard fund, yield curve

There are many vendors that specialize in collecting, parsing, processing, and delivering data. If the data is simple, vendors may provide only the raw data they have collected, such as price and volume. Sometimes vendors do parsing and processing before providing data to their clients; fundamental data is an example. For unstructured yet sophisticated data, such as news, Twitter posts, and so on, vendors typically apply natural language processing techniques to analyze the content of the raw data. They provide machine-readable data to their clients instead of raw data that is only human-readable. Some vendors even sell alpha models directly – this means the data itself is the output of alpha models. The clients need only to load the data and trade according to it. Such alpha models are risky, however, because they may be overfit and/or overcrowded, with many clients of the same vendor trading the same model.

The name comes from Canadian scientist Geoffrey Hinton, who created an unsupervised method known as the restricted Boltzmann machine (RBM) for pretraining NNs with a large number of neuron layers. That was meant to improve on the backpropagation training method, but there is no strong evidence that it really was an improvement. Another direction in deep learning is recurrent neural networks (RNNs) and natural language processing. One problem that arises in calibrating RNNs is that the changes in the weights from step to step can become too small or too large. This is called the vanishing gradient problem. These days, the words “deep learning” more often refer to convolutional neural networks (CNNs). The architecture of CNNs was introduced by computer scientists Kunihiko Fukushima, who developed the 126 Finding Alphas neocognitron model (feed-forward NN), and Yann LeCun, who modified the backpropagation algorithm for neocognitron training.

Developing momentum alphas on liquid universes (sets of more efficient stocks) is a particular challenge, which requires deeper exploration. 22 The Impact of News and Social Media on Stock Returns By Wancheng Zhang INTRODUCTION Stock prices naturally respond to news. But in recent years, news and sentiment seen on social media have grown increasingly significant as potential predictors of stock prices. However, it is challenging to make alphas using news. As unstructured data that often includes text and multimedia content, news cannot be understood directly by a computer. We can use natural language processing (NLP) and machine learning methods to classify and score raw news content, and we can measure additional properties of the news, such as novelty, relevance, and category, to better describe the sentiment of the news. Similar techniques can be applied to social media data to generate alphas, though we should bear in mind that social media has much more volume and is much noisier than conventional news media.


pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline by Cathy O'Neil, Rachel Schutt

Amazon Mechanical Turk, augmented reality, Augustin-Louis Cauchy, barriers to entry, Bayesian statistics, bioinformatics, computer vision, correlation does not imply causation, crowdsourcing, distributed generation, Edward Snowden, Emanuel Derman, fault tolerance, Filter Bubble, finite state, Firefox, game design, Google Glasses, index card, information retrieval, iterative process, John Harrison: Longitude, Khan Academy, Kickstarter, Mars Rover, Nate Silver, natural language processing, Netflix Prize, p-value, pattern recognition, performance metric, personalized medicine, pull request, recommendation engine, rent-seeking, selection bias, Silicon Valley, speech recognition, statistical model, stochastic process, text mining, the scientific method, The Wisdom of Crowds, Watson beat the top human players on Jeopardy!, X Prize

format=json&query=nytd_section_facet: [%s]&fields=url,title,body&rank=newest&offset=%s &api-key=Your_Key_Here" # create an empty list to hold 3 result sets resultsSports <- vector("list", 3) ## loop through 0, 1 and 2 to call the API for each value for(i in 0:2) { # first build the query string replacing the first %s with Sport and the second %s with the current value of i tempCall <- sprintf(theCall, "Sports", i) # make the query and get the json response tempJson <- fromJSON(file=tempCall) # convert the json into a 10x3 data.frame and save it to the list resultsSports[[i + 1]] <- ldply(tempJson$results, as.data.frame) } # convert the list into a data.frame resultsDFSports <- ldply(resultsSports) # make a new column indicating this comes from Sports resultsDFSports$Section <- "Sports" ## repeat that whole business for arts ## ideally you would do this in a more eloquent manner, but this is just for illustration resultsArts <- vector("list", 3) for(i in 0:2) { tempCall <- sprintf(theCall, "Arts", i) tempJson <- fromJSON(file=tempCall) resultsArts[[i + 1]] <- ldply(tempJson$results, as.data.frame) } resultsDFArts <- ldply(resultsArts) resultsDFArts$Section <- "Arts" # combine them both into one data.frame resultBig <- rbind(resultsDFArts, resultsDFSports) dim(resultBig) View(resultBig) ## now time for tokenizing # create the document-term matrix in english, removing numbers and stop words and stemming words doc_matrix <- create_matrix(resultBig$body, language="english", removeNumbers=TRUE, removeStopwords=TRUE, stemWords=TRUE) doc_matrix View(as.matrix(doc_matrix)) # create a training and testing set theOrder <- sample(60) container <- create_container(matrix=doc_matrix, labels=resultBig$Section, trainSize=theOrder[1:40], testSize=theOrder[41:60], virgin=FALSE) Historical Context: Natural Language Processing The example in this chapter where the raw data is text is just the tip of the iceberg of a whole field of research in computer science called natural language processing (NLP). The types of problems that can be solved with NLP include machine translation, where given text in one language, the algorithm can translate the text to another language; semantic analysis; part of speech tagging; and document classification (of which spam filtering is an example). Research in these areas dates back to the 1950s.

Figure 9-7 shows an example of a display box, which are designed to convey a retro vibe. Each box has an embedded Linux processor running Python, and a sound card that makes various sounds—clicking, typing, waves—depending on what scene is playing. Figure 9-7. Display box for Moveable Type The data is collected via text from New York Times articles, blogs, and search engine activity. Every sentence is parsed using Stanford natural language processing techniques, which diagram sentences. Altogether there are about 15 scenes so far, and it’s written in code so one can keep adding to it. Here’s a YouTube interview with Mark and Ben about the exhibit. Project Cascade: Lives on a Screen Mark next told us about Cascade, which was a joint work with Jer Thorp—data artist-in-residence at the New York Times —in partnership with bit.ly.

To understand the pictures, you imagine there’s a force, like a wind, which sends the nodes (blogs) out to the edge, but then there’s a counteracting force, namely the links between blogs, which attach them together. Figure 10-1 shows an example of the Arabic blogosphere. Figure 10-1. Example of the Arabic blogosphere The different colors represent countries and clusters of blogs. The size of each dot is centrality through degree, i.e., the number of links to other blogs in the network. The physical structure of the blogosphere can give us insight. If we analyze text using natural language processing (NLP), thinking of the blog posts as a pile of text or a river of text, then we see the micro or macro picture only—we lose the most important story. What’s missing there is social network analysis (SNA), which helps us map and analyze the patterns of interaction. The 12 different international blogospheres, for example, look different. We can infer that different societies have different interests, which give rise to different patterns.


Mastering Machine Learning With Scikit-Learn by Gavin Hackeling

computer vision, constrained optimization, correlation coefficient, Debian, distributed generation, iterative process, natural language processing, Occam's razor, optical character recognition, performance metric, recommendation engine

Prior to joining Reonomy, Sarah earned a Master's degree from the University of Michigan School of Information. www.it-ebooks.info Mikhail Korobov is a software developer at ScrapingHub Inc., where he works on web scraping, information extraction, natural language processing, machine learning, and web development tasks. He is an NLTK team member, Scrapy team member, and an author or contributor to many other open source projects. I'd like to thank my wife, Aleksandra, for her support and patience and for the cookies. Aman Madaan is currently pursuing his Master's in Computer Science and Engineering. His interests span across machine learning, information extraction, natural language processing, and distributed computing. More details about his skills, interests, and experience can be found at http://www.amanmadaan.in. www.it-ebooks.info www.PacktPub.com Support files, eBooks, discount offers, and more You might want to visit www.PacktPub.com for support files and downloads related to your book.

Data sets with even a modest number of features can result in mapped feature spaces with massive dimensions. scikit-learn provides several commonly used kernels, including the polynomial, sigmoid, Gaussian, and linear kernels. Polynomial kernels are given by the following equation: K ( x, x′ ) = (1 + x × x′ ) k Quadratic kernels, or polynomial kernels where k is equal to 2, are commonly used in natural language processing. The sigmoid kernel is given by the following equation. γ and r are hyperparameters that can be tuned through cross-validation: K ( x, x′ ) = tanh γ ( x, x′ ) + r The Gaussian kernel is a good first choice for problems requiring nonlinear models. The Gaussian kernel is a radial basis function. A decision boundary that is a hyperplane in the mapped feature space is similar to a decision boundary that is a hypersphere in the original space.


pages: 125 words: 27,675

Applied Text Analysis With Python: Enabling Language-Aware Data Products With Machine Learning by Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda

full text search, natural language processing, quantitative easing, sentiment analysis, statistical model

Machine Learning on Text As discussed in [Link to Come], natural language is flexible, evolves over time, and depends on context. Computation and analysis on language must also be flexible, therefore the primary computational technique for text analytics is machine learning. Learning techniques give data scientists the ability to train models in a specific context on a specific corpus, make predictions on new data, and adapt over time as the corpus grows and changes. In fact, most natural language processing uses machine learning in one form or another, from tokenization and part of speech tagging, as we saw in the previous chapter, to named entity recognition, entailment, and parsing. More recently, textual machine learning has enabled applications that utilize sentiment analysis, word sense disambiguation, automatic translation and tagging, scene recognition, captioning, chatbots, and more!

This allows us to compare models without bias and prevents overfit, indicating that the model is able to generalize to unseen inputs. However, even when data is split into training and test sets, there is a potential that certain chunks of the data will have more variance than others. To handle this case, we shuffle our dataset, and divide into k train and test splits, averaging the scores for each split. Note that if our natural language processing application didn’t have to do any machine learning work, the CorpusReader would be enough; after preprocessing, the text could go directly into a transformer. However, if we use the sklearn.cross_validation.train_test_split function directly on the reader, the data would be loaded into memory all at once, leaving us precious little RAM for computation if any at all. Although eventually we have to load a matrix representation of our entire corpus into memory, at that point documents will be described as numeric sparse matrices, which are much more compact!


Speaking Code: Coding as Aesthetic and Political Expression by Geoff Cox, Alex McLean

4chan, Amazon Mechanical Turk, augmented reality, bash_history, bitcoin, cloud computing, computer age, computer vision, crowdsourcing, dematerialisation, Donald Knuth, Douglas Hofstadter, en.wikipedia.org, Everything should be made as simple as possible, finite state, Gödel, Escher, Bach, Jacques de Vaucanson, Larry Wall, late capitalism, means of production, natural language processing, new economy, Norbert Wiener, Occupy movement, packet switching, peer-to-peer, Richard Stallman, Ronald Coase, Slavoj Žižek, social software, social web, software studies, speech recognition, stem cell, Stewart Brand, The Nature of the Firm, Turing machine, Turing test, Vilfredo Pareto, We are Anonymous. We are Legion, We are the 99%, WikiLeaks

The title makes reference to the Greek myth in which Pygmalion, a sculptor, falls in love with a statue he carves, and Venus grants it the breath of life.47 The Pygmalion myth stands as a useful analogy for the idea of breathing life into machines. The chatterbot Eliza produced by Joseph Weizenbaum between 1964 to 1966, named after Eliza Doolittle, is a disarmingly simple example based on similar motivations: to simulate a believable exchange with a human conversant. It uses primitive natural-language processing to simulate a conversation with a therapist, producing human-like responses by implementing a simple script based on key words and language patterns through which responses are generated. Here is an example: I am the psychotherapist. Please, describe your problems. > Hello world. Why do you say hello world? > It's the orthodox way to begin when using a new computer language. You have your mind on computers, it seems. > Sometimes I think I am a computer.

Maybe your life has something to do with this. Without involving complex algorithms, it can generate responses that appear to make some degree of sense. Yet it is interesting to note how the illusion of conversation follows an extremely reductive model of human expression, and the fantasies of machine intelligence seem to be similarly founded on reductive logic. At the same Vocable Code 31 time, natural-language processing programs and other chatterbots offer good examples of the speechlike procedures mentioned thus far, as well as the apparent impossibility of duplicating actual speech. Intelligence To demonstrate believability, a machine would be required to possess some kind of intelligence that reflects the capacity for human reasoning, in parallel to turning mere voice sounds into proper speech that expresses human gentility.

The choice of the song “Daisy Bell” is explained on the site: “originally written by Harry Dacre in 1892, was made famous in 1962 by John Kelly, Max Mathews, and Carol Lockbaum as the first example of musical speech synthesis. In contrast to the 1962 version, Bicycle Built for 2,000 was synthesized with a distributed system of human voices from all over the world.” 120. 2001: A Space Odyssey (1968, dir. Stanley Kubrick, Metro-Goldwyn-Mayer). HAL is a computer capable of speech, speech recognition, facial recognition, natural language processing, lip reading, art appreciation, interpreting and reproducing emotional behaviors, reasoning, and playing chess. Notes to Pages 67–70 125 121. The full story was on the Forumwarz blog but is no longer available. See http:// en.wikipedia.org/wiki/Forumwarz. Thanks to Robert Jackson for identifying this example. 122. Ibid. 123. Berardi, The Soul at Work, 89. 124. Ibid., 207. 125. This is something that Berardi also identifies in the article “An Introduction to Therapoetry: The Voice Against the Image / Poetry Against Semiocapital,” in Geoff Cox, Nav Haq, and Tom Trevor, eds., “Art, Activism and Recuperation,” Concept Store Journal 3 (Bristol: Arnolfini, 2010). 3 Coding Publics 1.


pages: 134 words: 29,488

Python Requests Essentials by Rakesh Vidya Chandra, Bala Subrahmanyam Varanasi

create, read, update, delete, en.wikipedia.org, Kickstarter, MITM: man-in-the-middle, MVC pattern, natural language processing, RFC: Request For Comment, RFID, supply-chain management, web application

[ 48 ] Interacting with Social Media Using Requests In this contemporary world, our lives are woven with a lot of interactions and collaborations with social media. The information that is available on the web is very valuable and it is being used by abundant resources. For instance, the news that is trending in the world can be spotted easily from a Twitter hashtag and this can be achieved by interacting with the Twitter API. Using natural language processing, we can classify emotion of a person by grabbing the Facebook status of an account. All this stuff can be accomplished easily with the help of Requests using the concerned APIs. Requests is a perfect module, if we want to reach out API frequently, as it supports pretty much everything, like caching, redirection, proxies, and so on. We will cover the following topics in this chapter: • Interacting with Twitter • Interacting with Facebook • Interacting with reddit [ 49 ] Interacting with Social Media Using Requests API introduction Before diving into details, let us have a quick look at what exactly is an Application Programming Interface (API).

They contain information about the date of birth, gender, place, income, and so on, of the people of a country. Unstructured data In contrast to structured data, unstructured data either misses out on a standard format or stays unorganized even though a specific format is imposed on it. Due to this reason, it becomes difficult to deal with different parts of the data. Also, it turns into a tedious task. To handle unstructured data, different techniques such as text analytics, Natural Language Processing (NLP), and data mining are used. Images, scientific data, text-heavy content (such as newspapers, health records, and so on), come under the unstructured data type. [ 66 ] Chapter 6 Semistructured data Semistructured data is a type of data that follows an irregular trend or has a structure which changes rapidly. This data can be a self described one, it uses tags and other markers to establish a semantic relationship among the elements of the data.


pages: 118 words: 35,663

Smart Machines: IBM's Watson and the Era of Cognitive Computing (Columbia Business School Publishing) by John E. Kelly Iii

AI winter, call centre, carbon footprint, crowdsourcing, demand response, discovery of DNA, disruptive innovation, Erik Brynjolfsson, future of work, Geoffrey West, Santa Fe Institute, global supply chain, Internet of things, John von Neumann, Mars Rover, natural language processing, optical character recognition, pattern recognition, planetary scale, RAND corporation, RFID, Richard Feynman, smart grid, smart meter, speech recognition, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!

Together, we can drive the exploration and invention that will shape society, the economy, and business for the next fifty years. 1 A NEW ERA OF COMPUTING IBM’s Watson computer created a sensation when it bested two past grand champions on the TV quiz show Jeopardy! Tens of millions of people suddenly understood how “smart” a computer could be. This was no mere parlor trick; the scientists who designed Watson built upon decades of research in the fields of artificial intelligence and natural-language processing and produced a series of breakthroughs. Their ingenuity made it possible for a system to excel at a game that requires both encyclopedic knowledge and lightning-quick recall. In preparation for the match, the machine ingested millions of pages of information. On the TV show, first broadcast in February 2011, the system was able to search that vast storehouse in response to questions, size up its confidence level, and, when sufficiently confident, beat the humans to the buzzer.

As it acquires answers, it will build a collection of learned axioms that strengthen its command of given domains. Other improvements to Watson have come. People are now able to view the logic and evidence upon which Watson presents options. Watson is now able to digest not just textual information but also structured statistical data, such as electronic medical records. A different group at IBM is working on natural-language-processing technology that will allow people to engage in spoken conversations with Watson. At the highest level, many of the changes are aimed at moving Watson from answering specific questions to dealing with complex and incomplete problem scenarios—the way humans experience things. In fact, as people in particular professions and industries experiment with Watson, they find that the basic question-and-answer capabilities, while useful, are not the most valuable aspects of the systems.


pages: 370 words: 107,983

Rage Inside the Machine: The Prejudice of Algorithms, and How to Stop the Internet Making Bigots of Us All by Robert Elliott Smith

Ada Lovelace, affirmative action, AI winter, Alfred Russel Wallace, Amazon Mechanical Turk, animal electricity, autonomous vehicles, Black Swan, British Empire, cellular automata, citizen journalism, Claude Shannon: information theory, combinatorial explosion, corporate personhood, correlation coefficient, crowdsourcing, Daniel Kahneman / Amos Tversky, desegregation, discovery of DNA, Douglas Hofstadter, Elon Musk, Fellow of the Royal Society, feminist movement, Filter Bubble, Flash crash, Gerolamo Cardano, gig economy, Gödel, Escher, Bach, invention of the wheel, invisible hand, Jacquard loom, Jacques de Vaucanson, John Harrison: Longitude, John von Neumann, Kenneth Arrow, low skilled workers, Mark Zuckerberg, mass immigration, meta analysis, meta-analysis, mutually assured destruction, natural language processing, new economy, On the Economy of Machinery and Manufactures, p-value, pattern recognition, Paul Samuelson, performance metric, Pierre-Simon Laplace, precariat, profit maximization, profit motive, Silicon Valley, social intelligence, statistical model, Stephen Hawking, stochastic process, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, The Future of Employment, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Thomas Bayes, Thomas Malthus, traveling salesman, Turing machine, Turing test, twin studies, Vilfredo Pareto, Von Neumann architecture, women in the workforce

., here Milbanke, Anne Isabella, here Mill, John Stuart, here, here, here, here Minsky, Marvin, here, here, here modelling, economic, here Morel, Edmund, here motivated reasoners, here, here, here, here MTurk, here Murray, Charles, here Musk, Elon, here, here MYCIN, here, here, here, here mythemes, here natural language processing. See NLP natural selection, here, here, here, here, here, here, here, here Nautical Almanac, here, here neural networks, here, here, here, here, here, here Newell, Allen, here Newton, Sir Issac, here, here Nietzsche, Friedrich, here NLP (natural language processing), here, here, here, here Normal Distribution, here, here Ovadya, Aviv, here Oxford Martin jobs study, here Page, Larry, here Papert, Seymour, here, here Pareto, Vilfredo, here Pascal, Blaise, here, here Pascaline, here, here, here, here Pasteur, Louis, here PCA (principle component analysis), here, here, here, here, here, here Pearson, Egon, here Pearson, Joel, here Pearson, Karl, here, here, here, here Peirce, Charles Sanders, here, here, here perceptron, here, here, here, here, here Perceptron Learning Algorithm.

Algorithms may start with simplified features, punch-card photos, but they process them in manners that people simply can’t fully understand. Sometimes that means their potentially offensive, potentially dangerous outputs are both unpredictable and irreparable. This is why the award-winning Google AI engineer Ali Rahimi said in 2017 that ‘Machine learning has become alchemy’.6 Machine-learning algorithms are now used in everything from image recognition, to natural language processing, to medical diagnosis, and virtually every other modern AI application. They are the core of big data analysis, and the bedrock of virtually all modern AI, the technology that draws its frames and atoms from big data to overcome the old problems of expert systems design. Yet the implementations of these algorithms, the actual programs doing the classification and generalization, have become so opaque that it is comparable to medieval pre-science.

The systems they create are certainly complex enough to demonstrate emergent behaviours, outcomes of complex networks that cannot be predicted from their individual parts. However, the emergent, spontaneous order these agents help to create may not be the one many people would desire. 9 Defining Terms Human communication cannot be reduced to information. Science-fiction author URSULA K. LE GUIN, 20041 Just like image recognition, natural language processing (NLP) is at the vanguard of AI today, and is exploding in its use, employing deep networks, machine-learning algorithms and the explosion of big data on the Internet (in the form of documents, websites, blogs, posts, tweets, etc., estimated to be enough text to fill 1011 A4 pages, with a large fraction of that text changing daily).2 NLP researchers and algorithm engineers face the daily challenge of getting computers to process, analyse and ‘comprehend’ some fraction of this text, as well as the challenges of recognizing human speech (audio and video content are also exploding online), and even generating some natural-sounding language in print and audio form.


pages: 255 words: 78,207

Web Scraping With Python: Collecting Data From the Modern Web by Ryan Mitchell

AltaVista, Amazon Web Services, cloud computing, en.wikipedia.org, Firefox, Guido van Rossum, meta analysis, meta-analysis, natural language processing, optical character recognition, random walk, self-driving car, Turing test, web application

Although you might not think that text analysis has anything to do with your project, understanding the concepts behind it can be extremely useful for all sorts of machine learning, as well as the more general ability to model real-world problems in proba‐ bilistic and algorithmic terms. 1 Although many of the techniques described in this chapter can be applied to all or most languages, it’s okay for now to focus on natural language processing in English only. Tools such as Python’s Natural Language Toolkit, for example, focus on English. Fifty-six percent of the Internet is still in English (with German follow‐ ing at a mere 6%, according to http://w3techs.com/technologies/overview/content_language/all). But who knows? English’s hold on the majority of the Internet will almost certainly change in the future, and further updates may be necessary in the next few years. 119 For instance, the Shazam music service can identify audio as containing a certain song recording, even if that audio contains ambient noise or distortion.

I hope that the coverage here will inspire you to think beyond conventional web scraping, or at least give some initial direction about where to begin when undertaking a project that requires natural language analysis. There are many excellent resources on introductory language processing and Python’s Natural Language Toolkit. In particular, Steven Bird, Ewan Klein, and Edward Loper’s book Natural Language Processing with Python presents both a com‐ prehensive and introductory approach to the topic. In addition, James Pustejovsky and Amber Stubbs’ Natural Language Annotations for Machine Learning provides a slightly more advanced theoretical guide. You’ll need a knowledge of Python to implement the lessons; the topics covered work perfectly with Python’s Natural Language Toolkit. 136 | Chapter 8: Reading and Writing Natural Languages CHAPTER 9 Crawling Through Forms and Logins One of the first questions that comes up when you start to move beyond the basics of web scraping is: “How do I access information behind a login screen?”

Hamidi, 227 intellectual property, 217-219 234 internal links crawling an entire site, 35-40 crawling with Scrapy, 45-48 traversing a single domain, 31-35 Internet about, 213-216 cautions downloading files from, 74 crawling across, 40-45 moving forward, 206 IP address blocking, avoiding, 199-200 ISO character sets, 96-98 is_displayed function, 186 Item object, 46, 48 items.py file, 46 | Index lambda expressions, 28, 74 legalities of web scraping, 217-230 lexicographical analysis with NLTK, 132-136 libraries bundling with projects, 7 OCR support, 161-164 logging with Scrapy, 48 logins about, 137 handling, 142-143 troubleshooting, 187 lxml library, 29 M machine learning, 135, 180 machine training, 135, 171-174 Markov text generators, 123-129 media files, storing, 71-74 Mersenne Twister algorithm, 34 methods (HTTP), 51 Microsoft SQL Server, 76 Microsoft Word, 102-105 MIME (Multipurpose Internet Mail Exten‐ sions) protocol, 90 MIMEText object, 90 MySQL about, 76 basic commands, 79-82 database techniques, 85-87 installing, 77-79 integrating with Python, 82-85 Wikipedia example, 87-89 N name attribute, 140 natural language processing about, 119 additional resources, 136 Markov models, 123-129 Natural Language Toolkit, 129-136 summarizing data, 120-123 Natural Language Toolkit (NLTK) about, 129 installation and setup, 129 lexicographical analysis, 132-136 statistical analysis, 130-132 NavigableString object, 18 navigating trees, 18-22 network connections about, 3-5 connecting reliably, 9-11 security considerations, 181 next_siblings() function, 21 ngrams module, 132 n-grams, 109-112, 120 NLTK (Natural Language Toolkit) about, 129 installation and setup, 129 lexicographical analysis, 132-136 statistical analysis, 130-132 NLTK Downloader interface, 130 NLTK module, 129 None object, 10 normalizing data, 112-113 NumPy library, 164 O OAuth authentication, 57 OCR (optical character recognition) about, 161 library support, 162-164 OpenRefine Expression Language (GREL), 116 OpenRefine tool about, 114 cleaning data, 116-118 filtering data, 115-116 installing, 114 usage considerations, 114 optical character recognition (OCR) about, 161 library support, 162-164 Oracle DBMS, 76 OrderedDict object, 112 os module, 74 P page load times, 154, 182 parentheses (), 25 parents (tags), 20, 22 parsing HTML pages (see HTML parsing) parsing JSON, 63 patents, 217 pay-per-hour computing instances, 205 PDF files, 100-102 PDFMiner3K library, 101 Penn Treebank Project, 133 period (.), 25 Peters, Tim, 211 PhantomJS tool, 152-155, 203 PIL (Python Imaging Library), 162 Pillow library about, 162 processing well-formatted text, 165-169 pipe (|), 25 plus sign (+), 25 POST method (HTTP) about, 51 tracking requests, 140 troubleshooting, 186 variable names and, 138 viewing form parameters, 140 Index | 235 previous_siblings() function, 21 primary keys in tables, 85 programming languages, regular expressions and, 27 projects, bundling with libraries, 7 pseudorandom number generators, 34 PUT method (HTTP), 51 PyMySQL library, 82-85 PySocks module, 202 Python Imaging Library (PIL), 162 Python language, installing, 209-211 Q query time versus database size, 86 quotation marks ("), 17 R random number generators, 34 random seeds, 34 rate limits about, 52 Google APIs, 60 Twitter API, 55 reading documents document encoding, 93 Microsoft Word, 102-105 PDF files, 100 text files, 94-98 recursion limit, 38, 89 redirects, 44, 158 Referrer header, 179 RegexPal website, 24 regular expressions about, 22-27 BeautifulSoup example, 27 commonly used symbols, 25 programming languages and, 27 relational data, 77 remote hosting running from a website hosting account, 203 running from the cloud, 204 remote servers avoiding IP address blocking, 199-200 extensibility and, 200 portability and, 200 PySocks and, 202 Tor and, 201-202 Requests library 236 | Index about, 137 auth module, 144 installing, 138, 179 submitting forms, 138 tracking cookies, 142-143 requests module, 179-181 responses, API calls and, 52 Robots Exclusion Standard, 223 robots.txt file, 138, 167, 222-225, 229 S safe harbor protection, 219, 230 Scrapy library, 45-48 screenshots, 197 script tag, 147 search engine optimization (SEO), 222 searching text data, 135 security considerations copyright law and, 219 forms and, 183-186 handling cookies, 181 SELECT statement, 79, 81 Selenium library about, 143 elements and, 153, 194 executing JavaScript, 152-156 handling redirects, 158 security considerations, 185 testing example, 193-198 Tor support, 203 semicolon (;), 210 SEO (search engine optimization), 222 server-side processing handling redirects, 44, 158 scripting languages and, 147 sets, 67 siblings (tags), 21 Simple Mail Transfer Protocol (SMTP), 90 site maps, 36 Six Degrees of Wikipedia, 31-35 SMTP (Simple Mail Transfer Protocol), 90 smtplib package, 90 sorted function, 112 span tag, 15 Spitler, Daniel, 227 SQL Server (Microsoft), 76 square brackets [], 25 src attribute, 28, 72, 74 StaleElementReferenceException, 158 statistical analysis with NLTK, 130-132 storing data (see data management) StringIO object, 99 strings, regular expressions and, 22-28 stylesheets about, 14, 216 dynamic HTML and, 151 hidden fields and, 184 Surface Web, 36 trademarks, 218 traversing the Web (see web crawlers) tree navigation, 18-22 trespass to chattels, 219-220, 226 trigrams module, 132 try...finally statement, 85 Twitov app, 123 Twitter API, 55-59 T underscore (_), 17 undirected graph problems, 127 Unicode standard, 83, 95-98, 110 unit tests, 190, 197 United States v.


pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets by David J. Leinweber

AI winter, algorithmic trading, asset allocation, banking crisis, barriers to entry, Big bang: deregulation of the City of London, business cycle, butter production in bangladesh, butterfly effect, buttonwood tree, buy and hold, buy low sell high, capital asset pricing model, citizen journalism, collateralized debt obligation, corporate governance, Craig Reynolds: boids flock, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Danny Hillis, demand response, disintermediation, distributed generation, diversification, diversified portfolio, Emanuel Derman, en.wikipedia.org, experimental economics, financial innovation, fixed income, Gordon Gekko, implied volatility, index arbitrage, index fund, information retrieval, intangible asset, Internet Archive, John Nash: game theory, Kenneth Arrow, load shedding, Long Term Capital Management, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, market fragmentation, market microstructure, Mars Rover, Metcalfe’s law, moral hazard, mutually assured destruction, Myron Scholes, natural language processing, negative equity, Network effects, optical character recognition, paper trading, passive investing, pez dispenser, phenotype, prediction markets, quantitative hedge fund, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Renaissance Technologies, risk tolerance, risk-adjusted returns, risk/return, Robert Metcalfe, Ronald Reagan, Rubik’s Cube, semantic web, Sharpe ratio, short selling, Silicon Valley, Small Order Execution System, smart grid, smart meter, social web, South Sea Bubble, statistical arbitrage, statistical model, Steve Jobs, Steven Levy, Tacoma Narrows Bridge, the scientific method, The Wisdom of Crowds, time value of money, too big to fail, transaction costs, Turing machine, Upton Sinclair, value at risk, Vernor Vinge, yield curve, Yogi Berra, your tax dollars at work

No aspect of financial life is untouched: research, risk management, trading, and investor communication. We are much more adept at using structured and quantitative information on the Internet than textual and qualitative information. We are just starting to learn how to effectively use this kind of information. This area is driven by new Internet technologies such as XML (extensible markup language) and RSS (an XML dialect) and by advances in natural language processing. The new kid on the block, expected to take these ideas to new levels, is the Resource Description Framework (RDF), promoted by Web inventor Berners-Lee. RDF does for relationships between tagged data elements what the XML tagging itself did for moving from format HTML tags like “Bold” to meaningful XML tags like “Price.” 38 Nerds on Wall Str eet Hits and Misses: Rational and Irrational Technology Exuberance Peter Bernstein’s book Capital Ideas (Free Press, 1993) tells the story of Bill Sharpe, who wandered Wall Street looking for enough computer time to run a simple capital asset pricing model (CAPM) portfolio optimization, while being regarded as something of a crackpot for doing so.

Karl Sim’s MIT video is here: www.youtube.com/watch?v=F0OHycypSG8. Artificial Intelligence and Intelligence Amplification 157 on genetically adaptive strategies and well funded, but vanished, and few of the principals are still keen on genetic algorithms. After sending the GA to the back of the breakthrough line in the previous chapter, in Chapter 9 we get to “The Text Frontier,” using IA, natural language processing, and Web technologies to extract and make sense of qualitative written information from news and a variety of disintermediated sources. In Chapter 6, “Stupid Data Miner Tricks,” we saw how you could fool yourself with data. When you collect data that people have put on the Web, they can try to fool you as well. Chapter 10 on Collective Intelligence and Chapter 11 on market manipulations include some remarkable and egregious examples.

What Gelertner’s thesis means for investing is that we can look inside that shoebox with a new set of technologies to develop a new form of research. Grabbing more and more data, and doing more and more searches, will quickly overwhelm us, leading to advanced cases of carpal tunnel syndrome, and a shelf full of unread books with “Information Explosion” somewhere in the title. Collectively, the new alphabet soup of technologies—AI, IA, NLP, and IR (artificial intelligence, intelligence amplification, natural language processing, and information retrieval, for those with a bigger soup bowl)—provides a means to make sense of patterns in the data collected in enterprise and global search. These means are molecular search, the use of persistent software agents so you don’t have to keep doing the same thing all the time; the semantic Web, using the information associated with data at the point of origin so there is less guessing about meaning of what find; and modern user interfaces and visualizations, so you can prioritize what you find, and focus on the important and the valuable in a timely way.


pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier

23andMe, Affordable Care Act / Obamacare, airport security, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, IBM and the Holocaust, index card, informal economy, intangible asset, Internet of things, invention of the printing press, Jeff Bezos, Joi Ito, lifelogging, Louis Pasteur, Mark Zuckerberg, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, paypal mafia, performance metric, Peter Thiel, post-materialism, random walk, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, speech recognition, Steve Jobs, Steven Levy, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Thomas Davenport, Turing test, Watson beat the top human players on Jeopardy!

In fact, endgames when six or fewer pieces are left on the chessboard have been completely analyzed and all possible moves (N=all) have been represented in a massive table that when uncompressed fills more than a terabyte of data. This enables chess computers to play the endgame flawlessly. No human will ever be able to outplay the system. The degree to which more data trumps better algorithms has been powerfully demonstrated in the area of natural language processing: the way computers learn how to parse words as we use them in everyday speech. Around 2000, Microsoft researchers Michele Banko and Eric Brill were looking for a method to improve the grammar checker that is part of the company’s Word program. They weren’t sure whether it would be more useful to put their effort into improving existing algorithms, finding new techniques, or adding more sophisticated features.

The trillion-word corpus Google released in 2006 was compiled from the flotsam and jetsam of Internet content—“data in the wild,” so to speak. This was the “training set” by which the system could calculate the probability that, for example, one word in English follows another. It was a far cry from the grandfather in the field, the famous Brown Corpus of the 1960s, which totaled one million English words. Using the larger dataset enabled great strides in natural-language processing, upon which systems for tasks like voice recognition and computer translation are based. “Simple models and a lot of data trump more elaborate models based on less data,” wrote Google’s artificial-intelligence guru Peter Norvig and colleagues in a paper entitled “The Unreasonable Effectiveness of Data.” As Norvig and his co-authors explained, messiness was the key: “In some ways this corpus is a step backwards from the Brown Corpus: it’s taken from unfiltered Web pages and thus contains incomplete sentences, spelling errors, grammatical errors, and all sorts of other errors.

See imprecision MetaCrawler, [>] metadata: in datafication, [>]–[>] metric system, [>] Microsoft, [>], [>], [>] Amalga software, [>]–[>], [>] and data-valuation, [>] and language translation, [>] Word spell-checking system, [>]–[>] Minority Report [film], [>]–[>], [>] Moneyball [film], [>], [>]–[>], [>], [>] Moneyball (Lewis), [>] Moore’s Law, [>] Mydex, [>] nanotechnology: and qualitative changes, [>] Nash, Bruce, [>] nations: big data and competitive advantage among, [>]–[>] natural language processing, [>] navigation, marine: correlation analysis in, [>]–[>] Maury revolutionizes, [>]–[>], [>], [>], [>], [>], [>], [>], [>], [>], [>] Negroponte, Nicholas: Being Digital, [>] Netbot, [>] Netflix, [>] collaborative filtering at, [>] data-reuse by, [>] releases personal data, [>] Netherlands: comprehensive civil records in, [>]–[>] network analysis, [>] network theory, [>] big data in, [>]–[>] New York City: exploding manhole covers in, [>]–[>], [>]–[>], [>], [>] government data-reuse in, [>]–[>] New York Times, [>]–[>] Next Jump, [>] Neyman, Jerzy: on statistical sampling, [>] Ng, Andrew, [>] 1984 (Orwell), [>], [>] Norvig, Peter, [>] “The Unreasonable Effectiveness of Data,” [>] Nuance: fails to understand data-reuse, [>]–[>] numerical systems: history of, [>]–[>] Oakland Athletics, [>]–[>] Obama, Barack: on open data, [>] Och, Franz Josef, [>] Ohm, Paul: on privacy, [>] oil refining: big data in, [>] ombudsmen, [>] Omidyar, Pierre, [>] open data.


When Computers Can Think: The Artificial Intelligence Singularity by Anthony Berglas, William Black, Samantha Thalind, Max Scratchmann, Michelle Estes

3D printing, AI winter, anthropic principle, artificial general intelligence, Asilomar, augmented reality, Automated Insights, autonomous vehicles, availability heuristic, blue-collar work, brain emulation, call centre, cognitive bias, combinatorial explosion, computer vision, create, read, update, delete, cuban missile crisis, David Attenborough, Elon Musk, en.wikipedia.org, epigenetics, Ernest Rutherford, factory automation, feminist movement, finite state, Flynn Effect, friendly AI, general-purpose programming language, Google Glasses, Google X / Alphabet X, Gödel, Escher, Bach, industrial robot, Isaac Newton, job automation, John von Neumann, Law of Accelerating Returns, license plate recognition, Mahatma Gandhi, mandelbrot fractal, natural language processing, Parkinson's law, patent troll, patient HM, pattern recognition, phenotype, ransomware, Ray Kurzweil, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, sorting algorithm, speech recognition, statistical model, stem cell, Stephen Hawking, Stuxnet, superintelligent machines, technological singularity, Thomas Malthus, Turing machine, Turing test, uranium enrichment, Von Neumann architecture, Watson beat the top human players on Jeopardy!, wikimedia commons, zero day

Operators would give RA high-level goals and it would plan the low level actions required to meet them. These plans could then be quickly adjusted if things did not turn out as expected or if faults were discovered. This capacity becomes important for missions to the outer planets where communication delays are significant. By 2001, speech understanding had also improved to the point of being practical. People could and sometimes did talk to computers on a regular basis. Natural language processing was also quite capable of understanding requests such as “How many Klingons are there in sector five?” or “Open the pod bay doors”. The Remote Agent did not process speech or natural language largely because there was no one to talk to on the spacecraft. Human astronauts have been obsolete technology since the mid 1970s. The film confuses these abilities with artificial general intelligence, which was certainly not possible by 2001.

Perhaps more interestingly, some description logics use Is-A inheritance within semantic networks to provide a more well behaved approach to the problem of default reasoning. This author has published papers showing how semantic networks and description logics can be used to structure complex expert system rule bases. Ontologies and databases Ontologies provide a hierarchical framework for the terms used in an information system. One simple ontology is Wordnet, which is widely used to assist in natural language processing. It contains the definitions of some 150,000 words, or more specifically, synsets, which are collections of words with the same meaning. Thus “engine” the machine is in a different synset from “engine” to cause (e.g. “the engine of change”). For each synset Wordnet contains a list hyponyms or subtypes, so for “engine” that includes “aircraft engine” and “generator”. It also contains super-type hierarchies such as “machine”, “artefact”, and “physical object”.

And when it feels “sure” enough, it decides to buzz. This is all an instant, intuitive process for a human Jeopardy! player, but I felt convinced that under the hood my brain was doing more or less the same thing. IBM is targeting Watson for use in some type of medical applications. However, Watson is a completely different type of system than an expert system such as MYCIN. It may be just the natural language processing that is being utilized, otherwise it would be concerning if treatment options were being decided by a trivia engine. Alternatively, IBM may be exploiting the general lack of understanding about artificial intelligence to use the word “Watson” to refer to any vaguely intelligent application that it is building. Ferrucci 2010 Overview of DeepQA is one of the very few non-marketing technical papers on Watson.


pages: 713 words: 93,944

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Eric Redmond, Jim Wilson, Jim R. Wilson

AGPL, Amazon Web Services, create, read, update, delete, data is the new oil, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, full text search, general-purpose programming language, Kickstarter, linked data, MVC pattern, natural language processing, node package manager, random walk, recommendation engine, Ruby on Rails, Skype, social graph, web application

SELECT *​​ ​​FROM movies​​ ​​WHERE title % 'Avatre';​​ ​​ title​​ ​​---------​​ ​​ Avatar​​ Trigrams are an excellent choice for accepting user input, without weighing them down with wildcard complexity. Full-Text Fun Next, we want to allow users to perform full-text searches based on matching words, even if they’re pluralized. If a user wants to search for certain words in a movie title but can remember only some of them, Postgres supports simple natural-language processing. TSVector and TSQuery Let’s look for a movie that contains the words night and day. This is a perfect job for text search using the @@ full-text query operator. ​​SELECT title​​ ​​FROM movies​​ ​​WHERE title @@ 'night & day';​​ ​​ title​​ ​​-------------------------------​​ ​​ A Hard Day’s Night​​ ​​ Six Days Seven Nights​​ ​​ Long Day’s Journey Into Night​​ The query returns titles like A Hard Day’s Night, despite the word Day being in possessive form, and the two words are out of order in the query.

Compare these two vectors: ​​SELECT to_tsvector('english', 'A Hard Day''s Night');​​ ​​ to_tsvector​​ ​​----------------------------​​ ​​'day':3 'hard':2 'night':5​​ ​​SELECT to_tsvector('simple', 'A Hard Day''s Night');​​ ​​ to_tsvector​​ ​​----------------------------------------​​ ​​'a':1 'day':3 'hard':2 'night':5 's':4​​ With simple, you can retrieve any movie containing the lexeme a. Other Languages Since Postgres is doing some natural-language processing here, it only makes sense that different configurations would be used for different languages. All of the installed configurations can be viewed with this command: ​​book=# \dF​​ Dictionaries are part of what Postgres uses to generate tsvector lexemes (along with stop words and other tokenizing rules we haven’t covered called parsers and templates). You can view your system’s list here: ​​book=# \dFd​​ You can test any dictionary outright by calling the ts_lexize function.

This also explains why HBase is often employed at big companies to back logging and search systems. 4.1 Introducing HBase HBase is a column-oriented database that prides itself on consistency and scaling out. It is based on BigTable, a high-performance, proprietary database developed by Google and described in the 2006 white paper “Bigtable: A Distributed Storage System for Structured Data.”[26] Initially created for natural-language processing, HBase started life as a contrib package for Apache Hadoop. Since then, it has become a top-level Apache project. On the architecture front, HBase is designed to be fault tolerant. Hardware failures may be uncommon for individual machines, but in a large cluster, node failure is the norm. By using write-ahead logging and distributed configuration, HBase can quickly recover from individual server failures.


The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Bayesian statistics, business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, discrete time, disruptive innovation, George Gilder, Google Earth, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, late capitalism, lifelogging, linked data, longitudinal study, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

It is premised on the notion that all massive datasets hold meaningful information that is non-random, valid, novel, useful and ultimately understandable (Han et al. 2011). As such, it uses supervised and unsupervised machine learning to detect, classify and segment meaningful relationships, associations and trends between variables. It does this using a series of different techniques including natural language processing, neural networks, decision trees, and statistical (non-parametric and parametric) methods. The selection of method varies between the type of data (structured, unstructured or semistructured) and the purpose of the analysis (see Table 6.1). Source: Miller and Han (2009: 7). Most of the techniques listed in Table 6.1 relate to structured data as found in relational databases. For example, segmentation models might be applied to a retail database of customers and their purchases to segment them into different profiles based on their characteristics and patterns of behaviour in order to offer each group different services/offers.

In detecting associations, a variety of regression models might be used to compute correlations between variables and thus reveal hidden patterns that can then be leveraged into commercial gain (for example, identifying what goods are bought with each other and reorganising a store to promote purchasing) (see Chapter 7). Unstructured data in the form of language, images and sounds raise particular data mining challenges. Natural language-processing techniques seek to analyse human language as expressed through the written and spoken word. They use semantics and taxonomies to recognise patterns and extract information from documents. Examples would include entity extraction that automatically extracts metadata from text by searching for particular types of text and phrasing, such as person names, locations, dates, specialised terms and product terminology, and entity relation extraction that automatically identifies the relationships between semantic entities, linking them together (e.g., person name to birth date or location, or an opinion to an item) (McCreary 2009).

Index A/B testing 112 abduction 133, 137, 138–139, 148 accountability 34, 44, 49, 55, 63, 66, 113, 116, 165, 171, 180 address e-mail 42 IP 8, 167, 171 place 8, 32, 42, 45, 52, 93, 171 Web 105 administration 17, 30, 34, 40, 42, 56, 64, 67, 87, 89, 114–115, 116, 124, 174, 180, 182 aggregation 8, 14, 101, 140, 169, 171 algorithm 5, 9, 21, 45, 76, 77, 83, 85, 89, 101, 102, 103, 106, 109, 111, 112, 118, 119, 122, 125, 127, 130, 131, 134, 136, 142, 146, 154, 160, 172, 177, 179, 181, 187 Amazon 72, 96, 131, 134 Anderson, C. 130, 135 Andrejevic, M. 133, 167, 178 animation 106, 107 anonymity 57, 63, 79, 90, 92, 116, 167, 170, 171, 172, 178 apophenia 158, 159 Application Programming Interfaces (APIs) 57, 95, 152, 154 apps 34, 59, 62, 64, 65, 78, 86, 89, 90, 95, 97, 125, 151, 170, 174, 177 archive 21, 22, 24, 25, 29–41, 48, 68, 95, 151, 153, 185 archiving 23, 29–31, 64, 65, 141 artificial intelligence 101, 103 Acxiom 43, 44 astronomy 34, 41, 72, 97 ATM 92, 116 audio 74, 77, 83 automatic meter reading (AMR) 89 automatic number plate recognition (ANPR) 85, 89 automation 32, 51, 83, 85, 87, 89–90, 98, 99, 102, 103, 118, 127, 136, 141, 146, 180 Ayasdi 132, 134 backup 29, 31, 40, 64, 163 barcode 74, 85, 92, Bates, J. 56, 61, 62, 182 Batty, M. 90, 111, 112, 140 Berry, D. 134, 141 bias 13, 14, 19, 28, 45, 101, 134–136, 153, 154, 155, 160 Big Brother 126, 180 big data xv, xvi, xvii, 2, 6, 13, 16, 20, 21, 27–29, 42, 46, 67–183, 186, 187, 188, 190, 191, 192 analysis 100–112 characteristics 27–29, 67–79 enablers 80–87 epistemology 128–148 ethical issues 165–183 etymology 67 organisational issues 160–163 rationale 113–127 sources 87–99 technical issues 149–160 biological sciences 128–129, 137 biometric data 8, 84, 115 DNA 8, 71, 84 face 85, 88, 105 fingerprints 8, 9, 84, 87, 88, 115 gait 85, 88 iris 8, 84, 88 bit-rot 20 blog 6, 95, 170 Bonferroni principle 159 born digital 32, 46, 141 Bowker, G. 2, 19, 20, 22, 24 Borgman, C. 2, 7, 10, 20, 30, 37, 40, 41 boyd, D. 68, 75, 151, 152, 156, 158, 160, 182 Brooks, D. 130, 145 business 1, 16, 42, 45, 56, 61, 62, 67, 79, 110, 113–127, 130, 137, 149, 152, 161, 166, 172, 173, 187 calculative practices 115–116 Campbell’s Law 63, 127 camera 6, 81, 83, 87, 88, 89, 90, 107, 116, 124, 167, 178, 180 capitalism 15, 16, 21, 59, 61, 62, 86, 95, 114, 119–123, 126, 136, 161, 184, 186 capta 2 categorization 6, 8, 12, 19, 20, 102, 106, 176 causation 130, 132, 135, 147 CCTV 87, 88, 180 census 17, 18, 19, 22, 24, 27, 30, 43, 54, 68, 74, 75, 76, 77, 87, 102, 115, 157, 176 Centro De Operações Prefeitura Do Rio 124–125, 182 CERN 72, 82 citizen science 97–99, 155 citizens xvi, 45, 57, 58, 61, 63, 71, 88, 114, 115, 116, 126, 127, 165, 166, 167, 174, 176, 179, 187 citizenship 55, 115, 170, 174 classification 6, 10, 11, 23, 28, 104, 105, 157, 176 clickstream 43, 92, 94, 120, 122, 154, 176 clustering 103, 104, 105, 106, 110, 122 Codd, E. 31 competitiveness xvi, 16, 114, computation 2, 4, 5, 6, 29, 32, 68, 80, 81–82, 83, 84, 86, 98, 100, 101, 102, 110, 129, 136, 139–147, 181 computational social science xiv, 139–147, 152, 186 computing cloud xv, 81, 86 distributed xv, 37, 78, 81, 83, 98 mobile xv, 44, 78, 80, 81, 83, 85, 139 pervasive 81, 83–84, 98, 124 ubiquitous 80, 81, 83–84, 98, 100, 124, 126 confidence level 14, 37, 133, 153, 160 confidentiality 8, 169, 175 control creep 126, 166, 178–179 cookies 92, 119, 171 copyright 16, 30, 40, 49, 51, 54, 96 correlation 105, 110, 130, 131, 132, 135, 145, 147, 157, 159 cost xv, 6, 11, 16, 27, 31, 32, 37, 38, 39, 40, 44, 52, 54, 57, 58, 59, 61, 66, 80, 81, 83, 85, 93, 96, 100, 116, 117, 118, 120, 127, 150 Crawford, K. 68, 75, 135, 151, 152, 155, 156, 158, 160, 182 credit cards 8, 13, 42, 44, 45, 85, 92, 167, 171, 176 risk 42, 63, 75, 120, 176, 177 crime 55, 115, 116, 123, 175, 179 crowdsourcing 37, 73, 93, 96–97, 155, 160 Cukier, K. 68, 71, 72, 91, 114, 128, 153, 154, 161, 174 customer relationship management (CRM) 42, 99, 117–118, 120, 122, 176 cyber-infrastructure 33, 34, 35, 41, 186 dashboard 106, 107, 108 data accuracy 12, 14, 110, 153, 154, 171 administrative 84–85, 89, 115, 116, 125, 150, 178 aggregators see data brokers amplification 8, 76, 99, 102, 167 analogue 1, 3, 32, 83, 88, 140, 141 analytics 42, 43, 63, 73, 80, 100–112, 116, 118, 119, 120, 124, 125, 129, 132, 134, 137, 139, 140, 145, 146, 149, 151, 159, 160, 161, 176, 179, 186, 191 archive see archive assemblage xvi, xvii, 2, 17, 22, 24–26, 66, 80, 83, 99, 117, 135, 139, 183, 184–192 attribute 4, 8–9, 31, 115, 150 auditing 33, 40, 64, 163 authenticity 12, 153 automated see automation bias see bias big see big data binary 1, 4, 32, 69 biometric see biometric data body 177–178, 187 boosterism xvi, 67, 127, 187, 192 brokers 42–45, 46, 57, 74, 75, 167, 183, 186, 187, 188, 191 calibration 13, 20 catalogue 32, 33, 35 clean 12, 40, 64, 86, 100, 101, 102, 152, 153, 154, 156 clearing house 33 commodity xvi, 4, 10, 12, 15, 16, 41, 42–45, 56, 161 commons 16, 42 consolidators see data brokers cooked 20, 21 corruption 19, 30 curation 9, 29, 30, 34, 36, 57, 141 definition 1, 2–4 deluge xv, 28, 73, 79, 100, 112, 130, 147, 149–151, 157, 168, 175 derived 1, 2, 3, 6–7, 8, 31, 32, 37, 42, 43, 44, 45, 62, 86, 178 deserts xvi, 28, 80, 147, 149–151, 161 determinism 45, 135 digital 1, 15, 31, 32, 67, 69, 71, 77, 82, 85, 86, 90, 137 directories 33, 35 dirty 29, 154, 163 dive 64–65, 188 documentation 20, 30, 31, 40, 64, 163 dredging 135, 147, 158, 159 dump 64, 150, 163 dynamic see dynamic data enrichment 102 error 13, 14, 44, 45, 101, 110, 153, 154, 156, 169, 175, 180 etymology 2–3, 67 exhaust 6–7, 29, 80, 90 fidelity 34, 40, 55, 79, 152–156 fishing see data dredging formats xvi, 3, 5, 6, 9, 22, 25, 30, 33, 34, 40, 51, 52, 54, 65, 77, 102, 153, 156, 157, 174 framing 12–26, 133–136, 185–188 gamed 154 holding 33, 35, 64 infrastructure xv, xvi, xvii, 2, 21–24, 25, 27–47, 52, 64, 102, 112, 113, 128, 129, 136, 140, 143, 147, 148, 149, 150, 156, 160, 161, 162, 163, 166, 184, 185, 186, 188, 189, 190, 191, 192 integration 42, 149, 156–157 integrity 12, 30, 33, 34, 37, 40, 51, 154, 157, 171 interaction 43, 72, 75, 85, 92–93, 94, 111, 167 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 156–157, 163, 184 interval 5, 110 licensing see licensing lineage 9, 152–156 linked see linked data lost 5, 30, 31, 39, 56, 150 markets xvi, 8, 15, 25, 42-45, 56, 59, 75, 167, 178 materiality see materiality meta see metadata mining 5, 77, 101, 103, 104–106, 109, 110, 112, 129, 132, 138, 159, 188 minimisation 45, 171, 178, 180 nominal 5, 110 ordinal 5, 110 open see open data ontology 12, 28, 54, 150 operational 3 ownership 16, 40, 96, 156, 166 preparation 40, 41, 54, 101–102 philosophy of 1, 2, 14, 17–21, 22, 25, 128–148, 185–188 policy 14, 23, 30, 33, 34, 37, 40, 48, 64, 160, 163, 170, 172, 173, 178 portals 24, 33, 34, 35 primary 3, 7–8, 9, 50, 90 preservation 30, 31, 34, 36, 39, 40, 64, 163 protection 15, 16, 17, 20, 23, 28, 40, 45, 62, 63, 64, 167, 168–174, 175, 178, 188 protocols 23, 25, 30, 34, 37 provenance 9, 30, 40, 79, 153, 156, 179 qualitative 4–5, 6, 14, 146, 191 quantitative 4–5, 14, 109, 127, 136, 144, 145, 191 quality 12, 13, 14, 34, 37, 40, 45, 52, 55, 57, 58, 64, 79, 102, 149, 151, 152–156, 157, 158 raw 1, 2, 6, 9, 20, 86, 185 ratio 5, 110 real-time 65, 68, 71, 73, 76, 88, 89, 91, 99, 102, 106, 107, 116, 118, 121, 124, 125, 139, 151, 181 reduction 5, 101–102 representative 4, 8, 13, 19, 21, 28 relational 3, 8, 28, 44, 68, 74–76, 79, 84, 85, 87, 88, 99, 100, 119, 140, 156, 166, 167, 184 reliability 12, 13–14, 52, 135, 155 resellers see data brokers resolution 7, 26, 27, 28, 68, 72, 73–74, 79, 84, 85, 89, 92, 133–134, 139, 140, 150, 180 reuse 7, 27, 29, 30, 31, 32, 39, 40, 41, 42, 46, 48, 49–50, 52, 56, 59, 61, 64, 102, 113, 163 scaled xvi, xvii 32, 100, 101, 112, 138, 149, 150, 163, 186 scarcity xv, xvi, 28, 80, 149–151, 161 science xvi, 100–112, 130, 137–139, 148, 151, 158, 160–163, 164, 191 secondary 3, 7–8 security see security selection 101, 176 semi-structured 4, 5–6, 77, 100, 105 sensitive 15, 16, 45, 63, 64, 137, 151, 167, 168, 171, 173, 174 shadow 166–168, 177, 179, 180 sharing 9, 11, 20, 21, 23, 24, 27, 29–41, 48–66, 80, 82, 95, 113, 141, 151, 174, 186 small see small data social construction 19–24 spatial 17, 52, 63, 68, 73, 75, 84–85, 88–89 standards xvi, 9, 14, 19, 22, 23, 24, 25, 31, 33, 34, 38, 40, 52, 53, 64, 102, 153, 156, 157 storage see storage stranded 156 structures 4, 5–6, 12, 21, 23, 30, 31, 40, 51, 68, 77, 86, 103, 106, 156 structured 4, 5–6, 11, 32, 52, 68, 71, 75, 77, 79, 86, 88, 105, 112, 163 tertiary 7–8, 9, 27, 74 time-series 68, 102, 106, 110 transient 6–7, 72, 150 transactional 42, 43, 71, 72, 74, 75, 85, 92, 93–94, 120, 122, 131, 167, 175, 176, 177 uncertainty see uncertainty unstructured 4, 5–6, 32, 52, 68, 71, 75, 77, 86, 100, 105, 112, 140, 153, 157 validity 12, 40, 72, 102, 135, 138, 154, 156, 158 variety 26, 28, 43, 44, 46, 68, 77, 79, 86, 139, 140, 166, 184 velocity 26, 28, 29, 68, 76–77, 78, 79, 86, 88, 102, 106, 112. 117, 140, 150, 153, 156, 184 veracity 13, 79, 102, 135, 152–156, 157, 163 volume 7, 26, 27, 28, 29, 32, 46, 67, 68, 69–72, 74, 76, 77, 78, 79, 86, 102, 106, 110, 125, 130, 135, 140, 141, 150, 156, 166, 184 volunteered 87, 93–98, 99, 155 databank 29, 34, 43 database NoSQL 6, 32, 77, 78, 86–87 relational 5, 6, 8, 32–33, 43, 74–75, 77, 78, 86, 100, 105 data-driven science 133, 137–139, 186 data-ism 130 datafication 181 dataveillance 15, 116, 126, 157, 166–168, 180, 181, 182, 184 decision tree 104, 111, 122, 159, deconstruction 24, 98, 126, 189–190 decontextualisation 22 deduction 132, 133, 134, 137, 138, 139, 148 deidentification 171, 172, 178 democracy 48, 55, 62, 63, 96, 117, 170 description 9, 101, 104, 109, 143, 147, 151, 190 designated community 30–31, 33, 46 digital devices 13, 25, 80, 81, 83, 84, 87, 90–91, 167, 174, 175 humanities xvi, 139–147, 152, 186 object identifier 8, 74 serendipity 134 discourse 15, 20, 55, 113–114, 117, 122, 127, 192 discursive regime 15, 20, 24, 56, 98, 113–114, 116, 123, 126, 127, 190 disruptive innovation xv, 68, 147, 184, 192 distributed computing xv, 37, 78, 81, 83, 98 sensors 124, 139, 160 storage 34, 37, 68, 78, 80, 81, 85–87, 97 division of labour 16 Dodge, M. 2, 21, 68, 73, 74, 76, 83, 84, 85, 89, 90, 92, 93, 96, 113, 115, 116, 124, 154, 155, 167, 177, 178, 179, 180, 189 driver’s licence 45, 87, 171 drone 88, Dublin Core 9 dynamic data xv, xvi, 76–77, 86, 106, 112 pricing 16, 120, 123, 177 eBureau 43, 44 ecological fallacy 14, 102, 135, 149, 158–160 Economist, The 58, 67, 69, 70, 72, 128 efficiency 16, 38, 55, 56, 59, 66, 77, 93, 102, 111, 114, 116, 118, 119, 174, 176 e-mail 71, 72–73, 82, 85, 90, 93, 116, 174, 190 empiricism 129, 130–137, 141, 186 empowerment 61, 62–63, 93, 115, 126, 165 encryption 171, 175 Enlightenment 114 Enterprise Resource Planning (ERP) 99, 117, 120 entity extraction 105 epistemology 3, 12, 19, 73, 79, 112, 128–148, 149, 185, 186 Epsilon 43 ethics 12, 14–15, 16, 19, 26, 30, 31, 40, 41, 64, 73, 99, 128, 144, 151, 163, 165–183, 186 ethnography 78, 189, 190, 191 European Union 31, 38, 45, 49, 58, 59, 70, 157, 168, 173, 178 everyware 83 exhaustive 13, 27, 28, 68, 72–73, 79, 83, 88, 100, 110, 118, 133–134, 140, 150, 153, 166, 184 explanation 101, 109, 132, 133, 134, 137, 151 extensionality 67, 78, 140, 184 experiment 2, 3, 6, 34, 75, 78, 118, 129, 131, 137, 146, 150, 160 Facebook 6, 28, 43, 71, 72, 77, 78, 85, 94, 119, 154, 170 facts 3, 4, 9, 10, 52, 140, 159 Fair Information Practice Principles 170–171, 172 false positive 159 Federal Trade Commission (FTC) 45, 173 flexibility 27, 28, 68, 77–78, 79, 86, 140, 157, 184 Flickr 95, 170 Flightradar 107 Floridi, L. 3, 4, 9, 10, 11, 73, 112, 130, 151 Foucault, M. 16, 113, 114, 189 Fourth paradigm 129–139 Franks, B. 6, 111, 154 freedom of information 48 freemium service 60 funding 15, 28, 29, 31, 34, 37, 38, 40, 41, 46, 48, 52, 54–55, 56, 57–58, 59, 60, 61, 65, 67, 75, 119, 143, 189 geographic information systems 147 genealogy 98, 127, 189–190 Gitelman, L. 2, 19, 20, 21, 22 Global Positioning System (GPS) 58, 59, 73, 85, 88, 90, 121, 154, 169 Google 32, 71, 73, 78, 86, 106, 109, 134, 170 governance 15, 21, 22, 23, 38, 40, 55, 63, 64, 66, 85, 87, 89, 117, 124, 126, 136, 168, 170, 178–182, 186, 187, 189 anticipatory 126, 166, 178–179 technocratic 126, 179–182 governmentality xvi, 15, 23, 25, 40, 87, 115, 127, 168, 185, 191 Gray, J. 129–130 Guardian, The 49 Gurstein, M. 52, 62, 63 hacking 45, 154, 174, 175 hackathon 64–65, 96, 97, 188, 191 Hadoop 87 hardware 32, 34, 40, 63, 78, 83, 84, 124, 143, 160 human resourcing 112, 160–163 hype cycle 67 hypothesis 129, 131, 132, 133, 137, 191 IBM 70, 123, 124, 143, 162, 182 identification 8, 44, 68, 73, 74, 77, 84–85, 87, 90, 92, 115, 169, 171, 172 ideology 4, 14, 25, 61, 113, 126, 128, 130, 134, 140, 144, 185, 190 immutable mobiles 22 independence 3, 19, 20, 24, 100 indexical 4, 8–9, 32, 44, 68, 73–74, 79, 81, 84–85, 88, 91, 98, 115, 150, 156, 167, 184 indicator 13, 62, 76, 102, 127 induction 133, 134, 137, 138, 148 information xvii, 1, 3, 4, 6, 9–12, 13, 23, 26, 31, 33, 42, 44, 45, 48, 53, 67, 70, 74, 75, 77, 92, 93, 94, 95, 96, 100, 101, 104, 105, 109, 110, 119, 125, 130, 138, 140, 151, 154, 158, 161, 168, 169, 171, 174, 175, 184, 192 amplification effect 76 freedom of 48 management 80, 100 overload xvi public sector 48 system 34, 65, 85, 117, 181 visualisation 109 information and communication technologies (ICTs) xvi, 37, 80, 83–84, 92, 93, 123, 124 Innocentive 96, 97 INSPIRE 157 instrumental rationality 181 internet 9, 32, 42, 49, 52, 53, 66, 70, 74, 80, 81, 82, 83, 86, 92, 94, 96, 116, 125, 167 of things xv, xvi, 71, 84, 92, 175 intellectual property rights xvi, 11, 12, 16, 25, 30, 31, 40, 41, 49, 50, 56, 62, 152, 166 Intelius 43, 44 intelligent transportation systems (ITS) 89, 124 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 149, 156–157, 163, 184 interpellation 165, 180, 188 interviews 13, 15, 19, 78, 155, 190 Issenberg, S. 75, 76, 78, 119 jurisdiction 17, 25, 51, 56, 57, 74, 114, 116 Kafka 180 knowledge xvii, 1, 3, 9–12, 19, 20, 22, 25, 48, 53, 55, 58, 63, 67, 93, 96, 110, 111, 118, 128, 130, 134, 136, 138, 142, 159, 160, 161, 162, 187, 192 contextual 48, 64, 132, 136–137, 143, 144, 187 discovery techniques 77, 138 driven science 139 economy 16, 38, 49 production of 16, 20, 21, 24, 26, 37, 41, 112, 117, 134, 137, 144, 184, 185 pyramid 9–10, 12, situated 16, 20, 28, 135, 137, 189 Latour, B. 22, 133 Lauriault, T.P. 15, 16, 17, 23, 24, 30, 31, 33, 37, 38, 40, 153 law of telecosm 82 legal issues xvi, 1, 23, 25, 30, 31, 115, 165–179, 182, 183, 187, 188 levels of measurement 4, 5 libraries 31, 32, 52, 71, 141, 142 licensing 14, 25, 40, 42, 48, 49, 51, 53, 57, 73, 96, 151 LIDAR 88, 89, 139 linked data xvii, 52–54, 66, 156 longitudinal study 13, 76, 140, 149, 150, 160 Lyon, D. 44, 74, 87, 167, 178, 180 machine learning 5, 6, 101, 102–104, 106, 111, 136, 188 readable 6, 52, 54, 81, 84–85, 90, 92, 98 vision 106 management 62, 88, 117–119, 120, 121, 124, 125, 131, 162, 181 Manovich, L. 141, 146, 152, 155 Manyika, J. 6, 16, 70, 71, 72, 104, 116, 118, 119, 120, 121, 122, 161 map 5, 22, 24, 34, 48, 54, 56, 73, 85, 88, 93, 96, 106, 107, 109, 115, 143, 144, 147, 154, 155–156, 157, 190 MapReduce 86, 87 marginal cost 11, 32, 57, 58, 59, 66, 151 marketing 8, 44, 58, 73, 117, 119, 120–123, 131, 176 marketisation 56, 61–62, 182 materiality 4, 19, 21, 24, 25, 66, 183, 185, 186, 189, 190 Mattern, S. 137, 181 Mayer-Schonberger, V. 68, 71, 72, 91, 114, 153, 154, 174 measurement 1, 3, 5, 6, 10, 12, 13, 15, 19, 23, 69, 97, 98, 115, 128, 166 metadata xvi, 1, 3, 4, 6, 8–9, 13, 22, 24, 29, 30, 31, 33, 35, 40, 43, 50, 54, 64, 71, 72, 74, 78, 85, 91, 93, 102, 105, 153, 155, 156 methodology 145, 158, 185 middleware 34 military intelligence 71, 116, 175 Miller, H.J. xvi, 27, 100, 101, 103, 104, 138, 139, 159 Minelli, M. 101, 120, 137, 168, 170, 171, 172, 174, 176 mixed methods 147, 191 mobile apps 78 computing xv, 44, 78, 80, 81, 83, 85, 139 mapping 88 phones 76, 81, 83, 90, 93, 151, 168, 170, 175 storage 85 mode of production 16 model 7, 11, 12, 24, 32, 37, 44, 57, 72, 73, 101, 103, 105, 106, 109, 110–112, 119, 125, 129, 130, 131, 132, 133, 134, 137, 139, 140, 144, 145, 147, 158–159, 166, 181 agent-based model 111, business 30, 54, 57–60, 61, 95, 118, 119, 121 environmental 139, 166 meteorological 72 time-space 73 transportation 7 modernity 3 Moore’s Law 81, moral philosophy 14 Moretti, F. 141–142 museum 31, 32, 137 NASA 7 National Archives and Records Administration (NARA) 67 National Security Agency (NSA) 45, 116 natural language processing 104, 105 near-field communication 89, 91 neoliberalism 56, 61–62, 126, 182 neural networks 104, 105, 111 New Public Management 62, non-governmental organisations xvi, 43, 55, 56, 73, 117 non-excludable 11, 151 non-rivalrous 11, 57, 151 normality 100, 101 normative thinking 12, 15, 19, 66, 99, 127, 144, 182, 183, 187, 192 Obama, B. 53, 75–76, 78, 118–119 objectivity 2, 17, 19, 20, 62, 135, 146, 185 observant participation 191 oligopticon 133, 167, 180 ontology 3, 12, 17–21, 22, 28, 54, 79, 128, 138, 150, 156, 177, 178, 184, 185 open data xv, xvi, xvii, 2, 12, 16, 21, 25, 48–66, 97, 114, 124, 128, 129, 140, 149, 151, 163, 164, 167, 186, 187, 188, 190, 191, 192 critique of 61–66 economics of 57–60 rationale 54–56 Open Definition 50 OpenGovData 50, 51 Open Knowledge Foundation 49, 52, 55, 58, 189, 190 open science 48, 72, 98 source 48, 56, 60, 87, 96 OpenStreetMap 73, 93, 96, 154, 155–156 optimisation 101, 104, 110–112, 120, 121, 122, 123 Ordnance Survey 54, 57 Organization for Economic Cooperation and Development (OECD) 49, 50, 59 overlearning 158, 159 panoptic 133, 167, 180 paradigm 112, 128–129, 130, 138, 147, 148, 186 participant observation 190, 191 participation 48, 49, 55, 66, 82, 94, 95, 96, 97–98, 126, 155, 165, 180 passport 8, 45, 84, 87, 88, 115 patent 13, 16, 41, 51 pattern recognition 101, 104–106, 134, 135 personally identifiable information 171 philanthropy 32, 38, 58 philosophy of science 112, 128–148, 185–188 phishing 174, 175 phone hacking 45 photography 6, 43, 71, 72, 74, 77, 86, 87, 88, 93, 94, 95, 105, 115, 116, 141, 155, 170 policing 80, 88, 116, 124, 125, 179 political economy xvi, 15–16, 25, 42–45, 182, 185, 188, 191 Pollock, R. 49, 54, 56, 57 58, 59 positivism 129, 136–137, 140, 141, 144, 145, 147 post-positivism 140, 144, 147 positionality 135, 190 power/knowledge 16, 22 predictive modelling 4, 7, 12, 34, 44, 45, 76, 101, 103, 104, 110–112, 118, 119, 120, 125, 132, 140, 147, 168, 179 profiling 110–112, 175–178, 179, 180 prescription 101 pre-analytical 2, 3, 19, 20, 185 pre-analytics 101–102, 112 pre-factual 3, 4, 19, 185 PRISM 45, 116 privacy 15, 28, 30, 40, 45, 51, 57, 63, 64, 96, 117, 163, 165, 166, 168–174, 175, 178, 182, 187 privacy by design 45, 173, 174 probability 14, 110, 153, 158 productivity xvi, 16, 39, 55, 66, 92, 114, 118 profiling 12, 42–45, 74, 75, 110–112, 119, 166, 168, 175–178, 179, 180, 187 propriety rights 48, 49, 54, 57, 62 prosumption 93 public good 4, 12, 16, 42, 52, 56, 58, 79, 97 –private partnerships 56, 59 sector information (PSI) 12, 48, 54, 56, 59, 61, 62 quantified self 95 redlining 176, 182 reductionism 73, 136, 140, 142, 143, 145 regression 102, 104, 105, 110, 111, 122 regulation xvi, 15, 16, 23, 25, 40, 44, 46, 83, 85, 87, 89–90, 114, 115, 123, 124, 126, 168, 174, 178, 180, 181–182, 187, 192 research design 7, 13, 14, 77–78, 98, 137–138, 153, 158 Renaissance xvi, 129, 141 repository 29, 33, 34, 41 representativeness 13, 14, 19, 21 Resource Description Framework (RDF) 53, 54 remote sensing 73–74, 105 RFID 74, 85, 90, 91, 169 rhetorical 3, 4, 185 right to be forgotten 45, 172, 187 information (RTI) 48, 62 risk 16, 44, 58, 63, 118, 120, 123, 132, 158, 174, 176–177, 178, 179, 180 Rosenberg, D. 1, 3 Ruppert, E. 22, 112, 157, 163, 187 sampling 13, 14, 27, 28, 46, 68, 72, 73, 77, 78, 88, 100, 101, 102, 120, 126, 133, 138, 139, 146, 149–150, 152, 153, 154, 156, 159 scale of economy 37 scanners 6, 25, 29, 32, 83, 85, 88, 89, 90, 91, 92, 175, 177, 180 science xvi, 1, 2, 3, 19, 20, 29, 31, 34, 37, 46, 65, 67, 71, 72, 73, 78, 79, 97, 98, 100, 101, 103, 111, 112, 128–139, 140, 147, 148, 150, 158, 161, 165, 166, 181, 184, 186 scientific method 129, 130, 133, 134, 136, 137–138, 140, 147, 148, 186 security data 28, 33, 34, 40, 45, 46, 51, 57, 126, 157, 166, 169, 171, 173, 174–175, 182, 187 national 42, 71, 88, 116–117, 172, 176, 178, 179 private 99, 115, 118, 151 social 8, 32, 45, 87, 115, 171 segmentation 104, 105, 110, 119, 120, 121, 122, 176 semantic information 9, 10, 11, 105, 157 Web 49, 52, 53, 66 sensors xv, 6, 7, 19, 20, 24, 25, 28, 34, 71, 76, 83, 84, 91–92, 95, 124, 139, 150, 160 sentiment analysis 105, 106, 121, Siegel, E. 103, 110, 111, 114, 120, 132, 158, 176, 179 signal 9, 151, 159 Silver, N. 136, 151, 158 simulation 4, 32, 37, 101, 104, 110–112, 119, 129, 133, 137, 139, 140 skills 37, 48, 52, 53, 57, 63, 94, 97, 98, 112, 149, 160–163, 164 small data 21, 27–47, 68, 72, 75, 76, 77, 79, 100, 103, 110, 112, 146, 147, 148, 150, 156, 160, 166, 184, 186, 188, 191 smart cards 90 cities 91, 92, 99, 124–125, 181–182 devices 83 metering 89, 123, 174 phones 81, 82, 83, 84, 90, 94, 107, 121, 155, 170, 174 SmartSantander 91 social computing xvi determinism 144 media xv, 13, 42, 43, 76, 78, 90, 93, 94–95, 96, 105, 119, 121, 140, 150, 151, 152, 154, 155, 160, 167, 176, 180 physics 144 security number 8, 32, 45, 87, 115, 171 sorting 126, 166, 168, 175–178, 182 sociotechnical systems 21–24, 47, 66, 183, 185, 188 software 6, 20, 32, 34, 40, 48, 53, 54, 56, 63, 80, 83, 84, 86, 88, 96, 132, 143, 160, 161, 163, 166, 170, 172, 175, 177, 180, 189 Solove, D. 116, 120, 168, 169, 170, 172, 176, 178, 180 solutionism 181 sousveillance 95–96 spatial autocorrelation 146 data infrastructure 34, 35, 38 processes 136, 144 resolution 149 statistics 110 video 88 spatiality 17, 157 Star, S.L. 19, 20, 23, 24 stationarity 100 statistical agencies 8, 30, 34, 35, 115 geography 17, 74, 157 statistics 4, 8, 13, 14, 24, 48, 77, 100, 101, 102, 104, 105, 109–110, 111, 129, 132, 134, 135, 136, 140, 142, 143, 145, 147, 159 descriptive 4, 106, 109, 147 inferential 4, 110, 147 non-parametric 105, 110 parametric 105, 110 probablistic 110 radical 147 spatial 110 storage 31–32, 68, 72, 73, 78, 80, 85–87, 88, 100, 118, 161, 171 analogue 85, 86 digital 85–87 media 20, 86 store loyalty cards 42, 45, 165 Sunlight Foundation 49 supervised learning 103 Supply Chain Management (SCM) 74, 99, 117–118, 119, 120, 121 surveillance 15, 71, 80, 83, 87–90, 95, 115, 116, 117, 123, 124, 151, 165, 167, 168, 169, 180 survey 6, 17, 19, 22, 28, 42, 68, 75, 77, 87, 115, 120 sustainability 16, 33, 34, 57, 58, 59, 61, 64–66, 87, 114, 123–124, 126, 155 synchronicity 14, 95, 102 technological handshake 84, 153 lock-in 166, 179–182 temporality 17, 21, 27, 28, 32, 37, 68, 75, 111, 114, 157, 160, 186 terrorism 116, 165, 179 territory 16, 38, 74, 85, 167 Tesco 71, 120 Thrift, N. 83, 113, 133, 167, 176 TopCoder 96 trading funds 54–55, 56, 57 transparency 19, 38, 44, 45, 48–49, 55, 61, 62, 63, 113, 115, 117, 118, 121, 126, 165, 173, 178, 180 trust 8, 30, 33, 34, 40, 44, 55, 84, 117, 152–156, 163, 175 trusted digital repository 33–34 Twitter 6, 71, 78, 94, 106, 107, 133, 143, 144, 146, 152, 154, 155, 170 uncertainty 10, 13, 14, 100, 102, 110, 156, 158 uneven development 16 Uniform Resource Identifiers (URIs) 53, 54 United Nations Development Programme (UNDP) 49 universalism 20, 23, 133, 140, 144, 154, 190 unsupervised learning 103 utility 1, 28, 53, 54, 55, 61, 63, 64–66, 100, 101, 114, 115, 134, 147, 163, 185 venture capital 25, 59 video 6, 43, 71, 74, 77, 83, 88, 90, 93, 94, 106, 141, 146, 170 visual analytics 106–109 visualisation 5, 10, 34, 77, 101, 102, 104, 106–109, 112, 125, 132, 141, 143 Walmart 28, 71, 99, 120 Web 2.0 81, 94–95 Weinberger, D. 9, 10, 11, 96, 97, 132, 133 White House 48 Wikipedia 93, 96, 106, 107, 143, 154, 155 Wired 69, 130 wisdom 9–12, 114, 161 XML 6, 53 Zikopoulos, P.C. 6, 16, 68, 70, 73, 76, 119, 151


pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzić

Albert Einstein, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application

As the most natural form of storing information is text, text mining is believed to have a commercial potential even higher than that of traditional data mining with structured data. In fact, recent studies indicate that 80% of a company’s information is contained in text documents. Text mining, however, is also a much more complex task than traditional data mining as it involves dealing with unstructured text data that are inherently ambiguous. Text mining is a multidisciplinary field involving IR, text analysis, information extraction, natural language processing, clustering, categorization, visualization, machine learning, and other methodologies already included in the data-mining “menu”; even some additional specific techniques developed lately and applied on semi-structured data can be included in this field. Market research, business-intelligence gathering, e-mail management, claim analysis, e-procurement, and automated help desk are only a few of the possible applications where text mining can be deployed successfully.

In the case of the vector approach, no match would be found between a query using the term “altruistic” and a document using the word “benevolent” though the meanings are quite similar. On the other hand, polysemes are words that have multiple meanings. The term “bank” could mean a financial system, to rely upon, or a type of basketball shot. All of these lead to very different types of documents, which can be problematic for document comparisons. LSA attempts to solve these problems, not with extensive dictionaries and natural language processing engines, but by using mathematical patterns within the data itself to uncover these relationships. We do this by reducing the number of dimensions used to represent a document using a mathematical matrix operation called singular value decomposition (SVD). Let us take a look at an example data set. This very simple data set consists of five documents. We will show the dimension reduction steps of LSA on the first four documents (), which will make up our training data.

For example, with the MM shown in Figure 12.21, the probability that the MM takes the horizontal path from starting node to S2 is 0.4 × 0.7 = 0.28. Figure 12.21. A simple Markov Model. MM is derived based on the memoryless assumption. It states that given the current state of the system, the future evolution of the system is independent of its history. MMs have been used widely in speech recognition and natural language processing. Hidden Markov Model (HMM) is an extension to MM. Similar to MM, HMM consists of a set of states and transition probabilities. In a regular MM, the states are visible to the observer, and the state-transition probabilities are the only parameters. In HMM, each state is associated with a state-probability distribution. For example, assume that we were given a sequence of events in a coin toss: O = (HTTHTHH), where H = Head and T = Tail.


pages: 391 words: 105,382

Utopia Is Creepy: And Other Provocations by Nicholas Carr

Air France Flight 447, Airbnb, Airbus A320, AltaVista, Amazon Mechanical Turk, augmented reality, autonomous vehicles, Bernie Sanders, book scanning, Brewster Kahle, Buckminster Fuller, Burning Man, Captain Sullenberger Hudson, centralized clearinghouse, Charles Lindbergh, cloud computing, cognitive bias, collaborative consumption, computer age, corporate governance, crowdsourcing, Danny Hillis, deskilling, digital map, disruptive innovation, Donald Trump, Electric Kool-Aid Acid Test, Elon Musk, factory automation, failed state, feminist movement, Frederick Winslow Taylor, friendly fire, game design, global village, Google bus, Google Glasses, Google X / Alphabet X, Googley, hive mind, impulse control, indoor plumbing, interchangeable parts, Internet Archive, invention of movable type, invention of the steam engine, invisible hand, Isaac Newton, Jeff Bezos, jimmy wales, Joan Didion, job automation, Kevin Kelly, lifelogging, low skilled workers, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, means of production, Menlo Park, mental accounting, natural language processing, Network effects, new economy, Nicholas Carr, Norman Mailer, off grid, oil shale / tar sands, Peter Thiel, plutocrats, Plutocrats, profit motive, Ralph Waldo Emerson, Ray Kurzweil, recommendation engine, Republic of Letters, robot derives from the Czech word robota Czech, meaning slave, Ronald Reagan, self-driving car, SETI@home, side project, Silicon Valley, Silicon Valley ideology, Singularitarianism, Snapchat, social graph, social web, speech recognition, Startup school, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, technoutopianism, the medium is the message, theory of mind, Turing test, Whole Earth Catalog, Y Combinator

That’s much less the case now. Google’s conception of searching has changed since those early days, and that means our own idea of what it means to search is changing as well. Google’s goal is no longer to read the web. It’s to read us. Ray Kurzweil, the inventor and AI speculator, recently joined the company as a director of engineering. His general focus will be on machine learning and natural language processing. But his particular concern will entail reconfiguring the company’s search engine to focus not outwardly on the world but inwardly on the user. “I envision some years from now that the majority of search queries will be answered without you actually asking,” he recently explained. “It’ll just know this is something that you’re going to want to see.” This has actually been Google’s great aspiration for a while now.

They shape speech to the needs of the computer network—and the computer network’s owner. “The speaking of language is part of an activity, or of a form of life,” wrote Wittgenstein in Philosophical Investigations. If human language is bound up in living, if it is an expression of both sense and sensibility, then computers, being nonliving, having no sensibility, will have a very difficult time mastering “natural-language processing” beyond a certain rudimentary level. The best solution, if you have a need to get computers to “understand” human communication, may be to avoid the problem altogether. Instead of figuring out how to get computers to understand natural language, you get people to speak artificial language, the language of computers. A good way to start is with Like buttons and other standardized messaging protocols.

., 226 video games and, 94–97 Merholz, Peter, 21 Merleau-Ponty, Maurice, 300 Merton, Robert, 12–13 message-automation service, 167 Meyer, Stephenie, 50 Meyerowitz, Joanne, 338 microfilm, microphotography, 267 Microsoft, 108, 168, 205, 284 military technology, 331–32 Miller, Perry, xvii mindfulness, 162 Minima Moralia (Adorno), 153–54 mirrors, 138–39 Mitchell, Joni, 128 Mollie (video poker player), 218–19 monitoring: corporate control through, 163–65 of thoughts, 214–15 through wearable behavior-modification devices, 168–69 Montaigne, Michel de, 247, 249, 252, 254 Moore, Geoffrey, 209 Morlocks, 114, 186 “Morphological Basis of the Arm-to-Wing Transition, The” (Poore), 329–30 Morrison, Ewan, 288 Morrison, Jim, 126 Morse code, 34 “Most of It, The” (Frost), 145–46 motor skills, video games and, 93–94 “Mowing” (Frost), 296–300, 302, 304–5 MP3 players, 122, 123, 124, 216, 218, 293 multitasking, media, 96–97 Mumford, Lewis, 138–39, 235 Murdoch, Rupert and Wendi, 131 music: bundling of, 41–46 commercial use of, 244–45 copying and sharing technologies for, 121–26, 314 digital revolution in, 293–95 fidelity of, 124 listening vs. interface in, 216–18, 293 in participatory games, 71–72 streamed and curated, 207, 217–18 music piracy, 121–26 Musings on Human Metamorphoses (Leary), 171 Musk, Elon, 172 Musset, Alfred de, xxiii Muzak, 208, 244 MySpace, xvi, 10–11, 30–31 “Names of the Hare, The,” 201 nanotechnology, 69 Napster, 122, 123 narcissism, 138–39 Twitter and, 34–36 narrative emotions, 250 natural-language processing, 215 Negroponte, Nicholas, xx neobehavioralism, 212–13 Netflix, 92 neural networks, 136–37 neuroengineering, 332–33 New Critics, 249 News Feed, 320 news media, 318–20 newspapers: evolution of, 79, 237 online archives of, 47–48, 190–92 online vs. printed, 289 Newton, Isaac, 66 New York Public Library, 269 New York Times, 8, 71, 83, 133, 152–53, 195, 237, 283, 314, 342 erroneous information revived by, 47–48 on Twitter, 35 Nielsen Company, 80–81 Nietzsche, Friedrich, 126, 234–35, 237 Nightingale, Paul, 335 Nixon, Richard, 317 noise pollution, 243–46 Nook, 257 North of Boston (Frost), 297 nostalgia, 202, 204, 312 in music, 292–95 Now You See It (Davidson), 94 Oates, Warren, 203 Oatley, Keith, 248–50 Obama, Barack, 314 obsession, 218–19 OCLC, 276 “off grid,” 52 Olds, James, 235 O’Neill, Gerard, 171 One Infinite Loop, 76 Ong, Walter, 129 online aggregation, 192 On Photography (Sontag), xx open networks, profiteering from, 83–85 open-source projects, 5–7, 26 Oracle, 17 orchises, 305 O’Reilly, Tim, 3–5, 7 organ donation and transplantation, 115 ornithopters, 239 orphan books, 276, 277 Overture, 279–80 Owad, Tom, 256 Oxford Junior Dictionary, 201–2 Oxford University, library of, 269 Page, Larry, 23, 160, 172, 239, 268–69, 270, 279, 281–85 personal style of, 16–17, 281–82, 285 paint-by-number kits, 71–72 Paley, William, 43 Palfrey, John, 272–74, 277 Palmisano, Sam, 26 “pancake people,” 242 paper, invention and uses of, 286–89 Paper: An Elegy (Sansom), 287 Papert, Seymour, 134 Paradise within the Reach of All Men, The (Etzler), xvi–xvii paradox of time, 203–4 parenting: automation of, 181 of virtual child, 73–75 Parker, Sarah Jessica, 131 participation: “cognitive surplus” in, 59 as content and performance, 184 inclusionists vs. deletionists in, 18–20 internet, 28–29 isolation and, 35–36, 184 limits and flaws of, 5–7, 62 Paul, Rand, 314 Pendragon, Caliandras (avatar), 25 Pentland, Alex, 212–13 perception, spiritual awakening of, 300–301 personalization, 11 of ads, 168, 225, 264 isolation and, 29 loss of autonomy in, 264–66 manipulation through, 258–59 in message automation, 167 in searches, 145–46, 264–66 of streamed music, 207–9, 245 tailoring in, 92, 224 as threat to privacy, 255 Phenomenology of Perception (Merleau-Ponty), 300 Philosophical Investigations (Wittgenstein), 215 phonograph, phonograph records, 41–46, 133, 287 photography, technological advancement in, 311–12 Pichai, Sundar, 181 Pilgrims, 172 Pinterest, 119, 186 playlists, 314 PlayStation, 260 “poetic faith,” 251 poetry, 296–313 polarization, 7 politics, transformed by technology, 314–20 Politics (Aristotle), 307–8 Poore, Samuel O., 329–30 pop culture, fact-mongering in, 58–62 pop music, 44–45, 63–64, 224 copying technologies for, 121–26 dead idols of, 126 industrialization of, 208–9 as retrospective and revivalist, 292–95 positivism, 211 Potter, Dean, 341–42 power looms, 178 Presley, Elvis, 11, 126 Prim Revolution, 26 Principles of Psychology (James), 203 Principles of Scientific Management, The (Taylor), 238 printing press: consequences of, 102–3, 234, 240–41, 271 development of, 53, 286–87 privacy: devaluation of, 258 from electronic surveillance, 52 family cohesion vs., 229 free flow of information vs. right to, 190–94 internet threat to, 184, 255–59, 265, 285 safeguarding of, 258–59, 283 vanity vs., 107 proactive cognitive control, 96 Prochnik, George, 243–46 “Productivity Future Vision (2011),” 108–9 Project Gutenberg, 278 prosperity, technologies of, 118, 119–20 prosumerism, 64 protest movements, 61 Proust and the Squid (Wolf), 234 proximal clues, 303 public-domain books, 277–78 “public library,” debate over use of term, 272–74 punch-card tabulator, 188 punk music, 63–64 Quantified Self Global Conference, 163 Quantified Self (QS) movement, 163–65 Quarter-of-a-Second Rule, 205 racecars, 195, 196 radio: in education, 134 evolution of, 77, 79, 159, 288 as music medium, 45, 121–22, 207 political use of, 315–16, 317, 319 Radosh, Daniel, 71 Rapp, Jen, 341–42 reactive cognitive control, 96 Readers’ Guide to Periodical Literature, 91 reading: brain function in, 247–54, 289–90 and invention of paper, 286–87 monitoring of, 257 video gaming vs., 261–62 see also books reading skills, changes in, 232–34, 240–41 Read Write Web (blog), 30 Reagan, Ronald, 315 real world: digital media intrusion in, 127–30 perceived as boring and ugly, 157–58 as source of knowledge, 313 virtual world vs., xx–xxi, 36, 62, 127–30, 303–4 reconstructive surgery, 239 record albums: copying of, 121–22 jackets for, 122, 224 technology of, 41–46 Redding, Otis, 126 Red Light Center, 39 Reichelt, Franz, 341 Reid, Rob, 122–25 relativists, 20 religion: internet perceived as, 3–4, 238 for McLuhan, 105 technology viewed as, xvi–xvii Republic of Letters, 271 reputations, tarnishing of, 47–48, 190–94 Resident Evil, 260–61 resource sharing, 148–49 resurrection, 69–70, 126 retinal implants, 332 Retromania (Reynolds), 217, 292–95 Reuters, Adam, 26 Reuters’ SL bureau, 26 revivification machine, 69–70 Reynolds, Simon, 217–18, 292–95 Rice, Isaac, 244 Rice, Julia Barnett, 243–44 Richards, Keith, 42 “right to be forgotten” lawsuit, 190–94 Ritalin, 304 robots: control of, 303 creepy quality of, 108 human beings compared to, 242 human beings replaced by, 112, 174, 176, 195, 197, 306–7, 310 limitations of, 323 predictions about, xvii, 177, 331 replaced by humans, 323 threat from, 226, 309 Rogers, Roo, 83–84 Rolling Stones, 42–43 Roosevelt, Franklin, 315 Rosen, Nick, 52 Rubio, Marco, 314 Rumsey, Abby Smith, 325–27 Ryan, Amy, 273 Sandel, Michael J., 340 Sanders, Bernie, 314, 316 Sansom, Ian, 287 Savage, Jon, 63 scatology, 147 Schachter, Joshua, 195 Schivelbusch, Wolfgang, 229 Schmidt, Eric, 13, 16, 238, 239, 257, 284 Schneier, Bruce, 258–59 Schüll, Natasha Dow, 218 science fiction, 106, 115, 116, 150, 309, 335 scientific management, 164–65, 237–38 Scrapbook in American Life, The, 185 scrapbooks, social media compared to, 185–86 “Scrapbooks as Cultural Texts” (Katriel and Farrell), 186 scythes, 302, 304–6 search-engine-optimization (SEO), 47–48 search engines: allusions sought through, 86 blogging, 66–67 in centralization of internet, 66–69 changing use of, 284 customizing by, 264–66 erroneous or outdated stories revived by, 47–48, 190–94 in filtering, 91 placement of results by, 47–48, 68 searching vs., 144–46 targeting information through, 13–14 writing tailored to, 89 see also Google searching, ontological connotations of, 144–46 Seasteading Institute, 172 Second Life, 25–27 second nature, 179 self, technologies of the, 118, 119–20 self-actualization, 120, 340 monitoring and quantification of, 163–65 selfies, 224 self-knowledge, 297–99 self-reconstruction, 339 self-tracking, 163–65 Selinger, Evan, 153 serendipity, internet as engine of, 12–15 SETI@Home, 149 sexbots, 55 Sex Pistols, 63 sex-reassignment procedures, 337–38 sexuality, 10–11 virtual, 39 Shakur, Tupac, 126 sharecropping, as metaphor for social media, 30–31 Shelley, Percy Bysshe, 88 Shirky, Clay, 59–61, 90, 241 Shop Class as Soulcraft (Crawford), 265 Shuster, Brian, 39 sickles, 302 silence, 246 Silicon Valley: American culture transformed by, xv–xxii, 148, 155–59, 171–73, 181, 241, 257, 309 commercial interests of, 162, 172, 214–15 informality eschewed by, 197–98, 215 wealthy lifestyle of, 16–17, 195 Simonite, Tom, 136–37 simulation, see virtual world Singer, Peter, 267 Singularity, Singularitarians, 69, 147 sitcoms, 59 situational overload, 90–92 skimming, 233 “Slaves to the Smartphone,” 308–9 Slee, Tom, 61, 84 SLExchange, 26 slot machines, 218–19 smart bra, 168–69 smartphones, xix, 82, 136, 145, 150, 158, 168, 170, 183–84, 219, 274, 283, 287, 308–9, 315 Smith, Adam, 175, 177 Smith, William, 204 Snapchat, 166, 205, 225, 316 social activism, 61–62 social media, 224 biases reinforced by, 319–20 as deceptively reflective, 138–39 documenting one’s children on, 74–75 economic value of content on, 20–21, 53–54, 132 emotionalism of, 316–17 evolution of, xvi language altered by, 215 loom as metaphor for, 178 maintaining one’s microcelebrity on, 166–67 paradox of, 35–36, 159 personal information collected and monitored through, 257 politics transformed by, 314–20 scrapbooks compared to, 185–86 self-validation through, 36, 73 traditional media slow to adapt to, 316–19 as ubiquitous, 205 see also specific sites social organization, technologies of, 118, 119 Social Physics (Pentland), 213 Society for the Suppression of Unnecessary Noise, 243–44 sociology, technology and, 210–13 Socrates, 240 software: autonomous, 187–89 smart, 112–13 solitude, media intrusion on, 127–30, 253 Songza, 207 Sontag, Susan, xx SoundCloud, 217 sound-management devices, 245 soundscapes, 244–45 space travel, 115, 172 spam, 92 Sparrow, Betsy, 98 Special Operations Command, U.S., 332 speech recognition, 137 spermatic, as term applied to reading, 247, 248, 250, 254 Spinoza, Baruch, 300–301 Spotify, 293, 314 “Sprite Sips” (app), 54 Squarciafico, Hieronimo, 240–41 Srinivasan, Balaji, 172 Stanford Encyclopedia of Philosophy, 68 Starr, Karla, 217–18 Star Trek, 26, 32, 313 Stengel, Rick, 28 Stephenson, Neal, 116 Sterling, Bruce, 113 Stevens, Wallace, 158 Street View, 137, 283 Stroop test, 98–99 Strummer, Joe, 63–64 Studies in Classic American Literature (Lawrence), xxiii Such Stuff as Dreams (Oatley), 248–49 suicide rate, 304 Sullenberger, Sully, 322 Sullivan, Andrew, xvi Sun Microsystems, 257 “surf cams,” 56–57 surfing, internet, 14–15 surveillance, 52, 163–65, 188–89 surveillance-personalization loop, 157 survival, technologies of, 118, 119 Swing, Edward, 95 Talking Heads, 136 talk radio, 319 Tan, Chade-Meng, 162 Tapscott, Don, 84 tattoos, 336–37, 340 Taylor, Frederick Winslow, 164, 237–38 Taylorism, 164, 238 Tebbel, John, 275 Technics and Civilization (Mumford), 138, 235 technology: agricultural, 305–6 American culture transformed by, xv–xxii, 148, 155–59, 174–77, 214–15, 229–30, 296–313, 329–42 apparatus vs. artifact in, 216–19 brain function affected by, 231–42 duality of, 240–41 election campaigns transformed by, 314–20 ethical hazards of, 304–11 evanescence and obsolescence of, 327 human aspiration and, 329–42 human beings eclipsed by, 108–9 language of, 201–2, 214–15 limits of, 341–42 master-slave metaphor for, 307–9 military, 331–32 need for critical thinking about, 311–13 opt-in society run by, 172–73 progress in, 77–78, 188–89, 229–30 risks of, 341–42 sociology and, 210–13 time perception affected by, 203–6 as tool of knowledge and perception, 299–304 as transcendent, 179–80 Technorati, 66 telegrams, 79 telegraph, Twitter compared to, 34 telephones, 103–4, 159, 288 television: age of, 60–62, 79, 93, 233 and attention disorders, 95 in education, 134 Facebook ads on, 155–56 introduction of, 103–4, 159, 288 news coverage on, 318 paying for, 224 political use of, 315–16, 317 technological adaptation of, 237 viewing habits for, 80–81 Teller, Astro, 195 textbooks, 290 texting, 34, 73, 75, 154, 186, 196, 205, 233 Thackeray, William, 318 “theory of mind,” 251–52 Thiel, Peter, 116–17, 172, 310 “Things That Connect Us, The” (ad campaign), 155–58 30 Days of Night (film), 50 Thompson, Clive, 232 thought-sharing, 214–15 “Three Princes of Serendip, The,” 12 Thurston, Baratunde, 153–54 time: memory vs., 226 perception of, 203–6 Time, covers of, 28 Time Machine, The (Wells), 114 tools: blurred line between users and, 333 ethical choice and, 305 gaining knowledge and perception through, 299–304 hand vs. computer, 306 Home and Away blurred by, 159 human agency removed from, 77 innovation in, 118 media vs., 226 slave metaphor for, 307–8 symbiosis with, 101 Tosh, Peter, 126 Toyota Motor Company, 323 Toyota Prius, 16–17 train disasters, 323–24 transhumanism, 330–40 critics of, 339–40 transparency, downside of, 56–57 transsexuals, 337–38 Travels and Adventures of Serendipity, The (Merton and Barber), 12–13 Trends in Biochemistry (Nightingale and Martin), 335 TripAdvisor, 31 trolls, 315 Trump, Donald, 314–18 “Tuft of Flowers, A” (Frost), 305 tugboats, noise restrictions on, 243–44 Tumblr, 166, 185, 186 Turing, Alan, 236 Turing Test, 55, 137 Twain, Mark, 243 tweets, tweeting, 75, 131, 315, 319 language of, 34–36 theses in form of, 223–26 “tweetstorm,” xvii 20/20, 16 Twilight Saga, The (Meyer), 50 Twitter, 34–36, 64, 91, 119, 166, 186, 197, 205, 223, 224, 257, 284 political use of, 315, 317–20 2001: A Space Odyssey (film), 231, 242 Two-Lane Blacktop (film), 203 “Two Tramps in Mud Time” (Frost), 247–48 typewriters, writing skills and, 234–35, 237 Uber, 148 Ubisoft, 261 Understanding Media (McLuhan), 102–3, 106 underwearables, 168–69 unemployment: job displacement in, 164–65, 174, 310 in traditional media, 8 universal online library, 267–78 legal, commercial, and political obstacles to, 268–71, 274–78 universe, as memory, 326 Urban Dictionary, 145 utopia, predictions of, xvii–xviii, xx, 4, 108–9, 172–73 Uzanne, Octave, 286–87, 290 Vaidhyanathan, Siva, 277 vampires, internet giants compared to, 50–51 Vampires (game), 50 Vanguardia, La, 190–91 Van Kekerix, Marvin, 134 vice, virtual, 39–40 video games, 223, 245, 303 as addictive, 260–61 cognitive effects of, 93–97 crafting of, 261–62 violent, 260–62 videos, viewing of, 80–81 virtual child, tips for raising a, 73–75 virtual world, xviii commercial aspects of, 26–27 conflict enacted in, 25–27 language of, 201–2 “playlaborers” of, 113–14 psychological and physical health affected by, 304 real world vs., xx–xxi, 36, 62, 127–30 as restrictive, 303–4 vice in, 39–40 von Furstenberg, Diane, 131 Wales, Jimmy, 192 Wallerstein, Edward, 43–44 Wall Street, automation of, 187–88 Wall Street Journal, 8, 16, 86, 122, 163, 333 Walpole, Horace, 12 Walters, Barbara, 16 Ward, Adrian, 200 Warhol, Andy, 72 Warren, Earl, 255, 257 “Waste Land, The” (Eliot), 86, 87 Watson (IBM computer), 147 Wealth of Networks, The (Benkler), xviii “We Are the Web” (Kelly), xxi, 4, 8–9 Web 1.0, 3, 5, 9 Web 2.0, xvi, xvii, xxi, 33, 58 amorality of, 3–9, 10 culturally transformative power of, 28–29 Twitter and, 34–35 “web log,” 21 Wegner, Daniel, 98, 200 Weinberger, David, 41–45, 277 Weizenbaum, Joseph, 236 Wells, H.


pages: 223 words: 60,909

Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech by Sara Wachter-Boettcher

Airbnb, airport security, AltaVista, big data - Walmart - Pop Tarts, Donald Trump, Ferguson, Missouri, Firefox, Grace Hopper, job automation, Kickstarter, lifelogging, Mark Zuckerberg, Menlo Park, move fast and break things, move fast and break things, natural language processing, pattern recognition, Peter Thiel, recommendation engine, ride hailing / ride sharing, self-driving car, Silicon Valley, Silicon Valley startup, Snapchat, Steve Jobs, Tim Cook: Apple, Travis Kalanick, upwardly mobile, women in the workforce, zero-sum game

In 2013, Google researchers trained a system to comb through Google News articles, parsing huge amounts of text and identifying patterns in how words are used within them. The result is Word2vec, a neural network made up of 3 million word embeddings, or semantic relationships between words. What Word2vec does is essentially reconstruct the way words work linguistically, in order to improve capabilities for natural language processing: the practice of teaching machines to understand human language as it’s spoken or written day to day—the kind of thing that allows Siri or a search engine to understand what you mean and provide an answer. Word2vec and other similar word-embedding systems do this by looking at how frequently pairs of words appear in the same text, and how near each other they appear. Over time, these patterns allow a system to understand semantic meaning and accurately complete analogies like “man is to woman as king is to _____” or “Paris is to France as Tokyo is to _____.” 26 That’s all well and good, but the system also returns other kinds of relationships—like “man is to woman as computer programmer is to homemaker.”

See also marginalized populations and companies’ collection of gender information, 62–64 and companies’ name policies, 54–55, 58 and edge cases, 38 and Etsy, 32–33 importance of tech to, 195–197 and normalizing TV programming, 48 and same-sex marriage, 196–198 and Milos Yiannopoulos, 153 Lil Miss Hot Mess, 55 location tracking, 105–108 Lone Hill, Dana, 54 McAdoo, Greg, 175 McBride, Sarah, 175 machine-learning products, 121, 128, 132, 135, 136, 140, 146 Mack, Arien, 95 McKesson, DeRay, 81 Mad Money (TV show), 158 MailChimp, 89–90 marginalized populations and default settings, 37, 66 and digital forms, 51, 61, 72, 75 and digital products’ personal data collection, 116–117 importance of tech to, 195–197 market negging, and opt-ins, 91–92, 97 Martin, Erik, 162–163 Martin, Trayvon, 141 Martinez, Chris, 30 Maslow, Abraham, 3 maternity policies, 16 MAUs (monthly active users) metric, 74, 97–98 May, Rob, 139 Mayer, Marissa, 143 Medium publishing platform, 87–88, 180 menstrual cycle tracking apps, 28–33 Mental Models (Young), 46 meritocracy and ethics, 176, 189 tech industry as, 173–177, 180 Uber as, 180 Messer, Madeline, 35, 37 metadata from emails, 102 Meyer, Eric, 4–5, 40, 64, 79, 82, 89, 96 Meyer, Rebecca, 4–5, 5 microaggressions, 70–73 Microsoft, 6, 36–37 Miley, Leslie, 158 misplaced celebrations and humor, 78–85, 87–90, 114–115, 200 Moments Facebook feature, 85, 97 monoculture, tech industry as, 188–189 monthly active users (MAUs) metric, 74, 97–98 Mosseri, Adam, 168 Mozilla, 102 multiracial populations, and form field design, 60–62 mystification of tech, 9, 11–12, 26, 143, 188, 191–193, 199 National Public Radio (NPR), 1, 40–44 National Security Agency (NSA), 102 National Suicide Prevention Lifeline, 6 Native Americans, Facebook’s rejection of names of, 53–57 natural language processing, 138 negging, 91–92, 97 Neighbors for Racial Justice, 69 Netflix, 144 neural networks, 131–133, 138 News Feed Facebook feature, 144, 168–169 Nextdoor app, 67–71, 71, 73–75 Noble, Safiya, 10, 113 non-binary people. See LGBTQ community Northpointe, 120, 125–127 Note to Self podcast, 130, 171 Nye, Bill, 1 Ohanian, Alexis, 161, 164 O’Neil, Cathy, 112, 126 online time, growth of Americans, 1–3 On This Day Facebook feature, 83–84, 97 opt-in pop-ups, 90–92, 97 oversight, tech industry’s desire to avoid, 187–189, 199 Page, Shirley, 133 Palantir, 199–200 Pancake, Beth, 57 Pao, Ellen, 162 Parker, Bernard, 119–120 PayPal, 175 Penny, Laurie, 153 personal data and algorithmic systems, 145 collected during mobile usage, 116–117 and data brokers, 101–104 digital products designed to collect, 105–117 tech industry’s responsibility for, 146 value of, 96–98 personalization of online content, 86–90, 99 personal names, digital forms’ problems with, 40, 52–59, 71–72, 75 personas, 27–33, 29, 44–47, 110 Phillips, Katherine W., 184–186 photo autotagging, 129–130, 129, 130, 132–133, 135–138, 145 pickup artist (PUA) community, 91–92 Pinterest, 42 political bias, and Trending Facebook feature, 165–167, 169 Practical Empathy (Young), 46 privacy and digital products’ collection of personal data, 115, 117 and Facebook, 108–109 and Google, 109 and Uber, 107–108 ProPublica, 103, 112–113, 120, 126–127 proxy data, 109–114 PUA (pickup artist) community, 91–92 PureGym, 6 push notifications, 198 Quantified Self movement, 28 queer people.


pages: 239 words: 70,206

Data-Ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else by Steve Lohr

"Robert Solow", 23andMe, Affordable Care Act / Obamacare, Albert Einstein, big data - Walmart - Pop Tarts, bioinformatics, business cycle, business intelligence, call centre, cloud computing, computer age, conceptual framework, Credit Default Swap, crowdsourcing, Daniel Kahneman / Amos Tversky, Danny Hillis, data is the new oil, David Brooks, East Village, Edward Snowden, Emanuel Derman, Erik Brynjolfsson, everywhere but in the productivity statistics, Frederick Winslow Taylor, Google Glasses, impulse control, income inequality, indoor plumbing, industrial robot, informal economy, Internet of things, invention of writing, Johannes Kepler, John Markoff, John von Neumann, lifelogging, Mark Zuckerberg, market bubble, meta analysis, meta-analysis, money market fund, natural language processing, obamacare, pattern recognition, payday loans, personalized medicine, precision agriculture, pre–internet, Productivity paradox, RAND corporation, rising living standards, Robert Gordon, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, speech recognition, statistical model, Steve Jobs, Steven Levy, The Design of Experiments, the scientific method, Thomas Kuhn: the structure of scientific revolutions, unbanked and underbanked, underbanked, Von Neumann architecture, Watson beat the top human players on Jeopardy!

Our notions of “knowledge,” “meaning,” and “understanding” don’t really apply to how this technology works. Humans understand things in good part largely because of their experience of the real world. Computers lack that advantage. Advances in artificial intelligence mean that machines can increasingly see, read, listen, and speak, in their way. And a very different way, it is. As Frederick Jelinek, a pioneer in speech recognition and natural-language processing at IBM, once explained by way of analogy: “Airplanes don’t flap their wings.” To get a sense of how computers build knowledge, let’s look at Carnegie Mellon University’s Never-Ending Language Learning system, or NELL. Since 2010, NELL has been steadily scanning hundreds of millions of Web pages for text patterns that it uses to learn facts, more than 2.3 million so far, with an estimated accuracy of 87 percent.

Decades ago, the main focus of artificial intelligence research was to develop knowledge rules and relationships to make so-called expert systems. But those systems proved extremely difficult to build. So knowledge systems gave way to the data-driven path: mine vast amounts of data to make predictions, based on statistical probabilities and patterns. Data-fueled artificial intelligence, Ferrucci says, has been “incredibly powerful” for tasks like natural-language processing—a central technology, for example, behind Google’s search and Watson’s question-answering. “But in a purely data-driven approach, there is no real understanding,” he says. “People are so enamored with the data-driven approach that they believe correlation is enough.” For a broad swath of commercial decisions, as we’ve seen, correlation is sufficient, as long as the outcome is a winner.


pages: 49 words: 12,968

Industrial Internet by Jon Bruner

autonomous vehicles, barriers to entry, commoditize, computer vision, data acquisition, demand response, en.wikipedia.org, factory automation, Google X / Alphabet X, industrial robot, Internet of things, job automation, loose coupling, natural language processing, performance metric, Silicon Valley, slashdot, smart grid, smart meter, statistical model, web application

“Now you’ve got what you might call a rain API — two machines talking, mediated by a human being,” says Prasad. It could alert other cars to the presence of rain, perhaps switching on headlights automatically or changing the assumptions that nearby cars make about road traction. The human in this case becomes part of an API in situ — the software, integrated with hardware, is able to detect a strong signal from a human without relying on extractive tools like natural-language processing that are often used to divine human preferences. Connected to networks through easy procedural mechanisms like If This Then That (IFTTT)[29], human operators even at the consumer level can identify significant signals and make their machines react to them. “I’m a car guy, so I’m talking about cars, but imagine the number of machines out there that are being turned on and off. In each case, the fact that a human is turning it on and off tells you something very interesting; it’s human-annotated data,” says Prasad.


pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence by John Brockman

agricultural Revolution, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, artificial general intelligence, augmented reality, autonomous vehicles, basic income, bitcoin, blockchain, clean water, cognitive dissonance, Colonization of Mars, complexity theory, computer age, computer vision, constrained optimization, corporate personhood, cosmological principle, cryptocurrency, cuban missile crisis, Danny Hillis, dark matter, discrete time, Douglas Engelbart, Elon Musk, Emanuel Derman, endowment effect, epigenetics, Ernest Rutherford, experimental economics, Flash crash, friendly AI, functional fixedness, global pandemic, Google Glasses, hive mind, income inequality, information trail, Internet of things, invention of writing, iterative process, Jaron Lanier, job automation, Johannes Kepler, John Markoff, John von Neumann, Kevin Kelly, knowledge worker, loose coupling, microbiome, Moneyball by Michael Lewis explains big data, natural language processing, Network effects, Norbert Wiener, pattern recognition, Peter Singer: altruism, phenotype, planetary scale, Ray Kurzweil, recommendation engine, Republic of Letters, RFID, Richard Thaler, Rory Sutherland, Satyajit Das, Search for Extraterrestrial Intelligence, self-driving car, sharing economy, Silicon Valley, Skype, smart contracts, social intelligence, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, superintelligent machines, supervolcano, the scientific method, The Wisdom of Crowds, theory of mind, Thorstein Veblen, too big to fail, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!, Y2K

After thirty years of research, a million-times improvement in computer power, and vast data sets from the Internet, we now know the answer to this question: Neural networks scaled up to twelve layers deep, with billions of connections, are outperforming the best algorithms in computer vision for object recognition and have revolutionized speech recognition. It’s rare for any algorithm to scale this well, which suggests that they may soon be able to solve even more difficult problems. Recent breakthroughs have been made that allow the application of deep learning to natural-language processing. Deep recurrent networks with short-term memory were trained to translate English sentences into French sentences at high levels of performance. Other deep-learning networks could create English captions for the content of images with surprising and sometimes amusing acumen. Supervised learning using deep networks is a step forward, but still far from achieving general intelligence. The functions they perform are analogous to some capabilities of the cerebral cortex, which has also been scaled up by evolution, but to solve complex cognitive problems the cortex interacts with many other brain regions.

Brain-machine interfaces continue to be improved, initially for physically impaired people but eventually to provide a seamless boundary between people and the monitoring network. And virtual-reality-style interfaces will continue to become more realistic and immersive. Why won’t a stand-alone sentient brain come sooner? The amazing progress in spoken-language recognition—unthinkable ten years ago—derives in large part from having access to huge amounts of data and huge amounts of storage and fast networks. The improvements we see in natural-language processing are based on mimicking what people do, not understanding or even simulating it. It’s not owing to breakthroughs in understanding human cognition or even significantly different algorithms. But eGaia is already partly here, at least in the developed world. This distributed nerve-center network, an interplay among the minds of people and their monitoring electronics, will give rise to a distributed technical-social mental system the likes of which has not been experienced before.

To be sure, there have been exponential advances in narrow-engineering applications of artificial intelligence, such as playing chess, calculating travel routes, or translating texts in rough fashion, but there’s been scarcely more than linear progress in five decades of working toward strong AI. For example, the different flavors of intelligent personal assistants available on your smartphone are only modestly better than Eliza, an early example of primitive natural-language-processing from the mid-1960s. We still have no machine that can, for instance, read all that the Web has to say about war and plot a decent campaign, nor do we even have an open-ended AI system that can figure out how to write an essay to pass a freshman composition class or an eighth-grade science exam. Why so little progress, despite the spectacular increases in memory and CPU power? When Marvin Minksy and Gerald Sussman attempted the construction of a visual system in 1966, did they envision superclusters or gigabytes that would sit in your pocket?


pages: 492 words: 118,882

The Blockchain Alternative: Rethinking Macroeconomic Policy and Economic Theory by Kariappa Bheemaiah

accounting loophole / creative accounting, Ada Lovelace, Airbnb, algorithmic trading, asset allocation, autonomous vehicles, balance sheet recession, bank run, banks create money, Basel III, basic income, Ben Bernanke: helicopter money, bitcoin, blockchain, Bretton Woods, business cycle, business process, call centre, capital controls, Capital in the Twenty-First Century by Thomas Piketty, cashless society, cellular automata, central bank independence, Claude Shannon: information theory, cloud computing, cognitive dissonance, collateralized debt obligation, commoditize, complexity theory, constrained optimization, corporate governance, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, cryptocurrency, David Graeber, deskilling, Diane Coyle, discrete time, disruptive innovation, distributed ledger, diversification, double entry bookkeeping, Ethereum, ethereum blockchain, fiat currency, financial innovation, financial intermediation, Flash crash, floating exchange rates, Fractional reserve banking, full employment, George Akerlof, illegal immigration, income inequality, income per capita, inflation targeting, information asymmetry, interest rate derivative, inventory management, invisible hand, John Maynard Keynes: technological unemployment, John von Neumann, joint-stock company, Joseph Schumpeter, Kenneth Arrow, Kenneth Rogoff, Kevin Kelly, knowledge economy, large denomination, liquidity trap, London Whale, low skilled workers, M-Pesa, Marc Andreessen, market bubble, market fundamentalism, Mexican peso crisis / tequila crisis, MITM: man-in-the-middle, money market fund, money: store of value / unit of account / medium of exchange, mortgage debt, natural language processing, Network effects, new economy, Nikolai Kondratiev, offshore financial centre, packet switching, Pareto efficiency, pattern recognition, peer-to-peer lending, Ponzi scheme, precariat, pre–internet, price mechanism, price stability, private sector deleveraging, profit maximization, QR code, quantitative easing, quantitative trading / quantitative finance, Ray Kurzweil, Real Time Gross Settlement, rent control, rent-seeking, Satoshi Nakamoto, Satyajit Das, savings glut, seigniorage, Silicon Valley, Skype, smart contracts, software as a service, software is eating the world, speech recognition, statistical model, Stephen Hawking, supply-chain management, technology bubble, The Chicago School, The Future of Employment, The Great Moderation, the market place, The Nature of the Firm, the payments system, the scientific method, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, too big to fail, trade liberalization, transaction costs, Turing machine, Turing test, universal basic income, Von Neumann architecture, Washington Consensus

Advantages: easier and faster access to funds, less red tape, transparency, reputation awareness, and appropriate matching of risk based on client segment diversity Risks: reputational risks (right to be forgotten, unestablished standards, regulation, and data privacy 3. Investment Management Stance: Customer-facing Main technologies: Big Data, Machine Learning, Trading Algorithms, Social Media, Robo-Advisory, AI, Natural Language Processing (NLP), Cloud Computing. One of the most adverse outcomes of the crisis was its impact on wealth management: banks suffered a loss of trust, while potential clients now required higher amounts of capital in order to invest. As wages stagnated and employment slowed, it became increasingly difficult for new investors to invest smaller sums of money. Since 2008, a growing number of automated wealth management services (also known as robo-advisory) have arisen to provide low-cost, erudite alternatives to traditional wealth management.

So let’s look at one particular entity that is connected to all of these keywords and see how recent developments of this singular entity is linked to all the jargon being flung about today. The entity we will choose is Chatbots. A Chatbot is essentially a service, powered by rules and artificial intelligence (AI), that a user can interact with via a chat interface. The service could be anything ranging from functional to fun, and it could exist in any chat product (Facebook Messenger, Slack, telegram, text messages, etc.). Recent advancements in Natural Language Processing (NLP) and Automatic Speech Recognition (ASR), coupled with crowdsourced data inputs and machine learning techniques, now allow AI’s to not just understand groups of words but also submit a corresponding natural response to a grouping of words. That’s essentially the base definition of a conversation, except this conversation is with a “bot.” Does this mean that we’ll soon have technology that can pass the Turing test?

See also Debt and money capitalism, 22 cash obsession, 2 CRS report, 2 currencies, 3 floating exchange, 3 functions, 3 gold and silver, 3 history of money, 3 histroy, 2 real commodities, 3 transfer of, 4 types of, 3 withdrawn, 4 shadowbanking (see (Shadow banking and systemic risk)) utilitarian approach, 1 Multiple currencies, 130 Bitcoin Obituaries, 134 bitcoin price, 132 BTC/USD and USD/EUR volatility, 131 contractual money, 132 cryptocurrencies, 133 differences, 131 free banking, 135 Gresham’s law, 133 legal definition, 132 legal status, 132 private and government fiat, 134 private money, 130 quantitative model, 133 sovereign cash, 134 volatility, 131 „„         N Namecoin blockchain, 77 Namibia, 147 Natural Language Processing (NLP), 140 NemID, 79 Neo-Keynesian models, 169 Neuroplasticity, 220–221 New Keynesian models (NK models), 169 ■ INDEX „„         O Occupational Information Network (ONET), 89 Office of Scientific Research and Development (OSRD), 218 OpenID protocol, 76 Originate, repackage and sell model, 29 Originate-to-distribute model, 29 „„         P Paine, Thomas, 144 Palley, Thomas I., 28 Payment protection insurance (PPI), 32 Peer-to-peer (P2P), 46 Personal identification number (PIN), 79 Polycoin, 70 Popperian falsifiability, 163 Public Company Accounting Oversight Board (PCAOB), 153 Public-key certificate (PKC), 76 Public-key infrastructure (PKI), 76 „„         Q Quantitative easing (QE), 138 Quantitative model, 133 „„         R R3 CORDA™, 103 Rational expectations, 161–163 Rational expectations structural models, 221 Rational expectations theory (RET), 156 Rational expectations theory (RMT), 21 RBCmodels.


pages: 472 words: 117,093

Machine, Platform, Crowd: Harnessing Our Digital Future by Andrew McAfee, Erik Brynjolfsson

"Robert Solow", 3D printing, additive manufacturing, AI winter, Airbnb, airline deregulation, airport security, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, artificial general intelligence, augmented reality, autonomous vehicles, backtesting, barriers to entry, bitcoin, blockchain, British Empire, business cycle, business process, carbon footprint, Cass Sunstein, centralized clearinghouse, Chris Urmson, cloud computing, cognitive bias, commoditize, complexity theory, computer age, creative destruction, crony capitalism, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, Dean Kamen, discovery of DNA, disintermediation, disruptive innovation, distributed ledger, double helix, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Ethereum, ethereum blockchain, everywhere but in the productivity statistics, family office, fiat currency, financial innovation, George Akerlof, global supply chain, Hernando de Soto, hive mind, information asymmetry, Internet of things, inventory management, iterative process, Jean Tirole, Jeff Bezos, jimmy wales, John Markoff, joint-stock company, Joseph Schumpeter, Kickstarter, law of one price, longitudinal study, Lyft, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Marc Andreessen, Mark Zuckerberg, meta analysis, meta-analysis, Mitch Kapor, moral hazard, multi-sided market, Myron Scholes, natural language processing, Network effects, new economy, Norbert Wiener, Oculus Rift, PageRank, pattern recognition, peer-to-peer lending, performance metric, plutocrats, Plutocrats, precision agriculture, prediction markets, pre–internet, price stability, principal–agent problem, Ray Kurzweil, Renaissance Technologies, Richard Stallman, ride hailing / ride sharing, risk tolerance, Ronald Coase, Satoshi Nakamoto, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Skype, slashdot, smart contracts, Snapchat, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Pinker, supply-chain management, TaskRabbit, Ted Nelson, The Market for Lemons, The Nature of the Firm, Thomas Davenport, Thomas L Friedman, too big to fail, transaction costs, transportation-network company, traveling salesman, Travis Kalanick, two-sided market, Uber and Lyft, Uber for X, uber lyft, ubercab, Watson beat the top human players on Jeopardy!, winner-take-all economy, yield management, zero day

Much of the work of customer service, for example, consists of listening to people to understand what they want, then providing an answer or service to them. Modern technologies can take over the latter of these activities once they learn the rules of an interaction. But the hardest part of customer service to automate has not been finding an answer, but rather the initial step: listening and understanding. Speech recognition and other aspects of natural language processing have been tremendously difficult problems in artificial intelligence since the dawn of the field, for all of the reasons described earlier in this chapter. The previously dominant symbolic approaches have not worked well at all, but newer ones based on deep learning are making progress so quickly that it has surprised even the experts. In October of 2016, a team from Microsoft Research announced that a neural network they had built had achieved “human parity in conversational speech recognition,” as the title of their paper put it.

depth=1#x0026;hl=en#x0026;prev=search#x0026;rurl=translate.google.com#x0026;sl=ja#x0026;sp=nmt4#x0026;u=http://www.fukoku-life.co.jp/about/news/download/20161226.pdf. 84 In October of 2016: Allison Linn, “Historic Achievement: Microsoft Researchers Reach Human Parity in Conversational Speech Recognition,” Microsoft (blog), October 18, 2016, http://blogs.microsoft.com/next/2016/10/18/historic-achievement-microsoft-researchers-reach-human-parity-conversational-speech-recognition/#sm.0001d0t49dx0veqdsh21cccecz0e3. 84 “I must confess that I never thought”: Mark Liberman, “Human Parity in Conversational Speech Recognition,” Language Log (blog), October 18, 2016, http://languagelog.ldc.upenn.edu/nll/?p=28894. 84 “Every time I fire a linguist”: Julia Hirschberg, “ ‘Every Time I Fire a Linguist, My Performance Goes Up,’ and Other Myths of the Statistical Natural Language Processing Revolution” (speech, 15th National Conference on Artificial Intelligence, Madison, WI, July 29, 1998). 84 “AI-first world”: Julie Bort, “Salesforce CEO Marc Benioff Just Made a Bold Prediction about the Future of Tech,” Business Insider, May 18, 2016, http://www.businessinsider.com/salesforce-ceo-i-see-an-ai-first-world-2016-5. 85 “Many businesses still make important decisions”: Marc Benioff, “On the Cusp of an AI Revolution,” Project Syndicate, September 13, 2016, https://www.project-syndicate.org/commentary/artificial-intelligence-revolution-by-marc-benioff-2016-09.

Bertram’s Mind, The” (AI-generated prose), 121 MySpace, 170–71 Naam, Ramez, 258n Nakamoto, Satoshi, 279–85, 287, 296–97, 306, 312 Nakamoto Institute, 304 Nappez, Francis, 190 Napster, 144–45 NASA, 15 Nasdaq, 290–91 National Association of Realtors, 39 National Enquirer, 132 National Institutes of Health, 253 National Library of Australia, 274 Naturalis Historia (Pliny the Elder), 246 natural language processing, 83–84 “Nature of the Firm, The” (Coase), 309–10 Navy, US, 72 negative prices, 216 Nelson, Ted, 33 Nelson, Theodore, 229 Nesbitt, Richard, 45 Netflix, 187 Netscape Navigator, 34 network effects, 140–42 defined, 140 diffusion of platforms and, 205–6 O2O platforms and, 193 size of network and, 217 Stripe and, 174 Uber’s market value and, 219 networks, Cambrian Explosion and, 96 neural networks, 73–74, 78 neurons, 72–73 Newell, Allen, 69 Newmark, Craig, 138 New Republic, 133 news aggregators, 139–40 News Corp, 170, 171 newspapers ad revenue, 130, 132, 139 publishing articles directly on Facebook, 165 Newsweek, 133 New York City Postmates in, 185 taxi medallion prices before and after Uber, 201 UberPool in, 9 New York Times, 73, 130, 152 Ng, Andrew, 75, 96, 121, 186 Nielsen BookScan, 293, 294 99Degrees Custom, 333–34 99designs, 261 Nixon, Richard, 280n Nokia, 167–68, 203 noncredentialism, 241–42 Norman, Robert, 273–74 nugget ice, 11–14 Nuomi, 192 Nupedia, 246–48 Obama, Barack, election of 2012, 48–51 occupancy rates, 221–22 oDesk, 188 Office of Personnel Management, US, 32 oil rigs, 100 on-demand economy, future of companies in, 320 online discussion groups, 229–30 online payment services, 171–74 online reviews, 208–10 O2O (online to offline) platforms, 185–98 business-to-business, 188–90 consumer-oriented, 186–88 defined, 186 as engines of liquidity, 192–96 globalization of, 190–92 interdisciplinary insights from data compiled by, 194 for leveraging assets, 196–97 and machine learning, 194 Opal (ice maker), 13–14 Open Agriculture Initiative, 272 openness (crowd collaboration principle), 241 open platforms curation and, 165 downsides, 164 importance of, 163–65 as key to success, 169 open-source software; See also Linux Android as, 166–67 development by crowd, 240–45 operating systems, crowd-developed, 240–45 Oracle, 204 O’Reilly, Tim, 242 organizational dysfunction, 257 Oruna, 291 Osindero, Simon, 76 Osterman, Paul, 322 Ostrom, Elinor, 313 outcomes, clear (crowd collaboration principle), 243 outsiders in automated investing, 270 experts vs., 252–75 overall evaluation criterion, 51 Overstock.com, 290 Owen, Ivan, 273, 274 Owen, Jennifer, 274n ownership, contracts and, 314–15 Page, Larry, 233 PageRank, 233 Pahlka, Jennifer, 163 Painting Fool, The, 117 Papa John’s Pizza, 286 Papert, Seymour, 73 “Paperwork Mine,” 32 Paris, France, terrorist attack (2015), 55 Parker, Geoffrey, 148 parole, 39–40 Parse.ly, 10 Paulos, John Allen, 233 payments platforms, 171–74 peer reviews, 208–10 peer-to-peer lending, 263 peer-to-peer platforms, 144–45, 298 Peloton, 177n Penthouse magazine, 132 People Express, 181n, 182 Perceptron, 72–74 Perceptrons: An Introduction to Computational Geometry (Minsky and Papert), 73 perishing/perishable inventory and O2O platforms, 186 and revenue management, 181–84 risks in managing, 180–81 personal drones, 98 perspectives, differing, 258–59 persuasion, 322 per-transaction fees, 172–73 Pew Research Center, 18 p53 protein, 116–17 photography, 131 physical environments, experimentation in development of, 62–63 Pindyck, Robert, 196n Pinker, Steven, 68n piracy, of recorded music, 144–45 Plaice, Sean, 184 plastics, transition from molds to 3D printing, 104–7 Platform Revolution (Parker, Van Alstyne, and Choudary), 148 platforms; See also specific platforms business advantages of, 205–11 characteristics of successful, 168–74 competition between, 166–68 and complements, 151–68 connecting online and offline experience, 177–98; See also O2O (online to offline) platforms consumer loyalty and, 210–11 defined, 14, 137 diffusion of, 205 economics of “free, perfect, instant” information goods, 135–37 effect on incumbents, 137–48, 200–204 elasticity of demand, 216–18 future of companies based on, 319–20 importance of being open, 163–65; See also open platforms and information asymmetries, 206–10 limits to disruption of incumbents, 221–24 multisided markets, 217–18 music industry disruption, 143–48 network effect, 140–42 for nondigital goods/services, 178–85; See also O2O (online to offline) platforms and perishing inventory, 180–81 preference for lower prices by, 211–21 pricing elasticities, 212–13 product as counterpart to, 15 and product maker prices, 220–21 proliferation of, 142–48 replacement of assets with, 6–10 for revenue management, 181–84 supply/demand curves and, 153–57 and unbundling, 145–48 user experience as strategic element, 169–74 Playboy magazine, 133 Pliny the Elder, 246 Polanyi, Michael, 3 Polanyi’s Paradox and AlphaGo, 4 defined, 3 and difficulty of comparing human judgment to mathematical models, 42 and failure of symbolic machine learning, 71–72 and machine language, 82 and problems with centrally planned economies, 236 and System 1/System 2 relationship, 45 Postmates, 173, 184–85, 205 Postmates Plus Unlimited, 185 Postrel, Virginia, 90 Pratt, Gil, 94–95, 97, 103–4 prediction data-driven, 59–60 experimentation and, 61–63 statistical vs. clinical, 41 “superforecasters” and, 60–61 prediction markets, 237–39 premium brands, 210–11 presidential elections, 48–51 Priceline, 61–62, 223–24 price/pricing data-driven, 47; See also revenue management demand curves and, 154 elasticities, 212–13 loss of traditional companies’ power over, 210–11 in market economies, 237 and prediction markets, 238–39 product makers and platform prices, 220 supply curves and, 154–56 in two-sided networks, 213–16 Principia Mathematica (Whitehead and Russell), 69 print media, ad revenue and, 130, 132, 139 production costs, markets vs. companies, 313–14 productivity, 16 products as counterpart to platforms, 15 loss of profits to platform providers, 202–4 pairing free apps with, 163 platforms’ effect on, 200–225 threats from platform prices, 220–21 profitability Apple, 204 excessive use of revenue management and, 184 programming, origins of, 66–67 Project Dreamcatcher, 114 Project Xanadu, 33 proof of work, 282, 284, 286–87 prose, AI-generated, 121 Proserpio, Davide, 223 Prosper, 263 protein p53, 116–17 public service, 162–63 Pullman, David, 131 Pullum, Geoffrey, 84 quantitative investing firms (quants), 266–70 Quantopian, 267–70 Quinn, Kevin, 40–41 race cars, automated design for, 114–16 racism, 40, 51–52, 209–10 radio stations as complements to recorded music, 148 in late 1990s, 130 revenue declines (2000–2010), 135 Ramos, Ismael, 12 Raspbian, 244 rationalization, 45 Raymond, Eric, 259 real-options pricing, 196 reasoning, See System 1/System 2 reasoning rebundling, 146–47 recommendations, e-commerce, 47 recorded music industry in late 1990s, 130–31 declining sales (1999-2015), 134, 143 disruption by platforms, 143–48 Recording Industry Association of America (RIAA), 144 redlining, 46–47 Redmond, Michael, 2 reengineering, business process, 32–35 Reengineering the Corporation (Hammer and Champy), 32, 34–35, 37 regulation financial services, 202 Uber, 201–2, 208 Reichman, Shachar, 39 reinforcement learning, 77, 80 Renaissance Technologies, 266, 267 Rent the Runway, 186–88 Replicator 2 (3D printer), 273 reputational systems, 209–10 research and development (R&D), crowd-assisted, 11 Research in Motion (RIM), 168 residual rights of control, 315–18 “Resolution of the Bitcoin Experiment, The” (Hearn), 306 resource utilization rate, 196–97 restaurants, robotics in, 87–89, 93–94 retail; See also e-commerce MUEs and, 62–63 Stripe and, 171–74 retail warehouses, robotics in, 102–3 Rethinking the MBA: Business Education at a Crossroads (Datar, Garvin, and Cullen), 37 revenue, defined, 212 revenue management defined, 47 downsides of, 184–85 O2O platforms and, 193 platforms for, 181–84 platform user experience and, 211 problems with, 183–84 Rent the Runway and, 187 revenue-maximizing price, 212–13 revenue opportunities, as benefit of open platforms, 164 revenue sharing, Spotify, 147 reviews, online, 208–10 Ricardo, David, 279 ride services, See BlaBlaCar; Lyft; Uber ride-sharing, 196–97, 201 Rio Tinto, 100 Robohand, 274 robotics, 87–108 conditions for rapid expansion of, 94–98 DANCE elements, 95–98 for dull, dirty, dangerous, dear work, 99–101 future developments, 104–7 humans and, 101–4 in restaurant industry, 87–89 3D printing, 105–7 Rocky Mountain News, 132 Romney, Mitt, 48, 49 Roosevelt, Teddy, 23 Rosenblatt, Frank, 72, 73 Rovio, 159n Roy, Deb, 122 Rubin, Andy, 166 Ruger, Ted, 40–41 rule-based artificial intelligence, 69–72, 81, 84 Russell, Bertrand, 69 Sagalyn, Raphael, 293n Saloner, Garth, 141n Samsung and Android, 166 and Linux, 241, 244 sales and earnings deterioration, 203–4 San Francisco, California Airbnb in, 9 Craigslist in, 138 Eatsa in, 87 Napster case, 144 Postmates in, 185 Uber in, 201 Sanger, Larry, 246–48 Sato, Kaz, 80 Satoshi Nakamoto Institute, 304 scaling, cloud and, 195–96 Schiller, Phil, 152 Schumpeter, Joseph, 129, 264, 279, 330 Scott, Brian, 101–2 second machine age origins of, 16 phase one, 16 phase two, 17–18 secular trends, 93 security lanes, automated, 89 Sedol, Lee, 5–6 self-checkout kiosks, 90 self-driving automobiles, 17, 81–82 self-justification, 45 self-organization, 244 self-selection, 91–92 self-service, at McDonald’s, 92 self-teaching machines, 17 Seychelles Trading Company, 291 Shanghai Tower, 118 Shapiro, Carl, 141n Shaw, David, 266 Shaw, J.


pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life by Adam Greenfield

3D printing, Airbnb, augmented reality, autonomous vehicles, bank run, barriers to entry, basic income, bitcoin, blockchain, business intelligence, business process, call centre, cellular automata, centralized clearinghouse, centre right, Chuck Templeton: OpenTable:, cloud computing, collective bargaining, combinatorial explosion, Computer Numeric Control, computer vision, Conway's Game of Life, cryptocurrency, David Graeber, dematerialisation, digital map, disruptive innovation, distributed ledger, drone strike, Elon Musk, Ethereum, ethereum blockchain, facts on the ground, fiat currency, global supply chain, global village, Google Glasses, IBM and the Holocaust, industrial robot, informal economy, information retrieval, Internet of things, James Watt: steam engine, Jane Jacobs, Jeff Bezos, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, joint-stock company, Kevin Kelly, Kickstarter, late capitalism, license plate recognition, lifelogging, M-Pesa, Mark Zuckerberg, means of production, megacity, megastructure, minimum viable product, money: store of value / unit of account / medium of exchange, natural language processing, Network effects, New Urbanism, Occupy movement, Oculus Rift, Pareto efficiency, pattern recognition, Pearl River Delta, performance metric, Peter Eisenman, Peter Thiel, planetary scale, Ponzi scheme, post scarcity, post-work, RAND corporation, recommendation engine, RFID, rolodex, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, sharing economy, Silicon Valley, smart cities, smart contracts, social intelligence, sorting algorithm, special economic zone, speech recognition, stakhanovite, statistical model, stem cell, technoutopianism, Tesla Model S, the built environment, The Death and Life of Great American Cities, The Future of Employment, transaction costs, Uber for X, undersea cable, universal basic income, urban planning, urban sprawl, Whole Earth Review, WikiLeaks, women in the workforce

At retail, “seamless” point-of-sale processes and the displacement of responsibility onto the shopper themselves via self-checkout slash the number of personnel it takes to run a storefront operation, though some staff will always be required to smooth out the inevitable fiascos; perhaps a few high-end boutiques performatively, conspicuously retain a significant floor presence. In customer service, appalling “cognitive agents” take the place of front-line staff.44 Equipped with speech recognition and natural-language processing capabilities, with synthetic virtual faces that unhesitatingly fold in every last kind of problematic assumption about gender and ethnicity, they’re so cheap that it’s hard to imagine demanding, hard-to-train human staff holding out against them for very long. Even in so-called high-touch fields like childcare and home-health assistance, jobs that might be done and done well by people with no other qualification, face the prospect of elimination.

And this is true on many fronts. A test for machinic intelligence called the Winograd Schema, for example, asks candidate systems to resolve the problems of pronoun disambiguation that crop up constantly in everyday speech.11 Sentences of this type (“I plugged my phone into the wall because it needed to be recharged”) yield to common sense more or less immediately, but still tax the competence of the most advanced natural-language processing systems. Similarly, for all the swagger of their parent company, Uber’s nominally autonomous vehicles seem unable to cope with even so simple an element of the urban environment as a bike lane, swerving in front of cyclists on multiple occasions during the few days they were permitted to operate in San Francisco.12 In the light of results like this, fears that algorithmic systems might take over much of anything at all can easily seem wildly overblown.

Some reflexes are apparently immune to mockery. 19.An image of the brochure can be found at i1.wp.com/bobsullivan.net/wp-content/uploads/2014/09/incident-prevention-close-up-tight.png 20.Richard Kelley et al., “Context-Based Bayesian Intent Recognition,” IEEE Transactions on Autonomous Mental Development, Volume 4, Number 3, September 2012. 21.Richard Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, October 2013, pp. 1631–42. 22.Bob Sullivan, “Police Sold on Snaptrends, Software That Claims to Stop Crime Before It Starts,” bobsullivan.net, September 4, 2014. 23.Ibid. 24.Leo Mirani, “Millions of Facebook Users Have No Idea They’re Using the Internet,” Quartz, February 9, 2015. 25.Ellen Huet, “Server and Protect: Predictive Policing Firm PredPol Promises to Map Crime Before It Happens,” Forbes, February 11, 2015. 26.Ibid. 27.Robert L.


pages: 265 words: 74,000

The Numerati by Stephen Baker

Berlin Wall, Black Swan, business process, call centre, correlation does not imply causation, Drosophila, full employment, illegal immigration, index card, Isaac Newton, job automation, job satisfaction, McMansion, Myron Scholes, natural language processing, PageRank, personalized medicine, recommendation engine, RFID, Silicon Valley, Skype, statistical model, Watson beat the top human players on Jeopardy!

I could come to lots of other conclusions about her passions, her love interests, and even what she likes to eat. This is all clear to me. But she's writing in my language. Practically every word makes sense. The bad news, from a data-mining perspective, is that it takes me a scandalous five minutes to read through her text. In that time, Umbria's computers work through 35,300 blog posts. This magic takes place within two domains of artificial intelligence: natural language processing and machine learning. The idea is simple enough. The machines churn through the words, using their statistical genius and formidable memory to make sense of them. To say that they "understand" the words is a stretch. It's like saying that a blind bat, which navigates by processing the geometry of sound waves, "sees" the open window it flies through. But no matter. If computers can draw correct conclusions from the words they plow through, they pass the language test.

See Social networks Names finding people by, [>], [>], [>], [>]–[>], [>] on phone prompts, [>] protection of, in data mining, [>] NASA, [>]–[>] National Cryptologic Museum, [>], [>], [>]–[>] National Science Foundation, [>] National Security Agency (NSA) data mining by, [>], [>]–[>] mathematicians working for, [>], [>], [>]–[>], [>]–[>], [>] social network interpretation by, [>], [>]–[>] Natural language processing, [>]–[>] "Negotiators" (personality type), [>]–[>], [>] Netflix, [>], [>], [>] "Neural network" programs, [>]–[>] Newton, Isaac, [>] New York Times, [>] Next Friend Analysis, [>]–[>], [>] Nicaragua, [>] Nicolov, Nicolas, [>]–[>], [>], [>]–[>] Nielsen BuzzMetrics (company), [>], [>] 9/11 terrorist attack, [>], [>]–[>], [>]–[>], [>], [>], [>] "Nodes" (in social networks), [>] "Noise," [>] No Place to Hide (O'Harrow), [>] NORA software, [>]–[>], [>] Norman (fistulated cow), [>]–[>], [>], [>] NSA.


pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang

AI winter, artificial general intelligence, bioinformatics, brain emulation, combinatorial explosion, complexity theory, computer vision, conceptual framework, correlation coefficient, epigenetics, friendly AI, G4S, information retrieval, Isaac Newton, John Conway, Loebner Prize, Menlo Park, natural language processing, Occam's razor, p-value, pattern recognition, performance metric, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K

These terms have a many-to-many mapping to the terms directly used in the system’s “native language”, Narsese, and this mapping corresponds to a symbolize relation in Narsese. The truthvalue of a symbolizing statement indicates the frequency and confidence for the word/phrase/sentence (in the natural language) to be used as the symbol of the term (in Narsese), according to the experience of the system. In language understanding process, NARS will not have separate parsing and semantic mapping phases, like in many other natural language processing systems. Instead, for an input sentence, the recognition of its syntactic structure and the recognition of its semantic structure will be carried out hand-in-hand. The process will start by checking whether the sentence can be understood as a whole, as the case of proverbs and idioms. If unsuccessful, the sentence will be divided recursively into phrases and words, whose sequential relations will be tentatively mapped into the structures of compound terms, with components corresponding to the individual phrases and words.

However, due to the inevitable difference in experience, the system cannot always be able to use a natural language as a native speaker. Even so, its proficiency in that language should be sufficient for many practical purposes. Being able to use any natural language is not a necessary condition for being intelligent. Since the aim of NARS is not to accurately duplicate human behaviors so as to pass the Turing Test [5], natural language processing is optional for the system. 3.3 Education NARS processes tasks using available knowledge, though the system is not designed with a ready-made knowledge base as a necessary part. Instead, all the knowledge, in principle, should come from the system’s experience. In other words, NARS as designed is like a baby that has great potential, but little instinct. P. Wang / From NARS to a Thinking Machine 85 For the system to serve any practical purpose, extensive education, or training, is needed, which means to build a proper internal knowledge base (or call it belief network, long-term memory, etc.) by feeding the system with certain (initial) experience.

To gracefully incorporate heuristics not explicitly based on probability theory, in cases where probability theory, at its current state of development, does not provide adequate pragmatic solutions. To provide “scalable” reasoning, in the sense of being able to carry out inferences involving at least billions of premises. Of course, when the number of premises is fewer, more intensive and accurate reasoning may be carried out. To easily accept input from, and send input to, natural language processing software systems. PLN implements a wide array of first-order and higher-order inference rules including (but not limited to) deduction, Bayes’ Rule, unification, intensional and extensional inference, belief revision, induction, and abduction. Each rule comes with uncertain truth-value formulas, calculating the truth-value of the conclusion from the truthvalues of the premises. Inference is controlled by highly flexible forward and backward chaining processes able to take feedback from external processes and thus behave adaptively.


pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts by Richard Susskind, Daniel Susskind

23andMe, 3D printing, additive manufacturing, AI winter, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, Andrew Keen, Atul Gawande, Automated Insights, autonomous vehicles, Big bang: deregulation of the City of London, big data - Walmart - Pop Tarts, Bill Joy: nanobots, business process, business process outsourcing, Cass Sunstein, Checklist Manifesto, Clapham omnibus, Clayton Christensen, clean water, cloud computing, commoditize, computer age, Computer Numeric Control, computer vision, conceptual framework, corporate governance, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, death of newspapers, disintermediation, Douglas Hofstadter, en.wikipedia.org, Erik Brynjolfsson, Filter Bubble, full employment, future of work, Google Glasses, Google X / Alphabet X, Hacker Ethic, industrial robot, informal economy, information retrieval, interchangeable parts, Internet of things, Isaac Newton, James Hargreaves, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Khan Academy, knowledge economy, lifelogging, lump of labour, Marshall McLuhan, Metcalfe’s law, Narrative Science, natural language processing, Network effects, optical character recognition, Paul Samuelson, personalized medicine, pre–internet, Ray Kurzweil, Richard Feynman, Second Machine Age, self-driving car, semantic web, Shoshana Zuboff, Skype, social web, speech recognition, spinning jenny, strong AI, supply-chain management, telepresence, The Future of Employment, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, transaction costs, Turing test, Watson beat the top human players on Jeopardy!, WikiLeaks, young professional

This is a computer system, effectively, answering questions on any topic under the sun, and doing so more accurately and quickly than the best human beings at this task. It is hard to overstate how impressive this is. For us, it represents the coming of the second wave of AI (section 4.9). Here is a system that undoubtedly performs tasks that we would normally think require human intelligence. The version of Watson that competed on Jeopardy! holds over 200 million pages of documents and implements a wide range of AI tools and techniques, including natural language processing, machine learning, speech synthesis, game-playing, information retrieval, intelligent search, knowledge processing and reasoning, and much more. This type of AI, we stress again, is radically different from the first wave of rule-based expert systems of the 1980s (see section 4.9). It is interesting to note, harking back again to the exponential growth of information technology, that the hardware on which Watson ran in 2011 was said to be about the size of the average bedroom.

This was an exciting time for AI, the heyday of what has since been called the era of GOFAI (good old-fashioned AI). The term ‘artificial intelligence’ was coined by John McCarthy in 1955, and in the thirty years or so that followed a wide range of systems, techniques, and technologies were brought under its umbrella (the terms used in the mid-1980s are included in parentheses): the processing and translation of natural language (natural language processing); the recognition of the spoken word (speech recognition); the playing of complex games such as chess (game-playing); the recognition of images and objects of the physical world (vision and perception); learning from examples and precedents (machine learning); computer programs that can themselves generate programs (automatic programming); the sophisticated education of human users (intelligent computer-aided instruction); the design and development of machines whose physical movements resembled those of human beings (robotics), and intelligent problem-solving and reasoning (intelligent knowledge-based systems or expert systems).103 Our project at the University of Oxford (1983–6) focused on theoretical and philosophical aspects of this last category—expert systems—as applied in the law.

We can imagine a day when machines will not just make coffee, but will write wonderful poetry, compose splendid symphonies, paint stunning landscapes, sing beautifully, and even dance with remarkable grace. We are likely to judge these contributions in two ways. On the one hand, we might take a view on their relative merits as machine-generated achievement, marvelling perhaps at the underpinning natural language processing or robotics. Our interest will be in comparing like with like—machine performance with machine performance. On the other hand, we might compare their output with the creative expressions of human beings. It may well be that we will concede that, in terms of outcomes, the machine is superior. Yet this will be to contrast apples with pears, so that this comparison may turn out to be wrong-headed.


pages: 71 words: 14,237

21 Recipes for Mining Twitter by Matthew A. Russell

en.wikipedia.org, Google Earth, natural language processing, NP-complete, social web, web application

Only the data needs to be written to a %s placeholder in the template. See Also http://labs.mudynamics.com/wp-content/uploads/2009/04/icouch.html, http://help.com/ post/383276-anyone-knows-the-formula-for-font-s 1.11 Creating a Tag Cloud from Tweet Entities | 33 1.12 Summarizing Link Targets Problem You want to summarize the text of a web page that’s indicated by a short URL in a tweet. Solution Extract the text from the web page, and then use a natural language processing (NLP) toolkit such as the Natural Language Toolkit (NLTK) to help you extract the most important sentences to create a machine-generated abstract. Discussion Summarizing web pages is a very powerful capability, and this is especially the case in the context of a tweet where you have a lot of additional metadata (or “reactions”) about the page from one or more tweets. Summarizing web pages is a particularly hard and messy problem, but you can bootstrap a reasonable solution with less effort than you might think.


pages: 308 words: 84,713

The Glass Cage: Automation and Us by Nicholas Carr

Airbnb, Airbus A320, Andy Kessler, Atul Gawande, autonomous vehicles, Bernard Ziegler, business process, call centre, Captain Sullenberger Hudson, Charles Lindbergh, Checklist Manifesto, cloud computing, computerized trading, David Brooks, deliberate practice, deskilling, digital map, Douglas Engelbart, drone strike, Elon Musk, Erik Brynjolfsson, Flash crash, Frank Gehry, Frank Levy and Richard Murnane: The New Division of Labor, Frederick Winslow Taylor, future of work, global supply chain, Google Glasses, Google Hangouts, High speed trading, indoor plumbing, industrial robot, Internet of things, Jacquard loom, James Watt: steam engine, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Kevin Kelly, knowledge worker, Lyft, Marc Andreessen, Mark Zuckerberg, means of production, natural language processing, new economy, Nicholas Carr, Norbert Wiener, Oculus Rift, pattern recognition, Peter Thiel, place-making, plutocrats, Plutocrats, profit motive, Ralph Waldo Emerson, RAND corporation, randomized controlled trial, Ray Kurzweil, recommendation engine, robot derives from the Czech word robota Czech, meaning slave, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley ideology, software is eating the world, Stephen Hawking, Steve Jobs, TaskRabbit, technoutopianism, The Wealth of Nations by Adam Smith, turn-by-turn navigation, US Airways Flight 1549, Watson beat the top human players on Jeopardy!, William Langewiesche

When doctors make diagnoses, they draw on their knowledge of a large body of specialized information, learned through years of rigorous education and apprenticeship as well as the ongoing study of medical journals and other relevant literature. Until recently, it was difficult, if not impossible, for computers to replicate such deep, specialized, and often tacit knowledge. But inexorable advances in processing speed, precipitous declines in data-storage and networking costs, and breakthroughs in artificial-intelligence methods such as natural language processing and pattern recognition have changed the equation. Computers have become much more adept at reviewing and interpreting vast amounts of text and other information. By spotting correlations in the data—traits or phenomena that tend to be found together or to occur simultaneously or sequentially—computers are often able to make accurate predictions, calculating, say, the probability that a patient displaying a set of symptoms has or will develop a particular disease or the odds that a patient with a certain disease will respond well to a particular drug or other treatment regimen.

., 48–51, 215, 216 computer as metaphor and model for, 119 drawing and, 143, 144 imaginative work of, 25 unconscious, 83–84 Mindell, David, 60, 61 Missionaries and Cannibals, 75, 180 miswanting, 15, 228 MIT, 174, 175 Mitchell, William J., 138 mobile phones, 132–33 Moore’s Law, 40 Morozov, Evgeny, 205, 225 Moser, Edvard, 134–35 Moser, May-Britt, 134 motivation, 14, 17, 124 “Mowing” (Frost), 211–16, 218, 221–22 Murnane, Richard, 9, 10 Musk, Elon, 8 Nadin, Mihai, 80 NASA, 50, 55, 58 National Safety Council, 208 National Transportation Safety Board (NTSB), 44 natural language processing, 113 nature, 217, 220 Nature, 155 Nature Neuroscience, 134–35 navigation systems, 59, 68–71, 217 see also GPS Navy, U.S., 189 Nazi Germany, 35, 157 nervous system, 9–10, 36, 220–21 Networks of Power (Hughes), 196 neural networks, 113–14 neural processing, 119n neuroergonomic systems, 165 neurological studies, 9 neuromorphic microchips, 114, 119n neurons, 57, 133–34, 150, 219 neuroscience, neuroscientists, 74, 133–37, 140, 149 New Division of Labor, The (Levy and Murnane), 9 Nimwegen, Christof van, 75–76, 180 Noble, David, 173–74 Norman, Donald, 161 Noyes, Jan, 54–55 NSA, 120, 198 numerical control, 174–75 Oakeshott, Michael, 124 Obama, Barack, 94 Observer, 78–79 Oculus Rift, 201 Office of the Inspector General, 99 offices, 28, 108–9, 112, 222 automation complacency and, 69 Ofri, Danielle, 102 O’Keefe, John, 133–34 Old Dominion University, 91 “On Things Relating to the Surgery” (Hippocrates), 158 oracle machine, 119–20 “Outsourced Brain, The” (Brooks), 128 Pallasmaa, Juhani, 145 Parameswaran, Ashwin, 115 Parameters, 191 parametric design, 140–41 parametricism, 140–41 “Parametricism Manifesto” (Schumacher), 141 Parasuraman, Raja, 54, 67, 71, 166, 176 Parry, William Edward, 125 pattern recognition, 57, 58, 81, 83, 113 Pavlov, Ivan, 88 Pebble, 201 Pediatrics, 97 perception, 8, 121, 130, 131, 132, 133, 144, 148–51, 201, 214–18, 220, 226, 230 performance, Yerkes-Dodson law and, 96 Phenomenology of Perception (Merleau-Ponty), 216 philosophers, 119, 143, 144, 148–51, 186, 224 photography, film vs. digital, 230 Piano, Renzo, 138, 141–42 pilots, 1, 2, 32, 43–63, 91, 153 attentional tunneling and, 200–201 capability of the plane vs., 60–61, 154 death of, 53 erosion of expertise of, 54–58, 62–63 human- vs. technology-centered automation and, 168–70, 172–73 income of, 59–60 see also autopilot place, 131–34, 137, 251n place cells, 133–34, 136, 219 Plato, 148 Player Piano (Vonnegut), 39 poetry, 211–16, 218, 221–22 Poirier, Richard, 214, 215 Politics (Aristotle), 224 Popular Science, 48 Post, Wiley, 48, 50, 53, 57, 62, 82, 169 power, 21, 37, 65, 151, 175, 204, 217 practice, 82–83 Predator drone, 188 premature fixation, 145 presence, power of, 200 Priestley, Joseph, 160 Prius, 6, 13, 154–55 privacy, 206 probability, 113–24 procedural (tacit) knowledge, 9–11, 83, 105, 113, 144 productivity, 18, 22, 29, 30, 37, 106, 160, 173, 175, 181, 218 professional work, incursion of computers into, 115 profit motive, 17 profits, 18, 22, 28, 30, 33, 95, 159, 171, 172–73, 175 progress, 21, 26, 29, 37, 40, 65, 196, 214 acceleration of, 26 scientific, 31, 123 social, 159–60, 228 progress (continued) technological, 29, 31, 34, 35, 48–49, 108–9, 159, 160, 161, 173, 174, 222, 223–24, 226, 228, 230 utopian vision of, 25, 26 prosperity, 20, 21, 107 proximal cues, 219–20 psychologists, psychology, 9, 11, 15, 54, 103, 119, 149, 158–59 animal studies, 87–92 cognitive, 72–76, 81, 129–30 psychomotor skills, 56, 57–58, 81, 120 quality of experience, 14–15 Race against the Machine (Brynjolfsson and McAfee), 28–29 RAND Corporation, 93–98 “Rationalism in Politics” (Oakeshott), 124 Rattner, Justin, 203 reading, learning of, 82 Reaper drone, 188 reasoning, reason, 120, 121, 124, 151 recession, 27, 28, 30, 32 Red Dead Redemption, 177–78 “Relation of Strength of Stimulus to Rapidity of Habit-Formation, The” (Yerkes and Dodson), 89 Renslow, Marvin, 43–44 Revit, 146, 147 Rifkin, Jeremy, 28 Robert, David, 45, 169–70 Robert Frost (Poirier), 214 Roberts, J.


pages: 245 words: 83,272

Artificial Unintelligence: How Computers Misunderstand the World by Meredith Broussard

1960s counterculture, A Declaration of the Independence of Cyberspace, Ada Lovelace, AI winter, Airbnb, Amazon Web Services, autonomous vehicles, availability heuristic, barriers to entry, Bernie Sanders, bitcoin, Buckminster Fuller, Chris Urmson, Clayton Christensen, cloud computing, cognitive bias, complexity theory, computer vision, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, digital map, disruptive innovation, Donald Trump, Douglas Engelbart, easy for humans, difficult for computers, Electric Kool-Aid Acid Test, Elon Musk, Firefox, gig economy, global supply chain, Google Glasses, Google X / Alphabet X, Hacker Ethic, Jaron Lanier, Jeff Bezos, John von Neumann, Joi Ito, Joseph-Marie Jacquard, life extension, Lyft, Mark Zuckerberg, mass incarceration, Minecraft, minimum viable product, Mother of all demos, move fast and break things, move fast and break things, Nate Silver, natural language processing, PageRank, payday loans, paypal mafia, performance metric, Peter Thiel, price discrimination, Ray Kurzweil, ride hailing / ride sharing, Ross Ulbricht, Saturday Night Live, school choice, self-driving car, Silicon Valley, speech recognition, statistical model, Steve Jobs, Steven Levy, Stewart Brand, Tesla Model S, the High Line, The Signal and the Noise by Nate Silver, theory of mind, Travis Kalanick, Turing test, Uber for X, uber lyft, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, women in the workforce

Artificial superintelligences, like on the TV show Person of Interest or Star Trek, are imaginary. Yes, they’re fun to imagine, and it can inspire wonderful creativity to think about the possibilities of robot domination and so on—but they aren’t real. This book hews closely to the real mathematical, cognitive, and computational concepts that are in the actual academic discipline of artificial intelligence: knowledge representation and reasoning, logic, machine learning, natural language processing, search, planning, mechanics, and ethics. In the first computational adventure (chapter 5), I investigate why, after two decades of education reform, schools still can’t get students to pass standardized tests. It’s not the students’ or the teachers’ fault. The problem is far bigger: the companies that create the most important state and local exams also publish textbooks that contain many of the answers, but low-income school districts can’t afford to buy the books.

They write: Eugene Wigner’s article “The Unreasonable Effectiveness of Mathematics in the Natural Sciences” examines why so much of physics can be neatly explained with simple mathematical formulas such as f=ma or e=mc2. Meanwhile, sciences that involve human beings rather than elementary particles have proven more resistant to elegant mathematics. Economists suffer from physics envy over their inability to neatly model human behavior. An informal, incomplete grammar of the English language runs over 1,700 pages. Perhaps when it comes to natural language processing and related fields, we’re doomed to complex theories that will never have the elegance of physics equations. But if that’s so, we should stop acting as if or goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data.27 Data is unreasonably effective—seductively so, even. This explains why we can build a classifier that seems to predict with 97 percent accuracy whether a passenger survives the Titanic disaster and why a computer can defeat a human Go champion.


pages: 696 words: 143,736

The Age of Spiritual Machines: When Computers Exceed Human Intelligence by Ray Kurzweil

Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Any sufficiently advanced technology is indistinguishable from magic, Buckminster Fuller, call centre, cellular automata, combinatorial explosion, complexity theory, computer age, computer vision, cosmological constant, cosmological principle, Danny Hillis, double helix, Douglas Hofstadter, Everything should be made as simple as possible, first square of the chessboard / second half of the chessboard, fudge factor, George Gilder, Gödel, Escher, Bach, I think there is a world market for maybe five computers, information retrieval, invention of movable type, Isaac Newton, iterative process, Jacquard loom, John Markoff, John von Neumann, Lao Tzu, Law of Accelerating Returns, mandelbrot fractal, Marshall McLuhan, Menlo Park, natural language processing, Norbert Wiener, optical character recognition, ought to be enough for anybody, pattern recognition, phenotype, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, Robert Metcalfe, Schrödinger's Cat, Search for Extraterrestrial Intelligence, self-driving car, Silicon Valley, social intelligence, speech recognition, Steven Pinker, Stewart Brand, stochastic process, technological singularity, Ted Kaczynski, telepresence, the medium is the message, There's no reason for any individual to have a computer in his home - Ken Olsen, traveling salesman, Turing machine, Turing test, Whole Earth Review, Y2K

He cites the following sentence:“What number of products of products of products of products of products of products of products of products was the number of products of products of products of products of products of products of products of products?” as having 1,430 X 1,430 = 2,044,900 interpretations. 4 These and other theoretical aspects of computational linguistics are covered in Mary D. Harris, Introduction to Natural Language Processing (Reston, VA: Reston Publishing Co., 1985). CHAPTER 6: BUILDING NEW BRAINS ... 1 Hans Moravec is likely to make this argument in his 1998 book Robot: Mere Machine to Transcendent Mind (Oxford University Press; not yet available as of this writing). 2 One hundred fifty million calculations per second for a 1998 personal computer doubling twenty-seven times by the year 2025 (this assumes doubling both the number of components, and the speed of each component every two years) equals about 20 million billion calculations per second.

New York: Dover Publications, 1961. ————. Ninth Bridgewater Treatise: A Fragment. London: Murray, 1838. Babbage, Henry Prevost. Babbage’s Calculating Engines: A Collection of Papers by Henry Prevost Babbage (Editor). Vol. 2. Los Angeles: Tomash, 1982. Bailey, James. After Thought: The Computer Challenge to Human Intelligence. New York: Basic Books, 1996. Bara, Bruno G. and Giovanni Guida. Computational Models of Natural Language Processing. Amsterdam: North Holland, 1984. Barnsley, Michael F. Fractals Everywhere. Boston: Academic Press Professional, 1993. Baron, Jonathan. Rationality and Intelligence. Cambridge: Cambridge University Press, 1985. Barrett, Paul H., ed. The Collected Papers of Charles Darwin. Vols. 1 and 2. Chicago: University of Chicago Press, 1977. Barrow, John. Theories of Everything. Oxford: Oxford University Press, 1991.

Global Mind Change: The New Age Revolution in the Way We Think. New York: Warner Books, 1988. Harmon, Paul and David King. Expert Systems: Artificial Intelligence in Business. New York: John Wiley and Sons, 1985. Harre, Rom, ed. American Behaviorial Scientist: Computation and the Mind. Vol. 40, no. 6, May 1997. Harrington, Steven. Computer Graphics: A Programming Approach. New York: McGraw-Hill, 1987. Harris, Mary Dee. Introduction to Natural Language Processing. Reston, VA: Reston, 1985. Haugeland, John. Artificial Intelligence: The Very Idea. Cambridge, MA: MIT Press, 1985. ________, ed. Mind Design: Philosophy, Psychology, Artificial Intelligence. Cambridge, MA: MIT Press, 1981. ________, ed. Mind Design II: Philosophy, Psychology, Artificial Intelligence. Cambridge, MA: MIT Press, 1997. Hawking, Stephen W.ABrief History of Time: From the Big Bang to Black Holes.


pages: 1,085 words: 219,144

Solr in Action by Trey Grainger, Timothy Potter

business intelligence, cloud computing, commoditize, conceptual framework, crowdsourcing, data acquisition, en.wikipedia.org, failed state, fault tolerance, finite state, full text search, glass ceiling, information retrieval, natural language processing, openstreetmap, performance metric, premature optimization, recommendation engine, web application

log function LoggingHandler class logs, 2nd long queries <long> element LowerCaseFilter LowerCaseFilterFactory, 2nd, 3rd LRU (Least Recently Used) <lst> element, 2nd Lucene, 2nd lucene folder Lucene in Action <luceneMatchVersion> element LuceneQParserPlugin class lucene-solr/ folder LukeRequestHandler class, 2nd M map function MappingCharFilterFactory MapReduce master.replication.enabled parameter masterUrl parameter math functions <maxBufferedDocs> element maxdoc function maxMergeAtOnce parameter maxShardsPerNode parameter maxWarmingSearchers parameter <maxWarmingSearchers> element MBeans, 2nd mean reciprocal rank metric memcached memory RAM sorting and mentions, preserving in text mergeFactor parameter <mergeFactor> element MERGEINDEXES action <mergePolicy> element <mergeScheduler> element metadata microblog search application example, 2nd MinimalStem filter minimum match missing values, and sorting misspelled terms mm parameter MMapDirectory monitoring, external More Like This feature, 2nd, 3rd, 4th MoreLikeThisHandler class, 2nd ms function MS Office documents MS SQL Server multicore configuration multilingual search data-modeling features language identification dynamically assigning language analyzers dynamically mapping content overview update processors for language-specific field type configurations linguistic analysis scenarios field type for multiple languages multiple languages in one field separate fields per language separate indexes per language stemming dictionary-based (Hunspell) example KeywordMarkerFilterFactory language-specific analyzer chains vs. lemmatization StemmerOverrideFilterFactory multiselect faceting defined excludes keys multitenant search MultiTextField, 2nd MultiTextFieldAnalyzer MultiTextFieldLanguageIdentifierUpdate-Processor MultiTextFieldLanguageIdentifierUpdate-ProcessorFactory MultiTextFieldTokenizer, 2nd multiValued attribute murmur hash algorithm MySQL N Nagios, 2nd Natural Language Processing. See NLP. natural language, search using near real-time search. See NRT search. negated terms Nested query parser nesting function queries .NET Netflix newSearcher event n-grams NIOFSDirectory NLP (Natural Language Processing) node recovery process norm function normal commit Norwegian language NorwegianLightStemFilterFactory NoSQL (Not only SQL), 2nd, 3rd not function NOT operator, 2nd NRTCachingDirectory NRTCachingDirectoryFactory class numdocs function numeric fields overview precisionStep attribute numShards parameter, 2nd, 3rd Nutch O offsite backup for SolrCloud omitNorms attribute, 2nd, 3rd op parameter OpenOffice documents <openSearcher> element Optimize request, update handler optional terms, 2nd optmistic concurrency control OR operator, 2nd Oracle AS ord function outage types OutOfMemoryError P parameters dereferencing local params parameter substitutions <params> element parseArg() method parseFloat() method parseValueSource() method PatternReplaceCharFilterFactory, 2nd payload boosting PDF documents importing common formats indexing peer sync perception of relevancy permissions, document Persian language, 2nd persist parameter pf (phrase fields) parameters PHP, 2nd PHPResponseWriter class PHPSerializedResponseWriter class phrase searches, 2nd phrase slop parameters.

This probably makes you think of a person named John walking up to a particular kind of place: a financial institution. If the text instead read “After sailing for hours, John approached the bank,” you would likely be thinking about a person named John on a boat floating toward the shore. Both sentences state that “John approached the bank,” but the context plays a critical role in ensuring the text is properly understood. Due to advances in the field of Natural Language Processing (NLP), many important contextual clues can be identified in standard text. These can include identification of the language of unknown text, determination of the parts of speech, discovery or approximation of the root form of a word, understanding of synonyms and unimportant words, and discovery of relationships between words through their usage. You will notice that the best web search engines today go to great lengths to infer the meaning of your query.

Apache UIMA includes integration with many tools to extract knowledge from within your content, and Solr provides connectors for Apache UIMA, so these may be worth looking into if you need to build sophisticated content analysis capabilities into your search application. Other clustering and data classification techniques can also be used to enrich your data, which can lead to a far superior search experience than keyword searching alone. Although implementing most of these capabilities is beyond the scope of this book, Grant Ingersoll, Thomas Morton, and Andrew Farris provide a great overview of how to implement these kind of natural language processing techniques in Taming Text: How to Find, Organize, and Manipulate It (Manning, 2013), including a chapter on building a question-and-answer system similar to some of the previous examples. What Solr does provide out of the box, however, are the building blocks for these kinds of systems. This includes dozens of language-specific stemmers, a synonym filter, a stop words filter and language-specific stop word lists, character/accent normalization, query correction (spell-check) capabilities, and a language identifier.


Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose

Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, speech recognition, statistical model, William of Occam

The basic idea is to use a set of words (or terms) that the user WEB CHALLENGES 5 specifies and retrieve documents that include (or do not include) those words. This is the keyword search approach, well known from the area of information retrieval (IR). In web search, further IR techniques are used to avoid terms that are too general and too specific and to take into account term distribution throughout the entire body of documents as well as to explore document similarity. Natural language processing approaches are also used to analyze term context or lexical information, or to combine several terms into phrases. After retrieving a set of documents ranked by their degree of matching the keyword query, they are further ranked by importance (popularity, authority), usually based on the web link structure. All these approaches are discussed further later in the book. Topic Directories Web pages are organized into hierarchical structures that reflect their meaning.

To have content-based access to these documents, we organize them in libraries, bibliography systems, and by other means. This process takes a lot of time and effort because it is done by people. There are attempts to use computers for this purpose, but the problem is that content-based access assumes understanding the meaning of documents, something that is still a research question, studied in the area of artificial intelligence and natural language processing in particular. One may argue that natural language texts are structured, which is true as long as the language syntax (grammatical structure) is concerned. However, the transition to meaning still requires semantic structuring or understanding. There exists a solution that avoids the problem of meaning but still provides some types of content-based access to unstructured data. This is the keyword search approach known from the area of information retrieval (IR).


pages: 533

Future Politics: Living Together in a World Transformed by Tech by Jamie Susskind

3D printing, additive manufacturing, affirmative action, agricultural Revolution, Airbnb, airport security, Andrew Keen, artificial general intelligence, augmented reality, automated trading system, autonomous vehicles, basic income, Bertrand Russell: In Praise of Idleness, bitcoin, blockchain, brain emulation, British Empire, business process, Capital in the Twenty-First Century by Thomas Piketty, cashless society, Cass Sunstein, cellular automata, cloud computing, computer age, computer vision, continuation of politics by other means, correlation does not imply causation, crowdsourcing, cryptocurrency, digital map, distributed ledger, Donald Trump, easy for humans, difficult for computers, Edward Snowden, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Ethereum, ethereum blockchain, Filter Bubble, future of work, Google bus, Google X / Alphabet X, Googley, industrial robot, informal economy, intangible asset, Internet of things, invention of the printing press, invention of writing, Isaac Newton, Jaron Lanier, John Markoff, Joseph Schumpeter, Kevin Kelly, knowledge economy, lifelogging, Metcalfe’s law, mittelstand, more computing power than Apollo, move fast and break things, move fast and break things, natural language processing, Network effects, new economy, night-watchman state, Oculus Rift, Panopticon Jeremy Bentham, pattern recognition, payday loans, price discrimination, price mechanism, RAND corporation, ransomware, Ray Kurzweil, Richard Stallman, ride hailing / ride sharing, road to serfdom, Robert Mercer, Satoshi Nakamoto, Second Machine Age, selection bias, self-driving car, sexual politics, sharing economy, Silicon Valley, Silicon Valley startup, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart contracts, Snapchat, speech recognition, Steve Jobs, Steve Wozniak, Steven Levy, technological singularity, the built environment, The Structural Transformation of the Public Sphere, The Wisdom of Crowds, Thomas L Friedman, universal basic income, urban planning, Watson beat the top human players on Jeopardy!, working-age population

Krista Conger, ‘Computers Trounce Pathologists in Predicting Lung Cancer Type, Severity’, Stanford Medicine News Center, 16 August 2016 <http://med.stanford.edu/news/all-news/2016/08/computerstrounce-pathologists-in-predicting-lung-cancer-severity.html> (accessed 28 November 2017). See also Andre Esteva et al., ‘Dermatologist-level Classification of Skin Cancer with Deep Neural Networks’, Nature 542 (2 February 2017): 115–18. Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preotiuc, and Vasileios Lampos,‘Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective’ Peer J Computer Science 2, e93 (24 October 2016). Sarah A. Topol, ‘Attack of the Killer Robots’, BuzzFeed News, 26 August 2016 <https://www.buzzfeed.com/sarahatopol/how-tosave-mankind-from-the-new-breed-of-killer-robots?utm_term=. nm1GdWDBZ#.vaJzgW6va>) (accessed 28 November 2017). Cade Metz, ‘Google’s AI Wins Fifth and Final Game Against Go’, Wired, 15 March 2016 <https://www.wired.com/2016/03/googlesai-wins-fifth-final-game-go-genius-lee-sedol/> (accessed 28 November 2017); Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (Oxford: Oxford University Press, 2014), 12–13.

Mark Bridge, ‘AI Can Identify Alzheimer’s Disease a Decade before Symptoms Appear’, The Times, 20 September 2017 <https://www.thetimes.co.uk/article/ai-can-identify-alzheimer-s-a-decade-beforesymptoms-appear-9b3qdrrf7> (accessed 1 December 2017). 23. Wendell Wallach and Colin Allen, Moral Machines: Teaching Robots Right from Wrong (Oxford: Oxford University Press, 2009), 27. 24. Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preotiuc, and Vasileios Lampos. ‘Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective’. Peer J Computer Science 2, e93 (24 October 2016). See further Harry Surden, ‘Machine Learning and Law’, Washington Law Review 89, no. 1 (2014): 87–115. 25. Erik Brynjolfsson and Andrew McAfee Machine Platform Crowd: Harnessing Our Digital Future (New York: W. W. Norton & Company, 2017), 41. 26. See Anthony J. Casey and Anthony Niblett, ‘The Death of Rules and Standards’, Indiana Law Journal 92, no. 4 (2017); Anthony J.

Medium, 6 May 2017 <https://medium. com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a> (accessed 1 Dec. 2017). Ajunwa, Ifeoma, Kate Crawford, and Jason Schultz. ‘Limitless Worker Surveillance’. California Law Review 105, no. 3 (2017), 734–76. Aletras, Nikolaos, Dimitrios Tsarapatsanis, Daniel Preotiuc, and Vasileios Lampos. ‘Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective’. Peer J Computer Science 2, e93 (24 Oct. 2016). Allen, Jonathan P. Technology and Inequality: Concentrated Wealth in a Digital World. Kindle Edition: Palgrave Macmillan, 2017. Ananny, Mike. ‘Toward an Ethics of Algorithms: Convening, Observation, Probability, and Timeliness’. Science,Technology, & Human Values 41, no. 1 (2016). Anderson, Berit and Brett Horvath. ‘The Rise of the Weaponized AI Propaganda Machine’.


Beautiful Visualization by Julie Steele

barriers to entry, correlation does not imply causation, data acquisition, database schema, Drosophila, en.wikipedia.org, epigenetics, global pandemic, Hans Rosling, index card, information retrieval, iterative process, linked data, Mercator projection, meta analysis, meta-analysis, natural language processing, Netflix Prize, pattern recognition, peer-to-peer, performance metric, QR code, recommendation engine, semantic web, social graph, sorting algorithm, Steve Jobs, web application, wikimedia commons

Nothing described here is impossible in other languages, using other libraries and frameworks, but Java’s strong support for Unicode text processing and 2D graphics (via the Java2D API) makes these things pretty straightforward. Text Analysis We’ll now take a step back and consider some of the fundamental assumptions that determine Wordle’s character. In particular, we have to examine what “text” is, as far as Wordle is concerned. While this kind of text analysis is crude compared to what’s required for some natural-language processing, it can still be tedious to implement. If you work in Java, you might find my cue.language library[13] useful for the kinds of tasks described in this section. It’s small enough, it’s fast enough, and thousands use it each day as part of Wordle. Remember that natural-language analysis is as much craft as science,[14] and even given state-of-the-art computational tools, you have to apply judgment and taste.

[9] See http://wordpress.org/extend/plugins/wp-cumulus/. [10] See http://en.wikipedia.org/wiki/Bin_packing_problem. [11] See http://levitated.net/daily/levEmotionFractal.html. [12] See http://www.cs.umd.edu/hcil/treemap-history/. [13] See http://github.com/vcl/cue.language. [14] For an illuminating demonstration of this craft, see Peter Norvig’s chapter on natural-language processing in the sister O’Reilly book Beautiful Data. [15] See http://researchweb.watson.ibm.com/visual/inaugurals/. [16] See http://www.alphaworks.ibm.com/tech/wordcloud. [17] See http://manyeyes.alphaworks.ibm.com/manyeyes/page/Visualization_Options.html. Chapter Four Color: The Cinderella of Data Visualization Michael Driscoll Avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm.


pages: 116 words: 31,356

Platform Capitalism by Nick Srnicek

3D printing, additive manufacturing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, Capital in the Twenty-First Century by Thomas Piketty, cloud computing, collaborative economy, collective bargaining, deindustrialization, deskilling, disintermediation, future of work, gig economy, Infrastructure as a Service, Internet of things, Jean Tirole, Jeff Bezos, knowledge economy, knowledge worker, liquidity trap, low skilled workers, Lyft, Mark Zuckerberg, means of production, mittelstand, multi-sided market, natural language processing, Network effects, new economy, Oculus Rift, offshore financial centre, pattern recognition, platform as a service, quantitative easing, RFID, ride hailing / ride sharing, Robert Gordon, self-driving car, sharing economy, Shoshana Zuboff, Silicon Valley, Silicon Valley startup, software as a service, TaskRabbit, the built environment, total factor productivity, two-sided market, Uber and Lyft, Uber for X, uber lyft, unconventional monetary instruments, unorthodox policies, Zipcar

If people move into apps or start searching on Amazon instead of Google, these are threats to Google’s basic business model. Every major platform company is increasingly positioning itself in the natural language interface market as well. In 2016 Facebook began a major push for ‘chatbots’ – that is, low-level AI programmes that would converse with users on Facebook’s platform. (This is also why Facebook – and numerous other companies – are investing heavily in AI and the natural language processing needed to enable chatbots.) The bet is that these chatbots will become the preferred way for users to interact with the internet. On this open platform, businesses would be given the tools to develop their own bots and create intuitive means for users to order food, buy a train ticket, or make a dinner reservation.24 Rather than using a separate app or website for accessing businesses and services, users would simply access them through Facebook’s platform, which would make Facebook’s chatbot platform the primary interface for commercial transactions online.


The Deep Learning Revolution (The MIT Press) by Terrence J. Sejnowski

AI winter, Albert Einstein, algorithmic trading, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, bioinformatics, cellular automata, Claude Shannon: information theory, cloud computing, complexity theory, computer vision, conceptual framework, constrained optimization, Conway's Game of Life, correlation does not imply causation, crowdsourcing, Danny Hillis, delayed gratification, discovery of DNA, Donald Trump, Douglas Engelbart, Drosophila, Elon Musk, en.wikipedia.org, epigenetics, Flynn Effect, Frank Gehry, future of work, Google Glasses, Google X / Alphabet X, Guggenheim Bilbao, Gödel, Escher, Bach, haute couture, Henri Poincaré, I think there is a world market for maybe five computers, industrial robot, informal economy, Internet of things, Isaac Newton, John Conway, John Markoff, John von Neumann, Mark Zuckerberg, Minecraft, natural language processing, Netflix Prize, Norbert Wiener, orbital mechanics / astrodynamics, PageRank, pattern recognition, prediction markets, randomized controlled trial, recommendation engine, Renaissance Technologies, Rodney Brooks, self-driving car, Silicon Valley, Silicon Valley startup, Socratic dialogue, speech recognition, statistical model, Stephen Hawking, theory of mind, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, traveling salesman, Turing machine, Von Neumann architecture, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra

Intelligence evolved in brains to control movements, and bodies evolved to interact with the world through that intelligence. Brooks departed from the traditional controllers used by roboticists and used behavior rather than computation as the metaphor for designing robots. As we learn more from building robots, it will become apparent that the body is a part of the mind. Nature Is Cleverer Than We Are 257 In “Why Natural Language Processing is Now Statistical Natural Language Processing,” Eugene Charniak explained that a basic part of grammar is to tag parts of speech in a sentence. This is something that humans can be trained to do much better than the extant parsing programs. The field of computational linguistics initially tried to apply the generative grammar approach pioneered by Noam Chomsky in the 1980s, but the results were disappointing.


pages: 407 words: 104,622

The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution by Gregory Zuckerman

affirmative action, Affordable Care Act / Obamacare, Albert Einstein, Andrew Wiles, automated trading system, backtesting, Bayesian statistics, beat the dealer, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, blockchain, Brownian motion, butter production in bangladesh, buy and hold, buy low sell high, Claude Shannon: information theory, computer age, computerized trading, Credit Default Swap, Daniel Kahneman / Amos Tversky, diversified portfolio, Donald Trump, Edward Thorp, Elon Musk, Emanuel Derman, endowment effect, Flash crash, George Gilder, Gordon Gekko, illegal immigration, index card, index fund, Isaac Newton, John Meriwether, John Nash: game theory, John von Neumann, Loma Prieta earthquake, Long Term Capital Management, loss aversion, Louis Bachelier, mandelbrot fractal, margin call, Mark Zuckerberg, More Guns, Less Crime, Myron Scholes, Naomi Klein, natural language processing, obamacare, p-value, pattern recognition, Peter Thiel, Ponzi scheme, prediction markets, quantitative hedge fund, quantitative trading / quantitative finance, random walk, Renaissance Technologies, Richard Thaler, Robert Mercer, Ronald Reagan, self-driving car, Sharpe ratio, Silicon Valley, sovereign wealth fund, speech recognition, statistical arbitrage, statistical model, Steve Jobs, stochastic process, the scientific method, Thomas Bayes, transaction costs, Turing machine

Once in a while, he’d issue statements that seemed aimed at getting a rise out of his lunch-mates, such as the time he declared that he thought he would live forever. Brown was more animated, approachable, and energetic, with thick, curly brown hair and an infectious charm. Unlike Mercer, Brown forged friendships within the group, several members of which appreciated his sneaky sense of humor. As the group struggled to make progress in natural-language processing, though, Brown showed impatience, directing special ire at an intern named Phil Resnik. A graduate student at the University of Pennsylvania who had earned a bachelor of arts in computer science at Harvard University and would later become a respected academic, Resnik hoped to combine mathematical tactics with linguistic principles. Brown had little patience for Resnik’s approach, mocking his younger colleague and jumping on his mistakes.

Stephen Miller, “Co-Inventor of Money-Market Account Helped Serve Small Investors’ Interest,” Wall Street Journal, August 16, 2008, https://www.wsj.com/articles/SB121884007790345601. 6. Feng-Hsiung Hsu, Behind Deep Blue: Building the Computer That Defeated the World Chess Champion (Princeton, NJ: Princeton University Press, 2002). Chapter Ten 1. Peter Brown and Robert Mercer, “Oh, Yes, Everything’s Right on Schedule, Fred” (lecture, Twenty Years of Bitext Workshop, Empirical Methods in Natural Language Processing Conference, Seattle, Washington, October 2013), http://cs.jhu.edu/~post/bitext. Chapter Eleven 1. Hal Lux, “The Secret World of Jim Simons,” Institutional Investor, November 1, 2000, https://www.institutionalinvestor.com/article/b151340bp779jn/the-secret-world-of-jim-simons. 2. Robert Mercer interviewed by Sharon McGrayne for her book, The Theory Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy (New Haven, CT: Yale University Press, 2011). 3.


pages: 419 words: 109,241

A World Without Work: Technology, Automation, and How We Should Respond by Daniel Susskind

3D printing, agricultural Revolution, AI winter, Airbnb, Albert Einstein, algorithmic trading, artificial general intelligence, autonomous vehicles, basic income, Bertrand Russell: In Praise of Idleness, blue-collar work, British Empire, Capital in the Twenty-First Century by Thomas Piketty, cloud computing, computer age, computer vision, computerized trading, creative destruction, David Graeber, David Ricardo: comparative advantage, demographic transition, deskilling, disruptive innovation, Donald Trump, Douglas Hofstadter, drone strike, Edward Glaeser, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, financial innovation, future of work, gig economy, Gini coefficient, Google Glasses, Gödel, Escher, Bach, income inequality, income per capita, industrial robot, interchangeable parts, invisible hand, Isaac Newton, Jacques de Vaucanson, James Hargreaves, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Joi Ito, Joseph Schumpeter, Kenneth Arrow, Khan Academy, Kickstarter, low skilled workers, lump of labour, Marc Andreessen, Mark Zuckerberg, means of production, Metcalfe’s law, natural language processing, Network effects, Occupy movement, offshore financial centre, Paul Samuelson, Peter Thiel, pink-collar, precariat, purchasing power parity, Ray Kurzweil, ride hailing / ride sharing, road to serfdom, Robert Gordon, Sam Altman, Second Machine Age, self-driving car, shareholder value, sharing economy, Silicon Valley, Snapchat, social intelligence, software is eating the world, sovereign wealth fund, spinning jenny, Stephen Hawking, Steve Jobs, strong AI, telemarketer, The Future of Employment, The Rise and Fall of American Growth, the scientific method, The Wealth of Nations by Adam Smith, Thorstein Veblen, Travis Kalanick, Turing test, Tyler Cowen: Great Stagnation, universal basic income, upwardly mobile, Watson beat the top human players on Jeopardy!, We are the 99%, wealth creators, working poor, working-age population, Y Combinator

Ruger, Pauline T. Kim, Andrew D. Martin, and Kevin M. Quinn, “The Supreme Court Forecasting Project: Legal and Political Science Approaches to Predicting Supreme Court Decisionmaking,” Columbia Law Review 104:4 (2004), 1150–1210. 39.  Nikolas Aletras, Dimitrios Tsarapatsanis, Daniel Preoţiuc-Pietro, and Vasileios Lampos, “Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective,” PeerJ Computer Science 2:93 (2016). 40.  Though by no means limited to diagnosis. See Eric Topol, “High-Performance Medicine: The Convergence of Human and Artificial Intelligence,” Nature 25 (2019), 44–56, for a broader overview of the uses of AI in medicine. 41.  Jeffrey De Fauw, Joseph Ledsam, Bernardino Romera-Paredes, et al., “Clinically Applicable Deep Learning for Diagnosis and Referral in Retinal Disease,” Nature Medicine 24 (2018), 1342–50. 42.  

Journal of Public Economics 88, nos. 9–10 (2004): 2009–42. Alesina, Alberto, Edward Glaeser, and Bruce Sacerdote. “Why Doesn’t the United States Have a European-Style Welfare State?” Brookings Papers on Economic Activity 2 (2001). Aletras, Nikolas, Dimitrios Tsarapatsanis, Daniel Preoţiuc-Pietro, and Vasileios Lampos. “Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective.” PeerJ Computer Science 2, no. 93 (2016). Allen, Robert. “The Industrial Revolution in Miniature: The Spinning Jenny in Britain, France, and India.” Oxford University Working Paper No. 375 (2017). Alstadsæter, Annette, Niels Johannesen, and Gabriel Zucman. “Tax Evasion and Inequality.” American Economic Review 109, no. 6 (2019): 2073–103. ________. “Who Owns the Wealth in Tax Havens?


pages: 413 words: 106,479

Because Internet: Understanding the New Rules of Language by Gretchen McCulloch

4chan, book scanning, British Empire, citation needed, Donald Trump, en.wikipedia.org, Firefox, Flynn Effect, Google Hangouts, Internet Archive, invention of the printing press, invention of the telephone, moral panic, multicultural london english, natural language processing, pre–internet, QWERTY keyboard, Ray Oldenburg, Silicon Valley, Skype, Snapchat, social web, Steven Pinker, telemarketer, The Great Good Place, upwardly mobile, Watson beat the top human players on Jeopardy!

The Ling Space blog. thelingspace.tumblr.com/post/138053815679/writing-in-texts-vs-twitter. favors a few elite languages and dialects: François Grosjean. 2010. Bilingual. Harvard University Press. One method of bridging: Su Lin Blodgett, Lisa Green, and Brendan O’Connor. 2016. “Demographic Dialectal Variation in Social Media: A Case Study of African-American English.” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. pp. 1119–1130. arxiv.org/pdf/1608.08868v1.pdf. “15-year-old users”: Ivan Smirnov. 2017. “The Digital Flynn Effect: Complexity of Posts on Social Media Increases over Time.” Presented at the International Conference on Social Informatics, September 13–15, 2017, Oxford, UK. arxiv.org/abs/1707.05755. textisms might interfere: Michelle Drouin and Claire Davis. 2009. “R u txting? Is the Use of Text Speak Hurting Your Literacy?”

“‘Confectionary, confectionary’”: Maturin Murray Ballou. 1848. The Duke’s Prize; a Story of Art and Heart in Florence. (No publisher cited.) www.gutenberg.org/ebooks/4956. top twenty most lengthened words: Samuel Brody and Nicholas Diakopoulos. 2011. “Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs.” Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. pp. 562–570. expressive lengthening: Tyler Schnoebelen. January 8, 2013. “Aww, hmmm, ohh heyyy nooo omggg!” Corpus Linguistics. corplinguistics.wordpress.com/2013/01/08/aww-hmmm-ohh-heyyy-nooo-omggg/. Jen Doll. 2016. “Why Drag It Out?” The Atlantic. www.theatlantic.com/magazine/archive/2013/03/dragging-it-out/309220/. Jen Doll. February 1, 2013.


pages: 470 words: 109,589

Apache Solr 3 Enterprise Search Server by Unknown

bioinformatics, continuous integration, database schema, en.wikipedia.org, fault tolerance, Firefox, full text search, information retrieval, natural language processing, performance metric, platform as a service, Ruby on Rails, web application

SignatureUpdateProcessorFactory: This generates a hash ID value based off of other field values you specify. If you want to de-duplicate your data (that is you don't want to add the same data twice accidentally) then this will do that for you. For further information see http://wiki.apache.org/solr/Deduplication. UIMAUpdateProcessorFactory: This hands the document off to the Unstructured Information Management Architecture (UIMA), a Solr contrib module that enhances the document through natural language processing (NLP) techniques. For further information see http://wiki.apache.org/solr/SolrUIMA. Although it's nice to see an NLP integration option in Solr, beware that NLP processing tends to be computationally expensive. Instead of using UIMA in this way, consider performing this processing external to Solr and cache the results to avoid re-computation as you adjust your indexing process. LogUpdateProcessorFactory: This is the one responsible for writing the log messages you see when an update occurs.

Indexing locations You need raw location data in the form of a latitude and longitude to take advantage of Solr's geospatial capabilities. If you have named locations (for example, "Boston, MA") then the data needs to be resolved to latitudes and longitudes using a gazetteer like Geonames—http://www.geonames.org. If all you have is free-form natural language text without the locations identified, then you'll have to perform a more difficult task that uses Natural Language Processing techniques to find the named locations. These approaches are out of scope of this book. The principle field type in Solr for geospatial is LatLonType, which stores a single latitude-longitude pair. Under the hood, this field type copies the latitude and longitude into a pair of indexed fields using the provided field name suffix. In the following excerpt taken from Solr's example schema, given the field name store, there will be two additional fields named store_0_coordinate and store_1_coordinate, which you'll see in Solr's schema browser.


pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, basic income, Bayesian statistics, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, creative destruction, crowdsourcing, Danny Hillis, data is the new oil, double helix, Douglas Hofstadter, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, global village, Google Glasses, Gödel, Escher, Bach, information retrieval, job automation, John Markoff, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, lone genius, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, NP-complete, off grid, P = NP, PageRank, pattern recognition, phenotype, planetary scale, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, scientific worldview, Second Machine Age, self-driving car, Silicon Valley, social intelligence, speech recognition, Stanford marshmallow experiment, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight, zero-sum game

A semantic network is a set of concepts (like planets and stars) and relations among those concepts (planets orbit stars). Alchemy learned over a million such patterns from facts extracted from the web (e.g., Earth orbits the sun). It discovered concepts like planet all by itself. The version we used was more advanced than the basic one I’ve described here, but the essential ideas are the same. Various research groups have used Alchemy or their own MLN implementations to solve problems in natural language processing, computer vision, activity recognition, social network analysis, molecular biology, and many other areas. Despite its successes, Alchemy has some significant shortcomings. It does not yet scale to truly big data, and someone without a PhD in machine learning will find it hard to use. Because of these problems, it’s not yet ready for prime time. But let’s see what we can do about them.

“Relevance weighting of search terms,”* by Stephen Robertson and Karen Sparck Jones (Journal of the American Society for Information Science, 1976), explains the use of Naïve Bayes–like methods in information retrieval. “First links in the Markov chain,” by Brian Hayes (American Scientist, 2013), recounts Markov’s invention of the eponymous chains. “Large language models in machine translation,”* by Thorsten Brants et al. (Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007), explains how Google Translate works. “The PageRank citation ranking: Bringing order to the Web,”* by Larry Page, Sergey Brin, Rajeev Motwani, and Terry Winograd (Stanford University technical report, 1998), describes the PageRank algorithm and its interpretation as a random walk over the web. Statistical Language Learning,* by Eugene Charniak (MIT Press, 1996), explains how hidden Markov models work.


pages: 163 words: 42,402

Machine Learning for Email by Drew Conway, John Myles White

call centre, correlation does not imply causation, Debian, natural language processing, Netflix Prize, pattern recognition, recommendation engine, SpamAssassin, text mining

Moreover, because we calculate conditional probabilities using products, if we assigned a zero probability to terms not in our training data, elementary arithmetic tells us that we would calculate zero as the probability of most messages, since we would be multiplying all the other probabilities by zero every time we encountered an unknown term. This would cause catastrophic results for our classifier, as many, or even all, messages would be incorrectly assigned a zero probability to be either spam or ham. Researchers have come up with many clever ways of trying to get around this problem, such as drawing a random probability from some distribution or using natural language processing (NLP) techniques to estimate the “spamminess” of a term given its context. For our purposes, we will use a very simple rule: assign a very small probability to terms that are not in the training set. This is, in fact, a common way of dealing with missing terms in simple text classifiers, and for our purposes it will serve just fine. In this exercise, by default we will set this probability to 0.0001%, or one-ten-thousandth of a percent, which is sufficiently small for this data set.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst

algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

In addition, the underlying system can resolve references by inferring new triples from the existing records using a rules set. This is a powerful alternative to joining relational tables to resolve references in a typical RDBMS, while also offering a more expressive way to model data than a key value store. One of the most powerful aspects of semantic technology comes from the world of linguistics and natural language processing, also known as entity extraction. This is a powerful mechanism to extract information from unstructured data and combine it with transactional data, enabling deep analytics by bringing these worlds closer together. Another method that brings structure to the unstructured is the text analytics tool, which is improving daily as scientists come up with new ways of making algorithms understand written text more accurately.


pages: 413 words: 119,587

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots by John Markoff

"Robert Solow", A Declaration of the Independence of Cyberspace, AI winter, airport security, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, basic income, Baxter: Rethink Robotics, Bill Duvall, bioinformatics, Brewster Kahle, Burning Man, call centre, cellular automata, Chris Urmson, Claude Shannon: information theory, Clayton Christensen, clean water, cloud computing, collective bargaining, computer age, computer vision, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, data acquisition, Dean Kamen, deskilling, don't be evil, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dynabook, Edward Snowden, Elon Musk, Erik Brynjolfsson, factory automation, From Mathematics to the Technologies of Life and Death, future of work, Galaxy Zoo, Google Glasses, Google X / Alphabet X, Grace Hopper, Gunnar Myrdal, Gödel, Escher, Bach, Hacker Ethic, haute couture, hive mind, hypertext link, indoor plumbing, industrial robot, information retrieval, Internet Archive, Internet of things, invention of the wheel, Jacques de Vaucanson, Jaron Lanier, Jeff Bezos, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, knowledge worker, Kodak vs Instagram, labor-force participation, loose coupling, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, medical residency, Menlo Park, Mitch Kapor, Mother of all demos, natural language processing, new economy, Norbert Wiener, PageRank, pattern recognition, pre–internet, RAND corporation, Ray Kurzweil, Richard Stallman, Robert Gordon, Rodney Brooks, Sand Hill Road, Second Machine Age, self-driving car, semantic web, shareholder value, side project, Silicon Valley, Silicon Valley startup, Singularitarianism, skunkworks, Skype, social software, speech recognition, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Nelson, telemarketer, telepresence, telepresence robot, Tenerife airport disaster, The Coming Technological Singularity, the medium is the message, Thorstein Veblen, Turing test, Vannevar Bush, Vernor Vinge, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, William Shockley: the traitorous eight, zero-sum game

Because it was faster to cast an erroneous line than correct it, typesetters would “run down” the rest of the line with easy-to-type nonsense, later removing the entire line after it had cooled down, or if they forgot, hope a proofreader caught it.9 He wasn’t concerned at the time about any ethical implications involved in building a natural language processing system that could “understand” and respond in a virtual world. In SHRDLU “understanding” meant that the program analyzed the structure of the typed questions and attempted to answer them and respond to the commands. It was an early effort at disambiguation, a thorny problem for natural language processing even today. For example, in the sentence “he put the glass on the table and it broke,” does “it” refer to the glass or the table? Without more context, neither a human nor an AI program could decide. Winograd’s system used its general knowledge of the microworld to answer and respond to various questions.


pages: 588 words: 131,025

The Patient Will See You Now: The Future of Medicine Is in Your Hands by Eric Topol

23andMe, 3D printing, Affordable Care Act / Obamacare, Anne Wojcicki, Atul Gawande, augmented reality, bioinformatics, call centre, Clayton Christensen, clean water, cloud computing, commoditize, computer vision, conceptual framework, connected car, correlation does not imply causation, creative destruction, crowdsourcing, dark matter, data acquisition, disintermediation, disruptive innovation, don't be evil, Edward Snowden, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Firefox, global village, Google Glasses, Google X / Alphabet X, Ignaz Semmelweis: hand washing, information asymmetry, interchangeable parts, Internet of things, Isaac Newton, job automation, Julian Assange, Kevin Kelly, license plate recognition, lifelogging, Lyft, Mark Zuckerberg, Marshall McLuhan, meta analysis, meta-analysis, microbiome, Nate Silver, natural language processing, Network effects, Nicholas Carr, obamacare, pattern recognition, personalized medicine, phenotype, placebo effect, RAND corporation, randomized controlled trial, Second Machine Age, self-driving car, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, Snapchat, social graph, speech recognition, stealth mode startup, Steve Jobs, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Uber for X, uber lyft, Watson beat the top human players on Jeopardy!, WikiLeaks, X Prize

When Sanofi and Regeneron were looking to expedite recruitment of patients with high cholesterol for their new, experimental drug alirocumab, an antibody against the PCSK9 protein, they turned to the American College of Cardiology registry.108 Another approach, developed by researchers at Case Western Reserve University, is a software tool known as “Trial Prospector,” which delves into clinical data systems to match patients with clinical trials.109 It combines artificial intelligence and natural language processing to automate the patient screening and enrollment process, often a rate-limiting step in developing new drugs. Automated clinical trial matching programs for specific conditions, such as the Alzheimer’s Association Trialmatch,107 are proliferating. Data mining to facilitate clinical trial recruitment is offered by a number of companies, such as Blue Chip Marketing Worldwide and Acurian.110 Ben Goldacre, the acclaimed author and one of the leading independent critics and innovators in pharma research, set up the tool “RandomiseMe,” which makes it “easy to run randomized clinical trials on yourself and your friends.”111 So although clinical trial participation is remarkably rare today, there are efforts on multiple fronts to change that in the future.

Cultural change is exceedingly difficult, but given the other forces in the iMedicine galaxy, especially the health care economic crisis that has engendered desperation, it may be possible to accomplish. An aggressive commitment to the education and training of practicing physicians to foster their use of the new tools would not only empower their patients, but also themselves. Eliminating the enormous burden of electronic charting or use of scribes by an all-out effort for natural language processing of voice during a visit would indeed be liberating. It’s long overdue for physicians and health professionals to be constantly cognizant of actual costs, eliminate unnecessary tests and procedures,75a and engage in exquisite electronic communication, which includes e-mail, and sharing notes and all data. If financial incentives are needed, they may be well worth the investment. Data Scientists Government and recalcitrant doctors are major potential impediments, but the biggest bottleneck to advancing the field is unquestionably dealing with data.


Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, longitudinal study, Mars Rover, natural language processing, openstreetmap, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social graph, SPARQL, speech recognition, statistical model, supply-chain management, text mining, Vernor Vinge, web application

Aside from R’s core functionality, some of the add-on packages we used include corrgram, flowCore, gclus, geneplotter, plyr, and pixmap. Good overviews of clustering, loess, and other machine learning techniques are in The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (Springer; 2008). The section on tags barely touches the surface of statistical language analysis. For more, see the chapters on corpus linguistics from Foundations of Statistical Natural Language Processing by Christopher Manning and Hinrich Schütze (MIT Press; 1999) and also Speech and Language Processing by Daniel Jurafsky and James H. Martin (Prentice Hall; 2008). There are many better ways for estimating confidence intervals for the attractiveness versus age analysis. One method is partial pooling; see pp. 252–258 of Andrew Gelman and Jennifer Hill’s Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge University Press; 2006).

He writes and speaks regularly on the interface of web technology with science and the successful (and unsuccessful) application of generic and specially designed tools in the academic research environment. Peter Norvig is director of research at Google Inc. He is a Fellow of the AAAI and the ACM and coauthor of Artificial Intelligence: A Modern Approach (Prentice Hall), the leading textbook in the field. Previously he was head of computational sciences at NASA and a faculty member at USC and Berkeley. Brendan O’Connor is a researcher in machine learning and natural language processing. He is a scientific consultant at Dolores Labs and worked previously as a relevance engineer at Powerset. He received a BS and MS in symbolic systems from Stanford University, and is back to academia this fall as a graduate student at Carnegie Mellon University. His blog, “Artificial Intelligence and Social Science,” is at http://anyall.org/blog. David Poole is a member of the Statistics Research Department at AT&T Labs and was recently the secretary/treasurer of the Section on Statistical Computing of the American Statistical Association.


pages: 215 words: 59,188

Seriously Curious: The Facts and Figures That Turn Our World Upside Down by Tom Standage

agricultural Revolution, augmented reality, autonomous vehicles, blood diamonds, corporate governance, Deng Xiaoping, Donald Trump, Elon Musk, failed state, financial independence, gender pay gap, gig economy, Gini coefficient, high net worth, income inequality, index fund, industrial robot, Internet of things, invisible hand, job-hopping, Julian Assange, life extension, Lyft, M-Pesa, Mahatma Gandhi, manufacturing employment, mega-rich, megacity, Minecraft, mobile money, natural language processing, Nelson Mandela, plutocrats, Plutocrats, price mechanism, purchasing power parity, ransomware, reshoring, ride hailing / ride sharing, Ronald Coase, self-driving car, Silicon Valley, Snapchat, South China Sea, speech recognition, stem cell, supply-chain management, transaction costs, Uber and Lyft, uber lyft, undersea cable, US Airways Flight 1549, WikiLeaks

The original approach to getting computers to understand human language was to use sets of precise rules – for example, in translation, a set of grammar rules for breaking down the meaning of the source language, and another set for reproducing the meaning in the target language. But after a burst of optimism in the 1950s, such systems could not be made to work on complex new sentences; the rules-based approach would not scale up. Funding for so-called natural-language processing went into hibernation for decades, until a renaissance in the late 1980s. Then a new approach emerged, based on machine learning – a technique in which computers are trained using lots of examples, rather than being explicitly programmed. For speech recognition, computers are fed sound files on the one hand, and human-written transcriptions on the other. The system learns to predict which sounds should result in what transcriptions.


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

Mining knowledge in cube space can substantially enhance the power and flexibility of data mining. ■ Data mining—an interdisciplinary effort: The power of data mining can be substantially enhanced by integrating new methods from multiple disciplines. For example, to mine data with natural language text, it makes sense to fuse data mining methods with methods of information retrieval and natural language processing. As another example, consider the mining of software bugs in large programs. This form of mining, known as bug mining, benefits from the incorporation of software engineering knowledge into the data mining process. ■ Boosting the power of discovery in a networked environment: Most data objects reside in a linked or interconnected environment, whether it be the Web, database relations, files, or documents.

Semantic annotation of a frequent pattern Figure 7.12 shows an example of a semantic annotation for the pattern “{frequent, pattern}.” This dictionary-like annotation provides semantic information related to “{frequent, pattern},” consisting of its strongest context indicators, the most representative data transactions, and the most semantically similar patterns. This kind of semantic annotation is similar to natural language processing. The semantics of a word can be inferred from its context, and words sharing similar contexts tend to be semantically similar. The context indicators and the representative transactions provide a view of the context of the pattern from different angles to help users understand the pattern. The semantically similar patterns provide a more direct connection between the pattern and any other patterns already known to the users.

Artificial Intelligence (AAAI’10) Atlanta, GA. (July 2010), pp. 1671–1675. [RH01] Raman, V.; Hellerstein, J.M., Potter's wheel: An interactive data cleaning system, In: Proc. 2001 Int. Conf. Very Large Data Bases (VLDB’01) Rome, Italy. (Sept. 2001), pp. 381–390. [RH07] Rosenberg, A.; Hirschberg, J., V-measure: A conditional entropy-based external cluster evaluation measure, In: Proc. 2007 Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07) Prague, Czech Republic. (June 2007), pp. 410–420. [RHS01] Roddick, J.F.; Hornsby, K.; Spiliopoulou, M., An updated bibliography of temporal, spatial, and spatio-temporal data mining research, In: (Editors: Roddick, J.F.; Hornsby, K.) Lecture Notes in Computer Science 2007 (2001) Springer, New York, pp. 147–163; TSDM 2000. [RHW86] Rumelhart, D.E.; Hinton, G.E.; Williams, R.J., Learning internal representations by error propagation, In: (Editors: Rumelhart, D.E.; McClelland, J.L.)


pages: 219 words: 63,495

50 Future Ideas You Really Need to Know by Richard Watson

23andMe, 3D printing, access to a mobile phone, Albert Einstein, artificial general intelligence, augmented reality, autonomous vehicles, BRICs, Buckminster Fuller, call centre, clean water, cloud computing, collaborative consumption, computer age, computer vision, crowdsourcing, dark matter, dematerialisation, digital Maoism, digital map, Elon Musk, energy security, failed state, future of work, Geoffrey West, Santa Fe Institute, germ theory of disease, global pandemic, happiness index / gross national happiness, hive mind, hydrogen economy, Internet of things, Jaron Lanier, life extension, Mark Shuttleworth, Marshall McLuhan, megacity, natural language processing, Network effects, new economy, oil shale / tar sands, pattern recognition, peak oil, personalized medicine, phenotype, precision agriculture, profit maximization, RAND corporation, Ray Kurzweil, RFID, Richard Florida, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Skype, smart cities, smart meter, smart transportation, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, supervolcano, telepresence, The Wisdom of Crowds, Thomas Malthus, Turing test, urban decay, Vernor Vinge, Watson beat the top human players on Jeopardy!, web application, women in the workforce, working-age population, young professional

the condensed idea Thought control timeline 2000 Electrode arrays implanted into owl monkeys 2001 Technology allows a monkey to operate a robotic arm via thought control 2006 Teenager plays Space Invaders using brain signals 2008 Scientists manage to extract images from a person’s mind 2009 Brain–Twitter interface 2017 Voice control replaces 70 percent of keyboards 2026 Google patents neural interface 33 Avatar assistants Computer-based avatars are virtual recreations of real or fictional characters used in forms of computer gaming and in virtual online communities. In the near future they will become common as intelligent digital assistants or personal agents, controlled by forms of artificial intelligence such as natural language processing and accessed via mobile or fixed devices. “Everything is backward now, like out there is the true world, and in here is the dream.” Jake Sully in the movie Avatar Apple’s iPhone 4S offers a tantalizing glimpse of the future in the form of Siri, an application that allows users to employ normal language to send messages or ask questions. But this is a very basic technology compared with what’s to come.


pages: 247 words: 71,698

Avogadro Corp by William Hertling

Any sufficiently advanced technology is indistinguishable from magic, cloud computing, crowdsourcing, Hacker Ethic, hive mind, invisible hand, natural language processing, Netflix Prize, private military company, Ray Kurzweil, recommendation engine, Richard Stallman, Ruby on Rails, standardized shipping container, technological singularity, Turing test, web application, WikiLeaks

David noticed that Rebecca Smith was standing in the doorway listening to the presentation. In a sharp tailored suit, and with her reputation hovering about her like an invisible aura, the Avogadro CEO made for an imposing presence. Only her warm smile left a welcoming space in which an ordinary guy like David could stand. She nodded to David as she came in and took her seat at the head of the table. Kenneth asked, “But what you’re describing, how does it work? Natural language processing ability of computers doesn’t even come close to being able to understand the semantics of human language. Have you had some miracle breakthrough?” “At the heart of how this works is the field of recommendation algorithms,” David explained. “Sean hired me not because I knew anything about language analysis but because I was a leading competitor in the Netflix competition. Netflix recommends movies that you’d enjoy watching.


pages: 244 words: 66,977

Subscribed: Why the Subscription Model Will Be Your Company's Future - and What to Do About It by Tien Tzuo, Gabe Weisert

3D printing, Airbnb, airport security, Amazon Web Services, augmented reality, autonomous vehicles, blockchain, Build a better mousetrap, business cycle, business intelligence, business process, call centre, cloud computing, cognitive dissonance, connected car, death of newspapers, digital twin, double entry bookkeeping, Elon Musk, factory automation, fiat currency, Internet of things, inventory management, iterative process, Jeff Bezos, Kevin Kelly, Lean Startup, Lyft, manufacturing employment, minimum viable product, natural language processing, Network effects, Nicholas Carr, nuclear winter, pets.com, profit maximization, race to the bottom, ride hailing / ride sharing, Sand Hill Road, shareholder value, Silicon Valley, skunkworks, smart meter, social graph, software as a service, spice trade, Steve Ballmer, Steve Jobs, subscription business, Tim Cook: Apple, transport as a service, Uber and Lyft, uber lyft, Y2K, Zipcar

This transformation is what allows GE to survive and remain on the Fortune 500 list. IBM was #61 on the Fortune 500 list in 1955, and it’s #32 on the list today. IBM originally sold commercial scales and punch card tabulators. Today it sells IT and quantum computing services. It has completely transformed from a product manufacturer into a business services giant. IBM is now working on Watson—a technology platform that uses natural language processing and machine learning to reveal insights from large amounts of unstructured data. It has Bob Dylan chatting with an artificial intelligence system in its advertisements. It is now in the business of cognitive services—a pretty exciting departure from where the company started. In fact, 12 percent of the companies on the 1955 Fortune 500 list are still on it today, and most of them have similarly transformed.


pages: 265 words: 69,310

What's Yours Is Mine: Against the Sharing Economy by Tom Slee

4chan, Airbnb, Amazon Mechanical Turk, asset-backed security, barriers to entry, Berlin Wall, big-box store, bitcoin, blockchain, citizen journalism, collaborative consumption, congestion charging, Credit Default Swap, crowdsourcing, data acquisition, David Brooks, don't be evil, gig economy, Hacker Ethic, income inequality, informal economy, invisible hand, Jacob Appelbaum, Jane Jacobs, Jeff Bezos, Khan Academy, Kibera, Kickstarter, license plate recognition, Lyft, Marc Andreessen, Mark Zuckerberg, move fast and break things, move fast and break things, natural language processing, Netflix Prize, Network effects, new economy, Occupy movement, openstreetmap, Paul Graham, peer-to-peer, peer-to-peer lending, Peter Thiel, pre–internet, principal–agent problem, profit motive, race to the bottom, Ray Kurzweil, recommendation engine, rent control, ride hailing / ride sharing, sharing economy, Silicon Valley, Snapchat, software is eating the world, South of Market, San Francisco, TaskRabbit, The Nature of the Firm, Thomas L Friedman, transportation-network company, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, ultimatum game, urban planning, WikiLeaks, winner-take-all economy, Y Combinator, Zipcar

In July 2014 Airbnb tried to encourage more critical reviews by holding back the publication of reviews until both parties had submitted a review of the other; neither the company nor researchers with access to the company’s data have commented on the success of the change. In another experiment, Airbnb staff are working with external researchers to test whether offering a reward to encourage reviews has any effect on the number of critical reviews that guests ­provide.24 Other efforts are trying to squeeze more critical information from what is already there. Airbnb is using natural language processing to parse critical comments from review texts.25 Researchers have shown that taking missing reviews into account can give a much more effective measure of seller quality.26 The problem with such efforts is that, if systems were changed so that missing reviews or passive-aggressive text comments were known to be recorded (and so became, implicitly, a negative review) customer behavior may change to avoid the threat of a negative (non-) review in return.


pages: 224 words: 13,238

Electronic and Algorithmic Trading Technology: The Complete Guide by Kendall Kim

algorithmic trading, automated trading system, backtesting, commoditize, computerized trading, corporate governance, Credit Default Swap, diversification, en.wikipedia.org, family office, financial innovation, fixed income, index arbitrage, index fund, interest rate swap, linked data, market fragmentation, money market fund, natural language processing, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, short selling, statistical arbitrage, Steven Levy, transaction costs, yield curve

At the moment, big strategic decisions such as which shares to buy or sell are made by human traders; algorithmic programs are then given the power to decide how to buy or sell shares, with the aim of hiding the client’s intentions. Executing algorithms are designed to be stealthy and create as little volatility as possible. The fact that they are designed to reduce the market impact of trades should in fact have a stabilizing effect in equity markets. Some day, advances in natural language processing and statistical analysis might lead to algorithms capable of analyzing news feeds, deciding which shares to buy and sell, and devising their own strategies. Broker dealers, software vendors, and now investment institutions are entering the algorithmic arms race. Since there are so many possible trading strategies, it is doubtful that there will turn out to be one single trading algorithm that outperforms all others.


pages: 222 words: 70,132

Move Fast and Break Things: How Facebook, Google, and Amazon Cornered Culture and Undermined Democracy by Jonathan Taplin

1960s counterculture, affirmative action, Affordable Care Act / Obamacare, Airbnb, Amazon Mechanical Turk, American Legislative Exchange Council, Apple's 1984 Super Bowl advert, back-to-the-land, barriers to entry, basic income, battle of ideas, big data - Walmart - Pop Tarts, bitcoin, Brewster Kahle, Buckminster Fuller, Burning Man, Clayton Christensen, commoditize, creative destruction, crony capitalism, crowdsourcing, data is the new oil, David Brooks, David Graeber, don't be evil, Donald Trump, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, Elon Musk, equal pay for equal work, Erik Brynjolfsson, future of journalism, future of work, George Akerlof, George Gilder, Google bus, Hacker Ethic, Howard Rheingold, income inequality, informal economy, information asymmetry, information retrieval, Internet Archive, Internet of things, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: technological unemployment, John von Neumann, Joseph Schumpeter, Kevin Kelly, Kickstarter, labor-force participation, life extension, Marc Andreessen, Mark Zuckerberg, Menlo Park, Metcalfe’s law, Mother of all demos, move fast and break things, move fast and break things, natural language processing, Network effects, new economy, Norbert Wiener, offshore financial centre, packet switching, Paul Graham, paypal mafia, Peter Thiel, plutocrats, Plutocrats, pre–internet, Ray Kurzweil, recommendation engine, rent-seeking, revision control, Robert Bork, Robert Gordon, Robert Metcalfe, Ronald Reagan, Ross Ulbricht, Sam Altman, Sand Hill Road, secular stagnation, self-driving car, sharing economy, Silicon Valley, Silicon Valley ideology, smart grid, Snapchat, software is eating the world, Steve Jobs, Stewart Brand, technoutopianism, The Chicago School, The Market for Lemons, The Rise and Fall of American Growth, Tim Cook: Apple, trade route, transfer pricing, Travis Kalanick, trickle-down economics, Tyler Cowen: Great Stagnation, universal basic income, unpaid internship, We wanted flying cars, instead we got 140 characters, web application, Whole Earth Catalog, winner-take-all economy, women in the workforce, Y Combinator

During the 2016 presidential campaign, Donald Trump regularly boasted about his ten million Twitter followers, even though (according to the site StatusPeople, which tracks how many Twitter accounts are bots, how many are inactive, and how many are real) only 21 percent of Trump’s Twitter followers are real, active users on the platform. Hillary Clinton didn’t fare much better, with only 30 percent of her followers classified as real. During the 2012 presidential race, the Annenberg Innovation Lab studied Twitter and politics, and what we found was pretty disturbing. We created a natural-language-processing computer model that read every tweet about every candidate and sorted them by sentiment. At the beginning I loved reading the dashboard of the twenty most positive and negative tweets of the previous hour. But within weeks the incredible amount of racist tweets directed at our president became too painful to look at. The anonymity that Twitter provides is a shield that brings out the worst in humans.


pages: 237 words: 64,411

Humans Need Not Apply: A Guide to Wealth and Work in the Age of Artificial Intelligence by Jerry Kaplan

Affordable Care Act / Obamacare, Amazon Web Services, asset allocation, autonomous vehicles, bank run, bitcoin, Bob Noyce, Brian Krebs, business cycle, buy low sell high, Capital in the Twenty-First Century by Thomas Piketty, combinatorial explosion, computer vision, corporate governance, crowdsourcing, en.wikipedia.org, Erik Brynjolfsson, estate planning, Flash crash, Gini coefficient, Goldman Sachs: Vampire Squid, haute couture, hiring and firing, income inequality, index card, industrial robot, information asymmetry, invention of agriculture, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, Loebner Prize, Mark Zuckerberg, mortgage debt, natural language processing, Own Your Own Home, pattern recognition, Satoshi Nakamoto, school choice, Schrödinger's Cat, Second Machine Age, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Skype, software as a service, The Chicago School, The Future of Employment, Turing test, Watson beat the top human players on Jeopardy!, winner-take-all economy, women in the workforce, working poor, Works Progress Administration

Jason Brewster, the company’s CEO, estimates that FairDocument reduces the time required to complete a straightforward estate plan from several hours to as little as fifteen to thirty minutes, not to mention that his company is doing the prospecting for new clients and delivering them to the attorneys. A more sophisticated example of synthetic intellects encroaching on legal expertise is the startup Judicata.34 The company uses machine learning and natural language processing techniques to convert ordinary text—such as legal principles or specific cases— into structured information that can be used for finding relevant case law. For instance, it could find all cases in which a male Hispanic gay employee successfully sued for wrongful termination by reading the actual text of court decisions, saving countless hours in a law library or using a more traditional electronic search tool.


pages: 1,331 words: 183,137

Programming Rust: Fast, Safe Systems Development by Jim Blandy, Jason Orendorff

bioinformatics, bitcoin, Donald Knuth, Elon Musk, Firefox, mandelbrot fractal, MVC pattern, natural language processing, side project, sorting algorithm, speech recognition, Turing test, type inference, WebSocket

The care Rust takes with references, mutability, and lifetimes is valuable enough in single-threaded programs, but it is in concurrent programming that the true significance of those rules becomes apparent. They make it possible to expand your toolbox, to hack multiple styles of multithreaded code quickly and correctly—without skepticism, without cynicism, without fear. Fork-Join Parallelism The simplest use cases for threads arise when we have several completely independent tasks that we’d like to do at once. For example, suppose we’re doing natural language processing on a large corpus of documents. We could write a loop: fn process_files(filenames: Vec<String>) -> io::Result<()> { for document in filenames { let text = load(&document)?; // read source file let results = process(text); // compute statistics save(&document, results)?; // write output file } Ok(()) } The program would run as shown in Figure 19-1. Figure 19-1. Single-threaded execution of process_files() Since each document is processed separately, it’s relatively easy to speed this task up by splitting the corpus into chunks and processing each chunk on a separate thread, as shown in Figure 19-2.

A fork-join program is deterministic as long as the threads are really isolated, like the compute threads in the Mandelbrot program. The program always produces the same result, regardless of variations in thread speed. It’s a concurrency model without race conditions. The main disadvantage of fork-join is that it requires isolated units of work. Later in this chapter, we’ll consider some problems that don’t split up so cleanly. For now, let’s stick with the natural language processing example. We’ll show a few ways of applying the fork-join pattern to the process_files function. spawn and join The function std::thread::spawn starts a new thread. spawn(|| { println!("hello from a child thread"); }) It takes one argument, a FnOnce closure or function. Rust starts a new thread to run the code of that closure or function. The new thread is a real operating system thread with its own stack, just like threads in C++, C#, and Java.


pages: 931 words: 79,142

Concepts, Techniques, and Models of Computer Programming by Peter Van-Roy, Seif Haridi

computer age, Debian, discrete time, Donald Knuth, Eratosthenes, fault tolerance, G4S, general-purpose programming language, George Santayana, John von Neumann, Lao Tzu, Menlo Park, natural language processing, NP-complete, Paul Graham, premature optimization, sorting algorithm, Therac-25, Turing complete, Turing machine, type inference

declarative and relational computation models do logic programming. Sections 9.4 through 9.6 give large examples in three areas that are particularly well-suited to relational programming, namely natural language parsing, interpreters, and deductive databases. Section 9.7 gives an introduction to Prolog, a programming language based on relational programming. Prolog was originally designed for natural language processing, but has become one of the main programming languages in all areas that require symbolic programming. 9.1 The relational computation model 9.1.1 The choice and fail statements The relational computation model extends the declarative model with two new statements, choice and fail: The choice statement groups together a set of alternative statements. Executing a choice statement provisionally picks one of these alternatives.

For information on the history of Prolog and its implementation technology, see [45, 216]. Prolog is generally used in application areas in which complex symbolic manipulations are needed, such as expert systems, specialized language translators, program generation, data transformation, knowledge processing, deductive databases, and theorem proving. There are two application areas in which Prolog is still predominant over other languages: natural language processing and constraint programming. The latter in particular has matured from being a subfield of logic programming into being a field in its own right, with conferences, practical systems, and industrial applications. Prolog has many advantages for such applications. The bulk of programming can be done cleanly in its pure declarative subset. Programs are concise due to the expressiveness of unification and the term notation.

A Primer of Algol 60 Programming. Academic Press, 1962. [48] [53] [54] [55] Object Edsger W. Dijkstra. Go To statement considered harmful. Communications of the ACM, 11(3):147–148, March 1968. Denys Duchier. Loop support. Technical report, Mozart Consortium, 2003. Available at http://www.mozart-oz.org/. [56] Denys Duchier, Claire Gardent, and Joachim Niehren. Concurrent constraint programming in Oz for natural language processing. Technical report, Saarland University, Saarbrücken, Germany, 1999. Available at http://www.ps.uni-sb.de/Papers/abstracts/oznlp.html. [57] Denys Duchier, Leif Kornstaedt, and Christian Schulte. The Oz base environment. Technical report, Mozart Consortium, 2003. Available at http://www.mozart-oz.org/. [58] Denys Duchier, Leif Kornstaedt, Christian Schulte, and Gert Smolka. A higher-order module discipline with separate compilation, dynamic linking, and pickling.


pages: 267 words: 72,552

Reinventing Capitalism in the Age of Big Data by Viktor Mayer-Schönberger, Thomas Ramge

accounting loophole / creative accounting, Air France Flight 447, Airbnb, Alvin Roth, Atul Gawande, augmented reality, banking crisis, basic income, Bayesian statistics, bitcoin, blockchain, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, Cass Sunstein, centralized clearinghouse, Checklist Manifesto, cloud computing, cognitive bias, conceptual framework, creative destruction, Daniel Kahneman / Amos Tversky, disruptive innovation, Donald Trump, double entry bookkeeping, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Ford paid five dollars a day, Frederick Winslow Taylor, fundamental attribution error, George Akerlof, gig economy, Google Glasses, information asymmetry, interchangeable parts, invention of the telegraph, inventory management, invisible hand, James Watt: steam engine, Jeff Bezos, job automation, job satisfaction, joint-stock company, Joseph Schumpeter, Kickstarter, knowledge worker, labor-force participation, land reform, lone genius, low cost airline, low cost carrier, Marc Andreessen, market bubble, market design, market fundamentalism, means of production, meta analysis, meta-analysis, Moneyball by Michael Lewis explains big data, multi-sided market, natural language processing, Network effects, Norbert Wiener, offshore financial centre, Parag Khanna, payday loans, peer-to-peer lending, Peter Thiel, Ponzi scheme, prediction markets, price anchoring, price mechanism, purchasing power parity, random walk, recommendation engine, Richard Thaler, ride hailing / ride sharing, Sam Altman, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, smart grid, smart meter, Snapchat, statistical model, Steve Jobs, technoutopianism, The Future of Employment, The Market for Lemons, The Nature of the Firm, transaction costs, universal basic income, William Langewiesche, Y Combinator

The system works in exactly the same way as the machine learning systems we described in Chapter 4: customers don’t have to make their needs and wants explicit, because the systems learn from how humans interact with the world around them. Feedback also plays a crucial role at Stitch Fix. To begin with, every item a customer returns generates data. But customers are strongly encouraged to comment on each item they receive and they can do so in plain English, which, with the help of natural-language processing software, further refines a customer’s preferences. Stitch Fix is also developing its own line of apparel, which uses preference data in the design process. Stitch Fix’s simple secret is that it understands data-rich markets and the crucial role data plays in customer satisfaction. As they put it: “Rich data on both sides of this ‘market’ enables Stitch Fix to be a matchmaker, connecting clients with styles they love (and never would’ve found on their own).”


pages: 589 words: 69,193

Mastering Pandas by Femi Anthony

Amazon Web Services, Bayesian statistics, correlation coefficient, correlation does not imply causation, Debian, en.wikipedia.org, Internet of things, natural language processing, p-value, random walk, side project, statistical model, Thomas Bayes

Among the characteristics that make Python popular for data science are its very user-friendly (human-readable) syntax, the fact that it is interpreted rather than compiled (leading to faster development time), and its very comprehensive library for parsing and analyzing data, as well as its capacity for doing numerical and statistical computations. Python has libraries that provide a complete toolkit for data science and analysis. The major ones are as follows: NumPy: The general-purpose array functionality with emphasis on numeric computation SciPy: Numerical computing Matplotlib: Graphics pandas: Series and data frames (1D and 2D array-like types) Scikit-Learn: Machine learning NLTK: Natural language processing Statstool: Statistical analysis For this book, we will be focusing on the 4th library listed in the preceding list, pandas. What is pandas? The pandas is a high-performance open source library for data analysis in Python developed by Wes McKinney in 2008. Over the years, it has become the de-facto standard library for data analysis using Python. There's been great adoption of the tool, a large community behind it, (220+ contributors and 9000+ commits by 03/2014), rapid iteration, features, and enhancements continuously made.


pages: 268 words: 75,850

The Formula: How Algorithms Solve All Our Problems-And Create More by Luke Dormehl

3D printing, algorithmic trading, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, big data - Walmart - Pop Tarts, call centre, Cass Sunstein, Clayton Christensen, commoditize, computer age, death of newspapers, deferred acceptance, disruptive innovation, Edward Lorenz: Chaos theory, Erik Brynjolfsson, Filter Bubble, Flash crash, Florence Nightingale: pie chart, Frank Levy and Richard Murnane: The New Division of Labor, Google Earth, Google Glasses, High speed trading, Internet Archive, Isaac Newton, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Kevin Kelly, Kodak vs Instagram, lifelogging, Marshall McLuhan, means of production, Nate Silver, natural language processing, Netflix Prize, Panopticon Jeremy Bentham, pattern recognition, price discrimination, recommendation engine, Richard Thaler, Rosa Parks, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Slavoj Žižek, social graph, speech recognition, Steve Jobs, Steven Levy, Steven Pinker, Stewart Brand, the scientific method, The Signal and the Noise by Nate Silver, upwardly mobile, Wall-E, Watson beat the top human players on Jeopardy!, Y Combinator

If such a tool was to be implemented within a future edition of MS Word or Google Docs, it is not inconceivable that users may one day finish typing a document and hit a single button—at which point it is auto-checked for spelling, punctuation, formatting and truthfulness. Already there is widespread use of algorithms in academia for sifting through submitted work and pulling up passages that may or may not be plagiarized. These will only become more widespread as natural language processing becomes more intuitive and able to move beyond simple passage comparison to detailed content and idea analysis. There is no one-size-fits-all answer to how best to deal with algorithms. In some cases, increased transparency would appear to be the answer. Where algorithms are used to enforce laws, for instance, releasing the source code to the general public would both protect against the dangers of unchecked government policy-making and make it possible to determine how specific decisions have been reached.


pages: 276 words: 81,153

Outnumbered: From Facebook and Google to Fake News and Filter-Bubbles – the Algorithms That Control Our Lives by David Sumpter

affirmative action, Bernie Sanders, correlation does not imply causation, crowdsourcing, don't be evil, Donald Trump, Elon Musk, Filter Bubble, Google Glasses, illegal immigration, Jeff Bezos, job automation, Kenneth Arrow, Loebner Prize, Mark Zuckerberg, meta analysis, meta-analysis, Minecraft, Nate Silver, natural language processing, Nelson Mandela, p-value, prediction markets, random walk, Ray Kurzweil, Robert Mercer, selection bias, self-driving car, Silicon Valley, Skype, Snapchat, speech recognition, statistical model, Stephen Hawking, Steven Pinker, The Signal and the Noise by Nate Silver, traveling salesman, Turing test

To satisfy our constant demand for the latest information, Google, Yahoo! and other Internet giants need to build systems that automatically track political changes, football transfer rumours and contestants in The Voice. The algorithms need to learn to understand new analogies and concepts by reading newspapers, checking Wikipedia and following social media. Jeffrey Pennington and his colleagues at the Stanford Natural Language Processing Group, have found an elegant way of training an algorithm to learn about analogies from web pages. Their algorithm, known as GloVe (global vectors for word representation), learns by reading a very large amount of text. In a 2014 article, Jeffrey trained GloVe on the whole of Wikipedia, which at that point totalled 1.6 billion words and symbols, together with the fifth edition of Gigaword, which is a database of 4.3 billion words and symbols downloaded from news sites around the world.


pages: 301 words: 85,263

New Dark Age: Technology and the End of the Future by James Bridle

AI winter, Airbnb, Alfred Russel Wallace, Automated Insights, autonomous vehicles, back-to-the-land, Benoit Mandelbrot, Bernie Sanders, bitcoin, British Empire, Brownian motion, Buckminster Fuller, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, cognitive bias, cognitive dissonance, combinatorial explosion, computer vision, congestion charging, cryptocurrency, data is the new oil, Donald Trump, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, drone strike, Edward Snowden, fear of failure, Flash crash, Google Earth, Haber-Bosch Process, hive mind, income inequality, informal economy, Internet of things, Isaac Newton, John von Neumann, Julian Assange, Kickstarter, late capitalism, lone genius, mandelbrot fractal, meta analysis, meta-analysis, Minecraft, mutually assured destruction, natural language processing, Network effects, oil shock, p-value, pattern recognition, peak oil, recommendation engine, road to serfdom, Robert Mercer, Ronald Reagan, self-driving car, Silicon Valley, Silicon Valley ideology, Skype, social graph, sorting algorithm, South China Sea, speech recognition, Spread Networks laid a new fibre optics cable between New York and Chicago, stem cell, Stuxnet, technoutopianism, the built environment, the scientific method, Uber for X, undersea cable, University of East Anglia, uranium enrichment, Vannevar Bush, WikiLeaks

‘HP cameras are racist’, YouTube video, username: wzamen01, December 10, 2009. 14.David Smith, ‘“Racism” of early colour photography explored in art exhibition’, Guardian, January 25, 2013, theguardian.com. 15.Phillip Martin, ‘How A Cambridge Woman’s Campaign Against Polaroid Weakened Apartheid’, WGBH News, December 9, 2013, news.wgbh.org. 16.Hewlett-Packard, ‘Global Citizenship Report 2009’, hp.com. 17.Trevor Paglen, ‘re:publica 2017 | Day 3 – Livestream Stage 1 – English’, YouTube video, username: re:publica, May 10, 2017. 18.Walter Benjamin, ‘Theses on the Philosophy of History’, in Walter Benjamin: Selected Writings, Volume 4: 1938–1940, Cambridge, MA: Harvard University Press, 2006. 19.PredPol, ‘5 Common Myths about Predictive Policing’, predpol.com. 20.G. O. Mohler, M. B. Short, P. J. Brantingham, et al., ‘Self-exciting point process modeling of crime’, JASA 106 (2011). 21.Daniel Jurafsky and James H. Martin, Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edition, Upper Saddle River, NJ: Prentice Hall, 2009. 22.Walter Benjamin, ‘The Task of the Translator’, in Selected Writings Volume 1 1913–1926, Marcus Bullock and Michael W. Jennings, eds, Cambridge, MA and London: Belknap Press, 1996. 23.Murat Nemet-Nejat, ‘Translation: Contemplating Against the Grain’, Cipher, 1999, cipherjournal.com. 24.Tim Adams, ‘Can Google break the computer language barrier?’


pages: 292 words: 85,151

Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It) by Salim Ismail, Yuri van Geest

23andMe, 3D printing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, Ben Horowitz, bioinformatics, bitcoin, Black Swan, blockchain, Burning Man, business intelligence, business process, call centre, chief data officer, Chris Wanstrath, Clayton Christensen, clean water, cloud computing, cognitive bias, collaborative consumption, collaborative economy, commoditize, corporate social responsibility, cross-subsidies, crowdsourcing, cryptocurrency, dark matter, Dean Kamen, dematerialisation, discounted cash flows, disruptive innovation, distributed ledger, Edward Snowden, Elon Musk, en.wikipedia.org, Ethereum, ethereum blockchain, game design, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, hiring and firing, Hyperloop, industrial robot, Innovator's Dilemma, intangible asset, Internet of things, Iridium satellite, Isaac Newton, Jeff Bezos, Joi Ito, Kevin Kelly, Kickstarter, knowledge worker, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, lifelogging, loose coupling, loss aversion, low earth orbit, Lyft, Marc Andreessen, Mark Zuckerberg, market design, means of production, minimum viable product, natural language processing, Netflix Prize, NetJets, Network effects, new economy, Oculus Rift, offshore financial centre, PageRank, pattern recognition, Paul Graham, paypal mafia, peer-to-peer, peer-to-peer model, Peter H. Diamandis: Planetary Resources, Peter Thiel, prediction markets, profit motive, publish or perish, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, Ronald Coase, Second Machine Age, self-driving car, sharing economy, Silicon Valley, skunkworks, Skype, smart contracts, Snapchat, social software, software is eating the world, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, subscription business, supply-chain management, TaskRabbit, telepresence, telepresence robot, Tony Hsieh, transaction costs, Travis Kalanick, Tyler Cowen: Great Stagnation, uber lyft, urban planning, WikiLeaks, winner-take-all economy, X Prize, Y Combinator, zero-sum game

And just as inevitably, within two weeks, complete newcomers to the field trounce their best results. For example, the Hewlett Foundation sponsored a 2012 competition to develop an automated scoring algorithm for student-written essays. Of the 155 teams competing, three were awarded a total of $100,000 in prize money. What was particularly interesting was the fact that none of the winners had prior experience with natural language processing (NLP). Nonetheless, they beat the experts, many of them with decades of experience in NLP under their belts. This can’t help but impact the current status quo. Raymond McCauley, Biotechnology & Bioinformatics Chair at Singularity University, has noticed that “When people want a biotech job in Silicon Valley, they hide their PhDs to avoid being seen as a narrow specialist.” So, if experts are suspect, where should we turn instead?


pages: 284 words: 84,169

Talk on the Wild Side by Lane Greene

Affordable Care Act / Obamacare, Albert Einstein, Boris Johnson, Donald Trump, ending welfare as we know it, experimental subject, facts on the ground, framing effect, Google Chrome, illegal immigration, invisible hand, meta analysis, meta-analysis, moral panic, natural language processing, obamacare, Ronald Reagan, Sapir-Whorf hypothesis, Snapchat, speech recognition, Steven Pinker, Turing test, Wall-E

Experience with language does most of the heavy lifting of teaching children (and adults) how to wield it well. The rules can be added on for the tricky cases, at the appropriate age, but we should never confuse an explicit knowledge of rules (“this is what a relative clause looks like”) with an ability to write. Lousy writing can be grammatical; good writing can have errors. Computer scientists who work in natural-languages processing are exploring best-of-both-worlds systems, for translation, parsing and other applications. They are combining newer-fangled machine learning with explicit rule coding. Educators should do the same, researching which things are best learned by experience, and which are best learned by rule. But rules cannot be the be-all, end-all. Whether or not a child’s mind is a computer, it can’t be programmed like one. 4 Buxom, but never nice IF COMPUTERS, WHICH CAN DO truly amazing things, struggle with language, then the human brain must be a fairly awesome machine to be able to handle it.


Programming Python by Mark Lutz

Benevolent Dictator For Life (BDFL), Build a better mousetrap, business process, cloud computing, Firefox, general-purpose programming language, Google Chrome, Guido van Rossum, iterative process, linear programming, loose coupling, MVC pattern, natural language processing, off grid, slashdot, sorting algorithm, web application

YAPPS creates LL(1) parsers, which are not as powerful as LALR parsers but are sufficient for many language tasks. For more on YAPPS, see http://theory.stanford.edu/~amitp/Yapps or search the Web at large. Natural language processing Even more demanding language analysis tasks require techniques developed in artificial intelligence research, such as semantic analysis and machine learning. For instance, the Natural Language Toolkit, or NLTK, is an open source suite of Python libraries and programs for symbolic and statistical natural language processing. It applies linguistic techniques to textual data, and it can be used in the development of natural language recognition software and systems. For much more on this subject, be sure to also see the O’Reilly book Natural Language Processing with Python, which explores, among other things, ways to use NLTK in Python. Not every system’s users will pose questions in a natural language, of course, but there are many applications which can make good use of such utility.

Strategies for Processing Text in Python In the grand scheme of things, there are a variety of ways to handle text processing and language analysis in Python: Expressions Built-in string object expressions Methods Built-in string object method calls Patterns Regular expression pattern matching Parsers: markup XML and HTML text parsing Parsers: grammars Custom language parsers, both handcoded and generated Embedding Running Python code with eval and exec built-ins And more Natural language processing For simpler tasks, Python’s built-in string object is often all we really need. Python strings can be indexed, concatenated, sliced, and processed with both string method calls and built-in functions. Our main emphasis in this chapter is mostly on higher-level tools and techniques for analyzing textual information and language, but we’ll briefly explore each of these techniques in turn.

GIL and, A process-based alternative: multiprocessing (ahead) implementation, Implementation and usage rules IPC support, Interprocess Communication, IPC Tools: Pipes, Shared Memory, and Queues, Queues and subclassing launching GUIs as programs, Launching GUIs as programs other ways: multiprocessing, Launching GUIs as programs other ways: multiprocessing processes and locks, The Basics: Processes and Locks, Implementation and usage rules socket server portability and, Why multiprocessing doesn’t help with socket server portability, Why multiprocessing doesn’t help with socket server portability starting independent programs, Starting Independent Programs usage rules, Implementation and usage rules Musciano, Chuck, “Oh, What a Tangled Web We Weave” MVC (model-view-controller) structure, Python Internet Development Options mysql-python interface, Persistence Options in Python N name conventions, File name conventions, Installing CGI scripts CGI scripts, Installing CGI scripts files, File name conventions __name__ variable, Using Programs in Two Ways named pipes, Interprocess Communication, Anonymous Pipes, Named Pipes (Fifos), Named pipe basics, Named pipe basics, Named pipe use cases basic functionality, Named pipe basics, Named pipe basics creating, Named Pipes (Fifos) defined, Interprocess Communication, Anonymous Pipes use cases, Named pipe use cases namespaces, Running Code Strings with Results and Namespaces, Running Code Strings with Results and Namespaces, Running Strings in Dictionaries creating, Running Strings in Dictionaries running code strings with, Running Code Strings with Results and Namespaces, Running Code Strings with Results and Namespaces natural language processing, Advanced Language Tools nested structures, Nested structures, Uploading Local Trees, Uploading Local Trees, Pickled Objects, Pickling in Action dictionaries, Nested structures pickling, Pickled Objects, Pickling in Action uploading local trees, Uploading Local Trees, Uploading Local Trees Network News Transfer Protocol (NNTP), NNTP: Accessing Newsgroups, NNTP: Accessing Newsgroups, More Than One Way to Push Bits over the Net network scripting, Python Internet Development Options, Python Internet Development Options, The Socket Layer, Machine identifiers, The Protocol Layer, Protocol structures, Python’s Internet Library Modules, Python’s Internet Library Modules, Socket Programming, Binding reserved port servers, Handling Multiple Clients, Summary: Choosing a Server Scheme, Making Sockets Look Like Files and Streams, Sockets versus command pipes, A Simple Python File Server, Using a reusable form-layout class development options, Python Internet Development Options, Python Internet Development Options handling multiple clients, Handling Multiple Clients, Summary: Choosing a Server Scheme library modules and, Python’s Internet Library Modules, Python’s Internet Library Modules making sockets look like files/streams, Making Sockets Look Like Files and Streams, Sockets versus command pipes protocols and, The Protocol Layer, Protocol structures Python file server, A Simple Python File Server, Using a reusable form-layout class sockets and, The Socket Layer, Machine identifiers, Socket Programming, Binding reserved port servers newsgroups, NNTP: Accessing Newsgroups, NNTP: Accessing Newsgroups, Ideas for Improvement accessing, NNTP: Accessing Newsgroups, NNTP: Accessing Newsgroups handling messages, Ideas for Improvement NLTK suite, Advanced Language Tools NNTP (Network News Transfer Protocol), NNTP: Accessing Newsgroups, NNTP: Accessing Newsgroups, More Than One Way to Push Bits over the Net nntplib module, Python’s Internet Library Modules, NNTP: Accessing Newsgroups, NNTP: Accessing Newsgroups numeric tools, A Quick Geometry Lesson NumPy programming extension, A Quick Geometry Lesson, Extending and Embedding O object references, Deferring Calls with Lambdas and Object References, Deferring Calls with Lambdas and Object References, Reloading Callback Handlers Dynamically callback handlers as, Reloading Callback Handlers Dynamically deferring calls, Deferring Calls with Lambdas and Object References, Deferring Calls with Lambdas and Object References object relational mappers, Other Database Options (see ORMs) Object Request Broker (ORB), Python Internet Development Options object types, storing in shelves, Storing Built-in Object Types in Shelves object-oriented databases (OODBs), Persistence Options in Python object-oriented programming, Step 3: Stepping Up to OOP (see OOP) objects, Step 1: Sharing Objects Between Pages—A New Input Form, Step 1: Sharing Objects Between Pages—A New Input Form, Persistence Options in Python, Pickled Objects, Pickle Details: Protocols, Binary Modes, and _pickle, Pickled Objects, Changing Classes of Objects Stored in Shelves, Objects are unique only within a key, What Is Embedded Code?


pages: 343 words: 93,544

vN: The First Machine Dynasty (The Machine Dynasty Book 1) by Madeline Ashby

big-box store, iterative process, natural language processing, place-making, traveling salesman, urban planning

Her failsafe guaranteed that. The angel investor supporting the development of von Neumann humanoids was not a military contractor, or a tech firm, or even a design giant. It was a church. A global megachurch named New Eden Ministries, Inc, that believed firmly that the Rapture was coming any minute now. It collected donations, bought real estate, and put the proceeds into programmable matter, natural language processing, and affect detection – all for the benefit of the few pitiful humans regrettably left behind to deal with God's wrath. They would need companions, after all. Helpmeets. And those helpmeets couldn't ever hurt humans. That was the Horsemen's job. It all went to hell, of course. The pastor of New Eden Ministries, Jonah LeMarque, and many of his council members became the defendants in a class action suit brought by youth group members regarding the use of their bodies as models in a pornographic game.


pages: 319 words: 90,965

The End of College: Creating the Future of Learning and the University of Everywhere by Kevin Carey

Albert Einstein, barriers to entry, Bayesian statistics, Berlin Wall, business cycle, business intelligence, carbon-based life, Claude Shannon: information theory, complexity theory, David Heinemeier Hansson, declining real wages, deliberate practice, discrete time, disruptive innovation, double helix, Douglas Engelbart, Douglas Engelbart, Downton Abbey, Drosophila, Firefox, Frank Gehry, Google X / Alphabet X, informal economy, invention of the printing press, inventory management, John Markoff, Khan Academy, Kickstarter, low skilled workers, Lyft, Marc Andreessen, Mark Zuckerberg, meta analysis, meta-analysis, natural language processing, Network effects, open borders, pattern recognition, Peter Thiel, pez dispenser, ride hailing / ride sharing, Ronald Reagan, Ruby on Rails, Sand Hill Road, self-driving car, Silicon Valley, Silicon Valley startup, social web, South of Market, San Francisco, speech recognition, Steve Jobs, technoutopianism, transcontinental railway, uber lyft, Vannevar Bush

He and two coauthors recently name-checked a well-known article called “The Unreasonable Effectiveness of Mathematics in the Natural Sciences,” which “examines why so much of physics can be neatly explained with simple mathematical formulas such as F = ma or E = mc2. Meanwhile, sciences that involve human beings rather than elementary particles have proven more resistant to elegant mathematics.” “Perhaps when it comes to natural language processing and related fields,” they wrote, “we’re doomed to complex theories that will never have the elegance of physics equations. But if that’s so, we should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data.” Learning and human cognition are definitely among the “related fields.”


pages: 332 words: 91,780

Starstruck: The Business of Celebrity by Currid

"Robert Solow", barriers to entry, Bernie Madoff, Donald Trump, income inequality, index card, industrial cluster, Mark Zuckerberg, Metcalfe’s law, natural language processing, place-making, Ponzi scheme, post-industrial society, prediction markets, Renaissance Technologies, Richard Florida, Robert Metcalfe, rolodex, shareholder value, Silicon Valley, slashdot, transaction costs, upwardly mobile, urban decay, Vilfredo Pareto, winner-take-all economy

Step one collected meta-information from the pictures in the Getty database. We then stored the meta-information in a MS-SQL relational database. In step two we identified the individuals in each photo. Instead of studying the photos themselves, we studied the caption information associated with the photos and cataloged an aggregate collection of this data. In order to identify the photographed objects, we used natural language processing (NLP). SQL-implemented association rules enabled us to clean the data. Our cataloging process collected the following information: names and occupations of individuals in each picture, the event and date when the photo was taken (e.g., Actress Angelina Jolie at the Oscars, February 22, 2007). In step three we used the database information to build a list of events and the celebrities photographed at them.


High-Frequency Trading by David Easley, Marcos López de Prado, Maureen O'Hara

algorithmic trading, asset allocation, backtesting, Brownian motion, capital asset pricing model, computer vision, continuous double auction, dark matter, discrete time, finite state, fixed income, Flash crash, High speed trading, index arbitrage, information asymmetry, interest rate swap, latency arbitrage, margin call, market design, market fragmentation, market fundamentalism, market microstructure, martingale, natural language processing, offshore financial centre, pattern recognition, price discovery process, price discrimination, price stability, quantitative trading / quantitative finance, random walk, Sharpe ratio, statistical arbitrage, statistical model, stochastic process, Tobin tax, transaction costs, two-sided market, yield curve

What interpretation can be given for a single order placement in a massive stream of microstructure data, or to a snapshot of an intraday order book, especially considering the fact that any outstanding order can be cancelled by the submitting party any time prior to execution?2 95 i i i i i i “Easley” — 2013/10/8 — 11:31 — page 96 — #116 i i HIGH-FREQUENCY TRADING To offer an analogy, consider the now common application of machine learning to problems in natural language processing (NLP) and computer vision. Both of them remain very challenging domains. But, in NLP, it is at least clear that the basic unit of meaning in the data is the word, which is how digital documents are represented and processed. In contrast, digital images are represented at the pixel level, but this is certainly not the meaningful unit of information in vision applications – objects are – but algorithmically extracting objects from images remains a difficult problem.


pages: 382 words: 92,138

The Entrepreneurial State: Debunking Public vs. Private Sector Myths by Mariana Mazzucato

"Robert Solow", Apple II, banking crisis, barriers to entry, Bretton Woods, business cycle, California gold rush, call centre, carbon footprint, Carmen Reinhart, cleantech, computer age, creative destruction, credit crunch, David Ricardo: comparative advantage, demand response, deskilling, endogenous growth, energy security, energy transition, eurozone crisis, everywhere but in the productivity statistics, Financial Instability Hypothesis, full employment, G4S, Growth in a Time of Debt, Hyman Minsky, incomplete markets, information retrieval, intangible asset, invisible hand, Joseph Schumpeter, Kenneth Rogoff, Kickstarter, knowledge economy, knowledge worker, natural language processing, new economy, offshore financial centre, Philip Mirowski, popular electronics, profit maximization, Ralph Nader, renewable energy credits, rent-seeking, ride hailing / ride sharing, risk tolerance, shareholder value, Silicon Valley, Silicon Valley ideology, smart grid, Steve Jobs, Steve Wozniak, The Wealth of Nations by Adam Smith, Tim Cook: Apple, too big to fail, total factor productivity, trickle-down economics, Washington Consensus, William Shockley: the traitorous eight

This technology, as well as the infrastructure of the system, would have been impossible without the government taking the initiative and making the necessary financial commitment for such a highly complex system. Apple’s latest iPhone feature is a virtual personal assistant known as SIRI. And, like most of the other key technological features in Apple’s iOS products, SIRI has its roots in federal funding and research. SIRI is an artificial intelligence program consisting of machine learning, natural language processing and a Web search algorithm (Roush 2010). In 2000, DARPA asked the Stanford Research Institute (SRI) to take the lead on a project to develop a sort of ‘virtual office assistant’ to assist military personnel. SRI was put in charge of coordinating the ‘Cognitive Assistant that Learns and Organizes’ (CALO) project which included 20 universities all over the US collaborating to develop the necessary technology base.


Learn Algorithmic Trading by Sebastien Donadio

active measures, algorithmic trading, automated trading system, backtesting, Bayesian statistics, buy and hold, buy low sell high, cryptocurrency, DevOps, en.wikipedia.org, fixed income, Flash crash, Guido van Rossum, latency arbitrage, locking in a profit, market fundamentalism, market microstructure, martingale, natural language processing, p-value, paper trading, performance metric, prediction markets, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, Sharpe ratio, short selling, sorting algorithm, statistical arbitrage, statistical model, stochastic process, survivorship bias, transaction costs, type inference, WebSocket, zero-sum game

He specializes in statistical arbitrage market-making, and pairs trading strategies for the most liquid global futures contracts. He works as a Senior Quantitative Developer at a trading firm in Chicago. He holds a Masters in Computer Science from the University of Southern California. His areas of interest include Computer Architecture, FinTech, Probability Theory and Stochastic Processes, Statistical Learning and Inference Methods, and Natural Language Processing. About the reviewers Nataraj Dasgupta is the VP of Advanced Analytics at RxDataScience Inc. He has been in the IT industry for more than 19 years and has worked in the technical & analytics divisions of Philip Morris, IBM, UBS Investment Bank, and Purdue Pharma. He led the Data Science team at Purdue, where he developed the company's award-winning Big Data and Machine Learning platform.


pages: 327 words: 103,336

Everything Is Obvious: *Once You Know the Answer by Duncan J. Watts

active measures, affirmative action, Albert Einstein, Amazon Mechanical Turk, Black Swan, business cycle, butterfly effect, Carmen Reinhart, Cass Sunstein, clockwork universe, cognitive dissonance, coherent worldview, collapse of Lehman Brothers, complexity theory, correlation does not imply causation, crowdsourcing, death of newspapers, discovery of DNA, East Village, easy for humans, difficult for computers, edge city, en.wikipedia.org, Erik Brynjolfsson, framing effect, Geoffrey West, Santa Fe Institute, George Santayana, happiness index / gross national happiness, high batting average, hindsight bias, illegal immigration, industrial cluster, interest rate swap, invention of the printing press, invention of the telescope, invisible hand, Isaac Newton, Jane Jacobs, Jeff Bezos, Joseph Schumpeter, Kenneth Rogoff, lake wobegon effect, Laplace demon, Long Term Capital Management, loss aversion, medical malpractice, meta analysis, meta-analysis, Milgram experiment, natural language processing, Netflix Prize, Network effects, oil shock, packet switching, pattern recognition, performance metric, phenotype, Pierre-Simon Laplace, planetary scale, prediction markets, pre–internet, RAND corporation, random walk, RFID, school choice, Silicon Valley, social intelligence, statistical model, Steve Ballmer, Steve Jobs, Steve Wozniak, supply-chain management, The Death and Life of Great American Cities, the scientific method, The Wisdom of Crowds, too big to fail, Toyota Production System, ultimatum game, urban planning, Vincenzo Peruggia: Mona Lisa, Watson beat the top human players on Jeopardy!, X Prize

Small, Michael, Pengliang L. Shi, and Chi Kong Tse. 2004. “Plausible Models for Propagation of the SARS Virus.” IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences E87A (9):2379–86. Snow, Rion, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. “Cheap and Fast—But Is It Good? Evaluating Non-Expert Annotations for Natural Language Tasks.” In Empirical Methods in Natural Language Processing. Honolulu, Hawaii: Association for Computational Linguistics. Somers, Margaret R. 1998. “ ‘We’re No Angels’: Realism, Rational Choice, and Relationality in Social Science.” American Journal of Sociology 104 (3):722–84. Sorkin, Andrew Ross (ed). 2008. “Steve & Barry’s Files for Bankruptcy.” New York Times, July 9. Sorkin, Andrew Ross. 2009a. Too Big to Fail: The Inside Story of How Wall Street and Washington Fought to Save the Financial System from Crisis—and Themselves.


pages: 193 words: 98,671

The Inmates Are Running the Asylum by Alan Cooper

Albert Einstein, business cycle, delayed gratification, Donald Trump, Howard Rheingold, informal economy, iterative process, Jeff Bezos, lateral thinking, Menlo Park, natural language processing, new economy, pets.com, Robert X Cringely, Silicon Valley, Silicon Valley startup, skunkworks, Steve Jobs, Steven Pinker, telemarketer, urban planning

Microsoft, in particular, is touting this false panacea. Microsoft says that interfaces will be easy to use as soon as it can perfect voice recognition and handwriting recognition. I think this is silly. Each new technology merely makes it possible to frustrate users with faster and more-powerful systems. A key to better interaction is to reduce the uncertainty between computers and users. Natural-language processing can never do that because meanings are so vague in human conversation. So much of our communication is based on nuance, gesture, and inflection that although it might be a year or two before computers can recognize our words, it might be decades—if ever—before computers can effectively interpret our meaning. Voice-recognition technology will certainly prove to be useful for many products.


pages: 341 words: 95,752

Word by Word: The Secret Life of Dictionaries by Kory Stamper

Affordable Care Act / Obamacare, index card, natural language processing, obamacare, Ronald Reagan, Steven Pinker, why are manhole covers round?

Or I might decide that it’s an important enough word that even though it’s still being glossed regularly, it deserves entry right away: words like “AIDS” and “SARS” will probably get entered into a dictionary fairly quickly after they first show up on the scene, because you can reason that the syndromes they name are significant enough health events that they are not going anywhere very soon. Those sorts of decisions are made on a human level; people with experience in the trenches of language change can make those decisions far better than natural-language processing programs currently can. Computers are, however, far quicker. Thinking about documenting language brings on a gurgle of dread deep in the editorial gut. The philosophy of citation gathering actually runs counter to how language forms. Because we live in a literate society with comparatively easy access to books and education, we tend to believe that the written word is more important and has more weight than the spoken word.


pages: 352 words: 96,532

Where Wizards Stay Up Late: The Origins of the Internet by Katie Hafner, Matthew Lyon

air freight, Bill Duvall, computer age, conceptual framework, Donald Davies, Douglas Engelbart, Douglas Engelbart, fault tolerance, Hush-A-Phone, information retrieval, John Markoff, Kevin Kelly, Leonard Kleinrock, Marc Andreessen, Menlo Park, natural language processing, packet switching, RAND corporation, RFC: Request For Comment, Robert Metcalfe, Ronald Reagan, Silicon Valley, speech recognition, Steve Crocker, Steven Levy

Walden later served as Heart’s boss, and Barker had gone on to run one of BBN’s divisions. The most conspicuous exception to this was Crowther, who had remained a programmer. For years Heart had been Crowther’s champion, lobbying for the company to let Crowther just be Crowther and think up ingenious ideas in his own dreamy way. In the years following the IMP project, Crowther pursued some unusual ideas about natural language processing, and worked extensively on high-speed packet-switching technology. Severo Ornstein had left BBN in the 1970s for Xerox PARC, and while there he started Computer Professionals for Social Responsibility. When he retired from Xerox, he and his wife moved into one of the remotest corners of the San Francisco Bay Area. For years Ornstein stayed off the Net, and for years he eschewed e-mail.


pages: 314 words: 101,034

Every Patient Tells a Story by Lisa Sanders

data acquisition, discovery of penicillin, high batting average, index card, medical residency, meta analysis, meta-analysis, natural language processing, pattern recognition, Pepto Bismol, randomized controlled trial, Ronald Reagan

Doctors using the diagnostic tool that Britto and Maude named Isabel can enter information using either key findings (like GIDEON) or whole-text entries, such as clinical descriptions that are cut-and-pasted from another program. Isabel also uses a novel search strategy to identify candidate diagnoses from the clinical findings. The program includes a thesaurus that facilitates recognition of a wide range of terms describing each finding. The program then uses natural language processing and search algorithms to compare these terms to those used in a selected reference library. For internal medicine cases, the library includes six key textbooks and forty-six major journals in general and subspecialty medicine and toxicology. The search domain and results are filtered to take into account the patient’s age, sex, geographic location, pregnancy status, and other clinical parameters that are either selected by the clinician or automatically entered if the system is integrated with the clinician’s electronic medical record.


pages: 326 words: 103,170

The Seventh Sense: Power, Fortune, and Survival in the Age of Networks by Joshua Cooper Ramo

Airbnb, Albert Einstein, algorithmic trading, barriers to entry, Berlin Wall, bitcoin, British Empire, cloud computing, crowdsourcing, Danny Hillis, defense in depth, Deng Xiaoping, drone strike, Edward Snowden, Fall of the Berlin Wall, Firefox, Google Chrome, income inequality, Isaac Newton, Jeff Bezos, job automation, Joi Ito, market bubble, Menlo Park, Metcalfe’s law, Mitch Kapor, natural language processing, Network effects, Norbert Wiener, Oculus Rift, packet switching, Paul Graham, price stability, quantitative easing, RAND corporation, recommendation engine, Republic of Letters, Richard Feynman, road to serfdom, Robert Metcalfe, Sand Hill Road, secular stagnation, self-driving car, Silicon Valley, Skype, Snapchat, social web, sovereign wealth fund, Steve Jobs, Steve Wozniak, Stewart Brand, Stuxnet, superintelligent machines, technological singularity, The Coming Technological Singularity, The Wealth of Nations by Adam Smith, too big to fail, Vernor Vinge, zero day

CHAPTER SEVEN The New Caste In which we meet a powerful group defined, enabled, and enriched by their mastery of the networks. 1. In 1965, an MIT computer scientist named Joseph Weizenbaum found himself, somewhat unexpectedly, considering a problem with his computer and its users that he had not quite anticipated. Weizenbaum was in the midst of an experiment that started innocently enough. He’d written a program to perform what is now known as natural language processing, essentially a bit of code designed to translate what a human tells a machine into something the machine can actually work with. When someone asks a computer, What is the weather? the machine uses a special processing approach to turn that into an instruction set. Answering those sorts of queries demands a great deal of digital work before the computer can figure out what you mean and how to fill you in.


pages: 337 words: 103,522

The Creativity Code: How AI Is Learning to Write, Paint and Think by Marcus Du Sautoy

3D printing, Ada Lovelace, Albert Einstein, Alvin Roth, Andrew Wiles, Automated Insights, Benoit Mandelbrot, Claude Shannon: information theory, computer vision, correlation does not imply causation, crowdsourcing, data is the new oil, Donald Trump, double helix, Douglas Hofstadter, Elon Musk, Erik Brynjolfsson, Fellow of the Royal Society, Flash crash, Gödel, Escher, Bach, Henri Poincaré, Jacquard loom, John Conway, Kickstarter, Loebner Prize, mandelbrot fractal, Minecraft, music of the spheres, Narrative Science, natural language processing, Netflix Prize, PageRank, pattern recognition, Paul Erdős, Peter Thiel, random walk, Ray Kurzweil, recommendation engine, Rubik’s Cube, Second Machine Age, Silicon Valley, speech recognition, Turing test, Watson beat the top human players on Jeopardy!, wikimedia commons

When Watson handles a difficult question in its current applications, it comes back with a set of possible outcomes – but it is also able to ask clarifying questions. Most question-answering systems are programmed to deal with a defined set of question types – meaning you can only answer certain kinds of questions, phrased in a certain ways, in order to obtain a response. Watson handles open-domain questions, meaning anything you can think of to ask it. It uses natural-language processing techniques to pick apart the words you give it, in order to understand the real question being asked, even when you ask it in an unusual way. IBM actually published a very useful FAQ about Watson and IBM’s DeepQA Project, a foundational technology utilised by Watson in generating hypotheses. The computer on Star Trek is a more suitable comparison. The fictional computer system can be seen as an interactive dialogue agent that could answer questions and provide precise info on any subject.


pages: 411 words: 98,128

Bezonomics: How Amazon Is Changing Our Lives and What the World's Best Companies Are Learning From It by Brian Dumaine

activist fund / activist shareholder / activist investor, AI winter, Airbnb, Amazon Web Services, Atul Gawande, autonomous vehicles, basic income, Bernie Sanders, Black Swan, call centre, Chris Urmson, cloud computing, corporate raider, creative destruction, Danny Hillis, Donald Trump, Elon Musk, Erik Brynjolfsson, future of work, gig economy, Google Glasses, Google X / Alphabet X, income inequality, industrial robot, Internet of things, Jeff Bezos, job automation, Joseph Schumpeter, Kevin Kelly, Lyft, Marc Andreessen, Mark Zuckerberg, money market fund, natural language processing, pets.com, plutocrats, Plutocrats, race to the bottom, ride hailing / ride sharing, Sand Hill Road, self-driving car, shareholder value, Silicon Valley, Silicon Valley startup, Snapchat, speech recognition, Steve Jobs, Stewart Brand, supply-chain management, Tim Cook: Apple, too big to fail, Travis Kalanick, Uber and Lyft, uber lyft, universal basic income, wealth creators, web application, Whole Earth Catalog

“When you’re looking to buy a coffee cup, it’s hard to describe what you want to a smart speaker.” Amazon does say it’s not overly fixated on the Echo as a shopping aid, especially given how the device ties in with the other services it offers through its Prime subscription, such as music and videos. Still, it holds out hope that the Amazon-optimized computers it has placed in customers’ homes will boost its retail business. Says Amazon’s Prasad, the natural-language-processing scientist, “If you want to buy double-A batteries, you don’t need to see them, and you don’t need to remember which ones. If you’ve never bought batteries before, we will suggest ones for you.” That suggestion, of course, often includes Amazon’s house brands. “Amazon is carpet-bombing America with these devices,” says Peter Hildick-Smith, president of the Codex-Group. “Behavioral change is the hardest thing, and companies hate to try to do it.


pages: 1,076 words: 67,364

Haskell Programming: From First Principles by Christopher Allen, Julie Moronuki

c2.com, en.wikipedia.org, natural language processing, spaced repetition, Turing complete, Turing machine, type inference, web application, Y Combinator

We met on Twitter and quickly became friends. As anyone who has encountered Chris–probably in any medium, but certainly on Twitter–knows, it doesn’t take long before he starts urging you to learn Haskell. I told him I had no interest in programming. I told him nothing and nobody had ever been able to interest me in programming before. When Chris learned of my background in linguistics, he thought I might be interested in natural language processing and exhorted me to learn Haskell for that purpose. I remained unconvinced. Then he tried a different approach. He was spending a lot of time gathering and evaluating resources for teaching Haskell and refining his pedagogical techniques, and he convinced me to try to learn Haskell so that he could gain the experience of teaching a code-neophyte. Finally, with an “anything for science” attitude, I gave in.

PARSER COMBINATORS 842 • use a parsing library to cover the basics of parsing; • demonstrate the awesome power of parser combinators; • marshall and unmarshall some JSON data; • talk about tokenization. 24.2 A few more words of introduction In this chapter, we will not look too deeply into the types of the parsing libraries we’re using, learn every sort of parser there is, or artisanally handcraft all of our parsing functions ourselves. These are thoroughly considered decisions. Parsing is a huge field of research in its own right with connections that span natural language processing, linguistics, and programming language theory. Just this topic could easily fill a book in itself (in fact, it has). The underlying types and typeclasses of the libraries we’ll be using are complicated. To be sure, if you enjoy parsing and expect to do it a lot, those are things you’d want to learn; they are simply out of the scope of this book. This chapter takes a different approach than previous chapters.


pages: 451 words: 103,606

Machine Learning for Hackers by Drew Conway, John Myles White

call centre, centre right, correlation does not imply causation, Debian, Erdős number, Nate Silver, natural language processing, Netflix Prize, p-value, pattern recognition, Paul Erdős, recommendation engine, social graph, SpamAssassin, statistical model, text mining, the scientific method, traveling salesman

Moreover, because we calculate conditional probabilities using products, if we assigned a zero probability to terms not in our training data, elementary arithmetic tells us that we would calculate zero as the probability of most messages, because we would be multiplying all the other probabilities by zero every time we encountered an unknown term. This would cause catastrophic results for our classifier because many, or even all, messages would be incorrectly assigned a zero probability of being either spam or ham. Researchers have come up with many clever ways of trying to get around this problem, such as drawing a random probability from some distribution or using natural language processing (NLP) techniques to estimate the “spamminess” of a term given its context. For our purposes, we will use a very simple rule: assign a very small probability to terms that are not in the training set. This is, in fact, a common way of dealing with missing terms in simple text classifiers, and for our purposes it will serve just fine. In this exercise, by default we will set this probability to 0.0001%, or one-ten-thousandth of a percent, which is sufficiently small for this data set.


pages: 353 words: 104,146

European Founders at Work by Pedro Gairifo Santos

business intelligence, cloud computing, crowdsourcing, fear of failure, full text search, information retrieval, inventory management, iterative process, Jeff Bezos, Joi Ito, Lean Startup, Mark Zuckerberg, natural language processing, pattern recognition, pre–internet, recommendation engine, Richard Stallman, Silicon Valley, Skype, slashdot, Steve Jobs, Steve Wozniak, subscription business, technology bubble, web application, Y Combinator

Since I finished university, I've been with two start-ups. The first start-up I was with was a mobile internet start-up based in Stockholm, where I was the first employee on the business side. So I became VP of product management there and part of my job was to find complementary code to fit in with our product, essentially. I came across code that Peter Halacsy had done. Back then he was doing research in natural language processing and we were in need of that. This company also had a development office in Cluj, Romania. When you go to Cluj from Stockholm, you fly via Budapest. My parents are from Hungary actually. When I went to Cluj, I would stop for a day in Budapest and say hi. And that's what I did. I figured since I'm in Budapest I should try to actually meet this person who had done this interesting code. So I hunted him down and managed to meet him.


pages: 396 words: 107,814

Is That a Fish in Your Ear?: Translation and the Meaning of Everything by David Bellos

Clapham omnibus, Claude Shannon: information theory, Douglas Hofstadter, Etonian, European colonialism, haute cuisine, invention of the telephone, invention of writing, natural language processing, Republic of Letters, Sapir-Whorf hypothesis, speech recognition

But common sense appeals to our total experience of the nonlinguistic world as well as to our ability to find a way through the language maze: it is precisely the kind of fuzzy, vague, and informal knowledge that distinctive feature analysis seeks to overcome and replace. Despite the usefulness of binary decomposition for some kinds of linguistic description and (in far more complex form) in the “natural language processing” that computers can now perform, word meanings can never be fully specified by atomic distinctions alone. People are just too adept at using words to mean something else. Such quasi-mathematical computation of “meaning” is equally unable to solve an even more basic problem, which is how to identify the very units whose meaning is to be specified. To ask what a word means (and translators often are asked to say what this or that word means) is to suppose that you know what word you are asking about, and that in turn requires you to know what a word is.


pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future by Kevin Kelly

A Declaration of the Independence of Cyberspace, AI winter, Airbnb, Albert Einstein, Amazon Web Services, augmented reality, bank run, barriers to entry, Baxter: Rethink Robotics, bitcoin, blockchain, book scanning, Brewster Kahle, Burning Man, cloud computing, commoditize, computer age, connected car, crowdsourcing, dark matter, dematerialisation, Downton Abbey, Edward Snowden, Elon Musk, Filter Bubble, Freestyle chess, game design, Google Glasses, hive mind, Howard Rheingold, index card, indoor plumbing, industrial robot, Internet Archive, Internet of things, invention of movable type, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Kevin Kelly, Kickstarter, lifelogging, linked data, Lyft, M-Pesa, Marc Andreessen, Marshall McLuhan, means of production, megacity, Minecraft, Mitch Kapor, multi-sided market, natural language processing, Netflix Prize, Network effects, new economy, Nicholas Carr, old-boy network, peer-to-peer, peer-to-peer lending, personalized medicine, placebo effect, planetary scale, postindustrial economy, recommendation engine, RFID, ride hailing / ride sharing, Rodney Brooks, self-driving car, sharing economy, Silicon Valley, slashdot, Snapchat, social graph, social web, software is eating the world, speech recognition, Stephen Hawking, Steven Levy, Ted Nelson, the scientific method, transport as a service, two-sided market, Uber for X, uber lyft, Watson beat the top human players on Jeopardy!, Whole Earth Review, zero-sum game

in-house AI research teams: Reed Albergotti, “Zuckerberg, Musk Invest in Artificial-Intelligence Company,” Wall Street Journal, March 21, 2014. purchased AI companies since 2014: Derrick Harris, “Pinterest, Yahoo, Dropbox and the (Kind of) Quiet Content-as-Data Revolution,” Gigaom, January 6, 2014; Derrick Harris “Twitter Acquires Deep Learning Startup Madbits,” Gigaom, July 29, 2014; Ingrid Lunden, “Intel Has Acquired Natural Language Processing Startup Indisys, Price ‘North’ of $26M, to Build Its AI Muscle,” TechCrunch, September 13, 2013; and Cooper Smith, “Social Networks Are Investing Big in Artificial Intelligence,” Business Insider, March 17, 2014. expanding 70 percent a year: Private analysis by Quid, Inc., 2014. taught an AI to learn to play: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et al., “Human-Level Control Through Deep Reinforcement Learning,” Nature 518, no. 7540 (2015): 529–33.


Python Geospatial Development - Second Edition by Erik Westra

capital controls, database schema, Firefox, Golden Gate Park, Google Earth, Mercator projection, natural language processing, openstreetmap, Silicon Valley, web application

Winwaed specialize in geospatial tools and applications including web applications, and operate the http://www.mapping-tools.com website for tools and add-ins for Microsoft's MapPoint product. Richard also manages the technical aspects of the EcoMapCostaRica.com project for the Biology Department at the University of Dallas. This includes the website, online field maps, field surveys, and the creation and comparison of panoramic photographs. Richard is also active in the field of natural language processing, especially with Python's NLTK package. Will Cadell is a principal consultant with Sparkgeo.com. He builds next generation web mapping applications, primarily using Google Maps, geoDjango, and PostGIS. He has worked in academia, government, and natural resources but now mainly consults for the start-up community in Silicon Valley. His passion has always been the implementation of geographic technology and with over a billion smart, mobile devices in the world it's a great time to be working on the geoweb.


pages: 374 words: 111,284

The AI Economy: Work, Wealth and Welfare in the Robot Age by Roger Bootle

"Robert Solow", 3D printing, agricultural Revolution, AI winter, Albert Einstein, anti-work, autonomous vehicles, basic income, Ben Bernanke: helicopter money, Bernie Sanders, blockchain, call centre, Capital in the Twenty-First Century by Thomas Piketty, Chris Urmson, computer age, conceptual framework, corporate governance, correlation does not imply causation, creative destruction, David Ricardo: comparative advantage, deindustrialization, deskilling, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, everywhere but in the productivity statistics, facts on the ground, financial intermediation, full employment, future of work, income inequality, income per capita, industrial robot, Internet of things, invention of the wheel, Isaac Newton, James Watt: steam engine, Jeff Bezos, job automation, job satisfaction, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Joseph Schumpeter, Kevin Kelly, license plate recognition, Marc Andreessen, Mark Zuckerberg, market bubble, mega-rich, natural language processing, Network effects, new economy, Nicholas Carr, Paul Samuelson, Peter Thiel, positional goods, quantitative easing, RAND corporation, Ray Kurzweil, Richard Florida, ride hailing / ride sharing, rising living standards, road to serfdom, Robert Gordon, Robert Shiller, Robert Shiller, Second Machine Age, secular stagnation, self-driving car, Silicon Valley, Simon Kuznets, Skype, social intelligence, spinning jenny, Stanislav Petrov, Stephen Hawking, Steven Pinker, technological singularity, The Future of Employment, The Wealth of Nations by Adam Smith, Thomas Malthus, trade route, universal basic income, US Airways Flight 1549, Vernor Vinge, Watson beat the top human players on Jeopardy!, We wanted flying cars, instead we got 140 characters, wealth creators, winner-take-all economy, Y2K, Yogi Berra

Quite the contrary: it is likely to lead to more consultations with medical professionals and more treatment of some sort. As a result of the use of sensors that track patients’ heart rate and blood pressure, thereby facilitating earlier identification of problems and treatment at home rather than in hospital, one possible result is a reduction in the number of people having to spend time in hospital, thereby freeing up resources for critical cases. In addition, natural language-processing technology enables doctors to transcribe and record meetings with patients with minimal effort and use of doctors’ time. A consultant labelling scans at Google’s offices said that labelling images for head and neck cancer “is a five or six hour job; usually doctors sit and do it after work.”32 Meanwhile, AI can help with triage in accident and emergency departments and help to reduce “traffic jams” in the flow of patients through different hospital departments.


pages: 401 words: 109,892

The Great Reversal: How America Gave Up on Free Markets by Thomas Philippon

airline deregulation, Amazon Mechanical Turk, Amazon Web Services, Andrei Shleifer, barriers to entry, bitcoin, blockchain, business cycle, business process, buy and hold, Carmen Reinhart, carried interest, central bank independence, commoditize, crack epidemic, cross-subsidies, disruptive innovation, Donald Trump, Erik Brynjolfsson, eurozone crisis, financial deregulation, financial innovation, financial intermediation, gig economy, income inequality, income per capita, index fund, intangible asset, inventory management, Jean Tirole, Jeff Bezos, Kenneth Rogoff, labor-force participation, law of one price, liquidity trap, low cost airline, manufacturing employment, Mark Zuckerberg, market bubble, minimum wage unemployment, money market fund, moral hazard, natural language processing, Network effects, new economy, offshore financial centre, Pareto efficiency, patent troll, Paul Samuelson, price discrimination, profit maximization, purchasing power parity, QWERTY keyboard, rent-seeking, ride hailing / ride sharing, risk-adjusted returns, Robert Bork, Robert Gordon, Ronald Reagan, Second Machine Age, self-driving car, Silicon Valley, Snapchat, spinning jenny, statistical model, Steve Jobs, supply-chain management, Telecommunications Act of 1996, The Chicago School, the payments system, The Rise and Fall of American Growth, The Wealth of Nations by Adam Smith, too big to fail, total factor productivity, transaction costs, Travis Kalanick, Vilfredo Pareto, zero-sum game

We would like to study if and how federal regulations affect industry dynamics. For that we need to build an index of regulations. How does one go about building an index of federal regulations? By using computers to read and classify the data! RegData is a relatively new database—introduced in Al-Ubaydli and McLaughlin (2017)—that aims to measure regulatory stringency at the industry level. It relies on machine learning and natural language processing techniques to count the number of restrictive words or phrases such as “shall,” “must,” and “may not” in each section of the Code of Federal Regulations and to assign them to industries. RegData represents a vast improvement over a simple measure of page counts.h Figure 5.8 shows that the decline in entry coincided with the rise of entry regulations, but this does not mean that regulations caused the decline in entry.


pages: 918 words: 257,605

The Age of Surveillance Capitalism by Shoshana Zuboff