natural language processing

157 results back to index


pages: 502 words: 107,510

Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs

Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, Free Software Foundation, game design, information retrieval, iterative process, language acquisition, machine readable, machine translation, natural language processing, pattern recognition, performance metric, power law, sentiment analysis, social web, sparse data, speech recognition, statistical model, text mining

., Maximum Entropy Classifiers maximum a posteriori (MAP) hypothesis, Naïve Bayes Learning sentiment classification, Sentiment classification Named Entities (NEs), The Annotation Development Cycle, Adding Named Entities, Inline Annotation, Example 3: Extent Annotations—Named Entities, Example 3: Extent Annotations—Named Entities as extent tags, Example 3: Extent Annotations—Named Entities and inline tagging, Inline Annotation and models, Adding Named Entities Simple Named Entity Guidelines V6.5, Example 3: Extent Annotations—Named Entities Narrative Containers, Narrative Containers–Narrative Containers natural language processing, What Is Natural Language Processing?–What Is Natural Language Processing? (see NLP (natural language processing)) Natural Language Processing with Python (Bird, Klein, and Loper), What Is Natural Language Processing?, Collecting Data from the Internet, Training: Machine Learning, Gender Identification–Gender Identification gender identification problem in, Gender Identification–Gender Identification NCSU, TempEval-2 system, TempEval-2: System Summaries neg-content-term, Decision Tree Learning Netflix, Film Genre Classification, Example 2: Multiple Labels—Film Genres New York Times, Building the Corpus NIST TREC Tracks, NLP Challenges NLP (natural language processing), The Importance of Language Annotation–The Importance of Language Annotation, The Layers of Linguistic Description–The Layers of Linguistic Description, What Is Natural Language Processing?

“TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2.” In Proceedings of the 5th International Workshop on Semantic Evaluation. Madnani, Nitin. 2007. “Getting Started on Natural Language Processing with Python.” ACM Crossroads 13(4). Updated version available at http://www.desilinguist.org/. Accessed May 16, 2012. Madnani, Nitin, and Jimmy Lin. Natural Language Processing with Hadoop and Python. http://www.cloudera.com/blog/2010/03/natural-language-processing-with-hadoopand-python/. Posted March 16, 2010. Mani, Inderjeet, Marc Verhagen, Ben Wellner, Chong Min Lee, and James Pustejovsky. 2006. Proceedings of Machine Learning of Temporal Relations.

, Collecting Data from the Internet, Training: Machine Learning, Gender Identification–Gender Identification gender identification problem in, Gender Identification–Gender Identification NCSU, TempEval-2 system, TempEval-2: System Summaries neg-content-term, Decision Tree Learning Netflix, Film Genre Classification, Example 2: Multiple Labels—Film Genres New York Times, Building the Corpus NIST TREC Tracks, NLP Challenges NLP (natural language processing), The Importance of Language Annotation–The Importance of Language Annotation, The Layers of Linguistic Description–The Layers of Linguistic Description, What Is Natural Language Processing?–What Is Natural Language Processing?, A Brief History of Corpus Linguistics–Corpora Today, Corpora Today, Kinds of Annotation–Kinds of Annotation, Kinds of Annotation–Kinds of Annotation, Kinds of Annotation, Kinds of Annotation, Kinds of Annotation, Language Data and Machine Learning–Structured Pattern Induction, The Annotation Development Cycle–The Annotation Development Cycle, Model the Phenomenon, Training: Machine Learning–Matching Annotation to Algorithms, NLP Online and in the Cloud–Shared Language Applications annotations and, Kinds of Annotation–Kinds of Annotation Cloud computing and, NLP Online and in the Cloud–Shared Language Applications corpus linguistics, A Brief History of Corpus Linguistics–Corpora Today language annotation, The Importance of Language Annotation–The Importance of Language Annotation linguistic description, The Layers of Linguistic Description–The Layers of Linguistic Description machine learning, Language Data and Machine Learning–Structured Pattern Induction, Training: Machine Learning–Matching Annotation to Algorithms MATTER methodology, The Annotation Development Cycle–The Annotation Development Cycle multimodel annotation, Model the Phenomenon n-grams, Corpora Today ontology, Kinds of Annotation POS tagsets and, Kinds of Annotation–Kinds of Annotation semantic value, Kinds of Annotation syntactic bracketing, Kinds of Annotation nonconsuming tags, Annotate with the Specification O ontology, Kinds of Annotation overfit algorithms, The Importance of Language Annotation, Algorithm Fits the Development Data Too Well overlapping clustering, Clustering and Unsupervised Learning P parsing, Defining Our Learning Task part of speech (POS) tagsets, Kinds of Annotation–Kinds of Annotation Penn TreeBank corpus, A Brief History of Corpus Linguistics, What Is a Corpus?


Natural Language Processing with Python and spaCy by Yuli Vasiliev

Bayesian statistics, computer vision, data science, database schema, Easter island, en.wikipedia.org, loose coupling, natural language processing, Skype, statistical model

BRIEF CONTENTS Introduction Chapter 1: How Natural Language Processing Works Chapter 2: The Text-Processing Pipeline Chapter 3: Working with Container Objects and Customizing spaCy Chapter 4: Extracting and Using Linguistic Features Chapter 5: Working with Word Vectors Chapter 6: Finding Patterns and Walking Dependency Trees Chapter 7: Visualizations Chapter 8: Intent Recognition Chapter 9: Storing User Input in a Database Chapter 10: Training Models Chapter 11: Deploying Your Own Chatbot Chapter 12: Implementing Web Data and Processing Images Appendix: Linguistic Primer Index CONTENTS IN DETAIL INTRODUCTION Using Python for Natural Language Processing The spaCy Library Who Should Read This Book? What’s in the Book? 1 HOW NATURAL LANGUAGE PROCESSING WORKS How Can Computers Understand Language? Mapping Words and Numbers with Word Embedding Using Machine Learning for Natural Language Processing Why Use Machine Learning for Natural Language Processing? What Is a Statistical Model in NLP? Neural Network Models Convolutional Neural Networks for NLP What Is Still on You Keywords Context Meaning Transition Summary 2 THE TEXT-PROCESSING PIPELINE Setting Up Your Working Environment Installing Statistical Models for spaCy Basic NLP Operations with spaCy Tokenization Lemmatization Applying Lemmatization for Meaning Recognition Part-of-Speech Tagging Using Part-of-Speech Tags to Find Relevant Verbs Context Is Important Syntactic Relations Try This Named Entity Recognition Summary 3 WORKING WITH CONTAINER OBJECTS AND CUSTOMIZING SPACY spaCy’s Container Objects Getting the Index of a Token in a Doc Object Iterating over a Token’s Syntactic Children The doc.sents Container The doc.noun_chunks Container Try This The Span Object Try This Customizing the Text-Processing Pipeline Disabling Pipeline Components Loading a Model Step by Step Customizing the Pipeline Components Using spaCy’s C-Level Data Structures How It Works Preparing Your Working Environment and Getting Text Files Your Cython Script Building a Cython Module Testing the Module Summary 4 EXTRACTING AND USING LINGUISTIC FEATURES Extracting and Generating Text with Part-of-Speech Tags Numeric, Symbolic, and Punctuation Tags Extracting Descriptions of Money Try This Turning Statements into Questions Try This Using Syntactic Dependency Labels in Text Processing Distinguishing Subjects from Objects Deciding What Question a Chatbot Should Ask Try This Summary 5 WORKING WITH WORD VECTORS Understanding Word Vectors Defining Meaning with Coordinates Using Dimensions to Represent Meaning The Similarity Method Choosing Keywords for Semantic Similarity Calculations Installing Word Vectors Taking Advantage of Word Vectors That Come with spaCy Models Using Third-Party Word Vectors Comparing spaCy Objects Using Semantic Similarity for Categorization Tasks Extracting Nouns as a Preprocessing Step Try This Extracting and Comparing Named Entities Summary 6 FINDING PATTERNS AND WALKING DEPENDENCY TREES Word Sequence Patterns Finding Patterns Based on Linguistic Features Try This Checking an Utterance for a Pattern Using spaCy’s Matcher to Find Word Sequence Patterns Applying Several Patterns Creating Patterns Based on Customized Features Choosing Which Patterns to Apply Using Word Sequence Patterns in Chatbots to Generate Statements Try This Extracting Keywords from Syntactic Dependency Trees Walking a Dependency Tree for Information Extraction Iterating over the Heads of Tokens Condensing a Text Using Dependency Trees Try This Using Context to Improve the Ticket-Booking Chatbot Making a Smarter Chatbot by Finding Proper Modifiers Summary 7 VISUALIZATIONS Getting Started with spaCy’s Built-In Visualizers displaCy Dependency Visualizer displaCy Named Entity Visualizer Visualizing from Within spaCy Visualizing Dependency Parsing Try This Sentence-by-Sentence Visualizations Customizing Your Visualizations with the Options Argument Using Dependency Visualizer Options Try This Using Named Entity Visualizer Options Exporting a Visualization to a File Using displaCy to Manually Render Data Formatting the Data Try This Summary 8 INTENT RECOGNITION Extracting the Transitive Verb and Direct Object for Intent Recognition Obtaining the Transitive Verb/Direct Object Pair Extracting Multiple Intents with token.conjuncts Try This Using Word Lists to Extract the Intent Finding the Meanings of Words Using Synonyms and Semantic Similarity Recognizing Synonyms Using Predefined Lists Try This Recognizing Implied Intents Using Semantic Similarity Try This Extracting Intent from a Sequence of Sentences Walking the Dependency Structures of a Discourse Replacing Proforms with Their Antecedents Try This Summary 9 STORING USER INPUT IN A DATABASE Converting Unstructured Data into Structured Data Extracting Data into Interchange Formats Moving Application Logic to the Database Building a Database-Powered Chatbot Gathering the Data and Building a JSON Object Converting Number Words to Numbers Preparing Your Database Environment Sending Data to the Underlying Database When a User’s Request Doesn’t Contain Enough Information Try This Summary 10 TRAINING MODELS Training a Model’s Pipeline Component Training the Entity Recognizer Deciding Whether You Need to Train the Entity Recognizer Creating Training Examples Automating the Example Creation Process Disabling the Other Pipeline Components The Training Process Evaluating the Updated Recognizer Creating a New Dependency Parser Custom Syntactic Parsing to Understand User Input Deciding on Types of Semantic Relations to Use Creating Training Examples Training the Parser Testing Your Custom Parser Try This Summary 11 DEPLOYING YOUR OWN CHATBOT How Implementing and Deploying a Chatbot Works Using Telegram as a Platform for Your Bot Creating a Telegram Account and Authorizing Your Bot Getting Started with the python-telegram-bot Library Using the telegram.ext Objects Creating a Telegram Chatbot That Uses spaCy Expanding the Chatbot Holding the State of the Current Chat Putting All the Pieces Together Try This Summary 12 IMPLEMENTING WEB DATA AND PROCESSING IMAGES How It Works Making Your Bot Find Answers to Questions from Wikipedia Determining What the Question Is About Try This Using Wikipedia to Answer User Questions Try This Reacting to Images Sent in a Chat Generating Descriptive Tags for Images Using Clarifai Using Tags to Generate Text Responses to Images Putting All the Pieces Together in a Telegram Bot Importing the Libraries Writing the Helper Functions Writing the Callback and main() Functions Testing the Bot Try This Summary LINGUISTIC PRIMER Dependency Grammars vs.

NATURAL LANGUAGE PROCESSING WITH PYTHON AND SPACY A Practical Introduction by Yuli Vasiliev San Francisco NATURAL LANGUAGE PROCESSING WITH PYTHON AND SPACY. Copyright © 2020 by Yuli Vasiliev. All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. ISBN-10: 1-7185-0052-1 ISBN-13: 978-1-7185-0052-5 Publisher: William Pollock Production Editors: Kassie Andreadis and Laurel Chun Cover Illustration: Gina Redman Photography: Igor Shabalin Developmental Editor: Frances Saux Technical Reviewers: Ivan Brigida and Geoff Bacon Copyeditor: Anne Marie Walker Compositor: Happenstance Type-O-Rama Proofreader: James Fraleigh Indexer: Beth Nauman-Montana For information on distribution, translations, or bulk sales, please contact No Starch Press, Inc. directly: No Starch Press, Inc. 245 8th Street, San Francisco, CA 94103 phone: 1.415.863.9900; info@nostarch.com www.nostarch.com A catalog record of this book is available from the Library of Congress.

Even so, very few people understand how these robots work or how they might use these technologies in their own projects. Natural language processing (NLP)—a branch of artificial intelligence that helps machines understand and respond to human language—is the key technology that lies at the heart of any digital assistant product. This book arms you with the skills you need to start creating your own NLP applications. By the end of this book, you’ll know how to apply NLP approaches to real-world problems, such as analyzing sentences, capturing the meaning of a text, composing original texts, and even building your own chatbot. Using Python for Natural Language Processing If you want to develop an NLP application, you can choose among a wide range of tools and technologies.


pages: 174 words: 56,405

Machine Translation by Thierry Poibeau

Alignment Problem, AlphaGo, AltaVista, augmented reality, call centre, Claude Shannon: information theory, cloud computing, combinatorial explosion, crowdsourcing, deep learning, DeepMind, easy for humans, difficult for computers, en.wikipedia.org, geopolitical risk, Google Glasses, information retrieval, Internet of things, language acquisition, machine readable, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, natural language processing, Necker cube, Norbert Wiener, RAND corporation, Robert Mercer, seminal paper, Skype, speech recognition, statistical model, technological singularity, Turing test, wikimedia commons

Processing natural languages (as opposed to processing formal languages, such as the programming languages used by computers) is difficult in itself, mainly because at the heart of natural language lie vagueness and ambiguity. Natural Languages and Ambiguity Linguists as well as computer scientists have been interested ever since the creation of computers in natural language processing, a field also called computational linguistics. Natural language processing is difficult because, by default, computers do not have any knowledge of what a language is. It is thus necessary to specify the definition of a word, a phrase, and a sentence. So far, things may not seem too difficult (however, think about expressions like: “isn’t it,” “won’t,” “U.S.,” “$80”: it is not always clear what is a word and how many words are involved in such expressions) and not so different from formal languages, which are also made of words.

Weaver wrote: “Ambiguity, moreover, attaches primarily to nouns, verbs, and adjectives; and actually (at least so I suppose) to relatively few nouns, verbs, and adjectives.” We now know that ambiguity is the most pervasive problem in natural language processing and applies to nearly all kinds of words, which makes ambiguity a much bigger problem than initially thought. Ambiguity is the most pervasive problem in natural language processing and applies to nearly all kinds of words, which makes ambiguity a much bigger problem than initially thought. The second principle was based on work done in logic and had a profound influence on the concept of formal grammar, which is used for analyzing artificial languages (particularly programming languages) as well as natural languages.

The First Evaluation Campaigns Since the beginnings of machine translation, evaluation has been perceived as necessary, more so than in other fields of natural language processing, probably because machine translation was seen from the beginning as an applicative field and very concrete results were expected. We have seen in this regard that the ALPAC report was very negative and rather skeptical about the quality that could be hoped for from such systems (see chapter 6). At the beginning of the 1990s, with the renewal of research based on a statistical approach originally proposed by IBM, the need to measure machine translation systems was again felt. As is often the case in the field of natural language processing, it was an American funding agency, the Advanced Research Project Agency (ARPA, later known as DARPA1), that initiated research in this area.


pages: 350 words: 98,077

Artificial Intelligence: A Guide for Thinking Humans by Melanie Mitchell

Ada Lovelace, AI winter, Alignment Problem, AlphaGo, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, artificial general intelligence, autonomous vehicles, backpropagation, Bernie Sanders, Big Tech, Boston Dynamics, Cambridge Analytica, Charles Babbage, Claude Shannon: information theory, cognitive dissonance, computer age, computer vision, Computing Machinery and Intelligence, dark matter, deep learning, DeepMind, Demis Hassabis, Douglas Hofstadter, driverless car, Elon Musk, en.wikipedia.org, folksonomy, Geoffrey Hinton, Gödel, Escher, Bach, I think there is a world market for maybe five computers, ImageNet competition, Jaron Lanier, job automation, John Markoff, John von Neumann, Kevin Kelly, Kickstarter, license plate recognition, machine translation, Mark Zuckerberg, natural language processing, Nick Bostrom, Norbert Wiener, ought to be enough for anybody, paperclip maximiser, pattern recognition, performance metric, RAND corporation, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Rodney Brooks, self-driving car, sentiment analysis, Silicon Valley, Singularitarianism, Skype, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, tacit knowledge, tail risk, TED Talk, the long tail, theory of mind, There's no reason for any individual to have a computer in his home - Ken Olsen, trolley problem, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, world market for maybe five computers

After all, understanding language—including the parts that are left unsaid—is a fundamental part of human intelligence. It’s no accident that Alan Turing framed his famous “imitation game” as a contest involving the generation and understanding of language. This part of the book deals with natural-language processing, which means “getting computers to deal with human language.” (In AI-speak, “natural” means “human.”) Natural-language processing (abbreviated NLP) includes topics such as speech recognition, web search, automated question answering, and machine translation. Similar to what we’ve seen in previous chapters, deep learning has been the driving force behind most of the recent advances in NLP.

A person who asks for a hamburger cooked rare but gets a burned one instead will not be happy. If someone says that a movie is “too dark for my taste,” then the person didn’t like it. While natural-language processing by machines has come a long way, I don’t believe that machines will be able to fully understand human language until they have humanlike common sense. This being said, natural-language processing systems are becoming ever more ubiquitous in our lives—transcribing our words, analyzing our sentiments, translating our documents, and answering our questions. Does the lack of humanlike understanding in such systems, however sophisticated their performance, inevitably result in their being brittle, unreliable, and vulnerable to attack?

Once at Dartmouth, McCarthy persuaded Minsky, Shannon, and Rochester to help him organize “a 2 month, 10 man study of artificial intelligence to be carried out during the summer of 1956.”1 The term artificial intelligence was McCarthy’s invention; he wanted to distinguish this field from a related effort called cybernetics.2 McCarthy later admitted that no one really liked the name—after all, the goal was genuine, not “artificial,” intelligence—but “I had to call it something, so I called it ‘Artificial Intelligence.’”3 The four organizers submitted a proposal to the Rockefeller Foundation asking for funding for the summer workshop. The proposed study was, they wrote, based on “the conjecture that every aspect of learning or any other feature of intelligence can be in principle so precisely described that a machine can be made to simulate it.”4 The proposal listed a set of topics to be discussed—natural-language processing, neural networks, machine learning, abstract concepts and reasoning, creativity—that have continued to define the field to the present day. Even though the most advanced computers in 1956 were about a million times slower than today’s smartphones, McCarthy and colleagues were optimistic that AI was in close reach: “We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.”5 Obstacles soon arose that would be familiar to anyone organizing a scientific workshop today.


pages: 504 words: 89,238

Natural language processing with Python by Steven Bird, Ewan Klein, Edward Loper

bioinformatics, business intelligence, business logic, Computing Machinery and Intelligence, conceptual framework, Donald Knuth, duck typing, elephant in my pajamas, en.wikipedia.org, finite state, Firefox, functional programming, Guido van Rossum, higher-order functions, information retrieval, language acquisition, lolcat, machine translation, Menlo Park, natural language processing, P = NP, search inside the book, sparse data, speech recognition, statistical model, text mining, Turing test, W. E. B. Du Bois

Natural Language Processing with Python Natural Language Processing with Python Steven Bird, Ewan Klein, and Edward Loper Beijing • Cambridge • Farnham • Köln • Sebastopol • Taipei • Tokyo Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper Copyright © 2009 Steven Bird, Ewan Klein, and Edward Loper. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com).

Managing Linguistic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 11.1 11.2 11.3 11.4 Corpus Structure: A Case Study The Life Cycle of a Corpus Acquiring Data Working with XML 407 412 416 425 Table of Contents | vii 11.5 11.6 11.7 11.8 11.9 Working with Toolbox Data Describing Language Resources Using OLAC Metadata Summary Further Reading Exercises 431 435 437 437 438 Afterword: The Language Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 NLTK Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 General Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 viii | Table of Contents Preface This is a book about Natural Language Processing. By “natural language” we mean a language that is used for everyday communication by humans; languages such as English, Hindi, or Portuguese. In contrast to artificial languages such as programming languages and mathematical notations, natural languages have evolved as they pass from generation to generation, and are hard to pin down with explicit rules. We will take Natural Language Processing—or NLP for short—in a wide sense to cover any kind of computer manipulation of natural language.

Coping with syntactic ambiguity or how to put the block in the box on the table. American Journal of Computational Linguistics, 8:139–149, 1982. [Cohen and Hunter, 2004] K. Bretonnel Cohen and Lawrence Hunter. Natural language processing and systems biology. In Werner Dubitzky and Francisco Azuaje, editors, Artificial Intelligence Methods and Tools for Systems Biology, page 147–174 Springer Verlag, 2004. [Cole, 1997] Ronald Cole, editor. Survey of the State of the Art in Human Language Technology. Studies in Natural Language Processing. Cambridge University Press, 1997. [Copestake, 2002] Ann Copestake. Implementing Typed Feature Structure Grammars. CSLI Publications, Stanford, CA, 2002.


pages: 424 words: 114,905

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again by Eric Topol

"World Economic Forum" Davos, 23andMe, Affordable Care Act / Obamacare, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic bias, AlphaGo, Apollo 11, artificial general intelligence, augmented reality, autism spectrum disorder, autonomous vehicles, backpropagation, Big Tech, bioinformatics, blockchain, Cambridge Analytica, cloud computing, cognitive bias, Colonization of Mars, computer age, computer vision, Computing Machinery and Intelligence, conceptual framework, creative destruction, CRISPR, crowdsourcing, Daniel Kahneman / Amos Tversky, dark matter, data science, David Brooks, deep learning, DeepMind, Demis Hassabis, digital twin, driverless car, Elon Musk, en.wikipedia.org, epigenetics, Erik Brynjolfsson, fake news, fault tolerance, gamification, general purpose technology, Geoffrey Hinton, George Santayana, Google Glasses, ImageNet competition, Jeff Bezos, job automation, job satisfaction, Joi Ito, machine translation, Mark Zuckerberg, medical residency, meta-analysis, microbiome, move 37, natural language processing, new economy, Nicholas Carr, Nick Bostrom, nudge unit, OpenAI, opioid epidemic / opioid crisis, pattern recognition, performance metric, personalized medicine, phenotype, placebo effect, post-truth, randomized controlled trial, recommendation engine, Rubik’s Cube, Sam Altman, self-driving car, Silicon Valley, Skinner box, speech recognition, Stephen Hawking, techlash, TED Talk, text mining, the scientific method, Tim Cook: Apple, traumatic brain injury, trolley problem, War on Poverty, Watson beat the top human players on Jeopardy!, working-age population

The infant’s prognosis, including both brain damage and death, was bleak. A blood sample was sent to Rady’s Genomic Institute for a rapid whole-genome sequencing. The sequence encompassed 125 gigabytes of data, including nearly 5 million locations where the child’s genome differed from the most common one. It took twenty seconds for a form of AI called natural-language processing to ingest the boy’s electronic medical record and determine eighty-eight phenotype features (almost twenty times more than the doctors had summarized in their problem list). Machine-learning algorithms quickly sifted the approximately 5 million genetic variants to find the roughly 700,000 rare ones.

He was unsure of the diagnosis of rheumatoid arthritis, so he posted on the HumanDx app “35F with pain and joint stiffness in L/R hands X 6 months, suspected rheumatoid arthritis.” He also uploaded a picture of her inflamed hands. Within hours, multiple rheumatologists confirmed the diagnosis. Human Dx intends to recruit at least 100,000 doctors by 2022 and increase the use of natural-language-processing algorithms to direct the key data to the appropriate specialists, combining AI tools with doctor crowdsourcing. An alternative model for crowdsourcing to improve diagnosis incorporates citizen science. Developed by CrowdMed, the platform sets up a financially incentivized competition among doctors and lay people to crack difficult diagnostic cases.

It’s also useful to think of algorithms as existing on a continuum from those that are entirely human guided to those that are entirely machine guided, with deep learning at the far machine end of the scale.12 Artificial Intelligence—the science and engineering of creating intelligent machines that have the ability to achieve goals like humans via a constellation of technologies Neural Network (NN)—software constructions modeled after the way adaptable neurons in the brain were understood to work instead of human guided rigid instructions Deep Learning—a type of neural network, the subset of machine learning composed of algorithms that permit software to train itself to perform tasks by processing multilayered networks of data Machine Learning—computers’ ability to learn without being explicitly programmed, with more than fifteen different approaches like Random Forest, Bayesian networks, Support Vector machine uses, computer algorithms to learn from examples and experiences (datasets) rather than predefined, hard rules-based methods Supervised Learning—an optimization, trial-and-error process based on labeled data, algorithm comparing outputs with the correct outputs during training Unsupervised Learning—the training samples are not labeled; the algorithm just looks for patterns, teaches itself Convolutional Neural Network—using the principle of convolution, a mathematical operation that basically takes two functions to produce a third one; instead of feeding in the entire dataset, it is broken into overlapping tiles with small neural networks and max-pooling, used especially for images Natural-Language Processing—a machine’s attempt to “understand” speech or written language like humans Generative Adversarial Networks—a pair of jointly trained neural networks, one generative and the other discriminative, whereby the former generates fake images and the latter tries to distinguish them from real images Reinforcement Learning—a type of machine learning that shifts the focus to an abstract goal or decision making, a technology for learning and executing actions in the real world Recurrent Neural Network—for tasks that involve sequential inputs, like speech or language, this neural network processes an input sequence one element at a time Backpropagation—an algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation on the previous layer passing values backward through the network; how the synapses get updated over time; signals are automatically sent back through the network to update and adjust the weighting values Representation Learning—set of methods that allows a machine with raw data to automatically discover the representations needed for detection or classification Transfer Learning—the ability of an AI to learn from different tasks and apply its precedent knowledge to a completely new task General Artificial Intelligence—perform a wide range of tasks, including any human task, without being explicitly programmed TABLE 4.1: Glossary.


pages: 523 words: 61,179

Human + Machine: Reimagining Work in the Age of AI by Paul R. Daugherty, H. James Wilson

3D printing, AI winter, algorithmic management, algorithmic trading, AlphaGo, Amazon Mechanical Turk, Amazon Robotics, augmented reality, autonomous vehicles, blockchain, business process, call centre, carbon footprint, circular economy, cloud computing, computer vision, correlation does not imply causation, crowdsourcing, data science, deep learning, DeepMind, digital twin, disintermediation, Douglas Hofstadter, driverless car, en.wikipedia.org, Erik Brynjolfsson, fail fast, friendly AI, fulfillment center, future of work, Geoffrey Hinton, Hans Moravec, industrial robot, Internet of things, inventory management, iterative process, Jeff Bezos, job automation, job satisfaction, knowledge worker, Lyft, machine translation, Marc Benioff, natural language processing, Neal Stephenson, personalized medicine, precision agriculture, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, robotic process automation, Rodney Brooks, Salesforce, Second Machine Age, self-driving car, sensor fusion, sentiment analysis, Shoshana Zuboff, Silicon Valley, Snow Crash, software as a service, speech recognition, tacit knowledge, telepresence, telepresence robot, text mining, the scientific method, uber lyft, warehouse automation, warehouse robotics

For instance, Minsky, with Seymour Papert, wrote what was considered the foundational book on scope and limitations of neural networks, a kind of AI that uses biological neurons as its model. Other ideas like expert systems—wherein a computer contained deep stores of “knowledge” for specific domains like architecture or medical diagnosis—and natural language processing, computer vision, and mobile robotics can also be traced back to the event. One conference participant was Arthur Samuel, an engineer at IBM who was building a computer program to play checkers. His program would assess the current state of a checkers board and calculate the probability that a given position could lead to a win.

Because the read-sort-route process is clearly defined, it is in some ways an excellent example of a process ripe for automation. But because the incoming information is text-based and is considered “unstructured” in the eyes of software systems, parsing could have been difficult for a less advanced system. Enter AI. Virgin Trains has now installed a machine-learning platform, inSTREAM, with natural-language processing capabilities that can recognize patterns in unstructured data by analyzing a corpus of similar examples—in this case, complaints—and by tracking how customer service representatives interact with incoming text. Now when a complaint arrives at Virgin Trains, it’s automatically read, sorted, and packaged into a case ready file that an employee can quickly review and process.

Aida is showing that automated natural-language customer communications are possible in large and complex business environments. As natural-language techniques improve and interfaces advance, they will continue spreading throughout different business functions in various industries. In chapter 4 we’ll discuss how various natural-language processing chatbots like Amazon’s Alexa are becoming the new front-office faces of companies. Redefining an Entire Industry As AI becomes increasingly capable of adding intelligence to middle- and back-office processes, the technology could potentially redefine entire industries. In IT security, for instance, a growing number of security firms are combining machine-learning approaches to build ultra-smart, continually evolving defenses against malicious software.


pages: 161 words: 39,526

Applied Artificial Intelligence: A Handbook for Business Leaders by Mariya Yao, Adelyn Zhou, Marlene Jia

Airbnb, algorithmic bias, AlphaGo, Amazon Web Services, artificial general intelligence, autonomous vehicles, backpropagation, business intelligence, business process, call centre, chief data officer, cognitive load, computer vision, conceptual framework, data science, deep learning, DeepMind, en.wikipedia.org, fake news, future of work, Geoffrey Hinton, industrial robot, information security, Internet of things, iterative process, Jeff Bezos, job automation, machine translation, Marc Andreessen, natural language processing, new economy, OpenAI, pattern recognition, performance metric, price discrimination, randomized controlled trial, recommendation engine, robotic process automation, Salesforce, self-driving car, sentiment analysis, Silicon Valley, single source of truth, skunkworks, software is eating the world, source of truth, sparse data, speech recognition, statistical model, strong AI, subscription business, technological singularity, The future is already here

The outpouring of responses provoked a government response and led UNICEF and Liberia’s Minister of Education to collaborate on a plan to stop the abuse of authority. In many parts of the world, citizens can’t utilize the feature-rich but data-intensive mobile apps that many of us enjoy due to bandwidth limitations and limited access to phones with up-to-date features. Being limited to voice calls and SMS means that technologies like natural language processing (NLP), dialog systems, and conversational bots become critically important to delivering value. Medical Diagnosis AI can dramatically streamline and improve medical care and our overall health and wellbeing. The fields of pathology and radiology, both of which rely largely on trained human eyes to spot anomalies, are being revolutionized by advancements in computer vision.

Retrieved from http://ureport.in/story/194/ (25) Study Finds Computers Surpass Pathologists in Predicting Lung Cancer Type, Severity. (2016). The ASCO Post. Retrieved from http://www.ascopost.com/News/43849 (26) Patel, T. A., Puppala, M., Ogunti, R. O., Ensor, J. E., He, T., Shewale, J. B., & Chang, J. C. (2016). Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods. Cancer, 123(1), 114-121. doi:10.1002/cncr.30245 (27) As validated against a gold standard review conducted on a sample of records by the study’s co-authors, which required 50 to 70 hours. (28) Csail, A. C. (2017, October 16). Using artificial intelligence to improve early breast cancer detection.

In most cases, having and using a fantastic machine learning algorithm is less important than deploying a well-designed user experience (UX) for your products. Thoughtful UX design that delights users will drive up engagement, which in turn increases the interactions you can capture for future data and analysis. Thoughtful UX compensates for areas where AI capabilities may be lacking, such as in natural language processing (NLP) for open-domain conversation. In order to develop “thoughtful UX," you’ll need both strong product development and engineering talent as well as partners who have domain expertise and business acumen. A common pattern observed in both academia and industry engineering teams is their propensity to optimize for tactical wins over strategic initiatives.


pages: 301 words: 85,126

AIQ: How People and Machines Are Smarter Together by Nick Polson, James Scott

Abraham Wald, Air France Flight 447, Albert Einstein, algorithmic bias, Amazon Web Services, Atul Gawande, autonomous vehicles, availability heuristic, basic income, Bayesian statistics, Big Tech, Black Lives Matter, Bletchley Park, business cycle, Cepheid variable, Checklist Manifesto, cloud computing, combinatorial explosion, computer age, computer vision, Daniel Kahneman / Amos Tversky, data science, deep learning, DeepMind, Donald Trump, Douglas Hofstadter, Edward Charles Pickering, Elon Musk, epigenetics, fake news, Flash crash, Grace Hopper, Gödel, Escher, Bach, Hans Moravec, Harvard Computers: women astronomers, Higgs boson, index fund, information security, Isaac Newton, John von Neumann, late fees, low earth orbit, Lyft, machine translation, Magellanic Cloud, mass incarceration, Moneyball by Michael Lewis explains big data, Moravec's paradox, more computing power than Apollo, natural language processing, Netflix Prize, North Sea oil, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, p-value, pattern recognition, Pierre-Simon Laplace, ransomware, recommendation engine, Ronald Reagan, Salesforce, self-driving car, sentiment analysis, side project, Silicon Valley, Skype, smart cities, speech recognition, statistical model, survivorship bias, systems thinking, the scientific method, Thomas Bayes, Uber for X, uber lyft, universal basic income, Watson beat the top human players on Jeopardy!, young professional

See health care and medicine Medtronic Menger, Karl Microsoft Microsoft Azure modeling assumptions and deep-learning models imputation and Inception latent feature massive models missing data and model rust natural language processing and prediction rules as reality versus rules-based (top-down) models training the model Moneyball Moore’s law Moravec paradox Morgenstern, Oskar Musk, Elon natural language processing (NLP) ambiguity and bottom-up approach chatbots digital assistants future trends Google Translate growth of statistical NLP knowing how versus knowing that natural language revolution “New Deal” for human-machine linguistic interaction prediction rules and programing language revolution robustness and rule bloat and speech recognition top-down approach word co-location statistics word vectors naturally occurring radioactive materials (NORM) Netflix Crown, The (series) data scientists history of House of Cards (series) Netflix Prize for recommender system personalization recommender systems neural networks deep learning and Friends new episodes and Inception model prediction rules and New England Patriots Newton, Isaac Nightingale, Florence coxcomb diagram (1858) Crimean War and early years and training evidence-based medicine legacy of “lady with the lamp” medical statistics legacy of nursing reform legacy of Nvidia Obama, Barack Office of Scientific Research and Development parallax pattern recognition cucumber sorting input and output learning a pattern maximum heart rate and prediction rules and toilet paper theft and See also prediction rules PayPal personalization conditional probability and latent feature models and Netflix and Wald’s survivability recommendations for aircraft and See also recommender systems; suggestion engines philosophy Pickering, Edward C.

The policy didn’t work out so well, though—and neither did the gift, which didn’t say “Reset” in Russian after all, but “Overcharge.” The second thing to keep in mind is that machines are getting better at language—fast. (You must admit that “wang bang” is a creative piece of boxing commentary.) Experts in AI use the term “natural language processing,” or NLP, to describe how we get computers to work with language. Over the last few years, you’ve been living through a period of tremendous growth in successful NLP systems: • Digital assistants like Amazon’s Echo and Google Home are far better than the clunky speech-to-text programs of just a few years ago.

Harpy seemed to suggest that, with better rules and faster computers, human-level performance might be just around the corner.21 Yet these hoped-for improvements in speech recognition never materialized. In later tests involving real-world conditions, Harpy’s word-level accuracy fell to 37%. After five years, the U.S. government cut funding for the project. And today, pure rules-based systems for natural language processing have become vanishingly rare. In the end, they were never able to overcome three basic problems: rule bloat, robustness, and ambiguity. Problem 1: Rule Bloat First, it’s really hard to write down all the rules for natural languages. There are way too many of them, vastly more than any programming language.


Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data by Dipanjan Sarkar

bioinformatics, business intelligence, business logic, computer vision, continuous integration, data science, deep learning, Dr. Strangelove, en.wikipedia.org, functional programming, general-purpose programming language, Guido van Rossum, information retrieval, Internet of things, invention of the printing press, iterative process, language acquisition, machine readable, machine translation, natural language processing, out of africa, performance metric, premature optimization, recommendation engine, self-driving car, semantic web, sentiment analysis, speech recognition, statistical model, text mining, Turing test, web application

Besides these popular corpora, there are a vast number of text corpora available that you can check and access with the nltk.corpus module. Thus, you can see how easy it is to access and use data from any text corpus with the help of Python and NLTK. This brings us to the end of our discussion about text corpora. The following sections cover some ground regarding NLP and text analytics. Natural Language Processing I’ve mentioned the term natural language processing (NLP) several times in this chapter. By now, you may have formed some idea about what NLP means. NLP is defined as a specialized field of computer science and engineering and artificial intelligence with roots in computational linguistics. It is primarily concerned with designing and building applications and systems that enable interaction between machines and natural languages evolved for use by humans.

Analytics, data science, and more recently text analytics came much later, perhaps around four or five years ago when the hype about Big Data and Analytics was getting bigger and crazier. Personally I think a lot of it is over-hyped, but a lot of it is also exciting and presents huge possibilities with regard to new jobs, new discoveries, and solving problems that were previously deemed impossible to solve. Natural Language Processing (NLP) has always caught my eye because the human brain and our cognitive abilities are really fascinating. The ability to communicate information, complex thoughts, and emotions with such little effort is staggering once you think about trying to replicate that ability in machines. Of course, we are advancing by leaps and bounds with regard to cognitive computing and artificial intelligence (AI), but we are not there yet.

The Philosophy of Language Language Acquisition and Usage Linguistics Language Syntax and Structure Words Phrases Clauses Grammar Word Order Typology Language Semantics Lexical Semantic Relations Semantic Networks and Models Representation of Semantics Text Corpora Corpora Annotation and Utilities Popular Corpora Accessing Text Corpora Natural Language Processing Machine Translation Speech Recognition Systems Question Answering Systems Contextual Recognition and Resolution Text Summarization Text Categorization Text Analytics Summary Chapter 2:​ Python Refresher Getting to Know Python The Zen of Python Applications:​ When Should You Use Python?​


pages: 541 words: 109,698

Mining the Social Web: Finding Needles in the Social Haystack by Matthew A. Russell

Andy Rubin, business logic, Climategate, cloud computing, crowdsourcing, data science, en.wikipedia.org, fault tolerance, Firefox, folksonomy, full text search, Georg Cantor, Google Earth, information retrieval, machine readable, Mark Zuckerberg, natural language processing, NP-complete, power law, Saturday Night Live, semantic web, Silicon Valley, slashdot, social graph, social web, sparse data, statistical model, Steve Jobs, supply-chain management, text mining, traveling salesman, Turing test, web application

(Chapter 8 introduces a fundamental paradigm shift away from the tools in this chapter and should make the differences more pronounced than they may seem if you haven’t read that material yet.) If you’d like to try applying the techniques from this chapter to the Web (in general), you might want to check out Scrapy, an easy-to-use and mature web scraping and crawling framework. Chapter 8. Blogs et al.: Natural Language Processing (and Beyond) This chapter is a modest attempt to introduce Natural Language Processing (NLP) and apply it to the unstructured data in blogs. In the spirit of the prior chapters, it attempts to present the minimal level of detail required to empower you with a solid general understanding of an inherently complex topic, while also providing enough of a technical drill-down that you’ll be able to immediately get to work mining some data.

plotting geo data via microform.at and Google Maps, Plotting geo data via microform.at and Google Maps hRecipe, Slicing and Dicing Recipes (for the Health of It), Slicing and Dicing Recipes (for the Health of It) hReview data for recipe reviews, Collecting Restaurant Reviews, Collecting Restaurant Reviews popular, for embedding structured data into web pages, XFN and Friends semantic markup, XFN and Friends XFN, XFN and Friends, Exploring Social Connections with XFN, Brief analysis of breadth-first techniques using to explore social connections, Exploring Social Connections with XFN, Brief analysis of breadth-first techniques multiquery (FQL), Slicing and dicing data with FQL N n-gram similarity, Common Similarity Metrics for Clustering n-grams, Common Similarity Metrics for Clustering, Buzzing on Bigrams defined, Common Similarity Metrics for Clustering n-squared problem, Motivation for Clustering natural language processing, Frequency Analysis and Lexical Diversity (see NLP) Natural Language Toolkit, Frequency Analysis and Lexical Diversity (see NLTK) natural numbers, Elementary Set Operations nested query (FQL), Slicing and dicing data with FQL NetworkX, Installing Python Development Tools, Installing Python Development Tools, Extracting relationships from the tweets, Extracting relationships from the tweets, Constructing Friendship Graphs, Clique Detection and Analysis building graph describing retweet data, Extracting relationships from the tweets, Extracting relationships from the tweets exporting Redis friend/follower data to for graph analysis, Constructing Friendship Graphs finding cliques in Twitter friendship data, Clique Detection and Analysis installing, Installing Python Development Tools using to create graph of nodes and edges, Installing Python Development Tools *nix (Linux/Unix) environment, Or Not to Read This Book? NLP (natural language processing), Blogs et al.: Natural Language Processing (and Beyond), Closing Remarks, NLP: A Pareto-Like Introduction, A Brief Thought Exercise, A Typical NLP Pipeline with NLTK, A Typical NLP Pipeline with NLTK, Sentence Detection in Blogs with NLTK, Sentence Detection in Blogs with NLTK, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm, Analysis of Luhn’s Summarization Algorithm, Entity-Centric Analysis: A Deeper Understanding of the Data, Quality of Analytics, Quality of Analytics entity-centric analysis, Entity-Centric Analysis: A Deeper Understanding of the Data, Quality of Analytics, Quality of Analytics quality of analytics, Quality of Analytics sentence detection in blogs with NLTK, Sentence Detection in Blogs with NLTK, Sentence Detection in Blogs with NLTK summarizing documents, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm, Analysis of Luhn’s Summarization Algorithm analysis of Luhn’s algorithm, Analysis of Luhn’s Summarization Algorithm syntax and semantics, NLP: A Pareto-Like Introduction thought exercise, A Brief Thought Exercise typical NLP pipeline with NLTK, A Typical NLP Pipeline with NLTK, A Typical NLP Pipeline with NLTK NLTK (Natural Language Toolkit), Frequency Analysis and Lexical Diversity, Frequency Analysis and Lexical Diversity, What are people talking about right now?

Although these are not difficult to compute, we’d be better off installing a tool that offers a built-in frequency distribution and many other tools for text analysis. The Natural Language Toolkit (NLTK) is a popular module we’ll use throughout this book: it delivers a vast amount of tools for various kinds of text analytics, including the calculation of common metrics, information extraction, and natural language processing (NLP). Although NLTK isn’t necessarily state-of-the-art as compared to ongoing efforts in the commercial space and academia, it nonetheless provides a solid and broad foundation—especially if this is your first experience trying to process natural language. If your project is sufficiently sophisticated that the quality or efficiency that NLTK provides isn’t adequate for your needs, you have approximately three options, depending on the amount of time and money you are willing to put in: scour the open source space for a suitable alternative by running comparative experiments and benchmarks, churn through whitepapers and prototype your own toolkit, or license a commercial product.


pages: 122 words: 29,286

Learning Scikit-Learn: Machine Learning in Python by Raúl Garreta, Guillermo Moncecchi

computer vision, Debian, Everything should be made as simple as possible, Higgs boson, Large Hadron Collider, natural language processing, Occam's razor, Silicon Valley

Also, I would like to have a special mention to the open source Python and scikit-learn community for their dedication and professionalism in developing these beautiful tools. Guillermo Moncecchi is a Natural Language Processing researcher at the Universidad de la República of Uruguay. He received a PhD in Informatics from the Universidad de la República, Uruguay and a Ph.D in Language Sciences from the Université Paris Ouest, France. He has participated in several international projects on NLP. He has almost 15 years of teaching experience on Automata Theory, Natural Language Processing, and Machine Learning. He also works as Head Developer at the Montevideo Council and has lead the development of several public services for the council, particularly in the Geographical Information Systems area.

ISBN 978-1-78328-193-0 www.packtpub.com Cover Image by Faiz Fattohi (<faizfattohi@gmail.com>) Credits Authors Raúl Garreta Guillermo Moncecchi Reviewers Andreas Hjortgaard Danielsen Noel Dawe Gavin Hackeling Acquisition Editors Kunal Parikh Owen Roberts Commissioning Editor Deepika Singh Technical Editors Shashank Desai Iram Malik Copy Editors Sarang Chari Janbal Dharmaraj Aditya Nair Project Coordinator Aboli Ambardekar Proofreader Katherine Tarr Indexer Monica Ajmera Mehta Graphics Abhinash Sahu Production Co-ordinator Pooja Chiplunkar Cover Work Pooja Chiplunkar About the Authors Raúl Garreta is a Computer Engineer with much experience in the theory and application of Artificial Intelligence (AI), where he specialized in Machine Learning and Natural Language Processing (NLP). He has an entrepreneur profile with much interest in the application of science, technology, and innovation to the Internet industry and startups. He has worked in many software companies, handling everything from video games to implantable medical devices. In 2009, he co-founded Tryolabs with the objective to apply AI to the development of intelligent software products, where he performs as the CTO and Product Manager of the company.

We have the training data where each instance has an input (a set of attributes) and a desired output (a target class). Then we use this data to train a model that will predict the same target class for new unseen instances. Supervised learning methods are nowadays a standard tool in a wide range of disciplines, from medical diagnosis to natural language processing, image recognition, and searching for new particles at the Large Hadron Collider (LHC). In this chapter we will present several methods applied to several real-world examples by using some of the many algorithms implemented in scikit-learn. This chapter does not intend to substitute the scikit-learn reference, but is an introduction to the main supervised learning techniques and shows how they can be used to solve practical problems.


pages: 521 words: 118,183

The Wires of War: Technology and the Global Struggle for Power by Jacob Helberg

"World Economic Forum" Davos, 2021 United States Capitol attack, A Declaration of the Independence of Cyberspace, active measures, Affordable Care Act / Obamacare, air gap, Airbnb, algorithmic management, augmented reality, autonomous vehicles, Berlin Wall, Bernie Sanders, Big Tech, bike sharing, Black Lives Matter, blockchain, Boris Johnson, Brexit referendum, cable laying ship, call centre, Cambridge Analytica, Cass Sunstein, cloud computing, coronavirus, COVID-19, creative destruction, crisis actor, data is the new oil, data science, decentralized internet, deep learning, deepfake, deglobalization, deindustrialization, Deng Xiaoping, deplatforming, digital nomad, disinformation, don't be evil, Donald Trump, dual-use technology, Edward Snowden, Elon Musk, en.wikipedia.org, end-to-end encryption, fail fast, fake news, Filter Bubble, Francis Fukuyama: the end of history, geopolitical risk, glass ceiling, global pandemic, global supply chain, Google bus, Google Chrome, GPT-3, green new deal, information security, Internet of things, Jeff Bezos, Jeffrey Epstein, John Markoff, John Perry Barlow, knowledge economy, Larry Ellison, lockdown, Loma Prieta earthquake, low earth orbit, low skilled workers, Lyft, manufacturing employment, Marc Andreessen, Mark Zuckerberg, Mary Meeker, Mikhail Gorbachev, military-industrial complex, Mohammed Bouazizi, move fast and break things, Nate Silver, natural language processing, Network effects, new economy, one-China policy, open economy, OpenAI, Parler "social media", Peter Thiel, QAnon, QR code, race to the bottom, Ralph Nader, RAND corporation, reshoring, ride hailing / ride sharing, Ronald Reagan, Russian election interference, Salesforce, Sam Altman, satellite internet, self-driving car, Sheryl Sandberg, side project, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, Skype, smart grid, SoftBank, Solyndra, South China Sea, SpaceX Starlink, Steve Jobs, Steven Levy, Stuxnet, supply-chain attack, Susan Wojcicki, tech worker, techlash, technoutopianism, TikTok, Tim Cook: Apple, trade route, TSMC, Twitter Arab Spring, uber lyft, undersea cable, Unsafe at Any Speed, Valery Gerasimov, vertical integration, Wargames Reagan, Westphalian system, white picket fence, WikiLeaks, Y Combinator, zero-sum game

But that’s changing. We now face security risks stemming from unprecedented advances in “natural language processing”—basically, applying those deep learning neural networks to process or generate human-sounding speech. When you ask, “Hey Siri, what’s the weather today?” or when your wife says “Alexa, play Hamilton” for the 500th time during lockdown, your device’s natural language processing abilities are what enable it to interpret your voice and act on those commands. Rudimentary chatbots—AI programs that use natural language processing to analyze and reply to messages—already exist. You may have encountered them while raising a customer service issue with your bank or insurance company.

Before the 2019 Ukrainian elections, they tried paying Ukrainians to hand over their Facebook pages to Russian propagandists.134 A group of Stanford and Facebook researchers detected a similar Russian effort in late 2019 in a number of African countries—cooked up once again by Putin’s prolific chef Yevgeniy Prigozhin—with the Russians using locals or existing news outlets to produce posts praising Russia and criticizing the United States and France.135 Alex Stamos, Facebook’s former chief security officer, suspects that Russians are already doing this in the United States as well.136 Here’s another troubling trend. As counterintelligence analysts get smarter and natural language processing algorithms improve, malign actors are increasingly spreading their falsehoods via screenshots. Images are less searchable than text, making disinformation harder to detect and trace back to its source.137 Trolls are also working harder to appear authentic, dropping many of the traits that made them easy to identify.

What Smith and Browne delicately term “a small group of American pranksters” immediately hijacked Tay, who began spewing racist and sexist comments from the darkest recesses of the Internet—“I fucking hate feminists”; “Hitler was right”; “Bush did 9/11.” Within twenty-four hours, Microsoft pulled the plug on Tay.43 But these hiccups won’t hold back AI-powered language generation forever. Indeed, natural language processing is only getting more sophisticated, in ways that could be quite frightening. Better language abilities could make it easier for trolls to spread propaganda—and harder for us to identify them. In 2019, OpenAI fed an algorithm the words “Russia has declared war on the United States after Donald Trump accidentally…” The algorithm proceeded to generate the following realistic—and perilous—sentences: Russia has declared war on the United States after Donald Trump accidentally fired a missile in the air.


pages: 347 words: 97,721

Only Humans Need Apply: Winners and Losers in the Age of Smart Machines by Thomas H. Davenport, Julia Kirby

"World Economic Forum" Davos, AI winter, Amazon Robotics, Andy Kessler, Apollo Guidance Computer, artificial general intelligence, asset allocation, Automated Insights, autonomous vehicles, basic income, Baxter: Rethink Robotics, behavioural economics, business intelligence, business process, call centre, carbon-based life, Clayton Christensen, clockwork universe, commoditize, conceptual framework, content marketing, dark matter, data science, David Brooks, deep learning, deliberate practice, deskilling, digital map, disruptive innovation, Douglas Engelbart, driverless car, Edward Lloyd's coffeehouse, Elon Musk, Erik Brynjolfsson, estate planning, financial engineering, fixed income, flying shuttle, follow your passion, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, game design, general-purpose programming language, global pandemic, Google Glasses, Hans Lippershey, haute cuisine, income inequality, independent contractor, index fund, industrial robot, information retrieval, intermodal, Internet of things, inventory management, Isaac Newton, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joi Ito, Khan Academy, Kiva Systems, knowledge worker, labor-force participation, lifelogging, longitudinal study, loss aversion, machine translation, Mark Zuckerberg, Narrative Science, natural language processing, Nick Bostrom, Norbert Wiener, nuclear winter, off-the-grid, pattern recognition, performance metric, Peter Thiel, precariat, quantitative trading / quantitative finance, Ray Kurzweil, Richard Feynman, risk tolerance, Robert Shiller, robo advisor, robotic process automation, Rodney Brooks, Second Machine Age, self-driving car, Silicon Valley, six sigma, Skype, social intelligence, speech recognition, spinning jenny, statistical model, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, superintelligent machines, supply-chain management, tacit knowledge, tech worker, TED Talk, the long tail, transaction costs, Tyler Cowen, Tyler Cowen: Great Stagnation, Watson beat the top human players on Jeopardy!, Works Progress Administration, Zipcar

Finally, people who are interested in programming in this context should be interested in and knowledgeable about some aspect of this field’s key movements: artificial intelligence, natural language processing (NLP), machine learning, deep-learning neural networks, statistical analysis and data mining, and so forth. If you have a basic grounding in computer science and programming, it is possible to develop a sufficient understanding of these automation-oriented tools well into your career. Today there are many online courses related to this field. Stanford professors, for example, have created online courses with companies like Coursera and Udacity in such highly relevant fields as machine learning, natural language processing, algorithms, and robotics.

He wrote: Hi Amy, Would you please send an invite for Tom and me for Friday 9/19 at 9:30A.M. at Hi-Rise Cafe in Cambridge, MA. We will be meeting in person. Thanks, Judah Curiosity getting the best of him, Tom looked up the company in Amy’s email extension, @x.ai. It turns out X.ai is a company that uses “natural language processing” software to interpret text and schedule meetings via email. “Amy,” in other words, is automated. Meanwhile, other tools such as email and voice mail, word processing, online travel sites, and Internet search applications have been chipping away the rest of what used to be a secretarial job.

Our main mission in the next couple hundred pages is to persuade you, our knowledge worker reader, that you remain in charge of your destiny. You should be feeling a sense of agency and making decisions for yourself as to how you will deal with advancing automation. Over the past few years, even as every week brings news of some breakthrough in machine learning or natural language processing or visual image recognition, we’ve been learning from knowledge workers who are thriving. They’re redefining what it means to be more capable than computers, and doubling down on their very human strengths. As you’ll find in the chapters to come, these are not superhumans who can somehow process information more quickly than artificial intelligence or perform repetitive tasks as flawlessly as robots.


pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron

AlphaGo, Amazon Mechanical Turk, Anton Chekhov, backpropagation, combinatorial explosion, computer vision, constrained optimization, correlation coefficient, crowdsourcing, data science, deep learning, DeepMind, don't repeat yourself, duck typing, Elon Musk, en.wikipedia.org, friendly AI, Geoffrey Hinton, ImageNet competition, information retrieval, iterative process, John von Neumann, Kickstarter, machine translation, natural language processing, Netflix Prize, NP-complete, OpenAI, optical character recognition, P = NP, p-value, pattern recognition, pull request, recommendation engine, self-driving car, sentiment analysis, SpamAssassin, speech recognition, stochastic process

Pac-Man Using Deep Q-Learning min_after_dequeue, RandomShuffleQueue MNIST dataset, MNIST-MNIST model parallelism, Model Parallelism-Model Parallelism model parameters, Gradient Descent, Batch Gradient Descent, Early Stopping, Under the Hood, Quadratic Programming, Creating Your First Graph and Running It in a Session, Construction Phase, Training RNNsdefining, Model-based learning model selection, Model-based learning model zoos, Model Zoos model-based learning, Model-based learning-Model-based learning modelsanalyzing, Analyze the Best Models and Their Errors-Analyze the Best Models and Their Errors evaluating on test set, Evaluate Your System on the Test Set-Evaluate Your System on the Test Set moments, Adam Optimization Momentum optimization, Momentum optimization-Momentum optimization Monte Carlo tree search, Policy Gradients Multi-Layer Perceptrons (MLP), Introduction to Artificial Neural Networks, The Perceptron-Multi-Layer Perceptron and Backpropagation, Neural Network Policiestraining with TF.Learn, Training an MLP with TensorFlow’s High-Level API multiclass classifiers, Multiclass Classification-Multiclass Classification Multidimensional Scaling (MDS), Other Dimensionality Reduction Techniques multilabel classifiers, Multilabel Classification-Multilabel Classification Multinomial Logistic Regression (see Softmax Regression) multinomial(), Neural Network Policies multioutput classifiers, Multioutput Classification-Multioutput Classification MultiRNNCell, Distributing a Deep RNN Across Multiple GPUs multithreaded readers, Multithreaded readers using a Coordinator and a QueueRunner-Multithreaded readers using a Coordinator and a QueueRunner multivariate regression, Frame the Problem N naive Bayes classifiers, Multiclass Classification name scopes, Name Scopes natural language processing (NLP), Recurrent Neural Networks, Natural Language Processing-An Encoder–Decoder Network for Machine Translationencoder-decoder network for machine translation, An Encoder–Decoder Network for Machine Translation-An Encoder–Decoder Network for Machine Translation TensorFlow tutorials, Natural Language Processing, An Encoder–Decoder Network for Machine Translation word embeddings, Word Embeddings-Word Embeddings Nesterov Accelerated Gradient (NAG), Nesterov Accelerated Gradient-Nesterov Accelerated Gradient Nesterov momentum optimization, Nesterov Accelerated Gradient-Nesterov Accelerated Gradient network topology, Fine-Tuning Neural Network Hyperparameters neural network hyperparameters, Fine-Tuning Neural Network Hyperparameters-Activation Functionsactivation functions, Activation Functions neurons per hidden layer, Number of Neurons per Hidden Layer number of hidden layers, Number of Hidden Layers-Number of Hidden Layers neural network policies, Neural Network Policies-Neural Network Policies neuronsbiological, From Biological to Artificial Neurons-Biological Neurons logical computations with, Logical Computations with Neurons neuron_layer(), Construction Phase next_batch(), Execution Phase No Free Lunch theorem, Testing and Validating node edges, Visualizing the Graph and Training Curves Using TensorBoard nonlinear dimensionality reduction (NLDR), LLE(see also Kernel PCA; LLE (Locally Linear Embedding)) nonlinear SVM classification, Nonlinear SVM Classification-Computational Complexitycomputational complexity, Computational Complexity Gaussian RBF kernel, Gaussian RBF Kernel-Gaussian RBF Kernel with polynomial features, Nonlinear SVM Classification-Polynomial Kernel polynomial kernel, Polynomial Kernel-Polynomial Kernel similarity features, adding, Adding Similarity Features-Adding Similarity Features nonparametric models, Regularization Hyperparameters nonresponse bias, Nonrepresentative Training Data nonsaturating activation functions, Nonsaturating Activation Functions-Nonsaturating Activation Functions normal distribution (see Gaussian distribution) Normal Equation, The Normal Equation-Computational Complexity normalization, Feature Scaling normalized exponential, Softmax Regression norms, Select a Performance Measure notations, Select a Performance Measure-Select a Performance Measure NP-Complete problems, The CART Training Algorithm null hypothesis, Regularization Hyperparameters numerical differentiation, Numerical Differentiation NumPy, Create the Workspace NumPy arrays, Handling Text and Categorical Attributes NVidia Compute Capability, Installation nvidia-smi, Managing the GPU RAM n_components, Choosing the Right Number of Dimensions O observation space, Neural Network Policies off-policy algorithm, Temporal Difference Learning and Q-Learning offline learning, Batch learning one-hot encoding, Handling Text and Categorical Attributes one-versus-all (OvA) strategy, Multiclass Classification, Softmax Regression, Exercises one-versus-one (OvO) strategy, Multiclass Classification online learning, Online learning-Online learning online SVMs, Online SVMs-Online SVMs OpenAI Gym, Introduction to OpenAI Gym-Introduction to OpenAI Gym operation_timeout_in_ms, In-Graph Versus Between-Graph Replication Optical Character Recognition (OCR), The Machine Learning Landscape optimal state value, Markov Decision Processes optimizers, Faster Optimizers-Learning Rate SchedulingAdaGrad, AdaGrad-AdaGrad Adam optimization, Faster Optimizers, Adam Optimization-Adam Optimization Gradient Descent (see Gradient Descent optimizer) learning rate scheduling, Learning Rate Scheduling-Learning Rate Scheduling Momentum optimization, Momentum optimization-Momentum optimization Nesterov Accelerated Gradient (NAG), Nesterov Accelerated Gradient-Nesterov Accelerated Gradient RMSProp, RMSProp out-of-bag evaluation, Out-of-Bag Evaluation-Out-of-Bag Evaluation out-of-core learning, Online learning out-of-memory (OOM) errors, Static Unrolling Through Time out-of-sample error, Testing and Validating OutOfRangeError, Reading the training data directly from the graph, Multithreaded readers using a Coordinator and a QueueRunner output gate, LSTM Cell output layer, Multi-Layer Perceptron and Backpropagation OutputProjectionWrapper, Training to Predict Time Series-Training to Predict Time Series output_put_keep_prob, Applying Dropout overcomplete autoencoder, Unsupervised Pretraining Using Stacked Autoencoders overfitting, Overfitting the Training Data-Overfitting the Training Data, Create a Test Set, Soft Margin Classification, Gaussian RBF Kernel, Regularization Hyperparameters, Regression, Number of Neurons per Hidden Layeravoiding through regularization, Avoiding Overfitting Through Regularization-Data Augmentation P p-value, Regularization Hyperparameters PaddingFIFOQueue, PaddingFifoQueue Pandas, Create the Workspace, Download the Datascatter_matrix, Looking for Correlations-Looking for Correlations parallel distributed computing, Distributing TensorFlow Across Devices and Servers-Exercisesdata parallelism, Data Parallelism-TensorFlow implementation in-graph versus between-graph replication, In-Graph Versus Between-Graph Replication-Model Parallelism model parallelism, Model Parallelism-Model Parallelism multiple devices across multiple servers, Multiple Devices Across Multiple Servers-Other convenience functionsasynchronous communication using queues, Asynchronous Communication Using TensorFlow Queues-PaddingFifoQueue loading training data, Loading Data Directly from the Graph-Other convenience functions master and worker services, The Master and Worker Services opening a session, Opening a Session pinning operations across tasks, Pinning Operations Across Tasks sharding variables, Sharding Variables Across Multiple Parameter Servers sharing state across sessions, Sharing State Across Sessions Using Resource Containers-Sharing State Across Sessions Using Resource Containers multiple devices on a single machine, Multiple Devices on a Single Machine-Control Dependenciescontrol dependencies, Control Dependencies installation, Installation-Installation managing the GPU RAM, Managing the GPU RAM-Managing the GPU RAM parallel execution, Parallel Execution-Parallel Execution placing operations on devices, Placing Operations on Devices-Soft placement one neural network per device, One Neural Network per Device-One Neural Network per Device parameter efficiency, Number of Hidden Layers parameter matrix, Softmax Regression parameter server (ps), Multiple Devices Across Multiple Servers parameter space, Gradient Descent parameter vector, Linear Regression, Gradient Descent, Training and Cost Function, Softmax Regression parametric models, Regularization Hyperparameters partial derivative, Batch Gradient Descent partial_fit(), Incremental PCA Pearson's r, Looking for Correlations peephole connections, Peephole Connections penalties (see rewards, in RL) percentiles, Take a Quick Look at the Data Structure Perceptron convergence theorem, The Perceptron Perceptrons, The Perceptron-Multi-Layer Perceptron and Backpropagationversus Logistic Regression, The Perceptron training, The Perceptron-The Perceptron performance measures, Select a Performance Measure-Select a Performance Measureconfusion matrix, Confusion Matrix-Confusion Matrix cross-validation, Measuring Accuracy Using Cross-Validation-Measuring Accuracy Using Cross-Validation precision and recall, Precision and Recall-Precision/Recall Tradeoff ROC (receiver operating characteristic) curve, The ROC Curve-The ROC Curve performance scheduling, Learning Rate Scheduling permutation(), Create a Test Set PG algorithms, Policy Gradients photo-hosting services, Semisupervised learning pinning operations, Pinning Operations Across Tasks pip, Create the Workspace Pipeline constructor, Transformation Pipelines-Select and Train a Model pipelines, Frame the Problem placeholder nodes, Feeding Data to the Training Algorithm placers (see simple placer; dynamic placer) policy, Policy Search policy gradients, Policy Search (see PG algorithms) policy space, Policy Search polynomial features, adding, Nonlinear SVM Classification-Polynomial Kernel polynomial kernel, Polynomial Kernel-Polynomial Kernel, Kernelized SVM Polynomial Regression, Training Models, Polynomial Regression-Polynomial Regressionlearning curves in, Learning Curves-Learning Curves pooling kernel, Pooling Layer pooling layer, Pooling Layer-Pooling Layer power scheduling, Learning Rate Scheduling precision, Confusion Matrix precision and recall, Precision and Recall-Precision/Recall TradeoffF-1 score, Precision and Recall-Precision and Recall precision/recall (PR) curve, The ROC Curve precision/recall tradeoff, Precision/Recall Tradeoff-Precision/Recall Tradeoff predetermined piecewise constant learning rate, Learning Rate Scheduling predict(), Data Cleaning predicted class, Confusion Matrix predictions, Confusion Matrix-Confusion Matrix, Decision Function and Predictions-Decision Function and Predictions, Making Predictions-Estimating Class Probabilities predictors, Supervised learning, Data Cleaning preloading training data, Preload the data into a variable PReLU (parametric leaky ReLU), Nonsaturating Activation Functions preprocessed attributes, Take a Quick Look at the Data Structure pretrained layers reuse, Reusing Pretrained Layers-Pretraining on an Auxiliary Taskauxiliary task, Pretraining on an Auxiliary Task-Pretraining on an Auxiliary Task caching frozen layers, Caching the Frozen Layers freezing lower layers, Freezing the Lower Layers model zoos, Model Zoos other frameworks, Reusing Models from Other Frameworks TensorFlow model, Reusing a TensorFlow Model-Reusing a TensorFlow Model unsupervised pretraining, Unsupervised Pretraining-Unsupervised Pretraining upper layers, Tweaking, Dropping, or Replacing the Upper Layers Pretty Tensor, Up and Running with TensorFlow primal problem, The Dual Problem principal component, Principal Components Principal Component Analysis (PCA), PCA-Randomized PCAexplained variance ratios, Explained Variance Ratio finding principal components, Principal Components-Principal Components for compression, PCA for Compression-Incremental PCA Incremental PCA, Incremental PCA-Randomized PCA Kernel PCA (kPCA), Kernel PCA-Selecting a Kernel and Tuning Hyperparameters projecting down to d dimensions, Projecting Down to d Dimensions Randomized PCA, Randomized PCA Scikit Learn for, Using Scikit-Learn variance, preserving, Preserving the Variance-Preserving the Variance probabilistic autoencoders, Variational Autoencoders probabilities, estimating, Estimating Probabilities-Estimating Probabilities, Estimating Class Probabilities producer functions, Other convenience functions projection, Projection-Projection propositional logic, From Biological to Artificial Neurons pruning, Regularization Hyperparameters, Symbolic Differentiation Pythonisolated environment in, Create the Workspace-Create the Workspace notebooks in, Create the Workspace-Download the Data pickle, Better Evaluation Using Cross-Validation pip, Create the Workspace Q Q-Learning algorithm, Temporal Difference Learning and Q-Learning-Learning to Play Ms.

Equation 14-4 summarizes how to compute the cell’s state at each time step for a single instance. Equation 14-4. GRU computations Creating a GRU cell in TensorFlow is trivial: gru_cell = tf.contrib.rnn.GRUCell(num_units=n_neurons) LSTM or GRU cells are one of the main reasons behind the success of RNNs in recent years, in particular for applications in natural language processing (NLP). Natural Language Processing Most of the state-of-the-art NLP applications, such as machine translation, automatic summarization, parsing, sentiment analysis, and more, are now based (at least in part) on RNNs. In this last section, we will take a quick look at what a machine translation model looks like.

Pac-Man Using Deep Q-Learning R Radial Basis Function (RBF), Adding Similarity Features Random Forests, Better Evaluation Using Cross-Validation-Grid Search, Multiclass Classification, Decision Trees, Instability, Ensemble Learning and Random Forests, Random Forests-Feature ImportanceExtra-Trees, Extra-Trees feature importance, Feature Importance-Feature Importance random initialization, Gradient Descent, Batch Gradient Descent, Stochastic Gradient Descent, Vanishing/Exploding Gradients Problems Random Patches and Random Subspaces, Random Patches and Random Subspaces randomized leaky ReLU (RReLU), Nonsaturating Activation Functions Randomized PCA, Randomized PCA randomized search, Randomized Search, Fine-Tuning Neural Network Hyperparameters RandomShuffleQueue, RandomShuffleQueue, Reading the training data directly from the graph random_uniform(), Manually Computing the Gradients reader operations, Reading the training data directly from the graph recall, Confusion Matrix recognition network, Efficient Data Representations reconstruction error, PCA for Compression reconstruction loss, Efficient Data Representations, TensorFlow Implementation, Variational Autoencoders reconstruction pre-image, Selecting a Kernel and Tuning Hyperparameters reconstructions, Efficient Data Representations recurrent neural networks (RNNs), Recurrent Neural Networks-Exercisesdeep RNNs, Deep RNNs-The Difficulty of Training over Many Time Steps exploration policies, Exploration Policies GRU cell, GRU Cell-GRU Cell input and output sequences, Input and Output Sequences-Input and Output Sequences LSTM cell, LSTM Cell-GRU Cell natural language processing (NLP), Natural Language Processing-An Encoder–Decoder Network for Machine Translation in TensorFlow, Basic RNNs in TensorFlow-Handling Variable-Length Output Sequencesdynamic unrolling through time, Dynamic Unrolling Through Time static unrolling through time, Static Unrolling Through Time-Static Unrolling Through Time variable length input sequences, Handling Variable Length Input Sequences variable length output sequences, Handling Variable-Length Output Sequences training, Training RNNs-Creative RNNbackpropagation through time (BPTT), Training RNNs creative sequences, Creative RNN sequence classifiers, Training a Sequence Classifier-Training a Sequence Classifier time series predictions, Training to Predict Time Series-Training to Predict Time Series recurrent neurons, Recurrent Neurons-Input and Output Sequencesmemory cells, Memory Cells reduce_mean(), Construction Phase reduce_sum(), TensorFlow Implementation-TensorFlow Implementation, Variational Autoencoders, Learning to Play Ms.


pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel

Alan Greenspan, Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apollo 11, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, butter production in bangladesh, call centre, Charles Lindbergh, commoditize, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil, data science, driverless car, en.wikipedia.org, Erik Brynjolfsson, Everything should be made as simple as possible, experimental subject, Google Glasses, happiness index / gross national happiness, information security, job satisfaction, Johann Wolfgang von Goethe, lifelogging, machine readable, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mass immigration, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, Shai Danziger, software as a service, SpaceShipOne, speech recognition, statistical model, Steven Levy, supply chain finance, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Davenport, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra, zero-sum game

See Hewlett-Packard (HP) Hubbard, Douglas human behavior collective intelligence consumer behavior insights emotions and mood prediction mistakes, predicting social effect and human genome human language inappropriate comments, predicting mood predictions and natural language processing (NLP) PA for persuasion and influence in human resources. See employees and staff I IBM corporate roll-ups Deep Blue computer DeepQA project Iambic IBM AI mind-reading technology natural language processing research sales leads, predicting student performance PA contest T. J. Watson Research Center value of See also Watson computer ID3 impact modeling. See uplift modeling Imperium incremental impact modeling.

Ask IBM researchers whether their question answering Watson system is anything like HAL, which goes famously rogue in the film, and they’ll quickly reroute your comparison toward the obedient computers of Star Trek. The field of research that develops technology to work with human language is natural language processing (NLP, aka computational linguistics). In commercial application, it’s known as text analytics. These fields develop analytical methods especially designed to operate across the written word. If data is all Earth’s water, textual data is the part known as “the ocean.” Often said to compose 80 percent of all data, it’s everything we the human race know that we’ve bothered to write down.

They were tackling the breadth of human language that stretches beyond the phrasing of each question to include a sea of textual sources, from which the answer to each question must be extracted. With this ambition, IBM had truly doubled down. I would have thought success impossible. After witnessing the world’s best researchers attempting to tackle the task through the 1990s (during which I spent six years in natural language processing research, as well as a summer at the same IBM Research center that bore Watson), I was ready to throw up my hands. Language is so tough that it seemed virtually impossible even to program a computer to answer questions within a limited domain of knowledge such as movies or wines. Yet IBM had taken on the unconstrained, open field of questions across any domain.


pages: 586 words: 186,548

Architects of Intelligence by Martin Ford

3D printing, agricultural Revolution, AI winter, algorithmic bias, Alignment Problem, AlphaGo, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, barriers to entry, basic income, Baxter: Rethink Robotics, Bayesian statistics, Big Tech, bitcoin, Boeing 747, Boston Dynamics, business intelligence, business process, call centre, Cambridge Analytica, cloud computing, cognitive bias, Colonization of Mars, computer vision, Computing Machinery and Intelligence, correlation does not imply causation, CRISPR, crowdsourcing, DARPA: Urban Challenge, data science, deep learning, DeepMind, Demis Hassabis, deskilling, disruptive innovation, Donald Trump, Douglas Hofstadter, driverless car, Elon Musk, Erik Brynjolfsson, Ernest Rutherford, fake news, Fellow of the Royal Society, Flash crash, future of work, general purpose technology, Geoffrey Hinton, gig economy, Google X / Alphabet X, Gödel, Escher, Bach, Hans Moravec, Hans Rosling, hype cycle, ImageNet competition, income inequality, industrial research laboratory, industrial robot, information retrieval, job automation, John von Neumann, Large Hadron Collider, Law of Accelerating Returns, life extension, Loebner Prize, machine translation, Mark Zuckerberg, Mars Rover, means of production, Mitch Kapor, Mustafa Suleyman, natural language processing, new economy, Nick Bostrom, OpenAI, opioid epidemic / opioid crisis, optical character recognition, paperclip maximiser, pattern recognition, phenotype, Productivity paradox, radical life extension, Ray Kurzweil, recommendation engine, Robert Gordon, Rodney Brooks, Sam Altman, self-driving car, seminal paper, sensor fusion, sentiment analysis, Silicon Valley, smart cities, social intelligence, sparse data, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, synthetic biology, systems thinking, Ted Kaczynski, TED Talk, The Rise and Fall of American Growth, theory of mind, Thomas Bayes, Travis Kalanick, Turing test, universal basic income, Wall-E, Watson beat the top human players on Jeopardy!, women in the workforce, working-age population, workplace surveillance , zero-sum game, Zipcar

YANN LECUN: What I’ve mentioned so far are the fundamental topics of research, but there are a whole bunch of application areas. Facebook is very active in computer vision, and I think we can claim to have the best computer vision research group in the world. It’s a mature group and there are a lot of really cool activities there. We’re putting quite a lot of work into natural language processing, and that includes translation, summarization, text categorization—figuring out what topic a text talks about, as well as dialog systems. Actually, dialog systems are a very important area of research for virtual assistants, question and answering systems, and so on. MARTIN FORD: Do you anticipate the creation of an AI that someday could pass the Turing test?

It’s too easy to trick it, and to some extent, the Turing test has already been and gone. We give a lot of importance to language as humans because we are used to discussing intelligent topics with other humans through language. However, language is sort of an epiphenomenon of intelligence, and when I say this, my colleagues who work on natural language processing disagree vehemently! Look at orangutans, who are essentially almost as smart as we are. They have a huge amount of common sense and very good models of the world, and they can build tools, just like humans. However, they don’t have language, they’re not social animals, and they barely interact with other members of the species outside the non-linguistic mother-and-child interaction.

We talk a lot about AI replacing humans in terms of a job scenario, but there are way more opportunities for AI to enhance humans and augment humans. The opportunities are much, much wider and I think we should advocate and invest in technology that is about collaboration and interaction between humans and machines. That’s robotics, natural language processing, human-centric design, and all that. The third component of human-centered AI recognizes that computer science alone cannot address all the AI opportunities and issues. It’s a deeply impactful technology to humanity, so we should be bringing in economists to talk about jobs, to talk about bigger organizations, to talk about finance.


AI 2041 by Kai-Fu Lee, Chen Qiufan

3D printing, Abraham Maslow, active measures, airport security, Albert Einstein, AlphaGo, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, augmented reality, autonomous vehicles, basic income, bitcoin, blockchain, blue-collar work, Cambridge Analytica, carbon footprint, Charles Babbage, computer vision, contact tracing, coronavirus, corporate governance, corporate social responsibility, COVID-19, CRISPR, cryptocurrency, DALL-E, data science, deep learning, deepfake, DeepMind, delayed gratification, dematerialisation, digital map, digital rights, digital twin, Elon Musk, fake news, fault tolerance, future of work, Future Shock, game design, general purpose technology, global pandemic, Google Glasses, Google X / Alphabet X, GPT-3, happiness index / gross national happiness, hedonic treadmill, hiring and firing, Hyperloop, information security, Internet of things, iterative process, job automation, language acquisition, low earth orbit, Lyft, Maslow's hierarchy, mass immigration, mirror neurons, money: store of value / unit of account / medium of exchange, mutually assured destruction, natural language processing, Neil Armstrong, Nelson Mandela, OpenAI, optical character recognition, pattern recognition, plutocrats, post scarcity, profit motive, QR code, quantitative easing, Richard Feynman, ride hailing / ride sharing, robotic process automation, Satoshi Nakamoto, self-driving car, seminal paper, Silicon Valley, smart cities, smart contracts, smart transportation, Snapchat, social distancing, speech recognition, Stephen Hawking, synthetic biology, telemarketer, Tesla Model S, The future is already here, trolley problem, Turing test, uber lyft, universal basic income, warehouse automation, warehouse robotics, zero-sum game

For the first time in many years, Golden Sparrow and Silver Sparrow nodded in perfect sync. ANALYSIS NATURAL LANGUAGE PROCESSING, SELF-SUPERVISED TRAINING, GPT-3, AGI AND CONSCIOUSNESS, AI EDUCATION “Twin Sparrows” introduces the idea of personal AI companions—in this case, companions whose primary function is to serve as tutors for the twins in the story. The AI companions, or vPals, as Fountainhead Academy’s program calls them, feature many AI technologies, but the one I want to highlight is natural language processing (NLP), or the ability for machines to process and understand human languages. What’s the chance of humans being able to form relationships with sophisticated AI companions like Atoman within twenty years?

I will then answer the natural question: When AI masters our language, will it have general intelligence? Lastly, we will explore the future of education in the AI era, including how AI will become a great complement to human teachers and significantly enhance the future of education. NATURAL LANGUAGE PROCESSING (NLP) Natural language processing is a subbranch of AI. Speech and language are central to human intelligence, communication, and cognitive processes, so understanding natural language is often viewed as the greatest AI challenge. “Natural language” refers to the language of humans—speech, writing, and nonverbal communication that may have an innate component and that people cultivate through social interactions and education.

Classification: LCC Q335 .L423 2021 (print) | LCC Q335 (ebook) | DDC 006.3—dc23 LC record available at https://lccn.loc.gov/​2021012928 LC ebook record available at https://lccn.loc.gov/​2021012929 International edition ISBN 9780593240717 Ebook ISBN 9780593238301 crownpublishing.com Book design by Edwin Vazquez, adapted for ebook Cover Design: Will Staehle ep_prh_5.7.1_c0_r0 Contents Cover Title Page Copyright Epigraph Introduction by Kai-Fu Lee: The Real Story of AI Introduction by Chen Qiufan: How We Can Learn to Stop Worrying and Embrace the Future with Imagination Chapter One: The Golden Elephant Analysis: Deep Learning, Big Data, Internet/Finance Applications, AI Externalities Chapter Two: Gods Behind the Masks Analysis: Computer Vision, Convolutional Neural Networks, Deepfakes, Generative Adversarial Networks (GANs), Biometrics, AI Security Chapter Three: Twin Sparrows Analysis: Natural Language Processing, Self-Supervised Training, GPT-3, AGI and Consciousness, AI Education Chapter Four: Contactless Love Analysis: AI Healthcare, AlphaFold, Robotic Applications, COVID Automation Acceleration Chapter Five: My Haunting Idol Analysis: Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR), Brain-Computer Interface (BCI), Ethical and Societal Issues Chapter Six: The Holy Driver Analysis: Autonomous Vehicles, Full Autonomy and Smart Cities, Ethical and Social Issues Chapter Seven: Quantum Genocide Analysis: Quantum Computers, Bitcoin Security, Autonomous Weapons and Existential Threat Chapter Eight: The Job Savior Analysis: AI Job Displacement, Universal Basic Income (UBI), What AI Cannot Do, 3Rs as a Solution to Displacement Chapter Nine: Isle of Happiness Analysis: AI and Happiness, General Data Protection Regulation (GDPR), Personal Data, Privacy Computing Using Federated Learning and Trusted Execution Environment (TEE) Chapter Ten: Dreaming of Plenitude Analysis: Plenitude, New Economic Models, the Future of Money, Singularity Acknowledgments Other Titles About the Authors What we want is a machine that can learn from experience.


pages: 294 words: 81,292

Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat

AI winter, air gap, AltaVista, Amazon Web Services, artificial general intelligence, Asilomar, Automated Insights, Bayesian statistics, Bernie Madoff, Bill Joy: nanobots, Bletchley Park, brain emulation, California energy crisis, cellular automata, Chuck Templeton: OpenTable:, cloud computing, cognitive bias, commoditize, computer vision, Computing Machinery and Intelligence, cuban missile crisis, Daniel Kahneman / Amos Tversky, Danny Hillis, data acquisition, don't be evil, drone strike, dual-use technology, Extropian, finite state, Flash crash, friendly AI, friendly fire, Google Glasses, Google X / Alphabet X, Hacker News, Hans Moravec, Isaac Newton, Jaron Lanier, Jeff Hawkins, John Markoff, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, Loebner Prize, lone genius, machine translation, mutually assured destruction, natural language processing, Neil Armstrong, Nicholas Carr, Nick Bostrom, optical character recognition, PageRank, PalmPilot, paperclip maximiser, pattern recognition, Peter Thiel, precautionary principle, prisoner's dilemma, Ray Kurzweil, Recombinant DNA, Rodney Brooks, rolling blackouts, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, smart grid, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Jurvetson, Steve Wozniak, strong AI, Stuxnet, subprime mortgage crisis, superintelligent machines, technological singularity, The Coming Technological Singularity, Thomas Bayes, traveling salesman, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, zero day

But it will continue to develop the component sciences of traveling in space—rocketry, robotics, astronomy, et cetera—and one day all the pieces will come together, and a shot at Mars will look feasible. Likewise, narrow AI projects do lots of intelligent jobs like search, voice recognition, natural language processing, visual perception, data mining, and much more. Separately they are well-funded, powerful tools, dramatically improving each year. Together they advance the computer sciences that will benefit AGI systems. However, Norvig told me, no AGI program for Google exists. But compare that statement to what his boss, Google cofounder Larry Page said at a London conference called Zeitgeist ’06: People always make the assumption that we’re done with search.

Created by AI pioneer Douglas Lenat, Cyc is the largest AI project in history, and probably the best funded, with $50 million in grants from government agencies, including DARPA, since 1984. Cyc’s creators continue to improve its database and inference engine so it can better process “natural language,” or everyday written language. Once it has acquired a sufficient natural language processing (NLP) capability, its creators will start it reading, and comprehending, all the Web pages on the Internet. Another contender for most knowledgeable knowledge database is already doing that. Carnegie Mellon University’s NELL, the Never-Ending-Language-Learning system, knows more than 390,000 facts about the world.

Many know that DARPA (then called ARPA) funded the research that invented the Internet (initially called ARPANET), as well as the researchers who developed the now ubiquitous GUI, or Graphical User Interface, a version of which you probably see every time you use a computer or smart phone. But the agency was also a major backer of parallel processing hardware and software, distributed computing, computer vision, and natural language processing (NLP). These contributions to the foundations of computer science are as important to AI as the results-oriented funding that characterizes DARPA today. How is DARPA spending its money? A recent annual budget allocates $61.3 million to a category called Machine Learning, and $49.3 million to Cognitive Computing.


pages: 288 words: 86,995

Rule of the Robots: How Artificial Intelligence Will Transform Everything by Martin Ford

AI winter, Airbnb, algorithmic bias, algorithmic trading, Alignment Problem, AlphaGo, Amazon Mechanical Turk, Amazon Web Services, artificial general intelligence, Automated Insights, autonomous vehicles, backpropagation, basic income, Big Tech, big-box store, call centre, carbon footprint, Chris Urmson, Claude Shannon: information theory, clean water, cloud computing, commoditize, computer age, computer vision, Computing Machinery and Intelligence, coronavirus, correlation does not imply causation, COVID-19, crowdsourcing, data is the new oil, data science, deep learning, deepfake, DeepMind, Demis Hassabis, deskilling, disruptive innovation, Donald Trump, Elon Musk, factory automation, fake news, fulfillment center, full employment, future of work, general purpose technology, Geoffrey Hinton, George Floyd, gig economy, Gini coefficient, global pandemic, Googley, GPT-3, high-speed rail, hype cycle, ImageNet competition, income inequality, independent contractor, industrial robot, informal economy, information retrieval, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Jeff Bezos, job automation, John Markoff, Kiva Systems, knowledge worker, labor-force participation, Law of Accelerating Returns, license plate recognition, low interest rates, low-wage service sector, Lyft, machine readable, machine translation, Mark Zuckerberg, Mitch Kapor, natural language processing, Nick Bostrom, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, Ocado, OpenAI, opioid epidemic / opioid crisis, passive income, pattern recognition, Peter Thiel, Phillips curve, post scarcity, public intellectual, Ray Kurzweil, recommendation engine, remote working, RFID, ride hailing / ride sharing, Robert Gordon, Rodney Brooks, Rubik’s Cube, Sam Altman, self-driving car, Silicon Valley, Silicon Valley startup, social distancing, SoftBank, South of Market, San Francisco, special economic zone, speech recognition, stealth mode startup, Stephen Hawking, superintelligent machines, TED Talk, The Future of Employment, The Rise and Fall of American Growth, the scientific method, Turing machine, Turing test, Tyler Cowen, Tyler Cowen: Great Stagnation, Uber and Lyft, uber lyft, universal basic income, very high income, warehouse automation, warehouse robotics, Watson beat the top human players on Jeopardy!, WikiLeaks, women in the workforce, Y Combinator

Tony Peng, “Yann LeCun Cake Analogy 2.0,” Synced Review, February 22, 2019, medium.com/syncedreview/yann-lecun-cake-analogy-2-0-a361da560dae. 23. Ford, Interview with Demis Hassabis, in Architects of Intelligence, pp. 172–173. 24. Jeremy Kahn, “A.I. breakthroughs in natural-language processing are big for business,” Fortune, January 20, 2020, fortune.com/2020/01/20/natural-language-processing-business/. 25. Ford, Interview with David Ferrucci, in Architects of Intelligence, p. 409. 26. Ibid. p. 414. 27. Do You Trust This Computer?, released April 5, 2018, Papercut Films, doyoutrustthiscomputer.org/. 28. Ford, Interview with David Ferrucci, in Architects of Intelligence, p. 414. 29.

As always, competition between the cloud providers is a powerful driver of innovation, and Amazon’s deep learning tools for the AWS platform are likewise becoming easier to use. Along with the development tools, all the cloud services offer pre-built deep learning components that are ready to be used out of the box and incorporated into applications. Amazon, for example, offers packages for speech recognition and natural language processing and a “recommendation engine” that can make suggestions in the same way that online shoppers or movie watchers are shown alternatives that are likely to be of interest.16 The most controversial example of this kind of prepackaged capability is AWS’s Rekognition service, which makes it easy for developers to deploy facial recognition technology.

There was an immediate backlash against the specter of the tech giant having access to NHS patient data despite the fact that Google claimed strict privacy policies were in place and the data was carefully anonymized.51 All this illustrates, once again, how factors beyond the capability of the technology itself—in this case, perceived privacy concerns—can act to significantly slow the deployment of artificial intelligence in the healthcare arena. Some of the most surprising successes with artificial intelligence in healthcare are occurring in the mental health arena. Woebot Labs, a Silicon Valley startup founded in 2017, has developed a chatbot powered by natural language processing technology similar to what is used in Alexa and Siri, combined with carefully scripted conversational elements developed by psychologists. Woebot’s approach is essentially to automate cognitive behavioral therapy, or CBT, a proven technique for helping people with depression or anxiety.


pages: 398 words: 86,855

Bad Data Handbook by Q. Ethan McCallum

Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, Chuck Templeton: OpenTable:, cloud computing, cognitive dissonance, combinatorial explosion, commoditize, conceptual framework, data science, database schema, DevOps, en.wikipedia.org, Firefox, Flash crash, functional programming, Gini coefficient, hype cycle, illegal immigration, iterative process, labor-force participation, loose coupling, machine readable, natural language processing, Netflix Prize, One Laptop per Child (OLPC), power law, quantitative trading / quantitative finance, recommendation engine, selection bias, sentiment analysis, SQL injection, statistical model, supply-chain management, survivorship bias, text mining, too big to fail, web application

Reviews can tell us the positive or negative sentiment of the reviewer, as well as what they specifically care about, such as quality of service, ambience, and value. When we aggregate reviews, we can learn what’s popular about the place and why people like or dislike it. We use many other signals besides reviews, but with the proper application of natural language processing,[9] reviews are a rich source of significant information. Getting Reviews To get reviews, we use APIs where possible, but most reviews are found using good old-fashioned web scraping. If you can use an API like CityGrid[10] to get the data you need, it will make your life much easier, because while scraping isn’t necessarily difficult, it can be very frustrating.

Such a sentiment classifier could be run over a business’s reviews in order to calculate an overall sentiment, and to make up for any missing rating information. Sentiment Classification NLTK,[12] Python’s Natural Language ToolKit, is a very useful programming library for doing natural language processing and text classification.[13] It also comes with many corpora that you can use for training and testing. One of these is the movie_reviews corpus,[14] and if you’re just learning how to do sentiment classification, this is a good corpus to start with. It is organized into two directories, pos and neg.

By choosing to show only positive reviews, the data, design, and user experience are all congruent, helping our users choose from the best options available based on their own preferences, without having to do any mental filtering of negative opinions. Lessons Learned One important lesson for machine learning and statistical natural language processing enthusiasts: it’s very important to train your own models on your own data. If I had used classifiers trained on the standard movie_reviews corpus, I would never have gotten these results. Movie reviews are simply different than local business reviews. In fact, it might be the case that you’d get even better results by segmenting businesses by type, and creating classifiers for each type of business.


pages: 252 words: 74,167

Thinking Machines: The Inside Story of Artificial Intelligence and Our Race to Build the Future by Luke Dormehl

"World Economic Forum" Davos, Ada Lovelace, agricultural Revolution, AI winter, Albert Einstein, Alexey Pajitnov wrote Tetris, algorithmic management, algorithmic trading, AlphaGo, Amazon Mechanical Turk, Apple II, artificial general intelligence, Automated Insights, autonomous vehicles, backpropagation, Bletchley Park, book scanning, borderless world, call centre, cellular automata, Charles Babbage, Claude Shannon: information theory, cloud computing, computer vision, Computing Machinery and Intelligence, correlation does not imply causation, crowdsourcing, deep learning, DeepMind, driverless car, drone strike, Elon Musk, Flash crash, Ford Model T, friendly AI, game design, Geoffrey Hinton, global village, Google X / Alphabet X, Hans Moravec, hive mind, industrial robot, information retrieval, Internet of things, iterative process, Jaron Lanier, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, life extension, Loebner Prize, machine translation, Marc Andreessen, Mark Zuckerberg, Menlo Park, Mustafa Suleyman, natural language processing, Nick Bostrom, Norbert Wiener, out of africa, PageRank, paperclip maximiser, pattern recognition, radical life extension, Ray Kurzweil, recommendation engine, remote working, RFID, scientific management, self-driving car, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, social intelligence, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, tech billionaire, technological singularity, The Coming Technological Singularity, The Future of Employment, Tim Cook: Apple, Tony Fadell, too big to fail, traumatic brain injury, Turing machine, Turing test, Vernor Vinge, warehouse robotics, Watson beat the top human players on Jeopardy!

Overenthusiasm meant that impressive, if incremental, advances were often written up as though truly smart machines were already here. For example, one heavily hyped project was a 1960s robot called SHAKEY, described as the world’s first general-purpose robot capable of reasoning about its own actions. In doing so, it set benchmarks in fields like pattern recognition, information representation, problem solving and natural language processing. That alone should have been enough to make SHAKEY exciting, but journalists couldn’t resist a bit of embellishment. As such, when SHAKEY appeared in Life magazine in 1970, he was hailed not as a promising combination of several important research topics, but as the world’s ‘first electronic person’.

In other cases, Siri’s reasoning allows it to extract the relevant concepts from our sentences and connect these with web-based services and data, applying its ever-growing knowledge about you to a series of rules, concepts and contexts. The result is a way of turning requests into actions. ‘I want to eat in the same restaurant I ate in last week,’ is a straightforward enough sentence, but to make it into something useful, an AI assistant such as Siri must not only use natural language processing to understand the concept you are talking about, but also use context to find the right rule in its programming to follow. The speech recognition used in Siri is the creation of Nuance Communications, arguably the most advanced speech recognition company in the world. ‘Our job is to figure out the logical assertions inherent in the question that is being asked, or the command that is being given,’ Nuance’s Distinguished Scientist Ron Kaplan tells me.

DARPA approached the non-profit research institute SRI International about creating a five-year, 500-person investigation, which was, at the time, the largest AI project in history. It brought together experts from a range of AI disciplines, including machine learning, knowledge representation and natural language processing. DARPA’s project was called CALO, standing for Cognitive Assistant that Learns and Organises. The name was inspired by the Latin word ‘calonis’, meaning ‘soldier’s servant’. After half a decade of research, SRI International made the decision to spin-off a consumer-facing version of the technology.


pages: 579 words: 76,657

Data Science from Scratch: First Principles with Python by Joel Grus

backpropagation, confounding variable, correlation does not imply causation, data science, deep learning, Hacker News, higher-order functions, natural language processing, Netflix Prize, p-value, Paul Graham, recommendation engine, SpamAssassin, statistical model

Three bottom-up clusters using max distance For Further Exploration scikit-learn has an entire module sklearn.cluster that contains several clustering algorithms including KMeans and the Ward hierarchical clustering algorithm (which uses a different criterion for merging clusters than ours did). SciPy has two clustering models scipy.cluster.vq (which does k-means) and scipy.cluster.hierarchy (which has a variety of hierarchical clustering algorithms). Chapter 20. Natural Language Processing They have been at a great feast of languages, and stolen the scraps. William Shakespeare Natural language processing (NLP) refers to computational techniques involving language. It’s a broad field, but we’ll look at a few techniques both simple and not simple. Word Clouds In Chapter 1, we computed word counts of users’ interests. One approach to visualizing words and counts is word clouds, which artistically lay out the words with sizes proportional to their counts.

modules (Python), Modules multiple assignment (Python), Tuples N n-gram models, n-gram Models-n-gram Modelsbigram, n-gram Models trigrams, n-gram Models n-grams, n-gram Models Naive Bayes algorithm, Naive Bayes-For Further Explorationexample, filtering spam, A Really Dumb Spam Filter-A More Sophisticated Spam Filter implementation, Implementation natural language processing (NLP), Natural Language Processing-For Further Explorationgrammars, Grammars-Grammars topic modeling, Topic Modeling-Topic Modeling topics of interest, finding, Topics of Interest word clouds, Word Clouds-Word Clouds nearest neighbors classification, k-Nearest Neighbors-For Further Explorationcurse of dimensionality, The Curse of Dimensionality-The Curse of Dimensionality example, favorite programming languages, Example: Favorite Languages-Example: Favorite Languages model, The Model network analysis, Network Analysis-For Further Explorationbetweenness centrality, Betweenness Centrality-Betweenness Centrality closeness centrality, Betweenness Centrality degree centrality, Finding Key Connectors, Betweenness Centrality directed graphs and PageRank, Directed Graphs and PageRank-Directed Graphs and PageRank eigenvector centrality, Eigenvector Centrality-Centrality networks, Network Analysis neural networks, Neural Networks-For Further Explorationbackpropagation, Backpropagation example, defeating a CAPTCHA, Example: Defeating a CAPTCHA-Example: Defeating a CAPTCHA feed-forward, Feed-Forward Neural Networks perceptrons, Perceptrons neurons, Neural Networks NLP (see natural language processing) nodes, Network Analysis noise, Rescalingin machine learning, Overfitting and Underfitting None (Python), Truthiness normal distribution, The Normal Distributionand p-value computation, Example: Flipping a Coin central limit theorem and, The Central Limit Theorem in coin flip example, Example: Flipping a Coin standard, The Normal Distribution normalized tables, JOIN NoSQL databases, NoSQL NotQuiteABase, Databases and SQL null hypothesis, Statistical Hypothesis Testingtesting in A/B test, Example: Running an A/B Test NumPy, NumPy O one-sided tests, Example: Flipping a Coin ORDER BY statement (SQL), ORDER BY overfitting, Overfitting and Underfitting, The Bias-Variance Trade-off P p-hacking, P-hacking p-values, Example: Flipping a Coin PageRank algorithm, Directed Graphs and PageRank paid accounts, predicting, Paid Accounts pandas, For Further Exploration, For Further Exploration, pandas parameterized models, What Is Machine Learning?

modules (Python), Modules multiple assignment (Python), Tuples N n-gram models, n-gram Models-n-gram Modelsbigram, n-gram Models trigrams, n-gram Models n-grams, n-gram Models Naive Bayes algorithm, Naive Bayes-For Further Explorationexample, filtering spam, A Really Dumb Spam Filter-A More Sophisticated Spam Filter implementation, Implementation natural language processing (NLP), Natural Language Processing-For Further Explorationgrammars, Grammars-Grammars topic modeling, Topic Modeling-Topic Modeling topics of interest, finding, Topics of Interest word clouds, Word Clouds-Word Clouds nearest neighbors classification, k-Nearest Neighbors-For Further Explorationcurse of dimensionality, The Curse of Dimensionality-The Curse of Dimensionality example, favorite programming languages, Example: Favorite Languages-Example: Favorite Languages model, The Model network analysis, Network Analysis-For Further Explorationbetweenness centrality, Betweenness Centrality-Betweenness Centrality closeness centrality, Betweenness Centrality degree centrality, Finding Key Connectors, Betweenness Centrality directed graphs and PageRank, Directed Graphs and PageRank-Directed Graphs and PageRank eigenvector centrality, Eigenvector Centrality-Centrality networks, Network Analysis neural networks, Neural Networks-For Further Explorationbackpropagation, Backpropagation example, defeating a CAPTCHA, Example: Defeating a CAPTCHA-Example: Defeating a CAPTCHA feed-forward, Feed-Forward Neural Networks perceptrons, Perceptrons neurons, Neural Networks NLP (see natural language processing) nodes, Network Analysis noise, Rescalingin machine learning, Overfitting and Underfitting None (Python), Truthiness normal distribution, The Normal Distributionand p-value computation, Example: Flipping a Coin central limit theorem and, The Central Limit Theorem in coin flip example, Example: Flipping a Coin standard, The Normal Distribution normalized tables, JOIN NoSQL databases, NoSQL NotQuiteABase, Databases and SQL null hypothesis, Statistical Hypothesis Testingtesting in A/B test, Example: Running an A/B Test NumPy, NumPy O one-sided tests, Example: Flipping a Coin ORDER BY statement (SQL), ORDER BY overfitting, Overfitting and Underfitting, The Bias-Variance Trade-off P p-hacking, P-hacking p-values, Example: Flipping a Coin PageRank algorithm, Directed Graphs and PageRank paid accounts, predicting, Paid Accounts pandas, For Further Exploration, For Further Exploration, pandas parameterized models, What Is Machine Learning?


pages: 370 words: 112,809

The Equality Machine: Harnessing Digital Technology for a Brighter, More Inclusive Future by Orly Lobel

2021 United States Capitol attack, 23andMe, Ada Lovelace, affirmative action, Airbnb, airport security, Albert Einstein, algorithmic bias, Amazon Mechanical Turk, augmented reality, barriers to entry, basic income, Big Tech, bioinformatics, Black Lives Matter, Boston Dynamics, Charles Babbage, choice architecture, computer vision, Computing Machinery and Intelligence, contact tracing, coronavirus, corporate social responsibility, correlation does not imply causation, COVID-19, crowdsourcing, data science, David Attenborough, David Heinemeier Hansson, deep learning, deepfake, digital divide, digital map, Elon Musk, emotional labour, equal pay for equal work, feminist movement, Filter Bubble, game design, gender pay gap, George Floyd, gig economy, glass ceiling, global pandemic, Google Chrome, Grace Hopper, income inequality, index fund, information asymmetry, Internet of things, invisible hand, it's over 9,000, iterative process, job automation, Lao Tzu, large language model, lockdown, machine readable, machine translation, Mark Zuckerberg, market bubble, microaggression, Moneyball by Michael Lewis explains big data, natural language processing, Netflix Prize, Network effects, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, occupational segregation, old-boy network, OpenAI, openstreetmap, paperclip maximiser, pattern recognition, performance metric, personalized medicine, price discrimination, publish or perish, QR code, randomized controlled trial, remote working, risk tolerance, robot derives from the Czech word robota Czech, meaning slave, Ronald Coase, Salesforce, self-driving car, sharing economy, Sheryl Sandberg, Silicon Valley, social distancing, social intelligence, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, surveillance capitalism, tech worker, TechCrunch disrupt, The Future of Employment, TikTok, Turing test, universal basic income, Wall-E, warehouse automation, women in the workforce, work culture , you are the product

A study by the World Bank recently used text-as-data methods to examine village assembly transcripts in rural villages in Tamil Nadu, India.9 The researchers used natural language processing to measure deliberative influence and found that women are at a disadvantage relative to men: they are less likely to speak, set the agenda, and receive a relevant response from state officials. The study also found that although the frequency of female attendees’ speech did not increase when there were gender quotas for village council presidents, female presidents did tend to be more responsive to female constituents. Natural language processing methods thus show promise to reveal, and perhaps rectify, patterns of unequal speech and influence that might otherwise be difficult to analyze across many local governments.

Machine learning is an application of AI that allows computers to autonomously improve through data and experience without being explicitly programmed on how to do so. One way that machine learning happens is through word embedding, a common research framework that represents text data as vectors used in many machine learning and natural language processing tasks. The machine teaches itself associations from all the text input it receives, and the algorithm learns about pairings. When presented with associations like “wheel: car, wing: ____,” a word-embedding algorithm learns to predict “plane.” Taking it a step further, “flowers” and “musical instruments,” for example, are far more likely to be associated with pleasant words than “insects” and “weapons” are.3 Just like our minds develop associations, an algorithm learns these connections from processing natural language fed to it.

Companies like Advanced Discovery, Nex, Botler, AwareHQ, and Emtrain (to name just a few) offer early warnings—a “smoke detector” AI—analyzing patterns of behavior that could lead to sexual harassment. These AI tools cross-reference thousands of legal documents and complaints related to harassment and use natural language processing to scan online conversations to predict whether a user’s experience might be unlawful. The AI bots analyze speech patterns, attachments, and the timing of messages being sent, tracking employee communication, flagging potentially problematic cases, and sending them for human investigation.


pages: 392 words: 108,745

Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think by James Vlahos

Albert Einstein, AltaVista, Amazon Mechanical Turk, Amazon Web Services, augmented reality, Automated Insights, autonomous vehicles, backpropagation, Big Tech, Cambridge Analytica, Chuck Templeton: OpenTable:, cloud computing, Colossal Cave Adventure, computer age, deep learning, DeepMind, Donald Trump, Elon Musk, fake news, Geoffrey Hinton, information retrieval, Internet of things, Jacques de Vaucanson, Jeff Bezos, lateral thinking, Loebner Prize, machine readable, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Mark Zuckerberg, Menlo Park, natural language processing, Neal Stephenson, Neil Armstrong, OpenAI, PageRank, pattern recognition, Ponzi scheme, randomized controlled trial, Ray Kurzweil, Ronald Reagan, Rubik’s Cube, self-driving car, sentiment analysis, Silicon Valley, Skype, Snapchat, speech recognition, statistical model, Steve Jobs, Steve Wozniak, Steven Levy, TechCrunch disrupt, Turing test, Watson beat the top human players on Jeopardy!

The company made some of the world’s most beloved consumer electronic devices and had a huge head start on conversational AI with Siri, who had just been unveiled. Amazon didn’t have a substantial track record with consumer products; there was only the Kindle e-reader. And the company didn’t employ legions of experts in speech recognition and natural-language processing. The number of people at Amazon with experience in those fields came to a grand total of two. The company was starting from scratch, and Hart had to suspend his own disbelief. “If we could build it—and I didn’t know the answer to the ‘if’ part—that would be an amazing product,” Hart remembers thinking.

In September 2011 Amazon acquired Yap, a North Carolina–based company that specialized in cloud-based speech recognition. Engineers at Lab126—the company’s hardware skunk works in Sunnyvale, California, where the Kindle had been created—worked on designing the device itself. In 2012 Doppler added an office in Boston, which, thanks to all of the city’s academic institutions, was a hotbed of natural-language-processing talent. In October 2012 Amazon acquired a Cambridge, UK–based company called Evi, which specialized in automatically answering spoken questions. And in January 2013 Doppler bought out Ivona, a Polish company that produced synthetic computer voices. Big picture, the problems that the Doppler team had to solve could be divided into two categories.

Wherever you were in a room, and whatever else was happening acoustically—music playing, baby crying, Klingons attacking—the device should be able to hear you. “Far-field speech recognition did not exist in any commercial product when we started on this project,” Hart says. “We didn’t know if we could solve it.” Rohit Prasad, a scientist whom Amazon hired in April 2013 to oversee Doppler’s natural-language processing, was uniquely qualified to help out. In the 1990s Prasad had done far-field research for the U.S. military, which wanted a system that could transcribe what everyone was saying in a meeting. Prasad helped to engineer technology that was twice as accurate as anything that had been previously developed.


Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport

Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, commoditize, data acquisition, data science, disruptive innovation, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, lifelogging, Mark Zuckerberg, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining, Thomas Davenport, three-martini lunch

In addition, there have Chapter_02.indd 44 03/12/13 11:42 AM How Big Data Will Change Your Job, Company, and Industry   45 always been voluminous amounts of text in the clinical setting, ­primarily from physicians’ and nurses’ notes. This text can increasingly be c­ aptured and classified through the use of natural language ­processing ­technology. Insurance firms have huge amounts of medical claims data, but it’s not integrated with the data from healthcare providers. If all of that data could be integrated, categorized, and analyzed, we’d know a lot more about patient conditions. Image data from CAT scans and MRIs is another huge source; thus far doctors only look at it but don’t analyze it in any systematic fashion.

The level of customer satisfaction is increasingly important to health insurers because it is being monitored by state and federal government groups and published by organizations such as Consumers Union. In the past, that valuable data from calls couldn’t be analyzed. Now, however, United is turning it into text and then analyzing it with natural language processing software (a way to extract meaning from text). The analysis process can identify—though it’s not easy, given the vagaries of the English language—customers who use terms suggesting strong dissatisfaction. The insurer can then make some sort of intervention—perhaps a call exploring the source of the ­dissatisfaction.

Big data often involves the processing of unstructured data types like text, images, and video. It is probably impossible for a data scientist to be familiar with the analysis of all of these data types, but a knowledge of analytical approaches to one of them would be very useful. For example, natural language processing (NLP) is a set of approaches to extracting meaning from text. It may involve counting, classifying, translating, or otherwise analyzing words. It’s quite commonly used, for example, in understanding what customers are saying about a product or company. Virtually every large firm that is interested in big data should have someone available with NLP skills, but one or two experts will probably be sufficient.


pages: 237 words: 65,794

Mining Social Media: Finding Stories in Internet Data by Lam Thuy Vo

barriers to entry, correlation does not imply causation, data science, Donald Trump, en.wikipedia.org, Filter Bubble, Firefox, Google Chrome, Internet Archive, natural language processing, social web, web application

., 178 dropna() function, 155 E elements, 5 encode, 60 end tags, 5 engagement metrics, 152 error messages, 30–31 ethics, 80 expressions, 16 F Facebook, 64–79 filepaths, 44 filtering data, 114–117 find() function, 71–72, 90–91 floats, 16, 96 for loops, 22–23 formatting data, 106–109 formulas, 112–114 frontend languages CSS (Cascading Style Sheets), 6–12 HTML (HyperText Markup Language), 4–6 JavaScript, 12–13 functions, 20–22 append(), 73 apply(), 168 describe(), 160 DictWriter(), 74 dropna(), 155 find(), 71–72, 90–91 get_text(), 71–72 head(), 146 json.load(), 51 lambda, 168–169 len(), 20, 148 loads(), 49 make_csv(), 59–60 mean(), 160 open(), 49–50 print(), 20, 148 reusable, 58–61 set_index(), 172 sleep(), 96 sort_values(), 158 tail(), 147 writeheader(), 74 writer(), 50 writerow(), 50 G General Data Protection Regulation (GDPR), 64 get_text() function, 71–72 Google Chrome, 10 Sheets, 104–106, 121–122, 128–133 H head() function, 146 Heisler, Sofia, 178 hexadecimal colors, 7 home pages, 4 HTML (HyperText Markup Language), 4–6 I IDs, 8 if clauses, 23–25 =iferror() formula, 120 iloc[] method, 149 indentation, 5–6 inheritance of styles, 7 inline CSS, 7 integer-location-based indexing, 149 integers, 16, 96 internal style sheets, 8 Internet Archive, 145 IPython Notebooks, 136 iteration, 22–23 J JavaScript, 12–13 joining data sets, 117–121 JSON (JavaScript Object Notation) format, 30–37 json library, 47, 49 JSON objects, 34 json.load() function, 51 Jupyter Notebook, 136–142 K keys, 34 key-value pairs, 34 Klein, Ewan, 180 L lambda functions, 168–169 len()function, 20, 148 libraries, 46–48 beautifulsoup4 library, 47, 68–70 csv library, 47, 49–50, 68 datetime library, 47 importing, 68 json library, 47, 49 matplotlib library, 175–176 pandas library, 47, 142–149, 165 pip library, 47–48 requests library, 47, 49 scikit-learn library, 180 third-party, 46 Linder, Lindsey, 78–79 lists, 19–20 loads() function, 49 logical operators, 24–25 loops, 22–23 Loper, Edward, 180 Lytvynenko, Jane, 38 M machine learning, 179–180 macOS, xxi make_csv() function, 59–60 matplotlib library, 175–176 McKinney, Wes, 142 mean, 152 mean() function, 160 measures of central tendency, 152–153 median, 152 merging data sets, 117–121 minified code, 87 modifying and formatting data, 106–109 N Naked Statistics (Wheelan), 179 NaN values, 155–156 natural language processing (NLP), 179 Natural Language Processing with Python (Bird, Klein, and Loper), 180 nested elements, 5 nextPageToken key, 55–57 NLTK (Natural Language Toolkit), 179 null values, 154–156 numbers, 16 O one-dimensional data sets, 143–144 open() function, 49–50 opening tags, 5 operators, 16, 24–25 overloading a server, 82 P pagination, 55–57 pandas library, 47, 142–149, 165 panel data, 142 parameters, 29–30, 41 parsing, 69 part parameter, 30 paste special, 115 pie charts, 127 pip command, 68 pip library, 47–48 pivot tables, 110–111 placeholders, 154 plotting data, 175–176 population data, 153 print statements, 15 print()function, 20, 148 prompts, 15 properties, 7 pseudocoding, 46 PyPI (Python Package Index), 47 Python.

Chapter 10: Measuring the Twitter Activity of Political Actors Explains how to format data as timestamps, modify it more efficiently with lambda functions, and resample it temporally in pandas. Chapter 11: Where to Go from Here Lists resources for becoming a better Python coder, learning more about statistical analyses, and analyzing text using natural language processing and machine learning. Downloading and Installing Python To work through the exercises in this book, you’ll need to set up a number of tools on your computer. I’ll help you with most of these—including signing up for a Google account and installing Python libraries—in the relevant chapters.

Norton, 2014) that provides a great introduction to statistics with relatable—and often amusing—examples (https://books.wwnorton.com/books/Naked-Statistics/) “Tidy Data,” an academic paper by Hadley Wickham that lays out helpful approaches to “tidying” data, or restructuring it for more efficient data analyses (https://www.jstatsoft.org/article/view/v059i10) Other Kinds of Analyses Finally, there are some more-advanced kinds of analysis, particularly suited to social web data, that have resulted in some fantastic research over the past few years. One example is natural language processing (NLP), the process of turning text into data for analysis. Many NLP methods are available through Python libraries, including the Natural Language Toolkit, or NLTK (https://www.nltk.org/), and spaCy (https://spacy.io/). These libraries allow us to break text down into smaller parts—like words, word stems, sentences, or phrases—for further analysis.


pages: 315 words: 89,861

The Simulation Hypothesis by Rizwan Virk

3D printing, Albert Einstein, AlphaGo, Apple II, artificial general intelligence, augmented reality, Benoit Mandelbrot, bioinformatics, butterfly effect, Colossal Cave Adventure, Computing Machinery and Intelligence, DeepMind, discovery of DNA, Dmitri Mendeleev, Elon Musk, en.wikipedia.org, Ernest Rutherford, game design, Google Glasses, Isaac Newton, John von Neumann, Kickstarter, mandelbrot fractal, Marc Andreessen, Minecraft, natural language processing, Nick Bostrom, OpenAI, Pierre-Simon Laplace, Plato's cave, quantum cryptography, quantum entanglement, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, Schrödinger's Cat, Search for Extraterrestrial Intelligence, Silicon Valley, Stephen Hawking, Steve Jobs, Steve Wozniak, technological singularity, TED Talk, time dilation, Turing test, Vernor Vinge, Zeno's paradox

See MMORPGs (massively multiplayer online roleplaying games) Masterson, Andrew, 251 Matera Laser Ranging Observatory (MLRO), 253–54 Mathematica software, 18 The Matrix, 7–8, 16–17, 25–26, 53, 72–74, 76, 196, 230, 251, 257, 276–77 Max Headroom, 92 Maxwell, James, 125–26 maya (illusion), 5, 14, 186–87, 191, 203 Mazurenko, Roman, 101–2 measurement, future vs. the past, 146–47 Mécanique celeste (Laplace), 125 Mendeleev, Dmitri, 190 metaphysical experiments and consciousness, 249–250 Microsoft, 60 Microsoft Hololens, 62 Miller, Laura, 80 mind interfaces mind reading, 75–77 mind-broadcast technology, 74–75 overview, 72–77 types of, 74 “mind lamp,” 76 mind reading, 75–77 mind-broadcast technology, 74–75 Minecraft, 50, 70–71 minimax algorithm, 154–55, 155f MIT, 6, 13, 32, 38–39, 85, 154, 165–66, 219 MIT Media Lab, 68 MIT Technology Review, 236 MMORPGs (massively multiplayer online roleplaying games) 3D rendering and virtual worlds, 42–44, 56 as 3D world, 94 and 3D world rendering, 136–37 augmented reality (AR), 63 as development to Simulation Point, 49–52 features of, 208–11 game evolution to, 4, 31 and Great Simulation, 53–54 Great Simulation as, 20, 279 quest engines of, 213–14 and realistic 3D models and graphics, 83 vs. simulated reality, 216 world as game state, 41 MMORPGs development 3D avatars, 49 big, graphically rendered 3D world to explore, 49 individual quests, 50–51 multiple online players, 50 persistent world state, 49–50 physics engines vs. rendering engine, 51 procedurally generated world, 51 storage of player’s state outside of rendered world, 49 user-generated content, 50 moksha, 203 Monroe, Robert, 242 Moody, Raymond, 228–29 Moorjani, Anita, 241 Morgan, Richard, 103–4 motion capture, 64 MUDs (multiuser dungeons), 44 Muhammad, 190, 226 multiple lives, 36 doctrines of reincarnation, 201–3 in video games, 200–201 multiple online players, 50 multiple possible futures, 147–48, 148f multiuser dungeons (MUDs), 44 multiverse and parallel worlds, 148–150 Musk, Elon, 5–6, 24–25, 87, 98, 139–140, 275 MWI (many worlds interpretation), 142–43, 149 My Big TOE (Campbell, 2003), 156–57, 173–74 N Natural Language Processing (NLP), 89–92 NDEs (near-death experiences), 15–16, 219, 228–231 near-death experiences (NDEs). See NDEs (near-death experiences) Netscape, 287 Neumann, John von, 100, 260 Neurable, 76 Neurolink, 76 A New Kind of Science (Wolfram, 2002), 266 New York Times, 232 Newton, Isaac, 13, 36, 124–26, 161, 166, 220–21 Niels Bohr Institute, 132 Nintendo Entertainment System (NES), 38–39 nirvana, 203 NLP (Natural Language Processing), 89–92 No Man’s Sky, 46–47, 51, 236 Noack, Marcus, 246 nonhuman earth-based lifeforms, 275 non-player characters (NPCs), 30–31, 39, 82, 280–81 non-player characters (NPCs), graphical, 41–42 non-simulated beings, 114 NPCs (non-player characters), 30–31, 39, 53, 82 NPCs and Turing Test, 115 O OASIS, 56–57, 71 OBEs (out-of-body experiences), 219, 241–42 “object” definition, 70 observation, particle collapse as, 131 Oculus VR, 59–60 OpenAI, 87, 94 optimization, 159–160 optimization techniques, computer graphics, 34, 157 Owhadi, Houman, 254–55 P Pac-Man, 1, 34, 82, 208, 273 parallel lives and future selves, 150–52 parallel universes and simulation hypothesis, 159–160 parallel worlds and Fringe, 152–53 parallel worlds and the multiverse, 148–150 parallel worlds, need for computation, 157–59 Paramahansa Yogananda, 183, 200 particle “local” nature, 127 particles and pixels on screen, 162–64 particle-wave duality, 127–134, 254–55 Pauli, Wolfgang, 121, 125–26 Pauli Exclusion Principle, 126 PCs (player characters), 82 PCs vs.

Still, given that it was developed in the mid 1960s, Eliza was a considerable achievement in the development of AI. In some ways, Eliza was the precedent for many of the NPCs in adventure games and a precedent to the chat-bots that we see in the early 21st century. Some of the chat-bots use very simplistic pattern matching, while others are starting to incorporate more complicated natural language processing. Different kinds of AI techniques had to be developed in order for a computer to have a chance at passing the “Turing Test.” In the early 21st century, digital assistants like Siri, Alexa, and Google Assistant are much better at processing either text or voice than any of the video games that we have covered thus far.

But just as video games drove early graphics technology, you can expect that simulated characters will drive more sophisticated AI in the future. Figure 15: Eliza was an early digital psychiatrist that used simple matching. NLP, AI, and the Quest to Pass the Turing Test Of critical importance to passing the Turing Test is NLP, or Natural Language Processing. NLP is the ability of a computer to read (or listen to) and understand the meaning of natural language. How could we tell if a computer program had “understood” a sentence? This is another difficult question to answer, so it comes down to what kind of response the program gives us. Early NLP systems were heuristic, meaning they were based on rules.


pages: 339 words: 92,785

I, Warbot: The Dawn of Artificially Intelligent Conflict by Kenneth Payne

Abraham Maslow, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, AlphaGo, anti-communist, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asperger Syndrome, augmented reality, Automated Insights, autonomous vehicles, backpropagation, Black Lives Matter, Bletchley Park, Boston Dynamics, classic study, combinatorial explosion, computer age, computer vision, Computing Machinery and Intelligence, coronavirus, COVID-19, CRISPR, cuban missile crisis, data science, deep learning, deepfake, DeepMind, delayed gratification, Demis Hassabis, disinformation, driverless car, drone strike, dual-use technology, Elon Musk, functional programming, Geoffrey Hinton, Google X / Alphabet X, Internet of things, job automation, John Nash: game theory, John von Neumann, Kickstarter, language acquisition, loss aversion, machine translation, military-industrial complex, move 37, mutually assured destruction, Nash equilibrium, natural language processing, Nick Bostrom, Norbert Wiener, nuclear taboo, nuclear winter, OpenAI, paperclip maximiser, pattern recognition, RAND corporation, ransomware, risk tolerance, Ronald Reagan, self-driving car, semantic web, side project, Silicon Valley, South China Sea, speech recognition, Stanislav Petrov, stem cell, Stephen Hawking, Steve Jobs, strong AI, Stuxnet, technological determinism, TED Talk, theory of mind, TikTok, Turing machine, Turing test, uranium enrichment, urban sprawl, V2 rocket, Von Neumann architecture, Wall-E, zero-sum game

The Pentagon was an enthusiastic sponsor of many AI research projects, forging links with centres of excellence at universities around the United States, including Shakey’s home department at Stanford, but also MIT and Carnegie Mellon: today these remain leading departments in the field. Alongside ARPA, military funders included the Office for Naval Research, the Army Signal Corps and the Rome Air Defense Center. And not just the Pentagon: the CIA was interested in image recognition and natural language processing, including one unsuccessful attempt at face recognition in the 1960s. Research also involved private enterprise. Famous names like Hughes, Bell Laboratories and IBM, along with more obscure outfits, like Panoramic Research and BBN, sometimes spun out by the university researchers themselves.

It is less bounded and less mechanistic. There’s no recipe book for success that you can train your neural net on. The cognitive challenge of war is far more complex than the cognitive challenge of battle. Strategy requires things neural nets aren’t good at, like imagination, creativity or intuition. Object recognition, natural language processing, robotics—none of these are particularly useful for the strategist. If anything, strategy plays more to the strengths of symbolic logic machines—if I do this, then they do that. Except that even there, the challenges are formidable. There are very many possible steps that each side can take at any moment—and the action isn’t iterative, like chess, it’s dynamic: Clausewitz likened it to a wrestling match.

And everywhere we go we leave traces of our DNA—useful for forensic analysis. DNA might even offer a place to store information—a vast biological hard drive. Thus are millions upon millions more bits of information added daily to an impossibly large pile of raw intelligence. Perhaps AI could help make sense of this torrent of information. Natural language processing and image recognition were, after all, a large part of the attraction for the defence funders of AI research back in the mid-twentieth century. And modern AI is spectacularly good at recognising patterns in huge data sets. It can transcribe spoken words into text, read even the most unruly handwriting, lip-read visual imagery.


pages: 309 words: 114,984

The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age by Robert Wachter

activist fund / activist shareholder / activist investor, Affordable Care Act / Obamacare, AI winter, Airbnb, Atul Gawande, Captain Sullenberger Hudson, Checklist Manifesto, Chuck Templeton: OpenTable:, Clayton Christensen, cognitive load, collapse of Lehman Brothers, computer age, creative destruction, crowdsourcing, deep learning, deskilling, disruptive innovation, driverless car, en.wikipedia.org, Erik Brynjolfsson, everywhere but in the productivity statistics, Firefox, Frank Levy and Richard Murnane: The New Division of Labor, general purpose technology, Google Glasses, human-factors engineering, hype cycle, Ignaz Semmelweis: hand washing, Internet of things, job satisfaction, Joseph Schumpeter, Kickstarter, knowledge worker, lifelogging, Marc Benioff, medical malpractice, medical residency, Menlo Park, minimum viable product, natural language processing, Network effects, Nicholas Carr, obamacare, pattern recognition, peer-to-peer, personalized medicine, pets.com, pneumatic tube, Productivity paradox, Ralph Nader, RAND corporation, Richard Hendricks, Robert Solow, Salesforce, Second Machine Age, self-driving car, seminal paper, Silicon Valley, Silicon Valley startup, six sigma, Skype, Snapchat, software as a service, Steve Jobs, Steven Levy, TED Talk, The future is already here, the payments system, The Wisdom of Crowds, Thomas Bayes, Toyota Production System, Uber for X, US Airways Flight 1549, Watson beat the top human players on Jeopardy!, Yogi Berra

Even technophiles admit that the quest to replace doctors with computers—or even the more modest ambition of providing them with useful guidance at the point of care—has been overhyped and unproductive. But times have changed. The growing prevalence of electronic health records offers grist for the AI and big-data mills, grist that wasn’t available when the records were on paper. And in this, the Age of Watson, we have new techniques, like natural language processing and machine learning, at our disposal. Perhaps this is our “gradually, then suddenly” moment. The public worships dynamic, innovative surgeons like Michael DeBakey; passionate, insightful researchers like Jonas Salk; and telegenic show horses like Mehmet Oz. But we seldom hear about those doctors whom other physicians tend to hold in the highest esteem: the great medical diagnosticians.

As if this weren’t complicated enough for the poor IBM engineer gearing up to retool Watson from answering questions about “Potent Potables” to diagnosing sick patients, there’s more. While the EHR at least offers a fighting chance for computerized diagnosis (older medical AI programs, built in the pen-and-paper era, required busy physicians to write their notes and then reenter all the key data), parsing an electronic medical record is far from straightforward. Natural language processing is getting much better, but it still has real problems with negation (“the patient has no history of chest pain or cough”) and with family history (“there is a history of arthritis in the patient’s sister, but his mother is well”), to name just a couple of issues. Certain terms have multiple meanings: when written by a psychiatrist, the term depression is likely to refer to a mood disorder, while when it appears in a cardiologist’s note (“there was no evidence of ST-depression”) it probably refers to a dip in the EKG tracing that is often a clue to coronary disease.

The scruffies are the pragmatists, the hackers, the crazy ones; they believe that problems should be attacked through whatever means work, and that modeling the behavior of experts or the scientific truth of a situation isn’t all that important. IBM’s breakthrough was to figure out that a combination of neat and scruffy—programming in some of the core rules of the game, but then folding in the fruits of machine learning and natural language processing—could solve truly complicated problems. When he was asked about the difference between human thinking and Watson’s method, Eric Brown, who runs IBM’s Watson Technologies group, gave a careful answer (note the shout-out to the humans, the bit players who made it all possible): A lot of the way that Watson works is motivated by the way that humans analyze problems and go about trying to find solutions, especially when it comes to dealing with complex problems where there are a number of intermediate steps to get you to the final answer.


pages: 340 words: 97,723

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity by Amy Webb

"Friedman doctrine" OR "shareholder theory", Ada Lovelace, AI winter, air gap, Airbnb, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic bias, AlphaGo, Andy Rubin, artificial general intelligence, Asilomar, autonomous vehicles, backpropagation, Bayesian statistics, behavioural economics, Bernie Sanders, Big Tech, bioinformatics, Black Lives Matter, blockchain, Bretton Woods, business intelligence, Cambridge Analytica, Cass Sunstein, Charles Babbage, Claude Shannon: information theory, cloud computing, cognitive bias, complexity theory, computer vision, Computing Machinery and Intelligence, CRISPR, cross-border payments, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, data science, deep learning, DeepMind, Demis Hassabis, Deng Xiaoping, disinformation, distributed ledger, don't be evil, Donald Trump, Elon Musk, fail fast, fake news, Filter Bubble, Flynn Effect, Geoffrey Hinton, gig economy, Google Glasses, Grace Hopper, Gödel, Escher, Bach, Herman Kahn, high-speed rail, Inbox Zero, Internet of things, Jacques de Vaucanson, Jeff Bezos, Joan Didion, job automation, John von Neumann, knowledge worker, Lyft, machine translation, Mark Zuckerberg, Menlo Park, move fast and break things, Mustafa Suleyman, natural language processing, New Urbanism, Nick Bostrom, one-China policy, optical character recognition, packet switching, paperclip maximiser, pattern recognition, personalized medicine, RAND corporation, Ray Kurzweil, Recombinant DNA, ride hailing / ride sharing, Rodney Brooks, Rubik’s Cube, Salesforce, Sand Hill Road, Second Machine Age, self-driving car, seminal paper, SETI@home, side project, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart cities, South China Sea, sovereign wealth fund, speech recognition, Stephen Hawking, strong AI, superintelligent machines, surveillance capitalism, technological singularity, The Coming Technological Singularity, the long tail, theory of mind, Tim Cook: Apple, trade route, Turing machine, Turing test, uber lyft, Von Neumann architecture, Watson beat the top human players on Jeopardy!, zero day

Thus the first ultraintelligent machine is the last invention that man need ever make.”22 A woman did finally enter the mix, at least in name. At MIT, computer scientist Joseph Weizenbaum wrote an early AI system called ELIZA, a chat program named after the ingenue in George Bernard Shaw’s play Pygmalion.23 This development was important for neural networks and AI because it was an early attempt at natural language processing, and the program accessed various prewritten scripts in order to have conversations with real people. The most famous script was called DOCTOR,24 and it mimicked an empathetic psychologist using pattern recognition to respond with strikingly humanistic responses. The Dartmouth workshop had now generated international attention, as did its researchers, who’d unexpectedly found themselves in the limelight.

Tribes typically observe rules and rituals, so let’s explore the rights of initiation for AI’s tribes. It begins with a rigorous university education. In North America, the emphasis within universities has centered on hard skills—like mastery of the R and Python programming languages, competency in natural language processing and applied statistics, and exposure to computer vision, computational biology, and game theory. It’s frowned upon to take classes outside the tribe, such as a course on the philosophy of mind, Muslim women in literature, or colonialism. If we’re trying to build thinking machines capable of thinking like humans do, it would seem counterintuitive to exclude learning about the human condition.

Although Microsoft was the indispensable—if invisible—productivity layer that no business could operate without, executives and shareholders were feeling antsy. It isn’t as though Microsoft didn’t see AI coming. In fact, the company had, for more than a decade, been working across multiple fronts: computer vision, natural language processing, machine reading comprehension, AI apps in its Azure cloud, and even edge computing. The problem was misalignment within the organization and the lack of a shared vision among all cross-functional teams. This resulted in bursts of incredible breakthroughs in AI, published papers, and lots of patents created by supernetworks working on individual projects.


Succeeding With AI: How to Make AI Work for Your Business by Veljko Krunic

AI winter, Albert Einstein, algorithmic trading, AlphaGo, Amazon Web Services, anti-fragile, anti-pattern, artificial general intelligence, autonomous vehicles, Bayesian statistics, bioinformatics, Black Swan, Boeing 737 MAX, business process, cloud computing, commoditize, computer vision, correlation coefficient, data is the new oil, data science, deep learning, DeepMind, en.wikipedia.org, fail fast, Gini coefficient, high net worth, information retrieval, Internet of things, iterative process, job automation, Lean Startup, license plate recognition, minimum viable product, natural language processing, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, six sigma, smart cities, speech recognition, statistical model, strong AI, tail risk, The Design of Experiments, the scientific method, web application, zero-sum game

While pipelines are often customized, it frequently happens that ML pipelines used to address similar problems have a commonality in structure. ML pipelines might represent years of accumulated wisdom from your company (or the whole community) and be structured accordingly. For example, natural language processing (NLP) often uses the same general form of ML pipeline [84]. Can we be more formal with an architectural description? To help readers who are accustomed to the typical architectural presentation done in the industry today, I’m using a simple box and arrow style of architectural diagram, as opposed to the more formal architectural presentation methods (such as the 4+1 architectural view model [85,86]).

The techniques shown in the next section will help your data science team address that situation. 6.5.5 Dealing with complex profit curves Now let’s talk about the details needed to construct the more complex profit curves. This section describes the technical aspects of dealing with non-monotonic and nonunique profit curves. NOTE I assume in this section that the reader is already familiar with confusion matrices and F-scores in the context of natural language processing (NLP). You can find more information in Leon Derczynski’s paper [124]. Let’s first deal with how to recognize a non-unique profit curve. A non-unique profit curve happens when no unique mathematical relation exists between the business metric and the technical metric that you’re using.

Here are some of those views, none of which is universally accepted as correct:  AI is a broader field than ML. While ML involves working with algorithms, just as AI does, AI is also heavily used with robotics [136].  AI deals with more modern algorithms and techniques than classical ML. Based on that idea, newer technologies (such as deep learning), when applied to visual recognition or natural language processing tasks, are considered part of AI. The relationship between AI and ML continues to change over time, and, as you can see, there’s no universally accepted definition of AI. Depending on what year we’re in, when we declare that some technique is part of AI or is no longer part of AI, AI’s meaning changes.


pages: 285 words: 86,853

What Algorithms Want: Imagination in the Age of Computing by Ed Finn

Airbnb, Albert Einstein, algorithmic bias, algorithmic management, algorithmic trading, AlphaGo, Amazon Mechanical Turk, Amazon Web Services, bitcoin, blockchain, business logic, Charles Babbage, Chuck Templeton: OpenTable:, Claude Shannon: information theory, commoditize, Computing Machinery and Intelligence, Credit Default Swap, crowdsourcing, cryptocurrency, data science, DeepMind, disruptive innovation, Donald Knuth, Donald Shoup, Douglas Engelbart, Douglas Engelbart, Elon Musk, Evgeny Morozov, factory automation, fiat currency, Filter Bubble, Flash crash, game design, gamification, Google Glasses, Google X / Alphabet X, Hacker Conference 1984, High speed trading, hiring and firing, Ian Bogost, industrial research laboratory, invisible hand, Isaac Newton, iterative process, Jaron Lanier, Jeff Bezos, job automation, John Conway, John Markoff, Just-in-time delivery, Kickstarter, Kiva Systems, late fees, lifelogging, Loebner Prize, lolcat, Lyft, machine readable, Mother of all demos, Nate Silver, natural language processing, Neal Stephenson, Netflix Prize, new economy, Nicholas Carr, Nick Bostrom, Norbert Wiener, PageRank, peer-to-peer, Peter Thiel, power law, Ray Kurzweil, recommendation engine, Republic of Letters, ride hailing / ride sharing, Satoshi Nakamoto, self-driving car, sharing economy, Silicon Valley, Silicon Valley billionaire, Silicon Valley ideology, Silicon Valley startup, SimCity, Skinner box, Snow Crash, social graph, software studies, speech recognition, statistical model, Steve Jobs, Steven Levy, Stewart Brand, supply-chain management, tacit knowledge, TaskRabbit, technological singularity, technological solutionism, technoutopianism, the Cathedral and the Bazaar, The Coming Technological Singularity, the scientific method, The Signal and the Noise by Nate Silver, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, transaction costs, traveling salesman, Turing machine, Turing test, Uber and Lyft, Uber for X, uber lyft, urban planning, Vannevar Bush, Vernor Vinge, wage slave

If complex systems are themselves computational Turing Machines, they are therefore equivalent: weather systems, human cognition, and most provocatively the universe itself.24 The grand problems of the cosmos (the origins thereof, the relationship of time and space) and the less grand problems of culture (box office returns, intelligent web searching, natural language processing) are irreducible but also calculable: they are not complicated problems with simple answers but rather simple problems (or rule-sets) that generate complicated answers. These assumptions open the door to a mathesis universalis, a language of science that the philosophers Gottfried Wilhelm Leibniz, René Descartes, and others presaged as a way to achieve perfect understanding of the natural world.25 This perfect language would exactly describe the universe through its grammar and vocabulary, becoming a new kind of rational magic for scientists that would effectively describe and be the world.

As critics like media scholar Siva Vaidhyanathan have pointed out, the price of nonparticipation is significant but also difficult to pin down, and the gravitational pull of algorithmic culture gradually inculcates the rituals of participation, of obeisance, to particular computational altars.12 The Magic of Ontology The call and response of Siri’s communication is central to the cultural understanding of intelligent assistants as a kind of useful demon—entities with specific, constrained abilities—but the nature of this achievement unveils a deeper technical being that exists beyond its utility to users. The vital element in Siri’s effectiveness as a culture machine is the achievement of a minimum viable threshold for speedy, topical responses to questions. Siri’s ability to interpret real-world commands depends on two key factors: natural language processing (NLP) and semantic interpretation. As any user who has tried to use Siri without a data connection knows, the software cannot operate without a link to Apple’s servers. Each time a user speaks to Siri the sound file is sent to a data center for analysis and storage, a service of the leading speech technology company Nuance.13 The major breakthroughs in algorithmic speech analysis have come by abandoning deep linguistic structure—efforts to thoroughly map grammar and semantics—in favor of treating speech as a statistical, probabilistic challenge.14 Given this audio signal, what text strings are most likely associated with each word?

Index Abortion, 64 Abstraction, 10 aesthetics and, 83, 87–112 arbitrage and, 161 Bogost and, 49, 92–95 capitalism and, 165 context and, 24 cryptocurrency and, 160–180 culture machines and, 54 (see also Culture machines) cybernetics and, 28, 30, 34 desire for answer and, 25 discarded information and, 50 effective computability and, 28, 33 ethos of information and, 159 high frequency trading (HFT) and imagination and, 185, 189, 192, 194 interfaces and, 52, 54, 92, 96, 103, 108, 110–111 ladder of, 82–83 language and, 2, 24 Marxism and, 165 meaning and, 36 money and, 153, 159, 161, 165–167, 171–175 Netflix and, 87–112, 205n36 politics of, 45 pragmatist approach and, 19–21 process and, 2, 52, 54 reality and, 205n36 Siri and, 64–65, 82–84 Turing Machine and, 23 (see also Turing Machine) Uber and, 124–126, 129 Wiener and, 28–29, 30 work of algorithms and, 113, 120, 123–136, 139–149 Adams, Douglas, 123 Adams, Henry, 80–81 Adaptive systems, 50, 63, 72, 92, 174, 176, 186, 191 Addiction, 114–115, 118–119, 121–122, 176 AdSense, 158–159 Advent of the Algorithm, The (Berlinski), 9, 24 Advertisements AdSense and, 158–159 algorithmic arbitrage and, 111, 161 Apple and, 65 cultural calculus of waiting and, 34 as cultural latency, 159 emotional appeals of, 148 Facebook and, 113–114 feedback systems and, 145–148 Google and, 66, 74, 156, 158–160 Habermas on, 175 Netflix and, 98, 100, 102, 104, 107–110 Uber and, 125 Aesthetics abstraction and, 83, 87–112 arbitrage and, 109–112, 175 culture machines and, 55 House of Cards and, 92, 98–112 Netflix Quantum Theory and, 91–97 personalization and, 11, 97–103 of production, 12 work of algorithms and, 123, 129, 131, 138–147 Agre, Philip, 178–179 Airbnb, 124, 127 Algebra, 17 Algorithmic reading, 52–56 Algorithmic trading, 12, 20, 99, 155 Algorithms abstraction and, 2 (see also Abstraction) arbitrage and, 12, 51, 97, 110–112, 119, 121, 124, 127, 130–134, 140, 151, 160, 162, 169, 171, 176 Berlinski on, 9, 24, 30, 36, 181 Bitcoin and, 160–180 black boxes and, 7, 15–16, 47–48, 51, 55, 64, 72, 92–93, 96, 136, 138, 146–147, 153, 162, 169–171, 179 blockchains and, 163–168, 171, 177, 179 Bogost and, 16, 33, 49 Church-Turing thesis and, 23–26, 39–41, 73 consciousness and, 2, 4, 8, 22–23, 36–37, 40, 76–79, 154, 176, 178, 182, 184 DARPA and, 11, 57–58, 87 desire and, 21–26, 37, 41, 47, 49, 52, 79–82, 93–96, 121, 159, 189–192 effective computability and, 10, 13, 21–29, 33–37, 40–49, 52–54, 58, 62, 64, 72–76, 81, 93, 192–193 Elliptic Curve Digital Signature Algorithm and, 163 embodiment and, 26–32 encryption, 153, 162–163 enframing and, 118–119 Enlightenment and, 27, 30, 38, 45, 68–71, 73 experimental humanities and, 192–196 Facebook and, 20 (see also Facebook) faith and, 7–9, 12, 16, 78, 80, 152, 162, 166, 168 gamification and, 12, 114–116, 120, 123–127, 133 ghost in the machine and, 55, 95 halting states and, 41–46 high frequency trading (HFT) and, 151–158, 168–169, 177 how to think about, 36–41 ideology and, 7, 9, 18, 20–23, 26, 33, 38, 42, 46–47, 54, 64, 69, 130, 144, 155, 160–162, 167, 169, 194 imagination and, 11, 55–56, 181–196 implementation and, 47–52 intelligent assistants and, 11, 57, 62, 64–65, 77 intimacy and, 4, 11, 35, 54, 65, 74–78, 82–85, 97, 102, 107, 128–130, 172, 176, 185–189 Knuth and, 17–18 language and, 24–28, 33–41, 44, 51, 54–55 machine learning and, 2, 15, 28, 42, 62, 66, 71, 85, 90, 112, 181–184, 191 mathematical logic and, 2 meaning and, 35–36, 38, 44–45, 50, 54–55 metaphor and, 32–36 Netflix Prize and, 87–91 neural networks and, 28, 31, 39, 182–183, 185 one-way functions and, 162–163 pragmatist approach and, 18–25, 42, 58, 62 process and, 41–46 programmable culture and, 169–175 quest for perfect knowledge and, 13, 65, 71, 73, 190 rise of culture machines and, 15–21 (see also Culture machines) Siri and, 59 (see also Siri) traveling salesman problem and Turing Machine and, 9 (see also Turing Machine) as vehicle of computation, 5 wants of, 81–85 Weizenbaum and, 33–40 work of, 113–149 worship of, 192 Al-Khwārizmī, Abū ‘Abdullāh Muhammad ibn Mūsā, 17 Alphabet Corporation, 66, 155 AlphaGo, 182, 191 Amazon algorithmic arbitrage and, 124 artificial intelligence (AI) and, 135–145 Bezos and, 174 Bitcoin and, 169 business model of, 20–21, 93–94 cloud warehouses and, 131–132, 135–145 disruptive technologies and, 124 effective computability and, 42 efficiency algorithms and, 134 interface economy and, 124 Kindle and, 195 Kiva Systems and, 134 Mechanical Turk and, 135–145 personalization and, 97 physical logistics of, 13, 131 pickers and, 132–134 pragmatic approach and, 18 product improvement and, 42 robotics and, 134 simplification ethos and, 97 worker conditions and, 132–134, 139–140 Android, 59 Anonymous, 112, 186 AOL, 75 Apple, 81 augmenting imagination and, 186 black box of, 169 cloud warehouse of, 131 company value of, 158 effective computability and, 42 efficiency algorithms and, 134 Foxconn and, 133–134 global computation infrastructure of, 131 iOS App Store and, 59{tab} iTunes and, 161 massive infrastructure of, 131 ontology and, 62–63, 65 physical logistics of, 131 pragmatist approach and, 18 product improvement and, 42 programmable culture and, 169 search and, 87 Siri and, 57 (see also Siri) software and, 59, 62 SRI International and, 57, 59 Application Program Interfaces (APIs), 7, 113 Apps culture machines and, 15 Facebook and, 9, 113–115, 149 Her and, 83 identity and, 6 interfaces and, 8, 124, 145 iOS App Store and, 59 Lyft and, 128, 145 Netflix and, 91, 94, 102 third-party, 114–115 Uber and, 124, 145 Arab Spring, 111, 186 Arbesman, Samuel, 188–189 Arbitrage algorithmic, 12, 51, 97, 110–112, 119, 121, 124, 127, 130–134, 140, 151, 160, 162, 169, 171, 176 Bitcoin and, 51, 169–171, 175–179 cultural, 12, 94, 121, 134, 152, 159 differing values and, 121–122 Facebook and, 111 Google and, 111 high frequency trading (HFT) and, 151–158, 168–169, 177 interface economy and, 123–131, 139–140, 145, 147 labor and, 97, 112, 123–145 market issues and, 152, 161 mining value and, 176–177 money and, 151–152, 155–163, 169–171, 175–179 Netflix and, 94, 97, 109–112 PageRank and, 159 pricing, 12 real-time, 12 trumping content and, 13 valuing culture and, 155–160 Archimedes, 18 Artificial intelligence (AI) adaptive systems and, 50, 63, 72, 92, 174, 176, 186, 191 Amazon and, 135–145 anthropomorphism and, 83, 181 anticipation and, 73–74 artificial, 135–141 automata and, 135–138 DARPA and, 11, 57–58, 87 Deep Blue and, 135–138 DeepMind and, 28, 66, 181–182 desire and, 79–82 ELIZA and, 34 ghost in the machine and, 55, 95 HAL and, 181 homeostat and, 199n42 human brain and, 29 intellectual history of, 61 intelligent assistants and, 11, 57, 62, 64–65, 77 intimacy and, 75–76 job elimination and, 133 McCulloch-Pitts Neuron and, 28, 39 machine learning and, 2, 15, 28, 42, 62, 66, 71, 85, 90, 112, 181–186 Mechanical Turk and, 12, 135–145 natural language processing (NLP) and, 62–63 neural networks and, 28, 31, 39, 182–183, 185 OS One (Her) and, 77 renegade independent, 191 Samantha (Her) and, 77–85, 154, 181 Siri and, 57, 61 (see also Siri) Turing test and, 43, 79–82, 87, 138, 142, 182 Art of Computer Programming, The (Knuth), 17 Ashby, Ross, 199n42 Asimov, Isaac, 45 Atlantic, The (magazine), 7, 92, 170 Automation, 122, 134, 144, 188 Autopoiesis, 28–30 Babbage, Charles, 8 Banks, Iain, 191 Barnet, Belinda, 43–44 Bayesian analysis, 182 BBC, 170 BellKor’s Pragmatic Chaos (Netflix), 89–90 Berlinski, David, 9, 24, 30, 36, 181, 184 Bezos, Jeff, 174 Big data, 11, 15–16, 62–63, 90, 110 Biology, 2, 4, 26–33, 36–37, 80, 133, 139, 185 Bitcoin, 12–13 arbitrage and, 51, 169–171, 175–179 blockchains and, 163–168, 171–172, 177, 179 computationalist approach and cultural processing and, 178 eliminating vulnerability and, 161–162 Elliptic Curve Digital Signature Algorithm and, 163 encryption and, 162–163 as glass box, 162 intrinsic value and, 165 labor and, 164, 178 legitimacy and, 178 market issues and, 163–180 miners and, 164–168, 171–172, 175–179 Nakamoto and, 161–162, 165–167 one-way functions and, 162–163 programmable culture and, 169–175 transaction fees and, 164–165 transparency and, 160–164, 168, 171, 177–178 trust and, 166–168 Blockbuster, 99 Blockchains, 163–168, 171–172, 177, 179 Blogs early web curation and, 156 Facebook algorithms and, 178 Gawker Media and, 170–175 journalistic principles and, 173, 175 mining value and, 175, 178 Netflix and, 91–92 turker job conditions and, 139 Uber and, 130 Bloom, Harold, 175 Bogost, Ian abstraction and, 92–95 algorithms and, 16, 33, 49 cathedral of computation and, 6–8, 27, 33, 49, 51 computation and, 6–10, 16 Cow Clicker and, 12, 116–123 Enlightenment and, 8 gamification and, 12, 114–116, 120, 123–127, 133 Netflix and, 92–95 Boolean conjunctions, 51 Bosker, Bianca, 58 Bostrom, Nick, 45 Bowker, Geoffrey, 28, 110 Boxley Abbey, 137 Brain Pickings (Popova), 175 Brain plasticity, 38, 191 Brand, Stewart, 3, 29 Brazil (film), 142 Breaking Bad (TV series), 101 Brin, Sergei, 57, 155–156 Buffett, Warren, 174 Burr, Raymond, 95 Bush, Vannevar, 18, 186–189, 195 Business models Amazon and, 20–21, 93–94, 96 cryptocurrency and, 160–180 Facebook and, 20 FarmVille and, 115 Google and, 20–21, 71–72, 93–94, 96, 155, 159 Netflix and, 87–88 Uber and, 54, 93–94, 96 Business of Enlightenment, The (Darnton) 68, 68 Calculus, 24, 26, 30, 34, 44–45, 98, 148, 186 CALO, 57–58, 63, 65, 67, 79, 81 Campbell, Joseph, 94 Campbell, Murray, 138 Capitalism, 12, 105 cryptocurrency and, 160, 165–168, 170–175 faking it and, 146–147 Gawker Media and, 170–175 identity and, 146–147 interface economy and, 127, 133 labor and, 165 public sphere and, 172–173 venture, 9, 124, 174 Captology, 113 Carr, Nicholas, 38 Carruth, Allison, 131 Castronova, Edward, 121 Cathedral and the Bazaar, The (Raymond), 6 Cathedral of computation, 6–10, 27, 33, 49, 51 Chess, 135–138, 144–145 Chun, Wendy Hui Kyong, 3, 16, 33, 35–36, 42, 104 Church, Alonzo, 23– 24, 42 Church-Turing thesis, 23–26, 39–41 Cinematch (Netflix), 88–90, 95 Citizens United case, 174 Clark, Andy, 37, 39–40 Cloud warehouses Amazon and, 135–145 interface economy and, 131–145 Mechanical Turk and, 135–145 worker conditions and, 132–134, 139–140 CNN, 170 Code.


pages: 339 words: 88,732

The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson, Andrew McAfee

2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, access to a mobile phone, additive manufacturing, Airbnb, Alan Greenspan, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, American Society of Civil Engineers: Report Card, Any sufficiently advanced technology is indistinguishable from magic, autonomous vehicles, barriers to entry, basic income, Baxter: Rethink Robotics, Boston Dynamics, British Empire, business cycle, business intelligence, business process, call centre, carbon tax, Charles Lindbergh, Chuck Templeton: OpenTable:, clean water, combinatorial explosion, computer age, computer vision, congestion charging, congestion pricing, corporate governance, cotton gin, creative destruction, crowdsourcing, data science, David Ricardo: comparative advantage, digital map, driverless car, employer provided health coverage, en.wikipedia.org, Erik Brynjolfsson, factory automation, Fairchild Semiconductor, falling living standards, Filter Bubble, first square of the chessboard / second half of the chessboard, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, full employment, G4S, game design, general purpose technology, global village, GPS: selective availability, Hans Moravec, happiness index / gross national happiness, illegal immigration, immigration reform, income inequality, income per capita, indoor plumbing, industrial robot, informal economy, intangible asset, inventory management, James Watt: steam engine, Jeff Bezos, Jevons paradox, jimmy wales, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, Khan Academy, Kiva Systems, knowledge worker, Kodak vs Instagram, law of one price, low skilled workers, Lyft, Mahatma Gandhi, manufacturing employment, Marc Andreessen, Mark Zuckerberg, Mars Rover, mass immigration, means of production, Narrative Science, Nate Silver, natural language processing, Network effects, new economy, New Urbanism, Nicholas Carr, Occupy movement, oil shale / tar sands, oil shock, One Laptop per Child (OLPC), pattern recognition, Paul Samuelson, payday loans, post-work, power law, price stability, Productivity paradox, profit maximization, Ralph Nader, Ray Kurzweil, recommendation engine, Report Card for America’s Infrastructure, Robert Gordon, Robert Solow, Rodney Brooks, Ronald Reagan, search costs, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Simon Kuznets, six sigma, Skype, software patent, sovereign wealth fund, speech recognition, statistical model, Steve Jobs, Steven Pinker, Stuxnet, supply-chain management, TaskRabbit, technological singularity, telepresence, The Bell Curve by Richard Herrnstein and Charles Murray, the Cathedral and the Bazaar, the long tail, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, total factor productivity, transaction costs, Tyler Cowen, Tyler Cowen: Great Stagnation, Vernor Vinge, warehouse robotics, Watson beat the top human players on Jeopardy!, winner-take-all economy, Y2K

Siri provided exactly the natural language interaction we were looking for. A 2004 review of the previous half-century’s research in automatic speech recognition (a critical part of natural language processing) opened with the admission that “Human-level speech recognition has proved to be an elusive goal,” but less than a decade later major elements of that goal have been reached. Apple and other companies have made robust natural language processing technology available to hundreds of millions of people via their mobile phones.10 As noted by Tom Mitchell, who heads the machine-learning department at Carnegie Mellon University: “We’re at the beginning of a ten-year period where we’re going to transition from computers that can’t understand language to a point where computers can understand quite a bit about language.”11 Digital Fluency: The Babel Fish Goes to Work Natural language processing software is still far from perfect, and computers are not yet as good as people at complex communication, but they’re getting better all the time.

Apple and other companies have made robust natural language processing technology available to hundreds of millions of people via their mobile phones.10 As noted by Tom Mitchell, who heads the machine-learning department at Carnegie Mellon University: “We’re at the beginning of a ten-year period where we’re going to transition from computers that can’t understand language to a point where computers can understand quite a bit about language.”11 Digital Fluency: The Babel Fish Goes to Work Natural language processing software is still far from perfect, and computers are not yet as good as people at complex communication, but they’re getting better all the time. And in tasks like translation from one language to another, surprising developments are underway: while computers’ communication abilities are not as deep as those of the average human being, they’re much broader.

Many of the ‘novices’ drawn to the challenge outperformed all of the testing companies in the essay competition. The surprises continued when Kaggle investigated who the top performers were. In both competitions, none of the top three finishers had any previous significant experience with either essay grading or natural language processing. And in the second competition, none of the top three finishers had any formal training in artificial intelligence beyond a free online course offered by Stanford AI faculty and open to anyone in the world who wanted to take it. People all over the world did, and evidently they learned a lot.


pages: 307 words: 88,180

AI Superpowers: China, Silicon Valley, and the New World Order by Kai-Fu Lee

"World Economic Forum" Davos, AI winter, Airbnb, Albert Einstein, algorithmic bias, algorithmic trading, Alignment Problem, AlphaGo, artificial general intelligence, autonomous vehicles, barriers to entry, basic income, bike sharing, business cycle, Cambridge Analytica, cloud computing, commoditize, computer vision, corporate social responsibility, cotton gin, creative destruction, crony capitalism, data science, deep learning, DeepMind, Demis Hassabis, Deng Xiaoping, deskilling, Didi Chuxing, Donald Trump, driverless car, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, fake news, full employment, future of work, general purpose technology, Geoffrey Hinton, gig economy, Google Chrome, Hans Moravec, happiness index / gross national happiness, high-speed rail, if you build it, they will come, ImageNet competition, impact investing, income inequality, informal economy, Internet of things, invention of the telegraph, Jeff Bezos, job automation, John Markoff, Kickstarter, knowledge worker, Lean Startup, low skilled workers, Lyft, machine translation, mandatory minimum, Mark Zuckerberg, Menlo Park, minimum viable product, natural language processing, Neil Armstrong, new economy, Nick Bostrom, OpenAI, pattern recognition, pirate software, profit maximization, QR code, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, risk tolerance, Robert Mercer, Rodney Brooks, Rubik’s Cube, Sam Altman, Second Machine Age, self-driving car, sentiment analysis, sharing economy, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, Skype, SoftBank, Solyndra, special economic zone, speech recognition, Stephen Hawking, Steve Jobs, strong AI, TED Talk, The Future of Employment, Travis Kalanick, Uber and Lyft, uber lyft, universal basic income, urban planning, vertical integration, Vision Fund, warehouse robotics, Y Combinator

The Chinese company has racked up victories at a series of prestigious international AI competitions for speech recognition, speech synthesis, image recognition, and machine translation. Even in the company’s “second language” of English, iFlyTek often beats teams from Google, DeepMind, Facebook, and IBM Watson in natural-language processing—that is, the ability of AI to decipher overall meaning rather than just words. This success didn’t come overnight. Back in 1999, when I started Microsoft Research Asia, my top-choice recruit was a brilliant young Ph.D. named Liu Qingfeng. He had been one of the students I saw filing out of the dorms to study under streetlights after my lecture in Hefei.

Founded in 2012, Toutiao is sometimes called “the BuzzFeed of China” because both sites serve as hubs for timely viral stories. But virality is where the similarities stop. BuzzFeed is built on a staff of young editors with a knack for cooking up original content. Toutiao’s “editors” are algorithms. Toutiao’s AI engines trawl the internet for content, using natural-language processing and computer vision to digest articles and videos from a vast network of partner sites and commissioned contributors. It then uses the past behavior of its users—their clicks, reads, views, comments, and so on—to curate a highly personalized newsfeed tailored to each person’s interests.

JUDGING THE JUDGES Similar principles are now being applied to China’s legal system, another sprawling bureaucracy with highly uneven levels of expertise across regions. iFlyTek has taken the lead in applying AI to the courtroom, building tools and executing a Shanghai-based pilot program that uses data from past cases to advise judges on both evidence and sentencing. An evidence cross-reference system uses speech recognition and natural-language processing to compare all evidence presented—testimony, documents, and background material—and seek out contradictory fact patterns. It then alerts the judge to these disputes, allowing for further investigation and clarification by court officers. Once a ruling is handed down, the judge can turn to yet another AI tool for advice on sentencing.


pages: 58 words: 12,386

Big Data Glossary by Pete Warden

business intelligence, business logic, crowdsourcing, fault tolerance, functional programming, information retrieval, linked data, machine readable, natural language processing, recommendation engine, web application

A bit like the LAMP stack for graph processing, they’re designing a set of services that work well together to perform common operations like interfacing to specialized graph databases, writing traversal queries, and exposing the whole system as a REST-based server. If you’re dealing with graph data, Tinkerpop will give you some high-level interfaces that can be much more convenient to deal with than raw graph databases. Chapter 7. NLP Natural language processing (NLP) is a subset of data processing that’s so crucial, it earned its own section. Its focus is taking messy, human-created text and extracting meaningful information. As you can imagine, this chaotic problem domain has spawned a large variety of approaches, with each tool most useful for particular kinds of text.

There’s no magic bullet that will understand written information as well as a human, but if you’re prepared to adapt your use of the results to handle some errors and don’t expect miracles, you can pull out some powerful insights. Natural Language Toolkit The NLTK is a collection of Python modules and datasets that implement common natural language processing techniques. It offers the building blocks that you need to build more complex algorithms for specific problems. For example, you can use it to break up texts into sentences, break sentences into words, stem words by removing common suffixes (like -ing from English verbs), or use machine-readable dictionaries to spot synonyms.


pages: 245 words: 64,288

Robots Will Steal Your Job, But That's OK: How to Survive the Economic Collapse and Be Happy by Pistono, Federico

3D printing, Albert Einstein, autonomous vehicles, bioinformatics, Buckminster Fuller, cloud computing, computer vision, correlation does not imply causation, en.wikipedia.org, epigenetics, Erik Brynjolfsson, Firefox, future of work, gamification, George Santayana, global village, Google Chrome, happiness index / gross national happiness, hedonic treadmill, illegal immigration, income inequality, information retrieval, Internet of things, invention of the printing press, Jeff Hawkins, jimmy wales, job automation, John Markoff, Kevin Kelly, Khan Academy, Kickstarter, Kiva Systems, knowledge worker, labor-force participation, Lao Tzu, Law of Accelerating Returns, life extension, Loebner Prize, longitudinal study, means of production, Narrative Science, natural language processing, new economy, Occupy movement, patent troll, pattern recognition, peak oil, post scarcity, QR code, quantum entanglement, race to the bottom, Ray Kurzweil, recommendation engine, RFID, Rodney Brooks, selection bias, self-driving car, seminal paper, slashdot, smart cities, software as a service, software is eating the world, speech recognition, Steven Pinker, strong AI, synthetic biology, technological singularity, TED Talk, Turing test, Vernor Vinge, warehouse automation, warehouse robotics, women in the workforce

Robots will eventually steal your job, but before them something else is going to jump in. In fact, it already has, in a much more pervasive way that any physical machine could ever do. I am of course talking about computer programs in general. Automated Planning and Scheduling, Machine Learning, Natural Language Processing, Machine Perception, Computer Vision, Speech Recognition, Affective Computing, Computational Creativity, these are all fields of Artificial Intelligence that do not have to face the cumbersome issues that Robotics has to. It is much easier to enhance an algorithm than it is to build a better robot.

As of today (2012), we believe these represent more closely what the human brain does, and they have been used in a variety of real-world applications: Google’s autonomous cars, search results, recommendation systems, automated language translation, personal assistants, cybernetic computational search engines, and IBM’s newest super brain Watson. Natural language processing was believed to be a task that only humans could accomplish. A word can have different meanings depending on the context, a phrase could not mean what it says if it is a joke or a pun. One may infer a subtext implicitly, make cultural references specific to a geographical or cultural area, the possibilities are truly endless.

While our brains will stay pretty much the same for the next 20 years, computer’s efficiency and computational power will have doubled about twenty times. That is a million-fold increase. So, for the same $3 million you will have a computer a million times more powerful than Watson, or you could have a Watson-equivalent computer for $3. Watson’s computational power and exceptional skills of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, Machine Learning, and open domain question answering are already being put to better use than showing off at a TV contest. IBM and Nuance Communications Inc. are partnering for the research project to develop a commercial product during the next 18 to 24 months that will exploit Watson’s capabilities as a clinical decision support system to aid the diagnosis and treatment of patients.86 Recall the example of automated radiologists we mentioned earlier.


pages: 268 words: 109,447

The Cultural Logic of Computation by David Golumbia

Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, American ideology, Benoit Mandelbrot, Bletchley Park, borderless world, business process, cellular automata, citizen journalism, Claude Shannon: information theory, computer age, Computing Machinery and Intelligence, corporate governance, creative destruction, digital capitalism, digital divide, en.wikipedia.org, finite state, folksonomy, future of work, Google Earth, Howard Zinn, IBM and the Holocaust, iterative process, Jaron Lanier, jimmy wales, John von Neumann, Joseph Schumpeter, late capitalism, Lewis Mumford, machine readable, machine translation, means of production, natural language processing, Norbert Wiener, One Laptop per Child (OLPC), packet switching, RAND corporation, Ray Kurzweil, RFID, Richard Stallman, semantic web, Shoshana Zuboff, Slavoj Žižek, social web, stem cell, Stephen Hawking, Steve Ballmer, Stewart Brand, strong AI, supply-chain management, supply-chain management software, technological determinism, Ted Nelson, telemarketer, The Wisdom of Crowds, theory of mind, Turing machine, Turing test, Vannevar Bush, web application, Yochai Benkler

Our human problem, according to this view, is that language has become corrupted due to ambiguity, polysemy, and polyvocality, and computers can bring language back to us, straighten it out, and eliminate the problems that are to blame not just for communicative difficulties but for the “simplicity and power” that would bring about significant political change. Despite Weaver’s assessment, few linguists of note contributed to the 1955 volume (the only practicing linguist among them is Victor Yngve, an MIT Germanicist who is most famous for work in CL and natural language processing, referred to as NLP). In an “historical introduction” provided by the editors, the history of MT begins abruptly in 1946, as if questions of the formal nature of language had never been addressed before. Rather than surveying the intellectual background and history of this topic, the editors cover only the history of machines built at MIT for the express purpose of MT.

Baran, Paul A., and Paul M. Sweezy. 1966. Monopoly Capital: An Essay on the American Economic and Social Order. New York: Monthly Review Press. Barsky, Robert F. 1997. Noam Chomsky: A Life of Dissent. Cambridge, MA: The MIT Press. Bates, Madeleine, and Ralph M. Weischedel, eds. 1993. Challenges in Natural Language Processing. New York: Cambridge University Press. Bauerlein, Mark. 2008. The Dumbest Generation: How the Digital Age Stupefies Young Americans and Jeopardizes Our Future (Or, Don’t Trust Anyone Under 30). New York: Penguin. Bechtel, William, and Adele Abrahamsen. 2002. Connectionism and the Mind: Parallel Processing, Dynamics, and Evolution in Networks.

New York: Cambridge University Press. ———. 2004. The Language Revolution. New York: Polity Press. Dahlberg, Lincoln, and Eugenia Siapera, eds. 2007. Radical Democracy and the Internet: Interrogating Theory and Practice. New York: Palgrave. Dale, Robert, Hermann Moisl, and Harold Somers, eds. 2000. Handbook of Natural Language Processing. New York: Marcel Dekker. Darnell, Rick. 1997. “A Brief History of SGML.” In HTML Unleashed 4. Indianapolis, IN: Sams Publishing. §3.2. http://www.webreference.com/. Davenport, David. 2000. “Computationalism: The Very Idea.” Conceptus-Studien 14, 121–137. Davidson, Matthew C., Dima Amso, Loren Cruess Anderson, and Adele Diamond. 2006.


pages: 346 words: 97,330

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass by Mary L. Gray, Siddharth Suri

"World Economic Forum" Davos, Affordable Care Act / Obamacare, AlphaGo, Amazon Mechanical Turk, Apollo 13, augmented reality, autonomous vehicles, barriers to entry, basic income, benefit corporation, Big Tech, big-box store, bitcoin, blue-collar work, business process, business process outsourcing, call centre, Capital in the Twenty-First Century by Thomas Piketty, cloud computing, cognitive load, collaborative consumption, collective bargaining, computer vision, corporate social responsibility, cotton gin, crowdsourcing, data is the new oil, data science, deep learning, DeepMind, deindustrialization, deskilling, digital divide, do well by doing good, do what you love, don't be evil, Donald Trump, Elon Musk, employer provided health coverage, en.wikipedia.org, equal pay for equal work, Erik Brynjolfsson, fake news, financial independence, Frank Levy and Richard Murnane: The New Division of Labor, fulfillment center, future of work, gig economy, glass ceiling, global supply chain, hiring and firing, ImageNet competition, independent contractor, industrial robot, informal economy, information asymmetry, Jeff Bezos, job automation, knowledge economy, low skilled workers, low-wage service sector, machine translation, market friction, Mars Rover, natural language processing, new economy, operational security, passive income, pattern recognition, post-materialism, post-work, power law, race to the bottom, Rana Plaza, recommendation engine, ride hailing / ride sharing, Ronald Coase, scientific management, search costs, Second Machine Age, sentiment analysis, sharing economy, Shoshana Zuboff, side project, Silicon Valley, Silicon Valley startup, Skype, software as a service, speech recognition, spinning jenny, Stephen Hawking, TED Talk, The Future of Employment, The Nature of the Firm, Tragedy of the Commons, transaction costs, two-sided market, union organizing, universal basic income, Vilfredo Pareto, Wayback Machine, women in the workforce, work culture , Works Progress Administration, Y Combinator, Yochai Benkler

Automatically recognizing and translating language looks easy in some ways because people are accustomed to the everyday nature of tools like Siri, Cortana, and Alexa. Automating human speech recognition and translation is a fundamental part of artificial intelligence that grew into a field called natural language processing. Natural language processing was helped immensely by the internet’s capacity to amass tons of examples of people writing and speaking in various languages. Yet capturing dialogue in video, particularly action scenes that change the mood and meaning of an actor’s words, remains a difficult task for a computer program to understand, let alone translate into different languages.

SEMI-AUTOMATED FUTURE The days of large enterprises with full-time employees working on-site are numbered as more and more projects rely on an off-site workforce available on demand, around the globe. Our employment classification systems, won in the 1930s to make full-time assembly line work sustainable, were not built for this future. As machines get more powerful and algorithms take over more and more problems, we know from past advances in natural language processing and image recognition that industries will continue to identify new problems to tackle. Thus, there is an ever-moving frontier between what machines can and can’t solve. We call this the paradox of automation’s last mile: as machines progress, the opportunity to automate something else appears on the horizon.

See application programming interface (API) Apollo 13, 52 application programming interface (API) circumventing, 74 collaboration, 178–80 definition, xiv growth of, 169 head count on, 103–4 hiring via, 4–6 improvements to, 138–39 inequality of power in, 91–93 limitations of, 170–71, 174 logistics of, xiv, 62 networking, 127 thoughtlessness of, 67–68 training and trust, 71–72 articulation work, 238 n1 artificial intelligence (AI), 231 n41 advancement of, 176–77 humans, dependency on, ix–x, xviii–xxiii, 231 n41 misconceptions about, 191–92 natural language processing, 30 rise of, 6–8 training, xxiii, 6–8, 16, 170, 222 n11 Asra, 106–8 assembly lines, 41–42 automation cost shifts from, 173–77 human labor and, xviii–xxiii, 58–59, 176–77 machinery use in Industrial Revolution, 42–45 paradox of automation, 170 projections for, 243 n5 autonomy vs isolation, 80–84 Avendano, Pablo, 142, 143, 145 Ayesha, 81, 219 n8 B B Corps, 147, 164 bait-and-switch strategy, 83 Bangalore, xi, 17, 76, 219 n5, 238 n7 Bangladesh, 193–94 Beckett, Samuel, 29 benefits APIs, 171 at Caviar, 142 at CrowdFlower, 35 disappearance of, 98, 156 at DoorDash, 157–58, 162 full-time employment, 47, 48, 49, 60 at LeadGenius, 159–60 permatemps (Microsoft), 56–57 recommendations for, 189–92 statistics on, xxiii Uber lawsuit, 146 as worker cost, 32 See also employment, reasons for Bezos, Jeff, 2–3, 90, 135–37, 222 n5 Biewald, Lukas, 35 Bing, xii Blight, David, 226 n2 blue collar work.


pages: 477 words: 75,408

The Economic Singularity: Artificial Intelligence and the Death of Capitalism by Calum Chace

"World Economic Forum" Davos, 3D printing, additive manufacturing, agricultural Revolution, AI winter, Airbnb, AlphaGo, Alvin Toffler, Amazon Robotics, Andy Rubin, artificial general intelligence, augmented reality, autonomous vehicles, banking crisis, basic income, Baxter: Rethink Robotics, Berlin Wall, Bernie Sanders, bitcoin, blockchain, Boston Dynamics, bread and circuses, call centre, Chris Urmson, congestion charging, credit crunch, David Ricardo: comparative advantage, deep learning, DeepMind, Demis Hassabis, digital divide, Douglas Engelbart, Dr. Strangelove, driverless car, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Fairchild Semiconductor, Flynn Effect, full employment, future of work, Future Shock, gender pay gap, Geoffrey Hinton, gig economy, Google Glasses, Google X / Alphabet X, Hans Moravec, Herman Kahn, hype cycle, ImageNet competition, income inequality, industrial robot, Internet of things, invention of the telephone, invisible hand, James Watt: steam engine, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, Kiva Systems, knowledge worker, lifelogging, lump of labour, Lyft, machine translation, Marc Andreessen, Mark Zuckerberg, Martin Wolf, McJob, means of production, Milgram experiment, Narrative Science, natural language processing, Neil Armstrong, new economy, Nick Bostrom, Occupy movement, Oculus Rift, OpenAI, PageRank, pattern recognition, post scarcity, post-industrial society, post-work, precariat, prediction markets, QWERTY keyboard, railway mania, RAND corporation, Ray Kurzweil, RFID, Rodney Brooks, Sam Altman, Satoshi Nakamoto, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Skype, SoftBank, software is eating the world, speech recognition, Stephen Hawking, Steve Jobs, TaskRabbit, technological singularity, TED Talk, The future is already here, The Future of Employment, Thomas Malthus, transaction costs, Two Sigma, Tyler Cowen, Tyler Cowen: Great Stagnation, Uber for X, uber lyft, universal basic income, Vernor Vinge, warehouse automation, warehouse robotics, working-age population, Y Combinator, young professional

Over time, Google Search has become unquestionably AI-powered. In August 2013, Google executed a major update of its search function by introducing Hummingbird, which enables the service to respond appropriately to questions phrased in natural language, such as, “what's the quickest route to Australia?”[lxxix] It combines AI techniques of natural language processing with colossal information resources (including Google's own Knowledge Graph, and of course Wikipedia) to analyse the context of the search query and make the response more relevant. PageRank wasn't dropped, but instead became just one of the 200 or so techniques that are now deployed to provide answers.

[civ] The software was initially licensed for single machines only, so even very well resourced organisations weren’t able to replicate the functionality that Google enjoys, but the move was significant. In April 2016 that restriction was lifted.[cv] In October 2015, Facebook announced that it would follow suit by open sourcing the designs for Big Sur, the server which runs the company's latest AI algorithms.[cvi] Then in May 2016 Google open sourced a natural language processing programme playfully called Parsey McParseFace, and SyntaxNet, an associated software toolkit. Google claims that in the kinds of sentences it can be used with, Parsey’s accuracy is 94%, almost as good as the 95% score achieved by human linguists.[cvii] Open sourcing confers a number of advantages.

This was thanks in no small part to the publication the previous year of Nick Bostrom's book “Superintelligence”. It was also the year when cutting-edge AI systems used deep learning and other techniques to demonstrate human-level capabilities in image recognition, speech recognition and natural language processing. In hindsight, 2015 may well be seen as a tipping point. Machines don't have to make everybody unemployed to bring about an economic singularity. If a majority of people – or even just a large minority – can never get hired again, we will need a different type of economy. Furthermore, we don't have to be absolutely certain of this outcome to make it worthwhile to monitor developments and make contingency plans.


Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos

AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, machine readable, machine translation, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, Wikidata, wikimedia commons, Wikivoyage

Owing to its comprehensive features, Any23 is implemented in major Semantic Web applications, such as Sindice. 85 Chapter 4 ■ Semantic Web Development Tools General Architecture for Text Engineering (GATE) The General Architecture for Text Engineering (GATE), an open source text processor tool developed by the University of Sheffield, uses Natural Language Processing (NLP) methods to generate RDF from text files [8]. GATE’s Ontology plug-in provides an API for manipulating OWL-Lite ontologies that can be serialized as RDF and RDFS. If you work with OWL-DL ontologies, classes that are subclasses of restrictions supported in OWL-Lite are usually shown, but the classes that are subclasses of other restrictions will not be displayed.

The data extracted from DBpedia also supports entity disambiguation and relation detection. YAGO is used for entity type identification, wherein disjointness properties are manually assigned to higher level types in the YAGO taxonomy. To be able to answer questions from a variety of domains, Watson implements relation detection and entity recognition on top of the Natural Language Processing (NLP) algorithms that process factual data from Wikipedia [8]. BBC’s Dynamic Semantic Publishing The British Broadcasting Corporation (BBC) has implemented RDF since 2010 [9] in web sites such as the World Cup 2010 web site [10] and the London 2012 Olympics web site [11]. Today, BBC News [12], BBC Sport [13], and many other web sites across the BBC are authored and published using Semantic Web technologies.

Index „„         A AllegroGraph ACID implementation, 151 client installation, 156 editions, 151 graph algorithms, 152 Gruff, 160 high-performance storage, 213 Java API connection() method, 157 create method, 157 indexing, 158 RDF statement, 159 read only mode, 158 showTriples method, 159 SparqlSelect query, 159 Triplestore information, 159 quintuplestore, 151 server installation RPM package, 152 TAR Archive, 155 virtual machine image file, 155 text indexing, 151 WebView, 152 Apache Jena, 94, 99 Apache Marmotta, 111 Apache Stanbol, 91 Arachnophilia, 80 Atomic process, 221 Atomicity, Consistency, Isolation, and Durability (ACID), 151, 161 „„         B BBEdit, 80 Big Data applications BBC’s Dynamic Semantic Publishing, 212 Google Knowledge Graph data resources, 200 Google Knowledge Carousel, 201–202 Google Knowledge Panel, 200, 202 JSON-LD annotation, 203 LocalBusiness annotation, 205 SERPs, 200 Google Knowledge Vault, 202 high-performance storage, 214 IBM Watson, 212 Library of Congress Linked Data Service, 213 social media applications (see Social media applications) variety, 199 velocity, 199 veracity, 199 volume, 199 Blazegraph, 171 BlueFish editor, 80 British Broadcasting Corporation (BBC), 212 Business Process Execution Language (BPEL), 140 „„         C Callimachus, 112 Contexts and Dependency Injection (CDI), 111 createDefaultModel() method, 94 CubicWeb, 109 Cypher Query Language (CQL), 188 „„         D D2R server, 193 DBpedia, 63 DBpedia mobile, 116 query Eisenach query, 225 SPARQL endpoint, 64, 225 resources, 63, 64 Spotlight, 84 DeepQA system, 212 227 ■ index Development tools advanced text editors, 79 application development Apache Jena, 94, 99 Sesame (see Sesame) browsers DBpedia Mobile, 116 facet-based (faceted) browsing, 113 IsaViz, 116 marbles, 114 ODE, 114 pivoting (rotation), 113 RelFinder, 117 Tabulator, 113 IDEs (see Integrated Development Environments (IDEs)) linked data software Apache Marmotta, 111 Callimachus, 112 LODStats, 113 Neologism, 112 sameAs.org, 112 Sindice, 110 ontology editors Apache Stanbol, 91 development stages, 86 Fluent Editor, 91 Protégé (see Protégé) SemanticWorks, 89 SML, 92 TopBraid Composer, 90 ZOOMA, 91 RDFizers Apache Any23, 85 GATE, 86 OpenRefine, 86 reasoners ABOX reasoning, 92 FaCT++, 94 HermiT, 92 OWL API, 92 OWLLink support, 92 Pellet, 93 RACER, 94 semantic annotators and converters DBpedia Spotlight, 84 Google Structured Data Testing Tool, 84 RDFa 1.1 distiller and parser, 82 RDFa Play, 82 RDF distiller, 83 Direct graph, 218 Direct mapping, 218 228 „„         E Eclipse Apache Jena set up, 99 JDK installation, 98, 99 Sesame set up, 103 EditPlus, 80 „„         F Facebook Graph API current state representation, 207 Facebook Module, 210 Graph API Explorer Android, 209 fields of node, 208 FOAF profile augmentation, 210 HTTP GET requests, 208 identifier and user name, 207 iOS, 209 JavaScript, 209 JSON-Turtle conversions, 210 Linked Data, 210 PHP, 209 RDF triples, 209 RDF/Turtle output, 210 Turtle translation, 209 JSON, 207 RESTful JSON API, 207 unique identifier, 207 Facebook Module of Apache Marmotta’s LDClient library, 210 Fast Classification of Terminologies (FaCT++), 94 Fluent Editor, 91 4Store application process, 169 RDF file, 169 rest-client installation, 170 SPARQL query, 170 SPARQL server, 169, 195 Fuseki, 192 „„         G General Architecture for Text Engineering (GATE), 86 GeoNames, 65 Gleaning Resource Descriptions from Dialects of Languages (GRDDL), 39 Google Knowledge Graph data resources, 200 Google Knowledge Carousel, 201–202 Google Knowledge Panel, 200, 202 ■ Index JSON-LD annotation Band in Markup, 203 product description, 203 product offering, 204 LocalBusiness annotation, 205 SERPs, 200 Google Knowledge Panel, 200 Graph databases 4Store process, 169 RDF file, 169 rest-client installation, 170 SPARQL query, 170 advantages, 146, 149 AllegroGraph (see AllegroGraph) Blazegraph, 171 definition, 145 features, 146 index-free adjacency, 145 named graph, 149–150 Neo4j (see Neo4j) Oracle, 171 processing engine, 145 quadstore, 149 storage, 145 triplestores, 149 Graphical User Interface (GUI), 86–87 Gruff, 160 „„         H Hadoop Distributed File System (HDFS), 171 „„         I IBM Watson Developers Cloud, 212 Integrated Development Environments (IDEs) CubicWeb, 109 Eclipse Apache Jena set up, 99 Java Development Kit installation, 99 Sesame set up, 103 NetBeans, 108 Internationalized Domain Names (IDN), 9 International Standard Book Number (ISBN), 16 Internet Reasoning Service (IRS), 141 IsaViz, 116 „„         J Java Development Kit (JDK), 99 Java Runtime Environment (JRE), 99 JavaScript Object Notation for Linked Data (JSON-LD), 37 Java Virtual Machine (JVM), 99 „„         K Knowledge representation standards GRDDL, 39 HTML5 microdata attributes, 35 microdata DOM API, 37 JSON-LD, 37 machine-readable annotation formats, 23 microformats drafts and future, 32 hCalendar, 25 hCard, 26 h-event, 26 rel=“license”, 28 rel=“nofollow”, 29 rel=“tag”, 30 URI profile, 25 vote links, 30 XFN, 30 XMDP, 31 OWL classes, 51 description logic, 46 properties, 50 syntaxes, 49 variants, 48 parsers, 54 R2RML, 40 RDF, 18 RDFa, 32 RDFS classes, 42 domains and ranges, 44 instance, 42 properties, 44 subclasses, 42 reasoning, 54 RIF, 53 SKOS, 53 vocabularies and ontologies books, 16 DOAP, 17 e-commerce, 16 FOAF, 13 licensing, 17 media ontologies, 18 metadata, 15 online communities, 18 person vocabularies, 15 PRISM, 16 publications, 16 schema.org, 14 Komodo Edit, 80 229 ■ index „„         L „„         P LinkedGeoData, 66 Linked Open Data (LOD) cloud diagram, 67 collections, 67 creation interlinking, 72 licenses, 71 RDF statements, 72 RDF structure, 70 your dataset, 74 DBpedia, 63 five-star rating system, 60 GeoNames, 65 LinkedGeoData, 66 principles, 59 RDF crawling, 62 RDF dumps, 62 SPARQL endpoints, 62 visualization, 75 Wikidata, 65 YAGO, 67 LODStats, 113 Pellet, 93 Persistent Uniform Resource Locators (PURLs), 9 Process model, 129 Protégé Active Ontology tab, 88 application, 86 class hierarchies, 88 command line, 86 GUI, 87 HermiT reasoner, 93 Individuals tab, 88 Learning Health System, 86 Object Properties and Data Properties tabs, 88 OntoGraf tab, 88 OWLViz, 88 SPARQL Query tab, 89 URIs, 88 PublishMyData, 195 „„         Q „„         M Quadstores, 149 MAchine-Readable Cataloging (MARC), 213 MicroWSMO, 137 „„         R „„         N Named graph, 149 Natural Language Processing (NLP) methods, 86 Neo4j, 161 Cypher commands, 163 graph style sheet, 163 Java API database installation, 165 Eclipse, 164, 168 node method, 166 main method, 166 RDF statement, 167 shut down method, 167 WEBSITE_OF method, 166 server installation, 161 web interface, 162 Neologism, 112 NetBeans, 108 Notepad++, 80 „„         O OpenLink Data Explorer (ODE), 114 OpenLink Virtuoso, 190 OpenRefine, 86 Oracle, 171 230 RACER, 94 RDB2RML (R2RML), 40 RDB to RDF direct mapping employee database table, 217 employee_project database table, 217 project database table, 218 source code, 218 Red Hat Package Manager (RPM) package, 152 Relational database (RDB), 217 RelFinder, 117 Renamed ABox and Concept Expression Reasoner (Racer), 94 rep.initialize() method, 104 Resource Description Framework (RDF), 217 attributes, 32 crawling, 62 dumps, 62 graph, 20, 145 R2RML, 40 statements, 72 structure creation, 70 triples/statements, 19, 220 turtle, 20 vocabulary, 18 RESTful JSON API, 207 Rule Interchange Format (RIF), 53 ■ Index „„         S sameAs.org, 112 Search engine optimization (SEO), 79 Search Engine Result Pages (SERPs), 84, 200 Semantic Annotations for Web Service Description Language (SAWSDL), 127 Semantic Automated Discovery and Integration (SADI), 142 Semantic Measures Library (SML), 92 Semantic search engines, 189 Semantic Web technology, 1 Big Data (see Big Data applications) components AI, 5 controlled vocabularies, 5 inference, 7 ontologies, 6 taxonomies, 5 features, 8 structured data, 2 web evolution, 2 Semantic Web Services OWL-S (see Web Ontology Language for Services (OWL-S)) process, 121 properties, 122 SOAP fault structure, 124 message structure, 122 software IRS, 141 SADI, 142 WSMT, 141 WSMX, 141 UDDI, 142 WS-BPEL (see Web Services Business Process Execution Language (WS-BPEL)) WSDL (see Web Service Description Language (WSDL)) WSML (see Web Service Modeling Language (WSML)) WSMO (see Web Service Modeling Ontology (WSMO)) SemanticWorks, 89 Service profile, 129 Sesame Alibaba, 96 Eclipse, 103 empty graph creation, 98 Graph API, 97 local repository, 96 RDF Model API, 97 RDF triplestore, 96 RemoteRepositoryManager, 97 Repository API, 96 SAIL, 97 triple support, 98 default ValueFactory implementation, 97 Sesame RDF Query Language (SeRQL), 186 Simple Knowledge Organization System (SKOS), 53 Simple Object Access Protocol (SOAP) binding interface, 127 fault structure, 124 message structure, 122 Sindice, 85, 110 SOAPssage, 123 Social media applications Facebook Social Graph Facebook Graph API (see Facebook Graph API) friends recommendation, 206–207 node and edge, 206 Open Graph Protocol, 211 Twitter Cards, 211 Software as a Service (SaaS), 195 SPARQL endpoint 4store’s HTTP server, 195 callback function, 196 D2R configuration file, 193 D2R server installation, 193 Fuseki, 192 jQuery request data, 195 JSON-P request data, 196 OpenLink Virtuoso process, 190–191 PublishMyData request data, 195–196 URL encoding PublishMyData, 195 SPARQL queries ASK query, 179 CONSTRUCT query, 180 core types, 176 CQL, 188 default namespace, 174 DESCRIBE query, 180 existence checking function, 177 federated query, 181 graph management operations ADD operation, 185–186 COPY DEFAULT TO operation, 184 default graph, 184 MOVE DEFAULT TO operation, 185 graph patterns, 176 graph update operations DELETE DATA operation, 183 INSERT DATA operation, 182–183 language checking function, 177 LOD datasets, 189 multiple variable match, 176 namespace declaration, 173 one variable match, 176 231 ■ index SPARQL queries (cont.) property path, 177 public SPARQL endpoints, 190 query engine remove graph property value, 187 Sesame Graph API, 187 Sesame Repository API, 186 RDF graph, 174 RDF triple matching, 176 REASON query, 181 SELECT query, 178–179 solution modifiers, 178 SPARQL 1.0 core types, 175 SPARQL 1.1 aggregation, 175 entailment regimes, 175 service description, 175 Uniform HTTP Protocol, 175 Update language, 175 SPARQL endpoint (see SPARQL endpoint) structure, 174 triple patterns, 176 URI syntax, 173 Storage And Inference Layer API (SAIL), 97 „„         T TextWrangler, 80 TopBraid Composer, 90 Triples map, 218 Twitter Cards, 211 „„         U, V Uniform Resource Identifier (URI), 9 Uniform Resource Names (URNs), 9 Universal Description, Discovery and Integration (UDDI), 142 US Library of Congress, 213 „„         W Web Ontology Language (OWL), 129 classes, 51 description logic, 46 properties, 50 syntaxes, 49 variants, 48 Web Ontology Language for Services (OWL-S) atomic process, 221 output and effect conditions, 132 parameter class, 130 232 precondition process, 131 process agents, 131 properties, 130 service process, 131 situation calculus, 129 SWRL variable, 130 URI value, 131 Web resource identifiers, 8 Web Services Business Process Execution Language (WS-BPEL), 140 Web Services Description Language (WSDL) data types, 126 elements, 124 endpoint element, 127 HTTP binding interface, 126 interface element, 126 namespace declaration, 125 SAWSDL annotation file, 128 modelReference, 127 skeleton document, 125 SOAP binding interface, 127 Web Service Modeling eXecution environment (WSMX), 141 Web Service Modeling Language (WSML) importsOntology, 139 IRI quotes, 139 mediator, 140 namespace declarations, 139 nonfunctional property, 139 syntaxes, 138 XML schema data types, 138 Web Service Modeling Ontology (WSMO) choreography and orchestration expresses, 137 class capability, 137 components, 133 definition, 133 entity set definitions, 135 function, 136 goal class, 137 mediators, 133–134 MicroWSMO, 137 nonfunctional properties, 134 ontology instance, 136 post-condition creditcard service, 224 pre-condition creditcard service, 224 relations definition, 135 service class, 134 service goal definition, 223 travel agency modeling, 223 WSMO-lite, 137 ■ Index Web Services Modeling Toolkit (WSMT), 141 Wikidata, 65 WSMO-Lite, 137 „„         X XHTML Friends Network (XFN), 30 XHTML MetaData Profiles (XMDP), 31 „„         Y Yet Another Great Ontology (YAGO), 67 „„         Z ZOOMA, 91 233 Mastering Structured Data on the Semantic Web From HTML5 Microdata to Linked Open Data Leslie F.


pages: 321

Finding Alphas: A Quantitative Approach to Building Trading Strategies by Igor Tulchinsky

algorithmic trading, asset allocation, automated trading system, backpropagation, backtesting, barriers to entry, behavioural economics, book value, business cycle, buy and hold, capital asset pricing model, constrained optimization, corporate governance, correlation coefficient, credit crunch, Credit Default Swap, currency risk, data science, deep learning, discounted cash flows, discrete time, diversification, diversified portfolio, Eugene Fama: efficient market hypothesis, financial engineering, financial intermediation, Flash crash, Geoffrey Hinton, implied volatility, index arbitrage, index fund, intangible asset, iterative process, Long Term Capital Management, loss aversion, low interest rates, machine readable, market design, market microstructure, merger arbitrage, natural language processing, passive investing, pattern recognition, performance metric, Performance of Mutual Funds in the Period, popular capitalism, prediction markets, price discovery process, profit motive, proprietary trading, quantitative trading / quantitative finance, random walk, Reminiscences of a Stock Operator, Renaissance Technologies, risk free rate, risk tolerance, risk-adjusted returns, risk/return, selection bias, sentiment analysis, shareholder value, Sharpe ratio, short selling, Silicon Valley, speech recognition, statistical arbitrage, statistical model, stochastic process, survivorship bias, systematic bias, systematic trading, text mining, transaction costs, Vanguard fund, yield curve

If the data is simple, vendors may provide only the raw data they have collected, such as price and volume. Sometimes vendors do parsing and processing before providing data to their clients; fundamental data is an example. For unstructured yet sophisticated data, such as news, Twitter posts, and so on, vendors typically apply natural language processing techniques to analyze the content of the raw data. They provide machine-readable data to their clients instead of raw data that is only human-readable. Some vendors even sell alpha models directly – this means the data itself is the output of alpha models. The clients need only to load the data and trade according to it.

The name comes from Canadian scientist Geoffrey Hinton, who created an unsupervised method known as the restricted Boltzmann machine (RBM) for pretraining NNs with a large number of neuron layers. That was meant to improve on the backpropagation training method, but there is no strong evidence that it really was an improvement. Another direction in deep learning is recurrent neural networks (RNNs) and natural language processing. One problem that arises in calibrating RNNs is that the changes in the weights from step to step can become too small or too large. This is called the vanishing gradient problem. These days, the words “deep learning” more often refer to convolutional neural networks (CNNs). The architecture of CNNs was introduced by computer scientists Kunihiko Fukushima, who developed the 126 Finding Alphas neocognitron model (feed-forward NN), and Yann LeCun, who modified the backpropagation algorithm for neocognitron training.

But in recent years, news and sentiment seen on social media have grown increasingly significant as potential predictors of stock prices. However, it is challenging to make alphas using news. As unstructured data that often includes text and multimedia content, news cannot be understood directly by a computer. We can use natural language processing (NLP) and machine learning methods to classify and score raw news content, and we can measure additional properties of the news, such as novelty, relevance, and category, to better describe the sentiment of the news. Similar techniques can be applied to social media data to generate alphas, though we should bear in mind that social media has much more volume and is much noisier than conventional news media.


pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline by Cathy O'Neil, Rachel Schutt

Amazon Mechanical Turk, augmented reality, Augustin-Louis Cauchy, barriers to entry, Bayesian statistics, bike sharing, bioinformatics, computer vision, confounding variable, correlation does not imply causation, crowdsourcing, data science, distributed generation, Dunning–Kruger effect, Edward Snowden, Emanuel Derman, fault tolerance, Filter Bubble, finite state, Firefox, game design, Google Glasses, index card, information retrieval, iterative process, John Harrison: Longitude, Khan Academy, Kickstarter, machine translation, Mars Rover, Nate Silver, natural language processing, Netflix Prize, p-value, pattern recognition, performance metric, personalized medicine, pull request, recommendation engine, rent-seeking, selection bias, Silicon Valley, speech recognition, statistical model, stochastic process, tacit knowledge, text mining, the scientific method, The Wisdom of Crowds, Watson beat the top human players on Jeopardy!, X Prize

format=json&query=nytd_section_facet: [%s]&fields=url,title,body&rank=newest&offset=%s &api-key=Your_Key_Here" # create an empty list to hold 3 result sets resultsSports <- vector("list", 3) ## loop through 0, 1 and 2 to call the API for each value for(i in 0:2) { # first build the query string replacing the first %s with Sport and the second %s with the current value of i tempCall <- sprintf(theCall, "Sports", i) # make the query and get the json response tempJson <- fromJSON(file=tempCall) # convert the json into a 10x3 data.frame and save it to the list resultsSports[[i + 1]] <- ldply(tempJson$results, as.data.frame) } # convert the list into a data.frame resultsDFSports <- ldply(resultsSports) # make a new column indicating this comes from Sports resultsDFSports$Section <- "Sports" ## repeat that whole business for arts ## ideally you would do this in a more eloquent manner, but this is just for illustration resultsArts <- vector("list", 3) for(i in 0:2) { tempCall <- sprintf(theCall, "Arts", i) tempJson <- fromJSON(file=tempCall) resultsArts[[i + 1]] <- ldply(tempJson$results, as.data.frame) } resultsDFArts <- ldply(resultsArts) resultsDFArts$Section <- "Arts" # combine them both into one data.frame resultBig <- rbind(resultsDFArts, resultsDFSports) dim(resultBig) View(resultBig) ## now time for tokenizing # create the document-term matrix in english, removing numbers and stop words and stemming words doc_matrix <- create_matrix(resultBig$body, language="english", removeNumbers=TRUE, removeStopwords=TRUE, stemWords=TRUE) doc_matrix View(as.matrix(doc_matrix)) # create a training and testing set theOrder <- sample(60) container <- create_container(matrix=doc_matrix, labels=resultBig$Section, trainSize=theOrder[1:40], testSize=theOrder[41:60], virgin=FALSE) Historical Context: Natural Language Processing The example in this chapter where the raw data is text is just the tip of the iceberg of a whole field of research in computer science called natural language processing (NLP). The types of problems that can be solved with NLP include machine translation, where given text in one language, the algorithm can translate the text to another language; semantic analysis; part of speech tagging; and document classification (of which spam filtering is an example).

Each box has an embedded Linux processor running Python, and a sound card that makes various sounds—clicking, typing, waves—depending on what scene is playing. Figure 9-7. Display box for Moveable Type The data is collected via text from New York Times articles, blogs, and search engine activity. Every sentence is parsed using Stanford natural language processing techniques, which diagram sentences. Altogether there are about 15 scenes so far, and it’s written in code so one can keep adding to it. Here’s a YouTube interview with Mark and Ben about the exhibit. Project Cascade: Lives on a Screen Mark next told us about Cascade, which was a joint work with Jer Thorp—data artist-in-residence at the New York Times —in partnership with bit.ly.

Example of the Arabic blogosphere The different colors represent countries and clusters of blogs. The size of each dot is centrality through degree, i.e., the number of links to other blogs in the network. The physical structure of the blogosphere can give us insight. If we analyze text using natural language processing (NLP), thinking of the blog posts as a pile of text or a river of text, then we see the micro or macro picture only—we lose the most important story. What’s missing there is social network analysis (SNA), which helps us map and analyze the patterns of interaction. The 12 different international blogospheres, for example, look different.


Mastering Machine Learning With Scikit-Learn by Gavin Hackeling

backpropagation, computer vision, constrained optimization, correlation coefficient, data science, Debian, deep learning, distributed generation, iterative process, natural language processing, Occam's razor, optical character recognition, performance metric, recommendation engine

www.it-ebooks.info Mikhail Korobov is a software developer at ScrapingHub Inc., where he works on web scraping, information extraction, natural language processing, machine learning, and web development tasks. He is an NLTK team member, Scrapy team member, and an author or contributor to many other open source projects. I'd like to thank my wife, Aleksandra, for her support and patience and for the cookies. Aman Madaan is currently pursuing his Master's in Computer Science and Engineering. His interests span across machine learning, information extraction, natural language processing, and distributed computing. More details about his skills, interests, and experience can be found at http://www.amanmadaan.in.

Data sets with even a modest number of features can result in mapped feature spaces with massive dimensions. scikit-learn provides several commonly used kernels, including the polynomial, sigmoid, Gaussian, and linear kernels. Polynomial kernels are given by the following equation: K ( x, x′ ) = (1 + x × x′ ) k Quadratic kernels, or polynomial kernels where k is equal to 2, are commonly used in natural language processing. The sigmoid kernel is given by the following equation. γ and r are hyperparameters that can be tuned through cross-validation: K ( x, x′ ) = tanh γ ( x, x′ ) + r The Gaussian kernel is a good first choice for problems requiring nonlinear models. The Gaussian kernel is a radial basis function.


pages: 326 words: 88,968

The Science and Technology of Growing Young: An Insider's Guide to the Breakthroughs That Will Dramatically Extend Our Lifespan . . . And What You Can Do Right Now by Sergey Young

23andMe, 3D printing, Albert Einstein, artificial general intelligence, augmented reality, basic income, Big Tech, bioinformatics, Biosphere 2, brain emulation, caloric restriction, caloric restriction, Charles Lindbergh, classic study, clean water, cloud computing, cognitive bias, computer vision, coronavirus, COVID-19, CRISPR, deep learning, digital twin, diversified portfolio, Doomsday Clock, double helix, Easter island, Elon Musk, en.wikipedia.org, epigenetics, European colonialism, game design, Gavin Belson, George Floyd, global pandemic, hockey-stick growth, impulse control, Internet of things, late capitalism, Law of Accelerating Returns, life extension, lockdown, Lyft, Mark Zuckerberg, meta-analysis, microbiome, microdosing, moral hazard, mouse model, natural language processing, personalized medicine, plant based meat, precision agriculture, radical life extension, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, ride hailing / ride sharing, Ronald Reagan, self-driving car, seminal paper, Silicon Valley, stem cell, Steve Jobs, tech billionaire, TED Talk, uber lyft, ultra-processed food, universal basic income, Virgin Galactic, Vision Fund, X Prize

They then used the AI techniques of deep learning and computer vision to teach their algorithm to recognize DR, just like a qualified ophthalmologist would. Today, the shortage of doctors to monitor the retinal condition of diabetic patients is less of a problem, thanks to AI. 3.AI Case Study #3: Natural Language Processing and Taking AI Health Care to the Next Level Using AI to analyze raw data and even images is one thing. But for AI technology to really make the kind of hyper-accurate, deeply personalized diagnosis precision medicine is capable of requires organizing and assimilating a huge number of sources, including medical records of hundreds of millions of patients, literature on thousands of approved and experimental drugs, clinical journals, insurance claims, and even handwritten doctors’ notes and patient charts, in multiple languages.

But for AI technology to really make the kind of hyper-accurate, deeply personalized diagnosis precision medicine is capable of requires organizing and assimilating a huge number of sources, including medical records of hundreds of millions of patients, literature on thousands of approved and experimental drugs, clinical journals, insurance claims, and even handwritten doctors’ notes and patient charts, in multiple languages. It needs to then draw useful insights from those sources, make probabilistic calculations about a patient’s situation, and offer the best possible solution for any particular patient. One of the ways that computers can do that is through natural language processing, or NLP, a form of AI that makes sense of written information. Medical AI systems like CloudMedX use NLP to scan language-based data and determine the right care pathway for a patient. At present, individual symptoms such as heart pain and tingling in the fingers can be entered in order to deduce a diagnosis—something particularly useful for rare conditions that a physician is unlikely to have personal experience with.

Medical records can be monitored to identify patterns associated with negative health events like hospital-acquired infections, heart attacks, and so on. On a pretty narrow basis, NLP can already perform the detailed analysis I described above to enhance physician decision making. In time, AI will be able to combine computer vision, deep learning, natural language processing, and other techniques to provide extremely reliable diagnostic outcomes. It will take all of the guesswork and inconsistency out of medical care and make our old one-size-fits-all approach seem barbaric in retrospect. We have a long way to go, but within the Near Horizon of Longevity, precision medicine will become, without a great deal of hyperbole, perfect medicine.


The Internet Trap: How the Digital Economy Builds Monopolies and Undermines Democracy by Matthew Hindman

A Declaration of the Independence of Cyberspace, accounting loophole / creative accounting, activist fund / activist shareholder / activist investor, AltaVista, Amazon Web Services, barriers to entry, Benjamin Mako Hill, bounce rate, business logic, Cambridge Analytica, cloud computing, computer vision, creative destruction, crowdsourcing, David Ricardo: comparative advantage, death of newspapers, deep learning, DeepMind, digital divide, discovery of DNA, disinformation, Donald Trump, fake news, fault tolerance, Filter Bubble, Firefox, future of journalism, Ida Tarbell, incognito mode, informal economy, information retrieval, invention of the telescope, Jeff Bezos, John Perry Barlow, John von Neumann, Joseph Schumpeter, lake wobegon effect, large denomination, longitudinal study, loose coupling, machine translation, Marc Andreessen, Mark Zuckerberg, Metcalfe’s law, natural language processing, Netflix Prize, Network effects, New Economic Geography, New Journalism, pattern recognition, peer-to-peer, Pepsi Challenge, performance metric, power law, price discrimination, recommendation engine, Robert Metcalfe, search costs, selection bias, Silicon Valley, Skype, sparse data, speech recognition, Stewart Brand, surveillance capitalism, technoutopianism, Ted Nelson, The Chicago School, the long tail, The Soul of a New Machine, Thomas Malthus, web application, Whole Earth Catalog, Yochai Benkler

Google has even built new globally distributed database systems called Spanner and F1, in which operations across different data centers are synced using atomic clocks.22 The latest iteration of Borg, Google’s cluster management system, coordinates “hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.”23 In recent years Google’s data centers have expanded their capabilities in other ways, too. As Google has increasingly focused on problems like computer vision, speech recognition, and natural language processing, it has worked to deploy deep learning, a variant of neural network methods. Google’s investments in deep learning have been massive and multifaceted, including (among other things) major corporate acquisitions and the development of the TensorFlow high-level programming toolkit.24 But one critical component has been the development of a custom computer chip built specially for machine learning.

Jumps in accuracy involved taking that limited data and extracting new features, like temporal effects. The moral here is somewhat paradoxical. Netflix released a massive dataset to find the best algorithm, but the algorithms themselves proved less important than the data. Similar lessons have emerged in other, quite different realms of machine learning. In research on natural language processing, Microsoft researchers examined how accuracy improved across several different algorithms as the amount of training data increased. Although these algorithms showed dramatically different performance on tests of one million words, as the researchers scaled up the training set— to ten million, one hundred million, and finally one billion words—the algorithms’ performance became more and more similar.

Duplication of television viewing between and within channels. Journal of Marketing Research, 6 (2). Google. (2013). Efficiency: how we do it. Retrieved from http://www.google.com/about /datacenters/efficiency/internal/. Gorrell, G. (2006). Generalized Hebbian algorithm for incremental singular value decomposition in natural language processing. In Proceedings of EACL, Trento, Italy (pp. 97–104). Gould, S. J. (2002). The structure of evolutionary theory. Cambridge, MA: Harvard University Press. Graves, L. (2010). Traffic jam: we’ll never agree about online audience size. Columbia Journalism Review. Retrieved from https://archives.cjr.org/reports/traffic_jam.php.


pages: 125 words: 27,675

Applied Text Analysis With Python: Enabling Language-Aware Data Products With Machine Learning by Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda

data science, full text search, natural language processing, quantitative easing, sentiment analysis, statistical model, the long tail

Computation and analysis on language must also be flexible, therefore the primary computational technique for text analytics is machine learning. Learning techniques give data scientists the ability to train models in a specific context on a specific corpus, make predictions on new data, and adapt over time as the corpus grows and changes. In fact, most natural language processing uses machine learning in one form or another, from tokenization and part of speech tagging, as we saw in the previous chapter, to named entity recognition, entailment, and parsing. More recently, textual machine learning has enabled applications that utilize sentiment analysis, word sense disambiguation, automatic translation and tagging, scene recognition, captioning, chatbots, and more!

However, even when data is split into training and test sets, there is a potential that certain chunks of the data will have more variance than others. To handle this case, we shuffle our dataset, and divide into k train and test splits, averaging the scores for each split. Note that if our natural language processing application didn’t have to do any machine learning work, the CorpusReader would be enough; after preprocessing, the text could go directly into a transformer. However, if we use the sklearn.cross_validation.train_test_split function directly on the reader, the data would be loaded into memory all at once, leaving us precious little RAM for computation if any at all.


Speaking Code: Coding as Aesthetic and Political Expression by Geoff Cox, Alex McLean

4chan, Amazon Mechanical Turk, augmented reality, bash_history, bitcoin, Charles Babbage, cloud computing, commons-based peer production, computer age, computer vision, Computing Machinery and Intelligence, crowdsourcing, dematerialisation, Donald Knuth, Douglas Hofstadter, en.wikipedia.org, Everything should be made as simple as possible, finite state, Free Software Foundation, Gabriella Coleman, Gödel, Escher, Bach, Hacker Conference 1984, Ian Bogost, Jacques de Vaucanson, language acquisition, Larry Wall, late capitalism, means of production, natural language processing, Neal Stephenson, new economy, Norbert Wiener, Occupy movement, packet switching, peer-to-peer, power law, Richard Stallman, Ronald Coase, Slavoj Žižek, social software, social web, software studies, speech recognition, SQL injection, stem cell, Stewart Brand, systems thinking, The Nature of the Firm, Turing machine, Turing test, Vilfredo Pareto, We are Anonymous. We are Legion, We are the 99%, WikiLeaks, Yochai Benkler

The title makes reference to the Greek myth in which Pygmalion, a sculptor, falls in love with a statue he carves, and Venus grants it the breath of life.47 The Pygmalion myth stands as a useful analogy for the idea of breathing life into machines. The chatterbot Eliza produced by Joseph Weizenbaum between 1964 to 1966, named after Eliza Doolittle, is a disarmingly simple example based on similar motivations: to simulate a believable exchange with a human conversant. It uses primitive natural-language processing to simulate a conversation with a therapist, producing human-like responses by implementing a simple script based on key words and language patterns through which responses are generated. Here is an example: I am the psychotherapist. Please, describe your problems. > Hello world. Why do you say hello world?

Without involving complex algorithms, it can generate responses that appear to make some degree of sense. Yet it is interesting to note how the illusion of conversation follows an extremely reductive model of human expression, and the fantasies of machine intelligence seem to be similarly founded on reductive logic. At the same Vocable Code 31 time, natural-language processing programs and other chatterbots offer good examples of the speechlike procedures mentioned thus far, as well as the apparent impossibility of duplicating actual speech. Intelligence To demonstrate believability, a machine would be required to possess some kind of intelligence that reflects the capacity for human reasoning, in parallel to turning mere voice sounds into proper speech that expresses human gentility.

In contrast to the 1962 version, Bicycle Built for 2,000 was synthesized with a distributed system of human voices from all over the world.” 120. 2001: A Space Odyssey (1968, dir. Stanley Kubrick, Metro-Goldwyn-Mayer). HAL is a computer capable of speech, speech recognition, facial recognition, natural language processing, lip reading, art appreciation, interpreting and reproducing emotional behaviors, reasoning, and playing chess. Notes to Pages 67–70 125 121. The full story was on the Forumwarz blog but is no longer available. See http:// en.wikipedia.org/wiki/Forumwarz. Thanks to Robert Jackson for identifying this example. 122.


pages: 134 words: 29,488

Python Requests Essentials by Rakesh Vidya Chandra, Bala Subrahmanyam Varanasi

business logic, create, read, update, delete, en.wikipedia.org, Kickstarter, machine readable, MITM: man-in-the-middle, MVC pattern, natural language processing, RFC: Request For Comment, RFID, supply-chain management, web application

[ 48 ] Interacting with Social Media Using Requests In this contemporary world, our lives are woven with a lot of interactions and collaborations with social media. The information that is available on the web is very valuable and it is being used by abundant resources. For instance, the news that is trending in the world can be spotted easily from a Twitter hashtag and this can be achieved by interacting with the Twitter API. Using natural language processing, we can classify emotion of a person by grabbing the Facebook status of an account. All this stuff can be accomplished easily with the help of Requests using the concerned APIs. Requests is a perfect module, if we want to reach out API frequently, as it supports pretty much everything, like caching, redirection, proxies, and so on.

Unstructured data In contrast to structured data, unstructured data either misses out on a standard format or stays unorganized even though a specific format is imposed on it. Due to this reason, it becomes difficult to deal with different parts of the data. Also, it turns into a tedious task. To handle unstructured data, different techniques such as text analytics, Natural Language Processing (NLP), and data mining are used. Images, scientific data, text-heavy content (such as newspapers, health records, and so on), come under the unstructured data type. [ 66 ] Chapter 6 Semistructured data Semistructured data is a type of data that follows an irregular trend or has a structure which changes rapidly.


pages: 118 words: 35,663

Smart Machines: IBM's Watson and the Era of Cognitive Computing (Columbia Business School Publishing) by John E. Kelly Iii

AI winter, book value, call centre, carbon footprint, Computing Machinery and Intelligence, crowdsourcing, demand response, discovery of DNA, disruptive innovation, Erik Brynjolfsson, Fairchild Semiconductor, future of work, Geoffrey West, Santa Fe Institute, global supply chain, Great Leap Forward, Internet of things, John von Neumann, Large Hadron Collider, Mars Rover, natural language processing, optical character recognition, pattern recognition, planetary scale, RAND corporation, RFID, Richard Feynman, smart grid, smart meter, speech recognition, TED Talk, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!

Together, we can drive the exploration and invention that will shape society, the economy, and business for the next fifty years. 1 A NEW ERA OF COMPUTING IBM’s Watson computer created a sensation when it bested two past grand champions on the TV quiz show Jeopardy! Tens of millions of people suddenly understood how “smart” a computer could be. This was no mere parlor trick; the scientists who designed Watson built upon decades of research in the fields of artificial intelligence and natural-language processing and produced a series of breakthroughs. Their ingenuity made it possible for a system to excel at a game that requires both encyclopedic knowledge and lightning-quick recall. In preparation for the match, the machine ingested millions of pages of information. On the TV show, first broadcast in February 2011, the system was able to search that vast storehouse in response to questions, size up its confidence level, and, when sufficiently confident, beat the humans to the buzzer.

Other improvements to Watson have come. People are now able to view the logic and evidence upon which Watson presents options. Watson is now able to digest not just textual information but also structured statistical data, such as electronic medical records. A different group at IBM is working on natural-language-processing technology that will allow people to engage in spoken conversations with Watson. At the highest level, many of the changes are aimed at moving Watson from answering specific questions to dealing with complex and incomplete problem scenarios—the way humans experience things. In fact, as people in particular professions and industries experiment with Watson, they find that the basic question-and-answer capabilities, while useful, are not the most valuable aspects of the systems.


pages: 370 words: 107,983

Rage Inside the Machine: The Prejudice of Algorithms, and How to Stop the Internet Making Bigots of Us All by Robert Elliott Smith

"World Economic Forum" Davos, Ada Lovelace, adjacent possible, affirmative action, AI winter, Alfred Russel Wallace, algorithmic bias, algorithmic management, AlphaGo, Amazon Mechanical Turk, animal electricity, autonomous vehicles, behavioural economics, Black Swan, Brexit referendum, British Empire, Cambridge Analytica, cellular automata, Charles Babbage, citizen journalism, Claude Shannon: information theory, combinatorial explosion, Computing Machinery and Intelligence, corporate personhood, correlation coefficient, crowdsourcing, Daniel Kahneman / Amos Tversky, data science, deep learning, DeepMind, desegregation, discovery of DNA, disinformation, Douglas Hofstadter, Elon Musk, fake news, Fellow of the Royal Society, feminist movement, Filter Bubble, Flash crash, Geoffrey Hinton, Gerolamo Cardano, gig economy, Gödel, Escher, Bach, invention of the wheel, invisible hand, Jacquard loom, Jacques de Vaucanson, John Harrison: Longitude, John von Neumann, Kenneth Arrow, Linda problem, low skilled workers, Mark Zuckerberg, mass immigration, meta-analysis, mutually assured destruction, natural language processing, new economy, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, On the Economy of Machinery and Manufactures, p-value, pattern recognition, Paul Samuelson, performance metric, Pierre-Simon Laplace, post-truth, precariat, profit maximization, profit motive, Silicon Valley, social intelligence, statistical model, Stephen Hawking, stochastic process, Stuart Kauffman, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, The Future of Employment, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Thomas Bayes, Thomas Malthus, traveling salesman, Turing machine, Turing test, twin studies, Vilfredo Pareto, Von Neumann architecture, warehouse robotics, women in the workforce, Yochai Benkler

., here Milbanke, Anne Isabella, here Mill, John Stuart, here, here, here, here Minsky, Marvin, here, here, here modelling, economic, here Morel, Edmund, here motivated reasoners, here, here, here, here MTurk, here Murray, Charles, here Musk, Elon, here, here MYCIN, here, here, here, here mythemes, here natural language processing. See NLP natural selection, here, here, here, here, here, here, here, here Nautical Almanac, here, here neural networks, here, here, here, here, here, here Newell, Allen, here Newton, Sir Issac, here, here Nietzsche, Friedrich, here NLP (natural language processing), here, here, here, here Normal Distribution, here, here Ovadya, Aviv, here Oxford Martin jobs study, here Page, Larry, here Papert, Seymour, here, here Pareto, Vilfredo, here Pascal, Blaise, here, here Pascaline, here, here, here, here Pasteur, Louis, here PCA (principle component analysis), here, here, here, here, here, here Pearson, Egon, here Pearson, Joel, here Pearson, Karl, here, here, here, here Peirce, Charles Sanders, here, here, here perceptron, here, here, here, here, here Perceptron Learning Algorithm.

Sometimes that means their potentially offensive, potentially dangerous outputs are both unpredictable and irreparable. This is why the award-winning Google AI engineer Ali Rahimi said in 2017 that ‘Machine learning has become alchemy’.6 Machine-learning algorithms are now used in everything from image recognition, to natural language processing, to medical diagnosis, and virtually every other modern AI application. They are the core of big data analysis, and the bedrock of virtually all modern AI, the technology that draws its frames and atoms from big data to overcome the old problems of expert systems design. Yet the implementations of these algorithms, the actual programs doing the classification and generalization, have become so opaque that it is comparable to medieval pre-science.

The systems they create are certainly complex enough to demonstrate emergent behaviours, outcomes of complex networks that cannot be predicted from their individual parts. However, the emergent, spontaneous order these agents help to create may not be the one many people would desire. 9 Defining Terms Human communication cannot be reduced to information. Science-fiction author URSULA K. LE GUIN, 20041 Just like image recognition, natural language processing (NLP) is at the vanguard of AI today, and is exploding in its use, employing deep networks, machine-learning algorithms and the explosion of big data on the Internet (in the form of documents, websites, blogs, posts, tweets, etc., estimated to be enough text to fill 1011 A4 pages, with a large fraction of that text changing daily).2 NLP researchers and algorithm engineers face the daily challenge of getting computers to process, analyse and ‘comprehend’ some fraction of this text, as well as the challenges of recognizing human speech (audio and video content are also exploding online), and even generating some natural-sounding language in print and audio form.


pages: 255 words: 78,207

Web Scraping With Python: Collecting Data From the Modern Web by Ryan Mitchell

AltaVista, Amazon Web Services, Apollo 13, cloud computing, Computing Machinery and Intelligence, data science, en.wikipedia.org, Firefox, Guido van Rossum, information security, machine readable, meta-analysis, natural language processing, optical character recognition, random walk, self-driving car, Turing test, web application

Although you might not think that text analysis has anything to do with your project, understanding the concepts behind it can be extremely useful for all sorts of machine learning, as well as the more general ability to model real-world problems in proba‐ bilistic and algorithmic terms. 1 Although many of the techniques described in this chapter can be applied to all or most languages, it’s okay for now to focus on natural language processing in English only. Tools such as Python’s Natural Language Toolkit, for example, focus on English. Fifty-six percent of the Internet is still in English (with German follow‐ ing at a mere 6%, according to http://w3techs.com/technologies/overview/content_language/all). But who knows? English’s hold on the majority of the Internet will almost certainly change in the future, and further updates may be necessary in the next few years. 119 For instance, the Shazam music service can identify audio as containing a certain song recording, even if that audio contains ambient noise or distortion.

I hope that the coverage here will inspire you to think beyond conventional web scraping, or at least give some initial direction about where to begin when undertaking a project that requires natural language analysis. There are many excellent resources on introductory language processing and Python’s Natural Language Toolkit. In particular, Steven Bird, Ewan Klein, and Edward Loper’s book Natural Language Processing with Python presents both a com‐ prehensive and introductory approach to the topic. In addition, James Pustejovsky and Amber Stubbs’ Natural Language Annotations for Machine Learning provides a slightly more advanced theoretical guide. You’ll need a knowledge of Python to implement the lessons; the topics covered work perfectly with Python’s Natural Language Toolkit. 136 | Chapter 8: Reading and Writing Natural Languages CHAPTER 9 Crawling Through Forms and Logins One of the first questions that comes up when you start to move beyond the basics of web scraping is: “How do I access information behind a login screen?”

Hamidi, 227 intellectual property, 217-219 234 internal links crawling an entire site, 35-40 crawling with Scrapy, 45-48 traversing a single domain, 31-35 Internet about, 213-216 cautions downloading files from, 74 crawling across, 40-45 moving forward, 206 IP address blocking, avoiding, 199-200 ISO character sets, 96-98 is_displayed function, 186 Item object, 46, 48 items.py file, 46 | Index lambda expressions, 28, 74 legalities of web scraping, 217-230 lexicographical analysis with NLTK, 132-136 libraries bundling with projects, 7 OCR support, 161-164 logging with Scrapy, 48 logins about, 137 handling, 142-143 troubleshooting, 187 lxml library, 29 M machine learning, 135, 180 machine training, 135, 171-174 Markov text generators, 123-129 media files, storing, 71-74 Mersenne Twister algorithm, 34 methods (HTTP), 51 Microsoft SQL Server, 76 Microsoft Word, 102-105 MIME (Multipurpose Internet Mail Exten‐ sions) protocol, 90 MIMEText object, 90 MySQL about, 76 basic commands, 79-82 database techniques, 85-87 installing, 77-79 integrating with Python, 82-85 Wikipedia example, 87-89 N name attribute, 140 natural language processing about, 119 additional resources, 136 Markov models, 123-129 Natural Language Toolkit, 129-136 summarizing data, 120-123 Natural Language Toolkit (NLTK) about, 129 installation and setup, 129 lexicographical analysis, 132-136 statistical analysis, 130-132 NavigableString object, 18 navigating trees, 18-22 network connections about, 3-5 connecting reliably, 9-11 security considerations, 181 next_siblings() function, 21 ngrams module, 132 n-grams, 109-112, 120 NLTK (Natural Language Toolkit) about, 129 installation and setup, 129 lexicographical analysis, 132-136 statistical analysis, 130-132 NLTK Downloader interface, 130 NLTK module, 129 None object, 10 normalizing data, 112-113 NumPy library, 164 O OAuth authentication, 57 OCR (optical character recognition) about, 161 library support, 162-164 OpenRefine Expression Language (GREL), 116 OpenRefine tool about, 114 cleaning data, 116-118 filtering data, 115-116 installing, 114 usage considerations, 114 optical character recognition (OCR) about, 161 library support, 162-164 Oracle DBMS, 76 OrderedDict object, 112 os module, 74 P page load times, 154, 182 parentheses (), 25 parents (tags), 20, 22 parsing HTML pages (see HTML parsing) parsing JSON, 63 patents, 217 pay-per-hour computing instances, 205 PDF files, 100-102 PDFMiner3K library, 101 Penn Treebank Project, 133 period (.), 25 Peters, Tim, 211 PhantomJS tool, 152-155, 203 PIL (Python Imaging Library), 162 Pillow library about, 162 processing well-formatted text, 165-169 pipe (|), 25 plus sign (+), 25 POST method (HTTP) about, 51 tracking requests, 140 troubleshooting, 186 variable names and, 138 viewing form parameters, 140 Index | 235 previous_siblings() function, 21 primary keys in tables, 85 programming languages, regular expressions and, 27 projects, bundling with libraries, 7 pseudorandom number generators, 34 PUT method (HTTP), 51 PyMySQL library, 82-85 PySocks module, 202 Python Imaging Library (PIL), 162 Python language, installing, 209-211 Q query time versus database size, 86 quotation marks ("), 17 R random number generators, 34 random seeds, 34 rate limits about, 52 Google APIs, 60 Twitter API, 55 reading documents document encoding, 93 Microsoft Word, 102-105 PDF files, 100 text files, 94-98 recursion limit, 38, 89 redirects, 44, 158 Referrer header, 179 RegexPal website, 24 regular expressions about, 22-27 BeautifulSoup example, 27 commonly used symbols, 25 programming languages and, 27 relational data, 77 remote hosting running from a website hosting account, 203 running from the cloud, 204 remote servers avoiding IP address blocking, 199-200 extensibility and, 200 portability and, 200 PySocks and, 202 Tor and, 201-202 Requests library 236 | Index about, 137 auth module, 144 installing, 138, 179 submitting forms, 138 tracking cookies, 142-143 requests module, 179-181 responses, API calls and, 52 Robots Exclusion Standard, 223 robots.txt file, 138, 167, 222-225, 229 S safe harbor protection, 219, 230 Scrapy library, 45-48 screenshots, 197 script tag, 147 search engine optimization (SEO), 222 searching text data, 135 security considerations copyright law and, 219 forms and, 183-186 handling cookies, 181 SELECT statement, 79, 81 Selenium library about, 143 elements and, 153, 194 executing JavaScript, 152-156 handling redirects, 158 security considerations, 185 testing example, 193-198 Tor support, 203 semicolon (;), 210 SEO (search engine optimization), 222 server-side processing handling redirects, 44, 158 scripting languages and, 147 sets, 67 siblings (tags), 21 Simple Mail Transfer Protocol (SMTP), 90 site maps, 36 Six Degrees of Wikipedia, 31-35 SMTP (Simple Mail Transfer Protocol), 90 smtplib package, 90 sorted function, 112 span tag, 15 Spitler, Daniel, 227 SQL Server (Microsoft), 76 square brackets [], 25 src attribute, 28, 72, 74 StaleElementReferenceException, 158 statistical analysis with NLTK, 130-132 storing data (see data management) StringIO object, 99 strings, regular expressions and, 22-28 stylesheets about, 14, 216 dynamic HTML and, 151 hidden fields and, 184 Surface Web, 36 trademarks, 218 traversing the Web (see web crawlers) tree navigation, 18-22 trespass to chattels, 219-220, 226 trigrams module, 132 try...finally statement, 85 Twitov app, 123 Twitter API, 55-59 T underscore (_), 17 undirected graph problems, 127 Unicode standard, 83, 95-98, 110 unit tests, 190, 197 United States v.


pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets by David J. Leinweber

"World Economic Forum" Davos, AI winter, Alan Greenspan, algorithmic trading, AOL-Time Warner, Apollo 11, asset allocation, banking crisis, barriers to entry, Bear Stearns, Big bang: deregulation of the City of London, Bob Litterman, book value, business cycle, butter production in bangladesh, butterfly effect, buttonwood tree, buy and hold, buy low sell high, capital asset pricing model, Charles Babbage, citizen journalism, collateralized debt obligation, Cornelius Vanderbilt, corporate governance, Craig Reynolds: boids flock, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Danny Hillis, demand response, disintermediation, distributed generation, diversification, diversified portfolio, electricity market, Emanuel Derman, en.wikipedia.org, experimental economics, fake news, financial engineering, financial innovation, fixed income, Ford Model T, Gordon Gekko, Hans Moravec, Herman Kahn, implied volatility, index arbitrage, index fund, information retrieval, intangible asset, Internet Archive, Ivan Sutherland, Jim Simons, John Bogle, John Nash: game theory, Kenneth Arrow, load shedding, Long Term Capital Management, machine readable, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, market fragmentation, market microstructure, Mars Rover, Metcalfe’s law, military-industrial complex, moral hazard, mutually assured destruction, Myron Scholes, natural language processing, negative equity, Network effects, optical character recognition, paper trading, passive investing, pez dispenser, phenotype, prediction markets, proprietary trading, quantitative hedge fund, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Reminiscences of a Stock Operator, Renaissance Technologies, risk free rate, risk tolerance, risk-adjusted returns, risk/return, Robert Metcalfe, Ronald Reagan, Rubik’s Cube, Savings and loan crisis, semantic web, Sharpe ratio, short selling, short squeeze, Silicon Valley, Small Order Execution System, smart grid, smart meter, social web, South Sea Bubble, statistical arbitrage, statistical model, Steve Jobs, Steven Levy, stock buybacks, Tacoma Narrows Bridge, the scientific method, The Wisdom of Crowds, time value of money, tontine, too big to fail, transaction costs, Turing machine, two and twenty, Upton Sinclair, value at risk, value engineering, Vernor Vinge, Wayback Machine, yield curve, Yogi Berra, your tax dollars at work

We are much more adept at using structured and quantitative information on the Internet than textual and qualitative information. We are just starting to learn how to effectively use this kind of information. This area is driven by new Internet technologies such as XML (extensible markup language) and RSS (an XML dialect) and by advances in natural language processing. The new kid on the block, expected to take these ideas to new levels, is the Resource Description Framework (RDF), promoted by Web inventor Berners-Lee. RDF does for relationships between tagged data elements what the XML tagging itself did for moving from format HTML tags like “Bold” to meaningful XML tags like “Price.” 38 Nerds on Wall Str eet Hits and Misses: Rational and Irrational Technology Exuberance Peter Bernstein’s book Capital Ideas (Free Press, 1993) tells the story of Bill Sharpe, who wandered Wall Street looking for enough computer time to run a simple capital asset pricing model (CAPM) portfolio optimization, while being regarded as something of a crackpot for doing so.

Artificial Intelligence and Intelligence Amplification 157 on genetically adaptive strategies and well funded, but vanished, and few of the principals are still keen on genetic algorithms. After sending the GA to the back of the breakthrough line in the previous chapter, in Chapter 9 we get to “The Text Frontier,” using IA, natural language processing, and Web technologies to extract and make sense of qualitative written information from news and a variety of disintermediated sources. In Chapter 6, “Stupid Data Miner Tricks,” we saw how you could fool yourself with data. When you collect data that people have put on the Web, they can try to fool you as well.

Grabbing more and more data, and doing more and more searches, will quickly overwhelm us, leading to advanced cases of carpal tunnel syndrome, and a shelf full of unread books with “Information Explosion” somewhere in the title. Collectively, the new alphabet soup of technologies—AI, IA, NLP, and IR (artificial intelligence, intelligence amplification, natural language processing, and information retrieval, for those with a bigger soup bowl)—provides a means to make sense of patterns in the data collected in enterprise and global search. These means are molecular search, the use of persistent software agents so you don’t have to keep doing the same thing all the time; the semantic Web, using the information associated with data at the point of origin so there is less guessing about meaning of what find; and modern user interfaces and visualizations, so you can prioritize what you find, and focus on the important and the valuable in a timely way.


Four Battlegrounds by Paul Scharre

2021 United States Capitol attack, 3D printing, active measures, activist lawyer, AI winter, AlphaGo, amateurs talk tactics, professionals talk logistics, artificial general intelligence, ASML, augmented reality, Automated Insights, autonomous vehicles, barriers to entry, Berlin Wall, Big Tech, bitcoin, Black Lives Matter, Boeing 737 MAX, Boris Johnson, Brexit referendum, business continuity plan, business process, carbon footprint, chief data officer, Citizen Lab, clean water, cloud computing, commoditize, computer vision, coronavirus, COVID-19, crisis actor, crowdsourcing, DALL-E, data is not the new oil, data is the new oil, data science, deep learning, deepfake, DeepMind, Demis Hassabis, Deng Xiaoping, digital map, digital rights, disinformation, Donald Trump, drone strike, dual-use technology, Elon Musk, en.wikipedia.org, endowment effect, fake news, Francis Fukuyama: the end of history, future of journalism, future of work, game design, general purpose technology, Geoffrey Hinton, geopolitical risk, George Floyd, global supply chain, GPT-3, Great Leap Forward, hive mind, hustle culture, ImageNet competition, immigration reform, income per capita, interchangeable parts, Internet Archive, Internet of things, iterative process, Jeff Bezos, job automation, Kevin Kelly, Kevin Roose, large language model, lockdown, Mark Zuckerberg, military-industrial complex, move fast and break things, Nate Silver, natural language processing, new economy, Nick Bostrom, one-China policy, Open Library, OpenAI, PalmPilot, Parler "social media", pattern recognition, phenotype, post-truth, purchasing power parity, QAnon, QR code, race to the bottom, RAND corporation, recommendation engine, reshoring, ride hailing / ride sharing, robotic process automation, Rodney Brooks, Rubik’s Cube, self-driving car, Shoshana Zuboff, side project, Silicon Valley, slashdot, smart cities, smart meter, Snapchat, social software, sorting algorithm, South China Sea, sparse data, speech recognition, Steve Bannon, Steven Levy, Stuxnet, supply-chain attack, surveillance capitalism, systems thinking, tech worker, techlash, telemarketer, The Brussels Effect, The Signal and the Noise by Nate Silver, TikTok, trade route, TSMC

The recent explosion of interest (and money) in AI has been closely followed by the development of specialized hardware better suited for deep learning. For example, by 2022 an estimated three-fourths of all smartphones shipped—1.25 billion devices—will have an AI-specialized processor on board. These chips will improve the devices’ ability to perform facial recognition, image identification, natural language processing, and other AI tasks onboard the device. AI algorithms and software tools are widely available, with programming frameworks like TensorFlow, PyTorch, and Keras free online, yet the hardware ecosystem is concentrated among a small number of actors. A limited number of companies—and an even smaller number of countries—wield outsize influence over key chokepoints in global chip supply chains.

Their goal was to “enable machines to listen & speak, understand & think.” iFLYTEK had a voice-activated smart home system, like Amazon’s Alexa. Their translation services can translate between Chinese and fifty-eight different languages and can handle multiple Chinese dialects. In addition to voice-based AI, the firm was working on natural language processing. They were building a machine to pass the college entrance exam for mathematics, training it on thousands of exam papers. In education, they were building AI applications to automatically grade papers, personalize education resources, and customize study plans. They cheerily explained their AI would increase teaching efficiency.

Other AI researchers were incensed, not that OpenAI had developed and partially released a potentially dangerous tool, but that they had held back from releasing the full version online. Anima Anandkumar, director of machine learning at the chip company NVIDIA and a professor at Caltech, accused OpenAI of “fear-mongering” and “severely playing up the risks” of the model. Others agreed. Delip Rao, an expert on machine learning and natural language processing, told The Verge, “The words ‘too dangerous’ were casually thrown out here without a lot of thought or experimentation.” Others accused OpenAI of courting hype by giving prerelease copies to tech journalists. AI researchers woke up one Thursday morning to a spate of news headlines about a technical breakthrough for which they hadn’t yet seen the academic paper (it posted the same day).


pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier

23andMe, Affordable Care Act / Obamacare, airport security, Apollo 11, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, book value, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, data science, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, hype cycle, IBM and the Holocaust, index card, informal economy, intangible asset, Internet of things, invention of the printing press, Jeff Bezos, Joi Ito, lifelogging, Louis Pasteur, machine readable, machine translation, Marc Benioff, Mark Zuckerberg, Max Levchin, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, paypal mafia, performance metric, Peter Thiel, Plato's cave, post-materialism, random walk, recommendation engine, Salesforce, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, sparse data, speech recognition, Steve Jobs, Steven Levy, systematic bias, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Thomas Davenport, Turing test, vertical integration, Watson beat the top human players on Jeopardy!

In fact, endgames when six or fewer pieces are left on the chessboard have been completely analyzed and all possible moves (N=all) have been represented in a massive table that when uncompressed fills more than a terabyte of data. This enables chess computers to play the endgame flawlessly. No human will ever be able to outplay the system. The degree to which more data trumps better algorithms has been powerfully demonstrated in the area of natural language processing: the way computers learn how to parse words as we use them in everyday speech. Around 2000, Microsoft researchers Michele Banko and Eric Brill were looking for a method to improve the grammar checker that is part of the company’s Word program. They weren’t sure whether it would be more useful to put their effort into improving existing algorithms, finding new techniques, or adding more sophisticated features.

This was the “training set” by which the system could calculate the probability that, for example, one word in English follows another. It was a far cry from the grandfather in the field, the famous Brown Corpus of the 1960s, which totaled one million English words. Using the larger dataset enabled great strides in natural-language processing, upon which systems for tasks like voice recognition and computer translation are based. “Simple models and a lot of data trump more elaborate models based on less data,” wrote Google’s artificial-intelligence guru Peter Norvig and colleagues in a paper entitled “The Unreasonable Effectiveness of Data.”

See imprecision MetaCrawler, [>] metadata: in datafication, [>]–[>] metric system, [>] Microsoft, [>], [>], [>] Amalga software, [>]–[>], [>] and data-valuation, [>] and language translation, [>] Word spell-checking system, [>]–[>] Minority Report [film], [>]–[>], [>] Moneyball [film], [>], [>]–[>], [>], [>] Moneyball (Lewis), [>] Moore’s Law, [>] Mydex, [>] nanotechnology: and qualitative changes, [>] Nash, Bruce, [>] nations: big data and competitive advantage among, [>]–[>] natural language processing, [>] navigation, marine: correlation analysis in, [>]–[>] Maury revolutionizes, [>]–[>], [>], [>], [>], [>], [>], [>], [>], [>], [>] Negroponte, Nicholas: Being Digital, [>] Netbot, [>] Netflix, [>] collaborative filtering at, [>] data-reuse by, [>] releases personal data, [>] Netherlands: comprehensive civil records in, [>]–[>] network analysis, [>] network theory, [>] big data in, [>]–[>] New York City: exploding manhole covers in, [>]–[>], [>]–[>], [>], [>] government data-reuse in, [>]–[>] New York Times, [>]–[>] Next Jump, [>] Neyman, Jerzy: on statistical sampling, [>] Ng, Andrew, [>] 1984 (Orwell), [>], [>] Norvig, Peter, [>] “The Unreasonable Effectiveness of Data,” [>] Nuance: fails to understand data-reuse, [>]–[>] numerical systems: history of, [>]–[>] Oakland Athletics, [>]–[>] Obama, Barack: on open data, [>] Och, Franz Josef, [>] Ohm, Paul: on privacy, [>] oil refining: big data in, [>] ombudsmen, [>] Omidyar, Pierre, [>] open data.


When Computers Can Think: The Artificial Intelligence Singularity by Anthony Berglas, William Black, Samantha Thalind, Max Scratchmann, Michelle Estes

3D printing, Abraham Maslow, AI winter, air gap, anthropic principle, artificial general intelligence, Asilomar, augmented reality, Automated Insights, autonomous vehicles, availability heuristic, backpropagation, blue-collar work, Boston Dynamics, brain emulation, call centre, cognitive bias, combinatorial explosion, computer vision, Computing Machinery and Intelligence, create, read, update, delete, cuban missile crisis, David Attenborough, DeepMind, disinformation, driverless car, Elon Musk, en.wikipedia.org, epigenetics, Ernest Rutherford, factory automation, feminist movement, finite state, Flynn Effect, friendly AI, general-purpose programming language, Google Glasses, Google X / Alphabet X, Gödel, Escher, Bach, Hans Moravec, industrial robot, Isaac Newton, job automation, John von Neumann, Law of Accelerating Returns, license plate recognition, Mahatma Gandhi, mandelbrot fractal, natural language processing, Nick Bostrom, Parkinson's law, patent troll, patient HM, pattern recognition, phenotype, ransomware, Ray Kurzweil, Recombinant DNA, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, sorting algorithm, speech recognition, statistical model, stem cell, Stephen Hawking, Stuxnet, superintelligent machines, technological singularity, Thomas Malthus, Turing machine, Turing test, uranium enrichment, Von Neumann architecture, Watson beat the top human players on Jeopardy!, wikimedia commons, zero day

These plans could then be quickly adjusted if things did not turn out as expected or if faults were discovered. This capacity becomes important for missions to the outer planets where communication delays are significant. By 2001, speech understanding had also improved to the point of being practical. People could and sometimes did talk to computers on a regular basis. Natural language processing was also quite capable of understanding requests such as “How many Klingons are there in sector five?” or “Open the pod bay doors”. The Remote Agent did not process speech or natural language largely because there was no one to talk to on the spacecraft. Human astronauts have been obsolete technology since the mid 1970s.

This author has published papers showing how semantic networks and description logics can be used to structure complex expert system rule bases. Ontologies and databases Ontologies provide a hierarchical framework for the terms used in an information system. One simple ontology is Wordnet, which is widely used to assist in natural language processing. It contains the definitions of some 150,000 words, or more specifically, synsets, which are collections of words with the same meaning. Thus “engine” the machine is in a different synset from “engine” to cause (e.g. “the engine of change”). For each synset Wordnet contains a list hyponyms or subtypes, so for “engine” that includes “aircraft engine” and “generator”.

This is all an instant, intuitive process for a human Jeopardy! player, but I felt convinced that under the hood my brain was doing more or less the same thing. IBM is targeting Watson for use in some type of medical applications. However, Watson is a completely different type of system than an expert system such as MYCIN. It may be just the natural language processing that is being utilized, otherwise it would be concerning if treatment options were being decided by a trivia engine. Alternatively, IBM may be exploiting the general lack of understanding about artificial intelligence to use the word “Watson” to refer to any vaguely intelligent application that it is building.


pages: 713 words: 93,944

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Eric Redmond, Jim Wilson, Jim R. Wilson

AGPL, Amazon Web Services, business logic, create, read, update, delete, data is the new oil, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, full text search, general-purpose programming language, Kickstarter, Large Hadron Collider, linked data, MVC pattern, natural language processing, node package manager, random walk, recommendation engine, Ruby on Rails, seminal paper, Skype, social graph, sparse data, web application

SELECT *​​ ​​FROM movies​​ ​​WHERE title % 'Avatre';​​ ​​ title​​ ​​---------​​ ​​ Avatar​​ Trigrams are an excellent choice for accepting user input, without weighing them down with wildcard complexity. Full-Text Fun Next, we want to allow users to perform full-text searches based on matching words, even if they’re pluralized. If a user wants to search for certain words in a movie title but can remember only some of them, Postgres supports simple natural-language processing. TSVector and TSQuery Let’s look for a movie that contains the words night and day. This is a perfect job for text search using the @@ full-text query operator. ​​SELECT title​​ ​​FROM movies​​ ​​WHERE title @@ 'night & day';​​ ​​ title​​ ​​-------------------------------​​ ​​ A Hard Day’s Night​​ ​​ Six Days Seven Nights​​ ​​ Long Day’s Journey Into Night​​ The query returns titles like A Hard Day’s Night, despite the word Day being in possessive form, and the two words are out of order in the query.

Compare these two vectors: ​​SELECT to_tsvector('english', 'A Hard Day''s Night');​​ ​​ to_tsvector​​ ​​----------------------------​​ ​​'day':3 'hard':2 'night':5​​ ​​SELECT to_tsvector('simple', 'A Hard Day''s Night');​​ ​​ to_tsvector​​ ​​----------------------------------------​​ ​​'a':1 'day':3 'hard':2 'night':5 's':4​​ With simple, you can retrieve any movie containing the lexeme a. Other Languages Since Postgres is doing some natural-language processing here, it only makes sense that different configurations would be used for different languages. All of the installed configurations can be viewed with this command: ​​book=# \dF​​ Dictionaries are part of what Postgres uses to generate tsvector lexemes (along with stop words and other tokenizing rules we haven’t covered called parsers and templates).

This also explains why HBase is often employed at big companies to back logging and search systems. 4.1 Introducing HBase HBase is a column-oriented database that prides itself on consistency and scaling out. It is based on BigTable, a high-performance, proprietary database developed by Google and described in the 2006 white paper “Bigtable: A Distributed Storage System for Structured Data.”[26] Initially created for natural-language processing, HBase started life as a contrib package for Apache Hadoop. Since then, it has become a top-level Apache project. On the architecture front, HBase is designed to be fault tolerant. Hardware failures may be uncommon for individual machines, but in a large cluster, node failure is the norm.


The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Bayesian statistics, business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, data science, discrete time, disruptive innovation, George Gilder, Google Earth, hype cycle, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, Large Hadron Collider, late capitalism, lifelogging, linked data, longitudinal study, machine readable, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, SimCity, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, technological solutionism, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

It is premised on the notion that all massive datasets hold meaningful information that is non-random, valid, novel, useful and ultimately understandable (Han et al. 2011). As such, it uses supervised and unsupervised machine learning to detect, classify and segment meaningful relationships, associations and trends between variables. It does this using a series of different techniques including natural language processing, neural networks, decision trees, and statistical (non-parametric and parametric) methods. The selection of method varies between the type of data (structured, unstructured or semistructured) and the purpose of the analysis (see Table 6.1). Source: Miller and Han (2009: 7). Most of the techniques listed in Table 6.1 relate to structured data as found in relational databases.

In detecting associations, a variety of regression models might be used to compute correlations between variables and thus reveal hidden patterns that can then be leveraged into commercial gain (for example, identifying what goods are bought with each other and reorganising a store to promote purchasing) (see Chapter 7). Unstructured data in the form of language, images and sounds raise particular data mining challenges. Natural language-processing techniques seek to analyse human language as expressed through the written and spoken word. They use semantics and taxonomies to recognise patterns and extract information from documents. Examples would include entity extraction that automatically extracts metadata from text by searching for particular types of text and phrasing, such as person names, locations, dates, specialised terms and product terminology, and entity relation extraction that automatically identifies the relationships between semantic entities, linking them together (e.g., person name to birth date or location, or an opinion to an item) (McCreary 2009).

Index A/B testing 112 abduction 133, 137, 138–139, 148 accountability 34, 44, 49, 55, 63, 66, 113, 116, 165, 171, 180 address e-mail 42 IP 8, 167, 171 place 8, 32, 42, 45, 52, 93, 171 Web 105 administration 17, 30, 34, 40, 42, 56, 64, 67, 87, 89, 114–115, 116, 124, 174, 180, 182 aggregation 8, 14, 101, 140, 169, 171 algorithm 5, 9, 21, 45, 76, 77, 83, 85, 89, 101, 102, 103, 106, 109, 111, 112, 118, 119, 122, 125, 127, 130, 131, 134, 136, 142, 146, 154, 160, 172, 177, 179, 181, 187 Amazon 72, 96, 131, 134 Anderson, C. 130, 135 Andrejevic, M. 133, 167, 178 animation 106, 107 anonymity 57, 63, 79, 90, 92, 116, 167, 170, 171, 172, 178 apophenia 158, 159 Application Programming Interfaces (APIs) 57, 95, 152, 154 apps 34, 59, 62, 64, 65, 78, 86, 89, 90, 95, 97, 125, 151, 170, 174, 177 archive 21, 22, 24, 25, 29–41, 48, 68, 95, 151, 153, 185 archiving 23, 29–31, 64, 65, 141 artificial intelligence 101, 103 Acxiom 43, 44 astronomy 34, 41, 72, 97 ATM 92, 116 audio 74, 77, 83 automatic meter reading (AMR) 89 automatic number plate recognition (ANPR) 85, 89 automation 32, 51, 83, 85, 87, 89–90, 98, 99, 102, 103, 118, 127, 136, 141, 146, 180 Ayasdi 132, 134 backup 29, 31, 40, 64, 163 barcode 74, 85, 92, Bates, J. 56, 61, 62, 182 Batty, M. 90, 111, 112, 140 Berry, D. 134, 141 bias 13, 14, 19, 28, 45, 101, 134–136, 153, 154, 155, 160 Big Brother 126, 180 big data xv, xvi, xvii, 2, 6, 13, 16, 20, 21, 27–29, 42, 46, 67–183, 186, 187, 188, 190, 191, 192 analysis 100–112 characteristics 27–29, 67–79 enablers 80–87 epistemology 128–148 ethical issues 165–183 etymology 67 organisational issues 160–163 rationale 113–127 sources 87–99 technical issues 149–160 biological sciences 128–129, 137 biometric data 8, 84, 115 DNA 8, 71, 84 face 85, 88, 105 fingerprints 8, 9, 84, 87, 88, 115 gait 85, 88 iris 8, 84, 88 bit-rot 20 blog 6, 95, 170 Bonferroni principle 159 born digital 32, 46, 141 Bowker, G. 2, 19, 20, 22, 24 Borgman, C. 2, 7, 10, 20, 30, 37, 40, 41 boyd, D. 68, 75, 151, 152, 156, 158, 160, 182 Brooks, D. 130, 145 business 1, 16, 42, 45, 56, 61, 62, 67, 79, 110, 113–127, 130, 137, 149, 152, 161, 166, 172, 173, 187 calculative practices 115–116 Campbell’s Law 63, 127 camera 6, 81, 83, 87, 88, 89, 90, 107, 116, 124, 167, 178, 180 capitalism 15, 16, 21, 59, 61, 62, 86, 95, 114, 119–123, 126, 136, 161, 184, 186 capta 2 categorization 6, 8, 12, 19, 20, 102, 106, 176 causation 130, 132, 135, 147 CCTV 87, 88, 180 census 17, 18, 19, 22, 24, 27, 30, 43, 54, 68, 74, 75, 76, 77, 87, 102, 115, 157, 176 Centro De Operações Prefeitura Do Rio 124–125, 182 CERN 72, 82 citizen science 97–99, 155 citizens xvi, 45, 57, 58, 61, 63, 71, 88, 114, 115, 116, 126, 127, 165, 166, 167, 174, 176, 179, 187 citizenship 55, 115, 170, 174 classification 6, 10, 11, 23, 28, 104, 105, 157, 176 clickstream 43, 92, 94, 120, 122, 154, 176 clustering 103, 104, 105, 106, 110, 122 Codd, E. 31 competitiveness xvi, 16, 114, computation 2, 4, 5, 6, 29, 32, 68, 80, 81–82, 83, 84, 86, 98, 100, 101, 102, 110, 129, 136, 139–147, 181 computational social science xiv, 139–147, 152, 186 computing cloud xv, 81, 86 distributed xv, 37, 78, 81, 83, 98 mobile xv, 44, 78, 80, 81, 83, 85, 139 pervasive 81, 83–84, 98, 124 ubiquitous 80, 81, 83–84, 98, 100, 124, 126 confidence level 14, 37, 133, 153, 160 confidentiality 8, 169, 175 control creep 126, 166, 178–179 cookies 92, 119, 171 copyright 16, 30, 40, 49, 51, 54, 96 correlation 105, 110, 130, 131, 132, 135, 145, 147, 157, 159 cost xv, 6, 11, 16, 27, 31, 32, 37, 38, 39, 40, 44, 52, 54, 57, 58, 59, 61, 66, 80, 81, 83, 85, 93, 96, 100, 116, 117, 118, 120, 127, 150 Crawford, K. 68, 75, 135, 151, 152, 155, 156, 158, 160, 182 credit cards 8, 13, 42, 44, 45, 85, 92, 167, 171, 176 risk 42, 63, 75, 120, 176, 177 crime 55, 115, 116, 123, 175, 179 crowdsourcing 37, 73, 93, 96–97, 155, 160 Cukier, K. 68, 71, 72, 91, 114, 128, 153, 154, 161, 174 customer relationship management (CRM) 42, 99, 117–118, 120, 122, 176 cyber-infrastructure 33, 34, 35, 41, 186 dashboard 106, 107, 108 data accuracy 12, 14, 110, 153, 154, 171 administrative 84–85, 89, 115, 116, 125, 150, 178 aggregators see data brokers amplification 8, 76, 99, 102, 167 analogue 1, 3, 32, 83, 88, 140, 141 analytics 42, 43, 63, 73, 80, 100–112, 116, 118, 119, 120, 124, 125, 129, 132, 134, 137, 139, 140, 145, 146, 149, 151, 159, 160, 161, 176, 179, 186, 191 archive see archive assemblage xvi, xvii, 2, 17, 22, 24–26, 66, 80, 83, 99, 117, 135, 139, 183, 184–192 attribute 4, 8–9, 31, 115, 150 auditing 33, 40, 64, 163 authenticity 12, 153 automated see automation bias see bias big see big data binary 1, 4, 32, 69 biometric see biometric data body 177–178, 187 boosterism xvi, 67, 127, 187, 192 brokers 42–45, 46, 57, 74, 75, 167, 183, 186, 187, 188, 191 calibration 13, 20 catalogue 32, 33, 35 clean 12, 40, 64, 86, 100, 101, 102, 152, 153, 154, 156 clearing house 33 commodity xvi, 4, 10, 12, 15, 16, 41, 42–45, 56, 161 commons 16, 42 consolidators see data brokers cooked 20, 21 corruption 19, 30 curation 9, 29, 30, 34, 36, 57, 141 definition 1, 2–4 deluge xv, 28, 73, 79, 100, 112, 130, 147, 149–151, 157, 168, 175 derived 1, 2, 3, 6–7, 8, 31, 32, 37, 42, 43, 44, 45, 62, 86, 178 deserts xvi, 28, 80, 147, 149–151, 161 determinism 45, 135 digital 1, 15, 31, 32, 67, 69, 71, 77, 82, 85, 86, 90, 137 directories 33, 35 dirty 29, 154, 163 dive 64–65, 188 documentation 20, 30, 31, 40, 64, 163 dredging 135, 147, 158, 159 dump 64, 150, 163 dynamic see dynamic data enrichment 102 error 13, 14, 44, 45, 101, 110, 153, 154, 156, 169, 175, 180 etymology 2–3, 67 exhaust 6–7, 29, 80, 90 fidelity 34, 40, 55, 79, 152–156 fishing see data dredging formats xvi, 3, 5, 6, 9, 22, 25, 30, 33, 34, 40, 51, 52, 54, 65, 77, 102, 153, 156, 157, 174 framing 12–26, 133–136, 185–188 gamed 154 holding 33, 35, 64 infrastructure xv, xvi, xvii, 2, 21–24, 25, 27–47, 52, 64, 102, 112, 113, 128, 129, 136, 140, 143, 147, 148, 149, 150, 156, 160, 161, 162, 163, 166, 184, 185, 186, 188, 189, 190, 191, 192 integration 42, 149, 156–157 integrity 12, 30, 33, 34, 37, 40, 51, 154, 157, 171 interaction 43, 72, 75, 85, 92–93, 94, 111, 167 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 156–157, 163, 184 interval 5, 110 licensing see licensing lineage 9, 152–156 linked see linked data lost 5, 30, 31, 39, 56, 150 markets xvi, 8, 15, 25, 42-45, 56, 59, 75, 167, 178 materiality see materiality meta see metadata mining 5, 77, 101, 103, 104–106, 109, 110, 112, 129, 132, 138, 159, 188 minimisation 45, 171, 178, 180 nominal 5, 110 ordinal 5, 110 open see open data ontology 12, 28, 54, 150 operational 3 ownership 16, 40, 96, 156, 166 preparation 40, 41, 54, 101–102 philosophy of 1, 2, 14, 17–21, 22, 25, 128–148, 185–188 policy 14, 23, 30, 33, 34, 37, 40, 48, 64, 160, 163, 170, 172, 173, 178 portals 24, 33, 34, 35 primary 3, 7–8, 9, 50, 90 preservation 30, 31, 34, 36, 39, 40, 64, 163 protection 15, 16, 17, 20, 23, 28, 40, 45, 62, 63, 64, 167, 168–174, 175, 178, 188 protocols 23, 25, 30, 34, 37 provenance 9, 30, 40, 79, 153, 156, 179 qualitative 4–5, 6, 14, 146, 191 quantitative 4–5, 14, 109, 127, 136, 144, 145, 191 quality 12, 13, 14, 34, 37, 40, 45, 52, 55, 57, 58, 64, 79, 102, 149, 151, 152–156, 157, 158 raw 1, 2, 6, 9, 20, 86, 185 ratio 5, 110 real-time 65, 68, 71, 73, 76, 88, 89, 91, 99, 102, 106, 107, 116, 118, 121, 124, 125, 139, 151, 181 reduction 5, 101–102 representative 4, 8, 13, 19, 21, 28 relational 3, 8, 28, 44, 68, 74–76, 79, 84, 85, 87, 88, 99, 100, 119, 140, 156, 166, 167, 184 reliability 12, 13–14, 52, 135, 155 resellers see data brokers resolution 7, 26, 27, 28, 68, 72, 73–74, 79, 84, 85, 89, 92, 133–134, 139, 140, 150, 180 reuse 7, 27, 29, 30, 31, 32, 39, 40, 41, 42, 46, 48, 49–50, 52, 56, 59, 61, 64, 102, 113, 163 scaled xvi, xvii 32, 100, 101, 112, 138, 149, 150, 163, 186 scarcity xv, xvi, 28, 80, 149–151, 161 science xvi, 100–112, 130, 137–139, 148, 151, 158, 160–163, 164, 191 secondary 3, 7–8 security see security selection 101, 176 semi-structured 4, 5–6, 77, 100, 105 sensitive 15, 16, 45, 63, 64, 137, 151, 167, 168, 171, 173, 174 shadow 166–168, 177, 179, 180 sharing 9, 11, 20, 21, 23, 24, 27, 29–41, 48–66, 80, 82, 95, 113, 141, 151, 174, 186 small see small data social construction 19–24 spatial 17, 52, 63, 68, 73, 75, 84–85, 88–89 standards xvi, 9, 14, 19, 22, 23, 24, 25, 31, 33, 34, 38, 40, 52, 53, 64, 102, 153, 156, 157 storage see storage stranded 156 structures 4, 5–6, 12, 21, 23, 30, 31, 40, 51, 68, 77, 86, 103, 106, 156 structured 4, 5–6, 11, 32, 52, 68, 71, 75, 77, 79, 86, 88, 105, 112, 163 tertiary 7–8, 9, 27, 74 time-series 68, 102, 106, 110 transient 6–7, 72, 150 transactional 42, 43, 71, 72, 74, 75, 85, 92, 93–94, 120, 122, 131, 167, 175, 176, 177 uncertainty see uncertainty unstructured 4, 5–6, 32, 52, 68, 71, 75, 77, 86, 100, 105, 112, 140, 153, 157 validity 12, 40, 72, 102, 135, 138, 154, 156, 158 variety 26, 28, 43, 44, 46, 68, 77, 79, 86, 139, 140, 166, 184 velocity 26, 28, 29, 68, 76–77, 78, 79, 86, 88, 102, 106, 112. 117, 140, 150, 153, 156, 184 veracity 13, 79, 102, 135, 152–156, 157, 163 volume 7, 26, 27, 28, 29, 32, 46, 67, 68, 69–72, 74, 76, 77, 78, 79, 86, 102, 106, 110, 125, 130, 135, 140, 141, 150, 156, 166, 184 volunteered 87, 93–98, 99, 155 databank 29, 34, 43 database NoSQL 6, 32, 77, 78, 86–87 relational 5, 6, 8, 32–33, 43, 74–75, 77, 78, 86, 100, 105 data-driven science 133, 137–139, 186 data-ism 130 datafication 181 dataveillance 15, 116, 126, 157, 166–168, 180, 181, 182, 184 decision tree 104, 111, 122, 159, deconstruction 24, 98, 126, 189–190 decontextualisation 22 deduction 132, 133, 134, 137, 138, 139, 148 deidentification 171, 172, 178 democracy 48, 55, 62, 63, 96, 117, 170 description 9, 101, 104, 109, 143, 147, 151, 190 designated community 30–31, 33, 46 digital devices 13, 25, 80, 81, 83, 84, 87, 90–91, 167, 174, 175 humanities xvi, 139–147, 152, 186 object identifier 8, 74 serendipity 134 discourse 15, 20, 55, 113–114, 117, 122, 127, 192 discursive regime 15, 20, 24, 56, 98, 113–114, 116, 123, 126, 127, 190 disruptive innovation xv, 68, 147, 184, 192 distributed computing xv, 37, 78, 81, 83, 98 sensors 124, 139, 160 storage 34, 37, 68, 78, 80, 81, 85–87, 97 division of labour 16 Dodge, M. 2, 21, 68, 73, 74, 76, 83, 84, 85, 89, 90, 92, 93, 96, 113, 115, 116, 124, 154, 155, 167, 177, 178, 179, 180, 189 driver’s licence 45, 87, 171 drone 88, Dublin Core 9 dynamic data xv, xvi, 76–77, 86, 106, 112 pricing 16, 120, 123, 177 eBureau 43, 44 ecological fallacy 14, 102, 135, 149, 158–160 Economist, The 58, 67, 69, 70, 72, 128 efficiency 16, 38, 55, 56, 59, 66, 77, 93, 102, 111, 114, 116, 118, 119, 174, 176 e-mail 71, 72–73, 82, 85, 90, 93, 116, 174, 190 empiricism 129, 130–137, 141, 186 empowerment 61, 62–63, 93, 115, 126, 165 encryption 171, 175 Enlightenment 114 Enterprise Resource Planning (ERP) 99, 117, 120 entity extraction 105 epistemology 3, 12, 19, 73, 79, 112, 128–148, 149, 185, 186 Epsilon 43 ethics 12, 14–15, 16, 19, 26, 30, 31, 40, 41, 64, 73, 99, 128, 144, 151, 163, 165–183, 186 ethnography 78, 189, 190, 191 European Union 31, 38, 45, 49, 58, 59, 70, 157, 168, 173, 178 everyware 83 exhaustive 13, 27, 28, 68, 72–73, 79, 83, 88, 100, 110, 118, 133–134, 140, 150, 153, 166, 184 explanation 101, 109, 132, 133, 134, 137, 151 extensionality 67, 78, 140, 184 experiment 2, 3, 6, 34, 75, 78, 118, 129, 131, 137, 146, 150, 160 Facebook 6, 28, 43, 71, 72, 77, 78, 85, 94, 119, 154, 170 facts 3, 4, 9, 10, 52, 140, 159 Fair Information Practice Principles 170–171, 172 false positive 159 Federal Trade Commission (FTC) 45, 173 flexibility 27, 28, 68, 77–78, 79, 86, 140, 157, 184 Flickr 95, 170 Flightradar 107 Floridi, L. 3, 4, 9, 10, 11, 73, 112, 130, 151 Foucault, M. 16, 113, 114, 189 Fourth paradigm 129–139 Franks, B. 6, 111, 154 freedom of information 48 freemium service 60 funding 15, 28, 29, 31, 34, 37, 38, 40, 41, 46, 48, 52, 54–55, 56, 57–58, 59, 60, 61, 65, 67, 75, 119, 143, 189 geographic information systems 147 genealogy 98, 127, 189–190 Gitelman, L. 2, 19, 20, 21, 22 Global Positioning System (GPS) 58, 59, 73, 85, 88, 90, 121, 154, 169 Google 32, 71, 73, 78, 86, 106, 109, 134, 170 governance 15, 21, 22, 23, 38, 40, 55, 63, 64, 66, 85, 87, 89, 117, 124, 126, 136, 168, 170, 178–182, 186, 187, 189 anticipatory 126, 166, 178–179 technocratic 126, 179–182 governmentality xvi, 15, 23, 25, 40, 87, 115, 127, 168, 185, 191 Gray, J. 129–130 Guardian, The 49 Gurstein, M. 52, 62, 63 hacking 45, 154, 174, 175 hackathon 64–65, 96, 97, 188, 191 Hadoop 87 hardware 32, 34, 40, 63, 78, 83, 84, 124, 143, 160 human resourcing 112, 160–163 hype cycle 67 hypothesis 129, 131, 132, 133, 137, 191 IBM 70, 123, 124, 143, 162, 182 identification 8, 44, 68, 73, 74, 77, 84–85, 87, 90, 92, 115, 169, 171, 172 ideology 4, 14, 25, 61, 113, 126, 128, 130, 134, 140, 144, 185, 190 immutable mobiles 22 independence 3, 19, 20, 24, 100 indexical 4, 8–9, 32, 44, 68, 73–74, 79, 81, 84–85, 88, 91, 98, 115, 150, 156, 167, 184 indicator 13, 62, 76, 102, 127 induction 133, 134, 137, 138, 148 information xvii, 1, 3, 4, 6, 9–12, 13, 23, 26, 31, 33, 42, 44, 45, 48, 53, 67, 70, 74, 75, 77, 92, 93, 94, 95, 96, 100, 101, 104, 105, 109, 110, 119, 125, 130, 138, 140, 151, 154, 158, 161, 168, 169, 171, 174, 175, 184, 192 amplification effect 76 freedom of 48 management 80, 100 overload xvi public sector 48 system 34, 65, 85, 117, 181 visualisation 109 information and communication technologies (ICTs) xvi, 37, 80, 83–84, 92, 93, 123, 124 Innocentive 96, 97 INSPIRE 157 instrumental rationality 181 internet 9, 32, 42, 49, 52, 53, 66, 70, 74, 80, 81, 82, 83, 86, 92, 94, 96, 116, 125, 167 of things xv, xvi, 71, 84, 92, 175 intellectual property rights xvi, 11, 12, 16, 25, 30, 31, 40, 41, 49, 50, 56, 62, 152, 166 Intelius 43, 44 intelligent transportation systems (ITS) 89, 124 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 149, 156–157, 163, 184 interpellation 165, 180, 188 interviews 13, 15, 19, 78, 155, 190 Issenberg, S. 75, 76, 78, 119 jurisdiction 17, 25, 51, 56, 57, 74, 114, 116 Kafka 180 knowledge xvii, 1, 3, 9–12, 19, 20, 22, 25, 48, 53, 55, 58, 63, 67, 93, 96, 110, 111, 118, 128, 130, 134, 136, 138, 142, 159, 160, 161, 162, 187, 192 contextual 48, 64, 132, 136–137, 143, 144, 187 discovery techniques 77, 138 driven science 139 economy 16, 38, 49 production of 16, 20, 21, 24, 26, 37, 41, 112, 117, 134, 137, 144, 184, 185 pyramid 9–10, 12, situated 16, 20, 28, 135, 137, 189 Latour, B. 22, 133 Lauriault, T.P. 15, 16, 17, 23, 24, 30, 31, 33, 37, 38, 40, 153 law of telecosm 82 legal issues xvi, 1, 23, 25, 30, 31, 115, 165–179, 182, 183, 187, 188 levels of measurement 4, 5 libraries 31, 32, 52, 71, 141, 142 licensing 14, 25, 40, 42, 48, 49, 51, 53, 57, 73, 96, 151 LIDAR 88, 89, 139 linked data xvii, 52–54, 66, 156 longitudinal study 13, 76, 140, 149, 150, 160 Lyon, D. 44, 74, 87, 167, 178, 180 machine learning 5, 6, 101, 102–104, 106, 111, 136, 188 readable 6, 52, 54, 81, 84–85, 90, 92, 98 vision 106 management 62, 88, 117–119, 120, 121, 124, 125, 131, 162, 181 Manovich, L. 141, 146, 152, 155 Manyika, J. 6, 16, 70, 71, 72, 104, 116, 118, 119, 120, 121, 122, 161 map 5, 22, 24, 34, 48, 54, 56, 73, 85, 88, 93, 96, 106, 107, 109, 115, 143, 144, 147, 154, 155–156, 157, 190 MapReduce 86, 87 marginal cost 11, 32, 57, 58, 59, 66, 151 marketing 8, 44, 58, 73, 117, 119, 120–123, 131, 176 marketisation 56, 61–62, 182 materiality 4, 19, 21, 24, 25, 66, 183, 185, 186, 189, 190 Mattern, S. 137, 181 Mayer-Schonberger, V. 68, 71, 72, 91, 114, 153, 154, 174 measurement 1, 3, 5, 6, 10, 12, 13, 15, 19, 23, 69, 97, 98, 115, 128, 166 metadata xvi, 1, 3, 4, 6, 8–9, 13, 22, 24, 29, 30, 31, 33, 35, 40, 43, 50, 54, 64, 71, 72, 74, 78, 85, 91, 93, 102, 105, 153, 155, 156 methodology 145, 158, 185 middleware 34 military intelligence 71, 116, 175 Miller, H.J. xvi, 27, 100, 101, 103, 104, 138, 139, 159 Minelli, M. 101, 120, 137, 168, 170, 171, 172, 174, 176 mixed methods 147, 191 mobile apps 78 computing xv, 44, 78, 80, 81, 83, 85, 139 mapping 88 phones 76, 81, 83, 90, 93, 151, 168, 170, 175 storage 85 mode of production 16 model 7, 11, 12, 24, 32, 37, 44, 57, 72, 73, 101, 103, 105, 106, 109, 110–112, 119, 125, 129, 130, 131, 132, 133, 134, 137, 139, 140, 144, 145, 147, 158–159, 166, 181 agent-based model 111, business 30, 54, 57–60, 61, 95, 118, 119, 121 environmental 139, 166 meteorological 72 time-space 73 transportation 7 modernity 3 Moore’s Law 81, moral philosophy 14 Moretti, F. 141–142 museum 31, 32, 137 NASA 7 National Archives and Records Administration (NARA) 67 National Security Agency (NSA) 45, 116 natural language processing 104, 105 near-field communication 89, 91 neoliberalism 56, 61–62, 126, 182 neural networks 104, 105, 111 New Public Management 62, non-governmental organisations xvi, 43, 55, 56, 73, 117 non-excludable 11, 151 non-rivalrous 11, 57, 151 normality 100, 101 normative thinking 12, 15, 19, 66, 99, 127, 144, 182, 183, 187, 192 Obama, B. 53, 75–76, 78, 118–119 objectivity 2, 17, 19, 20, 62, 135, 146, 185 observant participation 191 oligopticon 133, 167, 180 ontology 3, 12, 17–21, 22, 28, 54, 79, 128, 138, 150, 156, 177, 178, 184, 185 open data xv, xvi, xvii, 2, 12, 16, 21, 25, 48–66, 97, 114, 124, 128, 129, 140, 149, 151, 163, 164, 167, 186, 187, 188, 190, 191, 192 critique of 61–66 economics of 57–60 rationale 54–56 Open Definition 50 OpenGovData 50, 51 Open Knowledge Foundation 49, 52, 55, 58, 189, 190 open science 48, 72, 98 source 48, 56, 60, 87, 96 OpenStreetMap 73, 93, 96, 154, 155–156 optimisation 101, 104, 110–112, 120, 121, 122, 123 Ordnance Survey 54, 57 Organization for Economic Cooperation and Development (OECD) 49, 50, 59 overlearning 158, 159 panoptic 133, 167, 180 paradigm 112, 128–129, 130, 138, 147, 148, 186 participant observation 190, 191 participation 48, 49, 55, 66, 82, 94, 95, 96, 97–98, 126, 155, 165, 180 passport 8, 45, 84, 87, 88, 115 patent 13, 16, 41, 51 pattern recognition 101, 104–106, 134, 135 personally identifiable information 171 philanthropy 32, 38, 58 philosophy of science 112, 128–148, 185–188 phishing 174, 175 phone hacking 45 photography 6, 43, 71, 72, 74, 77, 86, 87, 88, 93, 94, 95, 105, 115, 116, 141, 155, 170 policing 80, 88, 116, 124, 125, 179 political economy xvi, 15–16, 25, 42–45, 182, 185, 188, 191 Pollock, R. 49, 54, 56, 57 58, 59 positivism 129, 136–137, 140, 141, 144, 145, 147 post-positivism 140, 144, 147 positionality 135, 190 power/knowledge 16, 22 predictive modelling 4, 7, 12, 34, 44, 45, 76, 101, 103, 104, 110–112, 118, 119, 120, 125, 132, 140, 147, 168, 179 profiling 110–112, 175–178, 179, 180 prescription 101 pre-analytical 2, 3, 19, 20, 185 pre-analytics 101–102, 112 pre-factual 3, 4, 19, 185 PRISM 45, 116 privacy 15, 28, 30, 40, 45, 51, 57, 63, 64, 96, 117, 163, 165, 166, 168–174, 175, 178, 182, 187 privacy by design 45, 173, 174 probability 14, 110, 153, 158 productivity xvi, 16, 39, 55, 66, 92, 114, 118 profiling 12, 42–45, 74, 75, 110–112, 119, 166, 168, 175–178, 179, 180, 187 propriety rights 48, 49, 54, 57, 62 prosumption 93 public good 4, 12, 16, 42, 52, 56, 58, 79, 97 –private partnerships 56, 59 sector information (PSI) 12, 48, 54, 56, 59, 61, 62 quantified self 95 redlining 176, 182 reductionism 73, 136, 140, 142, 143, 145 regression 102, 104, 105, 110, 111, 122 regulation xvi, 15, 16, 23, 25, 40, 44, 46, 83, 85, 87, 89–90, 114, 115, 123, 124, 126, 168, 174, 178, 180, 181–182, 187, 192 research design 7, 13, 14, 77–78, 98, 137–138, 153, 158 Renaissance xvi, 129, 141 repository 29, 33, 34, 41 representativeness 13, 14, 19, 21 Resource Description Framework (RDF) 53, 54 remote sensing 73–74, 105 RFID 74, 85, 90, 91, 169 rhetorical 3, 4, 185 right to be forgotten 45, 172, 187 information (RTI) 48, 62 risk 16, 44, 58, 63, 118, 120, 123, 132, 158, 174, 176–177, 178, 179, 180 Rosenberg, D. 1, 3 Ruppert, E. 22, 112, 157, 163, 187 sampling 13, 14, 27, 28, 46, 68, 72, 73, 77, 78, 88, 100, 101, 102, 120, 126, 133, 138, 139, 146, 149–150, 152, 153, 154, 156, 159 scale of economy 37 scanners 6, 25, 29, 32, 83, 85, 88, 89, 90, 91, 92, 175, 177, 180 science xvi, 1, 2, 3, 19, 20, 29, 31, 34, 37, 46, 65, 67, 71, 72, 73, 78, 79, 97, 98, 100, 101, 103, 111, 112, 128–139, 140, 147, 148, 150, 158, 161, 165, 166, 181, 184, 186 scientific method 129, 130, 133, 134, 136, 137–138, 140, 147, 148, 186 security data 28, 33, 34, 40, 45, 46, 51, 57, 126, 157, 166, 169, 171, 173, 174–175, 182, 187 national 42, 71, 88, 116–117, 172, 176, 178, 179 private 99, 115, 118, 151 social 8, 32, 45, 87, 115, 171 segmentation 104, 105, 110, 119, 120, 121, 122, 176 semantic information 9, 10, 11, 105, 157 Web 49, 52, 53, 66 sensors xv, 6, 7, 19, 20, 24, 25, 28, 34, 71, 76, 83, 84, 91–92, 95, 124, 139, 150, 160 sentiment analysis 105, 106, 121, Siegel, E. 103, 110, 111, 114, 120, 132, 158, 176, 179 signal 9, 151, 159 Silver, N. 136, 151, 158 simulation 4, 32, 37, 101, 104, 110–112, 119, 129, 133, 137, 139, 140 skills 37, 48, 52, 53, 57, 63, 94, 97, 98, 112, 149, 160–163, 164 small data 21, 27–47, 68, 72, 75, 76, 77, 79, 100, 103, 110, 112, 146, 147, 148, 150, 156, 160, 166, 184, 186, 188, 191 smart cards 90 cities 91, 92, 99, 124–125, 181–182 devices 83 metering 89, 123, 174 phones 81, 82, 83, 84, 90, 94, 107, 121, 155, 170, 174 SmartSantander 91 social computing xvi determinism 144 media xv, 13, 42, 43, 76, 78, 90, 93, 94–95, 96, 105, 119, 121, 140, 150, 151, 152, 154, 155, 160, 167, 176, 180 physics 144 security number 8, 32, 45, 87, 115, 171 sorting 126, 166, 168, 175–178, 182 sociotechnical systems 21–24, 47, 66, 183, 185, 188 software 6, 20, 32, 34, 40, 48, 53, 54, 56, 63, 80, 83, 84, 86, 88, 96, 132, 143, 160, 161, 163, 166, 170, 172, 175, 177, 180, 189 Solove, D. 116, 120, 168, 169, 170, 172, 176, 178, 180 solutionism 181 sousveillance 95–96 spatial autocorrelation 146 data infrastructure 34, 35, 38 processes 136, 144 resolution 149 statistics 110 video 88 spatiality 17, 157 Star, S.L. 19, 20, 23, 24 stationarity 100 statistical agencies 8, 30, 34, 35, 115 geography 17, 74, 157 statistics 4, 8, 13, 14, 24, 48, 77, 100, 101, 102, 104, 105, 109–110, 111, 129, 132, 134, 135, 136, 140, 142, 143, 145, 147, 159 descriptive 4, 106, 109, 147 inferential 4, 110, 147 non-parametric 105, 110 parametric 105, 110 probablistic 110 radical 147 spatial 110 storage 31–32, 68, 72, 73, 78, 80, 85–87, 88, 100, 118, 161, 171 analogue 85, 86 digital 85–87 media 20, 86 store loyalty cards 42, 45, 165 Sunlight Foundation 49 supervised learning 103 Supply Chain Management (SCM) 74, 99, 117–118, 119, 120, 121 surveillance 15, 71, 80, 83, 87–90, 95, 115, 116, 117, 123, 124, 151, 165, 167, 168, 169, 180 survey 6, 17, 19, 22, 28, 42, 68, 75, 77, 87, 115, 120 sustainability 16, 33, 34, 57, 58, 59, 61, 64–66, 87, 114, 123–124, 126, 155 synchronicity 14, 95, 102 technological handshake 84, 153 lock-in 166, 179–182 temporality 17, 21, 27, 28, 32, 37, 68, 75, 111, 114, 157, 160, 186 terrorism 116, 165, 179 territory 16, 38, 74, 85, 167 Tesco 71, 120 Thrift, N. 83, 113, 133, 167, 176 TopCoder 96 trading funds 54–55, 56, 57 transparency 19, 38, 44, 45, 48–49, 55, 61, 62, 63, 113, 115, 117, 118, 121, 126, 165, 173, 178, 180 trust 8, 30, 33, 34, 40, 44, 55, 84, 117, 152–156, 163, 175 trusted digital repository 33–34 Twitter 6, 71, 78, 94, 106, 107, 133, 143, 144, 146, 152, 154, 155, 170 uncertainty 10, 13, 14, 100, 102, 110, 156, 158 uneven development 16 Uniform Resource Identifiers (URIs) 53, 54 United Nations Development Programme (UNDP) 49 universalism 20, 23, 133, 140, 144, 154, 190 unsupervised learning 103 utility 1, 28, 53, 54, 55, 61, 63, 64–66, 100, 101, 114, 115, 134, 147, 163, 185 venture capital 25, 59 video 6, 43, 71, 74, 77, 83, 88, 90, 93, 94, 106, 141, 146, 170 visual analytics 106–109 visualisation 5, 10, 34, 77, 101, 102, 104, 106–109, 112, 125, 132, 141, 143 Walmart 28, 71, 99, 120 Web 2.0 81, 94–95 Weinberger, D. 9, 10, 11, 96, 97, 132, 133 White House 48 Wikipedia 93, 96, 106, 107, 143, 154, 155 Wired 69, 130 wisdom 9–12, 114, 161 XML 6, 53 Zikopoulos, P.C. 6, 16, 68, 70, 73, 76, 119, 151


pages: 1,082 words: 87,792

Python for Algorithmic Trading: From Idea to Cloud Deployment by Yves Hilpisch

algorithmic trading, Amazon Web Services, automated trading system, backtesting, barriers to entry, bitcoin, Brownian motion, cloud computing, coronavirus, cryptocurrency, data science, deep learning, Edward Thorp, fiat currency, global macro, Gordon Gekko, Guido van Rossum, implied volatility, information retrieval, margin call, market microstructure, Myron Scholes, natural language processing, paper trading, passive investing, popular electronics, prediction markets, quantitative trading / quantitative finance, random walk, risk free rate, risk/return, Rubik’s Cube, seminal paper, Sharpe ratio, short selling, sorting algorithm, systematic trading, transaction costs, value at risk

Index A absolute maximum drawdown, Case Study AdaBoost algorithm, Vectorized Backtesting addition (+) operator, Data Types adjusted return appraisal ratio, Algorithmic Trading algorithmic trading (generally)advantages of, Algorithmic Trading basics, Algorithmic Trading-Algorithmic Trading strategies, Trading Strategies-Conclusions alpha seeking strategies, Trading Strategies alpha, defined, Algorithmic Trading anonymous functions, Python Idioms API key, for data sets, Working with Open Data Sources-Working with Open Data Sources Apple, Inc.intraday stock prices, Getting into the Basics reading stock price data from different sources, Reading Financial Data From Different Sources-Reading from Excel and JSON retrieving historical unstructured data about, Retrieving Historical Unstructured Data-Retrieving Historical Unstructured Data app_key, for Eikon Data API, Eikon Data API AQR Capital Management, pandas and the DataFrame Class arithmetic operations, Data Types array programming, Making Use of Vectorization(see also vectorization) automated trading operations, Automating Trading Operations-Strategy Monitoringcapital management, Capital Management-Kelly Criterion for Stocks and Indices configuring Oanda account, Configuring Oanda Account hardware setup, Setting Up the Hardware infrastructure and deployment, Infrastructure and Deployment logging and monitoring, Logging and Monitoring-Logging and Monitoring ML-based trading strategy, ML-Based Trading Strategy-Persisting the Model Object online algorithm, Online Algorithm-Online Algorithm Python environment setup, Setting Up the Python Environment Python scripts for, Python Script-Strategy Monitoring real-time monitoring, Real-Time Monitoring running code, Running the Code uploading code, Uploading the Code visual step-by-step overview, Visual Step-by-Step Overview-Real-Time Monitoring B backtestingbased on simple moving averages, Strategies Based on Simple Moving Averages-Generalizing the Approach Python scripts for classification algorithm backtesting, Classification Algorithm Backtesting Class Python scripts for linear regression backtesting class, Linear Regression Backtesting Class vectorized (see vectorized backtesting) BacktestLongShort class, Long-Short Backtesting Class, Long-Short Backtesting Class bar charts, matplotlib bar plots (see Plotly; streaming bar plot) base class, for event-based backtesting, Backtesting Base Class-Backtesting Base Class, Backtesting Base Class Bash script, Building a Ubuntu and Python Docker Imagefor Droplet set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up for Python/Jupyter Lab installation, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Bitcoin, pandas and the DataFrame Class, Working with Open Data Sources Boolean operationsNumPy, Boolean Operations pandas, Boolean Operations C callback functions, Retrieving Streaming Data capital managementautomated trading operations and, Capital Management-Kelly Criterion for Stocks and Indices Kelly criterion for stocks and indices, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices Kelly criterion in binomial setting, Kelly Criterion in Binomial Setting-Kelly Criterion in Binomial Setting Carter, Graydon, FX Trading with FXCM CFD (contracts for difference)algorithmic trading risks, Logging and Monitoring defined, CFD Trading with Oanda risks of losses, Long-Short Backtesting Class risks of trading on margin, FX Trading with FXCM trading with Oanda, CFD Trading with Oanda-Python Script(see also Oanda) classification problemsmachine learning for, A Simple Classification Problem-A Simple Classification Problem neural networks for, The Simple Classification Problem Revisited-The Simple Classification Problem Revisited Python scripts for vectorized backtesting, Classification Algorithm Backtesting Class .close_all() method, Placing Orders cloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Upinstallation script for Python and Jupyter Lab, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Jupyter Notebook configuration file, Jupyter Notebook Configuration File RSA public/private keys, RSA Public and Private Keys script to orchestrate Droplet set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up Cocteau, Jean, Building Classes for Event-Based Backtesting comma separated value (CSV) files (see CSV files) condaas package manager, Conda as a Package Manager-Basic Operations with Conda as virtual environment manager, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager basic operations, Basic Operations with Conda-Basic Operations with Conda installing Miniconda, Installing Miniconda-Installing Miniconda conda remove, Basic Operations with Conda configparser module, The Oanda API containers (see Docker containers) contracts for difference (see CFD) control structures, Control Structures CPython, Python for Finance, Python Infrastructure .create_market_buy_order() method, Placing Orders .create_order() method, Placing Market Orders-Placing Market Orders cross-sectional momentum strategies, Strategies Based on Momentum CSV filesinput-output operations, Input-Output Operations-Input-Output Operations reading from a CSV file with pandas, Reading from a CSV File with pandas reading from a CSV file with Python, Reading from a CSV File with Python-Reading from a CSV File with Python .cummax() method, Case Study currency pairs, Logging and Monitoring(see also EUR/USD exchange rate) algorithmic trading risks, Logging and Monitoring D data science stack, Python, NumPy, matplotlib, pandas data snooping, Data Snooping and Overfitting data storageSQLite3 for, Storing Data with SQLite3-Storing Data with SQLite3 storing data efficiently, Storing Financial Data Efficiently-Storing Data with SQLite3 storing DataFrame objects, Storing DataFrame Objects-Storing DataFrame Objects TsTables package for, Using TsTables-Using TsTables data structures, Data Structures-Data Structures DataFrame class, pandas and the DataFrame Class-pandas and the DataFrame Class, Reading from a CSV File with pandas, DataFrame Class-DataFrame Class DataFrame objectscreating, Vectorization with pandas storing, Storing DataFrame Objects-Storing DataFrame Objects dataism, Preface DatetimeIndex() constructor, Plotting with pandas decision tree classification algorithm, Vectorized Backtesting deep learningadding features to analysis, Adding Different Types of Features-Adding Different Types of Features classification problem, The Simple Classification Problem Revisited-The Simple Classification Problem Revisited deep neural networks for predicting market direction, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features market movement prediction, Using Deep Learning for Market Movement Prediction-Adding Different Types of Features trading strategies and, Machine and Deep Learning deep neural networks, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features delta hedging, Algorithmic Trading dense neural network (DNN), The Simple Classification Problem Revisited, Using Deep Neural Networks to Predict Market Direction dictionary (dict) objects, Reading from a CSV File with Python, Data Structures DigitalOceancloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Up droplet setup, Setting Up the Hardware DNN (dense neural network), The Simple Classification Problem Revisited, Using Deep Neural Networks to Predict Market Direction Docker containers, Using Docker Containers-Building a Ubuntu and Python Docker Imagebuilding a Ubuntu and Python Docker image, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image defined, Docker Images and Containers Docker images versus, Docker Images and Containers Docker imagesdefined, Docker Images and Containers Docker containers versus, Docker Images and Containers Dockerfile, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image Domingos, Pedro, Automating Trading Operations Droplet, Using Cloud Instancescosts, Infrastructure and Deployment script to orchestrate set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up dynamic hedging, Algorithmic Trading E efficient market hypothesis, Predicting Market Movements with Machine Learning Eikon Data API, Eikon Data API-Retrieving Historical Unstructured Dataretrieving historical structured data, Retrieving Historical Structured Data-Retrieving Historical Structured Data retrieving historical unstructured data, Retrieving Historical Unstructured Data-Retrieving Historical Unstructured Data Euler discretization, Python Versus Pseudo-Code EUR/USD exchange ratebacktesting momentum strategy on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars evaluation of regression-based strategy, Generalizing the Approach factoring in leverage/margin, Factoring In Leverage and Margin-Factoring In Leverage and Margin gross performance versus deep learning-based strategy, Using Deep Neural Networks to Predict Market Direction-Using Deep Neural Networks to Predict Market Direction, Adding Different Types of Features-Adding Different Types of Features historical ask close prices, Retrieving Historical Data-Retrieving Historical Data historical candles data for, Retrieving Candles Data historical tick data for, Retrieving Tick Data implementing trading strategies in real time, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time logistic regression-based strategies, Generalizing the Approach placing orders, Placing Orders-Placing Orders predicting, Predicting Index Levels-Predicting Index Levels predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels retrieving streaming data for, Retrieving Streaming Data retrieving trading account information, Retrieving Account Information-Retrieving Account Information SMA calculation, Getting into the Basics-Generalizing the Approach vectorized backtesting of ML-based trading strategy, Vectorized Backtesting-Vectorized Backtesting vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy event-based backtesting, Building Classes for Event-Based Backtesting-Long-Short Backtesting Classadvantages, Building Classes for Event-Based Backtesting base class, Backtesting Base Class-Backtesting Base Class, Backtesting Base Class building classes for, Building Classes for Event-Based Backtesting-Long-Short Backtesting Class long-only backtesting class, Long-Only Backtesting Class-Long-Only Backtesting Class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class-Long-Short Backtesting Class, Long-Short Backtesting Class Python scripts for, Backtesting Base Class-Long-Short Backtesting Class Excelexporting financial data to, Exporting to Excel and JSON reading financial data from, Reading from Excel and JSON F featuresadding different types, Adding Different Types of Features-Adding Different Types of Features lags and, Using Logistic Regression to Predict Market Direction financial data, working with, Working with Financial Data-Python Scriptsdata set for examples, The Data Set Eikon Data API, Eikon Data API-Retrieving Historical Unstructured Data exporting to Excel/JSON, Exporting to Excel and JSON open data sources, Working with Open Data Sources-Working with Open Data Sources reading data from different sources, Reading Financial Data From Different Sources-Reading from Excel and JSON reading data from Excel/JSON, Reading from Excel and JSON reading from a CSV file with pandas, Reading from a CSV File with pandas reading from a CSV file with Python, Reading from a CSV File with Python-Reading from a CSV File with Python storing data efficiently, Storing Financial Data Efficiently-Storing Data with SQLite3 .flatten() method, matplotlib foreign exchange trading (see FX trading; FXCM) future returns, predicting, Predicting Future Returns-Predicting Future Returns FX trading, FX Trading with FXCM-References and Further Resources(see also EUR/USD exchange rate) FXCMFX trading, FX Trading with FXCM-References and Further Resources getting started, Getting Started placing orders, Placing Orders-Placing Orders retrieving account information, Account Information retrieving candles data, Retrieving Candles Data-Retrieving Candles Data retrieving data, Retrieving Data-Retrieving Candles Data retrieving historical data, Retrieving Historical Data-Retrieving Historical Data retrieving streaming data, Retrieving Streaming Data retrieving tick data, Retrieving Tick Data-Retrieving Tick Data working with the API, Working with the API-Account Information fxcmpy wrapper packagecallback functions, Retrieving Streaming Data installing, Getting Started tick data retrieval, Retrieving Tick Data fxTrade, CFD Trading with Oanda G GDX (VanEck Vectors Gold Miners ETF)logistic regression-based strategies, Generalizing the Approach mean-reversion strategies, Getting into the Basics-Generalizing the Approach regression-based strategies, Generalizing the Approach generate_sample_data(), Storing Financial Data Efficiently .get_account_summary() method, Retrieving Account Information .get_candles() method, Retrieving Historical Data .get_data() method, Backtesting Base Class, Retrieving Tick Data .get_date_price() method, Backtesting Base Class .get_instruments() method, Looking Up Instruments Available for Trading .get_last_price() method, Retrieving Streaming Data .get_raw_data() method, Retrieving Tick Data get_timeseries() function, Retrieving Historical Structured Data .get_transactions() method, Retrieving Account Information GLD (SPDR Gold Shares)logistic regression-based strategies, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction mean-reversion strategies, Getting into the Basics-Generalizing the Approach gold pricemean-reversion strategies, Getting into the Basics-Getting into the Basics momentum strategy and, Getting into the Basics-Getting into the Basics, Generalizing the Approach-Generalizing the Approach Goldman Sachs, Python and Algorithmic Trading, Algorithmic Trading .go_long() method, Long-Short Backtesting Class H half Kelly criterion, Optimal Leverage Harari, Yuval Noah, Preface HDF5 binary storage library, Using TsTables-Using TsTables HDFStore wrapper, Storing DataFrame Objects-Storing DataFrame Objects high frequency trading (HFQ), Algorithmic Trading histograms, matplotlib hit ratio, defined, Vectorized Backtesting I if-elif-else control structure, Python Idioms in-sample fitting, Generalizing the Approach index levels, predicting, Predicting Index Levels-Predicting Index Levels infrastructure (see Python infrastructure) installation script, Python/Jupyter Lab, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Intel Math Kernel Library, Basic Operations with Conda iterations, Control Structures J JSONexporting financial data to, Exporting to Excel and JSON reading financial data from, Reading from Excel and JSON Jupyter Labinstallation script for, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab RSA public/private keys for, RSA Public and Private Keys tools included, Using Cloud Instances Jupyter Notebook, Jupyter Notebook Configuration File K Kelly criterionin binomial setting, Kelly Criterion in Binomial Setting-Kelly Criterion in Binomial Setting optimal leverage, Optimal Leverage-Optimal Leverage stocks and indices, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices Keras, Using Deep Learning for Market Movement Prediction, Using Deep Neural Networks to Predict Market Direction, Adding Different Types of Features key-value stores, Data Structures keys, public/private, RSA Public and Private Keys L lags, The Basic Idea for Price Prediction, Using Logistic Regression to Predict Market Direction lambda functions, Python Idioms LaTeX, Python Versus Pseudo-Code leveraged trading, risks of, Factoring In Leverage and Margin, FX Trading with FXCM, Optimal Leverage linear regressiongeneralizing the approach, Generalizing the Approach market movement prediction, Using Linear Regression for Market Movement Prediction-Generalizing the Approach predicting future market direction, Predicting Future Market Direction predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels price prediction based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction review of, A Quick Review of Linear Regression scikit-learn and, Linear Regression with scikit-learn vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy, Linear Regression Backtesting Class list comprehension, Python Idioms list constructor, Data Structures list objects, Reading from a CSV File with Python, Data Structures, Regular ndarray Object logging, of automated trading operations, Logging and Monitoring-Logging and Monitoring logistic regressiongeneralizing the approach, Generalizing the Approach-Generalizing the Approach market direction prediction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction Python script for vectorized backtesting, Classification Algorithm Backtesting Class long-only backtesting class, Long-Only Backtesting Class-Long-Only Backtesting Class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class-Long-Short Backtesting Class, Long-Short Backtesting Class longest drawdown period, Risk Analysis M machine learningclassification problem, A Simple Classification Problem-A Simple Classification Problem linear regression with scikit-learn, Linear Regression with scikit-learn market movement prediction, Using Machine Learning for Market Movement Prediction-Generalizing the Approach ML-based trading strategy, ML-Based Trading Strategy-Persisting the Model Object Python scripts, Linear Regression Backtesting Class trading strategies and, Machine and Deep Learning using logistic regression to predict market direction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction macro hedge funds, algorithmic trading and, Algorithmic Trading __main__ method, Backtesting Base Class margin trading, FX Trading with FXCM market direction prediction, Predicting Future Market Direction market movement predictiondeep learning for, Using Deep Learning for Market Movement Prediction-Adding Different Types of Features deep neural networks for, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features linear regression for, Using Linear Regression for Market Movement Prediction-Generalizing the Approach linear regression with scikit-learn, Linear Regression with scikit-learn logistic regression to predict market direction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction machine learning for, Using Machine Learning for Market Movement Prediction-Generalizing the Approach predicting future market direction, Predicting Future Market Direction predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels price prediction based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy market orders, placing, Placing Market Orders-Placing Market Orders math module, Data Types mathematical functions, Data Types matplotlib, matplotlib-matplotlib, Plotting with pandas-Plotting with pandas maximum drawdown, Risk Analysis, Case Study McKinney, Wes, pandas and the DataFrame Class mean-reversion strategies, NumPy and Vectorization, Strategies Based on Mean Reversion-Generalizing the Approachbasics, Getting into the Basics-Generalizing the Approach generalizing the approach, Generalizing the Approach Python code with a class for vectorized backtesting, Momentum Backtesting Class Miniconda, Installing Miniconda-Installing Miniconda mkl (Intel Math Kernel Library), Basic Operations with Conda ML-based strategies, ML-Based Trading Strategy-Persisting the Model Objectoptimal leverage, Optimal Leverage-Optimal Leverage persisting the model object, Persisting the Model Object Python script for, Automated Trading Strategy risk analysis, Risk Analysis-Risk Analysis vectorized backtesting, Vectorized Backtesting-Vectorized Backtesting MLPClassifier, The Simple Classification Problem Revisited MLTrader class, Online Algorithm-Online Algorithm momentum strategies, Momentumbacktesting on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars basics, Getting into the Basics-Getting into the Basics generalizing the approach, Generalizing the Approach Python code with a class for vectorized backtesting, Momentum Backtesting Class Python script for custom streaming class, Python Script Python script for momentum online algorithm, Momentum Online Algorithm vectorized backtesting of, Strategies Based on Momentum-Generalizing the Approach MomentumTrader class, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time MomVectorBacktester class, Generalizing the Approach monitoringautomated trading operations, Logging and Monitoring-Logging and Monitoring, Real-Time Monitoring Python scripts for strategy monitoring, Strategy Monitoring Monte Carlo simulationsample tick data server, Sample Tick Data Server time series data based on, Python Scripts motives, for trading, Algorithmic Trading MRVectorBacktester class, Generalizing the Approach multi-layer perceptron, The Simple Classification Problem Revisited Musashi, Miyamoto, Python Infrastructure N natural language processing (NLP), Retrieving Historical Unstructured Data ndarray class, Vectorization with NumPy-Vectorization with NumPy ndarray objects, NumPy and Vectorization, ndarray Methods and NumPy Functions-ndarray Methods and NumPy Functionscreating, ndarray Creation linear regression and, A Quick Review of Linear Regression regular, Regular ndarray Object nested structures, Data Structures NLP (natural language processing), Retrieving Historical Unstructured Data np.arange(), ndarray Creation numbers, data typing of, Data Types numerical operations, pandas, Numerical Operations NumPy, NumPy and Vectorization-NumPy and Vectorization, NumPy-Random NumbersBoolean operations, Boolean Operations ndarray creation, ndarray Creation ndarray methods, ndarray Methods and NumPy Functions-ndarray Methods and NumPy Functions random numbers, Random Numbers regular ndarray object, Regular ndarray Object universal functions, ndarray Methods and NumPy Functions vectorization, Vectorization with NumPy-Vectorization with NumPy vectorized operations, Vectorized Operations numpy.random sub-package, Random Numbers NYSE Arca Gold Miners Index, Getting into the Basics O Oandaaccount configuration, Configuring Oanda Account account setup, Setting Up an Account API access, The Oanda API-The Oanda API backtesting momentum strategy on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars CFD trading, CFD Trading with Oanda-Python Script factoring in leverage/margin with historical data, Factoring In Leverage and Margin-Factoring In Leverage and Margin implementing trading strategies in real time, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time looking up instruments available for trading, Looking Up Instruments Available for Trading placing market orders, Placing Market Orders-Placing Market Orders Python script for custom streaming class, Python Script retrieving account information, Retrieving Account Information-Retrieving Account Information retrieving historical data, Retrieving Historical Data-Factoring In Leverage and Margin working with streaming data, Working with Streaming Data Oanda v20 RESTful API, The Oanda API, ML-Based Trading Strategy-Persisting the Model Object, Vectorized Backtesting offline algorithmdefined, Signal Generation in Real Time transformation to online algorithm, Online Algorithm OLS (ordinary least squares) regression, matplotlib online algorithmautomated trading operations, Online Algorithm-Online Algorithm defined, Signal Generation in Real Time Python script for momentum online algorithm, Momentum Online Algorithm signal generation in real time, Signal Generation in Real Time-Signal Generation in Real Time transformation of offline algorithm to, Online Algorithm .on_success() method, Implementing Trading Strategies in Real Time, Online Algorithm open data sources, Working with Open Data Sources-Working with Open Data Sources ordinary least squares (OLS) regression, matplotlib out-of-sample evaluation, Generalizing the Approach overfitting, Data Snooping and Overfitting P package manager, conda as, Conda as a Package Manager-Basic Operations with Conda pandas, pandas and the DataFrame Class-pandas and the DataFrame Class, pandas-Input-Output OperationsBoolean operations, Boolean Operations case study, Case Study-Case Study data selection, Data Selection-Data Selection DataFrame class, DataFrame Class-DataFrame Class exporting financial data to Excel/JSON, Exporting to Excel and JSON input-output operations, Input-Output Operations-Input-Output Operations numerical operations, Numerical Operations plotting, Plotting with pandas-Plotting with pandas reading financial data from Excel/JSON, Reading from Excel and JSON reading from a CSV file, Reading from a CSV File with pandas storing DataFrame objects, Storing DataFrame Objects-Storing DataFrame Objects vectorization, Vectorization with pandas-Vectorization with pandas password protection, for Jupyter lab, Jupyter Notebook Configuration File .place_buy_order() method, Backtesting Base Class .place_sell_order() method, Backtesting Base Class Plotlybasics, The Basics multiple real-time streams for, Three Real-Time Streams multiple sub-plots for streams, Three Sub-Plots for Three Streams streaming data as bars, Streaming Data as Bars visualization of streaming data, Visualizing Streaming Data with Plotly-Streaming Data as Bars plotting, with pandas, Plotting with pandas-Plotting with pandas .plot_data() method, Backtesting Base Class polyfit()/polyval() convenience functions, matplotlib price prediction, based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction .print_balance() method, Backtesting Base Class .print_net_wealth() method, Backtesting Base Class .print_transactions() method, Retrieving Account Information pseudo-code, Python versus, Python Versus Pseudo-Code publisher-subscriber (PUB-SUB) pattern, Working with Real-Time Data and Sockets Python (generally)advantages of, Python for Algorithmic Trading basics, Python and Algorithmic Trading-References and Further Resources control structures, Control Structures data structures, Data Structures-Data Structures data types, Data Types-Data Types deployment difficulties, Python Infrastructure idioms, Python Idioms-Python Idioms NumPy and vectorization, NumPy and Vectorization-NumPy and Vectorization obstacles to adoption in financial industry, Python for Finance origins, Python for Finance pandas and DataFrame class, pandas and the DataFrame Class-pandas and the DataFrame Class pseudo-code versus, Python Versus Pseudo-Code reading from a CSV file, Reading from a CSV File with Python-Reading from a CSV File with Python Python infrastructure, Python Infrastructure-References and Further Resourcesconda as package manager, Conda as a Package Manager-Basic Operations with Conda conda as virtual environment manager, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager Docker containers, Using Docker Containers-Building a Ubuntu and Python Docker Image using cloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Up Python scriptsautomated trading operations, Running the Code, Python Script-Strategy Monitoring backtesting base class, Backtesting Base Class custom streaming class that trades a momentum strategy, Python Script linear regression backtesting class, Linear Regression Backtesting Class long-only backtesting class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class real-time data handling, Python Scripts-Sample Data Server for Bar Plot sample time series data set, Python Scripts strategy monitoring, Strategy Monitoring uploading for automated trading operations, Uploading the Code vectorized backtesting, Python Scripts-Mean Reversion Backtesting Class Q Quandlpremium data sets, Working with Open Data Sources working with open data sources, Working with Open Data Sources-Working with Open Data Sources R random numbers, Random Numbers random walk hypothesis, Predicting Index Levels range (iterator object), Control Structures read_csv() function, Reading from a CSV File with pandas real-time data, Working with Real-Time Data and Sockets-Sample Data Server for Bar PlotPython script for handling, Python Scripts-Sample Data Server for Bar Plot signal generation in real time, Signal Generation in Real Time-Signal Generation in Real Time tick data client for, Connecting a Simple Tick Data Client tick data server for, Running a Simple Tick Data Server-Running a Simple Tick Data Server, Sample Tick Data Server visualizing streaming data with Plotly, Visualizing Streaming Data with Plotly-Streaming Data as Bars real-time monitoring, Real-Time Monitoring Refinitiv, Eikon Data API relative maximum drawdown, Case Study returns, predicting future, Predicting Future Returns-Predicting Future Returns risk analysis, for ML-based trading strategy, Risk Analysis-Risk Analysis RSA public/private keys, RSA Public and Private Keys .run_mean_reversion_strategy() method, Long-Only Backtesting Class, Long-Short Backtesting Class .run_simulation() method, Kelly Criterion in Binomial Setting S S&P 500, Algorithmic Trading-Algorithmic Tradinglogistic regression-based strategies and, Generalizing the Approach momentum strategies, Getting into the Basics passive long position in, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices scatter objects, Three Real-Time Streams scientific stack, NumPy and Vectorization, Python, NumPy, matplotlib, pandas scikit-learn, Linear Regression with scikit-learn ScikitBacktester class, Generalizing the Approach-Generalizing the Approach SciPy package project, NumPy and Vectorization seaborn library, matplotlib-matplotlib simple moving averages (SMAs), pandas and the DataFrame Class, Simple Moving Averagestrading strategies based on, Strategies Based on Simple Moving Averages-Generalizing the Approach visualization with price ticks, Three Real-Time Streams .simulate_value() method, Running a Simple Tick Data Server Singer, Paul, CFD Trading with Oanda sockets, real-time data and, Working with Real-Time Data and Sockets-Sample Data Server for Bar Plot sorting list objects, Data Structures SQLite3, Storing Data with SQLite3-Storing Data with SQLite3 SSL certificate, RSA Public and Private Keys storage (see data storage) streaming bar plots, Streaming Data as Bars, Sample Data Server for Bar Plot streaming dataOanda and, Working with Streaming Data visualization with Plotly, Visualizing Streaming Data with Plotly-Streaming Data as Bars string objects (str), Data Types-Data Types Swiss Franc event, CFD Trading with Oanda systematic macro hedge funds, Algorithmic Trading T TensorFlow, Using Deep Learning for Market Movement Prediction, Using Deep Neural Networks to Predict Market Direction Thomas, Rob, Working with Financial Data Thorp, Edward, Capital Management tick data client, Connecting a Simple Tick Data Client tick data server, Running a Simple Tick Data Server-Running a Simple Tick Data Server, Sample Tick Data Server time series data setspandas and vectorization, Vectorization with pandas price prediction based on, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction Python script for generating sample set, Python Scripts SQLite3 for storage of, Storing Data with SQLite3-Storing Data with SQLite3 TsTables for storing, Using TsTables-Using TsTables time series momentum strategies, Strategies Based on Momentum(see also momentum strategies) .to_hdf() method, Storing DataFrame Objects tpqoa wrapper package, The Oanda API, Working with Streaming Data trading platforms, factors influencing choice of, CFD Trading with Oanda trading strategies, Trading Strategies-Conclusions(see also specific strategies) implementing in real time with Oanda, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time machine learning/deep learning, Machine and Deep Learning mean-reversion, NumPy and Vectorization momentum, Momentum simple moving averages, Simple Moving Averages trading, motives for, Algorithmic Trading transaction costs, Long-Only Backtesting Class, Vectorized Backtesting TsTables package, Using TsTables-Using TsTables tuple objects, Data Structures U Ubuntu, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image universal functions, NumPy, ndarray Methods and NumPy Functions V v20 wrapper package, The Oanda API, ML-Based Trading Strategy-Persisting the Model Object, Vectorized Backtesting value-at-risk (VAR), Risk Analysis-Risk Analysis vectorization, NumPy and Vectorization, Strategies Based on Mean Reversion-Generalizing the Approach vectorized backtestingdata snooping and overfitting, Data Snooping and Overfitting-Conclusions ML-based trading strategy, Vectorized Backtesting-Vectorized Backtesting momentum-based trading strategies, Strategies Based on Momentum-Generalizing the Approach potential shortcomings, Building Classes for Event-Based Backtesting Python code with a class for vectorized backtesting of mean-reversion trading strategies, Momentum Backtesting Class Python scripts for, Python Scripts-Mean Reversion Backtesting Class, Linear Regression Backtesting Class regression-based strategy, Vectorized Backtesting of Regression-Based Strategy trading strategies based on simple moving averages, Strategies Based on Simple Moving Averages-Generalizing the Approach vectorization with NumPy, Vectorization with NumPy-Vectorization with NumPy vectorization with pandas, Vectorization with pandas-Vectorization with pandas vectorized operations, Vectorized Operations virtual environment management, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager W while loops, Control Structures Z ZeroMQ, Working with Real-Time Data and Sockets About the Author Dr.

The tick data is resampled to a 30 second interval length (by taking the last value and the sum, respectively)… …which is reflected in the DatetimeIndex of the new DataFrame object. Retrieving Historical Unstructured Data A major strength of working with the Eikon API via Python is the easy retrieval of unstructured data, which can then be parsed and analyzed with Python packages for natural language processing (NLP). Such a procedure is as simple and straightforward as for financial time series data. The code that follows retrieves news headlines for a fixed time interval that includes Apple Inc. as a company and “Macbook” as a word. The five most recent hits are displayed as a maximum: In [53]: headlines = ek.get_news_headlines(query='R:AAPL.O macbook', count=5, date_from='2020-4-1', date_to='2020-5-1') In [54]: headlines Out[54]: versionCreated \ 2020-04-20 21:33:37.332 2020-04-20 21:33:37.332000+00:00 2020-04-20 10:20:23.201 2020-04-20 10:20:23.201000+00:00 2020-04-20 02:32:27.721 2020-04-20 02:32:27.721000+00:00 2020-04-15 12:06:58.693 2020-04-15 12:06:58.693000+00:00 2020-04-09 21:34:08.671 2020-04-09 21:34:08.671000+00:00 text \ 2020-04-20 21:33:37.332 Apple said to launch new AirPods, MacBook Pro ... 2020-04-20 10:20:23.201 Apple might launch upgraded AirPods, 13-inch M... 2020-04-20 02:32:27.721 Apple to reportedly launch new AirPods alongsi... 2020-04-15 12:06:58.693 Apple files a patent for iPhones, MacBook indu... 2020-04-09 21:34:08.671 Apple rolls out new software update for MacBoo... storyId \ 2020-04-20 21:33:37.332 urn:newsml:reuters.com:20200420:nNRAble9rq:1 2020-04-20 10:20:23.201 urn:newsml:reuters.com:20200420:nNRAbl8eob:1 2020-04-20 02:32:27.721 urn:newsml:reuters.com:20200420:nNRAbl4mfz:1 2020-04-15 12:06:58.693 urn:newsml:reuters.com:20200415:nNRAbjvsix:1 2020-04-09 21:34:08.671 urn:newsml:reuters.com:20200409:nNRAbi2nbb:1 sourceCode 2020-04-20 21:33:37.332 NS:TIMIND 2020-04-20 10:20:23.201 NS:BUSSTA 2020-04-20 02:32:27.721 NS:HINDUT 2020-04-15 12:06:58.693 NS:HINDUT 2020-04-09 21:34:08.671 NS:TIMIND In [55]: story = headlines.iloc[0] In [56]: story Out[56]: versionCreated 2020-04-20 21:33:37.332000+00:00 text Apple said to launch new AirPods, MacBook Pro ... storyId urn:newsml:reuters.com:20200420:nNRAble9rq:1 sourceCode NS:TIMIND Name: 2020-04-20 21:33:37.332000, dtype: object In [57]: news_text = ek.get_news_story(story['storyId']) In [58]: from IPython.display import HTML In [59]: HTML(news_text) Out[59]: <IPython.core.display.HTML object> NEW DELHI: Apple recently launched its much-awaited affordable smartphone iPhone SE.


pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzić

Albert Einstein, algorithmic bias, backpropagation, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application

In fact, recent studies indicate that 80% of a company’s information is contained in text documents. Text mining, however, is also a much more complex task than traditional data mining as it involves dealing with unstructured text data that are inherently ambiguous. Text mining is a multidisciplinary field involving IR, text analysis, information extraction, natural language processing, clustering, categorization, visualization, machine learning, and other methodologies already included in the data-mining “menu”; even some additional specific techniques developed lately and applied on semi-structured data can be included in this field. Market research, business-intelligence gathering, e-mail management, claim analysis, e-procurement, and automated help desk are only a few of the possible applications where text mining can be deployed successfully.

On the other hand, polysemes are words that have multiple meanings. The term “bank” could mean a financial system, to rely upon, or a type of basketball shot. All of these lead to very different types of documents, which can be problematic for document comparisons. LSA attempts to solve these problems, not with extensive dictionaries and natural language processing engines, but by using mathematical patterns within the data itself to uncover these relationships. We do this by reducing the number of dimensions used to represent a document using a mathematical matrix operation called singular value decomposition (SVD). Let us take a look at an example data set.

For example, with the MM shown in Figure 12.21, the probability that the MM takes the horizontal path from starting node to S2 is 0.4 × 0.7 = 0.28. Figure 12.21. A simple Markov Model. MM is derived based on the memoryless assumption. It states that given the current state of the system, the future evolution of the system is independent of its history. MMs have been used widely in speech recognition and natural language processing. Hidden Markov Model (HMM) is an extension to MM. Similar to MM, HMM consists of a set of states and transition probabilities. In a regular MM, the states are visible to the observer, and the state-transition probabilities are the only parameters. In HMM, each state is associated with a state-probability distribution.


pages: 391 words: 105,382

Utopia Is Creepy: And Other Provocations by Nicholas Carr

Abraham Maslow, Air France Flight 447, Airbnb, Airbus A320, AltaVista, Amazon Mechanical Turk, augmented reality, autonomous vehicles, Bernie Sanders, book scanning, Brewster Kahle, Buckminster Fuller, Burning Man, Captain Sullenberger Hudson, centralized clearinghouse, Charles Lindbergh, cloud computing, cognitive bias, collaborative consumption, computer age, corporate governance, CRISPR, crowdsourcing, Danny Hillis, data science, deskilling, digital capitalism, digital map, disruptive innovation, Donald Trump, driverless car, Electric Kool-Aid Acid Test, Elon Musk, Evgeny Morozov, factory automation, failed state, feminist movement, Frederick Winslow Taylor, friendly fire, game design, global village, Google bus, Google Glasses, Google X / Alphabet X, Googley, hive mind, impulse control, indoor plumbing, interchangeable parts, Internet Archive, invention of movable type, invention of the steam engine, invisible hand, Isaac Newton, Jeff Bezos, jimmy wales, Joan Didion, job automation, John Perry Barlow, Kevin Kelly, Larry Ellison, Lewis Mumford, lifelogging, lolcat, low skilled workers, machine readable, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Max Levchin, means of production, Menlo Park, mental accounting, natural language processing, Neal Stephenson, Network effects, new economy, Nicholas Carr, Nick Bostrom, Norman Mailer, off grid, oil shale / tar sands, Peter Thiel, plutocrats, profit motive, Ralph Waldo Emerson, Ray Kurzweil, recommendation engine, Republic of Letters, robot derives from the Czech word robota Czech, meaning slave, Ronald Reagan, scientific management, self-driving car, SETI@home, side project, Silicon Valley, Silicon Valley ideology, Singularitarianism, Snapchat, social graph, social web, speech recognition, Startup school, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, technoutopianism, TED Talk, the long tail, the medium is the message, theory of mind, Turing test, Tyler Cowen, Whole Earth Catalog, Y Combinator, Yochai Benkler

Google’s conception of searching has changed since those early days, and that means our own idea of what it means to search is changing as well. Google’s goal is no longer to read the web. It’s to read us. Ray Kurzweil, the inventor and AI speculator, recently joined the company as a director of engineering. His general focus will be on machine learning and natural language processing. But his particular concern will entail reconfiguring the company’s search engine to focus not outwardly on the world but inwardly on the user. “I envision some years from now that the majority of search queries will be answered without you actually asking,” he recently explained. “It’ll just know this is something that you’re going to want to see.”

“The speaking of language is part of an activity, or of a form of life,” wrote Wittgenstein in Philosophical Investigations. If human language is bound up in living, if it is an expression of both sense and sensibility, then computers, being nonliving, having no sensibility, will have a very difficult time mastering “natural-language processing” beyond a certain rudimentary level. The best solution, if you have a need to get computers to “understand” human communication, may be to avoid the problem altogether. Instead of figuring out how to get computers to understand natural language, you get people to speak artificial language, the language of computers.

., 226 video games and, 94–97 Merholz, Peter, 21 Merleau-Ponty, Maurice, 300 Merton, Robert, 12–13 message-automation service, 167 Meyer, Stephenie, 50 Meyerowitz, Joanne, 338 microfilm, microphotography, 267 Microsoft, 108, 168, 205, 284 military technology, 331–32 Miller, Perry, xvii mindfulness, 162 Minima Moralia (Adorno), 153–54 mirrors, 138–39 Mitchell, Joni, 128 Mollie (video poker player), 218–19 monitoring: corporate control through, 163–65 of thoughts, 214–15 through wearable behavior-modification devices, 168–69 Montaigne, Michel de, 247, 249, 252, 254 Moore, Geoffrey, 209 Morlocks, 114, 186 “Morphological Basis of the Arm-to-Wing Transition, The” (Poore), 329–30 Morrison, Ewan, 288 Morrison, Jim, 126 Morse code, 34 “Most of It, The” (Frost), 145–46 motor skills, video games and, 93–94 “Mowing” (Frost), 296–300, 302, 304–5 MP3 players, 122, 123, 124, 216, 218, 293 multitasking, media, 96–97 Mumford, Lewis, 138–39, 235 Murdoch, Rupert and Wendi, 131 music: bundling of, 41–46 commercial use of, 244–45 copying and sharing technologies for, 121–26, 314 digital revolution in, 293–95 fidelity of, 124 listening vs. interface in, 216–18, 293 in participatory games, 71–72 streamed and curated, 207, 217–18 music piracy, 121–26 Musings on Human Metamorphoses (Leary), 171 Musk, Elon, 172 Musset, Alfred de, xxiii Muzak, 208, 244 MySpace, xvi, 10–11, 30–31 “Names of the Hare, The,” 201 nanotechnology, 69 Napster, 122, 123 narcissism, 138–39 Twitter and, 34–36 narrative emotions, 250 natural-language processing, 215 Negroponte, Nicholas, xx neobehavioralism, 212–13 Netflix, 92 neural networks, 136–37 neuroengineering, 332–33 New Critics, 249 News Feed, 320 news media, 318–20 newspapers: evolution of, 79, 237 online archives of, 47–48, 190–92 online vs. printed, 289 Newton, Isaac, 66 New York Public Library, 269 New York Times, 8, 71, 83, 133, 152–53, 195, 237, 283, 314, 342 erroneous information revived by, 47–48 on Twitter, 35 Nielsen Company, 80–81 Nietzsche, Friedrich, 126, 234–35, 237 Nightingale, Paul, 335 Nixon, Richard, 317 noise pollution, 243–46 Nook, 257 North of Boston (Frost), 297 nostalgia, 202, 204, 312 in music, 292–95 Now You See It (Davidson), 94 Oates, Warren, 203 Oatley, Keith, 248–50 Obama, Barack, 314 obsession, 218–19 OCLC, 276 “off grid,” 52 Olds, James, 235 O’Neill, Gerard, 171 One Infinite Loop, 76 Ong, Walter, 129 online aggregation, 192 On Photography (Sontag), xx open networks, profiteering from, 83–85 open-source projects, 5–7, 26 Oracle, 17 orchises, 305 O’Reilly, Tim, 3–5, 7 organ donation and transplantation, 115 ornithopters, 239 orphan books, 276, 277 Overture, 279–80 Owad, Tom, 256 Oxford Junior Dictionary, 201–2 Oxford University, library of, 269 Page, Larry, 23, 160, 172, 239, 268–69, 270, 279, 281–85 personal style of, 16–17, 281–82, 285 paint-by-number kits, 71–72 Paley, William, 43 Palfrey, John, 272–74, 277 Palmisano, Sam, 26 “pancake people,” 242 paper, invention and uses of, 286–89 Paper: An Elegy (Sansom), 287 Papert, Seymour, 134 Paradise within the Reach of All Men, The (Etzler), xvi–xvii paradox of time, 203–4 parenting: automation of, 181 of virtual child, 73–75 Parker, Sarah Jessica, 131 participation: “cognitive surplus” in, 59 as content and performance, 184 inclusionists vs. deletionists in, 18–20 internet, 28–29 isolation and, 35–36, 184 limits and flaws of, 5–7, 62 Paul, Rand, 314 Pendragon, Caliandras (avatar), 25 Pentland, Alex, 212–13 perception, spiritual awakening of, 300–301 personalization, 11 of ads, 168, 225, 264 isolation and, 29 loss of autonomy in, 264–66 manipulation through, 258–59 in message automation, 167 in searches, 145–46, 264–66 of streamed music, 207–9, 245 tailoring in, 92, 224 as threat to privacy, 255 Phenomenology of Perception (Merleau-Ponty), 300 Philosophical Investigations (Wittgenstein), 215 phonograph, phonograph records, 41–46, 133, 287 photography, technological advancement in, 311–12 Pichai, Sundar, 181 Pilgrims, 172 Pinterest, 119, 186 playlists, 314 PlayStation, 260 “poetic faith,” 251 poetry, 296–313 polarization, 7 politics, transformed by technology, 314–20 Politics (Aristotle), 307–8 Poore, Samuel O., 329–30 pop culture, fact-mongering in, 58–62 pop music, 44–45, 63–64, 224 copying technologies for, 121–26 dead idols of, 126 industrialization of, 208–9 as retrospective and revivalist, 292–95 positivism, 211 Potter, Dean, 341–42 power looms, 178 Presley, Elvis, 11, 126 Prim Revolution, 26 Principles of Psychology (James), 203 Principles of Scientific Management, The (Taylor), 238 printing press: consequences of, 102–3, 234, 240–41, 271 development of, 53, 286–87 privacy: devaluation of, 258 from electronic surveillance, 52 family cohesion vs., 229 free flow of information vs. right to, 190–94 internet threat to, 184, 255–59, 265, 285 safeguarding of, 258–59, 283 vanity vs., 107 proactive cognitive control, 96 Prochnik, George, 243–46 “Productivity Future Vision (2011),” 108–9 Project Gutenberg, 278 prosperity, technologies of, 118, 119–20 prosumerism, 64 protest movements, 61 Proust and the Squid (Wolf), 234 proximal clues, 303 public-domain books, 277–78 “public library,” debate over use of term, 272–74 punch-card tabulator, 188 punk music, 63–64 Quantified Self Global Conference, 163 Quantified Self (QS) movement, 163–65 Quarter-of-a-Second Rule, 205 racecars, 195, 196 radio: in education, 134 evolution of, 77, 79, 159, 288 as music medium, 45, 121–22, 207 political use of, 315–16, 317, 319 Radosh, Daniel, 71 Rapp, Jen, 341–42 reactive cognitive control, 96 Readers’ Guide to Periodical Literature, 91 reading: brain function in, 247–54, 289–90 and invention of paper, 286–87 monitoring of, 257 video gaming vs., 261–62 see also books reading skills, changes in, 232–34, 240–41 Read Write Web (blog), 30 Reagan, Ronald, 315 real world: digital media intrusion in, 127–30 perceived as boring and ugly, 157–58 as source of knowledge, 313 virtual world vs., xx–xxi, 36, 62, 127–30, 303–4 reconstructive surgery, 239 record albums: copying of, 121–22 jackets for, 122, 224 technology of, 41–46 Redding, Otis, 126 Red Light Center, 39 Reichelt, Franz, 341 Reid, Rob, 122–25 relativists, 20 religion: internet perceived as, 3–4, 238 for McLuhan, 105 technology viewed as, xvi–xvii Republic of Letters, 271 reputations, tarnishing of, 47–48, 190–94 Resident Evil, 260–61 resource sharing, 148–49 resurrection, 69–70, 126 retinal implants, 332 Retromania (Reynolds), 217, 292–95 Reuters, Adam, 26 Reuters’ SL bureau, 26 revivification machine, 69–70 Reynolds, Simon, 217–18, 292–95 Rice, Isaac, 244 Rice, Julia Barnett, 243–44 Richards, Keith, 42 “right to be forgotten” lawsuit, 190–94 Ritalin, 304 robots: control of, 303 creepy quality of, 108 human beings compared to, 242 human beings replaced by, 112, 174, 176, 195, 197, 306–7, 310 limitations of, 323 predictions about, xvii, 177, 331 replaced by humans, 323 threat from, 226, 309 Rogers, Roo, 83–84 Rolling Stones, 42–43 Roosevelt, Franklin, 315 Rosen, Nick, 52 Rubio, Marco, 314 Rumsey, Abby Smith, 325–27 Ryan, Amy, 273 Sandel, Michael J., 340 Sanders, Bernie, 314, 316 Sansom, Ian, 287 Savage, Jon, 63 scatology, 147 Schachter, Joshua, 195 Schivelbusch, Wolfgang, 229 Schmidt, Eric, 13, 16, 238, 239, 257, 284 Schneier, Bruce, 258–59 Schüll, Natasha Dow, 218 science fiction, 106, 115, 116, 150, 309, 335 scientific management, 164–65, 237–38 Scrapbook in American Life, The, 185 scrapbooks, social media compared to, 185–86 “Scrapbooks as Cultural Texts” (Katriel and Farrell), 186 scythes, 302, 304–6 search-engine-optimization (SEO), 47–48 search engines: allusions sought through, 86 blogging, 66–67 in centralization of internet, 66–69 changing use of, 284 customizing by, 264–66 erroneous or outdated stories revived by, 47–48, 190–94 in filtering, 91 placement of results by, 47–48, 68 searching vs., 144–46 targeting information through, 13–14 writing tailored to, 89 see also Google searching, ontological connotations of, 144–46 Seasteading Institute, 172 Second Life, 25–27 second nature, 179 self, technologies of the, 118, 119–20 self-actualization, 120, 340 monitoring and quantification of, 163–65 selfies, 224 self-knowledge, 297–99 self-reconstruction, 339 self-tracking, 163–65 Selinger, Evan, 153 serendipity, internet as engine of, 12–15 SETI@Home, 149 sexbots, 55 Sex Pistols, 63 sex-reassignment procedures, 337–38 sexuality, 10–11 virtual, 39 Shakur, Tupac, 126 sharecropping, as metaphor for social media, 30–31 Shelley, Percy Bysshe, 88 Shirky, Clay, 59–61, 90, 241 Shop Class as Soulcraft (Crawford), 265 Shuster, Brian, 39 sickles, 302 silence, 246 Silicon Valley: American culture transformed by, xv–xxii, 148, 155–59, 171–73, 181, 241, 257, 309 commercial interests of, 162, 172, 214–15 informality eschewed by, 197–98, 215 wealthy lifestyle of, 16–17, 195 Simonite, Tom, 136–37 simulation, see virtual world Singer, Peter, 267 Singularity, Singularitarians, 69, 147 sitcoms, 59 situational overload, 90–92 skimming, 233 “Slaves to the Smartphone,” 308–9 Slee, Tom, 61, 84 SLExchange, 26 slot machines, 218–19 smart bra, 168–69 smartphones, xix, 82, 136, 145, 150, 158, 168, 170, 183–84, 219, 274, 283, 287, 308–9, 315 Smith, Adam, 175, 177 Smith, William, 204 Snapchat, 166, 205, 225, 316 social activism, 61–62 social media, 224 biases reinforced by, 319–20 as deceptively reflective, 138–39 documenting one’s children on, 74–75 economic value of content on, 20–21, 53–54, 132 emotionalism of, 316–17 evolution of, xvi language altered by, 215 loom as metaphor for, 178 maintaining one’s microcelebrity on, 166–67 paradox of, 35–36, 159 personal information collected and monitored through, 257 politics transformed by, 314–20 scrapbooks compared to, 185–86 self-validation through, 36, 73 traditional media slow to adapt to, 316–19 as ubiquitous, 205 see also specific sites social organization, technologies of, 118, 119 Social Physics (Pentland), 213 Society for the Suppression of Unnecessary Noise, 243–44 sociology, technology and, 210–13 Socrates, 240 software: autonomous, 187–89 smart, 112–13 solitude, media intrusion on, 127–30, 253 Songza, 207 Sontag, Susan, xx SoundCloud, 217 sound-management devices, 245 soundscapes, 244–45 space travel, 115, 172 spam, 92 Sparrow, Betsy, 98 Special Operations Command, U.S., 332 speech recognition, 137 spermatic, as term applied to reading, 247, 248, 250, 254 Spinoza, Baruch, 300–301 Spotify, 293, 314 “Sprite Sips” (app), 54 Squarciafico, Hieronimo, 240–41 Srinivasan, Balaji, 172 Stanford Encyclopedia of Philosophy, 68 Starr, Karla, 217–18 Star Trek, 26, 32, 313 Stengel, Rick, 28 Stephenson, Neal, 116 Sterling, Bruce, 113 Stevens, Wallace, 158 Street View, 137, 283 Stroop test, 98–99 Strummer, Joe, 63–64 Studies in Classic American Literature (Lawrence), xxiii Such Stuff as Dreams (Oatley), 248–49 suicide rate, 304 Sullenberger, Sully, 322 Sullivan, Andrew, xvi Sun Microsystems, 257 “surf cams,” 56–57 surfing, internet, 14–15 surveillance, 52, 163–65, 188–89 surveillance-personalization loop, 157 survival, technologies of, 118, 119 Swing, Edward, 95 Talking Heads, 136 talk radio, 319 Tan, Chade-Meng, 162 Tapscott, Don, 84 tattoos, 336–37, 340 Taylor, Frederick Winslow, 164, 237–38 Taylorism, 164, 238 Tebbel, John, 275 Technics and Civilization (Mumford), 138, 235 technology: agricultural, 305–6 American culture transformed by, xv–xxii, 148, 155–59, 174–77, 214–15, 229–30, 296–313, 329–42 apparatus vs. artifact in, 216–19 brain function affected by, 231–42 duality of, 240–41 election campaigns transformed by, 314–20 ethical hazards of, 304–11 evanescence and obsolescence of, 327 human aspiration and, 329–42 human beings eclipsed by, 108–9 language of, 201–2, 214–15 limits of, 341–42 master-slave metaphor for, 307–9 military, 331–32 need for critical thinking about, 311–13 opt-in society run by, 172–73 progress in, 77–78, 188–89, 229–30 risks of, 341–42 sociology and, 210–13 time perception affected by, 203–6 as tool of knowledge and perception, 299–304 as transcendent, 179–80 Technorati, 66 telegrams, 79 telegraph, Twitter compared to, 34 telephones, 103–4, 159, 288 television: age of, 60–62, 79, 93, 233 and attention disorders, 95 in education, 134 Facebook ads on, 155–56 introduction of, 103–4, 159, 288 news coverage on, 318 paying for, 224 political use of, 315–16, 317 technological adaptation of, 237 viewing habits for, 80–81 Teller, Astro, 195 textbooks, 290 texting, 34, 73, 75, 154, 186, 196, 205, 233 Thackeray, William, 318 “theory of mind,” 251–52 Thiel, Peter, 116–17, 172, 310 “Things That Connect Us, The” (ad campaign), 155–58 30 Days of Night (film), 50 Thompson, Clive, 232 thought-sharing, 214–15 “Three Princes of Serendip, The,” 12 Thurston, Baratunde, 153–54 time: memory vs., 226 perception of, 203–6 Time, covers of, 28 Time Machine, The (Wells), 114 tools: blurred line between users and, 333 ethical choice and, 305 gaining knowledge and perception through, 299–304 hand vs. computer, 306 Home and Away blurred by, 159 human agency removed from, 77 innovation in, 118 media vs., 226 slave metaphor for, 307–8 symbiosis with, 101 Tosh, Peter, 126 Toyota Motor Company, 323 Toyota Prius, 16–17 train disasters, 323–24 transhumanism, 330–40 critics of, 339–40 transparency, downside of, 56–57 transsexuals, 337–38 Travels and Adventures of Serendipity, The (Merton and Barber), 12–13 Trends in Biochemistry (Nightingale and Martin), 335 TripAdvisor, 31 trolls, 315 Trump, Donald, 314–18 “Tuft of Flowers, A” (Frost), 305 tugboats, noise restrictions on, 243–44 Tumblr, 166, 185, 186 Turing, Alan, 236 Turing Test, 55, 137 Twain, Mark, 243 tweets, tweeting, 75, 131, 315, 319 language of, 34–36 theses in form of, 223–26 “tweetstorm,” xvii 20/20, 16 Twilight Saga, The (Meyer), 50 Twitter, 34–36, 64, 91, 119, 166, 186, 197, 205, 223, 224, 257, 284 political use of, 315, 317–20 2001: A Space Odyssey (film), 231, 242 Two-Lane Blacktop (film), 203 “Two Tramps in Mud Time” (Frost), 247–48 typewriters, writing skills and, 234–35, 237 Uber, 148 Ubisoft, 261 Understanding Media (McLuhan), 102–3, 106 underwearables, 168–69 unemployment: job displacement in, 164–65, 174, 310 in traditional media, 8 universal online library, 267–78 legal, commercial, and political obstacles to, 268–71, 274–78 universe, as memory, 326 Urban Dictionary, 145 utopia, predictions of, xvii–xviii, xx, 4, 108–9, 172–73 Uzanne, Octave, 286–87, 290 Vaidhyanathan, Siva, 277 vampires, internet giants compared to, 50–51 Vampires (game), 50 Vanguardia, La, 190–91 Van Kekerix, Marvin, 134 vice, virtual, 39–40 video games, 223, 245, 303 as addictive, 260–61 cognitive effects of, 93–97 crafting of, 261–62 violent, 260–62 videos, viewing of, 80–81 virtual child, tips for raising a, 73–75 virtual world, xviii commercial aspects of, 26–27 conflict enacted in, 25–27 language of, 201–2 “playlaborers” of, 113–14 psychological and physical health affected by, 304 real world vs., xx–xxi, 36, 62, 127–30 as restrictive, 303–4 vice in, 39–40 von Furstenberg, Diane, 131 Wales, Jimmy, 192 Wallerstein, Edward, 43–44 Wall Street, automation of, 187–88 Wall Street Journal, 8, 16, 86, 122, 163, 333 Walpole, Horace, 12 Walters, Barbara, 16 Ward, Adrian, 200 Warhol, Andy, 72 Warren, Earl, 255, 257 “Waste Land, The” (Eliot), 86, 87 Watson (IBM computer), 147 Wealth of Networks, The (Benkler), xviii “We Are the Web” (Kelly), xxi, 4, 8–9 Web 1.0, 3, 5, 9 Web 2.0, xvi, xvii, xxi, 33, 58 amorality of, 3–9, 10 culturally transformative power of, 28–29 Twitter and, 34–35 “web log,” 21 Wegner, Daniel, 98, 200 Weinberger, David, 41–45, 277 Weizenbaum, Joseph, 236 Wells, H.


pages: 223 words: 60,909

Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech by Sara Wachter-Boettcher

"Susan Fowler" uber, Abraham Maslow, Airbnb, airport security, algorithmic bias, AltaVista, big data - Walmart - Pop Tarts, Big Tech, Black Lives Matter, data science, deep learning, Donald Trump, fake news, false flag, Ferguson, Missouri, Firefox, Grace Hopper, Greyball, Hacker News, hockey-stick growth, independent contractor, job automation, Kickstarter, lifelogging, lolcat, Marc Benioff, Mark Zuckerberg, Max Levchin, Menlo Park, meritocracy, microaggression, move fast and break things, natural language processing, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, off-the-grid, pattern recognition, Peter Thiel, real-name policy, recommendation engine, ride hailing / ride sharing, Salesforce, self-driving car, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, Snapchat, Steve Jobs, Tactical Technology Collective, TED Talk, Tim Cook: Apple, Travis Kalanick, upwardly mobile, Wayback Machine, women in the workforce, work culture , zero-sum game

In 2013, Google researchers trained a system to comb through Google News articles, parsing huge amounts of text and identifying patterns in how words are used within them. The result is Word2vec, a neural network made up of 3 million word embeddings, or semantic relationships between words. What Word2vec does is essentially reconstruct the way words work linguistically, in order to improve capabilities for natural language processing: the practice of teaching machines to understand human language as it’s spoken or written day to day—the kind of thing that allows Siri or a search engine to understand what you mean and provide an answer. Word2vec and other similar word-embedding systems do this by looking at how frequently pairs of words appear in the same text, and how near each other they appear.

See also marginalized populations and companies’ collection of gender information, 62–64 and companies’ name policies, 54–55, 58 and edge cases, 38 and Etsy, 32–33 importance of tech to, 195–197 and normalizing TV programming, 48 and same-sex marriage, 196–198 and Milos Yiannopoulos, 153 Lil Miss Hot Mess, 55 location tracking, 105–108 Lone Hill, Dana, 54 McAdoo, Greg, 175 McBride, Sarah, 175 machine-learning products, 121, 128, 132, 135, 136, 140, 146 Mack, Arien, 95 McKesson, DeRay, 81 Mad Money (TV show), 158 MailChimp, 89–90 marginalized populations and default settings, 37, 66 and digital forms, 51, 61, 72, 75 and digital products’ personal data collection, 116–117 importance of tech to, 195–197 market negging, and opt-ins, 91–92, 97 Martin, Erik, 162–163 Martin, Trayvon, 141 Martinez, Chris, 30 Maslow, Abraham, 3 maternity policies, 16 MAUs (monthly active users) metric, 74, 97–98 May, Rob, 139 Mayer, Marissa, 143 Medium publishing platform, 87–88, 180 menstrual cycle tracking apps, 28–33 Mental Models (Young), 46 meritocracy and ethics, 176, 189 tech industry as, 173–177, 180 Uber as, 180 Messer, Madeline, 35, 37 metadata from emails, 102 Meyer, Eric, 4–5, 40, 64, 79, 82, 89, 96 Meyer, Rebecca, 4–5, 5 microaggressions, 70–73 Microsoft, 6, 36–37 Miley, Leslie, 158 misplaced celebrations and humor, 78–85, 87–90, 114–115, 200 Moments Facebook feature, 85, 97 monoculture, tech industry as, 188–189 monthly active users (MAUs) metric, 74, 97–98 Mosseri, Adam, 168 Mozilla, 102 multiracial populations, and form field design, 60–62 mystification of tech, 9, 11–12, 26, 143, 188, 191–193, 199 National Public Radio (NPR), 1, 40–44 National Security Agency (NSA), 102 National Suicide Prevention Lifeline, 6 Native Americans, Facebook’s rejection of names of, 53–57 natural language processing, 138 negging, 91–92, 97 Neighbors for Racial Justice, 69 Netflix, 144 neural networks, 131–133, 138 News Feed Facebook feature, 144, 168–169 Nextdoor app, 67–71, 71, 73–75 Noble, Safiya, 10, 113 non-binary people. See LGBTQ community Northpointe, 120, 125–127 Note to Self podcast, 130, 171 Nye, Bill, 1 Ohanian, Alexis, 161, 164 O’Neil, Cathy, 112, 126 online time, growth of Americans, 1–3 On This Day Facebook feature, 83–84, 97 opt-in pop-ups, 90–92, 97 oversight, tech industry’s desire to avoid, 187–189, 199 Page, Shirley, 133 Palantir, 199–200 Pancake, Beth, 57 Pao, Ellen, 162 Parker, Bernard, 119–120 PayPal, 175 Penny, Laurie, 153 personal data and algorithmic systems, 145 collected during mobile usage, 116–117 and data brokers, 101–104 digital products designed to collect, 105–117 tech industry’s responsibility for, 146 value of, 96–98 personalization of online content, 86–90, 99 personal names, digital forms’ problems with, 40, 52–59, 71–72, 75 personas, 27–33, 29, 44–47, 110 Phillips, Katherine W., 184–186 photo autotagging, 129–130, 129, 130, 132–133, 135–138, 145 pickup artist (PUA) community, 91–92 Pinterest, 42 political bias, and Trending Facebook feature, 165–167, 169 Practical Empathy (Young), 46 privacy and digital products’ collection of personal data, 115, 117 and Facebook, 108–109 and Google, 109 and Uber, 107–108 ProPublica, 103, 112–113, 120, 126–127 proxy data, 109–114 PUA (pickup artist) community, 91–92 PureGym, 6 push notifications, 198 Quantified Self movement, 28 queer people.


pages: 282 words: 63,385

Attention Factory: The Story of TikTok and China's ByteDance by Matthew Brennan

Airbnb, AltaVista, augmented reality, Benchmark Capital, Big Tech, business logic, Cambridge Analytica, computer vision, coronavirus, COVID-19, deep learning, Didi Chuxing, Donald Trump, en.wikipedia.org, fail fast, Google X / Alphabet X, growth hacking, ImageNet competition, income inequality, invisible hand, Kickstarter, Mark Zuckerberg, Menlo Park, natural language processing, Netflix Prize, Network effects, paypal mafia, Pearl River Delta, pre–internet, recommendation engine, ride hailing / ride sharing, Sheryl Sandberg, Silicon Valley, Snapchat, social graph, Steve Jobs, TikTok, Travis Kalanick, WeWork, Y Combinator

His presentation informs much of what we cover below. 96 ByteDance’s system centers around three profiles: the content profile, the user profile, and the environment profile. For the content profile, Cao gave the example of a written news article about an English Premier League football match between Liverpool and Manchester United. Keywords would be extracted from the article using natural language processing, in this case, “Liverpool Football Club,” “Manchester United Football Club,” “English Premier League,” and names of several key players from the game such as “David de Gea.” Relevance values are then assigned to the keywords. In the example, “Manchester United Football Club” was 0.9835, and “David de Gea” was 0.9973, both very high as to be expected.

The system still relied heavily on human labor, an army of staff conducting basic repetitive tasks such as tagging articles and manually reviewing content, which aided the machine learning. Reliably extracting key terms was critical for accurate recommendation, but technologies such as natural language processing could only get you so far. Regardless, no matter how accurate their recommendation, merely having a better product than competitors was also not enough. To rapidly grow the Toutiao user base and scale the company to unicorn valuations, the team had to master the darker arts of growth hacking.


pages: 492 words: 118,882

The Blockchain Alternative: Rethinking Macroeconomic Policy and Economic Theory by Kariappa Bheemaiah

"World Economic Forum" Davos, accounting loophole / creative accounting, Ada Lovelace, Adam Curtis, Airbnb, Alan Greenspan, algorithmic trading, asset allocation, autonomous vehicles, balance sheet recession, bank run, banks create money, Basel III, basic income, behavioural economics, Ben Bernanke: helicopter money, bitcoin, Bletchley Park, blockchain, Bretton Woods, Brexit referendum, business cycle, business process, call centre, capital controls, Capital in the Twenty-First Century by Thomas Piketty, cashless society, cellular automata, central bank independence, Charles Babbage, Claude Shannon: information theory, cloud computing, cognitive dissonance, collateralized debt obligation, commoditize, complexity theory, constrained optimization, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, cross-border payments, crowdsourcing, cryptocurrency, data science, David Graeber, deep learning, deskilling, Diane Coyle, discrete time, disruptive innovation, distributed ledger, diversification, double entry bookkeeping, Ethereum, ethereum blockchain, fiat currency, financial engineering, financial innovation, financial intermediation, Flash crash, floating exchange rates, Fractional reserve banking, full employment, George Akerlof, Glass-Steagall Act, Higgs boson, illegal immigration, income inequality, income per capita, inflation targeting, information asymmetry, interest rate derivative, inventory management, invisible hand, John Maynard Keynes: technological unemployment, John von Neumann, joint-stock company, Joseph Schumpeter, junk bonds, Kenneth Arrow, Kenneth Rogoff, Kevin Kelly, knowledge economy, large denomination, Large Hadron Collider, Lewis Mumford, liquidity trap, London Whale, low interest rates, low skilled workers, M-Pesa, machine readable, Marc Andreessen, market bubble, market fundamentalism, Mexican peso crisis / tequila crisis, Michael Milken, MITM: man-in-the-middle, Money creation, money market fund, money: store of value / unit of account / medium of exchange, mortgage debt, natural language processing, Network effects, new economy, Nikolai Kondratiev, offshore financial centre, packet switching, Pareto efficiency, pattern recognition, peer-to-peer lending, Ponzi scheme, power law, precariat, pre–internet, price mechanism, price stability, private sector deleveraging, profit maximization, QR code, quantitative easing, quantitative trading / quantitative finance, Ray Kurzweil, Real Time Gross Settlement, rent control, rent-seeking, robo advisor, Satoshi Nakamoto, Satyajit Das, Savings and loan crisis, savings glut, seigniorage, seminal paper, Silicon Valley, Skype, smart contracts, software as a service, software is eating the world, speech recognition, statistical model, Stephen Hawking, Stuart Kauffman, supply-chain management, technology bubble, The Chicago School, The Future of Employment, The Great Moderation, the market place, The Nature of the Firm, the payments system, the scientific method, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, too big to fail, trade liberalization, transaction costs, Turing machine, Turing test, universal basic income, Vitalik Buterin, Von Neumann architecture, Washington Consensus

Advantages: easier and faster access to funds, less red tape, transparency, reputation awareness, and appropriate matching of risk based on client segment diversity Risks: reputational risks (right to be forgotten, unestablished standards, regulation, and data privacy 3. Investment Management Stance: Customer-facing Main technologies: Big Data, Machine Learning, Trading Algorithms, Social Media, Robo-Advisory, AI, Natural Language Processing (NLP), Cloud Computing. One of the most adverse outcomes of the crisis was its impact on wealth management : banks suffered a loss of trust, while potential clients now required higher amounts of capital in order to invest. As wages stagnated and employment slowed, it became increasingly difficult for new investors to invest smaller sums of money.

A Chatbot is essentially a service, powered by rules and artificial intelligence (AI), that a user can interact with via a chat interface. The service could be anything ranging from functional to fun, and it could exist in any chat product (Facebook Messenger, Slack, telegram, text messages, etc.). Recent advancements in Natural Language Processing (NLP) and Automatic Speech Recognition (ASR) , coupled with crowdsourced data inputs and machine learning techniques, now allow AI’s to not just understand groups of words but also submit a corresponding natural response to a grouping of words. That’s essentially the base definition of a conversation, except this conversation is with a “bot.”

relative industry shares risk innovation CDOs, CLOs and CDSs non-financial firms originate, repackage and sell model originate-to-distribute model originate-to-hold model principal component production and exchange sharding Blockchain FinTech transformation global Fintech financing activity private sector skeleton keys AI-led high frequency trading amalgamation Blockchain fragmentation process information asymmetries Kabbage KYC/AML procedures KYC process machine learning P2P lending sector payments and remittances sector physical barriers rehypothecation robo-advisors SWIFT and ACH transferwise solution pathways digital identity and KYC private and public utilization scalability TBTF See(Too Big to Fail (TBTF)) television advertisement Financialization SeeFragmentation Financial Stability Oversight Committee (FSOC) Financial system Financial Technology (FinTech) capital markets Carney, Mark CHIPS financial services financing activities histroy insurance sector investment/wealth management lending platforms payments Foreign direct investment (FDI) Fractional Reserve banking base and broad money capital requirements central banks commercial banks exchanging currency fractional banking governments monetary policies monetary policy objectives Tier 1, Tier 2, and Tier 3 capital value of a currency Fragmentation concept of current economic malaise dial-up Internet access evolutionary biology Haldane, Andy information asymmetry limitations problem-solving approaches regulatory-centric approach systemic risk TBTF US telecoms industry G Genetic algorithm (GA) Gramm-Leach-Bliley Financial Modernization Act Greenspan, Alan Gresham’s law Guardtime H Haldane, Andy Heterogenous interacting agents High-frequency trading (HFT) Human uncertainty principle HYPR I Implicit contracts Information and communication technologies (ICTs) Institute for New Economical Thinking (INET) Insurance sector InterLedger Protocol (ILP) Internal Revenue Service (IRS) iSignthis J Junk bonds K Kashkari, Neel Kelton, Stephanie Kim-Markowitz Portfolio Insurers Model Know Your Business (KYB) Know Your Customer (KYC) advantage Atlantic model concept of contextual scenario development of documents empirical approach Government digital identity programs identity identity and KYC/AML services Kabbage KYC-Chain manifestations merchant processor multidimensional attributes multiple sources Namecoin blockchain OpenID protocol procedural system regulatory institutions tokenized identity transactional systems value exchange platforms vast-ranging subject Zooko’s triangle kompany.com L Large hadron collider (LHC) Living Will Review process M Macroeconomic models types cellular automata (CA) equilibrium business-cycle models genetic algorithm (GA) neural networks rational expectations structural models traditional structural models vector autoregression (VAR) models Macroeconomic theories Man-in-the-middle (MITM) Marketing money cashless system crime and taxation economy IRS money Seigniorage tax evasion Mathematical game theory McFadden Act Mincome, Canada Minority Game (MG) Money anddebt See alsoDebt and money capitalism cash obsession CRS report currencies floating exchange functions gold and silver history of money histroy real commodities transfer of types of withdrawn shadowbanking See(Shadow banking and systemic risk) utilitarian approach Multiple currencies Bitcoin Obituaries bitcoin price BTC/USD and USD/EUR volatility contractual money cryptocurrencies differences free banking Gresham’s law legal definition legal status private and government fiat private money quantitative model sovereign cash volatility N Namecoin blockchain Namibia Natural Language Processing (NLP) NemID Neo-Keynesian models Neuroplasticity New Keynesian models (NK models) O Occupational Information Network (ONET) Office of Scientific Research and Development (OSRD) OpenID protocol Originate, repackage and sell model Originate-to-distribute model P Paine, Thomas Palley, Thomas I.


Human Frontiers: The Future of Big Ideas in an Age of Small Thinking by Michael Bhaskar

"Margaret Hamilton" Apollo, 3D printing, additive manufacturing, AI winter, Albert Einstein, algorithmic trading, AlphaGo, Anthropocene, artificial general intelligence, augmented reality, autonomous vehicles, backpropagation, barriers to entry, basic income, behavioural economics, Benoit Mandelbrot, Berlin Wall, Big bang: deregulation of the City of London, Big Tech, Bletchley Park, blockchain, Boeing 747, brain emulation, Brexit referendum, call centre, carbon tax, charter city, citizen journalism, Claude Shannon: information theory, Clayton Christensen, clean tech, clean water, cognitive load, Columbian Exchange, coronavirus, cosmic microwave background, COVID-19, creative destruction, CRISPR, crony capitalism, cyber-physical system, dark matter, David Graeber, deep learning, DeepMind, deindustrialization, dematerialisation, Demis Hassabis, demographic dividend, Deng Xiaoping, deplatforming, discovery of penicillin, disruptive innovation, Donald Trump, double entry bookkeeping, Easter island, Edward Jenner, Edward Lorenz: Chaos theory, Elon Musk, en.wikipedia.org, endogenous growth, energy security, energy transition, epigenetics, Eratosthenes, Ernest Rutherford, Eroom's law, fail fast, false flag, Fellow of the Royal Society, flying shuttle, Ford Model T, Francis Fukuyama: the end of history, general purpose technology, germ theory of disease, glass ceiling, global pandemic, Goodhart's law, Google Glasses, Google X / Alphabet X, GPT-3, Haber-Bosch Process, hedonic treadmill, Herman Kahn, Higgs boson, hive mind, hype cycle, Hyperloop, Ignaz Semmelweis: hand washing, Innovator's Dilemma, intangible asset, interchangeable parts, Internet of things, invention of agriculture, invention of the printing press, invention of the steam engine, invention of the telegraph, invisible hand, Isaac Newton, ITER tokamak, James Watt: steam engine, James Webb Space Telescope, Jeff Bezos, jimmy wales, job automation, Johannes Kepler, John von Neumann, Joseph Schumpeter, Kenneth Arrow, Kevin Kelly, Kickstarter, knowledge economy, knowledge worker, Large Hadron Collider, liberation theology, lockdown, lone genius, loss aversion, Louis Pasteur, Mark Zuckerberg, Martin Wolf, megacity, megastructure, Menlo Park, Minecraft, minimum viable product, mittelstand, Modern Monetary Theory, Mont Pelerin Society, Murray Gell-Mann, Mustafa Suleyman, natural language processing, Neal Stephenson, nuclear winter, nudge unit, oil shale / tar sands, open economy, OpenAI, opioid epidemic / opioid crisis, PageRank, patent troll, Peter Thiel, plutocrats, post scarcity, post-truth, precautionary principle, public intellectual, publish or perish, purchasing power parity, quantum entanglement, Ray Kurzweil, remote working, rent-seeking, Republic of Letters, Richard Feynman, Robert Gordon, Robert Solow, secular stagnation, shareholder value, Silicon Valley, Silicon Valley ideology, Simon Kuznets, skunkworks, Slavoj Žižek, sovereign wealth fund, spinning jenny, statistical model, stem cell, Steve Jobs, Stuart Kauffman, synthetic biology, techlash, TED Talk, The Rise and Fall of American Growth, the scientific method, The Wealth of Nations by Adam Smith, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, Thomas Malthus, TikTok, total factor productivity, transcontinental railway, Two Sigma, Tyler Cowen, Tyler Cowen: Great Stagnation, universal basic income, uranium enrichment, We wanted flying cars, instead we got 140 characters, When a measure becomes a target, X Prize, Y Combinator

Parallel processing chips have boosted computational capacity. Machine learning needs vast amounts of ‘training’ data: these technical advances have come just as big datasets exploded. Business and government piled investment into R&D. Rapid improvements in areas like image recognition, natural language processing, translation, game playing and autonomous driving transformed services and generated hyperbolic headlines. The history of AI is one of crests of hype followed by troughs – the so-called AI winter set in from the 1980s with the broad failure of ‘symbolic’ approaches. But by the 2010s a new spring had arrived, and DeepMind was in the vanguard.

The authors of the original Eroom's Law paper now believe the era of stagnation may be coming to an end thanks to the prevalence and new-found effectiveness of machine learning in the discovery of drugs.23 AI is moving to the front lines of the battle against cancer and a paper in Cell illustrates that ML can use molecular structure to predict the effectiveness of antibacterials (the researchers behind the AI even called the resulting antibacterial ‘halicin’ after HAL, the AI in 2001: A Space Odyssey).24 We need things like this to beat future pandemics. Fusion scientists are optimistic that the application of AI could bring decisive advances in the coming years, and in general the field is now focused on ML approaches to core problems.25 Breakthroughs in natural language processing are coming at pace: the parameters of OpenAI's eye-catching GPT language prediction system grew from hundreds of millions to hundreds of billions in just a few years with some spectacular results, enabling it to write convincing text at length on any subject.26 GPT-3 can take a portion of writing and then continue it with at times shocking plausibility.

China spends $500 billion or more per year on R&D, significantly more than the EU and closing on the US (which it will have likely overtaken by the time you read this).12 Between 2000 and 2016 China's share of global scientific output quadrupled; since 2000 it alone has accounted for a third of global growth in R&D and has seen the second greatest increase in R&D intensity after South Korea. 2016 was a litmus year: for the first time China published more scientific papers than the USA to become the world's leading producer of scientific knowledge (in quantity – but quality, as measured by citations, also improved four-fold).13 A study from Elsevier and Nikkei found that in twenty-three out of thirty ‘hot’ fields, Chinese researchers published the most highly cited papers.14 It has a larger share of global patents and of the global STEM workforce than the former leader, the US; indeed almost 50 per cent of global patent families are going to Chinese inventors.15 That workforce is better educated than ever: between 1990 and 2010 the number of college graduates grew tenfold to 8 million per year, while the number of Chinese postgraduate degree holders grew fifteen-fold in the same period, again surpassing the US totals, even as Chinese universities shot up the world rankings.16 This boils down to specific advances beyond biotech. Andrew Ng, the AI pioneer, argues that the complexity of the Chinese language, and its level of investment, have pushed AI natural language processing ahead of the West, while Eric Schmidt, former CEO of Google, expects China to overtake the US in AI in the near future.17 Whereas the UK is putting at most £150 million towards the development of quantum computing, China has invested $15 billion and rising – its National Laboratory for Quantum Information Sciences is the largest anywhere in the world.


pages: 239 words: 70,206

Data-Ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else by Steve Lohr

"World Economic Forum" Davos, 23andMe, Abraham Maslow, Affordable Care Act / Obamacare, Albert Einstein, Alvin Toffler, Bear Stearns, behavioural economics, big data - Walmart - Pop Tarts, bioinformatics, business cycle, business intelligence, call centre, Carl Icahn, classic study, cloud computing, computer age, conceptual framework, Credit Default Swap, crowdsourcing, Daniel Kahneman / Amos Tversky, Danny Hillis, data is the new oil, data science, David Brooks, driverless car, East Village, Edward Snowden, Emanuel Derman, Erik Brynjolfsson, everywhere but in the productivity statistics, financial engineering, Frederick Winslow Taylor, Future Shock, Google Glasses, Ida Tarbell, impulse control, income inequality, indoor plumbing, industrial robot, informal economy, Internet of things, invention of writing, Johannes Kepler, John Markoff, John von Neumann, lifelogging, machine translation, Mark Zuckerberg, market bubble, meta-analysis, money market fund, natural language processing, obamacare, pattern recognition, payday loans, personalized medicine, planned obsolescence, precision agriculture, pre–internet, Productivity paradox, RAND corporation, rising living standards, Robert Gordon, Robert Solow, Salesforce, scientific management, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, SimCity, six sigma, skunkworks, speech recognition, statistical model, Steve Jobs, Steven Levy, The Design of Experiments, the scientific method, Thomas Kuhn: the structure of scientific revolutions, Tony Fadell, unbanked and underbanked, underbanked, Von Neumann architecture, Watson beat the top human players on Jeopardy!, yottabyte

Humans understand things in good part largely because of their experience of the real world. Computers lack that advantage. Advances in artificial intelligence mean that machines can increasingly see, read, listen, and speak, in their way. And a very different way, it is. As Frederick Jelinek, a pioneer in speech recognition and natural-language processing at IBM, once explained by way of analogy: “Airplanes don’t flap their wings.” To get a sense of how computers build knowledge, let’s look at Carnegie Mellon University’s Never-Ending Language Learning system, or NELL. Since 2010, NELL has been steadily scanning hundreds of millions of Web pages for text patterns that it uses to learn facts, more than 2.3 million so far, with an estimated accuracy of 87 percent.

But those systems proved extremely difficult to build. So knowledge systems gave way to the data-driven path: mine vast amounts of data to make predictions, based on statistical probabilities and patterns. Data-fueled artificial intelligence, Ferrucci says, has been “incredibly powerful” for tasks like natural-language processing—a central technology, for example, behind Google’s search and Watson’s question-answering. “But in a purely data-driven approach, there is no real understanding,” he says. “People are so enamored with the data-driven approach that they believe correlation is enough.” For a broad swath of commercial decisions, as we’ve seen, correlation is sufficient, as long as the outcome is a winner.


pages: 49 words: 12,968

Industrial Internet by Jon Bruner

air gap, autonomous vehicles, barriers to entry, Boeing 747, commoditize, computer vision, data acquisition, demand response, electricity market, en.wikipedia.org, factory automation, Google X / Alphabet X, industrial robot, Internet of things, job automation, loose coupling, natural language processing, performance metric, Silicon Valley, slashdot, smart grid, smart meter, statistical model, the Cathedral and the Bazaar, web application

It could alert other cars to the presence of rain, perhaps switching on headlights automatically or changing the assumptions that nearby cars make about road traction. The human in this case becomes part of an API in situ — the software, integrated with hardware, is able to detect a strong signal from a human without relying on extractive tools like natural-language processing that are often used to divine human preferences. Connected to networks through easy procedural mechanisms like If This Then That (IFTTT)[29], human operators even at the consumer level can identify significant signals and make their machines react to them. “I’m a car guy, so I’m talking about cars, but imagine the number of machines out there that are being turned on and off.


pages: 472 words: 117,093

Machine, Platform, Crowd: Harnessing Our Digital Future by Andrew McAfee, Erik Brynjolfsson

"World Economic Forum" Davos, 3D printing, additive manufacturing, AI winter, Airbnb, airline deregulation, airport security, Albert Einstein, algorithmic bias, AlphaGo, Amazon Mechanical Turk, Amazon Web Services, Andy Rubin, AOL-Time Warner, artificial general intelligence, asset light, augmented reality, autism spectrum disorder, autonomous vehicles, backpropagation, backtesting, barriers to entry, behavioural economics, bitcoin, blockchain, blood diamond, British Empire, business cycle, business process, carbon footprint, Cass Sunstein, centralized clearinghouse, Chris Urmson, cloud computing, cognitive bias, commoditize, complexity theory, computer age, creative destruction, CRISPR, crony capitalism, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, data science, Dean Kamen, deep learning, DeepMind, Demis Hassabis, discovery of DNA, disintermediation, disruptive innovation, distributed ledger, double helix, driverless car, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Ethereum, ethereum blockchain, everywhere but in the productivity statistics, Evgeny Morozov, fake news, family office, fiat currency, financial innovation, general purpose technology, Geoffrey Hinton, George Akerlof, global supply chain, Great Leap Forward, Gregor Mendel, Hernando de Soto, hive mind, independent contractor, information asymmetry, Internet of things, inventory management, iterative process, Jean Tirole, Jeff Bezos, Jim Simons, jimmy wales, John Markoff, joint-stock company, Joseph Schumpeter, Kickstarter, Kiva Systems, law of one price, longitudinal study, low interest rates, Lyft, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Marc Andreessen, Marc Benioff, Mark Zuckerberg, meta-analysis, Mitch Kapor, moral hazard, multi-sided market, Mustafa Suleyman, Myron Scholes, natural language processing, Network effects, new economy, Norbert Wiener, Oculus Rift, PageRank, pattern recognition, peer-to-peer lending, performance metric, plutocrats, precision agriculture, prediction markets, pre–internet, price stability, principal–agent problem, Project Xanadu, radical decentralization, Ray Kurzweil, Renaissance Technologies, Richard Stallman, ride hailing / ride sharing, risk tolerance, Robert Solow, Ronald Coase, Salesforce, Satoshi Nakamoto, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Skype, slashdot, smart contracts, Snapchat, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Pinker, supply-chain management, synthetic biology, tacit knowledge, TaskRabbit, Ted Nelson, TED Talk, the Cathedral and the Bazaar, The Market for Lemons, The Nature of the Firm, the strength of weak ties, Thomas Davenport, Thomas L Friedman, too big to fail, transaction costs, transportation-network company, traveling salesman, Travis Kalanick, Two Sigma, two-sided market, Tyler Cowen, Uber and Lyft, Uber for X, uber lyft, ubercab, Vitalik Buterin, warehouse robotics, Watson beat the top human players on Jeopardy!, winner-take-all economy, yield management, zero day

Modern technologies can take over the latter of these activities once they learn the rules of an interaction. But the hardest part of customer service to automate has not been finding an answer, but rather the initial step: listening and understanding. Speech recognition and other aspects of natural language processing have been tremendously difficult problems in artificial intelligence since the dawn of the field, for all of the reasons described earlier in this chapter. The previously dominant symbolic approaches have not worked well at all, but newer ones based on deep learning are making progress so quickly that it has surprised even the experts.

depth=1#x0026;hl=en#x0026;prev=search#x0026;rurl=translate.google.com#x0026;sl=ja#x0026;sp=nmt4#x0026;u=http://www.fukoku-life.co.jp/about/news/download/20161226.pdf. 84 In October of 2016: Allison Linn, “Historic Achievement: Microsoft Researchers Reach Human Parity in Conversational Speech Recognition,” Microsoft (blog), October 18, 2016, http://blogs.microsoft.com/next/2016/10/18/historic-achievement-microsoft-researchers-reach-human-parity-conversational-speech-recognition/#sm.0001d0t49dx0veqdsh21cccecz0e3. 84 “I must confess that I never thought”: Mark Liberman, “Human Parity in Conversational Speech Recognition,” Language Log (blog), October 18, 2016, http://languagelog.ldc.upenn.edu/nll/?p=28894. 84 “Every time I fire a linguist”: Julia Hirschberg, “ ‘Every Time I Fire a Linguist, My Performance Goes Up,’ and Other Myths of the Statistical Natural Language Processing Revolution” (speech, 15th National Conference on Artificial Intelligence, Madison, WI, July 29, 1998). 84 “AI-first world”: Julie Bort, “Salesforce CEO Marc Benioff Just Made a Bold Prediction about the Future of Tech,” Business Insider, May 18, 2016, http://www.businessinsider.com/salesforce-ceo-i-see-an-ai-first-world-2016-5. 85 “Many businesses still make important decisions”: Marc Benioff, “On the Cusp of an AI Revolution,” Project Syndicate, September 13, 2016, https://www.project-syndicate.org/commentary/artificial-intelligence-revolution-by-marc-benioff-2016-09.

Bertram’s Mind, The” (AI-generated prose), 121 MySpace, 170–71 Naam, Ramez, 258n Nakamoto, Satoshi, 279–85, 287, 296–97, 306, 312 Nakamoto Institute, 304 Nappez, Francis, 190 Napster, 144–45 NASA, 15 Nasdaq, 290–91 National Association of Realtors, 39 National Enquirer, 132 National Institutes of Health, 253 National Library of Australia, 274 Naturalis Historia (Pliny the Elder), 246 natural language processing, 83–84 “Nature of the Firm, The” (Coase), 309–10 Navy, US, 72 negative prices, 216 Nelson, Ted, 33 Nelson, Theodore, 229 Nesbitt, Richard, 45 Netflix, 187 Netscape Navigator, 34 network effects, 140–42 defined, 140 diffusion of platforms and, 205–6 O2O platforms and, 193 size of network and, 217 Stripe and, 174 Uber’s market value and, 219 networks, Cambrian Explosion and, 96 neural networks, 73–74, 78 neurons, 72–73 Newell, Allen, 69 Newmark, Craig, 138 New Republic, 133 news aggregators, 139–40 News Corp, 170, 171 newspapers ad revenue, 130, 132, 139 publishing articles directly on Facebook, 165 Newsweek, 133 New York City Postmates in, 185 taxi medallion prices before and after Uber, 201 UberPool in, 9 New York Times, 73, 130, 152 Ng, Andrew, 75, 96, 121, 186 Nielsen BookScan, 293, 294 99Degrees Custom, 333–34 99designs, 261 Nixon, Richard, 280n Nokia, 167–68, 203 noncredentialism, 241–42 Norman, Robert, 273–74 nugget ice, 11–14 Nuomi, 192 Nupedia, 246–48 Obama, Barack, election of 2012, 48–51 occupancy rates, 221–22 oDesk, 188 Office of Personnel Management, US, 32 oil rigs, 100 on-demand economy, future of companies in, 320 online discussion groups, 229–30 online payment services, 171–74 online reviews, 208–10 O2O (online to offline) platforms, 185–98 business-to-business, 188–90 consumer-oriented, 186–88 defined, 186 as engines of liquidity, 192–96 globalization of, 190–92 interdisciplinary insights from data compiled by, 194 for leveraging assets, 196–97 and machine learning, 194 Opal (ice maker), 13–14 Open Agriculture Initiative, 272 openness (crowd collaboration principle), 241 open platforms curation and, 165 downsides, 164 importance of, 163–65 as key to success, 169 open-source software; See also Linux Android as, 166–67 development by crowd, 240–45 operating systems, crowd-developed, 240–45 Oracle, 204 O’Reilly, Tim, 242 organizational dysfunction, 257 Oruna, 291 Osindero, Simon, 76 Osterman, Paul, 322 Ostrom, Elinor, 313 outcomes, clear (crowd collaboration principle), 243 outsiders in automated investing, 270 experts vs., 252–75 overall evaluation criterion, 51 Overstock.com, 290 Owen, Ivan, 273, 274 Owen, Jennifer, 274n ownership, contracts and, 314–15 Page, Larry, 233 PageRank, 233 Pahlka, Jennifer, 163 Painting Fool, The, 117 Papa John’s Pizza, 286 Papert, Seymour, 73 “Paperwork Mine,” 32 Paris, France, terrorist attack (2015), 55 Parker, Geoffrey, 148 parole, 39–40 Parse.ly, 10 Paulos, John Allen, 233 payments platforms, 171–74 peer reviews, 208–10 peer-to-peer lending, 263 peer-to-peer platforms, 144–45, 298 Peloton, 177n Penthouse magazine, 132 People Express, 181n, 182 Perceptron, 72–74 Perceptrons: An Introduction to Computational Geometry (Minsky and Papert), 73 perishing/perishable inventory and O2O platforms, 186 and revenue management, 181–84 risks in managing, 180–81 personal drones, 98 perspectives, differing, 258–59 persuasion, 322 per-transaction fees, 172–73 Pew Research Center, 18 p53 protein, 116–17 photography, 131 physical environments, experimentation in development of, 62–63 Pindyck, Robert, 196n Pinker, Steven, 68n piracy, of recorded music, 144–45 Plaice, Sean, 184 plastics, transition from molds to 3D printing, 104–7 Platform Revolution (Parker, Van Alstyne, and Choudary), 148 platforms; See also specific platforms business advantages of, 205–11 characteristics of successful, 168–74 competition between, 166–68 and complements, 151–68 connecting online and offline experience, 177–98; See also O2O (online to offline) platforms consumer loyalty and, 210–11 defined, 14, 137 diffusion of, 205 economics of “free, perfect, instant” information goods, 135–37 effect on incumbents, 137–48, 200–204 elasticity of demand, 216–18 future of companies based on, 319–20 importance of being open, 163–65; See also open platforms and information asymmetries, 206–10 limits to disruption of incumbents, 221–24 multisided markets, 217–18 music industry disruption, 143–48 network effect, 140–42 for nondigital goods/services, 178–85; See also O2O (online to offline) platforms and perishing inventory, 180–81 preference for lower prices by, 211–21 pricing elasticities, 212–13 product as counterpart to, 15 and product maker prices, 220–21 proliferation of, 142–48 replacement of assets with, 6–10 for revenue management, 181–84 supply/demand curves and, 153–57 and unbundling, 145–48 user experience as strategic element, 169–74 Playboy magazine, 133 Pliny the Elder, 246 Polanyi, Michael, 3 Polanyi’s Paradox and AlphaGo, 4 defined, 3 and difficulty of comparing human judgment to mathematical models, 42 and failure of symbolic machine learning, 71–72 and machine language, 82 and problems with centrally planned economies, 236 and System 1/System 2 relationship, 45 Postmates, 173, 184–85, 205 Postmates Plus Unlimited, 185 Postrel, Virginia, 90 Pratt, Gil, 94–95, 97, 103–4 prediction data-driven, 59–60 experimentation and, 61–63 statistical vs. clinical, 41 “superforecasters” and, 60–61 prediction markets, 237–39 premium brands, 210–11 presidential elections, 48–51 Priceline, 61–62, 223–24 price/pricing data-driven, 47; See also revenue management demand curves and, 154 elasticities, 212–13 loss of traditional companies’ power over, 210–11 in market economies, 237 and prediction markets, 238–39 product makers and platform prices, 220 supply curves and, 154–56 in two-sided networks, 213–16 Principia Mathematica (Whitehead and Russell), 69 print media, ad revenue and, 130, 132, 139 production costs, markets vs. companies, 313–14 productivity, 16 products as counterpart to platforms, 15 loss of profits to platform providers, 202–4 pairing free apps with, 163 platforms’ effect on, 200–225 threats from platform prices, 220–21 profitability Apple, 204 excessive use of revenue management and, 184 programming, origins of, 66–67 Project Dreamcatcher, 114 Project Xanadu, 33 proof of work, 282, 284, 286–87 prose, AI-generated, 121 Proserpio, Davide, 223 Prosper, 263 protein p53, 116–17 public service, 162–63 Pullman, David, 131 Pullum, Geoffrey, 84 quantitative investing firms (quants), 266–70 Quantopian, 267–70 Quinn, Kevin, 40–41 race cars, automated design for, 114–16 racism, 40, 51–52, 209–10 radio stations as complements to recorded music, 148 in late 1990s, 130 revenue declines (2000–2010), 135 Ramos, Ismael, 12 Raspbian, 244 rationalization, 45 Raymond, Eric, 259 real-options pricing, 196 reasoning, See System 1/System 2 reasoning rebundling, 146–47 recommendations, e-commerce, 47 recorded music industry in late 1990s, 130–31 declining sales (1999-2015), 134, 143 disruption by platforms, 143–48 Recording Industry Association of America (RIAA), 144 redlining, 46–47 Redmond, Michael, 2 reengineering, business process, 32–35 Reengineering the Corporation (Hammer and Champy), 32, 34–35, 37 regulation financial services, 202 Uber, 201–2, 208 Reichman, Shachar, 39 reinforcement learning, 77, 80 Renaissance Technologies, 266, 267 Rent the Runway, 186–88 Replicator 2 (3D printer), 273 reputational systems, 209–10 research and development (R&D), crowd-assisted, 11 Research in Motion (RIM), 168 residual rights of control, 315–18 “Resolution of the Bitcoin Experiment, The” (Hearn), 306 resource utilization rate, 196–97 restaurants, robotics in, 87–89, 93–94 retail; See also e-commerce MUEs and, 62–63 Stripe and, 171–74 retail warehouses, robotics in, 102–3 Rethinking the MBA: Business Education at a Crossroads (Datar, Garvin, and Cullen), 37 revenue, defined, 212 revenue management defined, 47 downsides of, 184–85 O2O platforms and, 193 platforms for, 181–84 platform user experience and, 211 problems with, 183–84 Rent the Runway and, 187 revenue-maximizing price, 212–13 revenue opportunities, as benefit of open platforms, 164 revenue sharing, Spotify, 147 reviews, online, 208–10 Ricardo, David, 279 ride services, See BlaBlaCar; Lyft; Uber ride-sharing, 196–97, 201 Rio Tinto, 100 Robohand, 274 robotics, 87–108 conditions for rapid expansion of, 94–98 DANCE elements, 95–98 for dull, dirty, dangerous, dear work, 99–101 future developments, 104–7 humans and, 101–4 in restaurant industry, 87–89 3D printing, 105–7 Rocky Mountain News, 132 Romney, Mitt, 48, 49 Roosevelt, Teddy, 23 Rosenblatt, Frank, 72, 73 Rovio, 159n Roy, Deb, 122 Rubin, Andy, 166 Ruger, Ted, 40–41 rule-based artificial intelligence, 69–72, 81, 84 Russell, Bertrand, 69 Sagalyn, Raphael, 293n Saloner, Garth, 141n Samsung and Android, 166 and Linux, 241, 244 sales and earnings deterioration, 203–4 San Francisco, California Airbnb in, 9 Craigslist in, 138 Eatsa in, 87 Napster case, 144 Postmates in, 185 Uber in, 201 Sanger, Larry, 246–48 Sato, Kaz, 80 Satoshi Nakamoto Institute, 304 scaling, cloud and, 195–96 Schiller, Phil, 152 Schumpeter, Joseph, 129, 264, 279, 330 Scott, Brian, 101–2 second machine age origins of, 16 phase one, 16 phase two, 17–18 secular trends, 93 security lanes, automated, 89 Sedol, Lee, 5–6 self-checkout kiosks, 90 self-driving automobiles, 17, 81–82 self-justification, 45 self-organization, 244 self-selection, 91–92 self-service, at McDonald’s, 92 self-teaching machines, 17 Seychelles Trading Company, 291 Shanghai Tower, 118 Shapiro, Carl, 141n Shaw, David, 266 Shaw, J.


pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life by Adam Greenfield

3D printing, Airbnb, algorithmic bias, algorithmic management, AlphaGo, augmented reality, autonomous vehicles, bank run, barriers to entry, basic income, bitcoin, Black Lives Matter, blockchain, Boston Dynamics, business intelligence, business process, Californian Ideology, call centre, cellular automata, centralized clearinghouse, centre right, Chuck Templeton: OpenTable:, circular economy, cloud computing, Cody Wilson, collective bargaining, combinatorial explosion, Computer Numeric Control, computer vision, Conway's Game of Life, CRISPR, cryptocurrency, David Graeber, deep learning, DeepMind, dematerialisation, digital map, disruptive innovation, distributed ledger, driverless car, drone strike, Elon Musk, Ethereum, ethereum blockchain, facts on the ground, fiat currency, fulfillment center, gentrification, global supply chain, global village, Goodhart's law, Google Glasses, Herman Kahn, Ian Bogost, IBM and the Holocaust, industrial robot, informal economy, information retrieval, Internet of things, Jacob Silverman, James Watt: steam engine, Jane Jacobs, Jeff Bezos, Jeff Hawkins, job automation, jobs below the API, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John Perry Barlow, John von Neumann, joint-stock company, Kevin Kelly, Kickstarter, Kiva Systems, late capitalism, Leo Hollis, license plate recognition, lifelogging, M-Pesa, Mark Zuckerberg, means of production, megacity, megastructure, minimum viable product, money: store of value / unit of account / medium of exchange, natural language processing, Network effects, New Urbanism, Nick Bostrom, Occupy movement, Oculus Rift, off-the-grid, PalmPilot, Pareto efficiency, pattern recognition, Pearl River Delta, performance metric, Peter Eisenman, Peter Thiel, planetary scale, Ponzi scheme, post scarcity, post-work, printed gun, proprietary trading, RAND corporation, recommendation engine, RFID, rolodex, Rutger Bregman, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, sharing economy, Shenzhen special economic zone , Sidewalk Labs, Silicon Valley, smart cities, smart contracts, social intelligence, sorting algorithm, special economic zone, speech recognition, stakhanovite, statistical model, stem cell, technoutopianism, Tesla Model S, the built environment, The Death and Life of Great American Cities, The Future of Employment, Tony Fadell, transaction costs, Uber for X, undersea cable, universal basic income, urban planning, urban sprawl, vertical integration, Vitalik Buterin, warehouse robotics, When a measure becomes a target, Whole Earth Review, WikiLeaks, women in the workforce

At retail, “seamless” point-of-sale processes and the displacement of responsibility onto the shopper themselves via self-checkout slash the number of personnel it takes to run a storefront operation, though some staff will always be required to smooth out the inevitable fiascos; perhaps a few high-end boutiques performatively, conspicuously retain a significant floor presence. In customer service, appalling “cognitive agents” take the place of front-line staff.44 Equipped with speech recognition and natural-language processing capabilities, with synthetic virtual faces that unhesitatingly fold in every last kind of problematic assumption about gender and ethnicity, they’re so cheap that it’s hard to imagine demanding, hard-to-train human staff holding out against them for very long. Even in so-called high-touch fields like childcare and home-health assistance, jobs that might be done and done well by people with no other qualification, face the prospect of elimination.

A test for machinic intelligence called the Winograd Schema, for example, asks candidate systems to resolve the problems of pronoun disambiguation that crop up constantly in everyday speech.11 Sentences of this type (“I plugged my phone into the wall because it needed to be recharged”) yield to common sense more or less immediately, but still tax the competence of the most advanced natural-language processing systems. Similarly, for all the swagger of their parent company, Uber’s nominally autonomous vehicles seem unable to cope with even so simple an element of the urban environment as a bike lane, swerving in front of cyclists on multiple occasions during the few days they were permitted to operate in San Francisco.12 In the light of results like this, fears that algorithmic systems might take over much of anything at all can easily seem wildly overblown.

., “Context-Based Bayesian Intent Recognition,” IEEE Transactions on Autonomous Mental Development, Volume 4, Number 3, September 2012. 21.Richard Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, October 2013, pp. 1631–42. 22.Bob Sullivan, “Police Sold on Snaptrends, Software That Claims to Stop Crime Before It Starts,” bobsullivan.net, September 4, 2014. 23.Ibid. 24.Leo Mirani, “Millions of Facebook Users Have No Idea They’re Using the Internet,” Quartz, February 9, 2015. 25.Ellen Huet, “Server and Protect: Predictive Policing Firm PredPol Promises to Map Crime Before It Happens,” Forbes, February 11, 2015. 26.Ibid. 27.Robert L.


pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence by John Brockman

Adam Curtis, agricultural Revolution, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Anthropocene, artificial general intelligence, augmented reality, autism spectrum disorder, autonomous vehicles, backpropagation, basic income, behavioural economics, bitcoin, blockchain, bread and circuses, Charles Babbage, clean water, cognitive dissonance, Colonization of Mars, complexity theory, computer age, computer vision, constrained optimization, corporate personhood, cosmological principle, cryptocurrency, cuban missile crisis, Danny Hillis, dark matter, data science, deep learning, DeepMind, Demis Hassabis, digital capitalism, digital divide, digital rights, discrete time, Douglas Engelbart, driverless car, Elon Musk, Emanuel Derman, endowment effect, epigenetics, Ernest Rutherford, experimental economics, financial engineering, Flash crash, friendly AI, functional fixedness, global pandemic, Google Glasses, Great Leap Forward, Hans Moravec, hive mind, Ian Bogost, income inequality, information trail, Internet of things, invention of writing, iterative process, James Webb Space Telescope, Jaron Lanier, job automation, Johannes Kepler, John Markoff, John von Neumann, Kevin Kelly, knowledge worker, Large Hadron Collider, lolcat, loose coupling, machine translation, microbiome, mirror neurons, Moneyball by Michael Lewis explains big data, Mustafa Suleyman, natural language processing, Network effects, Nick Bostrom, Norbert Wiener, paperclip maximiser, pattern recognition, Peter Singer: altruism, phenotype, planetary scale, Ray Kurzweil, Recombinant DNA, recommendation engine, Republic of Letters, RFID, Richard Thaler, Rory Sutherland, Satyajit Das, Search for Extraterrestrial Intelligence, self-driving car, sharing economy, Silicon Valley, Skype, smart contracts, social intelligence, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, superintelligent machines, supervolcano, synthetic biology, systems thinking, tacit knowledge, TED Talk, the scientific method, The Wisdom of Crowds, theory of mind, Thorstein Veblen, too big to fail, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!, We are as Gods, Y2K

After thirty years of research, a million-times improvement in computer power, and vast data sets from the Internet, we now know the answer to this question: Neural networks scaled up to twelve layers deep, with billions of connections, are outperforming the best algorithms in computer vision for object recognition and have revolutionized speech recognition. It’s rare for any algorithm to scale this well, which suggests that they may soon be able to solve even more difficult problems. Recent breakthroughs have been made that allow the application of deep learning to natural-language processing. Deep recurrent networks with short-term memory were trained to translate English sentences into French sentences at high levels of performance. Other deep-learning networks could create English captions for the content of images with surprising and sometimes amusing acumen. Supervised learning using deep networks is a step forward, but still far from achieving general intelligence.

And virtual-reality-style interfaces will continue to become more realistic and immersive. Why won’t a stand-alone sentient brain come sooner? The amazing progress in spoken-language recognition—unthinkable ten years ago—derives in large part from having access to huge amounts of data and huge amounts of storage and fast networks. The improvements we see in natural-language processing are based on mimicking what people do, not understanding or even simulating it. It’s not owing to breakthroughs in understanding human cognition or even significantly different algorithms. But eGaia is already partly here, at least in the developed world. This distributed nerve-center network, an interplay among the minds of people and their monitoring electronics, will give rise to a distributed technical-social mental system the likes of which has not been experienced before.

To be sure, there have been exponential advances in narrow-engineering applications of artificial intelligence, such as playing chess, calculating travel routes, or translating texts in rough fashion, but there’s been scarcely more than linear progress in five decades of working toward strong AI. For example, the different flavors of intelligent personal assistants available on your smartphone are only modestly better than Eliza, an early example of primitive natural-language-processing from the mid-1960s. We still have no machine that can, for instance, read all that the Web has to say about war and plot a decent campaign, nor do we even have an open-ended AI system that can figure out how to write an essay to pass a freshman composition class or an eighth-grade science exam.


pages: 619 words: 177,548

Power and Progress: Our Thousand-Year Struggle Over Technology and Prosperity by Daron Acemoglu, Simon Johnson

"Friedman doctrine" OR "shareholder theory", "World Economic Forum" Davos, 4chan, agricultural Revolution, AI winter, Airbnb, airline deregulation, algorithmic bias, algorithmic management, Alignment Problem, AlphaGo, An Inconvenient Truth, artificial general intelligence, augmented reality, basic income, Bellingcat, Bernie Sanders, Big Tech, Bletchley Park, blue-collar work, British Empire, carbon footprint, carbon tax, carried interest, centre right, Charles Babbage, ChatGPT, Clayton Christensen, clean water, cloud computing, collapse of Lehman Brothers, collective bargaining, computer age, Computer Lib, Computing Machinery and Intelligence, conceptual framework, contact tracing, Corn Laws, Cornelius Vanderbilt, coronavirus, corporate social responsibility, correlation does not imply causation, cotton gin, COVID-19, creative destruction, declining real wages, deep learning, DeepMind, deindustrialization, Demis Hassabis, Deng Xiaoping, deskilling, discovery of the americas, disinformation, Donald Trump, Douglas Engelbart, Douglas Engelbart, Edward Snowden, Elon Musk, en.wikipedia.org, energy transition, Erik Brynjolfsson, European colonialism, everywhere but in the productivity statistics, factory automation, facts on the ground, fake news, Filter Bubble, financial innovation, Ford Model T, Ford paid five dollars a day, fulfillment center, full employment, future of work, gender pay gap, general purpose technology, Geoffrey Hinton, global supply chain, Gordon Gekko, GPT-3, Grace Hopper, Hacker Ethic, Ida Tarbell, illegal immigration, income inequality, indoor plumbing, industrial robot, interchangeable parts, invisible hand, Isaac Newton, Jacques de Vaucanson, James Watt: steam engine, Jaron Lanier, Jeff Bezos, job automation, Johannes Kepler, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph-Marie Jacquard, Kenneth Arrow, Kevin Roose, Kickstarter, knowledge economy, labor-force participation, land reform, land tenure, Les Trente Glorieuses, low skilled workers, low-wage service sector, M-Pesa, manufacturing employment, Marc Andreessen, Mark Zuckerberg, megacity, mobile money, Mother of all demos, move fast and break things, natural language processing, Neolithic agricultural revolution, Norbert Wiener, NSO Group, offshore financial centre, OpenAI, PageRank, Panopticon Jeremy Bentham, paperclip maximiser, pattern recognition, Paul Graham, Peter Thiel, Productivity paradox, profit maximization, profit motive, QAnon, Ralph Nader, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Robert Bork, Robert Gordon, Robert Solow, robotic process automation, Ronald Reagan, scientific management, Second Machine Age, self-driving car, seminal paper, shareholder value, Sheryl Sandberg, Shoshana Zuboff, Silicon Valley, social intelligence, Social Responsibility of Business Is to Increase Its Profits, social web, South Sea Bubble, speech recognition, spice trade, statistical model, stem cell, Steve Jobs, Steve Wozniak, strikebreaker, subscription business, Suez canal 1869, Suez crisis 1956, supply-chain management, surveillance capitalism, tacit knowledge, tech billionaire, technoutopianism, Ted Nelson, TED Talk, The Future of Employment, The Rise and Fall of American Growth, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, theory of mind, Thomas Malthus, too big to fail, total factor productivity, trade route, transatlantic slave trade, trickle-down economics, Turing machine, Turing test, Twitter Arab Spring, Two Sigma, Tyler Cowen, Tyler Cowen: Great Stagnation, union organizing, universal basic income, Unsafe at Any Speed, Upton Sinclair, upwardly mobile, W. E. B. Du Bois, War on Poverty, WikiLeaks, wikimedia commons, working poor, working-age population

Modern intelligent machines perform tasks that many would have thought impossible a couple of decades ago. Examples include face-recognition software, search engines that guess what you want to find, and recommendation systems that match you to the products that you are most likely to enjoy or, at the very least, purchase. Many systems now use some form of natural-language processing to interface between human speech or written enquiries and computers. Apple’s Siri and Google’s search engine are examples of AI-based systems that are used widely around the world every day. AI enthusiasts also point to some impressive achievements. AI programs can recognize thousands of different objects and images and provide some basic translation among more than a hundred languages.

New machine capabilities can massively expand the things we do and can transform many aspects of our lives for the better. And there have also been tremendous advances. For example, the Generative Pre-trained Transformer 3 (GPT-3), released in 2020 by OpenAI, and ChatGPT released in 2022 by the same company, are natural-language processing systems with remarkable capabilities. Already trained and optimized on massive amounts of text data from the internet, these programs can generate almost human-like articles, including poetry; communicate in typical human language; and, most impressively, turn natural-language instructions into computer code.

Although talk of intelligent machines has been around for two decades, these technologies started spreading only after 2015. The takeoff is visible in the amount that firms spend on AI-related activities and in the number of job postings for workers with specialized AI skills (including machine learning, machine vision, deep learning, image recognition, natural-language processing, neural networks, support vector machines, and latent semantic analysis). Tracking this indelible footprint, we can see that AI investments and the hiring of AI specialists concentrate in organizations that rely on tasks that can be performed by these technologies, such as actuarial and accounting functions, procurement and purchasing analysis, and various other clerical jobs that involve pattern recognition, computation, and basic speech recognition.


pages: 265 words: 74,000

The Numerati by Stephen Baker

Berlin Wall, Black Swan, business process, call centre, correlation does not imply causation, Drosophila, full employment, illegal immigration, index card, information security, Isaac Newton, job automation, job satisfaction, junk bonds, McMansion, Myron Scholes, natural language processing, off-the-grid, PageRank, personalized medicine, recommendation engine, RFID, Silicon Valley, Skype, statistical model, surveillance capitalism, Watson beat the top human players on Jeopardy!, workplace surveillance

Practically every word makes sense. The bad news, from a data-mining perspective, is that it takes me a scandalous five minutes to read through her text. In that time, Umbria's computers work through 35,300 blog posts. This magic takes place within two domains of artificial intelligence: natural language processing and machine learning. The idea is simple enough. The machines churn through the words, using their statistical genius and formidable memory to make sense of them. To say that they "understand" the words is a stretch. It's like saying that a blind bat, which navigates by processing the geometry of sound waves, "sees" the open window it flies through.

See Social networks Names finding people by, [>], [>], [>], [>]–[>], [>] on phone prompts, [>] protection of, in data mining, [>] NASA, [>]–[>] National Cryptologic Museum, [>], [>], [>]–[>] National Science Foundation, [>] National Security Agency (NSA) data mining by, [>], [>]–[>] mathematicians working for, [>], [>], [>]–[>], [>]–[>], [>] social network interpretation by, [>], [>]–[>] Natural language processing, [>]–[>] "Negotiators" (personality type), [>]–[>], [>] Netflix, [>], [>], [>] "Neural network" programs, [>]–[>] Newton, Isaac, [>] New York Times, [>] Next Friend Analysis, [>]–[>], [>] Nicaragua, [>] Nicolov, Nicolas, [>]–[>], [>], [>]–[>] Nielsen BuzzMetrics (company), [>], [>] 9/11 terrorist attack, [>], [>]–[>], [>]–[>], [>], [>], [>] "Nodes" (in social networks), [>] "Noise," [>] No Place to Hide (O'Harrow), [>] NORA software, [>]–[>], [>] Norman (fistulated cow), [>]–[>], [>], [>] NSA.


pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts by Richard Susskind, Daniel Susskind

23andMe, 3D printing, Abraham Maslow, additive manufacturing, AI winter, Albert Einstein, Amazon Mechanical Turk, Amazon Robotics, Amazon Web Services, Andrew Keen, Atul Gawande, Automated Insights, autonomous vehicles, Big bang: deregulation of the City of London, big data - Walmart - Pop Tarts, Bill Joy: nanobots, Blue Ocean Strategy, business process, business process outsourcing, Cass Sunstein, Checklist Manifesto, Clapham omnibus, Clayton Christensen, clean water, cloud computing, commoditize, computer age, Computer Numeric Control, computer vision, Computing Machinery and Intelligence, conceptual framework, corporate governance, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, data science, death of newspapers, disintermediation, Douglas Hofstadter, driverless car, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, Filter Bubble, full employment, future of work, Garrett Hardin, Google Glasses, Google X / Alphabet X, Hacker Ethic, industrial robot, informal economy, information retrieval, interchangeable parts, Internet of things, Isaac Newton, James Hargreaves, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Khan Academy, knowledge economy, Large Hadron Collider, lifelogging, lump of labour, machine translation, Marshall McLuhan, Metcalfe’s law, Narrative Science, natural language processing, Network effects, Nick Bostrom, optical character recognition, Paul Samuelson, personalized medicine, planned obsolescence, pre–internet, Ray Kurzweil, Richard Feynman, Second Machine Age, self-driving car, semantic web, Shoshana Zuboff, Skype, social web, speech recognition, spinning jenny, strong AI, supply-chain management, Susan Wojcicki, tacit knowledge, TED Talk, telepresence, The Future of Employment, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Tragedy of the Commons, transaction costs, Turing test, Two Sigma, warehouse robotics, Watson beat the top human players on Jeopardy!, WikiLeaks, world market for maybe five computers, Yochai Benkler, young professional

For us, it represents the coming of the second wave of AI (section 4.9). Here is a system that undoubtedly performs tasks that we would normally think require human intelligence. The version of Watson that competed on Jeopardy! holds over 200 million pages of documents and implements a wide range of AI tools and techniques, including natural language processing, machine learning, speech synthesis, game-playing, information retrieval, intelligent search, knowledge processing and reasoning, and much more. This type of AI, we stress again, is radically different from the first wave of rule-based expert systems of the 1980s (see section 4.9). It is interesting to note, harking back again to the exponential growth of information technology, that the hardware on which Watson ran in 2011 was said to be about the size of the average bedroom.

The term ‘artificial intelligence’ was coined by John McCarthy in 1955, and in the thirty years or so that followed a wide range of systems, techniques, and technologies were brought under its umbrella (the terms used in the mid-1980s are included in parentheses): the processing and translation of natural language (natural language processing); the recognition of the spoken word (speech recognition); the playing of complex games such as chess (game-playing); the recognition of images and objects of the physical world (vision and perception); learning from examples and precedents (machine learning); computer programs that can themselves generate programs (automatic programming); the sophisticated education of human users (intelligent computer-aided instruction); the design and development of machines whose physical movements resembled those of human beings (robotics), and intelligent problem-solving and reasoning (intelligent knowledge-based systems or expert systems).103 Our project at the University of Oxford (1983–6) focused on theoretical and philosophical aspects of this last category—expert systems—as applied in the law.

We can imagine a day when machines will not just make coffee, but will write wonderful poetry, compose splendid symphonies, paint stunning landscapes, sing beautifully, and even dance with remarkable grace. We are likely to judge these contributions in two ways. On the one hand, we might take a view on their relative merits as machine-generated achievement, marvelling perhaps at the underpinning natural language processing or robotics. Our interest will be in comparing like with like—machine performance with machine performance. On the other hand, we might compare their output with the creative expressions of human beings. It may well be that we will concede that, in terms of outcomes, the machine is superior.


pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang

AI winter, artificial general intelligence, backpropagation, bioinformatics, brain emulation, classic study, combinatorial explosion, complexity theory, computer vision, Computing Machinery and Intelligence, conceptual framework, correlation coefficient, epigenetics, friendly AI, functional programming, G4S, higher-order functions, information retrieval, Isaac Newton, Jeff Hawkins, John Conway, Loebner Prize, Menlo Park, natural language processing, Nick Bostrom, Occam's razor, p-value, pattern recognition, performance metric, precautionary principle, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K

The truthvalue of a symbolizing statement indicates the frequency and confidence for the word/phrase/sentence (in the natural language) to be used as the symbol of the term (in Narsese), according to the experience of the system. In language understanding process, NARS will not have separate parsing and semantic mapping phases, like in many other natural language processing systems. Instead, for an input sentence, the recognition of its syntactic structure and the recognition of its semantic structure will be carried out hand-in-hand. The process will start by checking whether the sentence can be understood as a whole, as the case of proverbs and idioms. If unsuccessful, the sentence will be divided recursively into phrases and words, whose sequential relations will be tentatively mapped into the structures of compound terms, with components corresponding to the individual phrases and words.

Even so, its proficiency in that language should be sufficient for many practical purposes. Being able to use any natural language is not a necessary condition for being intelligent. Since the aim of NARS is not to accurately duplicate human behaviors so as to pass the Turing Test [5], natural language processing is optional for the system. 3.3 Education NARS processes tasks using available knowledge, though the system is not designed with a ready-made knowledge base as a necessary part. Instead, all the knowledge, in principle, should come from the system’s experience. In other words, NARS as designed is like a baby that has great potential, but little instinct.

To gracefully incorporate heuristics not explicitly based on probability theory, in cases where probability theory, at its current state of development, does not provide adequate pragmatic solutions. To provide “scalable” reasoning, in the sense of being able to carry out inferences involving at least billions of premises. Of course, when the number of premises is fewer, more intensive and accurate reasoning may be carried out. To easily accept input from, and send input to, natural language processing software systems. PLN implements a wide array of first-order and higher-order inference rules including (but not limited to) deduction, Bayes’ Rule, unification, intensional and extensional inference, belief revision, induction, and abduction. Each rule comes with uncertain truth-value formulas, calculating the truth-value of the conclusion from the truthvalues of the premises.


pages: 71 words: 14,237

21 Recipes for Mining Twitter by Matthew A. Russell

en.wikipedia.org, Google Earth, natural language processing, NP-complete, social web, web application

See Also http://labs.mudynamics.com/wp-content/uploads/2009/04/icouch.html, http://help.com/ post/383276-anyone-knows-the-formula-for-font-s 1.11 Creating a Tag Cloud from Tweet Entities | 33 1.12 Summarizing Link Targets Problem You want to summarize the text of a web page that’s indicated by a short URL in a tweet. Solution Extract the text from the web page, and then use a natural language processing (NLP) toolkit such as the Natural Language Toolkit (NLTK) to help you extract the most important sentences to create a machine-generated abstract. Discussion Summarizing web pages is a very powerful capability, and this is especially the case in the context of a tweet where you have a lot of additional metadata (or “reactions”) about the page from one or more tweets.


pages: 308 words: 84,713

The Glass Cage: Automation and Us by Nicholas Carr

Airbnb, Airbus A320, Andy Kessler, Atul Gawande, autonomous vehicles, Bernard Ziegler, business process, call centre, Captain Sullenberger Hudson, Charles Lindbergh, Checklist Manifesto, cloud computing, cognitive load, computerized trading, David Brooks, deep learning, deliberate practice, deskilling, digital map, Douglas Engelbart, driverless car, drone strike, Elon Musk, Erik Brynjolfsson, Evgeny Morozov, Flash crash, Frank Gehry, Frank Levy and Richard Murnane: The New Division of Labor, Frederick Winslow Taylor, future of work, gamification, global supply chain, Google Glasses, Google Hangouts, High speed trading, human-factors engineering, indoor plumbing, industrial robot, Internet of things, Ivan Sutherland, Jacquard loom, James Watt: steam engine, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Kevin Kelly, knowledge worker, low interest rates, Lyft, machine readable, Marc Andreessen, Mark Zuckerberg, means of production, natural language processing, new economy, Nicholas Carr, Norbert Wiener, Oculus Rift, pattern recognition, Peter Thiel, place-making, plutocrats, profit motive, Ralph Waldo Emerson, RAND corporation, randomized controlled trial, Ray Kurzweil, recommendation engine, robot derives from the Czech word robota Czech, meaning slave, scientific management, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley ideology, software is eating the world, Stephen Hawking, Steve Jobs, systems thinking, tacit knowledge, TaskRabbit, technological determinism, technological solutionism, technoutopianism, TED Talk, The Wealth of Nations by Adam Smith, turn-by-turn navigation, Tyler Cowen, US Airways Flight 1549, Watson beat the top human players on Jeopardy!, William Langewiesche

Until recently, it was difficult, if not impossible, for computers to replicate such deep, specialized, and often tacit knowledge. But inexorable advances in processing speed, precipitous declines in data-storage and networking costs, and breakthroughs in artificial-intelligence methods such as natural language processing and pattern recognition have changed the equation. Computers have become much more adept at reviewing and interpreting vast amounts of text and other information. By spotting correlations in the data—traits or phenomena that tend to be found together or to occur simultaneously or sequentially—computers are often able to make accurate predictions, calculating, say, the probability that a patient displaying a set of symptoms has or will develop a particular disease or the odds that a patient with a certain disease will respond well to a particular drug or other treatment regimen.

., 138 mobile phones, 132–33 Moore’s Law, 40 Morozov, Evgeny, 205, 225 Moser, Edvard, 134–35 Moser, May-Britt, 134 motivation, 14, 17, 124 “Mowing” (Frost), 211–16, 218, 221–22 Murnane, Richard, 9, 10 Musk, Elon, 8 Nadin, Mihai, 80 NASA, 50, 55, 58 National Safety Council, 208 National Transportation Safety Board (NTSB), 44 natural language processing, 113 nature, 217, 220 Nature, 155 Nature Neuroscience, 134–35 navigation systems, 59, 68–71, 217 see also GPS Navy, U.S., 189 Nazi Germany, 35, 157 nervous system, 9–10, 36, 220–21 Networks of Power (Hughes), 196 neural networks, 113–14 neural processing, 119n neuroergonomic systems, 165 neurological studies, 9 neuromorphic microchips, 114, 119n neurons, 57, 133–34, 150, 219 neuroscience, neuroscientists, 74, 133–37, 140, 149 New Division of Labor, The (Levy and Murnane), 9 Nimwegen, Christof van, 75–76, 180 Noble, David, 173–74 Norman, Donald, 161 Noyes, Jan, 54–55 NSA, 120, 198 numerical control, 174–75 Oakeshott, Michael, 124 Obama, Barack, 94 Observer, 78–79 Oculus Rift, 201 Office of the Inspector General, 99 offices, 28, 108–9, 112, 222 automation complacency and, 69 Ofri, Danielle, 102 O’Keefe, John, 133–34 Old Dominion University, 91 “On Things Relating to the Surgery” (Hippocrates), 158 oracle machine, 119–20 “Outsourced Brain, The” (Brooks), 128 Pallasmaa, Juhani, 145 Parameswaran, Ashwin, 115 Parameters, 191 parametric design, 140–41 parametricism, 140–41 “Parametricism Manifesto” (Schumacher), 141 Parasuraman, Raja, 54, 67, 71, 166, 176 Parry, William Edward, 125 pattern recognition, 57, 58, 81, 83, 113 Pavlov, Ivan, 88 Pebble, 201 Pediatrics, 97 perception, 8, 121, 130, 131, 132, 133, 144, 148–51, 201, 214–18, 220, 226, 230 performance, Yerkes-Dodson law and, 96 Phenomenology of Perception (Merleau-Ponty), 216 philosophers, 119, 143, 144, 148–51, 186, 224 photography, film vs. digital, 230 Piano, Renzo, 138, 141–42 pilots, 1, 2, 32, 43–63, 91, 153 attentional tunneling and, 200–201 capability of the plane vs., 60–61, 154 death of, 53 erosion of expertise of, 54–58, 62–63 human- vs. technology-centered automation and, 168–70, 172–73 income of, 59–60 see also autopilot place, 131–34, 137, 251n place cells, 133–34, 136, 219 Plato, 148 Player Piano (Vonnegut), 39 poetry, 211–16, 218, 221–22 Poirier, Richard, 214, 215 Politics (Aristotle), 224 Popular Science, 48 Post, Wiley, 48, 50, 53, 57, 62, 82, 169 power, 21, 37, 65, 151, 175, 204, 217 practice, 82–83 Predator drone, 188 premature fixation, 145 presence, power of, 200 Priestley, Joseph, 160 Prius, 6, 13, 154–55 privacy, 206 probability, 113–24 procedural (tacit) knowledge, 9–11, 83, 105, 113, 144 productivity, 18, 22, 29, 30, 37, 106, 160, 173, 175, 181, 218 professional work, incursion of computers into, 115 profit motive, 17 profits, 18, 22, 28, 30, 33, 95, 159, 171, 172–73, 175 progress, 21, 26, 29, 37, 40, 65, 196, 214 acceleration of, 26 scientific, 31, 123 social, 159–60, 228 progress (continued) technological, 29, 31, 34, 35, 48–49, 108–9, 159, 160, 161, 173, 174, 222, 223–24, 226, 228, 230 utopian vision of, 25, 26 prosperity, 20, 21, 107 proximal cues, 219–20 psychologists, psychology, 9, 11, 15, 54, 103, 119, 149, 158–59 animal studies, 87–92 cognitive, 72–76, 81, 129–30 psychomotor skills, 56, 57–58, 81, 120 quality of experience, 14–15 Race against the Machine (Brynjolfsson and McAfee), 28–29 RAND Corporation, 93–98 “Rationalism in Politics” (Oakeshott), 124 Rattner, Justin, 203 reading, learning of, 82 Reaper drone, 188 reasoning, reason, 120, 121, 124, 151 recession, 27, 28, 30, 32 Red Dead Redemption, 177–78 “Relation of Strength of Stimulus to Rapidity of Habit-Formation, The” (Yerkes and Dodson), 89 Renslow, Marvin, 43–44 Revit, 146, 147 Rifkin, Jeremy, 28 Robert, David, 45, 169–70 Robert Frost (Poirier), 214 Roberts, J.


pages: 245 words: 83,272

Artificial Unintelligence: How Computers Misunderstand the World by Meredith Broussard

"Susan Fowler" uber, 1960s counterculture, A Declaration of the Independence of Cyberspace, Ada Lovelace, AI winter, Airbnb, algorithmic bias, AlphaGo, Amazon Web Services, autonomous vehicles, availability heuristic, barriers to entry, Bernie Sanders, Big Tech, bitcoin, Buckminster Fuller, Charles Babbage, Chris Urmson, Clayton Christensen, cloud computing, cognitive bias, complexity theory, computer vision, Computing Machinery and Intelligence, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, data science, deep learning, Dennis Ritchie, digital map, disruptive innovation, Donald Trump, Douglas Engelbart, driverless car, easy for humans, difficult for computers, Electric Kool-Aid Acid Test, Elon Musk, fake news, Firefox, gamification, gig economy, global supply chain, Google Glasses, Google X / Alphabet X, Greyball, Hacker Ethic, independent contractor, Jaron Lanier, Jeff Bezos, Jeremy Corbyn, John Perry Barlow, John von Neumann, Joi Ito, Joseph-Marie Jacquard, life extension, Lyft, machine translation, Mark Zuckerberg, mass incarceration, Minecraft, minimum viable product, Mother of all demos, move fast and break things, Nate Silver, natural language processing, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, One Laptop per Child (OLPC), opioid epidemic / opioid crisis, PageRank, Paradox of Choice, payday loans, paypal mafia, performance metric, Peter Thiel, price discrimination, Ray Kurzweil, ride hailing / ride sharing, Ross Ulbricht, Saturday Night Live, school choice, self-driving car, Silicon Valley, Silicon Valley billionaire, speech recognition, statistical model, Steve Jobs, Steven Levy, Stewart Brand, TechCrunch disrupt, Tesla Model S, the High Line, The Signal and the Noise by Nate Silver, theory of mind, traumatic brain injury, Travis Kalanick, trolley problem, Turing test, Uber for X, uber lyft, Watson beat the top human players on Jeopardy!, We are as Gods, Whole Earth Catalog, women in the workforce, work culture , yottabyte

Yes, they’re fun to imagine, and it can inspire wonderful creativity to think about the possibilities of robot domination and so on—but they aren’t real. This book hews closely to the real mathematical, cognitive, and computational concepts that are in the actual academic discipline of artificial intelligence: knowledge representation and reasoning, logic, machine learning, natural language processing, search, planning, mechanics, and ethics. In the first computational adventure (chapter 5), I investigate why, after two decades of education reform, schools still can’t get students to pass standardized tests. It’s not the students’ or the teachers’ fault. The problem is far bigger: the companies that create the most important state and local exams also publish textbooks that contain many of the answers, but low-income school districts can’t afford to buy the books.

Meanwhile, sciences that involve human beings rather than elementary particles have proven more resistant to elegant mathematics. Economists suffer from physics envy over their inability to neatly model human behavior. An informal, incomplete grammar of the English language runs over 1,700 pages. Perhaps when it comes to natural language processing and related fields, we’re doomed to complex theories that will never have the elegance of physics equations. But if that’s so, we should stop acting as if or goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data.27 Data is unreasonably effective—seductively so, even.


pages: 696 words: 143,736

The Age of Spiritual Machines: When Computers Exceed Human Intelligence by Ray Kurzweil

Ada Lovelace, Alan Greenspan, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Alvin Toffler, Any sufficiently advanced technology is indistinguishable from magic, backpropagation, Buckminster Fuller, call centre, cellular automata, Charles Babbage, classic study, combinatorial explosion, complexity theory, computer age, computer vision, Computing Machinery and Intelligence, cosmological constant, cosmological principle, Danny Hillis, double helix, Douglas Hofstadter, Everything should be made as simple as possible, financial engineering, first square of the chessboard / second half of the chessboard, flying shuttle, fudge factor, functional programming, George Gilder, Gödel, Escher, Bach, Hans Moravec, I think there is a world market for maybe five computers, information retrieval, invention of movable type, Isaac Newton, iterative process, Jacquard loom, John Gilmore, John Markoff, John von Neumann, Lao Tzu, Law of Accelerating Returns, mandelbrot fractal, Marshall McLuhan, Menlo Park, natural language processing, Norbert Wiener, optical character recognition, ought to be enough for anybody, pattern recognition, phenotype, punch-card reader, quantum entanglement, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, Robert Metcalfe, Schrödinger's Cat, Search for Extraterrestrial Intelligence, self-driving car, Silicon Valley, social intelligence, speech recognition, Steven Pinker, Stewart Brand, stochastic process, Stuart Kauffman, technological singularity, Ted Kaczynski, telepresence, the medium is the message, The Soul of a New Machine, There's no reason for any individual to have a computer in his home - Ken Olsen, traveling salesman, Turing machine, Turing test, Whole Earth Review, world market for maybe five computers, Y2K

He cites the following sentence:“What number of products of products of products of products of products of products of products of products was the number of products of products of products of products of products of products of products of products?” as having 1,430 X 1,430 = 2,044,900 interpretations. 4 These and other theoretical aspects of computational linguistics are covered in Mary D. Harris, Introduction to Natural Language Processing (Reston, VA: Reston Publishing Co., 1985). CHAPTER 6: BUILDING NEW BRAINS ... 1 Hans Moravec is likely to make this argument in his 1998 book Robot: Mere Machine to Transcendent Mind (Oxford University Press; not yet available as of this writing). 2 One hundred fifty million calculations per second for a 1998 personal computer doubling twenty-seven times by the year 2025 (this assumes doubling both the number of components, and the speed of each component every two years) equals about 20 million billion calculations per second.

Babbage’s Calculating Engines: A Collection of Papers by Henry Prevost Babbage (Editor). Vol. 2. Los Angeles: Tomash, 1982. Bailey, James. After Thought: The Computer Challenge to Human Intelligence. New York: Basic Books, 1996. Bara, Bruno G. and Giovanni Guida. Computational Models of Natural Language Processing. Amsterdam: North Holland, 1984. Barnsley, Michael F. Fractals Everywhere. Boston: Academic Press Professional, 1993. Baron, Jonathan. Rationality and Intelligence. Cambridge: Cambridge University Press, 1985. Barrett, Paul H., ed. The Collected Papers of Charles Darwin. Vols. 1 and 2.

Expert Systems: Artificial Intelligence in Business. New York: John Wiley and Sons, 1985. Harre, Rom, ed. American Behaviorial Scientist: Computation and the Mind. Vol. 40, no. 6, May 1997. Harrington, Steven. Computer Graphics: A Programming Approach. New York: McGraw-Hill, 1987. Harris, Mary Dee. Introduction to Natural Language Processing. Reston, VA: Reston, 1985. Haugeland, John. Artificial Intelligence: The Very Idea. Cambridge, MA: MIT Press, 1985. ________, ed. Mind Design: Philosophy, Psychology, Artificial Intelligence. Cambridge, MA: MIT Press, 1981. ________, ed. Mind Design II: Philosophy, Psychology, Artificial Intelligence.


pages: 1,085 words: 219,144

Solr in Action by Trey Grainger, Timothy Potter

business intelligence, cloud computing, commoditize, conceptual framework, crowdsourcing, data acquisition, data science, en.wikipedia.org, failed state, fault tolerance, finite state, full text search, functional programming, glass ceiling, information retrieval, machine readable, natural language processing, openstreetmap, performance metric, premature optimization, recommendation engine, web application

log function LoggingHandler class logs, 2nd long queries <long> element LowerCaseFilter LowerCaseFilterFactory, 2nd, 3rd LRU (Least Recently Used) <lst> element, 2nd Lucene, 2nd lucene folder Lucene in Action <luceneMatchVersion> element LuceneQParserPlugin class lucene-solr/ folder LukeRequestHandler class, 2nd M map function MappingCharFilterFactory MapReduce master.replication.enabled parameter masterUrl parameter math functions <maxBufferedDocs> element maxdoc function maxMergeAtOnce parameter maxShardsPerNode parameter maxWarmingSearchers parameter <maxWarmingSearchers> element MBeans, 2nd mean reciprocal rank metric memcached memory RAM sorting and mentions, preserving in text mergeFactor parameter <mergeFactor> element MERGEINDEXES action <mergePolicy> element <mergeScheduler> element metadata microblog search application example, 2nd MinimalStem filter minimum match missing values, and sorting misspelled terms mm parameter MMapDirectory monitoring, external More Like This feature, 2nd, 3rd, 4th MoreLikeThisHandler class, 2nd ms function MS Office documents MS SQL Server multicore configuration multilingual search data-modeling features language identification dynamically assigning language analyzers dynamically mapping content overview update processors for language-specific field type configurations linguistic analysis scenarios field type for multiple languages multiple languages in one field separate fields per language separate indexes per language stemming dictionary-based (Hunspell) example KeywordMarkerFilterFactory language-specific analyzer chains vs. lemmatization StemmerOverrideFilterFactory multiselect faceting defined excludes keys multitenant search MultiTextField, 2nd MultiTextFieldAnalyzer MultiTextFieldLanguageIdentifierUpdate-Processor MultiTextFieldLanguageIdentifierUpdate-ProcessorFactory MultiTextFieldTokenizer, 2nd multiValued attribute murmur hash algorithm MySQL N Nagios, 2nd Natural Language Processing. See NLP. natural language, search using near real-time search. See NRT search. negated terms Nested query parser nesting function queries .NET Netflix newSearcher event n-grams NIOFSDirectory NLP (Natural Language Processing) node recovery process norm function normal commit Norwegian language NorwegianLightStemFilterFactory NoSQL (Not only SQL), 2nd, 3rd not function NOT operator, 2nd NRTCachingDirectory NRTCachingDirectoryFactory class numdocs function numeric fields overview precisionStep attribute numShards parameter, 2nd, 3rd Nutch O offsite backup for SolrCloud omitNorms attribute, 2nd, 3rd op parameter OpenOffice documents <openSearcher> element Optimize request, update handler optional terms, 2nd optmistic concurrency control OR operator, 2nd Oracle AS ord function outage types OutOfMemoryError P parameters dereferencing local params parameter substitutions <params> element parseArg() method parseFloat() method parseValueSource() method PatternReplaceCharFilterFactory, 2nd payload boosting PDF documents importing common formats indexing peer sync perception of relevancy permissions, document Persian language, 2nd persist parameter pf (phrase fields) parameters PHP, 2nd PHPResponseWriter class PHPSerializedResponseWriter class phrase searches, 2nd phrase slop parameters.

If the text instead read “After sailing for hours, John approached the bank,” you would likely be thinking about a person named John on a boat floating toward the shore. Both sentences state that “John approached the bank,” but the context plays a critical role in ensuring the text is properly understood. Due to advances in the field of Natural Language Processing (NLP), many important contextual clues can be identified in standard text. These can include identification of the language of unknown text, determination of the parts of speech, discovery or approximation of the root form of a word, understanding of synonyms and unimportant words, and discovery of relationships between words through their usage.

Other clustering and data classification techniques can also be used to enrich your data, which can lead to a far superior search experience than keyword searching alone. Although implementing most of these capabilities is beyond the scope of this book, Grant Ingersoll, Thomas Morton, and Andrew Farris provide a great overview of how to implement these kind of natural language processing techniques in Taming Text: How to Find, Organize, and Manipulate It (Manning, 2013), including a chapter on building a question-and-answer system similar to some of the previous examples. What Solr does provide out of the box, however, are the building blocks for these kinds of systems.


Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose

Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, sparse data, speech recognition, statistical model, William of Occam

This is the keyword search approach, well known from the area of information retrieval (IR). In web search, further IR techniques are used to avoid terms that are too general and too specific and to take into account term distribution throughout the entire body of documents as well as to explore document similarity. Natural language processing approaches are also used to analyze term context or lexical information, or to combine several terms into phrases. After retrieving a set of documents ranked by their degree of matching the keyword query, they are further ranked by importance (popularity, authority), usually based on the web link structure.

This process takes a lot of time and effort because it is done by people. There are attempts to use computers for this purpose, but the problem is that content-based access assumes understanding the meaning of documents, something that is still a research question, studied in the area of artificial intelligence and natural language processing in particular. One may argue that natural language texts are structured, which is true as long as the language syntax (grammatical structure) is concerned. However, the transition to meaning still requires semantic structuring or understanding. There exists a solution that avoids the problem of meaning but still provides some types of content-based access to unstructured data.


pages: 292 words: 94,660

The Loop: How Technology Is Creating a World Without Choices and How to Fight Back by Jacob Ward

2021 United States Capitol attack, 4chan, Abraham Wald, AI winter, Albert Einstein, Albert Michelson, Amazon Mechanical Turk, assortative mating, autonomous vehicles, availability heuristic, barriers to entry, Bayesian statistics, Benoit Mandelbrot, Big Tech, bitcoin, Black Lives Matter, Black Swan, blockchain, Broken windows theory, call centre, Cass Sunstein, cloud computing, contact tracing, coronavirus, COVID-19, crowdsourcing, cuban missile crisis, Daniel Kahneman / Amos Tversky, dark matter, data science, deep learning, Donald Trump, drone strike, endowment effect, George Akerlof, George Floyd, hindsight bias, invisible hand, Isaac Newton, Jeffrey Epstein, license plate recognition, lockdown, longitudinal study, Lyft, mandelbrot fractal, Mark Zuckerberg, meta-analysis, natural language processing, non-fungible token, nudge unit, OpenAI, opioid epidemic / opioid crisis, pattern recognition, QAnon, RAND corporation, Richard Thaler, Robert Shiller, selection bias, self-driving car, seminal paper, shareholder value, smart cities, social contagion, social distancing, Steven Levy, survivorship bias, TikTok, Turing test

The ELIZA program knew not only how to reflect back the last sentiment expressed, it could also hang on to certain notions and circle back to them—a simple piece of programming that happens to impersonate one of the best traits of a good listener. My father is afraid of everybody. WHAT ELSE COMES TO MIND WHEN YOU THINK OF YOUR FATHER Bullies. DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE4 Weizenbaum wrote later that he built ELIZA to learn what he could about natural-language processing, but wound up discovering far more about human tendencies. And in the end, those tendencies caused him to flee the field. He spent the rest of his life critiquing the sort of work he’d been doing, and he died in 2008 after a long period of environmental activism. In his 1976 book, Computer Power and Human Reason, he described a series of shocks that had him rethinking his project and recoiling from its implications.

Todd is the most successful of that tiny fraction of a percent. Allen Lau, the CEO of Wattpad, told me that machine learning is the key to finding authors like Todd. “Machines are very, very good at analyzing massive, massive amounts of data. And this is what we have.” And with the rise of natural-language processing, in which an algorithm can glean bits of meaning from what people write, Wattpad can scan not just the writing itself, but what’s being written about the writing. “We are seeing hundreds of millions of comments on those stories every single month. So, a lot of the insights, a lot of the emotion, is actually contained in the comments,” Lau said.


pages: 533

Future Politics: Living Together in a World Transformed by Tech by Jamie Susskind

3D printing, additive manufacturing, affirmative action, agricultural Revolution, Airbnb, airport security, algorithmic bias, AlphaGo, Amazon Robotics, Andrew Keen, Apollo Guidance Computer, artificial general intelligence, augmented reality, automated trading system, autonomous vehicles, basic income, Bertrand Russell: In Praise of Idleness, Big Tech, bitcoin, Bletchley Park, blockchain, Boeing 747, brain emulation, Brexit referendum, British Empire, business process, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, cashless society, Cass Sunstein, cellular automata, Citizen Lab, cloud computing, commons-based peer production, computer age, computer vision, continuation of politics by other means, correlation does not imply causation, CRISPR, crowdsourcing, cryptocurrency, data science, deep learning, DeepMind, digital divide, digital map, disinformation, distributed ledger, Donald Trump, driverless car, easy for humans, difficult for computers, Edward Snowden, Elon Musk, en.wikipedia.org, end-to-end encryption, Erik Brynjolfsson, Ethereum, ethereum blockchain, Evgeny Morozov, fake news, Filter Bubble, future of work, Future Shock, Gabriella Coleman, Google bus, Google X / Alphabet X, Googley, industrial robot, informal economy, intangible asset, Internet of things, invention of the printing press, invention of writing, Isaac Newton, Jaron Lanier, John Markoff, Joseph Schumpeter, Kevin Kelly, knowledge economy, Large Hadron Collider, Lewis Mumford, lifelogging, machine translation, Metcalfe’s law, mittelstand, more computing power than Apollo, move fast and break things, natural language processing, Neil Armstrong, Network effects, new economy, Nick Bostrom, night-watchman state, Oculus Rift, Panopticon Jeremy Bentham, pattern recognition, payday loans, Philippa Foot, post-truth, power law, price discrimination, price mechanism, RAND corporation, ransomware, Ray Kurzweil, Richard Stallman, ride hailing / ride sharing, road to serfdom, Robert Mercer, Satoshi Nakamoto, Second Machine Age, selection bias, self-driving car, sexual politics, sharing economy, Silicon Valley, Silicon Valley startup, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart contracts, Snapchat, speech recognition, Steve Bannon, Steve Jobs, Steve Wozniak, Steven Levy, tech bro, technological determinism, technological singularity, technological solutionism, the built environment, the Cathedral and the Bazaar, The Structural Transformation of the Public Sphere, The Wisdom of Crowds, Thomas L Friedman, Tragedy of the Commons, trolley problem, universal basic income, urban planning, Watson beat the top human players on Jeopardy!, work culture , working-age population, Yochai Benkler

See also Andre Esteva et al., ‘Dermatologist-level Classification of Skin Cancer with Deep Neural Networks’, Nature 542 (2 February 2017): 115–18. Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preotiuc, and Vasileios Lampos,‘Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective’ Peer J Computer Science 2, e93 (24 October 2016). Sarah A. Topol, ‘Attack of the Killer Robots’, BuzzFeed News, 26 August 2016 <https://www.buzzfeed.com/sarahatopol/how-tosave-mankind-from-the-new-breed-of-killer-robots?utm_term=. nm1GdWDBZ#.vaJzgW6va>) (accessed 28 November 2017).

Mark Bridge, ‘AI Can Identify Alzheimer’s Disease a Decade before Symptoms Appear’, The Times, 20 September 2017 <https://www.thetimes.co.uk/article/ai-can-identify-alzheimer-s-a-decade-beforesymptoms-appear-9b3qdrrf7> (accessed 1 December 2017). 23. Wendell Wallach and Colin Allen, Moral Machines: Teaching Robots Right from Wrong (Oxford: Oxford University Press, 2009), 27. 24. Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel Preotiuc, and Vasileios Lampos. ‘Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective’. Peer J Computer Science 2, e93 (24 October 2016). See further Harry Surden, ‘Machine Learning and Law’, Washington Law Review 89, no. 1 (2014): 87–115. 25. Erik Brynjolfsson and Andrew McAfee Machine Platform Crowd: Harnessing Our Digital Future (New York: W. W. Norton & Company, 2017), 41. 26.

Ajunwa, Ifeoma, Kate Crawford, and Jason Schultz. ‘Limitless Worker Surveillance’. California Law Review 105, no. 3 (2017), 734–76. Aletras, Nikolaos, Dimitrios Tsarapatsanis, Daniel Preotiuc, and Vasileios Lampos. ‘Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective’. Peer J Computer Science 2, e93 (24 Oct. 2016). Allen, Jonathan P. Technology and Inequality: Concentrated Wealth in a Digital World. Kindle Edition: Palgrave Macmillan, 2017. Ananny, Mike. ‘Toward an Ethics of Algorithms: Convening, Observation, Probability, and Timeliness’.


pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values by Brian Christian

Albert Einstein, algorithmic bias, Alignment Problem, AlphaGo, Amazon Mechanical Turk, artificial general intelligence, augmented reality, autonomous vehicles, backpropagation, butterfly effect, Cambridge Analytica, Cass Sunstein, Claude Shannon: information theory, computer vision, Computing Machinery and Intelligence, data science, deep learning, DeepMind, Donald Knuth, Douglas Hofstadter, effective altruism, Elaine Herzberg, Elon Musk, Frances Oldham Kelsey, game design, gamification, Geoffrey Hinton, Goodhart's law, Google Chrome, Google Glasses, Google X / Alphabet X, Gödel, Escher, Bach, Hans Moravec, hedonic treadmill, ImageNet competition, industrial robot, Internet Archive, John von Neumann, Joi Ito, Kenneth Arrow, language acquisition, longitudinal study, machine translation, mandatory minimum, mass incarceration, multi-armed bandit, natural language processing, Nick Bostrom, Norbert Wiener, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, OpenAI, Panopticon Jeremy Bentham, pattern recognition, Peter Singer: altruism, Peter Thiel, precautionary principle, premature optimization, RAND corporation, recommendation engine, Richard Feynman, Rodney Brooks, Saturday Night Live, selection bias, self-driving car, seminal paper, side project, Silicon Valley, Skinner box, sparse data, speech recognition, Stanislav Petrov, statistical model, Steve Jobs, strong AI, the map is not the territory, theory of mind, Tim Cook: Apple, W. E. B. Du Bois, Wayback Machine, zero-sum game

It would be the foundation for a completely new field: the project to actually build mechanisms out of these simplified versions of neurons, and see just what such “mechanical brains” could do.9 INTRODUCTION In the summer of 2013, an innocuous post appeared on Google’s open-source blog titled “Learning the Meaning Behind Words.”1 “Today computers aren’t very good at understanding human language,” it began. “While state-of-the-art technology is still a ways from this goal, we’re making significant progress using the latest machine learning and natural language processing techniques.” Google had fed enormous datasets of human language, mined from newspapers and the internet—in fact, thousands of times more text than had ever been successfully used before—into a biologically inspired “neural network,” and let the system pore over the sentences for correlations and connections between the terms.

Shannon, “A Mathematical Theory of Communication.” 56. See Jelinek and Mercer, “Interpolated Estimation of Markov Source Parameters from Sparse Data,” and Katz, “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer”; for an overview, see Manning and Schütze, Foundations of Statistical Natural Language Processing. 57. This famous phrase originated in Bellman, Dynamic Programming. 58. See Hinton, “Learning Distributed Representations of Concepts,” and “Connectionist Learning Procedures,” and Rumelhart and McClelland, Parallel Distributed Processing. 59. See, for instance, latent semantic analysis (see Landauer, Foltz, and Laham, “An Introduction to Latent Semantic Analysis”), the multiple cause mixture model (see Saund, “A Multiple Cause Mixture Model for Unsupervised Learning” and Sahami, Hearst, and Saund, “Applying the Multiple Cause Mixture Model to Text Categorization”), and latent Dirichlet allocation (see Blei, Ng, and Jordan, “Latent Dirichlet Allocation”). 60.

“Categorizing Variants of Goodhart’s Law.” arXiv Preprint arXiv:1803.04585, 2019. Manning, Christopher. “Lecture 2: Word Vector Representations: Word2vec,” April 3, 2017. https://www.youtube.com/watch?v=ERibwqs9p38. Manning, Christopher D., and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. Marewski, Julian N., and Gerd Gigerenzer. “Heuristic Decision Making in Medicine.” Dialogues in Clinical Neuroscience 14, no. 1 (2012): 77–89. Marks, Michelle A., Mark J. Sabella, C. Shawn Burke, and Stephen J. Zaccaro. “The Impact of Cross-Training on Team Effectiveness.”


Beautiful Visualization by Julie Steele

barriers to entry, correlation does not imply causation, data acquisition, data science, database schema, Drosophila, en.wikipedia.org, epigenetics, global pandemic, Hans Rosling, index card, information retrieval, iterative process, linked data, Mercator projection, meta-analysis, natural language processing, Netflix Prize, no-fly zone, pattern recognition, peer-to-peer, performance metric, power law, QR code, recommendation engine, semantic web, social bookmarking, social distancing, social graph, sorting algorithm, Steve Jobs, the long tail, web application, wikimedia commons, Yochai Benkler

Text Analysis We’ll now take a step back and consider some of the fundamental assumptions that determine Wordle’s character. In particular, we have to examine what “text” is, as far as Wordle is concerned. While this kind of text analysis is crude compared to what’s required for some natural-language processing, it can still be tedious to implement. If you work in Java, you might find my cue.language library[13] useful for the kinds of tasks described in this section. It’s small enough, it’s fast enough, and thousands use it each day as part of Wordle. Remember that natural-language analysis is as much craft as science,[14] and even given state-of-the-art computational tools, you have to apply judgment and taste.

[11] See http://levitated.net/daily/levEmotionFractal.html. [12] See http://www.cs.umd.edu/hcil/treemap-history/. [13] See http://github.com/vcl/cue.language. [14] For an illuminating demonstration of this craft, see Peter Norvig’s chapter on natural-language processing in the sister O’Reilly book Beautiful Data. [15] See http://researchweb.watson.ibm.com/visual/inaugurals/. [16] See http://www.alphaworks.ibm.com/tech/wordcloud. [17] See http://manyeyes.alphaworks.ibm.com/manyeyes/page/Visualization_Options.html.


Artificial Whiteness by Yarden Katz

affirmative action, AI winter, algorithmic bias, AlphaGo, Amazon Mechanical Turk, autonomous vehicles, benefit corporation, Black Lives Matter, blue-collar work, Californian Ideology, Cambridge Analytica, cellular automata, Charles Babbage, cloud computing, colonial rule, computer vision, conceptual framework, Danny Hillis, data science, David Graeber, deep learning, DeepMind, desegregation, Donald Trump, Dr. Strangelove, driverless car, Edward Snowden, Elon Musk, Erik Brynjolfsson, European colonialism, fake news, Ferguson, Missouri, general purpose technology, gentrification, Hans Moravec, housing crisis, income inequality, information retrieval, invisible hand, Jeff Bezos, Kevin Kelly, knowledge worker, machine readable, Mark Zuckerberg, mass incarceration, Menlo Park, military-industrial complex, Nate Silver, natural language processing, Nick Bostrom, Norbert Wiener, pattern recognition, phenotype, Philip Mirowski, RAND corporation, recommendation engine, rent control, Rodney Brooks, Ronald Reagan, Salesforce, Seymour Hersh, Shoshana Zuboff, Silicon Valley, Silicon Valley billionaire, Silicon Valley ideology, Skype, speech recognition, statistical model, Stephen Hawking, Stewart Brand, Strategic Defense Initiative, surveillance capitalism, talking drums, telemarketer, The Signal and the Noise by Nate Silver, W. E. B. Du Bois, Whole Earth Catalog, WikiLeaks

But absent that, some doubted whether AI deserved the dedicated funding it was receiving from DARPA. Practitioners then had to justify AI’s utility to the agency: DARPA had requested a “road map” that would summarize AI’s past accomplishments and set milestones for its future. In documents like the road map, AI practitioners carve the field into tracks (such as vision, natural language processing, and symbolic reasoning) and explain how each could advance patrons’ aims. But this activity raises difficulties. For one, dividing AI into areas is challenging; it foregrounds the endeavor’s murky and contested boundaries. Furthermore, DARPA expected researchers to commit to tangible goals and schedules, which can expose gaps between AI’s image and its reality.

These practitioners were influenced not just by phenomenology but also by heterodox conceptions of the relationship between organisms and environments developed in biology and cybernetics—lines of inquiry where a disembodied “analytic” epistemology has not reigned supreme.18 PUTTING PHENOMENOLOGY TO PRACTICE Terry Winograd and Fernando Flores made a bold attempt to reframe AI in their book Understanding Computers and Cognition (1986), a generic title that doesn’t do justice to the unusual nature of their project.19 Weaving strands from phenomenology, biology, linguistics, and their own experiences in building computing systems, Winograd and Flores developed a wide-ranging critique of AI and cognitive science that had the technical authority (and more inviting tone) Dreyfus lacked. They also tried to lay a tangible alternative path for practitioners. Winograd, a computer scientist at Stanford University, had previously worked in the area of natural language processing. His past work fit within mainstream AI (one of his computer programs was even critiqued by Dreyfus). From the start, however, Winograd was uneasy with the assumptions made by his peers.20 He met Flores in California, where Flores would later pursue a doctorate in philosophy with Dreyfus at the University of California, Berkeley.


pages: 116 words: 31,356

Platform Capitalism by Nick Srnicek

"World Economic Forum" Davos, 3D printing, additive manufacturing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, Big Tech, Californian Ideology, Capital in the Twenty-First Century by Thomas Piketty, cloud computing, collaborative economy, collective bargaining, data science, deindustrialization, deskilling, Didi Chuxing, digital capitalism, digital divide, disintermediation, driverless car, Ford Model T, future of work, gig economy, independent contractor, Infrastructure as a Service, Internet of things, Jean Tirole, Jeff Bezos, knowledge economy, knowledge worker, liquidity trap, low interest rates, low skilled workers, Lyft, Mark Zuckerberg, means of production, mittelstand, multi-sided market, natural language processing, Network effects, new economy, Oculus Rift, offshore financial centre, pattern recognition, platform as a service, quantitative easing, RFID, ride hailing / ride sharing, Robert Gordon, Salesforce, self-driving car, sharing economy, Shoshana Zuboff, Silicon Valley, Silicon Valley startup, software as a service, surveillance capitalism, TaskRabbit, the built environment, total factor productivity, two-sided market, Uber and Lyft, Uber for X, uber lyft, unconventional monetary instruments, unorthodox policies, vertical integration, warehouse robotics, Zipcar

Every major platform company is increasingly positioning itself in the natural language interface market as well. In 2016 Facebook began a major push for ‘chatbots’ – that is, low-level AI programmes that would converse with users on Facebook’s platform. (This is also why Facebook – and numerous other companies – are investing heavily in AI and the natural language processing needed to enable chatbots.) The bet is that these chatbots will become the preferred way for users to interact with the internet. On this open platform, businesses would be given the tools to develop their own bots and create intuitive means for users to order food, buy a train ticket, or make a dinner reservation.24 Rather than using a separate app or website for accessing businesses and services, users would simply access them through Facebook’s platform, which would make Facebook’s chatbot platform the primary interface for commercial transactions online.


The Deep Learning Revolution (The MIT Press) by Terrence J. Sejnowski

AI winter, Albert Einstein, algorithmic bias, algorithmic trading, AlphaGo, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, autonomous vehicles, backpropagation, Baxter: Rethink Robotics, behavioural economics, bioinformatics, cellular automata, Claude Shannon: information theory, cloud computing, complexity theory, computer vision, conceptual framework, constrained optimization, Conway's Game of Life, correlation does not imply causation, crowdsourcing, Danny Hillis, data science, deep learning, DeepMind, delayed gratification, Demis Hassabis, Dennis Ritchie, discovery of DNA, Donald Trump, Douglas Engelbart, driverless car, Drosophila, Elon Musk, en.wikipedia.org, epigenetics, Flynn Effect, Frank Gehry, future of work, Geoffrey Hinton, Google Glasses, Google X / Alphabet X, Guggenheim Bilbao, Gödel, Escher, Bach, haute couture, Henri Poincaré, I think there is a world market for maybe five computers, industrial robot, informal economy, Internet of things, Isaac Newton, Jim Simons, John Conway, John Markoff, John von Neumann, language acquisition, Large Hadron Collider, machine readable, Mark Zuckerberg, Minecraft, natural language processing, Neil Armstrong, Netflix Prize, Norbert Wiener, OpenAI, orbital mechanics / astrodynamics, PageRank, pattern recognition, pneumatic tube, prediction markets, randomized controlled trial, Recombinant DNA, recommendation engine, Renaissance Technologies, Rodney Brooks, self-driving car, Silicon Valley, Silicon Valley startup, Socratic dialogue, speech recognition, statistical model, Stephen Hawking, Stuart Kauffman, theory of mind, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, traveling salesman, Turing machine, Von Neumann architecture, Watson beat the top human players on Jeopardy!, world market for maybe five computers, X Prize, Yogi Berra

Brooks departed from the traditional controllers used by roboticists and used behavior rather than computation as the metaphor for designing robots. As we learn more from building robots, it will become apparent that the body is a part of the mind. Nature Is Cleverer Than We Are 257 In “Why Natural Language Processing is Now Statistical Natural Language Processing,” Eugene Charniak explained that a basic part of grammar is to tag parts of speech in a sentence. This is something that humans can be trained to do much better than the extant parsing programs. The field of computational linguistics initially tried to apply the generative grammar approach pioneered by Noam Chomsky in the 1980s, but the results were disappointing.


pages: 413 words: 106,479

Because Internet: Understanding the New Rules of Language by Gretchen McCulloch

4chan, Black Lives Matter, book scanning, British Empire, Cambridge Analytica, citation needed, context collapse, Day of the Dead, DeepMind, digital divide, disinformation, Donald Trump, emotional labour, en.wikipedia.org, eternal september, Firefox, Flynn Effect, Google Hangouts, Ian Bogost, Internet Archive, invention of the printing press, invention of the telephone, lolcat, machine translation, moral panic, multicultural london english, natural language processing, Neal Stephenson, off-the-grid, pre–internet, QWERTY keyboard, Ray Oldenburg, Silicon Valley, Skype, Snapchat, Snow Crash, social bookmarking, social web, SoftBank, Steven Pinker, tech worker, TED Talk, telemarketer, The Great Good Place, the strength of weak ties, Twitter Arab Spring, upwardly mobile, Watson beat the top human players on Jeopardy!, Wayback Machine

favors a few elite languages and dialects: François Grosjean. 2010. Bilingual. Harvard University Press. One method of bridging: Su Lin Blodgett, Lisa Green, and Brendan O’Connor. 2016. “Demographic Dialectal Variation in Social Media: A Case Study of African-American English.” Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. pp. 1119–1130. arxiv.org/pdf/1608.08868v1.pdf. “15-year-old users”: Ivan Smirnov. 2017. “The Digital Flynn Effect: Complexity of Posts on Social Media Increases over Time.” Presented at the International Conference on Social Informatics, September 13–15, 2017, Oxford, UK. arxiv.org/abs/1707.05755.

(No publisher cited.) www.gutenberg.org/ebooks/4956. top twenty most lengthened words: Samuel Brody and Nicholas Diakopoulos. 2011. “Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs.” Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. pp. 562–570. expressive lengthening: Tyler Schnoebelen. January 8, 2013. “Aww, hmmm, ohh heyyy nooo omggg!” Corpus Linguistics. corplinguistics.wordpress.com/2013/01/08/aww-hmmm-ohh-heyyy-nooo-omggg/. Jen Doll. 2016. “Why Drag It Out?” The Atlantic. www.theatlantic.com/magazine/archive/2013/03/dragging-it-out/309220/.


pages: 407 words: 104,622

The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution by Gregory Zuckerman

affirmative action, Affordable Care Act / Obamacare, Alan Greenspan, Albert Einstein, Andrew Wiles, automated trading system, backtesting, Bayesian statistics, Bear Stearns, beat the dealer, behavioural economics, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Black Monday: stock market crash in 1987, blockchain, book value, Brownian motion, butter production in bangladesh, buy and hold, buy low sell high, Cambridge Analytica, Carl Icahn, Claude Shannon: information theory, computer age, computerized trading, Credit Default Swap, Daniel Kahneman / Amos Tversky, data science, diversified portfolio, Donald Trump, Edward Thorp, Elon Musk, Emanuel Derman, endowment effect, financial engineering, Flash crash, George Gilder, Gordon Gekko, illegal immigration, index card, index fund, Isaac Newton, Jim Simons, John Meriwether, John Nash: game theory, John von Neumann, junk bonds, Loma Prieta earthquake, Long Term Capital Management, loss aversion, Louis Bachelier, mandelbrot fractal, margin call, Mark Zuckerberg, Michael Milken, Monty Hall problem, More Guns, Less Crime, Myron Scholes, Naomi Klein, natural language processing, Neil Armstrong, obamacare, off-the-grid, p-value, pattern recognition, Peter Thiel, Ponzi scheme, prediction markets, proprietary trading, quantitative hedge fund, quantitative trading / quantitative finance, random walk, Renaissance Technologies, Richard Thaler, Robert Mercer, Ronald Reagan, self-driving car, Sharpe ratio, Silicon Valley, sovereign wealth fund, speech recognition, statistical arbitrage, statistical model, Steve Bannon, Steve Jobs, stochastic process, the scientific method, Thomas Bayes, transaction costs, Turing machine, Two Sigma

Once in a while, he’d issue statements that seemed aimed at getting a rise out of his lunch-mates, such as the time he declared that he thought he would live forever. Brown was more animated, approachable, and energetic, with thick, curly brown hair and an infectious charm. Unlike Mercer, Brown forged friendships within the group, several members of which appreciated his sneaky sense of humor. As the group struggled to make progress in natural-language processing, though, Brown showed impatience, directing special ire at an intern named Phil Resnik. A graduate student at the University of Pennsylvania who had earned a bachelor of arts in computer science at Harvard University and would later become a respected academic, Resnik hoped to combine mathematical tactics with linguistic principles.

Feng-Hsiung Hsu, Behind Deep Blue: Building the Computer That Defeated the World Chess Champion (Princeton, NJ: Princeton University Press, 2002). Chapter Ten 1. Peter Brown and Robert Mercer, “Oh, Yes, Everything’s Right on Schedule, Fred” (lecture, Twenty Years of Bitext Workshop, Empirical Methods in Natural Language Processing Conference, Seattle, Washington, October 2013), http://cs.jhu.edu/~post/bitext. Chapter Eleven 1. Hal Lux, “The Secret World of Jim Simons,” Institutional Investor, November 1, 2000, https://www.institutionalinvestor.com/article/b151340bp779jn/the-secret-world-of-jim-simons. 2.


pages: 419 words: 109,241

A World Without Work: Technology, Automation, and How We Should Respond by Daniel Susskind

"World Economic Forum" Davos, 3D printing, agricultural Revolution, AI winter, Airbnb, Albert Einstein, algorithmic trading, AlphaGo, artificial general intelligence, autonomous vehicles, basic income, Bertrand Russell: In Praise of Idleness, Big Tech, blue-collar work, Boston Dynamics, British Empire, Capital in the Twenty-First Century by Thomas Piketty, cloud computing, computer age, computer vision, computerized trading, creative destruction, David Graeber, David Ricardo: comparative advantage, deep learning, DeepMind, Demis Hassabis, demographic transition, deskilling, disruptive innovation, Donald Trump, Douglas Hofstadter, driverless car, drone strike, Edward Glaeser, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, fake news, financial innovation, flying shuttle, Ford Model T, fulfillment center, future of work, gig economy, Gini coefficient, Google Glasses, Gödel, Escher, Bach, Hans Moravec, income inequality, income per capita, industrial robot, interchangeable parts, invisible hand, Isaac Newton, Jacques de Vaucanson, James Hargreaves, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Joi Ito, Joseph Schumpeter, Kenneth Arrow, Kevin Roose, Khan Academy, Kickstarter, Larry Ellison, low skilled workers, lump of labour, machine translation, Marc Andreessen, Mark Zuckerberg, means of production, Metcalfe’s law, natural language processing, Neil Armstrong, Network effects, Nick Bostrom, Occupy movement, offshore financial centre, Paul Samuelson, Peter Thiel, pink-collar, precariat, purchasing power parity, Ray Kurzweil, ride hailing / ride sharing, road to serfdom, Robert Gordon, Sam Altman, Second Machine Age, self-driving car, shareholder value, sharing economy, Silicon Valley, Snapchat, social intelligence, software is eating the world, sovereign wealth fund, spinning jenny, Stephen Hawking, Steve Jobs, strong AI, tacit knowledge, technological solutionism, TED Talk, telemarketer, The Future of Employment, The Rise and Fall of American Growth, the scientific method, The Theory of the Leisure Class by Thorstein Veblen, The Wealth of Nations by Adam Smith, Thorstein Veblen, Travis Kalanick, Turing test, Two Sigma, Tyler Cowen, Tyler Cowen: Great Stagnation, universal basic income, upwardly mobile, warehouse robotics, Watson beat the top human players on Jeopardy!, We are the 99%, wealth creators, working poor, working-age population, Y Combinator

Quinn, “The Supreme Court Forecasting Project: Legal and Political Science Approaches to Predicting Supreme Court Decisionmaking,” Columbia Law Review 104:4 (2004), 1150–1210. 39.  Nikolas Aletras, Dimitrios Tsarapatsanis, Daniel Preoţiuc-Pietro, and Vasileios Lampos, “Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective,” PeerJ Computer Science 2:93 (2016). 40.  Though by no means limited to diagnosis. See Eric Topol, “High-Performance Medicine: The Convergence of Human and Artificial Intelligence,” Nature 25 (2019), 44–56, for a broader overview of the uses of AI in medicine. 41.  Jeffrey De Fauw, Joseph Ledsam, Bernardino Romera-Paredes, et al., “Clinically Applicable Deep Learning for Diagnosis and Referral in Retinal Disease,” Nature Medicine 24 (2018), 1342–50. 42.  

Alesina, Alberto, Edward Glaeser, and Bruce Sacerdote. “Why Doesn’t the United States Have a European-Style Welfare State?” Brookings Papers on Economic Activity 2 (2001). Aletras, Nikolas, Dimitrios Tsarapatsanis, Daniel Preoţiuc-Pietro, and Vasileios Lampos. “Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective.” PeerJ Computer Science 2, no. 93 (2016). Allen, Robert. “The Industrial Revolution in Miniature: The Spinning Jenny in Britain, France, and India.” Oxford University Working Paper No. 375 (2017). Alstadsæter, Annette, Niels Johannesen, and Gabriel Zucman. “Tax Evasion and Inequality.”


Reset by Ronald J. Deibert

23andMe, active measures, air gap, Airbnb, Amazon Web Services, Anthropocene, augmented reality, availability heuristic, behavioural economics, Bellingcat, Big Tech, bitcoin, blockchain, blood diamond, Brexit referendum, Buckminster Fuller, business intelligence, Cal Newport, call centre, Cambridge Analytica, carbon footprint, cashless society, Citizen Lab, clean water, cloud computing, computer vision, confounding variable, contact tracing, contact tracing app, content marketing, coronavirus, corporate social responsibility, COVID-19, crowdsourcing, data acquisition, data is the new oil, decarbonisation, deep learning, deepfake, Deng Xiaoping, disinformation, Donald Trump, Doomsday Clock, dual-use technology, Edward Snowden, Elon Musk, en.wikipedia.org, end-to-end encryption, Evgeny Morozov, failed state, fake news, Future Shock, game design, gig economy, global pandemic, global supply chain, global village, Google Hangouts, Great Leap Forward, high-speed rail, income inequality, information retrieval, information security, Internet of things, Jaron Lanier, Jeff Bezos, John Markoff, Lewis Mumford, liberal capitalism, license plate recognition, lockdown, longitudinal study, Mark Zuckerberg, Marshall McLuhan, mass immigration, megastructure, meta-analysis, military-industrial complex, move fast and break things, Naomi Klein, natural language processing, New Journalism, NSO Group, off-the-grid, Peter Thiel, planetary scale, planned obsolescence, post-truth, proprietary trading, QAnon, ransomware, Robert Mercer, Sheryl Sandberg, Shoshana Zuboff, Silicon Valley, single source of truth, Skype, Snapchat, social distancing, sorting algorithm, source of truth, sovereign wealth fund, sparse data, speech recognition, Steve Bannon, Steve Jobs, Stuxnet, surveillance capitalism, techlash, technological solutionism, the long tail, the medium is the message, The Structural Transformation of the Public Sphere, TikTok, TSMC, undersea cable, unit 8200, Vannevar Bush, WikiLeaks, zero day, zero-sum game

We were assured by those companies that the device’s audio capture system would activate a connection to their cloud system only when a user would utter the exact phrase that matched a file stored locally on the device itself. No one was actually “eavesdropping.” Rather, the company’s artificial intelligence and natural language processing algorithms analyzed the audio to improve the quality of the responses provided to users’ questions. And then, one after another, news reports showed that those assurances were misleading. In fact, Amazon, Google, Microsoft, and Apple all retain human contractors to listen in on audio recordings to transcribe what’s being said in order to improve the companies’ AI systems.75 Moreover, an investigation undertaken by the Belgian news organization VRT showed that some of Google’s audio recordings were activated without the trigger words being uttered, apparently by accident.76 VRT obtained and listened to more than one thousand recordings shared with them by an outside contractor used by Google, confirming that the recordings started without the activation words and included highly sensitive personal information, discussions about finances, conversations involving minor children, and even what sounded to the journalists like physical violence and distress.

Central Asian countries like Uzbekistan and Kazakhstan have even gone so far as to advertise for Bitcoin mining operations to be hosted in their jurisdictions because of cheap and plentiful coal and other fossil-fuelled energy sources.349 Some estimates put electric energy consumption associated with Bitcoin mining at around 83.67 terawatt-hours per year, more than that of the entire country of Finland, with carbon emissions estimated at 33.82 megatons, roughly equivalent to those of Denmark.350 To put it another way, the Cambridge Centre for Alternative Finance says that the electricity consumed by the Bitcoin network in one year could power all the teakettles used to boil water in the entire United Kingdom for nineteen years.351 A similar energy-sucking dynamic underlies other cutting-edge technologies, like “deep learning.” The latter refers to the complex artificial intelligence systems used to undertake the fine-grained, real-time calculations associated with the range of social media experiences, such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, and so on. Research undertaken at the University of Massachusetts, Amherst, in which the researchers performed a life-cycle assessment for training several common large AI models, found that training a single AI model can emit more than 626,000 pounds of carbon dioxide equivalent — or nearly five times the lifetime emissions of the average American car (including its manufacturing).352 It’s become common to hear that “data is the new oil,” usually meaning that it is a valuable resource.


pages: 408 words: 105,715

Kingdom of Characters: The Language Revolution That Made China Modern by Jing Tsu

affirmative action, British Empire, computer age, Deng Xiaoping, Frederick Winslow Taylor, Great Leap Forward, information retrieval, invention of movable type, machine readable, machine translation, Menlo Park, natural language processing, Norbert Wiener, QWERTY keyboard, scientific management, Silicon Valley, smart cities, South China Sea, transcontinental railway

Instead of laying down rubber-wrapped cables, China is manufacturing fiber-optic cables and investing in space satellites, building a worldwide network of economic influence—overland, underwater, and into space. More than a century’s effort at learning how to standardize and transform its language into a modern technology has landed China here, at the beginning—not the end—of becoming a standard setter, from artificial intelligence to quantum natural language processing, automation to machine translation. The Chinese script has completely turned around its position in relation to the Western alphabetic script. There are currently more than 900 million internet users in China. As each of them searches through Chinese language websites, uses Chinese input methods, posts on social media, and buys and sells on Chinese websites every single day, they are making the Chinese internet smarter, faster, and ever more rich in data.

The Chinese script revolution has scaled up that statement for contemporary relevance. The country’s most decisive push will come in the next two decades, as it aims to take the lead in artificial intelligence by 2035. Deep neural networks are being trained on China’s ever-growing volume of data. Chinese tech giant Baidu has become a leader in machine translation and natural language processing, while Tencent sits on a wealth of data gathered through WeChat and its video gaming platforms. From health care to smart cities, education to social control, the Chinese state’s priority under its current leadership is to implement, if not to perfect, its vision of global governance. The country now enjoys a level of confidence it did not have for two centuries.


pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, backpropagation, basic income, Bayesian statistics, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Charles Babbage, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, creative destruction, crowdsourcing, Danny Hillis, data is not the new oil, data is the new oil, data science, deep learning, DeepMind, double helix, Douglas Hofstadter, driverless car, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, Geoffrey Hinton, global village, Google Glasses, Gödel, Escher, Bach, Hans Moravec, incognito mode, information retrieval, Jeff Hawkins, job automation, John Markoff, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, large language model, lone genius, machine translation, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, Nick Bostrom, NP-complete, off grid, P = NP, PageRank, pattern recognition, phenotype, planetary scale, power law, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, scientific worldview, Second Machine Age, self-driving car, Silicon Valley, social intelligence, speech recognition, Stanford marshmallow experiment, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the long tail, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight, yottabyte, zero-sum game

, Earth orbits the sun). It discovered concepts like planet all by itself. The version we used was more advanced than the basic one I’ve described here, but the essential ideas are the same. Various research groups have used Alchemy or their own MLN implementations to solve problems in natural language processing, computer vision, activity recognition, social network analysis, molecular biology, and many other areas. Despite its successes, Alchemy has some significant shortcomings. It does not yet scale to truly big data, and someone without a PhD in machine learning will find it hard to use. Because of these problems, it’s not yet ready for prime time.

“Relevance weighting of search terms,”* by Stephen Robertson and Karen Sparck Jones (Journal of the American Society for Information Science, 1976), explains the use of Naïve Bayes–like methods in information retrieval. “First links in the Markov chain,” by Brian Hayes (American Scientist, 2013), recounts Markov’s invention of the eponymous chains. “Large language models in machine translation,”* by Thorsten Brants et al. (Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007), explains how Google Translate works. “The PageRank citation ranking: Bringing order to the Web,”* by Larry Page, Sergey Brin, Rajeev Motwani, and Terry Winograd (Stanford University technical report, 1998), describes the PageRank algorithm and its interpretation as a random walk over the web.


pages: 470 words: 109,589

Apache Solr 3 Enterprise Search Server by Unknown

bioinformatics, business logic, continuous integration, database schema, en.wikipedia.org, fault tolerance, Firefox, full text search, functional programming, information retrieval, natural language processing, performance metric, platform as a service, Ruby on Rails, SQL injection, Wayback Machine, web application

If you want to de-duplicate your data (that is you don't want to add the same data twice accidentally) then this will do that for you. For further information see http://wiki.apache.org/solr/Deduplication. UIMAUpdateProcessorFactory: This hands the document off to the Unstructured Information Management Architecture (UIMA), a Solr contrib module that enhances the document through natural language processing (NLP) techniques. For further information see http://wiki.apache.org/solr/SolrUIMA. Although it's nice to see an NLP integration option in Solr, beware that NLP processing tends to be computationally expensive. Instead of using UIMA in this way, consider performing this processing external to Solr and cache the results to avoid re-computation as you adjust your indexing process.

If you have named locations (for example, "Boston, MA") then the data needs to be resolved to latitudes and longitudes using a gazetteer like Geonames—http://www.geonames.org. If all you have is free-form natural language text without the locations identified, then you'll have to perform a more difficult task that uses Natural Language Processing techniques to find the named locations. These approaches are out of scope of this book. The principle field type in Solr for geospatial is LatLonType, which stores a single latitude-longitude pair. Under the hood, this field type copies the latitude and longitude into a pair of indexed fields using the provided field name suffix.


pages: 1,172 words: 114,305

New Laws of Robotics: Defending Human Expertise in the Age of AI by Frank Pasquale

affirmative action, Affordable Care Act / Obamacare, Airbnb, algorithmic bias, Amazon Mechanical Turk, Anthropocene, augmented reality, Automated Insights, autonomous vehicles, basic income, battle of ideas, Bernie Sanders, Big Tech, Bill Joy: nanobots, bitcoin, blockchain, Brexit referendum, call centre, Cambridge Analytica, carbon tax, citizen journalism, Clayton Christensen, collective bargaining, commoditize, computer vision, conceptual framework, contact tracing, coronavirus, corporate social responsibility, correlation does not imply causation, COVID-19, critical race theory, cryptocurrency, data is the new oil, data science, decarbonisation, deep learning, deepfake, deskilling, digital divide, digital twin, disinformation, disruptive innovation, don't be evil, Donald Trump, Douglas Engelbart, driverless car, effective altruism, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, fake news, Filter Bubble, finite state, Flash crash, future of work, gamification, general purpose technology, Google Chrome, Google Glasses, Great Leap Forward, green new deal, guns versus butter model, Hans Moravec, high net worth, hiring and firing, holacracy, Ian Bogost, independent contractor, informal economy, information asymmetry, information retrieval, interchangeable parts, invisible hand, James Bridle, Jaron Lanier, job automation, John Markoff, Joi Ito, Khan Academy, knowledge economy, late capitalism, lockdown, machine readable, Marc Andreessen, Mark Zuckerberg, means of production, medical malpractice, megaproject, meta-analysis, military-industrial complex, Modern Monetary Theory, Money creation, move fast and break things, mutually assured destruction, natural language processing, new economy, Nicholas Carr, Nick Bostrom, Norbert Wiener, nuclear winter, obamacare, One Laptop per Child (OLPC), open immigration, OpenAI, opioid epidemic / opioid crisis, paperclip maximiser, paradox of thrift, pattern recognition, payday loans, personalized medicine, Peter Singer: altruism, Philip Mirowski, pink-collar, plutocrats, post-truth, pre–internet, profit motive, public intellectual, QR code, quantitative easing, race to the bottom, RAND corporation, Ray Kurzweil, recommendation engine, regulatory arbitrage, Robert Shiller, Rodney Brooks, Ronald Reagan, self-driving car, sentiment analysis, Shoshana Zuboff, Silicon Valley, Singularitarianism, smart cities, smart contracts, software is eating the world, South China Sea, Steve Bannon, Strategic Defense Initiative, surveillance capitalism, Susan Wojcicki, tacit knowledge, TaskRabbit, technological solutionism, technoutopianism, TED Talk, telepresence, telerobotics, The Future of Employment, The Turner Diaries, Therac-25, Thorstein Veblen, too big to fail, Turing test, universal basic income, unorthodox policies, wage slave, Watson beat the top human players on Jeopardy!, working poor, workplace surveillance , Works Progress Administration, zero day

Unions that tend toward a path of professionalization—empowering their members to protect those they serve—should have an important role in shaping the AI revolution. Sometimes it will be difficult to demonstrate that a human-centered process is better than an automated one. Crude monetary metrics crowd out complex critical standards. For example, machine learning programs may soon predict, based on brute-force natural language processing, whether one book proposal is more likely than another to be a best seller. From a purely economic perspective, such programs may be better than editors or directors at picking manuscripts or film scripts. Nevertheless, those in creative industries should stand up for their connoisseurship.

But rather than being universal and objective, it produces knowledge that is irrevocably entangled with specific computational mechanisms & the data used for training.”34 All of these shortcomings support a larger critique of many opaque forms of machine judgment; being unexplained (or unexplainable), they stand or fall based on the representativeness of training data.35 For example, imagine an overwhelmed court that uses natural-language processing to determine which of its present complaints are most like complaints that succeeded in the past and then prioritizes those complaints as it triages its workflow. To the extent that the past complaints reflect past conditions that no longer hold, they cannot be a good guide to which current claims are actually meritorious.36 A more explainable system, which identified why it isolated certain words or phrases as indicating a particularly grave or valid claim, would be more useful.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst

algorithmic trading, bioinformatics, business intelligence, business logic, business process, call centre, cloud computing, create, read, update, delete, data acquisition, data science, DevOps, extractivism, fault tolerance, information security, Large Hadron Collider, linked data, machine readable, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, warehouse automation, Watson beat the top human players on Jeopardy!, web application

In addition, the underlying system can resolve references by inferring new triples from the existing records using a rules set. This is a powerful alternative to joining relational tables to resolve references in a typical RDBMS, while also offering a more expressive way to model data than a key value store. One of the most powerful aspects of semantic technology comes from the world of linguistics and natural language processing, also known as entity extraction. This is a powerful mechanism to extract information from unstructured data and combine it with transactional data, enabling deep analytics by bringing these worlds closer together. Another method that brings structure to the unstructured is the text analytics tool, which is improving daily as scientists come up with new ways of making algorithms understand written text more accurately.


pages: 163 words: 42,402

Machine Learning for Email by Drew Conway, John Myles White

call centre, correlation does not imply causation, data science, Debian, natural language processing, Netflix Prize, pattern recognition, recommendation engine, SpamAssassin, text mining

This would cause catastrophic results for our classifier, as many, or even all, messages would be incorrectly assigned a zero probability to be either spam or ham. Researchers have come up with many clever ways of trying to get around this problem, such as drawing a random probability from some distribution or using natural language processing (NLP) techniques to estimate the “spamminess” of a term given its context. For our purposes, we will use a very simple rule: assign a very small probability to terms that are not in the training set. This is, in fact, a common way of dealing with missing terms in simple text classifiers, and for our purposes it will serve just fine.


pages: 413 words: 119,587

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots by John Markoff

A Declaration of the Independence of Cyberspace, AI winter, airport security, Andy Rubin, Apollo 11, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, basic income, Baxter: Rethink Robotics, Bill Atkinson, Bill Duvall, bioinformatics, Boston Dynamics, Brewster Kahle, Burning Man, call centre, cellular automata, Charles Babbage, Chris Urmson, Claude Shannon: information theory, Clayton Christensen, clean water, cloud computing, cognitive load, collective bargaining, computer age, Computer Lib, computer vision, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, data acquisition, Dean Kamen, deep learning, DeepMind, deskilling, Do you want to sell sugared water for the rest of your life?, don't be evil, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dr. Strangelove, driverless car, dual-use technology, Dynabook, Edward Snowden, Elon Musk, Erik Brynjolfsson, Evgeny Morozov, factory automation, Fairchild Semiconductor, Fillmore Auditorium, San Francisco, From Mathematics to the Technologies of Life and Death, future of work, Galaxy Zoo, General Magic , Geoffrey Hinton, Google Glasses, Google X / Alphabet X, Grace Hopper, Gunnar Myrdal, Gödel, Escher, Bach, Hacker Ethic, Hans Moravec, haute couture, Herbert Marcuse, hive mind, hype cycle, hypertext link, indoor plumbing, industrial robot, information retrieval, Internet Archive, Internet of things, invention of the wheel, Ivan Sutherland, Jacques de Vaucanson, Jaron Lanier, Jeff Bezos, Jeff Hawkins, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John Perry Barlow, John von Neumann, Kaizen: continuous improvement, Kevin Kelly, Kiva Systems, knowledge worker, Kodak vs Instagram, labor-force participation, loose coupling, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, medical residency, Menlo Park, military-industrial complex, Mitch Kapor, Mother of all demos, natural language processing, Neil Armstrong, new economy, Norbert Wiener, PageRank, PalmPilot, pattern recognition, Philippa Foot, pre–internet, RAND corporation, Ray Kurzweil, reality distortion field, Recombinant DNA, Richard Stallman, Robert Gordon, Robert Solow, Rodney Brooks, Sand Hill Road, Second Machine Age, self-driving car, semantic web, Seymour Hersh, shareholder value, side project, Silicon Valley, Silicon Valley startup, Singularitarianism, skunkworks, Skype, social software, speech recognition, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Strategic Defense Initiative, strong AI, superintelligent machines, tech worker, technological singularity, Ted Nelson, TED Talk, telemarketer, telepresence, telepresence robot, Tenerife airport disaster, The Coming Technological Singularity, the medium is the message, Thorstein Veblen, Tony Fadell, trolley problem, Turing test, Vannevar Bush, Vernor Vinge, warehouse automation, warehouse robotics, Watson beat the top human players on Jeopardy!, We are as Gods, Whole Earth Catalog, William Shockley: the traitorous eight, zero-sum game

Because it was faster to cast an erroneous line than correct it, typesetters would “run down” the rest of the line with easy-to-type nonsense, later removing the entire line after it had cooled down, or if they forgot, hope a proofreader caught it.9 He wasn’t concerned at the time about any ethical implications involved in building a natural language processing system that could “understand” and respond in a virtual world. In SHRDLU “understanding” meant that the program analyzed the structure of the typed questions and attempted to answer them and respond to the commands. It was an early effort at disambiguation, a thorny problem for natural language processing even today. For example, in the sentence “he put the glass on the table and it broke,” does “it” refer to the glass or the table? Without more context, neither a human nor an AI program could decide.


pages: 588 words: 131,025

The Patient Will See You Now: The Future of Medicine Is in Your Hands by Eric Topol

23andMe, 3D printing, Affordable Care Act / Obamacare, Anne Wojcicki, Atul Gawande, augmented reality, Big Tech, bioinformatics, call centre, Clayton Christensen, clean water, cloud computing, commoditize, computer vision, conceptual framework, connected car, correlation does not imply causation, creative destruction, crowdsourcing, dark matter, data acquisition, data science, deep learning, digital divide, disintermediation, disruptive innovation, don't be evil, driverless car, Edward Snowden, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Firefox, gamification, global village, Google Glasses, Google X / Alphabet X, Ignaz Semmelweis: hand washing, information asymmetry, interchangeable parts, Internet of things, Isaac Newton, it's over 9,000, job automation, Julian Assange, Kevin Kelly, license plate recognition, lifelogging, Lyft, Mark Zuckerberg, Marshall McLuhan, meta-analysis, microbiome, Nate Silver, natural language processing, Network effects, Nicholas Carr, obamacare, pattern recognition, personalized medicine, phenotype, placebo effect, quantum cryptography, RAND corporation, randomized controlled trial, Salesforce, Second Machine Age, self-driving car, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, Snapchat, social graph, speech recognition, stealth mode startup, Steve Jobs, synthetic biology, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, traumatic brain injury, Turing test, Uber for X, uber lyft, Watson beat the top human players on Jeopardy!, WikiLeaks, X Prize

When Sanofi and Regeneron were looking to expedite recruitment of patients with high cholesterol for their new, experimental drug alirocumab, an antibody against the PCSK9 protein, they turned to the American College of Cardiology registry.108 Another approach, developed by researchers at Case Western Reserve University, is a software tool known as “Trial Prospector,” which delves into clinical data systems to match patients with clinical trials.109 It combines artificial intelligence and natural language processing to automate the patient screening and enrollment process, often a rate-limiting step in developing new drugs. Automated clinical trial matching programs for specific conditions, such as the Alzheimer’s Association Trialmatch,107 are proliferating. Data mining to facilitate clinical trial recruitment is offered by a number of companies, such as Blue Chip Marketing Worldwide and Acurian.110 Ben Goldacre, the acclaimed author and one of the leading independent critics and innovators in pharma research, set up the tool “RandomiseMe,” which makes it “easy to run randomized clinical trials on yourself and your friends.”111 So although clinical trial participation is remarkably rare today, there are efforts on multiple fronts to change that in the future.

Cultural change is exceedingly difficult, but given the other forces in the iMedicine galaxy, especially the health care economic crisis that has engendered desperation, it may be possible to accomplish. An aggressive commitment to the education and training of practicing physicians to foster their use of the new tools would not only empower their patients, but also themselves. Eliminating the enormous burden of electronic charting or use of scribes by an all-out effort for natural language processing of voice during a visit would indeed be liberating. It’s long overdue for physicians and health professionals to be constantly cognizant of actual costs, eliminate unnecessary tests and procedures,75a and engage in exquisite electronic communication, which includes e-mail, and sharing notes and all data.


Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps by Valliappa Lakshmanan, Sara Robinson, Michael Munn

A Pattern Language, Airbnb, algorithmic trading, automated trading system, business intelligence, business logic, business process, combinatorial explosion, computer vision, continuous integration, COVID-19, data science, deep learning, DevOps, discrete time, en.wikipedia.org, Hacker News, industrial research laboratory, iterative process, Kubernetes, machine translation, microservices, mobile money, natural language processing, Netflix Prize, optical character recognition, pattern recognition, performance metric, recommendation engine, ride hailing / ride sharing, selection bias, self-driving car, sentiment analysis, speech recognition, statistical model, the payments system, web application

Research scientists, data analysts, and developers may also build and use AI models, but these job roles are not a focus audience for this book. Research scientists focus primarily on finding and developing new algorithms to advance the discipline of ML. This could include a variety of subfields within machine learning, like model architectures, natural language processing, computer vision, hyperparameter tuning, model interpretability, and more. Unlike the other roles discussed here, research scientists spend most of their time prototyping and evaluating new approaches to ML, rather than building out production ML systems. Data analysts evaluate and gather insights from data, then summarize these insights for other teams within their organization.

If we have an array where an item can occur only once (for example, of languages a person speaks), or if the feature just indicates presence and not count (such as whether the mother has ever had a Cesarean operation), then the count at each position is 0 or 1, and this is called multi-hot encoding. To avoid large numbers, the relative frequency can be used instead of the count. The representation for our example would be [0.5, 0.25, 0.25] instead of [2, 1, 1]. Empty arrays (first-born babies with no previous siblings) are represented as [0, 0, 0]. In natural language processing, the relative frequency of a word overall is normalized by the relative frequency of documents that contain the word to yield TF-IDF (short for term frequency–inverse document frequency). TF-IDF reflects how unique a word is to a document. If the array is ordered in a specific way (e.g., in order of time), representing the input array by the last three items.


Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, data science, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Gregor Mendel, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, Large Hadron Collider, longitudinal study, machine readable, machine translation, Mars Rover, natural language processing, openstreetmap, Paradox of Choice, power law, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social bookmarking, social graph, SPARQL, sparse data, speech recognition, statistical model, supply-chain management, systematic bias, TED Talk, text mining, the long tail, Vernor Vinge, web application

Good overviews of clustering, loess, and other machine learning techniques are in The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (Springer; 2008). The section on tags barely touches the surface of statistical language analysis. For more, see the chapters on corpus linguistics from Foundations of Statistical Natural Language Processing by Christopher Manning and Hinrich Schütze (MIT Press; 1999) and also Speech and Language Processing by Daniel Jurafsky and James H. Martin (Prentice Hall; 2008). There are many better ways for estimating confidence intervals for the attractiveness versus age analysis. One method is partial pooling; see pp. 252–258 of Andrew Gelman and Jennifer Hill’s Data Analysis Using Regression and Multilevel/Hierarchical Models (Cambridge University Press; 2006).

He is a Fellow of the AAAI and the ACM and coauthor of Artificial Intelligence: A Modern Approach (Prentice Hall), the leading textbook in the field. Previously he was head of computational sciences at NASA and a faculty member at USC and Berkeley. Brendan O’Connor is a researcher in machine learning and natural language processing. He is a scientific consultant at Dolores Labs and worked previously as a relevance engineer at Powerset. He received a BS and MS in symbolic systems from Stanford University, and is back to academia this fall as a graduate student at Carnegie Mellon University. His blog, “Artificial Intelligence and Social Science,” is at http://anyall.org/blog.


pages: 661 words: 156,009

Your Computer Is on Fire by Thomas S. Mullaney, Benjamin Peters, Mar Hicks, Kavita Philip

"Susan Fowler" uber, 2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, A Declaration of the Independence of Cyberspace, affirmative action, Airbnb, algorithmic bias, AlphaGo, AltaVista, Amazon Mechanical Turk, Amazon Web Services, American Society of Civil Engineers: Report Card, An Inconvenient Truth, Asilomar, autonomous vehicles, Big Tech, bitcoin, Bletchley Park, blockchain, Boeing 737 MAX, book value, British Empire, business cycle, business process, Californian Ideology, call centre, Cambridge Analytica, carbon footprint, Charles Babbage, cloud computing, collective bargaining, computer age, computer vision, connected car, corporate governance, corporate social responsibility, COVID-19, creative destruction, cryptocurrency, dark matter, data science, Dennis Ritchie, deskilling, digital divide, digital map, don't be evil, Donald Davies, Donald Trump, Edward Snowden, en.wikipedia.org, European colonialism, fake news, financial innovation, Ford Model T, fulfillment center, game design, gentrification, George Floyd, glass ceiling, global pandemic, global supply chain, Grace Hopper, hiring and firing, IBM and the Holocaust, industrial robot, informal economy, Internet Archive, Internet of things, Jeff Bezos, job automation, John Perry Barlow, Julian Assange, Ken Thompson, Kevin Kelly, Kickstarter, knowledge economy, Landlord’s Game, Lewis Mumford, low-wage service sector, M-Pesa, Mark Zuckerberg, mass incarceration, Menlo Park, meta-analysis, mobile money, moral panic, move fast and break things, Multics, mutually assured destruction, natural language processing, Neal Stephenson, new economy, Norbert Wiener, off-the-grid, old-boy network, On the Economy of Machinery and Manufactures, One Laptop per Child (OLPC), packet switching, pattern recognition, Paul Graham, pink-collar, pneumatic tube, postindustrial economy, profit motive, public intellectual, QWERTY keyboard, Ray Kurzweil, Reflections on Trusting Trust, Report Card for America’s Infrastructure, Salesforce, sentiment analysis, Sheryl Sandberg, Silicon Valley, Silicon Valley ideology, smart cities, Snapchat, speech recognition, SQL injection, statistical model, Steve Jobs, Stewart Brand, tacit knowledge, tech worker, techlash, technoutopianism, telepresence, the built environment, the map is not the territory, Thomas L Friedman, TikTok, Triangle Shirtwaist Factory, undersea cable, union organizing, vertical integration, warehouse robotics, WikiLeaks, wikimedia commons, women in the workforce, Y2K

These include but are not limited to: • A variety of what I would describe as “first-order,” more rudimentary, blunt tools that are long-standing and widely adopted, such as keyword ban lists for content and user profiles, URL and content filtering, IP blocking, and other user-identifying mechanisms;13 • More sophisticated automated tools such as hashing technologies used in products like PhotoDNA (used to automate the identification and removal of child sexual exploitation content; other engines based on this same technology do the same with regard to terroristic material, the definitions of which are the province of the system’s owners);14 • Higher-order AI tools and strategies for content moderation and management at scale, examples of which might include: ◦ Sentiment analysis and forecasting tools based on natural language processing that can identify when a comment thread has gone bad or, even more impressive, when it is in danger of doing so;15 ◦ AI speech-recognition technology that provides automatic, automated captioning of video content;16 ◦ Pixel analysis (to identify, for example, when an image or a video likely contains nudity);17 ◦ Machine learning and computer vision-based tools deployed toward a variety of other predictive outcomes (such as judging potential for virality or recognizing and predicting potentially inappropriate content).18 Computer vision was in its infancy when I began my research on commercial content moderation.

For an examination of early Chinese contributions to predictive text, see Mullaney, The Chinese Typewriter. 16. Recently, moreover, this process has actually entered the cloud. So-called “cloud input” IMEs, released by companies like Sogou, Baidu, QQ, Tencent, Microsoft, Google, and others, have begun to harness enormous Chinese-language text corpora and ever more sophisticated natural-language-processing algorithms. 17. One additional factor that helps explain the sheer number of input systems at this time is that, whether knowingly or not, these inventors, linguists, developers, and hobbyists were in fact recycling methods that were first invented in China during the 1910s, ’20s, and ’30s—an era well before computing, of course, but one in which Chinese-language reform and educational reform circles were in the grips of what was then called the “character retrieval crisis” (jianzifa wenti), in which various parties debated over which among a wide variety of experimental new methods was the best way to recategorize and reorganize Chinese characters in such contexts as dictionaries, phone books, filing cabinets, and library card catalogs, among others.


pages: 215 words: 59,188

Seriously Curious: The Facts and Figures That Turn Our World Upside Down by Tom Standage

"World Economic Forum" Davos, agricultural Revolution, augmented reality, autonomous vehicles, Big Tech, blood diamond, business logic, corporate governance, CRISPR, deep learning, Deng Xiaoping, Donald Trump, Dr. Strangelove, driverless car, Elon Musk, failed state, financial independence, gender pay gap, gig economy, Gini coefficient, high net worth, high-speed rail, income inequality, index fund, industrial robot, Internet of things, invisible hand, it's over 9,000, job-hopping, Julian Assange, life extension, Lyft, M-Pesa, Mahatma Gandhi, manufacturing employment, mega-rich, megacity, Minecraft, mobile money, natural language processing, Nelson Mandela, plutocrats, post-truth, price mechanism, private spaceflight, prosperity theology / prosperity gospel / gospel of success, purchasing power parity, ransomware, reshoring, ride hailing / ride sharing, Ronald Coase, self-driving car, Silicon Valley, Snapchat, South China Sea, speech recognition, stem cell, supply-chain management, transaction costs, Uber and Lyft, uber lyft, undersea cable, US Airways Flight 1549, WikiLeaks, zoonotic diseases

The original approach to getting computers to understand human language was to use sets of precise rules – for example, in translation, a set of grammar rules for breaking down the meaning of the source language, and another set for reproducing the meaning in the target language. But after a burst of optimism in the 1950s, such systems could not be made to work on complex new sentences; the rules-based approach would not scale up. Funding for so-called natural-language processing went into hibernation for decades, until a renaissance in the late 1980s. Then a new approach emerged, based on machine learning – a technique in which computers are trained using lots of examples, rather than being explicitly programmed. For speech recognition, computers are fed sound files on the one hand, and human-written transcriptions on the other.


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

backpropagation, bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, disinformation, distributed generation, finite state, industrial research laboratory, information retrieval, information security, iterative process, knowledge worker, linked data, machine readable, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, power law, random walk, recommendation engine, RFID, search costs, semantic web, seminal paper, sentiment analysis, sparse data, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

■ Data mining—an interdisciplinary effort: The power of data mining can be substantially enhanced by integrating new methods from multiple disciplines. For example, to mine data with natural language text, it makes sense to fuse data mining methods with methods of information retrieval and natural language processing. As another example, consider the mining of software bugs in large programs. This form of mining, known as bug mining, benefits from the incorporation of software engineering knowledge into the data mining process. ■ Boosting the power of discovery in a networked environment: Most data objects reside in a linked or interconnected environment, whether it be the Web, database relations, files, or documents.

Semantic annotation of a frequent pattern Figure 7.12 shows an example of a semantic annotation for the pattern “{frequent, pattern}.” This dictionary-like annotation provides semantic information related to “{frequent, pattern},” consisting of its strongest context indicators, the most representative data transactions, and the most semantically similar patterns. This kind of semantic annotation is similar to natural language processing. The semantics of a word can be inferred from its context, and words sharing similar contexts tend to be semantically similar. The context indicators and the representative transactions provide a view of the context of the pattern from different angles to help users understand the pattern.

., Potter's wheel: An interactive data cleaning system, In: Proc. 2001 Int. Conf. Very Large Data Bases (VLDB’01) Rome, Italy. (Sept. 2001), pp. 381–390. [RH07] Rosenberg, A.; Hirschberg, J., V-measure: A conditional entropy-based external cluster evaluation measure, In: Proc. 2007 Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07) Prague, Czech Republic. (June 2007), pp. 410–420. [RHS01] Roddick, J.F.; Hornsby, K.; Spiliopoulou, M., An updated bibliography of temporal, spatial, and spatio-temporal data mining research, In: (Editors: Roddick, J.F.; Hornsby, K.)


pages: 219 words: 63,495

50 Future Ideas You Really Need to Know by Richard Watson

23andMe, 3D printing, access to a mobile phone, Albert Einstein, Alvin Toffler, artificial general intelligence, augmented reality, autonomous vehicles, BRICs, Buckminster Fuller, call centre, carbon credits, Charles Babbage, clean water, cloud computing, collaborative consumption, computer age, computer vision, crowdsourcing, dark matter, dematerialisation, Dennis Tito, digital Maoism, digital map, digital nomad, driverless car, Elon Musk, energy security, Eyjafjallajökull, failed state, Ford Model T, future of work, Future Shock, gamification, Geoffrey West, Santa Fe Institute, germ theory of disease, global pandemic, happiness index / gross national happiness, Higgs boson, high-speed rail, hive mind, hydrogen economy, Internet of things, Jaron Lanier, life extension, Mark Shuttleworth, Marshall McLuhan, megacity, natural language processing, Neil Armstrong, Network effects, new economy, ocean acidification, oil shale / tar sands, pattern recognition, peak oil, personalized medicine, phenotype, precision agriculture, private spaceflight, profit maximization, RAND corporation, Ray Kurzweil, RFID, Richard Florida, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Skype, smart cities, smart meter, smart transportation, space junk, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, supervolcano, synthetic biology, tech billionaire, telepresence, The Wisdom of Crowds, Thomas Malthus, Turing test, urban decay, Vernor Vinge, Virgin Galactic, Watson beat the top human players on Jeopardy!, web application, women in the workforce, working-age population, young professional

the condensed idea Thought control timeline 2000 Electrode arrays implanted into owl monkeys 2001 Technology allows a monkey to operate a robotic arm via thought control 2006 Teenager plays Space Invaders using brain signals 2008 Scientists manage to extract images from a person’s mind 2009 Brain–Twitter interface 2017 Voice control replaces 70 percent of keyboards 2026 Google patents neural interface 33 Avatar assistants Computer-based avatars are virtual recreations of real or fictional characters used in forms of computer gaming and in virtual online communities. In the near future they will become common as intelligent digital assistants or personal agents, controlled by forms of artificial intelligence such as natural language processing and accessed via mobile or fixed devices. “Everything is backward now, like out there is the true world, and in here is the dream.” Jake Sully in the movie Avatar Apple’s iPhone 4S offers a tantalizing glimpse of the future in the form of Siri, an application that allows users to employ normal language to send messages or ask questions.


pages: 223 words: 60,936

Remote Work Revolution: Succeeding From Anywhere by Tsedal Neeley

Airbnb, Boycotts of Israel, call centre, cloud computing, coronavirus, COVID-19, cryptocurrency, discrete time, Donald Trump, future of work, global pandemic, iterative process, job satisfaction, knowledge worker, Lean Startup, lockdown, mass immigration, natural language processing, remote work: asynchronous communication, remote working, Silicon Valley, social distancing

Assess your team’s output to date. (See Sample Response) Results Met Expectations? (Yes or No) Exceeded Expectations (Yes or No) Explain New Web App Tool Yes Yes We met the client’s basic needs to share project data, and then also created a dynamic user-friendly interface and added natural language processing functionalities, to go the extra mile. Sales Goals No No Goals were 16 percent under target 2. How can remote work enhance your individual growth on the team? 3. Evaluate your team’s cohesion. Describe any changes you have observed over time, and list potential next steps. (Sample Response) Evidence of Team Cohesion Impact on Productivity Next Steps We doubled the amount of small group virtual meetings.


pages: 247 words: 71,698

Avogadro Corp by William Hertling

Any sufficiently advanced technology is indistinguishable from magic, cloud computing, crowdsourcing, Hacker Ethic, hive mind, invisible hand, messenger bag, natural language processing, Netflix Prize, off-the-grid, private military company, Ray Kurzweil, Recombinant DNA, recommendation engine, Richard Stallman, Ruby on Rails, standardized shipping container, tech worker, technological singularity, Turing test, web application, WikiLeaks

In a sharp tailored suit, and with her reputation hovering about her like an invisible aura, the Avogadro CEO made for an imposing presence. Only her warm smile left a welcoming space in which an ordinary guy like David could stand. She nodded to David as she came in and took her seat at the head of the table. Kenneth asked, “But what you’re describing, how does it work? Natural language processing ability of computers doesn’t even come close to being able to understand the semantics of human language. Have you had some miracle breakthrough?” “At the heart of how this works is the field of recommendation algorithms,” David explained. “Sean hired me not because I knew anything about language analysis but because I was a leading competitor in the Netflix competition.


pages: 237 words: 64,411

Humans Need Not Apply: A Guide to Wealth and Work in the Age of Artificial Intelligence by Jerry Kaplan

Affordable Care Act / Obamacare, Amazon Web Services, asset allocation, autonomous vehicles, bank run, bitcoin, Bob Noyce, Brian Krebs, business cycle, buy low sell high, Capital in the Twenty-First Century by Thomas Piketty, combinatorial explosion, computer vision, Computing Machinery and Intelligence, corporate governance, crowdsourcing, driverless car, drop ship, Easter island, en.wikipedia.org, Erik Brynjolfsson, estate planning, Fairchild Semiconductor, Flash crash, Gini coefficient, Goldman Sachs: Vampire Squid, haute couture, hiring and firing, income inequality, index card, industrial robot, information asymmetry, invention of agriculture, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, Kiva Systems, Larry Ellison, Loebner Prize, Mark Zuckerberg, mortgage debt, natural language processing, Nick Bostrom, Own Your Own Home, pattern recognition, Satoshi Nakamoto, school choice, Schrödinger's Cat, Second Machine Age, self-driving car, sentiment analysis, short squeeze, Silicon Valley, Silicon Valley startup, Skype, software as a service, The Chicago School, The Future of Employment, Turing test, Vitalik Buterin, Watson beat the top human players on Jeopardy!, winner-take-all economy, women in the workforce, working poor, Works Progress Administration

Jason Brewster, the company’s CEO, estimates that FairDocument reduces the time required to complete a straightforward estate plan from several hours to as little as fifteen to thirty minutes, not to mention that his company is doing the prospecting for new clients and delivering them to the attorneys. A more sophisticated example of synthetic intellects encroaching on legal expertise is the startup Judicata.34 The company uses machine learning and natural language processing techniques to convert ordinary text—such as legal principles or specific cases— into structured information that can be used for finding relevant case law. For instance, it could find all cases in which a male Hispanic gay employee successfully sued for wrongful termination by reading the actual text of court decisions, saving countless hours in a law library or using a more traditional electronic search tool.


pages: 222 words: 70,132

Move Fast and Break Things: How Facebook, Google, and Amazon Cornered Culture and Undermined Democracy by Jonathan Taplin

"Friedman doctrine" OR "shareholder theory", "there is no alternative" (TINA), 1960s counterculture, affirmative action, Affordable Care Act / Obamacare, Airbnb, AlphaGo, Amazon Mechanical Turk, American Legislative Exchange Council, AOL-Time Warner, Apple's 1984 Super Bowl advert, back-to-the-land, barriers to entry, basic income, battle of ideas, big data - Walmart - Pop Tarts, Big Tech, bitcoin, Brewster Kahle, Buckminster Fuller, Burning Man, Clayton Christensen, Cody Wilson, commoditize, content marketing, creative destruction, crony capitalism, crowdsourcing, data is the new oil, data science, David Brooks, David Graeber, decentralized internet, don't be evil, Donald Trump, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, Elon Musk, equal pay for equal work, Erik Brynjolfsson, Fairchild Semiconductor, fake news, future of journalism, future of work, George Akerlof, George Gilder, Golden age of television, Google bus, Hacker Ethic, Herbert Marcuse, Howard Rheingold, income inequality, informal economy, information asymmetry, information retrieval, Internet Archive, Internet of things, invisible hand, Jacob Silverman, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: technological unemployment, John Perry Barlow, John von Neumann, Joseph Schumpeter, Kevin Kelly, Kickstarter, labor-force participation, Larry Ellison, life extension, Marc Andreessen, Mark Zuckerberg, Max Levchin, Menlo Park, Metcalfe’s law, military-industrial complex, Mother of all demos, move fast and break things, natural language processing, Network effects, new economy, Norbert Wiener, offshore financial centre, packet switching, PalmPilot, Paul Graham, paypal mafia, Peter Thiel, plutocrats, pre–internet, Ray Kurzweil, reality distortion field, recommendation engine, rent-seeking, revision control, Robert Bork, Robert Gordon, Robert Metcalfe, Ronald Reagan, Ross Ulbricht, Sam Altman, Sand Hill Road, secular stagnation, self-driving car, sharing economy, Silicon Valley, Silicon Valley ideology, Skinner box, smart grid, Snapchat, Social Justice Warrior, software is eating the world, Steve Bannon, Steve Jobs, Stewart Brand, tech billionaire, techno-determinism, technoutopianism, TED Talk, The Chicago School, the long tail, The Market for Lemons, The Rise and Fall of American Growth, Tim Cook: Apple, trade route, Tragedy of the Commons, transfer pricing, Travis Kalanick, trickle-down economics, Tyler Cowen, Tyler Cowen: Great Stagnation, universal basic income, unpaid internship, vertical integration, We are as Gods, We wanted flying cars, instead we got 140 characters, web application, Whole Earth Catalog, winner-take-all economy, women in the workforce, Y Combinator, you are the product

During the 2016 presidential campaign, Donald Trump regularly boasted about his ten million Twitter followers, even though (according to the site StatusPeople, which tracks how many Twitter accounts are bots, how many are inactive, and how many are real) only 21 percent of Trump’s Twitter followers are real, active users on the platform. Hillary Clinton didn’t fare much better, with only 30 percent of her followers classified as real. During the 2012 presidential race, the Annenberg Innovation Lab studied Twitter and politics, and what we found was pretty disturbing. We created a natural-language-processing computer model that read every tweet about every candidate and sorted them by sentiment. At the beginning I loved reading the dashboard of the twenty most positive and negative tweets of the previous hour. But within weeks the incredible amount of racist tweets directed at our president became too painful to look at.


pages: 224 words: 13,238

Electronic and Algorithmic Trading Technology: The Complete Guide by Kendall Kim

algorithmic trading, automated trading system, backtesting, Bear Stearns, business logic, commoditize, computerized trading, corporate governance, Credit Default Swap, diversification, en.wikipedia.org, family office, financial engineering, financial innovation, fixed income, index arbitrage, index fund, interest rate swap, linked data, market fragmentation, money market fund, natural language processing, proprietary trading, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, short selling, statistical arbitrage, Steven Levy, transaction costs, yield curve

At the moment, big strategic decisions such as which shares to buy or sell are made by human traders; algorithmic programs are then given the power to decide how to buy or sell shares, with the aim of hiding the client’s intentions. Executing algorithms are designed to be stealthy and create as little volatility as possible. The fact that they are designed to reduce the market impact of trades should in fact have a stabilizing effect in equity markets. Some day, advances in natural language processing and statistical analysis might lead to algorithms capable of analyzing news feeds, deciding which shares to buy and sell, and devising their own strategies. Broker dealers, software vendors, and now investment institutions are entering the algorithmic arms race. Since there are so many possible trading strategies, it is doubtful that there will turn out to be one single trading algorithm that outperforms all others.


pages: 265 words: 69,310

What's Yours Is Mine: Against the Sharing Economy by Tom Slee

4chan, Airbnb, Amazon Mechanical Turk, asset-backed security, barriers to entry, Benchmark Capital, benefit corporation, Berlin Wall, big-box store, bike sharing, bitcoin, blockchain, Californian Ideology, citizen journalism, collaborative consumption, commons-based peer production, congestion charging, Credit Default Swap, crowdsourcing, data acquisition, data science, David Brooks, democratizing finance, do well by doing good, don't be evil, Dr. Strangelove, emotional labour, Evgeny Morozov, gentrification, gig economy, Hacker Ethic, impact investing, income inequality, independent contractor, informal economy, invisible hand, Jacob Appelbaum, Jane Jacobs, Jeff Bezos, John Zimmer (Lyft cofounder), Kevin Roose, Khan Academy, Kibera, Kickstarter, license plate recognition, Lyft, machine readable, Marc Andreessen, Mark Zuckerberg, Max Levchin, move fast and break things, natural language processing, Netflix Prize, Network effects, new economy, Occupy movement, openstreetmap, Paul Graham, peer-to-peer, peer-to-peer lending, Peter Thiel, pre–internet, principal–agent problem, profit motive, race to the bottom, Ray Kurzweil, recommendation engine, rent control, ride hailing / ride sharing, sharing economy, Silicon Valley, Snapchat, software is eating the world, South of Market, San Francisco, TaskRabbit, TED Talk, the Cathedral and the Bazaar, the long tail, The Nature of the Firm, Thomas L Friedman, transportation-network company, Travis Kalanick, Tyler Cowen, Uber and Lyft, Uber for X, uber lyft, ultimatum game, urban planning, WeWork, WikiLeaks, winner-take-all economy, Y Combinator, Yochai Benkler, Zipcar

In another experiment, Airbnb staff are working with external researchers to test whether offering a reward to encourage reviews has any effect on the number of critical reviews that guests ­provide.24 Other efforts are trying to squeeze more critical information from what is already there. Airbnb is using natural language processing to parse critical comments from review texts.25 Researchers have shown that taking missing reviews into account can give a much more effective measure of seller quality.26 The problem with such efforts is that, if systems were changed so that missing reviews or passive-aggressive text comments were known to be recorded (and so became, implicitly, a negative review) customer behavior may change to avoid the threat of a negative (non-) review in return.


pages: 244 words: 66,977

Subscribed: Why the Subscription Model Will Be Your Company's Future - and What to Do About It by Tien Tzuo, Gabe Weisert

3D printing, Airbnb, airport security, Amazon Web Services, augmented reality, autonomous vehicles, Big Tech, bike sharing, blockchain, Brexit referendum, Build a better mousetrap, business cycle, business intelligence, business process, call centre, cloud computing, cognitive dissonance, connected car, data science, death of newspapers, digital nomad, digital rights, digital twin, double entry bookkeeping, Elon Musk, factory automation, fake news, fiat currency, Ford Model T, fulfillment center, growth hacking, hockey-stick growth, Internet of things, inventory management, iterative process, Jeff Bezos, John Zimmer (Lyft cofounder), Kevin Kelly, Lean Startup, Lyft, manufacturing employment, Marc Benioff, Mary Meeker, megaproject, minimum viable product, natural language processing, Network effects, Nicholas Carr, nuclear winter, pets.com, planned obsolescence, pneumatic tube, profit maximization, race to the bottom, ride hailing / ride sharing, Salesforce, Sand Hill Road, shareholder value, Silicon Valley, skunkworks, smart meter, social graph, software as a service, spice trade, Steve Ballmer, Steve Jobs, subscription business, systems thinking, tech worker, TED Talk, Tim Cook: Apple, transport as a service, Uber and Lyft, uber lyft, WeWork, Y2K, Zipcar

IBM was #61 on the Fortune 500 list in 1955, and it’s #32 on the list today. IBM originally sold commercial scales and punch card tabulators. Today it sells IT and quantum computing services. It has completely transformed from a product manufacturer into a business services giant. IBM is now working on Watson—a technology platform that uses natural language processing and machine learning to reveal insights from large amounts of unstructured data. It has Bob Dylan chatting with an artificial intelligence system in its advertisements. It is now in the business of cognitive services—a pretty exciting departure from where the company started. In fact, 12 percent of the companies on the 1955 Fortune 500 list are still on it today, and most of them have similarly transformed.


pages: 296 words: 66,815

The AI-First Company by Ash Fontana

23andMe, Amazon Mechanical Turk, Amazon Web Services, autonomous vehicles, barriers to entry, blockchain, business intelligence, business process, business process outsourcing, call centre, Charles Babbage, chief data officer, Clayton Christensen, cloud computing, combinatorial explosion, computer vision, crowdsourcing, data acquisition, data science, deep learning, DevOps, en.wikipedia.org, Geoffrey Hinton, independent contractor, industrial robot, inventory management, John Conway, knowledge economy, Kubernetes, Lean Startup, machine readable, minimum viable product, natural language processing, Network effects, optical character recognition, Pareto efficiency, performance metric, price discrimination, recommendation engine, Ronald Coase, Salesforce, single source of truth, software as a service, source of truth, speech recognition, the scientific method, transaction costs, vertical integration, yield management

The factory manager’s job is to find efficiencies along the production line. Tools Labeling often requires engineers to clean data before applying the labels. For example, it is very hard for a machine to learn patterns across the text in millions of customer service emails. In this case, an engineer may use a natural language processing technique—an area of ML focused on understanding text—to locate the segments of these emails where customers mention specific products, then she will cluster the text to build up categories of complaints about those products. These categories are then used to label all the new emails that come in so that the machine can learn how to respond to complaints in different categories.


pages: 234 words: 67,589

Internet for the People: The Fight for Our Digital Future by Ben Tarnoff

4chan, A Declaration of the Independence of Cyberspace, accounting loophole / creative accounting, Alan Greenspan, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic management, AltaVista, Amazon Web Services, barriers to entry, Bernie Sanders, Big Tech, Black Lives Matter, blue-collar work, business logic, call centre, Charles Babbage, cloud computing, computer vision, coronavirus, COVID-19, decentralized internet, deep learning, defund the police, deindustrialization, desegregation, digital divide, disinformation, Edward Snowden, electricity market, fake news, Filter Bubble, financial intermediation, future of work, gamification, General Magic , gig economy, God and Mammon, green new deal, independent contractor, information asymmetry, Internet of things, Jeff Bezos, Jessica Bruder, John Markoff, John Perry Barlow, Kevin Roose, Kickstarter, Leo Hollis, lockdown, lone genius, low interest rates, Lyft, Mark Zuckerberg, means of production, Menlo Park, natural language processing, Network effects, Nicholas Carr, packet switching, PageRank, pattern recognition, pets.com, profit maximization, profit motive, QAnon, recommendation engine, rent-seeking, ride hailing / ride sharing, Sheryl Sandberg, Shoshana Zuboff, side project, Silicon Valley, single-payer health, smart grid, social distancing, Steven Levy, stock buybacks, supply-chain management, surveillance capitalism, techlash, Telecommunications Act of 1996, TikTok, transportation-network company, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, undersea cable, UUNET, vertical integration, Victor Gruen, web application, working poor, Yochai Benkler

This revival was made possible by a number of factors, foremost among them advances in computing power and the abundance of training data that could be sourced from the internet. Deep learning is the paradigm that underlies much of what is currently known as “artificial intelligence,” and has centrally contributed to significant breakthroughs in computer vision and natural language processing. See Andrey Kurenkov, “A Brief History of Neural Nets and Deep Learning,” Skynet Today, September 27, 2020, and Alex Hanna et al., “Lines of Sight,” Logic, December 20, 2020. 109, The sophistication of these systems … “Data imperative”: Marion Fourcade and Kieran Healy, “Seeing Like a Market,” Socio-Economic Review 15, no. 1 (2017): 9–29. 110, The same individual … Smartphone usage: “Mobile Fact Sheet,” April 7, 2021, Pew Research Center.


pages: 1,331 words: 183,137

Programming Rust: Fast, Safe Systems Development by Jim Blandy, Jason Orendorff

bioinformatics, bitcoin, Donald Knuth, duck typing, Elon Musk, Firefox, fizzbuzz, functional programming, mandelbrot fractal, Morris worm, MVC pattern, natural language processing, reproducible builds, side project, sorting algorithm, speech recognition, Turing test, type inference, WebSocket

They make it possible to expand your toolbox, to hack multiple styles of multithreaded code quickly and correctly—without skepticism, without cynicism, without fear. Fork-Join Parallelism The simplest use cases for threads arise when we have several completely independent tasks that we’d like to do at once. For example, suppose we’re doing natural language processing on a large corpus of documents. We could write a loop: fn process_files(filenames: Vec<String>) -> io::Result<()> { for document in filenames { let text = load(&document)?; // read source file let results = process(text); // compute statistics save(&document, results)?; // write output file } Ok(()) } The program would run as shown in Figure 19-1.

The program always produces the same result, regardless of variations in thread speed. It’s a concurrency model without race conditions. The main disadvantage of fork-join is that it requires isolated units of work. Later in this chapter, we’ll consider some problems that don’t split up so cleanly. For now, let’s stick with the natural language processing example. We’ll show a few ways of applying the fork-join pattern to the process_files function. spawn and join The function std::thread::spawn starts a new thread. spawn(|| { println!("hello from a child thread"); }) It takes one argument, a FnOnce closure or function. Rust starts a new thread to run the code of that closure or function.


pages: 931 words: 79,142

Concepts, Techniques, and Models of Computer Programming by Peter Van-Roy, Seif Haridi

computer age, Debian, discrete time, Donald Knuth, Eratosthenes, fault tolerance, functional programming, G4S, general-purpose programming language, George Santayana, John von Neumann, Lao Tzu, Menlo Park, natural language processing, NP-complete, Paul Graham, premature optimization, sorting algorithm, the Cathedral and the Bazaar, Therac-25, Turing complete, Turing machine, type inference

Sections 9.4 through 9.6 give large examples in three areas that are particularly well-suited to relational programming, namely natural language parsing, interpreters, and deductive databases. Section 9.7 gives an introduction to Prolog, a programming language based on relational programming. Prolog was originally designed for natural language processing, but has become one of the main programming languages in all areas that require symbolic programming. 9.1 The relational computation model 9.1.1 The choice and fail statements The relational computation model extends the declarative model with two new statements, choice and fail: The choice statement groups together a set of alternative statements.

Prolog is generally used in application areas in which complex symbolic manipulations are needed, such as expert systems, specialized language translators, program generation, data transformation, knowledge processing, deductive databases, and theorem proving. There are two application areas in which Prolog is still predominant over other languages: natural language processing and constraint programming. The latter in particular has matured from being a subfield of logic programming into being a field in its own right, with conferences, practical systems, and industrial applications. Prolog has many advantages for such applications. The bulk of programming can be done cleanly in its pure declarative subset.

Go To statement considered harmful. Communications of the ACM, 11(3):147–148, March 1968. Denys Duchier. Loop support. Technical report, Mozart Consortium, 2003. Available at http://www.mozart-oz.org/. [56] Denys Duchier, Claire Gardent, and Joachim Niehren. Concurrent constraint programming in Oz for natural language processing. Technical report, Saarland University, Saarbrücken, Germany, 1999. Available at http://www.ps.uni-sb.de/Papers/abstracts/oznlp.html. [57] Denys Duchier, Leif Kornstaedt, and Christian Schulte. The Oz base environment. Technical report, Mozart Consortium, 2003. Available at http://www.mozart-oz.org/


pages: 267 words: 72,552

Reinventing Capitalism in the Age of Big Data by Viktor Mayer-Schönberger, Thomas Ramge

accounting loophole / creative accounting, Air France Flight 447, Airbnb, Alvin Roth, Apollo 11, Atul Gawande, augmented reality, banking crisis, basic income, Bayesian statistics, Bear Stearns, behavioural economics, bitcoin, blockchain, book value, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, Cass Sunstein, centralized clearinghouse, Checklist Manifesto, cloud computing, cognitive bias, cognitive load, conceptual framework, creative destruction, Daniel Kahneman / Amos Tversky, data science, Didi Chuxing, disruptive innovation, Donald Trump, double entry bookkeeping, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, flying shuttle, Ford Model T, Ford paid five dollars a day, Frederick Winslow Taylor, fundamental attribution error, George Akerlof, gig economy, Google Glasses, Higgs boson, information asymmetry, interchangeable parts, invention of the telegraph, inventory management, invisible hand, James Watt: steam engine, Jeff Bezos, job automation, job satisfaction, joint-stock company, Joseph Schumpeter, Kickstarter, knowledge worker, labor-force participation, land reform, Large Hadron Collider, lone genius, low cost airline, low interest rates, Marc Andreessen, market bubble, market design, market fundamentalism, means of production, meta-analysis, Moneyball by Michael Lewis explains big data, multi-sided market, natural language processing, Neil Armstrong, Network effects, Nick Bostrom, Norbert Wiener, offshore financial centre, Parag Khanna, payday loans, peer-to-peer lending, Peter Thiel, Ponzi scheme, prediction markets, price anchoring, price mechanism, purchasing power parity, radical decentralization, random walk, recommendation engine, Richard Thaler, ride hailing / ride sharing, Robinhood: mobile stock trading app, Sam Altman, scientific management, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, smart grid, smart meter, Snapchat, statistical model, Steve Jobs, subprime mortgage crisis, Suez canal 1869, tacit knowledge, technoutopianism, The Future of Employment, The Market for Lemons, The Nature of the Firm, transaction costs, universal basic income, vertical integration, William Langewiesche, Y Combinator

The system works in exactly the same way as the machine learning systems we described in Chapter 4: customers don’t have to make their needs and wants explicit, because the systems learn from how humans interact with the world around them. Feedback also plays a crucial role at Stitch Fix. To begin with, every item a customer returns generates data. But customers are strongly encouraged to comment on each item they receive and they can do so in plain English, which, with the help of natural-language processing software, further refines a customer’s preferences. Stitch Fix is also developing its own line of apparel, which uses preference data in the design process. Stitch Fix’s simple secret is that it understands data-rich markets and the crucial role data plays in customer satisfaction. As they put it: “Rich data on both sides of this ‘market’ enables Stitch Fix to be a matchmaker, connecting clients with styles they love (and never would’ve found on their own).”


pages: 589 words: 69,193

Mastering Pandas by Femi Anthony

Amazon Web Services, Bayesian statistics, correlation coefficient, correlation does not imply causation, data science, Debian, en.wikipedia.org, Internet of things, Large Hadron Collider, natural language processing, p-value, power law, random walk, side project, sparse data, statistical model, Thomas Bayes

Python has libraries that provide a complete toolkit for data science and analysis. The major ones are as follows: NumPy: The general-purpose array functionality with emphasis on numeric computation SciPy: Numerical computing Matplotlib: Graphics pandas: Series and data frames (1D and 2D array-like types) Scikit-Learn: Machine learning NLTK: Natural language processing Statstool: Statistical analysis For this book, we will be focusing on the 4th library listed in the preceding list, pandas. What is pandas? The pandas is a high-performance open source library for data analysis in Python developed by Wes McKinney in 2008. Over the years, it has become the de-facto standard library for data analysis using Python.


pages: 268 words: 75,850

The Formula: How Algorithms Solve All Our Problems-And Create More by Luke Dormehl

3D printing, algorithmic bias, algorithmic trading, Alvin Toffler, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, big data - Walmart - Pop Tarts, call centre, Cass Sunstein, classic study, Clayton Christensen, commoditize, computer age, death of newspapers, deferred acceptance, disruptive innovation, Edward Lorenz: Chaos theory, Erik Brynjolfsson, Evgeny Morozov, Filter Bubble, Flash crash, Florence Nightingale: pie chart, Ford Model T, Frank Levy and Richard Murnane: The New Division of Labor, fulfillment center, Google Earth, Google Glasses, High speed trading, Internet Archive, Isaac Newton, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Kevin Kelly, Kodak vs Instagram, Lewis Mumford, lifelogging, machine readable, machine translation, Marshall McLuhan, means of production, Nate Silver, natural language processing, Netflix Prize, Panopticon Jeremy Bentham, Paradox of Choice, pattern recognition, price discrimination, recommendation engine, Richard Thaler, Rosa Parks, scientific management, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Slavoj Žižek, social graph, speech recognition, stable marriage problem, Steve Jobs, Steven Levy, Steven Pinker, Stewart Brand, technological determinism, technological solutionism, TED Talk, the long tail, the scientific method, The Signal and the Noise by Nate Silver, upwardly mobile, Wall-E, Watson beat the top human players on Jeopardy!, Y Combinator

If such a tool was to be implemented within a future edition of MS Word or Google Docs, it is not inconceivable that users may one day finish typing a document and hit a single button—at which point it is auto-checked for spelling, punctuation, formatting and truthfulness. Already there is widespread use of algorithms in academia for sifting through submitted work and pulling up passages that may or may not be plagiarized. These will only become more widespread as natural language processing becomes more intuitive and able to move beyond simple passage comparison to detailed content and idea analysis. There is no one-size-fits-all answer to how best to deal with algorithms. In some cases, increased transparency would appear to be the answer. Where algorithms are used to enforce laws, for instance, releasing the source code to the general public would both protect against the dangers of unchecked government policy-making and make it possible to determine how specific decisions have been reached.


pages: 256 words: 73,068

12 Bytes: How We Got Here. Where We Might Go Next by Jeanette Winterson

"Margaret Hamilton" Apollo, "World Economic Forum" Davos, 3D printing, Ada Lovelace, Airbnb, Albert Einstein, Alignment Problem, Amazon Mechanical Turk, Anthropocene, Apollo 11, Apple's 1984 Super Bowl advert, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, basic income, Big Tech, bitcoin, Bletchley Park, blockchain, Boston Dynamics, call centre, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, cashless society, Charles Babbage, computer age, Computing Machinery and Intelligence, coronavirus, COVID-19, CRISPR, cryptocurrency, dark matter, Dava Sobel, David Graeber, deep learning, deskilling, digital rights, discovery of DNA, Dominic Cummings, Donald Trump, double helix, driverless car, Elon Musk, fake news, flying shuttle, friendly AI, gender pay gap, global village, Grace Hopper, Gregor Mendel, hive mind, housing crisis, Internet of things, Isaac Newton, Jacquard loom, James Hargreaves, Jeff Bezos, Johannes Kepler, John von Neumann, Joseph-Marie Jacquard, Kickstarter, Large Hadron Collider, life extension, lockdown, lone genius, Mark Zuckerberg, means of production, microdosing, more computing power than Apollo, move fast and break things, natural language processing, Nick Bostrom, Norbert Wiener, off grid, OpenAI, operation paperclip, packet switching, Peter Thiel, pink-collar, Plato's cave, public intellectual, QAnon, QWERTY keyboard, Ray Kurzweil, rewilding, ride hailing / ride sharing, Rutger Bregman, Sam Altman, self-driving car, sharing economy, Sheryl Sandberg, Shoshana Zuboff, Silicon Valley, Skype, Snapchat, SoftBank, SpaceX Starlink, speech recognition, spinning jenny, stem cell, Stephen Hawking, Steve Bannon, Steve Jobs, Steven Levy, Steven Pinker, superintelligent machines, surveillance capitalism, synthetic biology, systems thinking, tech billionaire, tech worker, TED Talk, telepresence, telepresence robot, TikTok, trade route, Turing test, universal basic income, Virgin Galactic, Watson beat the top human players on Jeopardy!, women in the workforce, Y Combinator

Chatbots – software applications designed to mimic human interaction, either via speech or text – are ubiquitous. Usually we encounter them on response messages, asking us what’s the problem with our washing machine, or that our parcel is on the back porch, or how did we rate Pavel who just delivered a pizza? Chatbots use Natural Language Processing (NLP) to communicate with humans in a specific and limited way. These speech-recognition systems attempt to work out what it is you want. How can I help you? Problems start when we humans try to explain what it is we want. For instance: ‘Do you sell black shoes?’ is fine. But if you type, ‘Do you have black shoes?’


pages: 263 words: 77,786

Tomorrow's Capitalist: My Search for the Soul of Business by Alan Murray

"Friedman doctrine" OR "shareholder theory", "World Economic Forum" Davos, activist fund / activist shareholder / activist investor, Airbnb, Alan Greenspan, Alvin Toffler, Berlin Wall, Bernie Sanders, Big Tech, Black Lives Matter, blockchain, Boris Johnson, call centre, carbon footprint, commoditize, coronavirus, corporate governance, corporate raider, corporate social responsibility, COVID-19, creative destruction, Credit Default Swap, decarbonisation, digital divide, disinformation, disruptive innovation, do well by doing good, don't be evil, Donald Trump, Ferguson, Missouri, financial innovation, Francis Fukuyama: the end of history, Frederick Winslow Taylor, future of work, gentrification, George Floyd, global pandemic, Greta Thunberg, gun show loophole, impact investing, income inequality, intangible asset, invisible hand, Jeff Bezos, job automation, knowledge worker, lockdown, London Whale, low interest rates, Marc Benioff, Mark Zuckerberg, market fundamentalism, means of production, minimum wage unemployment, natural language processing, new economy, old-boy network, price mechanism, profit maximization, remote working, risk-adjusted returns, Ronald Reagan, Salesforce, scientific management, shareholder value, side hustle, Silicon Valley, social distancing, Social Responsibility of Business Is to Increase Its Profits, The Future of Employment, the payments system, The Wealth of Nations by Adam Smith, Tim Cook: Apple, Washington Consensus, women in the workforce, work culture , working poor, zero-sum game

“When people went through the pandemic, the first thing they invested in was resiliency and making sure people could work remotely,” he told me.22 Shortage of good AI talent was also a restrictive factor. But the slowdown may just be a pause before the breakthrough. “Almost all companies recognize AI is going to improve the customer experience, automation inside the enterprise, and everything around natural language processing. Those are the three areas where we see huge amounts of interest. People were pausing because they were squeezing ten years of digital transformation into two years.” But now that that’s done, AI looks to be the second wave. “Significant investments are planned,” Krishna says. The world is watching to see how this story develops.


pages: 301 words: 85,263

New Dark Age: Technology and the End of the Future by James Bridle

AI winter, Airbnb, Alfred Russel Wallace, AlphaGo, Anthropocene, Automated Insights, autonomous vehicles, back-to-the-land, Benoit Mandelbrot, Bernie Sanders, bitcoin, Boeing 747, British Empire, Brownian motion, Buckminster Fuller, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, coastline paradox / Richardson effect, cognitive bias, cognitive dissonance, combinatorial explosion, computer vision, congestion charging, cryptocurrency, data is the new oil, disinformation, Donald Trump, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dr. Strangelove, drone strike, Edward Snowden, Eyjafjallajökull, Fairchild Semiconductor, fake news, fear of failure, Flash crash, fulfillment center, Google Earth, Greyball, Haber-Bosch Process, Higgs boson, hive mind, income inequality, informal economy, Internet of things, Isaac Newton, ITER tokamak, James Bridle, John von Neumann, Julian Assange, Kickstarter, Kim Stanley Robinson, Large Hadron Collider, late capitalism, Laura Poitras, Leo Hollis, lone genius, machine translation, mandelbrot fractal, meta-analysis, Minecraft, mutually assured destruction, natural language processing, Network effects, oil shock, p-value, pattern recognition, peak oil, recommendation engine, road to serfdom, Robert Mercer, Ronald Reagan, security theater, self-driving car, Seymour Hersh, Silicon Valley, Silicon Valley ideology, Skype, social graph, sorting algorithm, South China Sea, speech recognition, Spread Networks laid a new fibre optics cable between New York and Chicago, stem cell, Stuxnet, technoutopianism, the built environment, the scientific method, Uber for X, undersea cable, University of East Anglia, uranium enrichment, Vannevar Bush, warehouse robotics, WikiLeaks

‘HP cameras are racist’, YouTube video, username: wzamen01, December 10, 2009. 14.David Smith, ‘“Racism” of early colour photography explored in art exhibition’, Guardian, January 25, 2013, theguardian.com. 15.Phillip Martin, ‘How A Cambridge Woman’s Campaign Against Polaroid Weakened Apartheid’, WGBH News, December 9, 2013, news.wgbh.org. 16.Hewlett-Packard, ‘Global Citizenship Report 2009’, hp.com. 17.Trevor Paglen, ‘re:publica 2017 | Day 3 – Livestream Stage 1 – English’, YouTube video, username: re:publica, May 10, 2017. 18.Walter Benjamin, ‘Theses on the Philosophy of History’, in Walter Benjamin: Selected Writings, Volume 4: 1938–1940, Cambridge, MA: Harvard University Press, 2006. 19.PredPol, ‘5 Common Myths about Predictive Policing’, predpol.com. 20.G. O. Mohler, M. B. Short, P. J. Brantingham, et al., ‘Self-exciting point process modeling of crime’, JASA 106 (2011). 21.Daniel Jurafsky and James H. Martin, Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edition, Upper Saddle River, NJ: Prentice Hall, 2009. 22.Walter Benjamin, ‘The Task of the Translator’, in Selected Writings Volume 1 1913–1926, Marcus Bullock and Michael W. Jennings, eds, Cambridge, MA and London: Belknap Press, 1996. 23.Murat Nemet-Nejat, ‘Translation: Contemplating Against the Grain’, Cipher, 1999, cipherjournal.com. 24.Tim Adams, ‘Can Google break the computer language barrier?’


pages: 284 words: 84,169

Talk on the Wild Side by Lane Greene

Affordable Care Act / Obamacare, Albert Einstein, Boris Johnson, deep learning, Donald Trump, ending welfare as we know it, experimental subject, facts on the ground, fake news, framing effect, Google Chrome, Higgs boson, illegal immigration, invisible hand, language acquisition, Large Hadron Collider, machine translation, meta-analysis, Money creation, moral panic, natural language processing, obamacare, public intellectual, Ronald Reagan, Sapir-Whorf hypothesis, Snapchat, sparse data, speech recognition, Steven Pinker, TED Talk, Turing test, Wall-E

The rules can be added on for the tricky cases, at the appropriate age, but we should never confuse an explicit knowledge of rules (“this is what a relative clause looks like”) with an ability to write. Lousy writing can be grammatical; good writing can have errors. Computer scientists who work in natural-languages processing are exploring best-of-both-worlds systems, for translation, parsing and other applications. They are combining newer-fangled machine learning with explicit rule coding. Educators should do the same, researching which things are best learned by experience, and which are best learned by rule.


pages: 276 words: 81,153

Outnumbered: From Facebook and Google to Fake News and Filter-Bubbles – the Algorithms That Control Our Lives by David Sumpter

affirmative action, algorithmic bias, AlphaGo, Bernie Sanders, Brexit referendum, Cambridge Analytica, classic study, cognitive load, Computing Machinery and Intelligence, correlation does not imply causation, crowdsourcing, data science, DeepMind, Demis Hassabis, disinformation, don't be evil, Donald Trump, Elon Musk, fake news, Filter Bubble, Geoffrey Hinton, Google Glasses, illegal immigration, James Webb Space Telescope, Jeff Bezos, job automation, Kenneth Arrow, Loebner Prize, Mark Zuckerberg, meta-analysis, Minecraft, Nate Silver, natural language processing, Nelson Mandela, Nick Bostrom, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, p-value, post-truth, power law, prediction markets, random walk, Ray Kurzweil, Robert Mercer, selection bias, self-driving car, Silicon Valley, Skype, Snapchat, social contagion, speech recognition, statistical model, Stephen Hawking, Steve Bannon, Steven Pinker, TED Talk, The Signal and the Noise by Nate Silver, traveling salesman, Turing test

and other Internet giants need to build systems that automatically track political changes, football transfer rumours and contestants in The Voice. The algorithms need to learn to understand new analogies and concepts by reading newspapers, checking Wikipedia and following social media. Jeffrey Pennington and his colleagues at the Stanford Natural Language Processing Group, have found an elegant way of training an algorithm to learn about analogies from web pages. Their algorithm, known as GloVe (global vectors for word representation), learns by reading a very large amount of text. In a 2014 article, Jeffrey trained GloVe on the whole of Wikipedia, which at that point totalled 1.6 billion words and symbols, together with the fifth edition of Gigaword, which is a database of 4.3 billion words and symbols downloaded from news sites around the world.


pages: 292 words: 85,151

Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It) by Salim Ismail, Yuri van Geest

23andMe, 3D printing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, anti-fragile, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, behavioural economics, Ben Horowitz, bike sharing, bioinformatics, bitcoin, Black Swan, blockchain, Blue Ocean Strategy, book value, Burning Man, business intelligence, business process, call centre, chief data officer, Chris Wanstrath, circular economy, Clayton Christensen, clean water, cloud computing, cognitive bias, collaborative consumption, collaborative economy, commoditize, corporate social responsibility, cross-subsidies, crowdsourcing, cryptocurrency, dark matter, data science, Dean Kamen, deep learning, DeepMind, dematerialisation, discounted cash flows, disruptive innovation, distributed ledger, driverless car, Edward Snowden, Elon Musk, en.wikipedia.org, Ethereum, ethereum blockchain, fail fast, game design, gamification, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, hiring and firing, holacracy, Hyperloop, industrial robot, Innovator's Dilemma, intangible asset, Internet of things, Iridium satellite, Isaac Newton, Jeff Bezos, Joi Ito, Kevin Kelly, Kickstarter, knowledge worker, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, lifelogging, loose coupling, loss aversion, low earth orbit, Lyft, Marc Andreessen, Mark Zuckerberg, market design, Max Levchin, means of production, Michael Milken, minimum viable product, natural language processing, Netflix Prize, NetJets, Network effects, new economy, Oculus Rift, offshore financial centre, PageRank, pattern recognition, Paul Graham, paypal mafia, peer-to-peer, peer-to-peer model, Peter H. Diamandis: Planetary Resources, Peter Thiel, Planet Labs, prediction markets, profit motive, publish or perish, radical decentralization, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, Ronald Coase, Rutger Bregman, Salesforce, Second Machine Age, self-driving car, sharing economy, Silicon Valley, skunkworks, Skype, smart contracts, Snapchat, social software, software is eating the world, SpaceShipOne, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, Steve Jurvetson, subscription business, supply-chain management, synthetic biology, TaskRabbit, TED Talk, telepresence, telepresence robot, the long tail, Tony Hsieh, transaction costs, Travis Kalanick, Tyler Cowen, Tyler Cowen: Great Stagnation, uber lyft, urban planning, Virgin Galactic, WikiLeaks, winner-take-all economy, X Prize, Y Combinator, zero-sum game

For example, the Hewlett Foundation sponsored a 2012 competition to develop an automated scoring algorithm for student-written essays. Of the 155 teams competing, three were awarded a total of $100,000 in prize money. What was particularly interesting was the fact that none of the winners had prior experience with natural language processing (NLP). Nonetheless, they beat the experts, many of them with decades of experience in NLP under their belts. This can’t help but impact the current status quo. Raymond McCauley, Biotechnology & Bioinformatics Chair at Singularity University, has noticed that “When people want a biotech job in Silicon Valley, they hide their PhDs to avoid being seen as a narrow specialist.”


pages: 251 words: 80,831

Super Founders: What Data Reveals About Billion-Dollar Startups by Ali Tamaseb

"World Economic Forum" Davos, 23andMe, additive manufacturing, Affordable Care Act / Obamacare, Airbnb, Anne Wojcicki, asset light, barriers to entry, Ben Horowitz, Benchmark Capital, bitcoin, business intelligence, buy and hold, Chris Wanstrath, clean water, cloud computing, coronavirus, corporate governance, correlation does not imply causation, COVID-19, cryptocurrency, data science, discounted cash flows, diversified portfolio, Elon Musk, Fairchild Semiconductor, game design, General Magic , gig economy, high net worth, hiring and firing, index fund, Internet Archive, Jeff Bezos, John Zimmer (Lyft cofounder), Kickstarter, late fees, lockdown, Lyft, Marc Andreessen, Marc Benioff, Mark Zuckerberg, Max Levchin, Mitch Kapor, natural language processing, Network effects, nuclear winter, PageRank, PalmPilot, Parker Conrad, Paul Buchheit, Paul Graham, peer-to-peer lending, Peter Thiel, Planet Labs, power law, QR code, Recombinant DNA, remote working, ride hailing / ride sharing, robotic process automation, rolodex, Ruby on Rails, Salesforce, Sam Altman, Sand Hill Road, self-driving car, shareholder value, sharing economy, side hustle, side project, Silicon Valley, Silicon Valley startup, Skype, Snapchat, SoftBank, software as a service, software is eating the world, sovereign wealth fund, Startup school, Steve Jobs, Steve Wozniak, survivorship bias, TaskRabbit, telepresence, the payments system, TikTok, Tony Fadell, Tony Hsieh, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, ubercab, web application, WeWork, work culture , Y Combinator

In this book, data from billion-dollar companies is compared to data from a baseline group of startups that did not become unicorns. Between 2005 and the end of 2018—the time period of my dataset—about twenty thousand startups were founded that raised at least $3 million each in funding. In the future, it might be possible to use artificial intelligence and natural language processing to automatically collect data on each of these companies. For now, though, it mostly had to be done by hand—for example, to determine competitors or defensibility factors. I manually collected the majority of the data on each of the startups in my study. This required a combination of judgement and pretty extensive research, so collecting data on all twenty thousand would have proven impractical.


Programming Python by Mark Lutz

Benevolent Dictator For Life (BDFL), Build a better mousetrap, business logic, business process, cloud computing, Firefox, general-purpose programming language, Google Chrome, Guido van Rossum, iterative process, linear programming, loose coupling, machine readable, MVC pattern, natural language processing, off grid, slashdot, sorting algorithm, web application

For more on YAPPS, see http://theory.stanford.edu/~amitp/Yapps or search the Web at large. Natural language processing Even more demanding language analysis tasks require techniques developed in artificial intelligence research, such as semantic analysis and machine learning. For instance, the Natural Language Toolkit, or NLTK, is an open source suite of Python libraries and programs for symbolic and statistical natural language processing. It applies linguistic techniques to textual data, and it can be used in the development of natural language recognition software and systems. For much more on this subject, be sure to also see the O’Reilly book Natural Language Processing with Python, which explores, among other things, ways to use NLTK in Python.

Strategies for Processing Text in Python In the grand scheme of things, there are a variety of ways to handle text processing and language analysis in Python: Expressions Built-in string object expressions Methods Built-in string object method calls Patterns Regular expression pattern matching Parsers: markup XML and HTML text parsing Parsers: grammars Custom language parsers, both handcoded and generated Embedding Running Python code with eval and exec built-ins And more Natural language processing For simpler tasks, Python’s built-in string object is often all we really need. Python strings can be indexed, concatenated, sliced, and processed with both string method calls and built-in functions. Our main emphasis in this chapter is mostly on higher-level tools and techniques for analyzing textual information and language, but we’ll briefly explore each of these techniques in turn.

GIL and, A process-based alternative: multiprocessing (ahead) implementation, Implementation and usage rules IPC support, Interprocess Communication, IPC Tools: Pipes, Shared Memory, and Queues, Queues and subclassing launching GUIs as programs, Launching GUIs as programs other ways: multiprocessing, Launching GUIs as programs other ways: multiprocessing processes and locks, The Basics: Processes and Locks, Implementation and usage rules socket server portability and, Why multiprocessing doesn’t help with socket server portability, Why multiprocessing doesn’t help with socket server portability starting independent programs, Starting Independent Programs usage rules, Implementation and usage rules Musciano, Chuck, “Oh, What a Tangled Web We Weave” MVC (model-view-controller) structure, Python Internet Development Options mysql-python interface, Persistence Options in Python N name conventions, File name conventions, Installing CGI scripts CGI scripts, Installing CGI scripts files, File name conventions __name__ variable, Using Programs in Two Ways named pipes, Interprocess Communication, Anonymous Pipes, Named Pipes (Fifos), Named pipe basics, Named pipe basics, Named pipe use cases basic functionality, Named pipe basics, Named pipe basics creating, Named Pipes (Fifos) defined, Interprocess Communication, Anonymous Pipes use cases, Named pipe use cases namespaces, Running Code Strings with Results and Namespaces, Running Code Strings with Results and Namespaces, Running Strings in Dictionaries creating, Running Strings in Dictionaries running code strings with, Running Code Strings with Results and Namespaces, Running Code Strings with Results and Namespaces natural language processing, Advanced Language Tools nested structures, Nested structures, Uploading Local Trees, Uploading Local Trees, Pickled Objects, Pickling in Action dictionaries, Nested structures pickling, Pickled Objects, Pickling in Action uploading local trees, Uploading Local Trees, Uploading Local Trees Network News Transfer Protocol (NNTP), NNTP: Accessing Newsgroups, NNTP: Accessing Newsgroups, More Than One Way to Push Bits over the Net network scripting, Python Internet Development Options, Python Internet Development Options, The Socket Layer, Machine identifiers, The Protocol Layer, Protocol structures, Python’s Internet Library Modules, Python’s Internet Library Modules, Socket Programming, Binding reserved port servers, Handling Multiple Clients, Summary: Choosing a Server Scheme, Making Sockets Look Like Files and Streams, Sockets versus command pipes, A Simple Python File Server, Using a reusable form-layout class development options, Python Internet Development Options, Python Internet Development Options handling multiple clients, Handling Multiple Clients, Summary: Choosing a Server Scheme library modules and, Python’s Internet Library Modules, Python’s Internet Library Modules making sockets look like files/streams, Making Sockets Look Like Files and Streams, Sockets versus command pipes protocols and, The Protocol Layer, Protocol structures Python file server, A Simple Python File Server, Using a reusable form-layout class sockets and, The Socket Layer, Machine identifiers, Socket Programming, Binding reserved port servers newsgroups, NNTP: Accessing Newsgroups, NNTP: Accessing Newsgroups, Ideas for Improvement accessing, NNTP: Accessing Newsgroups, NNTP: Accessing Newsgroups handling messages, Ideas for Improvement NLTK suite, Advanced Language Tools NNTP (Network News Transfer Protocol), NNTP: Accessing Newsgroups, NNTP: Accessing Newsgroups, More Than One Way to Push Bits over the Net nntplib module, Python’s Internet Library Modules, NNTP: Accessing Newsgroups, NNTP: Accessing Newsgroups numeric tools, A Quick Geometry Lesson NumPy programming extension, A Quick Geometry Lesson, Extending and Embedding O object references, Deferring Calls with Lambdas and Object References, Deferring Calls with Lambdas and Object References, Reloading Callback Handlers Dynamically callback handlers as, Reloading Callback Handlers Dynamically deferring calls, Deferring Calls with Lambdas and Object References, Deferring Calls with Lambdas and Object References object relational mappers, Other Database Options (see ORMs) Object Request Broker (ORB), Python Internet Development Options object types, storing in shelves, Storing Built-in Object Types in Shelves object-oriented databases (OODBs), Persistence Options in Python object-oriented programming, Step 3: Stepping Up to OOP (see OOP) objects, Step 1: Sharing Objects Between Pages—A New Input Form, Step 1: Sharing Objects Between Pages—A New Input Form, Persistence Options in Python, Pickled Objects, Pickle Details: Protocols, Binary Modes, and _pickle, Pickled Objects, Changing Classes of Objects Stored in Shelves, Objects are unique only within a key, What Is Embedded Code?


pages: 708 words: 223,211

The Friendly Orange Glow: The Untold Story of the PLATO System and the Dawn of Cyberculture by Brian Dear

air traffic controllers' union, AltaVista, Alvin Toffler, Apple II, Apple Newton, Buckminster Fuller, Charles Babbage, cloud computing, complexity theory, computer age, Computer Lib, conceptual framework, corporate social responsibility, disruptive innovation, Douglas Engelbart, Douglas Engelbart, Dynabook, Elon Musk, en.wikipedia.org, Fairchild Semiconductor, finite state, Future Shock, game design, Hacker News, Howard Rheingold, Ivan Sutherland, John Markoff, lateral thinking, linear programming, machine readable, Marc Andreessen, Marshall McLuhan, Menlo Park, Metcalfe’s law, Mitch Kapor, Mother of all demos, natural language processing, Neal Stephenson, Palm Treo, Plato's cave, pre–internet, publish or perish, Ralph Nader, Robert Metcalfe, Ronald Reagan, Silicon Valley, Silicon Valley startup, Skinner box, Skype, software is eating the world, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Ted Nelson, the medium is the message, The Soul of a New Machine, three-martini lunch, Watson beat the top human players on Jeopardy!, Whole Earth Catalog

Practitioners from other institutions attempting to show off their own computer creations were often frustrated with PLATO, because the crowds instantly gravitated to the Friendly Orange Glow. John Seely Brown, who in the early 1970s at the University of Michigan was developing an “intelligent tutoring system” called SOPHIE (short for “SOPHisticated Instructional Environment”) that used aspects of artificial intelligence and natural-language processing to enhance the dialogue between the electronic tutor and the human student, vividly remembers PLATO stealing SOPHIE’s thunder. “I had progressed in the first six months of the SOPHIE contract and I had to go out to Lowry Air Force Base to have a contract review. And, there was a PLATO terminal.

“I knew who he was,” says Green. “Of course I knew who he was. The only time I talked to him, he was the only person around, and he seemed to have a free minute, and I asked him a question.” Years later he’s no longer sure what newbie question he posed to Bitzer, but it might have been about natural language processing. Perhaps if CERL considered this or that approach? “And he said,” says Green, “ ‘That’s one of the questions that we’re working on around here.’ And I said, ‘Well, I thought…you know, maybe if this…’ and he said something to the effect of, ‘Hmm, well, maybe you’ll be the one to figure it out!’


pages: 332 words: 91,780

Starstruck: The Business of Celebrity by Currid

barriers to entry, Bernie Madoff, Big Tech, Donald Trump, income inequality, index card, industrial cluster, Mark Zuckerberg, Metcalfe’s law, natural language processing, place-making, Ponzi scheme, post-industrial society, power law, prediction markets, public intellectual, Renaissance Technologies, Richard Florida, Robert Metcalfe, Robert Solow, rolodex, search costs, shareholder value, Silicon Valley, slashdot, Stephen Fry, the long tail, The Theory of the Leisure Class by Thorstein Veblen, transaction costs, Tyler Cowen, upwardly mobile, urban decay, Vilfredo Pareto, Virgin Galactic, winner-take-all economy

We then stored the meta-information in a MS-SQL relational database. In step two we identified the individuals in each photo. Instead of studying the photos themselves, we studied the caption information associated with the photos and cataloged an aggregate collection of this data. In order to identify the photographed objects, we used natural language processing (NLP). SQL-implemented association rules enabled us to clean the data. Our cataloging process collected the following information: names and occupations of individuals in each picture, the event and date when the photo was taken (e.g., Actress Angelina Jolie at the Oscars, February 22, 2007).


pages: 343 words: 93,544

vN: The First Machine Dynasty (The Machine Dynasty Book 1) by Madeline Ashby

big-box store, company town, iterative process, natural language processing, place-making, retail therapy, synthetic biology, traveling salesman, urban planning

The angel investor supporting the development of von Neumann humanoids was not a military contractor, or a tech firm, or even a design giant. It was a church. A global megachurch named New Eden Ministries, Inc, that believed firmly that the Rapture was coming any minute now. It collected donations, bought real estate, and put the proceeds into programmable matter, natural language processing, and affect detection – all for the benefit of the few pitiful humans regrettably left behind to deal with God's wrath. They would need companions, after all. Helpmeets. And those helpmeets couldn't ever hurt humans. That was the Horsemen's job. It all went to hell, of course. The pastor of New Eden Ministries, Jonah LeMarque, and many of his council members became the defendants in a class action suit brought by youth group members regarding the use of their bodies as models in a pornographic game.


High-Frequency Trading by David Easley, Marcos López de Prado, Maureen O'Hara

algorithmic trading, asset allocation, backtesting, Bear Stearns, Brownian motion, capital asset pricing model, computer vision, continuous double auction, dark matter, discrete time, finite state, fixed income, Flash crash, High speed trading, index arbitrage, information asymmetry, interest rate swap, Large Hadron Collider, latency arbitrage, margin call, market design, market fragmentation, market fundamentalism, market microstructure, martingale, National best bid and offer, natural language processing, offshore financial centre, pattern recognition, power law, price discovery process, price discrimination, price stability, proprietary trading, quantitative trading / quantitative finance, random walk, Sharpe ratio, statistical arbitrage, statistical model, stochastic process, Tobin tax, transaction costs, two-sided market, yield curve

What interpretation can be given for a single order placement in a massive stream of microstructure data, or to a snapshot of an intraday order book, especially considering the fact that any outstanding order can be cancelled by the submitting party any time prior to execution?2 95 i i i i i i “Easley” — 2013/10/8 — 11:31 — page 96 — #116 i i HIGH-FREQUENCY TRADING To offer an analogy, consider the now common application of machine learning to problems in natural language processing (NLP) and computer vision. Both of them remain very challenging domains. But, in NLP, it is at least clear that the basic unit of meaning in the data is the word, which is how digital documents are represented and processed. In contrast, digital images are represented at the pixel level, but this is certainly not the meaningful unit of information in vision applications – objects are – but algorithmically extracting objects from images remains a difficult problem.


pages: 319 words: 90,965

The End of College: Creating the Future of Learning and the University of Everywhere by Kevin Carey

Albert Einstein, barriers to entry, Bayesian statistics, behavioural economics, Berlin Wall, Blue Ocean Strategy, business cycle, business intelligence, carbon-based life, classic study, Claude Shannon: information theory, complexity theory, data science, David Heinemeier Hansson, declining real wages, deliberate practice, discrete time, disruptive innovation, double helix, Douglas Engelbart, Douglas Engelbart, Downton Abbey, Drosophila, Fairchild Semiconductor, Firefox, Frank Gehry, Google X / Alphabet X, Gregor Mendel, informal economy, invention of the printing press, inventory management, John Markoff, Khan Academy, Kickstarter, low skilled workers, Lyft, Marc Andreessen, Mark Zuckerberg, meta-analysis, natural language processing, Network effects, open borders, pattern recognition, Peter Thiel, pez dispenser, Recombinant DNA, ride hailing / ride sharing, Ronald Reagan, Ruby on Rails, Sand Hill Road, self-driving car, Silicon Valley, Silicon Valley startup, social web, South of Market, San Francisco, speech recognition, Steve Jobs, technoutopianism, transcontinental railway, uber lyft, Vannevar Bush

He and two coauthors recently name-checked a well-known article called “The Unreasonable Effectiveness of Mathematics in the Natural Sciences,” which “examines why so much of physics can be neatly explained with simple mathematical formulas such as F = ma or E = mc2. Meanwhile, sciences that involve human beings rather than elementary particles have proven more resistant to elegant mathematics.” “Perhaps when it comes to natural language processing and related fields,” they wrote, “we’re doomed to complex theories that will never have the elegance of physics equations. But if that’s so, we should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data.”


pages: 382 words: 92,138

The Entrepreneurial State: Debunking Public vs. Private Sector Myths by Mariana Mazzucato

Apple II, banking crisis, barriers to entry, Bretton Woods, business cycle, California gold rush, call centre, carbon footprint, carbon tax, Carmen Reinhart, circular economy, clean tech, computer age, creative destruction, credit crunch, David Ricardo: comparative advantage, demand response, deskilling, dual-use technology, endogenous growth, energy security, energy transition, eurozone crisis, everywhere but in the productivity statistics, Fairchild Semiconductor, Financial Instability Hypothesis, full employment, G4S, general purpose technology, green transition, Growth in a Time of Debt, Hyman Minsky, incomplete markets, information retrieval, intangible asset, invisible hand, Joseph Schumpeter, Kenneth Rogoff, Kickstarter, knowledge economy, knowledge worker, linear model of innovation, natural language processing, new economy, offshore financial centre, Philip Mirowski, popular electronics, Post-Keynesian economics, profit maximization, Ralph Nader, renewable energy credits, rent-seeking, ride hailing / ride sharing, risk tolerance, Robert Solow, shareholder value, Silicon Valley, Silicon Valley ideology, smart grid, Solyndra, Steve Jobs, Steve Wozniak, The Wealth of Nations by Adam Smith, Tim Cook: Apple, Tony Fadell, too big to fail, total factor productivity, trickle-down economics, vertical integration, Washington Consensus, William Shockley: the traitorous eight

This technology, as well as the infrastructure of the system, would have been impossible without the government taking the initiative and making the necessary financial commitment for such a highly complex system. Apple’s latest iPhone feature is a virtual personal assistant known as SIRI. And, like most of the other key technological features in Apple’s iOS products, SIRI has its roots in federal funding and research. SIRI is an artificial intelligence program consisting of machine learning, natural language processing and a Web search algorithm (Roush 2010). In 2000, DARPA asked the Stanford Research Institute (SRI) to take the lead on a project to develop a sort of ‘virtual office assistant’ to assist military personnel. SRI was put in charge of coordinating the ‘Cognitive Assistant that Learns and Organizes’ (CALO) project which included 20 universities all over the US collaborating to develop the necessary technology base.


Learn Algorithmic Trading by Sebastien Donadio

active measures, algorithmic trading, automated trading system, backtesting, Bayesian statistics, behavioural economics, buy and hold, buy low sell high, cryptocurrency, data science, deep learning, DevOps, en.wikipedia.org, fixed income, Flash crash, Guido van Rossum, latency arbitrage, locking in a profit, market fundamentalism, market microstructure, martingale, natural language processing, OpenAI, p-value, paper trading, performance metric, prediction markets, proprietary trading, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, Sharpe ratio, short selling, sorting algorithm, statistical arbitrage, statistical model, stochastic process, survivorship bias, transaction costs, type inference, WebSocket, zero-sum game

He works as a Senior Quantitative Developer at a trading firm in Chicago. He holds a Masters in Computer Science from the University of Southern California. His areas of interest include Computer Architecture, FinTech, Probability Theory and Stochastic Processes, Statistical Learning and Inference Methods, and Natural Language Processing. About the reviewers Nataraj Dasgupta is the VP of Advanced Analytics at RxDataScience Inc. He has been in the IT industry for more than 19 years and has worked in the technical & analytics divisions of Philip Morris, IBM, UBS Investment Bank, and Purdue Pharma. He led the Data Science team at Purdue, where he developed the company's award-winning Big Data and Machine Learning platform.


pages: 290 words: 90,057

Billion Dollar Brand Club: How Dollar Shave Club, Warby Parker, and Other Disruptors Are Remaking What We Buy by Lawrence Ingrassia

air freight, Airbnb, airport security, Amazon Robotics, augmented reality, barriers to entry, call centre, commoditize, computer vision, data science, fake news, fulfillment center, global supply chain, Hacker News, industrial robot, Jeff Bezos, Kickstarter, Kiva Systems, Lyft, Mark Zuckerberg, minimum viable product, natural language processing, Netflix Prize, rolodex, San Francisco homelessness, side project, Silicon Valley, Silicon Valley startup, Snapchat, Steve Jobs, supply-chain management, Uber and Lyft, uber lyft, warehouse automation, warehouse robotics, WeWork

Accurate reviews are a constant challenge for Amazon, as some sellers, especially Chinese companies, try to skew results by generating fake five-star reviews. But AIMEE focuses more on bad reviews. If shoppers complain about the quality, or don’t like the features, or even express an interest in different colors or sizes, that presents a potential opportunity for a new brand. “We use natural language processing to parse through thousands of reviews to identify any pain points customers have,” Sarig explains. If not for AIMEE, it wouldn’t have occurred to Mohawk to consider a line of small home appliances. The category surfaced in a “top product model” feature Mohawk software engineers added to AIMEE to spotlight bigger-ticket items worth looking at.


How to Stand Up to a Dictator by Maria Ressa

2021 United States Capitol attack, activist lawyer, affirmative action, Affordable Care Act / Obamacare, airport security, anti-communist, Asian financial crisis, Big Tech, Brexit referendum, business process, business process outsourcing, call centre, Cambridge Analytica, citizen journalism, cognitive bias, colonial rule, commoditize, contact tracing, coronavirus, COVID-19, crowdsourcing, delayed gratification, disinformation, Donald Trump, fake news, future of journalism, iterative process, James Bridle, Kevin Roose, lockdown, lone genius, Mahatma Gandhi, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Milgram experiment, move fast and break things, natural language processing, Nelson Mandela, Network effects, obamacare, performance metric, QAnon, recommendation engine, Sheryl Sandberg, Shoshana Zuboff, Silicon Valley, Steven Levy, surveillance capitalism, the medium is the message, The Wisdom of Crowds, TikTok, Twitter Arab Spring, work culture

The Duterte administration immediately complained.21 Our first step in the fact-checking process was to find the lies. As mentioned previously, the best lies are half-truths that work to support a metanarrative, like “Duterte is the best leader” or “Journalists are criminals.” Step two was to use natural language processing, using computers to process large amounts of text to pull out the consistent messages of networks of disinformation. Doing that led us to step three, which was identifying the websites and other digital assets associated with those networks, including those profiting off the enterprise.22 Duterte had consolidated power and polarized the society by often using asymmetrical warfare, with small groups like us trying to stand up for the facts against the disinformation that was more likely to travel over pro-Duterte and pro-Marcos disinformation networks.


pages: 314 words: 101,034

Every Patient Tells a Story by Lisa Sanders

classic study, data acquisition, discovery of penicillin, high batting average, index card, medical residency, meta-analysis, natural language processing, pattern recognition, Pepto Bismol, randomized controlled trial, Ronald Reagan

Doctors using the diagnostic tool that Britto and Maude named Isabel can enter information using either key findings (like GIDEON) or whole-text entries, such as clinical descriptions that are cut-and-pasted from another program. Isabel also uses a novel search strategy to identify candidate diagnoses from the clinical findings. The program includes a thesaurus that facilitates recognition of a wide range of terms describing each finding. The program then uses natural language processing and search algorithms to compare these terms to those used in a selected reference library. For internal medicine cases, the library includes six key textbooks and forty-six major journals in general and subspecialty medicine and toxicology. The search domain and results are filtered to take into account the patient’s age, sex, geographic location, pregnancy status, and other clinical parameters that are either selected by the clinician or automatically entered if the system is integrated with the clinician’s electronic medical record.


pages: 352 words: 96,532

Where Wizards Stay Up Late: The Origins of the Internet by Katie Hafner, Matthew Lyon

air freight, Bill Duvall, Charles Babbage, Compatible Time-Sharing System, computer age, conceptual framework, Donald Davies, Douglas Engelbart, Douglas Engelbart, fault tolerance, Hush-A-Phone, information retrieval, Ivan Sutherland, John Markoff, Kevin Kelly, Leonard Kleinrock, Marc Andreessen, Menlo Park, military-industrial complex, Multics, natural language processing, OSI model, packet switching, RAND corporation, RFC: Request For Comment, Robert Metcalfe, Ronald Reagan, seminal paper, Silicon Valley, Skinner box, speech recognition, Steve Crocker, Steven Levy, The Soul of a New Machine

The most conspicuous exception to this was Crowther, who had remained a programmer. For years Heart had been Crowther’s champion, lobbying for the company to let Crowther just be Crowther and think up ingenious ideas in his own dreamy way. In the years following the IMP project, Crowther pursued some unusual ideas about natural language processing, and worked extensively on high-speed packet-switching technology. Severo Ornstein had left BBN in the 1970s for Xerox PARC, and while there he started Computer Professionals for Social Responsibility. When he retired from Xerox, he and his wife moved into one of the remotest corners of the San Francisco Bay Area.


pages: 326 words: 103,170

The Seventh Sense: Power, Fortune, and Survival in the Age of Networks by Joshua Cooper Ramo

air gap, Airbnb, Alan Greenspan, Albert Einstein, algorithmic trading, barriers to entry, Berlin Wall, bitcoin, Bletchley Park, British Empire, cloud computing, Computing Machinery and Intelligence, crowdsourcing, Danny Hillis, data science, deep learning, defense in depth, Deng Xiaoping, drone strike, Edward Snowden, Fairchild Semiconductor, Fall of the Berlin Wall, financial engineering, Firefox, Google Chrome, growth hacking, Herman Kahn, income inequality, information security, Isaac Newton, Jeff Bezos, job automation, Joi Ito, Laura Poitras, machine translation, market bubble, Menlo Park, Metcalfe’s law, Mitch Kapor, Morris worm, natural language processing, Neal Stephenson, Network effects, Nick Bostrom, Norbert Wiener, Oculus Rift, off-the-grid, packet switching, paperclip maximiser, Paul Graham, power law, price stability, quantitative easing, RAND corporation, reality distortion field, Recombinant DNA, recommendation engine, Republic of Letters, Richard Feynman, road to serfdom, Robert Metcalfe, Sand Hill Road, secular stagnation, self-driving car, Silicon Valley, Skype, Snapchat, Snow Crash, social web, sovereign wealth fund, Steve Jobs, Steve Wozniak, Stewart Brand, Stuxnet, superintelligent machines, systems thinking, technological singularity, The Coming Technological Singularity, The Wealth of Nations by Adam Smith, too big to fail, Vernor Vinge, zero day

In 1965, an MIT computer scientist named Joseph Weizenbaum found himself, somewhat unexpectedly, considering a problem with his computer and its users that he had not quite anticipated. Weizenbaum was in the midst of an experiment that started innocently enough. He’d written a program to perform what is now known as natural language processing, essentially a bit of code designed to translate what a human tells a machine into something the machine can actually work with. When someone asks a computer, What is the weather? the machine uses a special processing approach to turn that into an instruction set. Answering those sorts of queries demands a great deal of digital work before the computer can figure out what you mean and how to fill you in.


pages: 327 words: 103,336

Everything Is Obvious: *Once You Know the Answer by Duncan J. Watts

"World Economic Forum" Davos, active measures, affirmative action, Albert Einstein, Amazon Mechanical Turk, AOL-Time Warner, Bear Stearns, behavioural economics, Black Swan, business cycle, butterfly effect, carbon credits, Carmen Reinhart, Cass Sunstein, clockwork universe, cognitive dissonance, coherent worldview, collapse of Lehman Brothers, complexity theory, correlation does not imply causation, crowdsourcing, death of newspapers, discovery of DNA, East Village, easy for humans, difficult for computers, edge city, en.wikipedia.org, Erik Brynjolfsson, framing effect, Future Shock, Geoffrey West, Santa Fe Institute, George Santayana, happiness index / gross national happiness, Herman Kahn, high batting average, hindsight bias, illegal immigration, industrial cluster, interest rate swap, invention of the printing press, invention of the telescope, invisible hand, Isaac Newton, Jane Jacobs, Jeff Bezos, Joseph Schumpeter, Kenneth Rogoff, lake wobegon effect, Laplace demon, Long Term Capital Management, loss aversion, medical malpractice, meta-analysis, Milgram experiment, natural language processing, Netflix Prize, Network effects, oil shock, packet switching, pattern recognition, performance metric, phenotype, Pierre-Simon Laplace, planetary scale, prediction markets, pre–internet, RAND corporation, random walk, RFID, school choice, Silicon Valley, social contagion, social intelligence, statistical model, Steve Ballmer, Steve Jobs, Steve Wozniak, supply-chain management, tacit knowledge, The Death and Life of Great American Cities, the scientific method, The Wisdom of Crowds, too big to fail, Toyota Production System, Tragedy of the Commons, ultimatum game, urban planning, Vincenzo Peruggia: Mona Lisa, Watson beat the top human players on Jeopardy!, X Prize

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences E87A (9):2379–86. Snow, Rion, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. “Cheap and Fast—But Is It Good? Evaluating Non-Expert Annotations for Natural Language Tasks.” In Empirical Methods in Natural Language Processing. Honolulu, Hawaii: Association for Computational Linguistics. Somers, Margaret R. 1998. “ ‘We’re No Angels’: Realism, Rational Choice, and Relationality in Social Science.” American Journal of Sociology 104 (3):722–84. Sorkin, Andrew Ross (ed). 2008. “Steve & Barry’s Files for Bankruptcy.”


pages: 341 words: 95,752

Word by Word: The Secret Life of Dictionaries by Kory Stamper

Affordable Care Act / Obamacare, company town, index card, microaggression, natural language processing, obamacare, Ronald Reagan, Steven Pinker, why are manhole covers round?

Or I might decide that it’s an important enough word that even though it’s still being glossed regularly, it deserves entry right away: words like “AIDS” and “SARS” will probably get entered into a dictionary fairly quickly after they first show up on the scene, because you can reason that the syndromes they name are significant enough health events that they are not going anywhere very soon. Those sorts of decisions are made on a human level; people with experience in the trenches of language change can make those decisions far better than natural-language processing programs currently can. Computers are, however, far quicker. Thinking about documenting language brings on a gurgle of dread deep in the editorial gut. The philosophy of citation gathering actually runs counter to how language forms. Because we live in a literate society with comparatively easy access to books and education, we tend to believe that the written word is more important and has more weight than the spoken word.


pages: 337 words: 103,522

The Creativity Code: How AI Is Learning to Write, Paint and Think by Marcus Du Sautoy

3D printing, Ada Lovelace, Albert Einstein, algorithmic bias, AlphaGo, Alvin Roth, Andrew Wiles, Automated Insights, Benoit Mandelbrot, Bletchley Park, Cambridge Analytica, Charles Babbage, Claude Shannon: information theory, computer vision, Computing Machinery and Intelligence, correlation does not imply causation, crowdsourcing, data is the new oil, data science, deep learning, DeepMind, Demis Hassabis, Donald Trump, double helix, Douglas Hofstadter, driverless car, Elon Musk, Erik Brynjolfsson, Fellow of the Royal Society, Flash crash, Gödel, Escher, Bach, Henri Poincaré, Jacquard loom, John Conway, Kickstarter, Loebner Prize, machine translation, mandelbrot fractal, Minecraft, move 37, music of the spheres, Mustafa Suleyman, Narrative Science, natural language processing, Netflix Prize, PageRank, pattern recognition, Paul Erdős, Peter Thiel, random walk, Ray Kurzweil, recommendation engine, Rubik’s Cube, Second Machine Age, Silicon Valley, speech recognition, stable marriage problem, Turing test, Watson beat the top human players on Jeopardy!, wikimedia commons

Most question-answering systems are programmed to deal with a defined set of question types – meaning you can only answer certain kinds of questions, phrased in a certain ways, in order to obtain a response. Watson handles open-domain questions, meaning anything you can think of to ask it. It uses natural-language processing techniques to pick apart the words you give it, in order to understand the real question being asked, even when you ask it in an unusual way. IBM actually published a very useful FAQ about Watson and IBM’s DeepQA Project, a foundational technology utilised by Watson in generating hypotheses.


pages: 193 words: 98,671

The Inmates Are Running the Asylum by Alan Cooper

Albert Einstein, Apple Newton, Bill Atkinson, business cycle, delayed gratification, Donald Trump, Gary Kildall, General Magic , Howard Rheingold, informal economy, iterative process, Jeff Bezos, lateral thinking, Menlo Park, natural language processing, new economy, PalmPilot, pets.com, Robert X Cringely, Silicon Valley, Silicon Valley startup, skunkworks, Steve Jobs, Steven Pinker, telemarketer, urban planning

Microsoft says that interfaces will be easy to use as soon as it can perfect voice recognition and handwriting recognition. I think this is silly. Each new technology merely makes it possible to frustrate users with faster and more-powerful systems. A key to better interaction is to reduce the uncertainty between computers and users. Natural-language processing can never do that because meanings are so vague in human conversation. So much of our communication is based on nuance, gesture, and inflection that although it might be a year or two before computers can recognize our words, it might be decades—if ever—before computers can effectively interpret our meaning.


pages: 411 words: 98,128

Bezonomics: How Amazon Is Changing Our Lives and What the World's Best Companies Are Learning From It by Brian Dumaine

activist fund / activist shareholder / activist investor, AI winter, Airbnb, Amazon Robotics, Amazon Web Services, Atul Gawande, autonomous vehicles, basic income, Bernie Sanders, Big Tech, Black Swan, call centre, Cambridge Analytica, carbon tax, Carl Icahn, Chris Urmson, cloud computing, corporate raider, creative destruction, Danny Hillis, data science, deep learning, Donald Trump, Elon Musk, Erik Brynjolfsson, Fairchild Semiconductor, fake news, fulfillment center, future of work, gig economy, Glass-Steagall Act, Google Glasses, Google X / Alphabet X, income inequality, independent contractor, industrial robot, Internet of things, Jeff Bezos, job automation, Joseph Schumpeter, Kevin Kelly, Kevin Roose, Lyft, Marc Andreessen, Mark Zuckerberg, military-industrial complex, money market fund, natural language processing, no-fly zone, Ocado, pets.com, plutocrats, race to the bottom, ride hailing / ride sharing, Salesforce, Sand Hill Road, self-driving car, shareholder value, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, Snapchat, speech recognition, Steve Jobs, Stewart Brand, supply-chain management, TED Talk, Tim Cook: Apple, too big to fail, Travis Kalanick, two-pizza team, Uber and Lyft, uber lyft, universal basic income, warehouse automation, warehouse robotics, wealth creators, web application, Whole Earth Catalog, work culture

Amazon does say it’s not overly fixated on the Echo as a shopping aid, especially given how the device ties in with the other services it offers through its Prime subscription, such as music and videos. Still, it holds out hope that the Amazon-optimized computers it has placed in customers’ homes will boost its retail business. Says Amazon’s Prasad, the natural-language-processing scientist, “If you want to buy double-A batteries, you don’t need to see them, and you don’t need to remember which ones. If you’ve never bought batteries before, we will suggest ones for you.” That suggestion, of course, often includes Amazon’s house brands. “Amazon is carpet-bombing America with these devices,” says Peter Hildick-Smith, president of the Codex-Group.


pages: 372 words: 100,947

An Ugly Truth: Inside Facebook's Battle for Domination by Sheera Frenkel, Cecilia Kang

"World Economic Forum" Davos, 2021 United States Capitol attack, affirmative action, augmented reality, autonomous vehicles, Ben Horowitz, Bernie Sanders, Big Tech, Black Lives Matter, blockchain, Cambridge Analytica, clean water, coronavirus, COVID-19, data science, disinformation, don't be evil, Donald Trump, Edward Snowden, end-to-end encryption, fake news, George Floyd, global pandemic, green new deal, hockey-stick growth, Ian Bogost, illegal immigration, immigration reform, independent contractor, information security, Jeff Bezos, Kevin Roose, Marc Andreessen, Marc Benioff, Mark Zuckerberg, Menlo Park, natural language processing, offshore financial centre, Parler "social media", Peter Thiel, QAnon, RAND corporation, ride hailing / ride sharing, Robert Mercer, Russian election interference, Salesforce, Sam Altman, Saturday Night Live, Sheryl Sandberg, Shoshana Zuboff, Silicon Valley, Snapchat, social web, Steve Bannon, Steve Jobs, Steven Levy, subscription business, surveillance capitalism, TechCrunch disrupt, TikTok, Travis Kalanick, WikiLeaks

He needed someone who could help him develop an algorithm that could rank what users wanted to see. He turned to Ruchi Sanghvi, one of his earliest employees and engineers, to anchor the technical work. Overseeing the project was a group of managers Zuckerberg had recently hired, most notably Chris Cox. Cox had been plucked from a graduate program at Stanford studying natural language processing, a field of linguistics that looked at how artificial intelligence could help computers process and analyze the way people spoke. With his buzz cut and perpetual tan, he looked like a surfer, but he sounded like a technologist. Cox was known as a top student at Stanford; his departure for a small start-up that was competing against the much larger and better-funded Myspace and Friendster confounded his professors and classmates.


The Myth of Artificial Intelligence: Why Computers Can't Think the Way We Do by Erik J. Larson

AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Alignment Problem, AlphaGo, Amazon Mechanical Turk, artificial general intelligence, autonomous vehicles, Big Tech, Black Swan, Bletchley Park, Boeing 737 MAX, business intelligence, Charles Babbage, Claude Shannon: information theory, Computing Machinery and Intelligence, conceptual framework, correlation does not imply causation, data science, deep learning, DeepMind, driverless car, Elon Musk, Ernest Rutherford, Filter Bubble, Geoffrey Hinton, Georg Cantor, Higgs boson, hive mind, ImageNet competition, information retrieval, invention of the printing press, invention of the wheel, Isaac Newton, Jaron Lanier, Jeff Hawkins, John von Neumann, Kevin Kelly, Large Hadron Collider, Law of Accelerating Returns, Lewis Mumford, Loebner Prize, machine readable, machine translation, Nate Silver, natural language processing, Nick Bostrom, Norbert Wiener, PageRank, PalmPilot, paperclip maximiser, pattern recognition, Peter Thiel, public intellectual, Ray Kurzweil, retrograde motion, self-driving car, semantic web, Silicon Valley, social intelligence, speech recognition, statistical model, Stephen Hawking, superintelligent machines, tacit knowledge, technological singularity, TED Talk, The Coming Technological Singularity, the long tail, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, theory of mind, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, Yochai Benkler

Title: The myth of artificial intelligence : why computers can't think the way we do / Erik J. Larson. Description: Cambridge, Massachusetts : The Belknap Press of Harvard University Press, 2021. | Includes bibliographical references and index. Identifiers: LCCN 2020050249 | ISBN 9780674983519 (cloth) Subjects: LCSH: Artificial intelligence. | Intellect. | Inference. | Logic. | Natural language processing (Computer science) | Neurosciences. Classification: LCC Q 335 .L37 2021 | DDC 006.3—dc23 LC record available at https://lccn.loc.gov/2020050249 To Brooke and Ben CONTENTS Introduction Part I: T H E S I M P L I F I E D W O R L D 1 7 1 The Intelligence Error 9 2 Turing at Bletchley 19 3 The Superintelligence Error 33 4 The Singularity, Then and Now 44 5 Natu­ral Language Understanding 50 6 AI as Technological Kitsch 60 7 Simplifications and Mysteries 68 Part II: T H E P R O B ­L E M O F I N F E R E N C E 87 8 ­Don’t Calculate, Analyze 89 9 The Puzzle of Peirce (and Peirce’s Puzzle) 95 10 Prob­lems with Deduction and Induction 106 viii C ontents 11 Machine Learning and Big Data 133 12 Abductive Inference 157 13 Inference and Language I 191 14 Inference and Language II 204 Part III: T H E ­F U T U R E O F T H E M Y T H 235 15 Myths and Heroes 237 16 AI My­thol­ogy Invades Neuroscience 245 17 Neocortical Theories of H ­ uman Intelligence 263 18 The End of Science?


pages: 328 words: 96,678

MegaThreats: Ten Dangerous Trends That Imperil Our Future, and How to Survive Them by Nouriel Roubini

"World Economic Forum" Davos, 2021 United States Capitol attack, 3D printing, 9 dash line, AI winter, AlphaGo, artificial general intelligence, asset allocation, assortative mating, autonomous vehicles, bank run, banking crisis, basic income, Bear Stearns, Big Tech, bitcoin, Bletchley Park, blockchain, Boston Dynamics, Bretton Woods, British Empire, business cycle, business process, call centre, carbon tax, Carmen Reinhart, cashless society, central bank independence, collateralized debt obligation, Computing Machinery and Intelligence, coronavirus, COVID-19, creative destruction, credit crunch, crony capitalism, cryptocurrency, currency manipulation / currency intervention, currency peg, data is the new oil, David Ricardo: comparative advantage, debt deflation, decarbonisation, deep learning, DeepMind, deglobalization, Demis Hassabis, democratizing finance, Deng Xiaoping, disintermediation, Dogecoin, Donald Trump, Elon Musk, en.wikipedia.org, energy security, energy transition, Erik Brynjolfsson, Ethereum, ethereum blockchain, eurozone crisis, failed state, fake news, family office, fiat currency, financial deregulation, financial innovation, financial repression, fixed income, floating exchange rates, forward guidance, Fractional reserve banking, Francis Fukuyama: the end of history, full employment, future of work, game design, geopolitical risk, George Santayana, Gini coefficient, global pandemic, global reserve currency, global supply chain, GPS: selective availability, green transition, Greensill Capital, Greenspan put, Herbert Marcuse, high-speed rail, Hyman Minsky, income inequality, inflation targeting, initial coin offering, Intergovernmental Panel on Climate Change (IPCC), Internet of things, invention of movable type, Isaac Newton, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, junk bonds, Kenneth Rogoff, knowledge worker, Long Term Capital Management, low interest rates, low skilled workers, low-wage service sector, M-Pesa, margin call, market bubble, Martin Wolf, mass immigration, means of production, meme stock, Michael Milken, middle-income trap, Mikhail Gorbachev, Minsky moment, Modern Monetary Theory, money market fund, money: store of value / unit of account / medium of exchange, moral hazard, mortgage debt, Mustafa Suleyman, Nash equilibrium, natural language processing, negative equity, Nick Bostrom, non-fungible token, non-tariff barriers, ocean acidification, oil shale / tar sands, oil shock, paradox of thrift, pets.com, Phillips curve, planetary scale, Ponzi scheme, precariat, price mechanism, price stability, public intellectual, purchasing power parity, quantitative easing, race to the bottom, Ralph Waldo Emerson, ransomware, Ray Kurzweil, regulatory arbitrage, reserve currency, reshoring, Robert Shiller, Ronald Reagan, Salesforce, Satoshi Nakamoto, Savings and loan crisis, Second Machine Age, short selling, Silicon Valley, smart contracts, South China Sea, sovereign wealth fund, Stephen Hawking, TED Talk, The Great Moderation, the payments system, Thomas L Friedman, TikTok, too big to fail, Turing test, universal basic income, War on Poverty, warehouse robotics, Washington Consensus, Watson beat the top human players on Jeopardy!, working-age population, Yogi Berra, Yom Kippur War, zero-sum game, zoonotic diseases

Complacency this time—the assumption that once again, the Luddites will be wrong—looks like a fatal mistake. AI encroaches on more jobs than in prior revolutions. It affects jobs across many industries, and it affects knowledge workers just as much as blue-collar workers. Machine learning has accomplished one of the long-term hurdles holding back AI: natural-language processing. By allowing machines to scan vast corpuses of texts and do their own pattern analyses, AIs have learned how to translate between languages with remarkable success, and how to generate new texts with remarkable authenticity. The subtle grasp of language crosses one of the last obstacles en route to satisfying the Turing Test.


pages: 1,076 words: 67,364

Haskell Programming: From First Principles by Christopher Allen, Julie Moronuki

book value, c2.com, en.wikipedia.org, fail fast, fizzbuzz, functional programming, heat death of the universe, higher-order functions, natural language processing, spaced repetition, tiling window manager, Turing complete, Turing machine, type inference, web application, Y Combinator

As anyone who has encountered Chris–probably in any medium, but certainly on Twitter–knows, it doesn’t take long before he starts urging you to learn Haskell. I told him I had no interest in programming. I told him nothing and nobody had ever been able to interest me in programming before. When Chris learned of my background in linguistics, he thought I might be interested in natural language processing and exhorted me to learn Haskell for that purpose. I remained unconvinced. Then he tried a different approach. He was spending a lot of time gathering and evaluating resources for teaching Haskell and refining his pedagogical techniques, and he convinced me to try to learn Haskell so that he could gain the experience of teaching a code-neophyte.

PARSER COMBINATORS 842 • use a parsing library to cover the basics of parsing; • demonstrate the awesome power of parser combinators; • marshall and unmarshall some JSON data; • talk about tokenization. 24.2 A few more words of introduction In this chapter, we will not look too deeply into the types of the parsing libraries we’re using, learn every sort of parser there is, or artisanally handcraft all of our parsing functions ourselves. These are thoroughly considered decisions. Parsing is a huge field of research in its own right with connections that span natural language processing, linguistics, and programming language theory. Just this topic could easily fill a book in itself (in fact, it has). The underlying types and typeclasses of the libraries we’ll be using are complicated. To be sure, if you enjoy parsing and expect to do it a lot, those are things you’d want to learn; they are simply out of the scope of this book.


pages: 396 words: 107,814

Is That a Fish in Your Ear?: Translation and the Meaning of Everything by David Bellos

Bletchley Park, Clapham omnibus, Claude Shannon: information theory, Douglas Hofstadter, Dr. Strangelove, Etonian, European colonialism, Great Leap Forward, haute cuisine, high-speed rail, invention of the telephone, invention of writing, language acquisition, machine readable, machine translation, natural language processing, Republic of Letters, Sapir-Whorf hypothesis, speech recognition

But common sense appeals to our total experience of the nonlinguistic world as well as to our ability to find a way through the language maze: it is precisely the kind of fuzzy, vague, and informal knowledge that distinctive feature analysis seeks to overcome and replace. Despite the usefulness of binary decomposition for some kinds of linguistic description and (in far more complex form) in the “natural language processing” that computers can now perform, word meanings can never be fully specified by atomic distinctions alone. People are just too adept at using words to mean something else. Such quasi-mathematical computation of “meaning” is equally unable to solve an even more basic problem, which is how to identify the very units whose meaning is to be specified.


pages: 451 words: 103,606

Machine Learning for Hackers by Drew Conway, John Myles White

call centre, centre right, correlation does not imply causation, data science, Debian, Erdős number, Nate Silver, natural language processing, Netflix Prize, off-by-one error, p-value, pattern recognition, Paul Erdős, recommendation engine, social graph, SpamAssassin, statistical model, text mining, the scientific method, traveling salesman

This would cause catastrophic results for our classifier because many, or even all, messages would be incorrectly assigned a zero probability of being either spam or ham. Researchers have come up with many clever ways of trying to get around this problem, such as drawing a random probability from some distribution or using natural language processing (NLP) techniques to estimate the “spamminess” of a term given its context. For our purposes, we will use a very simple rule: assign a very small probability to terms that are not in the training set. This is, in fact, a common way of dealing with missing terms in simple text classifiers, and for our purposes it will serve just fine.


pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future by Kevin Kelly

A Declaration of the Independence of Cyberspace, Aaron Swartz, AI winter, Airbnb, Albert Einstein, Alvin Toffler, Amazon Web Services, augmented reality, bank run, barriers to entry, Baxter: Rethink Robotics, bitcoin, blockchain, book scanning, Brewster Kahle, Burning Man, cloud computing, commoditize, computer age, Computer Lib, connected car, crowdsourcing, dark matter, data science, deep learning, DeepMind, dematerialisation, Downton Abbey, driverless car, Edward Snowden, Elon Musk, Filter Bubble, Freestyle chess, Gabriella Coleman, game design, Geoffrey Hinton, Google Glasses, hive mind, Howard Rheingold, index card, indoor plumbing, industrial robot, Internet Archive, Internet of things, invention of movable type, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Perry Barlow, Kevin Kelly, Kickstarter, lifelogging, linked data, Lyft, M-Pesa, machine readable, machine translation, Marc Andreessen, Marshall McLuhan, Mary Meeker, means of production, megacity, Minecraft, Mitch Kapor, multi-sided market, natural language processing, Netflix Prize, Network effects, new economy, Nicholas Carr, off-the-grid, old-boy network, peer-to-peer, peer-to-peer lending, personalized medicine, placebo effect, planetary scale, postindustrial economy, Project Xanadu, recommendation engine, RFID, ride hailing / ride sharing, robo advisor, Rodney Brooks, self-driving car, sharing economy, Silicon Valley, slashdot, Snapchat, social graph, social web, software is eating the world, speech recognition, Stephen Hawking, Steven Levy, Ted Nelson, TED Talk, The future is already here, the long tail, the scientific method, transport as a service, two-sided market, Uber for X, uber lyft, value engineering, Watson beat the top human players on Jeopardy!, WeWork, Whole Earth Review, Yochai Benkler, yottabyte, zero-sum game

in-house AI research teams: Reed Albergotti, “Zuckerberg, Musk Invest in Artificial-Intelligence Company,” Wall Street Journal, March 21, 2014. purchased AI companies since 2014: Derrick Harris, “Pinterest, Yahoo, Dropbox and the (Kind of) Quiet Content-as-Data Revolution,” Gigaom, January 6, 2014; Derrick Harris “Twitter Acquires Deep Learning Startup Madbits,” Gigaom, July 29, 2014; Ingrid Lunden, “Intel Has Acquired Natural Language Processing Startup Indisys, Price ‘North’ of $26M, to Build Its AI Muscle,” TechCrunch, September 13, 2013; and Cooper Smith, “Social Networks Are Investing Big in Artificial Intelligence,” Business Insider, March 17, 2014. expanding 70 percent a year: Private analysis by Quid, Inc., 2014. taught an AI to learn to play: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et al., “Human-Level Control Through Deep Reinforcement Learning,” Nature 518, no. 7540 (2015): 529–33.


pages: 353 words: 104,146

European Founders at Work by Pedro Gairifo Santos

business intelligence, clean tech, cloud computing, crowdsourcing, deal flow, do what you love, fail fast, fear of failure, full text search, Hacker News, hockey-stick growth, information retrieval, inventory management, iterative process, Jeff Bezos, Joi Ito, Lean Startup, Mark Zuckerberg, Multics, natural language processing, pattern recognition, pre–internet, recommendation engine, Richard Stallman, Salesforce, Silicon Valley, Skype, slashdot, SoftBank, Steve Jobs, Steve Wozniak, subscription business, technology bubble, TED Talk, web application, Y Combinator

The first start-up I was with was a mobile internet start-up based in Stockholm, where I was the first employee on the business side. So I became VP of product management there and part of my job was to find complementary code to fit in with our product, essentially. I came across code that Peter Halacsy had done. Back then he was doing research in natural language processing and we were in need of that. This company also had a development office in Cluj, Romania. When you go to Cluj from Stockholm, you fly via Budapest. My parents are from Hungary actually. When I went to Cluj, I would stop for a day in Budapest and say hi. And that's what I did. I figured since I'm in Budapest I should try to actually meet this person who had done this interesting code.


Python Geospatial Development - Second Edition by Erik Westra

business logic, capital controls, database schema, Firefox, functional programming, Golden Gate Park, Google Earth, Mercator projection, natural language processing, openstreetmap, Silicon Valley, systems thinking, web application

Richard also manages the technical aspects of the EcoMapCostaRica.com project for the Biology Department at the University of Dallas. This includes the website, online field maps, field surveys, and the creation and comparison of panoramic photographs. Richard is also active in the field of natural language processing, especially with Python's NLTK package. Will Cadell is a principal consultant with Sparkgeo.com. He builds next generation web mapping applications, primarily using Google Maps, geoDjango, and PostGIS. He has worked in academia, government, and natural resources but now mainly consults for the start-up community in Silicon Valley.


pages: 401 words: 109,892

The Great Reversal: How America Gave Up on Free Markets by Thomas Philippon

airline deregulation, Amazon Mechanical Turk, Amazon Web Services, Andrei Shleifer, barriers to entry, Big Tech, bitcoin, blockchain, book value, business cycle, business process, buy and hold, Cambridge Analytica, carbon tax, Carmen Reinhart, carried interest, central bank independence, commoditize, crack epidemic, cross-subsidies, disruptive innovation, Donald Trump, driverless car, Erik Brynjolfsson, eurozone crisis, financial deregulation, financial innovation, financial intermediation, flag carrier, Ford Model T, gig economy, Glass-Steagall Act, income inequality, income per capita, index fund, intangible asset, inventory management, Jean Tirole, Jeff Bezos, Kenneth Rogoff, labor-force participation, law of one price, liquidity trap, low cost airline, manufacturing employment, Mark Zuckerberg, market bubble, minimum wage unemployment, money market fund, moral hazard, natural language processing, Network effects, new economy, offshore financial centre, opioid epidemic / opioid crisis, Pareto efficiency, patent troll, Paul Samuelson, price discrimination, profit maximization, purchasing power parity, QWERTY keyboard, rent-seeking, ride hailing / ride sharing, risk-adjusted returns, Robert Bork, Robert Gordon, robo advisor, Ronald Reagan, search costs, Second Machine Age, self-driving car, Silicon Valley, Snapchat, spinning jenny, statistical model, Steve Jobs, stock buybacks, supply-chain management, Telecommunications Act of 1996, The Chicago School, the payments system, The Rise and Fall of American Growth, The Wealth of Nations by Adam Smith, too big to fail, total factor productivity, transaction costs, Travis Kalanick, vertical integration, Vilfredo Pareto, warehouse automation, zero-sum game

How does one go about building an index of federal regulations? By using computers to read and classify the data! RegData is a relatively new database—introduced in Al-Ubaydli and McLaughlin (2017)—that aims to measure regulatory stringency at the industry level. It relies on machine learning and natural language processing techniques to count the number of restrictive words or phrases such as “shall,” “must,” and “may not” in each section of the Code of Federal Regulations and to assign them to industries. RegData represents a vast improvement over a simple measure of page counts.h Figure 5.8 shows that the decline in entry coincided with the rise of entry regulations, but this does not mean that regulations caused the decline in entry.


pages: 374 words: 111,284

The AI Economy: Work, Wealth and Welfare in the Robot Age by Roger Bootle

"World Economic Forum" Davos, 3D printing, agricultural Revolution, AI winter, Albert Einstein, AlphaGo, Alvin Toffler, anti-work, antiwork, autonomous vehicles, basic income, Ben Bernanke: helicopter money, Bernie Sanders, Bletchley Park, blockchain, call centre, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, Carl Icahn, Chris Urmson, computer age, Computing Machinery and Intelligence, conceptual framework, corporate governance, correlation does not imply causation, creative destruction, David Ricardo: comparative advantage, deep learning, DeepMind, deindustrialization, Demis Hassabis, deskilling, Dr. Strangelove, driverless car, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, everywhere but in the productivity statistics, facts on the ground, fake news, financial intermediation, full employment, future of work, Future Shock, general purpose technology, Great Leap Forward, Hans Moravec, income inequality, income per capita, industrial robot, Internet of things, invention of the wheel, Isaac Newton, James Watt: steam engine, Jeff Bezos, Jeremy Corbyn, job automation, job satisfaction, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Joseph Schumpeter, Kevin Kelly, license plate recognition, low interest rates, machine translation, Marc Andreessen, Mark Zuckerberg, market bubble, mega-rich, natural language processing, Network effects, new economy, Nicholas Carr, Ocado, Paul Samuelson, Peter Thiel, Phillips curve, positional goods, quantitative easing, RAND corporation, Ray Kurzweil, Richard Florida, ride hailing / ride sharing, rising living standards, road to serfdom, Robert Gordon, Robert Shiller, Robert Solow, Rutger Bregman, Second Machine Age, secular stagnation, self-driving car, seminal paper, Silicon Valley, Silicon Valley billionaire, Simon Kuznets, Skype, social intelligence, spinning jenny, Stanislav Petrov, Stephen Hawking, Steven Pinker, synthetic biology, technological singularity, The Future of Employment, The Wealth of Nations by Adam Smith, Thomas Malthus, trade route, universal basic income, US Airways Flight 1549, Vernor Vinge, warehouse automation, warehouse robotics, Watson beat the top human players on Jeopardy!, We wanted flying cars, instead we got 140 characters, wealth creators, winner-take-all economy, world market for maybe five computers, Y2K, Yogi Berra

As a result of the use of sensors that track patients’ heart rate and blood pressure, thereby facilitating earlier identification of problems and treatment at home rather than in hospital, one possible result is a reduction in the number of people having to spend time in hospital, thereby freeing up resources for critical cases. In addition, natural language-processing technology enables doctors to transcribe and record meetings with patients with minimal effort and use of doctors’ time. A consultant labelling scans at Google’s offices said that labelling images for head and neck cancer “is a five or six hour job; usually doctors sit and do it after work.”32 Meanwhile, AI can help with triage in accident and emergency departments and help to reduce “traffic jams” in the flow of patients through different hospital departments.


The Smart Wife: Why Siri, Alexa, and Other Smart Home Devices Need a Feminist Reboot by Yolande Strengers, Jenny Kennedy

active measures, Amazon Robotics, Anthropocene, autonomous vehicles, Big Tech, Boston Dynamics, cloud computing, cognitive load, computer vision, Computing Machinery and Intelligence, crowdsourcing, cyber-physical system, data science, deepfake, Donald Trump, emotional labour, en.wikipedia.org, Evgeny Morozov, fake news, feminist movement, game design, gender pay gap, Grace Hopper, hive mind, Ian Bogost, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Jeff Bezos, John Markoff, Kitchen Debate, knowledge economy, Masayoshi Son, Milgram experiment, Minecraft, natural language processing, Network effects, new economy, pattern recognition, planned obsolescence, precautionary principle, robot derives from the Czech word robota Czech, meaning slave, self-driving car, Shoshana Zuboff, side hustle, side project, Silicon Valley, smart grid, smart meter, social intelligence, SoftBank, Steve Jobs, surveillance capitalism, systems thinking, technological solutionism, technoutopianism, TED Talk, Turing test, Wall-E, Wayback Machine, women in the workforce

We know from research carried out in the fields of robotics, human-computer interaction, and psychology that humans assign emotional as well as personal traits to computers.75 A smart wife precedent for this was set in 1966, when founding computer scientist Joseph Weizenbaum created the first chatbot, named ELIZA. This fembot, which performed natural language processing, was cast in the role of psychiatrist and worked by posing questions based on Rogerian psychotherapy back to her “clients” (such as, “And how does that make you feel?”).76 Weizenbaum was surprised and later dismayed to discover how intimately his colleagues related to ELIZA, and the emotional connections they quickly formed with this artificial therapist.77 So deep are these ties to inanimate objects that some people even marry them, like Erika Eiffel nee LaBrie, who married the Eiffel Tower.78 Indeed, according to the late Clifford Nass and his collaborator Corina Yen, experts in the fields of human-computer interaction and user experience design, the success and failure of interactive computer systems depends on whether we like them, and how well they treat us.79 This is partly because people have a tendency to humanize devices and assign them with genders, even when they don’t have one.


pages: 363 words: 109,077

The Raging 2020s: Companies, Countries, People - and the Fight for Our Future by Alec Ross

"Friedman doctrine" OR "shareholder theory", "World Economic Forum" Davos, Affordable Care Act / Obamacare, air gap, air traffic controllers' union, Airbnb, Albert Einstein, An Inconvenient Truth, autonomous vehicles, barriers to entry, benefit corporation, Bernie Sanders, Big Tech, big-box store, British Empire, call centre, capital controls, clean water, collective bargaining, computer vision, coronavirus, corporate governance, corporate raider, COVID-19, deep learning, Deng Xiaoping, Didi Chuxing, disinformation, Dissolution of the Soviet Union, Donald Trump, Double Irish / Dutch Sandwich, drone strike, dumpster diving, employer provided health coverage, Francis Fukuyama: the end of history, future of work, general purpose technology, gig economy, Gini coefficient, global supply chain, Goldman Sachs: Vampire Squid, Gordon Gekko, greed is good, high-speed rail, hiring and firing, income inequality, independent contractor, information security, intangible asset, invisible hand, Jeff Bezos, knowledge worker, late capitalism, low skilled workers, Lyft, Marc Andreessen, Marc Benioff, mass immigration, megacity, military-industrial complex, minimum wage unemployment, mittelstand, mortgage tax deduction, natural language processing, Oculus Rift, off-the-grid, offshore financial centre, open economy, OpenAI, Parag Khanna, Paris climate accords, profit motive, race to the bottom, RAND corporation, ride hailing / ride sharing, Robert Bork, rolodex, Ronald Reagan, Salesforce, self-driving car, shareholder value, side hustle, side project, Silicon Valley, smart cities, Social Responsibility of Business Is to Increase Its Profits, sovereign wealth fund, sparse data, special economic zone, Steven Levy, stock buybacks, strikebreaker, TaskRabbit, tech bro, tech worker, transcontinental railway, transfer pricing, Travis Kalanick, trickle-down economics, Uber and Lyft, uber lyft, union organizing, Upton Sinclair, vertical integration, working poor

Local governments started pouring funds into AI start-ups, and industry partnerships began to form. The following month, the government drafted a “national AI team,” selecting four domestic companies to take the lead in strategic AI fields including autonomous vehicles (Baidu), medical imaging (Tencent), natural language processing (iFLYTEK), and smart city technology (Alibaba). By August 2019, the team had expanded to fifteen members, each with its own area of expertise. These national champions are granted special access to government funds and databases. They collaborate with one another in a manner that does not exist in a serious way in Silicon Valley, sharing research insights and setting standards for the Chinese AI ecosystem.


pages: 363 words: 109,834

The Crux by Richard Rumelt

activist fund / activist shareholder / activist investor, air gap, Airbnb, AltaVista, AOL-Time Warner, Bayesian statistics, behavioural economics, biodiversity loss, Blue Ocean Strategy, Boeing 737 MAX, Boeing 747, Charles Lindbergh, Clayton Christensen, cloud computing, cognitive bias, commoditize, coronavirus, corporate raider, COVID-19, creative destruction, crossover SUV, Crossrail, deep learning, Deng Xiaoping, diversified portfolio, double entry bookkeeping, drop ship, Elon Musk, en.wikipedia.org, financial engineering, Ford Model T, Herman Kahn, income inequality, index card, Internet of things, Jeff Bezos, Just-in-time delivery, Larry Ellison, linear programming, lockdown, low cost airline, low earth orbit, Lyft, Marc Benioff, Mark Zuckerberg, Masayoshi Son, meta-analysis, Myron Scholes, natural language processing, Neil Armstrong, Network effects, packet switching, PageRank, performance metric, precision agriculture, RAND corporation, ride hailing / ride sharing, Salesforce, San Francisco homelessness, search costs, selection bias, self-driving car, shareholder value, sharing economy, Silicon Valley, Skype, Snapchat, social distancing, SoftBank, software as a service, statistical model, Steve Ballmer, Steve Jobs, stochastic process, Teledyne, telemarketer, TSMC, uber lyft, undersea cable, union organizing, vertical integration, WeWork

Alphabet Acquisitions in 2016 Company Business Complement to BandPage Platform for musicians YouTube Pie Business communications Spaces Synergyse Interactive tutorials Google Docs Webpass Internet service provider Google Fiber Moodstocks Image recognition Google Photos Anvato Cloud-based video services Google Cloud Platform Kifi Link management Spaces LaunchKit Mobile tool maker Firebase Orbitera Cloud software Google Cloud Platform Apigee API mgmt and predictive analytics Google Cloud Platform Urban Engines Location-based analytics Google Maps API.AI Natural language processing Google Assistant FameBit Branded content YouTube Eyefluence Eye tracking, virtual reality Google VR LeapDroid Android emulator Android Qwiklabs Cloud-based training platform Google Cloud Platform Cronologics Smartwatches Android Wear Source: https://en.wikipedia.org/wiki/List_of_mergers_and_acquisitions_by_Alphabet reproduced via Creative Commons license https://creativecommons.org/licenses/by-sa/3.0 INGREDIENT 5: DON’T OVERPAY One reason so many research studies keep showing negative returns to acquiring firms is that acquirers are overpaying for what they get.


pages: 918 words: 257,605

The Age of Surveillance Capitalism by Shoshana Zuboff

"World Economic Forum" Davos, algorithmic bias, Amazon Web Services, Andrew Keen, augmented reality, autonomous vehicles, barriers to entry, Bartolomé de las Casas, behavioural economics, Berlin Wall, Big Tech, bitcoin, blockchain, blue-collar work, book scanning, Broken windows theory, California gold rush, call centre, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, choice architecture, citizen journalism, Citizen Lab, classic study, cloud computing, collective bargaining, Computer Numeric Control, computer vision, connected car, context collapse, corporate governance, corporate personhood, creative destruction, cryptocurrency, data science, deep learning, digital capitalism, disinformation, dogs of the Dow, don't be evil, Donald Trump, Dr. Strangelove, driverless car, Easter island, Edward Snowden, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, facts on the ground, fake news, Ford Model T, Ford paid five dollars a day, future of work, game design, gamification, Google Earth, Google Glasses, Google X / Alphabet X, Herman Kahn, hive mind, Ian Bogost, impulse control, income inequality, information security, Internet of things, invention of the printing press, invisible hand, Jean Tirole, job automation, Johann Wolfgang von Goethe, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, Kevin Roose, knowledge economy, Lewis Mumford, linked data, longitudinal study, low skilled workers, Mark Zuckerberg, market bubble, means of production, multi-sided market, Naomi Klein, natural language processing, Network effects, new economy, Occupy movement, off grid, off-the-grid, PageRank, Panopticon Jeremy Bentham, pattern recognition, Paul Buchheit, performance metric, Philip Mirowski, precision agriculture, price mechanism, profit maximization, profit motive, public intellectual, recommendation engine, refrigerator car, RFID, Richard Thaler, ride hailing / ride sharing, Robert Bork, Robert Mercer, Salesforce, Second Machine Age, self-driving car, sentiment analysis, shareholder value, Sheryl Sandberg, Shoshana Zuboff, Sidewalk Labs, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, slashdot, smart cities, Snapchat, social contagion, social distancing, social graph, social web, software as a service, speech recognition, statistical model, Steve Bannon, Steve Jobs, Steven Levy, structural adjustment programs, surveillance capitalism, technological determinism, TED Talk, The Future of Employment, The Wealth of Nations by Adam Smith, Tim Cook: Apple, two-sided market, union organizing, vertical integration, Watson beat the top human players on Jeopardy!, winner-take-all economy, Wolfgang Streeck, work culture , Yochai Benkler, you are the product

As the Wall Street Journal reports, new startups such as Affirm, LendUp, and ZestFinance “use data from sources such as social media, online behavior and data brokers to determine the creditworthiness of tens of thousands of U.S. consumers who don’t have access to loans,” more evidence that decision rights and the privacy they enable have become luxuries that too many people cannot afford.152 Another example of surveillance-as-a-service is a firm that sells deep vetting of potential employees and tenants to employers and landlords. For instance, a prospective tenant receives a demand from her potential landlord that requires her to grant full access to all social media profiles. The service then “scrapes your site activity,” including entire conversation threads and private messages, runs it through natural language processing and other analytic software, and finally spits out a report that catalogues everything from your personality to your “financial stress level,” including exposing protected status information such as pregnancy and age. There is no opportunity for affected individuals to view or contest information.

See IBM-Acxiom, “Improving Consumer Consumption Preference Prediction Accuracy with Personality Insights,” March 2016, https://www.ibm.com/watson/developer cloud/doc/personality-insights/applied.shtml. 72. IBM-Acxiom, “Improving Consumer Consumption Preference Prediction Accuracy.” 73. “Social Media Analytics,” Xerox Research Center Europe, April 3, 2017, http://www.xrce.xerox.com/Our-Research/Natural-Language-Processing/Social-Media-Analytics; Amy Webb, “8 Tech Trends to Watch in 2016,” Harvard Business Review, December 8, 2015, https://hbr.org/2015/12/8-tech-trends-to-watch-in-2016; Christina Crowell, “Machines That Talk to Us May Soon Sense Our Feelings, Too,” Scientific American, June 24, 2016, https://www.scientificamerican.com/article/machines-that-talk-to-us-may-soon-sense-our-feelings-too; R.


pages: 405 words: 117,219

In Our Own Image: Savior or Destroyer? The History and Future of Artificial Intelligence by George Zarkadakis

3D printing, Ada Lovelace, agricultural Revolution, Airbnb, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, animal electricity, anthropic principle, Asperger Syndrome, autonomous vehicles, barriers to entry, battle of ideas, Berlin Wall, bioinformatics, Bletchley Park, British Empire, business process, carbon-based life, cellular automata, Charles Babbage, Claude Shannon: information theory, combinatorial explosion, complexity theory, Computing Machinery and Intelligence, continuous integration, Conway's Game of Life, cosmological principle, dark matter, data science, deep learning, DeepMind, dematerialisation, double helix, Douglas Hofstadter, driverless car, Edward Snowden, epigenetics, Flash crash, Google Glasses, Gödel, Escher, Bach, Hans Moravec, income inequality, index card, industrial robot, intentional community, Internet of things, invention of agriculture, invention of the steam engine, invisible hand, Isaac Newton, Jacquard loom, Jacques de Vaucanson, James Watt: steam engine, job automation, John von Neumann, Joseph-Marie Jacquard, Kickstarter, liberal capitalism, lifelogging, machine translation, millennium bug, mirror neurons, Moravec's paradox, natural language processing, Nick Bostrom, Norbert Wiener, off grid, On the Economy of Machinery and Manufactures, packet switching, pattern recognition, Paul Erdős, Plato's cave, post-industrial society, power law, precautionary principle, prediction markets, Ray Kurzweil, Recombinant DNA, Rodney Brooks, Second Machine Age, self-driving car, seminal paper, Silicon Valley, social intelligence, speech recognition, stem cell, Stephen Hawking, Steven Pinker, Strategic Defense Initiative, strong AI, Stuart Kauffman, synthetic biology, systems thinking, technological singularity, The Coming Technological Singularity, The Future of Employment, the scientific method, theory of mind, Turing complete, Turing machine, Turing test, Tyler Cowen, Tyler Cowen: Great Stagnation, Vernor Vinge, Von Neumann architecture, Watson beat the top human players on Jeopardy!, Y2K

For the purpose of the TV quiz, the engineers at IBM loaded Watson with 200 million pages of data, including dictionaries, encyclopaedias and literary articles. Moreover, Watson communicated in natural language. You asked it a question, it understood it, and returned an answer. For this to happen, Watson’s designers exploited the whole arsenal of AI tools and techniques, including machine learning, natural language processing and knowledge representation. What the success of their creation demonstrated was that brute computing force could overcome the obstacles that the AI pioneers faced in the 1960s and early 1970s. Bigger, stronger, faster were very meaningful words when it came to increasing machine intelligence.


pages: 479 words: 113,510

Fed Up: An Insider's Take on Why the Federal Reserve Is Bad for America by Danielle Dimartino Booth

Affordable Care Act / Obamacare, Alan Greenspan, asset-backed security, bank run, barriers to entry, Basel III, Bear Stearns, Bernie Sanders, Black Monday: stock market crash in 1987, break the buck, Bretton Woods, business cycle, central bank independence, collateralized debt obligation, corporate raider, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, Donald Trump, financial deregulation, financial engineering, financial innovation, fixed income, Flash crash, forward guidance, full employment, George Akerlof, Glass-Steagall Act, greed is good, Greenspan put, high net worth, housing crisis, income inequality, index fund, inflation targeting, interest rate swap, invisible hand, John Meriwether, Joseph Schumpeter, junk bonds, liquidity trap, London Whale, Long Term Capital Management, low interest rates, margin call, market bubble, Mexican peso crisis / tequila crisis, money market fund, moral hazard, Myron Scholes, natural language processing, Navinder Sarao, negative equity, new economy, Northern Rock, obamacare, Phillips curve, price stability, proprietary trading, pushing on a string, quantitative easing, regulatory arbitrage, Robert Shiller, Ronald Reagan, selection bias, short selling, side project, Silicon Valley, stock buybacks, tail risk, The Great Moderation, The Wealth of Nations by Adam Smith, too big to fail, trickle-down economics, yield curve

When I ran up to the break room to watch CNBC’s Steve Liesman read the FOMC statement, I was on tenterhooks, wondering which words had prevailed. I fully grasped the ridiculous pageantry because I knew the markets would parse every single word. A Fed “computational linguistics” study of FOMC statements released in 2015 concluded: “natural language processing can strip away false impressions and uncover hidden truths about complex communications such as those of the Federal Reserve.” The Street had it right all along. Depressingly, the option Fisher preferred was rarely the one that came out of Liesman’s mouth. The doves always seemed to have the upper hand.


pages: 426 words: 117,027

Mind in Motion: How Action Shapes Thought by Barbara Tversky

Apple's 1984 Super Bowl advert, Asperger Syndrome, augmented reality, clean water, cognitive load, continuous integration, double helix, en.wikipedia.org, fundamental attribution error, Hans Rosling, Intergovernmental Panel on Climate Change (IPCC), John Snow's cholera map, Lao Tzu, meta-analysis, mirror neurons, natural language processing, neurotypical, patient HM, Richard Feynman, Steven Pinker, TED Talk, the new new thing, theory of mind, urban planning

Structure of boxes and speech balloons in comics Groensteen, T. (2007). The system of comics. Translated by B. Beaty & N. Nguyen. Jackson: University Press of Mississippi. Adding information to words and images Clark, H. H. (1975). Bridging. In Proceedings of the 1975 Workshop on Theoretical Issues in Natural Language Processing (pp. 169–174). Cambridge, MA: Association for Computational Linguistics. Intraub, H., Bender, R. S., & Mangels, J. A. (1992). Looking at pictures but remembering scenes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18(1), 180. Segmenting events and stories McCloud, S. (1993).


pages: 501 words: 114,888

The Future Is Faster Than You Think: How Converging Technologies Are Transforming Business, Industries, and Our Lives by Peter H. Diamandis, Steven Kotler

Ada Lovelace, additive manufacturing, Airbnb, Albert Einstein, AlphaGo, Amazon Mechanical Turk, Amazon Robotics, augmented reality, autonomous vehicles, barriers to entry, Big Tech, biodiversity loss, bitcoin, blockchain, blood diamond, Boston Dynamics, Burning Man, call centre, cashless society, Charles Babbage, Charles Lindbergh, Clayton Christensen, clean water, cloud computing, Colonization of Mars, computer vision, creative destruction, CRISPR, crowdsourcing, cryptocurrency, data science, Dean Kamen, deep learning, deepfake, DeepMind, delayed gratification, dematerialisation, digital twin, disruptive innovation, Donald Shoup, driverless car, Easter island, Edward Glaeser, Edward Lloyd's coffeehouse, Elon Musk, en.wikipedia.org, epigenetics, Erik Brynjolfsson, Ethereum, ethereum blockchain, experimental economics, fake news, food miles, Ford Model T, fulfillment center, game design, Geoffrey West, Santa Fe Institute, gig economy, gigafactory, Google X / Alphabet X, gravity well, hive mind, housing crisis, Hyperloop, impact investing, indoor plumbing, industrial robot, informal economy, initial coin offering, intentional community, Intergovernmental Panel on Climate Change (IPCC), Internet of things, invention of the telegraph, Isaac Newton, Jaron Lanier, Jeff Bezos, job automation, Joseph Schumpeter, Kevin Kelly, Kickstarter, Kiva Systems, late fees, Law of Accelerating Returns, life extension, lifelogging, loss aversion, Lyft, M-Pesa, Mary Lou Jepsen, Masayoshi Son, mass immigration, megacity, meta-analysis, microbiome, microdosing, mobile money, multiplanetary species, Narrative Science, natural language processing, Neal Stephenson, Neil Armstrong, Network effects, new economy, New Urbanism, Nick Bostrom, Oculus Rift, One Laptop per Child (OLPC), out of africa, packet switching, peer-to-peer lending, Peter H. Diamandis: Planetary Resources, Peter Thiel, planned obsolescence, QR code, RAND corporation, Ray Kurzweil, RFID, Richard Feynman, Richard Florida, ride hailing / ride sharing, risk tolerance, robo advisor, Satoshi Nakamoto, Second Machine Age, self-driving car, Sidewalk Labs, Silicon Valley, Skype, smart cities, smart contracts, smart grid, Snapchat, SoftBank, sovereign wealth fund, special economic zone, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Jurvetson, Steven Pinker, Stewart Brand, supercomputer in your pocket, supply-chain management, tech billionaire, technoutopianism, TED Talk, Tesla Model S, Tim Cook: Apple, transaction costs, Uber and Lyft, uber lyft, unbanked and underbanked, underbanked, urban planning, Vision Fund, VTOL, warehouse robotics, Watson beat the top human players on Jeopardy!, We wanted flying cars, instead we got 140 characters, X Prize

Starship Technologies, for example, a startup created by Skype founders Janus Friis and Ahti Heinla, has a general-purpose home delivery robot. Right now, the system is an array of cameras and GPS sensors, but soon models will include microphones, speakers, and the ability—via AI-driven natural language processing—to communicate with customers. Since 2016, Starship has carried out fifty thousand deliveries in over one hundred cities in twenty countries. Along similar lines, Nuro, the company cofounded by Jiajun Zhu, one of the engineers who helped Google develop their self-driving car, has a miniature self-driving car of their own.


The Future of Technology by Tom Standage

air freight, Alan Greenspan, barriers to entry, business process, business process outsourcing, call centre, Clayton Christensen, computer vision, connected car, corporate governance, creative destruction, disintermediation, disruptive innovation, distributed generation, double helix, experimental economics, financial engineering, Ford Model T, full employment, hydrogen economy, hype cycle, industrial robot, informal economy, information asymmetry, information security, interchangeable parts, job satisfaction, labour market flexibility, Larry Ellison, Marc Andreessen, Marc Benioff, market design, Menlo Park, millennium bug, moral hazard, natural language processing, Network effects, new economy, Nicholas Carr, optical character recognition, PalmPilot, railway mania, rent-seeking, RFID, Salesforce, seminal paper, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, six sigma, Skype, smart grid, software as a service, spectrum auction, speech recognition, stem cell, Steve Ballmer, Steve Jurvetson, technological determinism, technology bubble, telemarketer, transcontinental railway, vertical integration, Y2K

“It can make our customers 339 THE FUTURE OF TECHNOLOGY frightened.” This seems odd, because the firm’s search technology uses a classic ai technique, applying multiple algorithms to the same data, and then evaluates the results to see which approach was most effective. Even so, the firm prefers to use such terms as “natural language processing” and “machine learning”. Perhaps the biggest change in ai’s fortunes is simply down to the change of date. The film A.I. was based on an idea by the late director, Stanley Kubrick, who also dealt with the topic in another film, 2001: A Space Odyssey, which was released in 1969. 2001 featured an intelligent computer called hal 9000 with a hypnotic speaking voice.


pages: 351 words: 123,876

Beautiful Testing: Leading Professionals Reveal How They Improve Software (Theory in Practice) by Adam Goucher, Tim Riley

Albert Einstein, barriers to entry, Black Swan, business logic, call centre, continuous integration, Debian, Donald Knuth, en.wikipedia.org, Firefox, Grace Hopper, index card, Isaac Newton, natural language processing, off-by-one error, p-value, performance metric, revision control, six sigma, software as a service, software patent, SQL injection, the scientific method, Therac-25, Valgrind, web application

Since 2001 he has been involved in several free software projects, including Debian and Battle for Wesnoth. He, along with other partners, founded Warp Networks in 2004. Warp Networks is the open source– oriented software company from which eBox Technologies was later spun off. Other interests of his are artificial intelligence and natural language processing. J OHN D. C OOK is a very applied mathematician. After receiving a Ph.D. in from the University of Texas, he taught mathematics at Vanderbilt University. He then left academia to work as a software developer and consultant. He currently works as a research statistician at M. D. Anderson Cancer Center.


pages: 593 words: 118,995

Relevant Search: With Examples Using Elasticsearch and Solr by Doug Turnbull, John Berryman

business logic, cognitive load, commoditize, crowdsourcing, data science, domain-specific language, Dr. Strangelove, fail fast, finite state, fudge factor, full text search, heat death of the universe, information retrieval, machine readable, natural language processing, premature optimization, recommendation engine, sentiment analysis, the long tail

Unfortunately, it’s often not until well after a search system is deployed into production that organizations begin to realize the gap between out-of-the-box relevancy defaults and true domain-driven, personalized matching. Not only that, but the skillsets needed to think about relevancy (domain expertise, feature engineering, machine learning, ontologies, user testing, natural language processing) are very different from those needed to build and maintain scalable infrastructure (distributed systems, data structures, performance and concurrency, hardware utilization, network calls and communication). The role of a relevance engineer is almost entirely lacking in many organizations, leaving so much potential untapped for building a search experience that truly delights users and significantly moves a company forward.


pages: 481 words: 121,669

The Invisible Web: Uncovering Information Sources Search Engines Can't See by Gary Price, Chris Sherman, Danny Sullivan

AltaVista, American Society of Civil Engineers: Report Card, Bill Atkinson, bioinformatics, Brewster Kahle, business intelligence, dark matter, Donald Davies, Douglas Engelbart, Douglas Engelbart, full text search, HyperCard, hypertext link, information retrieval, Internet Archive, it's over 9,000, joint-stock company, knowledge worker, machine readable, machine translation, natural language processing, pre–internet, profit motive, Project Xanadu, publish or perish, search engine result page, side project, Silicon Valley, speech recognition, stealth mode startup, Ted Nelson, Vannevar Bush, web application

A search engine that simultaneously searches other search engines and aggregates the results into a single result list. Metasearch engines typically do not maintain their own indices of Web pages. natural language. Entering a search query exactly as if the question were being written or spoken. Natural Language Processing (NLP) is a technique used by search engines to break up or “parse” the search into a query the engine can understand. “on the fly.” Dynamic Web pages that are assembled in real time, as opposed to static HTML pages. An example could be your MyYahoo.Com page that contains the information (news, sports, weather, etc.) that you select.


pages: 482 words: 121,173

Tools and Weapons: The Promise and the Peril of the Digital Age by Brad Smith, Carol Ann Browne

"World Economic Forum" Davos, Affordable Care Act / Obamacare, AI winter, air gap, airport security, Alan Greenspan, Albert Einstein, algorithmic bias, augmented reality, autonomous vehicles, barriers to entry, Berlin Wall, Big Tech, Bletchley Park, Blitzscaling, Boeing 737 MAX, business process, call centre, Cambridge Analytica, Celtic Tiger, Charlie Hebdo massacre, chief data officer, cloud computing, computer vision, corporate social responsibility, data science, deep learning, digital divide, disinformation, Donald Trump, Eben Moglen, Edward Snowden, en.wikipedia.org, Hacker News, immigration reform, income inequality, Internet of things, invention of movable type, invention of the telephone, Jeff Bezos, Kevin Roose, Laura Poitras, machine readable, Mark Zuckerberg, minimum viable product, national security letter, natural language processing, Network effects, new economy, Nick Bostrom, off-the-grid, operational security, opioid epidemic / opioid crisis, pattern recognition, precision agriculture, race to the bottom, ransomware, Ronald Reagan, Rubik’s Cube, Salesforce, school vouchers, self-driving car, Sheryl Sandberg, Shoshana Zuboff, Silicon Valley, Skype, speech recognition, Steve Ballmer, Steve Jobs, surveillance capitalism, tech worker, The Rise and Fall of American Growth, Tim Cook: Apple, Wargames Reagan, WikiLeaks, women in the workforce

Given the nature and role of academic research, universities have begun to set up data depositories, where data can be shared for multiple uses. Microsoft Research is pursuing this data-sharing approach too, making available a collection of free data sets to advance research in areas such as natural language processing and computer vision, as well as in the physical and social sciences. It was this ability to share data that inspired Matthew Trunnell. He recognized that the best way to accelerate the race to cure cancer is to enable multiple research organizations to share their data in new ways. While this sounds simple in theory, its execution is complicated.


Producing Open Source Software: How to Run a Successful Free Software Project by Karl Fogel

active measures, AGPL, barriers to entry, Benjamin Mako Hill, collaborative editing, continuous integration, Contributor License Agreement, corporate governance, Debian, Donald Knuth, en.wikipedia.org, experimental subject, Firefox, Free Software Foundation, GnuPG, Hacker Ethic, Hacker News, intentional community, Internet Archive, iterative process, Kickstarter, natural language processing, off-by-one error, patent troll, peer-to-peer, pull request, revision control, Richard Stallman, selection bias, slashdot, software as a service, software patent, SpamAssassin, the Cathedral and the Bazaar, Wayback Machine, web application, zero-sum game

References: =========== CVE-2015-892346: Scanley stack overflow in queries Vulnerability: ============== The server can be made to run arbitrary commands if the server's locale is misconfigured and the client sends a malformed query. Severity: ========= Very severe; can involve arbitrary code execution on the server. Workarounds: ============ Setting the 'natural-language-processing' option to 'off' in scanley.conf closes this vulnerability. Patch: ====== The patch below applies to Scanley 3.0, 3.1, and 3.2. A new public release (Scanley 3.2.1) will be made on or just before May 19th, so that it is available at the same time as this vulnerability is made public. You can patch now, or just wait for the public release.


pages: 416 words: 129,308

The One Device: The Secret History of the iPhone by Brian Merchant

Airbnb, animal electricity, Apollo Guidance Computer, Apple II, Apple's 1984 Super Bowl advert, Black Lives Matter, Charles Babbage, citizen journalism, Citizen Lab, Claude Shannon: information theory, computer vision, Computing Machinery and Intelligence, conceptual framework, cotton gin, deep learning, DeepMind, Douglas Engelbart, Dynabook, Edward Snowden, Elon Musk, Ford paid five dollars a day, Frank Gehry, gigafactory, global supply chain, Google Earth, Google Hangouts, Higgs boson, Huaqiangbei: the electronics market of Shenzhen, China, information security, Internet of things, Jacquard loom, John Gruber, John Markoff, Jony Ive, Large Hadron Collider, Lyft, M-Pesa, MITM: man-in-the-middle, more computing power than Apollo, Mother of all demos, natural language processing, new economy, New Journalism, Norbert Wiener, offshore financial centre, oil shock, pattern recognition, peak oil, pirate software, profit motive, QWERTY keyboard, reality distortion field, ride hailing / ride sharing, rolodex, Shenzhen special economic zone , Silicon Valley, Silicon Valley startup, skeuomorphism, skunkworks, Skype, Snapchat, special economic zone, speech recognition, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, TED Talk, Tim Cook: Apple, Tony Fadell, TSMC, Turing test, uber lyft, Upton Sinclair, Vannevar Bush, zero day

Siri is really a constellation of features—speech-recognition software, a natural-language user interface, and an artificially intelligent personal assistant. When you ask Siri a question, here’s what happens: Your voice is digitized and transmitted to an Apple server in the Cloud while a local voice recognizer scans it right on your iPhone. Speech-recognition software translates your speech into text. Natural-language processing parses it. Siri consults what tech writer Steven Levy calls the iBrain—around 200 megabytes of data about your preferences, the way you speak, and other details. If your question can be answered by the phone itself (“Would you set my alarm for eight a.m.?”), the Cloud request is canceled.


pages: 474 words: 130,575

Surveillance Valley: The Rise of the Military-Digital Complex by Yasha Levine

23andMe, activist fund / activist shareholder / activist investor, Adam Curtis, Airbnb, AltaVista, Amazon Web Services, Anne Wojcicki, anti-communist, AOL-Time Warner, Apple's 1984 Super Bowl advert, bitcoin, Black Lives Matter, borderless world, Boston Dynamics, British Empire, Californian Ideology, call centre, Charles Babbage, Chelsea Manning, cloud computing, collaborative editing, colonial rule, company town, computer age, computerized markets, corporate governance, crowdsourcing, cryptocurrency, data science, digital map, disinformation, don't be evil, Donald Trump, Douglas Engelbart, Douglas Engelbart, Dr. Strangelove, drone strike, dual-use technology, Edward Snowden, El Camino Real, Electric Kool-Aid Acid Test, Elon Musk, end-to-end encryption, fake news, fault tolerance, gentrification, George Gilder, ghettoisation, global village, Google Chrome, Google Earth, Google Hangouts, Greyball, Hacker Conference 1984, Howard Zinn, hypertext link, IBM and the Holocaust, index card, Jacob Appelbaum, Jeff Bezos, jimmy wales, John Gilmore, John Markoff, John Perry Barlow, John von Neumann, Julian Assange, Kevin Kelly, Kickstarter, Laura Poitras, life extension, Lyft, machine readable, Mark Zuckerberg, market bubble, Menlo Park, military-industrial complex, Mitch Kapor, natural language processing, Neal Stephenson, Network effects, new economy, Norbert Wiener, off-the-grid, One Laptop per Child (OLPC), packet switching, PageRank, Paul Buchheit, peer-to-peer, Peter Thiel, Philip Mirowski, plutocrats, private military company, RAND corporation, Ronald Reagan, Ross Ulbricht, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, Sheryl Sandberg, side project, Silicon Valley, Silicon Valley startup, Skype, slashdot, Snapchat, Snow Crash, SoftBank, speech recognition, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Susan Wojcicki, Telecommunications Act of 1996, telepresence, telepresence robot, The Bell Curve by Richard Herrnstein and Charles Murray, The Hackers Conference, Tony Fadell, uber lyft, vertical integration, Whole Earth Catalog, Whole Earth Review, WikiLeaks

As I say, however, hopefully, many of the problems will be essentially the same, and essentially as important, in the research context as in the military context.53 On a fundamental level, the computer technology required to power active military operations was no different from the tech scientists and researchers used to do their work. Collaboration, real-time collection and sharing of data, predictive modeling, image analysis, natural language processing, intuitive controls and displays, and computer graphics—if the tools developed by ARPA contractors worked for them and their academic buddies, they would also work for the military with only slight modifications. Today’s military takes this for granted: computer technology is always “dual use,” to be used in both commercial and military applications.


pages: 483 words: 141,836

Red-Blooded Risk: The Secret History of Wall Street by Aaron Brown, Eric Kim

Abraham Wald, activist fund / activist shareholder / activist investor, Albert Einstein, algorithmic trading, Asian financial crisis, Atul Gawande, backtesting, Basel III, Bayesian statistics, Bear Stearns, beat the dealer, Benoit Mandelbrot, Bernie Madoff, Black Swan, book value, business cycle, capital asset pricing model, carbon tax, central bank independence, Checklist Manifesto, corporate governance, creative destruction, credit crunch, Credit Default Swap, currency risk, disintermediation, distributed generation, diversification, diversified portfolio, Edward Thorp, Emanuel Derman, Eugene Fama: efficient market hypothesis, experimental subject, fail fast, fear index, financial engineering, financial innovation, global macro, illegal immigration, implied volatility, independent contractor, index fund, John Bogle, junk bonds, Long Term Capital Management, loss aversion, low interest rates, managed futures, margin call, market clearing, market fundamentalism, market microstructure, Money creation, money market fund, money: store of value / unit of account / medium of exchange, moral hazard, Myron Scholes, natural language processing, open economy, Pierre-Simon Laplace, power law, pre–internet, proprietary trading, quantitative trading / quantitative finance, random walk, Richard Thaler, risk free rate, risk tolerance, risk-adjusted returns, risk/return, road to serfdom, Robert Shiller, shareholder value, Sharpe ratio, special drawing rights, statistical arbitrage, stochastic volatility, stock buybacks, stocks for the long run, tail risk, The Myth of the Rational Market, Thomas Bayes, too big to fail, transaction costs, value at risk, yield curve

At the other extreme, serious researchers sometimes try to apply the precise mechanisms or equations from one field to another. The first approach is almost never fruitful and often crazy. The second is called econophysics when applied to finance, even if the scientific field lending techniques is not physics. Borrowings from signal processing and natural language processing have been spectacularly successful in finance, but the jury is still out on whether other fields can produce worthwhile insights. Econophysics offers a lot of promise and has had some minor successes, but more frustration and failure than progress. I’m doing something different. The idea that risk can be analyzed using probability distributions and utility functions is embedded deeply in economics.


pages: 496 words: 131,938

The Future Is Asian by Parag Khanna

3D printing, Admiral Zheng, affirmative action, Airbnb, Amazon Web Services, anti-communist, Asian financial crisis, asset-backed security, augmented reality, autonomous vehicles, Ayatollah Khomeini, barriers to entry, Basel III, bike sharing, birth tourism , blockchain, Boycotts of Israel, Branko Milanovic, British Empire, call centre, capital controls, carbon footprint, cashless society, clean tech, clean water, cloud computing, colonial rule, commodity super cycle, computer vision, connected car, corporate governance, CRISPR, crony capitalism, cross-border payments, currency peg, death from overwork, deindustrialization, Deng Xiaoping, Didi Chuxing, Dissolution of the Soviet Union, Donald Trump, driverless car, dual-use technology, energy security, European colonialism, factory automation, failed state, fake news, falling living standards, family office, financial engineering, fixed income, flex fuel, gig economy, global reserve currency, global supply chain, Great Leap Forward, green transition, haute couture, haute cuisine, illegal immigration, impact investing, income inequality, industrial robot, informal economy, initial coin offering, Internet of things, karōshi / gwarosa / guolaosi, Kevin Kelly, Kickstarter, knowledge worker, light touch regulation, low cost airline, low skilled workers, Lyft, machine translation, Malacca Straits, Marc Benioff, Mark Zuckerberg, Masayoshi Son, megacity, megaproject, middle-income trap, Mikhail Gorbachev, money market fund, Monroe Doctrine, mortgage debt, natural language processing, Netflix Prize, new economy, off grid, oil shale / tar sands, open economy, Parag Khanna, payday loans, Pearl River Delta, prediction markets, purchasing power parity, race to the bottom, RAND corporation, rent-seeking, reserve currency, ride hailing / ride sharing, Ronald Reagan, Salesforce, Scramble for Africa, self-driving car, Shenzhen special economic zone , Silicon Valley, smart cities, SoftBank, South China Sea, sovereign wealth fund, special economic zone, stem cell, Steve Jobs, Steven Pinker, supply-chain management, sustainable-tourism, synthetic biology, systems thinking, tech billionaire, tech worker, trade liberalization, trade route, transaction costs, Travis Kalanick, uber lyft, upwardly mobile, urban planning, Vision Fund, warehouse robotics, Washington Consensus, working-age population, Yom Kippur War

Autonomous vehicles, energy-efficient power grids, and urban surveillance systems all rest on breakthroughs in AI such as neural networks, which Asians have developed at least a year ahead of their Western counterparts. Andrew Ng, a cofounder of Google Brain and Coursera who then became chief scientist at Baidu, argues that the complexities of Chinese characters and tones pushed Baidu toward advances in natural language processing (NLP) and voice recognition faster than its Western peers. Google’s AI was built on text collected from computers, whereas Baidu from the start focused on location-based data and images collected from mobile devices. Large data sets are the fuel that powers the AI rocket. Alibaba has its customers’ e-commerce and banking transaction data, while Tencent’s data has expanded with its range of customer services while also integrating voice and facial recognition, a field in which the Beijing-based SenseTime is a global leader.


Virtual Competition by Ariel Ezrachi, Maurice E. Stucke

"World Economic Forum" Davos, Airbnb, Alan Greenspan, Albert Einstein, algorithmic management, algorithmic trading, Arthur D. Levinson, barriers to entry, behavioural economics, cloud computing, collaborative economy, commoditize, confounding variable, corporate governance, crony capitalism, crowdsourcing, Daniel Kahneman / Amos Tversky, David Graeber, deep learning, demand response, Didi Chuxing, digital capitalism, disintermediation, disruptive innovation, double helix, Downton Abbey, driverless car, electricity market, Erik Brynjolfsson, Evgeny Morozov, experimental economics, Firefox, framing effect, Google Chrome, independent contractor, index arbitrage, information asymmetry, interest rate derivative, Internet of things, invisible hand, Jean Tirole, John Markoff, Joseph Schumpeter, Kenneth Arrow, light touch regulation, linked data, loss aversion, Lyft, Mark Zuckerberg, market clearing, market friction, Milgram experiment, multi-sided market, natural language processing, Network effects, new economy, nowcasting, offshore financial centre, pattern recognition, power law, prediction markets, price discrimination, price elasticity of demand, price stability, profit maximization, profit motive, race to the bottom, rent-seeking, Richard Thaler, ride hailing / ride sharing, road to serfdom, Robert Bork, Ronald Reagan, search costs, self-driving car, sharing economy, Silicon Valley, Skype, smart cities, smart meter, Snapchat, social graph, Steve Jobs, sunk-cost fallacy, supply-chain management, telemarketer, The Chicago School, The Myth of the Rational Market, The Wealth of Nations by Adam Smith, too big to fail, transaction costs, Travis Kalanick, turn-by-turn navigation, two-sided market, Uber and Lyft, Uber for X, uber lyft, vertical integration, Watson beat the top human players on Jeopardy!, women in the workforce, yield management

Another example concerns the combination of smart algorithms with Facebook’s vast user base, to improve targeting of ads and promotions. In its annual developer conference in 2016, the company discussed the way artificial intelligence (AI) could interact with the rich flow of data from its users. Facebook CEO, Mark Zuckerberg, noted how “with AI and natural language processing combined with human help, people will be able to talk to Messenger bots just like they talk to friends.”55 David Marcus, VP messaging products, reported how the company is “testing if business bots can re-engage people on threads with sponsored messages.”56 Not surprisingly, Apple, Amazon, Google and Microsoft are also investing in voice-activated digital assistants that “learn” to make decisions rather than simply follow instructions.57 The future of instant and online communications will heavily rely on the mutually reinforcing relationship between Big Data and Big Analytics.


pages: 475 words: 134,707

The Hype Machine: How Social Media Disrupts Our Elections, Our Economy, and Our Health--And How We Must Adapt by Sinan Aral

Airbnb, Albert Einstein, algorithmic bias, AlphaGo, Any sufficiently advanced technology is indistinguishable from magic, AOL-Time Warner, augmented reality, behavioural economics, Bernie Sanders, Big Tech, bitcoin, Black Lives Matter, Cambridge Analytica, carbon footprint, Cass Sunstein, computer vision, contact tracing, coronavirus, correlation does not imply causation, COVID-19, crowdsourcing, cryptocurrency, data science, death of newspapers, deep learning, deepfake, digital divide, digital nomad, disinformation, disintermediation, Donald Trump, Drosophila, Edward Snowden, Elon Musk, en.wikipedia.org, end-to-end encryption, Erik Brynjolfsson, experimental subject, facts on the ground, fake news, Filter Bubble, George Floyd, global pandemic, hive mind, illegal immigration, income inequality, Kickstarter, knowledge worker, lockdown, longitudinal study, low skilled workers, Lyft, Mahatma Gandhi, Mark Zuckerberg, Menlo Park, meta-analysis, Metcalfe’s law, mobile money, move fast and break things, multi-sided market, Nate Silver, natural language processing, Neal Stephenson, Network effects, performance metric, phenotype, recommendation engine, Robert Bork, Robert Shiller, Russian election interference, Second Machine Age, seminal paper, sentiment analysis, shareholder value, Sheryl Sandberg, skunkworks, Snapchat, social contagion, social distancing, social graph, social intelligence, social software, social web, statistical model, stem cell, Stephen Hawking, Steve Bannon, Steve Jobs, Steve Jurvetson, surveillance capitalism, Susan Wojcicki, Telecommunications Act of 1996, The Chicago School, the strength of weak ties, The Wisdom of Crowds, theory of mind, TikTok, Tim Cook: Apple, Uber and Lyft, uber lyft, WikiLeaks, work culture , Yogi Berra

Observed activity gives the platform a sense of the popularity of a topic, while the difference between the observed activity and the expected activity for a topic gives the platform a sense of the topic’s novelty. Timeliness is then captured by measuring popularity and novelty in the most recent time periods. But how do the platforms identify topics to begin with? Machine learning and natural language processing can analyze the free-form text posted to social media, but it’s computationally challenging and inefficient to analyze the growing volume of user-generated content without some guidance. So the platforms have widely adopted hashtags as labels signifying topics. This takes the engineering burden off them and harnesses the crowd of users to label topics themselves.


pages: 528 words: 146,459

Computer: A History of the Information Machine by Martin Campbell-Kelly, William Aspray, Nathan L. Ensmenger, Jeffrey R. Yost

Ada Lovelace, air freight, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Apple's 1984 Super Bowl advert, barriers to entry, Bill Gates: Altair 8800, Bletchley Park, borderless world, Buckminster Fuller, Build a better mousetrap, Byte Shop, card file, cashless society, Charles Babbage, cloud computing, combinatorial explosion, Compatible Time-Sharing System, computer age, Computer Lib, deskilling, don't be evil, Donald Davies, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Jenner, Evgeny Morozov, Fairchild Semiconductor, fault tolerance, Fellow of the Royal Society, financial independence, Frederick Winslow Taylor, game design, garden city movement, Gary Kildall, Grace Hopper, Herman Kahn, hockey-stick growth, Ian Bogost, industrial research laboratory, informal economy, interchangeable parts, invention of the wheel, Ivan Sutherland, Jacquard loom, Jeff Bezos, jimmy wales, John Markoff, John Perry Barlow, John von Neumann, Ken Thompson, Kickstarter, light touch regulation, linked data, machine readable, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Mitch Kapor, Multics, natural language processing, Network effects, New Journalism, Norbert Wiener, Occupy movement, optical character recognition, packet switching, PageRank, PalmPilot, pattern recognition, Pierre-Simon Laplace, pirate software, popular electronics, prediction markets, pre–internet, QWERTY keyboard, RAND corporation, Robert X Cringely, Salesforce, scientific management, Silicon Valley, Silicon Valley startup, Steve Jobs, Steven Levy, Stewart Brand, Ted Nelson, the market place, Turing machine, Twitter Arab Spring, Vannevar Bush, vertical integration, Von Neumann architecture, Whole Earth Catalog, William Shockley: the traitorous eight, women in the workforce, young professional

was already well established when two other Stanford University doctoral students, Larry Page and Sergey Brin, began work on the Stanford Digital Library Project (funded in part by the National Science Foundation)—research that would not only forever change the process of finding things on the Internet but also, in time, lead to an unprecedentedly successful web advertising model. Page became interested in a dissertation project on the mathematical properties of the web, and found strong support from his adviser Terry Winograd, a pioneer of artificial intelligence research on natural language processing. Using a “web crawler” to gather back-link data (that is, the websites that linked to a particular site), Page, now teamed up with Brin, created their “PageRank” algorithm based on back-links ranked by importance—the more prominent the linking site, the more influence it would have on the linked site’s page rank.


pages: 523 words: 143,139

Algorithms to Live By: The Computer Science of Human Decisions by Brian Christian, Tom Griffiths

4chan, Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, algorithmic bias, algorithmic trading, anthropic principle, asset allocation, autonomous vehicles, Bayesian statistics, behavioural economics, Berlin Wall, Big Tech, Bill Duvall, bitcoin, Boeing 747, Charles Babbage, cognitive load, Community Supported Agriculture, complexity theory, constrained optimization, cosmological principle, cryptocurrency, Danny Hillis, data science, David Heinemeier Hansson, David Sedaris, delayed gratification, dematerialisation, diversification, Donald Knuth, Donald Shoup, double helix, Dutch auction, Elon Musk, exponential backoff, fault tolerance, Fellow of the Royal Society, Firefox, first-price auction, Flash crash, Frederick Winslow Taylor, fulfillment center, Garrett Hardin, Geoffrey Hinton, George Akerlof, global supply chain, Google Chrome, heat death of the universe, Henri Poincaré, information retrieval, Internet Archive, Jeff Bezos, Johannes Kepler, John Nash: game theory, John von Neumann, Kickstarter, knapsack problem, Lao Tzu, Leonard Kleinrock, level 1 cache, linear programming, martingale, multi-armed bandit, Nash equilibrium, natural language processing, NP-complete, P = NP, packet switching, Pierre-Simon Laplace, power law, prediction markets, race to the bottom, RAND corporation, RFC: Request For Comment, Robert X Cringely, Sam Altman, scientific management, sealed-bid auction, second-price auction, self-driving car, Silicon Valley, Skype, sorting algorithm, spectrum auction, Stanford marshmallow experiment, Steve Jobs, stochastic process, Thomas Bayes, Thomas Malthus, Tragedy of the Commons, traveling salesman, Turing machine, urban planning, Vickrey auction, Vilfredo Pareto, Walter Mischel, Y Combinator, zero-sum game

Computers, Environment and Urban Systems 32, no. 6 (2008): 431–439. Berezovsky, Boris, and Alexander V. Gnedin. Problems of Best Choice (in Russian). Moscow: Akademia Nauk, 1984. Berg-Kirkpatrick, Taylor, and Dan Klein. “Decipherment with a Million Random Restarts.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing (2013): 874–878. Bernardo, Antonio E., and Ivo Welch. “On the Evolution of Overconfidence and Entrepreneurs.” Journal of Economics & Management Strategy 10, no. 3 (2001): 301–330. Berry, Donald A. “A Bernoulli Two-Armed Bandit.” Annals of Mathematical Statistics 43 (1972): 871–897. ______.


Beginning R: The Statistical Programming Language by Mark Gardener

correlation coefficient, distributed generation, natural language processing, New Urbanism, p-value, statistical model

Table 1-1: Task Views and Their Uses Title Uses Bayesian Bayesian Inference ChemPhys Chemometrics and Computational Physics ClinicalTrials Clinical Trial Design, Monitoring, and Analysis Cluster Cluster Analysis & Finite Mixture Models Distributions Probability Distributions Econometrics Computational Econometrics Environmetrics Analysis of Ecological and Environmental Data ExperimentalDesign Design of Experiments (DoE) & Analysis of Experimental Data Finance Empirical Finance Genetics Statistical Genetics Graphics Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization gR gRaphical Models in R HighPerformanceComputing High-Performance and Parallel Computing with R MachineLearning Machine Learning & Statistical Learning MedicalImaging Medical Image Analysis Multivariate Multivariate Statistics NaturalLanguageProcessing Natural Language Processing OfficialStatistics Official Statistics & Survey Methodology Optimization Optimization and Mathematical Programming Pharmacokinetics Analysis of Pharmacokinetic Data Phylogenetics Phylogenetics, Especially Comparative Methods Psychometrics Psychometric Models and Methods ReproducibleResearch Reproducible Research Robust Robust Statistical Methods SocialSciences Statistics for the Social Sciences Spatial Analysis of Spatial Data Survival Survival Analysis TimeSeries Time Series Analysis Alternatively, you can search the Internet for your topic and you will likely find quite a few hits that mention appropriate R packages.


Sorting Things Out: Classification and Its Consequences (Inside Technology) by Geoffrey C. Bowker

affirmative action, business process, classic study, corporate governance, Drosophila, government statistician, information retrieval, loose coupling, Menlo Park, Mitch Kapor, natural language processing, Occam's razor, QWERTY keyboard, Scientific racism, scientific worldview, sexual politics, statistical model, Stephen Hawking, Stewart Brand, tacit knowledge, the built environment, the medium is the message, the strength of weak ties, transaction costs, William of Occam

A technical issue about how to code process can become a challenge to organizational theory and its data­ base. A defense of process can become an attack on the scientific world-view. Susan Grobe, a nursing informatician, has made one of the chief attacks on the N I C scheme. She believes that rather than standardized nursing language, computer scientists should develop natural language processing tools so that nurse narratives can be 2 74 Chapter 8 interpreted . Grobe argues for the abandonment of any goal of pro­ ducing "a single coherent account of the pattern of action and beliefs in science" (Grobe 1 992, 92). She goes on to say that "philosophers of science have long acknowledged the value of a multiplicity of scientific views" (92) .


pages: 517 words: 147,591

Small Wars, Big Data: The Information Revolution in Modern Conflict by Eli Berman, Joseph H. Felter, Jacob N. Shapiro, Vestal Mcintyre

basic income, call centre, centre right, classic study, clean water, confounding variable, crowdsourcing, data science, demand response, drone strike, experimental economics, failed state, George Akerlof, Google Earth, guns versus butter model, HESCO bastion, income inequality, income per capita, information asymmetry, Internet of things, iterative process, land reform, mandatory minimum, minimum wage unemployment, moral hazard, natural language processing, operational security, RAND corporation, randomized controlled trial, Ronald Reagan, school vouchers, statistical model, the scientific method, trade route, Twitter Arab Spring, unemployed young men, WikiLeaks, World Values Survey

This is the same type of instrumental variables approach as Jake and coauthors’ work on civilian casualties and informing in Afghanistan that we discussed in chapter 7. 67. Vanden Eynde, “Targets of Violence,” 1–2. 68. Thiemo Fetzer, “Social Insurance and Conflict: Evidence from India” (EOPP Working Paper No. 53, 2014). One innovative aspect of this research is the use of natural language processing to code incident reports in order to extract perpetrator and victim types. 69. Eli Berman, Jacob N. Shapiro, and Joseph H. Felter, “Can Hearts and Minds Be Bought? The Economics of Counterinsurgency in Iraq,” Journal of Political Economy 119, no. 4 (2011): 766–819, appendix A, p. 813. 70.


pages: 688 words: 147,571

Robot Rules: Regulating Artificial Intelligence by Jacob Turner

"World Economic Forum" Davos, Ada Lovelace, Affordable Care Act / Obamacare, AI winter, algorithmic bias, algorithmic trading, AlphaGo, artificial general intelligence, Asilomar, Asilomar Conference on Recombinant DNA, autonomous vehicles, backpropagation, Basel III, bitcoin, Black Monday: stock market crash in 1987, blockchain, brain emulation, Brexit referendum, Cambridge Analytica, Charles Babbage, Clapham omnibus, cognitive dissonance, Computing Machinery and Intelligence, corporate governance, corporate social responsibility, correlation does not imply causation, crowdsourcing, data science, deep learning, DeepMind, Demis Hassabis, distributed ledger, don't be evil, Donald Trump, driverless car, easy for humans, difficult for computers, effective altruism, Elon Musk, financial exclusion, financial innovation, friendly fire, future of work, hallucination problem, hive mind, Internet of things, iterative process, job automation, John Markoff, John von Neumann, Loebner Prize, machine readable, machine translation, medical malpractice, Nate Silver, natural language processing, Nick Bostrom, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, nudge unit, obamacare, off grid, OpenAI, paperclip maximiser, pattern recognition, Peace of Westphalia, Philippa Foot, race to the bottom, Ray Kurzweil, Recombinant DNA, Rodney Brooks, self-driving car, Silicon Valley, Stanislav Petrov, Stephen Hawking, Steve Wozniak, strong AI, technological singularity, Tesla Model S, The Coming Technological Singularity, The Future of Employment, The Signal and the Noise by Nate Silver, trolley problem, Turing test, Vernor Vinge

Therefore, before it is possible to demonstrate the spreading influence of AI or the need for legal controls, we must first set out what we mean by this term. 2 Narrow and General AI It is helpful at the outset to distinguish two classifications for AI: narrow and general.18 Narrow (sometimes referred to as “weak”) AI denotes the ability of a system to achieve a certain stipulated goal or set of goals, in a manner or using techniques which qualify as intelligent (the meaning of “intelligence” is addressed below). These limited goals might include natural language processing functions like translation, or navigating through an unfamiliar physical environment. A narrow AI system is suited only to the task for which it is designed. The great majority of AI systems in the world today are closer to this narrow and limited type. General (or “strong”) AI is the ability to achieve an unlimited range of goals, and even to set new goals independently, including in situations of uncertainty or vagueness.


Data Wrangling With Python: Tips and Tools to Make Your Life Easier by Jacqueline Kazil

Amazon Web Services, bash_history, business logic, cloud computing, correlation coefficient, crowdsourcing, data acquisition, data science, database schema, Debian, en.wikipedia.org, Fairphone, Firefox, Global Witness, Google Chrome, Hacker News, job automation, machine readable, Nate Silver, natural language processing, pull request, Ronald Reagan, Ruby on Rails, selection bias, social web, statistical model, web application, WikiLeaks

Fuzzy Matching If you are using more than one dataset or unclean, unstandardized data, you might use fuzzy matching to find and combine duplicates. Fuzzy matching allows you to determine if two items (usually strings) are “the same.” While not as in-depth as using Data Cleanup Basics | 177 natural language processing or machine learning to determine a match with big data‐ sets on language, fuzzy matching can help us relate “My dog & I” and “me and my dog” as having similar meaning. There are many ways to go about fuzzy matching. One Python library, developed by SeatGeek, uses some pretty cool methods internally to match tickets being sold online for different events.


pages: 565 words: 151,129

The Zero Marginal Cost Society: The Internet of Things, the Collaborative Commons, and the Eclipse of Capitalism by Jeremy Rifkin

3D printing, active measures, additive manufacturing, Airbnb, autonomous vehicles, back-to-the-land, benefit corporation, big-box store, bike sharing, bioinformatics, bitcoin, business logic, business process, Chris Urmson, circular economy, clean tech, clean water, cloud computing, collaborative consumption, collaborative economy, commons-based peer production, Community Supported Agriculture, Computer Numeric Control, computer vision, crowdsourcing, demographic transition, distributed generation, DIY culture, driverless car, Eben Moglen, electricity market, en.wikipedia.org, Frederick Winslow Taylor, Free Software Foundation, Garrett Hardin, general purpose technology, global supply chain, global village, Hacker Conference 1984, Hacker Ethic, industrial robot, informal economy, information security, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Isaac Newton, James Watt: steam engine, job automation, John Elkington, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Julian Assange, Kickstarter, knowledge worker, longitudinal study, low interest rates, machine translation, Mahatma Gandhi, manufacturing employment, Mark Zuckerberg, market design, mass immigration, means of production, meta-analysis, Michael Milken, mirror neurons, natural language processing, new economy, New Urbanism, nuclear winter, Occupy movement, off grid, off-the-grid, oil shale / tar sands, pattern recognition, peer-to-peer, peer-to-peer lending, personalized medicine, phenotype, planetary scale, price discrimination, profit motive, QR code, RAND corporation, randomized controlled trial, Ray Kurzweil, rewilding, RFID, Richard Stallman, risk/return, Robert Solow, Rochdale Principles, Ronald Coase, scientific management, search inside the book, self-driving car, shareholder value, sharing economy, Silicon Valley, Skype, smart cities, smart grid, smart meter, social web, software as a service, spectrum auction, Steve Jobs, Stewart Brand, the built environment, the Cathedral and the Bazaar, the long tail, The Nature of the Firm, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas L Friedman, too big to fail, Tragedy of the Commons, transaction costs, urban planning, vertical integration, warehouse automation, Watson beat the top human players on Jeopardy!, web application, Whole Earth Catalog, Whole Earth Review, WikiLeaks, working poor, Yochai Benkler, zero-sum game, Zipcar

The Big Ten Network uses algorithms to create original pieces posted just seconds after games, eliminating human copywriters.37 Artificial intelligence took a big leap into the future in 2011 when an IBM computer, Watson—named after IBM’s past chairman—took on Ken Jennings, who held the record of 74 wins on the popular TV show Jeopardy, and defeated him. The showdown, which netted a $1 million prize for IBM, blew away TV viewers as they watched their Jeopardy hero crumble in the presence of the “all-knowing” Watson. Watson is a cognitive system that is able to integrate “natural language processing, machine learning, and hypothesis generation and evaluation,” says its proud IBM parent, allowing it to think and respond to questions and problems.38 Watson is already being put to work. IBM Healthcare Analytics will use Watson to assist physicians in making quick and accurate diagnoses by analyzing Big Data stored in the electronic health records of millions of patients, as well as in medical journals.39 IBM’s plans for Watson go far beyond serving the specialized needs of the research industry and the back-office tasks of managing Big Data.


pages: 543 words: 153,550

Model Thinker: What You Need to Know to Make Data Work for You by Scott E. Page

Airbnb, Albert Einstein, Alfred Russel Wallace, algorithmic trading, Alvin Roth, assortative mating, behavioural economics, Bernie Madoff, bitcoin, Black Swan, blockchain, business cycle, Capital in the Twenty-First Century by Thomas Piketty, Checklist Manifesto, computer age, corporate governance, correlation does not imply causation, cuban missile crisis, data science, deep learning, deliberate practice, discrete time, distributed ledger, Easter island, en.wikipedia.org, Estimating the Reproducibility of Psychological Science, Everything should be made as simple as possible, experimental economics, first-price auction, Flash crash, Ford Model T, Geoffrey West, Santa Fe Institute, germ theory of disease, Gini coefficient, Higgs boson, High speed trading, impulse control, income inequality, Isaac Newton, John von Neumann, Kenneth Rogoff, knowledge economy, knowledge worker, Long Term Capital Management, loss aversion, low skilled workers, Mark Zuckerberg, market design, meta-analysis, money market fund, multi-armed bandit, Nash equilibrium, natural language processing, Network effects, opioid epidemic / opioid crisis, p-value, Pareto efficiency, pattern recognition, Paul Erdős, Paul Samuelson, phenotype, Phillips curve, power law, pre–internet, prisoner's dilemma, race to the bottom, random walk, randomized controlled trial, Richard Feynman, Richard Thaler, Robert Solow, school choice, scientific management, sealed-bid auction, second-price auction, selection bias, six sigma, social graph, spectrum auction, statistical model, Stephen Hawking, Supply of New York City Cabdrivers, systems thinking, tacit knowledge, The Bell Curve by Richard Herrnstein and Charles Murray, The Great Moderation, the long tail, The Rise and Fall of American Growth, the rule of 72, the scientific method, The Spirit Level, the strength of weak ties, The Wisdom of Crowds, Thomas Malthus, Thorstein Veblen, Tragedy of the Commons, urban sprawl, value at risk, web application, winner-take-all economy, zero-sum game

Student academic performance data now includes scores on every homework, paper, quiz, and exam, as opposed to semester-end summary grades. In the past, a farmer might mention dry ground at a monthly Grange meeting. Now, tractors transmit instantaneous data on soil conditions and moisture levels in square-foot increments. Investment firms track dozens of ratios and trends for thousands of stocks and use natural-language processing tools to parse documents. Doctors can pull up page upon page of individual patient records that can include relevant genetic markers. A mere twenty-five years ago, most of us had access to little more than a few bookshelves’ worth of knowledge. Perhaps your place of work had a small reference library, or at home you had a collection of encyclopedias and a few dozen reference books.


pages: 579 words: 160,351

Breaking News: The Remaking of Journalism and Why It Matters Now by Alan Rusbridger

"World Economic Forum" Davos, accounting loophole / creative accounting, Airbnb, Andy Carvin, banking crisis, Bellingcat, Bernie Sanders, Bletchley Park, Boris Johnson, Brexit referendum, Cambridge Analytica, centre right, Chelsea Manning, citizen journalism, country house hotel, cross-subsidies, crowdsourcing, data science, David Attenborough, David Brooks, death of newspapers, Donald Trump, Doomsday Book, Double Irish / Dutch Sandwich, Downton Abbey, Edward Snowden, Etonian, Evgeny Morozov, fake news, Filter Bubble, folksonomy, forensic accounting, Frank Gehry, future of journalism, G4S, high net worth, information security, invention of movable type, invention of the printing press, Jeff Bezos, jimmy wales, Julian Assange, Large Hadron Collider, Laura Poitras, Mark Zuckerberg, Mary Meeker, Menlo Park, natural language processing, New Journalism, offshore financial centre, oil shale / tar sands, open borders, packet switching, Panopticon Jeremy Bentham, post-truth, pre–internet, ransomware, recommendation engine, Ruby on Rails, sexual politics, Silicon Valley, Skype, Snapchat, social web, Socratic dialogue, sovereign wealth fund, speech recognition, Steve Bannon, Steve Jobs, the long tail, The Wisdom of Crowds, Tim Cook: Apple, traveling salesman, upwardly mobile, WikiLeaks, Yochai Benkler

Joanna Geary,10 whom we’d hired in 2011 to look after social media, posted on Facebook in late 2017: About 10 years ago I thought I might need to learn Ruby on Rails [to build web apps] to understand what’s going on in journalism. Then, about 5 years after that, I thought I might need an MBA. Now, the qualifications I need are probably in: Computer Science Data Science Natural Language Processing Graph Analysis Advanced Critical Thinking Anthropology Behavioural Sciences Product Management Business Administration Social Psychology Coaching & People Development Change Management I think I need to lie down . . . We had recruited two stars of the digital news universe – Wolfgang Blau from Die Zeit in Germany and Aron Pilhofer from the New York Times11 – and relaunched the website in a design that worked much better over desktop, tablet and mobile.


We Are the Nerds: The Birth and Tumultuous Life of Reddit, the Internet's Culture Laboratory by Christine Lagorio-Chafkin

"Friedman doctrine" OR "shareholder theory", 4chan, Aaron Swartz, Airbnb, Amazon Web Services, Bernie Sanders, big-box store, bitcoin, blockchain, Brewster Kahle, Burning Man, compensation consultant, crowdsourcing, cryptocurrency, data science, David Heinemeier Hansson, digital rights, disinformation, Donald Trump, East Village, eternal september, fake news, game design, Golden Gate Park, growth hacking, Hacker News, hiring and firing, independent contractor, Internet Archive, Jacob Appelbaum, Jeff Bezos, jimmy wales, Joi Ito, Justin.tv, Kickstarter, Large Hadron Collider, Lean Startup, lolcat, Lyft, Marc Andreessen, Mark Zuckerberg, medical residency, minimum viable product, natural language processing, Palm Treo, Paul Buchheit, Paul Graham, paypal mafia, Peter Thiel, plutocrats, QR code, r/findbostonbombers, recommendation engine, RFID, rolodex, Ruby on Rails, Sam Altman, Sand Hill Road, Saturday Night Live, self-driving car, semantic web, Sheryl Sandberg, side project, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, slashdot, Snapchat, Social Justice Warrior, social web, South of Market, San Francisco, Startup school, Stephen Hawking, Steve Bannon, Steve Jobs, Steve Wozniak, Streisand effect, technoutopianism, uber lyft, Wayback Machine, web application, WeWork, WikiLeaks, Y Combinator

Still, Slowe had one foot firmly in startup land and did not want to extract it. He’d developed a daily routine of waking early and working all day in the lab, only taking a break to train for a half-marathon over lunch. In the evening, he’d put in another five or six hours brainstorming about natural-language processing for his startup. He lived on caffeine and adrenaline, and he loved it. For Slowe, by year four of his graduate work, physics had become a job in the most meh sense of the word. He wasn’t dissatisfied, and he still assumed that his lab life—stuck in windowless rooms modeling out hypotheses on stodgy computers, experimenting only a tiny fraction of the time—resembled his future postdoctoral life.


pages: 626 words: 167,836

The Technology Trap: Capital, Labor, and Power in the Age of Automation by Carl Benedikt Frey

3D printing, AlphaGo, Alvin Toffler, autonomous vehicles, basic income, Bernie Sanders, Branko Milanovic, British Empire, business cycle, business process, call centre, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, Charles Babbage, Clayton Christensen, collective bargaining, computer age, computer vision, Corn Laws, Cornelius Vanderbilt, creative destruction, data science, David Graeber, David Ricardo: comparative advantage, deep learning, DeepMind, deindustrialization, demographic transition, desegregation, deskilling, Donald Trump, driverless car, easy for humans, difficult for computers, Edward Glaeser, Elon Musk, Erik Brynjolfsson, everywhere but in the productivity statistics, factory automation, Fairchild Semiconductor, falling living standards, first square of the chessboard / second half of the chessboard, Ford Model T, Ford paid five dollars a day, Frank Levy and Richard Murnane: The New Division of Labor, full employment, future of work, game design, general purpose technology, Gini coefficient, Great Leap Forward, Hans Moravec, high-speed rail, Hyperloop, income inequality, income per capita, independent contractor, industrial cluster, industrial robot, intangible asset, interchangeable parts, Internet of things, invention of agriculture, invention of movable type, invention of the steam engine, invention of the wheel, Isaac Newton, James Hargreaves, James Watt: steam engine, Jeremy Corbyn, job automation, job satisfaction, job-hopping, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kickstarter, Kiva Systems, knowledge economy, knowledge worker, labor-force participation, labour mobility, Lewis Mumford, Loebner Prize, low skilled workers, machine translation, Malcom McLean invented shipping containers, manufacturing employment, mass immigration, means of production, Menlo Park, minimum wage unemployment, natural language processing, new economy, New Urbanism, Nick Bostrom, Norbert Wiener, nowcasting, oil shock, On the Economy of Machinery and Manufactures, OpenAI, opioid epidemic / opioid crisis, Pareto efficiency, pattern recognition, pink-collar, Productivity paradox, profit maximization, Renaissance Technologies, rent-seeking, rising living standards, Robert Gordon, Robert Solow, robot derives from the Czech word robota Czech, meaning slave, safety bicycle, Second Machine Age, secular stagnation, self-driving car, seminal paper, Silicon Valley, Simon Kuznets, social intelligence, sparse data, speech recognition, spinning jenny, Stephen Hawking, tacit knowledge, The Future of Employment, The Rise and Fall of American Growth, The Wealth of Nations by Adam Smith, Thomas Malthus, total factor productivity, trade route, Triangle Shirtwaist Factory, Turing test, union organizing, universal basic income, warehouse automation, washing machines reduced drudgery, wealth creators, women in the workforce, working poor, zero-sum game

And in August 2017, a research paper published by Microsoft’s AI team revealed additional improvements, reducing the error rate from 6 percent to 5 percent.17 And like image recognition technology promises to replace doctors in diagnostic tasks, advances in speech recognition and user interfaces promise to replace workers in some interactive tasks. As we all know, Apple’s Siri, Google Assistant, and Amazon’s Alexa rely on natural user interfaces to recognize spoken words, interpret their meanings, and respond to them accordingly. Using speech recognition technology and natural language processing, a company called Clinc is now developing a new AI voice assistant to be used in drive-through windows of fast-food restaurants like McDonald’s and Taco Bell.18 And in 2018, Google announced that it is building AI technology to replace workers in call centers. Virtual agents will answer the phone when a customer calls.


pages: 505 words: 161,581

The Founders: The Story of Paypal and the Entrepreneurs Who Shaped Silicon Valley by Jimmy Soni

activist fund / activist shareholder / activist investor, Ada Lovelace, AltaVista, Apple Newton, barriers to entry, Big Tech, bitcoin, Blitzscaling, book value, business logic, butterfly effect, call centre, Carl Icahn, Claude Shannon: information theory, cloud computing, Colonization of Mars, Computing Machinery and Intelligence, corporate governance, COVID-19, crack epidemic, cryptocurrency, currency manipulation / currency intervention, digital map, disinformation, disintermediation, drop ship, dumpster diving, Elon Musk, Fairchild Semiconductor, fear of failure, fixed income, General Magic , general-purpose programming language, Glass-Steagall Act, global macro, global pandemic, income inequality, index card, index fund, information security, intangible asset, Internet Archive, iterative process, Jeff Bezos, Jeff Hawkins, John Markoff, Kwajalein Atoll, Lyft, Marc Andreessen, Mark Zuckerberg, Mary Meeker, Max Levchin, Menlo Park, Metcalfe’s law, mobile money, money market fund, multilevel marketing, mutually assured destruction, natural language processing, Network effects, off-the-grid, optical character recognition, PalmPilot, pattern recognition, paypal mafia, Peter Thiel, pets.com, Potemkin village, public intellectual, publish or perish, Richard Feynman, road to serfdom, Robert Metcalfe, Robert X Cringely, rolodex, Sand Hill Road, Satoshi Nakamoto, seigniorage, shareholder value, side hustle, Silicon Valley, Silicon Valley startup, slashdot, SoftBank, software as a service, Startup school, Steve Ballmer, Steve Jobs, Steve Jurvetson, Steve Wozniak, technoutopianism, the payments system, transaction costs, Turing test, uber lyft, Vanguard fund, winner-take-all economy, Y Combinator, Y2K

After prison, Stephen used his programming talents to run a software consultancy, then built a start-up whose logistics technology helped schools, companies, and other venues remain open during the COVID-19 pandemic. Stephen even earned a patent—US10417204B2, “Method and system for creation and delivery of dynamic communications”—for his work on natural language processing. Chris wasn’t far behind. He started two businesses, wrote a widely praised book, and launched a second career as a globe-trotting artist—a remarkable post-prison trajectory culminating with an appearance on The Daily Show with Trevor Noah to promote his book, The Master Plan. Both Chris and Stephen live life with a rare sense of urgency—an urgency that comes from a keen awareness of life’s preciousness.


pages: 725 words: 168,262

API Design Patterns by Jj Geewax

Amazon Web Services, anti-pattern, bitcoin, blockchain, business logic, cognitive load, continuous integration, COVID-19, database schema, en.wikipedia.org, exponential backoff, imposter syndrome, Internet of things, Kubernetes, lateral thinking, loose coupling, machine readable, microservices, natural language processing, Paradox of Choice, ride hailing / ride sharing, social graph, sorting algorithm

This context is a huge asset generally, but in this case it’s more of a liability: it makes us bad at naming things. For example, the term topic is often used in the context of asynchronous messaging (e.g., Apache Kafka or RabbitMQ); however, it’s also used in a specific area of machine learning and natural language processing called topic modeling. If you were to use the term topic in your machine learning API, it wouldn’t be all that surprising that users might be confused about which type of topic you’re referring to. If that’s a real possibility (perhaps your API uses both asynchronous messaging and topic modeling), you might want to choose a more expressive name than topic, such as model_topic or messaging_topic to prevent user confusion. 3.2.2 Simple While an expressive name is certainly important, it can also become burdensome if the name is excessively long without adding additional clarity.


pages: 584 words: 187,436

More Money Than God: Hedge Funds and the Making of a New Elite by Sebastian Mallaby

Alan Greenspan, Andrei Shleifer, Asian financial crisis, asset-backed security, automated trading system, bank run, barriers to entry, Bear Stearns, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Big bang: deregulation of the City of London, Bonfire of the Vanities, book value, Bretton Woods, business cycle, buy and hold, capital controls, Carmen Reinhart, collapse of Lehman Brothers, collateralized debt obligation, computerized trading, corporate raider, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, currency manipulation / currency intervention, currency peg, deal flow, do well by doing good, Elliott wave, Eugene Fama: efficient market hypothesis, failed state, Fall of the Berlin Wall, financial deregulation, financial engineering, financial innovation, financial intermediation, fixed income, full employment, German hyperinflation, High speed trading, index fund, Jim Simons, John Bogle, John Meriwether, junk bonds, Kenneth Rogoff, Kickstarter, Long Term Capital Management, low interest rates, machine translation, margin call, market bubble, market clearing, market fundamentalism, Market Wizards by Jack D. Schwager, Mary Meeker, merger arbitrage, Michael Milken, money market fund, moral hazard, Myron Scholes, natural language processing, Network effects, new economy, Nikolai Kondratiev, operational security, pattern recognition, Paul Samuelson, pre–internet, proprietary trading, public intellectual, quantitative hedge fund, quantitative trading / quantitative finance, random walk, Renaissance Technologies, Richard Thaler, risk-adjusted returns, risk/return, Robert Mercer, rolodex, Savings and loan crisis, Sharpe ratio, short selling, short squeeze, Silicon Valley, South Sea Bubble, sovereign wealth fund, statistical arbitrage, statistical model, survivorship bias, tail risk, technology bubble, The Great Moderation, The Myth of the Rational Market, the new new thing, too big to fail, transaction costs, two and twenty, uptick rule

It is also interesting that Brown and Mercer’s coauthors who followed them to Renaissance, Stephen and Vincent Della Pietra, explicitly presented their experience with statistical machine translation as relevant to finding order in other types of data, including financial data. See Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra, “A Maximum Entropy Approach to Natural Language Processing,” Computational Linguistics 22, no. 1 (March 1996): pp. 39–71. 30. To manage the potential linguistic chaos resulting from this permissiveness, neologisms had to be submitted to a review. Mercer interview. 31. The Russian employees were Pavel Volfbeyn and Alexander Belopolsky. The firm that they defected to was Millennium.


pages: 661 words: 187,613

The Language Instinct: How the Mind Creates Language by Steven Pinker

Albert Einstein, Boeing 747, cloud computing, Computing Machinery and Intelligence, David Attenborough, double helix, Drosophila, elephant in my pajamas, finite state, Gregor Mendel, illegal immigration, Joan Didion, language acquisition, Loebner Prize, mass immigration, Maui Hawaii, meta-analysis, MITM: man-in-the-middle, natural language processing, out of africa, phenotype, rolodex, Ronald Reagan, Sapir-Whorf hypothesis, Saturday Night Live, speech recognition, Steven Pinker, Strategic Defense Initiative, tacit knowledge, theory of mind, transatlantic slave trade, Turing machine, Turing test, twin studies, Yogi Berra

.), 1964, Samuel Johnson’s Dictionary: A modern selection. New York: Pantheon. Joos, M. (Ed.) 1957. Readings in linguistics: The development of descriptive linguistics in America since 1925. Washington, D.C.: American Council of Learned Societies. Jordan, M. I., & Rosenbaum, D. 1989. Action. In Posner, 1989. Joshi, A. K. 1991. Natural language processing. Science, 253, 1242–1249. Kaplan, R. 1972. Augmented transition networks as psychological models of sentence comprehension. Artificial Intelligence, 3, 77–100. Kaplan, S. 1992. Environmental preference in a knowledge-seeking, knowledge-using organism. In Barkow, Cosmides, & Tooby, 1992.


pages: 651 words: 186,130

This Is How They Tell Me the World Ends: The Cyberweapons Arms Race by Nicole Perlroth

4chan, active measures, activist lawyer, air gap, Airbnb, Albert Einstein, Apollo 11, barriers to entry, Benchmark Capital, Bernie Sanders, Big Tech, bitcoin, Black Lives Matter, blood diamond, Boeing 737 MAX, Brexit referendum, Brian Krebs, Citizen Lab, cloud computing, commoditize, company town, coronavirus, COVID-19, crony capitalism, crowdsourcing, cryptocurrency, dark matter, David Vincenzetti, defense in depth, digital rights, disinformation, don't be evil, Donald Trump, driverless car, drone strike, dual-use technology, Edward Snowden, end-to-end encryption, failed state, fake news, false flag, Ferguson, Missouri, Firefox, gender pay gap, George Floyd, global pandemic, global supply chain, Hacker News, index card, information security, Internet of things, invisible hand, Jacob Appelbaum, Jeff Bezos, John Markoff, Ken Thompson, Kevin Roose, Laura Poitras, lockdown, Marc Andreessen, Mark Zuckerberg, mass immigration, Menlo Park, MITM: man-in-the-middle, moral hazard, Morris worm, move fast and break things, mutually assured destruction, natural language processing, NSO Group, off-the-grid, offshore financial centre, open borders, operational security, Parler "social media", pirate software, purchasing power parity, race to the bottom, RAND corporation, ransomware, Reflections on Trusting Trust, rolodex, Rubik’s Cube, Russian election interference, Sand Hill Road, Seymour Hersh, Sheryl Sandberg, side project, Silicon Valley, Skype, smart cities, smart grid, South China Sea, Steve Ballmer, Steve Bannon, Steve Jobs, Steven Levy, Stuxnet, supply-chain attack, TED Talk, the long tail, the scientific method, TikTok, Tim Cook: Apple, undersea cable, unit 8200, uranium enrichment, web application, WikiLeaks, zero day, Zimmermann PGP

, besting the game’s all-time human champions and proving that machines were now capable of understanding questions and answering them in natural languages. A short eight months later, Apple introduced the world to Siri, our new voice assistant, whose high-quality voice recognition and natural language processing let us send emails and texts and set reminders and playlists. The resulting mix of mass mobility, connectivity, storage, processing, and computational power gave NSA unprecedented openings and capabilities to track every last person and sensor on earth. Over the next decade the NSA continued to probe every last pore of this new digital dimension for exploitation, surveillance, and future attack.


pages: 677 words: 206,548

Future Crimes: Everything Is Connected, Everyone Is Vulnerable and What We Can Do About It by Marc Goodman

23andMe, 3D printing, active measures, additive manufacturing, Affordable Care Act / Obamacare, Airbnb, airport security, Albert Einstein, algorithmic trading, Alvin Toffler, Apollo 11, Apollo 13, artificial general intelligence, Asilomar, Asilomar Conference on Recombinant DNA, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, Bill Joy: nanobots, bitcoin, Black Swan, blockchain, borderless world, Boston Dynamics, Brian Krebs, business process, butterfly effect, call centre, Charles Lindbergh, Chelsea Manning, Citizen Lab, cloud computing, Cody Wilson, cognitive dissonance, computer vision, connected car, corporate governance, crowdsourcing, cryptocurrency, data acquisition, data is the new oil, data science, Dean Kamen, deep learning, DeepMind, digital rights, disinformation, disintermediation, Dogecoin, don't be evil, double helix, Downton Abbey, driverless car, drone strike, Edward Snowden, Elon Musk, Erik Brynjolfsson, Evgeny Morozov, Filter Bubble, Firefox, Flash crash, Free Software Foundation, future of work, game design, gamification, global pandemic, Google Chrome, Google Earth, Google Glasses, Gordon Gekko, Hacker News, high net worth, High speed trading, hive mind, Howard Rheingold, hypertext link, illegal immigration, impulse control, industrial robot, information security, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Jaron Lanier, Jeff Bezos, job automation, John Harrison: Longitude, John Markoff, Joi Ito, Jony Ive, Julian Assange, Kevin Kelly, Khan Academy, Kickstarter, Kiva Systems, knowledge worker, Kuwabatake Sanjuro: assassination market, Large Hadron Collider, Larry Ellison, Laura Poitras, Law of Accelerating Returns, Lean Startup, license plate recognition, lifelogging, litecoin, low earth orbit, M-Pesa, machine translation, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Metcalfe’s law, MITM: man-in-the-middle, mobile money, more computing power than Apollo, move fast and break things, Nate Silver, national security letter, natural language processing, Nick Bostrom, obamacare, Occupy movement, Oculus Rift, off grid, off-the-grid, offshore financial centre, operational security, optical character recognition, Parag Khanna, pattern recognition, peer-to-peer, personalized medicine, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, printed gun, RAND corporation, ransomware, Ray Kurzweil, Recombinant DNA, refrigerator car, RFID, ride hailing / ride sharing, Rodney Brooks, Ross Ulbricht, Russell Brand, Salesforce, Satoshi Nakamoto, Second Machine Age, security theater, self-driving car, shareholder value, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, SimCity, Skype, smart cities, smart grid, smart meter, Snapchat, social graph, SoftBank, software as a service, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, Stuxnet, subscription business, supply-chain management, synthetic biology, tech worker, technological singularity, TED Talk, telepresence, telepresence robot, Tesla Model S, The future is already here, The Future of Employment, the long tail, The Wisdom of Crowds, Tim Cook: Apple, trade route, uranium enrichment, Virgin Galactic, Wall-E, warehouse robotics, Watson beat the top human players on Jeopardy!, Wave and Pay, We are Anonymous. We are Legion, web application, Westphalian system, WikiLeaks, Y Combinator, you are the product, zero day

Follow that out further to, say, 2045, we will have multiplied the intelligence, the human biological machine intelligence of our civilization a billion-fold. RAY KURZWEIL In 2011, we all watched with awe when IBM’s Watson supercomputer beat the world champions on the television game show Jeopardy! Using artificial intelligence and natural language processing, Watson digested over 200 million pages of structured and unstructured data, which it processed at a rate of eighty teraflops—that’s eighty trillion operations per second. In doing so, it handily defeated Ken Jennings, a human Jeopardy! contestant who had won seventy-four games in a row.


pages: 706 words: 202,591

Facebook: The Inside Story by Steven Levy

active measures, Airbnb, Airbus A320, Amazon Mechanical Turk, AOL-Time Warner, Apple's 1984 Super Bowl advert, augmented reality, Ben Horowitz, Benchmark Capital, Big Tech, Black Lives Matter, Blitzscaling, blockchain, Burning Man, business intelligence, Cambridge Analytica, cloud computing, company town, computer vision, crowdsourcing, cryptocurrency, data science, deep learning, disinformation, don't be evil, Donald Trump, Dunbar number, East Village, Edward Snowden, El Camino Real, Elon Musk, end-to-end encryption, fake news, Firefox, Frank Gehry, Geoffrey Hinton, glass ceiling, GPS: selective availability, growth hacking, imposter syndrome, indoor plumbing, information security, Jeff Bezos, John Markoff, Jony Ive, Kevin Kelly, Kickstarter, lock screen, Lyft, machine translation, Mahatma Gandhi, Marc Andreessen, Marc Benioff, Mark Zuckerberg, Max Levchin, Menlo Park, Metcalfe’s law, MITM: man-in-the-middle, move fast and break things, natural language processing, Network effects, Oculus Rift, operational security, PageRank, Paul Buchheit, paypal mafia, Peter Thiel, pets.com, post-work, Ray Kurzweil, recommendation engine, Robert Mercer, Robert Metcalfe, rolodex, Russian election interference, Salesforce, Sam Altman, Sand Hill Road, self-driving car, sexual politics, Sheryl Sandberg, Shoshana Zuboff, side project, Silicon Valley, Silicon Valley startup, skeuomorphism, slashdot, Snapchat, social contagion, social graph, social software, South of Market, San Francisco, Startup school, Steve Ballmer, Steve Bannon, Steve Jobs, Steven Levy, Steven Pinker, surveillance capitalism, tech billionaire, techlash, Tim Cook: Apple, Tragedy of the Commons, web application, WeWork, WikiLeaks, women in the workforce, Y Combinator, Y2K, you are the product

(Cox had moved into the “Truckin’” house.) Also living there was Ezra Callahan, who’d moved there from the Los Altos house. Every day Callahan would come home and tell Cox how amazing Facebook was, and that he should come work there. Cox would say he wasn’t interested. Why would a Stanford AI graduate with dreams of solving natural language processing work for a silly company with posts and pokes? Callahan ultimately convinced him to come in, and he interviewed with Moskovitz, Jeff Rothschild, and Adam D’Angelo. Moskovitz explained to him that Facebook was the seed of a collaboratively created directory of people with one authentic representation for every individual.


pages: 1,409 words: 205,237

Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale by Jan Kunigk, Ian Buss, Paul Wilkinson, Lars George

Amazon Web Services, barriers to entry, bitcoin, business intelligence, business logic, business process, cloud computing, commoditize, computer vision, continuous integration, create, read, update, delete, data science, database schema, Debian, deep learning, DevOps, domain-specific language, fault tolerance, Firefox, FOSDEM, functional programming, Google Chrome, Induced demand, information security, Infrastructure as a Service, Internet of things, job automation, Kickstarter, Kubernetes, level 1 cache, loose coupling, microservices, natural language processing, Network effects, platform as a service, single source of truth, source of truth, statistical model, vertical integration, web application

While certainly a hyped term, machine learning goes beyond classic statistics, with more advanced algorithms that predict an outcome by learning from the data—often without explicitly being programmed. The most advanced methods in machine learning, referred to as deep learning, are able to automatically discover the relevant data features for learning, which essentially enables use cases like computer vision, natural language processing, or fraud detection for any corporation. Many machine learning algorithms (even fairly simple ones) benefit from big data in an unproportional, even unreasonable way, an effect which was described as early as 2001.2 As big data becomes readily available in more and more organizations, machine learning becomes a defining movement in the overall IT industry to take advantage of this effect.


Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

For example: • A secondary index is a kind of derived dataset with a straightforward transforma‐ tion function: for each row or document in the base table, it picks out the values in the columns or fields being indexed, and sorts by those values (assuming a Btree or SSTable index, which are sorted by key, as discussed in Chapter 3). • A full-text search index is created by applying various natural language process‐ ing functions such as language detection, word segmentation, stemming or lem‐ matization, spelling correction, and synonym identification, followed by building a data structure for efficient lookups (such as an inverted index). • In a machine learning system, we can consider the model as being derived from the training data by applying various feature extraction and statistical analysis functions.


pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil

additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business cycle, business intelligence, c2.com, call centre, carbon-based life, cellular automata, Charles Babbage, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, coronavirus, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, digital divide, disintermediation, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, factory automation, friendly AI, functional programming, George Gilder, Gödel, Escher, Bach, Hans Moravec, hype cycle, informal economy, information retrieval, information security, invention of the telephone, invention of the telescope, invention of writing, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Marshall McLuhan, Mikhail Gorbachev, Mitch Kapor, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Nick Bostrom, Norbert Wiener, oil shale / tar sands, optical character recognition, PalmPilot, pattern recognition, phenotype, power law, precautionary principle, premature optimization, punch-card reader, quantum cryptography, quantum entanglement, radical life extension, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Robert Metcalfe, Rodney Brooks, scientific worldview, Search for Extraterrestrial Intelligence, selection bias, semantic web, seminal paper, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, Stuart Kauffman, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, two and twenty, Vernor Vinge, Y2K, Yogi Berra

No simple tricks, short of fully mastering the principles of human intelligence, will allow a computerized system to convincingly emulate human conversation, even if restricted to just text messages. This was Turing's enduring insight in designing his eponymous test based entirely on written language. Although not yet at human levels, natural language-processing systems are making solid progress. Search engines have become so popular that "Google" has gone from a proper noun to a common verb, and its technology has revolutionized research and access to knowledge. Google and other search engines use Al-based statistical-learning methods and logical inference to determine the ranking of links.


pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Infrastructure as a Service, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

For example: A secondary index is a kind of derived dataset with a straightforward transformation function: for each row or document in the base table, it picks out the values in the columns or fields being indexed, and sorts by those values (assuming a B-tree or SSTable index, which are sorted by key, as discussed in Chapter 3). A full-text search index is created by applying various natural language processing functions such as language detection, word segmentation, stemming or lemmatization, spelling correction, and synonym identification, followed by building a data structure for efficient lookups (such as an inverted index). In a machine learning system, we can consider the model as being derived from the training data by applying various feature extraction and statistical analysis functions.