sentiment analysis

61 results back to index

Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data by Dipanjan Sarkar

bioinformatics, business intelligence, computer vision, continuous integration,, general-purpose programming language, Guido van Rossum, information retrieval, Internet of things, invention of the printing press, iterative process, natural language processing, out of africa, performance metric, premature optimization, recommendation engine, self-driving car, semantic web, sentiment analysis, speech recognition, statistical model, text mining, Turing test, web application

I encourage you to experiment with more propositions and FOL expressions by building your own assumptions, domain, and rules. Sentiment Analysis We will now discuss several concepts, techniques, and examples with regard to our second major topic in this chapter, sentiment analysis. Textual data , even though unstructured, mainly has two broad types of data points: factual based (objective) and opinion based (subjective). We briefly talked about these two categories at the beginning of this chapter when I introduced the concept of sentiment analysis and how it works best on text that has a subjective context. In general, social media, surveys, and feedback data all are heavily opinionated and express the beliefs, judgement, emotion, and feelings of human beings. Sentiment analysis, also popularly known as opinion analysis/mining, is defined as the process of using techniques like NLP, lexical resources, linguistics, and machine learning (ML) to extract subjective and opinion related information like emotions, attitude, mood, modality, and so on and try to use these to compute the polarity expressed by a text document.

Automated Text Classification Text Classification Blueprint Text Normalization Feature Extraction Bag of Words Model TF-IDF Model Advanced Word Vectorization Models Classification Algorithms Multinomial Naïve Bayes Support Vector Machines Evaluating Classification Models Building a Multi-Class Classification System Applications and Uses Summary Chapter 5:​ Text Summarization Text Summarization and Information Extraction Important Concepts Documents Text Normalization Feature Extraction Feature Matrix Singular Value Decomposition Text Normalization Feature Extraction Keyphrase Extraction Collocations Weighted Tag–Based Phrase Extraction Topic Modeling Latent Semantic Indexing Latent Dirichlet Allocation Non-negative Matrix Factorization Extracting Topics from Product Reviews Automated Document Summarization Latent Semantic Analysis TextRank Summarizing a Product Description Summary Chapter 6:​ Text Similarity and Clustering Important Concepts Information Retrieval (IR) Feature Engineering Similarity Measures Unsupervised Machine Learning Algorithms Text Normalization Feature Extraction Text Similarity Analyzing Term Similarity Hamming Distance Manhattan Distance Euclidean Distance Levenshtein Edit Distance Cosine Distance and Similarity Analyzing Document Similarity Cosine Similarity Hellinger-Bhattacharya Distance Okapi BM25 Ranking Document Clustering Clustering Greatest Movies of All Time K-means Clustering Affinity Propagation Ward’s Agglomerative Hierarchical Clustering Summary Chapter 7:​ Semantic and Sentiment Analysis Semantic Analysis Exploring WordNet Understanding Synsets Analyzing Lexical Semantic Relations Word Sense Disambiguation Named Entity Recognition Analyzing Semantic Representations Propositional Logic First Order Logic Sentiment Analysis Sentiment Analysis of IMDb Movie Reviews Setting Up Dependencies Preparing Datasets Supervised Machine Learning Technique Unsupervised Lexicon-based Techniques Comparing Model Performances Summary Index Contents at a Glance About the Author About the Technical Reviewer Acknowledgments Introduction Chapter 1:​ Natural Language Basics Chapter 2:​ Python Refresher Chapter 3:​ Processing and Understanding Text Chapter 4:​ Text Classification Chapter 5:​ Text Summarization Chapter 6:​ Text Similarity and Clustering Chapter 7:​ Semantic and Sentiment Analysis Index About the Author and About the Technical Reviewer About the Author Dipanjan Sarkar is a data scientist at Intel, the world’s largest silicon company, which is on a mission to make the world more connected and productive.

Sentiment analysisis perhaps the most popular application of text analytics, with a vast number of tutorials, web sites, and applications that focus on analyzing sentiment of various text resources ranging from corporate surveys to movie reviews. The key aspect of sentiment analysis is to analyze a body of text for understanding the opinion expressed by it and other factors like mood and modality. Usually sentiment analysis works best on text that has a subjective context than on that with only an objective context. This is because when a body of text has an objective context or perspective to it, the text usually depicts some normal statements or facts without expressing any emotion, feelings, or mood. Subjective text contains text that is usually expressed by a human having typical moods, emotions, and feelings. Sentiment analysis is widely used, especially as a part of social media analysis for any domain, be it a business, a recent movie, or a product launch, to understand its reception by the people and what they think of it based on their opinions or, you guessed it, sentiment.

pages: 527 words: 147,690

Terms of Service: Social Media and the Price of Constant Connection by Jacob Silverman

23andMe, 4chan, A Declaration of the Independence of Cyberspace, Airbnb, airport security, Amazon Mechanical Turk, augmented reality, basic income, Brian Krebs, California gold rush, call centre, cloud computing, cognitive dissonance, commoditize, correlation does not imply causation, Credit Default Swap, crowdsourcing, don't be evil, drone strike, Edward Snowden, feminist movement, Filter Bubble, Firefox, Flash crash, game design, global village, Google Chrome, Google Glasses, hive mind, income inequality, informal economy, information retrieval, Internet of things, Jaron Lanier, jimmy wales, Kevin Kelly, Kickstarter, knowledge economy, knowledge worker, late capitalism, license plate recognition, life extension, lifelogging, Lyft, Mark Zuckerberg, Mars Rover, Marshall McLuhan, mass incarceration, meta analysis, meta-analysis, Minecraft, move fast and break things, move fast and break things, national security letter, Network effects, new economy, Nicholas Carr, Occupy movement, optical character recognition, payday loans, Peter Thiel, postindustrial economy, prediction markets, pre–internet, price discrimination, price stability, profit motive, quantitative hedge fund, race to the bottom, Ray Kurzweil, recommendation engine, rent control, RFID, ride hailing / ride sharing, self-driving car, sentiment analysis, shareholder value, sharing economy, Silicon Valley, Silicon Valley ideology, Snapchat, social graph, social intelligence, social web, sorting algorithm, Steve Ballmer, Steve Jobs, Steven Levy, TaskRabbit, technoutopianism, telemarketer, transportation-network company, Travis Kalanick, Turing test, Uber and Lyft, Uber for X, uber lyft, universal basic income, unpaid internship, women in the workforce, Y Combinator, Zipcar

BehaviorMatrix says that its program examined cancer blogs and discovered that cancer patients are most optimistic just after receiving their diagnosis. This insight might be useful for therapists, doctors, and public health professionals, but the company’s CEO told the Wall Street Journal that he drew on this information to advise drug companies in their ad targeting. The most likely application of sentiment analysis, then, is to give a slight edge to hedge funds and advertisers. At the very least, a gaggle of digital media consultants are pulling down hefty fees selling these services to deep-pocketed corporate clients. But what happens when sentiment analysis is not just spilling out reports for an executive’s consumption but is actually linked to potentially vital systems? And what happens then if a network becomes seeded with misinformation? You might just crash the stock market. On April 23, 2013, the Associated Press’s official Twitter account sent out the following tweet: “Breaking: Two Explosions in the White House and Barack Obama is injured.”

To become part of the social web, then, is to join the networks of surveillance, tracking, and data circulation that now support a vast informational economy and increasingly shape our social and cultural lives. Few aspects of contemporary life have gone unaffected by this shift, by the ability to publish immediately, freely, and to a massive audience. Shareability, and the drive to rack up likes and other metrics, guides the agendas of magazine editors and the budgets of marketers. Sentiment analysis—the mining of social-network data to determine the attitudes of individuals or whole populations—helps intelligence analysts learn where potential extremists are becoming radicalized. Advertisers collect social-media data and form consumer profiles with tens of thousands of pieces of information. Large corporations use social media to befriend customers, offer personalized customer service, and churn out friendly propaganda.

These companies will tinker with policies, especially after every public outrage and class-action lawsuit, but the end point remains the same: to retain rights over your data and expressions, and to make the transition from a status update to a related, paid advertisement as smooth as possible. CONVERTING EMOTIONS INTO PROFITABLE DATA Like buttons and taggable emotions are just two features of what has become a like economy, which depends on the growth of sentiment analysis, the examination of huge data sets to find out how people are reacting to news, products, or the events of their own lives. Retailers and advertisers want to know what individual consumers are thinking and buying, but they, along with investors, banks, consultants, and others, also want to be able to take the pulse of public opinion. To do this, they try to tap into the welter of data we produce on social media and blogs and also in traditional news media, review sites, message boards, and interviews with corporate executives.

pages: 337 words: 86,320

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz

affirmative action, AltaVista, Amazon Mechanical Turk, Asian financial crisis, Bernie Sanders, big data - Walmart - Pop Tarts, Cass Sunstein, computer vision, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, desegregation, Donald Trump, Edward Glaeser, Filter Bubble, game design, happiness index / gross national happiness, income inequality, Jeff Bezos, John Snow's cholera map, longitudinal study, Mark Zuckerberg, Nate Silver, peer-to-peer lending, Peter Thiel, price discrimination, quantitative hedge fund, Ronald Reagan, Rosa Parks, sentiment analysis, Silicon Valley, statistical model, Steve Jobs, Steven Levy, Steven Pinker, TaskRabbit, The Signal and the Noise by Nate Silver, working poor

DRINK.WORK.PRAY 19- to 22-year-olds 23- to 29-year-olds 30- to 65-year-olds A powerful new tool for analyzing text is something called sentiment analysis. Scientists can now estimate how happy or sad a particular passage of text is. How? Teams of scientists have asked large numbers of people to code tens of thousands of words in the English language as positive or negative. The most positive words, according to this methodology, include “happy,” “love,” and “awesome.” The most negative words include “sad,” “death,” and “depression.” They thus have built an index of the mood of a huge set of words. Using this index, they can measure the average mood of words in a passage of text. If someone writes “I am happy and in love and feeling awesome,” sentiment analysis would code that as extremely happy text. If someone writes “I am sad thinking about all the world’s death and depression,” sentiment analysis would code that as extremely sad text.

These days, when people sit down to read, most of the time it is to peruse status updates on Facebook. But, once upon a time, not so long ago, human beings read stories, sometimes in books. Sentiment analysis can teach us a lot here, too. A team of scientists, led by Andy Reagan, now at the University of California at Berkeley School of Information, downloaded the text of thousands of books and movie scripts. They could then code how happy or sad each point of the story was. Consider, for example, the book Harry Potter and the Deathly Hallows. Here, from that team of scientists, is how the mood of the story changes, along with a description of key plot points. Note that the many rises and falls in mood that the sentiment analysis detects correspond to key events. Most stories have simpler structures. Take, for example, Shakespeare’s tragedy King John.

Finally, John is poisoned by a disgruntled monk. And here is the sentiment analysis as the play progresses. In other words, just from the words, the computer was able to detect that things go from bad to worse to worst. Or consider the movie 127 Hours. A basic plot summary of this movie is as follows: A mountaineer goes to Utah’s Canyonlands National Park to hike. He befriends other hikers but then parts ways with them. Suddenly, he slips and knocks loose a boulder, which traps his hand and wrist. He attempts various escapes, but each one fails. He becomes depressed. Finally, he amputates his arm and escapes. He gets married, starts a family, and continues climbing, although now he makes sure to leave a note whenever he goes off. And here is the sentiment analysis as the movie progresses, again by Reagan’s team of scientists.

pages: 161 words: 39,526

Applied Artificial Intelligence: A Handbook for Business Leaders by Mariya Yao, Adelyn Zhou, Marlene Jia

Airbnb, Amazon Web Services, artificial general intelligence, autonomous vehicles, business intelligence, business process, call centre, chief data officer, computer vision, conceptual framework,, future of work, industrial robot, Internet of things, iterative process, Jeff Bezos, job automation, Marc Andreessen, natural language processing, new economy, pattern recognition, performance metric, price discrimination, randomized controlled trial, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, skunkworks, software is eating the world, source of truth, speech recognition, statistical model, strong AI, technological singularity

These 29 gorgeous images created by Google’s AI raised almost $100,000 at auction. Business Insider. Retrieved from (15) Goleman, D. (2008, March 24). When Emotional Intelligence Does Not Matter More Than IQ. Retrieved from (16) Sentiment analysis. (n.d.). In Wikipedia. Retrieved on November 17, 2017, from (17) Knight, W. (2016, June 13). Emotional intelligence might be a virtual assistant’s secret weapon. MIT Technology Review. Retrieved from (18) Talbot, D. (2014, September 19). Apps for Autism. MIT Technology Review.

In 2016, Google hosted an exhibition of AI-generated art that collectively sold for $97,605.(14) Systems That Relate Daniel Goleman, a psychologist and author of the book Emotional Intelligence, believes that our emotional intelligence quotient (EQ) is more important than our intelligence quotient (IQ) in determining our success and happiness.(15) As human employees increasingly collaborate with AI tools at work and digital assistants like Apple’s Siri and Amazon Echo’s Alexa permeate our personal lives, machines will also need emotional intelligence to succeed in our society. Sentiment analysis, also known as opinion mining or emotion AI, extracts and quantifies emotional states from our text, voice, facial expressions, and body language.(16) Knowing a user’s affective state enables computers to respond empathetically and dynamically, as our friends do. The applications to digital assistants are obvious, and companies like Amazon are already prioritizing emotional recognition for voice products like the Echo.(17) Emotional awareness can also improve interpersonal business functions such as sales, marketing, and communications.

Data Connection Does the prospective product offer seamless connections with the other enterprise tools on which you depend, such as your data and analytics provider or CRM system? Is the integration built-in, and if so, is it offered via an application programming interface (API) or platform? If not, will it require custom development? Language Support If you’re working on a consumer-facing global product, such as a conversational agent or sentiment analysis, your solution may need to support additional languages. How many languages and types of voices does the prospective product support? Professional Support Most AI systems will need to be continually trained and updated. How accessible and competent is the vendor’s professional services team to help onboard and maintain your AI system? Particularly for large enterprises, does the vendor have the capability to support the scale of service that you require?

pages: 349 words: 98,868

Nervous States: Democracy and the Decline of Reason by William Davies

active measures, Affordable Care Act / Obamacare, Amazon Web Services, bank run, banking crisis, basic income, business cycle, Capital in the Twenty-First Century by Thomas Piketty, citizen journalism, Climategate, Climatic Research Unit, Colonization of Mars, continuation of politics by other means, creative destruction, credit crunch, decarbonisation, deindustrialization, discovery of penicillin, Dominic Cummings, Donald Trump, drone strike, Elon Musk, failed state, Filter Bubble, first-past-the-post, Frank Gehry, gig economy, housing crisis, income inequality, Isaac Newton, Jeff Bezos, Johannes Kepler, Joseph Schumpeter, knowledge economy, loss aversion, low skilled workers, Mahatma Gandhi, Mark Zuckerberg, mass immigration, meta analysis, meta-analysis, Mont Pelerin Society, mutually assured destruction, Northern Rock, obamacare, Occupy movement, pattern recognition, Peace of Westphalia, Peter Thiel, Philip Mirowski, planetary scale, post-industrial society, quantitative easing, RAND corporation, Ray Kurzweil, Richard Florida, road to serfdom, Robert Mercer, Ronald Reagan, sentiment analysis, Silicon Valley, Silicon Valley startup, smart cities, statistical model, Steve Jobs, the scientific method, Turing machine, Uber for X, universal basic income, University of East Anglia, Valery Gerasimov, We are the 99%, WikiLeaks, women in the workforce, zero-sum game

We have a wide vocabulary for naming and expressing these feelings. We communicate them physically in our facial expressions and body language. They tell us important things about our relationships, lifestyles, desires and identities. Feelings of this sort present themselves to our minds, such that we actually notice them, even if we can’t control them. Emotions can now be captured and algorithmically analyzed (“sentiment analysis”) thanks to the behavioral data that digital technologies collect. And yet feelings of this sort are not welcome everywhere. In public life, an accusation of being “emotional” traditionally carries the implication that someone has lost objectivity and given way to irrational forces. Feelings are how we orient ourselves, while also providing a reminder of shared humanity. Our capacity to feel pain and love is fundamental to how and why we care about each other.

Our contemporary notion of “viral marketing” (which subtly targets influential people, rather than communicating to the public all at once) is an example of systematically employed contagions. As more of our behavior and communication is digitally captured, and with rapid advances in “emotional artificial intelligence” (or “affective computing”), it is becoming possible to study the movement of emotions and sentiments through crowds with increasing scientific precision. Techniques of digital “sentiment analysis,” algorithmically trained upon social media content, facial movements, and other bodily cues, is taking Le Bon’s biological approach to psychology, and turning it into a whole industry of market research. The emotional content of a tweet, eye movement, or tone of voice can now be captured and analyzed. Faces in crowds can be recognized by smart cameras, which have been put to work in a pilot surveillance project by security services in Chongqing, China.

This is the crux of the problem. Unlike statisticians or social scientists, Silicon Valley is not seeking to create an accurate portrait of society, but to provide the infrastructure on which we all depend, which will then capture our movements and sentiments with the utmost sensitivity. Advances in machine-learning techniques have improved sensitivity beyond that of human consciousness. “Sentiment analysis” involves training algorithms to detect different types of emotion in a given sentence, and can be used to monitor the emotions being expressed on Twitter, Facebook, email, or (due to voice-recognition technology) phones. “Facial analytics” does something similar to detect how someone might be feeling from the movements in their face, and can now apparently be used to detect a person’s sexuality.11 The entire field of “affective computing,” which is transforming market research, uses machine learning to enable computers to identify emotions by means of body language and behavior.

pages: 688 words: 107,867

Python Data Analytics: With Pandas, NumPy, and Matplotlib by Fabio Nelli

Amazon Web Services, centre right, computer vision, Debian, DevOps, Google Earth, Guido van Rossum, Internet of things, optical character recognition, pattern recognition, sentiment analysis, speech recognition, statistical model, web application

doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN" "">\r\n<html>\r\n<hea' Now, however, the conversion into NLTK corpus requires an additional library, bs4 (BeautifulSoup), which provides you with suitable parsers that can recognize HTML tags and extract the text contained in them.from bs4 import BeautifulSoup raw = BeautifulSoup(html, "lxml").get_text() tokens = nltk.word_tokenize(raw) text = nltk.Text(tokens) Now you also have a corpus in this case, even if you often have to perform more complex cleaning operations than the previous case to eliminate the words that do not interest you. Sentimental Analysis Sentimental analysis is a new field of research that has developed very recently in order to evaluate people’s opinions about a particular topic. This discipline is based on different techniques that use text analysis and its field of work in the world of social media and forums ( opinion mining ). Thanks to comments and reviews by users, sentimental analysis algorithms can evaluate the degree of appreciation or evaluation based on certain keywords. This degree of appreciation is called opinion and has three possible values: positive, neutral, or negative. The assessment of this opinion thus becomes a form of classification. So many sentimental analysis techniques are actually classification algorithms similar to those you saw in previous chapters covering machine learning and deep learning (see Chapters 8 and 9).

Population in 2014 Conclusions Chapter 12:​ Recognizing Handwritten Digits Handwriting Recognition Recognizing Handwritten Digits with scikit-learn The Digits Dataset Learning and Predicting Recognizing Handwritten Digits with TensorFlow Learning and Predicting Conclusions Chapter 13:​ Textual Data Analysis with NLTK Text Analysis Techniques The Natural Language Toolkit (NLTK) Import the NLTK Library and the NLTK Downloader Tool Search for a Word with NLTK Analyze the Frequency of Words Selection of Words from Text Bigrams and Collocations Use Text on the Network Extract the Text from the HTML Pages Sentimental Analysis Conclusions Chapter 14:​ Image Analysis and Computer Vision with OpenCV Image Analysis and Computer Vision OpenCV and Python OpenCV and Deep Learning Installing OpenCV First Approaches to Image Processing and Analysis Before Starting Load and Display an Image Working with Images Save the New Image Elementary Operations on Images Image Blending Image Analysis Edge Detection and Image Gradient Analysis Edge Detection The Image Gradient Theory A Practical Example of Edge Detection with the Image Gradient Analysis A Deep Learning Example:​ The Face Detection Conclusions Appendix A:​ Writing Mathematical Expressions with LaTeX With matplotlib With IPython Notebook in a Markdown Cell With IPython Notebook in a Python 2 Cell Subscripts and Superscripts Fractions, Binomials, and Stacked Numbers Radicals Fonts Accents Appendix B:​ Open Data Sources Political and Government Data Health Data Social Data Miscellaneous and Public Data Sets Financial Data Climatic Data Sports Data Publications, Newspapers, and Books Musical Data Index About the Author and About the Technical Reviewer About the Author Fabio Nelliis a data scientist and Python consultant, designing and developing Python applications for data analysis and visualization.

Comments on social networks and chats can also be a great source of data, especially to understand the degree of approval or disapproval of a particular topic. Analyzing these texts has therefore become a source of enormous interest, and there are many techniques that have been introduced for this purpose, creating a real discipline in itself. Some of the more important techniques are the following:Analysis of the frequency distribution of words Pattern recognition Tagging Analysis of links and associations Sentiment analysis The Natural Language Toolkit (NLTK) If you program in Python and want to analyze data in text form, one of the most commonly used tools at the moment is the Python Natural Language Toolkit (NLTK). NLTK is nothing more than a Python library ( ) in which there are many tools specialized in processing and text data analysis. NLTK was created in 2001 for educational purposes, then over time it developed to such an extent that it became a real analysis tool.

pages: 317 words: 87,566

The Happiness Industry: How the Government and Big Business Sold Us Well-Being by William Davies

1960s counterculture, Airbnb, business intelligence, corporate governance, dematerialisation, experimental subject, Exxon Valdez, Frederick Winslow Taylor, Gini coefficient, income inequality, intangible asset, invisible hand, joint-stock company, lifelogging, market bubble, mental accounting, nudge unit, Panopticon Jeremy Bentham, Philip Mirowski, profit maximization, randomized controlled trial, Richard Thaler, road to serfdom, Ronald Coase, Ronald Reagan, science of happiness, selective serotonin reuptake inhibitor (SSRI), sentiment analysis, sharing economy, Slavoj Žižek, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, social intelligence, Social Responsibility of Business Is to Increase Its Profits, Steve Jobs, The Chicago School, The Spirit Level, theory of mind, urban planning, Vilfredo Pareto

This book shares much of that disquiet. There are surely ample political and material problems to deal with right now, before we divert quite so much attention towards the mental and neural conditions through which we individually experience them. There is also a sense that when the doyens of the World Economic Forum seize an agenda with so much gusto, there is at least some cause for suspicion. The mood-tracking technologies, sentiment analysis algorithms and stress-busting meditation techniques are put to work in the service of certain political and economic interests. They are not simply gifted to us for our own Aristotelian flourishing. Positive psychology, which repeats the mantra that happiness is a personal ‘choice’, is as a result largely unable to provide the exit from consumerism and egocentricity that its gurus sense many people are seeking.

Companies such as Nike are now exploring ways in which health and fitness products can be sold alongside quantified self apps, which will allow individuals to make constant reports of their behaviour (such as jogging), generating new data sets for the company in the process. There is a third development, the political and philosophical implications of which are potentially the most radical of all. This concerns the capability to ‘teach’ computers how to interpret human behaviour in terms of the emotions that are conveyed. For example, the field of ‘sentiment analysis’ involves the design of algorithms to interpret the sentiment that is expressed in a given sentence, for example, a single tweet. The MIT Affective Computing research centre is dedicated to exploring new ways in which computers might read people’s moods through evaluating their facial expressions, or might carry out ‘emotionally intelligent’ conversations with people, to provide them with therapeutic support or friendship.

There are those who possess the power of algorithmic analysis and data mining to navigate a world in which there are too many pieces of data to be studied individually. These include market research agencies, social media platforms and the security services. But for the rest of us, impulse and emotion have become how we orientate and simplify our decisions. Hence the importance of fMRI and sentiment analysis in the digital age: tools which visualize, measure and codify our feelings become the main conduit between an esoteric, expert discourse of mathematics and facts, and a layperson’s discourse of mood, mystical belief and feeling. ‘We’ simply feel our way around, while ‘they’ observe and algorithmically analyse the results. Two separate languages are at work. The terminal dystopia of Benthamism, as touched on in Chapter 7, is of a social world that has been rendered totally objective, to the point where the distinction between the objective and the subjective is overcome.

pages: 321

Finding Alphas: A Quantitative Approach to Building Trading Strategies by Igor Tulchinsky

algorithmic trading, asset allocation, automated trading system, backtesting, barriers to entry, business cycle, buy and hold, capital asset pricing model, constrained optimization, corporate governance, correlation coefficient, credit crunch, Credit Default Swap, discounted cash flows, discrete time, diversification, diversified portfolio, Eugene Fama: efficient market hypothesis, financial intermediation, Flash crash, implied volatility, index arbitrage, index fund, intangible asset, iterative process, Long Term Capital Management, loss aversion, market design, market microstructure, merger arbitrage, natural language processing, passive investing, pattern recognition, performance metric, popular capitalism, prediction markets, price discovery process, profit motive, quantitative trading / quantitative finance, random walk, Renaissance Technologies, risk tolerance, risk-adjusted returns, risk/return, selection bias, sentiment analysis, shareholder value, Sharpe ratio, short selling, Silicon Valley, speech recognition, statistical arbitrage, statistical model, stochastic process, survivorship bias, systematic trading, text mining, transaction costs, Vanguard fund, yield curve

Models should be tested for consistency between the vendor’s in-sample period (before it began delivering the data in real time) and the subsequent out-of-sample period. Look-Ahead Bias in Machine Learning Researchers using machine learning techniques also can introduce look-­ ahead bias. In particular, they may tune some hyper-parameters on the entire data sample and then use those parameters in the backtest. Hyper-­ parameters should always be tuned using only backward-looking data. Similarly, in the area of sentiment analysis, researchers should take note of vendor-supplied sentiment dictionaries that may have been trained on forward-looking data. Data Mining A researcher may data mine a signal by tinkering with its construction until it has favorable in-sample performance; this is commonly called overfitting. The standard approach to controlling data mining involves a holdout, which withholds data in the simulation and takes one of two broad forms: a time-series holdout or an asset holdout.

Though financial statements do not directly reflect all the information that indicates a company’s potential, they do contribute a key piece of the investment puzzle. In the pursuit of alphas, the meaningful interpretation and analysis of financial statements can be a solid basis for informed investment decisions. 20 Fundamental Analysis and Alpha Research By Xinye Tang and Kailin Qi Along with techniques such as pairs trading, momentum investing, event-driven investing, and news sentiment analysis, fundamental analysis is an important tool used in designing quantitative alphas. By examining relevant economic and financial factors, fundamental analysts attempt to reveal a security’s value and determine whether it is undervalued or overvalued. A potentially profitable portfolio can then be constructed by going long the relatively undervalued securities and/ or going short the overvalued ones.

Since then, key research areas include the prediction power of various forms of social media; social media applied to individual stocks; the discussion of noise in social media; finding valuable tweets by observing retweets and tweets from celebrities; and social media sentiment with long-term firm value. SENTIMENT Simply speaking, sentiment measures the quality of news. The most basic definition of sentiment is the polarity of the news: good, bad, or neutral. Advanced sentiment analysis can express more sophisticated emotional details, such as “anger,” “surprise,” or “beyond expectations.” The Impact of News and Social Media on Stock Returns161 The construction of news sentiment usually involves natural language processing and statistical/machine learning algorithms (for example, naive Bayes and support vector machines). The recent explosion of deep learning techniques has enabled rapid progress in understanding news.

pages: 398 words: 86,855

Bad Data Handbook by Q. Ethan McCallum

Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, Chuck Templeton: OpenTable:, cloud computing, cognitive dissonance, combinatorial explosion, commoditize, conceptual framework, database schema, DevOps,, Firefox, Flash crash, Gini coefficient, illegal immigration, iterative process, labor-force participation, loose coupling, natural language processing, Netflix Prize, quantitative trading / quantitative finance, recommendation engine, selection bias, sentiment analysis, statistical model, supply-chain management, survivorship bias, text mining, too big to fail, web application

I knew unethical people would lie in online reviews in order to inflate ratings or attack competitors, but what I didn’t know, and only learned by accident, is that individuals will sometimes write reviews that completely contradict their associated rating, without any regard to how it affects a business’s online reputation. And often this is for businesses that an individual likes. How did I learn this? By using ratings and reviews to create a sentiment corpus, I trained a sentiment analysis classifier that could reliably determine the sentiment of a review. While evaluating this classifier, I discovered that it could also detect discrepancies between the review sentiment and the corresponding rating, thereby finding liars and confused reviewers. Here’s the whole story of how I used text classification to identify an unexpected source of bad data... Weotta At my company, Weotta,[8] we produce applications and APIs for navigating local data in ways that people actually care about, so we can answer questions like: Is there a kid-friendly restaurant nearby?

So how can you accurately calculate an average rating? We wanted to do this for our data, as well as aggregate the overall positive sentiment from all the reviews for a business, independent of any average rating. With that in mind, I figured I could create a sentiment classifier,[11] using rated reviews as a training corpus. A classifier works by taking a feature set and determining a label. For sentiment analysis, a feature set is a piece of text, like a review, and the possible labels can be pos for positive text, and neg for negative text. Such a sentiment classifier could be run over a business’s reviews in order to calculate an overall sentiment, and to make up for any missing rating information. Sentiment Classification NLTK,[12] Python’s Natural Language ToolKit, is a very useful programming library for doing natural language processing and text classification.[13] It also comes with many corpora that you can use for training and testing.

So in a 5-star rating system, 3.5 stars and higher reviews went into the pos directory, while 2.5 stars and lower reviews went into the neg directory. The assumption behind this is that high rated reviews will have positive language, and low rated reviews will have more negative language. Polarized language is ideal for text classification, because the classifier can learn much more precisely those words that indicate pos and those words that indicate neg. Because I needed sentiment analysis for local businesses, not movies, I used a similar method to create my own sentiment training corpus for local business reviews. From a selection of businesses, I produced a corpus where the pos text came from 5 star reviews, and the neg text came from 1 star reviews. I actually started by using both 4 and 5 star reviews for pos, and 1 and 2 star reviews for neg, but after a number of training experiments, it was clear that the 2 and 4 star reviews had less polarizing language, and therefore introduced too much noise, decreasing the accuracy of the classifier.

pages: 123 words: 32,382

Grouped: How Small Groups of Friends Are the Key to Influence on the Social Web by Paul Adams

Airbnb, Cass Sunstein, cognitive dissonance, David Brooks, information retrieval, invention of the telegraph, planetary scale, race to the bottom, Richard Thaler, sentiment analysis, social web, statistical model, The Wisdom of Crowds, web application, white flight

The good news is that research has shown that when businesses are transparent about what data they have on people, and people have control over that data, they tell advertisers more about themselves.10 If trustworthiness and expertise are requirements for credibility, then transparency is becoming increasingly critical for building trustworthiness. Why negative comments are good for your brand The emergence of the social web means that more people are talking openly about businesses, and many businesses are nervous about any negative commentary. Most want sentiment analysis in the advertising products they use so they can hide the negative comments and only promote the positive comments. But this is the wrong approach. People can easily differentiate between a natural conversation and something that is controlled, and they won’t react well to the latter. Hiding negative comments is not transparent; it will dramatically decrease credibility. If people perceive that a source of information is fair and unbiased, it increases credibility.

It’s based on permission, and on highlighting new things about people’s friends. * * * Quick Tips Building credibility with a business is similar to building trust with someone you just met. It is a slow process, often taking months and even years, and marketers need to be patient. There is no quick solution to creating a credible brand. One way to fast-track it is to be recommended by people’s friends. Don’t use sentiment analysis to filter out negative comments, and don’t delete negative comments on your Facebook page. Look at it as an opportunity to learn and respond. If people have something negative to say, it’s because they had a poor experience with your brand. This is something you should want to rectify rather than hide. * * * Summary There are two main problems with interruption marketing, both of which are getting worse.

See social networks New York Times 19 News Feed 134, 135 Nickerson, Raymond 127 nonconscious brain 107–111 decision making by 103–104, 107, 109–110, 148 processing capacity of 107, 108 Nordgren, Loran 115 Nudge (Thaler and Sunstein) 97 O On Intelligence (Hawkins) 114 100 Things Every Designer Needs to Know About People (Weinschenk) 114 overconfidence 96 Owyang, Jeremiah 69, 144 P Pahl, Ray 52, 55, 66, 67 passive sharing 138 patterns 105, 110, 114 Pedigree community 122 Penenberg, Adam 67, 144 permission marketing 12, 14, 133–138 friends and 137–138, 143 word of mouth and 135–137 Permission Marketing (Godin) 14, 143 personal information 139–140 Persuasive Technology (Fogg) 98 photos, Facebook 3, 4 Politics of Happiness, The (Bok) 27 polls, business 22 Predictably Irrational (Ariely) 98, 128, 144 predictions 105 preferential attachment 32 priming 125 problem-solving 105 Proctor & Gamble 109, 121 public ratings 26 push marketing 137 R rational thinking 102–104 reductive thinking 102 relationships changes in 66 patterns of 55–58 strong ties 53, 54, 59–62 types of 52–54 uniqueness of 52 weak ties 53, 54, 62–65 relevance 138 reputation management 17 Rethinking Friendships (Spencer and Pahl) 67 S Salganik, Matthew 98 Science of Influence, The (Hogan) 115, 144 Searching for a Corporate Savior (Khurana) 82 sentiment analysis 140, 142 Sephora marketing campaign 18 serendipitous audience 25 Sernovitz, Andy 68 sharing feelings 19 information 41, 146 passive 138 similarity bias 118 Simon, Herbert 98 Simonson, Itamar 126, 128 six degrees of separation 43–44, 73 Six Degrees (Watts) 49, 82 Smart Lists 32 Social Animal, The (Brooks) 48 social behavior 150 social bonds 16–17, 18 social cognitive theory 128 social networks communication patterns on 23–24 consumer behavior and 106 decision making using 90–93 evolution of 31–32 groups connected through 39 historical overview of 9, 146 importance of understanding 150 influence within 94–95 information communicated on 24–26 pattern of connections in 33–35, 47 strong ties on 23, 60–61 structure of 30–35, 42–46, 81, 147–148 social norms 88 social proof 86–89 social web future of 149–151 how to think of 8 importance of 11–12 next great challenge on 93 summary points about 146–149 society, influence of 87–88 soulmates 53 Spencer, Liz 52, 55, 66, 67 Sponsored Stories 142 status updates 16–17 Strangers to Ourselves (Wilson) 99 strong ties 53, 54, 59–62 average number of 60 buying decisions and 61–62 communications with 60–61 disproportionate influence of 61, 147 importance of having 59 structure of social networks 30–35 connection patterns and 33–35 homophily principle and 32, 45–46 idea spreading and 76, 147–148 influence related to 42–46 laws governing 31–32 Stumbling on Happiness (Gilbert) 115 Sunstein, Cass 97 Surowiecki, James 92, 98 survival mechanism 16 sympathy group 34 T tagging photos 3 Target, poll example 22 targeted ads 80, 138, 139 technology human behavior and 9–10 interruption marketing and 130 Tetlock, Philip 95, 99 Thaler, Richard 97 Think Outside In blog 153 thinking rational 102–104 understanding of 151 three degrees of separation 43, 45, 46, 94 Ticketmaster 35 Tipping Point, The (Gladwell) 11, 14, 72, 82 transparency 139–141 trust building 139–142 levels of 91, 93 marketing and 131, 137 Twitter 73 U useful contacts 52 user ratings/reviews 137 V Viral Loop (Penenberg) 67, 144 visibility of products 21 W Watts, Duncan 10, 49, 73, 76, 82, 87, 98 weak ties 53, 54, 62–65 interactions with 62–64 sourcing information from 64–65 web, the how it’s changing 2–8 people-based rebuilding of 7, 8 phases of development 8 why it’s changing 9–10 See also social web Web Strategy blog 69 Weinschenk, Susan 114 Western cultures 88 Wikipedia 34, 90 Wilson, Timothy 99 Winning Decisions (Russo and Schoemaker) 99 word of mouth 135–137 Word of Mouth Marketing (Sernovitz) 68 Z Zynga games 2–3

Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport

Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, commoditize, data acquisition, disruptive innovation, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, lifelogging, Mark Zuckerberg, move fast and break things, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining, Thomas Davenport

However, with big data, the data resembles not so much a pool as an ongoing, fast-flowing stream. Therefore, a more continuous approach to sampling, analyzing, and acting on data is necessary. Chapter_01.indd 16 03/12/13 3:24 AM Why Big Data Is Important to You and Your Organization   17 This is particularly at issue for applications involving ­ ongoing ­monitoring of data, as in social media sentiment analysis. S ­ entiment a ­ nalysis allows an organization to assess whether the comments about its brands and products in blogs, tweets, and Facebook pages are positive or negative on balance. One potential problem with such monitoring applications is the tendency for managers to view a continuing stream of analysis and reports without making any decisions or taking any action. “Sentiment is up . . . no, it’s down . . . hooray, it’s back up again!”

It’s important then, to have clear criteria for what decisions to make and what actions to take based on big data analyses—particularly in fast-changing domains like social analytics. Sometimes it’s important to admit that the data and analyses are not definitive. I’ve already talked about the HunchWorks project at the United Nations, which seeks to identify trends and hunches at an early stage in order to decide whether they merit further attention. This could also be the right approach for social sentiment analysis—to use it as a tipoff that further investigation is required, rather than a specific action. If you’re a little more certain—but not entirely—that something important is going on based on your big data analysis, you might ­consider an automated recommendation. If necessary, a human could override it. That’s the approach that some health-care organizations are planning to take with the recommendations of IBM’s Watson ­system, for example.

., 195 cloud-based computing, 55, 89, 117, 163, 169, 192, 200, 208 Cloudera Hadoop, 115 commitment, culture of, 148 communication skills, 88, 92, 93, 99, 102–103 Competing on Analytics (Davenport and Harris), 2, 43 Compute Engine, 163 Concept 2, 12 conservative approach to big data ­adoption, 80, 81 consultants, data scientists as, 81, 98–99, 103–104, 112, 209 consumer products companies, 42, 42t, 43, 46, 54, 71, 82 Consumers Union, 67 Corporate Insight, 109 cost-reduction, 21, 60–63, 145 Coursera, 41 cows, data from, 11–12 credit card data, 37, 38, 42, 42t, 46, 164 culture for big data in organizations, 147–149, 152 customer relationship management (CRM), 54, 129f customers banking industry and, 9, 44, 49, 133 big data’s effect on relationships with, 26–27 business-to-business (B2B) firms and, 43, 45–46 business-to-business-to-consumer (B2B2C) firms and, 43, 46 data-based products and services for, 16, 23–24, 26, 66, 106, 155, 195 as focus of big data efforts, 16 future scenario of big data’s effect on relationships with, 35–38, 41–42, 58 identification of dissatisfaction and possible attrition of, 23, 48, 67, 68, 72, 78, 96, 179, 180, 181, 191 intermediaries reporting information about, 46 managers’ attention to, 21 marketing efforts targeted to, 27, 55, 63–64, 65, 67, 72, 79, 107, 108–109, 128, 142, 144, 179, 180, 197 media and entertainment firms and, 48, 49 03/12/13 2:04 PM 220 Index customers (continued) multichannel relationships with, 51, 67, 177, 186 Netflix Prize’s focus on, 16, 22, 66 overachievers and, 42, 42t, 46 regulatory environment for data from, 27 research on website behavior of, 164 sentiment analysis of, 17, 27, 107, 118, 123 service transaction histories from, 23 sharing data with, 167–168 social media and, 48, 50–51, 107 travel industry and, 75–76 underachievers and, 42t, 43–44 unstructured data from, 51, 67, 68, 69, 180, 186 volume of data warehoused from, 116–117, 168 Cutting, Doug, 157 CycleOps, 12 dashboards, 109, 128, 129, 130, 137, 167, 185, 198 data in big data stack, 119t, 121–122 success of big data initiatives and, 136–138 data disadvantaged organizations, 42t, 43 data discovery process big data strategy and, 70–72, 74–75, 75f, 84 enterprise orientation for, 139 focus of architecture on, 20, 201 GE’s experience with, 75 leadership and, 140 management orientation toward, 18–19 model generation for, 64 moderately aggressive approach to big data and, 82 objectives and, 75, 75f, 84 research on, 3 responsibility locus for, 76–77, 77f technical platform for, 131, 201 Data Lab product, 160 data mining, 122–123, 128, 183, 184 data production process big data strategy and, 70, 72–75, 75f, 84 data scientists and teams and, 201 enterprise orientation for, 139 Index.indd 220 GE’s experience with, 74–75 highly ambitious approach to big data and, 83 moderately aggressive approach to big data and, 82 objectives and, 75, 75f, 84 responsibility locus for, 76–77, 77f technical platform for, 74, 127, 129–130, 132, 133, 201 Data Science Central, 97 data scientists activities performed by, 15, 137–138, 148, 159–160, 199 analysts differentiated from, 15 background to, 86–87, 196–197 business expert traits of, 88 classic model of, 87–97 collaboration by, 165–167, 173, 176 development of products and services and, 16, 18, 20, 24, 61–62, 65, 66, 71, 79–80, 106, 161 education and training of, 14, 91, 92, 104, 184, 209 future for, 110–111 hacker traits of, 88–91 horizontal versus vertical, 97–99 job growth for, 111, 111f, 184–185 in large companies, 201 LinkedIn’s use of, 158, 160, 161 motivation of, 106 organizational structure with, 16, 61, 82, 140, 141, 142, 152, 153, 158, 173, 180, 187, 202, 207, 209 quantitative analyst traits of, 88, 93–97 research on, 3 retention of, 104–106, 112, 161 role of, 14, 209 scientist traits of, 88, 91–92 skills of, 71, 79, 88, 145, 147, 182–184, 185 sources of, for hiring, 101–105 start-ups using, 16, 157–158 team approach using, 99–101, 165–167, 181, 201, 209 traits of, 87, 88 trusted adviser traits of, 88, 92–93 data visualization, 124–125, 125f Davis, Jim, 163–164 DB2, 183.

pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life by Adam Greenfield

3D printing, Airbnb, augmented reality, autonomous vehicles, bank run, barriers to entry, basic income, bitcoin, blockchain, business intelligence, business process, call centre, cellular automata, centralized clearinghouse, centre right, Chuck Templeton: OpenTable:, cloud computing, collective bargaining, combinatorial explosion, Computer Numeric Control, computer vision, Conway's Game of Life, cryptocurrency, David Graeber, dematerialisation, digital map, disruptive innovation, distributed ledger, drone strike, Elon Musk, Ethereum, ethereum blockchain, facts on the ground, fiat currency, global supply chain, global village, Google Glasses, IBM and the Holocaust, industrial robot, informal economy, information retrieval, Internet of things, James Watt: steam engine, Jane Jacobs, Jeff Bezos, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, joint-stock company, Kevin Kelly, Kickstarter, late capitalism, license plate recognition, lifelogging, M-Pesa, Mark Zuckerberg, means of production, megacity, megastructure, minimum viable product, money: store of value / unit of account / medium of exchange, natural language processing, Network effects, New Urbanism, Occupy movement, Oculus Rift, Pareto efficiency, pattern recognition, Pearl River Delta, performance metric, Peter Eisenman, Peter Thiel, planetary scale, Ponzi scheme, post scarcity, post-work, RAND corporation, recommendation engine, RFID, rolodex, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, sharing economy, Silicon Valley, smart cities, smart contracts, social intelligence, sorting algorithm, special economic zone, speech recognition, stakhanovite, statistical model, stem cell, technoutopianism, Tesla Model S, the built environment, The Death and Life of Great American Cities, The Future of Employment, transaction costs, Uber for X, undersea cable, universal basic income, urban planning, urban sprawl, Whole Earth Review, WikiLeaks, women in the workforce

As the capacity to detect and characterize emotional states has grown, these reasonably traditional, Taylorist notions of time-and-motion efficiency have been supplemented by a concern for the worker’s affective performance.38 Japan’s Keikyu Corporation, for example, began measuring the quality of its frontline employees’ smiles in 2009, scanning their “eye movements, lip curves and wrinkles,” and rating them on a 0-100 scale.39 As intrusive as this may seem, smiling is at least something under an employee’s conscious control, which cannot be said for all of the measurements of “body posture, facial expressions, physiology, semantics [and] who a person talks to and when” that the management consultancy Accenture recommends to ensure employees are “exhibiting effective social behaviors.”40 Such subconscious tells are picked up by the People Analytics suite the “emotion-aware sentiment analysis company” Kanjoya offers, which uses unstructured voice and text data to calibrate an employee’s “Attrition Risk” and “Workplace Value,” in addition to the expected “Performance.” The concern for retention implies something that a review of similar sentiment analysis systems makes entirely explicit: the demand that inner states be measured and used to determine the conditions of labor now applies to the white-collar workforce every bit as much as it does to checkout clerks or line workers. As well, in a theme that we’ll be taking up repeatedly, what is salient is not so much whether these tools actually perform as advertised, but whether users can be induced to believe that they do.

A Snaptrends brochure for prospective customers in the law enforcement sector makes the proposition explicit: “From angry Facebook posts to suggestive Instagram uploads, today’s would-be criminals often leave A STRING OF CLUES across social media,” and a public-safety agency made aware of those CLUES can deploy its resources in time to preempt the commission of crime.19 Such tools use sentiment analysis, a facet of the emerging pseudoscience of “intent recognition,” to extract actionable intelligence from utterances.20 But it’s astonishing that anyone takes sentiment analysis seriously in any but the most trivial applications, let alone what is all too often the life-or-death context of a police stop. The algorithms involved are notoriously crude and simple-minded, stumbling when confronted with sarcasm and other common modes of expression. They have trouble with word order, double negatives, ambiguous qualifiers and inverted sentence structures.21 In short, they simply cannot be relied upon to distinguish even the most obvious snark from a genuine CLUE.

As we’ve seen, however, the ability to detect behavioral anomalies and departures from acceptable performance profiles algorithmically and remotely is already well advanced. Though they presently stumble at precisely the kind of coded speech that the marginalized have always used to establish and maintain spaces free from oversight—Verlan, Cockney rhyming slang, Polari, 3arabizi—it would be foolish to assume that sentiment analysis and intent recognition will not develop further in the years ahead. And of course totalizing systems like the Chinese social-credit scheme now under active development propose to weave a net capable of capturing, characterizing and punishing all such insurgent acts and utterances, whether public or private; whether or not, indeed, they are conscious at all. And that leaves the possibility, however slim, of working to enact progressive social change within the technosocial frameworks that are now available to us.

pages: 502 words: 107,510

Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs

Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, game design, information retrieval, iterative process, natural language processing, pattern recognition, performance metric, sentiment analysis, social web, speech recognition, statistical model, text mining

When is one set or type of feature better than another? This depends on what the target function is for the learning task. This is where annotation comes in. The point of linguistic annotation is to identify textual components of your document that can be associated with particular features for the phenomena for which you want to develop learning algorithms. Let’s take some examples beyond the spam-ham distinction. Consider sentiment analysis applied to movie reviews or hotel ratings. The most expedient method for classifying movie reviews is to set up the learning problem with n-gram features. The words in the reviews are taken as independent features (lexical clues), and thrown into a description of the target function. While this works remarkably well in general, this approach will fail to capture properties that show up as nonlocal dependencies, such as the ways that negation and modality are often expressed in language.

They have been applied to both simple and quite complex classification tasks (Manning et al. 2008). The main idea behind SVMs is to find the best-fitting decision boundary between two classes, one that is maximally far from any point in the training data. Nonlinearly separable data can be handled elegantly by using a technique called the kernel trick, which maps the data into a higher dimension where it behaves in a linear fashion. SVMs have been applied very successfully to sentiment analysis (Pang et al. 2002). We won’t be going into detail about these, however; other books on machine learning (see the list at the start of the chapter) provide excellent guides for how these classifiers work, and the ones we’ve already discussed are enough to get you started in training algorithms on your annotated data. Micro Versus Macro Classifiers are evaluated using the results of a simple table that sums up how often the tags were correctly assigned.

Classification algorithms are used to apply the most likely label (or classification) to a collection. They can be applied at a document, sentence, phrase, word, or any other level of language that is appropriate for your task. Using n-gram features is the simplest way to start with a classification system, but structure-dependent features and annotation-dependent features will help with more complex tasks such as event recognition or sentiment analysis. Decision trees are a type of ML algorithm that essentially ask “20 questions” of a corpus to determine what label should be applied to each item. The hierarchy of the tree determines the order in which the classifications are applied. The “questions” asked at each branch of a decision tree can be structure-dependent, annotation-dependent, or any other type of feature that can be discovered about the data.

pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier

23andMe, Affordable Care Act / Obamacare, airport security, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, IBM and the Holocaust, index card, informal economy, intangible asset, Internet of things, invention of the printing press, Jeff Bezos, Joi Ito, lifelogging, Louis Pasteur, Mark Zuckerberg, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, paypal mafia, performance metric, Peter Thiel, post-materialism, random walk, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, speech recognition, Steve Jobs, Steven Levy, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Thomas Davenport, Turing test, Watson beat the top human players on Jeopardy!

Messiness can also refer to the inconsistency of formatting, for which the data needs to be “cleaned” before being processed. There are a myriad of ways to refer to IBM, notes the big-data expert DJ Patil, from I.B.M. to T. J. Watson Labs, to International Business Machines. And messiness can arise when we extract or process the data, since in doing so we are transforming it, turning it into something else, such as when we perform sentiment analysis on Twitter messages to predict Hollywood box office receipts. Messiness itself is messy. Suppose we need to measure the temperature in a vineyard. If we have only one temperature sensor for the whole plot of land, we must make sure it’s accurate and working at all times: no messiness allowed. In contrast, if we have a sensor for every one of the hundreds of vines, we can use cheaper, less sophisticated sensors (as long as they do not introduce a systematic bias).

And, in fact, they’re often just that. Yet the company enables the datafication of people’s thoughts, moods, and interactions, which could never be captured previously. Twitter has struck deals with two firms, DataSift and Gnip, to sell access to the data. (Although all tweets are public, access to the “firehose” comes at a cost.) Many businesses parse tweets, sometimes using a technique called sentiment analysis, to garner aggregate customer feedback or judge the impact of marketing campaigns. Two hedge funds, Derwent Capital in London and MarketPsych in California, started analyzing the datafied text of tweets as signals for investments in the stock market. (Their actual trading strategies were kept secret: rather than investing in firms that were ballyhooed, they may have bet against them.) Both firms now sell the information to traders.

The biologist Marcel Salathé of Penn State University and the software engineer Shashank Khandelwal analyzed tweets to find that people’s attitudes about vaccinations matched their likelihood of actually getting flu shots. Importantly, their study used the metadata of who was connected to whom among Twitter followers to go a step further still. They noticed that subgroups of unvaccinated people may exist. What marks this research as particularly special is that where other studies, such as Google Flu Trends, used aggregated data to consider the state of individuals’ health, the sentiment analysis performed by Salathé actually predicted health behaviors. These early findings indicate where datafication will surely go next. Like Google, a gaggle of social media networks such as Facebook, Twitter, LinkedIn, Foursquare, and others sit on an enormous treasure chest of datafied information that, once analyzed, will shed light on social dynamics at all levels, from the individual to society at large.

pages: 397 words: 110,130

Smarter Than You Think: How Technology Is Changing Our Minds for the Better by Clive Thompson

4chan, A Declaration of the Independence of Cyberspace, augmented reality, barriers to entry, Benjamin Mako Hill, butterfly effect, citizen journalism, Claude Shannon: information theory, conceptual framework, corporate governance, crowdsourcing, Deng Xiaoping, discovery of penicillin, disruptive innovation, Douglas Engelbart, Douglas Engelbart, drone strike, Edward Glaeser, Edward Thorp,, experimental subject, Filter Bubble, Freestyle chess, Galaxy Zoo, Google Earth, Google Glasses, Gunnar Myrdal, Henri Poincaré, hindsight bias, hive mind, Howard Rheingold, information retrieval, iterative process, jimmy wales, Kevin Kelly, Khan Academy, knowledge worker, lifelogging, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Netflix Prize, Nicholas Carr, Panopticon Jeremy Bentham, patent troll, pattern recognition, pre–internet, Richard Feynman, Ronald Coase, Ronald Reagan, Rubik’s Cube, sentiment analysis, Silicon Valley, Skype, Snapchat, Socratic dialogue, spaced repetition, superconnector, telepresence, telepresence robot, The Nature of the Firm, the scientific method, The Wisdom of Crowds, theory of mind, transaction costs, Vannevar Bush, Watson beat the top human players on Jeopardy!, WikiLeaks, X Prize, éminence grise

p=2092#more-2092; Dale Lane, “Has Today Been a Good Day?” Dale Lane (blog), April 16, 2012, accessed March 22, 2013, analyzed the color usage in Van Gogh’s major paintings: Cory Doctorow, “Van Gogh Pie-Charts,” Boing Boing, January 29, 2011, accessed March 23, 2013, a “sentiment analysis” of the Bible: “Applying Sentiment Analysis to the Bible,”, October 10, 2011, accessed March 22, 2013, how characters interact in Hamlet: Richard Beck, “Hamlet and the Region of Death,” Boston Globe, May 29, 2011, accessed March 23, 2013, Tufte analyzed 217 data graphics: Edward R. Tufte, The Cognitive Style of Power Point (Cheshire, CT: Graphics Press, 2003), 4–5.

Even Gurrin admits to me that he rarely searches for anything at all in his massive archive. He’s waiting for better search tools to emerge. Mind you, he’s confident they will. As he points out, fifteen years ago you couldn’t find much on the Web because the search engines were dreadful. “And the first MP3 players were horrendous for finding songs,” he adds. The most promising trends in search algorithms include everything from “sentiment analysis” (you could hunt for a memory based on how happy or sad it is) to sophisticated ways of analyzing pictures, many of which are already emerging in everyday life: detecting faces and locations or snippets of text in pictures, allowing you to hunt down hard-to-track images by starting with a vague piece of half recall, the way we interrogate our own minds. The app Evernote has already become popular because of its ability to search for text, even bent or sideways, within photos and documents

If you want proof that data visualization is entering the mainstream, it’s there in online pop culture. Some of the biggest viral hits in recent years have been witty data crunches from odd, unexpected sources. Arthur Buxton, a young British Web designer, analyzed the color usage in Van Gogh’s major paintings and transformed them into pie charts, challenging viewers to figure out which was which. A group of Christian data nerds did a “sentiment analysis” of the Bible, using algorithms that determine whether a piece of text contains positive or negative language. (“Things start off well with creation, turn negative with Job and the patriarchs, improve again with Moses. . . . In the New Testament, things start off fine with Jesus, then quickly turn negative as opposition to his message grows.”) A professor of English used network-mapping software to analyze how characters interact in Hamlet and produced a map that uncovered some revealing patterns.

pages: 125 words: 27,675

Applied Text Analysis With Python: Enabling Language-Aware Data Products With Machine Learning by Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda

full text search, natural language processing, quantitative easing, sentiment analysis, statistical model

In the below example, we see that is the website, or the collection of pages, and each of the individual documents listed below it (courses, projects, etc.) represent the individual web pages. ├── / ├── /courses ├── /projects ├── /corporate-offerings ├── /about └── | ├── /an-introduction-to-machine-learning-with-python | ├── /the-age-of-the-data-product | └── /building-a-classifier-from-census-data | └── /modern-methods-for-sentiment-analysis ... The predictability of a common domain name makes systematic data collection simpler and more convenient. However, most ingested HTML does not arrive clean, ordered, and ready for analysis. For one thing, a raw HTML document collected from the web will include much that is not text: advertisements, headers and footers, navigation bars, etc. Because of its loose schema, HTML makes the systematic extraction of the text from the non-text challenging.

Learning techniques give data scientists the ability to train models in a specific context on a specific corpus, make predictions on new data, and adapt over time as the corpus grows and changes. In fact, most natural language processing uses machine learning in one form or another, from tokenization and part of speech tagging, as we saw in the previous chapter, to named entity recognition, entailment, and parsing. More recently, textual machine learning has enabled applications that utilize sentiment analysis, word sense disambiguation, automatic translation and tagging, scene recognition, captioning, chatbots, and more! Because of Python’s unique role in data science, it is rich in third party machine learning tools, from Scikit-Learn to TensorFlow, as well as language processing tools like NLTK and Gensim. In the last chapter we constructed a corpus of preprocessed documents from HTML ingested via RSS feeds, saving them as a pickled list of lists of (token, tag) tuples.

pages: 274 words: 75,846

The Filter Bubble: What the Internet Is Hiding From You by Eli Pariser

A Declaration of the Independence of Cyberspace, A Pattern Language, Amazon Web Services, augmented reality, back-to-the-land, Black Swan, borderless world, Build a better mousetrap, Cass Sunstein, citizen journalism, cloud computing, cognitive dissonance, crowdsourcing, Danny Hillis, data acquisition, disintermediation, don't be evil, Filter Bubble, Flash crash, fundamental attribution error, global village, Haight Ashbury, Internet of things, Isaac Newton, Jaron Lanier, Jeff Bezos, jimmy wales, Kevin Kelly, knowledge worker, Mark Zuckerberg, Marshall McLuhan, megacity, Metcalfe’s law, Netflix Prize, new economy, PageRank, paypal mafia, Peter Thiel, recommendation engine, RFID, Robert Metcalfe, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, social graph, social software, social web, speech recognition, Startup school, statistical model, stem cell, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, the scientific method, urban planning, Whole Earth Catalog, WikiLeaks, Y Combinator

There’s plenty of good that could emerge from persuasion profiling, Eckles believes. He points to DirectLife, a wearable coaching device by Philips that figures out which arguments get people eating more healthily and exercising more regularly. But he told me he’s troubled by some of the possibilities. Knowing what kinds of appeals specific people respond to gives you power to manipulate them on an individual basis. With new methods of “sentiment analysis, it’s now possible to guess what mood someone is in. People use substantially more positive words when they’re feeling up; by analyzing enough of your text messages, Facebook posts, and e-mails, it’s possible to tell good days from bad ones, sober messages from drunk ones (lots of typos, for a start). At best, this can be used to provide content that’s suited to your mood: On an awful day in the near future, Pandora might know to preload Pretty Hate Machine for you when you arrive.

But his dream is quite clear: Rendon wants to see a world where television “can drive the policy process,” where “border patrols [are] replaced by beaming patrols,” and where “you can win without fighting.” Given all that, I was a bit surprised when the first weapon he referred me to was a very quotidian one: a thesaurus. The key to changing public opinion, Rendon said, is finding different ways to say the same thing. He described a matrix, with extreme language or opinion on one side and mild opinion on the other. By using sentiment analysis to figure out how people in a country felt about an event—say, a new arms deal with the United States—and identify the right synonyms to move them toward approval, you could “gradually nudge a debate.” “It’s a lot easier to be close to what reality is” and push it in the right direction, he said, than to make up a new reality entirely. Rendon had seen me talk about personalization at an event we both attended.

PayPal PeekYou persuasion profiling Phantom Public, The (Lippmann) Philby, Kim Phorm Piaget, Jean Picasa Picasso, Pablo PK List Management Plato politics electoral districts and partisans and programmers and voting Popper, Karl postmaterialism predictions present bias priming effect privacy Facebook and facial recognition and genetic Procter & Gamble product recommendations Proulx, Travis Pulitzer, Joseph push technology and pull technology Putnam, Robert Qiang, Xiao Rapleaf Rather, Dan Raz, Guy reality augmented Reality Hunger (Shields) Reddit Rendon, John (Sunstein) retargeting RFID chips robots Rodriguez de Montalvo, Garci Rolling Stone Roombas Rotenberg, Marc Rothstein, Mark Rove, Karl Royal Caribbean Rubel, Steve Rubicon Project Rumsfeld, Donald Rushkoff, Douglas Salam, Reihan Sandberg, Sheryl schemata Schmidt, Eric Schudson, Michael Schulz, Kathryn science Scientific American Scorpion sentiment analysis Sentry serendipity Shields, David Shirky, Clay Siegel, Lee signals click Simonton, Dean Singhal, Amit Sleepwalkers, The (Koestler) smart devices Smith, J. Walker social capital social graph Social Graph Symposium Social Network, The Solove, Daniel solution horizon Startup School Steitz, Mark stereotyping Stewart, Neal Stryker, Charlie Sullivan, Danny Sunstein, Cass systematization Taleb, Nassim Nicholas Tapestry TargusInfo Taylor, Bret technodeterminism technology television advertising on mean world syndrome and Tetlock, Philip Thiel, Peter This American Life Thompson, Clive Time Tocqueville, Alexis de Torvalds, Linus town hall meetings traffic transparency Trotsky, Leon Turner, Fred Twitter Facebook compared with Últimas Noticias Unabomber uncanny valley Upshot Vaidhyanathan, Siva video games Wales, Jimmy Wall Street Journal Walmart Washington Post Web site morphing Westen, Drew Where Good Ideas Come From (Johnson) Whole Earth Catalog WikiLeaks Wikipedia Winer, Dave Winner, Langdon Winograd, Terry Wired Wiseman, Richard Woolworth, Andy Wright, David Wu, Tim Yahoo News Upshot Y Combinator Yeager, Sam Yelp You Tube LeanBack Zittrain, Jonathan Zuckerberg, Mark Table of Contents Title Page Copyright Page Dedication Introduction Chapter 1 - The Race for Relevance Chapter 2 - The User Is the Content Chapter 3 - The Adderall Society Chapter 4 - The You Loop Chapter 5 - The Public Is Irrelevant Chapter 6 - Hello, World!

pages: 244 words: 81,334

Picnic Comma Lightning: In Search of a New Reality by Laurence Scott

4chan, Airbnb, airport security, augmented reality, Berlin Wall, Bernie Sanders, Boris Johnson, clean water, colonial rule, cryptocurrency, dematerialisation, Donald Trump, Elon Musk, housing crisis, Internet of things, Joan Didion, job automation, late capitalism, Mark Zuckerberg, Narrative Science, Productivity paradox, QR code, ride hailing / ride sharing, Saturday Night Live, sentiment analysis, Silicon Valley, Skype, Slavoj Žižek, Snapchat, Y2K

Big Emotion is working on a large canvas. It seeks to decipher our feelings in all their modes of expression: in how we communicate online, in our external body language and, even more intimately, in the codes to our emotions that we keep inside our bodies. On social media, our words can often be the only evidence of our feelings. As a result, coders are busy improving their software’s capacity for ‘sentiment analysis’. Also known as opinion-mining, this genre of computer program attempts to discern our moods and feelings in the linguistic patterns of our social-media content. One of the obvious problems is that we don’t always say what we mean. Lotem Peled and Roi Reichart, two researchers in this burgeoning field, have given themselves ‘the novel task7 of sarcasm interpretation’. Their work so far involves gathering tweets that include ‘#sarcasm’ and then designing an algorithm that can create an accurately sincere version of the sarcastic tweet.

A concentrated frown might trigger an alert to human attendants, telling them to swarm on this contemplative consumer. Thermal-imaging cameras can analyse our heart rates, detecting that flutter of excitement at the verge of a purchase. There are countless8 commercial applications to these technologies, and it has been estimated that the business of detecting and interpreting emotion will be worth more than $36 billion by 2021. But Big Emotion is not satisfied with the remote scrutiny of sentiment analysis and facial-recognition cameras. The branch of this industry that deals in wearables is devoted to a new kind of empathy, an intimate exchange of information between the human body and the biosensors voluntarily strapped to it. One goal of wearable technologies is to judge our moods from quantifiable physiological responses. Marcus Mustafa, the Global Head of User Experience at the global marketing and technology agency DigitasLBi, has described the biometric data from wearables as the ‘glue’9 that binds the more easily trackable and analysable digital data – our browsing and online purchase histories – to our feelings.

DigitasLBi’s Marcus Mustafa, who – you might recall – defines emotional data as ‘all the things in between the lines’, believes that ‘the “in-between” is rapidly disappearing’. As a result, in addition to all the relevant offers streaming into our wearables, Mustafa wonders whether, in this disappearing space between our feelings and our actions, ‘we might become more aware of ourselves, and hopefully more tolerant to others’. There is no doubt that small personal voices, silenced for so long, are rightfully being heard, and that the culture that brings us sentiment analysis has also enabled a more coordinated, sustained and empathetic movement towards social justice to mobilise in response to these voices. With this vanishing in-between, some of the shelters for unambiguous immorality and clear abuses of power are being torn down. When BBC Radio’s ethical debate programme, The Moral Maze, discussed the early days of the Harvey Weinstein scandal, the journalist Tim Dowling remarked: ‘Where’s the maze?’

pages: 23 words: 5,264

Designing Great Data Products by Jeremy Howard, Mike Loukides, Margit Zwemer

AltaVista, Filter Bubble, PageRank, pattern recognition, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, text mining

In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters. Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses. These disaster applications are a particularly good example of why data products need simple, well-designed interfaces that produce concrete recommendations. In an emergency, a data product that just produces more data is of little use.

pages: 350 words: 98,077

Artificial Intelligence: A Guide for Thinking Humans by Melanie Mitchell

Ada Lovelace, AI winter, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, artificial general intelligence, autonomous vehicles, Bernie Sanders, Claude Shannon: information theory, cognitive dissonance, computer age, computer vision, dark matter, Douglas Hofstadter, Elon Musk,, Gödel, Escher, Bach, I think there is a world market for maybe five computers, ImageNet competition, Jaron Lanier, job automation, John Markoff, John von Neumann, Kevin Kelly, Kickstarter, license plate recognition, Mark Zuckerberg, natural language processing, Norbert Wiener, ought to be enough for anybody, pattern recognition, performance metric, RAND corporation, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Rodney Brooks, self-driving car, sentiment analysis, Silicon Valley, Singularitarianism, Skype, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, theory of mind, There's no reason for any individual to have a computer in his home - Ken Olsen, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!

What’s more, such information may have predictive power about other aspects of a person’s life, such as likely voting patterns and responsiveness to certain types of news stories or political ads.8 Furthermore, there have been several efforts, with varying success, to apply “sentiment mining” of, say, economics-related tweets on Twitter to predict stock prices and election outcomes. Putting aside the ethics of these applications of sentiment analysis, let’s focus on how AI systems might be able to classify the sentiment of sentences like the ones above. While it’s quite easy for humans to see that these mini-reviews are all negative, getting a program to do this kind of classification in a general way is much harder than it might seem at first glance. Some early NLP systems looked for the presence of individual words or short sequences of words as indications of the sentiment of a text.

“I was a little too young to see this terrific movie when it first came out.” “If you don’t see it, you’ll be missing out!” Looking at single words or short sequences in isolation is generally not sufficient to glean the overall sentiment; it’s necessary to capture the semantics of words in the context of the whole sentence. Soon after deep networks started to excel in computer vision and speech recognition, NLP practitioners experimented with applying them to sentiment analysis. As usual, the idea is to train the network on many human-labeled examples of sentences with both positive and negative sentiment and have the network itself learn useful features that allow it to output a classification confidence for “positive” or “negative” on a new sentence. But first, how can we get a neural network to process a sentence? Recurrent Neural Networks Processing a sentence or passage requires a different type of neural network from those I have described in previous chapters.

On the other hand, the “black hat” attackers—hackers who are actually trying to fool deployed systems for nefarious purposes—don’t publish the tricks they have come up with, so there might be many additional kinds of vulnerabilities of these systems of which we’re not yet aware. As far as I know, to date there has not been a real-world attack of these kinds on deep-learning systems, but I’d say it’s only a matter of time until we hear about such attacks. While deep learning has produced some very significant advances in speech recognition, language translation, sentiment analysis, and other areas of NLP, human-level language processing remains a distant goal. Christopher Manning, a Stanford professor and NLP luminary, noted this in 2017: “So far, problems in higher-level language processing have not seen the dramatic error rate reductions from deep learning that have been seen in speech recognition and in object recognition in vision.… The really dramatic gains may only have been possible on true signal processing tasks.”30 It seems to me to be extremely unlikely that machines could ever reach the level of humans on translation, reading comprehension, and the like by learning exclusively from online data, with essentially no real understanding of the language they process.

pages: 346 words: 97,330

Ghost Work: How to Stop Silicon Valley From Building a New Global Underclass by Mary L. Gray, Siddharth Suri

Affordable Care Act / Obamacare, Amazon Mechanical Turk, augmented reality, autonomous vehicles, barriers to entry, basic income, big-box store, bitcoin, blue-collar work, business process, business process outsourcing, call centre, Capital in the Twenty-First Century by Thomas Piketty, cloud computing, collaborative consumption, collective bargaining, computer vision, corporate social responsibility, crowdsourcing, data is the new oil, deindustrialization, deskilling, don't be evil, Donald Trump, Elon Musk, employer provided health coverage,, equal pay for equal work, Erik Brynjolfsson, financial independence, Frank Levy and Richard Murnane: The New Division of Labor, future of work, gig economy, glass ceiling, global supply chain, hiring and firing, ImageNet competition, industrial robot, informal economy, information asymmetry, Jeff Bezos, job automation, knowledge economy, low skilled workers, low-wage service sector, market friction, Mars Rover, natural language processing, new economy, passive income, pattern recognition, post-materialism, post-work, race to the bottom, Rana Plaza, recommendation engine, ride hailing / ride sharing, Ronald Coase, Second Machine Age, sentiment analysis, sharing economy, Shoshana Zuboff, side project, Silicon Valley, Silicon Valley startup, Skype, software as a service, speech recognition, spinning jenny, Stephen Hawking, The Future of Employment, The Nature of the Firm, transaction costs, two-sided market, union organizing, universal basic income, Vilfredo Pareto, women in the workforce, Works Progress Administration, Y Combinator

Microsoft’s strength in speech recognition and machine translation comes from the ghost work of people training algorithms with accurate data sets. They create them by listening to short audio recordings of one sentence in one language, typically English, and entering the translation of the sentence in their mother tongue in an Excel file. Other common types of work on UHRS are market surveys—often restricted by demographics like age, gender, and location—and a task called “sentiment analysis.” In sentiment analysis, workers may look at a series of words, selfies, videos, or audio files and add a word to each data point that describes their sense of the mood of the word, person, action, or sound in front of them. These human insights become the training data for algorithms later shown the same materials. Back at home, Kala often turns to her sons for help completing categorization tasks, especially ones that require knowledge of American colloquialisms.

., 19 robots, xviii–xxiii Romney, Mitt, xii Rosie the Riveter, 47 S S&P Global Market Intelligence, 62 safety, workplace algorithmic cruelty, 86 Bangladesh Accord, 193–94 for full-time employment, 60, 97 Good Work Code, 157 industrial era, 45–46 unraveling of, xxiii–xxiv workspaces, 190 safety net, for workers, 189–92 Sanjay, 128–29 Sanjeev, 126 scaffolding technique, 149–50, 164, 240 n11 scams, 104, 122, 125 scheduling 80/20 rule, 103, 118 always-on workers, 104, 105, 126, 150–51, 158–59, 170, 190 control over, 96, 99–100, 108, 157 employer control over, xxvi, 48 experimentalists, 104, 126, 150–51 just-in-time scheduling, 100, 235 n11 MTurk, 5, 79 as priority, 147, 150, 155, 164 Treaty of Detroit, 48 Sears, Mark, 141, 143, 149 self-improvement, 100, 110–13 sentiment analysis, 19 Service Employees International Union, 158–59, 191 service jobs, growth of, 97 Shah, Palak, 157 shared workspaces, 180–81 Singh, Manmohan, 55 skilled work, 39, 51, 97 skills, learning, 100, 110–13 skills gap, 230 n26 Skype, 23, 132, 179 slavery, 40–41, 226 n2 Smart Glasses, 167–68 Smith, Aaron, 219 n2, 242 n2 Smith, Adam, 58 social consequences, algorithmic cruelty, 68–69 social entrepreneurship, 147–55 social environment forums as, 132–33, 164, 239 n8 job validation, 95 need for, 178–80, 233 n6 requesters on, 73–74 in workplaces, 121–23, 173–74 See also collaboration Software Technology Parks of India (STPI), 55 SpaceX, xviii Sparrow Cycling, 142 speech recognition, 30 spinning jenny, 43, 173 Star, Susan Leigh, 238 n2 Starbucks, 28, 100 Stern, Andy, 191 Strauss, Anselm, 238 n1 strikes, 47, 48 subcontracting, Industrial Revolution, 41–42 success, changing definition of, 97–98 Suchman, Lucy, 238 n3 support collaboration, 121–23, 133–37 for on-demand work, 105 as requirement, 162 of workers, 21, 140–43, 149, 240 n11 See also double bottom line; forums Suri, Siddharth, xxvii–xxix, 221 n23 surveys LeadGenius, 224 n27 market surveys, 3, 19 on payment, 90–91 as task, 87, 116, 219 n2, 242 n2 worker motivation, 100 T Taft, Robert A., 48 Taft-Hartley Act, 48–49, 54, 228 n20 Taste of the World, 14 Taylor, Frederick, 227 n6 Team Genius, 88–90 teamwork, 24, 28, 160–61, 164, 182–83 technology AI. see artificial intelligence (AI) APIs. see application programming interface (API) automation, xviii–xxiii, 173–77, 176–77, 243 n5 computers. see computers machinery, 42, 43–44, 58–59, 227 n5 paradox of automation, xxii, 36, 170, 173, 175 Technology, Entertainment and Design (TED).

pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron

Amazon Mechanical Turk, Anton Chekhov, combinatorial explosion, computer vision, constrained optimization, correlation coefficient, crowdsourcing, don't repeat yourself, Elon Musk,, friendly AI, ImageNet competition, information retrieval, iterative process, John von Neumann, Kickstarter, natural language processing, Netflix Prize, NP-complete, optical character recognition, P = NP, p-value, pattern recognition, pull request, recommendation engine, self-driving car, sentiment analysis, SpamAssassin, speech recognition, stochastic process

In autonomous driving systems, they can anticipate car trajectories and help avoid accidents. More generally, they can work on sequences of arbitrary lengths, rather than on fixed-sized inputs like all the nets we have discussed so far. For example, they can take sentences, documents, or audio samples as input, making them extremely useful for natural language processing (NLP) systems such as automatic translation, speech-to-text, or sentiment analysis (e.g., reading movie reviews and extracting the rater’s feeling about the movie). Moreover, RNNs’ ability to anticipate also makes them capable of surprising creativity. You can ask them to predict which are the most likely next notes in a melody, then randomly pick one of these notes and play it. Then ask the net for the next most likely notes, play it, and repeat the process again and again.

Besides the long training time, a second problem faced by long-running RNNs is the fact that the memory of the first inputs gradually fades away. Indeed, due to the transformations that the data goes through when traversing an RNN, some information is lost after each time step. After a while, the RNN’s state contains virtually no trace of the first inputs. This can be a showstopper. For example, say you want to perform sentiment analysis on a long review that starts with the four words “I loved this movie,” but the rest of the review lists the many things that could have made the movie even better. If the RNN gradually forgets the first four words, it will completely misinterpret the review. To solve this problem, various types of cells with long-term memory have been introduced. They have proved so successful that the basic cells are not much used anymore.

GRU computations Creating a GRU cell in TensorFlow is trivial: gru_cell = tf.contrib.rnn.GRUCell(num_units=n_neurons) LSTM or GRU cells are one of the main reasons behind the success of RNNs in recent years, in particular for applications in natural language processing (NLP). Natural Language Processing Most of the state-of-the-art NLP applications, such as machine translation, automatic summarization, parsing, sentiment analysis, and more, are now based (at least in part) on RNNs. In this last section, we will take a quick look at what a machine translation model looks like. This topic is very well covered by TensorFlow’s awesome Word2Vec and Seq2Seq tutorials, so you should definitely check them out. Word Embeddings Before we start, we need to choose a word representation. One option could be to represent each word using a one-hot vector.

pages: 391 words: 123,597

Targeted: The Cambridge Analytica Whistleblower's Inside Story of How Big Data, Trump, and Facebook Broke Democracy and How It Can Happen Again by Brittany Kaiser

Albert Einstein, Amazon Mechanical Turk, Asian financial crisis, Bernie Sanders, bitcoin, blockchain, Boris Johnson, Burning Man, call centre, centre right, Chelsea Manning, clean water, cognitive dissonance, crony capitalism, Dominic Cummings, Donald Trump, Edward Snowden, Etonian, haute couture, illegal immigration, Julian Assange, Mark Zuckerberg, Menlo Park, Nelson Mandela, off grid, open borders, Renaissance Technologies, Robert Mercer, rolodex, sentiment analysis, Silicon Valley, Silicon Valley startup, Skype, Snapchat, statistical model, the High Line, the scientific method, WikiLeaks, young professional

And testing done by the data scientists and digital strategists, such as putting money behind a controlled set of ads versus targeted issues, could show (by measuring everything from the percentage increase in viewers’ favorability for Donald Trump to the percentage increase in the viewers’ intention to vote for him) if that campaign was working to convert impressions into votes. Besides Molly’s sets of dashboards, the team had access to data from “sentiment analysis platforms” such as Synthesio and Crimson Hexagon, which measured the effect, positive or negative, that all the campaign’s tweets, including Trump’s, were having.2 For example, if the campaign put out a video of Hillary calling Trump supporters “deplorables,” it could put money behind a few different versions of the ad and watch its performance in real time to determine how many people were watching, whether they paused the video, and whether they finished watching the video.

I needed either to find a way to stay at the company and make use of data for the common good, which had been my intention in joining SCL in the first place, or get out somehow. In this tumultuous moment, I pursued the former idea quietly, by reaching out to social justice and human rights contacts. I saw more clearly than ever that CA might be able to use Big Data to help diplomats manage crises in conflict zones. I brainstormed ways that AI and new language recognition and sentiment analysis could assist us in processing massive amounts of war crimes testimony, finding patterns in it. Perhaps psychographic modeling, which had been deployed on the U.S. population—to, I felt, disastrous effect—could be used to create regime change where it was most needed. I worked with Robert Murtfeld to reach out to Fatou Bensouda, the prosecutor of the International Criminal Court, and the U.S. ambassador-at-large for war crimes, Stephen Rapp, and we began to explore some options.

Trump Make America Great Again; Understanding the Voting Electorate,” PowerPoint presentation, Cambridge Analytica office, New York, December 7, 2016. 5.Lauren Etter, Vernon Silver, and Sarah Frier, “How Facebook’s Political Unit Enables the Dark Art of Digital Propaganda,”, December 21, 2017,–12–21/inside-the-facebook-team-helping-regimes-that-reach-out-and-crack-down. 6.Nancy Scola, “How Facebook, Google, and Twitter ‘Embeds’ Helped Trump in 2016,” Politico, October 26, 2017, 11: BREXIT BRITTANY 1.Jeremy Herron and Anna-Louise Jackson, “World Markets Roiled by Brexit as Stocks, Pound Drop; Gold Soars,”, June 23, 2016,–06–23/pound-surge-builds-as-polls-show-u-k-to-remain-in-eu-yen-slips. 2.Aaron Wherry, “Canadian Company Linked to Data Scandal Pushes Back at Whistleblower’s Claims: AggregateIQ Denies Links to Scandal-Plagued Cambridge Analytica,” CBC, April 24, 2018, 13: POSTMORTEM 1. Nancy Scola, “How Facebook, Google, and Twitter ‘Embeds’ Helped Trump in 2016,” Politico, October 26, 2017, 2. Sentiment analysis has its roots, interestingly enough, in the innovations Robert Mercer pioneered years before at IBM. For the campaign, it measured not only if people liked tweets or retweeted them but something more nuanced: whether tweeters were feeling positive or negative when composing their tweets. 3.Glenn Kessler, “Did Michelle Obama Throw Shade at Hillary Clinton?” Washington Post, November 1, 2016,

pages: 271 words: 77,448

Humans Are Underrated: What High Achievers Know That Brilliant Machines Never Will by Geoff Colvin

Ada Lovelace, autonomous vehicles, Baxter: Rethink Robotics, Black Swan, call centre, capital asset pricing model, commoditize, computer age, corporate governance, creative destruction, deskilling,, Freestyle chess, future of work, Google Glasses, Grace Hopper, industrial cluster, industrial robot, interchangeable parts, job automation, knowledge worker, low skilled workers, Marc Andreessen, meta analysis, meta-analysis, Narrative Science, new economy, rising living standards, self-driving car, sentiment analysis, Silicon Valley, Skype, social intelligence, Steve Jobs, Steve Wozniak, Steven Levy, Steven Pinker, theory of mind, Tim Cook: Apple, transaction costs

You’ve noticed that even the camera in your phone can detect faces and put little boxes around them. More advanced software can examine those faces and spot the muscle movements from Ekman’s system. The possibilities of such technology prompted six PhDs at the University of California at San Diego to form Emotient and to recruit Ekman to their advisory board. Point a video camera at any person’s face, and the company’s Sentiment Analysis software can tell you that person’s overall sentiment (positive, negative, neutral) plus display a continually updating bar chart showing levels of seven primary emotions—joy, surprise, sadness, fear, disgust, contempt, anger—and two advanced emotions, frustration and confusion (advanced because they’re combinations of other emotions). Point the camera at a group of people and it analyzes all their emotions and gives you a composite readout.

Point the camera at a group of people and it analyzes all their emotions and gives you a composite readout. Incorporate the software into Google Glass, as the company has done, and the emotion readouts for anyone you’re looking at appear before your eyes (and yes, several people quickly noted that the emotion you may very well detect is contempt for you because you’re wearing Google Glass). Emotient’s initial target market for selling the Sentiment Analysis system was retailers, but the possibilities are obviously much broader. Affectiva, a spin-off from MIT’s Media Lab, also uses Ekman’s research to analyze facial expressions, selling its software to marketers and advertisers so they can conduct consumer research online using webcams. No need to get your research subjects into a focus group and guess what they’re thinking; just have them talk to you online and let their faces tell the story.

pages: 297 words: 83,651

The Twittering Machine by Richard Seymour

4chan, anti-communist, augmented reality, Bernie Sanders, Cal Newport, Cass Sunstein, Chelsea Manning, citizen journalism, colonial rule, correlation does not imply causation, credit crunch, crowdsourcing, don't be evil, Donald Trump, Elon Musk, Erik Brynjolfsson, Filter Bubble, Google Chrome, Google Earth, hive mind, informal economy, Internet of things, invention of movable type, invention of writing, Jaron Lanier, Jony Ive, Kevin Kelly, knowledge economy, late capitalism, liberal capitalism, Mark Zuckerberg, Marshall McLuhan, meta analysis, meta-analysis, Mohammed Bouazizi, moral panic, move fast and break things, move fast and break things, Network effects, new economy, packet switching, patent troll, Philip Mirowski, post scarcity, post-industrial society, RAND corporation, Rat Park, rent-seeking, replication crisis, sentiment analysis, Shoshana Zuboff, Silicon Valley, Silicon Valley ideology, smart cities, Snapchat, Steve Jobs, Stewart Brand, Stuxnet, TaskRabbit, technoutopianism, the scientific method, Tim Cook: Apple, undersea cable, upwardly mobile, white flight, Whole Earth Catalog, WikiLeaks

Moreover, the conceptual schema from which tools are generated can be transferred to new contexts, thus generating new types of relationship. To talk about technologies is to talk about societies. This is about a social industry. As an industry it is able, through the production and harvesting of data, to objectify and quantify social life in numerical form. As William Davies has argued, its unique innovation is to make social interactions visible and susceptible to data analytics and sentiment analysis.6 This makes social life eminently susceptible to manipulation on the part of governments, parties and companies who buy data services. But more than that, it produces social life; it programmes it. This is what it means when we spend more hours tapping on the screen than talking to anyone face to face; that our social life is governed by algorithm and protocol. When Theodore Adorno wrote of the ‘culture industry’, arguing that culture was being universally commodified and homogenized, it was arguably an elitist simplification.

It is for cultural reasons, external to the logic of the platform, that such content can pose a threat, by inviting government regulation or encouraging users to disconnect. Even then, there is little the platforms can do without upsetting the ecologies of attention and data creation. For example, Facebook’s efforts to demonstrate conscientious engagement include changing the content of someone’s feed if the machine’s sentiment analysis discloses that they might be at risk of suicide. A page offering help for suicidal people might appear in the feed. Friends of the possible suicide might see an enlarged ‘report post’ button. But what if there are perverse incentives that arise from features that are intrinsic to the profit model? What if Conrad’s ‘demon of perverse inspiration’ now works by algorithm?40 In 2017, for example, a young woman from Ohio was sent to prison for nine months, after she live-streamed the rape of her friend by an older man.41 Marina Lonina was eighteen years old, her friend was seventeen and the rapist, Raymond Gates, was twenty-nine.

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Bayesian statistics, business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, discrete time, disruptive innovation, George Gilder, Google Earth, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, late capitalism, lifelogging, linked data, longitudinal study, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

Examples would include entity extraction that automatically extracts metadata from text by searching for particular types of text and phrasing, such as person names, locations, dates, specialised terms and product terminology, and entity relation extraction that automatically identifies the relationships between semantic entities, linking them together (e.g., person name to birth date or location, or an opinion to an item) (McCreary 2009). A typical application of such techniques is sentiment analysis which seeks to determine the general nature and strength of opinions about an issue, for example, what people are saying about a product on social media. By using placemark metadata it is also possible to track where such sentiment is expressed (Graham et al. 2013) and to mine the dissemination of information within social media, for example, how widely Web addresses are favourited and shared between multiple users (Ohlhorst 2013).

Index A/B testing 112 abduction 133, 137, 138–139, 148 accountability 34, 44, 49, 55, 63, 66, 113, 116, 165, 171, 180 address e-mail 42 IP 8, 167, 171 place 8, 32, 42, 45, 52, 93, 171 Web 105 administration 17, 30, 34, 40, 42, 56, 64, 67, 87, 89, 114–115, 116, 124, 174, 180, 182 aggregation 8, 14, 101, 140, 169, 171 algorithm 5, 9, 21, 45, 76, 77, 83, 85, 89, 101, 102, 103, 106, 109, 111, 112, 118, 119, 122, 125, 127, 130, 131, 134, 136, 142, 146, 154, 160, 172, 177, 179, 181, 187 Amazon 72, 96, 131, 134 Anderson, C. 130, 135 Andrejevic, M. 133, 167, 178 animation 106, 107 anonymity 57, 63, 79, 90, 92, 116, 167, 170, 171, 172, 178 apophenia 158, 159 Application Programming Interfaces (APIs) 57, 95, 152, 154 apps 34, 59, 62, 64, 65, 78, 86, 89, 90, 95, 97, 125, 151, 170, 174, 177 archive 21, 22, 24, 25, 29–41, 48, 68, 95, 151, 153, 185 archiving 23, 29–31, 64, 65, 141 artificial intelligence 101, 103 Acxiom 43, 44 astronomy 34, 41, 72, 97 ATM 92, 116 audio 74, 77, 83 automatic meter reading (AMR) 89 automatic number plate recognition (ANPR) 85, 89 automation 32, 51, 83, 85, 87, 89–90, 98, 99, 102, 103, 118, 127, 136, 141, 146, 180 Ayasdi 132, 134 backup 29, 31, 40, 64, 163 barcode 74, 85, 92, Bates, J. 56, 61, 62, 182 Batty, M. 90, 111, 112, 140 Berry, D. 134, 141 bias 13, 14, 19, 28, 45, 101, 134–136, 153, 154, 155, 160 Big Brother 126, 180 big data xv, xvi, xvii, 2, 6, 13, 16, 20, 21, 27–29, 42, 46, 67–183, 186, 187, 188, 190, 191, 192 analysis 100–112 characteristics 27–29, 67–79 enablers 80–87 epistemology 128–148 ethical issues 165–183 etymology 67 organisational issues 160–163 rationale 113–127 sources 87–99 technical issues 149–160 biological sciences 128–129, 137 biometric data 8, 84, 115 DNA 8, 71, 84 face 85, 88, 105 fingerprints 8, 9, 84, 87, 88, 115 gait 85, 88 iris 8, 84, 88 bit-rot 20 blog 6, 95, 170 Bonferroni principle 159 born digital 32, 46, 141 Bowker, G. 2, 19, 20, 22, 24 Borgman, C. 2, 7, 10, 20, 30, 37, 40, 41 boyd, D. 68, 75, 151, 152, 156, 158, 160, 182 Brooks, D. 130, 145 business 1, 16, 42, 45, 56, 61, 62, 67, 79, 110, 113–127, 130, 137, 149, 152, 161, 166, 172, 173, 187 calculative practices 115–116 Campbell’s Law 63, 127 camera 6, 81, 83, 87, 88, 89, 90, 107, 116, 124, 167, 178, 180 capitalism 15, 16, 21, 59, 61, 62, 86, 95, 114, 119–123, 126, 136, 161, 184, 186 capta 2 categorization 6, 8, 12, 19, 20, 102, 106, 176 causation 130, 132, 135, 147 CCTV 87, 88, 180 census 17, 18, 19, 22, 24, 27, 30, 43, 54, 68, 74, 75, 76, 77, 87, 102, 115, 157, 176 Centro De Operações Prefeitura Do Rio 124–125, 182 CERN 72, 82 citizen science 97–99, 155 citizens xvi, 45, 57, 58, 61, 63, 71, 88, 114, 115, 116, 126, 127, 165, 166, 167, 174, 176, 179, 187 citizenship 55, 115, 170, 174 classification 6, 10, 11, 23, 28, 104, 105, 157, 176 clickstream 43, 92, 94, 120, 122, 154, 176 clustering 103, 104, 105, 106, 110, 122 Codd, E. 31 competitiveness xvi, 16, 114, computation 2, 4, 5, 6, 29, 32, 68, 80, 81–82, 83, 84, 86, 98, 100, 101, 102, 110, 129, 136, 139–147, 181 computational social science xiv, 139–147, 152, 186 computing cloud xv, 81, 86 distributed xv, 37, 78, 81, 83, 98 mobile xv, 44, 78, 80, 81, 83, 85, 139 pervasive 81, 83–84, 98, 124 ubiquitous 80, 81, 83–84, 98, 100, 124, 126 confidence level 14, 37, 133, 153, 160 confidentiality 8, 169, 175 control creep 126, 166, 178–179 cookies 92, 119, 171 copyright 16, 30, 40, 49, 51, 54, 96 correlation 105, 110, 130, 131, 132, 135, 145, 147, 157, 159 cost xv, 6, 11, 16, 27, 31, 32, 37, 38, 39, 40, 44, 52, 54, 57, 58, 59, 61, 66, 80, 81, 83, 85, 93, 96, 100, 116, 117, 118, 120, 127, 150 Crawford, K. 68, 75, 135, 151, 152, 155, 156, 158, 160, 182 credit cards 8, 13, 42, 44, 45, 85, 92, 167, 171, 176 risk 42, 63, 75, 120, 176, 177 crime 55, 115, 116, 123, 175, 179 crowdsourcing 37, 73, 93, 96–97, 155, 160 Cukier, K. 68, 71, 72, 91, 114, 128, 153, 154, 161, 174 customer relationship management (CRM) 42, 99, 117–118, 120, 122, 176 cyber-infrastructure 33, 34, 35, 41, 186 dashboard 106, 107, 108 data accuracy 12, 14, 110, 153, 154, 171 administrative 84–85, 89, 115, 116, 125, 150, 178 aggregators see data brokers amplification 8, 76, 99, 102, 167 analogue 1, 3, 32, 83, 88, 140, 141 analytics 42, 43, 63, 73, 80, 100–112, 116, 118, 119, 120, 124, 125, 129, 132, 134, 137, 139, 140, 145, 146, 149, 151, 159, 160, 161, 176, 179, 186, 191 archive see archive assemblage xvi, xvii, 2, 17, 22, 24–26, 66, 80, 83, 99, 117, 135, 139, 183, 184–192 attribute 4, 8–9, 31, 115, 150 auditing 33, 40, 64, 163 authenticity 12, 153 automated see automation bias see bias big see big data binary 1, 4, 32, 69 biometric see biometric data body 177–178, 187 boosterism xvi, 67, 127, 187, 192 brokers 42–45, 46, 57, 74, 75, 167, 183, 186, 187, 188, 191 calibration 13, 20 catalogue 32, 33, 35 clean 12, 40, 64, 86, 100, 101, 102, 152, 153, 154, 156 clearing house 33 commodity xvi, 4, 10, 12, 15, 16, 41, 42–45, 56, 161 commons 16, 42 consolidators see data brokers cooked 20, 21 corruption 19, 30 curation 9, 29, 30, 34, 36, 57, 141 definition 1, 2–4 deluge xv, 28, 73, 79, 100, 112, 130, 147, 149–151, 157, 168, 175 derived 1, 2, 3, 6–7, 8, 31, 32, 37, 42, 43, 44, 45, 62, 86, 178 deserts xvi, 28, 80, 147, 149–151, 161 determinism 45, 135 digital 1, 15, 31, 32, 67, 69, 71, 77, 82, 85, 86, 90, 137 directories 33, 35 dirty 29, 154, 163 dive 64–65, 188 documentation 20, 30, 31, 40, 64, 163 dredging 135, 147, 158, 159 dump 64, 150, 163 dynamic see dynamic data enrichment 102 error 13, 14, 44, 45, 101, 110, 153, 154, 156, 169, 175, 180 etymology 2–3, 67 exhaust 6–7, 29, 80, 90 fidelity 34, 40, 55, 79, 152–156 fishing see data dredging formats xvi, 3, 5, 6, 9, 22, 25, 30, 33, 34, 40, 51, 52, 54, 65, 77, 102, 153, 156, 157, 174 framing 12–26, 133–136, 185–188 gamed 154 holding 33, 35, 64 infrastructure xv, xvi, xvii, 2, 21–24, 25, 27–47, 52, 64, 102, 112, 113, 128, 129, 136, 140, 143, 147, 148, 149, 150, 156, 160, 161, 162, 163, 166, 184, 185, 186, 188, 189, 190, 191, 192 integration 42, 149, 156–157 integrity 12, 30, 33, 34, 37, 40, 51, 154, 157, 171 interaction 43, 72, 75, 85, 92–93, 94, 111, 167 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 156–157, 163, 184 interval 5, 110 licensing see licensing lineage 9, 152–156 linked see linked data lost 5, 30, 31, 39, 56, 150 markets xvi, 8, 15, 25, 42-45, 56, 59, 75, 167, 178 materiality see materiality meta see metadata mining 5, 77, 101, 103, 104–106, 109, 110, 112, 129, 132, 138, 159, 188 minimisation 45, 171, 178, 180 nominal 5, 110 ordinal 5, 110 open see open data ontology 12, 28, 54, 150 operational 3 ownership 16, 40, 96, 156, 166 preparation 40, 41, 54, 101–102 philosophy of 1, 2, 14, 17–21, 22, 25, 128–148, 185–188 policy 14, 23, 30, 33, 34, 37, 40, 48, 64, 160, 163, 170, 172, 173, 178 portals 24, 33, 34, 35 primary 3, 7–8, 9, 50, 90 preservation 30, 31, 34, 36, 39, 40, 64, 163 protection 15, 16, 17, 20, 23, 28, 40, 45, 62, 63, 64, 167, 168–174, 175, 178, 188 protocols 23, 25, 30, 34, 37 provenance 9, 30, 40, 79, 153, 156, 179 qualitative 4–5, 6, 14, 146, 191 quantitative 4–5, 14, 109, 127, 136, 144, 145, 191 quality 12, 13, 14, 34, 37, 40, 45, 52, 55, 57, 58, 64, 79, 102, 149, 151, 152–156, 157, 158 raw 1, 2, 6, 9, 20, 86, 185 ratio 5, 110 real-time 65, 68, 71, 73, 76, 88, 89, 91, 99, 102, 106, 107, 116, 118, 121, 124, 125, 139, 151, 181 reduction 5, 101–102 representative 4, 8, 13, 19, 21, 28 relational 3, 8, 28, 44, 68, 74–76, 79, 84, 85, 87, 88, 99, 100, 119, 140, 156, 166, 167, 184 reliability 12, 13–14, 52, 135, 155 resellers see data brokers resolution 7, 26, 27, 28, 68, 72, 73–74, 79, 84, 85, 89, 92, 133–134, 139, 140, 150, 180 reuse 7, 27, 29, 30, 31, 32, 39, 40, 41, 42, 46, 48, 49–50, 52, 56, 59, 61, 64, 102, 113, 163 scaled xvi, xvii 32, 100, 101, 112, 138, 149, 150, 163, 186 scarcity xv, xvi, 28, 80, 149–151, 161 science xvi, 100–112, 130, 137–139, 148, 151, 158, 160–163, 164, 191 secondary 3, 7–8 security see security selection 101, 176 semi-structured 4, 5–6, 77, 100, 105 sensitive 15, 16, 45, 63, 64, 137, 151, 167, 168, 171, 173, 174 shadow 166–168, 177, 179, 180 sharing 9, 11, 20, 21, 23, 24, 27, 29–41, 48–66, 80, 82, 95, 113, 141, 151, 174, 186 small see small data social construction 19–24 spatial 17, 52, 63, 68, 73, 75, 84–85, 88–89 standards xvi, 9, 14, 19, 22, 23, 24, 25, 31, 33, 34, 38, 40, 52, 53, 64, 102, 153, 156, 157 storage see storage stranded 156 structures 4, 5–6, 12, 21, 23, 30, 31, 40, 51, 68, 77, 86, 103, 106, 156 structured 4, 5–6, 11, 32, 52, 68, 71, 75, 77, 79, 86, 88, 105, 112, 163 tertiary 7–8, 9, 27, 74 time-series 68, 102, 106, 110 transient 6–7, 72, 150 transactional 42, 43, 71, 72, 74, 75, 85, 92, 93–94, 120, 122, 131, 167, 175, 176, 177 uncertainty see uncertainty unstructured 4, 5–6, 32, 52, 68, 71, 75, 77, 86, 100, 105, 112, 140, 153, 157 validity 12, 40, 72, 102, 135, 138, 154, 156, 158 variety 26, 28, 43, 44, 46, 68, 77, 79, 86, 139, 140, 166, 184 velocity 26, 28, 29, 68, 76–77, 78, 79, 86, 88, 102, 106, 112. 117, 140, 150, 153, 156, 184 veracity 13, 79, 102, 135, 152–156, 157, 163 volume 7, 26, 27, 28, 29, 32, 46, 67, 68, 69–72, 74, 76, 77, 78, 79, 86, 102, 106, 110, 125, 130, 135, 140, 141, 150, 156, 166, 184 volunteered 87, 93–98, 99, 155 databank 29, 34, 43 database NoSQL 6, 32, 77, 78, 86–87 relational 5, 6, 8, 32–33, 43, 74–75, 77, 78, 86, 100, 105 data-driven science 133, 137–139, 186 data-ism 130 datafication 181 dataveillance 15, 116, 126, 157, 166–168, 180, 181, 182, 184 decision tree 104, 111, 122, 159, deconstruction 24, 98, 126, 189–190 decontextualisation 22 deduction 132, 133, 134, 137, 138, 139, 148 deidentification 171, 172, 178 democracy 48, 55, 62, 63, 96, 117, 170 description 9, 101, 104, 109, 143, 147, 151, 190 designated community 30–31, 33, 46 digital devices 13, 25, 80, 81, 83, 84, 87, 90–91, 167, 174, 175 humanities xvi, 139–147, 152, 186 object identifier 8, 74 serendipity 134 discourse 15, 20, 55, 113–114, 117, 122, 127, 192 discursive regime 15, 20, 24, 56, 98, 113–114, 116, 123, 126, 127, 190 disruptive innovation xv, 68, 147, 184, 192 distributed computing xv, 37, 78, 81, 83, 98 sensors 124, 139, 160 storage 34, 37, 68, 78, 80, 81, 85–87, 97 division of labour 16 Dodge, M. 2, 21, 68, 73, 74, 76, 83, 84, 85, 89, 90, 92, 93, 96, 113, 115, 116, 124, 154, 155, 167, 177, 178, 179, 180, 189 driver’s licence 45, 87, 171 drone 88, Dublin Core 9 dynamic data xv, xvi, 76–77, 86, 106, 112 pricing 16, 120, 123, 177 eBureau 43, 44 ecological fallacy 14, 102, 135, 149, 158–160 Economist, The 58, 67, 69, 70, 72, 128 efficiency 16, 38, 55, 56, 59, 66, 77, 93, 102, 111, 114, 116, 118, 119, 174, 176 e-mail 71, 72–73, 82, 85, 90, 93, 116, 174, 190 empiricism 129, 130–137, 141, 186 empowerment 61, 62–63, 93, 115, 126, 165 encryption 171, 175 Enlightenment 114 Enterprise Resource Planning (ERP) 99, 117, 120 entity extraction 105 epistemology 3, 12, 19, 73, 79, 112, 128–148, 149, 185, 186 Epsilon 43 ethics 12, 14–15, 16, 19, 26, 30, 31, 40, 41, 64, 73, 99, 128, 144, 151, 163, 165–183, 186 ethnography 78, 189, 190, 191 European Union 31, 38, 45, 49, 58, 59, 70, 157, 168, 173, 178 everyware 83 exhaustive 13, 27, 28, 68, 72–73, 79, 83, 88, 100, 110, 118, 133–134, 140, 150, 153, 166, 184 explanation 101, 109, 132, 133, 134, 137, 151 extensionality 67, 78, 140, 184 experiment 2, 3, 6, 34, 75, 78, 118, 129, 131, 137, 146, 150, 160 Facebook 6, 28, 43, 71, 72, 77, 78, 85, 94, 119, 154, 170 facts 3, 4, 9, 10, 52, 140, 159 Fair Information Practice Principles 170–171, 172 false positive 159 Federal Trade Commission (FTC) 45, 173 flexibility 27, 28, 68, 77–78, 79, 86, 140, 157, 184 Flickr 95, 170 Flightradar 107 Floridi, L. 3, 4, 9, 10, 11, 73, 112, 130, 151 Foucault, M. 16, 113, 114, 189 Fourth paradigm 129–139 Franks, B. 6, 111, 154 freedom of information 48 freemium service 60 funding 15, 28, 29, 31, 34, 37, 38, 40, 41, 46, 48, 52, 54–55, 56, 57–58, 59, 60, 61, 65, 67, 75, 119, 143, 189 geographic information systems 147 genealogy 98, 127, 189–190 Gitelman, L. 2, 19, 20, 21, 22 Global Positioning System (GPS) 58, 59, 73, 85, 88, 90, 121, 154, 169 Google 32, 71, 73, 78, 86, 106, 109, 134, 170 governance 15, 21, 22, 23, 38, 40, 55, 63, 64, 66, 85, 87, 89, 117, 124, 126, 136, 168, 170, 178–182, 186, 187, 189 anticipatory 126, 166, 178–179 technocratic 126, 179–182 governmentality xvi, 15, 23, 25, 40, 87, 115, 127, 168, 185, 191 Gray, J. 129–130 Guardian, The 49 Gurstein, M. 52, 62, 63 hacking 45, 154, 174, 175 hackathon 64–65, 96, 97, 188, 191 Hadoop 87 hardware 32, 34, 40, 63, 78, 83, 84, 124, 143, 160 human resourcing 112, 160–163 hype cycle 67 hypothesis 129, 131, 132, 133, 137, 191 IBM 70, 123, 124, 143, 162, 182 identification 8, 44, 68, 73, 74, 77, 84–85, 87, 90, 92, 115, 169, 171, 172 ideology 4, 14, 25, 61, 113, 126, 128, 130, 134, 140, 144, 185, 190 immutable mobiles 22 independence 3, 19, 20, 24, 100 indexical 4, 8–9, 32, 44, 68, 73–74, 79, 81, 84–85, 88, 91, 98, 115, 150, 156, 167, 184 indicator 13, 62, 76, 102, 127 induction 133, 134, 137, 138, 148 information xvii, 1, 3, 4, 6, 9–12, 13, 23, 26, 31, 33, 42, 44, 45, 48, 53, 67, 70, 74, 75, 77, 92, 93, 94, 95, 96, 100, 101, 104, 105, 109, 110, 119, 125, 130, 138, 140, 151, 154, 158, 161, 168, 169, 171, 174, 175, 184, 192 amplification effect 76 freedom of 48 management 80, 100 overload xvi public sector 48 system 34, 65, 85, 117, 181 visualisation 109 information and communication technologies (ICTs) xvi, 37, 80, 83–84, 92, 93, 123, 124 Innocentive 96, 97 INSPIRE 157 instrumental rationality 181 internet 9, 32, 42, 49, 52, 53, 66, 70, 74, 80, 81, 82, 83, 86, 92, 94, 96, 116, 125, 167 of things xv, xvi, 71, 84, 92, 175 intellectual property rights xvi, 11, 12, 16, 25, 30, 31, 40, 41, 49, 50, 56, 62, 152, 166 Intelius 43, 44 intelligent transportation systems (ITS) 89, 124 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 149, 156–157, 163, 184 interpellation 165, 180, 188 interviews 13, 15, 19, 78, 155, 190 Issenberg, S. 75, 76, 78, 119 jurisdiction 17, 25, 51, 56, 57, 74, 114, 116 Kafka 180 knowledge xvii, 1, 3, 9–12, 19, 20, 22, 25, 48, 53, 55, 58, 63, 67, 93, 96, 110, 111, 118, 128, 130, 134, 136, 138, 142, 159, 160, 161, 162, 187, 192 contextual 48, 64, 132, 136–137, 143, 144, 187 discovery techniques 77, 138 driven science 139 economy 16, 38, 49 production of 16, 20, 21, 24, 26, 37, 41, 112, 117, 134, 137, 144, 184, 185 pyramid 9–10, 12, situated 16, 20, 28, 135, 137, 189 Latour, B. 22, 133 Lauriault, T.P. 15, 16, 17, 23, 24, 30, 31, 33, 37, 38, 40, 153 law of telecosm 82 legal issues xvi, 1, 23, 25, 30, 31, 115, 165–179, 182, 183, 187, 188 levels of measurement 4, 5 libraries 31, 32, 52, 71, 141, 142 licensing 14, 25, 40, 42, 48, 49, 51, 53, 57, 73, 96, 151 LIDAR 88, 89, 139 linked data xvii, 52–54, 66, 156 longitudinal study 13, 76, 140, 149, 150, 160 Lyon, D. 44, 74, 87, 167, 178, 180 machine learning 5, 6, 101, 102–104, 106, 111, 136, 188 readable 6, 52, 54, 81, 84–85, 90, 92, 98 vision 106 management 62, 88, 117–119, 120, 121, 124, 125, 131, 162, 181 Manovich, L. 141, 146, 152, 155 Manyika, J. 6, 16, 70, 71, 72, 104, 116, 118, 119, 120, 121, 122, 161 map 5, 22, 24, 34, 48, 54, 56, 73, 85, 88, 93, 96, 106, 107, 109, 115, 143, 144, 147, 154, 155–156, 157, 190 MapReduce 86, 87 marginal cost 11, 32, 57, 58, 59, 66, 151 marketing 8, 44, 58, 73, 117, 119, 120–123, 131, 176 marketisation 56, 61–62, 182 materiality 4, 19, 21, 24, 25, 66, 183, 185, 186, 189, 190 Mattern, S. 137, 181 Mayer-Schonberger, V. 68, 71, 72, 91, 114, 153, 154, 174 measurement 1, 3, 5, 6, 10, 12, 13, 15, 19, 23, 69, 97, 98, 115, 128, 166 metadata xvi, 1, 3, 4, 6, 8–9, 13, 22, 24, 29, 30, 31, 33, 35, 40, 43, 50, 54, 64, 71, 72, 74, 78, 85, 91, 93, 102, 105, 153, 155, 156 methodology 145, 158, 185 middleware 34 military intelligence 71, 116, 175 Miller, H.J. xvi, 27, 100, 101, 103, 104, 138, 139, 159 Minelli, M. 101, 120, 137, 168, 170, 171, 172, 174, 176 mixed methods 147, 191 mobile apps 78 computing xv, 44, 78, 80, 81, 83, 85, 139 mapping 88 phones 76, 81, 83, 90, 93, 151, 168, 170, 175 storage 85 mode of production 16 model 7, 11, 12, 24, 32, 37, 44, 57, 72, 73, 101, 103, 105, 106, 109, 110–112, 119, 125, 129, 130, 131, 132, 133, 134, 137, 139, 140, 144, 145, 147, 158–159, 166, 181 agent-based model 111, business 30, 54, 57–60, 61, 95, 118, 119, 121 environmental 139, 166 meteorological 72 time-space 73 transportation 7 modernity 3 Moore’s Law 81, moral philosophy 14 Moretti, F. 141–142 museum 31, 32, 137 NASA 7 National Archives and Records Administration (NARA) 67 National Security Agency (NSA) 45, 116 natural language processing 104, 105 near-field communication 89, 91 neoliberalism 56, 61–62, 126, 182 neural networks 104, 105, 111 New Public Management 62, non-governmental organisations xvi, 43, 55, 56, 73, 117 non-excludable 11, 151 non-rivalrous 11, 57, 151 normality 100, 101 normative thinking 12, 15, 19, 66, 99, 127, 144, 182, 183, 187, 192 Obama, B. 53, 75–76, 78, 118–119 objectivity 2, 17, 19, 20, 62, 135, 146, 185 observant participation 191 oligopticon 133, 167, 180 ontology 3, 12, 17–21, 22, 28, 54, 79, 128, 138, 150, 156, 177, 178, 184, 185 open data xv, xvi, xvii, 2, 12, 16, 21, 25, 48–66, 97, 114, 124, 128, 129, 140, 149, 151, 163, 164, 167, 186, 187, 188, 190, 191, 192 critique of 61–66 economics of 57–60 rationale 54–56 Open Definition 50 OpenGovData 50, 51 Open Knowledge Foundation 49, 52, 55, 58, 189, 190 open science 48, 72, 98 source 48, 56, 60, 87, 96 OpenStreetMap 73, 93, 96, 154, 155–156 optimisation 101, 104, 110–112, 120, 121, 122, 123 Ordnance Survey 54, 57 Organization for Economic Cooperation and Development (OECD) 49, 50, 59 overlearning 158, 159 panoptic 133, 167, 180 paradigm 112, 128–129, 130, 138, 147, 148, 186 participant observation 190, 191 participation 48, 49, 55, 66, 82, 94, 95, 96, 97–98, 126, 155, 165, 180 passport 8, 45, 84, 87, 88, 115 patent 13, 16, 41, 51 pattern recognition 101, 104–106, 134, 135 personally identifiable information 171 philanthropy 32, 38, 58 philosophy of science 112, 128–148, 185–188 phishing 174, 175 phone hacking 45 photography 6, 43, 71, 72, 74, 77, 86, 87, 88, 93, 94, 95, 105, 115, 116, 141, 155, 170 policing 80, 88, 116, 124, 125, 179 political economy xvi, 15–16, 25, 42–45, 182, 185, 188, 191 Pollock, R. 49, 54, 56, 57 58, 59 positivism 129, 136–137, 140, 141, 144, 145, 147 post-positivism 140, 144, 147 positionality 135, 190 power/knowledge 16, 22 predictive modelling 4, 7, 12, 34, 44, 45, 76, 101, 103, 104, 110–112, 118, 119, 120, 125, 132, 140, 147, 168, 179 profiling 110–112, 175–178, 179, 180 prescription 101 pre-analytical 2, 3, 19, 20, 185 pre-analytics 101–102, 112 pre-factual 3, 4, 19, 185 PRISM 45, 116 privacy 15, 28, 30, 40, 45, 51, 57, 63, 64, 96, 117, 163, 165, 166, 168–174, 175, 178, 182, 187 privacy by design 45, 173, 174 probability 14, 110, 153, 158 productivity xvi, 16, 39, 55, 66, 92, 114, 118 profiling 12, 42–45, 74, 75, 110–112, 119, 166, 168, 175–178, 179, 180, 187 propriety rights 48, 49, 54, 57, 62 prosumption 93 public good 4, 12, 16, 42, 52, 56, 58, 79, 97 –private partnerships 56, 59 sector information (PSI) 12, 48, 54, 56, 59, 61, 62 quantified self 95 redlining 176, 182 reductionism 73, 136, 140, 142, 143, 145 regression 102, 104, 105, 110, 111, 122 regulation xvi, 15, 16, 23, 25, 40, 44, 46, 83, 85, 87, 89–90, 114, 115, 123, 124, 126, 168, 174, 178, 180, 181–182, 187, 192 research design 7, 13, 14, 77–78, 98, 137–138, 153, 158 Renaissance xvi, 129, 141 repository 29, 33, 34, 41 representativeness 13, 14, 19, 21 Resource Description Framework (RDF) 53, 54 remote sensing 73–74, 105 RFID 74, 85, 90, 91, 169 rhetorical 3, 4, 185 right to be forgotten 45, 172, 187 information (RTI) 48, 62 risk 16, 44, 58, 63, 118, 120, 123, 132, 158, 174, 176–177, 178, 179, 180 Rosenberg, D. 1, 3 Ruppert, E. 22, 112, 157, 163, 187 sampling 13, 14, 27, 28, 46, 68, 72, 73, 77, 78, 88, 100, 101, 102, 120, 126, 133, 138, 139, 146, 149–150, 152, 153, 154, 156, 159 scale of economy 37 scanners 6, 25, 29, 32, 83, 85, 88, 89, 90, 91, 92, 175, 177, 180 science xvi, 1, 2, 3, 19, 20, 29, 31, 34, 37, 46, 65, 67, 71, 72, 73, 78, 79, 97, 98, 100, 101, 103, 111, 112, 128–139, 140, 147, 148, 150, 158, 161, 165, 166, 181, 184, 186 scientific method 129, 130, 133, 134, 136, 137–138, 140, 147, 148, 186 security data 28, 33, 34, 40, 45, 46, 51, 57, 126, 157, 166, 169, 171, 173, 174–175, 182, 187 national 42, 71, 88, 116–117, 172, 176, 178, 179 private 99, 115, 118, 151 social 8, 32, 45, 87, 115, 171 segmentation 104, 105, 110, 119, 120, 121, 122, 176 semantic information 9, 10, 11, 105, 157 Web 49, 52, 53, 66 sensors xv, 6, 7, 19, 20, 24, 25, 28, 34, 71, 76, 83, 84, 91–92, 95, 124, 139, 150, 160 sentiment analysis 105, 106, 121, Siegel, E. 103, 110, 111, 114, 120, 132, 158, 176, 179 signal 9, 151, 159 Silver, N. 136, 151, 158 simulation 4, 32, 37, 101, 104, 110–112, 119, 129, 133, 137, 139, 140 skills 37, 48, 52, 53, 57, 63, 94, 97, 98, 112, 149, 160–163, 164 small data 21, 27–47, 68, 72, 75, 76, 77, 79, 100, 103, 110, 112, 146, 147, 148, 150, 156, 160, 166, 184, 186, 188, 191 smart cards 90 cities 91, 92, 99, 124–125, 181–182 devices 83 metering 89, 123, 174 phones 81, 82, 83, 84, 90, 94, 107, 121, 155, 170, 174 SmartSantander 91 social computing xvi determinism 144 media xv, 13, 42, 43, 76, 78, 90, 93, 94–95, 96, 105, 119, 121, 140, 150, 151, 152, 154, 155, 160, 167, 176, 180 physics 144 security number 8, 32, 45, 87, 115, 171 sorting 126, 166, 168, 175–178, 182 sociotechnical systems 21–24, 47, 66, 183, 185, 188 software 6, 20, 32, 34, 40, 48, 53, 54, 56, 63, 80, 83, 84, 86, 88, 96, 132, 143, 160, 161, 163, 166, 170, 172, 175, 177, 180, 189 Solove, D. 116, 120, 168, 169, 170, 172, 176, 178, 180 solutionism 181 sousveillance 95–96 spatial autocorrelation 146 data infrastructure 34, 35, 38 processes 136, 144 resolution 149 statistics 110 video 88 spatiality 17, 157 Star, S.L. 19, 20, 23, 24 stationarity 100 statistical agencies 8, 30, 34, 35, 115 geography 17, 74, 157 statistics 4, 8, 13, 14, 24, 48, 77, 100, 101, 102, 104, 105, 109–110, 111, 129, 132, 134, 135, 136, 140, 142, 143, 145, 147, 159 descriptive 4, 106, 109, 147 inferential 4, 110, 147 non-parametric 105, 110 parametric 105, 110 probablistic 110 radical 147 spatial 110 storage 31–32, 68, 72, 73, 78, 80, 85–87, 88, 100, 118, 161, 171 analogue 85, 86 digital 85–87 media 20, 86 store loyalty cards 42, 45, 165 Sunlight Foundation 49 supervised learning 103 Supply Chain Management (SCM) 74, 99, 117–118, 119, 120, 121 surveillance 15, 71, 80, 83, 87–90, 95, 115, 116, 117, 123, 124, 151, 165, 167, 168, 169, 180 survey 6, 17, 19, 22, 28, 42, 68, 75, 77, 87, 115, 120 sustainability 16, 33, 34, 57, 58, 59, 61, 64–66, 87, 114, 123–124, 126, 155 synchronicity 14, 95, 102 technological handshake 84, 153 lock-in 166, 179–182 temporality 17, 21, 27, 28, 32, 37, 68, 75, 111, 114, 157, 160, 186 terrorism 116, 165, 179 territory 16, 38, 74, 85, 167 Tesco 71, 120 Thrift, N. 83, 113, 133, 167, 176 TopCoder 96 trading funds 54–55, 56, 57 transparency 19, 38, 44, 45, 48–49, 55, 61, 62, 63, 113, 115, 117, 118, 121, 126, 165, 173, 178, 180 trust 8, 30, 33, 34, 40, 44, 55, 84, 117, 152–156, 163, 175 trusted digital repository 33–34 Twitter 6, 71, 78, 94, 106, 107, 133, 143, 144, 146, 152, 154, 155, 170 uncertainty 10, 13, 14, 100, 102, 110, 156, 158 uneven development 16 Uniform Resource Identifiers (URIs) 53, 54 United Nations Development Programme (UNDP) 49 universalism 20, 23, 133, 140, 144, 154, 190 unsupervised learning 103 utility 1, 28, 53, 54, 55, 61, 63, 64–66, 100, 101, 114, 115, 134, 147, 163, 185 venture capital 25, 59 video 6, 43, 71, 74, 77, 83, 88, 90, 93, 94, 106, 141, 146, 170 visual analytics 106–109 visualisation 5, 10, 34, 77, 101, 102, 104, 106–109, 112, 125, 132, 141, 143 Walmart 28, 71, 99, 120 Web 2.0 81, 94–95 Weinberger, D. 9, 10, 11, 96, 97, 132, 133 White House 48 Wikipedia 93, 96, 106, 107, 143, 154, 155 Wired 69, 130 wisdom 9–12, 114, 161 XML 6, 53 Zikopoulos, P.C. 6, 16, 68, 70, 73, 76, 119, 151

pages: 918 words: 257,605

The Age of Surveillance Capitalism by Shoshana Zuboff

Amazon Web Services, Andrew Keen, augmented reality, autonomous vehicles, barriers to entry, Bartolomé de las Casas, Berlin Wall, bitcoin, blockchain, blue-collar work, book scanning, Broken windows theory, California gold rush, call centre, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, choice architecture, citizen journalism, cloud computing, collective bargaining, Computer Numeric Control, computer vision, connected car, corporate governance, corporate personhood, creative destruction, cryptocurrency, dogs of the Dow, don't be evil, Donald Trump, Edward Snowden,, Erik Brynjolfsson, facts on the ground, Ford paid five dollars a day, future of work, game design, Google Earth, Google Glasses, Google X / Alphabet X, hive mind, impulse control, income inequality, Internet of things, invention of the printing press, invisible hand, Jean Tirole, job automation, Johann Wolfgang von Goethe, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, knowledge economy, linked data, longitudinal study, low skilled workers, Mark Zuckerberg, market bubble, means of production, multi-sided market, Naomi Klein, natural language processing, Network effects, new economy, Occupy movement, off grid, PageRank, Panopticon Jeremy Bentham, pattern recognition, Paul Buchheit, performance metric, Philip Mirowski, precision agriculture, price mechanism, profit maximization, profit motive, recommendation engine, refrigerator car, RFID, Richard Thaler, ride hailing / ride sharing, Robert Bork, Robert Mercer, Second Machine Age, self-driving car, sentiment analysis, shareholder value, Shoshana Zuboff, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, slashdot, smart cities, Snapchat, social graph, social web, software as a service, speech recognition, statistical model, Steve Jobs, Steven Levy, structural adjustment programs, The Future of Employment, The Wealth of Nations by Adam Smith, Tim Cook: Apple, two-sided market, union organizing, Watson beat the top human players on Jeopardy!, winner-take-all economy, Wolfgang Streeck

Machine Emotion In 2015 an eight-year-old startup named Realeyes won a 3.6 million euro grant from the European Commission for a project code-named “SEWA: Automatic Sentiment Analysis in the Wild.” The aim was “to develop automated technology that will be able to read a person’s emotion when they view content and then establish how this relates to how much they liked the content.” The director of video at AOL International called the project “a huge leap forward in video ad tech” and “the Holy Grail of video marketing.”86 Just a year later, Realeyes won the commission’s Horizon 2020 innovation prize thanks to its “machine learning-based tools that help market researchers analyze the impact of their advertising and make it more relevant.”87 The SEWA project is a window on a burgeoning new domain of rendition and behavioral surplus supply operations known as “affective computing,” “emotion analytics,” and “sentiment analysis.” The personalization project descends deeper toward the ocean floor with these new tools, where they lay claim to yet a new frontier of rendition trained not only on your personality but also on your emotional life.

Patrick Mannion, “Facial-Recognition Sensors Adapt to Track Emotions, Mood, and Stress,” EDN, March 3, 2016,—mood—and-stress; “Marketers, Welcome to the World of Emotional Analytics,” MarTech Today, January 12, 2016,; Ben Virdee-Chapman, “5 Companies Using Facial Recognition to Change the World,” Kairos, May 26, 2016,; “Affectiva Announces New Facial Coding Solution for Qualitative Research,” Affectiva, May 7, 2014,; Ahmad Jalal, Shaharyar Kamal, and Daijin Kim, “Human Depth Sensors-Based Activity Recognition Using Spatiotemporal Features and Hidden Markov Model for Smart Environments,” Journal of Computer Networks and Communications (2016),; M. Kakarla and G. R. M. Reddy, “A Real Time Facial Emotion Recognition Using Depth Sensor and Interfacing with Second Life Based Virtual 3D Avatar,” in International Conference on Recent Advances and Innovations in Engineering (ICRAIE-2014), 2014, 1–7, 90. “Sewa Project: Automatic Sentiment Analysis in the Wild,” SEWA, April 25, 2017, 91. Mihkel Jäätma, “Realeyes—Emotion Measurement,” Realeyes Data Services, 2016, paper.pdf. 92. Mihkel Jäätma, “Realeyes—Emotion Measurement.” 93. Alex Browne, “Realeyes—Play Your Audience Emotions to Stay on Top of the Game,” Realeyes, February 21, 2017, 94.

See also social comparison self-presentation, 462, 464, 472 self-regulation, by firms, 108, 110, 113, 147, 248–249, 305, 341 self-regulation, human, 307–308 Selvaggio, Leo, 489–490 Senate Committee on Commerce, Science, and Transportation, 169 Senate Subcommittee on Constitutional Rights, 320, 322, 323–325 Sense Networks, 425 sensors: for analysis of social relations (sociometer), 420, 423–424; and behavioral modification, 293–294; and emotion analytics, 283; in wearable technologies, 247–248 sensors, ubiquitous, 207–209, 240. See also internet of things; “smart” products; wearable technologies September 11, 2001 (9/11) attacks, 9–10, 101, 112–115, 193–194, 341 Sequoia Capital, 68, 72 Sesame Credit, 390, 391–392, 393 SEWA: Automatic Sentiment Analysis in the Wild, 282–284 shadow text: access to, 483–485; as digital dossier, 393; Instagram’s use of, 457–458; need for rejection of, 344; as pathological division of learning, 186–187, 327–328; and reality business, 202; as reversion to pre-Gutenberg order, 190. See also uncertainty Shaffer, Howard, 450, 451 shareholder value maximization, 38–39, 41, 175, 181–182, 370, 499 shock and awe approach (speed as violence), 344, 346, 400, 406 Short, Jodi, 107–108 Shorten, Richard, 359 Sidewalk Labs, 228–232 signal blocking, 489 Silicon Valley, business environment in, 72–73 Simitis, Spiros, 191 Singularity University, 426 Siri (Apple digital assistant), 269 “A 61-Million-Person Experiment in Social Influence and Political Mobilization” (Bond et al.), 299 Skinner, B.

pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst

algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

CHALLENGES REMAIN Locating the right talent to analyze data is the biggest hurdle in building a team. Such talent is in high demand, and the need for data analysts and data scientists continues to grow at an almost exponential rate. Finding this talent means that organizations will have to focus on data science and hire statistical modelers and text data–mining professionals as well as people who specialize in sentiment analysis. Success with Big Data analytics requires solid data models, statistical predictive models, and test analytic models, since these will be the core applications needed to do Big Data. Locating the appropriate talent takes more than just a typical IT job placement; the skills required for a good return on investment are not simple and are not solely technology oriented. Some organizations may turn to consulting firms to meet the need for talent; however, many consulting firms also have trouble finding the experts that can make Big Data pay off.

pages: 593 words: 118,995

Relevant Search: With Examples Using Elasticsearch and Solr by Doug Turnbull, John Berryman

commoditize, crowdsourcing, domain-specific language, finite state, fudge factor, full text search, information retrieval, natural language processing, premature optimization, recommendation engine, sentiment analysis

Otherwise, users might not find a document because it contains a misspelling of the query term. Or they may find 20 duplicates of the same document, which would have the effect of pushing other relevant documents off the end of the search results page. Second, often the existing data can be post-processed to augment the features already there. For instance, machine-learning techniques can be used to classify or cluster documents. Or sentiment analysis can be used to determine whether the text in a document is more positive or negative in tone. The possibilities are endless. After this new metadata is attached to the documents, it can serve as a valuable feature for users to search upon. Finally, new information can be merged into the documents from external sources. For instance, in e-commerce the products being sold often come from external vendors.

relevance-blind enterprise, 2nd relevance-centered enterprise business and domain awareness content curation risk of miscommunication with content curator role of content curator feedback learning to rank paired relevance tuning test-driven relevance using with user behavioral data user-focused culture vs. data-driven culture relevance-focused search application deploying designing combine and balance signals combining and balancing signals defining and modeling signals user experience improving information and requirements gathering business needs required and available information users and information needs law of diminishing returns monitoring requests library reranking rescoring response page retail_analyzer filter retail_syn_filter filter retention reweighting boosts S salient features scale variable scorable units score boost, 2nd score shaping boosting additive, with Boolean queries multiplicative, with function queries, 2nd signals defined filtering Solr strategies for achieving users’ recency goals capturing general-quality metrics combining function queries high-value tiers scored with function queries ignoring TF × IDF modeling boosting signals ranking scored documents scoring tiers, 2nd script scoring, 2nd search content exploring providing to search engine searching document search and retrieval aggregations Boolean search facets filtering Lucene-based search positional and phrase matching ranked results relevance sorting documents inverted index data structure analysis enrichment extraction indexing search antipattern search completion choosing method for from documents being searched from user input via specialized search indexes search engineer search relevance collaboration and curation and defined difficulty of class of search and lack of single solution feedback and gaining skills of relevance engineer information retrieval research into systematic approach for improving search-as-you-type searchable data semantic expansion sentiment analysis sentinel tokens, 2nd sharding short-tail application SHOULD clause, 2nd, 3rd, 4th, 5th signal construction signal discordance, 2nd avoiding combining fields into custom all fields mechanics of solving with cross_fields search signal measuring signal modeling best_fields calibrating controlling field preference in results more-precise signals field synchronicity and most_fields, 2nd boosting in when additional matches don’t matter signals boosting, 2nd combining and balancing behavior of signal weights building queries for related signals combining subqueries tuning and testing overall search tuning relevance parameters concept defined defining and modeling implementing source data model silli token similarity simple constants SimpleText data structure, 2nd snippet highlighting Solr analyzers analysis and mapping features building custom field mappings boosting additive, with Boolean queries boosting feature mappings multiplicative, with function queries feedback faceted browsing field collapsing match phrase prefix relevance feedback feature mappings suggestion and highlighting components multifield search all fields cross_fields search ergonomics query differences between Solr and Elasticsearch query feature mappings term-centric and field-centric search with edismax query parser sorting source data model span queries specificity, modeling with paths with synonyms standard analyzer, 2nd, 3rd, 4th standard filter, 2nd standard tokenizer, 2nd, 3rd, 4th, 5th standard_clone analyzer stemming stop filter stop words, 2nd, 3rd stored fields storing metadata string types subdivided text subobjects subquadrants suggest clause suggest endpoint, 2nd suggestion field sum_other_doc_count synonyms augmenting content with modeling specificity with overview, 2nd T term dictionary, 2nd term filter term frequency.

pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem

Amazon Web Services, anti-pattern, bioinformatics, commoditize, corporate governance, create, read, update, delete, data acquisition,, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, web application

Common Use Cases | 95 As in the social use case, making an effective recommendation depends on under‐ standing the connections between things, as well as the quality and strength of those connections—all of which are best expressed as a property graph. Queries are primarily graph local, in that they start with one or more identifiable subjects, whether people or resources, and thereafter discover surrounding portions of the graph. Taken together, social networks and recommendation engines provide key differenti‐ ating capabilities in the areas of retail, recruitment, sentiment analysis, search, and knowledge management. Graphs are a good fit for the densely connected data structures germane to each of these areas; storing and querying this data using a graph database allows an application to surface end-user realtime results that reflect recent changes to the data, rather than pre-calculated, stale results. Geo Geospatial is the original graph use case: Euler solved the Seven Bridges of Königsberg problem by positing a mathematical theorem which later came to form the basis of graph theory.

pages: 207 words: 59,298

The Gig Economy: A Critical Introduction by Jamie Woodcock, Mark Graham

Airbnb, Amazon Mechanical Turk, autonomous vehicles, barriers to entry, British Empire, business process, business process outsourcing, call centre, collective bargaining, commoditize, corporate social responsibility, crowdsourcing, David Graeber, deindustrialization, disintermediation,, full employment, future of work, gender pay gap, gig economy, global value chain, informal economy, information asymmetry, inventory management, Jaron Lanier, Jeff Bezos, job automation, knowledge economy, Lyft, mass immigration, means of production, Network effects, new economy, Panopticon Jeremy Bentham, planetary scale, precariat, rent-seeking, RFID, ride hailing / ride sharing, Ronald Reagan, self-driving car, sentiment analysis, sharing economy, Silicon Valley, Silicon Valley ideology, TaskRabbit, The Future of Employment, transaction costs, Travis Kalanick, two-sided market, Uber and Lyft, Uber for X, uber lyft, union organizing, women in the workforce, working poor, young professional

Figure 3(a) The availability of cloudwork Source: Figure 3(b) The location of cloudworkers on the five largest English-language platforms Source: Amazon’s Mechanical Turk – the world’s most well-known microwork platform – refers to these tasks as ‘artificial artificial intelligence’. These are tasks that usually rely on a distinctly human ability to interpret things (for instance image recognition or sentiment analysis). These are tasks that might, in theory, be performed by AI, but are cheaper and/or quicker to simply outsource to human workers. For some types of task, it may not be a simple case of humans or artificial intelligence, but rather human microworkers embedded into otherwise automated systems through application programming interfaces (APIs). Here, workers are essentially treated as part of software, algorithms and ‘automated’ processes.

pages: 204 words: 58,565

Keeping Up With the Quants: Your Guide to Understanding and Using Analytics by Thomas H. Davenport, Jinho Kim

Black-Scholes formula, business intelligence, business process, call centre, computer age, correlation coefficient, correlation does not imply causation, Credit Default Swap,, feminist movement, Florence Nightingale: pie chart, forensic accounting, global supply chain, Hans Rosling, hypertext link, invention of the telescope, inventory management, Jeff Bezos, Johannes Kepler, longitudinal study, margin call, Moneyball by Michael Lewis explains big data, Myron Scholes, Netflix Prize, p-value, performance metric, publish or perish, quantitative hedge fund, random walk, Renaissance Technologies, Robert Shiller, Robert Shiller, self-driving car, sentiment analysis, six sigma, Skype, statistical model, supply-chain management, text mining, the scientific method, Thomas Davenport

More important is the fact that the map or polygon is fully structured and small in size, even though the original prints were not. While unstructured prints are an input to the process, the actual analysis to match them up doesn’t use the unstructured images, but rather structured information extracted from them. An example everyone will appreciate is the analysis of text. Let’s consider the now popular approach of social media sentiment analysis. Are tweets, Facebook postings, and other social comments directly analyzed to determine their sentiment? Not really. The text is parsed into words or phrases. Then, those words and phrases are flagged as good or bad. In a simple example, perhaps a “good” word gets a 1, a “bad” word gets a –1, and a “neutral” word gets a 0. The sentiment of the posting is determined by the sum of the individual word or phrase scores.

Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

., parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database). This is followed by deriving patterns within the structured data, and evaluation and interpretation of the output. “High quality” in text mining usually refers to a combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity-relation modeling (i.e., learning relations between named entities). Other examples include multilingual data mining, multidimensional text analysis, contextual text mining, and trust and evolution analysis in text data, as well as text mining applications in security, biomedical literature analysis, online media analysis, and analytical customer relationship management.

Cormack [BCC10]; Manning, Raghavan, and Schutze [MRS08]; Grossman and Frieder [GR04]; Baeza-Yates and Riberio-Neto [BYRN11]; Zhai [Zha08]; Feldman and Sanger [FS06]; Berry [Ber03]; and Weiss, Indurkhya, Zhang, and Damerau [WIZD04]. Text mining is a fast-developing field with numerous papers published in recent years, covering many topics such as topic models (e.g., Blei and Lafferty [BL09]); sentiment analysis (e.g., Pang and Lee [PL07]); and contextual text mining (e.g., Mei and Zhai [MZ06]). Web mining is another focused theme, with books like Chakrabarti [Cha03a], Liu [Liu06] and Berry [Ber03]. Web mining has substantially improved search engines with a few influential milestone works, such as Brin and Page [BP98]; Kleinberg [Kle99]; Chakrabarti, Dom, Kumar, et al. [CDK+99]; and Kleinberg and Tomkins [KT99].

., SPOOK: A system for probabilistic object-oriented knowledge representation, In: Proc. 15th Annual Conf. Uncertainty in Artificial Intelligence (UAI’99) Stockholm, Sweden. (1999), pp. 541–550. [PKZT01] Papadias, D.; Kalnis, P.; Zhang, J.; Tao, Y., Efficient OLAP operations in spatial data warehouses, In: Proc. 2001 Int. Symp. Spatial and Temporal Databases (SSTD’01) Redondo Beach, CA. (July 2001), pp. 443–459. [PL07] Pang, B.; Lee, L., Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval 2 (2007) 1–135. [Pla98] Platt, J.C., Fast training of support vector machines using sequential minimal optimization, In: (Editors: Schölkopf, B.; Burges, C.J.C.; Smola, A.) Advances in Kernel Methods—Support Vector Learning (1998) MIT Press, Cambridge, MA, pp. 185–208. [PP07] Patcha, A.; Park, J.-M., An overview of anomaly detection techniques: Existing solutions and latest technological trends, Computer Networks 51 (12) (2007) 3448–3470.

pages: 523 words: 61,179

Human + Machine: Reimagining Work in the Age of AI by Paul R. Daugherty, H. James Wilson

3D printing, AI winter, algorithmic trading, Amazon Mechanical Turk, augmented reality, autonomous vehicles, blockchain, business process, call centre, carbon footprint, cloud computing, computer vision, correlation does not imply causation, crowdsourcing, digital twin, disintermediation, Douglas Hofstadter,, Erik Brynjolfsson, friendly AI, future of work, industrial robot, Internet of things, inventory management, iterative process, Jeff Bezos, job automation, job satisfaction, knowledge worker, Lyft, natural language processing, personalized medicine, precision agriculture, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, Rodney Brooks, Second Machine Age, self-driving car, sensor fusion, sentiment analysis, Shoshana Zuboff, Silicon Valley, software as a service, speech recognition, telepresence, telepresence robot, text mining, the scientific method, uber lyft

Applications include computational speech, and audio and audiovisual processing. Speech to text. Neural networks that convert audio signals to text signals in a variety of languages. Applications include translation, voice command and control, audio transcription, and more. Natural language processing (NLP). A field in which computers process human (natural) languages. Applications include speech recognition, machine translation, and sentiment analysis. AI Applications Component Intelligent agents. Agents that interact with humans via natural language. They can be used to augment human workers working in customer service, human resources, training, and other areas of business to handle FAQ-type inquiries. Collaborative robotics (cobots). Robots that operate at slower speeds and are fitted with sensors to enable safe collaboration with human workers.

pages: 237 words: 64,411

Humans Need Not Apply: A Guide to Wealth and Work in the Age of Artificial Intelligence by Jerry Kaplan

Affordable Care Act / Obamacare, Amazon Web Services, asset allocation, autonomous vehicles, bank run, bitcoin, Bob Noyce, Brian Krebs, business cycle, buy low sell high, Capital in the Twenty-First Century by Thomas Piketty, combinatorial explosion, computer vision, corporate governance, crowdsourcing,, Erik Brynjolfsson, estate planning, Flash crash, Gini coefficient, Goldman Sachs: Vampire Squid, haute couture, hiring and firing, income inequality, index card, industrial robot, information asymmetry, invention of agriculture, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, Loebner Prize, Mark Zuckerberg, mortgage debt, natural language processing, Own Your Own Home, pattern recognition, Satoshi Nakamoto, school choice, Schrödinger's Cat, Second Machine Age, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Skype, software as a service, The Chicago School, The Future of Employment, Turing test, Watson beat the top human players on Jeopardy!, winner-take-all economy, women in the workforce, working poor, Works Progress Administration

Human traders endeavor to become expert in these matters, but no one comes close to the ability of a synthetic intellect to observe broad or subtle patterns. One of my favorite examples is that the number of prepaid cell phone cards purchased is an indicator of the size of certain crops in Africa, because the individual farmers, watching their crops grow, are preparing to contact potential buyers. The more optimistic they are, the more they spend on talk minutes. The latest foray in this arena uses what’s called “sentiment analysis.” Yes, that kind of sentiment— programs at investment banks scour the Internet for positive or negative comments about products and companies, then trade on the information. The typical justification proffered for doing all this is that HFT programs are providing a service to society. They are simply cleaning up inefficiencies in the markets. But this whitewashes a darker truth. Yes, they make the financial markets nice and tidy, but they obscure a deeper cost.

pages: 268 words: 75,850

The Formula: How Algorithms Solve All Our Problems-And Create More by Luke Dormehl

3D printing, algorithmic trading, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, big data - Walmart - Pop Tarts, call centre, Cass Sunstein, Clayton Christensen, commoditize, computer age, death of newspapers, deferred acceptance, disruptive innovation, Edward Lorenz: Chaos theory, Erik Brynjolfsson, Filter Bubble, Flash crash, Florence Nightingale: pie chart, Frank Levy and Richard Murnane: The New Division of Labor, Google Earth, Google Glasses, High speed trading, Internet Archive, Isaac Newton, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Kevin Kelly, Kodak vs Instagram, lifelogging, Marshall McLuhan, means of production, Nate Silver, natural language processing, Netflix Prize, Panopticon Jeremy Bentham, pattern recognition, price discrimination, recommendation engine, Richard Thaler, Rosa Parks, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Slavoj Žižek, social graph, speech recognition, Steve Jobs, Steven Levy, Steven Pinker, Stewart Brand, the scientific method, The Signal and the Noise by Nate Silver, upwardly mobile, Wall-E, Watson beat the top human players on Jeopardy!, Y Combinator

Gild additionally looks at where individuals spend time online, since this has been shown to be a strong predictor of workplace skills. “If you spend a lot of time blogging it suggests that you’re not quite as good a programmer as someone who spends their time on Quora,” Ming says, referring to the question-and-answer website founded by two former Facebook employees. Even Twitter feeds are mined for their insights, using semantic and sentiment analysis. At the end, factors are combined to give prospective employees a “Gild Score” out of 100. “It’s very cool if you’re geeky about algorithms, but the really important take-away is that what we end up with is truly independent dimensions for describing people out in the world,” she says. “We’re talking about algorithms whose entire intent and purpose is to aggregate across your entire life to build up a very accurate representation of who you are.”

Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman

cloud computing, crowdsourcing,, first-price auction, G4S, information retrieval, John Snow's cholera map, Netflix Prize, NP-complete, PageRank, pattern recognition, random walk, recommendation engine, second-price auction, sentiment analysis, social graph, statistical model, web application

In some applications we can use the stars that people use to rate products or services on sites like Amazon or Yelp. For example, we might want to estimate the number of stars that would be assigned to reviews or tweets about a product, even if those reviews do not have star ratings. If we use star-labeled reviews as a training set, we can deduce the words that are most commonly associated with positive and negative reviews (called sentiment analysis). The presence of these words in other reviews can tell us the sentiment of those reviews. 12.1.5Exercises for Section 12.1 EXERCISE 12.1.1Redo Example 12.2 for the following different forms of f(x). (a)Require f(x) = ax; i.e., a straight line through the origin. Is the line that we discussed in the example optimal? (b)Require f(x) to be a quadratic, i.e., f(x) = ax2 + bx + c. 12.2Perceptrons A perceptron is a linear binary classifier.

., 67 S-curve, 84, 93 Saberi, A., 291 Salihoglu, S., 66 Sample, 215, 218, 221, 223, 242, 249, 253 Sampling, 127, 141 Savasere, A., 226 SCC, see Strongly connected component Schapire, R.E., 458 Schema, 30 Schutze, H., 18 Score, 105 Search ad, 268 Search engine, 166, 181 Search query, 125, 155, 176, 268, 285 Second-price auction, 279 Secondary storage, see Disk Selection, 31, 33 Sensor, 124 Sentiment analysis, 422 Set, 76, 112, see also Itemset Set difference, see Difference Shankar, S., 67 Shawe-Taylor, J., 458 Shi, J., 383 Shim, K., 266 Shingle, 72, 85, 109 Shivakumar, N., 226 Shopping cart, 193 Shortest paths, 42 Siddharth, J., 122 Signature, 75, 78, 85 Signature matrix, 78, 83 Silberschatz, A., 153 Silberstein, A., 67 Similarity, 4, 15, 69, 191, 299, 306 Similarity join, 52, 58 Simrank, 357 Singleton, R.C., 153 Singular value, 397, 401, 402 Singular-value decomposition, 312, 384, 397, 406 Six degrees of separation, 369 Sketch, 100 Skew, 26 Sliding window, 126, 142, 148, 257 Smart transitive closure, 372 Smith, B., 324 SNAP, 382 Social Graph, 326 Social network, 16, 325, 326, 384 SON Algorithm, 217 Source, 367 Space, 87, 228 Spam, see also Term spam, see also Link spam, 328, 421 Spam farm, 178, 180 Spam mass, 180, 181 Sparse matrix, 28, 76, 77, 168, 293 Spectral partitioning, 343 Spider trap, 161, 164, 184 Splitting clusters, 255 SQL, 19, 30, 66 Squares, 366 Srikant, R., 226 Srivastava, U., 67 Standard deviation, 245, 247 Standing query, 125 Stanford Network Analysis Platform, see SNAP Star join, 50 Stata, R., 18, 190 Statistical model, 1 Status, 287 Steinbach, M., 18 Stochastic gradient descent, 320, 445 Stochastic matrix, 158, 385 Stop clustering, 234, 238, 240 Stop words, 7, 74, 110, 194, 298 Stream, see Data stream Strength of membership, 355 String, 112 Striping, 29, 168, 170 Strong edge, 328 Strongly connected component, 159, 374 Strongly connected graph, 158, 368 Substochastic matrix, 161 Suffix length, 116 Summarization, 3 Summation, 147 Sun, J., 414 Supercomputer, 19 Superimposed code, see Bloom filter, 152 Supermarket, 193, 214 Superstep, 43 Supervised learning, 415, 417 Support, 192, 216, 218, 221 Support vector, 437 Support-vector machine, 17, 415, 419, 436, 455 Supporting page, 178 Suri, S., 383 Surprise number, 137 SVD, see Singular-value decomposition SVM, see Support-vector machine Swami, A., 226 Symmetric matrix, 346, 384 Szegedy, M., 152 Tag, 298, 329 Tail, 372 Tail length, 135, 376 Tan, P.

pages: 255 words: 76,495

The Facebook era: tapping online social networks to build better products, reach new audiences, and sell more stuff by Clara Shih

business process, call centre, Clayton Christensen, cloud computing, commoditize, conceptual framework, corporate governance, crowdsourcing, glass ceiling, jimmy wales, Mark Zuckerberg, Metcalfe’s law, Network effects,, pre–internet, rolodex, semantic web, sentiment analysis, Silicon Valley, Silicon Valley startup, social graph, social web, software as a service, Tony Hsieh, web application

Lexicon tracks the frequency and sentiment of a particular keyword (like your brand) across Facebook wall posts, status messages, and comments (see Figure 8.4). From the Library of Kerri Ross 152 Pa r t I I I Edited by Foxit Reader Copyright(C) by Foxit Software Company,2005-2008 Yo u r Evaluation S te p - B y - S te pOnly. G u i d e to Us i n g Fa ce b o o k fo r B u s i n e s s For Figure 8.4 Lexicon is a keyword frequency and sentiment analysis tool that looks across all Facebook wall posts, status messages, and comments. Lexicon measures frequency in three ways: the number of posts in which the keyword appears, the number of Facebook members who have referenced the keyword in a post, and what percentage of all members reference this keyword. Sentiment reflects the percentage of posts referencing a keyword that are positive versus negative.

pages: 301 words: 85,126

AIQ: How People and Machines Are Smarter Together by Nick Polson, James Scott

Air France Flight 447, Albert Einstein, Amazon Web Services, Atul Gawande, autonomous vehicles, availability heuristic, basic income, Bayesian statistics, business cycle, Cepheid variable, Checklist Manifesto, cloud computing, combinatorial explosion, computer age, computer vision, Daniel Kahneman / Amos Tversky, Donald Trump, Douglas Hofstadter, Edward Charles Pickering, Elon Musk, epigenetics, Flash crash, Grace Hopper, Gödel, Escher, Bach, Harvard Computers: women astronomers, index fund, Isaac Newton, John von Neumann, late fees, low earth orbit, Lyft, Magellanic Cloud, mass incarceration, Moneyball by Michael Lewis explains big data, Moravec's paradox, more computing power than Apollo, natural language processing, Netflix Prize, North Sea oil, p-value, pattern recognition, Pierre-Simon Laplace, ransomware, recommendation engine, Ronald Reagan, self-driving car, sentiment analysis, side project, Silicon Valley, Skype, smart cities, speech recognition, statistical model, survivorship bias, the scientific method, Thomas Bayes, Uber for X, uber lyft, universal basic income, Watson beat the top human players on Jeopardy!, young professional

Researchers in NLP simply didn’t have enough data to construct models that were sufficiently complicated to describe human language without overfitting what little data they had. As a result, by the 2000s, speech recognition again hit a plateau, at about 75–80% word-level accuracy. For nearly a decade, progress was discouragingly slow—and not just for speech recognition but also for other tasks in natural language processing that were hampered by a lack of data, from machine translation to sentiment analysis. Post 2010: The Natural Language Revolution Around 2010, everything started to change—slowly at first, then at a startling pace. What drove this change was a massive infusion of data. Jorge Luis Borges once wrote a story called “The Library of Babel,” about a library whose books contained all possible works of prose: that is, all possible orderings of the letters of the alphabet and the basic punctuation marks.

pages: 322 words: 84,752

Pax Technica: How the Internet of Things May Set Us Free or Lock Us Up by Philip N. Howard

Affordable Care Act / Obamacare, Berlin Wall, bitcoin, blood diamonds, Bretton Woods, Brian Krebs, British Empire, butter production in bangladesh, call centre, Chelsea Manning, citizen journalism, clean water, cloud computing, corporate social responsibility, creative destruction, crowdsourcing, digital map, Edward Snowden,, failed state, Fall of the Berlin Wall, feminist movement, Filter Bubble, Firefox, Francis Fukuyama: the end of history, Google Earth, Howard Rheingold, income inequality, informal economy, Internet of things, Julian Assange, Kibera, Kickstarter, land reform, M-Pesa, Marshall McLuhan, megacity, Mikhail Gorbachev, mobile money, Mohammed Bouazizi, national security letter, Nelson Mandela, Network effects, obamacare, Occupy movement, packet switching, pension reform, prediction markets, sentiment analysis, Silicon Valley, Skype, spectrum auction, statistical model, Stuxnet, trade route, undersea cable, uranium enrichment, WikiLeaks, zero day

Liu Yazhou, political commissar of the University of National Defense, published an article in the People’s Liberation Army Daily arguing that today’s internet has become the main battlefield for ideological struggle. “Entering the new century,” he wrote recently, “whoever controls the internet, especially micro-blog resources, will have the right to control opinions.”44 The Party is aware that political conversations over social media have real-world consequences and can provide a metric of public opinion. Senior officials get exclusive access to social media sentiment analysis through the Party’s media research team. One Chinese pollster blames a 10 percent drop in confidence in the Party to the rapid spread of microblogs.45 When moderates and ideologues are given equal access to digital media, people tend to use social media to marginalize extremism, hate speech, and radical ideas. In part, this is because digital networks are ultimately social networks. On a personal level, we often don’t like experiencing “socialization” because it can mean embarrassing correctives to our bad behavior.

pages: 305 words: 79,303

The Four: How Amazon, Apple, Facebook, and Google Divided and Conquered the World by Scott Galloway

activist fund / activist shareholder / activist investor, additive manufacturing, Affordable Care Act / Obamacare, Airbnb, Amazon Web Services, Apple II, autonomous vehicles, barriers to entry, Ben Horowitz, Bernie Sanders, big-box store, Bob Noyce, Brewster Kahle, business intelligence, California gold rush, cloud computing, commoditize, cuban missile crisis, David Brooks, disintermediation, don't be evil, Donald Trump, Elon Musk, follow your passion, future of journalism, future of work, global supply chain, Google Earth, Google Glasses, Google X / Alphabet X, Internet Archive, invisible hand, Jeff Bezos, Jony Ive, Khan Academy, longitudinal study, Lyft, Mark Zuckerberg, meta analysis, meta-analysis, Network effects, new economy, obamacare, Oculus Rift, offshore financial centre, passive income, Peter Thiel, profit motive, race to the bottom, RAND corporation, ride hailing / ride sharing, risk tolerance, Robert Mercer, Robert Shiller, Robert Shiller, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, shareholder value, Silicon Valley, Snapchat, software is eating the world, speech recognition, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Stewart Brand, supercomputer in your pocket, Tesla Model S, Tim Cook: Apple, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, undersea cable, Whole Earth Catalog, winner-take-all economy, working poor, young professional

Word of someone’s changed status can race through the network, reaching distant nodes that person doesn’t know exist. Facebook analyzes any resulting behavioral changes on the network whenever a customer switches his or her relationship information. As the following graph shows, single people communicate more on Facebook. It’s part of the preening of courtship. But once they enter a relationship, communication plummets. The Facebook machine tracks this and runs it through a process called “sentiment analysis”—categorizing positive and negative opinions, in words and photos, of each person’s level of happiness. And as you might expect, coupling significantly increases happiness (though there appears to be a dip following the initial euphoria).13 Meyer, Robinson. “When You Fall in Love This Is What Facebook Sees.” The Atlantic. It’s easy to be skeptical about Facebook, especially with all of the self-promotion, fake news, and groupthink spread on the platform.

pages: 307 words: 88,180

AI Superpowers: China, Silicon Valley, and the New World Order by Kai-Fu Lee

AI winter, Airbnb, Albert Einstein, algorithmic trading, artificial general intelligence, autonomous vehicles, barriers to entry, basic income, business cycle, cloud computing, commoditize, computer vision, corporate social responsibility, creative destruction, crony capitalism, Deng Xiaoping, deskilling, Donald Trump, Elon Musk,, Erik Brynjolfsson, full employment, future of work, gig economy, Google Chrome, happiness index / gross national happiness, if you build it, they will come, ImageNet competition, income inequality, informal economy, Internet of things, invention of the telegraph, Jeff Bezos, job automation, John Markoff, Kickstarter, knowledge worker, Lean Startup, low skilled workers, Lyft, mandatory minimum, Mark Zuckerberg, Menlo Park, minimum viable product, natural language processing, new economy, pattern recognition, pirate software, profit maximization, QR code, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, risk tolerance, Robert Mercer, Rodney Brooks, Rubik’s Cube, Sam Altman, Second Machine Age, self-driving car, sentiment analysis, sharing economy, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, Skype, special economic zone, speech recognition, Stephen Hawking, Steve Jobs, strong AI, The Future of Employment, Travis Kalanick, Uber and Lyft, uber lyft, universal basic income, urban planning, Y Combinator

Finally, for students who are falling behind, the AI-powered student profile will notify parents of their child’s situation, giving a clear and detailed explanation of what concepts the student is struggling with. The parents can use this information to enlist a remote tutor through services such as VIPKid, which connects American teachers with Chinese students for online English classes. Remote tutoring has been around for some time, but perception AI now allows these platforms to continuously gather data on student engagement through expression and sentiment analysis. That data continually feeds into a student’s profile, helping the platforms filter for the kinds of teachers that keep students engaged. Almost all of the tools described here already exist, and many are being implemented in different classrooms across China. Taken together, they constitute a new AI-powered paradigm for education, one that merges the online and offline worlds to create a learning experience tailored to the needs and abilities of each student.

pages: 313 words: 92,053

Places of the Heart: The Psychogeography of Everyday Life by Colin Ellard

augmented reality, Benoit Mandelbrot, Berlin Wall, Broken windows theory, Buckminster Fuller, carbon footprint, commoditize, crowdsourcing, Frank Gehry, Google Glasses, Guggenheim Bilbao, haute couture, Howard Rheingold, Internet of things, Jaron Lanier, mandelbrot fractal, Marshall McLuhan, Masdar, mass immigration, megastructure, more computing power than Apollo, Oculus Rift, Peter Eisenman, RFID, Richard Florida, risk tolerance, sentiment analysis, smart cities, starchitect, the built environment, theory of mind, urban decay, urban planning, urban sprawl, Victor Gruen

But tweets can also be geocoded, which means that they could be used to map the frequency of use of emotion words in different locations. Theoretically, it’s possible to tag the location at which a tweet occurred with city block precision, but this depends on the privacy settings set by the user of the application. It’s more common for tweets to be coded only to the home city of the tweeter. Nevertheless, the possibilities for using sentiment analysis, or even intention analysis, where text information is mined for clues as to what you plan to do next, will probably play an increasing role in the uses of social media to probe our inner states. With geographical variables added into the mix, this will make available to a wide range of commercial and institutional interests access to the emotional fabric of places. Ground Control to Ground Control Mobile phones have transformed our relationships with places.

pages: 368 words: 96,825

Bold: How to Go Big, Create Wealth and Impact the World by Peter H. Diamandis, Steven Kotler

3D printing, additive manufacturing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, Charles Lindbergh, cloud computing, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, dematerialisation, deskilling, disruptive innovation, Elon Musk,, Exxon Valdez, fear of failure, Firefox, Galaxy Zoo, Google Glasses, Google Hangouts, gravity well, ImageNet competition, industrial robot, Internet of things, Jeff Bezos, John Harrison: Longitude, John Markoff, Jono Bacon, Just-in-time delivery, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, loss aversion, Louis Pasteur, low earth orbit, Mahatma Gandhi, Marc Andreessen, Mark Zuckerberg, Mars Rover, meta analysis, meta-analysis, microbiome, minimum viable product, move fast and break things, Narrative Science, Netflix Prize, Network effects, Oculus Rift, optical character recognition, packet switching, PageRank, pattern recognition, performance metric, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, Ray Kurzweil, recommendation engine, Richard Feynman, ride hailing / ride sharing, risk tolerance, rolodex, self-driving car, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart grid, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Stewart Brand, superconnector, technoutopianism, telepresence, telepresence robot, Turing test, urban renewal, web application, X Prize, Y Combinator, zero-sum game

By offering $0.05 per categorization, I got the entire 65 years’ worth of issues, roughly 3,000 in total, done for under $200. I used Amazon’s site Mechanical Turk ( to get those magazine covers analyzed. While MTURK isn’t all that useful for more complicated jobs, it is where to go to get simple, quick tasks done fast. Aggregation and classification jobs tend to be popular uses. Aggregate photographs of red trucks, for example, or write product descriptions, or perform sentiment analysis exercises on thousands of Tweets. Requesters (you) post tasks known as HITs (human intelligence tasks) while workers (called providers) browse among existing tasks and complete them for a monetary payment.16 Another microtask site that I’ve previously relied upon (and with great result) is Fiverr (, an online marketplace offering microtasks starting at $5. Typical services include voiceovers, animations, crafts, promotional videos, and art.

Data and the City by Rob Kitchin,Tracey P. Lauriault,Gavin McArdle

A Declaration of the Independence of Cyberspace, bike sharing scheme, bitcoin, blockchain, Bretton Woods, Chelsea Manning, citizen journalism, Claude Shannon: information theory, clean water, cloud computing, complexity theory, conceptual framework, corporate governance, correlation does not imply causation, create, read, update, delete, crowdsourcing, cryptocurrency, dematerialisation, digital map, distributed ledger, fault tolerance, fiat currency, Filter Bubble, floating exchange rates, global value chain, Google Earth, hive mind, Internet of things, Kickstarter, knowledge economy, lifelogging, linked data, loose coupling, new economy, New Urbanism, Nicholas Carr, open economy, openstreetmap, packet switching, pattern recognition, performance metric, place-making, RAND corporation, RFID, Richard Florida, ride hailing / ride sharing, semantic web, sentiment analysis, sharing economy, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart contracts, smart grid, smart meter, social graph, software studies, statistical model, TaskRabbit, text mining, The Chicago School, The Death and Life of Great American Cities, the market place, the medium is the message, the scientific method, Toyota Production System, urban planning, urban sprawl, web application

The stickiness of social media data resists the operationalization in automatic pipelines for knowledge extraction and manifests itself in false positives that can only be identified and resolved by a close reading of the source. This has consequences for the use of big data in urban governance, urban operation centres and predictive policing – applications that often rely on decontextualized data and reductive modes of analysis, such as text mining based on trigger words or dictionary-based sentiment analysis. Ignoring stickiness of context can lead to Sticky data 105 cases where a terrorism suspect identified by unsupervised text analysis turns out to be the journalist who reported on the issue (Currier et al. 2015). In this sense, stickiness points to issues of privacy even within the realm of publicly accessible data sources. As social media scholar Judith Donath notes, privacy fails when something that is intended for a particular context gets shown in another where it acquires a different meaning (2014: 212).

pages: 392 words: 108,745

Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think by James Vlahos

Albert Einstein, AltaVista, Amazon Mechanical Turk, Amazon Web Services, augmented reality, Automated Insights, autonomous vehicles, Chuck Templeton: OpenTable:, cloud computing, computer age, Donald Trump, Elon Musk, information retrieval, Internet of things, Jacques de Vaucanson, Jeff Bezos, lateral thinking, Loebner Prize, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Mark Zuckerberg, Menlo Park, natural language processing, PageRank, pattern recognition, Ponzi scheme, randomized controlled trial, Ray Kurzweil, Ronald Reagan, Rubik’s Cube, self-driving car, sentiment analysis, Silicon Valley, Skype, Snapchat, speech recognition, statistical model, Steve Jobs, Steve Wozniak, Steven Levy, Turing test, Watson beat the top human players on Jeopardy!

In a provocative paper, “The Crying Shame of Robot Nannies,” University of Sheffield professors Amanda and Noel Sharkey examined some of the dystopian implications for childhood development. Advances in natural-language processing “could lead to superficially convincing conversations between robots and children in the near future,” the Sharkeys wrote. But there is a vast gulf between “superficially convincing” responses and those of a good human caregiver capable of true understanding and compassion. Affective computing—sentiment analysis from facial expressions, word choice, and tone—would bolster the quality of interaction but only to a limited degree. “A good carer’s response is based on grasping the cause of emotions rather than simply acting on the emotions displayed,” the Sharkeys wrote. “We should respond differently to a child crying because she has lost her toy than because she has been abused.” The notion of using AIs to monitor children may seem far-fetched.

pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel

Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, butter production in bangladesh, call centre, Charles Lindbergh, commoditize, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil,, Erik Brynjolfsson, Everything should be made as simple as possible, experimental subject, Google Glasses, happiness index / gross national happiness, job satisfaction, Johann Wolfgang von Goethe, lifelogging, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mass immigration, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, Shai Danziger, software as a service, speech recognition, statistical model, Steven Levy, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Davenport, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra, zero-sum game

Eric Gilbert and Karrie Karahalios released the data, code, and models for this research: Eric Gilbert, “Update: Widespread Worry and the Stock Market,” Social.CS.UIUC.EDU, March 13, 2010. Predicting by social media: Sitaram Asu and Bernardo A. Huberman, “Predicting the Future with Social Media,” Cornell University Library, March 29, 2010,, arXiv:1003.5699. Anshul Mittal and Arpit Goel, “Stock Prediction Using Twitter Sentiment Analysis,” Stanford University Libraries, December 16, 2011. Allison Aubrey, “Happiness: It Really Is Contagious,” NPR News, All Things Considered, December 5, 2008. Shea Bennett, “Can Twitter Beat the Stock Market? Tweet Sentiment Trading API Bets That It Can,” Mediabistro, July 5, 2012.

pages: 461 words: 106,027

Zero to Sold: How to Start, Run, and Sell a Bootstrapped Business by Arvid Kahl

"side hustle", business process, centre right, Chuck Templeton: OpenTable:, continuous integration, coronavirus, COVID-19, Covid-19, crowdsourcing, domain-specific language, financial independence, Google Chrome, if you build it, they will come, information asymmetry, information retrieval, inventory management, Jeff Bezos, job automation, Kubernetes, minimum viable product, Network effects, performance metric, post-work, premature optimization, risk tolerance, Ruby on Rails, sentiment analysis, Silicon Valley, software as a service, source of truth, statistical model, subscription business, supply-chain management, trickle-down economics, web application

And sometimes, it won't work; they'll give up or move to a more straightforward product. At that point, you should stay in close contact with them and see what they need, and if they find something that helps them. Then, learn how you can enable your own product to do that. Misalignment could be caused by something simple, like the wording of your messaging. For example, do your customers understand the phrase "heuristic-based statistical sentiment analysis," or would "find the tone of a message" be clearer? You don't need to dumb it down, but you also shouldn't overcomplicate it. As an engineer, I feel that I need to be as precise as possible. Customers don't necessarily value this as much as you think. Maybe your product is confusing. Your customers don't want to be confused. They don't want to be surprised by your product. And they definitely don't want to learn anything new to be able to solve a problem they already had to learn a product or even a manual solution for in the past.

pages: 298 words: 43,745

Understanding Sponsored Search: Core Elements of Keyword Advertising by Jim Jansen

AltaVista, barriers to entry, Black Swan, bounce rate, business intelligence, butterfly effect, call centre, Claude Shannon: information theory, complexity theory, correlation does not imply causation,, first-price auction, information asymmetry, information retrieval, intangible asset, inventory management, life extension, linear programming, longitudinal study, megacity, Nash equilibrium, Network effects, PageRank, place-making, price mechanism, psychological pricing, random walk, Schrödinger's Cat, sealed-bid auction, search engine result page, second-price auction, second-price sealed-bid, sentiment analysis, social web, software as a service, stochastic process, telemarketer, the market place, The Present Situation in Quantum Mechanics, the scientific method, The Wisdom of Crowds, Vickrey auction, Vilfredo Pareto, yield management

Sponsored-search analytics.╇ With the increased use of check-in and mobile apps, one would expect to see geo-location-based metrics to measure the increase in foot traffic to brick-and-mortar stores based on sponsored-search advertisements, similar to click-to-call metrics now. Certainly, given the increased availability of consumer data, the future will hold sponsored-search metrics beyond impressions, clicks, and conversions. For example, the increasingly social aspects of Web sites, such as reviews and consumer comments, will likely lead to sentiment-analysis metrics that measure the tone of consumer comments about a brand or ad. This data can potentially affect how quality score is calculated. Already, sponsored-search platforms are offering searchers and consumers the ability to rate ads, so integration of reviews from other sites cannot be far behind. With the increase in tracking devices and use of the Web via many devices such as mobile phones, televisions, and navigation systems, advertisers will have simpler ways to measure the combined reach of television, Web, radio, and mobile advertising in an integrated marketing communication (IMC) approach.

pages: 525 words: 116,295

The New Digital Age: Transforming Nations, Businesses, and Our Lives by Eric Schmidt, Jared Cohen

access to a mobile phone, additive manufacturing, airport security, Amazon Mechanical Turk, Amazon Web Services, anti-communist, augmented reality, Ayatollah Khomeini, barriers to entry, bitcoin, borderless world, call centre, Chelsea Manning, citizen journalism, clean water, cloud computing, crowdsourcing, data acquisition, Dean Kamen, drone strike, Elon Musk, failed state, fear of failure, Filter Bubble, Google Earth, Google Glasses, hive mind, income inequality, information trail, invention of the printing press, job automation, John Markoff, Julian Assange, Khan Academy, Kickstarter, knowledge economy, Law of Accelerating Returns, market fundamentalism, means of production, MITM: man-in-the-middle, mobile money, mutually assured destruction, Naomi Klein, Nelson Mandela, offshore financial centre, Parag Khanna, peer-to-peer, peer-to-peer lending, personalized medicine, Peter Singer: altruism, Ray Kurzweil, RFID, Robert Bork, self-driving car, sentiment analysis, Silicon Valley, Skype, Snapchat, social graph, speech recognition, Steve Jobs, Steven Pinker, Stewart Brand, Stuxnet, The Wisdom of Crowds, upwardly mobile, Whole Earth Catalog, WikiLeaks, young professional, zero day

Typically, governments put restrictions on the gateway routers that connect the country and on DNS (domain name system) servers. This allows them to either block a website altogether (e.g., YouTube in Iran) or process web content through “deep-packet inspection.” With deep-packet inspection, special software allows the router to look inside the packets of data that pass through it and check for forbidden words, among other things (the use of sentiment-analysis software to screen out negative statements about politicians, for example), which it can then block. Neither technique is foolproof; users can access blocked sites with circumvention technologies like proxy servers (which trick the routers) or by using secure https encryption protocols (which enable private Internet communication that, at least in theory, cannot be read by anyone other than your computer and the website you are accessing), and deep-packet inspection rarely catches every instance of banned content.

pages: 476 words: 125,219

Digital Disconnect: How Capitalism Is Turning the Internet Against Democracy by Robert W. McChesney

2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, access to a mobile phone, Albert Einstein, American Legislative Exchange Council, American Society of Civil Engineers: Report Card, Automated Insights, barriers to entry, Berlin Wall, business cycle, Cass Sunstein, citizen journalism, cloud computing, collaborative consumption, collective bargaining, creative destruction, crony capitalism, David Brooks, death of newspapers, declining real wages, Double Irish / Dutch Sandwich, Erik Brynjolfsson, failed state, Filter Bubble, full employment, future of journalism, George Gilder, Gini coefficient, Google Earth, income inequality, informal economy, intangible asset, invention of agriculture, invisible hand, Jaron Lanier, Jeff Bezos, jimmy wales, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, Joseph Schumpeter, Julian Assange, Kickstarter, Mark Zuckerberg, Marshall McLuhan, means of production, Metcalfe’s law, mutually assured destruction, national security letter, Nelson Mandela, Network effects, new economy, New Journalism, Nicholas Carr, Occupy movement, offshore financial centre, patent troll, Peter Thiel, plutocrats, Plutocrats, post scarcity, price mechanism, profit maximization, profit motive, QWERTY keyboard, Ralph Nader, Richard Stallman, road to serfdom, Robert Metcalfe, Saturday Night Live, sentiment analysis, Silicon Valley, single-payer health, Skype, spectrum auction, Steve Jobs, Steve Wozniak, Steven Levy, Steven Pinker, Stewart Brand, Telecommunications Act of 1996, the medium is the message, The Spirit Level, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, Thorstein Veblen, too big to fail, transfer pricing, Upton Sinclair, WikiLeaks, winner-take-all economy, yellow journalism

“Information about users is what really matters.”186 Turow concludes, “The emerging trajectory suggests that apart from a relatively few elite-oriented publishers (New York Times, Atlantic, and the like), the pressure to bring personalization synced to marketing goals will be difficult for companies to avoid if they want to survive.”187 This should not really be a surprise; advertisers always supported media for opportunistic reasons, because they had no better options. Now they have better options, and consequently much of the media can get thrown overboard. The profit motive pushes this process into new and dangerous frontiers quickly. Increasingly, research—“persuasion profiling”—determines what types of sales pitches are most effective with each individual, and ads are tailored accordingly. Moreover, researchers are now working on “sentiment analysis,” to see what mood a person is in at a particular moment and what products and sales pitches would be most effective.188 Advertisers are at work developing emotional analysis software so webcams can monitor how one’s face responds to what is on the screen. “One way to persuade internet users to grant access to their images,” The Economist notes, “would be to offer them discounts or subscriptions to websites.”189 Pariser chronicles a range of developments on the horizon, including making machines more “human.”

pages: 752 words: 131,533

Python for Data Analysis by Wes McKinney

backtesting, cognitive dissonance, crowdsourcing, Debian, Firefox, Google Chrome, Guido van Rossum, index card, random walk, recommendation engine, revision control, sentiment analysis, Sharpe ratio, side project, sorting algorithm, statistical model, type inference

This includes most kinds of data commonly stored in relational databases or tab- or comma-delimited text files Multiple tables of data interrelated by key columns (what would be primary or foreign keys for a SQL user) Evenly or unevenly spaced time series This is by no means a complete list. Even though it may not always be obvious, a large percentage of data sets can be transformed into a structured form that is more suitable for analysis and modeling. If not, it may be possible to extract features from a data set into a structured form. As an example, a collection of news articles could be processed into a word frequency table which could then be used to perform sentiment analysis. Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely used data analysis tool in the world, will not be strangers to these kinds of data. Why Python for Data Analysis? For many people (myself among them), the Python language is easy to fall in love with. Since its first appearance in 1991, Python has become one of the most popular dynamic, programming languages, along with Perl, Ruby, and others.

pages: 1,829 words: 135,521

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney

business process, Debian, Firefox, general-purpose programming language, Google Chrome, Guido van Rossum, index card, p-value, quantitative trading / quantitative finance, random walk, recommendation engine, sentiment analysis, side project, sorting algorithm, statistical model, type inference

This is by no means a complete list. Even though it may not always be obvious, a large percentage of datasets can be transformed into a structured form that is more suitable for analysis and modeling. If not, it may be possible to extract features from a dataset into a structured form. As an example, a collection of news articles could be processed into a word frequency table, which could then be used to perform sentiment analysis. Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely used data analysis tool in the world, will not be strangers to these kinds of data. 1.2 Why Python for Data Analysis? For many people, the Python programming language has strong appeal. Since its first appearance in 1991, Python has become one of the most popular interpreted programming languages, along with Perl, Ruby, and others.

pages: 567 words: 122,311

Lean Analytics: Use Data to Build a Better Startup Faster by Alistair Croll, Benjamin Yoskovitz

Airbnb, Amazon Mechanical Turk, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, barriers to entry, Bay Area Rapid Transit, Ben Horowitz, bounce rate, business intelligence, call centre, cloud computing, cognitive bias, commoditize, constrained optimization,, Firefox, Frederick Winslow Taylor, frictionless, frictionless market, game design, Google X / Alphabet X, Infrastructure as a Service, Internet of things, inventory management, Kickstarter, lateral thinking, Lean Startup, lifelogging, longitudinal study, Marshall McLuhan, minimum viable product, Network effects, pattern recognition, Paul Graham, performance metric, place-making, platform as a service, recommendation engine, ride hailing / ride sharing, rolodex, sentiment analysis, skunkworks, Skype, social graph, social software, software as a service, Steve Jobs, subscription business, telemarketer, transaction costs, two-sided market, Uber for X, web application, Y Combinator

We’ll share more details later in the book on the key metrics that matter based on your type of business, as well as benchmarks you can aim for. Table 5-2. Lean Canvas and relevant metrics Lean Canvas box Some relevant metrics Problem Respondents who have this need, respondents who are aware of having the need Solution Respondents who try the MVP, engagement, churn, most-used/least-used features, people willing to pay Unique value proposition Feedback scores, independent ratings, sentiment analysis, customer-worded descriptions, surveys, search, and competitive analysis Customer segments How easy it is to find groups of prospects, unique keyword segments, targeted funnel traffic from a particular source Channels Leads and customers per channel, viral coefficient and cycle, net promoter score, open rate, affiliate margins, click-through rate, PageRank, message reach Unfair advantage Respondents’ understanding of the UVP (Unique Value Proposition), patents, brand equity, barriers to entry, number of new entrants, exclusivity of relationships Revenue streams Lifetime customer value, average revenue per user, conversion rate, shopping cart size, click-through rate Cost structure Fixed costs, cost of customer acquisition, cost of servicing the nth customer, support costs, keyword costs Sean Ellis’s Startup Growth Pyramid Sean Ellis is a well-known entrepreneur and marketer.

pages: 474 words: 130,575

Surveillance Valley: The Rise of the Military-Digital Complex by Yasha Levine

23andMe, activist fund / activist shareholder / activist investor, Airbnb, AltaVista, Amazon Web Services, Anne Wojcicki, anti-communist, Apple's 1984 Super Bowl advert, bitcoin, borderless world, British Empire, call centre, Chelsea Manning, cloud computing, collaborative editing, colonial rule, computer age, computerized markets, corporate governance, crowdsourcing, cryptocurrency, digital map, don't be evil, Donald Trump, Douglas Engelbart, Douglas Engelbart, drone strike, Edward Snowden, El Camino Real, Electric Kool-Aid Acid Test, Elon Musk, fault tolerance, George Gilder, ghettoisation, global village, Google Chrome, Google Earth, Google Hangouts, Howard Zinn, hypertext link, IBM and the Holocaust, index card, Jacob Appelbaum, Jeff Bezos, jimmy wales, John Markoff, John von Neumann, Julian Assange, Kevin Kelly, Kickstarter, life extension, Lyft, Mark Zuckerberg, market bubble, Menlo Park, Mitch Kapor, natural language processing, Network effects, new economy, Norbert Wiener, packet switching, PageRank, Paul Buchheit, peer-to-peer, Peter Thiel, Philip Mirowski, plutocrats, Plutocrats, private military company, RAND corporation, Ronald Reagan, Ross Ulbricht, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, side project, Silicon Valley, Silicon Valley startup, Skype, slashdot, Snapchat, speech recognition, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Telecommunications Act of 1996, telepresence, telepresence robot, The Bell Curve by Richard Herrnstein and Charles Murray, The Hackers Conference, uber lyft, Whole Earth Catalog, Whole Earth Review, WikiLeaks

The US Air Force had a “Social Radar” initiative to tap intelligence coming in from the Internet, a system explicitly patterned after the early warning radar systems used to track enemy airplanes.16 The Intelligence Advanced Research Project Agency, run by the Office of the Director of National Intelligence, had multiple “anticipatory intelligence” research programs involving everything from mining YouTube videos for terrorist threats to predicting instability by scanning Twitter feeds and blogs and monitoring the Internet to predict future cyberattacks.17 DARPA ran a human radar project as well: the World-Wide Integrated Crisis Early Warning System, or ICEWS, which is pronounced as “IQs.” Started in 2007 and built by Lockheed Martin, the system ultimately grew into a full-fledged operational military prediction machine that had modules ingesting all sorts of open source network data—news wires, blogs, social media and Facebook posts, various Internet chatter, and “other sources of information”—and routing it through “sentiment analysis” in an attempt to predict military conflicts, insurgencies, civil wars, coups, and revolutions.18 DARPA’s ICEWS proved to be a success. Its core technology was spun off into a classified, operational version of the same system called ISPAN and absorbed into the US Strategic Command.19 The dream of building a global computer system that could watch the world and predict the future—it had a long and storied history in military circles.

pages: 475 words: 134,707

The Hype Machine: How Social Media Disrupts Our Elections, Our Economy, and Our Health--And How We Must Adapt by Sinan Aral

Airbnb, Albert Einstein, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, Bernie Sanders, bitcoin, carbon footprint, Cass Sunstein, computer vision, coronavirus, correlation does not imply causation, COVID-19, Covid-19, crowdsourcing, cryptocurrency, death of newspapers, disintermediation, Donald Trump, Drosophila, Edward Snowden, Elon Musk,, Erik Brynjolfsson, experimental subject, facts on the ground, Filter Bubble, global pandemic, hive mind, illegal immigration, income inequality, Kickstarter, knowledge worker, longitudinal study, low skilled workers, Lyft, Mahatma Gandhi, Mark Zuckerberg, Menlo Park, meta analysis, meta-analysis, Metcalfe’s law, mobile money, move fast and break things, move fast and break things, multi-sided market, Nate Silver, natural language processing, Network effects, performance metric, phenotype, recommendation engine, Robert Bork, Robert Shiller, Robert Shiller, Second Machine Age, sentiment analysis, shareholder value, skunkworks, Snapchat, social graph, social intelligence, social software, social web, statistical model, stem cell, Stephen Hawking, Steve Jobs, Telecommunications Act of 1996, The Chicago School, The Wisdom of Crowds, theory of mind, Tim Cook: Apple, Uber and Lyft, uber lyft, WikiLeaks, Yogi Berra

The main goal is to understand, second by second, what’s in a video, what it’s about, its context, feelings, and sentiment, and to compare the presence or absence of these elements to key performance indicators (KPIs) like video view-throughs, retention, drop-off rates, clicks, engagement, brand recognition, and satisfaction. By closing the loop of video production, analytics, optimization, and publishing, VidMob can improve its clients’ return on marketing investment. ACS automatically extracts video metadata and performs sentiment analysis. It uses deep learning and computer vision to identify the emotions, objects, logos, people, and words in videos; it can detect facial expressions like delight, surprise, or disgust. It then analyzes how each of these elements corresponds, for instance, to moments when viewers are dropping off from watching the video, and it recommends (and automates) editing that improves retention. The object, people, language, and emotion tagging also enables clients to organize and search their video assets by visual and language attributes.

Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, database schema, double helix,, epigenetics, fault tolerance, Firefox, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, longitudinal study, Mars Rover, natural language processing, openstreetmap, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social graph, SPARQL, speech recognition, statistical model, supply-chain management, text mining, Vernor Vinge, web application

This would let us determine which industries donate the most to which political parties. Figure 20-1 shows a couple of pie charts I made demonstrating this particular data mashup. F I G U R E 2 0 - 1 . Pie charts resulting from a data mashup of SEC industry data and Center for Responsible Politics political contribution data. (See Color Plate 70.) I haven’t even touched things like linking stock prices to sentiment analysis of message boards, trying to tie together genetics and drug data, or determining whether restaurants in low-income neighborhoods are dirtier (according to the health inspector), but this should give you just a small taste of what’s possible when different data sources are connected. Unfortunately, the difficulty of automatically connecting sets ranges from nontrivial to nearly impossible.

pages: 586 words: 186,548

Architects of Intelligence by Martin Ford

3D printing, agricultural Revolution, AI winter, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, barriers to entry, basic income, Baxter: Rethink Robotics, Bayesian statistics, bitcoin, business intelligence, business process, call centre, cloud computing, cognitive bias, Colonization of Mars, computer vision, correlation does not imply causation, crowdsourcing, DARPA: Urban Challenge, deskilling, disruptive innovation, Donald Trump, Douglas Hofstadter, Elon Musk, Erik Brynjolfsson, Ernest Rutherford, Fellow of the Royal Society, Flash crash, future of work, gig economy, Google X / Alphabet X, Gödel, Escher, Bach, Hans Rosling, ImageNet competition, income inequality, industrial robot, information retrieval, job automation, John von Neumann, Law of Accelerating Returns, life extension, Loebner Prize, Mark Zuckerberg, Mars Rover, means of production, Mitch Kapor, natural language processing, new economy, optical character recognition, pattern recognition, phenotype, Productivity paradox, Ray Kurzweil, recommendation engine, Robert Gordon, Rodney Brooks, Sam Altman, self-driving car, sensor fusion, sentiment analysis, Silicon Valley, smart cities, social intelligence, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, Ted Kaczynski, The Rise and Fall of American Growth, theory of mind, Thomas Bayes, Travis Kalanick, Turing test, universal basic income, Wall-E, Watson beat the top human players on Jeopardy!, women in the workforce, working-age population, zero-sum game, Zipcar

People are pretty good at monitoring the mental states of the people around them, and we know that about 55% of the signals we use are in facial expression and your gestures, while about 38% of the signal we respond to is from tone of voice. So how fast someone is speaking, the pitch, and how much energy is in the voice. Only 7% of the signal is in the text and the actual choice of words that someone uses! Now when you think of the entire industry of sentiment analysis, the multi-billion-dollar industry of people listening to tweets and analyzing text messages and all that, it only accounts for 7% of how humans communicate. What I like to think about what we’re doing here, is trying to capture the other 93% of non-verbal communication. So, back to your questions: about eighteen months ago I started a speech team that looks at these prosodic paralinguistic features.

Seeking SRE: Conversations About Running Production Systems at Scale by David N. Blank-Edelman

Affordable Care Act / Obamacare, algorithmic trading, Amazon Web Services, bounce rate, business continuity plan, business process, cloud computing, cognitive bias, cognitive dissonance, commoditize, continuous integration, crowdsourcing, dark matter, database schema, Debian, defense in depth, DevOps, domain-specific language,, fault tolerance, fear of failure, friendly fire, game design, Grace Hopper, information retrieval, Infrastructure as a Service, Internet of things, invisible hand, iterative process, Kubernetes, loose coupling, Lyft, Marc Andreessen, microservices, minimum viable product, MVC pattern, performance metric, platform as a service, pull request, RAND corporation, remote working, Richard Feynman, risk tolerance, Ruby on Rails, search engine result page, self-driving car, sentiment analysis, Silicon Valley, single page application, Snapchat, software as a service, software is eating the world, source of truth, the scientific method, Toyota Production System, web application, WebSocket, zero day

Here is a list of some operational challenges that we would like to solve in order to improve the SRE function: Automate noise reduction to filter out a specific stream Look for outliers with anomaly detection — for example, cluster malfunctions. Automate workflows around “situations,” not individual alerts. Automate ticket categorization based on patterns of behavior. Forecast short-term for service levels and long-term for capacity planning. In addition to these are other existing solutions — for example, for text analysis like spam filtering, sentiment analysis, and information extraction. All of these will hopefully reduce toil and alerts for humans by letting the machine do its job. The Awakening of Applied AI As senior site reliability engineer at my organization, I tend to search for long-term solutions that make the machine do the work for us — the best path to reach durable automation. This is the story of an investigation that is still ongoing.

pages: 788 words: 223,004

Merchants of Truth: The Business of News and the Fight for Facts by Jill Abramson

23andMe, 4chan, Affordable Care Act / Obamacare, Alexander Shulgin, Apple's 1984 Super Bowl advert, barriers to entry, Bernie Madoff, Bernie Sanders, Charles Lindbergh, Chelsea Manning, citizen journalism, cloud computing, commoditize, corporate governance, creative destruction, crowdsourcing, death of newspapers, digital twin, diversified portfolio, Donald Trump, East Village, Edward Snowden, Ferguson, Missouri, Filter Bubble, future of journalism, glass ceiling, Google Glasses, haute couture, hive mind, income inequality, information asymmetry, invisible hand, Jeff Bezos, Joseph Schumpeter, Khyber Pass, late capitalism, Marc Andreessen, Mark Zuckerberg, move fast and break things, move fast and break things, Nate Silver, new economy, obamacare, Occupy movement, performance metric, Peter Thiel, phenotype, pre–internet, race to the bottom, recommendation engine, Robert Mercer, Ronald Reagan, Saturday Night Live, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, skunkworks, Snapchat, social intelligence, social web, Steve Jobs, Steven Levy, technoutopianism, telemarketer, the scientific method, The Wisdom of Crowds, Tim Cook: Apple, too big to fail, WikiLeaks

Perhaps too powerful, thought some critics, who saw the collusion between platform and publisher as potentially destructive. They feared it would give BuzzFeed an unfair advantage, not only in reporting on how voters felt but in keeping that information to themselves and using it, as it did CrowdTangle, to determine what content would perform best with certain audiences. They warned that this might signal the end of the journalistic ideal of objective distance. Despite the flimsy science of “sentiment analysis,” BuzzFeed would read Facebook’s data as an earnest map of American political sentiment and assume that any clues they gleaned reflected truths about the electorate, when what they were actually observing was likely to be a fold in the fabric of Facebook’s own apparatus, a self-fulfilling reflection of its own hand in the conversation. BuzzFeed threatened to amplify and reinforce the unspoken influence that Facebook’s supposedly neutral network exerted over public opinion.

The Art of SEO by Eric Enge, Stephan Spencer, Jessie Stricchiola, Rand Fishkin

AltaVista, barriers to entry, bounce rate, Build a better mousetrap, business intelligence, cloud computing, dark matter,, Firefox, Google Chrome, Google Earth, hypertext link, index card, information retrieval, Internet Archive, Law of Accelerating Returns, linked data, mass immigration, Metcalfe’s law, Network effects, optical character recognition, PageRank, performance metric, risk tolerance, search engine result page, self-driving car, sentiment analysis, social web, sorting algorithm, speech recognition, Steven Levy, text mining, web application, wikimedia commons

bitly Excellent for tracking click-throughs on content from any source on any device or medium. Given the nonreporting of many desktop and mobile clients, bitly’s tracking has become a must for those seeking accurate analytics on the pages they share. Radian6 Probably the best known of the social media monitoring tools, Radian6 is geared toward enterprises and large budgets and has impressive social tracking, sentiment analysis, and reporting capabilities. Klout Measures an author’s authority by tracking activity related to many social accounts, including Twitter, Google+, LinkedIn, and others ( BackType Another fantastic tool for tracking social metrics, which was acquired by Twitter in 2011. Social Mention Enables Google Alerts–like updates from social media sources (Twitter in particular), and offers several plug-ins and search functions.