sentiment analysis

25 results back to index


pages: 527 words: 147,690

Terms of Service: Social Media and the Price of Constant Connection by Jacob Silverman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, 4chan, A Declaration of the Independence of Cyberspace, Airbnb, airport security, Amazon Mechanical Turk, augmented reality, Brian Krebs, California gold rush, call centre, cloud computing, cognitive dissonance, correlation does not imply causation, Credit Default Swap, crowdsourcing, don't be evil, Edward Snowden, feminist movement, Filter Bubble, Firefox, Flash crash, game design, global village, Google Chrome, Google Glasses, hive mind, income inequality, informal economy, information retrieval, Internet of things, Jaron Lanier, jimmy wales, Kevin Kelly, Kickstarter, knowledge economy, knowledge worker, late capitalism, license plate recognition, life extension, Lyft, Mark Zuckerberg, Mars Rover, Marshall McLuhan, meta analysis, meta-analysis, Minecraft, move fast and break things, national security letter, Network effects, new economy, Nicholas Carr, Occupy movement, optical character recognition, payday loans, Peter Thiel, postindustrial economy, prediction markets, pre–internet, price discrimination, price stability, profit motive, quantitative hedge fund, race to the bottom, Ray Kurzweil, recommendation engine, rent control, RFID, ride hailing / ride sharing, self-driving car, sentiment analysis, shareholder value, sharing economy, Silicon Valley, Silicon Valley ideology, Snapchat, social graph, social web, sorting algorithm, Steve Ballmer, Steve Jobs, Steven Levy, TaskRabbit, technoutopianism, telemarketer, transportation-network company, Turing test, Uber and Lyft, Uber for X, universal basic income, unpaid internship, women in the workforce, Y Combinator, Zipcar

BehaviorMatrix says that its program examined cancer blogs and discovered that cancer patients are most optimistic just after receiving their diagnosis. This insight might be useful for therapists, doctors, and public health professionals, but the company’s CEO told the Wall Street Journal that he drew on this information to advise drug companies in their ad targeting. The most likely application of sentiment analysis, then, is to give a slight edge to hedge funds and advertisers. At the very least, a gaggle of digital media consultants are pulling down hefty fees selling these services to deep-pocketed corporate clients. But what happens when sentiment analysis is not just spilling out reports for an executive’s consumption but is actually linked to potentially vital systems? And what happens then if a network becomes seeded with misinformation? You might just crash the stock market. On April 23, 2013, the Associated Press’s official Twitter account sent out the following tweet: “Breaking: Two Explosions in the White House and Barack Obama is injured.”

To become part of the social web, then, is to join the networks of surveillance, tracking, and data circulation that now support a vast informational economy and increasingly shape our social and cultural lives. Few aspects of contemporary life have gone unaffected by this shift, by the ability to publish immediately, freely, and to a massive audience. Shareability, and the drive to rack up likes and other metrics, guides the agendas of magazine editors and the budgets of marketers. Sentiment analysis—the mining of social-network data to determine the attitudes of individuals or whole populations—helps intelligence analysts learn where potential extremists are becoming radicalized. Advertisers collect social-media data and form consumer profiles with tens of thousands of pieces of information. Large corporations use social media to befriend customers, offer personalized customer service, and churn out friendly propaganda.

These companies will tinker with policies, especially after every public outrage and class-action lawsuit, but the end point remains the same: to retain rights over your data and expressions, and to make the transition from a status update to a related, paid advertisement as smooth as possible. CONVERTING EMOTIONS INTO PROFITABLE DATA Like buttons and taggable emotions are just two features of what has become a like economy, which depends on the growth of sentiment analysis, the examination of huge data sets to find out how people are reacting to news, products, or the events of their own lives. Retailers and advertisers want to know what individual consumers are thinking and buying, but they, along with investors, banks, consultants, and others, also want to be able to take the pulse of public opinion. To do this, they try to tap into the welter of data we produce on social media and blogs and also in traditional news media, review sites, message boards, and interviews with corporate executives.


pages: 398 words: 86,855

Bad Data Handbook by Q. Ethan McCallum

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, cloud computing, cognitive dissonance, combinatorial explosion, conceptual framework, database schema, en.wikipedia.org, Firefox, Flash crash, Gini coefficient, illegal immigration, iterative process, labor-force participation, loose coupling, natural language processing, Netflix Prize, quantitative trading / quantitative finance, recommendation engine, sentiment analysis, statistical model, supply-chain management, text mining, too big to fail, web application

I knew unethical people would lie in online reviews in order to inflate ratings or attack competitors, but what I didn’t know, and only learned by accident, is that individuals will sometimes write reviews that completely contradict their associated rating, without any regard to how it affects a business’s online reputation. And often this is for businesses that an individual likes. How did I learn this? By using ratings and reviews to create a sentiment corpus, I trained a sentiment analysis classifier that could reliably determine the sentiment of a review. While evaluating this classifier, I discovered that it could also detect discrepancies between the review sentiment and the corresponding rating, thereby finding liars and confused reviewers. Here’s the whole story of how I used text classification to identify an unexpected source of bad data... Weotta At my company, Weotta,[8] we produce applications and APIs for navigating local data in ways that people actually care about, so we can answer questions like: Is there a kid-friendly restaurant nearby?

So how can you accurately calculate an average rating? We wanted to do this for our data, as well as aggregate the overall positive sentiment from all the reviews for a business, independent of any average rating. With that in mind, I figured I could create a sentiment classifier,[11] using rated reviews as a training corpus. A classifier works by taking a feature set and determining a label. For sentiment analysis, a feature set is a piece of text, like a review, and the possible labels can be pos for positive text, and neg for negative text. Such a sentiment classifier could be run over a business’s reviews in order to calculate an overall sentiment, and to make up for any missing rating information. Sentiment Classification NLTK,[12] Python’s Natural Language ToolKit, is a very useful programming library for doing natural language processing and text classification.[13] It also comes with many corpora that you can use for training and testing.

So in a 5-star rating system, 3.5 stars and higher reviews went into the pos directory, while 2.5 stars and lower reviews went into the neg directory. The assumption behind this is that high rated reviews will have positive language, and low rated reviews will have more negative language. Polarized language is ideal for text classification, because the classifier can learn much more precisely those words that indicate pos and those words that indicate neg. Because I needed sentiment analysis for local businesses, not movies, I used a similar method to create my own sentiment training corpus for local business reviews. From a selection of businesses, I produced a corpus where the pos text came from 5 star reviews, and the neg text came from 1 star reviews. I actually started by using both 4 and 5 star reviews for pos, and 1 and 2 star reviews for neg, but after a number of training experiments, it was clear that the 2 and 4 star reviews had less polarizing language, and therefore introduced too much noise, decreasing the accuracy of the classifier.


pages: 317 words: 87,566

The Happiness Industry: How the Government and Big Business Sold Us Well-Being by William Davies

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

1960s counterculture, Airbnb, business intelligence, Cass Sunstein, corporate governance, dematerialisation, experimental subject, Exxon Valdez, Frederick Winslow Taylor, Gini coefficient, income inequality, invisible hand, joint-stock company, market bubble, mental accounting, nudge unit, profit maximization, randomized controlled trial, Richard Thaler, road to serfdom, Ronald Coase, Ronald Reagan, science of happiness, sentiment analysis, sharing economy, Slavoj Žižek, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, Steve Jobs, The Chicago School, The Spirit Level, theory of mind, urban planning

This book shares much of that disquiet. There are surely ample political and material problems to deal with right now, before we divert quite so much attention towards the mental and neural conditions through which we individually experience them. There is also a sense that when the doyens of the World Economic Forum seize an agenda with so much gusto, there is at least some cause for suspicion. The mood-tracking technologies, sentiment analysis algorithms and stress-busting meditation techniques are put to work in the service of certain political and economic interests. They are not simply gifted to us for our own Aristotelian flourishing. Positive psychology, which repeats the mantra that happiness is a personal ‘choice’, is as a result largely unable to provide the exit from consumerism and egocentricity that its gurus sense many people are seeking.

Companies such as Nike are now exploring ways in which health and fitness products can be sold alongside quantified self apps, which will allow individuals to make constant reports of their behaviour (such as jogging), generating new data sets for the company in the process. There is a third development, the political and philosophical implications of which are potentially the most radical of all. This concerns the capability to ‘teach’ computers how to interpret human behaviour in terms of the emotions that are conveyed. For example, the field of ‘sentiment analysis’ involves the design of algorithms to interpret the sentiment that is expressed in a given sentence, for example, a single tweet. The MIT Affective Computing research centre is dedicated to exploring new ways in which computers might read people’s moods through evaluating their facial expressions, or might carry out ‘emotionally intelligent’ conversations with people, to provide them with therapeutic support or friendship.

There are those who possess the power of algorithmic analysis and data mining to navigate a world in which there are too many pieces of data to be studied individually. These include market research agencies, social media platforms and the security services. But for the rest of us, impulse and emotion have become how we orientate and simplify our decisions. Hence the importance of fMRI and sentiment analysis in the digital age: tools which visualize, measure and codify our feelings become the main conduit between an esoteric, expert discourse of mathematics and facts, and a layperson’s discourse of mood, mystical belief and feeling. ‘We’ simply feel our way around, while ‘they’ observe and algorithmically analyse the results. Two separate languages are at work. The terminal dystopia of Benthamism, as touched on in Chapter 7, is of a social world that has been rendered totally objective, to the point where the distinction between the objective and the subjective is overcome.


pages: 123 words: 32,382

Grouped: How Small Groups of Friends Are the Key to Influence on the Social Web by Paul Adams

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Airbnb, Cass Sunstein, cognitive dissonance, David Brooks, information retrieval, invention of the telegraph, planetary scale, race to the bottom, Richard Thaler, sentiment analysis, social web, statistical model, The Wisdom of Crowds, web application, white flight

The good news is that research has shown that when businesses are transparent about what data they have on people, and people have control over that data, they tell advertisers more about themselves.10 If trustworthiness and expertise are requirements for credibility, then transparency is becoming increasingly critical for building trustworthiness. Why negative comments are good for your brand The emergence of the social web means that more people are talking openly about businesses, and many businesses are nervous about any negative commentary. Most want sentiment analysis in the advertising products they use so they can hide the negative comments and only promote the positive comments. But this is the wrong approach. People can easily differentiate between a natural conversation and something that is controlled, and they won’t react well to the latter. Hiding negative comments is not transparent; it will dramatically decrease credibility. If people perceive that a source of information is fair and unbiased, it increases credibility.

It’s based on permission, and on highlighting new things about people’s friends. * * * Quick Tips Building credibility with a business is similar to building trust with someone you just met. It is a slow process, often taking months and even years, and marketers need to be patient. There is no quick solution to creating a credible brand. One way to fast-track it is to be recommended by people’s friends. Don’t use sentiment analysis to filter out negative comments, and don’t delete negative comments on your Facebook page. Look at it as an opportunity to learn and respond. If people have something negative to say, it’s because they had a poor experience with your brand. This is something you should want to rectify rather than hide. * * * Summary There are two main problems with interruption marketing, both of which are getting worse.

See social networks New York Times 19 News Feed 134, 135 Nickerson, Raymond 127 nonconscious brain 107–111 decision making by 103–104, 107, 109–110, 148 processing capacity of 107, 108 Nordgren, Loran 115 Nudge (Thaler and Sunstein) 97 O On Intelligence (Hawkins) 114 100 Things Every Designer Needs to Know About People (Weinschenk) 114 overconfidence 96 Owyang, Jeremiah 69, 144 P Pahl, Ray 52, 55, 66, 67 passive sharing 138 patterns 105, 110, 114 Pedigree community 122 Penenberg, Adam 67, 144 permission marketing 12, 14, 133–138 friends and 137–138, 143 word of mouth and 135–137 Permission Marketing (Godin) 14, 143 personal information 139–140 Persuasive Technology (Fogg) 98 photos, Facebook 3, 4 Politics of Happiness, The (Bok) 27 polls, business 22 Predictably Irrational (Ariely) 98, 128, 144 predictions 105 preferential attachment 32 priming 125 problem-solving 105 Proctor & Gamble 109, 121 public ratings 26 push marketing 137 R rational thinking 102–104 reductive thinking 102 relationships changes in 66 patterns of 55–58 strong ties 53, 54, 59–62 types of 52–54 uniqueness of 52 weak ties 53, 54, 62–65 relevance 138 reputation management 17 Rethinking Friendships (Spencer and Pahl) 67 S Salganik, Matthew 98 Science of Influence, The (Hogan) 115, 144 Searching for a Corporate Savior (Khurana) 82 sentiment analysis 140, 142 Sephora marketing campaign 18 serendipitous audience 25 Sernovitz, Andy 68 sharing feelings 19 information 41, 146 passive 138 similarity bias 118 Simon, Herbert 98 Simonson, Itamar 126, 128 six degrees of separation 43–44, 73 Six Degrees (Watts) 49, 82 Smart Lists 32 Social Animal, The (Brooks) 48 social behavior 150 social bonds 16–17, 18 social cognitive theory 128 social networks communication patterns on 23–24 consumer behavior and 106 decision making using 90–93 evolution of 31–32 groups connected through 39 historical overview of 9, 146 importance of understanding 150 influence within 94–95 information communicated on 24–26 pattern of connections in 33–35, 47 strong ties on 23, 60–61 structure of 30–35, 42–46, 81, 147–148 social norms 88 social proof 86–89 social web future of 149–151 how to think of 8 importance of 11–12 next great challenge on 93 summary points about 146–149 society, influence of 87–88 soulmates 53 Spencer, Liz 52, 55, 66, 67 Sponsored Stories 142 status updates 16–17 Strangers to Ourselves (Wilson) 99 strong ties 53, 54, 59–62 average number of 60 buying decisions and 61–62 communications with 60–61 disproportionate influence of 61, 147 importance of having 59 structure of social networks 30–35 connection patterns and 33–35 homophily principle and 32, 45–46 idea spreading and 76, 147–148 influence related to 42–46 laws governing 31–32 Stumbling on Happiness (Gilbert) 115 Sunstein, Cass 97 Surowiecki, James 92, 98 survival mechanism 16 sympathy group 34 T tagging photos 3 Target, poll example 22 targeted ads 80, 138, 139 technology human behavior and 9–10 interruption marketing and 130 Tetlock, Philip 95, 99 Thaler, Richard 97 Think Outside In blog 153 thinking rational 102–104 understanding of 151 three degrees of separation 43, 45, 46, 94 Ticketmaster 35 Tipping Point, The (Gladwell) 11, 14, 72, 82 transparency 139–141 trust building 139–142 levels of 91, 93 marketing and 131, 137 Twitter 73 U useful contacts 52 user ratings/reviews 137 V Viral Loop (Penenberg) 67, 144 visibility of products 21 W Watts, Duncan 10, 49, 73, 76, 82, 87, 98 weak ties 53, 54, 62–65 interactions with 62–64 sourcing information from 64–65 web, the how it’s changing 2–8 people-based rebuilding of 7, 8 phases of development 8 why it’s changing 9–10 See also social web Web Strategy blog 69 Weinschenk, Susan 114 Western cultures 88 Wikipedia 34, 90 Wilson, Timothy 99 Winning Decisions (Russo and Schoemaker) 99 word of mouth 135–137 Word of Mouth Marketing (Sernovitz) 68 Z Zynga games 2–3


pages: 502 words: 107,510

Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, game design, information retrieval, iterative process, natural language processing, pattern recognition, performance metric, sentiment analysis, social web, speech recognition, statistical model, text mining

When is one set or type of feature better than another? This depends on what the target function is for the learning task. This is where annotation comes in. The point of linguistic annotation is to identify textual components of your document that can be associated with particular features for the phenomena for which you want to develop learning algorithms. Let’s take some examples beyond the spam-ham distinction. Consider sentiment analysis applied to movie reviews or hotel ratings. The most expedient method for classifying movie reviews is to set up the learning problem with n-gram features. The words in the reviews are taken as independent features (lexical clues), and thrown into a description of the target function. While this works remarkably well in general, this approach will fail to capture properties that show up as nonlocal dependencies, such as the ways that negation and modality are often expressed in language.

They have been applied to both simple and quite complex classification tasks (Manning et al. 2008). The main idea behind SVMs is to find the best-fitting decision boundary between two classes, one that is maximally far from any point in the training data. Nonlinearly separable data can be handled elegantly by using a technique called the kernel trick, which maps the data into a higher dimension where it behaves in a linear fashion. SVMs have been applied very successfully to sentiment analysis (Pang et al. 2002). We won’t be going into detail about these, however; other books on machine learning (see the list at the start of the chapter) provide excellent guides for how these classifiers work, and the ones we’ve already discussed are enough to get you started in training algorithms on your annotated data. Micro Versus Macro Classifiers are evaluated using the results of a simple table that sums up how often the tags were correctly assigned.

Classification algorithms are used to apply the most likely label (or classification) to a collection. They can be applied at a document, sentence, phrase, word, or any other level of language that is appropriate for your task. Using n-gram features is the simplest way to start with a classification system, but structure-dependent features and annotation-dependent features will help with more complex tasks such as event recognition or sentiment analysis. Decision trees are a type of ML algorithm that essentially ask “20 questions” of a corpus to determine what label should be applied to each item. The hierarchy of the tree determines the order in which the classifications are applied. The “questions” asked at each branch of a decision tree can be structure-dependent, annotation-dependent, or any other type of feature that can be discovered about the data.


pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, Affordable Care Act / Obamacare, airport security, AltaVista, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, IBM and the Holocaust, index card, informal economy, Internet of things, invention of the printing press, Jeff Bezos, Louis Pasteur, Mark Zuckerberg, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, performance metric, Peter Thiel, Post-materialism, post-materialism, random walk, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, speech recognition, Steve Jobs, Steven Levy, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Watson beat the top human players on Jeopardy!

Messiness can also refer to the inconsistency of formatting, for which the data needs to be “cleaned” before being processed. There are a myriad of ways to refer to IBM, notes the big-data expert DJ Patil, from I.B.M. to T. J. Watson Labs, to International Business Machines. And messiness can arise when we extract or process the data, since in doing so we are transforming it, turning it into something else, such as when we perform sentiment analysis on Twitter messages to predict Hollywood box office receipts. Messiness itself is messy. Suppose we need to measure the temperature in a vineyard. If we have only one temperature sensor for the whole plot of land, we must make sure it’s accurate and working at all times: no messiness allowed. In contrast, if we have a sensor for every one of the hundreds of vines, we can use cheaper, less sophisticated sensors (as long as they do not introduce a systematic bias).

And, in fact, they’re often just that. Yet the company enables the datafication of people’s thoughts, moods, and interactions, which could never be captured previously. Twitter has struck deals with two firms, DataSift and Gnip, to sell access to the data. (Although all tweets are public, access to the “firehose” comes at a cost.) Many businesses parse tweets, sometimes using a technique called sentiment analysis, to garner aggregate customer feedback or judge the impact of marketing campaigns. Two hedge funds, Derwent Capital in London and MarketPsych in California, started analyzing the datafied text of tweets as signals for investments in the stock market. (Their actual trading strategies were kept secret: rather than investing in firms that were ballyhooed, they may have bet against them.) Both firms now sell the information to traders.

The biologist Marcel Salathé of Penn State University and the software engineer Shashank Khandelwal analyzed tweets to find that people’s attitudes about vaccinations matched their likelihood of actually getting flu shots. Importantly, their study used the metadata of who was connected to whom among Twitter followers to go a step further still. They noticed that subgroups of unvaccinated people may exist. What marks this research as particularly special is that where other studies, such as Google Flu Trends, used aggregated data to consider the state of individuals’ health, the sentiment analysis performed by Salathé actually predicted health behaviors. These early findings indicate where datafication will surely go next. Like Google, a gaggle of social media networks such as Facebook, Twitter, LinkedIn, Foursquare, and others sit on an enormous treasure chest of datafied information that, once analyzed, will shed light on social dynamics at all levels, from the individual to society at large.

Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, data acquisition, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, Mark Zuckerberg, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining

However, with big data, the data resembles not so much a pool as an ongoing, fast-flowing stream. Therefore, a more continuous approach to sampling, analyzing, and acting on data is necessary. Chapter_01.indd 16 03/12/13 3:24 AM Why Big Data Is Important to You and Your Organization   17 This is particularly at issue for applications involving ­ ongoing ­monitoring of data, as in social media sentiment analysis. S ­ entiment a ­ nalysis allows an organization to assess whether the comments about its brands and products in blogs, tweets, and Facebook pages are positive or negative on balance. One potential problem with such monitoring applications is the tendency for managers to view a continuing stream of analysis and reports without making any decisions or taking any action. “Sentiment is up . . . no, it’s down . . . hooray, it’s back up again!”

It’s important then, to have clear criteria for what decisions to make and what actions to take based on big data analyses—particularly in fast-changing domains like social analytics. Sometimes it’s important to admit that the data and analyses are not definitive. I’ve already talked about the HunchWorks project at the United Nations, which seeks to identify trends and hunches at an early stage in order to decide whether they merit further attention. This could also be the right approach for social sentiment analysis—to use it as a tipoff that further investigation is required, rather than a specific action. If you’re a little more certain—but not entirely—that something important is going on based on your big data analysis, you might ­consider an automated recommendation. If necessary, a human could override it. That’s the approach that some health-care organizations are planning to take with the recommendations of IBM’s Watson ­system, for example.

., 195 cloud-based computing, 55, 89, 117, 163, 169, 192, 200, 208 Cloudera Hadoop, 115 commitment, culture of, 148 communication skills, 88, 92, 93, 99, 102–103 Competing on Analytics (Davenport and Harris), 2, 43 Compute Engine, 163 Concept 2, 12 conservative approach to big data ­adoption, 80, 81 consultants, data scientists as, 81, 98–99, 103–104, 112, 209 consumer products companies, 42, 42t, 43, 46, 54, 71, 82 Consumers Union, 67 Corporate Insight, 109 cost-reduction, 21, 60–63, 145 Coursera, 41 cows, data from, 11–12 credit card data, 37, 38, 42, 42t, 46, 164 culture for big data in organizations, 147–149, 152 customer relationship management (CRM), 54, 129f customers banking industry and, 9, 44, 49, 133 big data’s effect on relationships with, 26–27 business-to-business (B2B) firms and, 43, 45–46 business-to-business-to-consumer (B2B2C) firms and, 43, 46 data-based products and services for, 16, 23–24, 26, 66, 106, 155, 195 as focus of big data efforts, 16 future scenario of big data’s effect on relationships with, 35–38, 41–42, 58 identification of dissatisfaction and possible attrition of, 23, 48, 67, 68, 72, 78, 96, 179, 180, 181, 191 intermediaries reporting information about, 46 managers’ attention to, 21 marketing efforts targeted to, 27, 55, 63–64, 65, 67, 72, 79, 107, 108–109, 128, 142, 144, 179, 180, 197 media and entertainment firms and, 48, 49 03/12/13 2:04 PM 220  Index customers (continued) multichannel relationships with, 51, 67, 177, 186 Netflix Prize’s focus on, 16, 22, 66 overachievers and, 42, 42t, 46 regulatory environment for data from, 27 research on website behavior of, 164 sentiment analysis of, 17, 27, 107, 118, 123 service transaction histories from, 23 sharing data with, 167–168 social media and, 48, 50–51, 107 travel industry and, 75–76 underachievers and, 42t, 43–44 unstructured data from, 51, 67, 68, 69, 180, 186 volume of data warehoused from, 116–117, 168 Cutting, Doug, 157 CycleOps, 12 dashboards, 109, 128, 129, 130, 137, 167, 185, 198 data in big data stack, 119t, 121–122 success of big data initiatives and, 136–138 data disadvantaged organizations, 42t, 43 data discovery process big data strategy and, 70–72, 74–75, 75f, 84 enterprise orientation for, 139 focus of architecture on, 20, 201 GE’s experience with, 75 leadership and, 140 management orientation toward, 18–19 model generation for, 64 moderately aggressive approach to big data and, 82 objectives and, 75, 75f, 84 research on, 3 responsibility locus for, 76–77, 77f technical platform for, 131, 201 Data Lab product, 160 data mining, 122–123, 128, 183, 184 data production process big data strategy and, 70, 72–75, 75f, 84 data scientists and teams and, 201 enterprise orientation for, 139 Index.indd 220 GE’s experience with, 74–75 highly ambitious approach to big data and, 83 moderately aggressive approach to big data and, 82 objectives and, 75, 75f, 84 responsibility locus for, 76–77, 77f technical platform for, 74, 127, 129–130, 132, 133, 201 Data Science Central, 97 data scientists activities performed by, 15, 137–138, 148, 159–160, 199 analysts differentiated from, 15 background to, 86–87, 196–197 business expert traits of, 88 classic model of, 87–97 collaboration by, 165–167, 173, 176 development of products and services and, 16, 18, 20, 24, 61–62, 65, 66, 71, 79–80, 106, 161 education and training of, 14, 91, 92, 104, 184, 209 future for, 110–111 hacker traits of, 88–91 horizontal versus vertical, 97–99 job growth for, 111, 111f, 184–185 in large companies, 201 LinkedIn’s use of, 158, 160, 161 motivation of, 106 organizational structure with, 16, 61, 82, 140, 141, 142, 152, 153, 158, 173, 180, 187, 202, 207, 209 quantitative analyst traits of, 88, 93–97 research on, 3 retention of, 104–106, 112, 161 role of, 14, 209 scientist traits of, 88, 91–92 skills of, 71, 79, 88, 145, 147, 182–184, 185 sources of, for hiring, 101–105 start-ups using, 16, 157–158 team approach using, 99–101, 165–167, 181, 201, 209 traits of, 87, 88 trusted adviser traits of, 88, 92–93 data visualization, 124–125, 125f Davis, Jim, 163–164 DB2, 183.


pages: 397 words: 110,130

Smarter Than You Think: How Technology Is Changing Our Minds for the Better by Clive Thompson

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, 4chan, A Declaration of the Independence of Cyberspace, augmented reality, barriers to entry, Benjamin Mako Hill, butterfly effect, citizen journalism, Claude Shannon: information theory, conceptual framework, corporate governance, crowdsourcing, Deng Xiaoping, discovery of penicillin, Douglas Engelbart, Edward Glaeser, en.wikipedia.org, experimental subject, Filter Bubble, Freestyle chess, Galaxy Zoo, Google Earth, Google Glasses, Henri Poincaré, hindsight bias, hive mind, Howard Rheingold, information retrieval, iterative process, jimmy wales, Kevin Kelly, Khan Academy, knowledge worker, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Netflix Prize, Nicholas Carr, patent troll, pattern recognition, pre–internet, Richard Feynman, Richard Feynman, Ronald Coase, Ronald Reagan, sentiment analysis, Silicon Valley, Skype, Snapchat, Socratic dialogue, spaced repetition, telepresence, telepresence robot, The Nature of the Firm, the scientific method, The Wisdom of Crowds, theory of mind, transaction costs, Vannevar Bush, Watson beat the top human players on Jeopardy!, WikiLeaks, X Prize, éminence grise

p=2092#more-2092; Dale Lane, “Has Today Been a Good Day?” Dale Lane (blog), April 16, 2012, accessed March 22, 2013, dalelane.co.uk/blog/?p=2125. analyzed the color usage in Van Gogh’s major paintings: Cory Doctorow, “Van Gogh Pie-Charts,” Boing Boing, January 29, 2011, accessed March 23, 2013, boingboing.net/2011/01/29/van-gogh-pie-charts.html. a “sentiment analysis” of the Bible: “Applying Sentiment Analysis to the Bible,” OpenBible.info, October 10, 2011, accessed March 22, 2013, www.openbible.info/blog/2011/10/applying-sentiment-analysis-to-the-bible/. how characters interact in Hamlet: Richard Beck, “Hamlet and the Region of Death,” Boston Globe, May 29, 2011, accessed March 23, 2013, www.boston.com/bostonglobe/ideas/articles/2011/05/29/hamlet_and_the_region_of_death/. Tufte analyzed 217 data graphics: Edward R. Tufte, The Cognitive Style of Power Point (Cheshire, CT: Graphics Press, 2003), 4–5.

Even Gurrin admits to me that he rarely searches for anything at all in his massive archive. He’s waiting for better search tools to emerge. Mind you, he’s confident they will. As he points out, fifteen years ago you couldn’t find much on the Web because the search engines were dreadful. “And the first MP3 players were horrendous for finding songs,” he adds. The most promising trends in search algorithms include everything from “sentiment analysis” (you could hunt for a memory based on how happy or sad it is) to sophisticated ways of analyzing pictures, many of which are already emerging in everyday life: detecting faces and locations or snippets of text in pictures, allowing you to hunt down hard-to-track images by starting with a vague piece of half recall, the way we interrogate our own minds. The app Evernote has already become popular because of its ability to search for text, even bent or sideways, within photos and documents

If you want proof that data visualization is entering the mainstream, it’s there in online pop culture. Some of the biggest viral hits in recent years have been witty data crunches from odd, unexpected sources. Arthur Buxton, a young British Web designer, analyzed the color usage in Van Gogh’s major paintings and transformed them into pie charts, challenging viewers to figure out which was which. A group of Christian data nerds did a “sentiment analysis” of the Bible, using algorithms that determine whether a piece of text contains positive or negative language. (“Things start off well with creation, turn negative with Job and the patriarchs, improve again with Moses. . . . In the New Testament, things start off fine with Jesus, then quickly turn negative as opposition to his message grows.”) A professor of English used network-mapping software to analyze how characters interact in Hamlet and produced a map that uncovered some revealing patterns.


pages: 274 words: 75,846

The Filter Bubble: What the Internet Is Hiding From You by Eli Pariser

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

A Declaration of the Independence of Cyberspace, A Pattern Language, Amazon Web Services, augmented reality, back-to-the-land, Black Swan, borderless world, Build a better mousetrap, Cass Sunstein, citizen journalism, cloud computing, cognitive dissonance, crowdsourcing, Danny Hillis, data acquisition, disintermediation, don't be evil, Filter Bubble, Flash crash, fundamental attribution error, global village, Haight Ashbury, Internet of things, Isaac Newton, Jaron Lanier, Jeff Bezos, jimmy wales, Kevin Kelly, knowledge worker, Mark Zuckerberg, Marshall McLuhan, megacity, Netflix Prize, new economy, PageRank, paypal mafia, Peter Thiel, recommendation engine, RFID, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, social graph, social software, social web, speech recognition, Startup school, statistical model, stem cell, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, the scientific method, urban planning, Whole Earth Catalog, WikiLeaks, Y Combinator

There’s plenty of good that could emerge from persuasion profiling, Eckles believes. He points to DirectLife, a wearable coaching device by Philips that figures out which arguments get people eating more healthily and exercising more regularly. But he told me he’s troubled by some of the possibilities. Knowing what kinds of appeals specific people respond to gives you power to manipulate them on an individual basis. With new methods of “sentiment analysis, it’s now possible to guess what mood someone is in. People use substantially more positive words when they’re feeling up; by analyzing enough of your text messages, Facebook posts, and e-mails, it’s possible to tell good days from bad ones, sober messages from drunk ones (lots of typos, for a start). At best, this can be used to provide content that’s suited to your mood: On an awful day in the near future, Pandora might know to preload Pretty Hate Machine for you when you arrive.

But his dream is quite clear: Rendon wants to see a world where television “can drive the policy process,” where “border patrols [are] replaced by beaming patrols,” and where “you can win without fighting.” Given all that, I was a bit surprised when the first weapon he referred me to was a very quotidian one: a thesaurus. The key to changing public opinion, Rendon said, is finding different ways to say the same thing. He described a matrix, with extreme language or opinion on one side and mild opinion on the other. By using sentiment analysis to figure out how people in a country felt about an event—say, a new arms deal with the United States—and identify the right synonyms to move them toward approval, you could “gradually nudge a debate.” “It’s a lot easier to be close to what reality is” and push it in the right direction, he said, than to make up a new reality entirely. Rendon had seen me talk about personalization at an event we both attended.

PayPal PeekYou persuasion profiling Phantom Public, The (Lippmann) Philby, Kim Phorm Piaget, Jean Picasa Picasso, Pablo PK List Management Plato politics electoral districts and partisans and programmers and voting Popper, Karl postmaterialism predictions present bias priming effect privacy Facebook and facial recognition and genetic Procter & Gamble product recommendations Proulx, Travis Pulitzer, Joseph push technology and pull technology Putnam, Robert Qiang, Xiao Rapleaf Rather, Dan Raz, Guy reality augmented Reality Hunger (Shields) Reddit Rendon, John Republic.com (Sunstein) retargeting RFID chips robots Rodriguez de Montalvo, Garci Rolling Stone Roombas Rotenberg, Marc Rothstein, Mark Rove, Karl Royal Caribbean Rubel, Steve Rubicon Project Rumsfeld, Donald Rushkoff, Douglas Salam, Reihan Sandberg, Sheryl schemata Schmidt, Eric Schudson, Michael Schulz, Kathryn science Scientific American Scorpion sentiment analysis Sentry serendipity Shields, David Shirky, Clay Siegel, Lee signals click Simonton, Dean Singhal, Amit Sleepwalkers, The (Koestler) smart devices Smith, J. Walker social capital social graph Social Graph Symposium Social Network, The Solove, Daniel solution horizon Startup School Steitz, Mark stereotyping Stewart, Neal Stryker, Charlie Sullivan, Danny Sunstein, Cass systematization Taleb, Nassim Nicholas Tapestry TargusInfo Taylor, Bret technodeterminism technology television advertising on mean world syndrome and Tetlock, Philip Thiel, Peter This American Life Thompson, Clive Time Tocqueville, Alexis de Torvalds, Linus town hall meetings traffic transparency Trotsky, Leon Turner, Fred Twitter Facebook compared with Últimas Noticias Unabomber uncanny valley Upshot Vaidhyanathan, Siva video games Wales, Jimmy Wall Street Journal Walmart Washington Post Web site morphing Westen, Drew Where Good Ideas Come From (Johnson) Whole Earth Catalog WikiLeaks Wikipedia Winer, Dave Winner, Langdon Winograd, Terry Wired Wiseman, Richard Woolworth, Andy Wright, David Wu, Tim Yahoo News Upshot Y Combinator Yeager, Sam Yelp You Tube LeanBack Zittrain, Jonathan Zuckerberg, Mark Table of Contents Title Page Copyright Page Dedication Introduction Chapter 1 - The Race for Relevance Chapter 2 - The User Is the Content Chapter 3 - The Adderall Society Chapter 4 - The You Loop Chapter 5 - The Public Is Irrelevant Chapter 6 - Hello, World!


pages: 23 words: 5,264

Designing Great Data Products by Jeremy Howard, Mike Loukides, Margit Zwemer

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AltaVista, Filter Bubble, PageRank, pattern recognition, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, text mining

In another area where objective-based data products have the power to change lives, the CMU extension in Silicon Valley has an active project for building data products to help first responders after natural or man-made disasters. Jeannie Stamberger of Carnegie Mellon University Silicon Valley explained to us many of the possible applications of predictive algorithms to disaster response, from text-mining and sentiment analysis of tweets to determine the extent of the damage, to swarms of autonomous robots for reconnaissance and rescue, to logistic optimization tools that help multiple jurisdictions coordinate their responses. These disaster applications are a particularly good example of why data products need simple, well-designed interfaces that produce concrete recommendations. In an emergency, a data product that just produces more data is of little use.


pages: 271 words: 77,448

Humans Are Underrated: What High Achievers Know That Brilliant Machines Never Will by Geoff Colvin

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Ada Lovelace, autonomous vehicles, Baxter: Rethink Robotics, Black Swan, call centre, capital asset pricing model, computer age, corporate governance, deskilling, en.wikipedia.org, Freestyle chess, future of work, Google Glasses, Grace Hopper, industrial robot, interchangeable parts, job automation, knowledge worker, low skilled workers, meta analysis, meta-analysis, Narrative Science, new economy, rising living standards, self-driving car, sentiment analysis, Silicon Valley, Skype, Steve Jobs, Steve Wozniak, Steven Levy, Steven Pinker, theory of mind, Tim Cook: Apple, transaction costs

You’ve noticed that even the camera in your phone can detect faces and put little boxes around them. More advanced software can examine those faces and spot the muscle movements from Ekman’s system. The possibilities of such technology prompted six PhDs at the University of California at San Diego to form Emotient and to recruit Ekman to their advisory board. Point a video camera at any person’s face, and the company’s Sentiment Analysis software can tell you that person’s overall sentiment (positive, negative, neutral) plus display a continually updating bar chart showing levels of seven primary emotions—joy, surprise, sadness, fear, disgust, contempt, anger—and two advanced emotions, frustration and confusion (advanced because they’re combinations of other emotions). Point the camera at a group of people and it analyzes all their emotions and gives you a composite readout.

Point the camera at a group of people and it analyzes all their emotions and gives you a composite readout. Incorporate the software into Google Glass, as the company has done, and the emotion readouts for anyone you’re looking at appear before your eyes (and yes, several people quickly noted that the emotion you may very well detect is contempt for you because you’re wearing Google Glass). Emotient’s initial target market for selling the Sentiment Analysis system was retailers, but the possibilities are obviously much broader. Affectiva, a spin-off from MIT’s Media Lab, also uses Ekman’s research to analyze facial expressions, selling its software to marketers and advertisers so they can conduct consumer research online using webcams. No need to get your research subjects into a focus group and guess what they’re thinking; just have them talk to you online and let their faces tell the story.

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, discrete time, George Gilder, Google Earth, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, late capitalism, linked data, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

Examples would include entity extraction that automatically extracts metadata from text by searching for particular types of text and phrasing, such as person names, locations, dates, specialised terms and product terminology, and entity relation extraction that automatically identifies the relationships between semantic entities, linking them together (e.g., person name to birth date or location, or an opinion to an item) (McCreary 2009). A typical application of such techniques is sentiment analysis which seeks to determine the general nature and strength of opinions about an issue, for example, what people are saying about a product on social media. By using placemark metadata it is also possible to track where such sentiment is expressed (Graham et al. 2013) and to mine the dissemination of information within social media, for example, how widely Web addresses are favourited and shared between multiple users (Ohlhorst 2013).

Index A/B testing 112 abduction 133, 137, 138–139, 148 accountability 34, 44, 49, 55, 63, 66, 113, 116, 165, 171, 180 address e-mail 42 IP 8, 167, 171 place 8, 32, 42, 45, 52, 93, 171 Web 105 administration 17, 30, 34, 40, 42, 56, 64, 67, 87, 89, 114–115, 116, 124, 174, 180, 182 aggregation 8, 14, 101, 140, 169, 171 algorithm 5, 9, 21, 45, 76, 77, 83, 85, 89, 101, 102, 103, 106, 109, 111, 112, 118, 119, 122, 125, 127, 130, 131, 134, 136, 142, 146, 154, 160, 172, 177, 179, 181, 187 Amazon 72, 96, 131, 134 Anderson, C. 130, 135 Andrejevic, M. 133, 167, 178 animation 106, 107 anonymity 57, 63, 79, 90, 92, 116, 167, 170, 171, 172, 178 apophenia 158, 159 Application Programming Interfaces (APIs) 57, 95, 152, 154 apps 34, 59, 62, 64, 65, 78, 86, 89, 90, 95, 97, 125, 151, 170, 174, 177 archive 21, 22, 24, 25, 29–41, 48, 68, 95, 151, 153, 185 archiving 23, 29–31, 64, 65, 141 artificial intelligence 101, 103 Acxiom 43, 44 astronomy 34, 41, 72, 97 ATM 92, 116 audio 74, 77, 83 automatic meter reading (AMR) 89 automatic number plate recognition (ANPR) 85, 89 automation 32, 51, 83, 85, 87, 89–90, 98, 99, 102, 103, 118, 127, 136, 141, 146, 180 Ayasdi 132, 134 backup 29, 31, 40, 64, 163 barcode 74, 85, 92, Bates, J. 56, 61, 62, 182 Batty, M. 90, 111, 112, 140 Berry, D. 134, 141 bias 13, 14, 19, 28, 45, 101, 134–136, 153, 154, 155, 160 Big Brother 126, 180 big data xv, xvi, xvii, 2, 6, 13, 16, 20, 21, 27–29, 42, 46, 67–183, 186, 187, 188, 190, 191, 192 analysis 100–112 characteristics 27–29, 67–79 enablers 80–87 epistemology 128–148 ethical issues 165–183 etymology 67 organisational issues 160–163 rationale 113–127 sources 87–99 technical issues 149–160 biological sciences 128–129, 137 biometric data 8, 84, 115 DNA 8, 71, 84 face 85, 88, 105 fingerprints 8, 9, 84, 87, 88, 115 gait 85, 88 iris 8, 84, 88 bit-rot 20 blog 6, 95, 170 Bonferroni principle 159 born digital 32, 46, 141 Bowker, G. 2, 19, 20, 22, 24 Borgman, C. 2, 7, 10, 20, 30, 37, 40, 41 boyd, D. 68, 75, 151, 152, 156, 158, 160, 182 Brooks, D. 130, 145 business 1, 16, 42, 45, 56, 61, 62, 67, 79, 110, 113–127, 130, 137, 149, 152, 161, 166, 172, 173, 187 calculative practices 115–116 Campbell’s Law 63, 127 camera 6, 81, 83, 87, 88, 89, 90, 107, 116, 124, 167, 178, 180 capitalism 15, 16, 21, 59, 61, 62, 86, 95, 114, 119–123, 126, 136, 161, 184, 186 capta 2 categorization 6, 8, 12, 19, 20, 102, 106, 176 causation 130, 132, 135, 147 CCTV 87, 88, 180 census 17, 18, 19, 22, 24, 27, 30, 43, 54, 68, 74, 75, 76, 77, 87, 102, 115, 157, 176 Centro De Operações Prefeitura Do Rio 124–125, 182 CERN 72, 82 citizen science 97–99, 155 citizens xvi, 45, 57, 58, 61, 63, 71, 88, 114, 115, 116, 126, 127, 165, 166, 167, 174, 176, 179, 187 citizenship 55, 115, 170, 174 classification 6, 10, 11, 23, 28, 104, 105, 157, 176 clickstream 43, 92, 94, 120, 122, 154, 176 clustering 103, 104, 105, 106, 110, 122 Codd, E. 31 competitiveness xvi, 16, 114, computation 2, 4, 5, 6, 29, 32, 68, 80, 81–82, 83, 84, 86, 98, 100, 101, 102, 110, 129, 136, 139–147, 181 computational social science xiv, 139–147, 152, 186 computing cloud xv, 81, 86 distributed xv, 37, 78, 81, 83, 98 mobile xv, 44, 78, 80, 81, 83, 85, 139 pervasive 81, 83–84, 98, 124 ubiquitous 80, 81, 83–84, 98, 100, 124, 126 confidence level 14, 37, 133, 153, 160 confidentiality 8, 169, 175 control creep 126, 166, 178–179 cookies 92, 119, 171 copyright 16, 30, 40, 49, 51, 54, 96 correlation 105, 110, 130, 131, 132, 135, 145, 147, 157, 159 cost xv, 6, 11, 16, 27, 31, 32, 37, 38, 39, 40, 44, 52, 54, 57, 58, 59, 61, 66, 80, 81, 83, 85, 93, 96, 100, 116, 117, 118, 120, 127, 150 Crawford, K. 68, 75, 135, 151, 152, 155, 156, 158, 160, 182 credit cards 8, 13, 42, 44, 45, 85, 92, 167, 171, 176 risk 42, 63, 75, 120, 176, 177 crime 55, 115, 116, 123, 175, 179 crowdsourcing 37, 73, 93, 96–97, 155, 160 Cukier, K. 68, 71, 72, 91, 114, 128, 153, 154, 161, 174 customer relationship management (CRM) 42, 99, 117–118, 120, 122, 176 cyber-infrastructure 33, 34, 35, 41, 186 dashboard 106, 107, 108 data accuracy 12, 14, 110, 153, 154, 171 administrative 84–85, 89, 115, 116, 125, 150, 178 aggregators see data brokers amplification 8, 76, 99, 102, 167 analogue 1, 3, 32, 83, 88, 140, 141 analytics 42, 43, 63, 73, 80, 100–112, 116, 118, 119, 120, 124, 125, 129, 132, 134, 137, 139, 140, 145, 146, 149, 151, 159, 160, 161, 176, 179, 186, 191 archive see archive assemblage xvi, xvii, 2, 17, 22, 24–26, 66, 80, 83, 99, 117, 135, 139, 183, 184–192 attribute 4, 8–9, 31, 115, 150 auditing 33, 40, 64, 163 authenticity 12, 153 automated see automation bias see bias big see big data binary 1, 4, 32, 69 biometric see biometric data body 177–178, 187 boosterism xvi, 67, 127, 187, 192 brokers 42–45, 46, 57, 74, 75, 167, 183, 186, 187, 188, 191 calibration 13, 20 catalogue 32, 33, 35 clean 12, 40, 64, 86, 100, 101, 102, 152, 153, 154, 156 clearing house 33 commodity xvi, 4, 10, 12, 15, 16, 41, 42–45, 56, 161 commons 16, 42 consolidators see data brokers cooked 20, 21 corruption 19, 30 curation 9, 29, 30, 34, 36, 57, 141 definition 1, 2–4 deluge xv, 28, 73, 79, 100, 112, 130, 147, 149–151, 157, 168, 175 derived 1, 2, 3, 6–7, 8, 31, 32, 37, 42, 43, 44, 45, 62, 86, 178 deserts xvi, 28, 80, 147, 149–151, 161 determinism 45, 135 digital 1, 15, 31, 32, 67, 69, 71, 77, 82, 85, 86, 90, 137 directories 33, 35 dirty 29, 154, 163 dive 64–65, 188 documentation 20, 30, 31, 40, 64, 163 dredging 135, 147, 158, 159 dump 64, 150, 163 dynamic see dynamic data enrichment 102 error 13, 14, 44, 45, 101, 110, 153, 154, 156, 169, 175, 180 etymology 2–3, 67 exhaust 6–7, 29, 80, 90 fidelity 34, 40, 55, 79, 152–156 fishing see data dredging formats xvi, 3, 5, 6, 9, 22, 25, 30, 33, 34, 40, 51, 52, 54, 65, 77, 102, 153, 156, 157, 174 framing 12–26, 133–136, 185–188 gamed 154 holding 33, 35, 64 infrastructure xv, xvi, xvii, 2, 21–24, 25, 27–47, 52, 64, 102, 112, 113, 128, 129, 136, 140, 143, 147, 148, 149, 150, 156, 160, 161, 162, 163, 166, 184, 185, 186, 188, 189, 190, 191, 192 integration 42, 149, 156–157 integrity 12, 30, 33, 34, 37, 40, 51, 154, 157, 171 interaction 43, 72, 75, 85, 92–93, 94, 111, 167 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 156–157, 163, 184 interval 5, 110 licensing see licensing lineage 9, 152–156 linked see linked data lost 5, 30, 31, 39, 56, 150 markets xvi, 8, 15, 25, 42-45, 56, 59, 75, 167, 178 materiality see materiality meta see metadata mining 5, 77, 101, 103, 104–106, 109, 110, 112, 129, 132, 138, 159, 188 minimisation 45, 171, 178, 180 nominal 5, 110 ordinal 5, 110 open see open data ontology 12, 28, 54, 150 operational 3 ownership 16, 40, 96, 156, 166 preparation 40, 41, 54, 101–102 philosophy of 1, 2, 14, 17–21, 22, 25, 128–148, 185–188 policy 14, 23, 30, 33, 34, 37, 40, 48, 64, 160, 163, 170, 172, 173, 178 portals 24, 33, 34, 35 primary 3, 7–8, 9, 50, 90 preservation 30, 31, 34, 36, 39, 40, 64, 163 protection 15, 16, 17, 20, 23, 28, 40, 45, 62, 63, 64, 167, 168–174, 175, 178, 188 protocols 23, 25, 30, 34, 37 provenance 9, 30, 40, 79, 153, 156, 179 qualitative 4–5, 6, 14, 146, 191 quantitative 4–5, 14, 109, 127, 136, 144, 145, 191 quality 12, 13, 14, 34, 37, 40, 45, 52, 55, 57, 58, 64, 79, 102, 149, 151, 152–156, 157, 158 raw 1, 2, 6, 9, 20, 86, 185 ratio 5, 110 real-time 65, 68, 71, 73, 76, 88, 89, 91, 99, 102, 106, 107, 116, 118, 121, 124, 125, 139, 151, 181 reduction 5, 101–102 representative 4, 8, 13, 19, 21, 28 relational 3, 8, 28, 44, 68, 74–76, 79, 84, 85, 87, 88, 99, 100, 119, 140, 156, 166, 167, 184 reliability 12, 13–14, 52, 135, 155 resellers see data brokers resolution 7, 26, 27, 28, 68, 72, 73–74, 79, 84, 85, 89, 92, 133–134, 139, 140, 150, 180 reuse 7, 27, 29, 30, 31, 32, 39, 40, 41, 42, 46, 48, 49–50, 52, 56, 59, 61, 64, 102, 113, 163 scaled xvi, xvii 32, 100, 101, 112, 138, 149, 150, 163, 186 scarcity xv, xvi, 28, 80, 149–151, 161 science xvi, 100–112, 130, 137–139, 148, 151, 158, 160–163, 164, 191 secondary 3, 7–8 security see security selection 101, 176 semi-structured 4, 5–6, 77, 100, 105 sensitive 15, 16, 45, 63, 64, 137, 151, 167, 168, 171, 173, 174 shadow 166–168, 177, 179, 180 sharing 9, 11, 20, 21, 23, 24, 27, 29–41, 48–66, 80, 82, 95, 113, 141, 151, 174, 186 small see small data social construction 19–24 spatial 17, 52, 63, 68, 73, 75, 84–85, 88–89 standards xvi, 9, 14, 19, 22, 23, 24, 25, 31, 33, 34, 38, 40, 52, 53, 64, 102, 153, 156, 157 storage see storage stranded 156 structures 4, 5–6, 12, 21, 23, 30, 31, 40, 51, 68, 77, 86, 103, 106, 156 structured 4, 5–6, 11, 32, 52, 68, 71, 75, 77, 79, 86, 88, 105, 112, 163 tertiary 7–8, 9, 27, 74 time-series 68, 102, 106, 110 transient 6–7, 72, 150 transactional 42, 43, 71, 72, 74, 75, 85, 92, 93–94, 120, 122, 131, 167, 175, 176, 177 uncertainty see uncertainty unstructured 4, 5–6, 32, 52, 68, 71, 75, 77, 86, 100, 105, 112, 140, 153, 157 validity 12, 40, 72, 102, 135, 138, 154, 156, 158 variety 26, 28, 43, 44, 46, 68, 77, 79, 86, 139, 140, 166, 184 velocity 26, 28, 29, 68, 76–77, 78, 79, 86, 88, 102, 106, 112. 117, 140, 150, 153, 156, 184 veracity 13, 79, 102, 135, 152–156, 157, 163 volume 7, 26, 27, 28, 29, 32, 46, 67, 68, 69–72, 74, 76, 77, 78, 79, 86, 102, 106, 110, 125, 130, 135, 140, 141, 150, 156, 166, 184 volunteered 87, 93–98, 99, 155 databank 29, 34, 43 database NoSQL 6, 32, 77, 78, 86–87 relational 5, 6, 8, 32–33, 43, 74–75, 77, 78, 86, 100, 105 data-driven science 133, 137–139, 186 data-ism 130 datafication 181 dataveillance 15, 116, 126, 157, 166–168, 180, 181, 182, 184 decision tree 104, 111, 122, 159, deconstruction 24, 98, 126, 189–190 decontextualisation 22 deduction 132, 133, 134, 137, 138, 139, 148 deidentification 171, 172, 178 democracy 48, 55, 62, 63, 96, 117, 170 description 9, 101, 104, 109, 143, 147, 151, 190 designated community 30–31, 33, 46 digital devices 13, 25, 80, 81, 83, 84, 87, 90–91, 167, 174, 175 humanities xvi, 139–147, 152, 186 object identifier 8, 74 serendipity 134 discourse 15, 20, 55, 113–114, 117, 122, 127, 192 discursive regime 15, 20, 24, 56, 98, 113–114, 116, 123, 126, 127, 190 disruptive innovation xv, 68, 147, 184, 192 distributed computing xv, 37, 78, 81, 83, 98 sensors 124, 139, 160 storage 34, 37, 68, 78, 80, 81, 85–87, 97 division of labour 16 Dodge, M. 2, 21, 68, 73, 74, 76, 83, 84, 85, 89, 90, 92, 93, 96, 113, 115, 116, 124, 154, 155, 167, 177, 178, 179, 180, 189 driver’s licence 45, 87, 171 drone 88, Dublin Core 9 dynamic data xv, xvi, 76–77, 86, 106, 112 pricing 16, 120, 123, 177 eBureau 43, 44 ecological fallacy 14, 102, 135, 149, 158–160 Economist, The 58, 67, 69, 70, 72, 128 efficiency 16, 38, 55, 56, 59, 66, 77, 93, 102, 111, 114, 116, 118, 119, 174, 176 e-mail 71, 72–73, 82, 85, 90, 93, 116, 174, 190 empiricism 129, 130–137, 141, 186 empowerment 61, 62–63, 93, 115, 126, 165 encryption 171, 175 Enlightenment 114 Enterprise Resource Planning (ERP) 99, 117, 120 entity extraction 105 epistemology 3, 12, 19, 73, 79, 112, 128–148, 149, 185, 186 Epsilon 43 ethics 12, 14–15, 16, 19, 26, 30, 31, 40, 41, 64, 73, 99, 128, 144, 151, 163, 165–183, 186 ethnography 78, 189, 190, 191 European Union 31, 38, 45, 49, 58, 59, 70, 157, 168, 173, 178 everyware 83 exhaustive 13, 27, 28, 68, 72–73, 79, 83, 88, 100, 110, 118, 133–134, 140, 150, 153, 166, 184 explanation 101, 109, 132, 133, 134, 137, 151 extensionality 67, 78, 140, 184 experiment 2, 3, 6, 34, 75, 78, 118, 129, 131, 137, 146, 150, 160 Facebook 6, 28, 43, 71, 72, 77, 78, 85, 94, 119, 154, 170 facts 3, 4, 9, 10, 52, 140, 159 Fair Information Practice Principles 170–171, 172 false positive 159 Federal Trade Commission (FTC) 45, 173 flexibility 27, 28, 68, 77–78, 79, 86, 140, 157, 184 Flickr 95, 170 Flightradar 107 Floridi, L. 3, 4, 9, 10, 11, 73, 112, 130, 151 Foucault, M. 16, 113, 114, 189 Fourth paradigm 129–139 Franks, B. 6, 111, 154 freedom of information 48 freemium service 60 funding 15, 28, 29, 31, 34, 37, 38, 40, 41, 46, 48, 52, 54–55, 56, 57–58, 59, 60, 61, 65, 67, 75, 119, 143, 189 geographic information systems 147 genealogy 98, 127, 189–190 Gitelman, L. 2, 19, 20, 21, 22 Global Positioning System (GPS) 58, 59, 73, 85, 88, 90, 121, 154, 169 Google 32, 71, 73, 78, 86, 106, 109, 134, 170 governance 15, 21, 22, 23, 38, 40, 55, 63, 64, 66, 85, 87, 89, 117, 124, 126, 136, 168, 170, 178–182, 186, 187, 189 anticipatory 126, 166, 178–179 technocratic 126, 179–182 governmentality xvi, 15, 23, 25, 40, 87, 115, 127, 168, 185, 191 Gray, J. 129–130 Guardian, The 49 Gurstein, M. 52, 62, 63 hacking 45, 154, 174, 175 hackathon 64–65, 96, 97, 188, 191 Hadoop 87 hardware 32, 34, 40, 63, 78, 83, 84, 124, 143, 160 human resourcing 112, 160–163 hype cycle 67 hypothesis 129, 131, 132, 133, 137, 191 IBM 70, 123, 124, 143, 162, 182 identification 8, 44, 68, 73, 74, 77, 84–85, 87, 90, 92, 115, 169, 171, 172 ideology 4, 14, 25, 61, 113, 126, 128, 130, 134, 140, 144, 185, 190 immutable mobiles 22 independence 3, 19, 20, 24, 100 indexical 4, 8–9, 32, 44, 68, 73–74, 79, 81, 84–85, 88, 91, 98, 115, 150, 156, 167, 184 indicator 13, 62, 76, 102, 127 induction 133, 134, 137, 138, 148 information xvii, 1, 3, 4, 6, 9–12, 13, 23, 26, 31, 33, 42, 44, 45, 48, 53, 67, 70, 74, 75, 77, 92, 93, 94, 95, 96, 100, 101, 104, 105, 109, 110, 119, 125, 130, 138, 140, 151, 154, 158, 161, 168, 169, 171, 174, 175, 184, 192 amplification effect 76 freedom of 48 management 80, 100 overload xvi public sector 48 system 34, 65, 85, 117, 181 visualisation 109 information and communication technologies (ICTs) xvi, 37, 80, 83–84, 92, 93, 123, 124 Innocentive 96, 97 INSPIRE 157 instrumental rationality 181 internet 9, 32, 42, 49, 52, 53, 66, 70, 74, 80, 81, 82, 83, 86, 92, 94, 96, 116, 125, 167 of things xv, xvi, 71, 84, 92, 175 intellectual property rights xvi, 11, 12, 16, 25, 30, 31, 40, 41, 49, 50, 56, 62, 152, 166 Intelius 43, 44 intelligent transportation systems (ITS) 89, 124 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 149, 156–157, 163, 184 interpellation 165, 180, 188 interviews 13, 15, 19, 78, 155, 190 Issenberg, S. 75, 76, 78, 119 jurisdiction 17, 25, 51, 56, 57, 74, 114, 116 Kafka 180 knowledge xvii, 1, 3, 9–12, 19, 20, 22, 25, 48, 53, 55, 58, 63, 67, 93, 96, 110, 111, 118, 128, 130, 134, 136, 138, 142, 159, 160, 161, 162, 187, 192 contextual 48, 64, 132, 136–137, 143, 144, 187 discovery techniques 77, 138 driven science 139 economy 16, 38, 49 production of 16, 20, 21, 24, 26, 37, 41, 112, 117, 134, 137, 144, 184, 185 pyramid 9–10, 12, situated 16, 20, 28, 135, 137, 189 Latour, B. 22, 133 Lauriault, T.P. 15, 16, 17, 23, 24, 30, 31, 33, 37, 38, 40, 153 law of telecosm 82 legal issues xvi, 1, 23, 25, 30, 31, 115, 165–179, 182, 183, 187, 188 levels of measurement 4, 5 libraries 31, 32, 52, 71, 141, 142 licensing 14, 25, 40, 42, 48, 49, 51, 53, 57, 73, 96, 151 LIDAR 88, 89, 139 linked data xvii, 52–54, 66, 156 longitudinal study 13, 76, 140, 149, 150, 160 Lyon, D. 44, 74, 87, 167, 178, 180 machine learning 5, 6, 101, 102–104, 106, 111, 136, 188 readable 6, 52, 54, 81, 84–85, 90, 92, 98 vision 106 management 62, 88, 117–119, 120, 121, 124, 125, 131, 162, 181 Manovich, L. 141, 146, 152, 155 Manyika, J. 6, 16, 70, 71, 72, 104, 116, 118, 119, 120, 121, 122, 161 map 5, 22, 24, 34, 48, 54, 56, 73, 85, 88, 93, 96, 106, 107, 109, 115, 143, 144, 147, 154, 155–156, 157, 190 MapReduce 86, 87 marginal cost 11, 32, 57, 58, 59, 66, 151 marketing 8, 44, 58, 73, 117, 119, 120–123, 131, 176 marketisation 56, 61–62, 182 materiality 4, 19, 21, 24, 25, 66, 183, 185, 186, 189, 190 Mattern, S. 137, 181 Mayer-Schonberger, V. 68, 71, 72, 91, 114, 153, 154, 174 measurement 1, 3, 5, 6, 10, 12, 13, 15, 19, 23, 69, 97, 98, 115, 128, 166 metadata xvi, 1, 3, 4, 6, 8–9, 13, 22, 24, 29, 30, 31, 33, 35, 40, 43, 50, 54, 64, 71, 72, 74, 78, 85, 91, 93, 102, 105, 153, 155, 156 methodology 145, 158, 185 middleware 34 military intelligence 71, 116, 175 Miller, H.J. xvi, 27, 100, 101, 103, 104, 138, 139, 159 Minelli, M. 101, 120, 137, 168, 170, 171, 172, 174, 176 mixed methods 147, 191 mobile apps 78 computing xv, 44, 78, 80, 81, 83, 85, 139 mapping 88 phones 76, 81, 83, 90, 93, 151, 168, 170, 175 storage 85 mode of production 16 model 7, 11, 12, 24, 32, 37, 44, 57, 72, 73, 101, 103, 105, 106, 109, 110–112, 119, 125, 129, 130, 131, 132, 133, 134, 137, 139, 140, 144, 145, 147, 158–159, 166, 181 agent-based model 111, business 30, 54, 57–60, 61, 95, 118, 119, 121 environmental 139, 166 meteorological 72 time-space 73 transportation 7 modernity 3 Moore’s Law 81, moral philosophy 14 Moretti, F. 141–142 museum 31, 32, 137 NASA 7 National Archives and Records Administration (NARA) 67 National Security Agency (NSA) 45, 116 natural language processing 104, 105 near-field communication 89, 91 neoliberalism 56, 61–62, 126, 182 neural networks 104, 105, 111 New Public Management 62, non-governmental organisations xvi, 43, 55, 56, 73, 117 non-excludable 11, 151 non-rivalrous 11, 57, 151 normality 100, 101 normative thinking 12, 15, 19, 66, 99, 127, 144, 182, 183, 187, 192 Obama, B. 53, 75–76, 78, 118–119 objectivity 2, 17, 19, 20, 62, 135, 146, 185 observant participation 191 oligopticon 133, 167, 180 ontology 3, 12, 17–21, 22, 28, 54, 79, 128, 138, 150, 156, 177, 178, 184, 185 open data xv, xvi, xvii, 2, 12, 16, 21, 25, 48–66, 97, 114, 124, 128, 129, 140, 149, 151, 163, 164, 167, 186, 187, 188, 190, 191, 192 critique of 61–66 economics of 57–60 rationale 54–56 Open Definition 50 OpenGovData 50, 51 Open Knowledge Foundation 49, 52, 55, 58, 189, 190 open science 48, 72, 98 source 48, 56, 60, 87, 96 OpenStreetMap 73, 93, 96, 154, 155–156 optimisation 101, 104, 110–112, 120, 121, 122, 123 Ordnance Survey 54, 57 Organization for Economic Cooperation and Development (OECD) 49, 50, 59 overlearning 158, 159 panoptic 133, 167, 180 paradigm 112, 128–129, 130, 138, 147, 148, 186 participant observation 190, 191 participation 48, 49, 55, 66, 82, 94, 95, 96, 97–98, 126, 155, 165, 180 passport 8, 45, 84, 87, 88, 115 patent 13, 16, 41, 51 pattern recognition 101, 104–106, 134, 135 personally identifiable information 171 philanthropy 32, 38, 58 philosophy of science 112, 128–148, 185–188 phishing 174, 175 phone hacking 45 photography 6, 43, 71, 72, 74, 77, 86, 87, 88, 93, 94, 95, 105, 115, 116, 141, 155, 170 policing 80, 88, 116, 124, 125, 179 political economy xvi, 15–16, 25, 42–45, 182, 185, 188, 191 Pollock, R. 49, 54, 56, 57 58, 59 positivism 129, 136–137, 140, 141, 144, 145, 147 post-positivism 140, 144, 147 positionality 135, 190 power/knowledge 16, 22 predictive modelling 4, 7, 12, 34, 44, 45, 76, 101, 103, 104, 110–112, 118, 119, 120, 125, 132, 140, 147, 168, 179 profiling 110–112, 175–178, 179, 180 prescription 101 pre-analytical 2, 3, 19, 20, 185 pre-analytics 101–102, 112 pre-factual 3, 4, 19, 185 PRISM 45, 116 privacy 15, 28, 30, 40, 45, 51, 57, 63, 64, 96, 117, 163, 165, 166, 168–174, 175, 178, 182, 187 privacy by design 45, 173, 174 probability 14, 110, 153, 158 productivity xvi, 16, 39, 55, 66, 92, 114, 118 profiling 12, 42–45, 74, 75, 110–112, 119, 166, 168, 175–178, 179, 180, 187 propriety rights 48, 49, 54, 57, 62 prosumption 93 public good 4, 12, 16, 42, 52, 56, 58, 79, 97 –private partnerships 56, 59 sector information (PSI) 12, 48, 54, 56, 59, 61, 62 quantified self 95 redlining 176, 182 reductionism 73, 136, 140, 142, 143, 145 regression 102, 104, 105, 110, 111, 122 regulation xvi, 15, 16, 23, 25, 40, 44, 46, 83, 85, 87, 89–90, 114, 115, 123, 124, 126, 168, 174, 178, 180, 181–182, 187, 192 research design 7, 13, 14, 77–78, 98, 137–138, 153, 158 Renaissance xvi, 129, 141 repository 29, 33, 34, 41 representativeness 13, 14, 19, 21 Resource Description Framework (RDF) 53, 54 remote sensing 73–74, 105 RFID 74, 85, 90, 91, 169 rhetorical 3, 4, 185 right to be forgotten 45, 172, 187 information (RTI) 48, 62 risk 16, 44, 58, 63, 118, 120, 123, 132, 158, 174, 176–177, 178, 179, 180 Rosenberg, D. 1, 3 Ruppert, E. 22, 112, 157, 163, 187 sampling 13, 14, 27, 28, 46, 68, 72, 73, 77, 78, 88, 100, 101, 102, 120, 126, 133, 138, 139, 146, 149–150, 152, 153, 154, 156, 159 scale of economy 37 scanners 6, 25, 29, 32, 83, 85, 88, 89, 90, 91, 92, 175, 177, 180 science xvi, 1, 2, 3, 19, 20, 29, 31, 34, 37, 46, 65, 67, 71, 72, 73, 78, 79, 97, 98, 100, 101, 103, 111, 112, 128–139, 140, 147, 148, 150, 158, 161, 165, 166, 181, 184, 186 scientific method 129, 130, 133, 134, 136, 137–138, 140, 147, 148, 186 security data 28, 33, 34, 40, 45, 46, 51, 57, 126, 157, 166, 169, 171, 173, 174–175, 182, 187 national 42, 71, 88, 116–117, 172, 176, 178, 179 private 99, 115, 118, 151 social 8, 32, 45, 87, 115, 171 segmentation 104, 105, 110, 119, 120, 121, 122, 176 semantic information 9, 10, 11, 105, 157 Web 49, 52, 53, 66 sensors xv, 6, 7, 19, 20, 24, 25, 28, 34, 71, 76, 83, 84, 91–92, 95, 124, 139, 150, 160 sentiment analysis 105, 106, 121, Siegel, E. 103, 110, 111, 114, 120, 132, 158, 176, 179 signal 9, 151, 159 Silver, N. 136, 151, 158 simulation 4, 32, 37, 101, 104, 110–112, 119, 129, 133, 137, 139, 140 skills 37, 48, 52, 53, 57, 63, 94, 97, 98, 112, 149, 160–163, 164 small data 21, 27–47, 68, 72, 75, 76, 77, 79, 100, 103, 110, 112, 146, 147, 148, 150, 156, 160, 166, 184, 186, 188, 191 smart cards 90 cities 91, 92, 99, 124–125, 181–182 devices 83 metering 89, 123, 174 phones 81, 82, 83, 84, 90, 94, 107, 121, 155, 170, 174 SmartSantander 91 social computing xvi determinism 144 media xv, 13, 42, 43, 76, 78, 90, 93, 94–95, 96, 105, 119, 121, 140, 150, 151, 152, 154, 155, 160, 167, 176, 180 physics 144 security number 8, 32, 45, 87, 115, 171 sorting 126, 166, 168, 175–178, 182 sociotechnical systems 21–24, 47, 66, 183, 185, 188 software 6, 20, 32, 34, 40, 48, 53, 54, 56, 63, 80, 83, 84, 86, 88, 96, 132, 143, 160, 161, 163, 166, 170, 172, 175, 177, 180, 189 Solove, D. 116, 120, 168, 169, 170, 172, 176, 178, 180 solutionism 181 sousveillance 95–96 spatial autocorrelation 146 data infrastructure 34, 35, 38 processes 136, 144 resolution 149 statistics 110 video 88 spatiality 17, 157 Star, S.L. 19, 20, 23, 24 stationarity 100 statistical agencies 8, 30, 34, 35, 115 geography 17, 74, 157 statistics 4, 8, 13, 14, 24, 48, 77, 100, 101, 102, 104, 105, 109–110, 111, 129, 132, 134, 135, 136, 140, 142, 143, 145, 147, 159 descriptive 4, 106, 109, 147 inferential 4, 110, 147 non-parametric 105, 110 parametric 105, 110 probablistic 110 radical 147 spatial 110 storage 31–32, 68, 72, 73, 78, 80, 85–87, 88, 100, 118, 161, 171 analogue 85, 86 digital 85–87 media 20, 86 store loyalty cards 42, 45, 165 Sunlight Foundation 49 supervised learning 103 Supply Chain Management (SCM) 74, 99, 117–118, 119, 120, 121 surveillance 15, 71, 80, 83, 87–90, 95, 115, 116, 117, 123, 124, 151, 165, 167, 168, 169, 180 survey 6, 17, 19, 22, 28, 42, 68, 75, 77, 87, 115, 120 sustainability 16, 33, 34, 57, 58, 59, 61, 64–66, 87, 114, 123–124, 126, 155 synchronicity 14, 95, 102 technological handshake 84, 153 lock-in 166, 179–182 temporality 17, 21, 27, 28, 32, 37, 68, 75, 111, 114, 157, 160, 186 terrorism 116, 165, 179 territory 16, 38, 74, 85, 167 Tesco 71, 120 Thrift, N. 83, 113, 133, 167, 176 TopCoder 96 trading funds 54–55, 56, 57 transparency 19, 38, 44, 45, 48–49, 55, 61, 62, 63, 113, 115, 117, 118, 121, 126, 165, 173, 178, 180 trust 8, 30, 33, 34, 40, 44, 55, 84, 117, 152–156, 163, 175 trusted digital repository 33–34 Twitter 6, 71, 78, 94, 106, 107, 133, 143, 144, 146, 152, 154, 155, 170 uncertainty 10, 13, 14, 100, 102, 110, 156, 158 uneven development 16 Uniform Resource Identifiers (URIs) 53, 54 United Nations Development Programme (UNDP) 49 universalism 20, 23, 133, 140, 144, 154, 190 unsupervised learning 103 utility 1, 28, 53, 54, 55, 61, 63, 64–66, 100, 101, 114, 115, 134, 147, 163, 185 venture capital 25, 59 video 6, 43, 71, 74, 77, 83, 88, 90, 93, 94, 106, 141, 146, 170 visual analytics 106–109 visualisation 5, 10, 34, 77, 101, 102, 104, 106–109, 112, 125, 132, 141, 143 Walmart 28, 71, 99, 120 Web 2.0 81, 94–95 Weinberger, D. 9, 10, 11, 96, 97, 132, 133 White House 48 Wikipedia 93, 96, 106, 107, 143, 154, 155 Wired 69, 130 wisdom 9–12, 114, 161 XML 6, 53 Zikopoulos, P.C. 6, 16, 68, 70, 73, 76, 119, 151


pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, call centre, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil, en.wikipedia.org, Erik Brynjolfsson, experimental subject, Google Glasses, happiness index / gross national happiness, job satisfaction, Johann Wolfgang von Goethe, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, software as a service, speech recognition, statistical model, Steven Levy, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra

Eric Gilbert and Karrie Karahalios released the data, code, and models for this research: Eric Gilbert, “Update: Widespread Worry and the Stock Market,” Social.CS.UIUC.EDU, March 13, 2010. http://social.cs.uiuc.edu/people/gilbert/38. Predicting by social media: Sitaram Asu and Bernardo A. Huberman, “Predicting the Future with Social Media,” Cornell University Library, March 29, 2010, arXiv.org, arXiv:1003.5699. http://arxiv.org/abs/1003.5699/. Anshul Mittal and Arpit Goel, “Stock Prediction Using Twitter Sentiment Analysis,” Stanford University Libraries, December 16, 2011. http://cs229.stanford.edu/proj2011/GoelMittal-StockMarketPredictionUsingTwitterSentimentAnalysis.pdf. Allison Aubrey, “Happiness: It Really Is Contagious,” NPR News, All Things Considered, December 5, 2008. www.npr.org/templates/story/story.php?storyId=97831171. Shea Bennett, “Can Twitter Beat the Stock Market? Tweet Sentiment Trading API Bets That It Can,” Mediabistro, July 5, 2012. www.mediabistro.com/alltwitter/twitter-trading-api_b24992.


pages: 593 words: 118,995

Relevant Search: With Examples Using Elasticsearch and Solr by Doug Turnbull, John Berryman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

crowdsourcing, domain-specific language, finite state, fudge factor, full text search, information retrieval, natural language processing, premature optimization, recommendation engine, sentiment analysis

Otherwise, users might not find a document because it contains a misspelling of the query term. Or they may find 20 duplicates of the same document, which would have the effect of pushing other relevant documents off the end of the search results page. Second, often the existing data can be post-processed to augment the features already there. For instance, machine-learning techniques can be used to classify or cluster documents. Or sentiment analysis can be used to determine whether the text in a document is more positive or negative in tone. The possibilities are endless. After this new metadata is attached to the documents, it can serve as a valuable feature for users to search upon. Finally, new information can be merged into the documents from external sources. For instance, in e-commerce the products being sold often come from external vendors.

relevance-blind enterprise, 2nd relevance-centered enterprise business and domain awareness content curation risk of miscommunication with content curator role of content curator feedback learning to rank paired relevance tuning test-driven relevance using with user behavioral data user-focused culture vs. data-driven culture relevance-focused search application deploying designing combine and balance signals combining and balancing signals defining and modeling signals user experience improving information and requirements gathering business needs required and available information users and information needs law of diminishing returns monitoring requests library reranking rescoring response page retail_analyzer filter retail_syn_filter filter retention reweighting boosts S salient features scale variable scorable units score boost, 2nd score shaping boosting additive, with Boolean queries multiplicative, with function queries, 2nd signals defined filtering Solr strategies for achieving users’ recency goals capturing general-quality metrics combining function queries high-value tiers scored with function queries ignoring TF × IDF modeling boosting signals ranking scored documents scoring tiers, 2nd script scoring, 2nd search content exploring providing to search engine searching document search and retrieval aggregations Boolean search facets filtering Lucene-based search positional and phrase matching ranked results relevance sorting documents inverted index data structure analysis enrichment extraction indexing search antipattern search completion choosing method for from documents being searched from user input via specialized search indexes search engineer search relevance collaboration and curation and defined difficulty of class of search and lack of single solution feedback and gaining skills of relevance engineer information retrieval research into systematic approach for improving search-as-you-type searchable data semantic expansion sentiment analysis sentinel tokens, 2nd sharding short-tail application SHOULD clause, 2nd, 3rd, 4th, 5th signal construction signal discordance, 2nd avoiding combining fields into custom all fields mechanics of solving with cross_fields search signal measuring signal modeling best_fields calibrating controlling field preference in results more-precise signals field synchronicity and most_fields, 2nd boosting in when additional matches don’t matter signals boosting, 2nd combining and balancing behavior of signal weights building queries for related signals combining subqueries tuning and testing overall search tuning relevance parameters concept defined defining and modeling implementing source data model silli token similarity simple constants SimpleText data structure, 2nd snippet highlighting Solr analyzers analysis and mapping features building custom field mappings boosting additive, with Boolean queries boosting feature mappings multiplicative, with function queries feedback faceted browsing field collapsing match phrase prefix relevance feedback feature mappings suggestion and highlighting components multifield search all fields cross_fields search ergonomics query differences between Solr and Elasticsearch query feature mappings term-centric and field-centric search with edismax query parser sorting source data model span queries specificity, modeling with paths with synonyms standard analyzer, 2nd, 3rd, 4th standard filter, 2nd standard tokenizer, 2nd, 3rd, 4th, 5th standard_clone analyzer stemming stop filter stop words, 2nd, 3rd stored fields storing metadata string types subdivided text subobjects subquadrants suggest clause suggest endpoint, 2nd suggestion field sum_other_doc_count synonyms augmenting content with modeling specificity with overview, 2nd T term dictionary, 2nd term filter term frequency.


pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Amazon Web Services, anti-pattern, bioinformatics, corporate governance, create, read, update, delete, data acquisition, en.wikipedia.org, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, web application

Common Use Cases | 95 As in the social use case, making an effective recommendation depends on under‐ standing the connections between things, as well as the quality and strength of those connections—all of which are best expressed as a property graph. Queries are primarily graph local, in that they start with one or more identifiable subjects, whether people or resources, and thereafter discover surrounding portions of the graph. Taken together, social networks and recommendation engines provide key differenti‐ ating capabilities in the areas of retail, recruitment, sentiment analysis, search, and knowledge management. Graphs are a good fit for the densely connected data structures germane to each of these areas; storing and querying this data using a graph database allows an application to surface end-user realtime results that reflect recent changes to the data, rather than pre-calculated, stale results. Geo Geospatial is the original graph use case: Euler solved the Seven Bridges of Königsberg problem by positing a mathematical theorem which later came to form the basis of graph theory.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

CHALLENGES REMAIN Locating the right talent to analyze data is the biggest hurdle in building a team. Such talent is in high demand, and the need for data analysts and data scientists continues to grow at an almost exponential rate. Finding this talent means that organizations will have to focus on data science and hire statistical modelers and text data–mining professionals as well as people who specialize in sentiment analysis. Success with Big Data analytics requires solid data models, statistical predictive models, and test analytic models, since these will be the core applications needed to do Big Data. Locating the appropriate talent takes more than just a typical IT job placement; the skills required for a good return on investment are not simple and are not solely technology oriented. Some organizations may turn to consulting firms to meet the need for talent; however, many consulting firms also have trouble finding the experts that can make Big Data pay off.


pages: 204 words: 58,565

Keeping Up With the Quants: Your Guide to Understanding and Using Analytics by Thomas H. Davenport, Jinho Kim

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Black-Scholes formula, business intelligence, business process, call centre, computer age, correlation coefficient, correlation does not imply causation, Credit Default Swap, en.wikipedia.org, feminist movement, Florence Nightingale: pie chart, forensic accounting, global supply chain, Hans Rosling, hypertext link, invention of the telescope, inventory management, Jeff Bezos, margin call, Moneyball by Michael Lewis explains big data, Netflix Prize, p-value, performance metric, publish or perish, quantitative hedge fund, random walk, Renaissance Technologies, Robert Shiller, Robert Shiller, self-driving car, sentiment analysis, six sigma, Skype, statistical model, supply-chain management, text mining, the scientific method

More important is the fact that the map or polygon is fully structured and small in size, even though the original prints were not. While unstructured prints are an input to the process, the actual analysis to match them up doesn’t use the unstructured images, but rather structured information extracted from them. An example everyone will appreciate is the analysis of text. Let’s consider the now popular approach of social media sentiment analysis. Are tweets, Facebook postings, and other social comments directly analyzed to determine their sentiment? Not really. The text is parsed into words or phrases. Then, those words and phrases are flagged as good or bad. In a simple example, perhaps a “good” word gets a 1, a “bad” word gets a –1, and a “neutral” word gets a 0. The sentiment of the posting is determined by the sum of the individual word or phrase scores.


pages: 237 words: 64,411

Humans Need Not Apply: A Guide to Wealth and Work in the Age of Artificial Intelligence by Jerry Kaplan

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Affordable Care Act / Obamacare, Amazon Web Services, asset allocation, autonomous vehicles, bank run, bitcoin, Brian Krebs, buy low sell high, Capital in the Twenty-First Century by Thomas Piketty, combinatorial explosion, computer vision, corporate governance, crowdsourcing, en.wikipedia.org, Erik Brynjolfsson, estate planning, Flash crash, Gini coefficient, Goldman Sachs: Vampire Squid, haute couture, hiring and firing, income inequality, index card, industrial robot, invention of agriculture, Jaron Lanier, Jeff Bezos, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, Loebner Prize, Mark Zuckerberg, mortgage debt, natural language processing, Own Your Own Home, pattern recognition, Satoshi Nakamoto, school choice, Schrödinger's Cat, Second Machine Age, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Skype, software as a service, The Chicago School, Turing test, Watson beat the top human players on Jeopardy!, winner-take-all economy, women in the workforce, working poor, Works Progress Administration

Human traders endeavor to become expert in these matters, but no one comes close to the ability of a synthetic intellect to observe broad or subtle patterns. One of my favorite examples is that the number of prepaid cell phone cards purchased is an indicator of the size of certain crops in Africa, because the individual farmers, watching their crops grow, are preparing to contact potential buyers. The more optimistic they are, the more they spend on talk minutes. The latest foray in this arena uses what’s called “sentiment analysis.” Yes, that kind of sentiment— programs at investment banks scour the Internet for positive or negative comments about products and companies, then trade on the information. The typical justification proffered for doing all this is that HFT programs are providing a service to society. They are simply cleaning up inefficiencies in the markets. But this whitewashes a darker truth. Yes, they make the financial markets nice and tidy, but they obscure a deeper cost.


pages: 268 words: 75,850

The Formula: How Algorithms Solve All Our Problems-And Create More by Luke Dormehl

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, algorithmic trading, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, big data - Walmart - Pop Tarts, call centre, Cass Sunstein, Clayton Christensen, computer age, death of newspapers, deferred acceptance, Edward Lorenz: Chaos theory, Erik Brynjolfsson, Filter Bubble, Flash crash, Florence Nightingale: pie chart, Frank Levy and Richard Murnane: The New Division of Labor, Google Earth, Google Glasses, High speed trading, Internet Archive, Isaac Newton, Jaron Lanier, Jeff Bezos, job automation, Kevin Kelly, Kodak vs Instagram, Marshall McLuhan, means of production, Nate Silver, natural language processing, Netflix Prize, pattern recognition, price discrimination, recommendation engine, Richard Thaler, Rosa Parks, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Slavoj Žižek, social graph, speech recognition, Steve Jobs, Steven Levy, Steven Pinker, Stewart Brand, the scientific method, The Signal and the Noise by Nate Silver, upwardly mobile, Wall-E, Watson beat the top human players on Jeopardy!, Y Combinator

Gild additionally looks at where individuals spend time online, since this has been shown to be a strong predictor of workplace skills. “If you spend a lot of time blogging it suggests that you’re not quite as good a programmer as someone who spends their time on Quora,” Ming says, referring to the question-and-answer website founded by two former Facebook employees. Even Twitter feeds are mined for their insights, using semantic and sentiment analysis. At the end, factors are combined to give prospective employees a “Gild Score” out of 100. “It’s very cool if you’re geeky about algorithms, but the really important take-away is that what we end up with is truly independent dimensions for describing people out in the world,” she says. “We’re talking about algorithms whose entire intent and purpose is to aggregate across your entire life to build up a very accurate representation of who you are.”

Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, web application

., parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database). This is followed by deriving patterns within the structured data, and evaluation and interpretation of the output. “High quality” in text mining usually refers to a combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity-relation modeling (i.e., learning relations between named entities). Other examples include multilingual data mining, multidimensional text analysis, contextual text mining, and trust and evolution analysis in text data, as well as text mining applications in security, biomedical literature analysis, online media analysis, and analytical customer relationship management.

Cormack [BCC10]; Manning, Raghavan, and Schutze [MRS08]; Grossman and Frieder [GR04]; Baeza-Yates and Riberio-Neto [BYRN11]; Zhai [Zha08]; Feldman and Sanger [FS06]; Berry [Ber03]; and Weiss, Indurkhya, Zhang, and Damerau [WIZD04]. Text mining is a fast-developing field with numerous papers published in recent years, covering many topics such as topic models (e.g., Blei and Lafferty [BL09]); sentiment analysis (e.g., Pang and Lee [PL07]); and contextual text mining (e.g., Mei and Zhai [MZ06]). Web mining is another focused theme, with books like Chakrabarti [Cha03a], Liu [Liu06] and Berry [Ber03]. Web mining has substantially improved search engines with a few influential milestone works, such as Brin and Page [BP98]; Kleinberg [Kle99]; Chakrabarti, Dom, Kumar, et al. [CDK+99]; and Kleinberg and Tomkins [KT99].

., SPOOK: A system for probabilistic object-oriented knowledge representation, In: Proc. 15th Annual Conf. Uncertainty in Artificial Intelligence (UAI’99) Stockholm, Sweden. (1999), pp. 541–550. [PKZT01] Papadias, D.; Kalnis, P.; Zhang, J.; Tao, Y., Efficient OLAP operations in spatial data warehouses, In: Proc. 2001 Int. Symp. Spatial and Temporal Databases (SSTD’01) Redondo Beach, CA. (July 2001), pp. 443–459. [PL07] Pang, B.; Lee, L., Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval 2 (2007) 1–135. [Pla98] Platt, J.C., Fast training of support vector machines using sequential minimal optimization, In: (Editors: Schölkopf, B.; Burges, C.J.C.; Smola, A.) Advances in Kernel Methods—Support Vector Learning (1998) MIT Press, Cambridge, MA, pp. 185–208. [PP07] Patcha, A.; Park, J.-M., An overview of anomaly detection techniques: Existing solutions and latest technological trends, Computer Networks 51 (12) (2007) 3448–3470.


pages: 368 words: 96,825

Bold: How to Go Big, Create Wealth and Impact the World by Peter H. Diamandis, Steven Kotler

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, additive manufacturing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, cloud computing, crowdsourcing, Daniel Kahneman / Amos Tversky, dematerialisation, deskilling, Elon Musk, en.wikipedia.org, Exxon Valdez, fear of failure, Firefox, Galaxy Zoo, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, industrial robot, Internet of things, Jeff Bezos, John Harrison: Longitude, Jono Bacon, Just-in-time delivery, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, loss aversion, Louis Pasteur, Mahatma Gandhi, Mark Zuckerberg, Mars Rover, meta analysis, meta-analysis, microbiome, minimum viable product, move fast and break things, Narrative Science, Netflix Prize, Network effects, Oculus Rift, optical character recognition, packet switching, PageRank, pattern recognition, performance metric, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, ride hailing / ride sharing, risk tolerance, rolodex, self-driving car, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart grid, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, telepresence, telepresence robot, Turing test, urban renewal, web application, X Prize, Y Combinator

By offering $0.05 per categorization, I got the entire 65 years’ worth of issues, roughly 3,000 in total, done for under $200. I used Amazon’s site Mechanical Turk (www.mturk.com) to get those magazine covers analyzed. While MTURK isn’t all that useful for more complicated jobs, it is where to go to get simple, quick tasks done fast. Aggregation and classification jobs tend to be popular uses. Aggregate photographs of red trucks, for example, or write product descriptions, or perform sentiment analysis exercises on thousands of Tweets. Requesters (you) post tasks known as HITs (human intelligence tasks) while workers (called providers) browse among existing tasks and complete them for a monetary payment.16 Another microtask site that I’ve previously relied upon (and with great result) is Fiverr (www.fiverr.com), an online marketplace offering microtasks starting at $5. Typical services include voiceovers, animations, crafts, promotional videos, and art.


pages: 322 words: 84,752

Pax Technica: How the Internet of Things May Set Us Free or Lock Us Up by Philip N. Howard

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Affordable Care Act / Obamacare, Berlin Wall, bitcoin, blood diamonds, Bretton Woods, Brian Krebs, British Empire, call centre, Chelsea Manning, citizen journalism, clean water, cloud computing, corporate social responsibility, crowdsourcing, Edward Snowden, en.wikipedia.org, failed state, Fall of the Berlin Wall, feminist movement, Filter Bubble, Firefox, Francis Fukuyama: the end of history, Google Earth, Howard Rheingold, income inequality, informal economy, Internet of things, Julian Assange, Kibera, Kickstarter, land reform, M-Pesa, Marshall McLuhan, megacity, Mikhail Gorbachev, mobile money, Mohammed Bouazizi, national security letter, Network effects, obamacare, Occupy movement, packet switching, pension reform, prediction markets, sentiment analysis, Silicon Valley, Skype, spectrum auction, statistical model, Stuxnet, trade route, uranium enrichment, WikiLeaks, zero day

Liu Yazhou, political commissar of the University of National Defense, published an article in the People’s Liberation Army Daily arguing that today’s internet has become the main battlefield for ideological struggle. “Entering the new century,” he wrote recently, “whoever controls the internet, especially micro-blog resources, will have the right to control opinions.”44 The Party is aware that political conversations over social media have real-world consequences and can provide a metric of public opinion. Senior officials get exclusive access to social media sentiment analysis through the Party’s media research team. One Chinese pollster blames a 10 percent drop in confidence in the Party to the rapid spread of microblogs.45 When moderates and ideologues are given equal access to digital media, people tend to use social media to marginalize extremism, hate speech, and radical ideas. In part, this is because digital networks are ultimately social networks. On a personal level, we often don’t like experiencing “socialization” because it can mean embarrassing correctives to our bad behavior.


pages: 298 words: 43,745

Understanding Sponsored Search: Core Elements of Keyword Advertising by Jim Jansen

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AltaVista, barriers to entry, Black Swan, bounce rate, business intelligence, butterfly effect, call centre, Claude Shannon: information theory, complexity theory, correlation does not imply causation, en.wikipedia.org, first-price auction, information retrieval, inventory management, life extension, linear programming, megacity, Nash equilibrium, Network effects, PageRank, place-making, price mechanism, psychological pricing, random walk, Schrödinger's Cat, sealed-bid auction, search engine result page, second-price auction, second-price sealed-bid, sentiment analysis, social web, software as a service, stochastic process, telemarketer, the market place, The Present Situation in Quantum Mechanics, the scientific method, The Wisdom of Crowds, Vickrey auction, yield management

Sponsored-search analytics.╇ With the increased use of check-in and mobile apps, one would expect to see geo-location-based metrics to measure the increase in foot traffic to brick-and-mortar stores based on sponsored-search advertisements, similar to click-to-call metrics now. Certainly, given the increased availability of consumer data, the future will hold sponsored-search metrics beyond impressions, clicks, and conversions. For example, the increasingly social aspects of Web sites, such as reviews and consumer comments, will likely lead to sentiment-analysis metrics that measure the tone of consumer comments about a brand or ad. This data can potentially affect how quality score is calculated. Already, sponsored-search platforms are offering searchers and consumers the ability to rate ads, so integration of reviews from other sites cannot be far behind. With the increase in tracking devices and use of the Web via many devices such as mobile phones, televisions, and navigation systems, advertisers will have simpler ways to measure the combined reach of television, Web, radio, and mobile advertising in an integrated marketing communication (IMC) approach.


pages: 752 words: 131,533

Python for Data Analysis by Wes McKinney

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

backtesting, cognitive dissonance, crowdsourcing, Debian, Firefox, Google Chrome, index card, random walk, recommendation engine, revision control, sentiment analysis, Sharpe ratio, side project, sorting algorithm, statistical model, type inference

This includes most kinds of data commonly stored in relational databases or tab- or comma-delimited text files Multiple tables of data interrelated by key columns (what would be primary or foreign keys for a SQL user) Evenly or unevenly spaced time series This is by no means a complete list. Even though it may not always be obvious, a large percentage of data sets can be transformed into a structured form that is more suitable for analysis and modeling. If not, it may be possible to extract features from a data set into a structured form. As an example, a collection of news articles could be processed into a word frequency table which could then be used to perform sentiment analysis. Most users of spreadsheet programs like Microsoft Excel, perhaps the most widely used data analysis tool in the world, will not be strangers to these kinds of data. Why Python for Data Analysis? For many people (myself among them), the Python language is easy to fall in love with. Since its first appearance in 1991, Python has become one of the most popular dynamic, programming languages, along with Perl, Ruby, and others.


pages: 525 words: 116,295

The New Digital Age: Transforming Nations, Businesses, and Our Lives by Eric Schmidt, Jared Cohen

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, access to a mobile phone, additive manufacturing, airport security, Amazon Mechanical Turk, Amazon Web Services, anti-communist, augmented reality, Ayatollah Khomeini, barriers to entry, bitcoin, borderless world, call centre, Chelsea Manning, citizen journalism, clean water, cloud computing, crowdsourcing, data acquisition, Dean Kamen, Elon Musk, failed state, fear of failure, Filter Bubble, Google Earth, Google Glasses, hive mind, income inequality, information trail, invention of the printing press, job automation, Julian Assange, Khan Academy, Kickstarter, knowledge economy, Law of Accelerating Returns, market fundamentalism, means of production, mobile money, mutually assured destruction, Naomi Klein, offshore financial centre, peer-to-peer lending, personalized medicine, Peter Singer: altruism, Ray Kurzweil, RFID, self-driving car, sentiment analysis, Silicon Valley, Skype, Snapchat, social graph, speech recognition, Steve Jobs, Steven Pinker, Stewart Brand, Stuxnet, The Wisdom of Crowds, upwardly mobile, Whole Earth Catalog, WikiLeaks, young professional, zero day

Typically, governments put restrictions on the gateway routers that connect the country and on DNS (domain name system) servers. This allows them to either block a website altogether (e.g., YouTube in Iran) or process web content through “deep-packet inspection.” With deep-packet inspection, special software allows the router to look inside the packets of data that pass through it and check for forbidden words, among other things (the use of sentiment-analysis software to screen out negative statements about politicians, for example), which it can then block. Neither technique is foolproof; users can access blocked sites with circumvention technologies like proxy servers (which trick the routers) or by using secure https encryption protocols (which enable private Internet communication that, at least in theory, cannot be read by anyone other than your computer and the website you are accessing), and deep-packet inspection rarely catches every instance of banned content.