linked data

51 results back to index


Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos

AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, wikimedia commons

For example, tabular data in HTML with RDFa annotation using URIs and semantic properties is five-star data. Maximum reusability and machine-interpretability. The expression of rights provided by licensing makes free data reuse possible. Linked Data without an explicit open license1 (e.g., public domain license) cannot be reused freely, but the quality of Linked Data is independent from licensing. When the specified criteria are met, all five ratings can be used both for Linked Data (for Linked Data without explicit open license) and Linked Open Data (Linked Data with an explicit open license). As a consequence, the five-star rating system can be depicted in a way that the criteria can be read with or without the open license.

More and more universities provide information about staff members, departments, facilities, courses, grants, and publications as Linked Data and RDF dump, such as the University of Florida (http://vivo.ufl.edu) and the Ghent University (http://data.mmlab.be/mmlab). Libraries such as the Princeton University Library (http://findingaids.princeton.edu) publish bibliographic information as Linked Data. Part of the National Digital Data Archive of Hungary is available as Linked Data at http://lod.sztaki.hu. Even Project Gutenberg is available as Linked Data (http://wifo5-03.informatik.uni-mannheim.de/ gutendata/). Museums such as the British Museum publish some of their records as Linked Data (http://collection.britishmuseum.org).

Twitter Card Annotation in the Markup <meta name="twitter:card" content="summary" /> <meta name="twitter:site" content="@lesliesikos" /> <meta name="twitter:creator" content="@lesliesikos" /> <meta property="og:url" content="http://www.lesliesikos.com/linked-data-platform-1-0  standardized/" /> <meta property="og:title" content="Linked Data Platform 1.0 Standardized" /> <meta property="og:description" content="The Linked Data Platform 1.0 is now a W3C  Recommendation, covering a set of rules for HTTP operations on Web resources, including  RDF-based Linked Data, to provide an architecture for read-write Linked Data on the  Semantic Web." /> <meta property="og:image" content="http://www.lesliesikos.com/img/LOD.svg" /> 211 Chapter 8 ■ Big Data Applications IBM Watson IBM Watson’s DeepQA system is a question-answering system originally designed to compete with contestants of the Jeopardy!


pages: 315 words: 70,044

Learning SPARQL by Bob Ducharme

database schema, Donald Knuth, en.wikipedia.org, G4S, linked data, semantic web, SPARQL, web application

For example, simply knowing that “spouse” is a symmetric term made it possible to find out the identity of Cindy’s spouse, even though this fact was not part of the dataset. Linked Data The idea of Linked Data is newer than that of the semantic web, but sometimes it’s easier to think of the semantic web as building on the ideas behind Linked Data. Linked Data is not a specification, but a set of best practices for providing a data infrastructure that makes it easier to share data across the web. You can then use semantic web technologies such as RDFS, OWL, and SPARQL to build applications around that data. Tim Berners-Lee came up with these four principles of Linked Data in 2006 (I’ve bolded his wording and added my own commentary): Use URIs as names for things.

., Checking, Adding, and Removing Spoken Language Tags langMatches(), Checking, Adding, and Removing Spoken Language Tags language codes, Making RDF More Readable with Language Tags and Labels, Using the Labels Provided by DBpedia, Checking, Adding, and Removing Spoken Language Tags, Checking, Adding, and Removing Spoken Language Tags checking, adding, and removing, Checking, Adding, and Removing Spoken Language Tags, Checking, Adding, and Removing Spoken Language Tags filtering on, Using the Labels Provided by DBpedia LCASE(), String Functions LIMIT, Retrieving a Specific Number of Results, Federated Queries: Searching Multiple Datasets with One Query Linked Data, What Exactly Is the “Semantic Web”?, Linked Data, Linked Data, Linked Data, Public Endpoints, Private Endpoints, Public Endpoints, Private Endpoints, Glossary intranets and, Public Endpoints, Private Endpoints Linked Open Data, Linked Data, Public Endpoints, Private Endpoints Linked Movie Database, SPARQL and Web Application Development, SPARQL and Web Application Development literal, Data Typing, Glossary LOAD, Adding Data to a Dataset local name, URLs, URIs, IRIs, and Namespaces, Glossary M MAX(), Finding the Smallest, the Biggest, the Count, the Average...

o as variable names, Searching for Strings [], Blank Nodes and Why They’re Useful (see square braces) ^ in property paths, Searching Further in the Data ^^ datatype indicator, Datatypes and Queries _ in blank node names, Blank Nodes and Why They’re Useful | in property paths, Searching Further in the Data || in boolean expressions, Program Logic Functions “"” to delimit strings in Turtle and SPARQL, Representing Strings A a (“a”) as keyword, Reusing and Creating Vocabularies: RDF Schema and OWL abs(), Numeric Functions addition, Comparing Values and Doing Arithmetic AGROVOC thesaurus, Datatypes and Queries APIs, SPARQL, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT arithmetic, Comparing Values and Doing Arithmetic, Comparing Values and Doing Arithmetic ARQ SPARQL processor, Querying the Data, Standalone Processors application development and, Standalone Processors AS, Combining Values and Assigning Values to Variables ASK, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Defining Rules with SPARQL, Defining Rules with SPARQL SPARQL rules and, Defining Rules with SPARQL, Defining Rules with SPARQL asterisk, Searching for Strings, Searching Further in the Data in property paths, Searching Further in the Data in SELECT expression, Searching for Strings AVG(), Finding the Smallest, the Biggest, the Count, the Average..., Grouping Data and Finding Aggregate Values within Groups B bad data, finding, Finding Bad Data, Using Existing SPARQL Rules Vocabularies BASE, Node Type Conversion Functions Berners-Lee, Tim, Why Learn SPARQL?, What Exactly Is the “Semantic Web”?, Linked Data Linked Data and, Linked Data biggest value, finding, Finding the Smallest, the Biggest, the Count, the Average..., Finding the Smallest, the Biggest, the Count, the Average... BIND, Combining Values and Assigning Values to Variables, Creating New Data, Comparing Values and Doing Arithmetic in CONSTRUCT queries, Creating New Data binding, More Realistic Data and Matching on Multiple Triples, Glossary, Glossary blank nodes, Blank Nodes and Why They’re Useful, Blank Nodes and Why They’re Useful, Blank Nodes and Why They’re Useful, Searching with Blank Nodes, Using Existing SPARQL Rules Vocabularies, Node Type Conversion Functions, Glossary searching with, Searching with Blank Nodes square braces to represent, Using Existing SPARQL Rules Vocabularies bnode, Blank Nodes and Why They’re Useful (see blank nodes) boolean datatype, Datatypes and Queries bound(), Finding Data That Doesn’t Meet Certain Conditions, Node Type and Datatype Checking Functions C cast, Glossary casting, Functions ceil(), Numeric Functions CGI scripts, SPARQL and Web Application Development classes, Reusing and Creating Vocabularies: RDF Schema and OWL, Reusing and Creating Vocabularies: RDF Schema and OWL, Creating New Data subclasses and, Reusing and Creating Vocabularies: RDF Schema and OWL CLEAR, Deleting Data COALESCE(), Program Logic Functions comma, Storing RDF in Files, Converting Data CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files comma separated values, Standalone Processors comments (in Turtle and SPARQL), The Data to Query CONCAT(), Program Logic Functions CONSTRUCT, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Copying Data, Converting Data, Changing Existing Data prototyping update commands with, Changing Existing Data CONTAINS(), String Functions, String Functions, Extension Functions converting data, Converting Data, Converting Data copying data, Copying Data, Copying Data COUNT(), Finding the Smallest, the Biggest, the Count, the Average..., Grouping Data and Finding Aggregate Values within Groups CSS, SPARQL and Web Application Development curl utility, SPARQL and Web Application Development D D2RQ, Querying a Remote SPARQL Service, Middleware SPARQL Support data cleanup, FILTERing Data Based on Conditions data typing, Data Typing, Data Typing datatype(), Defining Rules with SPARQL, Node Type and Datatype Checking Functions datatypes, Datatypes and Queries, Datatype Conversion, Datatype Conversion converting, Datatype Conversion, Datatype Conversion custom, Datatypes and Queries date datatype, Datatypes and Queries date ranges in queries, Comparing Values and Doing Arithmetic dateTime datatype, Datatypes and Queries day(), Date and Time Functions DBpedia, Querying a Public Data Source, Using the Labels Provided by DBpedia, SPARQL and Web Application Development querying, Querying a Public Data Source decimal datatype, Datatypes and Queries default graph, Querying Named Graphs, Glossary DELETE, Deleting Data DELETE DATA, Deleting Data, Deleting Data DELETE vs., Deleting Data DESC(), Sorting Data DESCRIBE, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Asking for a Description of a Resource DISTINCT, Eliminating Redundant Output, Eliminating Redundant Output, Querying Named Graphs division, Comparing Values and Doing Arithmetic double precision datatype, Datatypes and Queries DROP, Dropping Graphs Dublin Core, URLs, URIs, IRIs, and Namespaces, Changing Existing Data, Glossary E ENCODE_FOR_URI(), String Functions entailment, The SPARQL Specifications, Glossary F FILTER, Searching for Strings, FILTERing Data Based on Conditions, FILTERing Data Based on Conditions float datatype, Datatypes and Queries floor(), Numeric Functions FOAF (Friend of a Friend), URLs, URIs, IRIs, and Namespaces, Storing RDF in Files, Converting Data, Hash Functions, Glossary hash functions in, Hash Functions Freebase, SPARQL and Web Application Development FROM, Querying the Data, Querying Named Graphs, Copying Data in CONSTRUCT queries, Copying Data FROM NAMED, Querying Named Graphs Fuseki, Getting Started with Fuseki, Getting Started with Fuseki, Adding Data to a Dataset loading data into, Adding Data to a Dataset shutting down, Getting Started with Fuseki starting up, Getting Started with Fuseki G GRAPH, Querying Named Graphs, Querying Named Graphs, Querying Named Graphs, Copying Data, Named Graphs in CONSTRUCT queries, Copying Data in update queries, Named Graphs referencing graphs not named in FROM NAMED clause, Querying Named Graphs variables with, Querying Named Graphs graph pattern, More Realistic Data and Matching on Multiple Triples, Glossary graphs (RDF), More Realistic Data and Matching on Multiple Triples, Glossary GROUP BY, Grouping Data and Finding Aggregate Values within Groups GROUP_CONCAT(), Finding the Smallest, the Biggest, the Count, the Average...


pages: 511 words: 111,423

Learning SPARQL by Bob Ducharme

Donald Knuth, en.wikipedia.org, G4S, hypertext link, linked data, place-making, semantic web, SPARQL, web application

We’ll learn more about RDFS and OWL in Chapter 9. Linked Data The idea of Linked Data is newer than that of the semantic web, but sometimes it’s easier to think of the semantic web as building on the ideas behind Linked Data. Linked Data is not a specification, but a set of best practices for providing a data infrastructure that makes it easier to share data across the Web. You can then use semantic web technologies such as RDFS, OWL, and SPARQL to build applications around that data. Tim Berners-Lee came up with these four principles of Linked Data in 2006 (I’ve bolded his wording and added my own commentary): Use URIs as names for things.

., Checking, Adding, and Removing Spoken Language Tags langMatches(), Checking, Adding, and Removing Spoken Language Tags language codes, Making RDF More Readable with Language Tags and Labels, Checking, Adding, and Removing Spoken Language Tags–Checking, Adding, and Removing Spoken Language Tags adding, Checking, Adding, and Removing Spoken Language Tags checking, Checking, Adding, and Removing Spoken Language Tags filtering on, Using the Labels Provided by DBpedia removing, Checking, Adding, and Removing Spoken Language Tags LCASE(), String Functions, Discussion LIMIT, Retrieving a Specific Number of Results, Federated Queries: Searching Multiple Datasets with One Query Linked Data, What Exactly Is the “Semantic Web”?, Linked DataLinked Data, Problem, Glossary intranets and, Public Endpoints, Private Endpoints Linked Open Data, Linked Data, Public Endpoints, Private Endpoints Linked Movie Database, SPARQL and Web Application Development, SPARQL and Web Application Development Linked Open Data, Discussion List All Triples query, Named Graphs literal, Data Typing, Glossary LOAD, Adding Data to a Dataset local name, URLs, URIs, IRIs, and Namespaces, Extension Functions, Glossary M magic properties (see property functions) materialization of triples, Inferred Triples and Your Query MAX(), Finding the Smallest, the Biggest, the Count, the Average...

This means that a good understanding of the role of URIs gives you greater control over your queries. Note The URIs that identify RDF resources are like the unique ID fields of relational database tables, except that they’re universally unique, which lets you link data from different sources around the world instead of just linking data from different tables in the same database. The Resource Description Framework (RDF) In Chapter 1, we learned the following about the Resource Description Framework: It’s a data model in which the basic unit of information is known as a triple. A triple consists of a subject, a predicate, and an object.


The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Bayesian statistics, business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, discrete time, disruptive innovation, George Gilder, Google Earth, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, late capitalism, lifelogging, linked data, longitudinal study, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

Conclusion At one level, the case for open and linked data is commonsensical – open data create transparency and accountability; participation, choice and social innovation; efficiency, productivity and enhanced governance; economic innovation and wealth creation. Linked data convert information across the Internet into a semantic web from which data can be machine-read and linked together. Open and linked data thus hold much promise and value as a venture. However, the case for open and linked data is more complex, and their economic underpinnings are not at all straightforward. Open and linked data might seem to have marginal costs, but their production and the technical and institutional apparatus needed to facilitate and maintain them has real cost in terms of labour, equipment, and resources.

When documents are published in this way, information on the Internet can be rendered and repackaged as data and can be linked in an infinite number of ways depending on purpose. However, as P. Miller (2010) notes, ‘linked data may be open, and open data may be linked, but it is equally possible for linked data to carry licensing or other restrictions that prevent it being considered open’, or for open data to be made available in ways that do not easily enable linking. In general, any linked documents that are not on an intranet or behind a pay wall are also open in nature. For Berners-Lee (2009), open and linked data should ideally be synonymous and he sets out five levels of such data, each with progressively more utility and value (see Table 3.3).

Since the late 2000s the movement has noticeably gained prominence and traction, initially with the Guardian newspaper’s campaign in the UK to ‘Free Our Data’ (www.theguardian.com/technology/free-ourdata), the Organization for Economic Cooperation and Development (OECD)’s call for member governments to open up their data in 2008, the launch in 2009 by the US government of data.gov, a website designed to provide access to non-sensitive and historical datasets held by US state and federal agencies, and the development of linked data and the promotion of the ‘Semantic Web’ as a standard element of future Internet technologies, in which open and linked data are often discursively conjoined (Berners-Lee 2009). Since 2010 dozens of countries and international organisations (e.g., the European Union [EU] and the United Nations Development Programme [UNDP]) have followed suit, making thousands of previously restricted datasets open in nature for non-commercial and commercial use (see DataRemixed 2013).


pages: 369 words: 80,355

Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room by David Weinberger

airport security, Alfred Russel Wallace, Amazon Mechanical Turk, Berlin Wall, Black Swan, book scanning, Cass Sunstein, commoditize, corporate social responsibility, crowdsourcing, Danny Hillis, David Brooks, Debian, double entry bookkeeping, double helix, en.wikipedia.org, Exxon Valdez, Fall of the Berlin Wall, future of journalism, Galaxy Zoo, Hacker Ethic, Haight Ashbury, hive mind, Howard Rheingold, invention of the telegraph, jimmy wales, Johannes Kepler, John Harrison: Longitude, Kevin Kelly, linked data, Netflix Prize, New Journalism, Nicholas Carr, Norbert Wiener, openstreetmap, P = NP, Pluto: dwarf planet, profit motive, Ralph Waldo Emerson, RAND corporation, Ray Kurzweil, Republic of Letters, RFID, Richard Feynman, Ronald Reagan, semantic web, slashdot, social graph, Steven Pinker, Stewart Brand, technological singularity, Ted Nelson, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas Malthus, Whole Earth Catalog, X Prize

The rise of Linked Data encapsulates the transformation of knowledge we have explored throughout this book. While the original Semantic Web emphasized building ontologies that are “knowledge representations” of the world, it turns out that if we go straight to unleashing an abundance of linked but imperfect data, making it widely and openly available in standardized form, the Net becomes a dramatically improved infrastructure for knowledge. Linked Data is nevertheless itself only an example of a more expansive practice: Create metadata so your information can be reused. Linked Data is usable because it points beyond itself to information about the information.

For example, when an article in the journal Public Library of Science Medicine 43 examines “the predictors of live birth” in in vitro fertilization by analyzing 144,018 attempts, it links to the UK open government site where the source data—“the world’s oldest and most comprehensive database of fertility treatment in the UK”—is available.44 The new default is: If you’re going to cite the data, you might as well link to it. Networked facts point to where they came from and, sometimes, where they lead to. Indeed, a new standard called Linked Data is making it easier to make the facts presented in one site useful to other sites in unanticipated ways—enabling an ad hoc worldwide data commons. Key to Linked Data is the ability for a computer program not only to get the fact but to ask the resource for a link to more information about the context of the fact.45 Facts have become networked because our new information infrastructure happens also to be a hyperlinked publishing system.

We used to need trust because paper-based publishing breaks knowledge off from its source. Now, however, science—which has always had a network of inter-cited publications—occurs within a network of links. We create these links by hand, computers prowl the Web suggesting new links, and the surge of interest in the Linked Data format is making it easier than ever to create clouds of linked data just waiting for new uses. In this hyperlinked environment, we will continue to tell science’s stories, but those stories will be embedded within a system of connections. We will click to see the data. We will click to have our computers compare disparate datasets, surfacing the anomalies and disagreements that will never be entirely driven out from the data of science or from its stories.


RDF Database Systems: Triples Storage and SPARQL Query Processing by Olivier Cure, Guillaume Blin

Amazon Web Services, bioinformatics, business intelligence, cloud computing, database schema, fault tolerance, full text search, functional programming, information retrieval, Internet Archive, Internet of things, linked data, NP-complete, peer-to-peer, performance metric, random walk, recommendation engine, RFID, semantic web, Silicon Valley, social intelligence, software as a service, SPARQL, web application

See Knowledge base management system (KBMS) Key-value stores, 27 Knowledge base management system (KBMS), 192 Kowari system, 112 L Last-to-front mapping property, 91 Lehigh University benchmark (LUBM), 77 data set, 98, 123 Linked Data Integration Benchmark (LODIB), 78 Linked data movement, 181 LinkedIn, 99 LinkedMDb, 168 Linked open data (LOD), 3 Literal, full text search, 99 analyzing text, 100 Load scalability, 170 LOD. See Linked open data (LOD) LODIB. See Linked Data Integration Benchmark (LODIB) LUBM. See Lehigh University benchmark (LUBM) Lucene software library, 176 Lucene documents, 100 M MAAN. See Multi-attribute addressable network (MAAN) MapReduce -based cluster, 32, 109 MapReduce decompression approach, 103 MapReduce processing, 9 MapReduce programming model, 102 MapReduce tasks, 37 MariaDB system, 23 MarkLogic system, 32, 138, 179 MaRVIN system, 217 Maximum-weight independent sets problem, 153 Mediator-based information systems, 187 Membase system, 137 Memcached system, 29 Memory mapping, 81 MemSQL system, 38 Message passing interface (MPI) approach, 216 Microformats, 52 Model checking, 193 MonetDB, 19 MongoDB system, 28, 32, 137 MPI approach.

The main advantages of JSON are its simplicity, flexibility (it’s schemaless), and native processing support for most Web applications due to a tight integration with the JavaScript programming language. But RDF is not without assets. For example, as a semi-structured data model, RDF data sets can be described with expressive schema languages, such as RDF Schema (RDFS) or Web Ontology Language (OWL), and can be linked to other documents present on the Web, forming the Linked Data movement. With the emergence of Linked Data, a pattern for hyperlinking machine-readable data sets that extensively uses RDF, URIs, and HTTP, we can consider that more and more data will be directly produced in or transformed into RDF. In 2013, the linked open data (LOD), a set of RDF data produced from open data sources, is considered to contain over 50 billion triples on domains as diverse as medicine, culture, and science, just to name a few.

The FedBench Benchmark (http://fedbench.fluidops.net/) uses several data sets (around 10, among which there are DBpedia subsets, NewYork Times, LinkedMDB, and Drugbank) on cross and life science domains (news, movies, music, drugs, etc.). The major aim of FedBench is to test the efficiency and effectiveness of federated query processing. Other benchmarks, such as Linked Data Integration Benchmark (LODIB) or JustBench, are designed to evaluate other properties of related systems, such as considering linked data (i.e., with real-world heterogeneities) or OWL capabilities of reasoners. 3.8 BUILDING SEMANTIC WEB APPLICATIONS Jena (http://jena.apache.org/) is an open-source Semantic Web framework for Java and is widely used in the Java community.


pages: 245 words: 68,420

Content Everywhere: Strategy and Structure for Future-Ready Content by Sara Wachter-Boettcher

crowdsourcing, John Gruber, Kickstarter, linked data, search engine result page, semantic web, Silicon Valley

But a more semantic Web seems closer than ever with the recent advent of linked data, which is made possible through structured content and markup. Coined by Tim Berners-Lee—yes, the guy who invented the World Wide Web—in 2006, linked data means exactly what it sounds like: bits of information that are linked to other, equivalent sets of data elsewhere on the Internet (often referred to as “in the cloud”), as illustrated in Figure 6.1. The idea is that, as opposed to HTML links, which link one document (e.g., a page) to another, linked data connects the things those pages are about by connecting the actual data behind those two pages instead.

This gives both databases access to the information in the other, and that information then becomes more useful to both people and machines. FIGURE 6.1 Linked data connects content from different places, like between your website and Wikipedia, based on shared content attributes—and it’s getting more and more useful for connecting content across sources. For example, consider The New York Times. Since the 19th century, it’s been maintaining a tremendous index of people, organizations, places, and descriptors in the news. Starting in 1913, it began publishing that data first in a quarterly index, and later an annual one.1 Now that its collection has been digitized, the Times has opened it up as linked data at http://data.nytimes.com, making this extensive list of topics—well over 10,000 as of this writing, with plans to continually add more—accessible to anyone who wants it.

And all these pages of content are built automatically, using the content’s underlying structure to dictate what’s contextually relevant where. Finally, remember our introduction to linked data in Chapter 6, “Understanding Markup”? Well, the BBC is making use of that, too. Rather than, say, hiring writers to craft overviews of every animal the BBC has video footage about, the organization relies on content from other sources, accessible via linked data. That is, by structuring content along the same lines as sources like Wikipedia, the BBC can automatically pull in the content it doesn’t have—and isn’t invested enough in to create—from an external source.


pages: 223 words: 52,808

Intertwingled: The Work and Influence of Ted Nelson (History of Computing) by Douglas R. Dechow

3D printing, Apple II, Bill Duvall, Brewster Kahle, Buckminster Fuller, Claude Shannon: information theory, cognitive dissonance, computer age, conceptual framework, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, game design, HyperCard, hypertext link, Ian Bogost, information retrieval, Internet Archive, Jaron Lanier, knowledge worker, linked data, Marc Andreessen, Marshall McLuhan, Menlo Park, Mother of all demos, pre–internet, RAND corporation, semantic web, Silicon Valley, software studies, Steve Jobs, Steve Wozniak, Stewart Brand, Ted Nelson, the medium is the message, Vannevar Bush, Wall-E, Whole Earth Catalog

So for me this really was a seminal conference with so many truly ground breaking ideas emerging at the same time, apparently orthogonal to each other but actually all the same thing as time has confirmed, since the Google Knowledge Graph is the Semantic Web or ZigZag by another name. It’s all about linking data. This is a much quieter revolution than that initiated by the document Web but it will be much more far reaching. Linked data will become an integral part of the development of data-driven systems architectures that will revolutionize the way we build and maintain information management systems. Linked data architectures will supersede relational databases, make websites easier to build and unify the worlds of hypertext, document management, and databases to create rich interlinked knowledge-based systems as envisaged by the pioneers such as Ted and Doug over 50 years ago.

Linked data architectures will supersede relational databases, make websites easier to build and unify the worlds of hypertext, document management, and databases to create rich interlinked knowledge-based systems as envisaged by the pioneers such as Ted and Doug over 50 years ago. But the linked data revolution was very slow to take off—largely because it’s hard to explain the key concepts to people and what the benefits are. In 2004, it seemed to have completely stalled. Analyzing why this was the case is a much longer story than I have time to tell here, but as a by-product of doing this analysis at the time, Tim, Nigel Shadbolt, Danny Weitzner, and I started to look back at the factors that made the web of linked documents take off in order to try and understand why the web of linked data wasn’t. We realized that to understand the ecosystem that is the Web we have to take a socio-technical approach.

Agosti M, Ferr N (2007) A formal model of annotations of digital content. ACM Trans Inf Syst 26(1). doi:10.​1145/​1292591.​1292594 2. Baca M (1998) Introduction to metadata: pathways to digital information. Getty Information Institute, Los Angeles 3. Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, Goble C et al (2013) Why linked data is not enough for scientists. Futur Gener Comput Syst 29(2). Special section: Recent advances in e-Science: 599–611. doi:10.​1016/​j.​future.​2011.​08.​004 4. Bechhofer S, De Roure D, Gamble M, Goble C, Buchan I (2010) Research objects: towards exchange and reuse of digital knowledge. Nat Proc. doi:10.​1038/​npre.​2010.​4626.​1 5.


The Art of SEO by Eric Enge, Stephan Spencer, Jessie Stricchiola, Rand Fishkin

AltaVista, barriers to entry, bounce rate, Build a better mousetrap, business intelligence, cloud computing, dark matter, en.wikipedia.org, Firefox, Google Chrome, Google Earth, hypertext link, index card, information retrieval, Internet Archive, Law of Accelerating Returns, linked data, mass immigration, Metcalfe’s law, Network effects, optical character recognition, PageRank, performance metric, risk tolerance, search engine result page, self-driving car, sentiment analysis, social web, sorting algorithm, speech recognition, Steven Levy, text mining, web application, wikimedia commons

Figure 10-51 and Figure 10-52 depict some example graphs showing the rate of new external links (and in the last two instances, pages) created over time, with some speculation as to what the trends might indicate. Figure 10-51. Interpreting new external link data Figure 10-52. More link data speculation These assumptions do not necessarily hold true for every site or instance, but the graphs make it easy to see how the engines can use temporal link and content growth information to make guesses about the relevance or worthiness of a particular site. Figure 10-53 shows some guesstimates of a few real sites and how these trends have affected them. Figure 10-53. Wikipedia link data guesstimates As you can see in Figure 10-53, Wikipedia has had tremendous growth in both pages and links from 2007 through 2011.

Google and Bing Webmaster Tools As mentioned earlier, other valuable sources of data include Google Webmaster Tools and Bing Webmaster Tools. We cover these extensively in Using Search Engine–Supplied SEO Tools. From a planning perspective, you will want to get these tools in place as soon as possible. Both tools provide valuable insight into how the search engines see your site. This includes things such as external link data, internal link data, crawl errors, high-volume search terms, and much, much more. Note Some companies will not want to set up these tools because they do not want to share their data with the search engines, but this is a nonissue as the tools do not provide the search engines with any more data about your website; rather, they let you see some of the data the search engines already have.

This plug-in provides basic link data on the fly with just a couple of mouse clicks. Figure 10-23 shows the menu you’ll see with regard to backlinks. Notice also in the figure that the SearchStatus plug-in offers an option for highlighting NoFollow links, as well as many other capabilities. It is a great tool that allows you to pull numbers such as these much more quickly than would otherwise be possible. Figure 10-23. Firefox SearchStatus plug-in Third-party link-measuring tools Here is a look at some of the better-known advanced third-party tools for gathering link data. Open Site Explorer Open Site Explorer was developed based on crawl data obtained by SEOmoz, plus a variety of parties engaged by SEOmoz.


Beautiful Visualization by Julie Steele

barriers to entry, correlation does not imply causation, data acquisition, database schema, Drosophila, en.wikipedia.org, epigenetics, global pandemic, Hans Rosling, index card, information retrieval, iterative process, linked data, Mercator projection, meta-analysis, natural language processing, Netflix Prize, pattern recognition, peer-to-peer, performance metric, QR code, recommendation engine, semantic web, social graph, sorting algorithm, Steve Jobs, web application, wikimedia commons, Yochai Benkler

However, choosing an effective presentation is challenging, as not all information visualizations are created equally. Not all information visualizations highlight the patterns, gaps, and outliers important to analysts’ tasks, and furthermore, not all information visualizations “force us to notice what we never expected to see” (Tukey 1977). A growing trend in data analysis is to make sense of linked data as networks. Rather than looking solely at attributes of data, network analysts also focus on the connections between data and the resulting structures. My research focuses on understanding these networks because they are topical, emergent, and inherently challenging for analysts. Networks are difficult to visualize and navigate, and, most problematically, it is difficult to find task-relevant patterns.

If we’re starting from a graph representation of the database, as defined in Figure 14-2, this is a simple task. All we need is a nodeset and an edgeset, which can be easily produced from a relational set of tables; it might even come for free if the database is available in the form of an RDF dump (Freebase 2009) or as Linked Data (Bizer, Heath, and Berners-Lee 2009). From there, we can easily produce a node-link diagram using a graph drawing program such as Cytoscape (Shannon et al. 2003)—an open source application that has its roots in the biological networks scientific community. The resulting diagram, shown in Figure 14-3, depicts the given data model in a similar way as a regular Entity-Relationship (E-R) data structure diagram (Chen 1976), enriched with some quantitative information about the actual data.

The CENSUS data model as a weighted node-link diagram The heterogeneity of node and link type frequency evidenced in Figure 14-3 is not restricted to our example. It is observable in many datasets, including research databases (Schich and Ebert-Schifferer 2009), large bibliographies (Schich et al. 2009), Freebase, and the Linked Data cloud, regardless of whether the number of types is predefined or expandable by the curators. In all cases that I have seen so far, both the number of nodes per node type and the number of links per link type exhibit right-skewed diminishing distributions, which are widely known as long tails (Anderson 2006, Newman 2005), and lack a shared average as found in a normal Gaussian distribution.


pages: 100 words: 15,500

Getting Started with D3 by Mike Dewar

Firefox, Google Chrome, linked data

First, we lay out the circles and edges: var width = 1500, height = 1500; var svg = d3.select("body") .append("svg") .attr("width", width) .attr("height", height); var node = svg.selectAll("circle.node") .data(data.nodes) .enter() .append("circle") .attr("class", "node") .attr("r", 12); var link = svg.selectAll("line.link") .data(data.links) .enter().append("line") .style("stroke","black"); This populates the web page with the appropriate elements, we just need to lay them out. The force layout applies a force-directed algorithm to decide the position of each node. Here, each node feels a repulsive force from every other node, but is constrained by the edges that keep nodes connected together.

Here, each node feels a repulsive force from every other node, but is constrained by the edges that keep nodes connected together. This can result in an organic layout that looks wonderfully inviting as it unfolds. D3 makes it easy; first we instantiate the algorithm: var force = d3.layout.force() .charge(-120) .linkDistance(30) .size([width, height]) .nodes(data.nodes) .links(data.links) .start(); These methods are all custom methods for the algorithm that detail the various parameters and references the algorithm needs to compute how the position of the nodes and edges should change. We then use it to modify the appropriate attributes of our lines and circles: force.on("tick", function() { link.attr("x1", function(d) { return d.source.x; }) .attr("y1", function(d) { return d.source.y; }) .attr("x2", function(d) { return d.target.x; }) .attr("y2", function(d) { return d.target.y; }); node.attr("cx", function(d) { return d.x; }) .attr("cy", function(d) { return d.y; }); }); The layout algorithm generates a tick event, which corresponds to a single step of the layout algorithm.


pages: 713 words: 93,944

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Eric Redmond, Jim Wilson, Jim R. Wilson

AGPL, Amazon Web Services, create, read, update, delete, data is the new oil, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, full text search, general-purpose programming language, Kickstarter, linked data, MVC pattern, natural language processing, node package manager, random walk, recommendation engine, Ruby on Rails, Skype, social graph, web application

For example, if the text of the article on Star Wars contains the string "[[Yoda|jedi master]]", we want to store that relationship twice—once as an outgoing link from Star Wars and once as an incoming link to Yoda. Storing the relationship twice means that it’s fast to look up both a page’s outgoing links and its incoming links. To store this additional link data, we’ll create a new table. Head over to the shell and enter this: ​​hbase> create 'links', {​​ ​​ NAME => 'to', VERSIONS => 1, BLOOMFILTER => 'ROWCOL'​​ ​​},{​​ ​​ NAME => 'from', VERSIONS => 1, BLOOMFILTER => 'ROWCOL'​​ ​​}​​ In principle, we could have chosen to shove the link data into an existing column family or merely added one or more additional column families to the wiki table, rather than create a new one. Creating a separate table has the advantage that the tables have separate regions.

The real strength of graph databases is traversing through the nodes by following relationships. In Chapter 7, ​Neo4J​, we discuss the most popular graph database today, Neo4J. Neo4J One operation where other databases often fall flat is crawling through self-referential or otherwise intricately linked data. This is exactly where Neo4J shines. The benefit of using a graph database is the ability to quickly traverse nodes and relationships to find relevant data. Often found in social networking applications, graph databases are gaining traction for their flexibility, with Neo4j as a pinnacle implementation.

​​$ curl -X PUT http://localhost:8091/riak/cages/2 \​​ ​​-H "Content-Type: application/json" \​​ ​​-H "Link:</riak/animals/ace>;riaktag=\"contains\",​​ ​​ </riak/cages/1>;riaktag=\"next_to\"" \​​ ​​-d '{"room" : 101}'​​ What makes Links special in Riak is link walking (and a more powerful variant, linked mapreduce queries, which we investigate tomorrow). Getting the linked data is achieved by appending a link spec to the URL that is structured like this: /_,_,_. The underscores (_) in the URL represent wildcards to each of the link criteria: bucket, tag, keep. We’ll explain those terms shortly. First let’s retrieve all links from cage 1. ​​$ curl http://localhost:8091/riak/cages/1/_,_,_​​ ​​--4PYi9DW8iJK5aCvQQrrP7mh7jZs​​ ​​Content-Type: multipart/mixed; boundary=Av1fawIA4WjypRlz5gHJtrRqklD​​ ​​​​ ​​--Av1fawIA4WjypRlz5gHJtrRqklD​​ ​​X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fvrde/U5gymRMY+VwZw35gRfFgA=​​ ​​Location: /riak/animals/polly​​ ​​Content-Type: application/json​​ ​​Link: </riak/animals>; rel="up"​​ ​​Etag: VD0ZAfOTsIHsgG5PM3YZW​​ ​​Last-Modified: Tue, 13 Dec 2011 17:53:59 GMT​​ ​​​​ ​​{"nickname" : "Sweet Polly Purebred", "breed" : "Purebred"}​​ ​​--Av1fawIA4WjypRlz5gHJtrRqklD--​​ ​​​​ ​​--4PYi9DW8iJK5aCvQQrrP7mh7jZs--​​ It returns a multipart/mixed dump of headers plus bodies of all linked keys/values.


Cataloging the World: Paul Otlet and the Birth of the Information Age by Alex Wright

1960s counterculture, Ada Lovelace, barriers to entry, British Empire, business climate, business intelligence, Cape to Cairo, card file, centralized clearinghouse, corporate governance, crowdsourcing, Danny Hillis, Deng Xiaoping, don't be evil, Douglas Engelbart, Douglas Engelbart, Electric Kool-Aid Acid Test, European colonialism, Frederick Winslow Taylor, hive mind, Howard Rheingold, index card, information retrieval, invention of movable type, invention of the printing press, Jane Jacobs, John Markoff, Kevin Kelly, knowledge worker, Law of Accelerating Returns, linked data, Livingstone, I presume, lone genius, Menlo Park, Mother of all demos, Norman Mailer, out of africa, packet switching, profit motive, RAND corporation, Ray Kurzweil, Scramble for Africa, self-driving car, semantic web, Silicon Valley, speech recognition, Steve Jobs, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, the scientific method, Thomas L Friedman, urban planning, Vannevar Bush, Whole Earth Catalog

One year after writing that essay, he established a company called MetaWeb that created Freebase, which he characterized as an “open, shared database of the world’s knowledge.” In 2010, he sold the company to Google, where its structured snippets now often complement traditional keyword-based search results. In recent years, the Linked Data movement has to some extent subsumed the Semantic Web initiative. Linked Data proposes more of a middle ground, in which ontologies might be derived programmatically from analyzing large data sets, rather than manually created by teams of experts.12 This middle way approach might incorporate some of Otlet’s ideas: a topical structure further refined by automated discovery, bidirectional linking, and the ability to extract content from static documents, then synthesize and interpolate it in new ways.13 278 E ntering the S trea m In a widely circulated 2005 essay, “Ontology Is Overrated,” Clay Shirky argues that projects like the Semantic Web were doomed to failure in the Internet age.

See also Dewey Decimal System development of, 226–227 expanded use of, 40, 232 Josephinian Catalog, 33 playing cards, use of, 33 rejecting Universal Bibliography, 72 significance of, 33 standardized catalog cards, 105 supplies for, as business, 41–42 Library of Congress, 20, 29, 37 Licklider, J. C. R., 15, 248–250, 251, 258, 259 Limited Company for Useful Knowledge, 46 Limousin, Charles, 76 Linked Data movement, 278 Linotype, 89, 91, 92 Lippman, Walter, 162 Literary Machines (Nelson), 266 Lodge, Henry Cabot, 148, 165 Lovelace, Ada, 15 Lumière brothers, 62 Macintosh operating system, 260 Malware, 272 Man-Computer Symbiosis (Licklider), 248 Marburg, Theodore, 143 MARK II computer, 258 Markoff, John, 260 Marlowe, Christopher, 24 Marx, Karl, 59 The Master (Tóibín), 127 Masure, Louis, 158 Max, Adolphe, 105 Mazower, Mark, 67 Mechanical collective brain, 206, 218, 287 Mein Kampf (Hitler), 68 Memex, 217, 254, 256, 256–257 Mergenthaler, Otto, 89 Meta-bibliography, 242 MetaWeb (company), 278 Metric system, 30, 150 Microcosm project (England), 270 Microfilm, 100, 193, 200, 208, 210, 218, 250, 255, 274 Microphotic book, 101–107 Microphotography, 101, 208 Military origins of computers and Internet, 18, 248, 252, 258, 265 A Model Utopia (Wells), 211 Modernism, 179, 191 Mondotheque, 235, 238, 257, 296 Mons (Belgian city), 300 Morel, Edward, 54 Morgan, Pierpont, 125 Morris, William, 36 Morse, Samuel, 90 Motion pictures, possibilities of, 228–229 La Muette de Portici (The Mute Girl of Portici, opera), 44 Multimedia, envisioning of, 199 344 INDEX Mumford, Lewis, 115, 302 Mundaneum, 176–189 Berner-Lee’s views compared to, 274 compared to World Wide Web, 19, 234, 253–254, 277–278, 298 creation of, 5, 176–177, 179 design of, 177, 181–183, 182, 185–188, 187, 277 Encyclopedia Universalis Mundaneum (EUM) and, 193 goals of, 18, 177, 185, 292, 304–305 Google partnership with, 295–297 Le Corbusier’s role in, 181–188 Mons location of, 300–301 Nelson’s views compared to, 266 obscurity of, 11–12 Otlet’s description of, 18, 234–235, 238–239, 242, 243 role in utopian World City, 9, 303 World War II fate of, 9–11, 10 Mundaneum (Le Corbusier and Otlet pamphlet), 181, 182 Murray, James, 32 Musée d’Otlet (childhood display by Otlet), 46 Muséothèque (exhibition kit), 194 Museum for the Book (Brussels), 92–93 Museum of Society and Economy (Vienna), 194–195, 196–197 Museums and museum exhibits, role of, 102, 190–201, 227 Mussolini, 189 National Association for the Advancement of Colored People (NAACP), 168, 171 Nationalism, 144, 245 National Science Foundation, 252, 268 Nazis.

See Palais Mondial Worldstream, 291 World War I, 17, 18, 144–145 World War II Nazi book seizures and burnings, 4–5, 7, 12 Nazi occupation of Belgium, 18 Nazi persecution of Goldberg, 210 Nazi persecution of Zamenhof, 68 Otlet’s attempt to save Mundaneum, 10–11 Rosenberg Commission’s interest in Otlet, 4, 5, 7, 13, 245 World Wide Web. See also Internet flatness of, 285, 303 fundamental disorder of, 253–254, 282, 305 Knowledge Web, 276 Linked Data movement, 278 negatives of structure of, 272, 281, 289–291 ongoing development of, 280, 291 openness of, 271–272, 279, 281, 283, 285 origins of, 14, 15, 217, 252–253, 262, 270–275 Otlet’s prophetic vision of, 8, 14–15, 233–234 popularity of, 289 Semantic Web, 273–276, 278–279, 305 World Wide Web Consortium (W3C), 271, 273, 281 Wright, Frank Lloyd, 181, 262 Writers and economic chain of knowledge production, 231–232 WWW Consortium, 253 Xanadu, 264, 267 Xerox PARC (Palo Alto Research Center), 260 Young Friends of the World Palace, 202 Zamenhof, Ludwig, 67–68, 206 Zeiss Ikon camera company, 208, 210 Zero, Mr.


pages: 430 words: 68,225

Blockchain Basics: A Non-Technical Introduction in 25 Steps by Daniel Drescher

bitcoin, blockchain, business process, central bank independence, collaborative editing, cryptocurrency, disintermediation, disruptive innovation, distributed ledger, Ethereum, ethereum blockchain, fiat currency, job automation, linked data, peer-to-peer, place-making, Satoshi Nakamoto, smart contracts, transaction costs

Since broken hash references serve as evidence that data were changed after the reference was created, the whole construct stores data in a change-sensitive manner. How It Works There are two classical patterns of using hash references in order to store data in a change-sensitive manner: • The chain • The tree Blockchain Basics 87 The Chain A chain of linked data, also called a linked list, 2 is formed when each piece of data also contains a hash reference to another piece of data. Such a structure is useful for storing and linking data together that are not fully available at one given point in time but instead arrive step by step in an ongoing fashion. Figure 11-4 illustrates this idea by using the symbols introduced above. The creation of such a chain starts with the piece of data labeled Data 1 and the creation of the hash reference R1.

Architecture and its underlying concepts Blockchain Basics 199 Consensus Logic Since all the nodes of the distributed system maintain their history of transaction data independently, their content can differ due to delays or other adversities of passing messages through a network. As a result, the data store that was meant to form a straight line of linked data blocks actually forms a three-shaped data structure where each branch represents a conflicting version of the transaction history. The consensus logic as depicted in Figure 21-6 makes all nodes of the system eventually consistent by making them choose the identical version of the transaction history that unites the most collective effort.


pages: 193 words: 19,478

Memory Machines: The Evolution of Hypertext by Belinda Barnet

augmented reality, Benoit Mandelbrot, Bill Duvall, British Empire, Buckminster Fuller, Claude Shannon: information theory, collateralized debt obligation, computer age, conceptual framework, Douglas Engelbart, Douglas Engelbart, game design, hiring and firing, Howard Rheingold, HyperCard, hypertext link, Ian Bogost, information retrieval, Internet Archive, John Markoff, linked data, mandelbrot fractal, Marshall McLuhan, Menlo Park, nonsequential writing, Norbert Wiener, publish or perish, Robert Metcalfe, semantic web, Steve Jobs, Stewart Brand, technoutopianism, Ted Nelson, the scientific method, Vannevar Bush, wikimedia commons

They would later have a profound influence over hypertext theory and criticism, and also the Storyspace system. From the outset, the nodes in Storyspace were called ‘writing spaces’, and it worked explicitly with topographic MACHINE-ENHANCED (RE)MINDING 121 metaphors, incorporating a graphic ‘map view’ of the link data structure from the first version, along with a tree and an outline view (which are also visual representations of the data). ‘The tree’, Bolter tells us in Turing’s Man, ‘is a remarkably useful way of representing logical relations in spatial terms’ (Bolter 1984, 86). Also in line with the topographic metaphor, writing spaces in Storyspace acted (and still act) as containers for other writing spaces; an author literally ‘builds’ the space as she traverses it, zooming in and out to view details of the work, the map making the territory.

‘You’d tab a text and then you’d be able to associate notes with any particular word or phrase in the text […] an automated version of classical texts with notes’ (Bolter 2011). It wasn’t clickable because the IBM PC wasn’t clickable at the time; the user would move the cursor over the word and select it. This link data structure formed the basis for their future experiments ‘only in the sense that it had this quality of one text leading to another’ (Bolter 2011). In his well-researched chapter on afternoon, Matthew Kirschenbaum suggests that Storyspace has ‘significant grounding in a hierarchical data model’ (Kirschenbaum 2008, 173) that has its origins in the tree structures of ‘interactive fictions of the Adventure type’ (Kirschenbaum 2008, 175).

Guard fields are a powerful device, and one that Joyce deploys to full effect in afternoon. According to the Markle Report, Joyce ‘agitated’ for them to be included in the design of Storyspace from the outset, and Bolter quickly obliged in their fledgling program: It was just a matter of putting a field into the link data structure that would contain the guard, and then just checking that field […] against what the user did before they were allowed to follow the link […] It was [that] idea you know and it was Michael’s. (Bolter 2011) Guard fields, along with the topographic ‘spatial’ writing style, have remained integral to the Storyspace program for 30 years hence.


pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem

Amazon Web Services, anti-pattern, bioinformatics, commoditize, corporate governance, create, read, update, delete, data acquisition, en.wikipedia.org, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, the strength of weak ties, web application

Triple stores typically provide SPARQL ca‐ pabilities to reason about stored RDF data.11 RDF—the lingua franca of triple stores and the Semantic Web—can be serialized several ways. RDF encoding of a simple three-node graph shows the RDF/XML format. Here we see how triples come together to form linked data. RDF encoding of a simple three-node graph. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.example.org/ter <rdf:Description rdf:about="http://www.example.org/ginger"> <name>Ginger Rogers</name> <occupation>dancer</occupation> <partner rdf:resource="http://www.example.org/fred"/> </rdf:Description> 10. http://www.w3.org/standards/semanticweb/ 11.

See http://www.w3.org/TR/rdf-sparql-query/ and http://www.w3.org/RDF/ Graph Databases | 185 <rdf:Description rdf:about="http://www.example.org/fred"> <name>Fred Astaire</name> <occupation>dancer</occupation> <likes rdf:resource="http://www.example.org/ice-cream"/> </rdf:Description> </rdf:RDF> W3C support That they produce logical representations of triples doesn’t mean triple stores necessarily have triple-like internal implementations. Most triple stores, however, are unified by their support for Semantic Web technology such as RDF and SPARQL. While there’s nothing particularly special about RDF as a means of serializing linked data, it is en‐ dorsed by the W3C and therefore benefits from being widely understood and well doc‐ umented. The query language SPARQL benefits from similar W3C patronage. In the graph database space there is a similar abundance of innovation around graph serialization formats (e.g. GEOFF) and inferencing query languages (e.g. the Cypher query language that we use throughout this book).12 The key difference is that at this point these innovations do not enjoy the patronage of a well-regarded body like the W3C, though they do benefit from strong engagement within their user and vendor communities.


pages: 458 words: 116,832

The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism by Nick Couldry, Ulises A. Mejias

"side hustle", 23andMe, Airbnb, Amazon Mechanical Turk, Amazon Web Services, British Empire, call centre, Cass Sunstein, choice architecture, cloud computing, colonial rule, computer vision, corporate governance, dark matter, data acquisition, data is the new oil, different worldview, discovery of the americas, disinformation, diversification, Edward Snowden, en.wikipedia.org, European colonialism, gig economy, global supply chain, Google Chrome, Google Earth, hiring and firing, income inequality, independent contractor, information asymmetry, Infrastructure as a Service, intangible asset, Internet of things, Jaron Lanier, job automation, Kevin Kelly, late capitalism, lifelogging, linked data, Marc Andreessen, Mark Zuckerberg, means of production, move fast and break things, move fast and break things, multi-sided market, Naomi Klein, Network effects, new economy, New Urbanism, PageRank, pattern recognition, payday loans, Philip Mirowski, profit maximization, Ray Kurzweil, RFID, Richard Stallman, Richard Thaler, Scientific racism, Second Machine Age, sharing economy, Shoshana Zuboff, Silicon Valley, Slavoj Žižek, smart cities, Snapchat, social graph, social intelligence, software studies, sovereign wealth fund, surveillance capitalism, The Future of Employment, the scientific method, Thomas Davenport, Tim Cook: Apple, trade liberalization, trade route, undersea cable, urban planning, wages for housework

In chapter 1 we noted the social credit system seen by the Chinese government as its route to “the modernization of social governance.”110 Meanwhile in India, the Aadhaar identity-card system is being made a requirement for access to welfare services, tax dealings, and even the online booking of train tickets.111 Through the operation of social caching, we are increasingly becoming data subjects whose responsiveness to data signals is expected, even taken as virtuous. IoT = LAC? (Operationalizing Life’s Annexation to Capital) The business opportunities from innovative extensions of social caching are multiplying, often in alliance with the state. Consider the cameras with linked data analytics now offered in the United States by Axon AI (formerly Taser) to replace law enforcement officers’ crime-scene reports; as one investor said, “Taser wants to be the Tesla or Apple of law enforcement.”112 Even in formal democracies, resource-strapped states will take advantage of these apparently risk-free methods for delegating their knowledge of hard-to-reach areas of the social world to algorithms.

To the transparent networks that slowly occlude the flow of all those aspects of nature and character that distinguish humans from elevator buttons and doorbells. . . . Haven’t you felt it? The loss of autonomy. The sense of being virtualized. All the coded impulses you depend on to guide you. All the sensors in the room that are watching you, listening to you, tracking your habits, measuring your capabilities. All the linked data designed to incorporate you into the megadata.37 Something, in other words, is going wrong with human autonomy. But, you might ask, isn’t the notion of autonomy (the self’s ability to govern its own life, deriving from the Greek words autos for self and nomos for law or rule) itself problematic?

We argued that underlying these was something even more fundamental: the drive to capitalize human life itself in all its aspects and build through this a new social and economic order that installs capitalist management as the privileged mode for governing every aspect of life. Put another way, and updating Marx for the Big Data age, human life becomes a direct factor in capitalist production. This annexation of human capital is what links data colonialism to the further expansion of capitalism. This is the fundamental cost of connection, and it is a cost being paid all over the world, in societies in which connection is increasingly imposed as the basis for participating in everyday life. The resulting order has important similarities whether we are discussing the United States, China, Europe, or Latin America.


pages: 58 words: 12,386

Big Data Glossary by Pete Warden

business intelligence, crowdsourcing, fault tolerance, functional programming, information retrieval, linked data, natural language processing, recommendation engine, web application

It has been designed to make it easy to correct the most common errors you’ll encounter in human-created datasets. For example, it’s easy to spot and correct common problems like typos or inconsistencies in text values and to change cells from one format to another. There’s also rich support for linking data by calling APIs with the data contained in existing rows to augment the spreadsheet with information from external sources. Refine doesn’t let you do anything you can’t with other tools, but its power comes from how well it supports a typical extract and transform workflow. It feels like a good step up in abstraction, packaging processes that would typically take multiple steps in a scripting language or spreadsheet package into single operations with sensible defaults.


Data and the City by Rob Kitchin,Tracey P. Lauriault,Gavin McArdle

A Declaration of the Independence of Cyberspace, bike sharing scheme, bitcoin, blockchain, Bretton Woods, Chelsea Manning, citizen journalism, Claude Shannon: information theory, clean water, cloud computing, complexity theory, conceptual framework, corporate governance, correlation does not imply causation, create, read, update, delete, crowdsourcing, cryptocurrency, dematerialisation, digital map, distributed ledger, fault tolerance, fiat currency, Filter Bubble, floating exchange rates, functional programming, global value chain, Google Earth, hive mind, Internet of things, Kickstarter, knowledge economy, lifelogging, linked data, loose coupling, new economy, New Urbanism, Nicholas Carr, open economy, openstreetmap, packet switching, pattern recognition, performance metric, place-making, RAND corporation, RFID, Richard Florida, ride hailing / ride sharing, semantic web, sentiment analysis, sharing economy, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart contracts, smart grid, smart meter, social graph, software studies, statistical model, TaskRabbit, text mining, The Chicago School, The Death and Life of Great American Cities, the market place, the medium is the message, the scientific method, Toyota Production System, urban planning, urban sprawl, web application

London: Macmillan and Co. Cosgrove, D. (2001) Apollo’s Eye: A Cartographic Genealogy of the Earth in the Western Imagination. Baltimore, MD: Johns Hopkins University Press. Debruyne, C., Clinton, É., McNerney, L., Lavin, P. and O’Sullivan, D. (2017) ‘On the construction for a linked data platform for Ireland’s authoritative geospatial linked data’, 186 T. P. Lauriault available from: www.osi.ie/wp-content/uploads/2017/01/osi-eswc-2017-preprint.pdf [accessed 10 February 2017]. Dodge, M., Kitchin, R. and Perkins, C. (eds) (2009) Rethinking Maps: New Frontiers in Cartographic Theory. London: Routledge. Foucault, M. (2003) The Essential Foucault: Selections from Essential Works of Foucault, 1954–1984.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst

algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

In both cases, the overarching principle is real-time data integration, in which reflecting data change instantly in a data warehouse—whether originating from a MapReduce job or from a transactional system—and create downstream analytics that have an accurate, timely view of reality. Others are turning to linked data and semantics, where data sets are created using linking methodologies that focus on the semantics of the data. This fits well into the broader notion of pointing at external sources from within a data set, which has been around for quite a long time. That ability to point to unstructured data (whether residing in the file system or some external source) merely becomes an extension of the given capabilities, in which the ability to store and process XML and XQuery natively within an RDBMS enables the combination of different degrees of structure while searching and analyzing the underlying data.


Virtual Competition by Ariel Ezrachi, Maurice E. Stucke

Airbnb, Albert Einstein, algorithmic trading, barriers to entry, cloud computing, collaborative economy, commoditize, corporate governance, crony capitalism, crowdsourcing, Daniel Kahneman / Amos Tversky, David Graeber, demand response, disintermediation, disruptive innovation, double helix, Downton Abbey, Erik Brynjolfsson, experimental economics, Firefox, framing effect, Google Chrome, independent contractor, index arbitrage, information asymmetry, interest rate derivative, Internet of things, invisible hand, Jean Tirole, John Markoff, Joseph Schumpeter, Kenneth Arrow, light touch regulation, linked data, loss aversion, Lyft, Mark Zuckerberg, market clearing, market friction, Milgram experiment, multi-sided market, natural language processing, Network effects, new economy, offshore financial centre, pattern recognition, prediction markets, price discrimination, price stability, profit maximization, profit motive, race to the bottom, rent-seeking, Richard Thaler, ride hailing / ride sharing, road to serfdom, Robert Bork, Ronald Reagan, self-driving car, sharing economy, Silicon Valley, Skype, smart cities, smart meter, Snapchat, social graph, Steve Jobs, sunk-cost fallacy, supply-chain management, telemarketer, The Chicago School, The Myth of the Rational Market, The Wealth of Nations by Adam Smith, too big to fail, transaction costs, Travis Kalanick, turn-by-turn navigation, two-sided market, Uber and Lyft, Uber for X, uber lyft, Watson beat the top human players on Jeopardy!, women in the workforce, yield management

One possibility may be to focus on commercially sensitive information that, although publicly available, is of little or no value to customers but helps the competitors arrive at a supracompetitive price.37 Here the focus is on “cheap talk,” that is, data exchanges that facilitate conscious parallelism but are of limited use to customers. One problem, however, is in identifying such information. Part of the value of Big Data is data fusion, whereby computers link data sets, from which new insights emerge.38 Moreover, the data for some applications—such as customers sharing their inventory data with suppliers—can promote efficiency even while raising antitrust concerns.39 Even if the customers seek to limit what information can be shared, the algorithms—by analyzing a variety of data—could fi ll in the gaps.

cote=DSTI/ICCP(2012)9/FINAL&docLanguage =En, observing that “In some cases, big data is defined by the capacity to analyse a variety of mostly unstructured data sets from sources as diverse as web logs, social media, mobile communications, sensors and financial transactions. This requires the capability to link data sets; this can be essential as information is highly context-dependent and may not be of value out of the right context. It also requires the capability to extract information from unstructured data, i.e. data that lack a predefined (explicit or implicit) model.” 39. Stanford Graduate School of Business Staff, “Sharing Information to Boost the Bottom Line,” Insights by Stanford Business (March 1, 1999), http://www .gsb.stanford.edu/insights/sharing-information-boost-bottom-line. 336 Notes to Pages 234–237 Final Reflections 1.


pages: 262 words: 60,248

Python Tricks: The Book by Dan Bader

anti-pattern, domain-specific language, don't repeat yourself, functional programming, linked data, pattern recognition, performance metric

But before we jump in, let’s cover some of the basics first. How do arrays work, and what are they used for? Arrays consist of fixed-size data records that allow each element to be efficiently located based on its index. Because arrays store information in adjoining blocks of memory, they’re considered contiguous data structures (as opposed to linked datas structure like linked lists, for example.) A real world analogy for an array data structure is a parking lot: You can look at the parking lot as a whole and treat it as a single object, but inside the lot there are parking spots indexed by a unique number. Parking spots are containers for vehicles—each parking spot can either be empty or have a car, a motorbike, or some other vehicle parked on it.


pages: 680 words: 157,865

Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design by Diomidis Spinellis, Georgios Gousios

Albert Einstein, barriers to entry, business intelligence, business process, call centre, continuous integration, corporate governance, database schema, Debian, domain-specific language, don't repeat yourself, Donald Knuth, en.wikipedia.org, fault tolerance, Firefox, functional programming, general-purpose programming language, iterative process, linked data, locality of reference, loose coupling, meta-analysis, MVC pattern, peer-to-peer, premature optimization, recommendation engine, Richard Stallman, Ruby on Rails, semantic web, smart cities, social graph, social web, SPARQL, Steve Jobs, Stewart Brand, traveling salesman, Turing complete, type inference, web application, zero-coupon bond

There is no central coordination, and we are free to document our wandering by republishing our stories, thoughts, and journeys as we go. We think of the Web as a series of one-way links between documents (see Figure 5-1). Figure 5-1. Conventional notion of the Web Linked documents are only part of the picture, however. The vision for the Web always included the idea of linked data as well. This content can be consumed through a rendered view or directly referenced and manipulated in preferred forms in different contexts. You can imagine a middle-tier layer asking for information as an XML document while the presentation tier prefers a JSON object via an AJAX call. The same name refers to the same data in different forms.

For the more difficult aspects of establishing the correctness of a design or implementation, the advantage of the functional approach is not so clear. For example, proving that a recursive definition has specific properties and terminates requires the equivalent of a loop invariant and variant. It is also unlikely that efficient functional programs can afford to renounce programmer-visible linked data structures, with all the resulting problems such as aliasing, which are challenging regardless of the underlying programming model. If functional programming fails to bring a significant simplification to the task of establishing correctness, there remains a major practical argument: referential transparency.


The Data Journalism Handbook by Jonathan Gray, Lucy Chambers, Liliana Bounegru

Amazon Web Services, barriers to entry, bioinformatics, business intelligence, carbon footprint, citizen journalism, correlation does not imply causation, crowdsourcing, David Heinemeier Hansson, eurozone crisis, Firefox, Florence Nightingale: pie chart, game design, Google Earth, Hans Rosling, information asymmetry, Internet Archive, John Snow's cholera map, Julian Assange, linked data, moral hazard, MVC pattern, New Journalism, openstreetmap, Ronald Reagan, Ruby on Rails, Silicon Valley, social graph, SPARQL, text mining, web application, WikiLeaks

While we are all either a journalist, designer, or developer “first,” we continue to work hard to increase our understanding and proficiency in each other’s areas of expertise. The core products for exploring data are Excel, Google Docs, and Fusion Tables. The team has also, but to a lesser extent, used MySQL, Access databases, and Solr to explore larger datasets; and used RDF and SPARQL to begin looking at ways in which we can model events using Linked Data technologies. Developers will also use their programming language of choice, whether that’s ActionScript, Python, or Perl, to match, parse, or generally pick apart a dataset we might be working on. Perl is used for some of the publishing. We use Google, Bing Maps, and Google Earth, along with Esri’s ArcMAP, for exploring and visualizing geographical data.


pages: 224 words: 13,238

Electronic and Algorithmic Trading Technology: The Complete Guide by Kendall Kim

algorithmic trading, automated trading system, backtesting, Bear Stearns, commoditize, computerized trading, corporate governance, Credit Default Swap, diversification, en.wikipedia.org, family office, financial innovation, fixed income, index arbitrage, index fund, interest rate swap, linked data, market fragmentation, money market fund, natural language processing, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, short selling, statistical arbitrage, Steven Levy, transaction costs, yield curve

However, most financial services institutions do not have the ability to reach an optimal infrastructure because resources for most of a brokerage firm’s cost center have fallen victim to applying discretionary funds within the profit center such as the trading area of the business. It is clearly evident that budgets for data infrastructure have been reduced in the past years when the need for enhancing performance and technology has never been greater. Presumably, this will change in the future, though, when linking data to trading profitability becomes more evident. 8.5 Impact on Operations and Technology Real-time transaction processing and electronic trading can result in a great deal of automation for operations. Real-time transactions move more Effective Data Management 89 quickly, tend to be more accurate, have fewer problems, and need less attention than manually engaged transactions.


Algorithms in C++ Part 5: Graph Algorithms by Robert Sedgewick

Erdős number, functional programming, linear programming, linked data, NP-complete, reversible computing, sorting algorithm, traveling salesman

Indeed, the first algorithms that we considered in detail, the union-find algorithms in Chapter 1, are prime examples of graph algorithms. We also used graphs in Chapter 3 as an illustration of applications of two-dimensional arrays and linked lists, and in Chapter 5 to illustrate the relationship between recursive programs and fundamental data structures. Any linked data structure is a representation of a graph, and some familiar algorithms for processing trees and other linked structures are special cases of graph algorithms. The purpose of this chapter is to provide a context for developing an understanding of graph algorithms ranging from the simple ones in Part 1 to the sophisticated ones in Chapters 18 through 22.

The primary disadvantage is that testing for the existence of specific edges can take time proportional to V, as opposed to constant time in the adjacency matrix. These differences trace, essentially, to the difference between using linked lists and vectors to represent the set of vertices incident on each vertex. Thus, we see again that an understanding of the basic properties of linked data structures and vectors is critical if we are to develop efficient graph ADT implementations. Our interest in these performance differences is that we want to avoid implementations that are inappropriately inefficient under unexpected circumstances when a wide range of operations is to be demanded of the ADT.


pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzić

Albert Einstein, algorithmic bias, backpropagation, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application

Examples of heterogeneous networks include those in medical domains describing patients, diseases, treatments, and contacts, or in bibliographic domains describing publications, authors, and venues. Graph-mining techniques explicitly consider these links when building predictive or descriptive models of the linked data. The requirement of different applications with graph-based data sets is not very uniform. Thus, graph models and mining algorithms that work well in one domain may not work well in another. For example, chemical data is often represented as graphs in which the nodes correspond to atoms, and the links correspond to bonds between the atoms.

Therefore, a labeled graph G consists of three sets of information: G(N,L,V), where the new component V = {v1, v2, … , vt} is a set of values attached to links. An example of a directed graph is given in Figure 12.2b, while the graph in Figure 12.2c is a labeled graph. Different applications use different types of graphs in modeling linked data. In this chapter the primary focus is on undirected and unlabeled graphs although the reader still has to be aware that there are numerous graph-mining algorithms for directed and/or labeled graphs. Besides a graphical representation, each graph may be presented in the form of the incidence matrix I(G) where nodes are indexing rows and links are indexing columns.


pages: 288 words: 85,073

Factfulness: Ten Reasons We're Wrong About the World – and Why Things Are Better Than You Think by Hans Rosling, Ola Rosling, Anna Rosling Rönnlund

animal electricity, clean water, colonial rule, en.wikipedia.org, energy transition, first square of the chessboard, first square of the chessboard / second half of the chessboard, global pandemic, Hans Rosling, illegal immigration, income inequality, income per capita, Intergovernmental Panel on Climate Change (IPCC), jimmy wales, linked data, lone genius, microcredit, purchasing power parity, Stanford marshmallow experiment, Steven Pinker, Thomas L Friedman, Walter Mischel

We presented at the ceremony for their new Open Data platform in May 2010, and since then the World Bank has become the main access point for reliable global statistics; see gapm.io/x6. This was all possible thanks to Tim Berners-Lee and other early visionaries of the free internet. Sometime after he had invented the World Wide Web, Tim Berners-Lee contacted us, asking to borrow a slide show that showed how a web of linked data sources could flourish (using an image of pretty flowers). We share all of our content for free, so of course we said yes. Tim used this “flower-powerpoint” in his 2009 TED talk—see gapm.io/x6—to help people see the beauty of “The Next Web,” and he uses Gapminder as an example of what happens when data from multiple sources come together; see Berners-Lee (2009).


pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists by Gary Marcus, Jeremy Freeman

23andMe, Albert Einstein, backpropagation, bioinformatics, bitcoin, brain emulation, cloud computing, complexity theory, computer age, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data acquisition, Drosophila, epigenetics, global pandemic, Google Glasses, iterative process, linked data, mouse model, optical character recognition, pattern recognition, personalized medicine, phenotype, race to the bottom, Richard Feynman, Ronald Reagan, semantic web, speech recognition, stem cell, Steven Pinker, supply-chain management, Turing machine, twin studies, web application

While efforts to map the brain have begun as public, government-funded projects, this does not mean that private entities will not enter the arena and seek to compete with those projects. Although initial efforts to map the brain may be fueled by public funds, the issue of how “fine-tuned” information that can be used to determine risk factors or emerging disease states in individual’s brains, which will require linking data to genetic databases, health records, and health databases, will be handled merits discussion now. What rules will govern the sharing of detailed scans or maps about each individual’s brain? Can data be linked from a brain scan to a genome to a database without an individual’s express consent if that person’s identity is not 100 percent secure?


pages: 374 words: 94,508

Infonomics: How to Monetize, Manage, and Measure Information as an Asset for Competitive Advantage by Douglas B. Laney

3D printing, Affordable Care Act / Obamacare, banking crisis, blockchain, business climate, business intelligence, business process, call centre, chief data officer, Claude Shannon: information theory, commoditize, conceptual framework, crowdsourcing, dark matter, data acquisition, digital twin, discounted cash flows, disintermediation, diversification, en.wikipedia.org, endowment effect, Erik Brynjolfsson, full employment, informal economy, intangible asset, Internet of things, linked data, Lyft, Nash equilibrium, Network effects, new economy, obamacare, performance metric, profit motive, recommendation engine, RFID, semantic web, smart meter, Snapchat, software as a service, source of truth, supply-chain management, text mining, uber lyft, Y2K, yield curve

. • Information accessibility • User request turnaround time • User satisfaction survey Agility The ability to respond to external influences, and the ability to respond to marketplace changes to gain or maintain competitive advantage. SCOR agility metrics include flexibility and adaptability. • Utility of information for a range of purposes • Linked data, metadata, and master data measures • Ease of integrating new types of data or changing dimensions Costs The cost of operating the supply chain processes. This includes labor costs, material costs, management, and transportation costs. A typical cost metric is cost of goods sold. • Data acquisition cost • Data management costs • Data delivery costs (Each include labor and technology related costs) Asset Management Efficiency (Assets) The ability to efficiently utilize assets.


Future Files: A Brief History of the Next 50 Years by Richard Watson

Albert Einstein, bank run, banking crisis, battle of ideas, Black Swan, call centre, carbon footprint, cashless society, citizen journalism, commoditize, computer age, computer vision, congestion charging, corporate governance, corporate social responsibility, deglobalization, digital Maoism, disintermediation, epigenetics, failed state, financial innovation, Firefox, food miles, future of work, global pandemic, global supply chain, global village, hive mind, industrial robot, invention of the telegraph, Jaron Lanier, Jeff Bezos, knowledge economy, lateral thinking, linked data, low cost airline, low skilled workers, M-Pesa, mass immigration, Northern Rock, peak oil, pensions crisis, precision agriculture, prediction markets, Ralph Nader, Ray Kurzweil, rent control, RFID, Richard Florida, self-driving car, speech recognition, telepresence, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Turing test, Victor Gruen, white flight, women in the workforce, Zipcar

Carolyn 153 trends that will transform transport 5 Embedded intelligence Cars can already be opened or started using fingerprint and iris recognition, so we’ll see more technologies linking vehicle security to user identification. We will also see mood-sensitive vehicles that adjust their behavior according to the mood of the driver or occupants. Cars will also become mobile technology platforms linking data to other services such as healthcare. For example, if your car regularly detects an abnormal heartbeat or high levels of stress, this information could be sent wirelessly to your doctor. Obviously privacy issues abound, but cars could become useful data-collection and delivery points. Remote monitoring Electronic data recorders are little black boxes that already sit covertly inside some cars and monitor your speed, acceleration and braking.


pages: 356 words: 102,224

Pale Blue Dot: A Vision of the Human Future in Space by Carl Sagan

Albert Einstein, anthropic principle, cosmological principle, dark matter, Dava Sobel, Francis Fukuyama: the end of history, germ theory of disease, invention of the telescope, Isaac Newton, Johannes Kepler, Kuiper Belt, linked data, low earth orbit, nuclear winter, planetary scale, profit motive, scientific worldview, Search for Extraterrestrial Intelligence, Stephen Hawking, telepresence

You reach out your arm to pick up something shiny in the soil, and the robot arm does likewise. The sands of Mars trickle through your fingers. The only difficulty with this remote reality technology is that all this must occur in tedious slow motion: The round-trip travel time of 115 the up-link commands from Earth to Mars and the down-link data returned from Mars to Earth might take half an hour or more. But this is something we can learn to do. We can learn to contain our exploratory impatience if that's the price of exploring Mars. The rover can be made smart enough to deal with routine contingencies. Anything more challenging, and it makes a dead stop, puts itself into a safeguard mode, and radios for a very patient human controller to take over.


pages: 313 words: 101,403

My Life as a Quant: Reflections on Physics and Finance by Emanuel Derman

Bear Stearns, Berlin Wall, bioinformatics, Black-Scholes formula, Brownian motion, buy and hold, capital asset pricing model, Claude Shannon: information theory, Donald Knuth, Emanuel Derman, fixed income, Gödel, Escher, Bach, haute couture, hiring and firing, implied volatility, interest rate derivative, Jeff Bezos, John Meriwether, John von Neumann, law of one price, linked data, Long Term Capital Management, moral hazard, Murray Gell-Mann, Myron Scholes, Paul Samuelson, pre–internet, publish or perish, quantitative trading / quantitative finance, Sharpe ratio, statistical arbitrage, statistical model, Stephen Hawking, Steve Jobs, stochastic volatility, technology bubble, the new new thing, transaction costs, volatility smile, Y2K, yield curve, zero-coupon bond, zero-sum game

While I was away on a two-week beach vacation at Fire Island with my family, Ed suddenly threw himself into redesigning and then rewriting the entire system-without giving me advance notice. I returned to a fait accompli, a completely new, enhanced, and almost unrecognizable APL-flavored version of the language. Ed's version now incorporated vastly complex dynamically linked data structures, whose details I knew I would not live long enough to master. Ed had also cleverly modified HEQS so that, once you had used it interactively to develop and solve a financial model, you could then use it generate a C program that would solve your equations many times faster. Programming came naturally to Ed in a way it never would to me, and his proficiency daunted me.


pages: 348 words: 97,277

The Truth Machine: The Blockchain and the Future of Everything by Paul Vigna, Michael J. Casey

3D printing, additive manufacturing, Airbnb, altcoin, Amazon Web Services, barriers to entry, basic income, Berlin Wall, Bernie Madoff, bitcoin, blockchain, blood diamonds, Blythe Masters, business process, buy and hold, carbon footprint, cashless society, cloud computing, computer age, computerized trading, conceptual framework, Credit Default Swap, crowdsourcing, cryptocurrency, cyber-physical system, dematerialisation, disinformation, disintermediation, distributed ledger, Donald Trump, double entry bookkeeping, Edward Snowden, Elon Musk, Ethereum, ethereum blockchain, failed state, fault tolerance, fiat currency, financial innovation, financial intermediation, Garrett Hardin, global supply chain, Hernando de Soto, hive mind, informal economy, intangible asset, Internet of things, Joi Ito, Kickstarter, linked data, litecoin, longitudinal study, Lyft, M-Pesa, Marc Andreessen, market clearing, mobile money, money: store of value / unit of account / medium of exchange, Network effects, off grid, pets.com, prediction markets, pre–internet, price mechanism, profit maximization, profit motive, ransomware, rent-seeking, RFID, ride hailing / ride sharing, Ross Ulbricht, Satoshi Nakamoto, self-driving car, sharing economy, Silicon Valley, smart contracts, smart meter, Snapchat, social web, software is eating the world, supply-chain management, Ted Nelson, the market place, too big to fail, trade route, Tragedy of the Commons, transaction costs, Travis Kalanick, Turing complete, Uber and Lyft, uber lyft, unbanked and underbanked, underbanked, universal basic income, web of trust, zero-sum game

You could say these “cloud” services are much truer to that name than those of Amazon Web Services, Google, Dropbox, IBM, Oracle, Microsoft, and Apple, the providers with which most people associate that word. But even bigger changes are being considered, including projects to entirely re-architect the Web itself. There’s Solid, which stands for Social Linked Data, a new protocol for data storage that puts data back in the hands of the people to whom it belongs. The core idea is that we will store our data in Pods (Personalized Online Data Stores) and distribute it to applications via permissions we control. Solid is the brainchild of none other than Tim Berners-Lee, the computer scientist who perfected HTTP and gave us the World Wide Web.


pages: 352 words: 98,561

The City by Tony Norfield

accounting loophole / creative accounting, anti-communist, Asian financial crisis, asset-backed security, bank run, banks create money, Basel III, Berlin Wall, Big bang: deregulation of the City of London, Bretton Woods, BRICs, British Empire, capital controls, central bank independence, colonial exploitation, colonial rule, continuation of politics by other means, dark matter, Edward Snowden, Fall of the Berlin Wall, financial innovation, financial intermediation, foreign exchange controls, Francis Fukuyama: the end of history, G4S, global value chain, Goldman Sachs: Vampire Squid, interest rate derivative, interest rate swap, Irish property bubble, linked data, London Interbank Offered Rate, London Whale, Mark Zuckerberg, Martin Wolf, means of production, Money creation, money market fund, mortgage debt, North Sea oil, Northern Rock, Occupy movement, offshore financial centre, Plutocrats, plutocrats, purchasing power parity, quantitative easing, Real Time Gross Settlement, regulatory arbitrage, reserve currency, Ronald Reagan, seigniorage, Sharpe ratio, sovereign wealth fund, The Great Moderation, transaction costs, transfer pricing, zero-sum game

The City’s status as a major dealing centre is solidly based on its connections with the rest of the world and its ability to act as an intermediary for global flows of money-capital and credit. Major flows of finance in the form of deposits, loans, and the purchase and sale of securities between UK-based banks and the rest of the world are intermediated by banks outside the UK, but many of these are UK-linked. Data from the Bank of England enable these links to be examined in some detail, and they highlight a key role of the UK banking system, one that has not been analysed before. These data are shown in Table 8.6.22 The figures are in US dollars, since this is the main currency used in the transactions, and they measure the outstanding valuations of bank assets and liabilities.


pages: 350 words: 109,521

Our 50-State Border Crisis: How the Mexican Border Fuels the Drug Epidemic Across America by Howard G. Buffett

airport security, clean water, collective bargaining, defense in depth, Donald Trump, illegal immigration, immigration reform, linked data, low skilled workers, moral panic

Anderson’s work directly, but now we support it through a nonprofit called the Colibri Center for Human Rights that works with the medical examiner’s office to identify these remains and provide closure for families regardless of the origins of the deceased. For example, we funded an international geographic information system (GIS) initiative in Pima County to link data from missing person reports to postmortem reports. We agree with Anderson and Colibri that respect for the dead is one measure of a civilized society. Is it civilized to view the “mortal danger” of the desert as a deterrent? Should it give us pause that before Operation Gatekeeper funneled immigrants to the desert, there were only about twelve bodies per year recovered along the border?


pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future by Kevin Kelly

A Declaration of the Independence of Cyberspace, AI winter, Airbnb, Albert Einstein, Amazon Web Services, augmented reality, bank run, barriers to entry, Baxter: Rethink Robotics, bitcoin, blockchain, book scanning, Brewster Kahle, Burning Man, cloud computing, commoditize, computer age, connected car, crowdsourcing, dark matter, dematerialisation, Downton Abbey, Edward Snowden, Elon Musk, Filter Bubble, Freestyle chess, game design, Google Glasses, hive mind, Howard Rheingold, index card, indoor plumbing, industrial robot, Internet Archive, Internet of things, invention of movable type, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Kevin Kelly, Kickstarter, lifelogging, linked data, Lyft, M-Pesa, Marc Andreessen, Marshall McLuhan, means of production, megacity, Minecraft, Mitch Kapor, multi-sided market, natural language processing, Netflix Prize, Network effects, new economy, Nicholas Carr, old-boy network, peer-to-peer, peer-to-peer lending, personalized medicine, placebo effect, planetary scale, postindustrial economy, recommendation engine, RFID, ride hailing / ride sharing, Rodney Brooks, self-driving car, sharing economy, Silicon Valley, slashdot, Snapchat, social graph, social web, software is eating the world, speech recognition, Stephen Hawking, Steven Levy, Ted Nelson, The future is already here, the scientific method, transport as a service, two-sided market, Uber for X, uber lyft, Watson beat the top human players on Jeopardy!, WeWork, Whole Earth Review, Yochai Benkler, zero-sum game

Slowly but surely Amazon’s cloud and Google’s cloud and Facebook’s cloud and all the other enterprise clouds are intertwining into one massive cloud that acts as a single cloud—The Cloud—to the average user or company. A counterforce resisting this merger is that an intercloud requires commercial clouds to share their data (a cloud is a network of linked data), and right now data tends to be hoarded like gold. Data hoards are seen as a competitive advantage, and sharing data freely is hampered by laws, so it will be many years (decades?) before companies learn how to share their data creatively, productively, and responsibly. There is one final step in the inexorable march toward decentralized access.


Remix by John Courtenay Grimwood

clean water, delayed gratification, double helix, fear of failure, haute couture, Herbert Marcuse, Kickstarter, linked data

But Lady Clare had insisted, reeling off a list that began with the Antiguan Absolutists and ended with Zebediah Nouveau. Mind you, he didn’t hate standing inside that circle as much as he hated being there at all. But Lady Clare had insisted on that as well. Keeping her good side to the main CySat camera, Lady Clare smiled. It was amazing how much clout you carried when you’d linked data credits to gold reserves to keep the senior officers loyal, welcomed the UN Pax Force with open arms, arranged for Paris to be the first European city overflown with the new ‘dote and put some backbone into the Prince Imperial. This was the General’s payback, and as far as Lady Clare was concerned it was a small price.


pages: 404 words: 43,442

The Art of R Programming by Norman Matloff

Debian, discrete time, Donald Knuth, functional programming, general-purpose programming language, linked data, sorting algorithm, statistical model

If implemented in C, a tree node would be represented by a C struct, similar to an R list, whose contents are the stored value, a pointer to the left child, and a pointer to the right child. But since R lacks pointer variables, what can we do? Our solution is to go back to the basics. In the old prepointer days in FORTRAN, linked data structures were implemented in long arrays. A pointer, which in C is a memory address, was an array index instead. Specifically, we’ll represent each node by a row in a three-column matrix. The node’s stored value will be in the third element of that row, while the first and second elements will be the left and right links.


pages: 400 words: 121,988

Trading at the Speed of Light: How Ultrafast Algorithms Are Transforming Financial Markets by Donald MacKenzie

algorithmic trading, automated trading system, banking crisis, barriers to entry, bitcoin, blockchain, Bonfire of the Vanities, Bretton Woods, centralized clearinghouse, Claude Shannon: information theory, coronavirus, Covid-19, COVID-19, cryptocurrency, disintermediation, diversification, en.wikipedia.org, Ethereum, ethereum blockchain, family office, financial intermediation, fixed income, Flash crash, Google Earth, Hacker Ethic, Hibernia Atlantic: Project Express, interest rate derivative, interest rate swap, inventory management, light touch regulation, linked data, low earth orbit, market design, market microstructure, Martin Wolf, Renaissance Technologies, Satoshi Nakamoto, Small Order Execution System, Spread Networks laid a new fibre optics cable between New York and Chicago, statistical arbitrage, statistical model, Steven Levy, The Great Moderation, transaction costs, zero-sum game

Interviewee CV gave, as an example of this highly demanding form of trading—“taking in the world’s information and being able to translate that to predict the next tick [price movement]”—an algorithm trading 10-year US Treasury futures in the Chicago Mercantile Exchange’s datacenter. The algorithm will take into account the pattern of bids, offers, and trades in those futures, as well as patterns in the trading of the other Treasury and interest-rate futures also traded in that datacenter. The algorithm will receive, via microwave links, data on the buying and selling of the underlying Treasurys, which are traded in the two datacenters in New Jersey shown in the map in figure 4.1. Via Hibernia Atlantic’s ultrafast transatlantic cable, it will receive data on the trading of futures on UK sovereign bonds (these futures are traded in a datacenter just outside of London) and the equivalent German futures, traded in a datacenter in Frankfurt called FR2.


The Art of Computer Programming: Fundamental Algorithms by Donald E. Knuth

discrete time, distributed generation, Donald Knuth, fear of failure, Fermat's Last Theorem, G4S, Gerard Salton, Isaac Newton, Jacquard loom, Johannes Kepler, John von Neumann, linear programming, linked data, Menlo Park, probability theory / Blaise Pascal / Pierre de Fermat, sorting algorithm, stochastic process, Turing machine

The proper way to design a library is heavily dependent upon the computer used and the applications to be handled. Large modern computers require an entirely different approach to subroutine libraries. But this is a nice exercise anyway, because it involves interesting manipulations on both sequential and linked data.) The problem in this exercise is to design an algorithm for the stated task. Your allocator may transform the tape directory in any way as it prepares its answer, since the tape directory can be read in anew by the subroutine allocator on its next assignment, and the tape directory is not needed by other parts of the loading routine. 27. [25] Write a MIX program for the subroutine allocation algorithm of exercise 26. 28. [40] The following construction shows how to "solve" a fairly general type of two- person game, including chess, nim, and many simpler games: Consider a finite set of nodes, each of which represents a possible position in the game.

The first algorithm we require is one that builds the Data Table in such a form. Note the flexibility in choice of level numbers that is allowed by the COBOL rules; the left structure in D) is completely equivalent to 1 A 2 B 3 C 3 D 2 E 2 F 3 G because level numbers do not have to be sequential. 428 INFORMATION STRUCTURES 2.4 Symbol Table LINK Data Table PREV PARENT NAME CHILD SIB A: B: C: D: E: F: G: H: Al B5 C5 D9 E9 F5 G9 HI Empty boxes indicate additional information not relevant here A A A A A A A A F3 G4 B3 C7 E3 D7 G8 A Al B3 B3 Al Al F3 A HI F5 HI HI C5 C5 C5 A B C D E F G H F G B C E D G B3 C7 A A A G4 A F5 G8 A A E9 A A A HI E3 D7 A F3 A A A B5 A C5 A D9 G9 A E) Al: B3?


pages: 505 words: 133,661

Who Owns England?: How We Lost Our Green and Pleasant Land, and How to Take It Back by Guy Shrubsole

back-to-the-land, Beeching cuts, Boris Johnson, Capital in the Twenty-First Century by Thomas Piketty, centre right, congestion charging, deindustrialization, digital map, do-ocracy, Downton Abbey, financial deregulation, fixed income, Garrett Hardin, Goldman Sachs: Vampire Squid, Google Earth, housing crisis, James Dyson, Kickstarter, land reform, land tenure, land value tax, linked data, loadsamoney, mega-rich, mutually assured destruction, new economy, Occupy movement, offshore financial centre, oil shale / tar sands, openstreetmap, place-making, Plutocrats, plutocrats, profit motive, rent-seeking, Right to Buy, Ronald Reagan, sceptred isle, Stewart Brand, the built environment, the map is not the territory, The Wealth of Nations by Adam Smith, Tragedy of the Commons, trickle-down economics, urban sprawl, web of trust, Yom Kippur War, zero-sum game

Part of the problem is that the data on what companies own still isn’t good enough to prove whether or not land banking is occurring. Anna has tried to map the land owned by housing developers, but has been thwarted by the lack in the Land Registry’s corporate dataset of the necessary information to link data on who owns a site with digital maps of that area. That makes it very hard to assess, for example, whether a piece of land owned by a housebuilder for decades is a prime site accruing in value or a leftover fragment of ground from a past development. Second, the scope of Letwin’s review was drawn too narrowly to examine the wider problem of land banking by landowners beyond the major housebuilders.


pages: 494 words: 142,285

The Future of Ideas: The Fate of the Commons in a Connected World by Lawrence Lessig

AltaVista, Andy Kessler, barriers to entry, Bill Atkinson, business process, Cass Sunstein, commoditize, computer age, creative destruction, dark matter, disintermediation, disruptive innovation, Donald Davies, Erik Brynjolfsson, Garrett Hardin, George Gilder, Hacker Ethic, Hedy Lamarr / George Antheil, Howard Rheingold, Hush-A-Phone, HyperCard, hypertext link, Innovator's Dilemma, invention of hypertext, inventory management, invisible hand, Jean Tirole, Jeff Bezos, Joseph Schumpeter, Kenneth Arrow, Larry Wall, Leonard Kleinrock, linked data, Marc Andreessen, Menlo Park, Mitch Kapor, Network effects, new economy, packet switching, peer-to-peer, peer-to-peer model, price mechanism, profit maximization, RAND corporation, rent control, rent-seeking, RFC: Request For Comment, Richard Stallman, Richard Thaler, Robert Bork, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, Silicon Valley, smart grid, software patent, spectrum auction, Steve Crocker, Steven Levy, Stewart Brand, Ted Nelson, Telecommunications Act of 1996, The Chicago School, Tragedy of the Commons, transaction costs, Yochai Benkler, zero-sum game

For a time, one could find an extraordinary range of songs archived throughout the Web. Slowly these services have migrated to commercial sites. This migration means the commercial sites can support the costs of developing and maintaining this information. And in some cases, with some databases, the Internet provided a simple way to collect and link data about music in particular.8 Here the CDDB—or “CD database”—is the most famous example. As MP3 equipment became common, people needed a simple way to get information about CD titles and tracks onto the MP3 device. Of course, one could type in that information, but why should everyone have to type in that information?


pages: 528 words: 146,459

Computer: A History of the Information Machine by Martin Campbell-Kelly, William Aspray, Nathan L. Ensmenger, Jeffrey R. Yost

Ada Lovelace, air freight, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Apple's 1984 Super Bowl advert, barriers to entry, Bill Gates: Altair 8800, borderless world, Buckminster Fuller, Build a better mousetrap, Byte Shop, card file, cashless society, cloud computing, combinatorial explosion, computer age, deskilling, don't be evil, Donald Davies, Douglas Engelbart, Douglas Engelbart, Dynabook, fault tolerance, Fellow of the Royal Society, financial independence, Frederick Winslow Taylor, game design, garden city movement, Grace Hopper, hockey-stick growth, Ian Bogost, informal economy, interchangeable parts, invention of the wheel, Jacquard loom, Jeff Bezos, jimmy wales, John Markoff, John von Neumann, Kickstarter, light touch regulation, linked data, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Mitch Kapor, natural language processing, Network effects, New Journalism, Norbert Wiener, Occupy movement, optical character recognition, packet switching, PageRank, pattern recognition, Pierre-Simon Laplace, pirate software, popular electronics, prediction markets, pre–internet, QWERTY keyboard, RAND corporation, Robert X Cringely, Silicon Valley, Silicon Valley startup, Steve Jobs, Steven Levy, Stewart Brand, Ted Nelson, the market place, Turing machine, Vannevar Bush, Von Neumann architecture, Whole Earth Catalog, William Shockley: the traitorous eight, women in the workforce, young professional

was already well established when two other Stanford University doctoral students, Larry Page and Sergey Brin, began work on the Stanford Digital Library Project (funded in part by the National Science Foundation)—research that would not only forever change the process of finding things on the Internet but also, in time, lead to an unprecedentedly successful web advertising model. Page became interested in a dissertation project on the mathematical properties of the web, and found strong support from his adviser Terry Winograd, a pioneer of artificial intelligence research on natural language processing. Using a “web crawler” to gather back-link data (that is, the websites that linked to a particular site), Page, now teamed up with Brin, created their “PageRank” algorithm based on back-links ranked by importance—the more prominent the linking site, the more influence it would have on the linked site’s page rank. They insightfully reasoned that this would provide the basis for more useful web searches than any existing tools and, moreover, that there would be no need to hire a corps of indexing staff.


pages: 598 words: 134,339

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World by Bruce Schneier

23andMe, Airbnb, airport security, AltaVista, Anne Wojcicki, augmented reality, Benjamin Mako Hill, Black Swan, Boris Johnson, Brewster Kahle, Brian Krebs, call centre, Cass Sunstein, Chelsea Manning, citizen journalism, cloud computing, congestion charging, disintermediation, drone strike, Edward Snowden, experimental subject, failed state, fault tolerance, Ferguson, Missouri, Filter Bubble, Firefox, friendly fire, Google Chrome, Google Glasses, hindsight bias, informal economy, Internet Archive, Internet of things, Jacob Appelbaum, Jaron Lanier, John Markoff, Julian Assange, Kevin Kelly, license plate recognition, lifelogging, linked data, Lyft, Mark Zuckerberg, moral panic, Nash equilibrium, Nate Silver, national security letter, Network effects, Occupy movement, Panopticon Jeremy Bentham, payday loans, pre–internet, price discrimination, profit motive, race to the bottom, RAND corporation, recommendation engine, RFID, Ross Ulbricht, self-driving car, Shoshana Zuboff, Silicon Valley, Skype, smart cities, smart grid, Snapchat, social graph, software as a service, South China Sea, stealth mode startup, Steven Levy, Stuxnet, TaskRabbit, telemarketer, Tim Cook: Apple, transaction costs, Uber and Lyft, uber lyft, undersea cable, urban planning, WikiLeaks, Yochai Benkler, zero day

., 160 fiduciary responsibility, data collection and, 204–5 50 Cent Party, 114 FileVault, 215 filter bubble, 114–15 FinFisher, 81 First Unitarian Church of Los Angeles, 91 FISA (Foreign Intelligence Surveillance Act; 1978), 273 FISA Amendments Act (2008), 171, 273, 275–76 Section 702 of, 65–66, 173, 174–75, 261 FISA Court, 122, 171 NSA misrepresentations to, 172, 337 secret warrants of, 174, 175–76, 177 transparency needed in, 177 fishing expeditions, 92, 93 Fitbit, 16, 112 Five Eyes, 76 Flame, 72 FlashBlock, 49 flash cookies, 49 Ford Motor Company, GPS data collected by, 29 Foreign Intelligence Surveillance Act (FISA; 1978), 273 see also FISA Amendments Act Forrester Research, 122 Fortinet, 82 Fox-IT, 72 France, government surveillance in, 79 France Télécom, 79 free association, government surveillance and, 2, 39, 96 freedom, see liberty Freeh, Louis, 314 free services: overvaluing of, 50 surveillance exchanged for, 4, 49–51, 58–59, 60–61, 226, 235 free speech: as constitutional right, 189, 344 government surveillance and, 6, 94–95, 96, 97–99 Internet and, 189 frequent flyer miles, 219 Froomkin, Michael, 198 FTC, see Federal Trade Commission, US fusion centers, 69, 104 gag orders, 100, 122 Gamma Group, 81 Gandy, Oscar, 111 Gates, Bill, 128 gay rights, 97 GCHQ, see Government Communications Headquarters Geer, Dan, 205 genetic data, 36 geofencing, 39–40 geopolitical conflicts, and need for surveillance, 219–20 Georgia, Republic of, cyberattacks on, 75 Germany: Internet control and, 188 NSA surveillance of, 76, 77, 122–23, 151, 160–61, 183, 184 surveillance of citizens by, 350 US relations with, 151, 234 Ghafoor, Asim, 103 GhostNet, 72 Gill, Faisal, 103 Gmail, 31, 38, 50, 58, 219 context-sensitive advertising in, 129–30, 142–43 encryption of, 215, 216 government surveillance of, 62, 83, 148 GoldenShores Technologies, 46–47 Goldsmith, Jack, 165, 228 Google, 15, 27, 44, 48, 54, 221, 235, 272 customer loyalty to, 58 data mining by, 38 data storage capacity of, 18 government demands for data from, 208 impermissible search ad policy of, 55 increased encryption by, 208 as information middleman, 57 linked data sets of, 50 NSA hacking of, 85, 208 PageRank algorithm of, 196 paid search results on, 113–14 search data collected by, 22–23, 31, 123, 202 transparency reports of, 207 see also Gmail Google Analytics, 31, 48, 233 Google Calendar, 58 Google Docs, 58 Google Glass, 16, 27, 41 Google Plus, 50 real name policy of, 49 surveillance by, 48 Google stalking, 230 Gore, Al, 53 government: checks and balances in, 100, 175 surveillance by, see mass surveillance, government Government Accountability Office, 30 Government Communications Headquarters (GCHQ): cyberattacks by, 149 encryption programs and, 85 location data used by, 3 mass surveillance by, 69, 79, 175, 182, 234 government databases, hacking of, 73, 117, 313 GPS: automobile companies’ use of, 29–30 FBI use of, 26, 95 police use of, 26 in smart phones, 3, 14 Grayson, Alan, 172 Great Firewall (Golden Shield), 94, 95, 150–51, 187, 237 Greece, wiretapping of government cell phones in, 148 greenhouse gas emissions, 17 Greenwald, Glenn, 20 Grindr, 259 Guardian, Snowden documents published by, 20, 67, 149 habeas corpus, 229 hackers, hacking, 42–43, 71–74, 216, 313 of government databases, 73, 117, 313 by NSA, 85 privately-made technology for, 73, 81 see also cyberwarfare Hacking Team, 73, 81, 149–50 HAPPYFOOT, 3 Harris Corporation, 68 Harris Poll, 96 Hayden, Michael, 23, 147, 162 health: effect of constant surveillance on, 127 mass surveillance and, 16, 41–42 healthcare data, privacy of, 193 HelloSpy, 3, 245 Hewlett-Packard, 112 Hill, Raquel, 44 hindsight bias, 322 Hobbes, Thomas, 210 Home Depot, 110, 116 homosexuality, 97 Hoover, J.


The Art of Computer Programming: Sorting and Searching by Donald Ervin Knuth

card file, Claude Shannon: information theory, complexity theory, correlation coefficient, Donald Knuth, double entry bookkeeping, Eratosthenes, Fermat's Last Theorem, G4S, information retrieval, iterative process, John von Neumann, linked data, locality of reference, Menlo Park, Norbert Wiener, NP-complete, p-value, Paul Erdős, RAND corporation, refrigerator car, sorting algorithm, Vilfredo Pareto, Yogi Berra, Zipf's Law

Example of Wheeler's tree insertion scheme. structure slightly with "two-way insertion" cuts the number of moves down to about |-/V2. Shellsort cuts the number of comparisons and moves to about N7//6, for N in a practical range; as N —> oo this number can be lowered to order N(\ogNJ. Another way to improve on Algorithm S, using a linked data structure, gave us the list insertion method, which does about \N2 comparisons, 0 moves, and 2N changes of links. Is it possible to marry the best features of these methods, reducing the number of comparisons to order NlogN as in binary insertion, yet reducing the number of moves as in list insertion?

An alert, "modern" reader will note, however, that the whole idea of mak- making digit counts for the storage allocation is tied to old-fashioned ideas about sequential data representation. We know that linked allocation is specifically designed to handle a set of tables of variable size, so it is natural to choose a linked data structure for radix sorting. Since we traverse each pile serially, all 5.2.5 SORTING BY DISTRIBUTION 171 Table 1 RADIX SORTING Input area contents: 503 087 512 061 908 170 897 275 653 426 154 509 612 677 765 703 Counts for units digit distribution: 1123121311 Storage allocations based on these counts: 1 2 4 7 8 10 11 14 15 16 Auxiliary area contents: 170 061 512 612 503 653 703 154 275 765 426 087 897 677 908 509 Counts for tens digit distribution: 4210022311 Storage allocations based on these counts: 4 6 7 7 7 9 11 14 15 16 Input area contents: 503 703 908 509 512 612 426 653 154 061 765 170 275 677 087 897 Counts for hundreds digit distribution: 2210133211 Storage allocations based on these counts: 2 4 5 5 6 9 12 14 15 16 Auxiliary area contents: 061 087 154 170 275 426 503 509 512 612 653 677 703 765 897 908 we need is a single link from each item to its successor.


In the Age of the Smart Machine by Shoshana Zuboff

affirmative action, American ideology, blue-collar work, collective bargaining, computer age, Computer Numeric Control, conceptual framework, data acquisition, demand response, deskilling, factory automation, Ford paid five dollars a day, fudge factor, future of work, industrial robot, information retrieval, interchangeable parts, job automation, lateral thinking, linked data, Marshall McLuhan, means of production, old-boy network, optical character recognition, Panopticon Jeremy Bentham, post-industrial society, RAND corporation, Shoshana Zuboff, social web, The Wealth of Nations by Adam Smith, Thorstein Veblen, union organizing, zero-sum game

Ironically, it means creating a doubly abstract world, where the refer- ence function of the electronic symbols becomes less problematic be- cause of yet another layer of abstractions (mental images) called up to serve as referents. Operators did not appear equally adept at generating an inward im- age. 7 Many seemed unable to link data on the screen to a referential reality. Their interactions with the data were confined to the two- dimensional space of the terminal screen; the electronic symbols were deciphered according to the varying patterns in which they were ar- rayed. Typically, when asked what the data on the screen meant, these operators would point to distinct data elements and discuss them in terms of their spatial relationships on the screen, as if there were no external referents.


pages: 834 words: 180,700

The Architecture of Open Source Applications by Amy Brown, Greg Wilson

8-hour work day, anti-pattern, bioinformatics, c2.com, cloud computing, collaborative editing, combinatorial explosion, computer vision, continuous integration, create, read, update, delete, David Heinemeier Hansson, Debian, domain-specific language, Donald Knuth, en.wikipedia.org, fault tolerance, finite state, Firefox, friendly fire, functional programming, Guido van Rossum, linked data, load shedding, locality of reference, loose coupling, Mars Rover, MITM: man-in-the-middle, MVC pattern, peer-to-peer, Perl 6, premature optimization, recommendation engine, revision control, Ruby on Rails, side project, Skype, slashdot, social web, speech recognition, the scientific method, The Wisdom of Crowds, web application, WebSocket

If the data changes from one execution to another, a new version is checked in to the repository. Thus, the (uuid, version) tuple is a compound identifier to retrieve the data in any state. In addition, we store the hash of the data as well as the signature of the upstream portion of the workflow that generated it (if it is not an input). This allows one to link data that might be identified differently as well as reuse data when the same computation is run again. The main concern when designing this package was the way users were able to select and retrieve their data. Also, we wished to keep all data in the same repository, regardless of whether it is used as input, output, or intermediate data (an output of one workflow might be used as the input of another).


pages: 933 words: 205,691

Hadoop: The Definitive Guide by Tom White

Amazon Web Services, bioinformatics, business intelligence, combinatorial explosion, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, full text search, functional programming, Grace Hopper, information retrieval, Internet Archive, Kickstarter, linked data, loose coupling, openstreetmap, recommendation engine, RFID, SETI@home, social graph, web application

This information is not readily available when crawling. Also, the indexing process benefits from taking into account the anchor text on inlinks so that this text may semantically enrich the text of the current page. As mentioned earlier, Nutch collects the outlink information and then uses this data to build a LinkDb, which contains this reversed link data in the form of inlinks and anchor text. This section presents a rough outline of the implementation of the LinkDb tool—many details have been omitted (such as URL normalization and filtering) in order to present a clear picture of the process. What’s left gives a classical example of why the MapReduce paradigm fits so well with the key data transformation processes required to run a search engine.


pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil

additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business cycle, business intelligence, c2.com, call centre, carbon-based life, cellular automata, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, coronavirus, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, disintermediation, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, factory automation, friendly AI, functional programming, George Gilder, Gödel, Escher, Bach, informal economy, information retrieval, invention of the telephone, invention of the telescope, invention of writing, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Marshall McLuhan, Mikhail Gorbachev, Mitch Kapor, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Norbert Wiener, oil shale / tar sands, optical character recognition, pattern recognition, phenotype, premature optimization, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Robert Metcalfe, Rodney Brooks, scientific worldview, Search for Extraterrestrial Intelligence, selection bias, semantic web, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, two and twenty, Vernor Vinge, Y2K, Yogi Berra

Resources and Contact Information Singularity.com New developments in the diverse fields discussed in this book are accumulating at an accelerating pace. To help you keep pace, I invite you to visit Singularity.com, where you will find ·Recent news stories ·A compilation of thousands of relevant news stories going back to 2001 from KurzweilAI.net (see below) ·Hundreds of articles on related topics from KurzweilAI.net ·Research links ·Data and citation for all graphs ·Material about this book ·Excerpts from this book ·Online endnotes KurzweilAI.net You are also invited to visit our award-winning Web site, KurzweilAI.net, which includes over six hundred articles by over one hundred "big thinkers" (many of whom are cited in this book), thousands of news articles, listings of events, and other features.


pages: 897 words: 242,580

The Temporal Void by Peter F. Hamilton

corporate governance, dark matter, forensic accounting, linked data, megacity, place-making, trade route

The Yenisey couldn’t even get an accurate quantum signature scan to determine what kind of drive it used. ‘Admiral,’ Lucian called urgently. ‘We can’t—’ The unknown ship fired. ‘What the fuck was that!’ Gore yelled as the secure link abruptly vanished. Kazimir took a second to review the TD link data, he was so surprised. His tactical staff had produced a number of scenarios, mostly incorporating the Ocisens utilizing weapons technology they’d procured from a more advanced species. This hadn’t been a remote consideration. ‘I don’t recognize that design at all,’ Ilanthe said. ‘Do we have any spherical ship on the Navy’s intelligence registry?’


pages: 903 words: 235,753

The Stack: On Software and Sovereignty by Benjamin H. Bratton

1960s counterculture, 3D printing, 4chan, Ada Lovelace, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, basic income, Benevolent Dictator For Life (BDFL), Berlin Wall, bioinformatics, Biosphere 2, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, carbon footprint, carbon-based life, Cass Sunstein, Celebration, Florida, charter city, clean water, cloud computing, connected car, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Douglas Engelbart, Edward Snowden, Elon Musk, en.wikipedia.org, Eratosthenes, Ethereum, ethereum blockchain, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, functional programming, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, Hyperloop, Ian Bogost, illegal immigration, industrial robot, information retrieval, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Jacob Appelbaum, Jaron Lanier, Joan Didion, John Markoff, Joi Ito, Jony Ive, Julian Assange, Khan Academy, Kim Stanley Robinson, liberal capitalism, lifelogging, linked data, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megastructure, Menlo Park, Minecraft, MITM: man-in-the-middle, Monroe Doctrine, Network effects, new economy, offshore financial centre, oil shale / tar sands, packet switching, PageRank, pattern recognition, peak oil, peer-to-peer, performance metric, personalized medicine, Peter Eisenman, Peter Thiel, phenotype, Philip Mirowski, Pierre-Simon Laplace, place-making, planetary scale, RAND corporation, recommendation engine, reserve currency, RFID, Robert Bork, Sand Hill Road, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, Slavoj Žižek, smart cities, smart grid, smart meter, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, TaskRabbit, the built environment, The Chicago School, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, undersea cable, universal basic income, urban planning, Vernor Vinge, Washington Consensus, web application, Westphalian system, WikiLeaks, working poor, Y Combinator

Through various combinations of open or proprietary exigetics of data, and perhaps a sequence of application programming interfaces (APIs), a query entered as “book me a ticket to New York” can activate a series of secondary inquiries to calendars, banks, flight schedules, airline databases, bank accounts, and so on and, through this, initiate the cascading programming resulting in that booking. For this, to search is also to program. Such tidy consumer use cases require enormously difficult standardizations of interoperability between competitive services (not to mention beyond-Esperanto level standardization of all Users’ conceptual taxonomies). The goal of linking data into semantically relevant and accessible structures so that “search” would also provide more actionable results, and in turn allowing queries to program those results for specific ends, remains compelling for search engines, if less so for individual down-service-stream providers, such as airlines and banks, which see their business absorbed into a handful of search platforms.20 By comparison, physical search may be based on a similar tissue of interrelation between addressable entities—in this case, a mix of physical things and data of interest—and might be a necessary condition of a really viable Internet of Things or SPIME space.


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

backpropagation, bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, disinformation, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

.; Ronkainen, P.; Toivonen, H.; Verkamo, A.I., Finding interesting rules from large sets of discovered association rules, In: Proc. 3rd Int. Conf. Information and Knowledge Management Gaithersburg, MD. (Nov. 1994), pp. 401–408. [KMS03] Kubica, J.; Moore, A.; Schneider, J., Tractable group detection on large link data sets, In: Proc. 2003 Int. Conf. Data Mining (ICDM’03) Melbourne, FL. (Nov. 2003), pp. 573–576. [KN97] Knorr, E.; Ng, R., A unified notion of outliers: Properties and computation, In: Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD’97) Newport Beach, CA. (Aug. 1997), pp. 219–222. [KNNL04] Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W., Applied Linear Statistical Models with Student CD. (2004) Irwin .


pages: 918 words: 257,605

The Age of Surveillance Capitalism by Shoshana Zuboff

algorithmic bias, Amazon Web Services, Andrew Keen, augmented reality, autonomous vehicles, barriers to entry, Bartolomé de las Casas, Berlin Wall, bitcoin, blockchain, blue-collar work, book scanning, Broken windows theory, California gold rush, call centre, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, choice architecture, citizen journalism, cloud computing, collective bargaining, Computer Numeric Control, computer vision, connected car, corporate governance, corporate personhood, creative destruction, cryptocurrency, disinformation, dogs of the Dow, don't be evil, Donald Trump, Edward Snowden, en.wikipedia.org, Erik Brynjolfsson, facts on the ground, Ford paid five dollars a day, future of work, game design, Google Earth, Google Glasses, Google X / Alphabet X, hive mind, Ian Bogost, impulse control, income inequality, Internet of things, invention of the printing press, invisible hand, Jean Tirole, job automation, Johann Wolfgang von Goethe, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, knowledge economy, linked data, longitudinal study, low skilled workers, Mark Zuckerberg, market bubble, means of production, multi-sided market, Naomi Klein, natural language processing, Network effects, new economy, Occupy movement, off grid, PageRank, Panopticon Jeremy Bentham, pattern recognition, Paul Buchheit, performance metric, Philip Mirowski, precision agriculture, price mechanism, profit maximization, profit motive, recommendation engine, refrigerator car, RFID, Richard Thaler, ride hailing / ride sharing, Robert Bork, Robert Mercer, Second Machine Age, self-driving car, sentiment analysis, shareholder value, Shoshana Zuboff, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, slashdot, smart cities, Snapchat, social graph, social web, software as a service, speech recognition, statistical model, Steve Bannon, Steve Jobs, Steven Levy, structural adjustment programs, surveillance capitalism, The Future of Employment, The Wealth of Nations by Adam Smith, Tim Cook: Apple, two-sided market, union organizing, Watson beat the top human players on Jeopardy!, winner-take-all economy, Wolfgang Streeck, Yochai Benkler, you are the product

Conlee, “How Automation and Analytics Are Changing Customer Care,” Conduent Blog, July 18, 2016, https://www.blogs.conduent.com/2016/07/18/how-automation-and-analytics-are-changing-customer-care; Ryan Knutson, “Call Centers May Know a Surprising Amount About You,” Wall Street Journal, January 6, 2017, http://www.wsj.com/articles/that-anonymous-voice-at-the-call-center-they-may-know-a-lot-about-you-1483698608. 74. Nicholas Confessore and Danny Hakim, “Bold Promises Fade to Doubts for a Trump-Linked Data Firm,” New York Times, March 6, 2017, https://www.nytimes.com/2017/03/06/us/politics/cambridge-analytica.html; Mary-Ann Russon, “Political Revolution: How Big Data Won the US Presidency for Donald Trump,” International Business Times UK, January 20, 2017, http://www.ibtimes.co.uk/political-revolution-how-big-data-won-us-presidency-donald-trump-1602269; Grassegger and Krogerus, “The Data That Turned the World Upside Down”; Carole Cadwalladr, “Revealed: How US Billionaire Helped to Back Brexit,” Guardian, February 25, 2017, https://www.theguardian.com/politics/2017/feb/26/us-billionaire-mercer-helped-back-brexit; Paul-Olivier Dehaye, “The (Dis)Information Mercenaries Now Controlling Trump’s Databases,” Medium, January 3, 2017, https://medium.com/personaldata-io/the-dis-information-mercenaries-now-controlling-trumps-databases-4f6a20d4f3e7; Harry Davies, “Ted Cruz Using Firm That Harvested Data on Millions of Unwitting Facebook Users,” Guardian, December 11, 2015, https://www.theguardian.com/us-news/2015/dec/11/senator-ted-cruz-president-campaign-facebook-user-data. 75.