linked data

28 results back to index

pages: 315 words: 70,044

Learning SPARQL by Bob DuCharme


database schema,, linked data, semantic web, SPARQL, web application

For example, simply knowing that “spouse” is a symmetric term made it possible to find out the identity of Cindy’s spouse, even though this fact was not part of the dataset. Linked Data The idea of Linked Data is newer than that of the semantic web, but sometimes it’s easier to think of the semantic web as building on the ideas behind Linked Data. Linked Data is not a specification, but a set of best practices for providing a data infrastructure that makes it easier to share data across the web. You can then use semantic web technologies such as RDFS, OWL, and SPARQL to build applications around that data. Tim Berners-Lee came up with these four principles of Linked Data in 2006 (I’ve bolded his wording and added my own commentary): Use URIs as names for things. URIs are the best way available to uniquely identify things, and therefore to identify connections between things.

., Checking, Adding, and Removing Spoken Language Tags langMatches(), Checking, Adding, and Removing Spoken Language Tags language codes, Making RDF More Readable with Language Tags and Labels, Using the Labels Provided by DBpedia, Checking, Adding, and Removing Spoken Language Tags, Checking, Adding, and Removing Spoken Language Tags checking, adding, and removing, Checking, Adding, and Removing Spoken Language Tags, Checking, Adding, and Removing Spoken Language Tags filtering on, Using the Labels Provided by DBpedia LCASE(), String Functions LIMIT, Retrieving a Specific Number of Results, Federated Queries: Searching Multiple Datasets with One Query Linked Data, What Exactly Is the “Semantic Web”?, Linked Data, Linked Data, Linked Data, Public Endpoints, Private Endpoints, Public Endpoints, Private Endpoints, Glossary intranets and, Public Endpoints, Private Endpoints Linked Open Data, Linked Data, Public Endpoints, Private Endpoints Linked Movie Database, SPARQL and Web Application Development, SPARQL and Web Application Development literal, Data Typing, Glossary LOAD, Adding Data to a Dataset local name, URLs, URIs, IRIs, and Namespaces, Glossary M MAX(), Finding the Smallest, the Biggest, the Count, the Average...

o as variable names, Searching for Strings [], Blank Nodes and Why They’re Useful (see square braces) ^ in property paths, Searching Further in the Data ^^ datatype indicator, Datatypes and Queries _ in blank node names, Blank Nodes and Why They’re Useful | in property paths, Searching Further in the Data || in boolean expressions, Program Logic Functions “"” to delimit strings in Turtle and SPARQL, Representing Strings A a (“a”) as keyword, Reusing and Creating Vocabularies: RDF Schema and OWL abs(), Numeric Functions addition, Comparing Values and Doing Arithmetic AGROVOC thesaurus, Datatypes and Queries APIs, SPARQL, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT arithmetic, Comparing Values and Doing Arithmetic, Comparing Values and Doing Arithmetic ARQ SPARQL processor, Querying the Data, Standalone Processors application development and, Standalone Processors AS, Combining Values and Assigning Values to Variables ASK, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Defining Rules with SPARQL, Defining Rules with SPARQL SPARQL rules and, Defining Rules with SPARQL, Defining Rules with SPARQL asterisk, Searching for Strings, Searching Further in the Data in property paths, Searching Further in the Data in SELECT expression, Searching for Strings AVG(), Finding the Smallest, the Biggest, the Count, the Average..., Grouping Data and Finding Aggregate Values within Groups B bad data, finding, Finding Bad Data, Using Existing SPARQL Rules Vocabularies BASE, Node Type Conversion Functions Berners-Lee, Tim, Why Learn SPARQL?, What Exactly Is the “Semantic Web”?, Linked Data Linked Data and, Linked Data biggest value, finding, Finding the Smallest, the Biggest, the Count, the Average..., Finding the Smallest, the Biggest, the Count, the Average... BIND, Combining Values and Assigning Values to Variables, Creating New Data, Comparing Values and Doing Arithmetic in CONSTRUCT queries, Creating New Data binding, More Realistic Data and Matching on Multiple Triples, Glossary, Glossary blank nodes, Blank Nodes and Why They’re Useful, Blank Nodes and Why They’re Useful, Blank Nodes and Why They’re Useful, Searching with Blank Nodes, Using Existing SPARQL Rules Vocabularies, Node Type Conversion Functions, Glossary searching with, Searching with Blank Nodes square braces to represent, Using Existing SPARQL Rules Vocabularies bnode, Blank Nodes and Why They’re Useful (see blank nodes) boolean datatype, Datatypes and Queries bound(), Finding Data That Doesn’t Meet Certain Conditions, Node Type and Datatype Checking Functions C cast, Glossary casting, Functions ceil(), Numeric Functions CGI scripts, SPARQL and Web Application Development classes, Reusing and Creating Vocabularies: RDF Schema and OWL, Reusing and Creating Vocabularies: RDF Schema and OWL, Creating New Data subclasses and, Reusing and Creating Vocabularies: RDF Schema and OWL CLEAR, Deleting Data COALESCE(), Program Logic Functions comma, Storing RDF in Files, Converting Data CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files comma separated values, Standalone Processors comments (in Turtle and SPARQL), The Data to Query CONCAT(), Program Logic Functions CONSTRUCT, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Copying Data, Converting Data, Changing Existing Data prototyping update commands with, Changing Existing Data CONTAINS(), String Functions, String Functions, Extension Functions converting data, Converting Data, Converting Data copying data, Copying Data, Copying Data COUNT(), Finding the Smallest, the Biggest, the Count, the Average..., Grouping Data and Finding Aggregate Values within Groups CSS, SPARQL and Web Application Development curl utility, SPARQL and Web Application Development D D2RQ, Querying a Remote SPARQL Service, Middleware SPARQL Support data cleanup, FILTERing Data Based on Conditions data typing, Data Typing, Data Typing datatype(), Defining Rules with SPARQL, Node Type and Datatype Checking Functions datatypes, Datatypes and Queries, Datatype Conversion, Datatype Conversion converting, Datatype Conversion, Datatype Conversion custom, Datatypes and Queries date datatype, Datatypes and Queries date ranges in queries, Comparing Values and Doing Arithmetic dateTime datatype, Datatypes and Queries day(), Date and Time Functions DBpedia, Querying a Public Data Source, Using the Labels Provided by DBpedia, SPARQL and Web Application Development querying, Querying a Public Data Source decimal datatype, Datatypes and Queries default graph, Querying Named Graphs, Glossary DELETE, Deleting Data DELETE DATA, Deleting Data, Deleting Data DELETE vs., Deleting Data DESC(), Sorting Data DESCRIBE, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Asking for a Description of a Resource DISTINCT, Eliminating Redundant Output, Eliminating Redundant Output, Querying Named Graphs division, Comparing Values and Doing Arithmetic double precision datatype, Datatypes and Queries DROP, Dropping Graphs Dublin Core, URLs, URIs, IRIs, and Namespaces, Changing Existing Data, Glossary E ENCODE_FOR_URI(), String Functions entailment, The SPARQL Specifications, Glossary F FILTER, Searching for Strings, FILTERing Data Based on Conditions, FILTERing Data Based on Conditions float datatype, Datatypes and Queries floor(), Numeric Functions FOAF (Friend of a Friend), URLs, URIs, IRIs, and Namespaces, Storing RDF in Files, Converting Data, Hash Functions, Glossary hash functions in, Hash Functions Freebase, SPARQL and Web Application Development FROM, Querying the Data, Querying Named Graphs, Copying Data in CONSTRUCT queries, Copying Data FROM NAMED, Querying Named Graphs Fuseki, Getting Started with Fuseki, Getting Started with Fuseki, Adding Data to a Dataset loading data into, Adding Data to a Dataset shutting down, Getting Started with Fuseki starting up, Getting Started with Fuseki G GRAPH, Querying Named Graphs, Querying Named Graphs, Querying Named Graphs, Copying Data, Named Graphs in CONSTRUCT queries, Copying Data in update queries, Named Graphs referencing graphs not named in FROM NAMED clause, Querying Named Graphs variables with, Querying Named Graphs graph pattern, More Realistic Data and Matching on Multiple Triples, Glossary graphs (RDF), More Realistic Data and Matching on Multiple Triples, Glossary GROUP BY, Grouping Data and Finding Aggregate Values within Groups GROUP_CONCAT(), Finding the Smallest, the Biggest, the Count, the Average...


The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin


business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, discrete time, George Gilder, Google Earth, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, late capitalism, linked data, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

The key to avoiding the creation of such a negative cycle is to ensure that any initiative focuses as much on the demand-side as the supply-side, providing users with interoperable data and analytic tools and other services that facilitate use and add value to the data, rather than simply linking to files. Conclusion At one level, the case for open and linked data is commonsensical – open data create transparency and accountability; participation, choice and social innovation; efficiency, productivity and enhanced governance; economic innovation and wealth creation. Linked data convert information across the Internet into a semantic web from which data can be machine-read and linked together. Open and linked data thus hold much promise and value as a venture. However, the case for open and linked data is more complex, and their economic underpinnings are not at all straightforward. Open and linked data might seem to have marginal costs, but their production and the technical and institutional apparatus needed to facilitate and maintain them has real cost in terms of labour, equipment, and resources.

When documents are published in this way, information on the Internet can be rendered and repackaged as data and can be linked in an infinite number of ways depending on purpose. However, as P. Miller (2010) notes, ‘linked data may be open, and open data may be linked, but it is equally possible for linked data to carry licensing or other restrictions that prevent it being considered open’, or for open data to be made available in ways that do not easily enable linking. In general, any linked documents that are not on an intranet or behind a pay wall are also open in nature. For Berners-Lee (2009), open and linked data should ideally be synonymous and he sets out five levels of such data, each with progressively more utility and value (see Table 3.3). His aspiration is for what he terms five-star (level five) data – a fully operational semantic Web.

Since the late 2000s the movement has noticeably gained prominence and traction, initially with the Guardian newspaper’s campaign in the UK to ‘Free Our Data’ (, the Organization for Economic Cooperation and Development (OECD)’s call for member governments to open up their data in 2008, the launch in 2009 by the US government of, a website designed to provide access to non-sensitive and historical datasets held by US state and federal agencies, and the development of linked data and the promotion of the ‘Semantic Web’ as a standard element of future Internet technologies, in which open and linked data are often discursively conjoined (Berners-Lee 2009). Since 2010 dozens of countries and international organisations (e.g., the European Union [EU] and the United Nations Development Programme [UNDP]) have followed suit, making thousands of previously restricted datasets open in nature for non-commercial and commercial use (see DataRemixed 2013).


pages: 511 words: 111,423

Learning SPARQL by Bob Ducharme

Amazon:, hypertext link, linked data, place-making, semantic web, SPARQL, web application

For example, simply knowing that “spouse” is a symmetric term made it possible to find out the identity of Cindy’s spouse, even though this fact was not part of the dataset. We’ll learn more about RDFS and OWL in Chapter 9. Linked Data The idea of Linked Data is newer than that of the semantic web, but sometimes it’s easier to think of the semantic web as building on the ideas behind Linked Data. Linked Data is not a specification, but a set of best practices for providing a data infrastructure that makes it easier to share data across the Web. You can then use semantic web technologies such as RDFS, OWL, and SPARQL to build applications around that data. Tim Berners-Lee came up with these four principles of Linked Data in 2006 (I’ve bolded his wording and added my own commentary): Use URIs as names for things. URIs are the best way available to uniquely identify things, and therefore to identify connections between things.

., Checking, Adding, and Removing Spoken Language Tags langMatches(), Checking, Adding, and Removing Spoken Language Tags language codes, Making RDF More Readable with Language Tags and Labels, Checking, Adding, and Removing Spoken Language Tags–Checking, Adding, and Removing Spoken Language Tags adding, Checking, Adding, and Removing Spoken Language Tags checking, Checking, Adding, and Removing Spoken Language Tags filtering on, Using the Labels Provided by DBpedia removing, Checking, Adding, and Removing Spoken Language Tags LCASE(), String Functions, Discussion LIMIT, Retrieving a Specific Number of Results, Federated Queries: Searching Multiple Datasets with One Query Linked Data, What Exactly Is the “Semantic Web”?, Linked DataLinked Data, Problem, Glossary intranets and, Public Endpoints, Private Endpoints Linked Open Data, Linked Data, Public Endpoints, Private Endpoints Linked Movie Database, SPARQL and Web Application Development, SPARQL and Web Application Development Linked Open Data, Discussion List All Triples query, Named Graphs literal, Data Typing, Glossary LOAD, Adding Data to a Dataset local name, URLs, URIs, IRIs, and Namespaces, Extension Functions, Glossary M magic properties (see property functions) materialization of triples, Inferred Triples and Your Query MAX(), Finding the Smallest, the Biggest, the Count, the Average...

If one airline redesigns their website, the developer must update his screen-scraping program to account for these differences. Berners-Lee came up with the idea of Linked Data as a set of best practices for sharing data across the web infrastructure so that applications can more easily retrieve data from public sites with no need for screen scraping—for example, to let your calendar program get flight information from multiple airline websites in a common, machine-readable format. These best practices recommend the use of URIs to name things and the use of standards such as RDF and SPARQL. They provide excellent guidelines for the creation of an infrastructure for the semantic web. and the semantics of that data The idea of “semantics” is often defined as “the meaning of words.” Linked Data principles and the related standards make it easier to share data, and the use of URIs can provide a bit of semantics by providing the context of a term.


pages: 369 words: 80,355

Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room by David Weinberger


airport security, Alfred Russel Wallace, Amazon Mechanical Turk, Berlin Wall, Black Swan, book scanning, Cass Sunstein, corporate social responsibility, crowdsourcing, Danny Hillis, David Brooks, Debian, double entry bookkeeping, double helix,, Exxon Valdez, Fall of the Berlin Wall, future of journalism, Galaxy Zoo, Hacker Ethic, Haight Ashbury, hive mind, Howard Rheingold, invention of the telegraph, jimmy wales, John Harrison: Longitude, Kevin Kelly, linked data, Netflix Prize, New Journalism, Nicholas Carr, Norbert Wiener, openstreetmap, P = NP, Pluto: dwarf planet, profit motive, Ralph Waldo Emerson, RAND corporation, Ray Kurzweil, Republic of Letters, RFID, Richard Feynman, Richard Feynman, Ronald Reagan, semantic web, slashdot, social graph, Steven Pinker, Stewart Brand, technological singularity, Ted Nelson, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas Malthus, Whole Earth Catalog, X Prize

This approach may be messy and imperfect, but it is 100 percent better than not releasing data because you haven’t figured out how to get the metadata perfectly right. The rise of Linked Data encapsulates the transformation of knowledge we have explored throughout this book. While the original Semantic Web emphasized building ontologies that are “knowledge representations” of the world, it turns out that if we go straight to unleashing an abundance of linked but imperfect data, making it widely and openly available in standardized form, the Net becomes a dramatically improved infrastructure for knowledge. Linked Data is nevertheless itself only an example of a more expansive practice: Create metadata so your information can be reused. Linked Data is usable because it points beyond itself to information about the information. That’s how a “triple” about mercury can be identified as being about the chemical, the planet, or the Roman god.

For example, when an article in the journal Public Library of Science Medicine 43 examines “the predictors of live birth” in in vitro fertilization by analyzing 144,018 attempts, it links to the UK open government site where the source data—“the world’s oldest and most comprehensive database of fertility treatment in the UK”—is available.44 The new default is: If you’re going to cite the data, you might as well link to it. Networked facts point to where they came from and, sometimes, where they lead to. Indeed, a new standard called Linked Data is making it easier to make the facts presented in one site useful to other sites in unanticipated ways—enabling an ad hoc worldwide data commons. Key to Linked Data is the ability for a computer program not only to get the fact but to ask the resource for a link to more information about the context of the fact.45 Facts have become networked because our new information infrastructure happens also to be a hyperlinked publishing system. If you’re going to make a fact visible, it’s so easy to link it to its source that you’ll need some special justification not to do so.

Rather, it is that “[t]rust should have no part in science.” We used to need trust because paper-based publishing breaks knowledge off from its source. Now, however, science—which has always had a network of inter-cited publications—occurs within a network of links. We create these links by hand, computers prowl the Web suggesting new links, and the surge of interest in the Linked Data format is making it easier than ever to create clouds of linked data just waiting for new uses. In this hyperlinked environment, we will continue to tell science’s stories, but those stories will be embedded within a system of connections. We will click to see the data. We will click to have our computers compare disparate datasets, surfacing the anomalies and disagreements that will never be entirely driven out from the data of science or from its stories.


pages: 223 words: 52,808

Intertwingled: The Work and Influence of Ted Nelson (History of Computing) by Douglas R. Dechow


3D printing, Apple II, Bill Duvall, Brewster Kahle, Buckminster Fuller, Claude Shannon: information theory, cognitive dissonance, computer age, conceptual framework, Douglas Engelbart, Dynabook, Edward Snowden, game design, HyperCard, hypertext link, information retrieval, Internet Archive, Jaron Lanier, knowledge worker, linked data, Marshall McLuhan, Menlo Park, Mother of all demos, pre–internet, RAND corporation, semantic web, Silicon Valley, software studies, Steve Jobs, Steve Wozniak, Stewart Brand, Ted Nelson, the medium is the message, Vannevar Bush, Wall-E, Whole Earth Catalog

So for me this really was a seminal conference with so many truly ground breaking ideas emerging at the same time, apparently orthogonal to each other but actually all the same thing as time has confirmed, since the Google Knowledge Graph is the Semantic Web or ZigZag by another name. It’s all about linking data. This is a much quieter revolution than that initiated by the document Web but it will be much more far reaching. Linked data will become an integral part of the development of data-driven systems architectures that will revolutionize the way we build and maintain information management systems. Linked data architectures will supersede relational databases, make websites easier to build and unify the worlds of hypertext, document management, and databases to create rich interlinked knowledge-based systems as envisaged by the pioneers such as Ted and Doug over 50 years ago. But the linked data revolution was very slow to take off—largely because it’s hard to explain the key concepts to people and what the benefits are.

Open Access This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited. References 1. Agosti M, Ferr N (2007) A formal model of annotations of digital content. ACM Trans Inf Syst 26(1). doi:10.​1145/​1292591.​1292594 2. Baca M (1998) Introduction to metadata: pathways to digital information. Getty Information Institute, Los Angeles 3. Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, Goble C et al (2013) Why linked data is not enough for scientists. Futur Gener Comput Syst 29(2). Special section: Recent advances in e-Science: 599–611. doi:10.​1016/​j.​future.​2011.​08.​004 4. Bechhofer S, De Roure D, Gamble M, Goble C, Buchan I (2010) Research objects: towards exchange and reuse of digital knowledge. Nat Proc. doi:10.​1038/​npre.​2010.​4626.​1 5. Bell G (2001) A personal digital store. Commun ACM 44(1):86–91. doi:10.​1145/​357489.​357513 CrossRef 6.

Three things happened at that conference as I recall. Tim started talking about the Semantic Web again in his keynote for the conference. He had talked about it at the first WWW conference in 1994 [1] and the idea of making links on data in the information management proposal he wrote in 1989. As far as he was concerned in 1998, the web of linked documents was beginning to emerge but his vision wasn’t complete until it was also a web of linked data, and so he started to re-educate the community about this at the Brisbane conference. Ted was also at the Brisbane conference to pick up a special award. I remember him demoing ZigZag to us in the bar one night at that conference. He was so excited, and we were all mesmerized. So I had heard Tim talk about the Semantic Web and I saw Ted demo ZigZag at the same conference, and I didn’t fully appreciate either of them at the time.


pages: 100 words: 15,500

Getting Started with D3 by Mike Dewar


Firefox, Google Chrome, linked data, margin call

First, we lay out the circles and edges: var width = 1500, height = 1500; var svg ="body") .append("svg") .attr("width", width) .attr("height", height); var node = svg.selectAll("circle.node") .data(data.nodes) .enter() .append("circle") .attr("class", "node") .attr("r", 12); var link = svg.selectAll("") .data(data.links) .enter().append("line") .style("stroke","black"); This populates the web page with the appropriate elements, we just need to lay them out. The force layout applies a force-directed algorithm to decide the position of each node. Here, each node feels a repulsive force from every other node, but is constrained by the edges that keep nodes connected together. This can result in an organic layout that looks wonderfully inviting as it unfolds. D3 makes it easy; first we instantiate the algorithm: var force = d3.layout.force() .charge(-120) .linkDistance(30) .size([width, height]) .nodes(data.nodes) .links(data.links) .start(); These methods are all custom methods for the algorithm that detail the various parameters and references the algorithm needs to compute how the position of the nodes and edges should change.


pages: 713 words: 93,944

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Eric Redmond, Jim Wilson, Jim R. Wilson


Amazon Web Services, create, read, update, delete, data is the new oil, database schema, Debian, domain-specific language,, fault tolerance, full text search, general-purpose programming language, linked data, MVC pattern, natural language processing, node package manager, random walk, recommendation engine, Skype, social graph, web application

For example, if the text of the article on Star Wars contains the string "[[Yoda|jedi master]]", we want to store that relationship twice—once as an outgoing link from Star Wars and once as an incoming link to Yoda. Storing the relationship twice means that it’s fast to look up both a page’s outgoing links and its incoming links. To store this additional link data, we’ll create a new table. Head over to the shell and enter this: ​​hbase> create 'links', {​​ ​​ NAME => 'to', VERSIONS => 1, BLOOMFILTER => 'ROWCOL'​​ ​​},{​​ ​​ NAME => 'from', VERSIONS => 1, BLOOMFILTER => 'ROWCOL'​​ ​​}​​ In principle, we could have chosen to shove the link data into an existing column family or merely added one or more additional column families to the wiki table, rather than create a new one. Creating a separate table has the advantage that the tables have separate regions. This means that the cluster can more effectively split regions as necessary.

A graph database consists of nodes and relationships between nodes. Both nodes and relationships can have properties—key-value pairs—that store data. The real strength of graph databases is traversing through the nodes by following relationships. In Chapter 7, ​Neo4J​, we discuss the most popular graph database today, Neo4J. Neo4J One operation where other databases often fall flat is crawling through self-referential or otherwise intricately linked data. This is exactly where Neo4J shines. The benefit of using a graph database is the ability to quickly traverse nodes and relationships to find relevant data. Often found in social networking applications, graph databases are gaining traction for their flexibility, with Neo4j as a pinnacle implementation. Polyglot In the wild, databases are often used alongside other databases. It’s still common to find a lone relational database, but over time it is becoming popular to use several databases together, leveraging their strengths to create an ecosystem that is more powerful, capable, and robust than the sum of its parts.

We’ll put Ace in cage 2 and also point to cage 1 tagged with next_to so we know that it’s nearby. ​​$ curl -X PUT http://localhost:8091/riak/cages/2 \​​ ​​-H "Content-Type: application/json" \​​ ​​-H "Link:</riak/animals/ace>;riaktag=\"contains\",​​ ​​ </riak/cages/1>;riaktag=\"next_to\"" \​​ ​​-d '{"room" : 101}'​​ What makes Links special in Riak is link walking (and a more powerful variant, linked mapreduce queries, which we investigate tomorrow). Getting the linked data is achieved by appending a link spec to the URL that is structured like this: /_,_,_. The underscores (_) in the URL represent wildcards to each of the link criteria: bucket, tag, keep. We’ll explain those terms shortly. First let’s retrieve all links from cage 1. ​​$ curl http://localhost:8091/riak/cages/1/_,_,_​​ ​​--4PYi9DW8iJK5aCvQQrrP7mh7jZs​​ ​​Content-Type: multipart/mixed; boundary=Av1fawIA4WjypRlz5gHJtrRqklD​​ ​​​​ ​​--Av1fawIA4WjypRlz5gHJtrRqklD​​ ​​X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fvrde/U5gymRMY+VwZw35gRfFgA=​​ ​​Location: /riak/animals/polly​​ ​​Content-Type: application/json​​ ​​Link: </riak/animals>; rel="up"​​ ​​Etag: VD0ZAfOTsIHsgG5PM3YZW​​ ​​Last-Modified: Tue, 13 Dec 2011 17:53:59 GMT​​ ​​​​ ​​{"nickname" : "Sweet Polly Purebred", "breed" : "Purebred"}​​ ​​--Av1fawIA4WjypRlz5gHJtrRqklD--​​ ​​​​ ​​--4PYi9DW8iJK5aCvQQrrP7mh7jZs--​​ It returns a multipart/mixed dump of headers plus bodies of all linked keys/values.


pages: 193 words: 19,478

Memory Machines: The Evolution of Hypertext by Belinda Barnet


augmented reality, Benoit Mandelbrot, Bill Duvall, British Empire, Buckminster Fuller, Claude Shannon: information theory, collateralized debt obligation, computer age, conceptual framework, Douglas Engelbart, game design, hiring and firing, Howard Rheingold, HyperCard, hypertext link, information retrieval, Internet Archive, linked data, mandelbrot fractal, Marshall McLuhan, Menlo Park, nonsequential writing, Norbert Wiener, publish or perish, semantic web, Steve Jobs, Stewart Brand, technoutopianism, Ted Nelson, the scientific method, Vannevar Bush, wikimedia commons

(Bolter 1984, 163) Bolter’s ideas around ‘topographic writing’ were nascent when he started collaborating with Joyce in September 1983 (Bolter and Joyce 1986, 10). They would later have a profound influence over hypertext theory and criticism, and also the Storyspace system. From the outset, the nodes in Storyspace were called ‘writing spaces’, and it worked explicitly with topographic MACHINE-ENHANCED (RE)MINDING 121 metaphors, incorporating a graphic ‘map view’ of the link data structure from the first version, along with a tree and an outline view (which are also visual representations of the data). ‘The tree’, Bolter tells us in Turing’s Man, ‘is a remarkably useful way of representing logical relations in spatial terms’ (Bolter 1984, 86). Also in line with the topographic metaphor, writing spaces in Storyspace acted (and still act) as containers for other writing spaces; an author literally ‘builds’ the space as she traverses it, zooming in and out to view details of the work, the map making the territory.

GLOSSA was a basic implementation in this sense, based on Bolter’s experience with classical texts. ‘You’d tab a text and then you’d be able to associate notes with any particular word or phrase in the text […] an automated version of classical texts with notes’ (Bolter 2011). It wasn’t clickable because the IBM PC wasn’t clickable at the time; the user would move the cursor over the word and select it. This link data structure formed the basis for their future experiments ‘only in the sense that it had this quality of one text leading to another’ (Bolter 2011). In his well-researched chapter on afternoon, Matthew Kirschenbaum suggests that Storyspace has ‘significant grounding in a hierarchical data model’ (Kirschenbaum 2008, 173) that has its origins in the tree structures of ‘interactive fictions of the Adventure type’ (Kirschenbaum 2008, 175).

Hypertext critic Jane Yellowlees Douglas (Joyce’s favourite reader, whose dissertation was on afternoon) argues this node ‘completes’ the work for her, but it is accessible only after the reader has seen a certain sequence of other nodes; ‘a succession of guard fields ensures that it is reached only after a lengthy visitation of fifty-seven narrative places’ (Yellowlees Douglas 2004, 106). Guard fields are a powerful device, and one that Joyce deploys to full effect in afternoon. According to the Markle Report, Joyce ‘agitated’ for them to be included in the design of Storyspace from the outset, and Bolter quickly obliged in their fledgling program: It was just a matter of putting a field into the link data structure that would contain the guard, and then just checking that field […] against what the user did before they were allowed to follow the link […] It was [that] idea you know and it was Michael’s. (Bolter 2011) Guard fields, along with the topographic ‘spatial’ writing style, have remained integral to the Storyspace program for 30 years hence. In 1985 Bolter became involved with an interdisciplinary research group at UNC directed by a colleague from computer science, John B.


pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem


Amazon Web Services, anti-pattern, bioinformatics, corporate governance, create, read, update, delete, data acquisition,, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, web application

Individually, single triples are semantically rather poor, but en-masse they provide a rich dataset from which to harvest knowledge and infer connections. Triple stores typically provide SPARQL ca‐ pabilities to reason about stored RDF data.11 RDF—the lingua franca of triple stores and the Semantic Web—can be serialized several ways. RDF encoding of a simple three-node graph shows the RDF/XML format. Here we see how triples come together to form linked data. RDF encoding of a simple three-node graph. <rdf:RDF xmlns:rdf="" xmlns=" <rdf:Description rdf:about=""> <name>Ginger Rogers</name> <occupation>dancer</occupation> <partner rdf:resource=""/> </rdf:Description> 10. 11. See and Graph Databases | 185 <rdf:Description rdf:about=""> <name>Fred Astaire</name> <occupation>dancer</occupation> <likes rdf:resource=""/> </rdf:Description> </rdf:RDF> W3C support That they produce logical representations of triples doesn’t mean triple stores necessarily have triple-like internal implementations.

See and Graph Databases | 185 <rdf:Description rdf:about=""> <name>Fred Astaire</name> <occupation>dancer</occupation> <likes rdf:resource=""/> </rdf:Description> </rdf:RDF> W3C support That they produce logical representations of triples doesn’t mean triple stores necessarily have triple-like internal implementations. Most triple stores, however, are unified by their support for Semantic Web technology such as RDF and SPARQL. While there’s nothing particularly special about RDF as a means of serializing linked data, it is en‐ dorsed by the W3C and therefore benefits from being widely understood and well doc‐ umented. The query language SPARQL benefits from similar W3C patronage. In the graph database space there is a similar abundance of innovation around graph serialization formats (e.g. GEOFF) and inferencing query languages (e.g. the Cypher query language that we use throughout this book).12 The key difference is that at this point these innovations do not enjoy the patronage of a well-regarded body like the W3C, though they do benefit from strong engagement within their user and vendor communities.


pages: 58 words: 12,386

Big Data Glossary by Pete Warden


business intelligence, crowdsourcing, fault tolerance, information retrieval, linked data, natural language processing, recommendation engine, web application

Google Refine Google Refine is an update to the Freebase Gridworks tool for cleaning up large, messy spreadsheets. It has been designed to make it easy to correct the most common errors you’ll encounter in human-created datasets. For example, it’s easy to spot and correct common problems like typos or inconsistencies in text values and to change cells from one format to another. There’s also rich support for linking data by calling APIs with the data contained in existing rows to augment the spreadsheet with information from external sources. Refine doesn’t let you do anything you can’t with other tools, but its power comes from how well it supports a typical extract and transform workflow. It feels like a good step up in abstraction, packaging processes that would typically take multiple steps in a scripting language or spreadsheet package into single operations with sensible defaults.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst


algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

Some tools combine this capability with in-place transformation at the target database as well, taking advantage of the computing capabilities of engineered machines and using change data capture to synchronize, source, and target, again without the overhead of a middle tier. In both cases, the overarching principle is real-time data integration, in which reflecting data change instantly in a data warehouse—whether originating from a MapReduce job or from a transactional system—and create downstream analytics that have an accurate, timely view of reality. Others are turning to linked data and semantics, where data sets are created using linking methodologies that focus on the semantics of the data. This fits well into the broader notion of pointing at external sources from within a data set, which has been around for quite a long time. That ability to point to unstructured data (whether residing in the file system or some external source) merely becomes an extension of the given capabilities, in which the ability to store and process XML and XQuery natively within an RDBMS enables the combination of different degrees of structure while searching and analyzing the underlying data.


pages: 224 words: 13,238

Electronic and Algorithmic Trading Technology: The Complete Guide by Kendall Kim


algorithmic trading, automated trading system, backtesting, corporate governance, Credit Default Swap, diversification,, family office, financial innovation, fixed income, index arbitrage, index fund, interest rate swap, linked data, market fragmentation, natural language processing, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, short selling, statistical arbitrage, Steven Levy, transaction costs, yield curve

However, most financial services institutions do not have the ability to reach an optimal infrastructure because resources for most of a brokerage firm’s cost center have fallen victim to applying discretionary funds within the profit center such as the trading area of the business. It is clearly evident that budgets for data infrastructure have been reduced in the past years when the need for enhancing performance and technology has never been greater. Presumably, this will change in the future, though, when linking data to trading profitability becomes more evident. 8.5 Impact on Operations and Technology Real-time transaction processing and electronic trading can result in a great deal of automation for operations. Real-time transactions move more Effective Data Management 89 quickly, tend to be more accurate, have fewer problems, and need less attention than manually engaged transactions. According to the TABB Group, 60% of trades were processed manually over seven years ago.


pages: 356 words: 102,224

Pale Blue Dot: A Vision of the Human Future in Space by Carl Sagan


Albert Einstein, anthropic principle, cosmological principle, dark matter, Dava Sobel, Francis Fukuyama: the end of history, germ theory of disease, invention of the telescope, Isaac Newton, Kuiper Belt, linked data, nuclear winter, planetary scale, profit motive, Search for Extraterrestrial Intelligence, Stephen Hawking, telepresence

You take a step forward, and the rover walks forward. You reach out your arm to pick up something shiny in the soil, and the robot arm does likewise. The sands of Mars trickle through your fingers. The only difficulty with this remote reality technology is that all this must occur in tedious slow motion: The round-trip travel time of 115 the up-link commands from Earth to Mars and the down-link data returned from Mars to Earth might take half an hour or more. But this is something we can learn to do. We can learn to contain our exploratory impatience if that's the price of exploring Mars. The rover can be made smart enough to deal with routine contingencies. Anything more challenging, and it makes a dead stop, puts itself into a safeguard mode, and radios for a very patient human controller to take over.


pages: 313 words: 101,403

My Life as a Quant: Reflections on Physics and Finance by Emanuel Derman


Berlin Wall, bioinformatics, Black-Scholes formula, Brownian motion, capital asset pricing model, Claude Shannon: information theory, Emanuel Derman, fixed income, Gödel, Escher, Bach, haute couture, hiring and firing, implied volatility, interest rate derivative, Jeff Bezos, John von Neumann, law of one price, linked data, Long Term Capital Management, moral hazard, Murray Gell-Mann, pre–internet, publish or perish, quantitative trading / quantitative finance, Richard Feynman, Sharpe ratio, statistical arbitrage, statistical model, Stephen Hawking, Steve Jobs, stochastic volatility, technology bubble, transaction costs, value at risk, volatility smile, Y2K, yield curve, zero-coupon bond

A colleague, Ed Sheppard, was assigned to work with me, and we planned to rewrite the system to incorporate multidimensional array variables in order to represent more general financial time series. While I was away on a two-week beach vacation at Fire Island with my family, Ed suddenly threw himself into redesigning and then rewriting the entire system-without giving me advance notice. I returned to a fait accompli, a completely new, enhanced, and almost unrecognizable APL-flavored version of the language. Ed's version now incorporated vastly complex dynamically linked data structures, whose details I knew I would not live long enough to master. Ed had also cleverly modified HEQS so that, once you had used it interactively to develop and solve a financial model, you could then use it generate a C program that would solve your equations many times faster. Programming came naturally to Ed in a way it never would to me, and his proficiency daunted me. Sometime in late 1984 he left to join Asymetrix, a Seattle-based company founded by Paul Allen.


pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists by Gary Marcus, Jeremy Freeman


23andMe, Albert Einstein, bioinformatics, bitcoin, brain emulation, cloud computing, complexity theory, computer age, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data acquisition, Drosophila, epigenetics, Google Glasses, iterative process, linked data, mouse model, optical character recognition, pattern recognition, personalized medicine, phenotype, race to the bottom, Richard Feynman, Richard Feynman, Ronald Reagan, semantic web, speech recognition, stem cell, Steven Pinker, supply-chain management, Turing machine, web application

While efforts to map the brain have begun as public, government-funded projects, this does not mean that private entities will not enter the arena and seek to compete with those projects. Although initial efforts to map the brain may be fueled by public funds, the issue of how “fine-tuned” information that can be used to determine risk factors or emerging disease states in individual’s brains, which will require linking data to genetic databases, health records, and health databases, will be handled merits discussion now. What rules will govern the sharing of detailed scans or maps about each individual’s brain? Can data be linked from a brain scan to a genome to a database without an individual’s express consent if that person’s identity is not 100 percent secure? What information about the brain can be patented?


Future Files: A Brief History of the Next 50 Years by Richard Watson


Albert Einstein, bank run, banking crisis, battle of ideas, Black Swan, call centre, carbon footprint, cashless society, citizen journalism, computer age, computer vision, congestion charging, corporate governance, corporate social responsibility, deglobalization, digital Maoism, disintermediation, epigenetics, failed state, financial innovation, Firefox, food miles, future of work, global supply chain, global village, hive mind, industrial robot, invention of the telegraph, Jaron Lanier, Jeff Bezos, knowledge economy, linked data, low skilled workers, M-Pesa, Northern Rock, peak oil, pensions crisis, precision agriculture, prediction markets, Ralph Nader, Ray Kurzweil, rent control, RFID, Richard Florida, self-driving car, speech recognition, telepresence, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Turing test, Victor Gruen, white flight, women in the workforce, Zipcar

Carolyn 153 trends that will transform transport 5 Embedded intelligence Cars can already be opened or started using fingerprint and iris recognition, so we’ll see more technologies linking vehicle security to user identification. We will also see mood-sensitive vehicles that adjust their behavior according to the mood of the driver or occupants. Cars will also become mobile technology platforms linking data to other services such as healthcare. For example, if your car regularly detects an abnormal heartbeat or high levels of stress, this information could be sent wirelessly to your doctor. Obviously privacy issues abound, but cars could become useful data-collection and delivery points. Remote monitoring Electronic data recorders are little black boxes that already sit covertly inside some cars and monitor your speed, acceleration and braking.


pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future by Kevin Kelly


3D printing, A Declaration of the Independence of Cyberspace, AI winter, Airbnb, Albert Einstein, Amazon Web Services, augmented reality, bank run, barriers to entry, Baxter: Rethink Robotics, bitcoin, blockchain, book scanning, Brewster Kahle, Burning Man, cloud computing, computer age, connected car, crowdsourcing, dark matter, dematerialisation, Downton Abbey, Edward Snowden, Elon Musk, Filter Bubble, Freestyle chess, game design, Google Glasses, hive mind, Howard Rheingold, index card, indoor plumbing, industrial robot, Internet Archive, Internet of things, invention of movable type, invisible hand, Jaron Lanier, Jeff Bezos, job automation, Kevin Kelly, Kickstarter, linked data, Lyft, M-Pesa, Marshall McLuhan, means of production, megacity, Minecraft, multi-sided market, natural language processing, Netflix Prize, Network effects, new economy, Nicholas Carr, peer-to-peer lending, personalized medicine, placebo effect, planetary scale, postindustrial economy, recommendation engine, RFID, ride hailing / ride sharing, Rodney Brooks, self-driving car, sharing economy, Silicon Valley, slashdot, Snapchat, social graph, social web, software is eating the world, speech recognition, Stephen Hawking, Steven Levy, Ted Nelson, the scientific method, transport as a service, two-sided market, Uber for X, Watson beat the top human players on Jeopardy!, Whole Earth Review

Just as the internet is the network of networks, the intercloud is the cloud of clouds. Slowly but surely Amazon’s cloud and Google’s cloud and Facebook’s cloud and all the other enterprise clouds are intertwining into one massive cloud that acts as a single cloud—The Cloud—to the average user or company. A counterforce resisting this merger is that an intercloud requires commercial clouds to share their data (a cloud is a network of linked data), and right now data tends to be hoarded like gold. Data hoards are seen as a competitive advantage, and sharing data freely is hampered by laws, so it will be many years (decades?) before companies learn how to share their data creatively, productively, and responsibly. There is one final step in the inexorable march toward decentralized access. At the same time we are moving to an intercloud we will also move toward one that is fully decentralized and peer to peer.


The Art of Computer Programming: Fundamental Algorithms by Donald E. Knuth


discrete time, distributed generation, fear of failure, Fermat's Last Theorem, Isaac Newton, Jacquard loom, Jacquard loom, John von Neumann, linear programming, linked data, Menlo Park, probability theory / Blaise Pascal / Pierre de Fermat, Richard Feynman, sorting algorithm, stochastic process, Turing machine

One possible answer for the example above would be X[l]: X[2]: X[3]: BASE 2400 2430 2450 SUB 1002 1010 1006 X[4]: X[5]: X[61: BASE 2510 2530 2730 SUB 1000 1003 0 The last entry contains the first unused memory address. (Clearly, this is not the only way to treat a library of subroutines. The proper way to design a library is heavily dependent upon the computer used and the applications to be handled. Large modern computers require an entirely different approach to subroutine libraries. But this is a nice exercise anyway, because it involves interesting manipulations on both sequential and linked data.) The problem in this exercise is to design an algorithm for the stated task. Your allocator may transform the tape directory in any way as it prepares its answer, since the tape directory can be read in anew by the subroutine allocator on its next assignment, and the tape directory is not needed by other parts of the loading routine. 27. [25] Write a MIX program for the subroutine allocation algorithm of exercise 26. 28. [40] The following construction shows how to "solve" a fairly general type of two- person game, including chess, nim, and many simpler games: Consider a finite set of nodes, each of which represents a possible position in the game.

The LINK field of each Symbol Table entry points to the most recently encoun- encountered Data Table entry for the symbolic name in question. The first algorithm we require is one that builds the Data Table in such a form. Note the flexibility in choice of level numbers that is allowed by the COBOL rules; the left structure in D) is completely equivalent to 1 A 2 B 3 C 3 D 2 E 2 F 3 G because level numbers do not have to be sequential. 428 INFORMATION STRUCTURES 2.4 Symbol Table LINK Data Table PREV PARENT NAME CHILD SIB A: B: C: D: E: F: G: H: Al B5 C5 D9 E9 F5 G9 HI Empty boxes indicate additional information not relevant here A A A A A A A A F3 G4 B3 C7 E3 D7 G8 A Al B3 B3 Al Al F3 A HI F5 HI HI C5 C5 C5 A B C D E F G H F G B C E D G B3 C7 A A A G4 A F5 G8 A A E9 A A A HI E3 D7 A F3 A A A B5 A C5 A D9 G9 A E) Al: B3? C7: D7: E3: F3: G4: HI: F5: G8: C5: E9: D9: G9: Some sequences of level numbers are illegal, however; for example, if the level number of D in D) were changed to " (in either place) we would have a meaningless data configuration, violating the rule that all items of a group must have the same number.


pages: 404 words: 43,442

The Art of R Programming by Norman Matloff


Debian, discrete time, general-purpose programming language, linked data, sorting algorithm, statistical model

In our example tree, where the root node contains 8, all of the values in the left subtree—5, 2 and 6—are less than 8, while 20 is greater than 8. If implemented in C, a tree node would be represented by a C struct, similar to an R list, whose contents are the stored value, a pointer to the left child, and a pointer to the right child. But since R lacks pointer variables, what can we do? Our solution is to go back to the basics. In the old prepointer days in FORTRAN, linked data structures were implemented in long arrays. A pointer, which in C is a memory address, was an array index instead. Specifically, we’ll represent each node by a row in a three-column matrix. The node’s stored value will be in the third element of that row, while the first and second elements will be the left and right links. For instance, if the first element in a row is 29, it means that this node’s left link points to the node stored in row 29 of the matrix.


pages: 598 words: 134,339

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World by Bruce Schneier


23andMe, Airbnb, airport security, AltaVista, Anne Wojcicki, augmented reality, Benjamin Mako Hill, Black Swan, Brewster Kahle, Brian Krebs, call centre, Cass Sunstein, Chelsea Manning, citizen journalism, cloud computing, congestion charging, disintermediation, Edward Snowden, experimental subject, failed state, fault tolerance, Ferguson, Missouri, Filter Bubble, Firefox, friendly fire, Google Chrome, Google Glasses, hindsight bias, informal economy, Internet Archive, Internet of things, Jacob Appelbaum, Jaron Lanier, Julian Assange, Kevin Kelly, license plate recognition, linked data, Lyft, Mark Zuckerberg, Nash equilibrium, Nate Silver, national security letter, Network effects, Occupy movement, payday loans, pre–internet, price discrimination, profit motive, race to the bottom, RAND corporation, recommendation engine, RFID, self-driving car, Silicon Valley, Skype, smart cities, smart grid, Snapchat, social graph, software as a service, South China Sea, stealth mode startup, Steven Levy, Stuxnet, TaskRabbit, telemarketer, Tim Cook: Apple, transaction costs, Uber and Lyft, urban planning, WikiLeaks, zero day

., 160 fiduciary responsibility, data collection and, 204–5 50 Cent Party, 114 FileVault, 215 filter bubble, 114–15 FinFisher, 81 First Unitarian Church of Los Angeles, 91 FISA (Foreign Intelligence Surveillance Act; 1978), 273 FISA Amendments Act (2008), 171, 273, 275–76 Section 702 of, 65–66, 173, 174–75, 261 FISA Court, 122, 171 NSA misrepresentations to, 172, 337 secret warrants of, 174, 175–76, 177 transparency needed in, 177 fishing expeditions, 92, 93 Fitbit, 16, 112 Five Eyes, 76 Flame, 72 FlashBlock, 49 flash cookies, 49 Ford Motor Company, GPS data collected by, 29 Foreign Intelligence Surveillance Act (FISA; 1978), 273 see also FISA Amendments Act Forrester Research, 122 Fortinet, 82 Fox-IT, 72 France, government surveillance in, 79 France Télécom, 79 free association, government surveillance and, 2, 39, 96 freedom, see liberty Freeh, Louis, 314 free services: overvaluing of, 50 surveillance exchanged for, 4, 49–51, 58–59, 60–61, 226, 235 free speech: as constitutional right, 189, 344 government surveillance and, 6, 94–95, 96, 97–99 Internet and, 189 frequent flyer miles, 219 Froomkin, Michael, 198 FTC, see Federal Trade Commission, US fusion centers, 69, 104 gag orders, 100, 122 Gamma Group, 81 Gandy, Oscar, 111 Gates, Bill, 128 gay rights, 97 GCHQ, see Government Communications Headquarters Geer, Dan, 205 genetic data, 36 geofencing, 39–40 geopolitical conflicts, and need for surveillance, 219–20 Georgia, Republic of, cyberattacks on, 75 Germany: Internet control and, 188 NSA surveillance of, 76, 77, 122–23, 151, 160–61, 183, 184 surveillance of citizens by, 350 US relations with, 151, 234 Ghafoor, Asim, 103 GhostNet, 72 Gill, Faisal, 103 Gmail, 31, 38, 50, 58, 219 context-sensitive advertising in, 129–30, 142–43 encryption of, 215, 216 government surveillance of, 62, 83, 148 GoldenShores Technologies, 46–47 Goldsmith, Jack, 165, 228 Google, 15, 27, 44, 48, 54, 221, 235, 272 customer loyalty to, 58 data mining by, 38 data storage capacity of, 18 government demands for data from, 208 impermissible search ad policy of, 55 increased encryption by, 208 as information middleman, 57 linked data sets of, 50 NSA hacking of, 85, 208 PageRank algorithm of, 196 paid search results on, 113–14 search data collected by, 22–23, 31, 123, 202 transparency reports of, 207 see also Gmail Google Analytics, 31, 48, 233 Google Calendar, 58 Google Docs, 58 Google Glass, 16, 27, 41 Google Plus, 50 real name policy of, 49 surveillance by, 48 Google stalking, 230 Gore, Al, 53 government: checks and balances in, 100, 175 surveillance by, see mass surveillance, government Government Accountability Office, 30 Government Communications Headquarters (GCHQ): cyberattacks by, 149 encryption programs and, 85 location data used by, 3 mass surveillance by, 69, 79, 175, 182, 234 government databases, hacking of, 73, 117, 313 GPS: automobile companies’ use of, 29–30 FBI use of, 26, 95 police use of, 26 in smart phones, 3, 14 Grayson, Alan, 172 Great Firewall (Golden Shield), 94, 95, 150–51, 187, 237 Greece, wiretapping of government cell phones in, 148 greenhouse gas emissions, 17 Greenwald, Glenn, 20 Grindr, 259 Guardian, Snowden documents published by, 20, 67, 149 habeas corpus, 229 hackers, hacking, 42–43, 71–74, 216, 313 of government databases, 73, 117, 313 by NSA, 85 privately-made technology for, 73, 81 see also cyberwarfare Hacking Team, 73, 81, 149–50 HAPPYFOOT, 3 Harris Corporation, 68 Harris Poll, 96 Hayden, Michael, 23, 147, 162 health: effect of constant surveillance on, 127 mass surveillance and, 16, 41–42 healthcare data, privacy of, 193 HelloSpy, 3, 245 Hewlett-Packard, 112 Hill, Raquel, 44 hindsight bias, 322 Hobbes, Thomas, 210 Home Depot, 110, 116 homosexuality, 97 Hoover, J.


pages: 494 words: 142,285

The Future of Ideas: The Fate of the Commons in a Connected World by Lawrence Lessig


AltaVista, Andy Kessler, barriers to entry, business process, Cass Sunstein, computer age, dark matter, disintermediation, Erik Brynjolfsson, George Gilder, Hacker Ethic, Hedy Lamarr / George Antheil, Howard Rheingold, Hush-A-Phone, HyperCard, hypertext link, Innovator's Dilemma, invention of hypertext, inventory management, invisible hand, Jean Tirole, Jeff Bezos, Joseph Schumpeter, linked data, Menlo Park, Network effects, new economy, packet switching, price mechanism, profit maximization, RAND corporation, rent control, rent-seeking, RFC: Request For Comment, Richard Stallman, Richard Thaler, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, Silicon Valley, smart grid, software patent, spectrum auction, Steve Crocker, Steven Levy, Stewart Brand, Ted Nelson, Telecommunications Act of 1996, The Chicago School, transaction costs

And these thousands produced a far better, more complete, and richer database of culture than commercial sites had produced. For a time, one could find an extraordinary range of songs archived throughout the Web. Slowly these services have migrated to commercial sites. This migration means the commercial sites can support the costs of developing and maintaining this information. And in some cases, with some databases, the Internet provided a simple way to collect and link data about music in particular.8 Here the CDDB—or “CD database”—is the most famous example. As MP3 equipment became common, people needed a simple way to get information about CD titles and tracks onto the MP3 device. Of course, one could type in that information, but why should everyone have to type in that information? Many MP3 services thus enabled a cooperative process. When a user installed a CD, the system queried the central database to see whether that CD had been cataloged thus far.


pages: 528 words: 146,459

Computer: A History of the Information Machine by Martin Campbell-Kelly, William Aspray, Nathan L. Ensmenger, Jeffrey R. Yost


Ada Lovelace, air freight, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Apple's 1984 Super Bowl advert, barriers to entry, Bill Gates: Altair 8800, borderless world, Buckminster Fuller, Build a better mousetrap, Byte Shop, card file, cashless society, cloud computing, combinatorial explosion, computer age, deskilling, don't be evil, Douglas Engelbart, Dynabook, fault tolerance, Fellow of the Royal Society, financial independence, Frederick Winslow Taylor, game design, garden city movement, Grace Hopper, informal economy, interchangeable parts, invention of the wheel, Jacquard loom, Jacquard loom, Jeff Bezos, jimmy wales, John von Neumann, linked data, Mark Zuckerberg, Marshall McLuhan, Menlo Park, natural language processing, Network effects, New Journalism, Norbert Wiener, Occupy movement, optical character recognition, packet switching, PageRank, pattern recognition, pirate software, popular electronics, prediction markets, pre–internet, QWERTY keyboard, RAND corporation, Robert X Cringely, Silicon Valley, Silicon Valley startup, Steve Jobs, Steven Levy, Stewart Brand, Ted Nelson, the market place, Turing machine, Vannevar Bush, Von Neumann architecture, Whole Earth Catalog, William Shockley: the traitorous eight, women in the workforce, young professional

was already well established when two other Stanford University doctoral students, Larry Page and Sergey Brin, began work on the Stanford Digital Library Project (funded in part by the National Science Foundation)—research that would not only forever change the process of finding things on the Internet but also, in time, lead to an unprecedentedly successful web advertising model. Page became interested in a dissertation project on the mathematical properties of the web, and found strong support from his adviser Terry Winograd, a pioneer of artificial intelligence research on natural language processing. Using a “web crawler” to gather back-link data (that is, the websites that linked to a particular site), Page, now teamed up with Brin, created their “PageRank” algorithm based on back-links ranked by importance—the more prominent the linking site, the more influence it would have on the linked site’s page rank. They insightfully reasoned that this would provide the basis for more useful web searches than any existing tools and, moreover, that there would be no need to hire a corps of indexing staff.


The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil


additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, augmented reality, autonomous vehicles, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business intelligence,, call centre, carbon-based life, cellular automata, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, disintermediation, double helix, Douglas Hofstadter,, epigenetics, factory automation, friendly AI, George Gilder, Gödel, Escher, Bach, informal economy, information retrieval, invention of the telephone, invention of the telescope, invention of writing, Isaac Newton, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Mikhail Gorbachev, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Norbert Wiener, oil shale / tar sands, optical character recognition, pattern recognition, phenotype, premature optimization, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Richard Feynman, Rodney Brooks, Search for Extraterrestrial Intelligence, semantic web, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, transaction costs, Turing machine, Turing test, Vernor Vinge, Y2K, Yogi Berra

Resources and Contact Information New developments in the diverse fields discussed in this book are accumulating at an accelerating pace. To help you keep pace, I invite you to visit, where you will find ·Recent news stories ·A compilation of thousands of relevant news stories going back to 2001 from (see below) ·Hundreds of articles on related topics from ·Research links ·Data and citation for all graphs ·Material about this book ·Excerpts from this book ·Online endnotes You are also invited to visit our award-winning Web site,, which includes over six hundred articles by over one hundred "big thinkers" (many of whom are cited in this book), thousands of news articles, listings of events, and other features. Over the past six months, we have had more than one million readers.


pages: 933 words: 205,691

Hadoop: The Definitive Guide by Tom White


Amazon Web Services, bioinformatics, business intelligence, combinatorial explosion, database schema, Debian, domain-specific language,, fault tolerance, full text search, Grace Hopper, information retrieval, Internet Archive, linked data, loose coupling, openstreetmap, recommendation engine, RFID, SETI@home, social graph, web application

Link inversion However, most algorithms for calculating a page’s importance (or quality) need the opposite information, that is, what pages contain outlinks that point to the current page. This information is not readily available when crawling. Also, the indexing process benefits from taking into account the anchor text on inlinks so that this text may semantically enrich the text of the current page. As mentioned earlier, Nutch collects the outlink information and then uses this data to build a LinkDb, which contains this reversed link data in the form of inlinks and anchor text. This section presents a rough outline of the implementation of the LinkDb tool—many details have been omitted (such as URL normalization and filtering) in order to present a clear picture of the process. What’s left gives a classical example of why the MapReduce paradigm fits so well with the key data transformation processes required to run a search engine.


pages: 834 words: 180,700

The Architecture of Open Source Applications by Amy Brown, Greg Wilson


8-hour work day, anti-pattern, bioinformatics,, cloud computing, collaborative editing, combinatorial explosion, computer vision, continuous integration, create, read, update, delete, Debian, domain-specific language,, fault tolerance, finite state, Firefox, friendly fire, linked data, load shedding, locality of reference, loose coupling, Mars Rover, MVC pattern, premature optimization, recommendation engine, revision control, side project, Skype, slashdot, social web, speech recognition, the scientific method, The Wisdom of Crowds, web application, WebSocket

We use universally unique identifiers (UUIDs) to identify data, and commit hashes from git to reference versions. If the data changes from one execution to another, a new version is checked in to the repository. Thus, the (uuid, version) tuple is a compound identifier to retrieve the data in any state. In addition, we store the hash of the data as well as the signature of the upstream portion of the workflow that generated it (if it is not an input). This allows one to link data that might be identified differently as well as reuse data when the same computation is run again. The main concern when designing this package was the way users were able to select and retrieve their data. Also, we wished to keep all data in the same repository, regardless of whether it is used as input, output, or intermediate data (an output of one workflow might be used as the input of another).


pages: 348 words: 39,850

Data Scientists at Work by Sebastian Gutierrez


Albert Einstein, algorithmic trading, bioinformatics, bitcoin, business intelligence, chief data officer, clean water, cloud computing, computer vision, continuous integration, correlation does not imply causation, crowdsourcing, data is the new oil, DevOps, domain-specific language, follow your passion, full text search, informal economy, information retrieval, Infrastructure as a Service, inventory management, iterative process, linked data, Mark Zuckerberg, microbiome, Moneyball by Michael Lewis explains big data, move fast and break things, natural language processing, Network effects, nuclear winter, optical character recognition, pattern recognition, Paul Graham, personalized medicine, Peter Thiel, pre–internet, quantitative hedge fund, quantitative trading / quantitative finance, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, self-driving car, side project, Silicon Valley, Skype, software as a service, speech recognition, statistical model, Steve Jobs, stochastic process, technology bubble, text mining, the scientific method, web application

This meant the system had to understand that, in the past, the cause and effect happened because Ankara is the capital of Turkey. Once it understood that, then it would apply this function to what was going on now. Now the system would say, “Earthquake in Australia. Data Scientists at Work Red Cross help sent to Canberra.” I looked for these types of relations— capital and country—and others using not only Wikipedia, but hundreds of other data sets from a project called Linked Data. The biggest one of these connected data sets was, of course, Wikipedia, which is just structured information from Wikipedia. With this causality graph I could now ask it anything I wanted. An interesting example I used to give is what it taught me when I wanted to buy an iPad. I asked the system, “How much does an iPad cost? Tell me what’s going to be.” The system then told me that prices were going to go up.


pages: 903 words: 235,753

The Stack: On Software and Sovereignty by Benjamin H. Bratton


1960s counterculture, 3D printing, 4chan, Ada Lovelace, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, Berlin Wall, bioinformatics, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, carbon footprint, carbon-based life, Cass Sunstein, Celebration, Florida, charter city, clean water, cloud computing, connected car, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Edward Snowden, Elon Musk,, Eratosthenes, ethereum blockchain, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, Hyperloop, illegal immigration, industrial robot, information retrieval, intermodal, Internet of things, invisible hand, Jacob Appelbaum, Jaron Lanier, Jony Ive, Julian Assange, Khan Academy, linked data, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megastructure, Menlo Park, Minecraft, Monroe Doctrine, Network effects, new economy, offshore financial centre, oil shale / tar sands, packet switching, PageRank, pattern recognition, peak oil, performance metric, personalized medicine, Peter Thiel, phenotype, place-making, planetary scale, RAND corporation, recommendation engine, reserve currency, RFID, Sand Hill Road, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, Slavoj Žižek, smart cities, smart grid, smart meter, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, TaskRabbit, the built environment, The Chicago School, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, universal basic income, urban planning, Vernor Vinge, Washington Consensus, web application, WikiLeaks, working poor, Y Combinator

Through various combinations of open or proprietary exigetics of data, and perhaps a sequence of application programming interfaces (APIs), a query entered as “book me a ticket to New York” can activate a series of secondary inquiries to calendars, banks, flight schedules, airline databases, bank accounts, and so on and, through this, initiate the cascading programming resulting in that booking. For this, to search is also to program. Such tidy consumer use cases require enormously difficult standardizations of interoperability between competitive services (not to mention beyond-Esperanto level standardization of all Users’ conceptual taxonomies). The goal of linking data into semantically relevant and accessible structures so that “search” would also provide more actionable results, and in turn allowing queries to program those results for specific ends, remains compelling for search engines, if less so for individual down-service-stream providers, such as airlines and banks, which see their business absorbed into a handful of search platforms.20 By comparison, physical search may be based on a similar tissue of interrelation between addressable entities—in this case, a mix of physical things and data of interest—and might be a necessary condition of a really viable Internet of Things or SPIME space.


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei


bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, web application

Pattern Analysis and Machine Intelligence (PAMI) 24 (2002) 881–892. [KMR+94] Klemettinen, M.; Mannila, H.; Ronkainen, P.; Toivonen, H.; Verkamo, A.I., Finding interesting rules from large sets of discovered association rules, In: Proc. 3rd Int. Conf. Information and Knowledge Management Gaithersburg, MD. (Nov. 1994), pp. 401–408. [KMS03] Kubica, J.; Moore, A.; Schneider, J., Tractable group detection on large link data sets, In: Proc. 2003 Int. Conf. Data Mining (ICDM’03) Melbourne, FL. (Nov. 2003), pp. 573–576. [KN97] Knorr, E.; Ng, R., A unified notion of outliers: Properties and computation, In: Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD’97) Newport Beach, CA. (Aug. 1997), pp. 219–222. [KNNL04] Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W., Applied Linear Statistical Models with Student CD. (2004) Irwin .