linked data

51 results back to index


Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos

AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, wikimedia commons

The expression of rights provided by licensing makes free data reuse possible. Linked Data without an explicit open license1 (e.g., public domain license) cannot be reused freely, but the quality of Linked Data is independent from licensing. When the specified criteria are met, all five ratings can be used both for Linked Data (for Linked Data without explicit open license) and Linked Open Data (Linked Data with an explicit open license). As a consequence, the five-star rating system can be depicted in a way that the criteria can be read with or without the open license. For example, the Linked Open Data mug can be read with both green labels for five-star Linked Open Data, or neither label for five-star Linked Data, as shown in Figure 3-1. For example, Linked Data available as machine-readable structured data is two-star Linked Data, while the same with an open license is two-star Linked Open Data.

More and more universities provide information about staff members, departments, facilities, courses, grants, and publications as Linked Data and RDF dump, such as the University of Florida (http://vivo.ufl.edu) and the Ghent University (http://data.mmlab.be/mmlab). Libraries such as the Princeton University Library (http://findingaids.princeton.edu) publish bibliographic information as Linked Data. Part of the National Digital Data Archive of Hungary is available as Linked Data at http://lod.sztaki.hu. Even Project Gutenberg is available as Linked Data (http://wifo5-03.informatik.uni-mannheim.de/ gutendata/). Museums such as the British Museum publish some of their records as Linked Data (http://collection.britishmuseum.org). News and media giants publish subject headings as Linked Data, as for example the New York Times at http://data.nytimes.com. MusicBrainz (http://dbtune.org/ musicbrainz/) provides data about music artists and their albums, served as Linked Data and via available through a SPARQL endpoint.

When RDF links are set to other data resources, users can navigate the Web of Data as a whole by following RDF links. The benefits of Linked Data are recognized by more and more organizations, businesses, and individuals. Some industrial giants that already have LOD implementations are Amazon.com, BBC, Facebook, Flickr, Google, Thomson Reuters, The New York Times Company, and Yahoo!, just to name a few. The Five-Star Deployment Scheme for Linked Data Publishing Linked Data (following the Linked Data principles) does not guarantee data quality. For example, the documents the URIs in LOD datasets point to might be documents that are difficult to reuse. Pointing to a fully machine-interpretable RDF file is not the same as pointing to a PDF file containing a table as a scanned image. A five-star rating system is used for expressing the quality of Linked Data which are not open, and Linked Open Data (open data and Linked Data at the same time) [4].


pages: 315 words: 70,044

Learning SPARQL by Bob Ducharme

database schema, Donald Knuth, en.wikipedia.org, G4S, linked data, semantic web, SPARQL, web application

For example, simply knowing that “spouse” is a symmetric term made it possible to find out the identity of Cindy’s spouse, even though this fact was not part of the dataset. Linked Data The idea of Linked Data is newer than that of the semantic web, but sometimes it’s easier to think of the semantic web as building on the ideas behind Linked Data. Linked Data is not a specification, but a set of best practices for providing a data infrastructure that makes it easier to share data across the web. You can then use semantic web technologies such as RDFS, OWL, and SPARQL to build applications around that data. Tim Berners-Lee came up with these four principles of Linked Data in 2006 (I’ve bolded his wording and added my own commentary): Use URIs as names for things. URIs are the best way available to uniquely identify things, and therefore to identify connections between things.

., Checking, Adding, and Removing Spoken Language Tags langMatches(), Checking, Adding, and Removing Spoken Language Tags language codes, Making RDF More Readable with Language Tags and Labels, Using the Labels Provided by DBpedia, Checking, Adding, and Removing Spoken Language Tags, Checking, Adding, and Removing Spoken Language Tags checking, adding, and removing, Checking, Adding, and Removing Spoken Language Tags, Checking, Adding, and Removing Spoken Language Tags filtering on, Using the Labels Provided by DBpedia LCASE(), String Functions LIMIT, Retrieving a Specific Number of Results, Federated Queries: Searching Multiple Datasets with One Query Linked Data, What Exactly Is the “Semantic Web”?, Linked Data, Linked Data, Linked Data, Public Endpoints, Private Endpoints, Public Endpoints, Private Endpoints, Glossary intranets and, Public Endpoints, Private Endpoints Linked Open Data, Linked Data, Public Endpoints, Private Endpoints Linked Movie Database, SPARQL and Web Application Development, SPARQL and Web Application Development literal, Data Typing, Glossary LOAD, Adding Data to a Dataset local name, URLs, URIs, IRIs, and Namespaces, Glossary M MAX(), Finding the Smallest, the Biggest, the Count, the Average...

o as variable names, Searching for Strings [], Blank Nodes and Why They’re Useful (see square braces) ^ in property paths, Searching Further in the Data ^^ datatype indicator, Datatypes and Queries _ in blank node names, Blank Nodes and Why They’re Useful | in property paths, Searching Further in the Data || in boolean expressions, Program Logic Functions “"” to delimit strings in Turtle and SPARQL, Representing Strings A a (“a”) as keyword, Reusing and Creating Vocabularies: RDF Schema and OWL abs(), Numeric Functions addition, Comparing Values and Doing Arithmetic AGROVOC thesaurus, Datatypes and Queries APIs, SPARQL, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT arithmetic, Comparing Values and Doing Arithmetic, Comparing Values and Doing Arithmetic ARQ SPARQL processor, Querying the Data, Standalone Processors application development and, Standalone Processors AS, Combining Values and Assigning Values to Variables ASK, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Defining Rules with SPARQL, Defining Rules with SPARQL SPARQL rules and, Defining Rules with SPARQL, Defining Rules with SPARQL asterisk, Searching for Strings, Searching Further in the Data in property paths, Searching Further in the Data in SELECT expression, Searching for Strings AVG(), Finding the Smallest, the Biggest, the Count, the Average..., Grouping Data and Finding Aggregate Values within Groups B bad data, finding, Finding Bad Data, Using Existing SPARQL Rules Vocabularies BASE, Node Type Conversion Functions Berners-Lee, Tim, Why Learn SPARQL?, What Exactly Is the “Semantic Web”?, Linked Data Linked Data and, Linked Data biggest value, finding, Finding the Smallest, the Biggest, the Count, the Average..., Finding the Smallest, the Biggest, the Count, the Average... BIND, Combining Values and Assigning Values to Variables, Creating New Data, Comparing Values and Doing Arithmetic in CONSTRUCT queries, Creating New Data binding, More Realistic Data and Matching on Multiple Triples, Glossary, Glossary blank nodes, Blank Nodes and Why They’re Useful, Blank Nodes and Why They’re Useful, Blank Nodes and Why They’re Useful, Searching with Blank Nodes, Using Existing SPARQL Rules Vocabularies, Node Type Conversion Functions, Glossary searching with, Searching with Blank Nodes square braces to represent, Using Existing SPARQL Rules Vocabularies bnode, Blank Nodes and Why They’re Useful (see blank nodes) boolean datatype, Datatypes and Queries bound(), Finding Data That Doesn’t Meet Certain Conditions, Node Type and Datatype Checking Functions C cast, Glossary casting, Functions ceil(), Numeric Functions CGI scripts, SPARQL and Web Application Development classes, Reusing and Creating Vocabularies: RDF Schema and OWL, Reusing and Creating Vocabularies: RDF Schema and OWL, Creating New Data subclasses and, Reusing and Creating Vocabularies: RDF Schema and OWL CLEAR, Deleting Data COALESCE(), Program Logic Functions comma, Storing RDF in Files, Converting Data CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files comma separated values, Standalone Processors comments (in Turtle and SPARQL), The Data to Query CONCAT(), Program Logic Functions CONSTRUCT, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Copying Data, Converting Data, Changing Existing Data prototyping update commands with, Changing Existing Data CONTAINS(), String Functions, String Functions, Extension Functions converting data, Converting Data, Converting Data copying data, Copying Data, Copying Data COUNT(), Finding the Smallest, the Biggest, the Count, the Average..., Grouping Data and Finding Aggregate Values within Groups CSS, SPARQL and Web Application Development curl utility, SPARQL and Web Application Development D D2RQ, Querying a Remote SPARQL Service, Middleware SPARQL Support data cleanup, FILTERing Data Based on Conditions data typing, Data Typing, Data Typing datatype(), Defining Rules with SPARQL, Node Type and Datatype Checking Functions datatypes, Datatypes and Queries, Datatype Conversion, Datatype Conversion converting, Datatype Conversion, Datatype Conversion custom, Datatypes and Queries date datatype, Datatypes and Queries date ranges in queries, Comparing Values and Doing Arithmetic dateTime datatype, Datatypes and Queries day(), Date and Time Functions DBpedia, Querying a Public Data Source, Using the Labels Provided by DBpedia, SPARQL and Web Application Development querying, Querying a Public Data Source decimal datatype, Datatypes and Queries default graph, Querying Named Graphs, Glossary DELETE, Deleting Data DELETE DATA, Deleting Data, Deleting Data DELETE vs., Deleting Data DESC(), Sorting Data DESCRIBE, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Asking for a Description of a Resource DISTINCT, Eliminating Redundant Output, Eliminating Redundant Output, Querying Named Graphs division, Comparing Values and Doing Arithmetic double precision datatype, Datatypes and Queries DROP, Dropping Graphs Dublin Core, URLs, URIs, IRIs, and Namespaces, Changing Existing Data, Glossary E ENCODE_FOR_URI(), String Functions entailment, The SPARQL Specifications, Glossary F FILTER, Searching for Strings, FILTERing Data Based on Conditions, FILTERing Data Based on Conditions float datatype, Datatypes and Queries floor(), Numeric Functions FOAF (Friend of a Friend), URLs, URIs, IRIs, and Namespaces, Storing RDF in Files, Converting Data, Hash Functions, Glossary hash functions in, Hash Functions Freebase, SPARQL and Web Application Development FROM, Querying the Data, Querying Named Graphs, Copying Data in CONSTRUCT queries, Copying Data FROM NAMED, Querying Named Graphs Fuseki, Getting Started with Fuseki, Getting Started with Fuseki, Adding Data to a Dataset loading data into, Adding Data to a Dataset shutting down, Getting Started with Fuseki starting up, Getting Started with Fuseki G GRAPH, Querying Named Graphs, Querying Named Graphs, Querying Named Graphs, Copying Data, Named Graphs in CONSTRUCT queries, Copying Data in update queries, Named Graphs referencing graphs not named in FROM NAMED clause, Querying Named Graphs variables with, Querying Named Graphs graph pattern, More Realistic Data and Matching on Multiple Triples, Glossary graphs (RDF), More Realistic Data and Matching on Multiple Triples, Glossary GROUP BY, Grouping Data and Finding Aggregate Values within Groups GROUP_CONCAT(), Finding the Smallest, the Biggest, the Count, the Average...


pages: 511 words: 111,423

Learning SPARQL by Bob Ducharme

Donald Knuth, en.wikipedia.org, G4S, hypertext link, linked data, place-making, semantic web, SPARQL, web application

For example, simply knowing that “spouse” is a symmetric term made it possible to find out the identity of Cindy’s spouse, even though this fact was not part of the dataset. We’ll learn more about RDFS and OWL in Chapter 9. Linked Data The idea of Linked Data is newer than that of the semantic web, but sometimes it’s easier to think of the semantic web as building on the ideas behind Linked Data. Linked Data is not a specification, but a set of best practices for providing a data infrastructure that makes it easier to share data across the Web. You can then use semantic web technologies such as RDFS, OWL, and SPARQL to build applications around that data. Tim Berners-Lee came up with these four principles of Linked Data in 2006 (I’ve bolded his wording and added my own commentary): Use URIs as names for things. URIs are the best way available to uniquely identify things, and therefore to identify connections between things.

., Checking, Adding, and Removing Spoken Language Tags langMatches(), Checking, Adding, and Removing Spoken Language Tags language codes, Making RDF More Readable with Language Tags and Labels, Checking, Adding, and Removing Spoken Language Tags–Checking, Adding, and Removing Spoken Language Tags adding, Checking, Adding, and Removing Spoken Language Tags checking, Checking, Adding, and Removing Spoken Language Tags filtering on, Using the Labels Provided by DBpedia removing, Checking, Adding, and Removing Spoken Language Tags LCASE(), String Functions, Discussion LIMIT, Retrieving a Specific Number of Results, Federated Queries: Searching Multiple Datasets with One Query Linked Data, What Exactly Is the “Semantic Web”?, Linked DataLinked Data, Problem, Glossary intranets and, Public Endpoints, Private Endpoints Linked Open Data, Linked Data, Public Endpoints, Private Endpoints Linked Movie Database, SPARQL and Web Application Development, SPARQL and Web Application Development Linked Open Data, Discussion List All Triples query, Named Graphs literal, Data Typing, Glossary LOAD, Adding Data to a Dataset local name, URLs, URIs, IRIs, and Namespaces, Extension Functions, Glossary M magic properties (see property functions) materialization of triples, Inferred Triples and Your Query MAX(), Finding the Smallest, the Biggest, the Count, the Average...

If one airline redesigns their website, the developer must update his screen-scraping program to account for these differences. Berners-Lee came up with the idea of Linked Data as a set of best practices for sharing data across the web infrastructure so that applications can more easily retrieve data from public sites with no need for screen scraping—for example, to let your calendar program get flight information from multiple airline websites in a common, machine-readable format. These best practices recommend the use of URIs to name things and the use of standards such as RDF and SPARQL. They provide excellent guidelines for the creation of an infrastructure for the semantic web. and the semantics of that data The idea of “semantics” is often defined as “the meaning of words.” Linked Data principles and the related standards make it easier to share data, and the use of URIs can provide a bit of semantics by providing the context of a term.


The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Bayesian statistics, business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, discrete time, disruptive innovation, George Gilder, Google Earth, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, late capitalism, lifelogging, linked data, longitudinal study, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

The key to avoiding the creation of such a negative cycle is to ensure that any initiative focuses as much on the demand-side as the supply-side, providing users with interoperable data and analytic tools and other services that facilitate use and add value to the data, rather than simply linking to files. Conclusion At one level, the case for open and linked data is commonsensical – open data create transparency and accountability; participation, choice and social innovation; efficiency, productivity and enhanced governance; economic innovation and wealth creation. Linked data convert information across the Internet into a semantic web from which data can be machine-read and linked together. Open and linked data thus hold much promise and value as a venture. However, the case for open and linked data is more complex, and their economic underpinnings are not at all straightforward. Open and linked data might seem to have marginal costs, but their production and the technical and institutional apparatus needed to facilitate and maintain them has real cost in terms of labour, equipment, and resources.

When documents are published in this way, information on the Internet can be rendered and repackaged as data and can be linked in an infinite number of ways depending on purpose. However, as P. Miller (2010) notes, ‘linked data may be open, and open data may be linked, but it is equally possible for linked data to carry licensing or other restrictions that prevent it being considered open’, or for open data to be made available in ways that do not easily enable linking. In general, any linked documents that are not on an intranet or behind a pay wall are also open in nature. For Berners-Lee (2009), open and linked data should ideally be synonymous and he sets out five levels of such data, each with progressively more utility and value (see Table 3.3). His aspiration is for what he terms five-star (level five) data – a fully operational semantic Web.

Since the late 2000s the movement has noticeably gained prominence and traction, initially with the Guardian newspaper’s campaign in the UK to ‘Free Our Data’ (www.theguardian.com/technology/free-ourdata), the Organization for Economic Cooperation and Development (OECD)’s call for member governments to open up their data in 2008, the launch in 2009 by the US government of data.gov, a website designed to provide access to non-sensitive and historical datasets held by US state and federal agencies, and the development of linked data and the promotion of the ‘Semantic Web’ as a standard element of future Internet technologies, in which open and linked data are often discursively conjoined (Berners-Lee 2009). Since 2010 dozens of countries and international organisations (e.g., the European Union [EU] and the United Nations Development Programme [UNDP]) have followed suit, making thousands of previously restricted datasets open in nature for non-commercial and commercial use (see DataRemixed 2013).


pages: 369 words: 80,355

Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room by David Weinberger

airport security, Alfred Russel Wallace, Amazon Mechanical Turk, Berlin Wall, Black Swan, book scanning, Cass Sunstein, commoditize, corporate social responsibility, crowdsourcing, Danny Hillis, David Brooks, Debian, double entry bookkeeping, double helix, en.wikipedia.org, Exxon Valdez, Fall of the Berlin Wall, future of journalism, Galaxy Zoo, Hacker Ethic, Haight Ashbury, hive mind, Howard Rheingold, invention of the telegraph, jimmy wales, Johannes Kepler, John Harrison: Longitude, Kevin Kelly, linked data, Netflix Prize, New Journalism, Nicholas Carr, Norbert Wiener, openstreetmap, P = NP, Pluto: dwarf planet, profit motive, Ralph Waldo Emerson, RAND corporation, Ray Kurzweil, Republic of Letters, RFID, Richard Feynman, Ronald Reagan, semantic web, slashdot, social graph, Steven Pinker, Stewart Brand, technological singularity, Ted Nelson, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas Malthus, Whole Earth Catalog, X Prize

This approach may be messy and imperfect, but it is 100 percent better than not releasing data because you haven’t figured out how to get the metadata perfectly right. The rise of Linked Data encapsulates the transformation of knowledge we have explored throughout this book. While the original Semantic Web emphasized building ontologies that are “knowledge representations” of the world, it turns out that if we go straight to unleashing an abundance of linked but imperfect data, making it widely and openly available in standardized form, the Net becomes a dramatically improved infrastructure for knowledge. Linked Data is nevertheless itself only an example of a more expansive practice: Create metadata so your information can be reused. Linked Data is usable because it points beyond itself to information about the information. That’s how a “triple” about mercury can be identified as being about the chemical, the planet, or the Roman god.

For example, when an article in the journal Public Library of Science Medicine 43 examines “the predictors of live birth” in in vitro fertilization by analyzing 144,018 attempts, it links to the UK open government site where the source data—“the world’s oldest and most comprehensive database of fertility treatment in the UK”—is available.44 The new default is: If you’re going to cite the data, you might as well link to it. Networked facts point to where they came from and, sometimes, where they lead to. Indeed, a new standard called Linked Data is making it easier to make the facts presented in one site useful to other sites in unanticipated ways—enabling an ad hoc worldwide data commons. Key to Linked Data is the ability for a computer program not only to get the fact but to ask the resource for a link to more information about the context of the fact.45 Facts have become networked because our new information infrastructure happens also to be a hyperlinked publishing system. If you’re going to make a fact visible, it’s so easy to link it to its source that you’ll need some special justification not to do so.

Rather, it is that “[t]rust should have no part in science.” We used to need trust because paper-based publishing breaks knowledge off from its source. Now, however, science—which has always had a network of inter-cited publications—occurs within a network of links. We create these links by hand, computers prowl the Web suggesting new links, and the surge of interest in the Linked Data format is making it easier than ever to create clouds of linked data just waiting for new uses. In this hyperlinked environment, we will continue to tell science’s stories, but those stories will be embedded within a system of connections. We will click to see the data. We will click to have our computers compare disparate datasets, surfacing the anomalies and disagreements that will never be entirely driven out from the data of science or from its stories.


RDF Database Systems: Triples Storage and SPARQL Query Processing by Olivier Cure, Guillaume Blin

Amazon Web Services, bioinformatics, business intelligence, cloud computing, database schema, fault tolerance, full text search, information retrieval, Internet Archive, Internet of things, linked data, NP-complete, peer-to-peer, performance metric, random walk, recommendation engine, RFID, semantic web, Silicon Valley, social intelligence, software as a service, SPARQL, web application

IJCAI, 2668–2673. Ladwig, G., Harth, A., 2011. Cumulus RDF: Linked data management on nested key-value stores. In: ­Proceedings of the 7th International Workshop on Scalable Semantic Web Knowledge Base Systems at the 10th International Semantic Web Conference (ISWC2011), Springer Berlin Heidelberg, pp. 30-42. Ladwig, G., Tran, T., 2010. Linked data query processing strategies. In: International Semantic Web ­Conference, vol. 1, pp. 453–469. Larsson, N.J., Moffat, A., 1999. Offline dictionary-based compression. In: Data Compression Conference, DCC 1999, Snowbird, Utah, USA, March 29–31, 1999, pp. 296–305. Le-Phuoc, D., Dao-Tran, M., Parreira, J.X., Hauswirth, M., 2011. A native and adaptive approach for unified processing of linked streams and linked data. In: Proceedings of the 10th International Conference on the Semantic Web.

See JavaScript Object Notation (JSON) JustBench, 78 K KBMS. See Knowledge base management system (KBMS) Key-value stores, 27 Knowledge base management system (KBMS), 192 Kowari system, 112 L Last-to-front mapping property, 91 Lehigh University benchmark (LUBM), 77 data set, 98, 123 Linked Data Integration Benchmark (LODIB), 78 Linked data movement, 181 LinkedIn, 99 LinkedMDb, 168 Linked open data (LOD), 3 Literal, full text search, 99 analyzing text, 100 Load scalability, 170 LOD. See Linked open data (LOD) LODIB. See Linked Data Integration Benchmark (LODIB) LUBM. See Lehigh University benchmark (LUBM) Lucene software library, 176 Lucene documents, 100 M MAAN. See Multi-attribute addressable network (MAAN) MapReduce -based cluster, 32, 109 MapReduce decompression approach, 103 MapReduce processing, 9 MapReduce programming model, 102 MapReduce tasks, 37 MariaDB system, 23 MarkLogic system, 32, 138, 179 MaRVIN system, 217 Maximum-weight independent sets problem, 153 Mediator-based information systems, 187 Membase system, 137 Memcached system, 29 Memory mapping, 81 MemSQL system, 38 Message passing interface (MPI) approach, 216 Microformats, 52 Model checking, 193 MonetDB, 19 MongoDB system, 28, 32, 137 MPI approach.

The main advantages of JSON are its simplicity, flexibility (it’s schemaless), and native processing support for most Web applications due to a tight integration with the JavaScript programming language. But RDF is not without assets. For example, as a semi-structured data model, RDF data sets can be described with expressive schema languages, such as RDF Schema (RDFS) or Web Ontology Language (OWL), and can be linked to other documents present on the Web, forming the Linked Data movement. With the emergence of Linked Data, a pattern for hyperlinking machine-readable data sets that extensively uses RDF, URIs, and HTTP, we can consider that more and more data will be directly produced in or transformed into RDF. In 2013, the linked open data (LOD), a set of RDF data produced from open data sources, is considered to contain over 50 billion triples on domains as diverse as medicine, culture, and science, just to name a few.


pages: 245 words: 68,420

Content Everywhere: Strategy and Structure for Future-Ready Content by Sara Wachter-Boettcher

crowdsourcing, John Gruber, Kickstarter, linked data, search engine result page, semantic web, Silicon Valley

But a more semantic Web seems closer than ever with the recent advent of linked data, which is made possible through structured content and markup. Coined by Tim Berners-Lee—yes, the guy who invented the World Wide Web—in 2006, linked data means exactly what it sounds like: bits of information that are linked to other, equivalent sets of data elsewhere on the Internet (often referred to as “in the cloud”), as illustrated in Figure 6.1. The idea is that, as opposed to HTML links, which link one document (e.g., a page) to another, linked data connects the things those pages are about by connecting the actual data behind those two pages instead. This gives both databases access to the information in the other, and that information then becomes more useful to both people and machines. FIGURE 6.1 Linked data connects content from different places, like between your website and Wikipedia, based on shared content attributes—and it’s getting more and more useful for connecting content across sources.

By clicking through to the broadleaf forest page, you can see what a broadleaf forest means, as well as what other sorts of animals live there. And all these pages of content are built automatically, using the content’s underlying structure to dictate what’s contextually relevant where. Finally, remember our introduction to linked data in Chapter 6, “Understanding Markup”? Well, the BBC is making use of that, too. Rather than, say, hiring writers to craft overviews of every animal the BBC has video footage about, the organization relies on content from other sources, accessible via linked data. That is, by structuring content along the same lines as sources like Wikipedia, the BBC can automatically pull in the content it doesn’t have—and isn’t invested enough in to create—from an external source. FIGURE 8.3 BBC Wildlife’s collection of cheetah-related content and links to other animals, habitats, and more—all pulled together automatically based on their content’s structure.

Because of this, it forms the basis for a number of the other markup approaches listed next, but this also gives XML plenty of critics who say it’s too clunky and difficult to write and use, as well as those who say it’s too generic to be a standard. RDF RDF, which stands for Resource Description Framework, is a generic method used to describe concepts—specifically, to describe things and their relationships with other things. It can be written using a variety of other languages, including both XML and JSON. It’s the glue that holds linked data together, providing a language for describing data by using three elements to form a machine-readable statement: a subject, a predicate, and an object—such as stating that the Declaration of Independence has an author of Thomas Jefferson. However, in practice, RDF is currently used relatively little. OWL Web ontology language, somewhat confusingly abbreviated as OWL, is built on RDF and expressed using XML.


pages: 223 words: 52,808

Intertwingled: The Work and Influence of Ted Nelson (History of Computing) by Douglas R. Dechow

3D printing, Apple II, Bill Duvall, Brewster Kahle, Buckminster Fuller, Claude Shannon: information theory, cognitive dissonance, computer age, conceptual framework, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, game design, HyperCard, hypertext link, information retrieval, Internet Archive, Jaron Lanier, knowledge worker, linked data, Marc Andreessen, Marshall McLuhan, Menlo Park, Mother of all demos, pre–internet, RAND corporation, semantic web, Silicon Valley, software studies, Steve Jobs, Steve Wozniak, Stewart Brand, Ted Nelson, the medium is the message, Vannevar Bush, Wall-E, Whole Earth Catalog

So for me this really was a seminal conference with so many truly ground breaking ideas emerging at the same time, apparently orthogonal to each other but actually all the same thing as time has confirmed, since the Google Knowledge Graph is the Semantic Web or ZigZag by another name. It’s all about linking data. This is a much quieter revolution than that initiated by the document Web but it will be much more far reaching. Linked data will become an integral part of the development of data-driven systems architectures that will revolutionize the way we build and maintain information management systems. Linked data architectures will supersede relational databases, make websites easier to build and unify the worlds of hypertext, document management, and databases to create rich interlinked knowledge-based systems as envisaged by the pioneers such as Ted and Doug over 50 years ago. But the linked data revolution was very slow to take off—largely because it’s hard to explain the key concepts to people and what the benefits are.

Open Access This chapter is distributed under the terms of the Creative Commons Attribution Noncommercial License, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited. References 1. Agosti M, Ferr N (2007) A formal model of annotations of digital content. ACM Trans Inf Syst 26(1). doi:10.​1145/​1292591.​1292594 2. Baca M (1998) Introduction to metadata: pathways to digital information. Getty Information Institute, Los Angeles 3. Bechhofer S, Buchan I, De Roure D, Missier P, Ainsworth J, Bhagat J, Goble C et al (2013) Why linked data is not enough for scientists. Futur Gener Comput Syst 29(2). Special section: Recent advances in e-Science: 599–611. doi:10.​1016/​j.​future.​2011.​08.​004 4. Bechhofer S, De Roure D, Gamble M, Goble C, Buchan I (2010) Research objects: towards exchange and reuse of digital knowledge. Nat Proc. doi:10.​1038/​npre.​2010.​4626.​1 5. Bell G (2001) A personal digital store. Commun ACM 44(1):86–91. doi:10.​1145/​357489.​357513 CrossRef 6.

Three things happened at that conference as I recall. Tim started talking about the Semantic Web again in his keynote for the conference. He had talked about it at the first WWW conference in 1994 [1] and the idea of making links on data in the information management proposal he wrote in 1989. As far as he was concerned in 1998, the web of linked documents was beginning to emerge but his vision wasn’t complete until it was also a web of linked data, and so he started to re-educate the community about this at the Brisbane conference. Ted was also at the Brisbane conference to pick up a special award. I remember him demoing ZigZag to us in the bar one night at that conference. He was so excited, and we were all mesmerized. So I had heard Tim talk about the Semantic Web and I saw Ted demo ZigZag at the same conference, and I didn’t fully appreciate either of them at the time.


The Art of SEO by Eric Enge, Stephan Spencer, Jessie Stricchiola, Rand Fishkin

AltaVista, barriers to entry, bounce rate, Build a better mousetrap, business intelligence, cloud computing, dark matter, en.wikipedia.org, Firefox, Google Chrome, Google Earth, hypertext link, index card, information retrieval, Internet Archive, Law of Accelerating Returns, linked data, mass immigration, Metcalfe’s law, Network effects, optical character recognition, PageRank, performance metric, risk tolerance, search engine result page, self-driving car, sentiment analysis, social web, sorting algorithm, speech recognition, Steven Levy, text mining, web application, wikimedia commons

Figure 10-51 and Figure 10-52 depict some example graphs showing the rate of new external links (and in the last two instances, pages) created over time, with some speculation as to what the trends might indicate. Figure 10-51. Interpreting new external link data Figure 10-52. More link data speculation These assumptions do not necessarily hold true for every site or instance, but the graphs make it easy to see how the engines can use temporal link and content growth information to make guesses about the relevance or worthiness of a particular site. Figure 10-53 shows some guesstimates of a few real sites and how these trends have affected them. Figure 10-53. Wikipedia link data guesstimates As you can see in Figure 10-53, Wikipedia has had tremendous growth in both pages and links from 2007 through 2011. This success manifests itself in the search engines, which reward Wikipedia’s massive link authority with high rankings for much of its content.

Google and Bing Webmaster Tools As mentioned earlier, other valuable sources of data include Google Webmaster Tools and Bing Webmaster Tools. We cover these extensively in Using Search Engine–Supplied SEO Tools. From a planning perspective, you will want to get these tools in place as soon as possible. Both tools provide valuable insight into how the search engines see your site. This includes things such as external link data, internal link data, crawl errors, high-volume search terms, and much, much more. Note Some companies will not want to set up these tools because they do not want to share their data with the search engines, but this is a nonissue as the tools do not provide the search engines with any more data about your website; rather, they let you see some of the data the search engines already have. Search Analytics Search analytics is a new and emerging category of tools.

Bing report on external links For quick and dirty link totals, you can use a Firefox plug-in known as SearchStatus. This plug-in provides basic link data on the fly with just a couple of mouse clicks. Figure 10-23 shows the menu you’ll see with regard to backlinks. Notice also in the figure that the SearchStatus plug-in offers an option for highlighting NoFollow links, as well as many other capabilities. It is a great tool that allows you to pull numbers such as these much more quickly than would otherwise be possible. Figure 10-23. Firefox SearchStatus plug-in Third-party link-measuring tools Here is a look at some of the better-known advanced third-party tools for gathering link data. Open Site Explorer Open Site Explorer was developed based on crawl data obtained by SEOmoz, plus a variety of parties engaged by SEOmoz.


Beautiful Visualization by Julie Steele

barriers to entry, correlation does not imply causation, data acquisition, database schema, Drosophila, en.wikipedia.org, epigenetics, global pandemic, Hans Rosling, index card, information retrieval, iterative process, linked data, Mercator projection, meta analysis, meta-analysis, natural language processing, Netflix Prize, pattern recognition, peer-to-peer, performance metric, QR code, recommendation engine, semantic web, social graph, sorting algorithm, Steve Jobs, web application, wikimedia commons

However, choosing an effective presentation is challenging, as not all information visualizations are created equally. Not all information visualizations highlight the patterns, gaps, and outliers important to analysts’ tasks, and furthermore, not all information visualizations “force us to notice what we never expected to see” (Tukey 1977). A growing trend in data analysis is to make sense of linked data as networks. Rather than looking solely at attributes of data, network analysts also focus on the connections between data and the resulting structures. My research focuses on understanding these networks because they are topical, emergent, and inherently challenging for analysts. Networks are difficult to visualize and navigate, and, most problematically, it is difficult to find task-relevant patterns.

Data Model Definition Plus Emergence To get an idea of the basic structure, the first thing we want to see in a database evaluation is the data model—if possible, including some indicators of how the actual data is distributed within the model. If we’re starting from a graph representation of the database, as defined in Figure 14-2, this is a simple task. All we need is a nodeset and an edgeset, which can be easily produced from a relational set of tables; it might even come for free if the database is available in the form of an RDF dump (Freebase 2009) or as Linked Data (Bizer, Heath, and Berners-Lee 2009). From there, we can easily produce a node-link diagram using a graph drawing program such as Cytoscape (Shannon et al. 2003)—an open source application that has its roots in the biological networks scientific community. The resulting diagram, shown in Figure 14-3, depicts the given data model in a similar way as a regular Entity-Relationship (E-R) data structure diagram (Chen 1976), enriched with some quantitative information about the actual data.

Frequent node and link types occur way more often in reality than the majority of less frequent types—a fact that is usually not reflected in traditional E-R data structure diagrams, often leading to lengthy discussions about almost irrelevant areas of particular database models. Figure 14-3. The CENSUS data model as a weighted node-link diagram The heterogeneity of node and link type frequency evidenced in Figure 14-3 is not restricted to our example. It is observable in many datasets, including research databases (Schich and Ebert-Schifferer 2009), large bibliographies (Schich et al. 2009), Freebase, and the Linked Data cloud, regardless of whether the number of types is predefined or expandable by the curators. In all cases that I have seen so far, both the number of nodes per node type and the number of links per link type exhibit right-skewed diminishing distributions, which are widely known as long tails (Anderson 2006, Newman 2005), and lack a shared average as found in a normal Gaussian distribution. The comparable long-tail structure of hyperlinks in web pages—i.e., of a single link type in only one node type—has been well known for over a decade (Science 2009).


pages: 100 words: 15,500

Getting Started with D3 by Mike Dewar

Firefox, Google Chrome, linked data

First, we lay out the circles and edges: var width = 1500, height = 1500; var svg = d3.select("body") .append("svg") .attr("width", width) .attr("height", height); var node = svg.selectAll("circle.node") .data(data.nodes) .enter() .append("circle") .attr("class", "node") .attr("r", 12); var link = svg.selectAll("line.link") .data(data.links) .enter().append("line") .style("stroke","black"); This populates the web page with the appropriate elements, we just need to lay them out. The force layout applies a force-directed algorithm to decide the position of each node. Here, each node feels a repulsive force from every other node, but is constrained by the edges that keep nodes connected together. This can result in an organic layout that looks wonderfully inviting as it unfolds. D3 makes it easy; first we instantiate the algorithm: var force = d3.layout.force() .charge(-120) .linkDistance(30) .size([width, height]) .nodes(data.nodes) .links(data.links) .start(); These methods are all custom methods for the algorithm that detail the various parameters and references the algorithm needs to compute how the position of the nodes and edges should change.


pages: 713 words: 93,944

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Eric Redmond, Jim Wilson, Jim R. Wilson

AGPL, Amazon Web Services, create, read, update, delete, data is the new oil, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, full text search, general-purpose programming language, Kickstarter, linked data, MVC pattern, natural language processing, node package manager, random walk, recommendation engine, Ruby on Rails, Skype, social graph, web application

For example, if the text of the article on Star Wars contains the string "[[Yoda|jedi master]]", we want to store that relationship twice—once as an outgoing link from Star Wars and once as an incoming link to Yoda. Storing the relationship twice means that it’s fast to look up both a page’s outgoing links and its incoming links. To store this additional link data, we’ll create a new table. Head over to the shell and enter this: ​​hbase> create 'links', {​​ ​​ NAME => 'to', VERSIONS => 1, BLOOMFILTER => 'ROWCOL'​​ ​​},{​​ ​​ NAME => 'from', VERSIONS => 1, BLOOMFILTER => 'ROWCOL'​​ ​​}​​ In principle, we could have chosen to shove the link data into an existing column family or merely added one or more additional column families to the wiki table, rather than create a new one. Creating a separate table has the advantage that the tables have separate regions. This means that the cluster can more effectively split regions as necessary.

A graph database consists of nodes and relationships between nodes. Both nodes and relationships can have properties—key-value pairs—that store data. The real strength of graph databases is traversing through the nodes by following relationships. In Chapter 7, ​Neo4J​, we discuss the most popular graph database today, Neo4J. Neo4J One operation where other databases often fall flat is crawling through self-referential or otherwise intricately linked data. This is exactly where Neo4J shines. The benefit of using a graph database is the ability to quickly traverse nodes and relationships to find relevant data. Often found in social networking applications, graph databases are gaining traction for their flexibility, with Neo4j as a pinnacle implementation. Polyglot In the wild, databases are often used alongside other databases. It’s still common to find a lone relational database, but over time it is becoming popular to use several databases together, leveraging their strengths to create an ecosystem that is more powerful, capable, and robust than the sum of its parts.

We’ll put Ace in cage 2 and also point to cage 1 tagged with next_to so we know that it’s nearby. ​​$ curl -X PUT http://localhost:8091/riak/cages/2 \​​ ​​-H "Content-Type: application/json" \​​ ​​-H "Link:</riak/animals/ace>;riaktag=\"contains\",​​ ​​ </riak/cages/1>;riaktag=\"next_to\"" \​​ ​​-d '{"room" : 101}'​​ What makes Links special in Riak is link walking (and a more powerful variant, linked mapreduce queries, which we investigate tomorrow). Getting the linked data is achieved by appending a link spec to the URL that is structured like this: /_,_,_. The underscores (_) in the URL represent wildcards to each of the link criteria: bucket, tag, keep. We’ll explain those terms shortly. First let’s retrieve all links from cage 1. ​​$ curl http://localhost:8091/riak/cages/1/_,_,_​​ ​​--4PYi9DW8iJK5aCvQQrrP7mh7jZs​​ ​​Content-Type: multipart/mixed; boundary=Av1fawIA4WjypRlz5gHJtrRqklD​​ ​​​​ ​​--Av1fawIA4WjypRlz5gHJtrRqklD​​ ​​X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fvrde/U5gymRMY+VwZw35gRfFgA=​​ ​​Location: /riak/animals/polly​​ ​​Content-Type: application/json​​ ​​Link: </riak/animals>; rel="up"​​ ​​Etag: VD0ZAfOTsIHsgG5PM3YZW​​ ​​Last-Modified: Tue, 13 Dec 2011 17:53:59 GMT​​ ​​​​ ​​{"nickname" : "Sweet Polly Purebred", "breed" : "Purebred"}​​ ​​--Av1fawIA4WjypRlz5gHJtrRqklD--​​ ​​​​ ​​--4PYi9DW8iJK5aCvQQrrP7mh7jZs--​​ It returns a multipart/mixed dump of headers plus bodies of all linked keys/values.


Cataloging the World: Paul Otlet and the Birth of the Information Age by Alex Wright

1960s counterculture, Ada Lovelace, barriers to entry, British Empire, business climate, business intelligence, Cape to Cairo, card file, centralized clearinghouse, corporate governance, crowdsourcing, Danny Hillis, Deng Xiaoping, don't be evil, Douglas Engelbart, Douglas Engelbart, Electric Kool-Aid Acid Test, European colonialism, Frederick Winslow Taylor, hive mind, Howard Rheingold, index card, information retrieval, invention of movable type, invention of the printing press, Jane Jacobs, John Markoff, Kevin Kelly, knowledge worker, Law of Accelerating Returns, linked data, Livingstone, I presume, lone genius, Menlo Park, Mother of all demos, Norman Mailer, out of africa, packet switching, profit motive, RAND corporation, Ray Kurzweil, Scramble for Africa, self-driving car, semantic web, Silicon Valley, speech recognition, Steve Jobs, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, the scientific method, Thomas L Friedman, urban planning, Vannevar Bush, Whole Earth Catalog

“The knowledge web will make us all smarter.”11 Hillis’s optimism for the possibilities of the Semantic Web eventually paid off. One year after writing that essay, he established a company called MetaWeb that created Freebase, which he characterized as an “open, shared database of the world’s knowledge.” In 2010, he sold the company to Google, where its structured snippets now often complement traditional keyword-based search results. In recent years, the Linked Data movement has to some extent subsumed the Semantic Web initiative. Linked Data proposes more of a middle ground, in which ontologies might be derived programmatically from analyzing large data sets, rather than manually created by teams of experts.12 This middle way approach might incorporate some of Otlet’s ideas: a topical structure further refined by automated discovery, bidirectional linking, and the ability to extract content from static documents, then synthesize and interpolate it in new ways.13 278 E ntering the S trea m In a widely circulated 2005 essay, “Ontology Is Overrated,” Clay Shirky argues that projects like the Semantic Web were doomed to failure in the Internet age.

See also Dewey Decimal System development of, 226–227 expanded use of, 40, 232 Josephinian Catalog, 33 playing cards, use of, 33 rejecting Universal Bibliography, 72 significance of, 33 standardized catalog cards, 105 supplies for, as business, 41–42 Library of Congress, 20, 29, 37 Licklider, J. C. R., 15, 248–250, 251, 258, 259 Limited Company for Useful Knowledge, 46 Limousin, Charles, 76 Linked Data movement, 278 Linotype, 89, 91, 92 Lippman, Walter, 162 Literary Machines (Nelson), 266 Lodge, Henry Cabot, 148, 165 Lovelace, Ada, 15 Lumière brothers, 62 Macintosh operating system, 260 Malware, 272 Man-Computer Symbiosis (Licklider), 248 Marburg, Theodore, 143 MARK II computer, 258 Markoff, John, 260 Marlowe, Christopher, 24 Marx, Karl, 59 The Master (Tóibín), 127 Masure, Louis, 158 Max, Adolphe, 105 Mazower, Mark, 67 Mechanical collective brain, 206, 218, 287 Mein Kampf (Hitler), 68 Memex, 217, 254, 256, 256–257 Mergenthaler, Otto, 89 Meta-bibliography, 242 MetaWeb (company), 278 Metric system, 30, 150 Microcosm project (England), 270 Microfilm, 100, 193, 200, 208, 210, 218, 250, 255, 274 Microphotic book, 101–107 Microphotography, 101, 208 Military origins of computers and Internet, 18, 248, 252, 258, 265 A Model Utopia (Wells), 211 Modernism, 179, 191 Mondotheque, 235, 238, 257, 296 Mons (Belgian city), 300 Morel, Edward, 54 Morgan, Pierpont, 125 Morris, William, 36 Morse, Samuel, 90 Motion pictures, possibilities of, 228–229 La Muette de Portici (The Mute Girl of Portici, opera), 44 Multimedia, envisioning of, 199 344 INDEX Mumford, Lewis, 115, 302 Mundaneum, 176–189 Berner-Lee’s views compared to, 274 compared to World Wide Web, 19, 234, 253–254, 277–278, 298 creation of, 5, 176–177, 179 design of, 177, 181–183, 182, 185–188, 187, 277 Encyclopedia Universalis Mundaneum (EUM) and, 193 goals of, 18, 177, 185, 292, 304–305 Google partnership with, 295–297 Le Corbusier’s role in, 181–188 Mons location of, 300–301 Nelson’s views compared to, 266 obscurity of, 11–12 Otlet’s description of, 18, 234–235, 238–239, 242, 243 role in utopian World City, 9, 303 World War II fate of, 9–11, 10 Mundaneum (Le Corbusier and Otlet pamphlet), 181, 182 Murray, James, 32 Musée d’Otlet (childhood display by Otlet), 46 Muséothèque (exhibition kit), 194 Museum for the Book (Brussels), 92–93 Museum of Society and Economy (Vienna), 194–195, 196–197 Museums and museum exhibits, role of, 102, 190–201, 227 Mussolini, 189 National Association for the Advancement of Colored People (NAACP), 168, 171 Nationalism, 144, 245 National Science Foundation, 252, 268 Nazis.

See Palais Mondial Worldstream, 291 World War I, 17, 18, 144–145 World War II Nazi book seizures and burnings, 4–5, 7, 12 Nazi occupation of Belgium, 18 Nazi persecution of Goldberg, 210 Nazi persecution of Zamenhof, 68 Otlet’s attempt to save Mundaneum, 10–11 Rosenberg Commission’s interest in Otlet, 4, 5, 7, 13, 245 World Wide Web. See also Internet flatness of, 285, 303 fundamental disorder of, 253–254, 282, 305 Knowledge Web, 276 Linked Data movement, 278 negatives of structure of, 272, 281, 289–291 ongoing development of, 280, 291 openness of, 271–272, 279, 281, 283, 285 origins of, 14, 15, 217, 252–253, 262, 270–275 Otlet’s prophetic vision of, 8, 14–15, 233–234 popularity of, 289 Semantic Web, 273–276, 278–279, 305 World Wide Web Consortium (W3C), 271, 273, 281 Wright, Frank Lloyd, 181, 262 Writers and economic chain of knowledge production, 231–232 WWW Consortium, 253 Xanadu, 264, 267 Xerox PARC (Palo Alto Research Center), 260 Young Friends of the World Palace, 202 Zamenhof, Ludwig, 67–68, 206 Zeiss Ikon camera company, 208, 210 Zero, Mr.


pages: 430 words: 68,225

Blockchain Basics: A Non-Technical Introduction in 25 Steps by Daniel Drescher

bitcoin, blockchain, business process, central bank independence, collaborative editing, cryptocurrency, disintermediation, disruptive innovation, distributed ledger, Ethereum, ethereum blockchain, fiat currency, job automation, linked data, peer-to-peer, place-making, Satoshi Nakamoto, smart contracts, transaction costs

Since broken hash references serve as evidence that data were changed after the reference was created, the whole construct stores data in a change-sensitive manner. How It Works There are two classical patterns of using hash references in order to store data in a change-sensitive manner: • The chain • The tree Blockchain Basics 87 The Chain A chain of linked data, also called a linked list, 2 is formed when each piece of data also contains a hash reference to another piece of data. Such a structure is useful for storing and linking data together that are not fully available at one given point in time but instead arrive step by step in an ongoing fashion. Figure 11-4 illustrates this idea by using the symbols introduced above. The creation of such a chain starts with the piece of data labeled Data 1 and the creation of the hash reference R1.

The peers communicate with one another by utilizing a gossip-style message-passing protocol that ensures that eventually each peer will receive all of the information. Figure 21-5. Architecture and its underlying concepts Blockchain Basics 199 Consensus Logic Since all the nodes of the distributed system maintain their history of transaction data independently, their content can differ due to delays or other adversities of passing messages through a network. As a result, the data store that was meant to form a straight line of linked data blocks actually forms a three-shaped data structure where each branch represents a conflicting version of the transaction history. The consensus logic as depicted in Figure 21-6 makes all nodes of the system eventually consistent by making them choose the identical version of the transaction history that unites the most collective effort. Figure 21-6. Consensus logic and its underlying concepts Gaining Abstraction Abstraction is gained by identifying and distinguishing the components of the blockchain that are specific to the goal of managing ownership from those that are agnostic to the specific application goal.


pages: 193 words: 19,478

Memory Machines: The Evolution of Hypertext by Belinda Barnet

augmented reality, Benoit Mandelbrot, Bill Duvall, British Empire, Buckminster Fuller, Claude Shannon: information theory, collateralized debt obligation, computer age, conceptual framework, Douglas Engelbart, Douglas Engelbart, game design, hiring and firing, Howard Rheingold, HyperCard, hypertext link, information retrieval, Internet Archive, John Markoff, linked data, mandelbrot fractal, Marshall McLuhan, Menlo Park, nonsequential writing, Norbert Wiener, publish or perish, Robert Metcalfe, semantic web, Steve Jobs, Stewart Brand, technoutopianism, Ted Nelson, the scientific method, Vannevar Bush, wikimedia commons

(Bolter 1984, 163) Bolter’s ideas around ‘topographic writing’ were nascent when he started collaborating with Joyce in September 1983 (Bolter and Joyce 1986, 10). They would later have a profound influence over hypertext theory and criticism, and also the Storyspace system. From the outset, the nodes in Storyspace were called ‘writing spaces’, and it worked explicitly with topographic MACHINE-ENHANCED (RE)MINDING 121 metaphors, incorporating a graphic ‘map view’ of the link data structure from the first version, along with a tree and an outline view (which are also visual representations of the data). ‘The tree’, Bolter tells us in Turing’s Man, ‘is a remarkably useful way of representing logical relations in spatial terms’ (Bolter 1984, 86). Also in line with the topographic metaphor, writing spaces in Storyspace acted (and still act) as containers for other writing spaces; an author literally ‘builds’ the space as she traverses it, zooming in and out to view details of the work, the map making the territory.

GLOSSA was a basic implementation in this sense, based on Bolter’s experience with classical texts. ‘You’d tab a text and then you’d be able to associate notes with any particular word or phrase in the text […] an automated version of classical texts with notes’ (Bolter 2011). It wasn’t clickable because the IBM PC wasn’t clickable at the time; the user would move the cursor over the word and select it. This link data structure formed the basis for their future experiments ‘only in the sense that it had this quality of one text leading to another’ (Bolter 2011). In his well-researched chapter on afternoon, Matthew Kirschenbaum suggests that Storyspace has ‘significant grounding in a hierarchical data model’ (Kirschenbaum 2008, 173) that has its origins in the tree structures of ‘interactive fictions of the Adventure type’ (Kirschenbaum 2008, 175).

Hypertext critic Jane Yellowlees Douglas (Joyce’s favourite reader, whose dissertation was on afternoon) argues this node ‘completes’ the work for her, but it is accessible only after the reader has seen a certain sequence of other nodes; ‘a succession of guard fields ensures that it is reached only after a lengthy visitation of fifty-seven narrative places’ (Yellowlees Douglas 2004, 106). Guard fields are a powerful device, and one that Joyce deploys to full effect in afternoon. According to the Markle Report, Joyce ‘agitated’ for them to be included in the design of Storyspace from the outset, and Bolter quickly obliged in their fledgling program: It was just a matter of putting a field into the link data structure that would contain the guard, and then just checking that field […] against what the user did before they were allowed to follow the link […] It was [that] idea you know and it was Michael’s. (Bolter 2011) Guard fields, along with the topographic ‘spatial’ writing style, have remained integral to the Storyspace program for 30 years hence. In 1985 Bolter became involved with an interdisciplinary research group at UNC directed by a colleague from computer science, John B.


pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem

Amazon Web Services, anti-pattern, bioinformatics, commoditize, corporate governance, create, read, update, delete, data acquisition, en.wikipedia.org, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, web application

Individually, single triples are semantically rather poor, but en-masse they provide a rich dataset from which to harvest knowledge and infer connections. Triple stores typically provide SPARQL ca‐ pabilities to reason about stored RDF data.11 RDF—the lingua franca of triple stores and the Semantic Web—can be serialized several ways. RDF encoding of a simple three-node graph shows the RDF/XML format. Here we see how triples come together to form linked data. RDF encoding of a simple three-node graph. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.example.org/ter <rdf:Description rdf:about="http://www.example.org/ginger"> <name>Ginger Rogers</name> <occupation>dancer</occupation> <partner rdf:resource="http://www.example.org/fred"/> </rdf:Description> 10. http://www.w3.org/standards/semanticweb/ 11. See http://www.w3.org/TR/rdf-sparql-query/ and http://www.w3.org/RDF/ Graph Databases | 185 <rdf:Description rdf:about="http://www.example.org/fred"> <name>Fred Astaire</name> <occupation>dancer</occupation> <likes rdf:resource="http://www.example.org/ice-cream"/> </rdf:Description> </rdf:RDF> W3C support That they produce logical representations of triples doesn’t mean triple stores necessarily have triple-like internal implementations.

See http://www.w3.org/TR/rdf-sparql-query/ and http://www.w3.org/RDF/ Graph Databases | 185 <rdf:Description rdf:about="http://www.example.org/fred"> <name>Fred Astaire</name> <occupation>dancer</occupation> <likes rdf:resource="http://www.example.org/ice-cream"/> </rdf:Description> </rdf:RDF> W3C support That they produce logical representations of triples doesn’t mean triple stores necessarily have triple-like internal implementations. Most triple stores, however, are unified by their support for Semantic Web technology such as RDF and SPARQL. While there’s nothing particularly special about RDF as a means of serializing linked data, it is en‐ dorsed by the W3C and therefore benefits from being widely understood and well doc‐ umented. The query language SPARQL benefits from similar W3C patronage. In the graph database space there is a similar abundance of innovation around graph serialization formats (e.g. GEOFF) and inferencing query languages (e.g. the Cypher query language that we use throughout this book).12 The key difference is that at this point these innovations do not enjoy the patronage of a well-regarded body like the W3C, though they do benefit from strong engagement within their user and vendor communities.


pages: 58 words: 12,386

Big Data Glossary by Pete Warden

business intelligence, crowdsourcing, fault tolerance, information retrieval, linked data, natural language processing, recommendation engine, web application

Google Refine Google Refine is an update to the Freebase Gridworks tool for cleaning up large, messy spreadsheets. It has been designed to make it easy to correct the most common errors you’ll encounter in human-created datasets. For example, it’s easy to spot and correct common problems like typos or inconsistencies in text values and to change cells from one format to another. There’s also rich support for linking data by calling APIs with the data contained in existing rows to augment the spreadsheet with information from external sources. Refine doesn’t let you do anything you can’t with other tools, but its power comes from how well it supports a typical extract and transform workflow. It feels like a good step up in abstraction, packaging processes that would typically take multiple steps in a scripting language or spreadsheet package into single operations with sensible defaults.


Data and the City by Rob Kitchin,Tracey P. Lauriault,Gavin McArdle

A Declaration of the Independence of Cyberspace, bike sharing scheme, bitcoin, blockchain, Bretton Woods, Chelsea Manning, citizen journalism, Claude Shannon: information theory, clean water, cloud computing, complexity theory, conceptual framework, corporate governance, correlation does not imply causation, create, read, update, delete, crowdsourcing, cryptocurrency, dematerialisation, digital map, distributed ledger, fault tolerance, fiat currency, Filter Bubble, floating exchange rates, global value chain, Google Earth, hive mind, Internet of things, Kickstarter, knowledge economy, lifelogging, linked data, loose coupling, new economy, New Urbanism, Nicholas Carr, open economy, openstreetmap, packet switching, pattern recognition, performance metric, place-making, RAND corporation, RFID, Richard Florida, ride hailing / ride sharing, semantic web, sentiment analysis, sharing economy, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart contracts, smart grid, smart meter, social graph, software studies, statistical model, TaskRabbit, text mining, The Chicago School, The Death and Life of Great American Cities, the market place, the medium is the message, the scientific method, Toyota Production System, urban planning, urban sprawl, web application

Pinch (eds), The Social Construction of Technological Systems: New Directions in the Sociology and History of Technology. Cambridge, MA: MIT Press, pp. 83–106. Carroll, L. (1893) Sylvie and Bruno Concluded. London: Macmillan and Co. Cosgrove, D. (2001) Apollo’s Eye: A Cartographic Genealogy of the Earth in the Western Imagination. Baltimore, MD: Johns Hopkins University Press. Debruyne, C., Clinton, É., McNerney, L., Lavin, P. and O’Sullivan, D. (2017) ‘On the construction for a linked data platform for Ireland’s authoritative geospatial linked data’, 186 T. P. Lauriault available from: www.osi.ie/wp-content/uploads/2017/01/osi-eswc-2017-preprint.pdf [accessed 10 February 2017]. Dodge, M., Kitchin, R. and Perkins, C. (eds) (2009) Rethinking Maps: New Frontiers in Cartographic Theory. London: Routledge. Foucault, M. (2003) The Essential Foucault: Selections from Essential Works of Foucault, 1954–1984. New York: The New Press.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst

algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

Some tools combine this capability with in-place transformation at the target database as well, taking advantage of the computing capabilities of engineered machines and using change data capture to synchronize, source, and target, again without the overhead of a middle tier. In both cases, the overarching principle is real-time data integration, in which reflecting data change instantly in a data warehouse—whether originating from a MapReduce job or from a transactional system—and create downstream analytics that have an accurate, timely view of reality. Others are turning to linked data and semantics, where data sets are created using linking methodologies that focus on the semantics of the data. This fits well into the broader notion of pointing at external sources from within a data set, which has been around for quite a long time. That ability to point to unstructured data (whether residing in the file system or some external source) merely becomes an extension of the given capabilities, in which the ability to store and process XML and XQuery natively within an RDBMS enables the combination of different degrees of structure while searching and analyzing the underlying data.


Virtual Competition by Ariel Ezrachi, Maurice E. Stucke

Airbnb, Albert Einstein, algorithmic trading, barriers to entry, cloud computing, collaborative economy, commoditize, corporate governance, crony capitalism, crowdsourcing, Daniel Kahneman / Amos Tversky, David Graeber, demand response, disintermediation, disruptive innovation, double helix, Downton Abbey, Erik Brynjolfsson, experimental economics, Firefox, framing effect, Google Chrome, index arbitrage, information asymmetry, interest rate derivative, Internet of things, invisible hand, Jean Tirole, John Markoff, Joseph Schumpeter, Kenneth Arrow, light touch regulation, linked data, loss aversion, Lyft, Mark Zuckerberg, market clearing, market friction, Milgram experiment, multi-sided market, natural language processing, Network effects, new economy, offshore financial centre, pattern recognition, prediction markets, price discrimination, price stability, profit maximization, profit motive, race to the bottom, rent-seeking, Richard Thaler, ride hailing / ride sharing, road to serfdom, Robert Bork, Ronald Reagan, self-driving car, sharing economy, Silicon Valley, Skype, smart cities, smart meter, Snapchat, social graph, Steve Jobs, supply-chain management, telemarketer, The Chicago School, The Myth of the Rational Market, The Wealth of Nations by Adam Smith, too big to fail, transaction costs, Travis Kalanick, turn-by-turn navigation, two-sided market, Uber and Lyft, Uber for X, uber lyft, Watson beat the top human players on Jeopardy!, women in the workforce, yield management

One possibility may be to focus on commercially sensitive information that, although publicly available, is of little or no value to customers but helps the competitors arrive at a supracompetitive price.37 Here the focus is on “cheap talk,” that is, data exchanges that facilitate conscious parallelism but are of limited use to customers. One problem, however, is in identifying such information. Part of the value of Big Data is data fusion, whereby computers link data sets, from which new insights emerge.38 Moreover, the data for some applications—such as customers sharing their inventory data with suppliers—can promote efficiency even while raising antitrust concerns.39 Even if the customers seek to limit what information can be shared, the algorithms—by analyzing a variety of data—could fi ll in the gaps. So it would likely be difficult (and potentially welfare-reducing) for the government to specify what data the algorithms must ignore.

President’s Council of Advisors on Science and Technology, Big Data and Privacy: A Technological Perspective (Washington, DC: Executive Office of the President, May 2014), x, https://www.whitehouse.gov/sites/default/fi les /microsites/ostp/PCAST/pcast _big _data _ and _privacy_-_may_ 2014.pdf; Organisation for Economic Co-operation and Development, Exploring Data-Driven Innovation as a New Source of Growth: Mapping the Policy Issues Raised by “Big Data” (Paris: Organisation for Economic Co-operation and Development, June 18, 2003), 12, http://www.oecd.org/officialdocuments /publicdisplaydocumentpdf/?cote=DSTI/ICCP(2012)9/FINAL&docLanguage =En, observing that “In some cases, big data is defined by the capacity to analyse a variety of mostly unstructured data sets from sources as diverse as web logs, social media, mobile communications, sensors and financial transactions. This requires the capability to link data sets; this can be essential as information is highly context-dependent and may not be of value out of the right context. It also requires the capability to extract information from unstructured data, i.e. data that lack a predefined (explicit or implicit) model.” 39. Stanford Graduate School of Business Staff, “Sharing Information to Boost the Bottom Line,” Insights by Stanford Business (March 1, 1999), http://www .gsb.stanford.edu/insights/sharing-information-boost-bottom-line. 336 Notes to Pages 234–237 Final Reflections 1.


pages: 262 words: 60,248

Python Tricks: The Book by Dan Bader

anti-pattern, domain-specific language, don't repeat yourself, linked data, pattern recognition, performance metric

You’ll see the strengths and weaknesses of each approach so you can decide which implementation is right for your use case. But before we jump in, let’s cover some of the basics first. How do arrays work, and what are they used for? Arrays consist of fixed-size data records that allow each element to be efficiently located based on its index. Because arrays store information in adjoining blocks of memory, they’re considered contiguous data structures (as opposed to linked datas structure like linked lists, for example.) A real world analogy for an array data structure is a parking lot: You can look at the parking lot as a whole and treat it as a single object, but inside the lot there are parking spots indexed by a unique number. Parking spots are containers for vehicles—each parking spot can either be empty or have a car, a motorbike, or some other vehicle parked on it.


pages: 680 words: 157,865

Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design by Diomidis Spinellis, Georgios Gousios

Albert Einstein, barriers to entry, business intelligence, business process, call centre, continuous integration, corporate governance, database schema, Debian, domain-specific language, don't repeat yourself, Donald Knuth, en.wikipedia.org, fault tolerance, Firefox, general-purpose programming language, iterative process, linked data, locality of reference, loose coupling, meta analysis, meta-analysis, MVC pattern, peer-to-peer, premature optimization, recommendation engine, Richard Stallman, Ruby on Rails, semantic web, smart cities, social graph, social web, SPARQL, Steve Jobs, Stewart Brand, traveling salesman, Turing complete, type inference, web application, zero-coupon bond

As creatures with insatiable knowledge appetites, we simply decide what we are interested in and begin to ask for it. There is no central coordination, and we are free to document our wandering by republishing our stories, thoughts, and journeys as we go. We think of the Web as a series of one-way links between documents (see Figure 5-1). Figure 5-1. Conventional notion of the Web Linked documents are only part of the picture, however. The vision for the Web always included the idea of linked data as well. This content can be consumed through a rendered view or directly referenced and manipulated in preferred forms in different contexts. You can imagine a middle-tier layer asking for information as an XML document while the presentation tier prefers a JSON object via an AJAX call. The same name refers to the same data in different forms. By allowing the data to be addressed like this, it is easy to build layered applications that have consistent views, even if they are asking for different levels of detail or wish to have the data styled in a particular way.

This diminishes the main benefit expected of stateless programming: to facilitate mathematical reasoning about programs. For the more difficult aspects of establishing the correctness of a design or implementation, the advantage of the functional approach is not so clear. For example, proving that a recursive definition has specific properties and terminates requires the equivalent of a loop invariant and variant. It is also unlikely that efficient functional programs can afford to renounce programmer-visible linked data structures, with all the resulting problems such as aliasing, which are challenging regardless of the underlying programming model. If functional programming fails to bring a significant simplification to the task of establishing correctness, there remains a major practical argument: referential transparency. This is the notion of substitutivity of equals for equals: in mathematics, f (a) always means the same thing for given values of f and a.


The Data Journalism Handbook by Jonathan Gray, Lucy Chambers, Liliana Bounegru

Amazon Web Services, barriers to entry, bioinformatics, business intelligence, carbon footprint, citizen journalism, correlation does not imply causation, crowdsourcing, David Heinemeier Hansson, eurozone crisis, Firefox, Florence Nightingale: pie chart, game design, Google Earth, Hans Rosling, information asymmetry, Internet Archive, John Snow's cholera map, Julian Assange, linked data, moral hazard, MVC pattern, New Journalism, openstreetmap, Ronald Reagan, Ruby on Rails, Silicon Valley, social graph, SPARQL, text mining, web application, WikiLeaks

While we are all either a journalist, designer, or developer “first,” we continue to work hard to increase our understanding and proficiency in each other’s areas of expertise. The core products for exploring data are Excel, Google Docs, and Fusion Tables. The team has also, but to a lesser extent, used MySQL, Access databases, and Solr to explore larger datasets; and used RDF and SPARQL to begin looking at ways in which we can model events using Linked Data technologies. Developers will also use their programming language of choice, whether that’s ActionScript, Python, or Perl, to match, parse, or generally pick apart a dataset we might be working on. Perl is used for some of the publishing. We use Google, Bing Maps, and Google Earth, along with Esri’s ArcMAP, for exploring and visualizing geographical data. For graphics we use the Adobe Suite including After Effects, Illustrator, Photoshop, and Flash, although we would rarely publish Flash files on the site these days as JavaScript—particularly JQuery and other JavaScript libraries like Highcharts, Raphael and D3—increasingly meets our data visualization requirements


pages: 224 words: 13,238

Electronic and Algorithmic Trading Technology: The Complete Guide by Kendall Kim

algorithmic trading, automated trading system, backtesting, commoditize, computerized trading, corporate governance, Credit Default Swap, diversification, en.wikipedia.org, family office, financial innovation, fixed income, index arbitrage, index fund, interest rate swap, linked data, market fragmentation, money market fund, natural language processing, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, short selling, statistical arbitrage, Steven Levy, transaction costs, yield curve

However, most financial services institutions do not have the ability to reach an optimal infrastructure because resources for most of a brokerage firm’s cost center have fallen victim to applying discretionary funds within the profit center such as the trading area of the business. It is clearly evident that budgets for data infrastructure have been reduced in the past years when the need for enhancing performance and technology has never been greater. Presumably, this will change in the future, though, when linking data to trading profitability becomes more evident. 8.5 Impact on Operations and Technology Real-time transaction processing and electronic trading can result in a great deal of automation for operations. Real-time transactions move more Effective Data Management 89 quickly, tend to be more accurate, have fewer problems, and need less attention than manually engaged transactions. According to the TABB Group, 60% of trades were processed manually over seven years ago.


Algorithms in C++ Part 5: Graph Algorithms by Robert Sedgewick

Erdős number, linear programming, linked data, NP-complete, reversible computing, sorting algorithm, traveling salesman

We have already encountered graphs, briefly, in Part 1. Indeed, the first algorithms that we considered in detail, the union-find algorithms in Chapter 1, are prime examples of graph algorithms. We also used graphs in Chapter 3 as an illustration of applications of two-dimensional arrays and linked lists, and in Chapter 5 to illustrate the relationship between recursive programs and fundamental data structures. Any linked data structure is a representation of a graph, and some familiar algorithms for processing trees and other linked structures are special cases of graph algorithms. The purpose of this chapter is to provide a context for developing an understanding of graph algorithms ranging from the simple ones in Part 1 to the sophisticated ones in Chapters 18 through 22. As always, we are interested in knowing which are the most efficient algorithms that solve a particular problem.

The primary advantage of the adjacency-lists representation over the adjacency-matrix representation is that it always uses space proportional to E + V, as opposed to V2 in the adjacency matrix. The primary disadvantage is that testing for the existence of specific edges can take time proportional to V, as opposed to constant time in the adjacency matrix. These differences trace, essentially, to the difference between using linked lists and vectors to represent the set of vertices incident on each vertex. Thus, we see again that an understanding of the basic properties of linked data structures and vectors is critical if we are to develop efficient graph ADT implementations. Our interest in these performance differences is that we want to avoid implementations that are inappropriately inefficient under unexpected circumstances when a wide range of operations is to be demanded of the ADT. In Section 17.5, we discuss the application of basic data structures to realize many of the theoretical benefits of both structures.


pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzić

Albert Einstein, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application

Examples of homogeneous networks include single-mode social networks, such as people connected by friendship links, or the World Wide Web (WWW), a collection of linked Web pages. Examples of heterogeneous networks include those in medical domains describing patients, diseases, treatments, and contacts, or in bibliographic domains describing publications, authors, and venues. Graph-mining techniques explicitly consider these links when building predictive or descriptive models of the linked data. The requirement of different applications with graph-based data sets is not very uniform. Thus, graph models and mining algorithms that work well in one domain may not work well in another. For example, chemical data is often represented as graphs in which the nodes correspond to atoms, and the links correspond to bonds between the atoms. The individual graphs are quite small although there are significant repetitions among the different nodes.

A labeled graph is a graph in which each link carries some value. Therefore, a labeled graph G consists of three sets of information: G(N,L,V), where the new component V = {v1, v2, … , vt} is a set of values attached to links. An example of a directed graph is given in Figure 12.2b, while the graph in Figure 12.2c is a labeled graph. Different applications use different types of graphs in modeling linked data. In this chapter the primary focus is on undirected and unlabeled graphs although the reader still has to be aware that there are numerous graph-mining algorithms for directed and/or labeled graphs. Besides a graphical representation, each graph may be presented in the form of the incidence matrix I(G) where nodes are indexing rows and links are indexing columns. The matrix entry in the position (i,j) has value a if node ni is incident with a, the link lj.


pages: 288 words: 85,073

Factfulness: Ten Reasons We're Wrong About the World – and Why Things Are Better Than You Think by Hans Rosling, Ola Rosling, Anna Rosling Rönnlund

animal electricity, clean water, colonial rule, en.wikipedia.org, energy transition, first square of the chessboard, first square of the chessboard / second half of the chessboard, global pandemic, Hans Rosling, illegal immigration, income inequality, income per capita, Intergovernmental Panel on Climate Change (IPCC), jimmy wales, linked data, lone genius, microcredit, purchasing power parity, Stanford marshmallow experiment, Steven Pinker, Thomas L Friedman, Walter Mischel

We presented at the ceremony for their new Open Data platform in May 2010, and since then the World Bank has become the main access point for reliable global statistics; see gapm.io/x6. This was all possible thanks to Tim Berners-Lee and other early visionaries of the free internet. Sometime after he had invented the World Wide Web, Tim Berners-Lee contacted us, asking to borrow a slide show that showed how a web of linked data sources could flourish (using an image of pretty flowers). We share all of our content for free, so of course we said yes. Tim used this “flower-powerpoint” in his 2009 TED talk—see gapm.io/x6—to help people see the beauty of “The Next Web,” and he uses Gapminder as an example of what happens when data from multiple sources come together; see Berners-Lee (2009). His vision is so bold, we have thus far seen only the early shoots!


pages: 374 words: 94,508

Infonomics: How to Monetize, Manage, and Measure Information as an Asset for Competitive Advantage by Douglas B. Laney

3D printing, Affordable Care Act / Obamacare, banking crisis, blockchain, business climate, business intelligence, business process, call centre, chief data officer, Claude Shannon: information theory, commoditize, conceptual framework, crowdsourcing, dark matter, data acquisition, digital twin, discounted cash flows, disintermediation, diversification, en.wikipedia.org, endowment effect, Erik Brynjolfsson, full employment, informal economy, intangible asset, Internet of things, linked data, Lyft, Nash equilibrium, Network effects, new economy, obamacare, performance metric, profit motive, recommendation engine, RFID, semantic web, smart meter, Snapchat, software as a service, source of truth, supply-chain management, text mining, uber lyft, Y2K, yield curve

The speed at which a supply chain provides products to the customer. Examples include cycle-time metrics. • Information accessibility • User request turnaround time • User satisfaction survey Agility The ability to respond to external influences, and the ability to respond to marketplace changes to gain or maintain competitive advantage. SCOR agility metrics include flexibility and adaptability. • Utility of information for a range of purposes • Linked data, metadata, and master data measures • Ease of integrating new types of data or changing dimensions Costs The cost of operating the supply chain processes. This includes labor costs, material costs, management, and transportation costs. A typical cost metric is cost of goods sold. • Data acquisition cost • Data management costs • Data delivery costs (Each include labor and technology related costs) Asset Management Efficiency (Assets) The ability to efficiently utilize assets.


pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists by Gary Marcus, Jeremy Freeman

23andMe, Albert Einstein, bioinformatics, bitcoin, brain emulation, cloud computing, complexity theory, computer age, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data acquisition, Drosophila, epigenetics, global pandemic, Google Glasses, iterative process, linked data, mouse model, optical character recognition, pattern recognition, personalized medicine, phenotype, race to the bottom, Richard Feynman, Ronald Reagan, semantic web, speech recognition, stem cell, Steven Pinker, supply-chain management, Turing machine, twin studies, web application

While efforts to map the brain have begun as public, government-funded projects, this does not mean that private entities will not enter the arena and seek to compete with those projects. Although initial efforts to map the brain may be fueled by public funds, the issue of how “fine-tuned” information that can be used to determine risk factors or emerging disease states in individual’s brains, which will require linking data to genetic databases, health records, and health databases, will be handled merits discussion now. What rules will govern the sharing of detailed scans or maps about each individual’s brain? Can data be linked from a brain scan to a genome to a database without an individual’s express consent if that person’s identity is not 100 percent secure? What information about the brain can be patented?


pages: 356 words: 102,224

Pale Blue Dot: A Vision of the Human Future in Space by Carl Sagan

Albert Einstein, anthropic principle, cosmological principle, dark matter, Dava Sobel, Francis Fukuyama: the end of history, germ theory of disease, invention of the telescope, Isaac Newton, Johannes Kepler, Kuiper Belt, linked data, low earth orbit, nuclear winter, planetary scale, profit motive, scientific worldview, Search for Extraterrestrial Intelligence, Stephen Hawking, telepresence

You take a step forward, and the rover walks forward. You reach out your arm to pick up something shiny in the soil, and the robot arm does likewise. The sands of Mars trickle through your fingers. The only difficulty with this remote reality technology is that all this must occur in tedious slow motion: The round-trip travel time of 115 the up-link commands from Earth to Mars and the down-link data returned from Mars to Earth might take half an hour or more. But this is something we can learn to do. We can learn to contain our exploratory impatience if that's the price of exploring Mars. The rover can be made smart enough to deal with routine contingencies. Anything more challenging, and it makes a dead stop, puts itself into a safeguard mode, and radios for a very patient human controller to take over.


pages: 313 words: 101,403

My Life as a Quant: Reflections on Physics and Finance by Emanuel Derman

Berlin Wall, bioinformatics, Black-Scholes formula, Brownian motion, buy and hold, capital asset pricing model, Claude Shannon: information theory, Donald Knuth, Emanuel Derman, fixed income, Gödel, Escher, Bach, haute couture, hiring and firing, implied volatility, interest rate derivative, Jeff Bezos, John Meriwether, John von Neumann, law of one price, linked data, Long Term Capital Management, moral hazard, Murray Gell-Mann, Myron Scholes, Paul Samuelson, pre–internet, publish or perish, quantitative trading / quantitative finance, Sharpe ratio, statistical arbitrage, statistical model, Stephen Hawking, Steve Jobs, stochastic volatility, technology bubble, the new new thing, transaction costs, volatility smile, Y2K, yield curve, zero-coupon bond, zero-sum game

A colleague, Ed Sheppard, was assigned to work with me, and we planned to rewrite the system to incorporate multidimensional array variables in order to represent more general financial time series. While I was away on a two-week beach vacation at Fire Island with my family, Ed suddenly threw himself into redesigning and then rewriting the entire system-without giving me advance notice. I returned to a fait accompli, a completely new, enhanced, and almost unrecognizable APL-flavored version of the language. Ed's version now incorporated vastly complex dynamically linked data structures, whose details I knew I would not live long enough to master. Ed had also cleverly modified HEQS so that, once you had used it interactively to develop and solve a financial model, you could then use it generate a C program that would solve your equations many times faster. Programming came naturally to Ed in a way it never would to me, and his proficiency daunted me. Sometime in late 1984 he left to join Asymetrix, a Seattle-based company founded by Paul Allen.


pages: 348 words: 97,277

The Truth Machine: The Blockchain and the Future of Everything by Paul Vigna, Michael J. Casey

3D printing, additive manufacturing, Airbnb, altcoin, Amazon Web Services, barriers to entry, basic income, Berlin Wall, Bernie Madoff, bitcoin, blockchain, blood diamonds, Blythe Masters, business process, buy and hold, carbon footprint, cashless society, cloud computing, computer age, computerized trading, conceptual framework, Credit Default Swap, crowdsourcing, cryptocurrency, cyber-physical system, dematerialisation, disintermediation, distributed ledger, Donald Trump, double entry bookkeeping, Edward Snowden, Elon Musk, Ethereum, ethereum blockchain, failed state, fault tolerance, fiat currency, financial innovation, financial intermediation, global supply chain, Hernando de Soto, hive mind, informal economy, intangible asset, Internet of things, Joi Ito, Kickstarter, linked data, litecoin, longitudinal study, Lyft, M-Pesa, Marc Andreessen, market clearing, mobile money, money: store of value / unit of account / medium of exchange, Network effects, off grid, pets.com, prediction markets, pre–internet, price mechanism, profit maximization, profit motive, ransomware, rent-seeking, RFID, ride hailing / ride sharing, Ross Ulbricht, Satoshi Nakamoto, self-driving car, sharing economy, Silicon Valley, smart contracts, smart meter, Snapchat, social web, software is eating the world, supply-chain management, Ted Nelson, the market place, too big to fail, trade route, transaction costs, Travis Kalanick, Turing complete, Uber and Lyft, uber lyft, unbanked and underbanked, underbanked, universal basic income, web of trust, zero-sum game

With names such as Storj, Sia, and Maidsafe, these new platforms reward you with tokens if you offer up your spare hard-drive space to other computer users in a global network of users. You could say these “cloud” services are much truer to that name than those of Amazon Web Services, Google, Dropbox, IBM, Oracle, Microsoft, and Apple, the providers with which most people associate that word. But even bigger changes are being considered, including projects to entirely re-architect the Web itself. There’s Solid, which stands for Social Linked Data, a new protocol for data storage that puts data back in the hands of the people to whom it belongs. The core idea is that we will store our data in Pods (Personalized Online Data Stores) and distribute it to applications via permissions we control. Solid is the brainchild of none other than Tim Berners-Lee, the computer scientist who perfected HTTP and gave us the World Wide Web. Another one that gets a lot of people excited is the Interplanetary File System, designed by Juan Benet.


Future Files: A Brief History of the Next 50 Years by Richard Watson

Albert Einstein, bank run, banking crisis, battle of ideas, Black Swan, call centre, carbon footprint, cashless society, citizen journalism, commoditize, computer age, computer vision, congestion charging, corporate governance, corporate social responsibility, deglobalization, digital Maoism, disintermediation, epigenetics, failed state, financial innovation, Firefox, food miles, future of work, global pandemic, global supply chain, global village, hive mind, industrial robot, invention of the telegraph, Jaron Lanier, Jeff Bezos, knowledge economy, lateral thinking, linked data, low cost airline, low skilled workers, M-Pesa, mass immigration, Northern Rock, peak oil, pensions crisis, precision agriculture, prediction markets, Ralph Nader, Ray Kurzweil, rent control, RFID, Richard Florida, self-driving car, speech recognition, telepresence, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Turing test, Victor Gruen, white flight, women in the workforce, Zipcar

Carolyn 153 trends that will transform transport 5 Embedded intelligence Cars can already be opened or started using fingerprint and iris recognition, so we’ll see more technologies linking vehicle security to user identification. We will also see mood-sensitive vehicles that adjust their behavior according to the mood of the driver or occupants. Cars will also become mobile technology platforms linking data to other services such as healthcare. For example, if your car regularly detects an abnormal heartbeat or high levels of stress, this information could be sent wirelessly to your doctor. Obviously privacy issues abound, but cars could become useful data-collection and delivery points. Remote monitoring Electronic data recorders are little black boxes that already sit covertly inside some cars and monitor your speed, acceleration and braking.


pages: 350 words: 109,521

Our 50-State Border Crisis: How the Mexican Border Fuels the Drug Epidemic Across America by Howard G. Buffett

airport security, clean water, collective bargaining, defense in depth, Donald Trump, illegal immigration, immigration reform, linked data, low skilled workers, moral panic

Originally, our foundation supported Dr. Anderson’s work directly, but now we support it through a nonprofit called the Colibri Center for Human Rights that works with the medical examiner’s office to identify these remains and provide closure for families regardless of the origins of the deceased. For example, we funded an international geographic information system (GIS) initiative in Pima County to link data from missing person reports to postmortem reports. We agree with Anderson and Colibri that respect for the dead is one measure of a civilized society. Is it civilized to view the “mortal danger” of the desert as a deterrent? Should it give us pause that before Operation Gatekeeper funneled immigrants to the desert, there were only about twelve bodies per year recovered along the border? People debate these questions.


pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future by Kevin Kelly

A Declaration of the Independence of Cyberspace, AI winter, Airbnb, Albert Einstein, Amazon Web Services, augmented reality, bank run, barriers to entry, Baxter: Rethink Robotics, bitcoin, blockchain, book scanning, Brewster Kahle, Burning Man, cloud computing, commoditize, computer age, connected car, crowdsourcing, dark matter, dematerialisation, Downton Abbey, Edward Snowden, Elon Musk, Filter Bubble, Freestyle chess, game design, Google Glasses, hive mind, Howard Rheingold, index card, indoor plumbing, industrial robot, Internet Archive, Internet of things, invention of movable type, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Kevin Kelly, Kickstarter, lifelogging, linked data, Lyft, M-Pesa, Marc Andreessen, Marshall McLuhan, means of production, megacity, Minecraft, Mitch Kapor, multi-sided market, natural language processing, Netflix Prize, Network effects, new economy, Nicholas Carr, old-boy network, peer-to-peer, peer-to-peer lending, personalized medicine, placebo effect, planetary scale, postindustrial economy, recommendation engine, RFID, ride hailing / ride sharing, Rodney Brooks, self-driving car, sharing economy, Silicon Valley, slashdot, Snapchat, social graph, social web, software is eating the world, speech recognition, Stephen Hawking, Steven Levy, Ted Nelson, the scientific method, transport as a service, two-sided market, Uber for X, uber lyft, Watson beat the top human players on Jeopardy!, Whole Earth Review, zero-sum game

Just as the internet is the network of networks, the intercloud is the cloud of clouds. Slowly but surely Amazon’s cloud and Google’s cloud and Facebook’s cloud and all the other enterprise clouds are intertwining into one massive cloud that acts as a single cloud—The Cloud—to the average user or company. A counterforce resisting this merger is that an intercloud requires commercial clouds to share their data (a cloud is a network of linked data), and right now data tends to be hoarded like gold. Data hoards are seen as a competitive advantage, and sharing data freely is hampered by laws, so it will be many years (decades?) before companies learn how to share their data creatively, productively, and responsibly. There is one final step in the inexorable march toward decentralized access. At the same time we are moving to an intercloud we will also move toward one that is fully decentralized and peer to peer.


pages: 404 words: 43,442

The Art of R Programming by Norman Matloff

Debian, discrete time, Donald Knuth, general-purpose programming language, linked data, sorting algorithm, statistical model

In our example tree, where the root node contains 8, all of the values in the left subtree—5, 2 and 6—are less than 8, while 20 is greater than 8. If implemented in C, a tree node would be represented by a C struct, similar to an R list, whose contents are the stored value, a pointer to the left child, and a pointer to the right child. But since R lacks pointer variables, what can we do? Our solution is to go back to the basics. In the old prepointer days in FORTRAN, linked data structures were implemented in long arrays. A pointer, which in C is a memory address, was an array index instead. Specifically, we’ll represent each node by a row in a three-column matrix. The node’s stored value will be in the third element of that row, while the first and second elements will be the left and right links. For instance, if the first element in a row is 29, it means that this node’s left link points to the node stored in row 29 of the matrix.


Remix by John Courtenay Grimwood

clean water, delayed gratification, double helix, fear of failure, haute couture, Kickstarter, linked data

The old man hated the lattice, mostly because he refused to believe anyone might want to kill him. But Lady Clare had insisted, reeling off a list that began with the Antiguan Absolutists and ended with Zebediah Nouveau. Mind you, he didn’t hate standing inside that circle as much as he hated being there at all. But Lady Clare had insisted on that as well. Keeping her good side to the main CySat camera, Lady Clare smiled. It was amazing how much clout you carried when you’d linked data credits to gold reserves to keep the senior officers loyal, welcomed the UN Pax Force with open arms, arranged for Paris to be the first European city overflown with the new ‘dote and put some backbone into the Prince Imperial. This was the General’s payback, and as far as Lady Clare was concerned it was a small price. As for her reward... The new Princess Imperial looked around her, eyes stopping briefly as they touched on a tired young Imperial Guard.


The Art of Computer Programming: Fundamental Algorithms by Donald E. Knuth

discrete time, distributed generation, Donald Knuth, fear of failure, Fermat's Last Theorem, G4S, Gerard Salton, Isaac Newton, Jacquard loom, Johannes Kepler, John von Neumann, linear programming, linked data, Menlo Park, probability theory / Blaise Pascal / Pierre de Fermat, sorting algorithm, stochastic process, Turing machine

One possible answer for the example above would be X[l]: X[2]: X[3]: BASE 2400 2430 2450 SUB 1002 1010 1006 X[4]: X[5]: X[61: BASE 2510 2530 2730 SUB 1000 1003 0 The last entry contains the first unused memory address. (Clearly, this is not the only way to treat a library of subroutines. The proper way to design a library is heavily dependent upon the computer used and the applications to be handled. Large modern computers require an entirely different approach to subroutine libraries. But this is a nice exercise anyway, because it involves interesting manipulations on both sequential and linked data.) The problem in this exercise is to design an algorithm for the stated task. Your allocator may transform the tape directory in any way as it prepares its answer, since the tape directory can be read in anew by the subroutine allocator on its next assignment, and the tape directory is not needed by other parts of the loading routine. 27. [25] Write a MIX program for the subroutine allocation algorithm of exercise 26. 28. [40] The following construction shows how to "solve" a fairly general type of two- person game, including chess, nim, and many simpler games: Consider a finite set of nodes, each of which represents a possible position in the game.

The LINK field of each Symbol Table entry points to the most recently encoun- encountered Data Table entry for the symbolic name in question. The first algorithm we require is one that builds the Data Table in such a form. Note the flexibility in choice of level numbers that is allowed by the COBOL rules; the left structure in D) is completely equivalent to 1 A 2 B 3 C 3 D 2 E 2 F 3 G because level numbers do not have to be sequential. 428 INFORMATION STRUCTURES 2.4 Symbol Table LINK Data Table PREV PARENT NAME CHILD SIB A: B: C: D: E: F: G: H: Al B5 C5 D9 E9 F5 G9 HI Empty boxes indicate additional information not relevant here A A A A A A A A F3 G4 B3 C7 E3 D7 G8 A Al B3 B3 Al Al F3 A HI F5 HI HI C5 C5 C5 A B C D E F G H F G B C E D G B3 C7 A A A G4 A F5 G8 A A E9 A A A HI E3 D7 A F3 A A A B5 A C5 A D9 G9 A E) Al: B3? C7: D7: E3: F3: G4: HI: F5: G8: C5: E9: D9: G9: Some sequences of level numbers are illegal, however; for example, if the level number of D in D) were changed to " (in either place) we would have a meaningless data configuration, violating the rule that all items of a group must have the same number.


pages: 505 words: 133,661

Who Owns England?: How We Lost Our Green and Pleasant Land, and How to Take It Back by Guy Shrubsole

back-to-the-land, Beeching cuts, Boris Johnson, Capital in the Twenty-First Century by Thomas Piketty, centre right, congestion charging, deindustrialization, digital map, do-ocracy, Downton Abbey, financial deregulation, fixed income, Goldman Sachs: Vampire Squid, Google Earth, housing crisis, James Dyson, Kickstarter, land reform, land tenure, land value tax, linked data, loadsamoney, mega-rich, mutually assured destruction, new economy, Occupy movement, offshore financial centre, oil shale / tar sands, openstreetmap, place-making, plutocrats, Plutocrats, profit motive, rent-seeking, Right to Buy, Ronald Reagan, sceptred isle, Stewart Brand, the built environment, the map is not the territory, The Wealth of Nations by Adam Smith, trickle-down economics, urban sprawl, web of trust, Yom Kippur War, zero-sum game

‘I cannot find any evidence that the major housebuilders are financial investors of this kind,’ he stated, pointing the finger of blame instead at the rate at which new homes could be absorbed into the marketplace. Part of the problem is that the data on what companies own still isn’t good enough to prove whether or not land banking is occurring. Anna has tried to map the land owned by housing developers, but has been thwarted by the lack in the Land Registry’s corporate dataset of the necessary information to link data on who owns a site with digital maps of that area. That makes it very hard to assess, for example, whether a piece of land owned by a housebuilder for decades is a prime site accruing in value or a leftover fragment of ground from a past development. Second, the scope of Letwin’s review was drawn too narrowly to examine the wider problem of land banking by landowners beyond the major housebuilders.


pages: 494 words: 142,285

The Future of Ideas: The Fate of the Commons in a Connected World by Lawrence Lessig

AltaVista, Andy Kessler, barriers to entry, business process, Cass Sunstein, commoditize, computer age, creative destruction, dark matter, disintermediation, disruptive innovation, Donald Davies, Erik Brynjolfsson, George Gilder, Hacker Ethic, Hedy Lamarr / George Antheil, Howard Rheingold, Hush-A-Phone, HyperCard, hypertext link, Innovator's Dilemma, invention of hypertext, inventory management, invisible hand, Jean Tirole, Jeff Bezos, Joseph Schumpeter, Kenneth Arrow, Larry Wall, Leonard Kleinrock, linked data, Marc Andreessen, Menlo Park, Mitch Kapor, Network effects, new economy, packet switching, peer-to-peer, peer-to-peer model, price mechanism, profit maximization, RAND corporation, rent control, rent-seeking, RFC: Request For Comment, Richard Stallman, Richard Thaler, Robert Bork, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, Silicon Valley, smart grid, software patent, spectrum auction, Steve Crocker, Steven Levy, Stewart Brand, Ted Nelson, Telecommunications Act of 1996, The Chicago School, transaction costs, zero-sum game

And these thousands produced a far better, more complete, and richer database of culture than commercial sites had produced. For a time, one could find an extraordinary range of songs archived throughout the Web. Slowly these services have migrated to commercial sites. This migration means the commercial sites can support the costs of developing and maintaining this information. And in some cases, with some databases, the Internet provided a simple way to collect and link data about music in particular.8 Here the CDDB—or “CD database”—is the most famous example. As MP3 equipment became common, people needed a simple way to get information about CD titles and tracks onto the MP3 device. Of course, one could type in that information, but why should everyone have to type in that information? Many MP3 services thus enabled a cooperative process. When a user installed a CD, the system queried the central database to see whether that CD had been cataloged thus far.


pages: 598 words: 134,339

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World by Bruce Schneier

23andMe, Airbnb, airport security, AltaVista, Anne Wojcicki, augmented reality, Benjamin Mako Hill, Black Swan, Boris Johnson, Brewster Kahle, Brian Krebs, call centre, Cass Sunstein, Chelsea Manning, citizen journalism, cloud computing, congestion charging, disintermediation, drone strike, Edward Snowden, experimental subject, failed state, fault tolerance, Ferguson, Missouri, Filter Bubble, Firefox, friendly fire, Google Chrome, Google Glasses, hindsight bias, informal economy, Internet Archive, Internet of things, Jacob Appelbaum, Jaron Lanier, John Markoff, Julian Assange, Kevin Kelly, license plate recognition, lifelogging, linked data, Lyft, Mark Zuckerberg, moral panic, Nash equilibrium, Nate Silver, national security letter, Network effects, Occupy movement, Panopticon Jeremy Bentham, payday loans, pre–internet, price discrimination, profit motive, race to the bottom, RAND corporation, recommendation engine, RFID, Ross Ulbricht, self-driving car, Shoshana Zuboff, Silicon Valley, Skype, smart cities, smart grid, Snapchat, social graph, software as a service, South China Sea, stealth mode startup, Steven Levy, Stuxnet, TaskRabbit, telemarketer, Tim Cook: Apple, transaction costs, Uber and Lyft, uber lyft, undersea cable, urban planning, WikiLeaks, zero day

., 160 fiduciary responsibility, data collection and, 204–5 50 Cent Party, 114 FileVault, 215 filter bubble, 114–15 FinFisher, 81 First Unitarian Church of Los Angeles, 91 FISA (Foreign Intelligence Surveillance Act; 1978), 273 FISA Amendments Act (2008), 171, 273, 275–76 Section 702 of, 65–66, 173, 174–75, 261 FISA Court, 122, 171 NSA misrepresentations to, 172, 337 secret warrants of, 174, 175–76, 177 transparency needed in, 177 fishing expeditions, 92, 93 Fitbit, 16, 112 Five Eyes, 76 Flame, 72 FlashBlock, 49 flash cookies, 49 Ford Motor Company, GPS data collected by, 29 Foreign Intelligence Surveillance Act (FISA; 1978), 273 see also FISA Amendments Act Forrester Research, 122 Fortinet, 82 Fox-IT, 72 France, government surveillance in, 79 France Télécom, 79 free association, government surveillance and, 2, 39, 96 freedom, see liberty Freeh, Louis, 314 free services: overvaluing of, 50 surveillance exchanged for, 4, 49–51, 58–59, 60–61, 226, 235 free speech: as constitutional right, 189, 344 government surveillance and, 6, 94–95, 96, 97–99 Internet and, 189 frequent flyer miles, 219 Froomkin, Michael, 198 FTC, see Federal Trade Commission, US fusion centers, 69, 104 gag orders, 100, 122 Gamma Group, 81 Gandy, Oscar, 111 Gates, Bill, 128 gay rights, 97 GCHQ, see Government Communications Headquarters Geer, Dan, 205 genetic data, 36 geofencing, 39–40 geopolitical conflicts, and need for surveillance, 219–20 Georgia, Republic of, cyberattacks on, 75 Germany: Internet control and, 188 NSA surveillance of, 76, 77, 122–23, 151, 160–61, 183, 184 surveillance of citizens by, 350 US relations with, 151, 234 Ghafoor, Asim, 103 GhostNet, 72 Gill, Faisal, 103 Gmail, 31, 38, 50, 58, 219 context-sensitive advertising in, 129–30, 142–43 encryption of, 215, 216 government surveillance of, 62, 83, 148 GoldenShores Technologies, 46–47 Goldsmith, Jack, 165, 228 Google, 15, 27, 44, 48, 54, 221, 235, 272 customer loyalty to, 58 data mining by, 38 data storage capacity of, 18 government demands for data from, 208 impermissible search ad policy of, 55 increased encryption by, 208 as information middleman, 57 linked data sets of, 50 NSA hacking of, 85, 208 PageRank algorithm of, 196 paid search results on, 113–14 search data collected by, 22–23, 31, 123, 202 transparency reports of, 207 see also Gmail Google Analytics, 31, 48, 233 Google Calendar, 58 Google Docs, 58 Google Glass, 16, 27, 41 Google Plus, 50 real name policy of, 49 surveillance by, 48 Google stalking, 230 Gore, Al, 53 government: checks and balances in, 100, 175 surveillance by, see mass surveillance, government Government Accountability Office, 30 Government Communications Headquarters (GCHQ): cyberattacks by, 149 encryption programs and, 85 location data used by, 3 mass surveillance by, 69, 79, 175, 182, 234 government databases, hacking of, 73, 117, 313 GPS: automobile companies’ use of, 29–30 FBI use of, 26, 95 police use of, 26 in smart phones, 3, 14 Grayson, Alan, 172 Great Firewall (Golden Shield), 94, 95, 150–51, 187, 237 Greece, wiretapping of government cell phones in, 148 greenhouse gas emissions, 17 Greenwald, Glenn, 20 Grindr, 259 Guardian, Snowden documents published by, 20, 67, 149 habeas corpus, 229 hackers, hacking, 42–43, 71–74, 216, 313 of government databases, 73, 117, 313 by NSA, 85 privately-made technology for, 73, 81 see also cyberwarfare Hacking Team, 73, 81, 149–50 HAPPYFOOT, 3 Harris Corporation, 68 Harris Poll, 96 Hayden, Michael, 23, 147, 162 health: effect of constant surveillance on, 127 mass surveillance and, 16, 41–42 healthcare data, privacy of, 193 HelloSpy, 3, 245 Hewlett-Packard, 112 Hill, Raquel, 44 hindsight bias, 322 Hobbes, Thomas, 210 Home Depot, 110, 116 homosexuality, 97 Hoover, J.


pages: 528 words: 146,459

Computer: A History of the Information Machine by Martin Campbell-Kelly, William Aspray, Nathan L. Ensmenger, Jeffrey R. Yost

Ada Lovelace, air freight, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Apple's 1984 Super Bowl advert, barriers to entry, Bill Gates: Altair 8800, borderless world, Buckminster Fuller, Build a better mousetrap, Byte Shop, card file, cashless society, cloud computing, combinatorial explosion, computer age, deskilling, don't be evil, Donald Davies, Douglas Engelbart, Douglas Engelbart, Dynabook, fault tolerance, Fellow of the Royal Society, financial independence, Frederick Winslow Taylor, game design, garden city movement, Grace Hopper, informal economy, interchangeable parts, invention of the wheel, Jacquard loom, Jeff Bezos, jimmy wales, John Markoff, John von Neumann, Kickstarter, light touch regulation, linked data, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Mitch Kapor, natural language processing, Network effects, New Journalism, Norbert Wiener, Occupy movement, optical character recognition, packet switching, PageRank, pattern recognition, Pierre-Simon Laplace, pirate software, popular electronics, prediction markets, pre–internet, QWERTY keyboard, RAND corporation, Robert X Cringely, Silicon Valley, Silicon Valley startup, Steve Jobs, Steven Levy, Stewart Brand, Ted Nelson, the market place, Turing machine, Vannevar Bush, Von Neumann architecture, Whole Earth Catalog, William Shockley: the traitorous eight, women in the workforce, young professional

was already well established when two other Stanford University doctoral students, Larry Page and Sergey Brin, began work on the Stanford Digital Library Project (funded in part by the National Science Foundation)—research that would not only forever change the process of finding things on the Internet but also, in time, lead to an unprecedentedly successful web advertising model. Page became interested in a dissertation project on the mathematical properties of the web, and found strong support from his adviser Terry Winograd, a pioneer of artificial intelligence research on natural language processing. Using a “web crawler” to gather back-link data (that is, the websites that linked to a particular site), Page, now teamed up with Brin, created their “PageRank” algorithm based on back-links ranked by importance—the more prominent the linking site, the more influence it would have on the linked site’s page rank. They insightfully reasoned that this would provide the basis for more useful web searches than any existing tools and, moreover, that there would be no need to hire a corps of indexing staff.


The Art of Computer Programming: Sorting and Searching by Donald Ervin Knuth

card file, Claude Shannon: information theory, complexity theory, correlation coefficient, Donald Knuth, double entry bookkeeping, Eratosthenes, Fermat's Last Theorem, G4S, information retrieval, iterative process, John von Neumann, linked data, locality of reference, Menlo Park, Norbert Wiener, NP-complete, p-value, Paul Erdős, RAND corporation, refrigerator car, sorting algorithm, Vilfredo Pareto, Yogi Berra, Zipf's Law

Changing the data 98 SORTING 5.2.1 061 087/ 503 f 512 \908 154 170 275 426 509 612 653 f 897 Y 677 f 765 Y 703 Fig. 13. Example of Wheeler's tree insertion scheme. structure slightly with "two-way insertion" cuts the number of moves down to about |-/V2. Shellsort cuts the number of comparisons and moves to about N7//6, for N in a practical range; as N —> oo this number can be lowered to order N(\ogNJ. Another way to improve on Algorithm S, using a linked data structure, gave us the list insertion method, which does about \N2 comparisons, 0 moves, and 2N changes of links. Is it possible to marry the best features of these methods, reducing the number of comparisons to order NlogN as in binary insertion, yet reducing the number of moves as in list insertion? The answer is yes, by going to a tree-structured arrangement. This possibility was first explored about 1957 by D.

Radix sorting is generally not useful for such small N, so a small example like this is intended to illustrate the sufficiency rather than the efficiency of the method. An alert, "modern" reader will note, however, that the whole idea of mak- making digit counts for the storage allocation is tied to old-fashioned ideas about sequential data representation. We know that linked allocation is specifically designed to handle a set of tables of variable size, so it is natural to choose a linked data structure for radix sorting. Since we traverse each pile serially, all 5.2.5 SORTING BY DISTRIBUTION 171 Table 1 RADIX SORTING Input area contents: 503 087 512 061 908 170 897 275 653 426 154 509 612 677 765 703 Counts for units digit distribution: 1123121311 Storage allocations based on these counts: 1 2 4 7 8 10 11 14 15 16 Auxiliary area contents: 170 061 512 612 503 653 703 154 275 765 426 087 897 677 908 509 Counts for tens digit distribution: 4210022311 Storage allocations based on these counts: 4 6 7 7 7 9 11 14 15 16 Input area contents: 503 703 908 509 512 612 426 653 154 061 765 170 275 677 087 897 Counts for hundreds digit distribution: 2210133211 Storage allocations based on these counts: 2 4 5 5 6 9 12 14 15 16 Auxiliary area contents: 061 087 154 170 275 426 503 509 512 612 653 677 703 765 897 908 we need is a single link from each item to its successor.


pages: 834 words: 180,700

The Architecture of Open Source Applications by Amy Brown, Greg Wilson

8-hour work day, anti-pattern, bioinformatics, c2.com, cloud computing, collaborative editing, combinatorial explosion, computer vision, continuous integration, create, read, update, delete, David Heinemeier Hansson, Debian, domain-specific language, Donald Knuth, en.wikipedia.org, fault tolerance, finite state, Firefox, friendly fire, Guido van Rossum, linked data, load shedding, locality of reference, loose coupling, Mars Rover, MITM: man-in-the-middle, MVC pattern, peer-to-peer, Perl 6, premature optimization, recommendation engine, revision control, Ruby on Rails, side project, Skype, slashdot, social web, speech recognition, the scientific method, The Wisdom of Crowds, web application, WebSocket

We use universally unique identifiers (UUIDs) to identify data, and commit hashes from git to reference versions. If the data changes from one execution to another, a new version is checked in to the repository. Thus, the (uuid, version) tuple is a compound identifier to retrieve the data in any state. In addition, we store the hash of the data as well as the signature of the upstream portion of the workflow that generated it (if it is not an input). This allows one to link data that might be identified differently as well as reuse data when the same computation is run again. The main concern when designing this package was the way users were able to select and retrieve their data. Also, we wished to keep all data in the same repository, regardless of whether it is used as input, output, or intermediate data (an output of one workflow might be used as the input of another).


In the Age of the Smart Machine by Shoshana Zuboff

affirmative action, American ideology, blue-collar work, collective bargaining, computer age, Computer Numeric Control, conceptual framework, data acquisition, demand response, deskilling, factory automation, Ford paid five dollars a day, fudge factor, future of work, industrial robot, information retrieval, interchangeable parts, job automation, lateral thinking, linked data, Marshall McLuhan, means of production, old-boy network, optical character recognition, Panopticon Jeremy Bentham, post-industrial society, RAND corporation, Shoshana Zuboff, social web, The Wealth of Nations by Adam Smith, Thorstein Veblen, union organizing, zero-sum game

The act of visualization brings internal resources to bear in order to soften the sense of distance, disconnection, and uncertainty that is created by the withdrawal from a three-dimensional action context. Ironically, it means creating a doubly abstract world, where the refer- ence function of the electronic symbols becomes less problematic be- cause of yet another layer of abstractions (mental images) called up to serve as referents. Operators did not appear equally adept at generating an inward im- age. 7 Many seemed unable to link data on the screen to a referential reality. Their interactions with the data were confined to the two- dimensional space of the terminal screen; the electronic symbols were deciphered according to the varying patterns in which they were ar- rayed. Typically, when asked what the data on the screen meant, these operators would point to distinct data elements and discuss them in terms of their spatial relationships on the screen, as if there were no external referents.


pages: 933 words: 205,691

Hadoop: The Definitive Guide by Tom White

Amazon Web Services, bioinformatics, business intelligence, combinatorial explosion, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, full text search, Grace Hopper, information retrieval, Internet Archive, Kickstarter, linked data, loose coupling, openstreetmap, recommendation engine, RFID, SETI@home, social graph, web application

Link inversion However, most algorithms for calculating a page’s importance (or quality) need the opposite information, that is, what pages contain outlinks that point to the current page. This information is not readily available when crawling. Also, the indexing process benefits from taking into account the anchor text on inlinks so that this text may semantically enrich the text of the current page. As mentioned earlier, Nutch collects the outlink information and then uses this data to build a LinkDb, which contains this reversed link data in the form of inlinks and anchor text. This section presents a rough outline of the implementation of the LinkDb tool—many details have been omitted (such as URL normalization and filtering) in order to present a clear picture of the process. What’s left gives a classical example of why the MapReduce paradigm fits so well with the key data transformation processes required to run a search engine.


pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil

additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business cycle, business intelligence, c2.com, call centre, carbon-based life, cellular automata, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, coronavirus, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, disintermediation, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, factory automation, friendly AI, George Gilder, Gödel, Escher, Bach, informal economy, information retrieval, invention of the telephone, invention of the telescope, invention of writing, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Marshall McLuhan, Mikhail Gorbachev, Mitch Kapor, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Norbert Wiener, oil shale / tar sands, optical character recognition, pattern recognition, phenotype, premature optimization, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Robert Metcalfe, Rodney Brooks, scientific worldview, Search for Extraterrestrial Intelligence, selection bias, semantic web, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Y2K, Yogi Berra

Resources and Contact Information Singularity.com New developments in the diverse fields discussed in this book are accumulating at an accelerating pace. To help you keep pace, I invite you to visit Singularity.com, where you will find ·Recent news stories ·A compilation of thousands of relevant news stories going back to 2001 from KurzweilAI.net (see below) ·Hundreds of articles on related topics from KurzweilAI.net ·Research links ·Data and citation for all graphs ·Material about this book ·Excerpts from this book ·Online endnotes KurzweilAI.net You are also invited to visit our award-winning Web site, KurzweilAI.net, which includes over six hundred articles by over one hundred "big thinkers" (many of whom are cited in this book), thousands of news articles, listings of events, and other features. Over the past six months, we have had more than one million readers.


pages: 897 words: 242,580

The Temporal Void by Peter F. Hamilton

corporate governance, dark matter, forensic accounting, linked data, megacity, place-making, trade route

A smooth spherical starship appeared from nowhere a kilometre ahead of the Starslayer. Its force fields were impenetrable. The Yenisey couldn’t even get an accurate quantum signature scan to determine what kind of drive it used. ‘Admiral,’ Lucian called urgently. ‘We can’t—’ The unknown ship fired. ‘What the fuck was that!’ Gore yelled as the secure link abruptly vanished. Kazimir took a second to review the TD link data, he was so surprised. His tactical staff had produced a number of scenarios, mostly incorporating the Ocisens utilizing weapons technology they’d procured from a more advanced species. This hadn’t been a remote consideration. ‘I don’t recognize that design at all,’ Ilanthe said. ‘Do we have any spherical ship on the Navy’s intelligence registry?’ ‘There are some species that utilize a sphere,’ Kazimir said slowly as his u-shadow supplied their most highly classified data.


pages: 903 words: 235,753

The Stack: On Software and Sovereignty by Benjamin H. Bratton

1960s counterculture, 3D printing, 4chan, Ada Lovelace, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, basic income, Benevolent Dictator For Life (BDFL), Berlin Wall, bioinformatics, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, carbon footprint, carbon-based life, Cass Sunstein, Celebration, Florida, charter city, clean water, cloud computing, connected car, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Douglas Engelbart, Edward Snowden, Elon Musk, en.wikipedia.org, Eratosthenes, Ethereum, ethereum blockchain, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, Hyperloop, illegal immigration, industrial robot, information retrieval, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Jacob Appelbaum, Jaron Lanier, Joan Didion, John Markoff, Joi Ito, Jony Ive, Julian Assange, Khan Academy, liberal capitalism, lifelogging, linked data, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megastructure, Menlo Park, Minecraft, MITM: man-in-the-middle, Monroe Doctrine, Network effects, new economy, offshore financial centre, oil shale / tar sands, packet switching, PageRank, pattern recognition, peak oil, peer-to-peer, performance metric, personalized medicine, Peter Eisenman, Peter Thiel, phenotype, Philip Mirowski, Pierre-Simon Laplace, place-making, planetary scale, RAND corporation, recommendation engine, reserve currency, RFID, Robert Bork, Sand Hill Road, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, Slavoj Žižek, smart cities, smart grid, smart meter, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, TaskRabbit, the built environment, The Chicago School, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, undersea cable, universal basic income, urban planning, Vernor Vinge, Washington Consensus, web application, Westphalian system, WikiLeaks, working poor, Y Combinator

Through various combinations of open or proprietary exigetics of data, and perhaps a sequence of application programming interfaces (APIs), a query entered as “book me a ticket to New York” can activate a series of secondary inquiries to calendars, banks, flight schedules, airline databases, bank accounts, and so on and, through this, initiate the cascading programming resulting in that booking. For this, to search is also to program. Such tidy consumer use cases require enormously difficult standardizations of interoperability between competitive services (not to mention beyond-Esperanto level standardization of all Users’ conceptual taxonomies). The goal of linking data into semantically relevant and accessible structures so that “search” would also provide more actionable results, and in turn allowing queries to program those results for specific ends, remains compelling for search engines, if less so for individual down-service-stream providers, such as airlines and banks, which see their business absorbed into a handful of search platforms.20 By comparison, physical search may be based on a similar tissue of interrelation between addressable entities—in this case, a mix of physical things and data of interest—and might be a necessary condition of a really viable Internet of Things or SPIME space.


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

Pattern Analysis and Machine Intelligence (PAMI) 24 (2002) 881–892. [KMR+94] Klemettinen, M.; Mannila, H.; Ronkainen, P.; Toivonen, H.; Verkamo, A.I., Finding interesting rules from large sets of discovered association rules, In: Proc. 3rd Int. Conf. Information and Knowledge Management Gaithersburg, MD. (Nov. 1994), pp. 401–408. [KMS03] Kubica, J.; Moore, A.; Schneider, J., Tractable group detection on large link data sets, In: Proc. 2003 Int. Conf. Data Mining (ICDM’03) Melbourne, FL. (Nov. 2003), pp. 573–576. [KN97] Knorr, E.; Ng, R., A unified notion of outliers: Properties and computation, In: Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD’97) Newport Beach, CA. (Aug. 1997), pp. 219–222. [KNNL04] Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W., Applied Linear Statistical Models with Student CD. (2004) Irwin .


pages: 918 words: 257,605

The Age of Surveillance Capitalism by Shoshana Zuboff

Amazon Web Services, Andrew Keen, augmented reality, autonomous vehicles, barriers to entry, Bartolomé de las Casas, Berlin Wall, bitcoin, blockchain, blue-collar work, book scanning, Broken windows theory, California gold rush, call centre, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, choice architecture, citizen journalism, cloud computing, collective bargaining, Computer Numeric Control, computer vision, connected car, corporate governance, corporate personhood, creative destruction, cryptocurrency, dogs of the Dow, don't be evil, Donald Trump, Edward Snowden, en.wikipedia.org, Erik Brynjolfsson, facts on the ground, Ford paid five dollars a day, future of work, game design, Google Earth, Google Glasses, Google X / Alphabet X, hive mind, impulse control, income inequality, Internet of things, invention of the printing press, invisible hand, Jean Tirole, job automation, Johann Wolfgang von Goethe, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, knowledge economy, linked data, longitudinal study, low skilled workers, Mark Zuckerberg, market bubble, means of production, multi-sided market, Naomi Klein, natural language processing, Network effects, new economy, Occupy movement, off grid, PageRank, Panopticon Jeremy Bentham, pattern recognition, Paul Buchheit, performance metric, Philip Mirowski, precision agriculture, price mechanism, profit maximization, profit motive, recommendation engine, refrigerator car, RFID, Richard Thaler, ride hailing / ride sharing, Robert Bork, Robert Mercer, Second Machine Age, self-driving car, sentiment analysis, shareholder value, Shoshana Zuboff, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, slashdot, smart cities, Snapchat, social graph, social web, software as a service, speech recognition, statistical model, Steve Jobs, Steven Levy, structural adjustment programs, The Future of Employment, The Wealth of Nations by Adam Smith, Tim Cook: Apple, two-sided market, union organizing, Watson beat the top human players on Jeopardy!, winner-take-all economy, Wolfgang Streeck

Conlee, “How Automation and Analytics Are Changing Customer Care,” Conduent Blog, July 18, 2016, https://www.blogs.conduent.com/2016/07/18/how-automation-and-analytics-are-changing-customer-care; Ryan Knutson, “Call Centers May Know a Surprising Amount About You,” Wall Street Journal, January 6, 2017, http://www.wsj.com/articles/that-anonymous-voice-at-the-call-center-they-may-know-a-lot-about-you-1483698608. 74. Nicholas Confessore and Danny Hakim, “Bold Promises Fade to Doubts for a Trump-Linked Data Firm,” New York Times, March 6, 2017, https://www.nytimes.com/2017/03/06/us/politics/cambridge-analytica.html; Mary-Ann Russon, “Political Revolution: How Big Data Won the US Presidency for Donald Trump,” International Business Times UK, January 20, 2017, http://www.ibtimes.co.uk/political-revolution-how-big-data-won-us-presidency-donald-trump-1602269; Grassegger and Krogerus, “The Data That Turned the World Upside Down”; Carole Cadwalladr, “Revealed: How US Billionaire Helped to Back Brexit,” Guardian, February 25, 2017, https://www.theguardian.com/politics/2017/feb/26/us-billionaire-mercer-helped-back-brexit; Paul-Olivier Dehaye, “The (Dis)Information Mercenaries Now Controlling Trump’s Databases,” Medium, January 3, 2017, https://medium.com/personaldata-io/the-dis-information-mercenaries-now-controlling-trumps-databases-4f6a20d4f3e7; Harry Davies, “Ted Cruz Using Firm That Harvested Data on Millions of Unwitting Facebook Users,” Guardian, December 11, 2015, https://www.theguardian.com/us-news/2015/dec/11/senator-ted-cruz-president-campaign-facebook-user-data. 75.