SPARQL

11 results back to index


pages: 511 words: 111,423

Learning SPARQL by Bob Ducharme

Donald Knuth, en.wikipedia.org, G4S, hypertext link, linked data, place-making, semantic web, SPARQL, web application

ex160.rq, Grouping Data and Finding Aggregate Values within Groups ex162.rq, Grouping Data and Finding Aggregate Values within Groups ex164.rq, Grouping Data and Finding Aggregate Values within Groups ex166.rq, Querying a Remote SPARQL Service ex167.rq, Querying a Remote SPARQL Service ex170.rq, Querying a Remote SPARQL Service ex172.rq, Federated Queries: Searching Multiple Datasets with One Query ex174.rq, Copying Data ex176.rq, Copying Data ex178.rq, Copying Data ex180.rq, Copying Data ex182.rq, Copying Data ex184.rq, Creating New Data ex185.rq, Creating New Data ex187.ttl, Creating New Data ex188.rq, Creating New Data ex190.rq, Creating New Data ex192.rq, Creating New Data ex193.ttl, Creating New Data ex194.rq, Converting Data ex196.rq, Converting Data ex198.ttl, Defining Rules with SPARQL ex199.rq, Defining Rules with SPARQL ex201.rq, Defining Rules with SPARQL, Datatypes and Queries ex202.rq, Defining Rules with SPARQL ex203.rq, Generating Data About Broken Rules ex205.rq, Generating Data About Broken Rules ex207.rq, Generating Data About Broken Rules ex209.rq, Generating Data About Broken Rules ex211.rq, Using Existing SPARQL Rules Vocabularies ex212.rq, Using Existing SPARQL Rules Vocabularies ex213.rq, Asking for a Description of a Resource ex215.rq, Asking for a Description of a Resource ex216.rq, Asking for a Description of a Resource ex217.ttl, Datatypes and Queries ex218.rq, Datatypes and Queries ex220.rq, Datatypes and Queries ex221.rq, Datatypes and Queries ex222.rq, Datatypes and Queries ex223.rq, Datatypes and Queries ex224.ttl, Representing Strings ex225.rq, Representing Strings ex227.ttl, Comparing Values and Doing Arithmetic ex228.rq, Comparing Values and Doing Arithmetic ex230.rq, Comparing Values and Doing Arithmetic ex232.rq, Comparing Values and Doing Arithmetic ex233.rq, Comparing Values and Doing Arithmetic ex235.rq, Program Logic Functions ex237.rq, Program Logic Functions ex239.rq, Program Logic Functions ex241.ttl, Node Type and Datatype Checking Functions ex242.rq, Node Type and Datatype Checking Functions ex244.rq, Node Type and Datatype Checking Functions ex246.rq, Node Type Conversion Functions ex248.rq, Node Type Conversion Functions ex249.ttl, Node Type Conversion Functions ex251.rq, Node Type Conversion Functions ex253.rq, Node Type Conversion Functions ex255.rq, Node Type Conversion Functions ex257.rq, Datatype Conversion ex259.ttl, Datatype Conversion ex260.rq, Datatype Conversion ex262.rq, Datatype Conversion ex264.rq, Datatype Conversion ex266.ttl, Datatype Conversion ex267.rq, Datatype Conversion ex268.txt, Datatype Conversion ex269.rq, Checking, Adding, and Removing Spoken Language Tags ex270.rq, Checking, Adding, and Removing Spoken Language Tags ex271.rq, Checking, Adding, and Removing Spoken Language Tags ex273.rq, Checking, Adding, and Removing Spoken Language Tags ex276.rq, Checking, Adding, and Removing Spoken Language Tags ex278.ttl, Checking, Adding, and Removing Spoken Language Tags ex279.rq, Checking, Adding, and Removing Spoken Language Tags ex281.ttl, Checking, Adding, and Removing Spoken Language Tags ex282.rq, Checking, Adding, and Removing Spoken Language Tags ex284.ttl, String Functions ex285.rq, String Functions ex287.rq, String Functions ex289.ttl, String Functions ex290.rq, String Functions ex292.ttl, Numeric Functions ex293.rq, Numeric Functions ex295.rq, Numeric Functions ex298.ttl, Date and Time Functions ex299.rq, Date and Time Functions ex301.rq, Date and Time Functions ex303.rq, Date and Time Functions ex305.rq, Hash Functions ex307.py, Hash Functions ex308.rq, Extension Functions ex311.rq, Adding Data to a Dataset ex312.ru, Adding Data to a Dataset ex313.ru, Adding Data to a Dataset ex314.rq, Adding Data to a Dataset ex316.ru, Adding Data to a Dataset ex324.ru, Deleting Data ex325.ru, Changing Existing Data ex326.rq, Changing Existing Data ex327.ttl, Changing Existing Data ex328.ttl, Changing Existing Data ex329.ru, Changing Existing Data ex330.ru, Named Graphs ex331.ru, Named Graphs ex332.rq, Named Graphs, SPARQL and HTTP ex333.ru, Named Graphs ex334.ru, Dropping Graphs ex336.ru, Dropping Graphs ex337.ru, Dropping Graphs, Solution ex338.ru, Dropping Graphs, SPARQL and HTTP ex339.ru, Dropping Graphs ex340.ru, Dropping Graphs ex341.rq, Dropping Graphs ex342.ru, Named Graph Syntax Shortcuts: WITH and USING ex343.ru, Named Graph Syntax Shortcuts: WITH and USING ex344.ru, Named Graph Syntax Shortcuts: WITH and USING ex345.ru, Named Graph Syntax Shortcuts: WITH and USING ex346.ru, Deleting and Replacing Triples in Named Graphs ex347.ru, Deleting and Replacing Triples in Named Graphs ex348.ru, Deleting and Replacing Triples in Named Graphs ex349.ru, Deleting and Replacing Triples in Named Graphs ex350.ru, Deleting and Replacing Triples in Named Graphs ex351.ru, Deleting and Replacing Triples in Named Graphs ex352.ru, Deleting and Replacing Triples in Named Graphs ex353.ru, Deleting and Replacing Triples in Named Graphs ex354.rq, Deleting and Replacing Triples in Named Graphs ex355.rq, SPARQL and Web Application Development ex358.py, SPARQL and Web Application Development ex360.pl, SPARQL and Web Application Development ex361.py, SPARQL and Web Application Development ex363.py, SPARQL and Web Application Development ex401.xml, SPARQL Query Results XML Format ex402.xsl, Processing XML Query Results ex403.xml, SPARQL Query Results XML Format ex404.js, SPARQL Query Results JSON Format ex405.js, SPARQL Query Results JSON Format ex406.rq, Working with SPARQL Query Result Formats ex407.js, Processing JSON Query Results ex408.rq, Working with SPARQL Query Result Formats ex409.ttl, Working with SPARQL Query Result Formats ex410.xml, SPARQL Query Results XML Format ex411.js, SPARQL Query Results JSON Format ex412.csv, SPARQL Query Results CSV and TSV Formats ex413.tsv, TSV Query Results ex414.txt, Working with SPARQL Query Result Formats ex415.rq, What Is Inferencing?

, Reusing and Creating Vocabularies: RDF Schema and OWL semicolon, More Readable Query Results connecting operations with, Named Graphs CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files serialization, Storing RDF in Files, Glossary SERVICE, Querying a Remote SPARQL Service Sesame triplestore, Querying Named Graphs, Datatypes and Queries inferencing with, Inferred Triples and Your Query repositories, SPARQL and HTTP simple literal, Glossary SKOS, Making RDF More Readable with Language Tags and Labels creating, Checking, Adding, and Removing Spoken Language Tags custom datatypes and, Datatypes and Queries SKOS-XL, Changing Existing Data SNORQL, Querying a Public Data Source sorting, Sorting Data query efficiency and, Efficiency Outside the WHERE Clause space before SPARQL punctuation, The Data to Query SPARQL, Jumping Right In: Some Data and Some Queries, Glossary comments, The Data to Query endpoint, Querying a Remote SPARQL Service engine, Querying the Data Graph Store HTTP Protocol specification, Named Graphs processor, Querying the Data protocol, Jumping Right In: Some Data and Some Queries, The SPARQL Specifications query language, The SPARQL Specifications SPARQL 1.1, Updating Data with SPARQL specifications, The SPARQL Specifications triplestores and, Storing RDF in Databases uppercase keywords, Querying the Data SPARQL algebra, SPARQL Algebra SPARQL endpoint, Querying a Public Data Source, Public Endpoints, Private Endpoints–Public Endpoints, Private Endpoints, Glossary creating your own, Triplestore SPARQL Support identifier, SPARQL and Web Application Development Linked Data Cloud and, Problem retrieving triples from, Problem SERVICE keyword and, Federated Queries: Searching Multiple Datasets with One Query SPARQL processor, SPARQL Processors–Public Endpoints, Private Endpoints, Glossary SPARQL protocol, Glossary SPARQL Query Results CSV and TSV Formats, SPARQL Query Results CSV and TSV Formats SPARQL Query Results JSON Format, SPARQL Query Results JSON Format SPARQL Query Results XML Format, The SPARQL Specifications, SPARQL Query Results XML Format as ARQ output, Standalone Processors SPARQL rules, Defining Rules with SPARQL–Defining Rules with SPARQL SPIN (SPARQL Inferencing Notation), Using Existing SPARQL Rules Vocabularies, Using SPARQL to Do Your Inferencing spreadsheets, Checking, Adding, and Removing Spoken Language Tags, Using CSV Query Results SQL, Querying the Data, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Middleware SPARQL Support, Glossary habits, Querying the Data square braces, Blank Nodes and Why They’re Useful, Using Existing SPARQL Rules Vocabularies str(), Node Type Conversion Functions CSV format and, SPARQL Query Results CSV and TSV Formats STRDT(), Datatype Conversion STRENDS(), String Functions string converting to URI, Problem datatype, Datatypes and Queries, Representing Strings functions, String Functions–String Functions searching for substrings, Problem striping, Storing RDF in Files, Glossary STRLANG(), Checking, Adding, and Removing Spoken Language Tags STRLEN(), String Functions STRSTARTS(), String Functions subject (of triple), The Data to Query, The Resource Description Framework (RDF), Glossary namespaces and, URLs, URIs, IRIs, and Namespaces subqueries, Queries in Your Queries, Combining Values and Assigning Values to Variables, Federated Queries: Searching Multiple Datasets with One Query SUBSTR(), Creating New Data, String Functions subtraction, Comparing Values and Doing Arithmetic SUM(), Finding the Smallest, the Biggest, the Count, the Average...

, Reusing and Creating Vocabularies: RDF Schema and OWL semicolon, More Readable Query Results connecting operations with, Named Graphs CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files serialization, Storing RDF in Files, Glossary SERVICE, Querying a Remote SPARQL Service Sesame triplestore, Querying Named Graphs, Datatypes and Queries inferencing with, Inferred Triples and Your Query repositories, SPARQL and HTTP simple literal, Glossary SKOS, Making RDF More Readable with Language Tags and Labels creating, Checking, Adding, and Removing Spoken Language Tags custom datatypes and, Datatypes and Queries SKOS-XL, Changing Existing Data SNORQL, Querying a Public Data Source sorting, Sorting Data query efficiency and, Efficiency Outside the WHERE Clause space before SPARQL punctuation, The Data to Query SPARQL, Jumping Right In: Some Data and Some Queries, Glossary comments, The Data to Query endpoint, Querying a Remote SPARQL Service engine, Querying the Data Graph Store HTTP Protocol specification, Named Graphs processor, Querying the Data protocol, Jumping Right In: Some Data and Some Queries, The SPARQL Specifications query language, The SPARQL Specifications SPARQL 1.1, Updating Data with SPARQL specifications, The SPARQL Specifications triplestores and, Storing RDF in Databases uppercase keywords, Querying the Data SPARQL algebra, SPARQL Algebra SPARQL endpoint, Querying a Public Data Source, Public Endpoints, Private Endpoints–Public Endpoints, Private Endpoints, Glossary creating your own, Triplestore SPARQL Support identifier, SPARQL and Web Application Development Linked Data Cloud and, Problem retrieving triples from, Problem SERVICE keyword and, Federated Queries: Searching Multiple Datasets with One Query SPARQL processor, SPARQL Processors–Public Endpoints, Private Endpoints, Glossary SPARQL protocol, Glossary SPARQL Query Results CSV and TSV Formats, SPARQL Query Results CSV and TSV Formats SPARQL Query Results JSON Format, SPARQL Query Results JSON Format SPARQL Query Results XML Format, The SPARQL Specifications, SPARQL Query Results XML Format as ARQ output, Standalone Processors SPARQL rules, Defining Rules with SPARQL–Defining Rules with SPARQL SPIN (SPARQL Inferencing Notation), Using Existing SPARQL Rules Vocabularies, Using SPARQL to Do Your Inferencing spreadsheets, Checking, Adding, and Removing Spoken Language Tags, Using CSV Query Results SQL, Querying the Data, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Middleware SPARQL Support, Glossary habits, Querying the Data square braces, Blank Nodes and Why They’re Useful, Using Existing SPARQL Rules Vocabularies str(), Node Type Conversion Functions CSV format and, SPARQL Query Results CSV and TSV Formats STRDT(), Datatype Conversion STRENDS(), String Functions string converting to URI, Problem datatype, Datatypes and Queries, Representing Strings functions, String Functions–String Functions searching for substrings, Problem striping, Storing RDF in Files, Glossary STRLANG(), Checking, Adding, and Removing Spoken Language Tags STRLEN(), String Functions STRSTARTS(), String Functions subject (of triple), The Data to Query, The Resource Description Framework (RDF), Glossary namespaces and, URLs, URIs, IRIs, and Namespaces subqueries, Queries in Your Queries, Combining Values and Assigning Values to Variables, Federated Queries: Searching Multiple Datasets with One Query SUBSTR(), Creating New Data, String Functions subtraction, Comparing Values and Doing Arithmetic SUM(), Finding the Smallest, the Biggest, the Count, the Average...


pages: 315 words: 70,044

Learning SPARQL by Bob Ducharme

database schema, Donald Knuth, en.wikipedia.org, G4S, linked data, semantic web, SPARQL, web application

See Also TurtleRDF/XMLN3 simple literal See literal SPARQL The SPARQL Protocol and RDF Query Language, a set of W3C standards for querying and updating data conforming to the RDF model. SPARQL endpoint An endpoint is a resource that a process can contact and use as a service; a SPARQL endpoint accepts SPARQL queries and returns the results using the SPARQL Protocol for RDF. One SPARQL service can provide multiple endpoints, each identified by its own URL. SPARQL engine See SPARQL processor SPARQL processor (Or SPARQL engine) a program that applies a SPARQL query against a dataset and returns the result. This can be a local or remote program. SPARQL protocol The specification for how a program should pass SPARQL queries and updates to a SPARQL query processing service and how that service should return the results.

, Reusing and Creating Vocabularies: RDF Schema and OWL semicolon, Storing RDF in Files, More Readable Query Results, Converting Data, Named Graphs CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files serialization, Storing RDF in Files, Glossary SERVICE, Querying a Remote SPARQL Service simple literal, Glossary SKOS, Making RDF More Readable with Language Tags and Labels, Datatypes and Queries, Checking, Adding, and Removing Spoken Language Tags creating, Checking, Adding, and Removing Spoken Language Tags custom datatypes and, Datatypes and Queries SKOS-XL, Changing Existing Data SNORQL, Querying a Public Data Source sorting data, Sorting Data space before SPARQL punctuation, The Data to Query SPARQL, Jumping Right In: Some Data and Some Queries, Jumping Right In: Some Data and Some Queries, The Data to Query, Querying the Data, Querying the Data, Querying the Data, Storing RDF in Databases, The SPARQL Specifications, The SPARQL Specifications, The SPARQL Specifications, Updating Data with SPARQL, Named Graphs, Glossary comments, The Data to Query engine, Querying the Data Graph Store HTTP Protocol specification, Named Graphs processor, Querying the Data protocol, Jumping Right In: Some Data and Some Queries, The SPARQL Specifications query language, The SPARQL Specifications SPARQL 1.1, Updating Data with SPARQL specifications, The SPARQL Specifications triplestores and, Storing RDF in Databases uppercase keywords, Querying the Data SPARQL endpoint, Querying a Public Data Source, SPARQL and Web Application Development, Triplestore SPARQL Support, Glossary creating your own, Triplestore SPARQL Support SPARQL processor, Glossary SPARQL protocol, Glossary SPARQL Query Results XML Format, The SPARQL Specifications, SPARQL Query Results XML Format, Standalone Processors as ARQ output, Standalone Processors SPARQL rules, Defining Rules with SPARQL, Defining Rules with SPARQL SPIN, Using Existing SPARQL Rules Vocabularies spreadsheets, Checking, Adding, and Removing Spoken Language Tags SQL, Querying the Data, Glossary square braces, Blank Nodes and Why They’re Useful, Using Existing SPARQL Rules Vocabularies str(), Node Type Conversion Functions STRDT(), Datatype Conversion STRENDS(), String Functions string datatype, Datatypes and Queries, Representing Strings striping, Storing RDF in Files, Glossary STRLANG(), Checking, Adding, and Removing Spoken Language Tags STRLEN(), String Functions STRSTARTS(), String Functions subject (of triple), The Data to Query, URLs, URIs, IRIs, and Namespaces, The Resource Description Format (RDF), Glossary namespaces and, URLs, URIs, IRIs, and Namespaces subqueries, Queries in Your Queries, Combining Values and Assigning Values to Variables, Federated Queries: Searching Multiple Datasets with One Query SUBSTR(), Creating New Data, String Functions subtraction, Comparing Values and Doing Arithmetic SUM(), Finding the Smallest, the Biggest, the Count, the Average...

, Querying the Data, More Realistic Data and Matching on Multiple Triples, URLs, URIs, IRIs, and Namespaces, Storing RDF in Databases, Data That Might Not Be There, Searching Further in the Data, Querying a Remote SPARQL Service, Creating New Data, Using Existing SPARQL Rules Vocabularies, Deleting and Replacing Triples in Named Graphs, Middleware SPARQL Support join (SPARQL equivalent), Searching Further in the Data normalization and, Creating New Data outer join (SPARQL equivalent), Data That Might Not Be There row ID values and, More Realistic Data and Matching on Multiple Triples, URLs, URIs, IRIs, and Namespaces SPARQL middleware and, Middleware SPARQL Support SPARQL rules and, Using Existing SPARQL Rules Vocabularies SQL habits, Querying the Data remote SPARQL service, querying, Querying a Remote SPARQL Service, Querying a Remote SPARQL Service Resource Description Format, The Data to Query (see RDF) round(), Numeric Functions S sample code, Using Code Examples schema, What Exactly Is the “Semantic Web”?


Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos

AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, wikimedia commons

Popular Public SPARQL Endpoints Service/Dataset SPARQL Endpoint Datahub/CKAN http://semantic.ckan.net/sparql DBpedia http://dbpedia.org/sparql/ GeoNames http://geosparql.org/ Linked Open Commerce http://linkedopencommerce.com/sparql/ Linked Open Data Cloud http://lod.openlinksw.com/sparql LinkedGeoData http://linkedgeodata.org/sparql Sindice http://sparql.sindice.com/ URIBurner http://uriburner.com/sparql Setting Up Your Own SPARQL Endpoint If you publish your LOD dataset on your server, you might want to set up a dedicated SPARQL endpoint to provide easy access to it. There are a couple of free, open source, and commercial products available, not all of which have a full SPARQL 1.1 support, but most have a complete SPARQL 1.0 support. Some products are standalone SPARQL endpoints that you can install on your web server, while others are more comprehensive products that provide a SPARQL endpoint as a feature.

person . } WHERE { { SERVICE <http://examplegraph1.com/sparql> { :LeslieSikos foaf:knows ?person . } } UNION { SERVICE <http://examplegraph2.com/sparql> { :LeslieSikos foaf:knows ?person . } } } URL Encoding of SPARQL Queries In order to provide the option for automated processes to make SPARQL queries, SPARQL can be used over HTTP, using the SPARQL Protocol (abbreviated by the P in SPARQL). SPARQL endpoints can handle a SPARQL query with parameters of an HTTP GET or POST request. The query is URL-encoded to escape special characters and create the query string as the value of the query variable. The parameters are defined in the standardized SPARQL Protocol [12]. As an example, take a look at the default DBpedia SPARQL endpoint (http://dbpedia.org/sparql/) query shown in Listing 7-21. Listing 7-21. A URL-Encoded SPARQL Query http://dbpedia.org/sparql?

In some cases, as, for example, when the SPARQL endpoint used for the query is dedicated to the LOD dataset from which you want to retrieve data, the FROM clause is optional and can be safely omitted. The WHERE clause specifies the patterns used to extract the desired results. The query modifiers, such as ORDER BY or LIMIT, if present, are in the last part of the query. SPARQL 1.0 and SPARQL 1.1 The first version of SPARQL, SPARQL 1.0, was released in 2008 [2]. SPARQL 1.0 introduced the SPARQL grammar, the SPARQL query syntax, the RDF term constraints, the graph patterns, the solution sequences and solution modifiers, and the four core query types (SELECT, CONSTRUCT, ASK, and DESCRIBE). SPARQL 1.0 has been significantly extended with new features in SPARQL 1.1 [3]. For example, SPARQL 1.1 supports aggregation.


RDF Database Systems: Triples Storage and SPARQL Query Processing by Olivier Cure, Guillaume Blin

Amazon Web Services, bioinformatics, business intelligence, cloud computing, database schema, fault tolerance, full text search, information retrieval, Internet Archive, Internet of things, linked data, NP-complete, peer-to-peer, performance metric, random walk, recommendation engine, RFID, semantic web, Silicon Valley, social intelligence, software as a service, SPARQL, web application

While the FROM keyword allows us to retrieve data (from local or distant graphs) and apply a query to it, SPARQL’s SERVICE keyword provides a way to remotely execute a query on a SPARQL endpoint. A SPARQL endpoint is a web service ­allowing the execution of SPARQL queries (e.g., DBpedia a central data set of Linked Open Data). Using SERVICE, the query will be sent to the SPARQL endpoint, which will execute the query and return the result. The following code illustrates the SERVICE and FROM keywords. By now, results to SPARQL queries were taking the form of a set of tuples, as in SQL, that is shaped from the SELECT template expression. In fact, SPARQL allows for other representations when used with ASK, CONSTRUCT, and DESCRIBE keywords. ASK allows us to obtain a Boolean answer to a SPARQL query depending on the possible match of a query pattern to the data set.

An important group of them tackle the join order execution of SPARQL BGP triples. This is known to have a substantial impact on the overall performance of query execution. Therefore, it’s generally considered that determining the best join ordering is a crucial issue in SPARQL query optimization. Note that this is also true for SQL queries, but it’s exacerbated in a SPARQL context where it’s known that the number of joins can be a lot more important—that is, queries with more than a dozen joins are common in SPARQL, while this is relatively rare in an RDBMS context. Then we present some of the most popular solutions implemented in commercial or academic state-of-the-art systems. 6.4.1 SPARQL graphs Most query optimization techniques for SPARQL are using a BGP abstraction, which takes the form of a graph.

ARQ can be used as a standalone application or as a library to integrate into a bigger application. ARQ considers both the SPARQL and SPARQL Update standards. ARQ can produce query results in several formats (XML, JSON, and column-separated output). ARQ can both read and query local and remote RDF data and SPARQL endpoints. ARQ supports custom filter functions, aggregation, GROUP BY, and assignment and federated queries. Fuzeki is a SPARQL server built on ARQ that can present RDF data and answer SPARQL queries over HTTP. Its installation is easy and straightforward. Jena provides a rules engine and other inference algorithms to derive additional RDF assertions. The inference system is pluggable with other reasoners. It allows supporting of RDFS and OWL, which allow inference of additional facts from instance data and class descriptions. There are several included reasoners: a transitive reasoner (i.e., dealing with rdfs:subPropertyOf and rdfs:subClassOf), an RDFS rule reasoner, some OWL reasoners (implementing partially OWL Lite), and a generic rule reasoner (for user-defined rules).


pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, Ethereum, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, general-purpose programming language, informal economy, information retrieval, Infrastructure as a Service, Internet of things, iterative process, John von Neumann, Kubernetes, loose coupling, Marc Andreessen, microservices, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, undersea cable, web application, WebSocket, wikimedia commons

To avoid potential confusion with http:// URLs, the examples in this section use non-resolvable URIs such as urn:example:within. Fortunately, you can just specify this prefix once at the top of the file, and then forget about it. The SPARQL query language SPARQL is a query language for triple-stores using the RDF data model [43]. (It is an acronym for SPARQL Protocol and RDF Query Language, pronounced “sparkle.”) It predates Cypher, and since Cypher’s pattern matching is borrowed from SPARQL, they look quite similar [37]. The same query as before—finding people who have moved from the US to Europe—is even more concise in SPARQL than it is in Cypher (see Example 2-9). Example 2-9. The same query as Example 2-4, expressed in SPARQL PREFIX : <urn:example:> SELECT ?personName WHERE { ?person :name ?personName. ?person :bornIn / :within* / :name "United States". ?person :livesIn / :within* / :name "Europe". } The structure is very similar.

, Glossaryabstractions for, Consistency and Consensus formalization in consensus, Fault-Tolerant Consensus-Limitations of consensususe of replication, Single-leader replication and consensus human fault tolerance, Philosophy of batch process outputs in batch processing, Bringing related data together in the same place, Philosophy of batch process outputs, Fault tolerance, Fault tolerance in log-based systems, Applying end-to-end thinking in data systems, Timeliness and Integrity-Correctness of dataflow systems in stream processing, Fault Tolerance-Rebuilding state after a failureatomic commit, Atomic commit revisited idempotence, Idempotence maintaining derived state, Maintaining derived state microbatching and checkpointing, Microbatching and checkpointing rebuilding state after a failure, Rebuilding state after a failure of distributed transactions, XA transactions-Limitations of distributed transactions transaction atomicity, Atomicity, Atomic Commit and Two-Phase Commit (2PC)-Exactly-once message processing faults, ReliabilityByzantine faults, Byzantine Faults-Weak forms of lying failures versus, Reliability handled by transactions, Transactions handling in supercomputers and cloud computing, Cloud Computing and Supercomputing hardware, Hardware Faults in batch processing versus distributed databases, Designing for frequent faults in distributed systems, Faults and Partial Failures-Cloud Computing and Supercomputing introducing deliberately, Reliability, Network Faults in Practice network faults, Network Faults in Practice-Detecting Faultsasymmetric faults, The Truth Is Defined by the Majority detecting, Detecting Faults tolerance of, in multi-leader replication, Multi-datacenter operation software errors, Software Errors tolerating (see fault tolerance) federated databases, The meta-database of everything fence (CPU instruction), Linearizability and network delays fencing (preventing split brain), Leader failure: Failover, The leader and the lock-Fencing tokensgenerating fencing tokens, Using total order broadcast, Membership and Coordination Services properties of fencing tokens, Correctness of an algorithm stream processors writing to databases, Idempotence, Exactly-once execution of an operation Fibre Channel (networks), MapReduce and Distributed Filesystems field tags (Thrift and Protocol Buffers), Thrift and Protocol Buffers-Field tags and schema evolution file descriptors (Unix), A uniform interface financial data, Advantages of immutable events Firebase (database), API support for change streams Flink (processing framework), Dataflow engines-Discussion of materializationdataflow APIs, High-Level APIs and Languages fault tolerance, Fault tolerance, Microbatching and checkpointing, Rebuilding state after a failure Gelly API (graph processing), The Pregel processing model integration of batch and stream processing, Batch and Stream Processing, Unifying batch and stream processing machine learning, Specialization for different domains query optimizer, The move toward declarative query languages stream processing, Stream analytics flow control, Network congestion and queueing, Messaging Systems, Glossary FLP result (on consensus), Distributed Transactions and Consensus FlumeJava (dataflow library), MapReduce workflows, High-Level APIs and Languages followers, Leaders and Followers, Glossary(see also leader-based replication) foreign keys, Comparison to document databases, Reduce-Side Joins and Grouping forward compatibility, Encoding and Evolution forward decay (algorithm), Describing Performance Fossil (version control system), Limitations of immutabilityshunning (deleting data), Limitations of immutability FoundationDB (database)serializable transactions, Serializable Snapshot Isolation (SSI), Performance of serializable snapshot isolation, Limitations of distributed transactions fractal trees, B-tree optimizations full table scans, Reduce-Side Joins and Grouping full-text search, Glossaryand fuzzy indexes, Full-text search and fuzzy indexes building search indexes, Building search indexes Lucene storage engine, Making an LSM-tree out of SSTables functional reactive programming (FRP), Designing Applications Around Dataflow functional requirements, Summary futures (asynchronous operations), Current directions for RPC fuzzy search (see similarity search) G garbage collectionimmutability and, Limitations of immutability process pauses for, Describing Performance, Process Pauses-Limiting the impact of garbage collection, The Truth Is Defined by the Majority(see also process pauses) genome analysis, Summary, Specialization for different domains geographically distributed datacenters, Distributed Data, Reading Your Own Writes, Unreliable Networks, The limits of total ordering geospatial indexes, Multi-column indexes Giraph (graph processing), The Pregel processing model Git (version control system), Custom conflict resolution logic, The causal order is not a total order, Limitations of immutability GitHub, postmortems, Leader failure: Failover, Leader failure: Failover, Mapping system models to the real world global indexes (see term-partitioned indexes) GlusterFS (distributed filesystem), MapReduce and Distributed Filesystems GNU Coreutils (Linux), Sorting versus in-memory aggregation GoldenGate (change data capture), Trigger-based replication, Multi-datacenter operation, Implementing change data capture(see also Oracle) GoogleBigtable (database)data model (see Bigtable data model) partitioning scheme, Partitioning, Partitioning by Key Range storage layout, Making an LSM-tree out of SSTables Chubby (lock service), Membership and Coordination Services Cloud Dataflow (stream processor), Stream analytics, Atomic commit revisited, Unifying batch and stream processing(see also Beam) Cloud Pub/Sub (messaging), Message brokers compared to databases, Using logs for message storage Docs (collaborative editor), Collaborative editing Dremel (query engine), The divergence between OLTP databases and data warehouses, Column-Oriented Storage FlumeJava (dataflow library), MapReduce workflows, High-Level APIs and Languages GFS (distributed file system), MapReduce and Distributed Filesystems gRPC (RPC framework), Current directions for RPC MapReduce (batch processing), Batch Processing(see also MapReduce) building search indexes, Building search indexes task preemption, Designing for frequent faults Pregel (graph processing), The Pregel processing model Spanner (see Spanner) TrueTime (clock API), Clock readings have a confidence interval gossip protocol, Request Routing government use of data, Data as assets and power GPS (Global Positioning System)use for clock synchronization, Unreliable Clocks, Clock Synchronization and Accuracy, Clock readings have a confidence interval, Synchronized clocks for global snapshots GraphChi (graph processing), Parallel execution graphs, Glossaryas data models, Graph-Like Data Models-The Foundation: Datalogexample of graph-structured data, Graph-Like Data Models property graphs, Property Graphs RDF and triple-stores, Triple-Stores and SPARQL-The SPARQL query language versus the network model, The SPARQL query language processing and analysis, Graphs and Iterative Processing-Parallel executionfault tolerance, Fault tolerance Pregel processing model, The Pregel processing model query languagesCypher, The Cypher Query Language Datalog, The Foundation: Datalog-The Foundation: Datalog recursive SQL queries, Graph Queries in SQL SPARQL, The SPARQL query language-The SPARQL query language Gremlin (graph query language), Graph-Like Data Models grep (Unix tool), Simple Log Analysis GROUP BY clause (SQL), GROUP BY grouping records in MapReduce, GROUP BYhandling skew, Handling skew H Hadoop (data infrastructure)comparison to distributed databases, Batch Processing comparison to MPP databases, Comparing Hadoop to Distributed Databases-Designing for frequent faults comparison to Unix, Philosophy of batch process outputs-Philosophy of batch process outputs, Unbundling Databases diverse processing models in ecosystem, Diversity of processing models HDFS distributed filesystem (see HDFS) higher-level tools, MapReduce workflows join algorithms, Reduce-Side Joins and Grouping-MapReduce workflows with map-side joins(see also MapReduce) MapReduce (see MapReduce) YARN (see YARN) happens-before relationship, Ordering and Causalitycapturing, Capturing the happens-before relationship concurrency and, The “happens-before” relationship and concurrency hard disksaccess patterns, Advantages of LSM-trees detecting corruption, The end-to-end argument, Don’t just blindly trust what they promise faults in, Hardware Faults, Durability sequential write throughput, Hash Indexes, Disk space usage hardware faults, Hardware Faults hash indexes, Hash Indexes-Hash Indexesbroadcast hash joins, Broadcast hash joins partitioned hash joins, Partitioned hash joins hash partitioning, Partitioning by Hash of Key-Partitioning by Hash of Key, Summaryconsistent hashing, Partitioning by Hash of Key problems with hash mod N, How not to do it: hash mod N range queries, Partitioning by Hash of Key suitable hash functions, Partitioning by Hash of Key with fixed number of partitions, Fixed number of partitions HAWQ (database), Specialization for different domains HBase (database)bug due to lack of fencing, The leader and the lock bulk loading, Key-value stores as batch process output column-family data model, Data locality for queries, Column Compression dynamic partitioning, Dynamic partitioning key-range partitioning, Partitioning by Key Range log-structured storage, Making an LSM-tree out of SSTables request routing, Request Routing size-tiered compaction, Performance optimizations use of HDFS, Diversity of processing models use of ZooKeeper, Membership and Coordination Services HDFS (Hadoop Distributed File System), MapReduce and Distributed Filesystems-MapReduce and Distributed Filesystems(see also distributed filesystems) checking data integrity, Don’t just blindly trust what they promise decoupling from query engines, Diversity of processing models indiscriminately dumping data into, Diversity of storage metadata about datasets, MapReduce workflows with map-side joins NameNode, MapReduce and Distributed Filesystems use by Flink, Rebuilding state after a failure use by HBase, Dynamic partitioning use by MapReduce, MapReduce workflows HdrHistogram (numerical library), Describing Performance head (Unix tool), Simple Log Analysis head vertex (property graphs), Property Graphs head-of-line blocking, Describing Performance heap files (databases), Storing values within the index Helix (cluster manager), Request Routing heterogeneous distributed transactions, Distributed Transactions in Practice, Limitations of distributed transactions heuristic decisions (in 2PC), Recovering from coordinator failure Hibernate (object-relational mapper), The Object-Relational Mismatch hierarchical model, Are Document Databases Repeating History?

person :livesIn / :within* / :name "Europe". } The structure is very similar. The following two expressions are equivalent (variables start with a question mark in SPARQL): (person) -[:BORN_IN]-> () -[:WITHIN*0..]-> (location) # Cypher ?person :bornIn / :within* ?location. # SPARQL Because RDF doesn’t distinguish between properties and edges but just uses predicates for both, you can use the same syntax for matching properties. In the following expression, the variable usa is bound to any vertex that has a name property whose value is the string "United States": (usa {name:'United States'}) # Cypher ?usa :name "United States". # SPARQL SPARQL is a nice query language—even if the semantic web never happens, it can be a powerful tool for applications to use internally. Graph Databases Compared to the Network Model In “Are Document Databases Repeating History?”


Martin Kleppmann-Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems-O’Reilly (2017) by Unknown

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, Ethereum, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, general-purpose programming language, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, Kubernetes, loose coupling, Marc Andreessen, microservices, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, undersea cable, web application, WebSocket, wikimedia commons

Fortunately, you can just specify this prefix once at the top of the file, and then forget about it. 58 | Chapter 2: Data Models and Query Languages The SPARQL query language SPARQL is a query language for triple-stores using the RDF data model [43]. (It is an acronym for SPARQL Protocol and RDF Query Language, pronounced “sparkle.”) It predates Cypher, and since Cypher’s pattern matching is borrowed from SPARQL, they look quite similar [37]. The same query as before—finding people who have moved from the US to Europe— is even more concise in SPARQL than it is in Cypher (see Example 2-9). Example 2-9. The same query as Example 2-4, expressed in SPARQL PREFIX : <urn:example:> SELECT ?personName WHERE { ?person :name ?personName. ?person :bornIn / :within* / :name "United States". ?person :livesIn / :within* / :name "Europe". } The structure is very similar.

person :livesIn / :within* / :name "Europe". } The structure is very similar. The following two expressions are equivalent (variables start with a question mark in SPARQL): (person) -[:BORN_IN]-> () -[:WITHIN*0..]-> (location) # Cypher ?person :bornIn / :within* ?location. # SPARQL Because RDF doesn’t distinguish between properties and edges but just uses predi‐ cates for both, you can use the same syntax for matching properties. In the following expression, the variable usa is bound to any vertex that has a name property whose value is the string "United States": (usa {name:'United States'}) # Cypher ?usa :name "United States". # SPARQL SPARQL is a nice query language—even if the semantic web never happens, it can be a powerful tool for applications to use internally. Graph-Like Data Models | 59 Graph Databases Compared to the Network Model In “Are Document Databases Repeating History?”

In a graph database, vertices and edges are not ordered (you can only sort the results when making a query). • In CODASYL, all queries were imperative, difficult to write and easily broken by changes in the schema. In a graph database, you can write your traversal in imperative code if you want to, but most graph databases also support high-level, declarative query languages such as Cypher or SPARQL. The Foundation: Datalog Datalog is a much older language than SPARQL or Cypher, having been studied extensively by academics in the 1980s [44, 45, 46]. It is less well known among soft‐ ware engineers, but it is nevertheless important, because it provides the foundation that later query languages build upon. In practice, Datalog is used in a few data systems: for example, it is the query lan‐ guage of Datomic [40], and Cascalog [47] is a Datalog implementation for querying large datasets in Hadoop.viii viii.


pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem

Amazon Web Services, anti-pattern, bioinformatics, commoditize, corporate governance, create, read, update, delete, data acquisition, en.wikipedia.org, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, web application

Individually, single triples are semantically rather poor, but en-masse they provide a rich dataset from which to harvest knowledge and infer connections. Triple stores typically provide SPARQL ca‐ pabilities to reason about stored RDF data.11 RDF—the lingua franca of triple stores and the Semantic Web—can be serialized several ways. RDF encoding of a simple three-node graph shows the RDF/XML format. Here we see how triples come together to form linked data. RDF encoding of a simple three-node graph. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.example.org/ter <rdf:Description rdf:about="http://www.example.org/ginger"> <name>Ginger Rogers</name> <occupation>dancer</occupation> <partner rdf:resource="http://www.example.org/fred"/> </rdf:Description> 10. http://www.w3.org/standards/semanticweb/ 11. See http://www.w3.org/TR/rdf-sparql-query/ and http://www.w3.org/RDF/ Graph Databases | 185 <rdf:Description rdf:about="http://www.example.org/fred"> <name>Fred Astaire</name> <occupation>dancer</occupation> <likes rdf:resource="http://www.example.org/ice-cream"/> </rdf:Description> </rdf:RDF> W3C support That they produce logical representations of triples doesn’t mean triple stores necessarily have triple-like internal implementations.

See http://www.w3.org/TR/rdf-sparql-query/ and http://www.w3.org/RDF/ Graph Databases | 185 <rdf:Description rdf:about="http://www.example.org/fred"> <name>Fred Astaire</name> <occupation>dancer</occupation> <likes rdf:resource="http://www.example.org/ice-cream"/> </rdf:Description> </rdf:RDF> W3C support That they produce logical representations of triples doesn’t mean triple stores necessarily have triple-like internal implementations. Most triple stores, however, are unified by their support for Semantic Web technology such as RDF and SPARQL. While there’s nothing particularly special about RDF as a means of serializing linked data, it is en‐ dorsed by the W3C and therefore benefits from being widely understood and well doc‐ umented. The query language SPARQL benefits from similar W3C patronage. In the graph database space there is a similar abundance of innovation around graph serialization formats (e.g. GEOFF) and inferencing query languages (e.g. the Cypher query language that we use throughout this book).12 The key difference is that at this point these innovations do not enjoy the patronage of a well-regarded body like the W3C, though they do benefit from strong engagement within their user and vendor communities.

Once you understand Cypher, it becomes very easy to branch out and learn other graph query languages.1 In the following sections we’ll take a brief tour through Cypher. This isn’t a reference document for Cypher, however—merely a friendly introduction so that we can explore more interesting graph query scenarios later on.2 Other query languages Other graph databases have other means of querying data. Many, including Neo4j, sup‐ port the RDF query language SPARQL and the imperative, path-based query language Gremlin.3 Our interest, however, is in the expressive power of a property graph com‐ bined with a declarative query language, and so in this book we focus almost exclusively on Cypher. Cypher Philosophy Cypher is designed to be easily read and understood by developers, database profes‐ sionals, and business stakeholders. Its ease of use derives from the fact it accords with the way we intuitively describe graphs using diagrams. 1.


pages: 377 words: 110,427

The Boy Who Could Change the World: The Writings of Aaron Swartz by Aaron Swartz, Lawrence Lessig

affirmative action, Alfred Russel Wallace, American Legislative Exchange Council, Benjamin Mako Hill, bitcoin, Bonfire of the Vanities, Brewster Kahle, Cass Sunstein, deliberate practice, Donald Knuth, Donald Trump, failed state, fear of failure, Firefox, full employment, Howard Zinn, index card, invisible hand, Joan Didion, John Gruber, Lean Startup, More Guns, Less Crime, peer-to-peer, post scarcity, Richard Feynman, Richard Stallman, Ronald Reagan, school vouchers, semantic web, single-payer health, SpamAssassin, SPARQL, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, the scientific method, Toyota Production System, unbiased observer, wage slave, Washington Consensus, web application, WikiLeaks, working poor, zero-sum game

(To engineers, this is absurd from the start—standards are things you write after you’ve got something working, not before!) And so the “Semantic Web Activity” at the Worldwide Web Consortium (W3C) has spent its time writing standard upon standard: the Extensible Markup Language (XML), the Resource Description Framework (RDF), the Web Ontology Language (OWL), tools for Gleaning Resource Descriptions from Dialects of Languages (GRDDL), the Simple Protocol And RDF Query Language (SPARQL) (as created by the RDF Data Access Working Group (DAWG)). Few have received any widespread use and those that have (XML) are uniformly scourges on the planet, offenses against hardworking programmers that have pushed out sensible formats (like JSON) in favor of overly complicated hairballs with no basis in reality (I’m not done yet!—more on this in chapter 5). Instead of getting existing systems to talk to each other and writing up the best practices, these self-appointed guarantors of the Semantic Web have spent their time creating their own little universe, complete with Semantic Web databases and programming languages.

And it’s led many who have been working on the Semantic Web, in the vain hope of actually building a world where software can communicate, to burn out and tune out and find more productive avenues for their attentions. For an example, look at Sean B. Palmer. In his influential piece, “Ditching the Semantic Web?,” he proclaims “It’s not prudent, perhaps even not moral (if that doesn’t sound too melodramatic), to work on RDF, OWL, SPARQL, RIF, the broken ideas of distributed trust, CWM, Tabulator, Dublin Core, FOAF, SIOC, and any of these kinds of things” and says not only will he “stop working on the Semantic Web” but “I will, moreover, actively dissuade anyone from working on the Semantic Web where it distracts them from working on” more practical projects. It would be only fair here to point out that I am not exactly an unbiased observer.


pages: 680 words: 157,865

Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design by Diomidis Spinellis, Georgios Gousios

Albert Einstein, barriers to entry, business intelligence, business process, call centre, continuous integration, corporate governance, database schema, Debian, domain-specific language, don't repeat yourself, Donald Knuth, en.wikipedia.org, fault tolerance, Firefox, general-purpose programming language, iterative process, linked data, locality of reference, loose coupling, meta analysis, meta-analysis, MVC pattern, peer-to-peer, premature optimization, recommendation engine, Richard Stallman, Ruby on Rails, semantic web, smart cities, social graph, social web, SPARQL, Steve Jobs, Stewart Brand, traveling salesman, Turing complete, type inference, web application, zero-coupon bond

Not only do we now have the ability to use whatever terms we would like to, we can add new terms and relationships at any point in the future without affecting the existing relationships. This schemaless approach is tremendously appealing to anyone who has ever modified an XML or RDBMS schema. It also represents a data model that not only survives in the face of inevitable social, procedural, and technological changes, but also embraces them. This RDF would be stored in a triplestore or other database, where it could be queried through SPARQL or a similar language. Most semantically enabled containers support storing and querying RDF in this way now. Examples include the Mulgara Semantic Store,[20] the Sesame Engine,[21] the Talis Platform,[22] and even Oracle 10g and beyond. Nodes in the graph can be selected based on pattern-matching criteria, so we could ask questions of our resources such as “Who created this URL?”, “Show me everything that Brian has created,” or “Identify any Creative Commons–licensed material produced in the last six months.”

To some extent, these discussions are still going on, as the technologies involved are evolving along with Akonadi, and the current approaches have yet to be validated in production use. At the time of this writing, there are agents feeding into Nepomuk and Strigi; these are separate processes that use the same API to the store as all other clients and resources. Incoming search queries are expressed in either XESAM or SPARQL, the respective query languages of Strigi and Nepomuk, which are also implemented by other search engines (such as Beagle, for example) and forwarded to them via DBUS. This forwarding happens inside the Akonadi server process. The results come in via DBUS in the form of lists of identifiers, which Akonadi can then use to present the results as actual items from the store to the user. The store does not do any searching or content indexing itself at the moment.


The Data Journalism Handbook by Jonathan Gray, Lucy Chambers, Liliana Bounegru

Amazon Web Services, barriers to entry, bioinformatics, business intelligence, carbon footprint, citizen journalism, correlation does not imply causation, crowdsourcing, David Heinemeier Hansson, eurozone crisis, Firefox, Florence Nightingale: pie chart, game design, Google Earth, Hans Rosling, information asymmetry, Internet Archive, John Snow's cholera map, Julian Assange, linked data, moral hazard, MVC pattern, New Journalism, openstreetmap, Ronald Reagan, Ruby on Rails, Silicon Valley, social graph, SPARQL, text mining, web application, WikiLeaks

Central to any data projects are the technical skills and advice of our developers and the visualization skills of our designers. While we are all either a journalist, designer, or developer “first,” we continue to work hard to increase our understanding and proficiency in each other’s areas of expertise. The core products for exploring data are Excel, Google Docs, and Fusion Tables. The team has also, but to a lesser extent, used MySQL, Access databases, and Solr to explore larger datasets; and used RDF and SPARQL to begin looking at ways in which we can model events using Linked Data technologies. Developers will also use their programming language of choice, whether that’s ActionScript, Python, or Perl, to match, parse, or generally pick apart a dataset we might be working on. Perl is used for some of the publishing. We use Google, Bing Maps, and Google Earth, along with Esri’s ArcMAP, for exploring and visualizing geographical data.


Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, longitudinal study, Mars Rover, natural language processing, openstreetmap, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social graph, SPARQL, speech recognition, statistical model, supply-chain management, text mining, Vernor Vinge, web application

The fragment shown next uses the XML entity ons with the value http://spreadsheet.google.com/ plwwufp30hfq0udnEmRD1aQ/onto# essentially as an alias to make the XML more readable (&ons;measurement179 is expanded to the full URL with “measurement179” appended): <ons:Measurement RDF:about="&ons;measurement179"> <ons:solubility>0.44244235315106</ons:solubility> <ons:solvent RDF:resource="&ons;solvent8"/> <ons:solute RDF:resource="&ons;solute26"/> <ons:experiment RDF:resource="&ons;experiment2"/> </ons:Measurement> These statements, or triples, can then be read or analyzed by any RDF engine and query systems such as SPARQL. By using appropriate namespaces, especially where they are agreed and shared, it is possible to generate datafiles that are essentially self-describing. A parser has been developed (http://github.com/egonw/onssolubility/tree/) to generate the full RDF document, available at http://github.com/egonw/onssolubility/tree/master/ons.solubility. RDF/ons.RDF. The Chemistry Development Kit (CDK; see http://cdk.sourceforge.net/) is used to derive molecular properties from the SMILES, including the InChI.