SPARQL

11 results back to index


pages: 511 words: 111,423

Learning SPARQL by Bob Ducharme

business logic, Donald Knuth, en.wikipedia.org, G4S, hypertext link, linked data, machine readable, place-making, semantic web, SPARQL, web application

SPARQL The SPARQL Protocol and RDF Query Language, a set of W3C standards for querying and updating data conforming to the RDF model. SPARQL endpoint An endpoint is a resource that a process can contact and use as a service; a SPARQL endpoint accepts SPARQL queries and returns the results using the SPARQL Protocol for RDF. One SPARQL service can provide multiple endpoints, each identified by its own URL. SPARQL engine See SPARQL processor. SPARQL processor (Or SPARQL engine) a program that applies a SPARQL query against a dataset and returns the result. This can be a local or remote program. SPARQL protocol The specification for how a program should pass SPARQL queries and updates to a SPARQL query processing service and how that service should return the results.

., Adding Data to a Dataset integer datatype, Datatypes and Queries IRI, Glossary IRI(), Node Type Conversion Functions, Solution isBlank(), Node Type and Datatype Checking Functions isIRI(), Node Type and Datatype Checking Functions isLiteral(), Node Type and Datatype Checking Functions isNumeric(), Node Type and Datatype Checking Functions isURI(), FILTERing Data Based on Conditions, Node Type and Datatype Checking Functions J Java, SPARQL and Web Application Development JavaScript, SPARQL Query Results JSON Format, SPARQL and Web Application Development Jena, Defining Rules with SPARQL, Getting Started with Fuseki, Getting Started with Fuseki, Standalone Processors join (SPARQL equivalent), Searching Further in the Data JSON, The SPARQL Specifications, SPARQL and Web Application Development ARQ and, Working with SPARQL Query Result Formats, Standalone Processors defined, SPARQL Query Results JSON Format query results, SPARQL Query Results JSON Format results from a SPARQL engine, SPARQL Query Results JSON Format K Knuth, Donald, Datatypes and Queries L lang(), Checking, Adding, and Removing Spoken Language Tags langMatches() vs., Checking, Adding, and Removing Spoken Language Tags langMatches(), Checking, Adding, and Removing Spoken Language Tags language codes, Making RDF More Readable with Language Tags and Labels, Checking, Adding, and Removing Spoken Language Tags–Checking, Adding, and Removing Spoken Language Tags adding, Checking, Adding, and Removing Spoken Language Tags checking, Checking, Adding, and Removing Spoken Language Tags filtering on, Using the Labels Provided by DBpedia removing, Checking, Adding, and Removing Spoken Language Tags LCASE(), String Functions, Discussion LIMIT, Retrieving a Specific Number of Results, Federated Queries: Searching Multiple Datasets with One Query Linked Data, What Exactly Is the “Semantic Web”?

, Storing RDF in Databases, Querying a Remote SPARQL Service, Deleting and Replacing Triples in Named Graphs (see also SQL) join (SPARQL equivalent), Searching Further in the Data normalization and, Creating New Data outer join (SPARQL equivalent), Data That Might Not Be There row ID values and, More Realistic Data and Matching on Multiple Triples, URLs, URIs, IRIs, and Namespaces SPARQL middleware and, Middleware SPARQL Support SPARQL rules and, Using Existing SPARQL Rules Vocabularies remote SPARQL service, querying, Querying a Remote SPARQL Service–Querying a Remote SPARQL Service Resource Description Framework (see RDF) REST, SPARQL and HTTP restriction classes, SPARQL and OWL Inferencing round(), Numeric Functions Ruby, SPARQL and Web Application Development rules, SPARQL (see SPARQL rules) S sameTerm(), Node Type and Datatype Checking Functions sample code, Using Code Examples schema, What Exactly Is the “Semantic Web”?


pages: 315 words: 70,044

Learning SPARQL by Bob Ducharme

database schema, Donald Knuth, en.wikipedia.org, G4S, linked data, machine readable, semantic web, SPARQL, web application

See Also TurtleRDF/XMLN3 simple literal See literal SPARQL The SPARQL Protocol and RDF Query Language, a set of W3C standards for querying and updating data conforming to the RDF model. SPARQL endpoint An endpoint is a resource that a process can contact and use as a service; a SPARQL endpoint accepts SPARQL queries and returns the results using the SPARQL Protocol for RDF. One SPARQL service can provide multiple endpoints, each identified by its own URL. SPARQL engine See SPARQL processor SPARQL processor (Or SPARQL engine) a program that applies a SPARQL query against a dataset and returns the result.

, Reusing and Creating Vocabularies: RDF Schema and OWL semicolon, Storing RDF in Files, More Readable Query Results, Converting Data, Named Graphs CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files serialization, Storing RDF in Files, Glossary SERVICE, Querying a Remote SPARQL Service simple literal, Glossary SKOS, Making RDF More Readable with Language Tags and Labels, Datatypes and Queries, Checking, Adding, and Removing Spoken Language Tags creating, Checking, Adding, and Removing Spoken Language Tags custom datatypes and, Datatypes and Queries SKOS-XL, Changing Existing Data SNORQL, Querying a Public Data Source sorting data, Sorting Data space before SPARQL punctuation, The Data to Query SPARQL, Jumping Right In: Some Data and Some Queries, Jumping Right In: Some Data and Some Queries, The Data to Query, Querying the Data, Querying the Data, Querying the Data, Storing RDF in Databases, The SPARQL Specifications, The SPARQL Specifications, The SPARQL Specifications, Updating Data with SPARQL, Named Graphs, Glossary comments, The Data to Query engine, Querying the Data Graph Store HTTP Protocol specification, Named Graphs processor, Querying the Data protocol, Jumping Right In: Some Data and Some Queries, The SPARQL Specifications query language, The SPARQL Specifications SPARQL 1.1, Updating Data with SPARQL specifications, The SPARQL Specifications triplestores and, Storing RDF in Databases uppercase keywords, Querying the Data SPARQL endpoint, Querying a Public Data Source, SPARQL and Web Application Development, Triplestore SPARQL Support, Glossary creating your own, Triplestore SPARQL Support SPARQL processor, Glossary SPARQL protocol, Glossary SPARQL Query Results XML Format, The SPARQL Specifications, SPARQL Query Results XML Format, Standalone Processors as ARQ output, Standalone Processors SPARQL rules, Defining Rules with SPARQL, Defining Rules with SPARQL SPIN, Using Existing SPARQL Rules Vocabularies spreadsheets, Checking, Adding, and Removing Spoken Language Tags SQL, Querying the Data, Glossary square braces, Blank Nodes and Why They’re Useful, Using Existing SPARQL Rules Vocabularies str(), Node Type Conversion Functions STRDT(), Datatype Conversion STRENDS(), String Functions string datatype, Datatypes and Queries, Representing Strings striping, Storing RDF in Files, Glossary STRLANG(), Checking, Adding, and Removing Spoken Language Tags STRLEN(), String Functions STRSTARTS(), String Functions subject (of triple), The Data to Query, URLs, URIs, IRIs, and Namespaces, The Resource Description Format (RDF), Glossary namespaces and, URLs, URIs, IRIs, and Namespaces subqueries, Queries in Your Queries, Combining Values and Assigning Values to Variables, Federated Queries: Searching Multiple Datasets with One Query SUBSTR(), Creating New Data, String Functions subtraction, Comparing Values and Doing Arithmetic SUM(), Finding the Smallest, the Biggest, the Count, the Average...

, Reusing and Creating Vocabularies: RDF Schema and OWL semicolon, Storing RDF in Files, More Readable Query Results, Converting Data, Named Graphs CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files serialization, Storing RDF in Files, Glossary SERVICE, Querying a Remote SPARQL Service simple literal, Glossary SKOS, Making RDF More Readable with Language Tags and Labels, Datatypes and Queries, Checking, Adding, and Removing Spoken Language Tags creating, Checking, Adding, and Removing Spoken Language Tags custom datatypes and, Datatypes and Queries SKOS-XL, Changing Existing Data SNORQL, Querying a Public Data Source sorting data, Sorting Data space before SPARQL punctuation, The Data to Query SPARQL, Jumping Right In: Some Data and Some Queries, Jumping Right In: Some Data and Some Queries, The Data to Query, Querying the Data, Querying the Data, Querying the Data, Storing RDF in Databases, The SPARQL Specifications, The SPARQL Specifications, The SPARQL Specifications, Updating Data with SPARQL, Named Graphs, Glossary comments, The Data to Query engine, Querying the Data Graph Store HTTP Protocol specification, Named Graphs processor, Querying the Data protocol, Jumping Right In: Some Data and Some Queries, The SPARQL Specifications query language, The SPARQL Specifications SPARQL 1.1, Updating Data with SPARQL specifications, The SPARQL Specifications triplestores and, Storing RDF in Databases uppercase keywords, Querying the Data SPARQL endpoint, Querying a Public Data Source, SPARQL and Web Application Development, Triplestore SPARQL Support, Glossary creating your own, Triplestore SPARQL Support SPARQL processor, Glossary SPARQL protocol, Glossary SPARQL Query Results XML Format, The SPARQL Specifications, SPARQL Query Results XML Format, Standalone Processors as ARQ output, Standalone Processors SPARQL rules, Defining Rules with SPARQL, Defining Rules with SPARQL SPIN, Using Existing SPARQL Rules Vocabularies spreadsheets, Checking, Adding, and Removing Spoken Language Tags SQL, Querying the Data, Glossary square braces, Blank Nodes and Why They’re Useful, Using Existing SPARQL Rules Vocabularies str(), Node Type Conversion Functions STRDT(), Datatype Conversion STRENDS(), String Functions string datatype, Datatypes and Queries, Representing Strings striping, Storing RDF in Files, Glossary STRLANG(), Checking, Adding, and Removing Spoken Language Tags STRLEN(), String Functions STRSTARTS(), String Functions subject (of triple), The Data to Query, URLs, URIs, IRIs, and Namespaces, The Resource Description Format (RDF), Glossary namespaces and, URLs, URIs, IRIs, and Namespaces subqueries, Queries in Your Queries, Combining Values and Assigning Values to Variables, Federated Queries: Searching Multiple Datasets with One Query SUBSTR(), Creating New Data, String Functions subtraction, Comparing Values and Doing Arithmetic SUM(), Finding the Smallest, the Biggest, the Count, the Average...


Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos

AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, machine readable, machine translation, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, Wikidata, wikimedia commons, Wikivoyage

SPARQL 1.0 and SPARQL 1.1 The first version of SPARQL, SPARQL 1.0, was released in 2008 [2]. SPARQL 1.0 introduced the SPARQL grammar, the SPARQL query syntax, the RDF term constraints, the graph patterns, the solution sequences and solution modifiers, and the four core query types (SELECT, CONSTRUCT, ASK, and DESCRIBE). SPARQL 1.0 has been significantly extended with new features in SPARQL 1.1 [3]. For example, SPARQL 1.1 supports aggregation. To perform aggregation, first you have to segregate the results into groups, based on the expression(s) in the GROUP BY clause. Then, you evaluate the projections and aggregate functions in the SELECT clause, to get one result per group.

y } => { ?y foaf:knows ?x } } OVER { :LeslieSikos foaf:knows ?person . } WHERE { { SERVICE <http://examplegraph1.com/sparql> { :LeslieSikos foaf:knows ?person . } } UNION { SERVICE <http://examplegraph2.com/sparql> { :LeslieSikos foaf:knows ?person . } } } URL Encoding of SPARQL Queries In order to provide the option for automated processes to make SPARQL queries, SPARQL can be used over HTTP, using the SPARQL Protocol (abbreviated by the P in SPARQL). SPARQL endpoints can handle a SPARQL query with parameters of an HTTP GET or POST request. The query is URL-encoded to escape special characters and create the query string as the value of the query variable.

Popular Public SPARQL Endpoints Service/Dataset SPARQL Endpoint Datahub/CKAN http://semantic.ckan.net/sparql DBpedia http://dbpedia.org/sparql/ GeoNames http://geosparql.org/ Linked Open Commerce http://linkedopencommerce.com/sparql/ Linked Open Data Cloud http://lod.openlinksw.com/sparql LinkedGeoData http://linkedgeodata.org/sparql Sindice http://sparql.sindice.com/ URIBurner http://uriburner.com/sparql Setting Up Your Own SPARQL Endpoint If you publish your LOD dataset on your server, you might want to set up a dedicated SPARQL endpoint to provide easy access to it. There are a couple of free, open source, and commercial products available, not all of which have a full SPARQL 1.1 support, but most have a complete SPARQL 1.0 support.


RDF Database Systems: Triples Storage and SPARQL Query Processing by Olivier Cure, Guillaume Blin

Amazon Web Services, bioinformatics, business intelligence, cloud computing, database schema, fault tolerance, folksonomy, full text search, functional programming, information retrieval, Internet Archive, Internet of things, linked data, machine readable, NP-complete, peer-to-peer, performance metric, power law, random walk, recommendation engine, RFID, semantic web, Silicon Valley, social intelligence, software as a service, SPARQL, sparse data, web application

While the FROM keyword allows us to retrieve data (from local or distant graphs) and apply a query to it, SPARQL’s SERVICE keyword provides a way to remotely execute a query on a SPARQL endpoint. A SPARQL endpoint is a web service ­allowing the execution of SPARQL queries (e.g., DBpedia a central data set of Linked Open Data). Using SERVICE, the query will be sent to the SPARQL endpoint, which will execute the query and return the result. The following code illustrates the SERVICE and FROM keywords. By now, results to SPARQL queries were taking the form of a set of tuples, as in SQL, that is shaped from the SELECT template expression. In fact, SPARQL allows for other representations when used with ASK, CONSTRUCT, and DESCRIBE keywords.

Accessors are defined in order to retrieve the subject, predicate, and object of any triple. ARQ is a SPARQL query engine for Jena allowing us to query and update RDF models through SPARQL standards. ARQ can be used as a standalone application or as a library to integrate into a bigger application. ARQ considers both the SPARQL and SPARQL Update standards. ARQ can produce query results in several formats (XML, JSON, and column-separated output). ARQ can both read and query local and remote RDF data and SPARQL endpoints. ARQ supports custom filter functions, aggregation, GROUP BY, and assignment and federated queries. Fuzeki is a SPARQL server built on ARQ that can present RDF data and answer SPARQL queries over HTTP.

This is mainly due to the large join order search space associated to SPARQL queries.That is, the triples pattern–matching approach of the SPARQL query language implies that practical queries involve many joins—for example, in the scientific domain, some queries can contain more than 50 joins. Therefore, the ordering of these joins has a preponderant impact on the performance of query processing. These SPARQL queries generally Query Processing Figure 6.1 (a) Star and (b) path (chain) SPARQL queries. follow two main patterns: star and path (a.k.a. chain) queries. With a star query pattern, the BGP of a SPARQL query can be represented as a graph where an important number of relations are directly connected to a central node.


pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Infrastructure as a Service, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

Fortunately, you can just specify this prefix once at the top of the file, and then forget about it. The SPARQL query language SPARQL is a query language for triple-stores using the RDF data model [43]. (It is an acronym for SPARQL Protocol and RDF Query Language, pronounced “sparkle.”) It predates Cypher, and since Cypher’s pattern matching is borrowed from SPARQL, they look quite similar [37]. The same query as before—finding people who have moved from the US to Europe—is even more concise in SPARQL than it is in Cypher (see Example 2-9). Example 2-9. The same query as Example 2-4, expressed in SPARQL PREFIX : <urn:example:> SELECT ?personName WHERE { ?

, Glossaryabstractions for, Consistency and Consensus formalization in consensus, Fault-Tolerant Consensus-Limitations of consensususe of replication, Single-leader replication and consensus human fault tolerance, Philosophy of batch process outputs in batch processing, Bringing related data together in the same place, Philosophy of batch process outputs, Fault tolerance, Fault tolerance in log-based systems, Applying end-to-end thinking in data systems, Timeliness and Integrity-Correctness of dataflow systems in stream processing, Fault Tolerance-Rebuilding state after a failureatomic commit, Atomic commit revisited idempotence, Idempotence maintaining derived state, Maintaining derived state microbatching and checkpointing, Microbatching and checkpointing rebuilding state after a failure, Rebuilding state after a failure of distributed transactions, XA transactions-Limitations of distributed transactions transaction atomicity, Atomicity, Atomic Commit and Two-Phase Commit (2PC)-Exactly-once message processing faults, ReliabilityByzantine faults, Byzantine Faults-Weak forms of lying failures versus, Reliability handled by transactions, Transactions handling in supercomputers and cloud computing, Cloud Computing and Supercomputing hardware, Hardware Faults in batch processing versus distributed databases, Designing for frequent faults in distributed systems, Faults and Partial Failures-Cloud Computing and Supercomputing introducing deliberately, Reliability, Network Faults in Practice network faults, Network Faults in Practice-Detecting Faultsasymmetric faults, The Truth Is Defined by the Majority detecting, Detecting Faults tolerance of, in multi-leader replication, Multi-datacenter operation software errors, Software Errors tolerating (see fault tolerance) federated databases, The meta-database of everything fence (CPU instruction), Linearizability and network delays fencing (preventing split brain), Leader failure: Failover, The leader and the lock-Fencing tokensgenerating fencing tokens, Using total order broadcast, Membership and Coordination Services properties of fencing tokens, Correctness of an algorithm stream processors writing to databases, Idempotence, Exactly-once execution of an operation Fibre Channel (networks), MapReduce and Distributed Filesystems field tags (Thrift and Protocol Buffers), Thrift and Protocol Buffers-Field tags and schema evolution file descriptors (Unix), A uniform interface financial data, Advantages of immutable events Firebase (database), API support for change streams Flink (processing framework), Dataflow engines-Discussion of materializationdataflow APIs, High-Level APIs and Languages fault tolerance, Fault tolerance, Microbatching and checkpointing, Rebuilding state after a failure Gelly API (graph processing), The Pregel processing model integration of batch and stream processing, Batch and Stream Processing, Unifying batch and stream processing machine learning, Specialization for different domains query optimizer, The move toward declarative query languages stream processing, Stream analytics flow control, Network congestion and queueing, Messaging Systems, Glossary FLP result (on consensus), Distributed Transactions and Consensus FlumeJava (dataflow library), MapReduce workflows, High-Level APIs and Languages followers, Leaders and Followers, Glossary(see also leader-based replication) foreign keys, Comparison to document databases, Reduce-Side Joins and Grouping forward compatibility, Encoding and Evolution forward decay (algorithm), Describing Performance Fossil (version control system), Limitations of immutabilityshunning (deleting data), Limitations of immutability FoundationDB (database)serializable transactions, Serializable Snapshot Isolation (SSI), Performance of serializable snapshot isolation, Limitations of distributed transactions fractal trees, B-tree optimizations full table scans, Reduce-Side Joins and Grouping full-text search, Glossaryand fuzzy indexes, Full-text search and fuzzy indexes building search indexes, Building search indexes Lucene storage engine, Making an LSM-tree out of SSTables functional reactive programming (FRP), Designing Applications Around Dataflow functional requirements, Summary futures (asynchronous operations), Current directions for RPC fuzzy search (see similarity search) G garbage collectionimmutability and, Limitations of immutability process pauses for, Describing Performance, Process Pauses-Limiting the impact of garbage collection, The Truth Is Defined by the Majority(see also process pauses) genome analysis, Summary, Specialization for different domains geographically distributed datacenters, Distributed Data, Reading Your Own Writes, Unreliable Networks, The limits of total ordering geospatial indexes, Multi-column indexes Giraph (graph processing), The Pregel processing model Git (version control system), Custom conflict resolution logic, The causal order is not a total order, Limitations of immutability GitHub, postmortems, Leader failure: Failover, Leader failure: Failover, Mapping system models to the real world global indexes (see term-partitioned indexes) GlusterFS (distributed filesystem), MapReduce and Distributed Filesystems GNU Coreutils (Linux), Sorting versus in-memory aggregation GoldenGate (change data capture), Trigger-based replication, Multi-datacenter operation, Implementing change data capture(see also Oracle) GoogleBigtable (database)data model (see Bigtable data model) partitioning scheme, Partitioning, Partitioning by Key Range storage layout, Making an LSM-tree out of SSTables Chubby (lock service), Membership and Coordination Services Cloud Dataflow (stream processor), Stream analytics, Atomic commit revisited, Unifying batch and stream processing(see also Beam) Cloud Pub/Sub (messaging), Message brokers compared to databases, Using logs for message storage Docs (collaborative editor), Collaborative editing Dremel (query engine), The divergence between OLTP databases and data warehouses, Column-Oriented Storage FlumeJava (dataflow library), MapReduce workflows, High-Level APIs and Languages GFS (distributed file system), MapReduce and Distributed Filesystems gRPC (RPC framework), Current directions for RPC MapReduce (batch processing), Batch Processing(see also MapReduce) building search indexes, Building search indexes task preemption, Designing for frequent faults Pregel (graph processing), The Pregel processing model Spanner (see Spanner) TrueTime (clock API), Clock readings have a confidence interval gossip protocol, Request Routing government use of data, Data as assets and power GPS (Global Positioning System)use for clock synchronization, Unreliable Clocks, Clock Synchronization and Accuracy, Clock readings have a confidence interval, Synchronized clocks for global snapshots GraphChi (graph processing), Parallel execution graphs, Glossaryas data models, Graph-Like Data Models-The Foundation: Datalogexample of graph-structured data, Graph-Like Data Models property graphs, Property Graphs RDF and triple-stores, Triple-Stores and SPARQL-The SPARQL query language versus the network model, The SPARQL query language processing and analysis, Graphs and Iterative Processing-Parallel executionfault tolerance, Fault tolerance Pregel processing model, The Pregel processing model query languagesCypher, The Cypher Query Language Datalog, The Foundation: Datalog-The Foundation: Datalog recursive SQL queries, Graph Queries in SQL SPARQL, The SPARQL query language-The SPARQL query language Gremlin (graph query language), Graph-Like Data Models grep (Unix tool), Simple Log Analysis GROUP BY clause (SQL), GROUP BY grouping records in MapReduce, GROUP BYhandling skew, Handling skew H Hadoop (data infrastructure)comparison to distributed databases, Batch Processing comparison to MPP databases, Comparing Hadoop to Distributed Databases-Designing for frequent faults comparison to Unix, Philosophy of batch process outputs-Philosophy of batch process outputs, Unbundling Databases diverse processing models in ecosystem, Diversity of processing models HDFS distributed filesystem (see HDFS) higher-level tools, MapReduce workflows join algorithms, Reduce-Side Joins and Grouping-MapReduce workflows with map-side joins(see also MapReduce) MapReduce (see MapReduce) YARN (see YARN) happens-before relationship, Ordering and Causalitycapturing, Capturing the happens-before relationship concurrency and, The “happens-before” relationship and concurrency hard disksaccess patterns, Advantages of LSM-trees detecting corruption, The end-to-end argument, Don’t just blindly trust what they promise faults in, Hardware Faults, Durability sequential write throughput, Hash Indexes, Disk space usage hardware faults, Hardware Faults hash indexes, Hash Indexes-Hash Indexesbroadcast hash joins, Broadcast hash joins partitioned hash joins, Partitioned hash joins hash partitioning, Partitioning by Hash of Key-Partitioning by Hash of Key, Summaryconsistent hashing, Partitioning by Hash of Key problems with hash mod N, How not to do it: hash mod N range queries, Partitioning by Hash of Key suitable hash functions, Partitioning by Hash of Key with fixed number of partitions, Fixed number of partitions HAWQ (database), Specialization for different domains HBase (database)bug due to lack of fencing, The leader and the lock bulk loading, Key-value stores as batch process output column-family data model, Data locality for queries, Column Compression dynamic partitioning, Dynamic partitioning key-range partitioning, Partitioning by Key Range log-structured storage, Making an LSM-tree out of SSTables request routing, Request Routing size-tiered compaction, Performance optimizations use of HDFS, Diversity of processing models use of ZooKeeper, Membership and Coordination Services HDFS (Hadoop Distributed File System), MapReduce and Distributed Filesystems-MapReduce and Distributed Filesystems(see also distributed filesystems) checking data integrity, Don’t just blindly trust what they promise decoupling from query engines, Diversity of processing models indiscriminately dumping data into, Diversity of storage metadata about datasets, MapReduce workflows with map-side joins NameNode, MapReduce and Distributed Filesystems use by Flink, Rebuilding state after a failure use by HBase, Dynamic partitioning use by MapReduce, MapReduce workflows HdrHistogram (numerical library), Describing Performance head (Unix tool), Simple Log Analysis head vertex (property graphs), Property Graphs head-of-line blocking, Describing Performance heap files (databases), Storing values within the index Helix (cluster manager), Request Routing heterogeneous distributed transactions, Distributed Transactions in Practice, Limitations of distributed transactions heuristic decisions (in 2PC), Recovering from coordinator failure Hibernate (object-relational mapper), The Object-Relational Mismatch hierarchical model, Are Document Databases Repeating History?

The following two expressions are equivalent (variables start with a question mark in SPARQL): (person) -[:BORN_IN]-> () -[:WITHIN*0..]-> (location) # Cypher ?person :bornIn / :within* ?location. # SPARQL Because RDF doesn’t distinguish between properties and edges but just uses predicates for both, you can use the same syntax for matching properties. In the following expression, the variable usa is bound to any vertex that has a name property whose value is the string "United States": (usa {name:'United States'}) # Cypher ?usa :name "United States". # SPARQL SPARQL is a nice query language—even if the semantic web never happens, it can be a powerful tool for applications to use internally.


Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

Fortunately, you can just specify this prefix once at the top of the file, and then forget about it. 58 | Chapter 2: Data Models and Query Languages The SPARQL query language SPARQL is a query language for triple-stores using the RDF data model [43]. (It is an acronym for SPARQL Protocol and RDF Query Language, pronounced “sparkle.”) It predates Cypher, and since Cypher’s pattern matching is borrowed from SPARQL, they look quite similar [37]. The same query as before—finding people who have moved from the US to Europe— is even more concise in SPARQL than it is in Cypher (see Example 2-9). Example 2-9. The same query as Example 2-4, expressed in SPARQL PREFIX : <urn:example:> SELECT ?personName WHERE { ?

The following two expressions are equivalent (variables start with a question mark in SPARQL): (person) -[:BORN_IN]-> () -[:WITHIN*0..]-> (location) # Cypher ?person :bornIn / :within* ?location. # SPARQL Because RDF doesn’t distinguish between properties and edges but just uses predi‐ cates for both, you can use the same syntax for matching properties. In the following expression, the variable usa is bound to any vertex that has a name property whose value is the string "United States": (usa {name:'United States'}) # Cypher ?usa :name "United States". # SPARQL SPARQL is a nice query language—even if the semantic web never happens, it can be a powerful tool for applications to use internally.

. • In CODASYL, all queries were imperative, difficult to write and easily broken by changes in the schema. In a graph database, you can write your traversal in imperative code if you want to, but most graph databases also support high-level, declarative query languages such as Cypher or SPARQL. The Foundation: Datalog Datalog is a much older language than SPARQL or Cypher, having been studied extensively by academics in the 1980s [44, 45, 46]. It is less well known among soft‐ ware engineers, but it is nevertheless important, because it provides the foundation that later query languages build upon. In practice, Datalog is used in a few data systems: for example, it is the query lan‐ guage of Datomic [40], and Cascalog [47] is a Datalog implementation for querying large datasets in Hadoop.viii viii.


pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem

Amazon Web Services, anti-pattern, bioinformatics, business logic, commoditize, corporate governance, create, read, update, delete, data acquisition, en.wikipedia.org, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, the strength of weak ties, web application

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.example.org/ter <rdf:Description rdf:about="http://www.example.org/ginger"> <name>Ginger Rogers</name> <occupation>dancer</occupation> <partner rdf:resource="http://www.example.org/fred"/> </rdf:Description> 10. http://www.w3.org/standards/semanticweb/ 11. See http://www.w3.org/TR/rdf-sparql-query/ and http://www.w3.org/RDF/ Graph Databases | 185 <rdf:Description rdf:about="http://www.example.org/fred"> <name>Fred Astaire</name> <occupation>dancer</occupation> <likes rdf:resource="http://www.example.org/ice-cream"/> </rdf:Description> </rdf:RDF> W3C support That they produce logical representations of triples doesn’t mean triple stores necessarily have triple-like internal implementations. Most triple stores, however, are unified by their support for Semantic Web technology such as RDF and SPARQL. While there’s nothing particularly special about RDF as a means of serializing linked data, it is en‐ dorsed by the W3C and therefore benefits from being widely understood and well doc‐ umented.

Once you understand Cypher, it becomes very easy to branch out and learn other graph query languages.1 In the following sections we’ll take a brief tour through Cypher. This isn’t a reference document for Cypher, however—merely a friendly introduction so that we can explore more interesting graph query scenarios later on.2 Other query languages Other graph databases have other means of querying data. Many, including Neo4j, sup‐ port the RDF query language SPARQL and the imperative, path-based query language Gremlin.3 Our interest, however, is in the expressive power of a property graph com‐ bined with a declarative query language, and so in this book we focus almost exclusively on Cypher. Cypher Philosophy Cypher is designed to be easily read and understood by developers, database profes‐ sionals, and business stakeholders.

Most of the examples will work with versions 1.8 and 1.9 of Neo4j. Where a particular language feature requires the latest version, we’ll point it out. 2. For reference documentation see http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html and http:// www.neo4j.org/resources/cypher. 3. See http://www.w3.org/TR/rdf-sparql-query/ and https://github.com/tinkerpop/gremlin/wiki/ Querying Graphs: An Introduction to Cypher | 27 Cypher enables a user (or an application acting on behalf of a user) to ask the database to find data that matches a specific pattern. Colloquially, we ask the database to “find things like this”.


pages: 377 words: 110,427

The Boy Who Could Change the World: The Writings of Aaron Swartz by Aaron Swartz, Lawrence Lessig

Aaron Swartz, affirmative action, Alfred Russel Wallace, American Legislative Exchange Council, Benjamin Mako Hill, bitcoin, Bonfire of the Vanities, Brewster Kahle, Cass Sunstein, deliberate practice, do what you love, Donald Knuth, Donald Trump, failed state, fear of failure, Firefox, Free Software Foundation, full employment, functional programming, Hacker News, Howard Zinn, index card, invisible hand, Joan Didion, John Gruber, Lean Startup, low interest rates, More Guns, Less Crime, peer-to-peer, post scarcity, power law, Richard Feynman, Richard Stallman, Ronald Reagan, school vouchers, semantic web, single-payer health, SpamAssassin, SPARQL, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, the scientific method, Toyota Production System, unbiased observer, wage slave, Washington Consensus, web application, WikiLeaks, working poor, zero-sum game

And so the “Semantic Web Activity” at the Worldwide Web Consortium (W3C) has spent its time writing standard upon standard: the Extensible Markup Language (XML), the Resource Description Framework (RDF), the Web Ontology Language (OWL), tools for Gleaning Resource Descriptions from Dialects of Languages (GRDDL), the Simple Protocol And RDF Query Language (SPARQL) (as created by the RDF Data Access Working Group (DAWG)). Few have received any widespread use and those that have (XML) are uniformly scourges on the planet, offenses against hardworking programmers that have pushed out sensible formats (like JSON) in favor of overly complicated hairballs with no basis in reality (I’m not done yet!

And it’s led many who have been working on the Semantic Web, in the vain hope of actually building a world where software can communicate, to burn out and tune out and find more productive avenues for their attentions. For an example, look at Sean B. Palmer. In his influential piece, “Ditching the Semantic Web?,” he proclaims “It’s not prudent, perhaps even not moral (if that doesn’t sound too melodramatic), to work on RDF, OWL, SPARQL, RIF, the broken ideas of distributed trust, CWM, Tabulator, Dublin Core, FOAF, SIOC, and any of these kinds of things” and says not only will he “stop working on the Semantic Web” but “I will, moreover, actively dissuade anyone from working on the Semantic Web where it distracts them from working on” more practical projects.


pages: 680 words: 157,865

Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design by Diomidis Spinellis, Georgios Gousios

Albert Einstein, barriers to entry, business intelligence, business logic, business process, call centre, continuous integration, corporate governance, database schema, Debian, domain-specific language, don't repeat yourself, Donald Knuth, duck typing, en.wikipedia.org, fail fast, fault tolerance, financial engineering, Firefox, Free Software Foundation, functional programming, general-purpose programming language, higher-order functions, iterative process, linked data, locality of reference, loose coupling, meta-analysis, MVC pattern, Neal Stephenson, no silver bullet, peer-to-peer, premature optimization, recommendation engine, Richard Stallman, Ruby on Rails, semantic web, smart cities, social graph, social web, SPARQL, Steve Jobs, Stewart Brand, Strategic Defense Initiative, systems thinking, the Cathedral and the Bazaar, traveling salesman, Turing complete, type inference, web application, zero-coupon bond

This schemaless approach is tremendously appealing to anyone who has ever modified an XML or RDBMS schema. It also represents a data model that not only survives in the face of inevitable social, procedural, and technological changes, but also embraces them. This RDF would be stored in a triplestore or other database, where it could be queried through SPARQL or a similar language. Most semantically enabled containers support storing and querying RDF in this way now. Examples include the Mulgara Semantic Store,[20] the Sesame Engine,[21] the Talis Platform,[22] and even Oracle 10g and beyond. Nodes in the graph can be selected based on pattern-matching criteria, so we could ask questions of our resources such as “Who created this URL?”

To some extent, these discussions are still going on, as the technologies involved are evolving along with Akonadi, and the current approaches have yet to be validated in production use. At the time of this writing, there are agents feeding into Nepomuk and Strigi; these are separate processes that use the same API to the store as all other clients and resources. Incoming search queries are expressed in either XESAM or SPARQL, the respective query languages of Strigi and Nepomuk, which are also implemented by other search engines (such as Beagle, for example) and forwarded to them via DBUS. This forwarding happens inside the Akonadi server process. The results come in via DBUS in the form of lists of identifiers, which Akonadi can then use to present the results as actual items from the store to the user.


The Data Journalism Handbook by Jonathan Gray, Lucy Chambers, Liliana Bounegru

Amazon Web Services, barriers to entry, bioinformatics, business intelligence, carbon footprint, citizen journalism, correlation does not imply causation, crowdsourcing, data science, David Heinemeier Hansson, eurozone crisis, fail fast, Firefox, Florence Nightingale: pie chart, game design, Google Earth, Hans Rosling, high-speed rail, information asymmetry, Internet Archive, John Snow's cholera map, Julian Assange, linked data, machine readable, moral hazard, MVC pattern, New Journalism, openstreetmap, Ronald Reagan, Ruby on Rails, Silicon Valley, social graph, Solyndra, SPARQL, text mining, Wayback Machine, web application, WikiLeaks

While we are all either a journalist, designer, or developer “first,” we continue to work hard to increase our understanding and proficiency in each other’s areas of expertise. The core products for exploring data are Excel, Google Docs, and Fusion Tables. The team has also, but to a lesser extent, used MySQL, Access databases, and Solr to explore larger datasets; and used RDF and SPARQL to begin looking at ways in which we can model events using Linked Data technologies. Developers will also use their programming language of choice, whether that’s ActionScript, Python, or Perl, to match, parse, or generally pick apart a dataset we might be working on. Perl is used for some of the publishing.


Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, data science, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Gregor Mendel, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, Large Hadron Collider, longitudinal study, machine readable, machine translation, Mars Rover, natural language processing, openstreetmap, Paradox of Choice, power law, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social bookmarking, social graph, SPARQL, sparse data, speech recognition, statistical model, supply-chain management, systematic bias, TED Talk, text mining, the long tail, Vernor Vinge, web application

The fragment shown next uses the XML entity ons with the value http://spreadsheet.google.com/ plwwufp30hfq0udnEmRD1aQ/onto# essentially as an alias to make the XML more readable (&ons;measurement179 is expanded to the full URL with “measurement179” appended): <ons:Measurement RDF:about="&ons;measurement179"> <ons:solubility>0.44244235315106</ons:solubility> <ons:solvent RDF:resource="&ons;solvent8"/> <ons:solute RDF:resource="&ons;solute26"/> <ons:experiment RDF:resource="&ons;experiment2"/> </ons:Measurement> These statements, or triples, can then be read or analyzed by any RDF engine and query systems such as SPARQL. By using appropriate namespaces, especially where they are agreed and shared, it is possible to generate datafiles that are essentially self-describing. A parser has been developed (http://github.com/egonw/onssolubility/tree/) to generate the full RDF document, available at http://github.com/egonw/onssolubility/tree/master/ons.solubility.