semantic web

68 results back to index


Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos

AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, wikimedia commons

The mainstream XML-based standards for web service interoperability specify the syntax only, rather than the semantic meaning of messages. Semantic Web technologies can enhance service-oriented environments with well-defined, rich semantics. Semantic Web services leverage Semantic Web technologies to automate services and enable automatic service discovery, composition, and execution across heterogeneous users and domains. Semantic Web Service Modeling Web services are programs programmatically accessible over standard Internet protocols, using reusable components [1]. Web services are distributed and encapsulate discrete functionality. Semantic Web Services (SWS) make web service characteristics machine-interpretable via semantics. Semantic Web Services aim to combine web services and Semantic Web technologies with the aim of automating service-related tasks, such as discovery, composition, etc. [2]. Semantic Web Services can address some of the limitations of conventional web services, such as syntactic descriptions and the need for manual inspection of web service usability, usage, and integration.

Web Semantics: Science, Services and Agents on the World Wide Web 2008, 6(1): 21–28. 11. Celma, Ò., Raimond, Y. ZemPod: A Semantic Web approach to podcasting. Web Semantics: Science, Services and Agents on the World Wide Web 2008, 6(2): 162–169. 12. Saleem, M., Kamdar, M. R., Iqbal, A., Sampath, S., Deus, H. F., Ngomo, A.-C. Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web 2014, http://dx.doi.org/10.1016/j.websem.2014.07.004. 13. Oinonen, K. (2005) On the road to business application of Semantic Web technology. Semantic Web in Business—How to proceed. In: Industrial Applications of Semantic Web: Proceedings of the 1st IFIP WG12.5 Working Conference on Industrial Applications of Semantic Web. International Federation for Information Processing. Springer Science+Business Media, Inc., New York.

Summary In this chapter, you learned what Semantic Web Services are and how they can be described using WSDL, annotated using SAWSDL, and modeled using OWL-S, WSMO, MicroWSMO and WSMO-Lite, and WSML. You became familiar with the Semantic Web Service software, such as the WSMX and IIR-III execution environments, as well as the WSMT Toolkit and the SADI Protégé plug-in. You learned about the UDDI service listing used to dynamically look up and discover services provided by external business partners and service providers. The next chapter will show you how to store triples and quads efficiently in purpose-built graph databases: triplestores and quadstores. 142 Chapter 5 ■ Semantic Web Services References 1. Domingue, J., Martin, D. (2008) Introduction to the Semantic Web Tutorial. The 7th International Semantic Web Conference, 26–30 October, 2008, Karlsruhe, Germany. 2.


RDF Database Systems: Triples Storage and SPARQL Query Processing by Olivier Cure, Guillaume Blin

Amazon Web Services, bioinformatics, business intelligence, cloud computing, database schema, fault tolerance, full text search, information retrieval, Internet Archive, Internet of things, linked data, NP-complete, peer-to-peer, performance metric, random walk, recommendation engine, RFID, semantic web, Silicon Valley, social intelligence, software as a service, SPARQL, web application

The ICS-forth RDF suite: managing voluminous RDF description bases. In: 2nd International Workshop on the Semantic Web, Hong Kong, pp. 1–13. Allemang, D., Hendler, J.A., 2011. Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL, 2nd Edition Morgan Kaufmann, San Francisco. Angles, R., Gutierrez, C., 2005. Querying RDF data from a graph database perspective. In: Proceedings of the Second European Semantic Web Conference, pp. 346–360. Angles, R., Gutiérrez, C., 2008. Survey of graph database models. ACM Computing Survey 40 (1), 1–39. Aranda, C.B., Hogan, A., Umbrich, J., Vandenbussche, P.-Y., 2013. Sparql web-querying infrastructure: Ready for action? International Semantic Web Conference 2, 277–293. Arenas, M., Gutierrez, C., Perez, J., 2009. Foundations of RDF databases. In: Reasoning Web, 158–204.

, 2009. Efficient linked-list RDF indexing in parliament. In: Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems. IEEE Computer Society, Washington, DC, pp. 17–32. Krötzsch, M., 2011. Efficient rule-based inferencing for OWL EL. IJCAI, 2668–2673. Ladwig, G., Harth, A., 2011. Cumulus RDF: Linked data management on nested key-value stores. In: ­Proceedings of the 7th International Workshop on Scalable Semantic Web Knowledge Base Systems at the 10th International Semantic Web Conference (ISWC2011), Springer Berlin Heidelberg, pp. 30-42. Ladwig, G., Tran, T., 2010. Linked data query processing strategies. In: International Semantic Web ­Conference, vol. 1, pp. 453–469. Larsson, N.J., Moffat, A., 1999. Offline dictionary-based compression. In: Data Compression Conference, DCC 1999, Snowbird, Utah, USA, March 29–31, 1999, pp. 296–305.

QueryPIE: backward reasoning for OWL Horst over very large knowledge bases. In: International Semantic Web Conference, vol. 1, pp. 730–745. Vogels, W., 2009. Eventually consistent. Communication of ACM 52 (1), 40–44. References W3C, 2004a. RDF vocabulary description language 1.0: RDF schema. http://www.w3.org/TR/rdf-schema/. W3C, 2004b. RDF/XML syntax specification (revised). http://www.w3.org/TR/rdf-syntax-grammar/. Weaver, J., Hendler, J.A., 2009. Parallel materialization of thefinite RDFS closure for hundreds of millions of triples. In: International Semantic Web Conference, Chantilly,VA, USA, pp. 682–697. Weiss, C., Bernstein, A., 2009. On-disk storage techniques for semantic web data: Are B-trees always the ­optimal solution? In: Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems. IEEE Computer Society, Washington DC, pp. 49–64.


pages: 315 words: 70,044

Learning SPARQL by Bob Ducharme

database schema, Donald Knuth, en.wikipedia.org, G4S, linked data, semantic web, SPARQL, web application

Later chapters describe how to create more complex queries, how to modify data, how to build applications around your queries, and how it all fits into the semantic web, but if you can execute the queries shown in this chapter, you’re ready to put SPARQL to work for you. Chapter 2. The Semantic Web, RDF, and Linked Data (and SPARQL) SPARQL is a query language for data that follows a particular model, but the semantic web isn’t about the query language or about the model—it’s about the data. The booming amount of data becoming available on the semantic web is making great new kinds of applications possible, and as a well-implemented, mature standard designed with the semantic web in mind, SPARQL is the best way to get that data and put it to work in your applications. What Exactly Is the “Semantic Web”? As excitement over the semantic web grows, some vendors use the phrase to sell products with strong connections to the ideas behind the semantic web, and others use it to sell products with weaker connections.

, Querying the Data, More Realistic Data and Matching on Multiple Triples, URLs, URIs, IRIs, and Namespaces, Storing RDF in Databases, Data That Might Not Be There, Searching Further in the Data, Querying a Remote SPARQL Service, Creating New Data, Using Existing SPARQL Rules Vocabularies, Deleting and Replacing Triples in Named Graphs, Middleware SPARQL Support join (SPARQL equivalent), Searching Further in the Data normalization and, Creating New Data outer join (SPARQL equivalent), Data That Might Not Be There row ID values and, More Realistic Data and Matching on Multiple Triples, URLs, URIs, IRIs, and Namespaces SPARQL middleware and, Middleware SPARQL Support SPARQL rules and, Using Existing SPARQL Rules Vocabularies SQL habits, Querying the Data remote SPARQL service, querying, Querying a Remote SPARQL Service, Querying a Remote SPARQL Service Resource Description Format, The Data to Query (see RDF) round(), Numeric Functions S sample code, Using Code Examples schema, What Exactly Is the “Semantic Web”?, Glossary Schemarama, Using Existing SPARQL Rules Vocabularies screen scraping, What Exactly Is the “Semantic Web”?, Storing RDF in Files, Glossary searching for string, Searching for Strings SELECT, Querying the Data, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT semantic web, What Exactly Is the “Semantic Web”? semantics, What Exactly Is the “Semantic Web”?, Reusing and Creating Vocabularies: RDF Schema and OWL semicolon, Storing RDF in Files, More Readable Query Results, Converting Data, Named Graphs CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files serialization, Storing RDF in Files, Glossary SERVICE, Querying a Remote SPARQL Service simple literal, Glossary SKOS, Making RDF More Readable with Language Tags and Labels, Datatypes and Queries, Checking, Adding, and Removing Spoken Language Tags creating, Checking, Adding, and Removing Spoken Language Tags custom datatypes and, Datatypes and Queries SKOS-XL, Changing Existing Data SNORQL, Querying a Public Data Source sorting data, Sorting Data space before SPARQL punctuation, The Data to Query SPARQL, Jumping Right In: Some Data and Some Queries, Jumping Right In: Some Data and Some Queries, The Data to Query, Querying the Data, Querying the Data, Querying the Data, Storing RDF in Databases, The SPARQL Specifications, The SPARQL Specifications, The SPARQL Specifications, Updating Data with SPARQL, Named Graphs, Glossary comments, The Data to Query engine, Querying the Data Graph Store HTTP Protocol specification, Named Graphs processor, Querying the Data protocol, Jumping Right In: Some Data and Some Queries, The SPARQL Specifications query language, The SPARQL Specifications SPARQL 1.1, Updating Data with SPARQL specifications, The SPARQL Specifications triplestores and, Storing RDF in Databases uppercase keywords, Querying the Data SPARQL endpoint, Querying a Public Data Source, SPARQL and Web Application Development, Triplestore SPARQL Support, Glossary creating your own, Triplestore SPARQL Support SPARQL processor, Glossary SPARQL protocol, Glossary SPARQL Query Results XML Format, The SPARQL Specifications, SPARQL Query Results XML Format, Standalone Processors as ARQ output, Standalone Processors SPARQL rules, Defining Rules with SPARQL, Defining Rules with SPARQL SPIN, Using Existing SPARQL Rules Vocabularies spreadsheets, Checking, Adding, and Removing Spoken Language Tags SQL, Querying the Data, Glossary square braces, Blank Nodes and Why They’re Useful, Using Existing SPARQL Rules Vocabularies str(), Node Type Conversion Functions STRDT(), Datatype Conversion STRENDS(), String Functions string datatype, Datatypes and Queries, Representing Strings striping, Storing RDF in Files, Glossary STRLANG(), Checking, Adding, and Removing Spoken Language Tags STRLEN(), String Functions STRSTARTS(), String Functions subject (of triple), The Data to Query, URLs, URIs, IRIs, and Namespaces, The Resource Description Format (RDF), Glossary namespaces and, URLs, URIs, IRIs, and Namespaces subqueries, Queries in Your Queries, Combining Values and Assigning Values to Variables, Federated Queries: Searching Multiple Datasets with One Query SUBSTR(), Creating New Data, String Functions subtraction, Comparing Values and Doing Arithmetic SUM(), Finding the Smallest, the Biggest, the Count, the Average...

By adding just a little bit of metadata (for example, the information about the ab:spouse, ab:patient, and ab:doctor properties above) to a small set of data (the information about Richard, Craig, and Cindy) we got more out of this dataset than we originally put into it. This is one of the great payoffs of semantic web technology. Tip The OWL 2 upgrade to the original OWL standard introduced several profiles, or subsets of OWL, that are specialized for certain kinds of applications. These profiles are easier to implement and use than attempting to take on all of OWL at once. If you’re thinking of doing some data modeling with OWL, look into OWL 2 RL, OWL 2 QL, and OWL 2 EL as possible starting points for your needs. Of all the W3C semantic web standards, OWL is the key one for putting the “semantic” in “semantic web.” The term “semantics” is sometimes defined as the meaning behind words, and those who doubt the value of semantic web technology like to question the viability of storing all the meaning of a word in a machine-readable way.


pages: 511 words: 111,423

Learning SPARQL by Bob Ducharme

Donald Knuth, en.wikipedia.org, G4S, hypertext link, linked data, place-making, semantic web, SPARQL, web application

Summary In this chapter, we learned: What SPARQL is The basics of RDF The meaning and role of URIs The parts of a simple SPARQL query How to execute a SPARQL query with ARQ How the same variable in multiple triple patterns can connect up the data in different triples What can lead to a query returning nothing What SPARQL endpoints are and how to query the most popular one, DBpedia Later chapters describe how to create more complex queries, how to modify data, how to build applications around your queries, the potential role of inferencing, and the technology’s roots in the semantic web world, but if you can execute the queries shown in this chapter, you’re ready to put SPARQL to work for you. Chapter 2. The Semantic Web, RDF, and Linked Data (and SPARQL) The SPARQL query language is for data that follows a particular model, but the semantic web isn’t about the query language or about the model—it’s about the data. The booming amount of data becoming available on the semantic web is making great new kinds of applications possible, and as a well-implemented, mature standard designed with the semantic web in mind, SPARQL is the best way to get that data and put it to work in your applications. Note The flexibility of the RDF data model means that it’s being used more and more with projects that have nothing to do with the “semantic web” other than their use of technology that uses these standards—that’s why you’ll often see references to “semantic web technology.”

Note The flexibility of the RDF data model means that it’s being used more and more with projects that have nothing to do with the “semantic web” other than their use of technology that uses these standards—that’s why you’ll often see references to “semantic web technology.” What Exactly Is the “Semantic Web”? As excitement over the semantic web grows, some vendors use the phrase to sell products with strong connections to the ideas behind the semantic web, and others use it to sell products with weaker connections. This can be confusing for people trying to understand the semantic web landscape. I like to define the semantic web as a set of standards and best practices for sharing data and the semantics of that data over the Web for use by applications. Let’s look at this definition one or two phrases at a time, and then we’ll look at these issues in more detail. A set of standards Before Tim Berners-Lee invented the World Wide Web, more powerful hypertext systems were available, but he built his around simple specifications that he published as public standards.

, Storing RDF in Databases, Querying a Remote SPARQL Service, Deleting and Replacing Triples in Named Graphs (see also SQL) join (SPARQL equivalent), Searching Further in the Data normalization and, Creating New Data outer join (SPARQL equivalent), Data That Might Not Be There row ID values and, More Realistic Data and Matching on Multiple Triples, URLs, URIs, IRIs, and Namespaces SPARQL middleware and, Middleware SPARQL Support SPARQL rules and, Using Existing SPARQL Rules Vocabularies remote SPARQL service, querying, Querying a Remote SPARQL Service–Querying a Remote SPARQL Service Resource Description Framework (see RDF) REST, SPARQL and HTTP restriction classes, SPARQL and OWL Inferencing round(), Numeric Functions Ruby, SPARQL and Web Application Development rules, SPARQL (see SPARQL rules) S sameTerm(), Node Type and Datatype Checking Functions sample code, Using Code Examples schema, What Exactly Is the “Semantic Web”?, Glossary querying, Querying Schemas Schemarama, Using Existing SPARQL Rules Vocabularies Schematron, Finding Bad Data screen scraping, What Exactly Is the “Semantic Web”?, Storing RDF in Files, Glossary search space, Reduce the Search Space searching for string, Searching for Strings SELECT, Querying the Data, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT semantic web, What Exactly Is the “Semantic Web”?, Glossary semantics, What Exactly Is the “Semantic Web”?, Reusing and Creating Vocabularies: RDF Schema and OWL semicolon, More Readable Query Results connecting operations with, Named Graphs CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files serialization, Storing RDF in Files, Glossary SERVICE, Querying a Remote SPARQL Service Sesame triplestore, Querying Named Graphs, Datatypes and Queries inferencing with, Inferred Triples and Your Query repositories, SPARQL and HTTP simple literal, Glossary SKOS, Making RDF More Readable with Language Tags and Labels creating, Checking, Adding, and Removing Spoken Language Tags custom datatypes and, Datatypes and Queries SKOS-XL, Changing Existing Data SNORQL, Querying a Public Data Source sorting, Sorting Data query efficiency and, Efficiency Outside the WHERE Clause space before SPARQL punctuation, The Data to Query SPARQL, Jumping Right In: Some Data and Some Queries, Glossary comments, The Data to Query endpoint, Querying a Remote SPARQL Service engine, Querying the Data Graph Store HTTP Protocol specification, Named Graphs processor, Querying the Data protocol, Jumping Right In: Some Data and Some Queries, The SPARQL Specifications query language, The SPARQL Specifications SPARQL 1.1, Updating Data with SPARQL specifications, The SPARQL Specifications triplestores and, Storing RDF in Databases uppercase keywords, Querying the Data SPARQL algebra, SPARQL Algebra SPARQL endpoint, Querying a Public Data Source, Public Endpoints, Private Endpoints–Public Endpoints, Private Endpoints, Glossary creating your own, Triplestore SPARQL Support identifier, SPARQL and Web Application Development Linked Data Cloud and, Problem retrieving triples from, Problem SERVICE keyword and, Federated Queries: Searching Multiple Datasets with One Query SPARQL processor, SPARQL Processors–Public Endpoints, Private Endpoints, Glossary SPARQL protocol, Glossary SPARQL Query Results CSV and TSV Formats, SPARQL Query Results CSV and TSV Formats SPARQL Query Results JSON Format, SPARQL Query Results JSON Format SPARQL Query Results XML Format, The SPARQL Specifications, SPARQL Query Results XML Format as ARQ output, Standalone Processors SPARQL rules, Defining Rules with SPARQL–Defining Rules with SPARQL SPIN (SPARQL Inferencing Notation), Using Existing SPARQL Rules Vocabularies, Using SPARQL to Do Your Inferencing spreadsheets, Checking, Adding, and Removing Spoken Language Tags, Using CSV Query Results SQL, Querying the Data, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Middleware SPARQL Support, Glossary habits, Querying the Data square braces, Blank Nodes and Why They’re Useful, Using Existing SPARQL Rules Vocabularies str(), Node Type Conversion Functions CSV format and, SPARQL Query Results CSV and TSV Formats STRDT(), Datatype Conversion STRENDS(), String Functions string converting to URI, Problem datatype, Datatypes and Queries, Representing Strings functions, String Functions–String Functions searching for substrings, Problem striping, Storing RDF in Files, Glossary STRLANG(), Checking, Adding, and Removing Spoken Language Tags STRLEN(), String Functions STRSTARTS(), String Functions subject (of triple), The Data to Query, The Resource Description Framework (RDF), Glossary namespaces and, URLs, URIs, IRIs, and Namespaces subqueries, Queries in Your Queries, Combining Values and Assigning Values to Variables, Federated Queries: Searching Multiple Datasets with One Query SUBSTR(), Creating New Data, String Functions subtraction, Comparing Values and Doing Arithmetic SUM(), Finding the Smallest, the Biggest, the Count, the Average...


pages: 541 words: 109,698

Mining the Social Web: Finding Needles in the Social Haystack by Matthew A. Russell

Climategate, cloud computing, crowdsourcing, en.wikipedia.org, fault tolerance, Firefox, full text search, Georg Cantor, Google Earth, information retrieval, Mark Zuckerberg, natural language processing, NP-complete, Saturday Night Live, semantic web, Silicon Valley, slashdot, social graph, social web, statistical model, Steve Jobs, supply-chain management, text mining, traveling salesman, Turing test, web application

Figure 9-9. A rotating tag cloud that’s highly customizable and requires very little effort to get up and running Chapter 10. The Semantic Web: A Cocktail Discussion While the previous chapters attempted to provide an overview of the social web and motivate you to get busy hacking on data, it seems appropriate to wrap up with a brief postscript on the semantic web. This short discussion makes no attempt to regurgitate the reams of interesting mailing list discussions, blog posts, and other sources of information that document the origin of the Web, how it has revolutionized just about everything in our lives in under two decades, and how the semantic web has always been a part of that vision. It does, however, aim to engage you in something akin to a cocktail discussion that, while glossing over a lot of the breadth and depth of these issues, hopefully excites you about the possibilities that lie ahead.

At present, there’s no real consensus about what Web 3.0 really means, but most discussions of the subject generally include the phrase “semantic web” and the notion of information being consumed and acted upon by machines in ways that are not yet possible at web scale. For example, it’s still very difficult for machines to extract and make inferences about the facts contained in documents available online. Keyword searching and heuristics can certainly provide listings of very relevant search results, but human intelligence is still required to interpret and synthesize the information in the documents themselves. Whether Web 3.0 and the semantic web are really the same thing is open for debate; however, it’s generally accepted that the term semantic web refers to a web that’s much like the one we already know and love, but that has evolved to the point where machines can extract and act on the information contained in documents at a granular level.

Various manifestations/eras of the Web and their virtues Manifestation/eraVirtues Internet Application protocols such as SMTP, FTP, BitTorrent, HTTP, etc. Web 1.0 Mostly static HTML pages and hyperlinks Web 2.0 Platforms, collaboration, rich user experiences Social web (Web 2.x ???) People and their virtual and real-world social connections and activities Web 3.0 (the semantic web) Prolific amounts of machine-understandable content * * * [62] As defined in Programming the Semantic Web, by Toby Segaran, Jamie Taylor, and Colin Evans (O’Reilly). [63] Inter-net literally implies “mutual or cooperating networks.” Man Cannot Live on Facts Alone The semantic web’s fundamental construct for representing knowledge is called a triple, which is a highly intuitive and very natural way of expressing a fact. As an example, the sentence we’ve considered on many previous occasions—“Mr. Green killed Colonel Mustard in the study with the candlestick”—expressed as a triple might be something like (Mr.


pages: 214 words: 14,382

Monadic Design Patterns for the Web by L.G. Meredith

barriers to entry, domain-specific language, don't repeat yourself, finite state, Georg Cantor, ghettoisation, John von Neumann, Kickstarter, semantic web, social graph, type inference, web application, WebSocket

eBook <www.wowebook.com> 9.5 Foundations Cover · Overview · Contents · Discuss · Suggest · Glossary · Index 180 Chapter 10 The Semantic Web Where are we; how did we get here; and where are we going? Chapter 10 query model Chapter 6 Chapter 1 request stream browser Chapter 3 http parser navigation model domain model storage model app request parser Chapter 5 Chapter 8 Chapter 4 User Chapter 2 Chapter 7 Download from Wow! eBook <www.wowebook.com> store Chapter 9 Figure 10.1 · Chapter 10 map Cover · Overview · Contents · Discuss · Suggest · Glossary · Index Download from Wow! eBook <www.wowebook.com> Section 10.1 Chapter 10 · The Semantic Web 10.1 Practice 10.2 Referential transparency In the interest of complete transparency, it is important for me to be clear about my position on the current approach to the semantic web. As early as 2004 i appeared in print as stating a complete lack of confidence regarding meta-data, tags and ontology-based approaches.

., mn ), m0 ∈ [[c]], mi ∈ [[ci ]]} LET SEQ GROUP | val x = c;d | c;d | { c } PROBE | [[hdic]] = {m ∈ L(m) | ∃m0 ∈ [[d]].m0 (m) → m00 , m00 ∈ [[c]]} Other collection monads, other logics Cover · Overview · Contents · Discuss · Suggest · Glossary · Index 191 Section 10.5 Chapter 10 · The Semantic Web 192 Stateful collections Other logical operations EXPRESSION PREVIOUS QUANTIFICATION FIXPT DEFN c, d ::= | ... | ∀v.c | rec X.c FIXPT MENTION |X 10.5 Searching for programs A new foundation for search Monad composition via distributive laws Examples Download from Wow! eBook <www.wowebook.com> 10.6 Foundations Cover · Overview · Contents · Discuss · Suggest · Glossary · Index Section 10.6 Chapter 10 · The Semantic Web 193 data1 dataK { form1 } constraint1 constraintN formK Download from Wow! eBook <www.wowebook.com> form { form : form1 <- data1,..., formK <- dataK, constraint1, ,..., constraintN } Figure 10.2 · Comprehensions and distributive maps Cover · Overview · Contents · Discuss · Suggest · Glossary · Index Glossary algebraic data type A type defined by providing several alternatives, each of which comes with its own constructor.

Cover · Overview · Contents · Discuss · Suggest · Glossary · Index Overview Download from Wow! eBook <www.wowebook.com> Contents List of Figures List of Tables List of Listings Acknowledgments 1. Motivation and Background 2. Toolbox 3. An I/O Monad for HTTP Streams 4. Parsing Requests, Monadically 5. The Domain Model as Abstract Syntax 6. Zippers and Contexts and URIs, Oh My! 7. A Review of Collections as Monads 8. Domain Model, Storage, and State 9. Putting it All Together 10. The Semantic Web Glossary Bibliography About the Author Cover · Overview · Contents · Discuss · Suggest · Glossary · Index vii x xi xii xiii 16 36 55 76 94 115 143 175 179 181 194 210 213 Contents Download from Wow! eBook <www.wowebook.com> Contents vii List of Figures x List of Tables xi List of Listings xii Acknowledgments xiii 1 Motivation and Background 1.1 Where are we? . . . . . . . . . . . . . . . . . . . . . . 1.2 Where are we going?


Cataloging the World: Paul Otlet and the Birth of the Information Age by Alex Wright

1960s counterculture, Ada Lovelace, barriers to entry, British Empire, business climate, business intelligence, Cape to Cairo, card file, centralized clearinghouse, corporate governance, crowdsourcing, Danny Hillis, Deng Xiaoping, don't be evil, Douglas Engelbart, Douglas Engelbart, Electric Kool-Aid Acid Test, European colonialism, Frederick Winslow Taylor, hive mind, Howard Rheingold, index card, information retrieval, invention of movable type, invention of the printing press, Jane Jacobs, John Markoff, Kevin Kelly, knowledge worker, Law of Accelerating Returns, linked data, Livingstone, I presume, lone genius, Menlo Park, Mother of all demos, Norman Mailer, out of africa, packet switching, profit motive, RAND corporation, Ray Kurzweil, Scramble for Africa, self-driving car, semantic web, Silicon Valley, speech recognition, Steve Jobs, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, the scientific method, Thomas L Friedman, urban planning, Vannevar Bush, Whole Earth Catalog

Berners-Lee, “The World Wide Web and the ‘Web of Life.’” Church, “One Light, Many Windows.” Berners-Lee, “The World Wide Web and the ‘Web of Life.’” 323 N O T E S T O PA G E S 2 7 3 – 3 0 7 6. Berners-Lee, “Keynote.” 7. Berners-Lee, Hendler, and Lassila, “The Semantic Web.” 8. Van den Heuvel, “Web 2.0 and the Semantic Web in Research from a Historical Perspective.” 9. Van den Heuvel, private correspondence. 10. Van den Heuvel, “Web 2.0 and the Semantic Web in Research from a Historical Perspective.” 11. Hillis, “Hillis Knowledge Web.” 12. Wright, “Data Streaming 2.0.” 13. Van den Heuvel, “Web 2.0 and the Semantic Web in Research from a Historical Perspective.” 14. Shirky, “Ontology Is Overrated.” 15. Weinberger, Everything Is Miscellaneous, 102. 16. Ibid., 100. 17. Ibid., 103. 18. Lee, “Citizendium Turns Five, but the Wikipedia Fork Is Dead in the Water.” 19.

As early as 1996 he appeared before the W3C and lamented the lack of two-way authoring in modern Web browsers.6 In 2000, he published a paper in Scientific American in which he laid out his vision for a more orderly version of the Web that would allow computers to exchange information with each other more easily. He dubbed the project the Semantic Web, an umbrella term for a collection of 273 C ATA L O G I N G T H E WO R L D technologies aimed at making the Internet more useful by imposing more consistent structures on data that can then be exchanged automatically between machines. He envisioned a “Web of data” designed primarily to foster the automatic exchange of information between computers, to allow any number of applications to search, retrieve, and synthesize data drawn from disparate sources. “The Semantic Web will bring structure to the meaningful content of Web pages,” he wrote, “creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users.”7 Substitute “bibliologists” for “software agents,” and that same description could just as easily apply to the Mundaneum.

Otlet also considered the possibility of engineering mechanical means of indexing and reassembling data from multiple sources, as in his microfilm experiments with Robert Goldschmidt. The environment that Otlet envisioned bears other similarities to the Semantic Web. By predicating its future on ontologies, handcrafted maps of topical relationships, the initiative shares the same spirit of expert knowledge systems that characterized Otlet’s work with the Universal Decimal Classification. Van den Heuvel, for one, argues that Otlet’s framework not only resembles the hyperlinked structure of the World Wide Web, but also presages some of the more advanced linking strategies of the Semantic Web.8 Otlet’s Monographic Principle provided a framework for breaking down documents and other forms of media into component parts, then recombining them into new formats using the multiple link dimensions afforded by the Universal Decimal Classification.


pages: 377 words: 110,427

The Boy Who Could Change the World: The Writings of Aaron Swartz by Aaron Swartz, Lawrence Lessig

affirmative action, Alfred Russel Wallace, American Legislative Exchange Council, Benjamin Mako Hill, bitcoin, Bonfire of the Vanities, Brewster Kahle, Cass Sunstein, deliberate practice, Donald Knuth, Donald Trump, failed state, fear of failure, Firefox, full employment, Howard Zinn, index card, invisible hand, Joan Didion, John Gruber, Lean Startup, More Guns, Less Crime, peer-to-peer, post scarcity, Richard Feynman, Richard Stallman, Ronald Reagan, school vouchers, semantic web, single-payer health, SpamAssassin, SPARQL, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, the scientific method, Toyota Production System, unbiased observer, wage slave, Washington Consensus, web application, WikiLeaks, working poor, zero-sum game

All of which has led “web engineers” (as this series’ title so cutely calls them) to tune out and go back to doing real work, not wanting to waste their time with things that don’t exist and, in all likelihood, never will. And it’s led many who have been working on the Semantic Web, in the vain hope of actually building a world where software can communicate, to burn out and tune out and find more productive avenues for their attentions. For an example, look at Sean B. Palmer. In his influential piece, “Ditching the Semantic Web?,” he proclaims “It’s not prudent, perhaps even not moral (if that doesn’t sound too melodramatic), to work on RDF, OWL, SPARQL, RIF, the broken ideas of distributed trust, CWM, Tabulator, Dublin Core, FOAF, SIOC, and any of these kinds of things” and says not only will he “stop working on the Semantic Web” but “I will, moreover, actively dissuade anyone from working on the Semantic Web where it distracts them from working on” more practical projects. It would be only fair here to point out that I am not exactly an unbiased observer.

The Techniques of Mass Collaboration: A Third Way Out http://www.aaronsw.com/weblog/masscollab2 July 19, 2006 Age 19 I’m not the first to suggest that the Internet could be used for bringing users together to build grand databases. The most famous example is the Semantic Web project (where, in full disclosure, I worked for several years). The project, spearheaded by Tim Berners-Lee, inventor of the web, proposed to extend the working model of the web to more structured data, so that instead of simply publishing text web pages, users could publish their own databases, which could be aggregated by search engines like Google into major resources. The Semantic Web project has received an enormous amount of criticism, much (in my view) rooted in misunderstandings, but much legitimate as well. In the news today is just the most recent example, in which famed computer scientist turned Google executive Peter Norvig challenged Tim Berners-Lee on the subject at a conference.

But Wikipedia points to a different model, where all the users come to one website, where the interface for inputting data in the proper format is clear and unambiguous, and the users can work together to resolve any conflicts that may come up. Indeed, this method strikes me as so superior that I’m surprised I don’t see it discussed in this context more often. Ignorance doesn’t seem plausible; even if Wikipedia was a latecomer, sites like ChefMoz and MusicBrainz followed this model and were Semantic Web case studies. (Full disclosure: I worked on the Semantic Web portions of MusicBrainz.) Perhaps the reason is simply that both sides—W3C and Google—have the existing web as the foundation for their work, so it’s not surprising that they assume future work will follow from the same basic model. One possible criticism of the million-dollar-users proposal is that it’s somehow less free than the individualist approach. One site will end up being in charge of all the data and thus will be able to control its formation.


pages: 223 words: 52,808

Intertwingled: The Work and Influence of Ted Nelson (History of Computing) by Douglas R. Dechow

3D printing, Apple II, Bill Duvall, Brewster Kahle, Buckminster Fuller, Claude Shannon: information theory, cognitive dissonance, computer age, conceptual framework, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, game design, HyperCard, hypertext link, information retrieval, Internet Archive, Jaron Lanier, knowledge worker, linked data, Marc Andreessen, Marshall McLuhan, Menlo Park, Mother of all demos, pre–internet, RAND corporation, semantic web, Silicon Valley, software studies, Steve Jobs, Steve Wozniak, Stewart Brand, Ted Nelson, the medium is the message, Vannevar Bush, Wall-E, Whole Earth Catalog

Microcosm was an open hypermedia system in that all the links were stored in a database as first-class entities that could be reasoned about and applied to any document. Each link was a triple that consisted of a source, a destination and a description. Little did I know at the time how prescient of the Semantic Web these ideas would be. Of course, there are problems with automatically making a link on a word without knowing its precise semantic meaning. There are a lot of different people with the name Mountbatten in the Mountbatten archive for example. So working out the context in which the link was being applied and therefore the meaning of the word became a key focus of our work: problems we are still dealing with as the Semantic Web develops today. We did also have specific links in Microcosm that were more like standard hypertext links because they were embedded in the documents and represented to the user through highlighted buttons, and you could trace them backwards though the link database or linkbase as we called it.

Three things happened at that conference as I recall. Tim started talking about the Semantic Web again in his keynote for the conference. He had talked about it at the first WWW conference in 1994 [1] and the idea of making links on data in the information management proposal he wrote in 1989. As far as he was concerned in 1998, the web of linked documents was beginning to emerge but his vision wasn’t complete until it was also a web of linked data, and so he started to re-educate the community about this at the Brisbane conference. Ted was also at the Brisbane conference to pick up a special award. I remember him demoing ZigZag to us in the bar one night at that conference. He was so excited, and we were all mesmerized. So I had heard Tim talk about the Semantic Web and I saw Ted demo ZigZag at the same conference, and I didn’t fully appreciate either of them at the time.

I understood the principles, but I didn’t understand the detail. It’s taken me a long time to appreciate both the Semantic Web and ZigZag, but as my understanding of both of them has increased I now firmly believe what I suspected all along: there is a one-to-one correspondence between the two ideas, and that you can implement ZigZag in the RDF graph. Someday I’ll find the time to prove that. I need to get Ted involved in making that happen. I really believe that these two amazing people—Tim and Ted—have the same idea of how you can make links on data to create an incredibly rich hyper-structure for generating knowledge. Tim will never talk about it like that. His idea with the Semantic Web is that machines can, if you describe the data using a vocabulary like an ontology, make inferences about the information contained in the data that couldn’t be made in any other way.


pages: 268 words: 109,447

The Cultural Logic of Computation by David Golumbia

Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, American ideology, Benoit Mandelbrot, borderless world, business process, cellular automata, citizen journalism, Claude Shannon: information theory, computer age, corporate governance, creative destruction, en.wikipedia.org, finite state, future of work, Google Earth, Howard Zinn, IBM and the Holocaust, iterative process, Jaron Lanier, jimmy wales, John von Neumann, Joseph Schumpeter, late capitalism, means of production, natural language processing, Norbert Wiener, packet switching, RAND corporation, Ray Kurzweil, RFID, Richard Stallman, semantic web, Shoshana Zuboff, Slavoj Žižek, social web, stem cell, Stephen Hawking, Steve Ballmer, Stewart Brand, strong AI, supply-chain management, supply-chain management software, Ted Nelson, telemarketer, The Wisdom of Crowds, theory of mind, Turing machine, Turing test, Vannevar Bush, web application

The fact that this problem is so rarely discussed in the literature on Negroponte’s project shows the degree to which computers already have encoded into them a profound linguistic ideology, and the fact that few researchers if any are working on ways to solve this problem before “giving computers to everyone” shows exactly the values that propel much of the computer revolution. Like the Semantic Web, the fact that such resources are profoundly majoritarian is considered entirely secondary to the power they give to people who have so little and to the power such projects create. Of course disadvantaged people deserve such access, and of course the ac- Linguistic Computationalism p 125 cess to computer power will help them economically. The question is whether we should be focusing much more of our intellectual energy on making the computer infrastructure into an environment that is not majoritarian, rather than spending so much of our capacity on computerizing English, translating other languages into English, getting computers to speak, or, via projects like the Semantic Web, getting them to “understand” language for us. part three CULTURAL COMPUTATIONALISM chapter six Computation, Globalization, and Cultural Striation n the late 1960s and early 1970s, Marxist economists outlined a theory that was received with a certain amount of surprise, one that has been largely pushed aside today.

While the presence of professional computational linguists in industry today means that companies who focus on language products make such claims less often than in the past, they can still be found with unexpected frequency outside professional CL circles. Yet the views of professional computational linguists have a surprisingly restricted sphere of influence. They are rarely referenced even by the computer scientists who are working to define future versions of the World Wide Web, especially in the recent project to get the the web to “understand” meanings called the Semantic Web. They have little influence, either, over the widespread presumption among computationalists that the severe restrictions today’s computers impose on natural language diversity are irrelevant to the worlwide distribution of cultural power. Computationalism and Digital Textuality During the last fifteen years, a small body of writing has emerged that is concerned with an idea called in the literature the OHCO thesis, spelled out to mean that texts are Ordered Hierarchies of Content Objects.

It is specifically human creativity in text production that the word processor is enabling, and over time it has seemed that in fact most writers prefer text processors to be as formal as possible—that is, to interfere as little as possible with the creation of text content, and to make alterations to text appearance straightforward and consistent. This does not make it sound like even text producers have been waiting for the capability to generate text in terms of the semantic categories it harbors—after all, we already have terrific ways of manipulating those categories, which we call writing and thinking. Language Ideologies of the Semantic Web Ordinarily, a discourse like the one on the OHCO thesis might simply pass unnoticed, but in the present context it is interesting for several reasons. Not least among these is the clear way in which the OHCO writers are proceeding not (only) from the Platonic philosophical intuitions which they say drive them, but instead or also from certain clear prevailing tendencies in our own culture and world.


pages: 314 words: 94,600

Business Metadata: Capturing Enterprise Knowledge by William H. Inmon, Bonnie K. O'Neil, Lowell Fryman

affirmative action, bioinformatics, business cycle, business intelligence, business process, call centre, carbon-based life, continuous integration, corporate governance, create, read, update, delete, database schema, en.wikipedia.org, informal economy, knowledge economy, knowledge worker, semantic web, The Wisdom of Crowds, web application

After a brief survey of semantics and semantic technology, we will cover the relationship of semantics and business metadata. 11.2 C H A P T E R 11 C H A P T E R TA B L E O F CO N T E N T S The Vision of the Semantic Web Tim Berners-Lee envisioned the idea of the “semantic web,” wherein intelligent agents would be truly intelligent. 195 196 Chapter 11 Semantics and Business Metadata In his vision the computer would know exactly what “booking a restaurant reservation” meant, as well as all the underlying tasks associated with it. For example, you could ask the computer to book a reservation at an Indian restaurant on the way home from work, and the computer would find an Indian restaurant located directly on your way home, book a reservation for you, and put it automatically on your calendar, all without human intervention. In the context of searching for documents, a semantic web would be able to understand what the documents contained.

In the context of searching for documents, a semantic web would be able to understand what the documents contained. Today, we rely mostly on document titles and tagging. Tagging is usually done manually either by the document author, someone else charged with tagging after the fact, or through a folksonomy like del.icio.us. But a true semantic web could decipher document contents on its own. On a smaller scale, the semantic web means distinguishing between word senses: when there are two or more senses of a word, the user is asked, “Did you mean…?” For example, we have used the word “mole” throughout the book to illustrate word sense. Google can now distinguish between spelling variations and probable errors. However, if Google were semantically enabled, it would be able to distinguish between the different word senses of mole, and Google would either ask the user which sense he or she wanted or, better, would display results based on each sense.

Business Metadata Praise for Business Metadata “Despite the presence of some excellent books on what is essentially “technical” metadata, up until now there has been a dearth of wellpresented material to help address the growing need for interaction at the conceptual and semantic levels between data professionals and the business clients they support. In Business Metadata, Bill, Bonnie, and Lowell provide the means for bridging the gap between the sometimes “fuzzy” human perception of data that fuels business processes and the rigid information management models used by business applications. Look to the future: next generation business intelligence, enterprise content management and search, the semantic web all will depend on business metadata. Read this book!” —David Loshin, President, Knowledge Integrity Incorporated These authors have written a book that ventures into new territory for data and information management. There are several books about metadata, but this is the first to offer in-depth discussion of the important topic of business metadata. Business metadata is really about understanding the business – something that IT people have struggled with since the dawn of information technology.


pages: 245 words: 68,420

Content Everywhere: Strategy and Structure for Future-Ready Content by Sara Wachter-Boettcher

crowdsourcing, John Gruber, Kickstarter, linked data, search engine result page, semantic web, Silicon Valley

The more you know about how these systems work and what’s being used for what, the better you can evaluate your content’s needs against them and the more you can participate in conversations with those on the database end of the spectrum. What About the Semantic Web? Once you understand a bit about markup, and about making content machine-readable and interoperable, then it’s time to consider some of the exciting stuff that markup makes possible. One of those things is the Semantic Web: a Web where all content shares a common framework and can be shared, reused, and understood across systems—to the point where, say, machines know whether the term “blackberry” is referring to the fruit or the phone. A completely semantic Web is a lofty goal—one not without its detractors, I might note—and our path toward it is still meandering at best. But a more semantic Web seems closer than ever with the recent advent of linked data, which is made possible through structured content and markup.

Domain-driven design celebrates this and offers a graph structure where the connections between concepts are themselves informative. Probably the biggest mindset shift for UXers is to stop thinking about pages and page types, and instead think purely about the mental model of the subject you’re trying to represent. How do linked data and Semantic Web fit in? Where once we built ourselves silos on the Web, these days it pays to recognize that it’s really one Web and we’re in the business of stitching our content into that wider canvas. Initiatives like the Linked Open Data and Semantic Web projects are helping us do this by providing standardized methods of sharing data for both people and computers. For example, dbPedia and MusicBrainz provide free, crowd-sourced sources of content and business data you can use to enrich and enhance your own offerings, on a scale that few businesses would have the time and resources to replicate.

Yet we spend so much time thinking about content—full of powerful stories, important information, critical details—as just documents and pages, identical little units where each one is exactly like the others. We’ve failed to take its complexity into account when it comes to how we’ve published and displayed it. And as a more mobile, social, and user-centered Web has emerged, that failure has finally caught up with us. It’s time we right the ship. Wherever the world goes with markup, whatever happens with the Semantic Web and APIs and even big hairy problems like media revenue models, the truth remains: You’re going to need content that’s ready for multiple destinations—multiple potential destinies. In fact, if you even want to be part of your organization’s conversations about those big hairy problems, you’d be best served by understanding your content and all the ways it might get used and reused. To get there, you have to break down both your content and the organizational hurdles that are preventing your organization from change.


The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Bayesian statistics, business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, discrete time, disruptive innovation, George Gilder, Google Earth, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, late capitalism, lifelogging, linked data, longitudinal study, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

McClean, T. (2011) Not with a Bang but a Whimper: the Politics of Accountability and Open Data in the UK. Paper prepared for the American Political Science Association Annual Meeting. Seattle, Washington, 1–4 September 2011. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1899790 (last accessed 19 August 2013). McCreary, D. (2009) ‘Entity extraction and the semantic web’, Semantic Web, 12 January, http://semanticweb.com/entity-extraction-and-the-semantic-web_b10675 (last accessed 19 July 2013). McKeon, S.G. (2013) ‘Hacking the hackathon’, Shaunagm.net, 10 October, http://www.shaunagm.net/blog/2013/10/hacking-the-hackathon/ (last accessed 21 October 2013). McNay, L. (1994) Foucault: A Critical Introduction. Polity Press, Oxford. Miller, H.J. (2010) ‘The data avalanche is here. Shouldn’t we be digging?’, Journal of Regional Science, 50(1): 181–201.

Since the late 2000s the movement has noticeably gained prominence and traction, initially with the Guardian newspaper’s campaign in the UK to ‘Free Our Data’ (www.theguardian.com/technology/free-ourdata), the Organization for Economic Cooperation and Development (OECD)’s call for member governments to open up their data in 2008, the launch in 2009 by the US government of data.gov, a website designed to provide access to non-sensitive and historical datasets held by US state and federal agencies, and the development of linked data and the promotion of the ‘Semantic Web’ as a standard element of future Internet technologies, in which open and linked data are often discursively conjoined (Berners-Lee 2009). Since 2010 dozens of countries and international organisations (e.g., the European Union [EU] and the United Nations Development Programme [UNDP]) have followed suit, making thousands of previously restricted datasets open in nature for non-commercial and commercial use (see DataRemixed 2013).

Given that by their nature open data generate no or little income to fund such service arrangements, nor indeed the costs of opening data, while it is easy to agree that open data should be delivered as a service, in practice it might be an aspiration unless effective funding models are developed (as discussed more fully below). Linked Data The idea of linked data is to transform the Internet from a ‘web of documents’ to a ‘web of data’ through the creation of a semantic web (Berners-Lee 2009; P. Miller, 2010), or what Goddard and Byrne (2010) term a ‘machine-readable web’. Such a vision recognises that all of the information shared on the Web contains a rich diversity of data – names, addresses, product details, facts, figures, and so on. However, these data are not necessarily formally identified as such, nor are they formally structured in such a way as to be easily harvested and used.


pages: 287 words: 86,919

Protocol: how control exists after decentralization by Alexander R. Galloway

Ada Lovelace, airport security, Berlin Wall, bioinformatics, Bretton Woods, computer age, Craig Reynolds: boids flock, discovery of DNA, Donald Davies, double helix, Douglas Engelbart, Douglas Engelbart, easy for humans, difficult for computers, Fall of the Berlin Wall, Grace Hopper, Hacker Ethic, informal economy, John Conway, John Markoff, Kevin Kelly, Kickstarter, late capitalism, linear programming, Marshall McLuhan, means of production, Menlo Park, moral panic, mutually assured destruction, Norbert Wiener, old-boy network, packet switching, Panopticon Jeremy Bentham, phenotype, post-industrial society, profit motive, QWERTY keyboard, RAND corporation, Ray Kurzweil, RFC: Request For Comment, Richard Stallman, semantic web, SETI@home, stem cell, Steve Crocker, Steven Levy, Stewart Brand, Ted Nelson, telerobotics, the market place, theory of mind, urban planning, Vannevar Bush, Whole Earth Review, working poor

By making the descriptive protocols more complex, one is able to say more complex things about information, namely, that Galloway is my surname, and my given name is Alexander, and so on. The Semantic Web is simply the process of adding extra metalayers on top of information so that it can be parsed according to its semantic value. Why is this significant? Before this, protocol had very little to do with meaningful information. Protocol does not interface with content, with semantic value. It is, as I have said, against interpretation. But with Berners-Lee comes a new strain of protocol: protocol that cares about meaning. This is what he means by a Semantic Web. It is, as he says, “machineunderstandable information.” Does the Semantic Web, then, contradict my earlier principle that protocol is against interpretation? I’m not so sure. Protocols can certainly say things about their contents.

In many ways the core protocols of the Internet had their development heyday in the 1980s. But Web protocols are experiencing explosive growth 38. Berners-Lee, Weaving the Web, p. 36. 39. Berners-Lee, Weaving the Web, p. 71. 40. Berners-Lee, Weaving the Web, pp. 92, 94. Chapter 4 138 today. Current growth is due to an evolution of the concept of the Web into what Berners-Lee calls the Semantic Web. In the Semantic Web, information is not simply interconnected on the Internet using links and graphical markup—what he calls “a space in which information could permanently exist and be referred to”41—but it is enriched using descriptive protocols that say what the information actually is. For example, the word “Galloway” is meaningless to a machine. It is just a piece of information that says nothing about what it is or what it means.

But do they actually know the meaning of their contents? So it is a matter of debate as to whether descriptive protocols actually add intelligence to information, or whether they are simply subjective descriptions (originally written by a human) that computers mimic but understand little about. Berners-Lee himself stresses that the Semantic Web is not an artificial intelligence machine.42 He calls it “well-defined” data, not interpreted data—and 41. Berners-Lee, Weaving the Web, p. 18. 42. Tim Berners-Lee, “What the Semantic Web Can Represent,” available online at http:// www.w3.org/DesignIssues/RDFnot.html. Institutionalization 139 in reality those are two very different things. I promised in the introduction to skip all epistemological questions, and so I leave this one to be debated by others. As this survey of protocological institutionalization shows, the primary source materials for any protocological analysis of Internet standards are the RFC memos.


Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose

Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, speech recognition, statistical model, William of Occam

There are also approaches to do this automatically by applying machine learning methods for classification and clustering. We look into these approaches in Part II. Semantic Web Semantic web is a recent initiative led by the web consortium (w3c.org). Its main objective is to bring formal knowledge representation techniques into the Web. Currently, web pages are designed basically for human readers. It is widely acknowledged that the Web is like a “fancy fax machine” used to send good-looking documents worldwide. The problem here is that the nice format of web pages is very difficult for computers to understand—something that we expect search engines to do. The main idea behind the semantic web is to add formal descriptive material to each web page that although invisible to people would make its content easily understandable by computers.

Thus, the Web would be organized and turned into the largest knowledge base in the world, which with the help of advanced reasoning techniques developed in the area of artificial intelligence would be able not just to provide ranked documents that match a keyword search query, but would also be able to answer questions and give explanations. The web consortium site (http://www.w3.org/2001/sw/) provides detailed information about the latest developments in the area of the semantic web. Although the semantic web is probably the future of the Web, our focus is on the former two approaches to bring semantics to the Web. The reason for this is that web search is the data mining approach to web semantics: extracting knowledge from web data. In contrast, the semantic web approach is about turning web pages into formal knowledge structures and extending the functionality of web browsers with knowledge manipulation and reasoning tools. 6 CHAPTER 1 INFORMATION RETRIEVAL AND WEB SEARCH CRAWLING THE WEB In this and later sections we use basic web terminology such as HTML, URL, web browsers, and servers.

Larose, Daniel T. II. Title. QA76.9.D343M38 2007 005.74 – dc22 2006025099 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 For my children Teodora, Kalin, and Svetoslav – Z.M. For my children Chantal, Ellyriane, Tristan, and Ravel – D.T.L. CONTENTS PREFACE xi PART I WEB STRUCTURE MINING 1 2 INFORMATION RETRIEVAL AND WEB SEARCH 3 Web Challenges Web Search Engines Topic Directories Semantic Web Crawling the Web Web Basics Web Crawlers Indexing and Keyword Search Document Representation Implementation Considerations Relevance Ranking Advanced Text Search Using the HTML Structure in Keyword Search Evaluating Search Quality Similarity Search Cosine Similarity Jaccard Similarity Document Resemblance References Exercises 3 4 5 5 6 6 7 13 15 19 20 28 30 32 36 36 38 41 43 43 HYPERLINK-BASED RANKING 47 Introduction Social Networks Analysis PageRank Authorities and Hubs Link-Based Similarity Search Enhanced Techniques for Page Ranking References Exercises 47 48 50 53 55 56 57 57 vii viii CONTENTS PART II WEB CONTENT MINING 3 4 5 CLUSTERING 61 Introduction Hierarchical Agglomerative Clustering k-Means Clustering Probabilty-Based Clustering Finite Mixture Problem Classification Problem Clustering Problem Collaborative Filtering (Recommender Systems) References Exercises 61 63 69 73 74 76 78 84 86 86 EVALUATING CLUSTERING 89 Approaches to Evaluating Clustering Similarity-Based Criterion Functions Probabilistic Criterion Functions MDL-Based Model and Feature Evaluation Minimum Description Length Principle MDL-Based Model Evaluation Feature Selection Classes-to-Clusters Evaluation Precision, Recall, and F-Measure Entropy References Exercises 89 90 95 100 101 102 105 106 108 111 112 112 CLASSIFICATION 115 General Setting and Evaluation Techniques Nearest-Neighbor Algorithm Feature Selection Naive Bayes Algorithm Numerical Approaches Relational Learning References Exercises 115 118 121 125 131 133 137 138 PART III WEB USAGE MINING 6 INTRODUCTION TO WEB USAGE MINING 143 Definition of Web Usage Mining Cross-Industry Standard Process for Data Mining Clickstream Analysis 143 144 147 CONTENTS 7 8 9 ix Web Server Log Files Remote Host Field Date/Time Field HTTP Request Field Status Code Field Transfer Volume (Bytes) Field Common Log Format Identification Field Authuser Field Extended Common Log Format Referrer Field User Agent Field Example of a Web Log Record Microsoft IIS Log Format Auxiliary Information References Exercises 148 PREPROCESSING FOR WEB USAGE MINING 156 Need for Preprocessing the Data Data Cleaning and Filtering Page Extension Exploration and Filtering De-Spidering the Web Log File User Identification Session Identification Path Completion Directories and the Basket Transformation Further Data Preprocessing Steps References Exercises 156 149 149 149 150 151 151 151 151 151 152 152 152 153 154 154 154 158 161 163 164 167 170 171 174 174 174 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING 177 Introduction Number of Visit Actions Session Duration Relationship between Visit Actions and Session Duration Average Time per Page Duration for Individual Pages References Exercises 177 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION Introduction Modeling Methodology Definition of Clustering The BIRCH Clustering Algorithm Affinity Analysis and the A Priori Algorithm 177 178 181 183 185 188 188 191 191 192 193 194 197 x CONTENTS Discretizing the Numerical Variables: Binning Applying the A Priori Algorithm to the CCSU Web Log Data Classification and Regression Trees The C4.5 Algorithm References Exercises INDEX 199 201 204 208 210 211 213 PREFACE DEFINING DATA MINING THE WEB By data mining the Web, we refer to the application of data mining methodologies, techniques, and models to the variety of data forms, structures, and usage patterns that comprise the World Wide Web.


pages: 369 words: 80,355

Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room by David Weinberger

airport security, Alfred Russel Wallace, Amazon Mechanical Turk, Berlin Wall, Black Swan, book scanning, Cass Sunstein, commoditize, corporate social responsibility, crowdsourcing, Danny Hillis, David Brooks, Debian, double entry bookkeeping, double helix, en.wikipedia.org, Exxon Valdez, Fall of the Berlin Wall, future of journalism, Galaxy Zoo, Hacker Ethic, Haight Ashbury, hive mind, Howard Rheingold, invention of the telegraph, jimmy wales, Johannes Kepler, John Harrison: Longitude, Kevin Kelly, linked data, Netflix Prize, New Journalism, Nicholas Carr, Norbert Wiener, openstreetmap, P = NP, Pluto: dwarf planet, profit motive, Ralph Waldo Emerson, RAND corporation, Ray Kurzweil, Republic of Letters, RFID, Richard Feynman, Ronald Reagan, semantic web, slashdot, social graph, Steven Pinker, Stewart Brand, technological singularity, Ted Nelson, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas Malthus, Whole Earth Catalog, X Prize

But because each of these has a different way of identifying the book, there’s no easy way to write a program that will reliably pull all that information together. If each of these sites followed the conventions specified by the Semantic Web—initiated by Sir Tim Berners-Lee, the inventor of the World Wide Web, around the turn of the millennium—computer programs could far more easily know that these sites were referring to the same book. In fact, the Semantic Web would make it possible to share far more complex information from across multiple sites. Agreeing on how to encode metadata makes the Net capable of expressing more knowledge than was put into it. That is the very definition of a smart network. But creating that metadata can be difficult, especially since many Semantic Web adherents originally proceeded by trying to write large, complex, logical representations of domains of the world. Writing these ontologies, as they are called, can be difficult.

If you’re just trying to write a model of, say, knitting, it might not be too complex; you’d have to represent all the objects (needles, yarns, patterns, knitters, knit goods, etc.) and all their relationships (knit goods have knitters, knitters use needles, needles have sizes, etc.). But writing an ontology of financial markets would require agreeing on exactly what the required definitional elements of a “trade,” “bond,” “regulation,” and “report” are—as well as on every detail and every connection with other domains, such as law, economics, and politics. So, some supporters of the Semantic Web (including Tim Berners-Lee 8) decided that there would be faster and enormous benefits to making data accessible in standardized but imperfect form—as what is called “Linked Data”—without waiting for agreement about overarching ontologies. So, if you have a store of information about, say, chemical elements, you can make it available on the Web as a series of basic assertions that are called “triples” because they have the form of two objects joined by a relation: “Mercury is an element.”

Now any application that wants to understand your triple knows that the relationship is the one defined over on the Dublin Core site. This approach may be messy and imperfect, but it is 100 percent better than not releasing data because you haven’t figured out how to get the metadata perfectly right. The rise of Linked Data encapsulates the transformation of knowledge we have explored throughout this book. While the original Semantic Web emphasized building ontologies that are “knowledge representations” of the world, it turns out that if we go straight to unleashing an abundance of linked but imperfect data, making it widely and openly available in standardized form, the Net becomes a dramatically improved infrastructure for knowledge. Linked Data is nevertheless itself only an example of a more expansive practice: Create metadata so your information can be reused.


pages: 397 words: 102,910

The Idealist: Aaron Swartz and the Rise of Free Culture on the Internet by Justin Peters

4chan, activist lawyer, Any sufficiently advanced technology is indistinguishable from magic, Bayesian statistics, Brewster Kahle, buy low sell high, crowdsourcing, disintermediation, don't be evil, global village, Hacker Ethic, hypertext link, index card, informal economy, information retrieval, Internet Archive, invention of movable type, invention of writing, Isaac Newton, John Markoff, Joi Ito, Lean Startup, moral panic, Paul Buchheit, Paul Graham, profit motive, RAND corporation, Republic of Letters, Richard Stallman, selection bias, semantic web, Silicon Valley, social web, Steve Jobs, Steven Levy, Stewart Brand, strikebreaker, Vannevar Bush, Whole Earth Catalog, Y Combinator

. ,” Schoolyard Subversion, August 16, 2000, http://web.archive.org/web/20010514192627/http://swartzfam.com/aaron/school/2000/08/16/. 41 Aaron Swartz, “The Weight of School,” Schoolyard Subversion, October 8, 2000, http://web.archive.org/web/20010517235916/http://swartzfam.com/aaron/school/2000/10/08/. 42 Aaron Swartz, “Welcome to Unschooling,” Schoolyard Subversion, April 5, 2001, http://web.archive.org/web/20010502005216/http:/swartzfam.com/aaron/school/2001/04/05/. 43 Robert Swartz, interview. 44 Tim Berners-Lee, James Hendler, and Ora Lassila, “The Semantic Web.” Scientific American, May 17, 2001, http://www.scientificamerican.com/article/the-semantic-web/. 45 Aaron Swartz, “I think there is a,” Aaron Swartz: The Weblog, January 14, 2002, http://www.aaronsw.com/weblog/000111. 46 Felter, interview. 47 Wilcox-O’Hearn, “Part 1.” 48 Eldred, “Battle of the Books.” 49 Interview with Lisa Rein, January 2013. 50 Interview with Ben Adida, January 2013. 51 Rein, interview. 52 Aaron Swartz, “Emerging Technologies—Day 2,” Aaron Swartz: The Weblog, May 15, 2002, http://www.aaronsw.com/weblog/000254. 53 Aaron Swartz, “May 13, 2002: Visiting Google,” Google Weblog, May 13, 2002, http://google.blogspace.com/archives/000252. 54 Felter, interview. 55 Aaron Swartz, “Emerging Technologies—Day 3,” Aaron Swartz: The Weblog, May 16, 2002, http://www.aaronsw.com/weblog/000255. 56 Ibid. 57 Eric Eldred to Book People mailing list, October 19, 1998, http://onlinebooks.library.upenn.edu/webbin/bparchive?

,” a developer named Gabe Beged-Dov wrote to an online mailing list on July 3, 2000.38 Swartz responded: “I generally try not to mention my age, because I find that unfortunately some people immediately discredit me because of it. :-(, Thanks to everyone who is able to put aside their prejudices not only in age, but in all matters, so that work on standards like these can go ahead and we can build the Web of the future. I don’t know about all of you, but I get very excited when I think about the possibilities for the Semantic Web. The sooner we get standards, the better. It’s not hard—even an 8th grader can do it! :-) So let’s get moving.”39 Swartz attended a private school, North Shore Country Day, in Winnetka, Illinois, and he chafed at its rules and customs. After-school sports were mandatory, much to his dismay. (“I narrowly escaped another day of practice due to an awful migraine headache. I don’t know which is worse: the headache or practice,” he blogged in August 2000.)40 Students were burdened with too much homework and too many course requirements.

The W3C members were among the first to understand the potential value and power of metadata as a solution to the search-and-retrieval problems that plagued the Web. Just as a supermarket checkout machine scans a bar code to determine exactly what you’re buying and how much it costs, a computer reads a website’s metadata to acquire salient information about that site and the content therein. In a 2001 Scientific American article, Berners-Lee, James Hendler, and Ora Lassila made the case for a metadata-rich “Semantic Web” as one “in which information is given well-defined meaning, better enabling computers and people to work in cooperation. . . . In the near future, these developments will usher in significant new functionality as machines become much better able to process and ‘understand’ the data that they merely display at present.”44 The idea sounded great to Swartz. “The future will be made of thousands of small pieces—computers, protocols, programming languages and people—all working together,” he wrote in January 2002.


pages: 291 words: 77,596

Total Recall: How the E-Memory Revolution Will Change Everything by Gordon Bell, Jim Gemmell

airport security, Albert Einstein, book scanning, cloud computing, conceptual framework, Douglas Engelbart, full text search, information retrieval, invention of writing, inventory management, Isaac Newton, John Markoff, lifelogging, Menlo Park, optical character recognition, pattern recognition, performance metric, RAND corporation, RFID, semantic web, Silicon Valley, Skype, social web, statistical model, Stephen Hawking, Steve Ballmer, Ted Nelson, telepresence, Turing test, Vannevar Bush, web application

Bell, Gordon, and Jim Gemmell. 2002. “A Call for the Home Media Network. Communications of the ACM 45, no. 7 ( July): 71-75. Association for Computing Machinery, Inc. Montalbano, Elizabeth. 2008. “IBM Pledges $1 Billion to Unified Communications.” PC World (March 11). O’Reilly, Paul. 2009. “Managing Unified Communications Performance.” CRN (March 9). Semantic Web: Berners-Lee, T., and J. Hendler. 2001. “Scientific Publishing on the Semantic Web.” Nature (26 April). W3C Semantic Web Frequently Asked Questions. http://www.w3.org/RDF/FAQ British Library Digital Lives Project and conference: Digital Lives Research Project Web page. http://www.bl.uk/digital-lives First Digital Lives Research Conference: Personal Digital Archives for the 21st Century. British Library, St. Pancras, London, February 9-11, 2009. Randy Hahn helped us craft the story about him.

Translation software is required to preserve the correct meaning between systems. As anyone who has translated between languages knows, a word-for-word translation is inadequate; it gives us translations that turn “The spirit is willing but the flesh is weak” to “The alcohol is good but the meat is bad.” Likewise, it can be difficult to translate between storage formats, and a lot of work is yet to go into this effort. The Semantic Web, which aims to standardize transmission and translation of information, is an important effort in this area. There will also be a unification of networking in the sense that we will cease to have distinct networks for different types of data. Already we get telephone over our cable TV network and TV shows over our telephone’s DSL. Eventually, we will get a digital dial tone that carries anything and everything.

ScanMyPhotos.com scanners and digitizing books and file formats and implementation of Total Recall and memex and organization of data and origin of MyLifeBits pen scanners scanning services Schacter, Daniel scholarship science fiction scientific method scrapbooking screensavers Scripps Genomic Health Initiative searching data and associative memory and data analysis desktop search and e-books and implementation of Total Recall and lifelogging Second Life security of data and adaptation to lifelogging and education and encryption and ownership of health records and passwords and privacy self-awareness semantic memory Semantic Web SenseCam and CARPE and diet monitoring and memory aids origin of and summarization of data and travelogues sensory technology. See also biometric sensors The Seven Sins of Memory: How the Mind Forgets and Remembers (Schacter) sexual molestation memories sharing data sheet music shopping lists The Simpsons situational awareness Sixth Sense system Sky Server sleep data Slidescanning.com SmartDraw smartphones.


Beautiful Visualization by Julie Steele

barriers to entry, correlation does not imply causation, data acquisition, database schema, Drosophila, en.wikipedia.org, epigenetics, global pandemic, Hans Rosling, index card, information retrieval, iterative process, linked data, Mercator projection, meta analysis, meta-analysis, natural language processing, Netflix Prize, pattern recognition, peer-to-peer, performance metric, QR code, recommendation engine, semantic web, social graph, sorting algorithm, Steve Jobs, web application, wikimedia commons

The nodes in Figure 7-9 self-organize, in the graph space, by their ties to also bought books. This allows similar books to self-organize together to form clusters of like topics, which reveal the human communities of interest behind the book clusters. In Figure 7-9, two obvious groupings cling together by topic: The bottom-right grouping is all about programmers and programming. The grouping at the top of the graph is all about the Semantic Web. Although clusters emerge in Figure 7-9, they are not as obvious as some others that we will see later; these clusters are intermixed and overlap, especially around other books about modern programming methods and processes. Figure 7-9. The network neighborhood of books surrounding Beautiful Data In addition to clusters of like topics, in Figure 7-9 there are clusters around the publishers, designated by the node colors: red books connect to other red books and yellows connect to other yellows.

This measure reveals which nodes play a similar role in a network. Equivalent nodes may be substitutable for one another in the network. As an author, I would not like my book to be substitutable with many other books! As a reader, however, I would like equivalent choices. In Figure 7-9, the two books with the most similar link pattern to Beautiful Data are Cloud Application Architectures and Programming the Semantic Web. Another value-added service that Amazon provides is reader-submitted book reviews. A person considering the purchase of a particular book may be aided by the many reviews that accumulate. Unfortunately, the reviews can be skewed: an author with a large personal network can quickly get a dozen or more glowing reviews of his latest book posted to Amazon, and a reader with a grudge can do the opposite.

Our example is taken from the fields of art history and archaeology, as these are my trained areas of expertise. However, the findings I present here—namely, that it is possible to visualize the complex structures of databases—can also be demonstrated for many other structured data collections, including biological research databases and massive collaborative efforts such as DBpedia, Freebase, or the Semantic Web. All these data collections share a number of properties, which are not straightforward but are important if we want to make use of the recorded data or if we have to decide where and how our energies and funds should be spent in improving them. Curated databases in art history and archaeology come in a number of flavors, such as library catalogs and bibliographies, image archives, museum inventories, and more general research databases.


pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem

Amazon Web Services, anti-pattern, bioinformatics, commoditize, corporate governance, create, read, update, delete, data acquisition, en.wikipedia.org, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, web application

However, in situations where you need to capture meta-intent, effectively qualifying one relationship with another, (e.g. I like the fact that you liked that car), hypergraphs typically require fewer primitives than property graphs. Triples Triple stores come from the Semantic Web movement, where researchers are interested in large-scale knowledge inference by adding semantic markup to the links that connect Web resources.10 To date, very little of the Web has been marked up in a useful fashion, so running queries across the semantic layer is uncommon. Instead, most effort in the semantic Web appears to be invested in harvesting useful data and relationship infor‐ mation from the Web (or other more mundane data sources, such as applications) and depositing it in triple stores for querying. A triple is a subject-predicate-object data structure.

A triple is a subject-predicate-object data structure. Using triples, we can capture facts, such as “Ginger dances with Fred” and “Fred likes ice cream.” Individually, single triples are semantically rather poor, but en-masse they provide a rich dataset from which to harvest knowledge and infer connections. Triple stores typically provide SPARQL ca‐ pabilities to reason about stored RDF data.11 RDF—the lingua franca of triple stores and the Semantic Web—can be serialized several ways. RDF encoding of a simple three-node graph shows the RDF/XML format. Here we see how triples come together to form linked data. RDF encoding of a simple three-node graph. <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://www.example.org/ter <rdf:Description rdf:about="http://www.example.org/ginger"> <name>Ginger Rogers</name> <occupation>dancer</occupation> <partner rdf:resource="http://www.example.org/fred"/> </rdf:Description> 10. http://www.w3.org/standards/semanticweb/ 11.

See http://www.w3.org/TR/rdf-sparql-query/ and http://www.w3.org/RDF/ Graph Databases | 185 <rdf:Description rdf:about="http://www.example.org/fred"> <name>Fred Astaire</name> <occupation>dancer</occupation> <likes rdf:resource="http://www.example.org/ice-cream"/> </rdf:Description> </rdf:RDF> W3C support That they produce logical representations of triples doesn’t mean triple stores necessarily have triple-like internal implementations. Most triple stores, however, are unified by their support for Semantic Web technology such as RDF and SPARQL. While there’s nothing particularly special about RDF as a means of serializing linked data, it is en‐ dorsed by the W3C and therefore benefits from being widely understood and well doc‐ umented. The query language SPARQL benefits from similar W3C patronage. In the graph database space there is a similar abundance of innovation around graph serialization formats (e.g.


pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, Ethereum, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, general-purpose programming language, informal economy, information retrieval, Infrastructure as a Service, Internet of things, iterative process, John von Neumann, Kubernetes, loose coupling, Marc Andreessen, microservices, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, undersea cable, web application, WebSocket, wikimedia commons

_:idaho a :Location; :name "Idaho"; :type "state"; :within _:usa. _:usa a :Location; :name "United States"; :type "country"; :within _:namerica. _:namerica a :Location; :name "North America"; :type "continent". The semantic web If you read more about triple-stores, you may get sucked into a maelstrom of articles written about the semantic web. The triple-store data model is completely independent of the semantic web—for example, Datomic [40] is a triple-store that does not claim to have anything to do with it.vii But since the two are so closely linked in many people’s minds, we should discuss them briefly. The semantic web is fundamentally a simple and reasonable idea: websites already publish information as text and pictures for humans to read, so why don’t they also publish information as machine-readable data for computers to read?

The Resource Description Framework (RDF) [41] was intended as a mechanism for different websites to publish data in a consistent format, allowing data from different websites to be automatically combined into a web of data—a kind of internet-wide “database of everything.” Unfortunately, the semantic web was overhyped in the early 2000s but so far hasn’t shown any sign of being realized in practice, which has made many people cynical about it. It has also suffered from a dizzying plethora of acronyms, overly complex standards proposals, and hubris. However, if you look past those failings, there is also a lot of good work that has come out of the semantic web project. Triples can be a good internal data model for applications, even if you have no interest in publishing RDF data on the semantic web. The RDF data model The Turtle language we used in Example 2-7 is a human-readable format for RDF data. Sometimes RDF is also written in an XML format, which does the same thing much more verbosely—see Example 2-8.

in databases, Dataflow Through Databases-Archival storage in message-passing, Distributed actor frameworks in service calls, Data encoding and evolution for RPC flexibility in document model, Schema flexibility in the document model for analytics, Stars and Snowflakes: Schemas for Analytics-Stars and Snowflakes: Schemas for Analytics for JSON and XML, JSON, XML, and Binary Variants merits of, The Merits of Schemas schema migration on railways, Reprocessing data for application evolution Thrift and Protocol Buffers, Thrift and Protocol Buffers-Datatypes and schema evolutionschema evolution, Field tags and schema evolution traditional approach to design, fallacy in, Deriving several views from the same event log searchesbuilding search indexes in batch processes, Building search indexes k-nearest neighbors, Specialization for different domains on streams, Search on streams partitioned secondary indexes, Partitioning and Secondary Indexes secondaries (see leader-based replication) secondary indexes, Other Indexing Structures, Glossarypartitioning, Partitioning and Secondary Indexes-Partitioning Secondary Indexes by Term, Summarydocument-partitioned, Partitioning Secondary Indexes by Document index maintenance, Maintaining derived state term-partitioned, Partitioning Secondary Indexes by Term problems with dual writes, Keeping Systems in Sync, Reasoning about dataflows updating, transaction isolation and, The need for multi-object transactions secondary sorts, Sort-merge joins sed (Unix tool), Simple Log Analysis self-describing files, Code generation and dynamically typed languages self-joins, Summary self-validating systems, A culture of verification semantic web, The semantic web semi-synchronous replication, Synchronous Versus Asynchronous Replication sequence number ordering, Sequence Number Ordering-Timestamp ordering is not sufficientgenerators, Synchronized clocks for global snapshots, Noncausal sequence number generators insufficiency for enforcing constraints, Timestamp ordering is not sufficient Lamport timestamps, Lamport timestamps use of timestamps, Timestamps for ordering events, Synchronized clocks for global snapshots, Noncausal sequence number generators sequential consistency, Implementing linearizable storage using total order broadcast serializability, Isolation, Weak Isolation Levels, Serializability-Performance of serializable snapshot isolation, Glossarylinearizability versus, What Makes a System Linearizable?


pages: 518 words: 49,555

Designing Social Interfaces by Christian Crumlish, Erin Malone

A Pattern Language, Amazon Mechanical Turk, anti-pattern, barriers to entry, c2.com, carbon footprint, cloud computing, collaborative editing, creative destruction, crowdsourcing, en.wikipedia.org, Firefox, game design, ghettoisation, Howard Rheingold, hypertext link, if you build it, they will come, Merlin Mann, Nate Silver, Network effects, Potemkin village, recommendation engine, RFC: Request For Comment, semantic web, SETI@home, Skype, slashdot, social graph, social software, social web, source of truth, stealth mode startup, Stewart Brand, telepresence, The Wisdom of Crowds, web application

——continued Download at WoweBook.Com Keeping Up 333 Social Metadata and Future Uses Today’s metadata and future uses Much of the future lies in an idea that has been around many years. This future is embracing the Semantic Web and similar tools, building on semantic information. Semantically relevant metadata improves relevance in information and media retrieval. The Semantic Web has had a chicken-and-egg problem, as it has the tools to do fantastic things with structured information, but it has been held back by the lack of that structured information at a scale that will make a difference. Today’s social semi structured information gives enough of a boost that Semantic Web tools can begin to provide their long-promised power. The Semantic Web is based on triples of information: subject, predicate, and object. Using what we have today, this turns into: Thomas uses the resource tag; Thomas’s resource tag points to web page X.

Social Metadata and Future Uses Summary In short, the social tools we are using today are letting us focus on what we care about, and through the use of lightweight connections and light form fields, are capturing and building a web of semi structured information. The web of semistructured information working as metadata provides enough of a foundation to be used as structured elements, which are the fodder for using Semantic Web tools. This use of the Semantic Web tools leads to better relevance and discernment providing drastically better search to find exactly what the seeker wants, not just what is good enough. This also provides much better capability for aggregating information people care about and would like to keep closer to them. —Thomas Vander Wal, Principal & Sr. Consultant, InfoCloud Solutions (http://infocloudsolutions.com) See Chapter 17 for a further discussion of microformats and semantic markup in general.


pages: 400 words: 94,847

Reinventing Discovery: The New Era of Networked Science by Michael Nielsen

Albert Einstein, augmented reality, barriers to entry, bioinformatics, Cass Sunstein, Climategate, Climatic Research Unit, conceptual framework, dark matter, discovery of DNA, Donald Knuth, double helix, Douglas Engelbart, Douglas Engelbart, en.wikipedia.org, Erik Brynjolfsson, fault tolerance, Fellow of the Royal Society, Firefox, Freestyle chess, Galaxy Zoo, Internet Archive, invisible hand, Jane Jacobs, Jaron Lanier, Johannes Kepler, Kevin Kelly, Magellanic Cloud, means of production, medical residency, Nicholas Carr, P = NP, publish or perish, Richard Feynman, Richard Stallman, selection bias, semantic web, Silicon Valley, Silicon Valley startup, Simon Singh, Skype, slashdot, social intelligence, social web, statistical model, Stephen Hawking, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, The Nature of the Firm, The Wisdom of Crowds, University of East Anglia, Vannevar Bush, Vernor Vinge

[12] Yochai Benkler. Coase’s penguin, or, Linux and The Nature of the Firm. The Yale Law Journal, 112:369–446, 2002. [13] Yochai Benkler. The Wealth of Networks. New Haven: Yale University Press, 2006. [14] Tim Berners-Lee. Weaving the Web. New York: Harper Business, 2000. [15] Tim Berners-Lee and James Hendler. Publishing on the semantic web. Nature, 410:1023–1024, April 26, 2001. [16] Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web. Scientific American, May 17, 2001. [17] Mario Biagioli. Galileo’s Instruments of credit: Telescopes, images, secrecy. Chicago: University of Chicago Press, 2006. [18] Peter Block. Community: The Structure of Belonging. San Francisco: Berrett Koehler, 2008. [19] Barry Boehm, Bradford Clark, Ellis Horowitz, Ray Madachy, Richard Shelby, and Chris Westland.

It was out of that mess of ideas that the first airplanes slowly emerged. In a similar way, today thousands of people and organizations have their own ideas about the best way to build the data web. All are aiming in roughly the same direction, but there are many differences in the details. Perhaps the best-known effort comes from academia, where many researchers are developing an approach called the semantic web. In the business world, the state of affairs is more fluid, as companies try out many different ways of sharing data. Because of these many approaches, there are passionate arguments about the best way to build the data web, often carried out with great conviction and certainty. But the data web is still in its infancy, and it’s too early to say which approach will succeed. For these reasons, Il use the term “data web” rather loosely to refer to all open data, taken together in aggregate.

Interestingly, Hydra has played and lost twice in games of correspondence chess, against correspondence chess grandmaster Arno Nickel. Nickel was, however, allowed to use computer chess programs in these games. A full record of Hydra’s games may be found at [40]. p 119: Chuck Hansen’s book is [92]. The story I recount about Hansen’s methodology is told in Richard Rhodes’s book How to Write, [182], page 61. p 120: On the semantic web, see [16, 15] and http://www.w3.org/standards/semanticweb/. A stimulating alternate point of view is [88]. p 120: For Obama’s memorandum on transparency and open government, see [158]. p 123: The beautiful summary of Einstein’s general theory of relativity, “Spacetime tells matter how to move; matter tells spacetime how to curve,” is due to John Wheeler [240]. p 125 these models have no understanding of the meaning of “hola” or “hello”: I use the term “understanding” here in its everyday sense.


Martin Kleppmann-Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems-O’Reilly (2017) by Unknown

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, Ethereum, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, general-purpose programming language, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, Kubernetes, loose coupling, Marc Andreessen, microservices, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, undersea cable, web application, WebSocket, wikimedia commons

_:lucy a :Person; :name _:idaho a :Location; :name _:usa a :Location; :name _:namerica a :Location; :name "Lucy"; "Idaho"; "United States"; "North America"; :bornIn _:idaho. :type "state"; :within _:usa. :type "country"; :within _:namerica. :type "continent". The semantic web If you read more about triple-stores, you may get sucked into a maelstrom of articles written about the semantic web. The triple-store data model is completely independ‐ ent of the semantic web—for example, Datomic [40] is a triple-store that does not claim to have anything to do with it.vii But since the two are so closely linked in many people’s minds, we should discuss them briefly. The semantic web is fundamentally a simple and reasonable idea: websites already publish information as text and pictures for humans to read, so why don’t they also publish information as machine-readable data for computers to read?

The Resource Description Framework (RDF) [41] was intended as a mechanism for different web‐ sites to publish data in a consistent format, allowing data from different websites to be automatically combined into a web of data—a kind of internet-wide “database of everything.” Unfortunately, the semantic web was overhyped in the early 2000s but so far hasn’t shown any sign of being realized in practice, which has made many people cynical about it. It has also suffered from a dizzying plethora of acronyms, overly complex standards proposals, and hubris. However, if you look past those failings, there is also a lot of good work that has come out of the semantic web project. Triples can be a good internal data model for appli‐ cations, even if you have no interest in publishing RDF data on the semantic web. The RDF data model The Turtle language we used in Example 2-7 is a human-readable format for RDF data. Sometimes RDF is also written in an XML format, which does the same thing much more verbosely—see Example 2-8.

# SPARQL Because RDF doesn’t distinguish between properties and edges but just uses predi‐ cates for both, you can use the same syntax for matching properties. In the following expression, the variable usa is bound to any vertex that has a name property whose value is the string "United States": (usa {name:'United States'}) # Cypher ?usa :name "United States". # SPARQL SPARQL is a nice query language—even if the semantic web never happens, it can be a powerful tool for applications to use internally. Graph-Like Data Models | 59 Graph Databases Compared to the Network Model In “Are Document Databases Repeating History?” on page 36 we discussed how CODASYL and the relational model competed to solve the problem of many-tomany relationships in IMS. At first glance, CODASYL’s network model looks similar to the graph model.


pages: 255 words: 76,495

The Facebook era: tapping online social networks to build better products, reach new audiences, and sell more stuff by Clara Shih

business process, call centre, Clayton Christensen, cloud computing, commoditize, conceptual framework, corporate governance, crowdsourcing, glass ceiling, jimmy wales, Mark Zuckerberg, Metcalfe’s law, Network effects, pets.com, pre–internet, rolodex, semantic web, sentiment analysis, Silicon Valley, Silicon Valley startup, social graph, social web, software as a service, Tony Hsieh, web application

Recognizing this, both established media players like Thomson Reuters as well as Internet upstarts like Metaweb are investing in efforts to define a “semantic Web.”These initiatives seek to classify Web content in a way that is understandable by computers so that the tedious work of linking information on the Web can be automated. For example, say there is a semantic Web system for selling used books over the Internet. The first time someone visits the site, she will be asked to identify herself with information such as name, address, e-mail, and phone number. The data provided is stored in a Resource Description Framework (RDF) file to provide context about this person for future visits to this site and any other semantic Web site. Similarly, any data provided about a particular book, such as title, publisher, ISBN number, cover image, and description would be stored in an RDF file about the textbook to provide context for any future references to this textbook.

See innovation rapport with customers, building/sustaining, 75-76 RDF (Resource Description Framework) file, 28 reciprocity ring, 52-56 recommendations influence of, 3 social recommendations, 101-103 reconciling grassroots initiatives, 202 recruiting, 123-124 advice for job candidates, 141 candidate references, obtaining, 134-135 credibility of recruiter, establishing, 136 employee poaching, 141-142 employer reputation, marketing, 136 keeping contact with candidates, 137 alumni networks, 139-140 financial services example, 138-139 nonplacements, 138 successful placements, 137 in Second Life, 22 social networking sites for, 124-126 sourcing candidates, 126-127 active candidates, 127-128 college students, 129-131 from specialized networks, 132-133 From the Library of Kerri Ross Skyrock passive candidates, 128-129 by “reading between the lines,” 133-134 referrals, 131-132 Red Bull Energy Drink Web site, Facebook Connect and, 41 reducing testing variance with hypersegments, 167 references customer references, 74-75 obtaining, 134-135 referrals influence of, 3 for recruiting, 131-132 relationship interest, as hypertargeting dimension, 165 relationship status, as hypertargeting dimension, 164 relationships breaking off, 49 in customer organizations, navigating, 69-71 importance in business, 3 latent value of, 48-49 shifting nature of, 211 tagging, 51 valuable relationships, discovering, 48 weak ties, maintaining, 44-47 reminders, birthday, 189 requests, asking, 56 Resource Description Framework (RDF) file, 28 return on investment (ROI) of social networking, 205 risk management, 198-200 brand misrepresentation, 200 identity, privacy, security, 198-199 intellectual property, confidentiality, 199-200 RockYou, 38 Rogers, Everett, 118 ROI (return on investment) of social networking, 205 RSS feeds, 27 Ryze, recruiting via, 125 S sales, 61-62 B2B versus B2C, 63-64 CRM versus social networking sites, 80 multiple network structures in, 79-80 online social graph benefits, 62-63 building/sustaining customer rapport, 75-76 credibility, establishing, 64-65 customer references, 74-75 first call success rate, 67-69 navigating customer organizations, 69-71 postsales customer support, 77-78 prospecting for customers, 65-67 sales team collaboration, 72-73 social capital in, 71 233 sales leads, obtaining, 65-67 Salesforce CRM, 41 Salesforce Ideas, 112 Salesforce to Salesforce, 209-210 salesmen (in social epidemics), 101 Sanrio, unsanctioned communities related to, 149-150 Schatzer, Jeff, 140 Scott, Adrian, 125 search engine marketing, 27-28 searching for friends, 189 Second Life, 21-22 security risks, 198-199 segmenting audience. See hypersegments of audience selecting hypersegments of audience, 164-167 common problems, 166 connecting with social networking goals, 166 dimensions, list of, 164-165 reducing testing variance, 167 social networking sites for corporate presence, 156 The Selfish Gene (Dawkins), 109 semantic Web, 28 setting up accounts. See Facebook, account setup shopping, 101-103 Simply Hired, 125 Skyrock, 221 From the Library of Kerri Ross 234 Slide Slide, 38 small businesses in future of social networking, 208 Social Actions, 170, 173-175 social ads, 98-99 social applications, 190 social business, future of collaboration among organizations, 209-210 community strengthening, 208 in enterprise IT, 206-207 innovator’s dilemma, 204 organizational transparency and productivity, 207 relationships, shifting nature of, 211 ROI, 205 for small businesses, 208 trends, 205-206 social capital, 206 advantages of increasing, 43-44 building, 188-192 business implications of, 44 discovering valuable relationships, 48 entrepreneurial networks versus clique networks, 49-50 flattened communication hierarchy, 52 latent value of relationships, 48-49 online networks as supplement to offline networks, 50-51 reciprocity ring, 52-56 weak ties relationships, 44-47 defined, 43 in sales, 71 social distribution, 96-97 passive word of mouth, 97-98 reaching new audiences, 101 social ads, 98-99 social shopping and recommendations, 101-103 viral marketing, 99-101 social epidemics, types of people driven by, 100-101 social filtering, 25, 29-34 social innovation.


pages: 371 words: 93,570

Broad Band: The Untold Story of the Women Who Made the Internet by Claire L. Evans

"side hustle", 4chan, Ada Lovelace, Albert Einstein, British Empire, colonial rule, computer age, crowdsourcing, dark matter, dematerialisation, Doomsday Book, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, East Village, Edward Charles Pickering, game design, glass ceiling, Grace Hopper, Gödel, Escher, Bach, Haight Ashbury, Harvard Computers: women astronomers, Honoré de Balzac, Howard Rheingold, HyperCard, hypertext link, index card, information retrieval, Internet Archive, Jacquard loom, John von Neumann, Joseph-Marie Jacquard, knowledge worker, Leonard Kleinrock, Mahatma Gandhi, Mark Zuckerberg, Menlo Park, Mother of all demos, Network effects, old-boy network, On the Economy of Machinery and Manufactures, packet switching, pets.com, rent control, RFC: Request For Comment, rolodex, semantic web, Silicon Valley, Skype, South of Market, San Francisco, Steve Jobs, Steven Levy, Stewart Brand, subscription business, technoutopianism, Ted Nelson, telepresence, Whole Earth Catalog, Whole Earth Review, women in the workforce, Works Progress Administration, Y2K

How and why that data are linked is becoming increasingly important, especially as we teach machines to interpret connections for us—in order for artificial intelligence to understand the Web, it will need an additional layer of machine-readable information on top of our documents, a kind of meta-Web that proponents call the Semantic Web. While humans might understand connections intuitively, and are willing to ignore when links rot or lead nowhere, computers require more consistent information about the source, the destination, and the meaning of every link. “That was the core of Microcosm,” Wendy says. When she began to participate in building the Semantic Web in the 2000s, it was “so exciting because I could see all my original research ideas coming to life in the Web world. We still couldn’t do all the things we could do in Microcosm in the ’90s, but we could see how effective our linkbases were.”

., 29–31, 70, 75 Nehru, Jawaharlal, 160 Nelson, Ted, 154 Netscape, 172, 191, 209, 215 network effect, 172 Network Information Center (NIC), 112–19, 114, 121, 122, 166 Reference Desk of, 113 WHOIS, 119–20 networks, 25 packets in, 110, 126, 202 spanning-tree protocol for, 126–28 Network Working Group, 117 Neustrup, Chris, 97–98 Newsweek, 183, 184, 191 New York, 187, 210 New York City: 9/11 attack in, 150, 200–201 Silicon Alley in, 146, 182, 184, 186–88, 191–94, 196–201, 218, 219 New-York Historical Society, 150–51 New York Times, 9–10, 50, 136, 191, 194, 199, 218, 235 New York University (NYU), 134, 195, 196 Interactive Telecommunications Program, 182 NeXT, 168 Nightline, 233 9/11 terrorist attacks, 150, 200–201, 204 NLS (oNLine System), 111–12, 115, 116, 154, 210 NoteCards, 164–66, 168, 170 nuclear bombs, 36, 55 nuclear submarines, 76 Old Boys’ Network, 239–40 online publishing, see electronic publishing OS/360 operating system, 76 Oxygen Media, 216 Pack, Ellen, 205–13, 215, 216, 219–20 packets, 110, 126, 202 PDQ Committee, 71, 73 Pearce, Naomi, 133, 208 Pearl, Amy, 162 Pearl Harbor attack, 27–29, 32 People’s Computer, 98, 119 Perlman, Radia, 123–28 Phiber Optik, 136, 187 Pickering, Edward Charles, 23 PicoSpan, 132, 135 Pierce, Julianne, 237 Plant, Sadie, 11, 21, 80, 238 PLATO (Programmed Logic for Automatic Teaching Operations), 178–81 Pleasant Company, 235 Poetics (Aristotle), 226 Pollock, Scarlet, 239 Powers, Richard, 88 presidential election of 1952, 60 programming, 25, 26, 46, 52, 64–74, 75–80, 91–92, 106, 122–24, 162, 226 and association between women and software, 51–52 automatic, 65–69, 73, 119 caving compared with, 88 compilers in, 66–69, 73 computer-written programs, 59–60, 68 conference on crisis in, 77 Cosmopolitan article about, 75, 76, 77 debugging and, 66, 68, 74 decline in women in, 76–78, 93 distinction between operating and, 52 documentation in, 37, 65, 69 Editing Generator and, 73 educational requirements for, 78, 93 EMCC and, 56, 57 ENIAC and, 44–52, 79 first programs, 21 flowcharts and, 59 hardware development and, 77 Lovelace and, 20 machine code in, 66–68 magnetic-tape, 60–62, 79, 110 Mark I and, 32–33, 46, 59 perfection required in, 76–77 professionalization and masculinization of, 76–78, 93, 222, 228 punch cards and tapes in, 12–13, 32–33, 35–36, 39, 46, 47, 60–62, 79, 110 renamed software engineering, 77–78, 93 shortage in programmers, 76 social skills and, 78–79 Sort-Merge Generator and, 59, 68, 73 subroutines in, 37, 65, 67, 68 UNIVAC and, 58–59, 65 see also software programming languages, 46, 65–73, 79, 108 COBOL, 71–73 FORTRAN, 70, 88, 89, 93 Project One, 95–108, 119 Prose, Francine, 218 Pseudo, 186–87, 199 publishing, see electronic publishing punch cards and tapes, 12–13, 32–33, 35–36, 39, 46, 47, 60–62, 79, 110 Purple Moon, 227–36 Radio Corporation of America (RCA), 69 Radio Shack, 225 Raisch, Charles, 96 Razorfish, 191, 197–99 Reddit, 149 Reed, Lou, 192 Remington Rand, 60–63, 65–70, 73 Requests for Comments (RFCs), 117–18, 120, 129 Reson, Sherry, 95, 96, 103–7 Resource One, 96–108, 109, 130, 132, 215, 242 Resource One Generalized Information Retrieval System (ROGIRS), 98 Reynolds, Joyce, 117 Rheingold, Howard, 148–49 Rhine, Nancy, 132–33, 205–12 Richardson, Ann, 75 Rockett games, 230–36 Rolling Stone, 99 routers, 86, 93 routing algorithms, 124–28 Salon, 218 Sammet, Jean E., 70, 72, 73 San Francisco Bay Area, 95–98, 100–102, 104–6, 109, 135, 179 San Francisco Public Library, 106 San Francisco Switchboard, 97 Scientific Data Systems 940 (SDS-940), 96–99, 101, 103–5, 107, 109–10 search engines, 115, 154 Sears, 225 Secret Paths games, 232, 236 Sega, 233 Semantic Web, 174 Seneca Falls Conference on the Rights of Women, 11 September 11 terrorist attacks, 150, 200–201, 204 Sharp, Elliot, 187 Shepard, Alan, 24 Sherman, Aliza, 131–32, 140, 143, 214 Shirky, Clay, 181 Shone, Mya, 96, 104–6 Silicon Alley, 146, 182, 184, 186–88, 191–94, 196–201, 218, 219 Silicon Alley Reporter, 198–99 Simpson, O. J., 150 Smithsonian Institution, 62 Snyder, Elizabeth “Betty,” see Holberton, Elizabeth “Betty” social media, 97, 137, 139–41, 148, 149, 151, 152, 201, 207, 210, 241, 242 Facebook, 139, 141, 148, 149, 151, 210 Reddit, 149 Twitter, 149, 150, 151 Social Services Referral Directory, 105–7, 215 Sodoeka, Yoshi, 193 software, 56, 74, 88, 94, 132, 163 crisis in, 76–78 distinction between hardware and, 33 women and, 51–52 see also programming software engineering, use of term, 77–78, 93 Somerville, Mary, 16, 21 Space Task Force, 24 spanning-tree protocol, 126–28 Speiser, Jane, 99–100 Stahl, Mary, 114, 118, 120, 122 Stanford University, 110, 153, 154 Augmentation Research Center at, 111–12, 116 Starrs, Josephine, 237 Stevenson, Adlai, 60 stock market crash, 198–200, 201 Stone, Allucquére Rosanne, 143 subroutines, 37 Suck.com, 194, 201–2 Sun Link Service, 162 Sun Microsystems, 161, 162, 210 Sutton, Jo, 239 Switchboards, 97–98, 100, 101, 105 Symbolics, 161, 162 Symbolics Document Examiner, 162 system administrators (sysops), 130, 131 Talmud, 154 Tandy, 225 Tannenbaum, Rob, 137 telephone companies, 24 Telepresence Research, 227 Teletype machines, 101, 105, 106 Telluride InfoZone, 131 telnet, 151–52 Terminal, 151–52 textile looms, 11–13, 20 Tierney, Gertrude, 73 Time, 233 Tomb Raider, 236 TransAmerica Leasing Corporation, 98, 99 trans experience, 143–44 Embraceable Ewe and, 142, 144 Turkle, Sherry, 223, 229 Twitter, 149, 150, 151 United States Naval Observatory, 9–10 United Way, 106 UNIVAC (Universal Automatic Computer), 57–63, 65, 66, 67, 73 C-10 code for, 58–59 University of California, Berkeley, 97, 110 University of California, Los Angeles (UCLA), 110 University of Michigan, 157 University of Pennsylvania, 69–70 Moore School of Electrical Engineering, 37–42, 47, 48, 50, 54–56 University of Southampton, 157–59, 160 Web Science Institute, 171, 173 Unix, 135–36, 152 URLs, 215 Utopian Entrepreneur (Laurel), 235 Van Meter, Jonathan, 188, 189 Viacom, 186, 192 VIBE, 188 video games, see computer games VIKI, 166, 170 Village Voice, 136, 183, 184 Virtual Community, The: Homesteading on the Electronic Frontier (Rheingold), 148–49 virtual reality, 227–28 VNS Matrix, 237–40, 242 Volkart, Yvonne, 240 von Neumann, John, 36 Walcott, James, 137 Walker, Janet, 162 Wall Street Journal, 220, 221 Watson, Patty Jo, 91–92 Watson, Richard, 88 Watson, Thomas, Jr., 60 Web: use of word, 153 see also World Wide Web Web sites and pages, 131, 135, 153, 154, 184, 186 life spans of, 170 for women, see women’s Web see also World Wide Web WELL, The, 132–35, 140, 149, 153, 179–80, 205–6, 209 Wellington, Arthur Wellesley, Duke of, 16 Wescoff, Marlyn, 39, 43, 48, 49 Westheimer, Ellen, 114 WHOIS, 119–20 Whole Earth Catalog, 100, 132 Whole Earth Review, 132, 183 Wilcox, Patricia (Pat Crowther), 84–94, 110 William the Conqueror, 155 Wired, 138, 194, 206 women, 4–5 computers as viewed by, 229 men posing as, 143–44, 179 and software vs. hardware, 51–52 women, working, 23–24 black, 24 wage discrimination and, 23, 77, 78 women.com, 205, 214–21 Women in Telecommunications (WIT), 141–42, 144, 205 Women’s Internet History Project, 143 Women’space, 239 women’s Web, 131, 216, 221, 223, 233 advertising and, 214–16, 218, 219, 221 iVillage, 214, 216–21 women.com, 205, 214–21 Women’s WIRE, 205–15 Women’s WIRE, 205–15 Woods, Don, 90 Word, 188–95, 201–3, 205, 214, 215 Works Progress Administration, 25 World War I, 24 World War II, 24, 25, 28–29, 31, 32, 34–37, 40, 45, 47, 50, 51, 53–55 atomic bomb in, 36 Pearl Harbor attack, 27–29, 32 World Wide Web, 102, 131, 152, 154, 159, 165, 168–72, 177, 203, 204, 222 browsers for, see browsers commercialization of, 204–5, 217, 241; see also advertising conferences on, 170, 173 early true believers and, 187–88, 196, 197, 202 hypertext and, 168–70, 201 links on, 168–70, 201 Microcosm viewer for, 172–73 number of women on, 214 search engines for, 115, 154 Semantic Web and, 174 see also Internet; Web sites and pages Xerox, 161 Xerox PARC, 162–66, 210 Y2K, 71, 194 Yankelovich, Nicole, 162 Zapata Corporation, 194, 201 Zeroes + Ones (Plant), 238 About the Author CLAIRE L.

., 60 Web: use of word, 153 see also World Wide Web Web sites and pages, 131, 135, 153, 154, 184, 186 life spans of, 170 for women, see women’s Web see also World Wide Web WELL, The, 132–35, 140, 149, 153, 179–80, 205–6, 209 Wellington, Arthur Wellesley, Duke of, 16 Wescoff, Marlyn, 39, 43, 48, 49 Westheimer, Ellen, 114 WHOIS, 119–20 Whole Earth Catalog, 100, 132 Whole Earth Review, 132, 183 Wilcox, Patricia (Pat Crowther), 84–94, 110 William the Conqueror, 155 Wired, 138, 194, 206 women, 4–5 computers as viewed by, 229 men posing as, 143–44, 179 and software vs. hardware, 51–52 women, working, 23–24 black, 24 wage discrimination and, 23, 77, 78 women.com, 205, 214–21 Women in Telecommunications (WIT), 141–42, 144, 205 Women’s Internet History Project, 143 Women’space, 239 women’s Web, 131, 216, 221, 223, 233 advertising and, 214–16, 218, 219, 221 iVillage, 214, 216–21 women.com, 205, 214–21 Women’s WIRE, 205–15 Women’s WIRE, 205–15 Woods, Don, 90 Word, 188–95, 201–3, 205, 214, 215 Works Progress Administration, 25 World War I, 24 World War II, 24, 25, 28–29, 31, 32, 34–37, 40, 45, 47, 50, 51, 53–55 atomic bomb in, 36 Pearl Harbor attack, 27–29, 32 World Wide Web, 102, 131, 152, 154, 159, 165, 168–72, 177, 203, 204, 222 browsers for, see browsers commercialization of, 204–5, 217, 241; see also advertising conferences on, 170, 173 early true believers and, 187–88, 196, 197, 202 hypertext and, 168–70, 201 links on, 168–70, 201 Microcosm viewer for, 172–73 number of women on, 214 search engines for, 115, 154 Semantic Web and, 174 see also Internet; Web sites and pages Xerox, 161 Xerox PARC, 162–66, 210 Y2K, 71, 194 Yankelovich, Nicole, 162 Zapata Corporation, 194, 201 Zeroes + Ones (Plant), 238 About the Author CLAIRE L. EVANS is a contributor to VICE, The Guardian, WIRED, and Aeon, and is the founding editor of Terraform, VICE’s science-fiction vertical. She is the former futures editor of VICE’s technology website, Motherboard, has contributed to Grantland, and wrote National Geographic’s popular culture and science blog, Universe.


pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets by David J. Leinweber

AI winter, algorithmic trading, asset allocation, banking crisis, barriers to entry, Big bang: deregulation of the City of London, business cycle, butter production in bangladesh, butterfly effect, buttonwood tree, buy and hold, buy low sell high, capital asset pricing model, citizen journalism, collateralized debt obligation, corporate governance, Craig Reynolds: boids flock, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Danny Hillis, demand response, disintermediation, distributed generation, diversification, diversified portfolio, Emanuel Derman, en.wikipedia.org, experimental economics, financial innovation, fixed income, Gordon Gekko, implied volatility, index arbitrage, index fund, information retrieval, intangible asset, Internet Archive, John Nash: game theory, Kenneth Arrow, load shedding, Long Term Capital Management, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, market fragmentation, market microstructure, Mars Rover, Metcalfe’s law, moral hazard, mutually assured destruction, Myron Scholes, natural language processing, negative equity, Network effects, optical character recognition, paper trading, passive investing, pez dispenser, phenotype, prediction markets, quantitative hedge fund, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Renaissance Technologies, risk tolerance, risk-adjusted returns, risk/return, Robert Metcalfe, Ronald Reagan, Rubik’s Cube, semantic web, Sharpe ratio, short selling, Silicon Valley, Small Order Execution System, smart grid, smart meter, social web, South Sea Bubble, statistical arbitrage, statistical model, Steve Jobs, Steven Levy, Tacoma Narrows Bridge, the scientific method, The Wisdom of Crowds, time value of money, too big to fail, transaction costs, Turing machine, Upton Sinclair, value at risk, Vernor Vinge, yield curve, Yogi Berra, your tax dollars at work

The new so-called adaptive market hypothesis and a certain degree of common sense allow that some news ( but not all) is news to everyone at the same time, and that someone can be the first to profit from it. This opens yet another front in the algo wars. In the past year, we have seen the major news providers, Dow Jones25 and Reuters,26 offering costly high-end, low-latency news feeds designed for machines. In addition to being faster, they include extensive XML tagging for a variety of stories. These semantic Web approaches allow clever algo warriors to extract the salient facts with much greater accuracy than they could achieve writing code to parse plaintext feeds designed for human readers. What kind of tags are they talking about? The Dow Jones product is described as over 150 macroeconomic indicators, in developed markets, and a wide range of news on publicly traded U.S. and Canadian firms, as well as some in the United Kingdom.

Collectively, the new alphabet soup of technologies—AI, IA, NLP, and IR (artificial intelligence, intelligence amplification, natural language processing, and information retrieval, for those with a bigger soup bowl)—provides a means to make sense of patterns in the data collected in enterprise and global search. These means are molecular search, the use of persistent software agents so you don’t have to keep doing the same thing all the time; the semantic Web, using the information associated with data at the point of origin so there is less guessing about meaning of what find; and modern user interfaces and visualizations, so you can prioritize what you find, and focus on the important and the valuable in a timely way. The SEC: The Mother Lode of Pre-News The Securities and Exchange Commission is a good place to start looking for pre-news. There many reasons for this.

This came along at the same time 218 Nerds on Wall Str eet as other Enron (and WorldCom and Tyco)-inspired reforms in the Sarbanes-Oxley Act. The elimination of the time disadvantage for ordinary investors, paying only with their taxes and using the SEC web site, is an overdue improvement in a system that (literally) delivered yesterday’s news for its first six years of existence. Other advances were slower in coming. The filings themselves remain unstructured text files, with no sign of the semantic Web and XML ideas that are used to deliver meaningful information in many other contexts. After years of lip service to modernizing EDGAR, SEC chairman Christopher Cox (who took office in 2005) made a serious effort to do so, replacing TRW with more Internet-savvy firms and actually demonstrating prototypes that allow extraction of specific content from SEC filings. A description of the agency’s plans for this “21st Century Disclosure Initiative” is now featured prominently on the home page of the SEC site.


pages: 394 words: 118,929

Dreaming in Code: Two Dozen Programmers, Three Years, 4,732 Bugs, and One Quest for Transcendent Software by Scott Rosenberg

A Pattern Language, Benevolent Dictator For Life (BDFL), Berlin Wall, c2.com, call centre, collaborative editing, conceptual framework, continuous integration, Donald Knuth, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dynabook, en.wikipedia.org, Firefox, Ford paid five dollars a day, Francis Fukuyama: the end of history, George Santayana, Grace Hopper, Guido van Rossum, Gödel, Escher, Bach, Howard Rheingold, HyperCard, index card, Internet Archive, inventory management, Jaron Lanier, John Markoff, John von Neumann, knowledge worker, Larry Wall, life extension, Loma Prieta earthquake, Menlo Park, Merlin Mann, Mitch Kapor, new economy, Nicholas Carr, Norbert Wiener, pattern recognition, Paul Graham, Potemkin village, RAND corporation, Ray Kurzweil, Richard Stallman, Ronald Reagan, Ruby on Rails, semantic web, side project, Silicon Valley, Singularitarianism, slashdot, software studies, source of truth, South of Market, San Francisco, speech recognition, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Stewart Brand, Ted Nelson, Therac-25, thinkpad, Turing test, VA Linux, Vannevar Bush, Vernor Vinge, web application, Whole Earth Catalog, Y2K

You could model just about anything in a simple three-part format that looked something like the subject-verb-object arrangement of a simple English sentence: <this> <has-relationship-with> <that> Then they discovered that the answer they’d come up with had already been outlined and at least partially implemented by researchers led by Tim Berners-Lee, the scientist who had invented the World Wide Web a dozen years before. Berners-Lee had a dream he called the Semantic Web, an upgraded version of the existing Web that relied on smarter and more complex representations of data. The Semantic Web would be built on a technical foundation called RDF, for Resource Description Framework. RDF stores all information in “triples”—statements in three parts that declare relationships between things. This was very close to the structure Sagen had independently sketched out, with the advantage that a considerable amount of work over several years had already been put into codifying the details.

In meetings through November and December, culminating in a marathon right after the New Year, the Chandler team struggled toward an elusive consensus. The RDF-based Shimmer repository was something Morgen Sagen had built expressly as a prototype; it couldn’t simply be hitched onto the real Chandler. Besides, John Anderson had never gotten the RDF religion. The whole RDF enterprise had a reputation for academic complexity and impracticality. There were lots of papers about the Semantic Web, but not a lot of working software. As one programmer after another had a look at the world of RDF, each came to a similar conclusion: It was “scary.” Anderson knew how much work programming Chandler’s user interface would be. He had been there before and understood how critical it was to keep that job manageable; it was the area most likely to cause endless delay. His chief requirement for the repository was to make things easier for the front-end developers.

“We took the plan out”: From “Painful Birth: Creating New Software Was Agonizing Task for Mitch Kapor Firm” by Paul B. Carroll, Wall Street Journal, May 11, 1990. CHAPTER 3 PROTOTYPES AND PYTHON “a crew of twenty people”: Artist Chris Cobb’s project at the Adobe Bookstore in San Francisco is chronicled at the McSweeney’s Web site at http://www.mcsweeneys.net/links/events/chriscobb. htm. Information about the Semantic Web and RDF is at http://www.w3.org/2001/sw/. “plan to throw one away” and “promise to deliver a throwaway”: Frederick Brooks, The Mythical Man-Month (Addison Wesley, 1995), pp. 115–16. “The programmer, like the poet”: Ibid., p. 7. “The lunatic, the lover, and the poet”: William Shakespeare, A Midsummer Night’s Dream, Act V, sc. i. “The process of combining multiple”: The phrase is from Wikipedia’s definition of “Abstraction (computer science).”


pages: 570 words: 115,722

The Tangled Web: A Guide to Securing Modern Web Applications by Michal Zalewski

barriers to entry, business process, defense in depth, easy for humans, difficult for computers, fault tolerance, finite state, Firefox, Google Chrome, information retrieval, RFC: Request For Comment, semantic web, Steve Jobs, telemarketer, Turing test, Vannevar Bush, web application, WebRTC, WebSocket

In the traditional HTML parser in Firefox versions prior to 4, any occurrence of “--”, later followed by “>”, is also considered good enough. The Battle over Semantics The low-level syntax of the language aside, HTML is also the subject of a fascinating conceptual struggle: a clash between the ideology and the reality of the online world. Tim Berners-Lee always championed the vision of a semantic web, an interconnected system of documents in which every functional block, such as a citation, a snippet of code, a mailing address, or a heading, has its meaning explained by an appropriate machine-readable tag (say, <cite>, <code>, <address>, or <h1> to <h6>). This approach, he and other proponents argued, would make it easier for machines to crawl, analyze, and index the content in a meaningful way, and in the near future, it would enable computers to reason using the sum of human knowledge.

Although tags such as <font> have been successfully obsoleted and largely abandoned in favor of CSS, this is only because stylesheets offered more powerful and consistent visual controls. With the help of CSS, the developers simply started relying on a soup of semantically agnostic <span> and <div> tags to build everything from headings to user-clickable buttons, all in a manner completely opaque to any automated content extraction tools. Despite having had a lasting impact on the design of the language, in some ways, the idea of a semantic web may be becoming obsolete: Online content less frequently maps to the concept of a single, viewable document, and HTML is often reduced to providing a convenient drawing surface and graphic primitives for JavaScript applications to build their interfaces with. * * * [25] To process HTML documents, Internet Explorer uses the Trident engine (aka MSHTML); Firefox and some derived products use Gecko; Safari, Chrome, and several other browsers use WebKit; and Opera relies on Presto.

See Safari (Apple), Type-Specific Content Inclusion, Content Rendering with Browser Plug-ins, Sun Java, Cross-Domain Content Inclusion application/binary, Detection for Non-HTTP Files application/javascript document type, Plaintext Files application/json document type, Plaintext Files, Unrecognized Content Type application/mathml+xml document type, Audio and Video application/octet-stream document type, Special Content-Type Values, Detection for Non-HTTP Files application/x-www-for-urlencoded, Forms and Form-Triggered Requests Arce, Ivan, Information Security in a Nutshell Arya, Abhishek, Character Set Inheritance and Override asynchronous XMLHttpRequest, Interactions with Browser Credentials Atom, RSS and Atom Feeds authentication, in HTTP, HTTP Cookie Semantics Authorization header (HTTP), HTTP Authentication authorization, vs. authentication, HTTP Cookie Semantics B background parameter for HTML tags, Type-Specific Content Inclusion background processes, in JavaScript, Content-Level Features \ (backslashes) in URLs, browser acceptance of, Fragment ID backslashes (\) in URLs, browser acceptance of, Fragment ID ` (backticks), as quote characters, Understanding HTML Parser Behavior, The Document Object Model backticks (`), as quote characters, Understanding HTML Parser Behavior, The Document Object Model Bad Request status error (400), 300-399: Redirection and Other Status Messages bandwidth, and XML, XML User Interface Language Barth, Adam, Nonconvergence of Visions, Frame Descendant Policy and Cross-Domain Communications, XDomainRequest, Other Uses of the Origin Header, Sandboxed Frames, URL- and Protocol-Level Proposals Base64 encoding, Header Character Set and Encoding Schemes basic credential-passing method, HTTP Authentication Bell-La Padula security model, Flirting with Formal Solutions, Flirting with Formal Solutions Berners-Lee, Tim, Tales of the Stone Age: 1945 to 1994, Tales of the Stone Age: 1945 to 1994, The First Browser Wars: 1995 to 1999, Hypertext Transfer Protocol, Hypertext Markup Language, Document Parsing Modes and semantic web, Document Parsing Modes World Wide Web browser, Tales of the Stone Age: 1945 to 1994 World Wide Web Consortium, The First Browser Wars: 1995 to 1999 binary HTTP, URL- and Protocol-Level Proposals bitmap images, browser recognition of, Plaintext Files blacklists, Same-Origin Policy for XMLHttpRequest, Same-Origin Policy for XMLHttpRequest, New and Upcoming Security Features malicious URLs, New and Upcoming Security Features of HTTP headers in XMLHttpRequest, Same-Origin Policy for XMLHttpRequest BMP file format, Type-Specific Content Inclusion BOM (byte order marks), Character Set Handling Breckman, John, Referer Header Behavior browser cache, Caching Behavior, Caching Behavior, Caching Behavior information in, Caching Behavior poisoning, Caching Behavior browser extensions and UI, Pseudo-URLs browser market share, May 2011, Global browser market share, May 2011 browser wars, The First Browser Wars: 1995 to 1999, A Glimpse of Things to Come browser-managed site permissions, Extrinsic Site Privileges browser-side scripts, Browser-Side Scripts buffer overflow, Common Problems Unique to Server-Side Code bugs, preventing classes of, Enlightenment Through Taxonomy Bush, Vannevar, Toward Practical Approaches byte order marks (BOM), Character Set Handling C cache manifests, URL- and Protocol-Level Proposals cache poisoning, Access to Internal Networks, Vulnerabilities Specific to Web Applications Cache-Control directive, Resolution of Duplicate or Conflicting Headers, Caching Behavior cache.


pages: 188 words: 9,226

Collaborative Futures by Mike Linksvayer, Michael Mandiberg, Mushon Zer-Aviv

4chan, AGPL, Benjamin Mako Hill, British Empire, citizen journalism, cloud computing, collaborative economy, corporate governance, crowdsourcing, Debian, en.wikipedia.org, Firefox, informal economy, jimmy wales, Kickstarter, late capitalism, loose coupling, Marshall McLuhan, means of production, Naomi Klein, Network effects, optical character recognition, packet switching, postnationalism / post nation state, prediction markets, Richard Stallman, semantic web, Silicon Valley, slashdot, Slavoj Žižek, stealth mode startup, technoutopianism, the medium is the message, The Wisdom of Crowds, web application, WikiLeaks

The announcement of Google Wave was probably the most ambitious vision for a decentralized collaborative protocol coming from Silicon Valley. It was launched with the same celebratory terminology propagated by the selfproclaimed social media gurus, only to be terminated a year later when the vision could not live up to the hype. Web 3.0 is also bullshit. The term was initially used to describe a web enhanced by Semantic Web technologies. However, these technologies have been developed painstakingly over essentially the entire history of the web and deployed increasingly in the la er part of the last decade. Many Open Source projects reject the arbitrary and counter-productive terminology of “dot releases” the difference between the 2.9 release and the 3.0 release should not necessarily be more substantial than the one between 2.8 and 2.9.

Publishing the entire “research compendium” under appropriate terms (e.g. usually public domain for data, a free so ware license for so ware, and a liberal Creative Commons license for articles and other content) and in open formats has recently been called “reproducible research”—in computational fields, the publication of such a compendium gives other researches all of the tools they need to build upon one’s work. Standards are also very important for enabling scientific collaboration, and not just coarse standards like RSS. The Semantic Web and in particular ontologies have sometimes been ridiculed by consumer web developers, but they are necessary for science. How can one treat the world's scientific literature as a database if it isn't possible to identify, for example, a specific chemical or gene, and agree on a name for the chemical or gene in question that different programs can use interoperably? The biological sciences have taken a lead in implementation of semantic technologies, from ontology development and semantic databases to in-line web page annotation using RDFa.


pages: 903 words: 235,753

The Stack: On Software and Sovereignty by Benjamin H. Bratton

1960s counterculture, 3D printing, 4chan, Ada Lovelace, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, basic income, Benevolent Dictator For Life (BDFL), Berlin Wall, bioinformatics, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, carbon footprint, carbon-based life, Cass Sunstein, Celebration, Florida, charter city, clean water, cloud computing, connected car, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Douglas Engelbart, Edward Snowden, Elon Musk, en.wikipedia.org, Eratosthenes, Ethereum, ethereum blockchain, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, Hyperloop, illegal immigration, industrial robot, information retrieval, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Jacob Appelbaum, Jaron Lanier, Joan Didion, John Markoff, Joi Ito, Jony Ive, Julian Assange, Khan Academy, liberal capitalism, lifelogging, linked data, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megastructure, Menlo Park, Minecraft, MITM: man-in-the-middle, Monroe Doctrine, Network effects, new economy, offshore financial centre, oil shale / tar sands, packet switching, PageRank, pattern recognition, peak oil, peer-to-peer, performance metric, personalized medicine, Peter Eisenman, Peter Thiel, phenotype, Philip Mirowski, Pierre-Simon Laplace, place-making, planetary scale, RAND corporation, recommendation engine, reserve currency, RFID, Robert Bork, Sand Hill Road, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, Slavoj Žižek, smart cities, smart grid, smart meter, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, TaskRabbit, the built environment, The Chicago School, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, undersea cable, universal basic income, urban planning, Vernor Vinge, Washington Consensus, web application, Westphalian system, WikiLeaks, working poor, Y Combinator

These range from the prosaic (Google “my car keys” to find them under the couch) to the barely fathomable (“search the contagion distribution of the RNA in the virus that laid me up”). Just as for today's web pages, search providers are eager to provide more direct services built directly into query results themselves by predictively interpreting the intention of the query and providing its likely solution along with tools for the User to accomplish that intention as part of the search result. These are techniques sometimes associated with the semantic web, for which structured data are linked and associated to allow instrumental relations with other data, making the web as a whole more programmable by Users. Through various combinations of open or proprietary exigetics of data, and perhaps a sequence of application programming interfaces (APIs), a query entered as “book me a ticket to New York” can activate a series of secondary inquiries to calendars, banks, flight schedules, airline databases, bank accounts, and so on and, through this, initiate the cascading programming resulting in that booking.

The designation of semantic relations between objects, according to some disinterested (or extremely interested and capitalized) graph of addresses and their interlocking sets, might reorganize what we take to be the natural proximities of one thing to one another and introduce another map (even topology) of queryable association between them. This resulting platform might provide for the programming and counterprogramming of the resulting object landscapes and event graphs, putting them to direct use, as well as providing secondary metadata about their efficacy or accuracy. Just as most of the traffic on the Internet today is machine-to-machine, or at least machine generated, so too a semantic web of things21 would be correlated less by the cognitive dispositions or instrumental intentions of human Users, but those of “objects” and other instances within the larger meta-assemblage all querying and programming one another without human intervention or supervision. In the hype, it's easy to forget that the Internet of Things is also an Internet for Things (or for any addressable entity, however immaterial).

Kerry Stevenson, “The 3D Printer Virus, Really?” Fabbaloo, April 7, 2010, http://www.fabbaloo.com/blog/2010/4/7/the-3d-printer-virus-really.html. 20.  Cory Doctorow, “Metacrap: Putting the Torch to Seven Straw-men of the Meta-Utopia,” Well, August 26, 2011. 21.  Payam Barnaghi, Cory Henson, Kerry Taylor, and Wei Wang, “Semantics for the Internet of Things: Early Progress and Back to the Future,” International Journal on Semantic Web and Information System 8, no. 1 (2012): 1–21, http://knoesis.org/library/download/IJSWIS_SemIoT.pdf. 22.  Yann Moulier-Boutang, Cognitive Capitalism (London: Polity Press, 2012). 23.  Open Internet of Things Assembly, “Bill of Rights” http://postscapes.com/open-internet-of-things-assembly. (July 17, 2012). 24.  See, for example, Saul A. Kripke, Naming and Necessity (Cambridge, MA: Harvard University Press, 1980).


pages: 573 words: 163,302

Year's Best SF 15 by David G. Hartwell; Kathryn Cramer

air freight, Black Swan, disruptive innovation, experimental subject, Georg Cantor, gravity well, job automation, Kuiper Belt, phenotype, semantic web

I’d seen him trudging the porticoes in Turin, hunch-shouldered, slapping his feet, always looking sly and preoccupied. You only had to see the man to know that he had an agenda like no other writer in the world. “When Calvino finished his six lectures,” mused Massimo, “they carried him off to CERN in Geneva and they made him work on the ‘Semantic Web.’ The Semantic Web works beautifully, by the way. It’s not like your foul little Internet—so full of spam and crime.” He wiped the sausage knife on an oil-stained napkin. “I should qualify that remark. The Semantic Web works beautifully—in the Italian language. Because the Semantic Web was built by Italians. They had a little bit of help from a few French Oulipo writers.” “Can we leave this place now? And visit this Italy you boast so much about? And then drop by my Italy?” “That situation is complicated,” Massimo hedged, and stood up.


Digital Accounting: The Effects of the Internet and Erp on Accounting by Ashutosh Deshmukh

accounting loophole / creative accounting, AltaVista, business continuity plan, business intelligence, business process, call centre, computer age, conceptual framework, corporate governance, data acquisition, dumpster diving, fixed income, hypertext link, interest rate swap, inventory management, iterative process, late fees, money market fund, new economy, New Journalism, optical character recognition, packet switching, performance metric, profit maximization, semantic web, shareholder value, six sigma, statistical model, supply-chain management, supply-chain management software, telemarketer, transaction costs, value at risk, web application, Y2K

This formatted memo was then run through XML Spy and the associated style sheet was generated. This style sheet is based on XHTML and will be needed to display the formatted memo exactly on the Web. • Web communication and services: Languages in this area handle communications in the client-server environment, define protocols for exchange of information and describe Web services. • Semantic Web and RDF: XML is also providing building blocks for Semantic Web. Semantic web refers to the extension of the current Web where information definition is standardized, enabling automated tools to process data. This standardization also leads to better linking of information and easier discovery, integration and reuse of data. Such a web will enable collaborative processing of data by humans and computers in a symbiotic fashion. The primary effort by W3C in this area is RDF.

Examples of XML supplementary technologies Validation and linking technologies • • • • • • • XML DTD XML Schema XLink X Base XPath XPointer XFragment Transformation technologies • • • • • • • XSL XSLT Canonical XML XQuery XInclude DOM SAX • • • • • • • • o o o • • • • • • MathML SMIL SMIL Animation SVG Voice XML/CCXML XHTML XFrames XForms CC/PP SOAP/XMLP WSDL/WSCL RDF RDF Schema XML Signature XKMS P3P Encrypted Data Processor technologies XML applications • Non-text applications • Publishing on the Web o Web communication and services • Semantic Web and Resource Description Framework (RDF) Security applications increasingly popular and may become a dominant method. DTDs have a solid installed base, since DTDs are used in SGML, and are not likely to vanish in the short-term. The popularity of the Web can be partially traced to its hyperlink capabilities. The user can jump from one page to another across Web sites and geographical boundaries.


pages: 680 words: 157,865

Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design by Diomidis Spinellis, Georgios Gousios

Albert Einstein, barriers to entry, business intelligence, business process, call centre, continuous integration, corporate governance, database schema, Debian, domain-specific language, don't repeat yourself, Donald Knuth, en.wikipedia.org, fault tolerance, Firefox, general-purpose programming language, iterative process, linked data, locality of reference, loose coupling, meta analysis, meta-analysis, MVC pattern, peer-to-peer, premature optimization, recommendation engine, Richard Stallman, Ruby on Rails, semantic web, smart cities, social graph, social web, SPARQL, Steve Jobs, Stewart Brand, traveling salesman, Turing complete, type inference, web application, zero-coupon bond

He has spoken at conferences around the world and writes about web-oriented technologies for several online publications. His experience has spanned the defense, financial, and commercial domains. He has designed and built network matrix switch control systems, online games, 3D simulation/visualization environments, Internet-distributed computing platforms, P2P, and Semantic Web-based systems. He has a B.S. in computer science from the College of William and Mary and currently lives in Fairfax, Virginia. He is the president of Bosatsu Consulting, Inc., a professional services company focused on web architecture, resource-oriented computing, the Semantic Web, advanced user interfaces, scalable systems, security consulting, and other technologies of the late 20th and early 21st centuries. Diomidis Spinellis is an Associate Professor in the Department of Management Science and Technology at the Athens University of Economics and Business in Greece.

It was a forked version of Apache 1.0, written in C and reflecting the state of the art at the time.[28] It has been a steady piece of Internet infrastructure since then, but it was showing its age and needed modernization, particularly to support the W3C TAG’s 303 recommendation and higher volumes of use. Most of the data was accessible through web pages or ad hoc CGI-bin scripts because at the time, the browser seemed like the only real client to serve. As we started to realize the applicability of persistent, unambiguous identifiers for use in the Semantic Web, life sciences, publication, and similar communities, we knew that it was time to rethink the architecture to be more useful for both people and software. The PURL system was designed to mediate the tension between good names and resolvable names. Anyone who has been publishing content on the Web over time knows that links break when content gets moved around. The notion of a Persistent URL is one that has a good, logical name that maps to a resolvable location.

These cache policies can be set on a per-folder, per-account, or backend basis and are enforced by a component running in its own thread in the server with lower priority, regularly inspecting the database for data that can be purged according to all policies applicable to it. Among the major missing puzzle pieces that were identified in the architecture at the 2007 meeting was how to approach searching and semantic linking. The KDE 4 platform was gaining powerful solutions for pervasive indexing, rich metadata handling, and semantic webs with the Strigi and Nepomuk projects, which could yield very interesting possibilities when integrated with Akonadi. It was unclear whether a component feeding data into Strigi for full indexing could be implemented as an agent, a separate process operating on the notifications from the core, or would need to be integrated into the server application itself for performance reasons. Since at least the full text index information would be stored outside of Akonadi, a related question was how search queries would be split up, how and where results from Strigi and Akonadi itself would be aggregated, and how queries could be passed through to backend server systems capable of online searching, such as an LDAP server.


pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts by Richard Susskind, Daniel Susskind

23andMe, 3D printing, additive manufacturing, AI winter, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, Andrew Keen, Atul Gawande, Automated Insights, autonomous vehicles, Big bang: deregulation of the City of London, big data - Walmart - Pop Tarts, Bill Joy: nanobots, business process, business process outsourcing, Cass Sunstein, Checklist Manifesto, Clapham omnibus, Clayton Christensen, clean water, cloud computing, commoditize, computer age, Computer Numeric Control, computer vision, conceptual framework, corporate governance, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, death of newspapers, disintermediation, Douglas Hofstadter, en.wikipedia.org, Erik Brynjolfsson, Filter Bubble, full employment, future of work, Google Glasses, Google X / Alphabet X, Hacker Ethic, industrial robot, informal economy, information retrieval, interchangeable parts, Internet of things, Isaac Newton, James Hargreaves, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Khan Academy, knowledge economy, lifelogging, lump of labour, Marshall McLuhan, Metcalfe’s law, Narrative Science, natural language processing, Network effects, optical character recognition, Paul Samuelson, personalized medicine, pre–internet, Ray Kurzweil, Richard Feynman, Second Machine Age, self-driving car, semantic web, Shoshana Zuboff, Skype, social web, speech recognition, spinning jenny, strong AI, supply-chain management, telepresence, The Future of Employment, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, transaction costs, Turing test, Watson beat the top human players on Jeopardy!, WikiLeaks, young professional

Next, there are machines that can interact with apparent manual skill and dexterity in the physical world (robotics). Finally, there are systems that can detect and express emotions (affective computing). Volumes have already been written on each of these four subjects. We try to give an overview rather than make an academic assessment. We are not suggesting, incidentally, that these are the only important developments. We could also have added the ‘semantic web’, ‘search algorithms’, and ‘intelligent agents’.33 But to debate which technologies are primary distracts from the bigger point—that, exploiting various technologies, our machines will continue to become increasingly capable, and able to discharge more and more tasks that we used to think were the distinctive province of human beings. Big Data In 1988, foreshadowing much that is now claimed in the field of ‘Big Data’, Harvard’s Soshana Zuboff made the following claim in her ground-breaking book In the Age of the Smart Machine: ‘Information technology not only produces action but also produces a voice that symbolically renders events, objects, and processes so that they become visible, knowable, and shareable in a new way.’34 In more homely terms, she was referring to the value of the great streams of information that are generated as a by-product of computerization.

Searle, John, Mind, Language and Society (London: Weidenfeld & Nicolson, 1999). Searle, John, ‘Watson Doesn’t Know it Won on “Jeopardy!”’, Wall Street Journal, 23 Feb. 2011 <http://www.wsj.com> (accessed 28 March 2015). Seidman, Dov, How (Hoboken, NJ: Wiley, 2007). Sennett, Richard, The Craftsman (London: Penguin Books, 2009). Sennett, Richard, Together (London: Allen Lane, 2012). Shadbolt, Nigel, Wendy Hall, and Tim Berners-Lee, ‘The Semantic Web Revisited’, IEEE Intelligent Systems, 21: 3 (2006), 96–101. Shanteau, James, ‘Cognitive Heuristics and Biases in Behavioral Auditing: Review, Comments, and Observations’, Accounting, Organizations, and Society, 14: 1 (1989), 165–77. Shapiro, Carl, and Hal Varian, Information Rules (Boston: Harvard Business School Press, 1999). Shapiro, Carl, and Hal Varian, ‘Versioning: The Smart Way to Sell Information’, in James Gilmore and Joseph Pine (eds.), Markets of One (Boston: Harvard Business School Press, 2000).

Wickenden, William, A Professional Guide for Young Engineers (New York: Engineers’ Council for Professional Development, 1949). Widdicombe, Lizzie, ‘From Mars’, New Yorker, 23 Sept. 2013. WikiHouse, ‘WikiHouse 4.0’ <http://www.wikihouse.cc/news-2/> (accessed 8 March 2015). Wikistrat, ‘Become an Analyst’ <http://www.wikistrat.com/become-an-analyst> (accessed 8 March 2015). Wilensky, Harold, ‘The Professionalization of Everyone?’, American Journal of Sociology, 70: 2 (1964), 137–58. Wilks, Yorick, ‘What is the Semantic Web and What Will it Do for eScience’, Research Report, No.12, Oxford Internet Institute, October 2006. Winner, Langdon, Autonomous Technology (Cambridge, Mass.: MIT Press, 1977). Winner, Langdon, ‘Technology Today: Utopia or Dystopia?’, in Technology and the Rest of Culture, ed. Arien Mack (Columbus, Ohio: Ohio State University Press, 2001). Winograd, Terry, Language as a Cognitive Process (Boston: Addison-Wesley, 1982).


pages: 66 words: 9,247

MongoDB and Python by Niall O’Higgins

cloud computing, Debian, fault tolerance, semantic web, web application

I’ve worked with most of the usual relational databases (MSSQL Server, MySQL, PostgreSQL) and with some very interesting nonrelational databases (Freebase.com’s Graphd/MQL, Berkeley DB, MongoDB). MongoDB is at this point the system I enjoy working with the most, and choose for most projects. It sits somewhere at a crossroads between the performance and pragmatism of a relational system and the flexibility and expressiveness of a semantic web database. It has been central to my success in building some quite complicated systems in a short period of time. I hope that after reading this book you will find MongoDB to be a pleasant database to work with, and one which doesn’t get in the way between you and the application you wish to build. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions.


pages: 272 words: 83,378

Digital Barbarism: A Writer's Manifesto by Mark Helprin

Albert Einstein, anti-communist, Berlin Wall, carbon footprint, computer age, crowdsourcing, hive mind, invention of writing, Jacquard loom, lateral thinking, plutocrats, Plutocrats, race to the bottom, semantic web, Silicon Valley, Silicon Valley ideology, the scientific method, Yogi Berra, zero-sum game

Had each Turkish soldier had to decide individually whether or not to make that winter ascent, they all might have thought harder and better about it in the absence of so many others carrying them and their orders along on an utterly worthless wave of quick-set belief. In the electronic culture, however, the decision has already been made in regard to such things. To quote Jeremy Carroll, chief product architect of Top Quadrant, discussing an aspect of his work: “Semantic Web technology…” will make possible “consensus instructions from many different sources, or instructions that other people have already found helpful (rather than back-breaking searches and comparisons).”23 It is the labor, care, and learning in making such comparisons that bring the benefits of experience, a sharp eye, and good judgment. As anyone who has ever used it knows, the internet is a magnificent (if often unreliable) research tool.

See also Taxes from copyright, 111 royalties, 47, 51, 74, 78, 113, 158 A River Runs Through It (Maclean), 164 Robinson Crusoe (DeFoe), 119 Roth, Philip, 114 Royalties, 47, 51, 74, 78, 113, 158 Rushdie, Salman, 75 Russia, copyright law in, 128 S Satie, Erik, 80 Schlesinger, Arthur, 89 Schumann, Robert, 80 SEC. See Securities and Exchange Commission Second World War, 192, 196 Securities and Exchange Commission (SEC), 29 “Semantic Web technology,” 64 Seward, William H., 59–60 Sex, 17 Shakespeare, William, 179, 194 Sharpton, Al, 166 Signet Society (Harvard), 183 Silent Spring (Carson), 105 Silicon Valley, xiii, 205 Sinatra, Frank, 24 Skull and Bones (Yale), 182 Smith, Kate, 52 Social contract, 173 Socialism, 168 Social Security, 81 Social theorists, 185 Software, piracy, 38, 214 A Soldier of the Great War (Helprin), 113 Sonny Bono Copyright Term Extension Act (1998), 120, 125–126, 127, 139, 140 Sports, 94–95 Star Wars, 164 Statute of Queen Anne (1709), 124, 127 Stevens, Justice, 115 Story, Joseph, 124 Sweden, copyright law in, 128 Switzerland, copyright law in, 128 T Tartakovsky, Joseph, 87 Taubman, Arnold, 24 Taxes, 81, 86, 87, 171–172.


pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists by Gary Marcus, Jeremy Freeman

23andMe, Albert Einstein, bioinformatics, bitcoin, brain emulation, cloud computing, complexity theory, computer age, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data acquisition, Drosophila, epigenetics, global pandemic, Google Glasses, iterative process, linked data, mouse model, optical character recognition, pattern recognition, personalized medicine, phenotype, race to the bottom, Richard Feynman, Ronald Reagan, semantic web, speech recognition, stem cell, Steven Pinker, supply-chain management, Turing machine, twin studies, web application

INCF has worked closely with partners, including the Allen Institute, University of Oslo, Duke University, University of Edinburgh, and others, to develop a standard coordinate space for mouse brain data, dubbed “Waxholm Space” and web services to facilitate translation between mouse brain atlases. In addition, in collaboration with the Neuroscience Information Framework (NIF, San Diego) it has produced community consensus ontologies and nomenclatures for neurons and brain structures, which have been placed in a public wiki (www.neurolex.org) employing the latest semantic web technologies. INCF supports working groups of experts from around the world to produce new standards, tools, services, and guidelines for the global community. With the advent of multiple large-scale brain initiatives around the world, INCF is well positioned to help coordinate standards and infrastructure between such projects at a global scale. INCF has agreed to coordinate some of the tools for brain atlases in the HBP Neuroinformatics Platform, and the HBP will build off of INCF infrastructures and adhere to INCF standards and guidelines whenever applicable.

Ontologies formalize the definitions of these structures and their names (and synonyms) so that the relationships between entities are explicit. Alternatively, by annotating the data with the spatial coordinates of where it was measured, it would be associated with the volume that has been named reticular nucleus of the thalamus. Careful curation of data and annotating it using the next generation semantic web technologies and spatial coordinates, each piece of data will be part of a rich brain atlas integrated with a web of knowledge about the brain. The Neuroinformatics Platform, coordinated by groups from the École Polytechnique Fédérale de Lausanne (EPFL), Karolinska Institute, University of Oslo, Forschungszentrum Jülich, Universidad Politécnica de Madrid, and Radboud Universiteit Nijmegen, will provide the tools for organizing neuroscience data in atlases that bring together collections of data about the mouse and human brains from around the world.


pages: 597 words: 119,204

Website Optimization by Andrew B. King

AltaVista, bounce rate, don't be evil, en.wikipedia.org, Firefox, In Cold Blood by Truman Capote, information retrieval, iterative process, Kickstarter, medical malpractice, Network effects, performance metric, search engine result page, second-price auction, second-price sealed-bid, semantic web, Silicon Valley, slashdot, social graph, Steve Jobs, web application

So, instead of this: http://wwww.example.com/index.php?cat=53 do this: http://wwww.example.com/index?=photovoltaic+panels Even better, remove all the variable query characters (?, $, and #): http://www.example.com/photovoltaic+panels By eliminating the suffix to URIs, you avoid broken links and messy mapping when changing technologies in the future. See Chapter 9 for details on URI rewriting. See also "Cool URIs for the Semantic Web," at http://www.w3.org/TR/cooluris/. Write compelling summaries In newspaper parlance, the description that goes with a headline is called a deck or a blurb. Great decks summarize the story in a couple of sentences, enticing the user to read the article. Include keywords describing the major theme of the article for search engines. Don't get too bogged down in the details of your story.

The following short example shows how the statement mentioned previously could be encoded in a web page: <div xmlns:dc="http://pURI.org/dc/elements/1.1/" about="http://www.oreilly.com/catalog/9780596515089"> <span property="dc:creator">Andy King</span> </div> Soon, a significant amount of traffic from search engines will depend on the extent to which the underlying site makes useful structured data available. Things such as microformats and RDFa have been around in various forms for years, but now that search engines are noticing them, SEO practitioners are starting to take note, too. * * * [34] http://www.techcrunch.com/2008/03/13/yahoo-embraces-the-semantic-web-expect-the-web-to-organize-itself-in-a-hurry/ [35] http://www.microformats.org [36] http://gmpg.org/xfn/11 [37] http://microformats.org/wiki/hcard [38] http://www.ietf.org/rfc/rfc2426.txt [39] http://www.w3.org/RDF/ [40] http://www.w3.org/TR/rdfa-syntax/ [41] http://www.w3.org/TR/curie Chapter 2. SEO Case Study: PhillyDentistry.com In this chapter, we'll show you how to put into action the optimization techniques that you learned in Chapter 1 and the conversion techniques you'll learn in Chapter 5.


Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data by Dipanjan Sarkar

bioinformatics, business intelligence, computer vision, continuous integration, en.wikipedia.org, general-purpose programming language, Guido van Rossum, information retrieval, Internet of things, invention of the printing press, iterative process, natural language processing, out of africa, performance metric, premature optimization, recommendation engine, self-driving car, semantic web, sentiment analysis, speech recognition, statistical model, text mining, Turing test, web application

Semantic network around the concept fish In the network in Figure 1-17, we can see some of the concepts discussed earlier around fish and also specific types of fish like eel, salmon, shark, and so on, which can be hyponyms to the concept fish. These semantic networks are formally denoted and represented by semantic data models using graph structures, where concepts or entities are the nodes and the edges denote the relationships. The Semantic Web is as extension of the World Wide Web using semantic metadata annotations and embeddings using data-modeling techniques like Resource Description Framework (RDF) and Web Ontology Language (OWL). In linguistics, we have a rich lexical corpus and database called WordNet, which has an exhaustive list of different lexical entities grouped together based on semantic similarity (for example, synonyms) into synsets.

Keyphrase extraction, also known as terminology extraction, is defined as the process or technique of extracting key important and relevant terms or phrases from a body of unstructured text such that the core topics or themes of the text document(s) are captured in these key phrases. This technique falls under the broad umbrella of information retrieval and extraction. Keyphrase extraction finds its uses in many areas, including the following: Semantic web Query-based search engines and crawlers Recommendation systems Tagging systems Document similarity Translation Keyphrase extraction is often the starting point for carrying out more complex tasks in text analytics or NLP, and the output from this can itself act as features for more complex systems. There are various approaches for keyphrase extraction. We will be covering the following two techniques: Collocations Weighted tag–based phrase extraction An important thing to remember here is that we will be extracting phrases that are usually collections of words, though sometimes that can include a single word.


pages: 532 words: 139,706

Googled: The End of the World as We Know It by Ken Auletta

23andMe, AltaVista, Anne Wojcicki, Apple's 1984 Super Bowl advert, Ben Horowitz, bioinformatics, Burning Man, carbon footprint, citizen journalism, Clayton Christensen, cloud computing, Colonization of Mars, commoditize, corporate social responsibility, creative destruction, death of newspapers, disintermediation, don't be evil, facts on the ground, Firefox, Frank Gehry, Google Earth, hypertext link, Innovator's Dilemma, Internet Archive, invention of the telephone, Jeff Bezos, jimmy wales, John Markoff, Kevin Kelly, knowledge worker, Long Term Capital Management, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Network effects, new economy, Nicholas Carr, PageRank, Paul Buchheit, Peter Thiel, Ralph Waldo Emerson, Richard Feynman, Sand Hill Road, Saturday Night Live, semantic web, sharing economy, Silicon Valley, Skype, slashdot, social graph, spectrum auction, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, strikebreaker, telemarketer, the scientific method, The Wisdom of Crowds, Upton Sinclair, X Prize, yield management, zero-sum game

Google will have competition from Microsoft’s renamed and reengineered search engine, Bing, launched in May 2009, which in July 2009 finally succeeded in merging with Yahoo search. One could argue that the ultimate vertical search would be provided by Artificial Intelligence (AI), computers that could infer what users actually sought. This has always been an obsession of Google’s founders, and they have recruited engineers who specialize in AI. The term is sometimes used synonymously with another, “the semantic Web,” which has long been championed by Tim Berners-Lee. This vision appears to be a long way from becoming real. Craig Silverstein, Google employee number 1, said a thinking machine is probably “hundreds of years away” Marc Andreessen suggests that it is a pipe dream. “We are no closer to a computer that thinks like a person than we were fifty years ago,” he said. Sometimes lost in the excitement over the wonders of ever more relevant search is the potential social cost.

Davenport, “Reverse Engineering Google’s Innovation Machine,” Harvard Business Review, April 2008. 324 Its social network site: author interviews with Google executives in Russia, Jason Bush, “Where Google Isn’t Goliath,” BusinessWeek, June 26, 2008. 324 “These companies air kiss”: author interview with Andrew Lack, October 4, 2007. 324 Facebook had 200 million users: author interview with Sheryl Sandberg, March 30, 2009. 324 “Anybody that gets”: author interview with Bill Campbell, October 8, 2007. 325 Lee began with : author interview with Kwan Lee, February 10, 2009. 325 “lacks a social gene”: author interview with John Borthwick, April 28, 2008. 326 “If I were Google”: author interview with Danny Sullivan, August 27, 2007. 326 The problem with horizontal search: author interview with Jason Calacanus, September 21, 2007. 327 “the semantic web”: Katie Franklin, “Google May Be Displaced, Said World Wide Web Creator Tim Berners-Lee”, Daily Telegraph, March 3, 2008. 327 “hundreds of years away”: author interview with Craig Silverstein, September 17, 2007. 327 “We are no closer”: author interview with Marc Andreessen, March 27, 2008. 327 In his provocative book: Nicholas Carr, The Big Switch: Rewiring the World, from Edison to Google, W.


pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang

AI winter, artificial general intelligence, bioinformatics, brain emulation, combinatorial explosion, complexity theory, computer vision, conceptual framework, correlation coefficient, epigenetics, friendly AI, G4S, information retrieval, Isaac Newton, John Conway, Loebner Prize, Menlo Park, natural language processing, Occam's razor, p-value, pattern recognition, performance metric, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K

It also includes the biennial ECAI, the European Conference on Artificial Intelligence, proceedings volumes, and other ECCAI – the European Coordinating Committee on Artificial Intelligence – sponsored publications. An editorial panel of internationally well-known scholars is appointed to provide a high quality selection. Series Editors: J. Breuker, R. Dieng-Kuntz, N. Guarino, J.N. Kok, J. Liu, R. López de Mántaras, R. Mizoguchi, M. Musen and N. Zhong Volume 157 Recently published in this series Vol. 156. R.M. Colomb, Ontology and the Semantic Web Vol. 155. O. Vasilecas et al. (Eds.), Databases and Information Systems IV – Selected Papers from the Seventh International Baltic Conference DB&IS’2006 Vol. 154. M. Duží et al. (Eds.), Information Modelling and Knowledge Bases XVIII Vol. 153. Y. Vogiazou, Design for Emergence – Collaborative Social Play with Online and Location-Based Media Vol. 152. T.M. van Engers (Ed.), Legal Knowledge and Information Systems – JURIX 2006: The Nineteenth Annual Conference Vol. 151.

NARS can be connected to existing knowledge bases, such as Cyc (for commonsense knowledge), WordNet (for linguistic knowledge), Mizar (for mathematical knowledge), and so on. For each of them, a special interface module should be able to approximately translate knowledge from its original format into Narsese. x The Internet. It is possible for NARS to be equipped with additional modules, which use techniques like semantic web, information retrieval, and data mining, to directly acquire certain knowledge from the Internet, and put them into Narsese. x Natural language interface. After NARS has learned a natural language (as discussed previously), it should be able to accept knowledge from various sources in that language. Additionally, interactive tutoring will be necessary, which allows a human trainer to monitor the establishing of the knowledge base, to answer questions, to guide the system to form a proper goal structure and priority distributions among its concepts, tasks, and beliefs.


pages: 165 words: 50,798

Intertwingled: Information Changes Everything by Peter Morville

A Pattern Language, Airbnb, Albert Einstein, Arthur Eddington, augmented reality, Bernie Madoff, Black Swan, business process, Cass Sunstein, cognitive dissonance, collective bargaining, disruptive innovation, index card, information retrieval, Internet of things, Isaac Newton, iterative process, Jane Jacobs, John Markoff, Lean Startup, Lyft, minimum viable product, Mother of all demos, Nelson Mandela, Paul Graham, peer-to-peer, RFID, Richard Thaler, ride hailing / ride sharing, Schrödinger's Cat, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley startup, source of truth, Steve Jobs, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, the scientific method, The Wisdom of Crowds, theory of mind, uber lyft, urban planning, urban sprawl, Vannevar Bush, zero-sum game

N-directional links on Twitter. If you look deeper, you’ll see triples – subject, predicate, object – defining semantic relations as precisely as possible. In ontological experiments, domain-specific models of entities, relationships, and attributes push the limits of information visualization and knowledge discovery. We’re on the verge of teaching systems to make links that uncover new questions. Figure 3-4. The Semantic Web is built on triples. Of course, links aren’t limited to digital networks. A book affords random access with its index and citations. A park links places with signs, paths, and bridges. And, off course, we may need a table of contents or a map or a metaphor, so we might know where we can go from where we are. As Richard Saul Wurman says “we only understand something relative to something we already understand,” and this only grows more important as we orchestrate cross-channel services.


Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, longitudinal study, Mars Rover, natural language processing, openstreetmap, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social graph, SPARQL, speech recognition, statistical model, supply-chain management, text mining, Vernor Vinge, web application

How can we tell that location when it’s a property on a restaurant means the same thing as address, but location when used to describe a gene sequence means its position on a chromosome? In practice, this is usually CONNECTING DATA Download at Boykma.Com 341 a manual process, but if we expect to build systems that can easily integrate hundreds or thousands of databases, we need to find ways to eliminate a lot of the manual work involved in such integrations. Various efforts to resolve these naming problems have been attempted. In the Semantic Web community an effort called “Linked Open Data” has emerged, wherein people are encouraging one another to refer to specific objects (like a movie, a person, or a restaurant) by a standard Universal Resource Indicator (URI), so everyone knows when two people are talking about the same thing. There have also been several efforts to standardize on a set of ontologies, which describe what fields should be used to describe things like a restaurant or a movie in all cases.

His paper on the Birch clustering algorithm received the SIGMOD 10-Year Test-of-Time award, and he has written the widely used text Database Management Systems (with Johannes Gehrke; McGraw-Hill). He is Chair of ACM SIGMOD, and a Fellow of the ACM and IEEE. Toby Segaran is the author of two O’Reilly titles, the very popular Programming Collective Intelligence and the recently released Programming the Semantic Web. He currently works at Metaweb, where he develops large-scale reconciliation algorithms in an attempt to create a free database of shared keys for all other public databases. Prior to working at Metaweb, he started a biotech software company, which was acquired in 2003 by Genstruct, a systems biology company. Toby has a BS in computer science from MIT and lives in San Francisco with his wife, Brooke.


pages: 219 words: 63,495

50 Future Ideas You Really Need to Know by Richard Watson

23andMe, 3D printing, access to a mobile phone, Albert Einstein, artificial general intelligence, augmented reality, autonomous vehicles, BRICs, Buckminster Fuller, call centre, clean water, cloud computing, collaborative consumption, computer age, computer vision, crowdsourcing, dark matter, dematerialisation, digital Maoism, digital map, Elon Musk, energy security, failed state, future of work, Geoffrey West, Santa Fe Institute, germ theory of disease, global pandemic, happiness index / gross national happiness, hive mind, hydrogen economy, Internet of things, Jaron Lanier, life extension, Mark Shuttleworth, Marshall McLuhan, megacity, natural language processing, Network effects, new economy, oil shale / tar sands, pattern recognition, peak oil, personalized medicine, phenotype, precision agriculture, profit maximization, RAND corporation, Ray Kurzweil, RFID, Richard Florida, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Skype, smart cities, smart meter, smart transportation, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, supervolcano, telepresence, The Wisdom of Crowds, Thomas Malthus, Turing test, urban decay, Vernor Vinge, Watson beat the top human players on Jeopardy!, web application, women in the workforce, working-age population, young professional

Web 2.0 A term often used to describe Web applications that help individuals to share information online, examples being sites such as Facebook and YouTube. Sometimes referred to as the participatory or conversational Web. Web 3.0 The next stage of Web development, although the term causes much disagreement. Sometimes refers to the ability of search engines to answer complex questions. It can also refer to the personalized Web, semantic Web or the geo-tagging of information. Web 4.0 Like Web 3.0 but immersive.


pages: 226 words: 17,533

Programming Scala: tackle multicore complexity on the JVM by Venkat Subramaniam

augmented reality, continuous integration, domain-specific language, don't repeat yourself, loose coupling, semantic web, type inference, web application

You don’t have to throw away the time, money, and effort you’ve invested writing Java code. You can intermix Scala with Java libraries. You can build full applications entirely in Scala or intermix it to the extent you desire with Java and other languages on the JVM. So, your Scala code could be as small as a script or as large as a full-fledged enterprise application. Scala has been used to build applications in various domains including telecommunications, social networking, semantic web, and digital asset management. Apache Camel uses Scala for its DSL to create routing rules. Lift WebFramework is a powerful web development framework built using Scala. It takes full advantage of Scala features such as conciseness, expressiveness, pattern matching, and concurrency. 1.2 What’s Scala? Scala, short for Scalable Language, is a hybrid functional programming language. It was created by Martin Odersky3 and was first released in 2003.


Designing Search: UX Strategies for Ecommerce Success by Greg Nudelman, Pabini Gabriel-Petit

access to a mobile phone, Albert Einstein, AltaVista, augmented reality, barriers to entry, business intelligence, call centre, crowdsourcing, information retrieval, Internet of things, performance metric, QR code, recommendation engine, RFID, search engine result page, semantic web, Silicon Valley, social graph, social web, speech recognition, text mining, the map is not the territory, The Wisdom of Crowds, web application, zero-sum game, Zipcar

Here’s one example: Google search results that extract location breadcrumbs from Web sites and display them on results pages to help users select relevant pages from among the many possibilities. Perhaps a breadcrumb-sharing service will evolve, making it easier to share meta-information—such as a location within a hierarchy and other attributes—across Web sites. I am sure someone at some startup is working on something like this now, perhaps under the semantic Web banner. Breadcrumbs may evolve to become an integral part of key navigational structures—instead of their being a last-resort mechanism, as is common today. This is what this chapter focuses on. Breadcrumbs have started out as a simple way to show a single location or a single path—which reflects constraints from the physical world—but a powerful aspect of the virtual world is that an object can live in many places, and you can find it in many different ways at the same time.


pages: 276 words: 78,094

Design for Hackers: Reverse Engineering Beauty by David Kadavy

Airbnb, complexity theory, en.wikipedia.org, Firefox, Isaac Newton, John Gruber, Paul Graham, Ruby on Rails, semantic web, Silicon Valley, Silicon Valley startup, Steve Jobs, TaskRabbit, web application, wikimedia commons, Y Combinator

> Art Nouveau: Inspired by the Arts and Crafts Movement’s return to organic forms, and freed from the limitations of typesetting by stone lithographic technique, Parisian poster artists such as Alphonse Mucha (originally from Morovia, now part of the Czech Republic) integrated illustration and typography. Today’s fast pace of business and the technological limitations of the web make typography of this nature impractical. jQuery plug-ins such as Lettering.js are attempting to bring similar typographic control to the semantic web. (Categorization: display) > Futura: Paul Renner’s Futura broke down letters into the most basic geometric forms that it could. Typefaces with such intense geometric influence render poorly at body copy sizes on today’s screens. Pixels are relatively incompatible with perfectly circular forms. (Categorization: sans-serif, geometric) > Helvetica: This font, which was modeled off of Akzidenz Grotesk, was an instant hit and has achieved such ubiquity that there is an entire movie about it.


pages: 193 words: 19,478

Memory Machines: The Evolution of Hypertext by Belinda Barnet

augmented reality, Benoit Mandelbrot, Bill Duvall, British Empire, Buckminster Fuller, Claude Shannon: information theory, collateralized debt obligation, computer age, conceptual framework, Douglas Engelbart, Douglas Engelbart, game design, hiring and firing, Howard Rheingold, HyperCard, hypertext link, information retrieval, Internet Archive, John Markoff, linked data, mandelbrot fractal, Marshall McLuhan, Menlo Park, nonsequential writing, Norbert Wiener, publish or perish, Robert Metcalfe, semantic web, Steve Jobs, Stewart Brand, technoutopianism, Ted Nelson, the scientific method, Vannevar Bush, wikimedia commons

At least some of these cultures take part in open-source software development, including some for whom Xanadu is not just a vision in a dream. There are some grounds for hope. However poorly conceived the general infrastructure, however corrupt and benighted the superstructure, the society of networks does support, somewhat obscurely, a plurality of ideas. Even on what ostensibly counts as the ascendant side, there is room for Berners-Lee to envision a Semantic Web that aims to cast some light below our diving xviii Memory Machines boards – and for great institutional innovators such as Wendy Hall of Southampton to extend the affordances of the Web through artful exploitations on the server side. Hypertext takes no single line. The concept itself arises from the idea of extension or complication ­– writing in a higher-dimensional space – so how could it be confined to one chain of transmission?


pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzić

Albert Einstein, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application

Thirty years ago the lack of relevant law was understandable: The technologies were new; their capacity was largely unknown; and the types of legal issues they might raise were novel. Today, it is inexplicable and threatens to undermine both privacy and security. Hence, we must develop technical, legal, and policy foundations for transparency and accountability of large-scale mining across distributed heterogeneous data sources. Policy awareness is a property of the Semantic Web still in development that should provide users with accessible and understandable views of the policies associated with resources. The following issues related to privacy concerns may assist in individual privacy protection during a data-mining process, and should be a part of the best data-mining practices: Whether there is a clear description of a program’s collection of personal information, including how the collected information will serve the program’s purpose?

The book presents innovative, cutting-edge fuzzy techniques that highlight the relevance of fuzziness for huge data sets in the perspective of scalability issues, from both a theoretical and experimental point of view. It covers a wide scope of research areas including data representation, structuring and querying, as well as information retrieval and data mining. It encompasses different forms of databases, including data warehouses, data cubes, tabular or relational data, and many applications, among which are music warehouses, video mining, bioinformatics, semantic Web and data streams. Li, H. X., V. C. Yen, Fuzzy Sets and Fuzzy Decision-Making, CRC Press, Inc., Boca Raton, 1995. The book emphasizes the applications of fuzzy-set theory in the field of management science and decision science, introducing and formalizing the concept of fuzzy decision making. Many interesting methods of fuzzy decision making are developed and illustrated with examples. Pal, S.


pages: 743 words: 201,651

Free Speech: Ten Principles for a Connected World by Timothy Garton Ash

A Declaration of the Independence of Cyberspace, activist lawyer, Affordable Care Act / Obamacare, Andrew Keen, Apple II, Ayatollah Khomeini, battle of ideas, Berlin Wall, bitcoin, British Empire, Cass Sunstein, Chelsea Manning, citizen journalism, Clapham omnibus, colonial rule, crowdsourcing, David Attenborough, don't be evil, Donald Davies, Douglas Engelbart, Edward Snowden, Etonian, European colonialism, eurozone crisis, failed state, Fall of the Berlin Wall, Ferguson, Missouri, Filter Bubble, financial independence, Firefox, Galaxy Zoo, George Santayana, global village, index card, Internet Archive, invention of movable type, invention of writing, Jaron Lanier, jimmy wales, John Markoff, Julian Assange, Mark Zuckerberg, Marshall McLuhan, mass immigration, megacity, mutually assured destruction, national security letter, Nelson Mandela, Netflix Prize, Nicholas Carr, obamacare, Peace of Westphalia, Peter Thiel, pre–internet, profit motive, RAND corporation, Ray Kurzweil, Ronald Reagan, semantic web, Silicon Valley, Simon Singh, Snapchat, social graph, Stephen Hawking, Steve Jobs, Steve Wozniak, The Death and Life of Great American Cities, The Wisdom of Crowds, Turing test, We are Anonymous. We are Legion, WikiLeaks, World Values Survey, Yom Kippur War

The richest university in the world, Harvard, called on its scholars to make their work available in open-access journals, saying that its library could no longer afford the $3.5 million annual bill payable to the likes of Elsevier.36 The British government demanded that the results of any publicly funded research should be made freely available to the public and commissioned a report on the best way to cover the editorial, peer-review and production costs of academic publications.37 Tragically, this battle over intellectual property claimed the life of a brilliant young man. Aaron Swartz, an American computing prodigy, co-developed Reddit, an online bulletin board which by 2015 clocked more than 150 million unique monthly visitors viewing more than six billion pages. He was involved in pioneering the widely used RSS web feed, worked with Tim Berners-Lee to improve data sharing through the Semantic Web and with cyberlaw guru Lawrence Lessig on the Creative Commons licences. All this by age 26.38 Swartz believed passionately that data, information and knowledge should be freely accessible to all. So he obtained the book-cataloguing data kept by the Library of Congress, for which it usually charged, and posted it on something called the Open Library. He found his way into 19.9 million pages of electronic records of US court proceedings and uploaded them for all to see on theinfo.org.39 Using his computer skills and his Massachusetts Institute of Technology (MIT) guest access to the JSTOR online library of journal articles, for which most universities pay a hefty fee, he started downloading articles to a laptop hidden in a wiring cupboard at MIT.

., 195 search engine manipulation, 365 ‘search engine optimisation,’ 302 Second Life, 316 secrecy: C/S ratio, 324; guarding the guardians, 334–38; official, 324–27, 332–34, 337–38, 344–45; in wartime, 326; ‘well-placed sources,’ 341–45; whistleblowers and leakers, 339–41 section 295 of Indian/Pakistani penal code, 225, 254, 268, 275 secularism, 261, 265, 267, 273, 277–78, 281 security: executive oversight of, 335; versus freedom, 327–29; judiciary oversight of, 336–37; legislative oversight of, 335–36; national and personal, 321 sedition, 325 seditious libel, 331 Sedley, Stephen, 77, 131 Seinfeld, Jerry, 244 Selassie, Haile, 205 self-broadcasting/-publishing, 56–58 self-restraint, 213 Semantic Web, 164 Semprun, Jorgé, 304 Sen, Amartya, 78, 109, 193–94 Senegal, 243, 277 September 11, 2001 attacks, 64, 273, 322–24 Serbia, 133, 242 Serbo-Croat language, 123, 207 Serrano, Andres, 146 Serres, Michel, 25 sex, speech as, 89, 247–48 Shakarian, Hayastan, 349 Shakespeare, William, 156, 212 Shamikah, 313 Shamsie, Kamila, 90 ‘sharing,’ 166 Sharp, Gene, 148–49 Shaw, George Bernard, 17, 109 Shayegan, Daryush, 98 shield laws, 342 Shils, Edward, 99, 208–9 Shotoku (Prince), 109 Shrimsley, Robert, 142 Shteyngart, Gary, 13, 16 ‘Shunga’ art exhibition, 246–47 Sikhs, 131, 253, 262, 274 Siliconese, 50 Simone, Nina, 74, 78, 119, 212 Simpson, O.


pages: 294 words: 81,292

Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat

AI winter, AltaVista, Amazon Web Services, artificial general intelligence, Asilomar, Automated Insights, Bayesian statistics, Bernie Madoff, Bill Joy: nanobots, brain emulation, cellular automata, Chuck Templeton: OpenTable:, cloud computing, cognitive bias, commoditize, computer vision, cuban missile crisis, Daniel Kahneman / Amos Tversky, Danny Hillis, data acquisition, don't be evil, drone strike, Extropian, finite state, Flash crash, friendly AI, friendly fire, Google Glasses, Google X / Alphabet X, Isaac Newton, Jaron Lanier, John Markoff, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, Loebner Prize, lone genius, mutually assured destruction, natural language processing, Nicholas Carr, optical character recognition, PageRank, pattern recognition, Peter Thiel, prisoner's dilemma, Ray Kurzweil, Rodney Brooks, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, smart grid, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, Stuxnet, superintelligent machines, technological singularity, The Coming Technological Singularity, Thomas Bayes, traveling salesman, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, zero day

But since AIXI is uncomputable, it would never be a candidate for an intelligence explosion anyway. AIXItl—a computable approximation of AIXI—is another matter. This is also probably not true of mind uploading, if such a thing ever comes to pass. Computer science-based researchers want to engineer AGI: The mind versus brain debate is too large to address here. with $50 million in grants: Lenat, Doug, “Doug Lenat on Cyc, a truly semantic Web, and artificial intelligence (AI),” developerWorks, September 16, 2008, http://www.ibm.com/developerworks/podcast/dwi/cm-int091608txt.html (accessed September 28, 2011). Carnegie Mellon University’s NELL: Lohr, Steve, “Aiming to Learn as We Do, a Machine Teaches Itself,” New York Times, sec. science, October 4, 2010, http://www.nytimes.com/2010/10/05/science/05compute.html?pagewanted=all (accessed September 28, 2011).


pages: 336 words: 90,749

How to Fix Copyright by William Patry

A Declaration of the Independence of Cyberspace, barriers to entry, big-box store, borderless world, business cycle, business intelligence, citizen journalism, cloud computing, commoditize, creative destruction, crowdsourcing, death of newspapers, en.wikipedia.org, facts on the ground, Frederick Winslow Taylor, George Akerlof, Gordon Gekko, haute cuisine, informal economy, invisible hand, Joseph Schumpeter, Kickstarter, knowledge economy, lone genius, means of production, moral panic, new economy, road to serfdom, Ronald Coase, Ronald Reagan, semantic web, shareholder value, Silicon Valley, The Chicago School, The Wealth of Nations by Adam Smith, trade route, transaction costs, trickle-down economics, winner-take-all economy, zero-sum game

To make copyright work on a technical system (like the Internet) you’d need to look at the system itself for the means to implement the law. 6. I have no idea. 7. Advances in technologies create problems that can only be solved by further advances in those same technologies. 8. The answer to beating a machine (say, at chess) is understanding how it works. “THE ANSWER TO THE MACHINE IS IN THE MACHINE” IS A REALLY BAD METAPHOR 235 9. The semantic web is the answer to all potential problems of access, control, and copying online. In other words, the proliferation of metadata standards will solve the “problem” of the existing behavior of computers, and in particular, search engines. 10. The challenge to copyright that the machine has always posed historically—shifting production cost and thus power—can be met by building a response into the same machine.


pages: 374 words: 94,508

Infonomics: How to Monetize, Manage, and Measure Information as an Asset for Competitive Advantage by Douglas B. Laney

3D printing, Affordable Care Act / Obamacare, banking crisis, blockchain, business climate, business intelligence, business process, call centre, chief data officer, Claude Shannon: information theory, commoditize, conceptual framework, crowdsourcing, dark matter, data acquisition, digital twin, discounted cash flows, disintermediation, diversification, en.wikipedia.org, endowment effect, Erik Brynjolfsson, full employment, informal economy, intangible asset, Internet of things, linked data, Lyft, Nash equilibrium, Network effects, new economy, obamacare, performance metric, profit motive, recommendation engine, RFID, semantic web, smart meter, Snapchat, software as a service, source of truth, supply-chain management, text mining, uber lyft, Y2K, yield curve

This information can also have real commercial value—especially when mashed with other sources—to understand and act on local or global market conditions, population trends, and weather, for example. Public data even can be used to create new (ahem) high-value businesses such as Potbot, a virtual cannabis “budtender.” At its core is a recommendation engine that uses information on strains, cannabinoids, and medical applications aggregated via semantic web technology. Potbot also incorporates data from cannabis seed DNA scans along with recordings of brain activity in clinical tests. It monetizes this information, not just in the form of a consumer app, but also in helping growers improve their yields for the most popular or beneficial strains.1617 Public data is most monetizable when integrated with your own proprietary information. Consider the example of one of the largest brewers in the world.


pages: 407 words: 103,501

The Digital Divide: Arguments for and Against Facebook, Google, Texting, and the Age of Social Netwo Rking by Mark Bauerlein

Amazon Mechanical Turk, Andrew Keen, business cycle, centre right, citizen journalism, collaborative editing, computer age, computer vision, corporate governance, crowdsourcing, David Brooks, disintermediation, Frederick Winslow Taylor, Howard Rheingold, invention of movable type, invention of the steam engine, invention of the telephone, Jaron Lanier, Jeff Bezos, jimmy wales, Kevin Kelly, knowledge worker, late fees, Mark Zuckerberg, Marshall McLuhan, means of production, meta analysis, meta-analysis, moral panic, Network effects, new economy, Nicholas Carr, PageRank, peer-to-peer, pets.com, Results Only Work Environment, Saturday Night Live, search engine result page, semantic web, Silicon Valley, slashdot, social graph, social web, software as a service, speech recognition, Steve Jobs, Stewart Brand, technology bubble, Ted Nelson, The Wisdom of Crowds, Thorstein Veblen, web application

Hence our theme for this year: Web Squared. 1990–2004 was the match being struck; 2005–2009 was the fuse; and 2010 will be the explosion. Ever since we first introduced the term “Web 2.0,” people have been asking, “What’s next?” Assuming that Web 2.0 was meant to be a kind of software version number (rather than a statement about the second coming of the Web after the dot-com bust), we’re constantly asked about “Web 3.0.” Is it the semantic web? The sentient web? Is it the social web? The mobile web? Is it some form of virtual reality? It is all of those, and more. The Web is no longer a collection of static pages of HTML that describe something in the world. Increasingly, the Web is the world—everything and everyone in the world casts an “information shadow,” an aura of data which, when captured and processed intelligently, offers extraordinary opportunity and mind-bending implications.


Data and the City by Rob Kitchin,Tracey P. Lauriault,Gavin McArdle

A Declaration of the Independence of Cyberspace, bike sharing scheme, bitcoin, blockchain, Bretton Woods, Chelsea Manning, citizen journalism, Claude Shannon: information theory, clean water, cloud computing, complexity theory, conceptual framework, corporate governance, correlation does not imply causation, create, read, update, delete, crowdsourcing, cryptocurrency, dematerialisation, digital map, distributed ledger, fault tolerance, fiat currency, Filter Bubble, floating exchange rates, global value chain, Google Earth, hive mind, Internet of things, Kickstarter, knowledge economy, lifelogging, linked data, loose coupling, new economy, New Urbanism, Nicholas Carr, open economy, openstreetmap, packet switching, pattern recognition, performance metric, place-making, RAND corporation, RFID, Richard Florida, ride hailing / ride sharing, semantic web, sentiment analysis, sharing economy, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart contracts, smart grid, smart meter, social graph, software studies, statistical model, TaskRabbit, text mining, The Chicago School, The Death and Life of Great American Cities, the market place, the medium is the message, the scientific method, Toyota Production System, urban planning, urban sprawl, web application

It is only by looking at the model and how it came to be through database specifications and requirements, the observation of data production on-site in real time and in communication with database designers and mangers, that attributes of an infrastructure’s assemblage can be observed in their state of play. What a cursory analysis shows is that the process of modelling is situated in the domain of object-oriented programming, the semantic web, GIScience, modelling software, taxonomies, the burgeoning database and GIS industry, modelling schemas, mathematics, consulting firms and offshore data re-engineering companies. Furthermore, data modelling requires a particular form of logical abstract thinking, in the case of the OSi and 1Spatial those that were involved in the modelling exercise were very senior, experienced and renowned spatial data experts, all formally trained in spatial database design and maintenance as well as spatial 180 T.


pages: 349 words: 102,827

The Infinite Machine: How an Army of Crypto-Hackers Is Building the Next Internet With Ethereum by Camila Russo

4chan, Airbnb, algorithmic trading, altcoin, always be closing, Any sufficiently advanced technology is indistinguishable from magic, Asian financial crisis, bitcoin, blockchain, Burning Man, crowdsourcing, cryptocurrency, distributed ledger, diversification, Donald Trump, East Village, Ethereum, ethereum blockchain, Flash crash, Google Glasses, Google Hangouts, hacker house, Internet of things, Mark Zuckerberg, Maui Hawaii, mobile money, new economy, peer-to-peer, Peter Thiel, pets.com, Ponzi scheme, prediction markets, QR code, reserve currency, RFC: Request For Comment, Richard Stallman, Robert Shiller, Robert Shiller, Sand Hill Road, Satoshi Nakamoto, semantic web, sharing economy, side project, Silicon Valley, Skype, slashdot, smart contracts, South of Market, San Francisco, the payments system, too big to fail, tulip mania, Turing complete, Uber for X

Web 1 is the internet of the 1990s, before user-generated content, indexed search, and social media platforms. It lived exclusively in desktop computers. Web 2 is the internet as we know it today, with user-generated content, streaming video and music, and location-based services. It thrives on mobile devices. Web 3 was first coined in a 2006 New York Times article referring to a third-generation internet. This new internet is made up of concepts including the “semantic web,” or a web of data that can be processed by machines, artificial intelligence, machine learning, and data mining. When algorithms decide what to recommend someone should purchase on Amazon, that’s a glimpse of Web 3. Besides all those features, Gavin’s version of Web 3 would allow people to interact without needing to trust each other. It should be a peer-to-peer network with no servers and no authorities to manage the flow of information.


Possiplex by Ted Nelson

Any sufficiently advanced technology is indistinguishable from magic, Bill Duvall, Brewster Kahle, Buckminster Fuller, cuban missile crisis, Donald Knuth, Douglas Engelbart, Douglas Engelbart, HyperCard, Jaron Lanier, John Markoff, Kevin Kelly, Marc Andreessen, Marshall McLuhan, Murray Gell-Mann, nonsequential writing, pattern recognition, post-work, RAND corporation, semantic web, Silicon Valley, Steve Jobs, Stewart Brand, Ted Nelson, Thomas Kuhn: the structure of scientific revolutions, Vannevar Bush, Zimmermann PGP

"Xanadu" is a registered trademark which I maintain at considerable cost, and I ask all parties to respect this by using the "®" or "(R)" symbol for the first use of the trademark "Xanadu" in each document. 7. Not "all the world's information", but all the world's documents. The concept of "information" is arguable, documents much less so. I believe Tim is finding his concept of pure information, the "Semantic Web", much more difficult to achieve than hypertext documents. 8. No, not a link; a transclusive pathway. The two mechanisms are entirely different. A link connects two things which are different. A transclusion connects two things which are the same. 9. Not authors, rightsholders. Sometimes the author is a rightsholder, sometimes not. A rightsholder is generally someone who has bought or contracted the rights from the author.


pages: 387 words: 105,250

The Caryatids by Bruce Sterling

carbon footprint, clean water, failed state, impulse control, negative equity, new economy, nuclear winter, semantic web, sexual politics, social software, starchitect, stem cell, supervolcano, urban renewal, Whole Earth Review

The results arrived in a blistering deluge of search hits. The results were ugly. They had hit on a subject that knowledgeable experts had been discussing for a hundred years. The most heavily trafficked tag was the strange coinage “Supervolcano.” Supervolcanoes had been a topic of mild intellectual interest for many years. Recently, people had talked much less about supervolcanoes, and with more pejoratives in their semantics. Web-semantic traffic showed that people were actively shunning the subject of supervolcanoes. That scientific news seemed to be rubbing people the wrong way. “So,” said Guillermo at last, “according to our best sources here, there are some giant … and I mean really giant magma plumes rising up and chewing at the West Coast of North America. Do we have a Family consensus about that issue?” Raph still wasn’t buying it.


pages: 373 words: 112,822

The Upstarts: How Uber, Airbnb, and the Killer Companies of the New Silicon Valley Are Changing the World by Brad Stone

Affordable Care Act / Obamacare, Airbnb, Amazon Web Services, Andy Kessler, autonomous vehicles, Ben Horowitz, Boris Johnson, Burning Man, call centre, Chuck Templeton: OpenTable:, collaborative consumption, East Village, fixed income, Google X / Alphabet X, housing crisis, inflight wifi, Jeff Bezos, Justin.tv, Kickstarter, Lyft, Marc Andreessen, Mark Zuckerberg, Menlo Park, Mitch Kapor, Necker cube, obamacare, Paul Graham, peer-to-peer, Peter Thiel, race to the bottom, rent control, ride hailing / ride sharing, Ruby on Rails, Sand Hill Road, self-driving car, semantic web, sharing economy, side project, Silicon Valley, Silicon Valley startup, Skype, South of Market, San Francisco, Startup school, Steve Jobs, TaskRabbit, Tony Hsieh, transportation-network company, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, ubercab, Y Combinator, Y2K, Zipcar

He got his undergraduate degree in 2001 and stayed at the university to pursue a master of science, finally leaving his comfortable nest after he turned twenty-two to move into a campus apartment with classmates. Camp met Geoff Smith, who would become his StumbleUpon co-founder, through one of his childhood friends and together they started the site as a way for users to share and find interesting things on the internet without having to search for them on Google. Camp was obsessed with collaborative information systems and the semantic web. He didn’t go out much back then, splitting his time between his graduate thesis and the company and immersing himself in dense academic papers about esoteric topics in computer science. By the time Camp finished his degree in 2005, StumbleUpon was starting to show promise. Camp and Smith met an angel investor that year who convinced them to move to San Francisco and raise capital. They incorporated the company in the United States, and over the next year, the number of users on StumbleUpon grew from five hundred thousand to two million.


pages: 413 words: 119,587

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots by John Markoff

"Robert Solow", A Declaration of the Independence of Cyberspace, AI winter, airport security, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, basic income, Baxter: Rethink Robotics, Bill Duvall, bioinformatics, Brewster Kahle, Burning Man, call centre, cellular automata, Chris Urmson, Claude Shannon: information theory, Clayton Christensen, clean water, cloud computing, collective bargaining, computer age, computer vision, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, data acquisition, Dean Kamen, deskilling, don't be evil, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dynabook, Edward Snowden, Elon Musk, Erik Brynjolfsson, factory automation, From Mathematics to the Technologies of Life and Death, future of work, Galaxy Zoo, Google Glasses, Google X / Alphabet X, Grace Hopper, Gunnar Myrdal, Gödel, Escher, Bach, Hacker Ethic, haute couture, hive mind, hypertext link, indoor plumbing, industrial robot, information retrieval, Internet Archive, Internet of things, invention of the wheel, Jacques de Vaucanson, Jaron Lanier, Jeff Bezos, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, knowledge worker, Kodak vs Instagram, labor-force participation, loose coupling, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, medical residency, Menlo Park, Mitch Kapor, Mother of all demos, natural language processing, new economy, Norbert Wiener, PageRank, pattern recognition, pre–internet, RAND corporation, Ray Kurzweil, Richard Stallman, Robert Gordon, Rodney Brooks, Sand Hill Road, Second Machine Age, self-driving car, semantic web, shareholder value, side project, Silicon Valley, Silicon Valley startup, Singularitarianism, skunkworks, Skype, social software, speech recognition, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Nelson, telemarketer, telepresence, telepresence robot, Tenerife airport disaster, The Coming Technological Singularity, the medium is the message, Thorstein Veblen, Turing test, Vannevar Bush, Vernor Vinge, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, William Shockley: the traitorous eight, zero-sum game

Critics, such as Ben Shneiderman, insisted that software assistants were both technically and ethically flawed. They argued for keeping human users in direct control rather than handing off decisions to a software valet. The Siri team did not shy away from the controversy, and it wasn’t long before they pulled back the curtain on their project, just a bit. By late spring 2009, Gruber was speaking obliquely about the new technology. During the summer of that year he appeared at a Semantic Web conference and described, point by point, how the futuristic technologies in the Knowledge Navigator were becoming a reality: there were now touch screens that enabled so-called gestural interfaces, there was a global network for information sharing and collaboration, developers were coding programs that interacted with humans, and engineers had started to finesse natural and continuous speech recognition.


When Computers Can Think: The Artificial Intelligence Singularity by Anthony Berglas, William Black, Samantha Thalind, Max Scratchmann, Michelle Estes

3D printing, AI winter, anthropic principle, artificial general intelligence, Asilomar, augmented reality, Automated Insights, autonomous vehicles, availability heuristic, blue-collar work, brain emulation, call centre, cognitive bias, combinatorial explosion, computer vision, create, read, update, delete, cuban missile crisis, David Attenborough, Elon Musk, en.wikipedia.org, epigenetics, Ernest Rutherford, factory automation, feminist movement, finite state, Flynn Effect, friendly AI, general-purpose programming language, Google Glasses, Google X / Alphabet X, Gödel, Escher, Bach, industrial robot, Isaac Newton, job automation, John von Neumann, Law of Accelerating Returns, license plate recognition, Mahatma Gandhi, mandelbrot fractal, natural language processing, Parkinson's law, patent troll, patient HM, pattern recognition, phenotype, ransomware, Ray Kurzweil, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, sorting algorithm, speech recognition, statistical model, stem cell, Stephen Hawking, Stuxnet, superintelligent machines, technological singularity, Thomas Malthus, Turing machine, Turing test, uranium enrichment, Von Neumann architecture, Watson beat the top human players on Jeopardy!, wikimedia commons, zero day

However, natural language query systems are rarely used today because they tend to be brittle, so they require the user to know how to phrase questions that the system can answer. A better approach seems to be to present the structure of the data in a graphical user interface and then let the user specify the query directly in terms of the symbols that the computer does understand. As advances are made in commonsense reasoning this may change. Producing an effective natural language query processor is a major goal of the semantic web community. Eurisko and other early results One of the more commonly quoted early works is Eurisko, created by Douglas Lenat in 1976. It used various heuristics to generate short programs that could be interpreted as mathematical theorems. It also had heuristics for how to create new heuristics. It had some success, winning the Traveler ship building game against human competitors. The AGI community often reveres Eurisko as an example of a very powerful early computer system.


pages: 1,038 words: 137,468

JavaScript Cookbook by Shelley Powers

Firefox, Google Chrome, hypertext link, semantic web, web application, WebSocket

operator, 120 test method (RegExp), 24 U testing code with JsUnit, 392–396 text elements (forms), 162 undefined array elements, 70 text input (forms), accessing, 159–161 undefined data type, 11 text results to Ajax requests, processing, 422 Unicode sequences, 16 text value (aria-relevant attribute), 324 unit testing, 393 textareas universal selector (*), 232 events for, 162 unload events, 115 lines in, processing, 16–17 warnings when leaving pages, 147 observing character input for, 129–132 unordered lists, applying striping theme to, textInput events, 130 230–231 TextRectangle objects, 272 uppercase (see case) this context, 163 URIError errors, 185 this keyword, 360, 383–385 URLs, adding persistent information to, 458– keeping object members private, 361–362 461 throw statements, 184 user error, about, 177 throwing exceptions, 184 user input, form (see forms) Thunderbird extensions, building, 486 user input, validating (see validating) time (see date and time; tiers) userAgent property (Navigator), 146 timed page updates, 427–430 UTC date and time, printing, 42–43 timeouts, 49–50 UTCString method (Date), 42 timerEvent function, 428 timers, 41 V function closures with, 52–53 validating incremental counters in code, 57–58 array contents, 86–87 recurring, 50–51 checking for function errors, 180–181 triggering timeouts, 49–50 with forms title elements, 211 based on format, 166–167 today’s date, printing, 41–42 canceling invalid data submission, 167– toISOString method (Date), 44 168 toLowerCase method (String), 5 dynamic selection lists, 173–176 tools, extending with JavaScript, 496–499 preventing multiple submissions, 169– top property (bounding rectangle), 272, 273 171 toString method, 1, 59 function arguments (input), 95 touch swiping events, 117 highlighting invalid form fields, 302–307 toUpperCase method (String), 5 with jQuery Validation plug-in, 403 tr elements social security numbers, 26–28 adding to tables, 257–260 value attribute (objects), 370 Index | 527 valueOf method, 11, 12 writable attribute (objects), 370 variable values, checking, 181–182 vendor property (Navigator), 146 X video (see rich media) video elements, 326, 353–357 X3D, 326 visibility property (CSS), 172, 276 XML documents VoiceOver screen reader, 297 extracting pertinent information from, 437– 442 W processing, 436–437 XMLHttpRequest objects \w in regular expressions, 23 accessing, 414–415 \W in regular expressions, 23 adding callback functions to, 420–421 warn function (JsUnit), 394 checking for error conditions, 421 Watch panel (Firebug), 190 making requests to other domains, 422– Web Inspector (Safari), 203 424 web page elements (see elements) XScriptContext objects, 499 web page space (see page space) web pages (see document elements; pages) web-safe colors, 148 Web Sockets API, 413, 429 Web Workers, 500–509 WebGL (Web Graphics Library), 326, 350– 351 WebKit (Google) debugging with, 208–209 WebGL support in, 350–351 .wgt files, 493 while loop, iterating through arrays with, 71 whitespace, 269 (see also page space) matching in regular expressions, 23 nonbreaking space character, 19 trimming from form data, 162 trimming from strings, 17–19 using regular expressions, 35–36 widgets, creating, 489–494 width (see size) width attribute (canvas element), 327 width property (bounding rectangle), 272 width property (Screen), 149 window area, measuring, 270–271 window elements, 143 creating new stripped-down, 144–145 open method, 145 window space (see page space) windows, communicating across, 430–434 Windows-Eyes, 297 words, 32 (see also strings) swapping order of, 32–34 528 | Index About the Author Shelley Powers has been working with and writing about web technologies—from the first release of JavaScript to the latest graphics and design tools—for more than 15 years. Her recent O’Reilly books have covered the semantic web, Ajax, JavaScript, and web graphics. She’s an avid amateur photographer and web development aficionado. Colophon The animal on the cover of JavaScript Cookbook is a little (or lesser) egret ( Egretta garzetta). A small white heron, it is the old world counterpart to the very similar new world snowy egret. It is the smallest and most common egret in Singapore, and its original breeding distribution included the large inland and coastal wetlands in warm temperate parts of Europe, Asia, Africa, Taiwan, and Australia.


We Are the Nerds: The Birth and Tumultuous Life of Reddit, the Internet's Culture Laboratory by Christine Lagorio-Chafkin

4chan, Airbnb, Amazon Web Services, Bernie Sanders, big-box store, bitcoin, blockchain, Brewster Kahle, Burning Man, crowdsourcing, cryptocurrency, David Heinemeier Hansson, Donald Trump, East Village, game design, Golden Gate Park, hiring and firing, Internet Archive, Jacob Appelbaum, Jeff Bezos, jimmy wales, Joi Ito, Justin.tv, Kickstarter, Lean Startup, Lyft, Marc Andreessen, Mark Zuckerberg, medical residency, minimum viable product, natural language processing, Paul Buchheit, Paul Graham, paypal mafia, Peter Thiel, plutocrats, Plutocrats, QR code, recommendation engine, RFID, rolodex, Ruby on Rails, Sam Altman, Sand Hill Road, Saturday Night Live, self-driving car, semantic web, side project, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, slashdot, Snapchat, social web, South of Market, San Francisco, Startup school, Stephen Hawking, Steve Jobs, Steve Wozniak, technoutopianism, uber lyft, web application, WikiLeaks, Y Combinator

Davidson gave him the job. Huffman arrived at Image Matters to find five guys working on government security and emergency response technology. He later admitted he didn’t fully understand the scope of the work at the time, but one project helped layer locations of disaster responders over a map. Another was sort of a web-based assistant like Siri. What Huffman largely worked on was translating data for the “semantic web,” a layer of coding that helps computers understand and catalog a website’s contents. Image Matters didn’t behave how Huffman expected startups should. “It wasn’t very glamorous. All its money was from government contracts, so their projects were kind of boring,” he said. “No users, no scaling problems—none of that stuff.” In Silicon Valley, he suspected, there were cool, audacious upstarts that touched millions of individuals and could effect at least some measure of change in the world.


pages: 999 words: 194,942

Clojure Programming by Chas Emerick, Brian Carper, Christophe Grand

Amazon Web Services, Benoit Mandelbrot, cloud computing, continuous integration, database schema, domain-specific language, don't repeat yourself, en.wikipedia.org, failed state, finite state, Firefox, game design, general-purpose programming language, Guido van Rossum, Larry Wall, mandelbrot fractal, Paul Graham, platform as a service, premature optimization, random walk, Ruby on Rails, Schrödinger's Cat, semantic web, software as a service, sorting algorithm, Turing complete, type inference, web application

* * * [177] Note that metadata on keys of &env can’t be relied upon, in particular in the presence of local aliases. [178] See Testing Contextual Macros for our stab at an alternative macroexpansion function that does support this without the var-dereferencing line noise. [179] Or, returned by a previous expansion. [180] Triples are a term for subject-predicate-object expressions, as found in semantic web technologies like RDF. Specific representations and semantics of triples vary from implementation to implementation, but a simplified example of a vector triple might be ["Boston" :capital-of "Massachusetts"]. [181] refer is described in “refer”, and is also reused by use, described later in that chapter. [182] We describe type hints in Type Hinting for Performance. [183] This is due to an unfortunate implementation detail: special forms (like let, the outermost form in the expression returned by or) cannot be hinted.


pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil

additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business cycle, business intelligence, c2.com, call centre, carbon-based life, cellular automata, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, coronavirus, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, disintermediation, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, factory automation, friendly AI, George Gilder, Gödel, Escher, Bach, informal economy, information retrieval, invention of the telephone, invention of the telescope, invention of writing, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Marshall McLuhan, Mikhail Gorbachev, Mitch Kapor, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Norbert Wiener, oil shale / tar sands, optical character recognition, pattern recognition, phenotype, premature optimization, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Robert Metcalfe, Rodney Brooks, scientific worldview, Search for Extraterrestrial Intelligence, selection bias, semantic web, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Y2K, Yogi Berra

Wikipedia, http://en.wikipedia.org/wiki/Epigenetics. 10. See note 57 in chapter 2 for an analysis of the information content in the genome, which I estimate to be 30 to 100 million bytes, therefore less than 109 bits. See the section "Human Memory Capacity" in chapter 3 (p. 126) for my analysis of the information in a human brain, estimated at 1018 bits. 11. Marie Gustafsson and Christian Balkenius, "Using Semantic Web Techniques for Validation of Cognitive Models against Neuroscientific Data," AILS04 Workshop, SAIS/SSLS Workshop (Swedish Artificial Intelligence Society; Swedish Society for Learning Systems), April 15–16, 2004, Lund, Sweden, www.lucs.lu.se/People/Christian.Balkenius/PDF/Gustafsson.Balkenius.2004.pdf. 12. See discussion in chapter 3. In one useful reference, when modeling neuron by neuron, Tomaso Poggio and Christof Koch describe the neuron as similar to a chip with thousands of logical gates.


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

seeStructural Clustering Algorithm for Networks core vertex 531 illustrated 532 scatter plots 54 2-D data set visualization with 59 3-D data set visualization with 60 correlations between attributes 54–56 illustrated 55 matrix 56, 59 schemas integration 94 snowflake 140–141 star 139–140 science applications 611–613 search engines 28 search space pruning 263, 301 second guess heuristic 369 selection dimensions 225 self-training 432 semantic annotations applications 317, 313, 320–321 with context modeling 316 from DBLP data set 316–317 effectiveness 317 example 314–315 of frequent patterns 313–317 mutual information 315–316 task definition 315 Semantic Web 597 semi-offline materialization 226 semi-supervised classification 432–433, 437 alternative approaches 433 cotraining 432–433 self-training 432 semi-supervised learning 25 outlier detection by 572 semi-supervised outlier detection 551 sensitivity analysis 408 sensitivity measure 367 sentiment classification 434 sequence data analysis 319 sequences 586 alignment 590 biological 586, 590–591 classification of 589–590 similarity searches 587 symbolic 586, 588–590 time-series 586, 587–588 sequential covering algorithm 359 general-to-specific search 360 greedy search 361 illustrated 359 rule induction with 359–361 sequential pattern mining 589 constraint-based 589 in symbolic sequences 588–589 shapelets method 590 shared dimensions 204 pruning 205 shared-sorts 193 shared-partitions 193 shell cubes 160 shell fragments 192, 235 approach 211–212 computation algorithm 212, 213 computation example 214–215 precomputing 210 shrinking diameter 592 sigmoid function 402 signature-based detection 614 significance levels 373 significance measure 312 significance tests 372–373, 386 silhouette coefficient 489–490 similarity asymmetric binary 71 cosine 77–78 measuring 65–78, 79 nominal attributes 70 similarity measures 447–448, 525–528 constraints on 533 geodesic distance 525–526 SimRank 526–528 similarity searches 587 in information networks 594 in multimedia data mining 596 simple random sample with replacement (SRSWR) 108 simple random sample without replacement (SRSWOR) 108 SimRank 526–528, 539 computation 527–528 random walk 526–528 structural context 528 simultaneous aggregation 195 single-dimensional association rules 17, 287 single-linkage algorithm 460, 461 singular value decomposition (SVD) 587 skewed data balanced 271 negatively 47 positively 47 wavelet transforms on 102 slice operation 148 small-world phenomenon 592 smoothing 112 by bin boundaries 89 by bin means 89 by bin medians 89 for data discretization 90 snowflake schema 140 example 141 illustrated 141 star schema versus 140 social networks 524–525, 526–528 densification power law 592 evolution of 594 mining 623 small-world phenomenon 592see alsonetworks social science/social studies data mining 613 soft clustering 501 soft constraints 534, 539 example 534 handling 536–537 space-filling curve 58 sparse data 102 sparse data cubes 190 sparsest cuts 539 sparsity coefficient 579 spatial data 14 spatial data mining 595 spatiotemporal data analysis 319 spatiotemporal data mining 595, 623–624 specialized SQL servers 165 specificity measure 367 spectral clustering 520–522, 539 effectiveness 522 framework 521 steps 520–522 speech recognition 430 speed, classification 369 spiral method 152 split-point 333, 340, 342 splitting attributes 333 splitting criterion 333, 342 splitting rules.


Engineering Security by Peter Gutmann

active measures, algorithmic trading, Amazon Web Services, Asperger Syndrome, bank run, barriers to entry, bitcoin, Brian Krebs, business process, call centre, card file, cloud computing, cognitive bias, cognitive dissonance, combinatorial explosion, Credit Default Swap, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, Debian, domain-specific language, Donald Davies, Donald Knuth, double helix, en.wikipedia.org, endowment effect, fault tolerance, Firefox, fundamental attribution error, George Akerlof, glass ceiling, GnuPG, Google Chrome, iterative process, Jacob Appelbaum, Jane Jacobs, Jeff Bezos, John Conway, John Markoff, John von Neumann, Kickstarter, lake wobegon effect, Laplace demon, linear programming, litecoin, load shedding, MITM: man-in-the-middle, Network effects, Parkinson's law, pattern recognition, peer-to-peer, Pierre-Simon Laplace, place-making, post-materialism, QR code, race to the bottom, random walk, recommendation engine, RFID, risk tolerance, Robert Metcalfe, Ruby on Rails, Sapir-Whorf hypothesis, Satoshi Nakamoto, security theater, semantic web, Skype, slashdot, smart meter, social intelligence, speech recognition, statistical model, Steve Jobs, Steven Pinker, Stuxnet, telemarketer, text mining, the built environment, The Death and Life of Great American Cities, The Market for Lemons, the payments system, Therac-25, too big to fail, Turing complete, Turing machine, Turing test, web application, web of trust, x509 certificate, Y2K, zero day, Zimmermann PGP

“Context-Aware Access Control—Making Access Control Decisions Based on Context Information”, Sven Lachmund, Thomas Walter, Laurent Bussard, Laurent Gomez and Eddy Olk, Proceedings of the International Workshop on Ubiquitous Access Control (IWUAC’06), July 2006, p.1. “A Semantic Context-Aware Access Control Framework for Secure Collaborations in Pervasive Computing Environments”, Alessandra Toninelli, Rebecca Montanari, Lalana Kagal and Ora Lassila, Proceedings of the 5th International Semantic Web Conference (ISWC’06), Springer-Verlag LNCS No.4273, November 2006, p.473. “Information Security Architecture-Context Aware Access Control Model for Educational Applications”, N. DuraiPandian, V. Shanmughaneethi and C. Chellappan, International Journal of Computer Science and Network Security, Vol.6, No.12 (December 2006), p.197. “Context-Aware Provisional Access Control”, Amir Masoumzadeh, Morteza Amini and Rasool Jalili, Proceedings of the 2nd International Conference on References [384] [385] [386] [387] [388] [389] [390] [391] [392] [393] 525 Information Systems Security (ICISS’06), Springer-Verlag LNCS No.4332, December 2006, p.132.