fault tolerance

pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Infrastructure as a Service, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

EventSource (browser API), Pushing state changes to clients eventual consistency, Replication, Problems with Replication Lag, Safety and liveness, Consistency Guarantees(see also conflicts) and perpetual inconsistency, Timeliness and Integrity evolvability, Evolvability: Making Change Easy, Encoding and Evolutioncalling services, Data encoding and evolution for RPC graph-structured data, Property Graphs of databases, Schema flexibility in the document model, Dataflow Through Databases-Archival storage, Deriving several views from the same event log, Reprocessing data for application evolution of message-passing, Distributed actor frameworks reprocessing data, Reprocessing data for application evolution, Unifying batch and stream processing schema evolution in Avro, The writer’s schema and the reader’s schema schema evolution in Thrift and Protocol Buffers, Field tags and schema evolution schema-on-read, Schema flexibility in the document model, Encoding and Evolution, The Merits of Schemas exactly-once semantics, Exactly-once message processing, Fault Tolerance, Exactly-once execution of an operationparity with batch processors, Unifying batch and stream processing preservation of integrity, Correctness of dataflow systems exclusive mode (locks), Implementation of two-phase locking eXtended Architecture transactions (see XA transactions) extract-transform-load (see ETL) F FacebookPresto (query engine), The divergence between OLTP databases and data warehouses React, Flux, and Redux (user interface libraries), End-to-end event streams social graphs, Graph-Like Data Models Wormhole (change data capture), Implementing change data capture fact tables, Stars and Snowflakes: Schemas for Analytics failover, Leader failure: Failover, Glossary(see also leader-based replication) in leaderless replication, absence of, Writing to the Database When a Node Is Down leader election, The leader and the lock, Total Order Broadcast, Distributed Transactions and Consensus potential problems, Leader failure: Failover failuresamplification by distributed transactions, Limitations of distributed transactions, Maintaining derived state failure detection, Detecting Faultsautomatic rebalancing causing cascading failures, Operations: Automatic or Manual Rebalancing perfect failure detectors, Three-phase commit timeouts and unbounded delays, Timeouts and Unbounded Delays, Network congestion and queueing using ZooKeeper, Membership and Coordination Services faults versus, Reliability partial failures in distributed systems, Faults and Partial Failures-Cloud Computing and Supercomputing, Summary fan-out (messaging systems), Describing Load, Multiple consumers fault tolerance, Reliability-How Important Is Reliability?, Glossaryabstractions for, Consistency and Consensus formalization in consensus, Fault-Tolerant Consensus-Limitations of consensususe of replication, Single-leader replication and consensus human fault tolerance, Philosophy of batch process outputs in batch processing, Bringing related data together in the same place, Philosophy of batch process outputs, Fault tolerance, Fault tolerance in log-based systems, Applying end-to-end thinking in data systems, Timeliness and Integrity-Correctness of dataflow systems in stream processing, Fault Tolerance-Rebuilding state after a failureatomic commit, Atomic commit revisited idempotence, Idempotence maintaining derived state, Maintaining derived state microbatching and checkpointing, Microbatching and checkpointing rebuilding state after a failure, Rebuilding state after a failure of distributed transactions, XA transactions-Limitations of distributed transactions transaction atomicity, Atomicity, Atomic Commit and Two-Phase Commit (2PC)-Exactly-once message processing faults, ReliabilityByzantine faults, Byzantine Faults-Weak forms of lying failures versus, Reliability handled by transactions, Transactions handling in supercomputers and cloud computing, Cloud Computing and Supercomputing hardware, Hardware Faults in batch processing versus distributed databases, Designing for frequent faults in distributed systems, Faults and Partial Failures-Cloud Computing and Supercomputing introducing deliberately, Reliability, Network Faults in Practice network faults, Network Faults in Practice-Detecting Faultsasymmetric faults, The Truth Is Defined by the Majority detecting, Detecting Faults tolerance of, in multi-leader replication, Multi-datacenter operation software errors, Software Errors tolerating (see fault tolerance) federated databases, The meta-database of everything fence (CPU instruction), Linearizability and network delays fencing (preventing split brain), Leader failure: Failover, The leader and the lock-Fencing tokensgenerating fencing tokens, Using total order broadcast, Membership and Coordination Services properties of fencing tokens, Correctness of an algorithm stream processors writing to databases, Idempotence, Exactly-once execution of an operation Fibre Channel (networks), MapReduce and Distributed Filesystems field tags (Thrift and Protocol Buffers), Thrift and Protocol Buffers-Field tags and schema evolution file descriptors (Unix), A uniform interface financial data, Advantages of immutable events Firebase (database), API support for change streams Flink (processing framework), Dataflow engines-Discussion of materializationdataflow APIs, High-Level APIs and Languages fault tolerance, Fault tolerance, Microbatching and checkpointing, Rebuilding state after a failure Gelly API (graph processing), The Pregel processing model integration of batch and stream processing, Batch and Stream Processing, Unifying batch and stream processing machine learning, Specialization for different domains query optimizer, The move toward declarative query languages stream processing, Stream analytics flow control, Network congestion and queueing, Messaging Systems, Glossary FLP result (on consensus), Distributed Transactions and Consensus FlumeJava (dataflow library), MapReduce workflows, High-Level APIs and Languages followers, Leaders and Followers, Glossary(see also leader-based replication) foreign keys, Comparison to document databases, Reduce-Side Joins and Grouping forward compatibility, Encoding and Evolution forward decay (algorithm), Describing Performance Fossil (version control system), Limitations of immutabilityshunning (deleting data), Limitations of immutability FoundationDB (database)serializable transactions, Serializable Snapshot Isolation (SSI), Performance of serializable snapshot isolation, Limitations of distributed transactions fractal trees, B-tree optimizations full table scans, Reduce-Side Joins and Grouping full-text search, Glossaryand fuzzy indexes, Full-text search and fuzzy indexes building search indexes, Building search indexes Lucene storage engine, Making an LSM-tree out of SSTables functional reactive programming (FRP), Designing Applications Around Dataflow functional requirements, Summary futures (asynchronous operations), Current directions for RPC fuzzy search (see similarity search) G garbage collectionimmutability and, Limitations of immutability process pauses for, Describing Performance, Process Pauses-Limiting the impact of garbage collection, The Truth Is Defined by the Majority(see also process pauses) genome analysis, Summary, Specialization for different domains geographically distributed datacenters, Distributed Data, Reading Your Own Writes, Unreliable Networks, The limits of total ordering geospatial indexes, Multi-column indexes Giraph (graph processing), The Pregel processing model Git (version control system), Custom conflict resolution logic, The causal order is not a total order, Limitations of immutability GitHub, postmortems, Leader failure: Failover, Leader failure: Failover, Mapping system models to the real world global indexes (see term-partitioned indexes) GlusterFS (distributed filesystem), MapReduce and Distributed Filesystems GNU Coreutils (Linux), Sorting versus in-memory aggregation GoldenGate (change data capture), Trigger-based replication, Multi-datacenter operation, Implementing change data capture(see also Oracle) GoogleBigtable (database)data model (see Bigtable data model) partitioning scheme, Partitioning, Partitioning by Key Range storage layout, Making an LSM-tree out of SSTables Chubby (lock service), Membership and Coordination Services Cloud Dataflow (stream processor), Stream analytics, Atomic commit revisited, Unifying batch and stream processing(see also Beam) Cloud Pub/Sub (messaging), Message brokers compared to databases, Using logs for message storage Docs (collaborative editor), Collaborative editing Dremel (query engine), The divergence between OLTP databases and data warehouses, Column-Oriented Storage FlumeJava (dataflow library), MapReduce workflows, High-Level APIs and Languages GFS (distributed file system), MapReduce and Distributed Filesystems gRPC (RPC framework), Current directions for RPC MapReduce (batch processing), Batch Processing(see also MapReduce) building search indexes, Building search indexes task preemption, Designing for frequent faults Pregel (graph processing), The Pregel processing model Spanner (see Spanner) TrueTime (clock API), Clock readings have a confidence interval gossip protocol, Request Routing government use of data, Data as assets and power GPS (Global Positioning System)use for clock synchronization, Unreliable Clocks, Clock Synchronization and Accuracy, Clock readings have a confidence interval, Synchronized clocks for global snapshots GraphChi (graph processing), Parallel execution graphs, Glossaryas data models, Graph-Like Data Models-The Foundation: Datalogexample of graph-structured data, Graph-Like Data Models property graphs, Property Graphs RDF and triple-stores, Triple-Stores and SPARQL-The SPARQL query language versus the network model, The SPARQL query language processing and analysis, Graphs and Iterative Processing-Parallel executionfault tolerance, Fault tolerance Pregel processing model, The Pregel processing model query languagesCypher, The Cypher Query Language Datalog, The Foundation: Datalog-The Foundation: Datalog recursive SQL queries, Graph Queries in SQL SPARQL, The SPARQL query language-The SPARQL query language Gremlin (graph query language), Graph-Like Data Models grep (Unix tool), Simple Log Analysis GROUP BY clause (SQL), GROUP BY grouping records in MapReduce, GROUP BYhandling skew, Handling skew H Hadoop (data infrastructure)comparison to distributed databases, Batch Processing comparison to MPP databases, Comparing Hadoop to Distributed Databases-Designing for frequent faults comparison to Unix, Philosophy of batch process outputs-Philosophy of batch process outputs, Unbundling Databases diverse processing models in ecosystem, Diversity of processing models HDFS distributed filesystem (see HDFS) higher-level tools, MapReduce workflows join algorithms, Reduce-Side Joins and Grouping-MapReduce workflows with map-side joins(see also MapReduce) MapReduce (see MapReduce) YARN (see YARN) happens-before relationship, Ordering and Causalitycapturing, Capturing the happens-before relationship concurrency and, The “happens-before” relationship and concurrency hard disksaccess patterns, Advantages of LSM-trees detecting corruption, The end-to-end argument, Don’t just blindly trust what they promise faults in, Hardware Faults, Durability sequential write throughput, Hash Indexes, Disk space usage hardware faults, Hardware Faults hash indexes, Hash Indexes-Hash Indexesbroadcast hash joins, Broadcast hash joins partitioned hash joins, Partitioned hash joins hash partitioning, Partitioning by Hash of Key-Partitioning by Hash of Key, Summaryconsistent hashing, Partitioning by Hash of Key problems with hash mod N, How not to do it: hash mod N range queries, Partitioning by Hash of Key suitable hash functions, Partitioning by Hash of Key with fixed number of partitions, Fixed number of partitions HAWQ (database), Specialization for different domains HBase (database)bug due to lack of fencing, The leader and the lock bulk loading, Key-value stores as batch process output column-family data model, Data locality for queries, Column Compression dynamic partitioning, Dynamic partitioning key-range partitioning, Partitioning by Key Range log-structured storage, Making an LSM-tree out of SSTables request routing, Request Routing size-tiered compaction, Performance optimizations use of HDFS, Diversity of processing models use of ZooKeeper, Membership and Coordination Services HDFS (Hadoop Distributed File System), MapReduce and Distributed Filesystems-MapReduce and Distributed Filesystems(see also distributed filesystems) checking data integrity, Don’t just blindly trust what they promise decoupling from query engines, Diversity of processing models indiscriminately dumping data into, Diversity of storage metadata about datasets, MapReduce workflows with map-side joins NameNode, MapReduce and Distributed Filesystems use by Flink, Rebuilding state after a failure use by HBase, Dynamic partitioning use by MapReduce, MapReduce workflows HdrHistogram (numerical library), Describing Performance head (Unix tool), Simple Log Analysis head vertex (property graphs), Property Graphs head-of-line blocking, Describing Performance heap files (databases), Storing values within the index Helix (cluster manager), Request Routing heterogeneous distributed transactions, Distributed Transactions in Practice, Limitations of distributed transactions heuristic decisions (in 2PC), Recovering from coordinator failure Hibernate (object-relational mapper), The Object-Relational Mismatch hierarchical model, Are Document Databases Repeating History?

…

Protocols for making systems Byzantine fault-tolerant are quite complicated [84], and fault-tolerant embedded systems rely on support from the hardware level [81]. In most server-side data systems, the cost of deploying Byzantine fault-tolerant solutions makes them impractical. Web applications do need to expect arbitrary and malicious behavior of clients that are under end-user control, such as web browsers. This is why input validation, sanitization, and output escaping are so important: to prevent SQL injection and cross-site scripting, for example. However, we typically don’t use Byzantine fault-tolerant protocols here, but simply make the server the authority on deciding what client behavior is and isn’t allowed.

…

in derived data systems, Derived Data materialized views, Aggregation: Data Cubes and Materialized Views updating derived data, Single-Object and Multi-Object Operations, The need for multi-object transactions, Combining Specialized Tools by Deriving Data versus normalization, Deriving several views from the same event log derived data, Derived Data, Stream Processing, Glossaryfrom change data capture, Implementing change data capture in event sourcing, Deriving current state from the event log-Deriving current state from the event log maintaining derived state through logs, Databases and Streams-API support for change streams, State, Streams, and Immutability-Concurrency control observing, by subscribing to streams, End-to-end event streams outputs of batch and stream processing, Batch and Stream Processing through application code, Application code as a derivation function versus distributed transactions, Derived data versus distributed transactions deterministic operations, Pros and cons of stored procedures, Faults and Partial Failures, Glossaryaccidental nondeterminism, Fault tolerance and fault tolerance, Fault tolerance, Fault tolerance and idempotence, Idempotence, Reasoning about dataflows computing derived data, Maintaining derived state, Correctness of dataflow systems, Designing for auditability in state machine replication, Using total order broadcast, Databases and Streams, Deriving current state from the event log joins, Time-dependence of joins DevOps, The Unix Philosophy differential dataflow, What’s missing?

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

Protocols for mak‐ ing systems Byzantine fault-tolerant are quite complicated [84], and fault-tolerant embedded systems rely on support from the hardware level [81]. In most server-side data systems, the cost of deploying Byzantine fault-tolerant solutions makes them impractical. Web applications do need to expect arbitrary and malicious behavior of clients that are under end-user control, such as web browsers. This is why input validation, sani‐ tization, and output escaping are so important: to prevent SQL injection and crosssite scripting, for example. However, we typically don’t use Byzantine fault-tolerant protocols here, but simply make the server the authority on deciding what client behavior is and isn’t allowed.

…

It is impossible to reduce the probability of a fault to zero; therefore it is usually best to design fault-tolerance mechanisms that prevent faults from causing failures. In this book we cover several techniques for building reliable systems from unreliable parts. Counterintuitively, in such fault-tolerant systems, it can make sense to increase the rate of faults by triggering them deliberately—for example, by randomly killing indi‐ vidual processes without warning. Many critical bugs are actually due to poor error handling [3]; by deliberately inducing faults, you ensure that the fault-tolerance machinery is continually exercised and tested, which can increase your confidence that faults will be handled correctly when they occur naturally.

…

However, we typically don’t use Byzantine fault-tolerant protocols here, but simply make the server the authority on deciding what client behavior is and isn’t allowed. In peer-to-peer networks, where there is no such cen‐ tral authority, Byzantine fault tolerance is more relevant. A bug in the software could be regarded as a Byzantine fault, but if you deploy the same software to all nodes, then a Byzantine fault-tolerant algorithm cannot save you. Most Byzantine fault-tolerant algorithms require a supermajority of more than twothirds of the nodes to be functioning correctly (i.e., if you have four nodes, at most one may malfunction). To use this approach against bugs, you would have to have four independent implementations of the same software and hope that a bug only appears in one of the four implementations.

Elixir in Action by Saša Jurić

demand response, en.wikipedia.org, fail fast, fault tolerance, finite state, functional programming, general-purpose programming language, higher-order functions, place-making, reproducible builds, Ruby on Rails, WebSocket

As I’ve explained, Erlang goes a long way toward making it possible to write fault-tolerant systems that can run for a long time with hardly any downtime. This is a big challenge and a specific focus of the Erlang platform. Although it’s admittedly unfortunate that the ecosystem isn’t as mature as it could be, my sentiment is that Erlang significantly helps with hard problems, even if simple problems can sometimes be more clumsy to solve. Of course, those difficult problems may not always be important. Perhaps you don’t expect a high load, or a system doesn’t need to run constantly and be extremely fault-tolerant. In such cases, you may want to consider some other technology stack with a more evolved ecosystem.

…

We’ll spend some time exploring BEAM concurrency, a feature that plays a central role in Elixir’s and Erlang’s support for scalability, fault-tolerance, and distribution. In this chapter, we’ll start our tour of BEAM concurrency by looking at basic techniques and tools. Before we explore the lower-level details, we’ll take a look at higher-level principles. 129 130 5.1 Chapter 5 Concurrency primitives Concurrency in BEAM Erlang is all about writing highly available systems — systems that run forever and are always able to meaningfully respond to client requests. To make your system highly available, you have to tackle the following challenges: ¡ Fault-tolerance — Minimize, isolate, and recover from the effects of runtime errors. ¡ Scalability — Handle a load increase by adding more hardware resources without changing or redeploying the code. ¡ Distribution — Run your system on multiple machines so that others can take over if one machine crashes.

…

Doing so promotes the scalability and fault-tolerance of the system. ¡ A process is internally sequential and handles requests one by one. A single process can thus keep its state consistent, but it can also cause a performance bottleneck if it serves many clients. ¡ Carefully consider calls versus casts. Calls are synchronous and therefore block the caller. If the response isn’t needed, casts may improve performance at the expense of reduced guarantees, because a client process doesn’t know the outcome. ¡ You can use mix projects to manage more involved systems that consist of multiple modules. 8 Fault-tolerance basics This chapter covers ¡ Runtime errors ¡ Errors in concurrent systems ¡ Supervisors Fault-tolerance is a first-class concept in BEAM.

pages: 419 words: 102,488

Chaos Engineering: System Resiliency in Practice by Casey Rosenthal, Nora Jones

Amazon Web Services, Asilomar, autonomous vehicles, barriers to entry, blockchain, business continuity plan, business intelligence, business logic, business process, cloud computing, cognitive load, complexity theory, continuous integration, cyber-physical system, database schema, DevOps, fail fast, fault tolerance, hindsight bias, human-factors engineering, information security, Kanban, Kubernetes, leftpad, linear programming, loose coupling, microservices, MITM: man-in-the-middle, no silver bullet, node package manager, operational security, OSI model, pull request, ransomware, risk tolerance, scientific management, Silicon Valley, six sigma, Skype, software as a service, statistical model, systems thinking, the scientific method, value engineering, WebSocket

Conclusion The discoveries made during these exercises and the improvements to Slack’s reliability they inspired were only possible because Disasterpiece Theater gave us a clear process for testing the fault tolerance of our production systems. Disasterpiece Theater exercises are meticulously planned failures that are introduced in the development environment and then, if that goes well, in the production environment by a group of experts all gathered together. It helps minimize the risk inherent in testing fault tolerance, especially when it’s based on assumptions made long ago in older systems that maybe weren’t originally designed to be so fault tolerant. The process is intended to motivate investment in development environments that faithfully match the production environment and to drive reliability improvements throughout complex systems.

…

Robustness and Stability To build users’ trust in a newly released distributed database like TiDB, where data is saved in multiple nodes that communicate with each other, data loss or damage must be prevented at any time. But in the real world, failures can happen any time, anywhere, in a way we can never expect. So how can we survive them? One common way is to make our system fault-tolerant. If one service crashes, another fallover service can take charge immediately without affecting online services. In practice we need to be wary that fault tolerance increases the complexity for a distributed system. How can we ensure that our fault tolerance is robust? Typical ways of testing our tolerance to failures include writing unit tests and integration tests. With the assistance of internal test generation tools, we have developed over 20 million unit test cases.

…

For completeness, I want to address monoliths briefly. There is no precise threshold a system crosses and becomes a monolith—it’s relative. Monolithic systems are not inherently more or less fault tolerant than service-oriented architectures. They may, though, be harder to retrofit because of the sheer surface area, difficulty in affecting incremental change, and difficulty limiting the blast radius of failures. Maybe you decide you’re going to break up your monolith, maybe you don’t. Fault tolerance is reachable via both roads. Design Patterns Common in Newer Systems By contrast, systems being designed today are likely to assume individual computers come and go frequently.

pages: 673 words: 164,804

Peer-to-Peer by Andy Oram

AltaVista, big-box store, c2.com, combinatorial explosion, commoditize, complexity theory, correlation coefficient, dark matter, Dennis Ritchie, fault tolerance, Free Software Foundation, Garrett Hardin, independent contractor, information retrieval, Kickstarter, Larry Wall, Marc Andreessen, moral hazard, Network effects, P = NP, P vs NP, p-value, packet switching, PalmPilot, peer-to-peer, peer-to-peer model, Ponzi scheme, power law, radical decentralization, rolodex, Ronald Coase, Search for Extraterrestrial Intelligence, semantic web, SETI@home, Silicon Valley, slashdot, statistical model, Tragedy of the Commons, UUNET, Vernor Vinge, web application, web of trust, Zimmermann PGP

Publius, Publius and other systems in this book when Freenet data is requested through Free Haven, One network with a thousand faces frequent correspondent lists, POWs not required for, Nonfungible micropayments FSF (Free Software Foundation), A success story: From free software to open source full collisions, Micropayment digital cash schemes fungible micropayments, Micropayment schemes, Fungible micropayments–Anonymous macropayment digital cash schemes amortized pairwise payment model, The difficulty of distributed systems: How to exchange micropayments among peers as accountability measure, Fungible payments for accountability pairwise payment model and, The difficulty of distributed systems: How to exchange micropayments among peers redeemable for real-world currencies, Fungible payments for accountability G Garay, Juan, Other considerations from the case study gateways advantages of, One network with a thousand faces between Freenet and Free Haven, One network with a thousand faces creating, problems with, Problems creating gateways–Free Haven existing projects, Existing projects implementing, Gateway implementation inserts, problems with, Problems with inserts interoperability through, Interoperability Through Gateways–Acknowledgments requests, problems with, Problems with requests–Free Haven Gaussians (signals), How SETI@home works gcc (GNU C Compiler), A success story: From free software to open source Gedye, David, SETI@home GeoCities, The writable Web Germany and legal status of ISPs, Precedents and parries GhostScript, A success story: From free software to open source GNU C Compiler (gcc), A success story: From free software to open source GNU Emacs, A success story: From free software to open source GNU General Public License (GPL), A success story: From free software to open source Gnutella Version 0.56, Gnutella’s first breath GNU tools, A success story: From free software to open source GnuCache, Host caches Gnut software for Unix, Host caches Gnutella, Some context and a definition, Gnutella–Gnutella’s effects accountability problem, solving, Purposes of micropayments and reputation systems analogues to, Gnutella’s analogues–Cultivating the Gnutella network anonymity properties of, An analysis of anonymity anonymous chat, Anonymous Gnutella chat bandwidth, using too much, Accountability blending client, server, and network, The client is the server is the network cellular telephony and, Cellular telephony and the Gnutella network creating an ad hoc backbone, Organizing Gnutella design deficiencies, Gnutella distributed intelligence system, Distributed intelligence–Distributed intelligence distributed search engines and, Distributed search engines dynamic routing, Dynamic routing Ethernet analogous to, Ethernet exchanging micropayments among peers, The difficulty of distributed systems: How to exchange micropayments among peers free riding, impact on, The impact of free riding freeloading on, Freeloading horizon concept, The Gnutella horizon host caches and, Host caches–Returning the network to its natural state how it began, A brief history impact on accountability, Peer-to-peer models and their impacts on accountability InfraSearch and, Distributed intelligence, Searching legal issues, Napster wars link distribution in, Fault tolerance and link distribution in Gnutella message-based routing, Message-based, application-level routing node failure due to random removal, Fault tolerance and link distribution in Gnutella due to targeted attack, Fault tolerance and link distribution in Gnutella vs. Freenet, Fault tolerance and link distribution in Gnutella OmniNetwork, what it can add to, Gnutella performance case study, Case study 2: Gnutella–Scalability placing nodes on the network, Placing nodes on the network private networks, Private Gnutella networks pseudoanonymity and, Gnutella pseudoanonymity–Anonymous Gnutella chat querying the network, Distributed intelligence–Distributed intelligence, Case study 2: Gnutella–Initial experiments reducing broadcasts causes big impact, Reducing broadcasts makes a significant impact–Reducing broadcasts makes a significant impact Reflectors, File sharing: Napster and successors, Cultivating the Gnutella network, Reducing broadcasts makes a significant impact routing techniques, Gnutella saved by open source developers, Open source to the rescue scalability and, Scalability simulating behavior over time, Initial experiments–Initial experiments fault tolerance, Fault tolerance and link distribution in Gnutella super peers and, File sharing: Napster and successors, Scalability TCP-based broadcasts, TCP broadcast traffic problems with, Gnutella’s traffic problems–Reducing broadcasts makes a significant impact transmission loss over TCP, Lossy transmission over reliable TCP trust issues and, Gnutella TTL (time-to-live) numbers, Message broadcasting vs. client/server model, Gnutella works like the real world–Client/server means control, and control means responsibility vs.

…

NAT (Network Address Translation), Message fanout IPv6 and, Technical solutions: Return to the old Internet Nautilus file manager, A success story: From free software to open source negative shilling, Attacks and adversaries Nerdherd web site, Open source to the rescue .NET, Some context and a definition, Conversations and peers, Evolving toward the ideal Netscape’s parallel download strategy, The TCP rate equation: Cooperative protocols Network Address Translation (see NAT) network congestion, The TCP rate equation: Cooperative protocols–The TCP rate equation: Cooperative protocols network model of Internet, The network model of the Internet explosion (1995-1999)–Asymmetric bandwidth Network News Transport Protocol (NNTP) and Usenet, Usenet networks, private (Gnutella), Private Gnutella networks neural networks and scoring algorithms, Scoring algorithms New, Darren, Strategic positioning and core competencies New-Member-Added delta message, The New-Member-Added delta message Newmarch, Jan, Codifying reputation on a wide scale: The PGP web of trust news.admin Usenet group, Usenet, Social solutions: Engineer polite behavior newsgroups (see Usenet) next-generation Web, evaluating it as a form of conversation, Conversations and peers NNTP (Network News Transport Protocol) and Usenet, Usenet node failure due to random removal in Freenet, Simulating fault tolerance in Gnutella, Fault tolerance and link distribution in Gnutella due to targeted attack in Freenet, Simulating fault tolerance in Gnutella, Fault tolerance and link distribution in Gnutella node-specific tickets, Micropayments in the Free Haven context nodes, remailer (see remailers) nonfungible micropayments, Nonfungible micropayments–Nonparallelizable work functions amortized pairwise payment model, The difficulty of distributed systems: How to exchange micropayments among peers extended types of, Extended types of nonfungible micropayments Free Haven works best with, Micropayments in the Free Haven context limitations of, Fungible micropayments pairwise payment model and, The difficulty of distributed systems: How to exchange micropayments among peers POWs and, Micropayment schemes nonparallelizable work functions, Nonparallelizable work functions notifications, buddy, Accountability and the buddy system Nullsoft, Gnutella’s first breath Nutella and Gnutella, Gnutella’s first breath O Odlyzko, A.M., General considerations in an economic analysis of micropayment design Olson, Mancur, Accountability OmniNetwork, Interoperability Through Gateways integrating all network types into, One network with a thousand faces networks and their roles in, Well-known networks and their roles–Free Haven and Publius Onion Routing (mix network), Communications channel, Mix networks open source meme map, A success story: From free software to open source–A success story: From free software to open source peer-to-peer and, File sharing: Napster and successors projects and trust metric, A reputation system that resists pseudospoofing: Advogato–A reputation system that resists pseudospoofing: Advogato saving Gnutella, Open source to the rescue software and trust issues, Open source software summit, Remaking the Peer-to-Peer Meme, A success story: From free software to open source Oram, Andy, Preface–We’d like to hear from you, Afterword–A clean sweep?

…

encrypting, Encryption and decryption, Encryption and decryption (see also cryptography) delta messages, Security characteristics of a shared space documents with Publius, System architecture–System architecture email messages, A simple example of remailers–How Type 2 remailers differ from Type 1 remailers, Other anonymity tools–Mix networks Freenet documents, Keys and redirects IP addresses (Red Rover), The hub–The subscribers micropayments, The difficulty of distributed systems: How to exchange micropayments among peers on the Web, Why secure email is a failure using PGP (Pretty Good Privacy), Signature verification encryption/decryption key pairs, Anatomy of a mutually-trusting shared space, Taxonomy of Groove keys end-to-end payment model, The difficulty of distributed systems: How to exchange micropayments among peers Enterprise Resource Planning (ERP) systems, Interface to the marketplace entities in reputation domains, Reputation domains, entities, and multidimensional reputations–Reputation domains, entities, and multidimensional reputations Reputation Server and, Identity as an element of reputation ERP (Enterprise Resource Planning) systems, Interface to the marketplace Eternity Service, Free Haven and Publius trust issues and, The Eternity Service Eternity Usenet, Eternity Usenet anonymity properties of, An analysis of anonymity Ethernet and Gnutella network, Ethernet EUROCRYPT conference, Anonymous macropayment digital cash schemes Evans, Philip, Long-term vision expiration dates of shares (Free Haven), Share expiration extraterrestrial signals (SETI@home), SETI@home, How SETI@home works F fanout, message, Message fanout–Message fanout Fast Fourier Transform (FFT) algorithm, Radio SETI fault tolerance, Performance simulating in Freenet, Simulating fault tolerance in Gnutella, Fault tolerance and link distribution in Gnutella Faybishenko, Yaroslav, Distributed intelligence Federrath, Hannes, Micropayments in the Free Haven context feedback soliciting from parties in transactions, Collecting ratings, Reputation–Summary system on eBay, Reputations worth real money: eBay trusting sources of, Credibility FFT (Fast Fourier Transform) algorithm, Radio SETI file sharing between Free Haven and Freenet, One network with a thousand faces DNS and, DNS Freenet and, Freenet–Conclusions next-generation peer-to-peer technologies, Next-generation peer-to-peer file-sharing technologies file tampering, detecting, Message digest functions file-sharing systems searching technologies for, Trust and search engines–Deniability trust issues and, File-sharing systems–Freenet unifying with an OmniNetwork, Interoperability Through Gateways Financial Cryptography conference, Anonymous macropayment digital cash schemes firewalls, Message fanout abuse of port 80, Abusing port 80 as obstacles to peer-to-peer, Firewalls, dynamic IP, NAT: The end of the open network making smarter, Technical solutions: Return to the old Internet no good if internal network is compromised, Groove versus email no longer a guarantee of protection, Groove versus email Fishburn, P.C., General considerations in an economic analysis of micropayment design flat-fee methods vs. pay-per-use methods, General considerations in an economic analysis of micropayment design floating-point operations and SETI@home, The world’s most powerful computer flooding attacks, Attacks on documents or the servnet common methods for dealing with, Common methods for dealing with flooding and DoS attacks–Active caching and mirroring foiled by fungible micropayments, Fungible payments for accountability protecting against, Active caching and mirroring protecting peer-to-peer from, Accountability Ford-Fulkerson algorithm and Advogato, A reputation system that resists pseudospoofing: Advogato forgery macropayment techniques for protecting against, Anonymous macropayment digital cash schemes preventing with micropayment schemes, Varieties of micropayments or digital cash thwarted by MicroMint, Micropayment digital cash schemes France and legal status of ISPs, Precedents and parries Frankel, Justin, Gnutella’s first breath Free Haven, Acknowledgments, Free Haven–Acknowledgments accountability, Free Haven buddy system and, Accountability and the buddy system case study, A case study: Accountability in Free Haven–Other considerations from the case study impact on, Peer-to-peer models and their impacts on accountability solving problem of, Purposes of micropayments and reputation systems, Moderating security levels: An accountability slider anonymity, Free Haven–Partial anonymity, An analysis of anonymity–An analysis of anonymity attacks on, Attacks on anonymity properties of, An analysis of anonymity attacks on, Attacks on Free Haven–Attacks on anonymity buddy system ) (see buddy system (Free Haven) choosing good algorithms, Reputation systems communications channel, Elements of the system, Communications channel, Micropayments in the Free Haven context design of, The design of Free Haven–Implementation status documents attacks on, Attacks on documents or the servnet not possible to revoke, Document revocation retrieving, Retrieval storing new, Storage efficiency problems, Future work flexibility, Free Haven freeloader problem, Freeloading, Moderating security levels: An accountability slider goals of, Free Haven introducers, The design of Free Haven, Introducers, Reputation systems micropayments in, Micropayments in the Free Haven context–Micropayments in the Free Haven context nonfungible micropayments work best with, Micropayments in the Free Haven context OmniNetwork, what it can add to, Free Haven and Publius persistence, Free Haven privacy in data-sharing systems, Privacy in data-sharing systems–Reliability with anonymity problems in design of, Future work–Conclusion pseudonyms on, Reliability with anonymity public keys, Elements of the system–Retrieval publication system, Elements of the system–Publication receipts, Trading –Receipts attacks on, Attacks on the reputation system reply blocks, Elements of the system reputation (see reputation, Free Haven) reputation referrals, Reputation systems routing techniques, Free Haven scores and ratings, Reputation systems sending broadcasts in batches, Micropayments in the Free Haven context servers adding/removing, Introducers as introducers, The design of Free Haven, Introducers, Reputation systems broadcasting referrals, Reputation systems contracts formed by, The design of Free Haven credibility/confidence values, Reputation systems punishing misbehaving, Reputation systems servnet, The design of Free Haven attacks on, Attacks on documents or the servnet dynamic nature of, Elements of the system introducing files into, Publication shares buddy system, Accountability and the buddy system expiration dates of, Share expiration hoarding, Attacks on documents or the servnet receipts and, Trading –Receipts, Attacks on the reputation system replicating not allowed, Accountability and the buddy system trading, The design of Free Haven, Trading trust issues and, Mojo Nation and Free Haven, Other considerations from the case study vs.

Principles of Protocol Design by Robin Sharp

accounting loophole / creative accounting, business process, discrete time, exponential backoff, fault tolerance, finite state, functional programming, Gödel, Escher, Bach, information retrieval, loose coupling, MITM: man-in-the-middle, OSI model, packet switching, quantum cryptography, RFC: Request For Comment, stochastic process

The first is a practical objection: Simple languages generally do not correspond to protocols which can tolerate faults, such as missing or duplicated messages. Protocols which are fault-tolerant often require the use of state machines with enormous numbers of states, or they may define context-dependent languages. A more radical objection is that classical analysis of the protocol language from a formal language point of view traditionally concerns itself with the problems of constructing a suitable recogniser, determining the internal states of the recogniser, and so on. This does not help us to analyse or check many of the properties which we may require the protocol to have, such as the properties of fault-tolerance mentioned above. To be able to investigate this we need analytical tools which can describe the parallel operation of all the parties which use the protocol to regulate their communication. 1.2 Protocols as Processes A radically different way of looking at things has therefore gained prominence within recent years.

…

If no value is received from a particular participant, the algorithm should supply some default, vde f . 5.4.1 Using unsigned messages Solutions to this problem depend quite critically on the assumptions made about the system. Initially, we shall assume the following: Degree of fault-tolerance: Out of the n participants, at most t are unreliable. This defines the degree of fault tolerance required of the system. We cannot expect the protocol to work correctly if this limit is overstepped. Network properties: Every message that is sent is delivered correctly, and the receiver of a message knows who sent it. These assumptions mean that an unreliable participant cannot interfere with the message traffic between the other participants.

…

Addressing: Hierarchical addressing. T-address formed by concatenating T-selector onto N-address. Fault tolerance: Loss or duplication of data (DT TPDUs) or acknowledgments (AK TPDUs). Whereas the ISO Class 0 protocol provides minimal functionality, and is therefore only suitable for use when the underlying network is comparatively reliable, the Class 4 protocol is designed to be resilient to a large range of potential disasters, including the arrival of spurious PDUs, PDU loss and PDU corruption. To ensure this degree of fault tolerance, the protocol uses a large number of timers, whose identifications and functions are summarised in Table 9.3.

pages: 371 words: 78,103

Webbots, Spiders, and Screen Scrapers by Michael Schrenk

Amazon Web Services, corporate governance, digital rights, fault tolerance, Firefox, machine readable, Marc Andreessen, new economy, pre–internet, SpamAssassin, The Hackers Conference, Turing test, web application

This action sounds silly, but it is exactly what a poorly programmed webbot may do if it is expecting an available seat and has no provision to act otherwise. Types of Webbot Fault Tolerance For a webbot, fault tolerance involves adapting to changes to URLs, HTML content (which affect parsing), forms, cookie use, and network outages and congestion). We'll examine each of these aspects of fault tolerance in the following sections. Adapting to Changes in URLs Possibly the most important type of webbot fault tolerance is URL tolerance, or a webbot's ability to make valid requests for web pages under changing conditions. URL tolerance ensures that your webbot does the following: Download pages that are available on the target site Follow header redirections to updated pages Use referer values to indicate that you followed a link from a page that is still on the website Avoid Making Requests for Pages That Don't Exist Before you determine that your webbot downloaded a valid web page, you should verify that you made a valid request.

…

Depending on what your webbot does and which website it targets, the identification of a webbot can lead to possible banishment from the website and the loss of a competitive advantage for your business. It's better to avoid these issues by designing fault-tolerant webbots that anticipate changes in the websites they target. Fault tolerance does not mean that everything will always work perfectly. Sometimes changes in a targeted website confuse even the most fault-tolerant webbot. In these cases, the proper thing for a webbot to do is to abort its task and report an error to its owner. Essentially, you want your webbot to fail in the same manner a person using a browser might fail.

…

In that regard, think about how and when people use browsers, and try to write webbots that mimic that activity. * * * [68] See Chapter 28 for more information about trespass to chattels. [69] You can find the owner of an IP address at http://www.arin.net. Chapter 25. WRITING FAULT-TOLERANT WEBBOTS The biggest complaint users have about webbots is their unreliability: Your webbots will suddenly and inexplicably fail if they are not fault tolerant, or able to adapt to the changing conditions of your target websites. This chapter is devoted to helping you write webbots that are tolerant to network outages and unexpected changes in the web pages you target.

Pragmatic.Programming.Erlang.Jul.2007 by Unknown

Debian, en.wikipedia.org, fault tolerance, finite state, full text search, functional programming, higher-order functions, Planet Labs, RFC: Request For Comment

The author kept on and on about concurrency and distribution and fault tolerance and about a method of programming called concurrency-oriented programming—whatever that might mean. But some of the examples looked like fun. That evening the programmer looked at the example chat program. It was pretty small and easy to understand, even if the syntax was a bit strange. Surely it couldn’t be that easy. The basic program was simple, and with a few more lines of code, file sharing and encrypted conversations became possible. The programmer started typing.... What’s This All About? It’s about concurrency. It’s about distribution. It’s about fault tolerance. It’s about functional programming.

…

@spec unlink(Pid) -> true This removes any link between the current process and the process Pid. @spec exit(Why) -> none() This causes the current process to terminate with reason Why. If the clause that executes this statement is not within the scope of 170 E RROR H ANDLING P RIMITIVES Joe Asks. . . How Can We Make a Fault-Tolerant System? To make something fault tolerant, we need at least two computers. One computer does the job, and another computer watches the first computer and must be ready to take over at a moment’s notice if the first computer fails. This is exactly how error recovery works in Erlang. One process does the job, and another process watches the first process and takes over if things go wrong.

…

In distributed Erlang, the process that does the job and the processes that monitor the process that does the job can be placed on physically different machines. Using this technique, we can start designing fault-tolerant software. This pattern is common. We call it the worker-supervisor model, and an entire section of the OTP libraries is devoted to building supervision trees that use this idea. The basic language primitive that makes all this possible is the link primitive. Once you understand how link works and get yourself access to two computers, then you’re well on your way to building your first fault-tolerant system. a catch statement, then the current process will broadcast an exit signal, with argument Why to all processes to which it is currently linked.

pages: 680 words: 157,865

Beautiful Architecture: Leading Thinkers Reveal the Hidden Beauty in Software Design by Diomidis Spinellis, Georgios Gousios

Albert Einstein, barriers to entry, business intelligence, business logic, business process, call centre, continuous integration, corporate governance, database schema, Debian, domain-specific language, don't repeat yourself, Donald Knuth, duck typing, en.wikipedia.org, fail fast, fault tolerance, financial engineering, Firefox, Free Software Foundation, functional programming, general-purpose programming language, higher-order functions, iterative process, linked data, locality of reference, loose coupling, meta-analysis, MVC pattern, Neal Stephenson, no silver bullet, peer-to-peer, premature optimization, recommendation engine, Richard Stallman, Ruby on Rails, semantic web, smart cities, social graph, social web, SPARQL, Steve Jobs, Stewart Brand, Strategic Defense Initiative, systems thinking, the Cathedral and the Bazaar, traveling salesman, Turing complete, type inference, web application, zero-coupon bond

Project Darkstar high-level architecture Unlike most replication schemes, the different copies of the game logic are not meant to process the same events. Instead, each copy can independently interact with the clients. Replication in this design is used primarily to allow scale rather than to ensure fault tolerance (although, as we will see later, fault tolerance is also achieved). Further, the game logic itself does not know or need to know that there are other copies of the server operating on other machines. The code written by the game programmer runs as if it were on a single machine, with coordination of the different copies done by the Project Darkstar infrastructure.

…

NIO image transfer Obviously, that leaves the problem of getting the images from the client to the server. One option we considered and rejected early was CIFS—Windows shared drives. Our main concern here was fault-tolerance, but transfer speed also worried us. These machines needed to move a lot of data back and forth, while photographers and customers were sitting around waiting. In our matrix of off-the-shelf options, nothing had the right mix of speed, parallelism, fault-tolerance, and information hiding. Reluctantly, we decided to build our own file transfer protocol, which led us into one of the most complex areas of Creation Center. Image transfer became a severe trial, but we emerged, at last, with one of the most robust features of the whole system.

…

Computers haven’t been around that long, of course, but here too there have been many examples of beautiful architectures in the past. As with buildings, the style doesn’t always persist, and in this chapter I describe one such architecture and consider why it had so little impact. Guardian is the operating system for Tandem’s fault-tolerant “NonStop” series of computers. It was designed in parallel with the hardware to provide fault tolerance with minimal overhead cost. This chapter describes the original Tandem machine, designed between 1974 and 1976 and shipped between 1976 and 1982. It was originally called “Tandem/16,” but after the introduction of its successor, “NonStop II,” it was retrospectively renamed “NonStop I.”

Mastering Blockchain, Second Edition by Imran Bashir

3D printing, altcoin, augmented reality, autonomous vehicles, bitcoin, blockchain, business logic, business process, carbon footprint, centralized clearinghouse, cloud computing, connected car, cryptocurrency, data acquisition, Debian, disintermediation, disruptive innovation, distributed ledger, Dogecoin, domain-specific language, en.wikipedia.org, Ethereum, ethereum blockchain, fault tolerance, fiat currency, Firefox, full stack developer, general-purpose programming language, gravity well, information security, initial coin offering, interest rate swap, Internet of things, litecoin, loose coupling, machine readable, MITM: man-in-the-middle, MVC pattern, Network effects, new economy, node package manager, Oculus Rift, peer-to-peer, platform as a service, prediction markets, QR code, RAND corporation, Real Time Gross Settlement, reversible computing, RFC: Request For Comment, RFID, ride hailing / ride sharing, Satoshi Nakamoto, seminal paper, single page application, smart cities, smart contracts, smart grid, smart meter, supply-chain management, transaction costs, Turing complete, Turing machine, Vitalik Buterin, web application, x509 certificate

Consensus is pluggable and currently, there are two types of ordering services available in Hyperledger Fabric: SOLO: This is a basic ordering service intended to be used for development and testing purposes. Kafka: This is an implementation of Apache Kafka, which provides ordering service. It should be noted that currently Kafka only provides crash fault tolerance but does not provide byzantine fault tolerance. This is acceptable in a permissioned network where chances of malicious actors are almost none. In addition to these mechanisms, the Simple Byzantine Fault Tolerance (SBFT) based mechanism is also under development, which will become available in the later releases of Hyperledger Fabric. Distributed ledger Blockchain and world state are two main elements of the distributed ledger.

…

Now, if the update is rejected by the node, that would result in loss of availability. In that case due to partition tolerance, both availability and consistency are unachievable. This is strange because somehow blockchain manages to achieve all of these properties—or does it? This will be explained shortly. To achieve fault tolerance, replication is used. This is a standard and widely-used method to achieve fault tolerance. Consistency is achieved using consensus algorithms in order to ensure that all nodes have the same copy of the data. This is also called state machine replication. The blockchain is a means for achieving state machine replication. In general, there are two types of faults that a node can experience.

…

As an analogy to distributed systems, the generals can be considered nodes, the traitors as Byzantine (malicious) nodes, and the messenger can be thought of as a channel of communication among the generals. This problem was solved in 1999 by Castro and Liskov who presented the Practical Byzantine Fault Tolerance (PBFT) algorithm, where consensus is reached after a certain number of messages are received containing the same signed content. This type of inconsistent behavior of Byzantine nodes can be intentionally malicious, which is detrimental to the operation of the network. Any unexpected behavior by a node on the network, whether malicious or not, can be categorized as Byzantine.

pages: 931 words: 79,142

Concepts, Techniques, and Models of Computer Programming by Peter Van-Roy, Seif Haridi

computer age, Debian, discrete time, Donald Knuth, Eratosthenes, fault tolerance, functional programming, G4S, general-purpose programming language, George Santayana, John von Neumann, Lao Tzu, Menlo Park, natural language processing, NP-complete, Paul Graham, premature optimization, sorting algorithm, the Cathedral and the Bazaar, Therac-25, Turing complete, Turing machine, type inference

This solution is harder to implement, but can make the program much simpler. Raising an exception corresponds to aborting the transaction. A third motivation is fault tolerance. Lightweight transactions are important for writing fault-tolerant applications. With respect to a component, e.g., an application doing a transaction, we deﬁne a fault as incorrect behavior in one of its subcomponents. Ideally, the application should continue to behave correctly when there are faults, i.e., it should be fault tolerant. When a fault occurs, a fault-tolerant application has to take three steps: (1) detect the fault, (2) contain the fault in a limited part of the application, and (3) repair any problems caused by the fault.

…

For this exercise, write an abstraction for a replicated server that hides all the fault-handling activities from the clients. 5. (advanced exercise) Fault tolerance and synchronous communication. Section 11.10 says that synchronous communication makes fault conﬁnement easier. Section 5.7 says that asynchronous communication helps keep concurrent components independent, which is important when building fault tolerance abstractions. For this exercise, reconcile these two principles by studying the architecture of fault tolerant applications. This page intentionally left blank 12 Constraint Programming by Peter Van Roy, Raphaël Collet, and Seif Haridi Plans within plans within plans within plans. – Dune, Frank Herbert (1920–1986) Constraint programming consists of a set of techniques for solving constraint satisfaction problems.

…

Its implementation, the Ericsson OTP (Open Telecom Platform), features ﬁne-grained concurrency (eﬃcient threads), extreme reliability (high-performance software fault tolerance), and hot code replacement ability (update software while the system is running). It is a high-level language that hides the internal representation of data and does automatic memory management. It has been used successfully in several Ericsson products. 5.7.1 Computation model The Erlang computation model has an elegant layered structure. We ﬁrst explain the model and then we show how it is extended for distribution and fault tolerance. The Erlang computation model consists of concurrent entities called “processes.”

pages: 496 words: 70,263

Erlang Programming by Francesco Cesarini

cloud computing, fault tolerance, finite state, functional programming, higher-order functions, loose coupling, revision control, RFC: Request For Comment, social bookmarking, sorting algorithm, Turing test, type inference, web application

This should record enough information to enable billing for the use of the phone. 138 | Chapter 5: Process Design Patterns CHAPTER 6 Process Error Handling Whatever the programming language, building distributed, fault-tolerant, and scalable systems with requirements for high availability is not for the faint of heart. Erlang’s reputation for handling the fault-tolerant and high-availability aspects of these systems has its foundations in the simple but powerful constructs built into the language’s concurrency model. These constructs allow processes to monitor each other’s behavior and to recover from software faults. They give Erlang a competitive advantage over other programming languages, as they facilitate development of the complex architecture that provides the required fault tolerance through isolating errors and ensuring nonstop operation.

…

In conjunction with these projects, the OTP framework was developed and released in 1996. OTP provides a framework to structure Erlang systems, offering robustness and fault tolerance together with a set of tools and libraries. The history of Erlang is important in understanding its philosophy. Although many languages were developed before finding their niche, Erlang was developed to solve the “time-to-market” requirements of distributed, fault-tolerant, massively concurrent, soft real-time systems. The fact that web services, retail and commercial banking, computer telephony, messaging systems, and enterprise integration, to mention but a few, happen to share the same requirements as telecom systems explains why Erlang is gaining headway in these sectors.

…

A typical example here is a web server: if you are planning a new release of a piece of software, or you are planning to stream video of a football match in real time, distributing the server across a number of machines will make this possible without failure. This performance is given by replication of a service—in this case a web server— which is often found in the architecture of a distributed system. • Replication also provides fault tolerance: if one of the replicated web servers fails or becomes unavailable for some reason, HTTP requests can still be served by the other servers, albeit at a slower rate. This fault tolerance allows the system to be more robust and reliable. • Distribution allows transparent access to remote resources, and building on this, it is possible to federate a collection of different systems to provide an overall user service.

pages: 570 words: 115,722

The Tangled Web: A Guide to Securing Modern Web Applications by Michal Zalewski

barriers to entry, business process, defense in depth, easy for humans, difficult for computers, fault tolerance, finite state, Firefox, Google Chrome, information retrieval, information security, machine readable, Multics, RFC: Request For Comment, semantic web, Steve Jobs, telemarketer, Tragedy of the Commons, Turing test, Vannevar Bush, web application, WebRTC, WebSocket

Vendors released their products with embedded programming languages such as JavaScript and Visual Basic, plug-ins to execute platform-independent Java or Flash applets on the user’s machine, and useful but tricky HTTP extensions such as cookies. Only a limited degree of superficial compatibility, sometimes hindered by patents and trademarks,[7] would be maintained. As the Web grew larger and more diverse, a sneaky disease spread across browser engines under the guise of fault tolerance. At first, the reasoning seemed to make perfect sense: If browser A could display a poorly designed, broken page but browser B refused to (for any reason), users would inevitably see browser B’s failure as a bug in that product and flock in droves to the seemingly more capable client, browser A.

…

One such example is the advice on parsing dates in certain HTTP headers, at the request of section 3.3 in RFC 1945. The resulting implementation (the prtime.c file in the Firefox codebase[118]) consists of close to 2,000 lines of extremely confusing and unreadable C code just to decipher the specified date, time, and time zone in a sufficiently fault-tolerant way (for uses such as deciding cache content expiration). Semicolon-Delimited Header Values Several HTTP headers, such as Cache-Control or Content-Disposition, use a semicolon-delimited syntax to cram several separate name=value pairs into a single line. The reason for allowing this nested notation is unclear, but it is probably driven by the belief that it will be a more efficient or a more intuitive approach that using several separate headers that would always have to go hand in hand.

…

Escape or substitute these values as appropriate. When building a new HTTP client, server, or proxy: Do not create a new implementation unless you absolutely have to. If you can’t help it, read this chapter thoroughly and aim to mimic an existing mainstream implementation closely. If possible, ignore the RFC-provided advice about fault tolerance and bail out if you encounter any syntax ambiguities. * * * [24] Public key cryptography relies on asymmetrical encryption algorithms to create a pair of keys: a private one, kept secret by the owner and required to decrypt messages, and a public one, broadcast to the world and useful only to encrypt traffic to that recipient, not to decrypt it.

pages: 1,085 words: 219,144

Solr in Action by Trey Grainger, Timothy Potter

business intelligence, cloud computing, commoditize, conceptual framework, crowdsourcing, data acquisition, data science, en.wikipedia.org, failed state, fault tolerance, finite state, full text search, functional programming, glass ceiling, information retrieval, machine readable, natural language processing, openstreetmap, performance metric, premature optimization, recommendation engine, web application

We only point this out because it’s a testament to the depth and breadth of automated testing in Lucene and Solr. If you have a nightly build off trunk in which all the automated tests pass, then you can be fairly confident that the core functionality is solid. We’ve touched on Solr’s approach to scalability and fault tolerance in sections 1.2.6 and 1.2.7. As an architect, you’re probably most curious about the limitations of Solr’s approach to scalability and fault tolerance. First, you should realize that the sharding and replication features in Solr have been improved in Solr 4 to be robust and easier to manage. The new approach to scaling is called SolrCloud. Under the covers, SolrCloud uses Apache ZooKeeper to distribute configurations across a cluster of Solr servers and to keep track of cluster state.

…

You may want to use replication either when you want to isolate indexing from searching operations to different servers within your cluster or when you need to increase available queries-per-second capacity. Fault tolerance It’s great that we can increase our overall query capacity by adding another server and replicating the index to that server, but what happens when one of our servers eventually crashes? When our application had only one server, the application clearly would have stopped. Now that multiple, redundant servers exist, one server dying will simply reduce our capacity back to the capacity of however many servers remain. If you want to build fault tolerance into your system, it’s a good idea to have additional resources (extra slave servers) in your cluster so that your system can continue functioning with enough capacity even if a single server fails.

…

Chapter 13 will demonstrate how you can delegate most of the fault tolerance and distributed query routing concerns to SolrCloud to make scaling Solr a more manageable process. Regardless of which scaling approach you take, it can be useful to understand how to interact with your Solr cores without having to restart Solr to make changes. The next section will dive into Solr’s Core Admin API, which provides rich features for managing Solr cores. 12.6. Solr core management In the previous section we discussed how to scale Solr using sharding (for large document sets) and replication (for fault tolerance and query load). Solr contains a suite of capabilities collectively called SolrCloud (covered in depth in the next chapter) that makes managing collections of documents through shards and replicas somewhat seamless.

pages: 161 words: 44,488

The Business Blockchain: Promise, Practice, and Application of the Next Internet Technology by William Mougayar

Airbnb, airport security, Albert Einstein, altcoin, Amazon Web Services, bitcoin, Black Swan, blockchain, business logic, business process, centralized clearinghouse, Clayton Christensen, cloud computing, cryptocurrency, decentralized internet, disintermediation, distributed ledger, Edward Snowden, en.wikipedia.org, Ethereum, ethereum blockchain, fault tolerance, fiat currency, fixed income, Ford Model T, global value chain, Innovator's Dilemma, Internet of things, Kevin Kelly, Kickstarter, market clearing, Network effects, new economy, peer-to-peer, peer-to-peer lending, prediction markets, pull request, QR code, ride hailing / ride sharing, Satoshi Nakamoto, sharing economy, smart contracts, social web, software as a service, too big to fail, Turing complete, Vitalik Buterin, web application, Yochai Benkler

Leslie Lamport, Robert Shostak, and Marshall Pease, The Byzantine Generals Problem. http://research.microsoft.com/en-us/um/people/lamport/pubs/byz.pdf. 6. IT Does not Matter, https://hbr.org/2003/05/it-doesnt-matter. 7. PayPal website, https://www.paypal.com/webapps/mpp/about. 8. Personal communication with Vitalik Buterin, February 2016. 9. Byzantine fault tolerance, https://en.wikipedia.org/wiki/Byzantine_fault_tolerance. 10. Proof-of-stake, https://en.wikipedia.org/wiki/Proof-of-stake. 2 HOW BLOCKCHAIN TRUST INFILTRATES “I cannot understand why people are frightened of new ideas. I’m frightened of the old ones.” –JOHN CAGE REACHING CONSENSUS is at the heart of a blockchain’s operations.

…

In part, the continuation of some of the trends in crypto 2.0, and particularly generalized protocols that provide both computational abstraction and privacy. But equally important is the current technological elephant in the room in the blockchain sphere: scalability. Currently, all existing blockchain protocols have the property that every computer in the network must process every transaction—a property that provides extreme degrees of fault tolerance and security, but at the cost of ensuring that the network's processing power is effectively bounded by the processing power of a single node. Crypto 3.0—at least in my mind—consists of approaches that move beyond this limitation, in one of various ways to create systems that break through this limitation and actually achieve the scale needed to support mainstream adoption (technically astute readers may have heard of “lightning networks,” “state channels,” and “sharding”).

…

Game theory is ‘the study of mathematical models of conflict and cooperation between intelligent rational decision-makers.”4 And this is related to the blockchain because the Bitcoin blockchain, originally conceived by Satoshi Nakamoto, had to solve a known game theory conundrum called the Byzantine Generals Problem.5 Solving that problem consists in mitigating any attempts by a small number of unethical Generals who would otherwise become traitors, and lie about coordinating their attack to guarantee victory. This is accomplished by enforcing a process for verifying the work that was put into crafting these messages, and time-limiting the requirement for seeing untampered messages in order to ensure their validity. Implementing a “Byzantine Fault Tolerance” is important because it starts with the assumption that you cannot trust anyone, and yet it delivers assurance that the transaction has traveled and arrived safely based on trusting the network during its journey, while surviving potential attacks. There are fundamental implications for this new method of reaching safety in the finality of a transaction, because it questions the existence and roles of current trusted intermediaries, who held the traditional authority on validating transactions.

pages: 434 words: 77,974

Mastering Blockchain: Unlocking the Power of Cryptocurrencies and Smart Contracts by Lorne Lantz, Daniel Cawrey

air gap, altcoin, Amazon Web Services, barriers to entry, bitcoin, blockchain, business logic, business process, call centre, capital controls, cloud computing, corporate governance, creative destruction, cross-border payments, cryptocurrency, currency peg, disinformation, disintermediation, distributed ledger, Dogecoin, Ethereum, ethereum blockchain, fault tolerance, fiat currency, Firefox, global reserve currency, information security, initial coin offering, Internet of things, Kubernetes, litecoin, low interest rates, Lyft, machine readable, margin call, MITM: man-in-the-middle, multilevel marketing, Network effects, offshore financial centre, OSI model, packet switching, peer-to-peer, Ponzi scheme, prediction markets, QR code, ransomware, regulatory arbitrage, rent-seeking, reserve currency, Robinhood: mobile stock trading app, Ross Ulbricht, Satoshi Nakamoto, Silicon Valley, Skype, smart contracts, software as a service, Steve Wozniak, tulip mania, uber lyft, unbanked and underbanked, underbanked, Vitalik Buterin, web application, WebSocket, WikiLeaks

The following are some of the companies involved, and their roles: Payments: PayU Technology: Facebook, FarFetch, Lyft, Spotify, Uber Telecom: Iliad Blockchain: Anchorage, BisonTrails, Coinbase, Xapo Venture capital: Andreessen Horowitz, Breakthrough Initiatives, Union Square Ventures, Ribbit Capital, Thrive Capital Nonprofits: Creative Destruction Lab, Kiva, Mercy Corps, Women’s World Banking Borrowing from Existing Blockchains The Libra Association intends to create an entirely new payments system on the internet by using a proof-of-stake consensus Byzantine fault-tolerant algorithm developed by VMware, known as HotStuff. The association’s members will be the validators of the system. HotStuff uses a lead validator. It accepts transactions from the clients and uses a voting mechanism for validation. It is fault tolerant because the other validators can take the lead’s place in case of error or downtime. Byzantine fault tolerance is used in other blockchain systems, most notably on some smaller open networks utilizing proof-of-stake. Figure 9-7 illustrates Libra’s consensus mechanism.

…

While early on Ripple was an open source competitor to Bitcoin, with third-party “gateways” that functioned as a method of anonymous exchange, in 2014 the company pivoted to supporting banks as a faster and cheaper settlement network with a cross-border focus. Instead of using traditional proof-of-work, Ripple introduced a new type of consensus known as the XRP Consensus Protocol. It uses Byzantine fault-tolerant agreement, which requires nodes to come to agreement on transactions. Ripple has hundreds of partnerships with various companies in the banking and payments sectors. The best-known strategic partnership is with the money remittances company MoneyGram, in which Ripple has made a $50 million equity investment.

…

VmWare With support for the EVM, DAML, and Hyperledger, VmWare Blockchain is a multiblockchain platform. Developers are also able to use VmWare’s cloud technology to set up various types of infrastructure implementations, including the option of hybrid cloud capabilities to increase security and privacy. It also uses a Byzantine fault-tolerant consensus engine to provide features of decentralization. Oracle Oracle’s Blockchain Platform is built on Hyperledger Fabric and supports multicloud implementations—hybrid, on-premise, or a mix of the two for greater flexibility. The purpose is to be able to configure specific environments depending on regulatory requirements.

Scala in Action by Nilanjan Raychaudhuri

business logic, continuous integration, create, read, update, delete, database schema, domain-specific language, don't repeat yourself, duck typing, en.wikipedia.org, failed state, fault tolerance, functional programming, general-purpose programming language, higher-order functions, index card, Kanban, MVC pattern, type inference, web application

So many things can go wrong in the concurrent/ parallel programming world. What if we get an IOException while reading the file? Let’s learn how to handle faults in an actor-based application. 9.3.4. Fault tolerance made easy with a supervisor Akka encourages nondefensive programming in which failure is a valid state in the lifecycle of an application. As a programmer you know you can’t prevent every error, so it’s better to prepare your application for the errors. You can easily do this through fault-tolerance support provided by Akka through the supervisor hierarchy. Think of this supervisor as an actor that links to supervised actors and restarts them when one dies.

…

You can have one supervisor linked to another supervisor. That way you can supervise a supervisor in case of a crash. It’s hard to build a fault-tolerant system with one box, so I recommend having your supervisor hierarchy spread across multiple machines. That way, if a node (machine) is down, you can restart an actor in a different box. Always remember to delegate the work so that if a crash occurs, another supervisor can recover. Now let’s look into the fault-tolerant strategies available in Akka. Supervision Strategies in Akka Akka comes with two restarting strategies: One-for-One and All-for-One.

…

First I’ll talk about the philosophy behind Akka so you understand the goal behind the Akka project and the problems it tries to solve. 12.1. The philosophy behind Akka The philosophy behind Akka is simple: make it easier for developers to build correct, concurrent, scalable, and fault-tolerant applications. To that end, Akka provides a higher level of abstractions to deal with concurrency, scalability, and faults. Figure 12.1 shows the three core modules provided by Akka for concurrency, scalability, and fault tolerance. Figure 12.1. Akka core modules The concurrency module provides options to solve concurrency-related problems. By now I’m sure you’re comfortable with actors (message-oriented concurrency).

pages: 250 words: 73,574

Nine Algorithms That Changed the Future: The Ingenious Ideas That Drive Today's Computers by John MacCormick, Chris Bishop

Ada Lovelace, AltaVista, Charles Babbage, Claude Shannon: information theory, Computing Machinery and Intelligence, fault tolerance, information retrieval, Menlo Park, PageRank, pattern recognition, Richard Feynman, Silicon Valley, Simon Singh, sorting algorithm, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, traveling salesman, Turing machine, Turing test, Vannevar Bush

At the time of writing, however, many of the systems that claim to be peer-to-peer in fact use central servers for some of their functionality and thus do not need to rely on distributed hash tables. The technique of “Byzantine fault tolerance” falls in the same category: a surprising and beautiful algorithm that can't yet be classed as great, due to lack of adoption. Byzantine fault tolerance allows certain computer systems to tolerate any type of error whatsoever (as long as there are not too many simultaneous errors). This contrasts with the more usual notion of fault tolerance, in which a system can survive more benign errors, such as the permanent failure of a disk drive or an operating system crash.

…

The to-do list trick also guarantees consistency in the face of failures. When combined with the prepare-then-commit trick for replicated databases, we are left with iron-clad consistency and durability for our data. The heroic triumph of databases over unreliable components, known by computer scientists as “fault-tolerance,” is the work of many researchers over many decades. But among the most important contributors was Jim Gray, a superb computer scientist who literally wrote the book on transaction processing. (The book is Transaction Processing: Concepts and Techniques, first published in 1992.) Sadly, Gray's career ended early: one day in 2007, he sailed his yacht out of San Francisco Bay, under the Golden Gate Bridge, and into the open ocean on a planned day trip to some nearby islands.

…

See also certification authority authority trick B-tree Babylonia backup bank; account number; balance; for keys; online banking; for signatures; transfer; as trusted third party base, in exponentiation Battelle, John Bell Telephone Company binary Bing biology biometric sensor Bishop, Christopher bit block cipher body, of a web page brain Brin, Sergey British government browser brute force bug Burrows, Mike Bush, Vannevar Businessweek Byzantine fault tolerance C++ programming language CA. See certification authority calculus Caltech Cambridge CanCrash.exe CanCrashWeird.exe Carnegie Mellon University CD cell phone. See phone certificate certification authority Charles Babbage Institute chat-bot checkbook checksum; in practice; simple; staircase.

pages: 422 words: 86,414

Hands-On RESTful API Design Patterns and Best Practices by Harihara Subramanian

blockchain, business logic, business process, cloud computing, continuous integration, create, read, update, delete, cyber-physical system, data science, database schema, DevOps, disruptive innovation, domain-specific language, fault tolerance, information security, Infrastructure as a Service, Internet of things, inventory management, job automation, Kickstarter, knowledge worker, Kubernetes, loose coupling, Lyft, machine readable, microservices, MITM: man-in-the-middle, MVC pattern, Salesforce, self-driving car, semantic web, single page application, smart cities, smart contracts, software as a service, SQL injection, supply-chain management, web application, WebSocket

Thus, API gateway clustering is important for continuously receiving and responding to service messages and the Load Balancer plays a vital role in fulfilling this, as illustrated in the following diagram: High availability and failover In the era of microservices, guaranteeing high available through fault tolerance, fault detection, and isolation is an important thing for architects. In the recent past, API gateway solutions emerged as a critical component for the microeconomic era. Microservice architecture is being touted as as the soul and savior for facilitating the mandated business adaptivity, process optimization, and automation.

…

The orchestration/choreography, brokerage, discovery, routing, enrichment, policy enforcement, governance, concierge jobs, and so on are performed by standardized API gateway solutions. On the other hand, API management adds additional capabilities such as analytics and life cycle management. In future, there will be attempts to meet QoS and NFRs such as availability, scalability, high performance/throughput, security, and reliability through replication and fault tolerance, through a combination of API gateways, cluster and orchestration platforms, service mesh solutions, and so on. API gateways for microservice-centric applications The unique contributions of API gateways for operationalizing microservices in a beneficial fashion are growing as days pass.

…

This strategically-sound transition ultimately enables them to be innately smart in their actions and reactions. And this grandiose and technology-inspired transformation of everyday elements and entities in our daily environments leads to the timely formulation and delivery of service-oriented, event-driven, insight-filled, cloud-enabled, fault-tolerant, mission-critical, multifaceted, and people-centric services. The role of the powerful RESTful paradigm in building and providing these kinds of advanced and next-generation services is steadily growing. This chapter is specially crafted to tell you all about the contributions of the RESTful services paradigm toward designing, developing, and deploying next-generation microservices-centric and enterprise-scale applications.

pages: 412 words: 104,864

Silence on the Wire: A Field Guide to Passive Reconnaissance and Indirect Attacks by Michal Zalewski

active measures, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, AltaVista, Charles Babbage, complexity theory, dark matter, data acquisition, Donald Knuth, fault tolerance, information security, MITM: man-in-the-middle, NP-complete, OSI model, Silicon Valley, speech recognition, Turing complete, Turing machine, Vannevar Bush

Spanning tree protocol (STP) Lets you build redundant network structures in which switches are interconnected in more than one location, in order to maintain fault tolerance. Traditionally, such a design could cause broadcast traffic and some other packets to loop forever while also causing network performance to deteriorate significantly, because the data received on one interface and forwarded to another in effect bounces back to the originator (see Figure 7-2, left). When designing a network, it is often difficult to avoid accidental broadcast loops. It is also sometimes desirable to design architectures with potential loops (in which one switch connects to two or more switches), because this type of design is much more fault tolerant and a single device or single link can be taken out without dividing the entire network into two separate islands.

…

A stateful NAT mechanism can be used, among other applications, to implement fault-tolerant setups in which a single, publicly accessible IP address is served by more than one internal server. Or to save address space and improve security, NAT can be implemented to allow the internal network to use a pool of private, not publicly accessible, addresses, while enabling hosts on the network to communicate with the Internet by “masquerading” as a single public IP machine. In the first scenario, NAT rewrites destination addresses on incoming packets to a number of private systems behind the firewall. This provides a fault-tolerant load-balancing setup, in which subsequent requests to a popular website (http://www.microsoft.com, perhaps) or other critical service can be distributed among an array of systems, and if any one fails, other systems can take over.

…

Based on the result of this election, a treelike traffic distribution hierarchy is built from this node down, and links that could cause a reverse propagation of broadcast traffic are temporarily disabled (see Figure 7-2, right). You can quickly change this simple self-organizing hierarchy when one of the nodes drops off and reactivate a link previously deemed unnecessary. Figure 7-2. Packet storm problem and STP election scheme; left side shows a fault-tolerant network with no STP, where some packets are bound to loop (almost) forever between switches; right side is the same network where one of the devices was automatically elected a master node using STP, and for which the logical topology was adjusted to eliminate loops. When one of the links fails, the network would be reconfigured to ensure proper operations

pages: 305 words: 89,103

Scarcity: The True Cost of Not Having Enough by Sendhil Mullainathan

American Society of Civil Engineers: Report Card, Andrei Shleifer, behavioural economics, Cass Sunstein, clean water, cognitive load, computer vision, delayed gratification, double entry bookkeeping, Exxon Valdez, fault tolerance, happiness index / gross national happiness, impulse control, indoor plumbing, inventory management, knowledge worker, late fees, linear programming, mental accounting, microcredit, p-value, payday loans, purchasing power parity, randomized controlled trial, Report Card for America’s Infrastructure, Richard Thaler, Saturday Night Live, Walter Mischel, Yogi Berra

Skipping class in a training program while you’re dealing with scarcity is not the same as playing hooky in middle school. Linear classes that must not be missed can work well for the full-time student; they do not make sense for the juggling poor. It is important to emphasize that fault tolerance is not a substitute for personal responsibility. On the contrary: fault tolerance is a way to ensure that when the poor do take it on themselves, they can improve—as so many do. Fault tolerance allows the opportunities people receive to match the effort they put in and the circumstances they face. It does not take away the need for hard work; rather, it allows hard work to yield better returns for those who are up for the challenge, just as improved levers in the cockpit allow the dedicated pilot to excel.

…

But why not look at the design of the cockpit rather than the workings of the pilot? Why not look at the structure of the programs rather than the failings of the clients? If we accept that pilots can fail and that cockpits need to be wisely structured so as to inhibit those failures, why can we not do the same with the poor? Why not design programs structured to be more fault tolerant? We could ask the same question of anti-poverty programs. Consider the training programs, where absenteeism is common and dropout rates are high. What happens when, loaded and depleted, a client misses a class? What happens when her mind wanders in class? The next class becomes a lot harder.

…

Now roll forward a few more weeks. By now you’ve missed another class. And when you go, you understand less than before. Eventually you decide it’s just too much right now; you’ll drop out and sign up another time, when your financial life is more together. The program you tried was not designed to be fault tolerant. It magnified your mistakes, which were predictable, and essentially pushed you out the door. But it need not be that way. Instead of insisting on no mistakes or for behavior to change, we can redesign the cockpit. Curricula can be altered, for example, so that there are modules, staggered to start at different times and to proceed in parallel.

pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline by Cathy O'Neil, Rachel Schutt

Amazon Mechanical Turk, augmented reality, Augustin-Louis Cauchy, barriers to entry, Bayesian statistics, bike sharing, bioinformatics, computer vision, confounding variable, correlation does not imply causation, crowdsourcing, data science, distributed generation, Dunning–Kruger effect, Edward Snowden, Emanuel Derman, fault tolerance, Filter Bubble, finite state, Firefox, game design, Google Glasses, index card, information retrieval, iterative process, John Harrison: Longitude, Khan Academy, Kickstarter, machine translation, Mars Rover, Nate Silver, natural language processing, Netflix Prize, p-value, pattern recognition, performance metric, personalized medicine, pull request, recommendation engine, rent-seeking, selection bias, Silicon Valley, speech recognition, statistical model, stochastic process, tacit knowledge, text mining, the scientific method, The Wisdom of Crowds, Watson beat the top human players on Jeopardy!, X Prize

Enter MapReduce In 2004 Jeff and Sanjay published their paper “MapReduce: Simplified Data Processing on Large Clusters” (and here’s another one on the underlying filesystem). MapReduce allows us to stop thinking about fault tolerance; it is a platform that does the fault tolerance work for us. Programming 1,000 computers is now easier than programming 100. It’s a library to do fancy things. To use MapReduce, you write two functions: a mapper function, and then a reducer function. It takes these functions and runs them on many machines that are local to your stored data. All of the fault tolerance is automatically done for you once you’ve placed the algorithm into the map/reduce framework. The mapper takes each data point and produces an ordered pair of the form (key, value).

…

If we denote by the variable that exhibits whether a given computer is working, so means it works and means it’s broken, then we can assume: But this means, when we have 1,000 computers, the chance that no computer is broken is which is generally pretty small even if is small. So if for each individual computer, then the probability that all 1,000 computers work is 0.37, less than even odds. This isn’t sufficiently robust. What to do? We address this problem by talking about fault tolerance for distributed work. This usually involves replicating the input (the default is to have three copies of everything), and making the different copies available to different machines, so if one blows, another one will still have the good data. We might also embed checksums in the data, so the data itself can be audited for errors, and we will automate monitoring by a controller machine (or maybe more than one?).

…

To add efficiency, when some machines finish, we should use the excess capacity to rerun work, again checking for errors. Note Q: Wait, I thought we were counting things?! This seems like some other awful rat’s nest we’ve gotten ourselves into. A: It’s always like this. You cannot reason about the efficiency of fault tolerance easily; everything is complicated. And note, efficiency is just as important as correctness, because a thousand computers are worth more than your salary. It’s like this: The first 10 computers are easy; The first 100 computers are hard; and The first 1,000 computers are impossible. There’s really no hope.

pages: 589 words: 147,053

The Age of Em: Work, Love and Life When Robots Rule the Earth by Robin Hanson

8-hour work day, artificial general intelligence, augmented reality, Berlin Wall, bitcoin, blockchain, brain emulation, business cycle, business process, Clayton Christensen, cloud computing, correlation does not imply causation, creative destruction, deep learning, demographic transition, Erik Brynjolfsson, Ethereum, ethereum blockchain, experimental subject, fault tolerance, financial intermediation, Flynn Effect, Future Shock, Herman Kahn, hindsight bias, information asymmetry, job automation, job satisfaction, John Markoff, Just-in-time delivery, lone genius, Machinery of Freedom by David Friedman, market design, megaproject, meta-analysis, Nash equilibrium, new economy, Nick Bostrom, pneumatic tube, power law, prediction markets, quantum cryptography, rent control, rent-seeking, reversible computing, risk tolerance, Silicon Valley, smart contracts, social distancing, statistical model, stem cell, Thomas Malthus, trade route, Turing test, Tyler Cowen, Vernor Vinge, William MacAskill

If emulation hardware is digital, then it could either be deterministic, so that the value and timing of output states are always exactly predictable, or it could be fault-prone and fault-tolerant in the sense of having and tolerating more frequent and larger logic errors and timing fluctuations. Most digital hardware today is deterministic, but large parallel systems are more often fault-tolerant. The design of fault-tolerant hardware and software is an active area of research today (Bogdan et al. 2007). As human brains are large, parallel, and have an intrinsically fault-tolerant design, brain emulation software is likely to need less special adaptation to run on fault-prone hardware.

…

As human brains are large, parallel, and have an intrinsically fault-tolerant design, brain emulation software is likely to need less special adaptation to run on fault-prone hardware. Such hardware is usually cheaper to design and construct, occupies less volume, and takes less energy to run. Thus em hardware is likely to often be fault-prone and fault-tolerant. Cosmic rays are high-energy particles that come from space and disrupt the operation of electronic devices. Hardware errors resulting from cosmic rays cause a higher rate of errors per operation in hardware that runs more slowly, with all else equal. Because of this, when ems run slower, with each operation taking more time, they either tend to tolerate fewer other errors, or they pay more for error correction.

…

United States Bureau of Labor Statistics USDL-12–1887, September 18. http://www.bls.gov/news.release/archives/tenure_09182012.pdf. Boehm, Christopher. 1999. Hierarchy in the Forest: The Evolution of Egalitarian Behavior. Harvard University Press, December 1. Bogdan, Paul, Tudor Dumitras, and Radu Marculescu. 2007. “Stochastic Communication: A New Paradigm for Fault-Tolerant Networks-on-Chip.” VLSI Design 2007: 95348. Boning, Brent, Casey Ichniowski, and Kathryn Shaw. 2007. “Opportunity Counts: Teams and the Effectiveness of Production Incentives.” Journal of Labor Economics 25(4): 613–650. Bonke, Jens. 2012. “Do Morning-Type People Earn More than Evening-Type People?

pages: 463 words: 118,936

Darwin Among the Machines by George Dyson

Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anti-communist, backpropagation, Bletchley Park, British Empire, carbon-based life, cellular automata, Charles Babbage, Claude Shannon: information theory, combinatorial explosion, computer age, Computing Machinery and Intelligence, Danny Hillis, Donald Davies, fault tolerance, Fellow of the Royal Society, finite state, IFF: identification friend or foe, independent contractor, invention of the telescope, invisible hand, Isaac Newton, Jacquard loom, James Watt: steam engine, John Nash: game theory, John von Neumann, launch on warning, low earth orbit, machine readable, Menlo Park, Nash equilibrium, Norbert Wiener, On the Economy of Machinery and Manufactures, packet switching, pattern recognition, phenotype, RAND corporation, Richard Feynman, spectrum auction, strong AI, synthetic biology, the scientific method, The Wealth of Nations by Adam Smith, Turing machine, Von Neumann architecture, zero-sum game

Von Neumann believed that entirely different logical foundations would be required to arrive at an understanding of even the simplest nervous system, let alone the human brain. His Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components (1956) explored the possibilities of parallel architecture and fault-tolerant neural nets. This approach would soon be superseded by a development that neither nature nor von Neumann had counted on: the integrated circuit, composed of logically intricate yet structurally monolithic microscopic parts. Serial architecture swept the stage. Probabilistic logics, along with vacuum tubes and acoustic delay-line memory, would scarcely be heard from again.

…

At one level, this language may appear to us to be money, especially the new, polymorphous E-money that circulates without reserve at the speed of light. E-money is, after all, simply a consensual definition of “electrons with meaning,” allowing other levels of meaning to freely evolve. Composed of discrete yet divisible and liquid units, digital currency resembles the pulse-frequency coding that has proved to be such a rugged and fault-tolerant characteristic of the nervous systems evolved by biology. Frequency-modulated signals that travel through the nerves are associated with chemical messages that are broadcast by diffusion through the fluid that bathes the brain. Money has a twofold nature that encompasses both kinds of behavior: it can be transmitted, like an electrical signal, from one place (or time) to another; or it can be diffused in any number of more chemical, hormonelike ways.

…

The packet chooses a channel that happens to be quiet at that instant and jumps to the next lamppost at the speed of light. The multiplexing of communications across the available network topology is extended to the multiplexing of network topology across the available frequency spectrum. Communication becomes more efficient, fault tolerant, and secure. The way the system works now (in a growing number of metropolitan areas—hence the name) is that you purchase or rent a small Ricochet modem, about the size of a large candy bar and transmitting at about two-thirds of a watt. Your modem establishes contact with the nearest pole-top lunch box or directly with any other modem of its species within range.

pages: 1,758 words: 342,766

Code Complete (Developer Best Practices) by Steve McConnell

Ada Lovelace, Albert Einstein, Buckminster Fuller, business logic, call centre, classic study, continuous integration, data acquisition, database schema, don't repeat yourself, Donald Knuth, fault tolerance, General Magic , global macro, Grace Hopper, haute cuisine, if you see hoof prints, think horses—not zebras, index card, inventory management, iterative process, Larry Wall, loose coupling, Menlo Park, no silver bullet, off-by-one error, Perl 6, place-making, premature optimization, revision control, Sapir-Whorf hypothesis, seminal paper, slashdot, sorting algorithm, SQL injection, statistical model, Tacoma Narrows Bridge, the Cathedral and the Bazaar, the scientific method, Thomas Kuhn: the structure of scientific revolutions, Turing machine, web application

The fact that an environment has a particular error-handling approach doesn't mean that it's the best approach for your requirements. Fault Tolerance The architecture should also indicate the kind of fault tolerance expected. Fault tolerance is a collection of techniques that increase a system's reliability by detecting errors, recovering from them if possible, and containing their bad effects if not. Further Reading For a good introduction to fault tolerance, see the July 2001 issue of IEEE Software. In addition to providing a good introduction, the articles cite many key books and key articles on the topic. For example, a system could make the computation of the square root of a number fault tolerant in any of several ways: The system might back up and try again when it detects a fault.

…

Each class computes the square root, and then the system compares the results. Depending on the kind of fault tolerance built into the system, it then uses the mean, the median, or the mode of the three results. The system might replace the erroneous value with a phony value that it knows to have a benign effect on the rest of the system. Other fault-tolerance approaches include having the system change to a state of partial operation or a state of degraded functionality when it detects an error. It can shut itself down or automatically restart itself. These examples are necessarily simplistic. Fault tolerance is a fascinating and complex subject—unfortunately, it's one that's outside the scope of this book.

…

Does the architecture set space and speed budgets for each class, subsystem, or functionality area? Does the architecture describe how scalability will be achieved? Does the architecture address interoperability? Is a strategy for internationalization/localization described? Is a coherent error-handling strategy provided? Is the approach to fault tolerance defined (if any is needed)? Has technical feasibility of all parts of the system been established? Is an approach to overengineering specified? Are necessary buy-vs.-build decisions included? Does the architecture describe how reused code will be made to conform to other architectural objectives?

pages: 480 words: 99,288

Mastering ElasticSearch by Rafal Kuc, Marek Rogozinski

Amazon Web Services, book value, business logic, create, read, update, delete, en.wikipedia.org, fault tolerance, finite state, full text search, information retrieval

This allows us to store various document types in one index and have different mappings for different document types. Node The single instance of the ElasticSearch server is called a node. A single node ElasticSearch deployment can be sufficient for many simple use cases, but when you have to think about fault tolerance or you have lots of data that cannot fit in a single server, you should think about multi-node ElasticSearch cluster. Cluster Cluster is a set of ElasticSearch nodes that work together to handle the load bigger than single instance can handle (both in terms of handling queries and documents).

…

In the next chapter, we'll look closely at what ElasticSearch offers us when it comes to shard control. We'll see how to choose the right amount of shards and replicas for our index, we'll manipulate shard placement and we will see when to create more shards than we actually need. We'll discuss how the shard allocator works. Finally, we'll use all the knowledge we've got so far to create fault tolerant and scalable clusters. Chapter 4. Index Distribution Architecture In the previous chapter, we've learned how to use different scoring formulas and how we can benefit from using them. We've also seen how to use different posting formats to change how the data is indexed. In addition to that, we now know how to handle near real-time searching and real-time get and what searcher reopening means for ElasticSearch.

…

Using our knowledge As we are slowly approaching the end of the fourth chapter we need to get something that is closer to what you can encounter during your everyday work. Because of that we have decided to divide the real-life example into two sections. In this section, you'll see how to combine the knowledge we've got so far to build a fault-tolerant and scalable cluster based on some assumptions. Because this chapter is mostly about configuration, we will concentrate on that. The mappings and your data may be different, but with similar amount data and queries hitting your cluster the following sections may be useful for you. Assumptions Before we go into the juicy configuration details let's make some basic assumptions with which using which we will configure our ElasticSearch cluster.

pages: 719 words: 181,090

Site Reliability Engineering: How Google Runs Production Systems by Betsy Beyer, Chris Jones, Jennifer Petoff, Niall Richard Murphy

"Margaret Hamilton" Apollo, Abraham Maslow, Air France Flight 447, anti-pattern, barriers to entry, business intelligence, business logic, business process, Checklist Manifesto, cloud computing, cognitive load, combinatorial explosion, continuous integration, correlation does not imply causation, crowdsourcing, database schema, defense in depth, DevOps, en.wikipedia.org, exponential backoff, fail fast, fault tolerance, Flash crash, George Santayana, Google Chrome, Google Earth, if you see hoof prints, think horses—not zebras, information asymmetry, job automation, job satisfaction, Kubernetes, linear programming, load shedding, loose coupling, machine readable, meta-analysis, microservices, minimum viable product, MVC pattern, no silver bullet, OSI model, performance metric, platform as a service, proprietary trading, reproducible builds, revision control, risk tolerance, side project, six sigma, the long tail, the scientific method, Toyota Production System, trickle-down economics, warehouse automation, web application, zero day

The product developers have more visibility into the time and effort involved in writing and releasing their code, while the SREs have more visibility into the service’s reliability (and the state of production in general). These tensions often reflect themselves in different opinions about the level of effort that should be put into engineering practices. The following list presents some typical tensions: Software fault tolerance How hardened do we make the software to unexpected events? Too little, and we have a brittle, unusable product. Too much, and we have a product no one wants to use (but that runs very stably). Testing Again, not enough testing and you have embarrassing outages, privacy data leaks, or a number of other press-worthy events.

…

Deploying Distributed Consensus-Based Systems The most critical decisions system designers must make when deploying a consensus-based system concern the number of replicas to be deployed and the location of those replicas. Number of Replicas In general, consensus-based systems operate using majority quorums, i.e., a group of replicas may tolerate failures (if Byzantine fault tolerance, in which the system is resistant to replicas returning incorrect results, is required, then replicas may tolerate failures [Cas99]). For non-Byzantine failures, the minimum number of replicas that can be deployed is three—if two are deployed, then there is no tolerance for failure of any process.

…

[All15] J. Allspaw, “Trade-Offs Under Pressure: Heuristics and Observations of Teams Resolving Internet Service Outages”, MSc thesis, Lund University, 2015. [Ana07] S. Anantharaju, “Automating web application security testing”, blog post, July 2007. [Ana13] R. Ananatharayan et al., “Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams”, in SIGMOD ’13, 2013. [And05] A. Andrieux, K. Czajkowski, A. Dan, et al., “Web Services Agreement Specification (WS-Agreement)”, September 2005. [Bai13] P. Bailis and A. Ghodsi, “Eventual Consistency Today: Limitations, Extensions, and Beyond”, in ACM Queue, vol. 11, no. 3, 2013.

Reactive Messaging Patterns With the Actor Model: Applications and Integration in Scala and Akka by Vaughn Vernon

A Pattern Language, business intelligence, business logic, business process, cloud computing, cognitive dissonance, domain-specific language, en.wikipedia.org, fault tolerance, finite state, functional programming, Internet of things, Kickstarter, loose coupling, remote working, type inference, web application

• If scheduling tasks is difficult and error prone, leave the task scheduling to software that is best at that job, and focus on your system’s use cases instead. • If errors happen—and errors do happen—design your system to expect errors and react to errors by being fault tolerant. These are powerful assertions. Yet, is there a way to realize these sound concurrency design principles? Or have we just identified a panacea of wishful thinking? Can we actually use multithreaded software development techniques that enable us to reason about our systems, that react to changing conditions, that are scalable and fault tolerant, and that really work? How the Actor Model Helps A system of actors helps you leverage the simultaneous use of multiple processor cores.

…

Trying to take full advantage of contemporary hardware improvements such as increasing numbers of processors and cores and growing processor cache is seriously impeded by the very tools and patterns that should be helping us. Thus, implementing event-driven, scalable, resilient, and responsive applications is often deemed too difficult and risky and as a result is generally avoided. The Akka toolkit was created to address the failings of common multithreaded programming approaches, distributed computing, and fault tolerance. It does so by using the Actor model, which provides powerful abstractions that make creating solutions around concurrency and parallelism much easier to reason about and succeed in. This is not to say that Akka removes the need to think about concurrency. It doesn’t, and you must still design for parallelism, latency, and eventually consistent application state and think of how you will prevent your application from unnecessary blocking.

…

Akka clustering is useful not only for peak demand but also for failover. Even if you have five machines available, clustering can make more efficient use of resources by assigning extra work to the machines least under load and by rebalancing work between machines if a machine crashes unexpectedly. Akka clustering is designed to support a multinode, fault-tolerant, distributed system of actors. It does this by creating a cluster of nodes. A node must be an ActorSystem that is exposed on a TCP port so that it has a unique identifier. Every node must share the same ActorSystem name. Every node member must use a different port number within its host server hardware; no two node members may share a socket port number on the same physical machine.

pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzić

Albert Einstein, algorithmic bias, backpropagation, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application

In the context of data classification, an ANN can be designed to provide information not only about which particular class to select for a given sample, but also about confidence in the decision made. This latter information may be used to reject ambiguous data, should they arise, and thereby improve the classification performance or performances of the other tasks modeled by the network. 5. Fault Tolerance. An ANN has the potential to be inherently fault-tolerant, or capable of robust computation. Its performances do not degrade significantly under adverse operating conditions such as disconnection of neurons, and noisy or missing data. There is some empirical evidence for robust computation, but usually it is uncontrolled. 6.

…

SOM applications. (a) Drugs binding to human cytochrome; (b) interest rate classification; (c) analysis of book-buying behavior. 7.8 REVIEW QUESTIONS AND PROBLEMS 1. Explain the fundamental differences between the design of an ANN and “classical” information-processing systems. 2. Why is fault-tolerance property one of the most important characteristics and capabilities of ANNs? 3. What are the basic components of the neuron’s model? 4. Why are continuous functions such as log-sigmoid or hyperbolic tangent considered common activation functions in real-world applications of ANNs? 5. Discuss the differences between feedforward and recurrent neural networks. 6.

…

Because of the massive amount of data and the speed of which the data are generated, many data-mining applications in sensor networks require in-network processing such as aggregation to reduce sample size and communication overhead. Online data mining in sensor networks offers many additional challenges, including: limited communication bandwidth, constraints on local computing resources, limited power supply, need for fault tolerance, and asynchronous nature of the network. Obviously, data-mining systems have evolved in a short period of time from stand-alone programs characterized by single algorithms with little support for the entire knowledge-discovery process to integrated systems incorporating several mining algorithms, multiple users, communications, and various and heterogeneous data formats and distributed data sources.

Industry 4.0: The Industrial Internet of Things by Alasdair Gilchrist

3D printing, additive manufacturing, air gap, AlphaGo, Amazon Web Services, augmented reality, autonomous vehicles, barriers to entry, business intelligence, business logic, business process, chief data officer, cloud computing, connected car, cyber-physical system, data science, deep learning, DeepMind, deindustrialization, DevOps, digital twin, fault tolerance, fulfillment center, global value chain, Google Glasses, hiring and firing, industrial robot, inflight wifi, Infrastructure as a Service, Internet of things, inventory management, job automation, low cost airline, low skilled workers, microservices, millennium bug, OSI model, pattern recognition, peer-to-peer, platform as a service, pre–internet, race to the bottom, RFID, Salesforce, Skype, smart cities, smart grid, smart meter, smart transportation, software as a service, stealth mode startup, supply-chain management, The future is already here, trade route, undersea cable, vertical integration, warehouse robotics, web application, WebRTC, Y2K

Therefore, we see the following delivery mechanisms: At most once delivery—This is commonly called fire and forget and rides on unreliable protocols such as UDP At least once delivery—This is reliable delivery such as TCP/IP where every message is delivered to the recipient Exactly once delivery—This technique is used in batch jobs as means of delivery that ensures late packets, delayed through excessive latency or delay or even jitter do not mess up the results Additionally, there are also many other factors that need to be taken into consideration such as lifespan, which relates to the IISs to discard old data packets, much like the time-to-live factor on IP packets. There is also fault tolerance, which ensures that there is fault survivability and alternative routes or hardware redundancy is available, which will guarantee availability and reliability. Similarly, there is the case of security, which we will discuss in detail in a later chapter. Industry 4.0 Key Functions of the Communication Layer The communication layer functions can deliver the data to the correct address and application.

…

There is also considerable interest in the production of IoT devices capable of energy harvesting solar, wind, or electromagnetic fields as a power source, as that can be a major technology advance in deploying remote M2M style mesh networking in rural areas. For example, in a smart agriculture scenario. Energy harvesting IoT devices would provide the means through mesh M2M networks for highly fault tolerant, unattended long-term solutions that require only minimal human intervention However, research and technology is not just focused on the technology. They are also keenly studying methods that would make application protocols and data formats far more efficient. For instance, low-power sources require that devices running on minimal power levels or are harvesting energy, again at subsistence levels, must communicate their data in a highly efficient and timely manner and this has serious implications for protocol design.

…

Therefore, contention ratios—the number of other customers you are sharing the bandwidth with—can be as high as 50:1 for residential use and 10:1 for business use. • SDH/Sonnet—This optic ring technology is typically deployed as the service provider’s transport core as it is provides high speed, high capacity, and highly reliable and fault-tolerant transport for data over sometimesvast geographical regions. However, for customers that require high-speed data links over a large geographical region, typically enterprises or large company's fiber optic 163 164 Chapter 11 | IIoT WAN Technologies and Protocols rings are high performance, highly reliable, and high cost.

pages: 757 words: 193,541

The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Volume 2 by Thomas A. Limoncelli, Strata R. Chalup, Christina J. Hogan

active measures, Amazon Web Services, anti-pattern, barriers to entry, business process, cloud computing, commoditize, continuous integration, correlation coefficient, database schema, Debian, defense in depth, delayed gratification, DevOps, domain-specific language, en.wikipedia.org, fault tolerance, finite state, Firefox, functional programming, Google Glasses, information asymmetry, Infrastructure as a Service, intermodal, Internet of things, job automation, job satisfaction, Ken Thompson, Kickstarter, level 1 cache, load shedding, longitudinal study, loose coupling, machine readable, Malcom McLean invented shipping containers, Marc Andreessen, place-making, platform as a service, premature optimization, recommendation engine, revision control, risk tolerance, Salesforce, scientific management, seminal paper, side project, Silicon Valley, software as a service, sorting algorithm, standardized shipping container, statistical model, Steven Levy, supply-chain management, systems thinking, The future is already here, Toyota Production System, vertical integration, web application, Yogi Berra

Sometimes services were also scaled by deploying servers for the application into several geographic regions, or business units, each of which would then use its local server. For example, when Tom first worked at AT&T, there was a different payroll processing center for each division of the company. High Availability Applications requiring high availability required “fault-tolerant” computers. These computers had multiple CPUs, error-correcting RAM, and other technologies that were extremely expensive at the time. Fault-tolerant systems were niche products. Generally only the military and Wall Street needed such systems. As a result they were usually priced out of the reach of typical companies. Costs During this era the Internet was not business-critical, and outages for internal business-critical systems could be scheduled because the customer base was a limited, known set of people.

…

Failure domains can be any size: a device, a computer, a rack, a datacenter, or even an entire company. The amount of capacity in a system is N + M, where N is the amount of capacity used to provide a service and M is the amount of spare capacity available, which can be used in the event of a failure. A system that is N + 1 fault tolerant can survive one unit of failure and remain operational. The most common way to route around failure is through replication of services. A service may be replicated one or more times per failure domain to provide resilience greater than the domain. Failures can also come from external sources that overload a system, and from human mistakes.

…

Originally based on applying Agile methodology to operations, the result is a streamlined set of principles and processes that can create reliable services. Appendix B will make the case that cloud or distributed computing was the inevitable result of the economics of hardware. DevOps is the inevitable result of needing to do efficient operations in such an environment. If hardware and software are sufficiently fault tolerant, the remaining problems are human. The seminal paper “Why Do Internet Services Fail, and What Can Be Done about It?” by Oppenheimer et al. (2003) raised awareness that if web services are to be a success in the future, operational aspects must improve: We find that (1) operator error is the largest single cause of failures in two of the three services, (2) operator errors often take a long time to repair, (3) configuration errors are the largest category of operator errors, (4) failures in custom-written front-end software are significant, and (5) more extensive online testing and more thoroughly exposing and detecting component failures would reduce failure rates in at least one service.

pages: 194 words: 49,310

Clock of the Long Now by Stewart Brand

Albert Einstein, Brewster Kahle, Buckminster Fuller, Charles Babbage, Colonization of Mars, complexity theory, Danny Hillis, Eratosthenes, Extropian, fault tolerance, George Santayana, Herman Kahn, Internet Archive, Jaron Lanier, Kevin Kelly, Kim Stanley Robinson, knowledge economy, Lewis Mumford, life extension, longitudinal study, low earth orbit, Metcalfe’s law, Mitch Kapor, nuclear winter, pensions crisis, phenotype, Ray Kurzweil, Robert Metcalfe, Stephen Hawking, Stewart Brand, technological singularity, Ted Kaczynski, Thomas Malthus, Tragedy of the Commons, Vernor Vinge, Whole Earth Catalog

Hasty opportunists will never get past the foothills because they only pay attention to the slope of the ground under their feet, climb quickly to the immediate hilltop, and get stuck there. Patient opportunists take the longer view to the distant peaks, and toil through many ups and downs on the long trek to the heights. There are two ways to make systems fault-tolerant: One is to make them small, so that correction is local and quick; the other is to make them slow, so that correction has time to permeate the system. When you proceed too rapidly with something mistakes cascade, whereas when you proceed slowly the mistakes instruct. Gradual, incremental projects engage the full power of learning and discovery, and they are able to back out of problems.

…

Diamond, Jared Digital information and core standards discontinuity of and immortality and megadata and migration preservation of Digital records, passive and active Discounting of value Drexler, Eric Drucker, Peter Dubos, René Dyson, Esther Dyson, Freeman Earth, view of from outer space Earth Day Easterbrook, Gregg Eaton Collection Eberling, Richard Ecological communities systems and change See also Environment Economic forecasting Ecotrust Egyptian civilization and time Ehrlich, Paul Electronic Frontier Foundation Eliade, Mircea Eno, Brian and ancient Egyptian woman and Clock of the Long Now ideas for participation in Clock/Library and tour of Big Ben Environment degradation of and peace, prosperity, and continuity reframing of problems of and technology See also Ecological Environmentalists and long-view Europe-America dialogue Event horizon Evolution of Cooperation, The “Experts Look Ahead, The” Extinction rate Extra-Terrestrial Intelligence programs and time-release services Extropians Family Tree Maker Fashion Fast and bad things Fault-tolerant systems Feedback and tuning of systems Feldman, Marcus Finite and Infinite Games Finite games Florescence Foresight Institute Freefall Free will Fuller, Buckminster Fundamental tracking Future configuration towards continuous of desire versus fate feeling of and nuclear armageddon one hundred years and present moment tree uses of and value Future of Industrial Man, The “Futurismists” Gabriel, Peter Galileo Galvin, Robert Gambling Games, finite and infinite Gender imbalance in Chinese babies Generations Gershenfeld, Neil Gibbon, Edward GI Bill Gibson, William Gilbert, Joseph Henry Global Business Network (GBN) Global collapse Global computer Global perspective Global warming Goebbels, Joseph Goethe, Johann Wolfgang von Goldberg, Avram “Goldberg rule, the” Goldsmith, Oliver Goodall, Jane Governance Governing the Commons Government and the long view Grand Canyon Great Year Greek tragedy Grove, Andy Hale-Bopp comet Hampden-Turner, Charles Hardware dependent digital experiences, preservation of Hawking, Stephen Hawthorne, Nathaniel Heinlein, Robert Herman, Arthur Hill climbing Hillis, Daniel definition of technology and design of Clock and digital discontinuity and digital preservation and extra-terrestrial intelligence programs ideas for participation in Clock/Library and Long Now Foundation and long-term responsibility and motivation to build linear Clock and the Singularity and sustained endeavors and types of time History and accessible data as a horror and warning how to apply intelligently Hitler, Adolf Holling, C.

RDF Database Systems: Triples Storage and SPARQL Query Processing by Olivier Cure, Guillaume Blin

Amazon Web Services, bioinformatics, business intelligence, cloud computing, database schema, fault tolerance, folksonomy, full text search, functional programming, information retrieval, Internet Archive, Internet of things, linked data, machine readable, NP-complete, peer-to-peer, performance metric, power law, random walk, recommendation engine, RFID, semantic web, Silicon Valley, social intelligence, software as a service, SPARQL, sparse data, web application

When writing these programs, one does not need to take care about the data distribution and parallelism aspects. In fact, the main contribution of MapReduce-based systems is to orchestrate the distribution and execution of these map and reduce operations on a cluster of machines over very large data sets. It also fault-tolerant, meaning that if a machine of the cluster fails during the execution of a process, its job will be given to another machine automatically.Therefore, most of the hard tasks from an end-user point of view are automatized and taken care of by the system: data partitioning, execution scheduling, handling machine failure, and managing intermachine communication.

…

Because these index lookups are defined procedurally, we can consider that any forms of optimization are quite difficult to process. This implies that the generated index lookups need to be optimal to ensure efficient query answering. We saw in Chapter 5 that many systems are using a MapReduce approach to benefit from a parallel-processing, fault-tolerant environment. PigSPARQL, presented in Schätzle et al. (2013), is a system that maps SPARQL queries to Pig Latin queries. In a nutshell, Pig is a data analysis platform developed by Yahoo! that runs on top of the Hadoop processing framework, and Latin is its query language that abstracts the creation of the map and reduce functions using a relational algebra–like approach.

…

These two mechanisms come with support for conflict resolution, i.e., detect whether an update has been correctly replicated to a subscriber. The second strategy is based on partitioning that is specified at the index level using a hash function on key parts. Each partition is replicated on different physical machines to ensure load balancing and fault tolerance. When triple updates are being performed, all copies are updated within the same transaction. The clustering approach of the Mark Logic system distinguishes between two kinds of nodes: data managers (denoted as D-nodes) and evaluators (denoted as E-nodes). The D-nodes are responsible for the management of a data subset, while the E-nodes handle the access to data and the query processing.

pages: 348 words: 97,277

The Truth Machine: The Blockchain and the Future of Everything by Paul Vigna, Michael J. Casey

3D printing, additive manufacturing, Airbnb, altcoin, Amazon Web Services, barriers to entry, basic income, Berlin Wall, Bernie Madoff, Big Tech, bitcoin, blockchain, blood diamond, Blythe Masters, business process, buy and hold, carbon credits, carbon footprint, cashless society, circular economy, cloud computing, computer age, computerized trading, conceptual framework, content marketing, Credit Default Swap, cross-border payments, crowdsourcing, cryptocurrency, cyber-physical system, decentralized internet, dematerialisation, disinformation, disintermediation, distributed ledger, Donald Trump, double entry bookkeeping, Dunbar number, Edward Snowden, Elon Musk, Ethereum, ethereum blockchain, failed state, fake news, fault tolerance, fiat currency, financial engineering, financial innovation, financial intermediation, Garrett Hardin, global supply chain, Hernando de Soto, hive mind, informal economy, information security, initial coin offering, intangible asset, Internet of things, Joi Ito, Kickstarter, linked data, litecoin, longitudinal study, Lyft, M-Pesa, Marc Andreessen, market clearing, mobile money, money: store of value / unit of account / medium of exchange, Network effects, off grid, pets.com, post-truth, prediction markets, pre–internet, price mechanism, profit maximization, profit motive, Project Xanadu, ransomware, rent-seeking, RFID, ride hailing / ride sharing, Ross Ulbricht, Satoshi Nakamoto, self-driving car, sharing economy, Silicon Valley, smart contracts, smart meter, Snapchat, social web, software is eating the world, supply-chain management, Ted Nelson, the market place, too big to fail, trade route, Tragedy of the Commons, transaction costs, Travis Kalanick, Turing complete, Uber and Lyft, uber lyft, unbanked and underbanked, underbanked, universal basic income, Vitalik Buterin, web of trust, work culture , zero-sum game

However, instead of its electricity-hungry “proof-of-work” consensus model, they drew upon older, pre-Bitcoin protocols that were more efficient but which couldn’t achieve the same level of security without putting a centralized entity in charge of identifying and authorizing participants. Predominantly, the bankers’ models used a consensus algorithm known as practical byzantine fault tolerance, or PBFT, a cryptographic solution invented in 1999. It gave all approved ledger-keepers in the network confidence that each other’s actions weren’t undermining the shared record even when there was no way of knowing whether one or more had malicious intent to defraud the others. With these consensus-building systems, the computers adopted each updated version of the ledger once certain thresholds of acceptance were demonstrated across the network.

…

See also R3 CEV Cosmos costs-per-impression measures (CPMs) Craigslist Creative Commons credit default swap (CDS) Crowdfunder crowdfunding crypto-asset analysts crypto-assets Crytpo Company cryptocurrency and criminality and Cypherpunk movement and decentralization and fair distribution and financial sector and Fourth Industrial Revolution hoarding investors and privacy and quantum computing and regulatory challenges See also Bitcoin cryptography and blockchain technology and data storage and financial sector hashes history of and identity and math Merkle Tree practical byzantine fault tolerance (PBFT) and registers and security and privacy signatures and supply chains and tokens triple-entry bookkeeping and trust crypto-impact-economics Cryptokernel (CK) crypto-libertarians cryptomoney Cryptonomos Cuende, Luis Iván Cuomo, Jerry cyber-attacks ransom attacks cybersecurity and decentralized trust model device identity model shared-secret model Cypherpunk manifesto Cypherpunk movement and community DAO, The (The Decentralized Autonomous Organization) Dapps.

…

MIT Media Lab MIT Media Lab’s Digital Currency Initiative Mizrahi, Alex MME Modi, Narendra Monax Monero monetary and banking systems central bank fiat digital currency and community connections and digital counterfeiting mobile money systems money laundering See also cryptocurrency; financial sector Moore’s law Mooti Morehead, Dan Mozilla M-Pesa Nakamoto, Satoshi (pseudonymous Bitcoin creator) Nasdaq Nelson, Ted New America Foundation New York Department of Financial Services Niederauer, Duncan North American Bitcoin Conference Norway Obama, Barack Occupy Wall Street Ocean Health Coin off-chain environment Olsen, Richard open protocols open-source systems and movement and art and innovation challenges of Cryptokernel (CK) and data storage and financial sector and health care sector and honest accounting Hyperledger and identity and permissioned systems and registries and ride-sharing and tokens See also Ethereum organized crime Pacioli, Luca Pantera Capital Parity Wallet peer-to-peer commerce and economy Pentland, Alex “Sandy” Perkins Coie permissioned (private) blockchains advantages of challenges of and cryptocurrency-less systems definition of and finance sector open-source development of scalability of and security and supply chains permissionless blockchains Bitcoin and Cypherpunks Ethereum financial sector and identity information mobile money systems and scalability and trusted computing Pink Army Cooperative Plasma Polkadot Polychain Capital Poon, Joseph practical byzantine fault tolerance (PBFT) pre-mining pre-selling private blockchains. See permissioned (private) blockchains Procivis proof-of-stake algorithm proof of work prosumers Protocol Labs Provenance public key infrastructure (PKI) Pureswaran, Veena R3 CEV consortium ransom attacks Ravikant, Naval Realini, Carol re-architecting record keeping and proof-of-stake algorithm and supply chains and trust See also ledger-keeping Reddit refugee camps Regenor, James reputation scoring Reuschel, Peter Rhodes, Yorke ride-sharing Commuterz Lyft reputation scoring Uber Ripple Labs Rivest Co.

pages: 1,409 words: 205,237

Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale by Jan Kunigk, Ian Buss, Paul Wilkinson, Lars George

Amazon Web Services, barriers to entry, bitcoin, business intelligence, business logic, business process, cloud computing, commoditize, computer vision, continuous integration, create, read, update, delete, data science, database schema, Debian, deep learning, DevOps, domain-specific language, fault tolerance, Firefox, FOSDEM, functional programming, Google Chrome, Induced demand, information security, Infrastructure as a Service, Internet of things, job automation, Kickstarter, Kubernetes, level 1 cache, loose coupling, microservices, natural language processing, Network effects, platform as a service, single source of truth, source of truth, statistical model, vertical integration, web application

Core Components The first set of projects are those that form the core of the Hadoop project itself or are key enabling technologies for the rest of the stack: HDFS, YARN, Apache ZooKeeper, and the Apache Hive Metastore. Together, these projects form the foundation on which most other frameworks, projects, and applications running on the cluster depend. HDFS The Hadoop Distributed File System (HDFS) is the scalable, fault-tolerant, and distributed filesystem for Hadoop. Based on the original use case of analytics over large-scale datasets, HDFS is optimized to store very large amounts of immutable data with files being typically accessed in long sequential scans. HDFS is the critical supporting technology for many of the other components in the stack.

…

The client then reads the data directly from the DataNodes, preferring replicas that are local or close, in network terms. The design of HDFS means that it does not allow in-place updates to the files it stores. This can initially seem quite restrictive until you realize that this immutability allows it to achieve the required horizontal scalability and resilience in a relatively simple way. HDFS is fault-tolerant because the failure of an individual disk, DataNode, or even rack does not imperil the safety of the data. In these situations, the NameNode simply directs one of the DataNodes that is maintaining a surviving replica to copy the block to another DataNode until the required replication factor is reasserted.

…

In Chapter 3, we discuss in detail how HDFS interacts with the servers on which its daemons run and how it uses the locally attached disks in these servers. In Chapter 4, we examine the options when putting a network plan together, and in Chapter 12, we cover how to make HDFS as highly available and fault-tolerant as possible. One final note before we move on. In this short description of HDFS, we glossed over the fact that Hadoop abstracts much of this detail from the client. The API that a client uses is actually a Hadoop-compatible filesystem, of which HDFS is just one implementation. We will come across other commonly used implementations in this book, such as cloud-based object storage offerings like Amazon S3.

Engineering Security by Peter Gutmann

active measures, address space layout randomization, air gap, algorithmic trading, Amazon Web Services, Asperger Syndrome, bank run, barriers to entry, bitcoin, Brian Krebs, business process, call centre, card file, cloud computing, cognitive bias, cognitive dissonance, cognitive load, combinatorial explosion, Credit Default Swap, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, Debian, domain-specific language, Donald Davies, Donald Knuth, double helix, Dr. Strangelove, Dunning–Kruger effect, en.wikipedia.org, endowment effect, false flag, fault tolerance, Firefox, fundamental attribution error, George Akerlof, glass ceiling, GnuPG, Google Chrome, Hacker News, information security, iterative process, Jacob Appelbaum, Jane Jacobs, Jeff Bezos, John Conway, John Gilmore, John Markoff, John von Neumann, Ken Thompson, Kickstarter, lake wobegon effect, Laplace demon, linear programming, litecoin, load shedding, MITM: man-in-the-middle, Multics, Network effects, nocebo, operational security, Paradox of Choice, Parkinson's law, pattern recognition, peer-to-peer, Pierre-Simon Laplace, place-making, post-materialism, QR code, quantum cryptography, race to the bottom, random walk, recommendation engine, RFID, risk tolerance, Robert Metcalfe, rolling blackouts, Ruby on Rails, Sapir-Whorf hypothesis, Satoshi Nakamoto, security theater, semantic web, seminal paper, Skype, slashdot, smart meter, social intelligence, speech recognition, SQL injection, statistical model, Steve Jobs, Steven Pinker, Stuxnet, sunk-cost fallacy, supply-chain attack, telemarketer, text mining, the built environment, The Death and Life of Great American Cities, The Market for Lemons, the payments system, Therac-25, too big to fail, Tragedy of the Commons, Turing complete, Turing machine, Turing test, Wayback Machine, web application, web of trust, x509 certificate, Y2K, zero day, Zimmermann PGP

For example a resolver could decide that although a particular entry may be stale, it came from an authoritative source and so it can still be used until newer information becomes available (the technical name for a resolver that provides this type of service on behalf of the user is “curated DNS”). What DNSSEC does is take the irregularity- and fault-tolerant behaviour of resolvers and turn any problem into a fatal error, since close-enough is no longer sufficient to satisfy a resolver that for security reasons can’t allow a single bit to be out of place. The DNSSEC documents describe in great detail the bits-on-the-wire representation of the packets that carry the data but say nothing about what happens to those bits once they’ve reached their destination [637]. As a result the implicit fault-tolerance of the DNS, which works because resolvers go to great lengths to tolerate any form of vaguely-acceptable (and in a number of cases unacceptable but present in widelydeployed implementations) responses [638], is seriously impacted when glitches are 390 Design no longer allowed to be tolerated.

…

Other Threat Analysis Techniques The discussion above has focused heavily on PSMs for threat analysis because that seems to be the most useful technique to apply to product development. Another 260 Threats threat analysis technique that you may run into is the use of attack trees or graphs [97][98][99][100][101][102][103][104][105][106][107][108][109][110][111][112] [113][114][115][116][117][118][119][120][121][122] which are derived from fault trees used in fault-tolerant computing and safety-critical systems [123][124] [125][126][127][128]. The general idea behind a fault tree is shown in Figure 70 and involves starting with the general high-level concept that “a failure occurred” and then iteratively breaking it down into more and more detailed failure classes.

…

The analysis process for these methods is a relatively straightforward modification of the existing FMEA one that involves identifying all of the system components that would be affected by a particular type of attack (typically a computer-based one rather than just a standard component failure) and then applying standard mitigation techniques used with fault-tolerant and safety-critical systems. So although FMEA and RA aren’t entirely useful for dealing with malicious rather than benign faults, they can at least be applied as a general tool to structuring the allocation of resources towards dealing with malicious faults. Another area where FMEA can be useful is in modelling the process of risk diversification that’s covered in “Security through Diversity” on page 315.

pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem

Amazon Web Services, anti-pattern, bioinformatics, business logic, commoditize, corporate governance, create, read, update, delete, data acquisition, en.wikipedia.org, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, the strength of weak ties, web application

Whatever the database, understanding the underlying storage and caching infrastruc‐ ture will help you construct idiomatic-- and hence, mechanically sympathetic—queries that maximise performance. Our final observation on availability is that scaling for cluster-wide replication has a positive impact, not just in terms of fault-tolerance, but also responsiveness. Since there are many machines available for a given workload, query latency is low and availability is maintained. But as we’ll now discuss, scale itself is more nuanced than simply the number of servers we deploy. Scale The topic of scale has become more important as data volumes have grown.

…

Document Stores | 173 Key-Value Stores Key-value stores are cousins of the document store family, but their lineage comes from Amazon’s Dynamo database. 3 They act like large, distributed hashmap data structures that store and retrieve opaque values by key. As shown in Figure A-3 the key space of the hashmap is spread across numerous buckets on the network. For fault-tolerance reasons each bucket is replicated onto several ma‐ chines. The formula for number of replicas required is given by R = 2F +1 where F is the number of failures we can tolerate. The replication algorithm seeks to ensure that machines aren’t exact copies of each other. This allows the system to load-balance while a machine and its buckets recover; it also helps avoid hotspots, which can cause inad‐ vertent self denial-of-service.

pages: 540 words: 103,101

Building Microservices by Sam Newman

airport security, Amazon Web Services, anti-pattern, business logic, business process, call centre, continuous integration, Conway's law, create, read, update, delete, defense in depth, don't repeat yourself, Edward Snowden, fail fast, fallacies of distributed computing, fault tolerance, index card, information retrieval, Infrastructure as a Service, inventory management, job automation, Kubernetes, load shedding, loose coupling, microservices, MITM: man-in-the-middle, platform as a service, premature optimization, pull request, recommendation engine, Salesforce, SimCity, social graph, software as a service, source of truth, sunk-cost fallacy, systems thinking, the built environment, the long tail, two-pizza team, web application, WebSocket

Netflix, for example, is especially concerned with aspects like fault tolerance, to ensure that the outage of one part of its system cannot take everything down. To handle this, a large amount of work has been done to ensure that there are client libraries on the JVM to provide teams with the tools they need to keep their services well behaved. Anyone introducing a new technology stack would mean having to reproduce all this effort. The main concern for Netflix is less about the duplicated effort, and more about the fact that it is so easy to get this wrong. The risk of a service getting newly implemented fault tolerance wrong is high if it could impact more of the system.

…

Consul also builds in other capabilities that you might find useful, such as the ability to perform health checks on nodes. This means that Consul could well overlap the capabilities provided by other dedicated monitoring tools, although you would more likely use Consul as a source of this information and then pull it into a more comprehensive dashboard or alerting system. Consul’s highly fault-tolerant design and focus on handling systems that make heavy use of ephemeral nodes does make me wonder, though, if it may end up replacing systems like Nagios and Sensu for some use cases. Consul uses a RESTful HTTP interface for everything from registering a service, querying the key/value store, or inserting health checks.

pages: 31 words: 9,168

Designing Reactive Systems: The Role of Actors in Distributed Architecture by Hugh McKee

Amazon Web Services, business logic, fault tolerance, Internet of things, microservices

These two features of the actor system directly impact the operational costs of your application system: you use the processing capacity that you have more efficiently and you use only the capacity that is needed at a given point in time. The main takeaways in this chapter are: Delegation of work through supervised workers allows for higher levels of concurrency and fault tolerance. Workers are asynchronous and run concurrently, never sitting idle as in synchronous systems. Efficient utilization of system resources (CPU, memory, and threads) results in reduced infrastructure costs. It’s simple to scale elastically at the actor level by increasing or decreasing workers as needed.

pages: 923 words: 516,602

The C++ Programming Language by Bjarne Stroustrup

combinatorial explosion, conceptual framework, database schema, Dennis Ritchie, distributed generation, Donald Knuth, fault tolerance, functional programming, general-purpose programming language, higher-order functions, index card, iterative process, job-hopping, L Peter Deutsch, locality of reference, Menlo Park, no silver bullet, Parkinson's law, premature optimization, sorting algorithm

Chapter 13 presents templates, that is, C++’s facilities for defining families of types and functions. It demonstrates the basic techniques used to provide containers, such as lists, and to support generic programming. Chapter 14 presents exception handling, discusses techniques for error handling, and presents strategies for fault tolerance. I assume that you either aren’t well acquainted with objectoriented programming and generic programming or could benefit from an explanation of how the main abstraction techniques are supported by C++. Thus, I don’t just present the language features supporting the abstraction techniques; I also explain the techniques themselves.

…

The exception-handling mechanism is a nonlocal control structure based on stack unwinding (§14.4) that can be seen as an alternative return mechanism. There are therefore legitimate uses of exceptions that have nothing to do with errors (§14.5). However, the primary aim of the exception-handling mechanism and the focus of this chapter is error handling and the support of fault tolerance. Standard C++ doesn’t have the notion of a thread or a process. Consequently, exceptional circumstances relating to concurrency are not discussed here. The concurrency facilities available on your system are described in its documentation. Here, I’ll just note that the C++ exception- The C++ Programming Language, Third Edition by Bjarne Stroustrup.

…

Exactly the same problem can occur in languages that do not support exception handling. For example, the standard C library function lloonnggjjm mpp() can cause the same problem. Even an ordinary rreettuurrnn-statement could exit uussee__ffiillee without closing ff. A first attempt to make uussee__ffiillee() to be fault-tolerant looks like this: vvooiidd uussee__ffiillee(ccoonnsstt cchhaarr* ffnn) { F FIIL LE E* f = ffooppeenn(ffnn,"rr"); ttrryy { // use f } The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T. Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

pages: 933 words: 205,691

Hadoop: The Definitive Guide by Tom White

Amazon Web Services, bioinformatics, business intelligence, business logic, combinatorial explosion, data science, database schema, Debian, domain-specific language, en.wikipedia.org, exponential backoff, fallacies of distributed computing, fault tolerance, full text search, functional programming, Grace Hopper, information retrieval, Internet Archive, Kickstarter, Large Hadron Collider, linked data, loose coupling, openstreetmap, recommendation engine, RFID, SETI@home, social graph, sparse data, web application

The storage subsystem deals with blocks, simplifying storage management (since blocks are a fixed size, it is easy to calculate how many can be stored on a given disk) and eliminating metadata concerns (blocks are just a chunk of data to be stored—file metadata such as permissions information does not need to be stored with the blocks, so another system can handle metadata separately). Furthermore, blocks fit well with replication for providing fault tolerance and availability. To insure against corrupted blocks and disk and machine failure, each block is replicated to a small number of physically separate machines (typically three). If a block becomes unavailable, a copy can be read from another location in a way that is transparent to the client.

…

There are many other interfaces to HDFS, but the command line is one of the simplest and, to many developers, the most familiar. We are going to run HDFS on one machine, so first follow the instructions for setting up Hadoop in pseudo-distributed mode in Appendix A. Later you’ll see how to run on a cluster of machines to give us scalability and fault tolerance. There are two properties that we set in the pseudo-distributed configuration that deserve further explanation. The first is fs.default.name, set to hdfs://localhost/, which is used to set a default filesystem for Hadoop. Filesystems are specified by a URI, and here we have used an hdfs URI to configure Hadoop to use HDFS by default.

…

In any case, you’ll need to begin to scale horizontally. You can attempt to build some type of partitioning on your largest tables, or look into some of the commercial solutions that provide multiple master capabilities. Countless applications, businesses, and websites have successfully achieved scalable, fault-tolerant, and distributed data systems built on top of RDBMSs and are likely using many of the previous strategies. But what you end up with is something that is no longer a true RDBMS, sacrificing features and conveniences for compromises and complexities. Any form of slave replication or external caching introduces weak consistency into your now denormalized data.

pages: 444 words: 118,393

The Nature of Software Development: Keep It Simple, Make It Valuable, Build It Piece by Piece by Ron Jeffries

Amazon Web Services, anti-pattern, bitcoin, business cycle, business intelligence, business logic, business process, c2.com, call centre, cloud computing, continuous integration, Conway's law, creative destruction, dark matter, data science, database schema, deep learning, DevOps, disinformation, duck typing, en.wikipedia.org, fail fast, fault tolerance, Firefox, Hacker News, industrial robot, information security, Infrastructure as a Service, Internet of things, Jeff Bezos, Kanban, Kubernetes, load shedding, loose coupling, machine readable, Mars Rover, microservices, Minecraft, minimum viable product, MITM: man-in-the-middle, Morris worm, move fast and break things, OSI model, peer-to-peer lending, platform as a service, power law, ransomware, revision control, Ruby on Rails, Schrödinger's Cat, Silicon Valley, six sigma, software is eating the world, source of truth, SQL injection, systems thinking, text mining, time value of money, transaction costs, Turing machine, two-pizza team, web application, zero day

The exhaustive brute-force approach is clearly impractical for anything but life-critical systems or Mars rovers. What if you actually have to deliver in this decade? Our community is divided about how to handle faults. One camp says we need to make systems fault-tolerant. We should catch exceptions, check error codes, and generally keep faults from turning into errors. The other camp says it’s futile to aim for fault tolerance. It’s like trying to make a fool-proof device: the universe will always deliver a better fool. No matter what faults you try to catch and recover from, something unexpected will always occur. This camp says “let it crash” so you can restart from a known good state.

…

The alternative, vertical scaling, means building bigger and bigger servers—adding core, memory, and storage to hosts. Vertical scaling has its place, but most of our interactive workload goes to horizontally scaled farms. If your system scales horizontally, then you will have load-balanced farms or clusters where each server runs the same applications. The multiplicity of machines provides you with fault tolerance through redundancy. A single machine or process can completely bonk while the remainder continues serving transactions. Still, even though horizontal clusters are not susceptible to single points of failure (except in the case of attacks of self-denial; see Self-Denial Attacks), they can exhibit a load-related failure mode.

pages: 58 words: 12,386

Big Data Glossary by Pete Warden

business intelligence, business logic, crowdsourcing, fault tolerance, functional programming, information retrieval, linked data, machine readable, natural language processing, recommendation engine, web application

This horizontal scaling approach tends to be cheaper as the number of operations and the size of the data increases, and the very largest data processing pipelines are all built on a horizontal model. There is a cost to this approach, though. Writing distributed data handling code is tricky and involves tradeoffs between speed, scalability, fault tolerance, and traditional database goals like atomicity and consistency. MapReduce MapReduce is an algorithm design pattern that originated in the functional programming world. It consists of three steps. First, you write a mapper function or script that goes through your input data and outputs a series of keys and values to use in calculating the results.

Applied Cryptography: Protocols, Algorithms, and Source Code in C by Bruce Schneier

active measures, cellular automata, Claude Shannon: information theory, complexity theory, dark matter, Donald Davies, Donald Knuth, dumpster diving, Dutch auction, end-to-end encryption, Exxon Valdez, fault tolerance, finite state, heat death of the universe, information security, invisible hand, John von Neumann, knapsack problem, MITM: man-in-the-middle, Multics, NP-complete, OSI model, P = NP, packet switching, quantum cryptography, RAND corporation, RFC: Request For Comment, seminal paper, software patent, telemarketer, traveling salesman, Turing machine, web of trust, Zimmermann PGP

Ciphertext is up to one block longer - Ciphertext is up to one block longer than the plaintext, due to padding. than the plaintext, not counting the IV. - No preprocessing is possible. - No preprocessing is possible. + Processing is parallelizable. +/- Encryptions not parallelizable; decryption is parallelizable and has a random-access property. Fault-tolerance: Fault-tolerance: - A ciphertext error affects one full - A ciphertext error affects one full block of plaintext. block of plaintext and the corresponding bit in the next block. - Synchronization error is - Synchronization error is unrecoverable. unrecoverable. CFB: OFB/Counter: Security: Security: + Plaintext patterns are concealed. + Plaintext patterns are concealed. + Input to the block cipher is + Input to the block cipher is randomized. randomized. + More than one message can be + More than one message can be encrypted with the same key provided encrypted with the same key, that a different IV is used. provided that a different IV is used. +/- Plaintext is somewhat difficult to - Plaintext is very easy to manipulate, manipulate;blocks can be removed any change in ciphertext directly from the beginning and end of the affects the plaintext. message, bits of the first block can be changed, and repetition allows some controlled changes.

…

. +/- Encryption is not parallelizable; decryption is parallelizable and has a random-access property. Fault-tolerance: - A ciphertext error affects the corresponding bit of plaintext and the next full block. +Synchronization errors of full block sizes are recoverable. 1-bit CFB can recover from the addition or loss of single bits. Efficiency: + Speed is the same as the block cipher. - Ciphertext is the same size as the plaintext, not counting the IV. + Processing is possible before the message is seen. -/+ OFB processing is not parallelizable; counter processing is parallelizable. Fault-tolerance: + A ciphertext error affects only the corresponding bit of plaintext.

…

There are other security considerations: Patterns in the plaintext should be concealed, input to the cipher should be randomized, manipulation of the plaintext by introducing errors in the ciphertext should be difficult, and encryption of more than one message with the same key should be possible. These will be discussed in detail in the next sections. Efficiency is another consideration. The mode should not be significantly less efficient than the underlying cipher. In some circumstances it is important that the ciphertext be the same size as the plaintext. A third consideration is fault-tolerance. Some applications need to parallelize encryption or decryption, while others need to be able to preprocess as much as possible. In still others it is important that the decrypting process be able to recover from bit errors in the ciphertext stream, or dropped or added bits. As we will see, different modes have different subsets of these characteristics. 9.1 Electronic Codebook Mode Electronic codebook (ECB) mode is the most obvious way to use a block cipher: A block of plaintext encrypts into a block of ciphertext.

pages: 232 words: 71,237

Kill It With Fire: Manage Aging Computer Systems by Marianne Bellotti

anti-pattern, barriers to entry, business logic, cloud computing, cognitive bias, computer age, continuous integration, create, read, update, delete, Daniel Kahneman / Amos Tversky, data science, database schema, Dennis Ritchie, DevOps, fault tolerance, fear of failure, Google Chrome, Hans Moravec, iterative process, Ken Thompson, loose coupling, microservices, minimum viable product, Multics, no silver bullet, off-by-one error, platform as a service, pull request, QWERTY keyboard, Richard Stallman, risk tolerance, Schrödinger's Cat, side project, software as a service, Steven Levy, systems thinking, web application, Y Combinator, Y2K

A quick trick when two capable engineers cannot seem to agree on a decision is to ask yourself what each one is optimizing for with their suggested approach. Remember, technology has a number of trade-offs where optimizing for one characteristic diminishes another important characteristic. Examples include security versus usability, coupling versus complexity, fault tolerance versus consistency, and so on, and so forth. If two engineers really can’t agree on a decision, it’s usually because they have different beliefs about where the ideal optimization between two such poles is. Looking for absolute truths in situations that are ambiguous and value-based is painful.

…

Networking issues are not subtle, and they are generally a product of misconfiguration rather than gremlins. The HTTP request solution is wrong in the correct way because migrating from an HTTP request between Service A and Service B to a message queue later is straightforward. While we are temporarily losing built-in fault tolerance and accepting a higher scaling burden, it creates a system that is easier for the current teams to maintain. The counterexample would be if we swapped the order of the HTTP request and had Service B poll Service A for new data. While this is also less complex than a message queue, it is unnecessarily resource-intensive.

pages: 834 words: 180,700

The Architecture of Open Source Applications by Amy Brown, Greg Wilson

8-hour work day, anti-pattern, bioinformatics, business logic, c2.com, cloud computing, cognitive load, collaborative editing, combinatorial explosion, computer vision, continuous integration, Conway's law, create, read, update, delete, David Heinemeier Hansson, Debian, domain-specific language, Donald Knuth, en.wikipedia.org, fault tolerance, finite state, Firefox, Free Software Foundation, friendly fire, functional programming, Guido van Rossum, Ken Thompson, linked data, load shedding, locality of reference, loose coupling, Mars Rover, MITM: man-in-the-middle, MVC pattern, One Laptop per Child (OLPC), peer-to-peer, Perl 6, premature optimization, recommendation engine, revision control, Ruby on Rails, side project, Skype, slashdot, social web, speech recognition, the scientific method, The Wisdom of Crowds, web application, WebSocket

The coordinator distributes requests to individual CouchDB instances based on the key of the document being requested. Twitter has built the notions of sharding and replication into a coordinating framework called Gizzard16. Gizzard takes standalone data stores of any type—you can build wrappers for SQL or NoSQL storage systems—and arranges them in trees of any depth to partition keys by key range. For fault tolerance, Gizzard can be configured to replicate data to multiple physical machines for the same key range. 13.4.3. Consistent Hash Rings Good hash functions distribute a set of keys in a uniform manner. This makes them a powerful tool for distributing key-value pairs among multiple servers. The academic literature on a technique called consistent hashing is extensive, and the first applications of the technique to data stores was in systems called distributed hash tables (DHTs).

…

With more complicated rebalancing schemes, finding the right node for a key becomes more difficult. Range partitioning requires the upfront cost of maintaining routing and configuration nodes, which can see heavy load and become central points of failure in the absence of relatively complex fault tolerance schemes. Done well, however, range-partitioned data can be load-balanced in small chunks which can be reassigned in high-load situations. If a server goes down, its assigned ranges can be distributed to many servers, rather than loading the server's immediate neighbors during downtime. 13.5.

…

., as a RFC 3279 Dsa-Sig-Value, created by algorithm 1.2.840.10040.4.3. The Architecture of Open Source Applications Amy Brown and Greg Wilson (eds.) ISBN 978-1-257-63801-7 License / Buy / Contribute Chapter 15. Riak and Erlang/OTP Francesco Cesarini, Andy Gross, and Justin Sheehy Riak is a distributed, fault tolerant, open source database that illustrates how to build large scale systems using Erlang/OTP. Thanks in large part to Erlang's support for massively scalable distributed systems, Riak offers features that are uncommon in databases, such as high-availability and linear scalability of both capacity and throughput.

pages: 319 words: 72,969

Nginx HTTP Server Second Edition by Clement Nedelcu

Debian, fault tolerance, Firefox, Google Chrome, Ruby on Rails, web application

Here is a list of the main features of the web branch, quoted from the official website www.nginx.org: • Handling of static files, index files, and autoindexing; open file descriptor cache. • Accelerated reverse proxying with caching; simple load balancing and fault tolerance. • Accelerated support with caching of remote FastCGI servers; simple load balancing and fault tolerance. • Modular architecture. Filters include Gzipping, byte ranges, chunked responses, XSLT, SSI, and image resizing filter. Multiple SSI inclusions within a single page can be processed in parallel if they are handled by FastCGI or proxied servers

pages: 66 words: 9,247

MongoDB and Python by Niall O’Higgins

cloud computing, Debian, fault tolerance, semantic web, web application

MongoDB ObjectIds have the nice property of being almost-certainly-unique upon generation, hence no central coordination is required. This contrasts sharply with the common RDBMS idiom of using auto-increment primary keys. Guaranteeing that an auto-increment key is not already in use usually requires consulting some centralized system. When the intention is to provide a horizontally scalable, de-centralized and fault-tolerant database—as is the case with MongoDB—auto-increment keys represent an ugly bottleneck. By employing ObjectId as your _id, you leave the door open to horizontal scaling via MongoDB’s sharding capabilities. While you can in fact supply your own value for the _id property if you wish—so long as it is globally unique—this is best avoided unless there is a strong reason to do otherwise.

pages: 355 words: 81,788

Monolith to Microservices: Evolutionary Patterns to Transform Your Monolith by Sam Newman

Airbnb, business logic, business process, continuous integration, Conway's law, database schema, DevOps, fail fast, fault tolerance, ghettoisation, inventory management, Jeff Bezos, Kubernetes, loose coupling, microservices, MVC pattern, price anchoring, pull request, single page application, single source of truth, software as a service, source of truth, sunk-cost fallacy, systems thinking, telepresence, two-pizza team, work culture

I often look back at the small part I played in this industry with a great deal of regret. It turns out not knowing what you’re doing and doing it anyway can have some pretty disastrous implications. 7 See Liming Chen and Algirdas Avizienis, “N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation,” published in the Twenty-Fifth International Symposium on Fault-Tolerant Computing (1995). Chapter 4. Decomposing the Database As we’ve already explored, there are a host of ways to extract functionality into microservices. However, we need to address the elephant in the room: namely, what do we do about our data?

Service Design Patterns: Fundamental Design Solutions for SOAP/WSDL and RESTful Web Services by Robert Daigneau

Amazon Web Services, business intelligence, business logic, business process, continuous integration, create, read, update, delete, en.wikipedia.org, fault tolerance, loose coupling, machine readable, MITM: man-in-the-middle, MVC pattern, OSI model, pull request, RFC: Request For Comment, Ruby on Rails, software as a service, web application

Once a task has completed, the request would be forwarded to the next background process to perform the next task (e.g., reserve hotel), and so on. The request is therefore processed much like a baton is passed from one runner to the next in a relay race. Web server scalability is promoted because the work is off-loaded from the web servers. This pattern also provides a relatively fault-tolerant way to conduct long-running business processes. However, it can be challenging to understand the entire business process at a macro level, and it can also be difﬁcult to change or debug control-ﬂow logic since these rules are typically buried within individual services, conﬁguration W ORKFLOW C ONNECTOR ﬁles, routing tables, and messages in transit.

…

These Process Snapshots provide several beneﬁts. One may query the database to determine the status of any process instance. If a process instance crashes, the database may be queried to determine the last task that completed successfully, and the process may be restarted from that step. This is one way Workﬂow Engines help to ensure fault tolerance. Complete Flight Reservation Issue Confirmation Callback Message Flight Reservation ID Process Variable Figure 5.4 Graphical workﬂow design tools let developers depict control ﬂow through UML activity diagrams and ﬂowcharts. Information may be mapped from one task to another through Process Variables. 159 Workﬂow Connector 160 Workﬂow Connector C HAPTER 5 W EB S ERVICE I MPLEMENTATION S TYLES The Workﬂow Connector pattern uses web services as a means to launch the business processes managed by workﬂow engines.

pages: 329 words: 95,309

Digital Bank: Strategies for Launching or Becoming a Digital Bank by Chris Skinner

algorithmic trading, AltaVista, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, bank run, Basel III, bitcoin, Bitcoin Ponzi scheme, business cycle, business intelligence, business process, business process outsourcing, buy and hold, call centre, cashless society, clean water, cloud computing, corporate social responsibility, credit crunch, cross-border payments, crowdsourcing, cryptocurrency, demand response, disintermediation, don't be evil, en.wikipedia.org, fault tolerance, fiat currency, financial innovation, gamification, Google Glasses, high net worth, informal economy, information security, Infrastructure as a Service, Internet of things, Jeff Bezos, Kevin Kelly, Kickstarter, M-Pesa, margin call, mass affluent, MITM: man-in-the-middle, mobile money, Mohammed Bouazizi, new economy, Northern Rock, Occupy movement, Pingit, platform as a service, Ponzi scheme, prediction markets, pre–internet, QR code, quantitative easing, ransomware, reserve currency, RFID, Salesforce, Satoshi Nakamoto, Silicon Valley, smart cities, social intelligence, software as a service, Steve Jobs, strong AI, Stuxnet, the long tail, trade route, unbanked and underbanked, underbanked, upwardly mobile, vertical integration, We are the 99%, web application, WikiLeaks, Y2K

It is far easier to change and add new front office systems – new trading desks, new channels or new customer service operations – than to replace core back office platforms – deposit account processing, post-trade services and payment systems. Why? Because the core processing needs to be highly resilient; 99.9999999999999999999999% and a few more 9’s fault tolerant; and running 24 by 7. In other words these systems are non-stop and would highly expose the bank to failure if they stop working. It is these systems that cause most of the challenges for a bank however. This is because, being a core system, they were often developed in the 1960s and 1970s. Back then, computing technologies were based upon lines of code fed into the machine through packs and packs of punched cards.

…

Add to this the regulatory regime change, which would force banks to respond more and more rapidly to new requirements, and the old technologies could not keep up. Finally, the technology had to change. This is why banks have been working hard to consolidate and replace their old infrastructures, and why we are seeing more and more glitches and failures. As soon as you upgrade an old, embedded, non-stop fault tolerant machine however, you are open to risk. The 99.9999+% non-stop machine suddenly has to stop. A competent bank derisks the risk of change by testing, testing and testing, whilst an incompetent bank may test but not enough. Luckily, most banks and exchanges are competent enough to test these things properly by planning correctly through roll forward and roll back cycles.

pages: 713 words: 93,944

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Eric Redmond, Jim Wilson, Jim R. Wilson

AGPL, Amazon Web Services, business logic, create, read, update, delete, data is the new oil, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, full text search, general-purpose programming language, Kickstarter, Large Hadron Collider, linked data, MVC pattern, natural language processing, node package manager, random walk, recommendation engine, Ruby on Rails, seminal paper, Skype, social graph, sparse data, web application

Each component is cheap and expendable, but when used right, it’s hard to find a simpler or stronger structure upon which to build a foundation. Riak is a distributed key-value database where values can be anything—from plain text, JSON, or XML to images or video clips—all accessible through a simple HTTP interface. Whatever data you have, Riak can store it. Riak is also fault-tolerant. Servers can go up or down at any moment with no single point of failure. Your cluster continues humming along as servers are added, removed, or (ideally not) crash. Riak won’t keep you up nights worrying about your cluster—a failed node is not an emergency, and you can wait to deal with it in the morning.

…

It is based on BigTable, a high-performance, proprietary database developed by Google and described in the 2006 white paper “Bigtable: A Distributed Storage System for Structured Data.”[26] Initially created for natural-language processing, HBase started life as a contrib package for Apache Hadoop. Since then, it has become a top-level Apache project. On the architecture front, HBase is designed to be fault tolerant. Hardware failures may be uncommon for individual machines, but in a large cluster, node failure is the norm. By using write-ahead logging and distributed configuration, HBase can quickly recover from individual server failures. Additionally, HBase lives in an ecosystem that has its own complementary benefits.

pages: 304 words: 91,566

Bitcoin Billionaires: A True Story of Genius, Betrayal, and Redemption by Ben Mezrich

airport security, Albert Einstein, bank run, Ben Horowitz, Big Tech, bitcoin, Bitcoin Ponzi scheme, blockchain, Burning Man, buttonwood tree, cryptocurrency, East Village, El Camino Real, Elon Musk, fake news, family office, fault tolerance, fiat currency, financial engineering, financial innovation, game design, information security, Isaac Newton, junk bonds, Marc Andreessen, Mark Zuckerberg, Max Levchin, Menlo Park, Metcalfe’s law, Michael Milken, new economy, offshore financial centre, paypal mafia, peer-to-peer, Peter Thiel, Ponzi scheme, proprietary trading, QR code, Ronald Reagan, Ross Ulbricht, Sand Hill Road, Satoshi Nakamoto, Savings and loan crisis, Schrödinger's Cat, self-driving car, Sheryl Sandberg, side hustle, side project, Silicon Valley, Skype, smart contracts, South of Market, San Francisco, Steve Jobs, Susan Wojcicki, transaction costs, Virgin Galactic, zero-sum game

Either way, it would be a logistical nightmare—Mission Impossible shit that only worked in the movies—to get ahold of the three shards that made up the bitcoin private key. Moreover, the twins had replicated this model four times across different geographic regions, to build redundancy into their system—removing the final single point of failure—and improving their overall fault tolerance. This way, if a natural disaster like a major tornado decimated the Midwest, there would still be other sets of alpha, bravo, and charlie spread across other regions in the country (the Northeast, Mid-Atlantic, West, etc.) that could be assembled to form the twins’ private key. If a mega tsunami—or hell, Godzilla—hit the eastern seaboard, or a meteor hit Los Angeles, the twins’ private key would still be safe.

…

Tyler corralled the security expert by the pool table, where McCaleb and Levchin were geeking out on god knows what. “Why all three?” Tyler asked. “Doesn’t one do the job?” Kaminsky shrugged. “The second one is to tell if the first one is broken. The third is to tell if the other two are lying.” It was exactly how Tyler should have expected a security engineer to think—in terms of systems and their fault tolerance and integrity. Over the next ten minutes, he interrogated Kaminsky about his hacking efforts; at first, the security expert had expected to be able to penetrate such a complex piece of code easily—the fact that it was so complex, so long, meant there should have been many weak spots to exploit.

pages: 1,201 words: 233,519

Coders at Work by Peter Seibel

Ada Lovelace, Bill Atkinson, bioinformatics, Bletchley Park, Charles Babbage, cloud computing, Compatible Time-Sharing System, Conway's Game of Life, Dennis Ritchie, domain-specific language, don't repeat yourself, Donald Knuth, fallacies of distributed computing, fault tolerance, Fermat's Last Theorem, Firefox, Free Software Foundation, functional programming, George Gilder, glass ceiling, Guido van Rossum, history of Unix, HyperCard, industrial research laboratory, information retrieval, Ken Thompson, L Peter Deutsch, Larry Wall, loose coupling, Marc Andreessen, Menlo Park, Metcalfe's law, Multics, no silver bullet, Perl 6, premature optimization, publish or perish, random walk, revision control, Richard Stallman, rolodex, Ruby on Rails, Saturday Night Live, side project, slashdot, speech recognition, systems thinking, the scientific method, Therac-25, Turing complete, Turing machine, Turing test, type inference, Valgrind, web application

When we first did Erlang and we went to conferences and said, “You should copy all your data.” And I think they accepted the arguments over fault tolerance—the reason you copy all your data is to make the system fault tolerant. They said, “It'll be terribly inefficient if you do that,” and we said, “Yeah, it will but it'll be fault tolerant.” The thing that is surprising is that it's more efficient in certain circumstances. What we did for the reasons of fault tolerance, turned out to be, in many circumstances, just as efficient or even more efficient than sharing. Then we asked the question, “Why is that?”

The Art of Scalability: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise by Martin L. Abbott, Michael T. Fisher

always be closing, anti-pattern, barriers to entry, Bernie Madoff, business climate, business continuity plan, business intelligence, business logic, business process, call centre, cloud computing, combinatorial explosion, commoditize, Computer Numeric Control, conceptual framework, database schema, discounted cash flows, Dunning–Kruger effect, en.wikipedia.org, fault tolerance, finite state, friendly fire, functional programming, hiring and firing, Infrastructure as a Service, inventory management, machine readable, new economy, OSI model, packet switching, performance metric, platform as a service, Ponzi scheme, power law, RFC: Request For Comment, risk tolerance, Rubik’s Cube, Search for Extraterrestrial Intelligence, SETI@home, shareholder value, Silicon Valley, six sigma, software as a service, the scientific method, transaction costs, Vilfredo Pareto, web application, Y2K

If we have a technology platform comprised of a number of noncommunicating services, we increase the number of airports or runways for which we are managing traffic; as a result, we can have many more “landings” or changes. If the services communicate asynchronously, we would have a few more concerns, but we are also likely more willing to take risks. On the other hand, if the services all communicate synchronously with each other, there isn’t much more fault tolerance than with a monolithic system (see Chapter 21, Creating Fault Isolative Architectural Structures) and we are back to managing a single runway at a single airport. The expected result of the change is important as we want to be able to verify later that the change was successful. For instance, if a change is being made to a Web server and that change is to allow more threads of execution in the Web server, we should state that as the expected result.

…

If availability and reliability are important to you and your customers, try to be an early majority or late majority adopter of those systems that are critical to the operations of your service, product, or platform. Asynchronous Design Whenever possible, systems should communicate in an asynchronous fashion. Asynchronous systems tend to be more fault tolerant to extreme load and do not easily fall prey to the multiplicative effects of failure that characterize synchronous systems. We will discuss the reasons for this in greater detail in the next section of this chapter. Stateless Systems Although some systems need state, state has a cost in terms of availability, scalability, and overall cost of your system.

…

The first factor to use in determining which services should be selected for stress testing is the criticality of each service to the overall system performance. If there is a central service such as a data abstract layer (DAL) or user authorization, this should be included as a candidate for stress testing because the stability of the entire application depends on this service. If you have architected your application into fault tolerant “swim lanes,” which will be discussed in Chapter 21, Creating Fault Isolative Architectural Structures, you still likely have core services that have been replicated across the lanes. The second consideration for determining services to stress test is the likelihood that a service affects performance.

pages: 102 words: 27,769

Rework by Jason Fried, David Heinemeier Hansson

call centre, Clayton Christensen, Dean Kamen, Exxon Valdez, fault tolerance, fixed-gear, James Dyson, Jeff Bezos, Ralph Nader, risk tolerance, Ruby on Rails, Steve Jobs, Tony Hsieh, Y Combinator

Rework is not just smart and succinct but grounded in the concreteness of doing rather than hard-to-apply philosophizing. This book inspired me to trust myself in defying the status quo.” —Penelope Trunk, author of Brazen Careerist: The New Rules for Success “[This book’s] assumption is that an organization is a piece of software. Editable. Malleable. Sharable. Fault-tolerant. Comfortable in Beta. Reworkable. The authors live by the credo ‘keep it simple, stupid’ and Rework possesses the same intelligence—and irreverence—of that simple adage.” —John Maeda, author of The Laws of Simplicity “Rework is like its authors: fast-moving, iconoclastic, and inspiring. It’s not just for startups.

pages: 400 words: 94,847

Reinventing Discovery: The New Era of Networked Science by Michael Nielsen

Albert Einstein, augmented reality, barriers to entry, bioinformatics, Cass Sunstein, Climategate, Climatic Research Unit, conceptual framework, dark matter, discovery of DNA, Donald Knuth, double helix, Douglas Engelbart, Douglas Engelbart, Easter island, en.wikipedia.org, Erik Brynjolfsson, fault tolerance, Fellow of the Royal Society, Firefox, Free Software Foundation, Freestyle chess, Galaxy Zoo, Higgs boson, Internet Archive, invisible hand, Jane Jacobs, Jaron Lanier, Johannes Kepler, Kevin Kelly, Large Hadron Collider, machine readable, machine translation, Magellanic Cloud, means of production, medical residency, Nicholas Carr, P = NP, P vs NP, publish or perish, Richard Feynman, Richard Stallman, selection bias, semantic web, Silicon Valley, Silicon Valley startup, Simon Singh, Skype, slashdot, social intelligence, social web, statistical model, Stephen Hawking, Stewart Brand, subscription business, tacit knowledge, Ted Nelson, the Cathedral and the Bazaar, The Death and Life of Great American Cities, The Nature of the Firm, The Wisdom of Crowds, University of East Anglia, Vannevar Bush, Vernor Vinge, Wayback Machine, Yochai Benkler

Nature, 442:981, August 31, 2006. [21] John Bohannon. Gamers unravel the secret life of protein. Wired, 17(5), April 20, 2009. http://www.wired.com/medtech/genetics/magazine/17-05/ff_protein?currentPage=all. [22] Parsa Bonderson, Sankar Das Sarma, Michael Freedman, and Chetan Nayak. A blueprint for a topologically fault-tolerant quantum computer. eprint arXiv:1003.2856, 2010. [23] Christine L. Borgman. Scholarship in the Digital Age. Cambrdge, MA: MIT Press, 2007. [24] Kirk D. Borne et al. Astroinformatics: A 21st century approach to astronomy. eprint arXiv: 0909.3892, 2009. Position paper for Astro2010 Decadal Survey State, available at http://arxiv.org/abs/0909.3892

…

Edge: The Third Culture, 2006. http://www.edge.org/3rd_culture/kelly06/kelly06_index.html. [109] Kevin Kelly. What Technology Wants. New York: Viking, 2010. [110] Richard A. Kerr. Recently discovered habitable world may not exist. Science Now, October 12, 2010. http://news.sciencemag.org/sciencenow/2010/10/recently-discovered-habitable-world.html. [111] A. Yu Kitaev. Fault-tolerant quantum computation by anyons. Annals of Physics, 303(1):2–30, 2003. [112] Helge Kragh. Max Planck: The reluctant revolutionary. Physics World, December 2000. http://physicsworld.com/cws/article/print/373. [113] Greg Kroah-Hartman. The Linux kernel. Online video from Google Tech Talks. http://www.youtube.com/watch?

pages: 178 words: 33,275

Ansible Playbook Essentials by Gourav Shah

Amazon Web Services, cloud computing, Debian, DevOps, fault tolerance, information security, web application

Kotian Copy Editors Pranjali Chury Neha Vyas Project Coordinator Suzanne Coutinho Proofreader Safis Editing Indexer Monica Ajmera Mehta Graphics Jason Monteiro Production Coordinator Nilesh R. Mohite Cover Work Nilesh R. Mohite About the Author Gourav Shah (www.gouravshah.com) has extensive experience in building and managing highly available, automated, fault-tolerant infrastructure and scaling it. He started his career as a passionate Linux and open source enthusiast, transformed himself into an operations engineer, and evolved to be a cloud and DevOps expert and trainer. In his previous avatar, Gourav headed IT operations for Efficient Frontier (now Adobe), India.

pages: 554 words: 108,035

Scala in Depth by Tom Kleenex, Joshua Suereth

discrete time, domain-specific language, duck typing, fault tolerance, functional programming, higher-order functions, MVC pattern, sorting algorithm, type inference

These aren’t discussed in the book, but can be found in Akka’s documentation at http://akka.io/docs/ This technique can be powerful when distributed and clustered. The Akka 2.0 framework is adding the ability to create actors inside a cluster and allow them to be dynamically moved around to machines as needed. 9.6. Summary Actors provide a simpler parallelization model than traditional locking and threading. A well-behaved actors system can be fault-tolerant and resistant to total system slowdown. Actors provide an excellent abstraction for designing high-performance servers, where throughput and uptime are of the utmost importance. For these systems, designing failure zones and failure handling behaviors can help keep a system running even in the event of critical failures.

…

So, while the Scala actors library is an excellent resource for creating actors applications, the Akka library provides the features and performance needed to make a production application. Akka also supports common features out of the box. Actors and actor-related system design is a rich subject. This chapter lightly covered a few of the key aspects to actor-related design. These should be enough to create a fault-tolerant high-performant actors system. Next let’s look into a topic of great interest: Java interoperability with Scala. Chapter 10. Integrating Scala with Java In this chapter The benefits of using interfaces for Scala-Java interaction The dangers of automatic implicit conversions of Java types The complications of Java serialization in Scala How to effectively use annotations in Scala for Java libraries One of the biggest advantages of the Scala language is its ability to seamlessly interact with existing Java libraries and applications.

HBase: The Definitive Guide by Lars George

Alignment Problem, Amazon Web Services, bioinformatics, create, read, update, delete, Debian, distributed revision control, domain-specific language, en.wikipedia.org, fail fast, fault tolerance, Firefox, FOSDEM, functional programming, Google Earth, information security, Kickstarter, place-making, revision control, smart grid, sparse data, web application

You may have a background in relational database theory or you want to start fresh and this “column-oriented thing” is something that seems to fit your bill. You also heard that HBase can scale without much effort, and that alone is reason enough to look at it since you are building the next web-scale system. I was at that point in late 2007 when I was facing the task of storing millions of documents in a system that needed to be fault-tolerant and scalable while still being maintainable by just me. I had decent skills in managing a MySQL database system, and was using the database to store data that would ultimately be served to our website users. This database was running on a single server, with another as a backup. The issue was that it would not be able to hold the amount of data I needed to store for this new project.

…

The question is, wouldn’t it be good to trade relational features permanently for performance? You could denormalize (see the next section) the data model and avoid waits and deadlocks by minimizing necessary locking. How about built-in horizontal scalability without the need to repartition as your data grows? Finally, throw in fault tolerance and data availability, using the same mechanisms that allow scalability, and what you get is a NoSQL solution—more specifically, one that matches what HBase has to offer. Database (De-)Normalization At scale, it is often a requirement that we design schema differently, and a good term to describe this principle is Denormalization, Duplication, and Intelligent Keys (DDI).[20] It is about rethinking how data is stored in Bigtable-like storage systems, and how to make use of it in an appropriate way.

…

HDFS is the most used and tested filesystem in production. Almost all production clusters use it as the underlying storage layer. It is proven stable and reliable, so deviating from it may impose its own risks and subsequent problems. The primary reason HDFS is so popular is its built-in replication, fault tolerance, and scalability. Choosing a different filesystem should provide the same guarantees, as HBase implicitly assumes that data is stored in a reliable manner by the filesystem. It has no added means to replicate data or even maintain copies of its own storage files. This functionality must be provided by the lower-level system.

pages: 541 words: 109,698

Mining the Social Web: Finding Needles in the Social Haystack by Matthew A. Russell

Andy Rubin, business logic, Climategate, cloud computing, crowdsourcing, data science, en.wikipedia.org, fault tolerance, Firefox, folksonomy, full text search, Georg Cantor, Google Earth, information retrieval, machine readable, Mark Zuckerberg, natural language processing, NP-complete, power law, Saturday Night Live, semantic web, Silicon Valley, slashdot, social graph, social web, sparse data, statistical model, Steve Jobs, supply-chain management, text mining, traveling salesman, Turing test, web application

Sorting by date seems like a good idea and opens the door to certain kinds of time-series analysis, so let’s start there and see what happens. But first, we’ll need to make a small configuration change so that we can write our map/reduce functions to perform this task in Python. CouchDB is especially intriguing in that it’s written in Erlang, a language engineered to support super-high concurrency[16] and fault tolerance. The de facto out-of-the-box language you use to query and transform your data via map/reduce functions is JavaScript. Note that we could certainly opt to write map/reduce functions in JavaScript and realize some benefits from built-in JavaScript functions CouchDB offers—such as _sum, _count, and _stats.

…

But before we get too pie-in-the-sky, let’s back up for just a moment and reflect on how we got to where we are right now. The Internet is just a network of networks,[63] and what’s very fascinating about it from a technical standpoint is how layers of increasingly higher-level protocols build on top of lower-level protocols to ultimately produce a fault-tolerant worldwide computing infrastructure. In our online activity, we rely on dozens of protocols every single day, without even thinking about it. However, there is one ubiquitous protocol that is hard not to think about explicitly from time to time: HTTP, the prefix of just about every URL that you type into your browser, the enabling protocol for the extensive universe of hypertext documents (HTML pages), and the links that glue them all together into what we know as the Web.

pages: 470 words: 109,589

Apache Solr 3 Enterprise Search Server by Unknown

bioinformatics, business logic, continuous integration, database schema, en.wikipedia.org, fault tolerance, Firefox, full text search, functional programming, information retrieval, natural language processing, performance metric, platform as a service, Ruby on Rails, SQL injection, Wayback Machine, web application

The distributed search of Solr doesn't adapt to real time changes in indexing or query load and doesn't provide any fail-over support. SolrCloud is an ongoing effort to build a fault tolerant, centrally managed support for clusters of Solr instances and is part of the trunk development path (Solr 4.0). SolrCloud introduces the idea that a logical collection of documents (otherwise known as an index) is distributed across a number of slices. Each slice is made up of shards, which are the physical pieces of the collection. In order to support fault tolerance, there may be multiple replicas of a shard distributed across different physical nodes. To keep all this data straight, Solr embeds Apache ZooKeeper as the centralized service for managing all configuration information for the cluster of Solr instances, including mapping which shards are available on which set of nodes of the cluster.

pages: 210 words: 42,271

Programming HTML5 Applications by Zachary Kessin

barriers to entry, continuous integration, fault tolerance, Firefox, functional programming, Google Chrome, higher-order functions, machine readable, mandelbrot fractal, QWERTY keyboard, SQL injection, web application, WebSocket

ws.onmessage { |msg| ws.send "Pong: #{msg}" } ws.onclose { puts "WebSocket closed" } end Erlang Yaws Erlang is a pretty rigorously functional language that was developed several decades ago for telephone switches and has found acceptance in many other areas where massive parallelism and strong robustness are desired. The language is concurrent, fault-tolerant, and very scalable. In recent years it has moved into the web space because all of the traits that make it useful in phone switches are very useful in a web server. The Erlang Yaws web server also supports web sockets right out of the box. The documentation can be found at the Web Sockets in Yaws web page, along with code for a simple echo server.

pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst

algorithmic trading, bioinformatics, business intelligence, business logic, business process, call centre, cloud computing, create, read, update, delete, data acquisition, data science, DevOps, extractivism, fault tolerance, information security, Large Hadron Collider, linked data, machine readable, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, warehouse automation, Watson beat the top human players on Jeopardy!, web application

Big Data analytics requires that organizations choose the data to analyze, consolidate them, and then apply aggregation methods before the data can be subjected to the ETL process. This has to occur with large volumes of data, which can be structured, unstructured, or from multiple sources, such as social networks, data logs, web sites, mobile devices, and sensors. Hadoop accomplishes that by incorporating pragmatic processes and considerations, such as a fault-tolerant clustered architecture, the ability to move computing power closer to the data, parallel and/or batch processing of large data sets, and an open ecosystem that supports enterprise architecture layers from data storage to analytics processes. Not all enterprises require what Big Data analytics has to offer; those that do must consider Hadoop’s ability to meet the challenge.

pages: 179 words: 42,081

DeFi and the Future of Finance by Campbell R. Harvey, Ashwin Ramachandran, Joey Santoro, Vitalik Buterin, Fred Ehrsam

activist fund / activist shareholder / activist investor, bank run, barriers to entry, bitcoin, blockchain, collateralized debt obligation, crowdsourcing, cryptocurrency, David Graeber, Ethereum, ethereum blockchain, fault tolerance, fiat currency, fixed income, Future Shock, initial coin offering, Jane Street, margin call, money: store of value / unit of account / medium of exchange, Network effects, non-fungible token, passive income, peer-to-peer, prediction markets, rent-seeking, RFID, risk tolerance, Robinhood: mobile stock trading app, Satoshi Nakamoto, seigniorage, smart contracts, transaction costs, Vitalik Buterin, yield curve, zero-coupon bond

VII RISKS As we have emphasized in previous sections, DeFi allows developers to create new types of financial products and services, expanding the possibilities of financial technology. While DeFi can eliminate counterparty risk – cutting out intermediators and allowing financial assets to be exchanged in a trustless way – all innovative technologies introduce a new set of risks. To provide users and institutions with a robust and fault-tolerant system capable of handling new financial applications at scale, we must confront and properly mitigate these risks; otherwise, DeFi will remain an exploratory technology, restricting its use, adoption, and appeal. The principal risks DeFi faces today are smart contract,governance, oracle, scaling, DEX custodial, environmental,and regulatory.

Seeking SRE: Conversations About Running Production Systems at Scale by David N. Blank-Edelman

Affordable Care Act / Obamacare, algorithmic trading, AlphaGo, Amazon Web Services, backpropagation, Black Lives Matter, Bletchley Park, bounce rate, business continuity plan, business logic, business process, cloud computing, cognitive bias, cognitive dissonance, cognitive load, commoditize, continuous integration, Conway's law, crowdsourcing, dark matter, data science, database schema, Debian, deep learning, DeepMind, defense in depth, DevOps, digital rights, domain-specific language, emotional labour, en.wikipedia.org, exponential backoff, fail fast, fallacies of distributed computing, fault tolerance, fear of failure, friendly fire, game design, Grace Hopper, imposter syndrome, information retrieval, Infrastructure as a Service, Internet of things, invisible hand, iterative process, Kaizen: continuous improvement, Kanban, Kubernetes, loose coupling, Lyft, machine readable, Marc Andreessen, Maslow's hierarchy, microaggression, microservices, minimum viable product, MVC pattern, performance metric, platform as a service, pull request, RAND corporation, remote working, Richard Feynman, risk tolerance, Ruby on Rails, Salesforce, scientific management, search engine result page, self-driving car, sentiment analysis, Silicon Valley, single page application, Snapchat, software as a service, software is eating the world, source of truth, systems thinking, the long tail, the scientific method, Toyota Production System, traumatic brain injury, value engineering, vertical integration, web application, WebSocket, zero day

As we close out, you should take the following points with you: Third parties are an extension of your stack, not ancillary. If it’s critical path, treat it like a service. Consider abandonment during the life cycle of an integration. The quality of your third-party integration depends on good communication. Contributor Bio Jonathan Mercereau has spent his career leading teams and architecting resilient, fault tolerant, and performant solutions working with the biggest players in DNS, CDN, Certificate Authority, and Synthetic Monitoring. Chances are, you’ve experienced the results of his work, from multi-CDN and optimizing streaming algorithms at Netflix to all multi-vendor solutions and performance enhancements at LinkedIn.

…

The result is that we now have systems that lack unique state. In such a world, reverting a software change can make the system take on a more familiar appearance, but it might not restore the world to the way it once was. Special Knowledge About Complex Systems The situation facing SREs is seldom simple. The fault-tolerance mechanisms built into the design of distributed systems and related automation handle most problems that arise. Because of this, incidents represent situations that fall outside of the “most problems” boundary. Reasoning about cause and effect here is often challenging. For example, simply observing that a process is failing does not necessarily mean that fixing that process will resolve the incident.

…

Operations groups often take on big projects to increase MTBF and decrease MTTR, usually at the level of hardware components, because this is the only level at which assumptions of rationality hold well enough for “mean time to anything” to be well defined. It’s certainly worthwhile for a team to optimize within one “accountability domain” like this. Even so, when you’re looking at the entire system, an increase in fault tolerance beyond “barely acceptable” tends to be immediately eaten up by another layer. Suppose that you have a distributed storage system that was deployed to tolerate three simultaneous disk failures in an array. Then the hardware team, full of gumption and wishing to be promoted, takes clever measures to “guarantee” that the array will never have more than one disk down.

pages: 377 words: 21,687

Digital Apollo: Human and Machine in Spaceflight by David A. Mindell

"Margaret Hamilton" Apollo, 1960s counterculture, Apollo 11, Apollo 13, Apollo Guidance Computer, Charles Lindbergh, computer age, deskilling, Dr. Strangelove, Fairchild Semiconductor, fault tolerance, Gene Kranz, interchangeable parts, Lewis Mumford, Mars Rover, more computing power than Apollo, Neil Armstrong, Norbert Wiener, Norman Mailer, orbital mechanics / astrodynamics, Silicon Valley, sparse data, Stewart Brand, systems thinking, tacit knowledge, telepresence, telerobotics, The Soul of a New Machine

Holliday, Will L., and Dale P. Hoffman. ‘‘Systems Approach to Flight Controls.’’ Astronautics (May 1962): 36–37, 74–80. Hong, Sungook. ‘‘Man and Machine in the 1960s.’’ Techne 7, no. 3 (2004): 49–77. Hopkins, Albert L. ‘‘A Fault-Tolerant Information Processing Concept for Space Vehicles.’’ Cambridge, Mass.: MIT Instrumentation Laboratory, 1970. Hopkins, Albert L. ‘‘A Fault-Tolerant Information Processing System for Advanced Control, Guidance, and Navigation.’’ Cambridge, Mass.: Charles Stark Draper Laboratories, 1970. Hopkins Jr., Albert L., Ramon Alonso, and Hugh Blair-Smith. ‘‘Logical Description for the Apollo Guidance Computer (AGC4).’’

pages: 271 words: 52,814

Blockchain: Blueprint for a New Economy by Melanie Swan

23andMe, Airbnb, altcoin, Amazon Web Services, asset allocation, banking crisis, basic income, bioinformatics, bitcoin, blockchain, capital controls, cellular automata, central bank independence, clean water, cloud computing, collaborative editing, Conway's Game of Life, crowdsourcing, cryptocurrency, data science, digital divide, disintermediation, Dogecoin, Edward Snowden, en.wikipedia.org, Ethereum, ethereum blockchain, fault tolerance, fiat currency, financial innovation, Firefox, friendly AI, Hernando de Soto, information security, intangible asset, Internet Archive, Internet of things, Khan Academy, Kickstarter, Large Hadron Collider, lifelogging, litecoin, Lyft, M-Pesa, microbiome, Neal Stephenson, Network effects, new economy, operational security, peer-to-peer, peer-to-peer lending, peer-to-peer model, personalized medicine, post scarcity, power law, prediction markets, QR code, ride hailing / ride sharing, Satoshi Nakamoto, Search for Extraterrestrial Intelligence, SETI@home, sharing economy, Skype, smart cities, smart contracts, smart grid, Snow Crash, software as a service, synthetic biology, technological singularity, the long tail, Turing complete, uber lyft, unbanked and underbanked, underbanked, Vitalik Buterin, Wayback Machine, web application, WikiLeaks

Consensus without mining is another area being explored, such as in Tendermint’s modified version of DLS (the solution to the Byzantine Generals’ Problem by Dwork, Lynch, and Stockmeyer), with bonded coins belonging to byzantine participants.184 Another idea for consensus without mining or proof of work is through a consensus algorithm such as Hyperledger’s, which is based on the Practical Byzantine Fault Tolerance algorithm. Only focus on the most recent or unspent outputs Many blockchain operations could be based on surface calculations of the most recent or unspent outputs, similar to how credit card transactions operate. “Thin wallets” operate this way, as opposed to querying a full Bitcoind node, and this is how Bitcoin ewallets work on cellular telephones.

Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, data science, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Gregor Mendel, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, Large Hadron Collider, longitudinal study, machine readable, machine translation, Mars Rover, natural language processing, openstreetmap, Paradox of Choice, power law, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social bookmarking, social graph, SPARQL, sparse data, speech recognition, statistical model, supply-chain management, systematic bias, TED Talk, text mining, the long tail, Vernor Vinge, web application

Acknowledgments Thanks to Darius Bacon, Thorsten Brants, Andy Golding, Mark Paskin, Franco Salvetti, and Casey Whitelaw for comments, corrections, and code. 242 CHAPTER FOURTEEN Download at Boykma.Com Chapter 15 CHAPTER FIFTEEN Life in Data: The Story of DNA Matt Wood and Ben Blackburne DNA IS A BIOLOGICAL BUILDING BLOCK, A CONCISE, SCHEMA-LESS, FAULT-TOLERANT DATABASE OF AN organism’s chemical makeup, designed and implemented by a population over millions of years. Over the past 20 years, biologists have begun to move from the study of individual genes to whole genomes, with genomic approaches forming an increasingly large part of modern biomedical research.

…

It is written in the molecules of DNA, copies of which are stored in each cell of the human body (with a few exceptions). This pattern is repeated across nature, right down to the simplest forms of life. The information encoded within the genome contains the directions to build the proteins that make up the molecular machinery that runs the chemistry of the cell. Now that’s what I call fault-tolerant and redundant storage. 243 Download at Boykma.Com Almost every cell in your body contains a central data center, which stores these genomic databases, called the nucleus. Within this are the chromosomes. Like all humans, you are diploid, with two copies of each chromosome, one from your father and one from your mother.

Programming Android by Zigurd Mednieks, Laird Dornin, G. Blake Meike, Masumi Nakamura

anti-pattern, business process, conceptual framework, create, read, update, delete, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, general purpose technology, Google Earth, interchangeable parts, iterative process, loose coupling, MVC pattern, revision control, RFID, SQL injection, systems thinking, web application

SQLite Android uses the SQLite database engine, a self-contained, transactional database engine that requires no separate server process. Many applications and environments beyond Android make use of it, and a large open source community actively develops SQLite. In contrast to desktop-oriented or enterprise databases, which provide a plethora of features related to fault tolerance and concurrent access to data, SQLite aggressively strips out features that are not absolutely necessary in order to achieve a small footprint. For example, many database systems use static typing, but SQLite does not store database type information. Instead, it pushes the responsibility of keeping type information into high-level languages, such as Java, that map database structures into high-level types.

…

For a given transaction, SQLite does not modify the database until all statements in the transaction have completed successfully. Given the volatility of the Android mobile environment, we recommend that in addition to meeting the needs for consistency in your app, you also make liberal use of transactions to support fault tolerance in your application. Example Database Manipulation Using sqlite3 Now that you understand the basics of SQL as it pertains to SQLite, let’s have a look at a simple database for storing video metadata using the sqlite3 command-line tool and the Android debug shell, which you can start by using the adb command.

pages: 211 words: 58,677

Philosophy of Software Design by John Ousterhout

cognitive load, conceptual framework, fault tolerance, functional programming, iterative process, move fast and break things, MVC pattern, revision control, Silicon Valley

In a distributed system, network packets may be lost or delayed, servers may not respond in a timely fashion, or peers may communicate in unexpected ways. The code may detect bugs, internal inconsistencies, or situations it is not prepared to handle. Large systems have to deal with many exceptional conditions, particularly if they are distributed or need to be fault-tolerant. Exception handling can account for a significant fraction of all the code in a system. Exception handling code is inherently more difficult to write than normal-case code. An exception disrupts the normal flow of the code; it usually means that something didn’t work as expected. When an exception occurs, the programmer can deal with it in two ways, each of which can be complicated.

pages: 208 words: 57,602

Futureproof: 9 Rules for Humans in the Age of Automation by Kevin Roose

"World Economic Forum" Davos, adjacent possible, Airbnb, Albert Einstein, algorithmic bias, algorithmic management, Alvin Toffler, Amazon Web Services, Atul Gawande, augmented reality, automated trading system, basic income, Bayesian statistics, Big Tech, big-box store, Black Lives Matter, business process, call centre, choice architecture, coronavirus, COVID-19, data science, deep learning, deepfake, DeepMind, disinformation, Elon Musk, Erik Brynjolfsson, factory automation, fake news, fault tolerance, Frederick Winslow Taylor, Freestyle chess, future of work, Future Shock, Geoffrey Hinton, George Floyd, gig economy, Google Hangouts, GPT-3, hiring and firing, hustle culture, hype cycle, income inequality, industrial robot, Jeff Bezos, job automation, John Markoff, Kevin Roose, knowledge worker, Kodak vs Instagram, labor-force participation, lockdown, Lyft, mandatory minimum, Marc Andreessen, Mark Zuckerberg, meta-analysis, Narrative Science, new economy, Norbert Wiener, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, off-the-grid, OpenAI, pattern recognition, planetary scale, plutocrats, Productivity paradox, QAnon, recommendation engine, remote working, risk tolerance, robotic process automation, scientific management, Second Machine Age, self-driving car, Shoshana Zuboff, Silicon Valley, Silicon Valley startup, social distancing, Steve Jobs, Stuart Kauffman, surveillance capitalism, tech worker, The Future of Employment, The Wealth of Nations by Adam Smith, TikTok, Travis Kalanick, Uber and Lyft, uber lyft, universal basic income, warehouse robotics, Watson beat the top human players on Jeopardy!, work culture

Which means that people with unusual combinations of skills—like a zoologist with a math degree, or a graphic designer who knows everything there is to know about folk music—will have an upper hand against AI. Another type of scarce work that will be hard to automate is work that involves rare or high-stakes situations with low fault tolerance. Most AI learns in an iterative way—that is, it repeats a task over and over again, getting it a little more right each time. But in the real world, we don’t always have time to run a thousand tests, and we know, intuitively, that there are things that are too important to entrust to machines.

The Dream Machine: J.C.R. Licklider and the Revolution That Made Computing Personal by M. Mitchell Waldrop

Ada Lovelace, air freight, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anti-communist, Apple II, battle of ideas, Berlin Wall, Bill Atkinson, Bill Duvall, Bill Gates: Altair 8800, Bletchley Park, Boeing 747, Byte Shop, Charles Babbage, Claude Shannon: information theory, Compatible Time-Sharing System, computer age, Computing Machinery and Intelligence, conceptual framework, cuban missile crisis, Dennis Ritchie, do well by doing good, Donald Davies, double helix, Douglas Engelbart, Douglas Engelbart, Dynabook, experimental subject, Fairchild Semiconductor, fault tolerance, Frederick Winslow Taylor, friendly fire, From Mathematics to the Technologies of Life and Death, functional programming, Gary Kildall, Haight Ashbury, Howard Rheingold, information retrieval, invisible hand, Isaac Newton, Ivan Sutherland, James Watt: steam engine, Jeff Rulifson, John von Neumann, Ken Thompson, Leonard Kleinrock, machine translation, Marc Andreessen, Menlo Park, Multics, New Journalism, Norbert Wiener, packet switching, pink-collar, pneumatic tube, popular electronics, RAND corporation, RFC: Request For Comment, Robert Metcalfe, Silicon Valley, Skinner box, Steve Crocker, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Ted Nelson, The Soul of a New Machine, Turing machine, Turing test, Vannevar Bush, Von Neumann architecture, Wiener process, zero-sum game

He and his colleagues would have to give up every engineer's first instinct, which was to control things so that problems could not happen, and instead design a system that was guaranteed to fail-but that would keep running anyhow. Nowadays this is known as a fault-tolerant system, and designing one is still considered a cutting-edge challenge. It means giving the system some of the same quality possessed by a superbly trained military unit, or a talented football team, or, for that matter, any living organism-namely, an ability to react to the unexpected. But in the early 1960s, with CTSS, Corbato and his colleagues had to pioneer fault-tolerant design even as they were pioneering time-sharing itself: For example, among their early innovations were "firewalls," or software barriers that kept each user's area of computer memory isolated from its neighbors, so that a flameout in one program wouldn't necessarily consume the others.

…

., 197-98 electncal engmeenng, 82 Electncal Engmeermg, 113 electnc power networks, 25-26 electroencephalography (EEG), 11-12 electronIc commons Idea, 413-14,420 ElectronIc Discrete Vanable Automatic Computer (EDVAC),47, 100-101 von Neumann's report on, 59-65 Electronic News, 338 ElectronIc Numencal Integrator and Calculator (EN lAC), 43,45-47,87-88, 101, 102, 103,339 drawbacks of, 46-47 patent dispute over, 63 programmmg of, 46-47 electronIc office Idea, 363-64, 407 ElIas, Peter, 220 ELIZA, 229 Elkmd, Jerry, 110, 111, 152, 175-76, 194, 295, 345, 351, 354,368,371,399,438,444, 446,447 Ellenby, John, 382, 408 EllIs, Jim, 427 E-maIl, 231, 324-26, 384, 420, 465 Engelbart, Douglas, 210-17, 241-43,255,261,273,278, 285, 342, 358, 360n, 364, 406,465,470 at Fall JOInt Computer Confer- ence,5,287-94 EnglIsh, BIll, 242, 243, 289-90, 293-94,354,355,361-62, 365n,366,368 EN lAC, see ElectroniC Numeflcal Integrator and Calculator EnIgma machines, 80 entropy, 81 error-checking codes, 271 error-correcting codes, 79-80, 94n Ethernet, 5, 374-75, 382, 385, 386,439-40,452 Ethernet-AI to- RCG-S LOT (EARS), 385 EuclId, 137 Evans, David, 239, 261, 274, 282, 303, 343, 357, 358 Everett, Robert, 102-3, 108 expectation, 10 behavIOral theory, 74, 97 expert systems, 397-98, 406 ExtrapolatIOn, InterpolatlOn, and Smoothmg of StatIOnary Time Series (Wiener), 54 facsimile machines, 347-48 Fahlman, Scott, 438 FairchIld Semiconductor, 339 Fall JOInt Computer Conference, 5,287-94 Fano, Robert, 19, 75, 94-95, 107, 174,193,217-24,227-36, 243,244,249-51,252-53, 257, 281, 307, 310, 317, 453 FantasIa, 338 Farley, Belmont, 144 fault-tolerant systems, 234 Federal Research Internet Coor- dlnatmg Committee (FRICC), 462 feedback, 55-57, 92, 138 Feigenbaum, Edward, 210, 281, 396,397-98,403,405-6 Fiala, Ed, 346 file systems, hierarchical, 230 FIle Transfer Protocol (FTP), 301 firewalls, 234 "First Draft of a Report on the EDV AC" (von Neumann), 59-65,68,86,102 flat-panel displays, 359 Flegal, Bob, 345 FLEX machine, 358, 359, 361 Flexownter, 166, 188 flight simulators, 101-2 floppy disks, CP/M software and, 434 Ford Motor Company, 334, 335, 336, 337, 389 Forrester, Jay, 102-3, 113, 114-15, 117, 173,230-31 Fortran, 165, 168, 169, 171-72, 246 Fortune, 27, 93 Fossum, Bob, 418, 420 Foster, John, 278, 279, 330 Frankston, Bob, 315 Fredkin, Edward, 152-56, 179, 194,208,313-14,323,412 Freeman, Greydon, 457 Freyman, Monsieur, 83 FRICC (Federal Research Inter- net Coordlnatmg Commit- tee), 462 Fnck, Fredenck, 97, 128, 201-2, 203n Fublni, Gene, 202 Fuchs, Ira, 457 FUJI Xerox, 409 Fumblmg the Future (Smith and Alexander), 382n, 446 functions, 10 lIst processing, 169-70 Galanter, Eugene, 139 Galley, Stuart, 319-20 games, computer, 188, 320, 435 game theory, 85-86, 91 Garner, W.

pages: 247 words: 60,543

The Currency Cold War: Cash and Cryptography, Hash Rates and Hegemony by David G. W. Birch

"World Economic Forum" Davos, Alan Greenspan, algorithmic management, AlphaGo, bank run, Big Tech, bitcoin, blockchain, Bretton Woods, BRICs, British Empire, business cycle, capital controls, cashless society, central bank independence, COVID-19, cross-border payments, cryptocurrency, Diane Coyle, disintermediation, distributed ledger, Donald Trump, driverless car, Elon Musk, Ethereum, ethereum blockchain, facts on the ground, fault tolerance, fiat currency, financial exclusion, financial innovation, financial intermediation, floating exchange rates, forward guidance, Fractional reserve banking, global reserve currency, global supply chain, global village, Hyman Minsky, information security, initial coin offering, Internet of things, Jaron Lanier, Kenneth Rogoff, knowledge economy, M-Pesa, Mark Zuckerberg, market clearing, market design, Marshall McLuhan, mobile money, Money creation, money: store of value / unit of account / medium of exchange, moral hazard, Network effects, new economy, Northern Rock, one-China policy, Overton Window, PalmPilot, pattern recognition, Pingit, QR code, quantum cryptography, race to the bottom, railway mania, ransomware, Real Time Gross Settlement, reserve currency, Satoshi Nakamoto, seigniorage, Silicon Valley, smart contracts, social distancing, sovereign wealth fund, special drawing rights, subscription business, the payments system, too big to fail, transaction costs, Vitalik Buterin, Washington Consensus

This means that it can take a while to establish consensus, which can remain probabilistic for some time (with Bitcoin, for example, people generally wait for an hour or so in order to see which chain has ‘won’). Nevertheless, the science of consensus protocols is well known, highly developed and widely used to create alternatives (Kravchenko et al. 2018). In particular, cryptographers have been exploring what are known as Byzantine fault tolerant (BFT) protocols that use rounds of voting between participants to agree on the state of the ledger (or anything else). These protocols function as long as no more than a third of the participants are malicious. Thus, they work well when the number of participants is limited, so the voting overhead is not so great, although there are variations that allow for much larger groups of participants to interact, such as the Federated Byzantine Agreement (FBA) used in Stellar.

pages: 265 words: 60,880

The Docker Book by James Turnbull

Airbnb, continuous integration, Debian, DevOps, domain-specific language, false flag, fault tolerance, job automation, Kickstarter, Kubernetes, microservices, MVC pattern, platform as a service, pull request, Ruby on Rails, software as a service, standardized shipping container, web application

This provides the information needed, for example an IP address or port or both, to allow interaction between services. Our example service discovery tool, Consul, is a specialized datastore that uses consensus algorithms. Consul specifically uses the Raft consensus algorithm to require a quorum for writes. It also exposes a key value store and service catalog that is highly available, fault-tolerant, and maintains strong consistency guarantees. Services can register themselves with Consul and share that registration information in a highly-available and distributed manner. Consul is also interesting because it provides: A service catalog with an API instead of the traditional key=value store of most service discovery tools.

PostgreSQL Cookbook by Chitij Chauhan

database schema, Debian, fault tolerance, GnuPG, Google Glasses, index card, information security

High Availability and Replication In this chapter, we will cover the following recipes: Setting up hot streaming replication Replication using Slony-I Replication using Londiste Replication using Bucardo Replication using DRBD Setting up the Postgres-XC cluster Introduction The important components for any production database is to achieve fault tolerance, 24/7 availability, and redundancy. It is for this purpose that we have different high availability and replication solutions available for PostgreSQL. From a business perspective, it is important to ensure 24/7 data availability in the event of a disaster situation or a database crash due to disk or hardware failure.

pages: 391 words: 71,600

Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone by Satya Nadella, Greg Shaw, Jill Tracie Nichols

3D printing, AlphaGo, Amazon Web Services, anti-globalists, artificial general intelligence, augmented reality, autonomous vehicles, basic income, Bretton Woods, business process, cashless society, charter city, cloud computing, complexity theory, computer age, computer vision, corporate social responsibility, crowdsourcing, data science, DeepMind, Deng Xiaoping, Donald Trump, Douglas Engelbart, driverless car, Edward Snowden, Elon Musk, en.wikipedia.org, equal pay for equal work, everywhere but in the productivity statistics, fault tolerance, fulfillment center, Gini coefficient, global supply chain, Google Glasses, Grace Hopper, growth hacking, hype cycle, industrial robot, Internet of things, Jeff Bezos, job automation, John Markoff, John von Neumann, knowledge worker, late capitalism, Mars Rover, Minecraft, Mother of all demos, Neal Stephenson, NP-complete, Oculus Rift, pattern recognition, place-making, Richard Feynman, Robert Gordon, Robert Solow, Ronald Reagan, Salesforce, Second Machine Age, self-driving car, side project, Silicon Valley, Skype, Snapchat, Snow Crash, special economic zone, speech recognition, Stephen Hawking, Steve Ballmer, Steve Jobs, subscription business, TED Talk, telepresence, telerobotics, The Rise and Fall of American Growth, The Soul of a New Machine, Tim Cook: Apple, trade liberalization, two-sided market, universal basic income, Wall-E, Watson beat the top human players on Jeopardy!, young professional, zero-sum game

The same can be said of a dozen other areas in which technology is “stuck”—high temperature superconductors, energy efficient fertilizer production, string theory. A quantum computer would allow a new look at our most compelling problems. Computer scientist Krysta Svore is at the heart of our quest to solve problems on a quantum computer. Krysta received her PhD from Columbia University focusing on fault tolerance and scalable quantum computing, and she spent a year at MIT working with an experimentalist designing the software needed to control a quantum computer. Her team is designing an exotic software architecture that assumes our math, physics, and superconducting experts succeed in building a quantum computer.

pages: 218 words: 68,648

Confessions of a Crypto Millionaire: My Unlikely Escape From Corporate America by Dan Conway

Affordable Care Act / Obamacare, Airbnb, bank run, basic income, Bear Stearns, Big Tech, bitcoin, blockchain, buy and hold, cloud computing, cognitive dissonance, corporate governance, crowdsourcing, cryptocurrency, disruptive innovation, distributed ledger, double entry bookkeeping, Ethereum, ethereum blockchain, fault tolerance, financial independence, gig economy, Gordon Gekko, Haight Ashbury, high net worth, holacracy, imposter syndrome, independent contractor, initial coin offering, job satisfaction, litecoin, Marc Andreessen, Mitch Kapor, obamacare, offshore financial centre, Ponzi scheme, prediction markets, rent control, reserve currency, Ronald Coase, Satoshi Nakamoto, Silicon Valley, Silicon Valley billionaire, smart contracts, Steve Jobs, supercomputer in your pocket, tech billionaire, tech bro, Tragedy of the Commons, Turing complete, Uber for X, universal basic income, upwardly mobile, Vitalik Buterin

It was a delightfully odd group, without a single big personality pushing business cards. The first question was something like this: “Vitalik, don’t you think that the Byzantine general’s dilemma could be exploited by the various geographic nodes in a proof of stake architecture? Is there a way to compile the blockchain that is fault tolerant and aligns incentives with the miners?” I had no idea what they were talking about. I especially didn’t understand Vitalik’s response, which he delivered in an even voice seasoned with small bursts of energy, as if he were connected to a gentle electrical current that gave his face a stutter step every so often.

pages: 205 words: 71,872

Whistleblower: My Journey to Silicon Valley and Fight for Justice at Uber by Susan Fowler

"Susan Fowler" uber, Airbnb, Albert Einstein, Big Tech, Burning Man, cloud computing, data science, deep learning, DevOps, Donald Trump, Elon Musk, end-to-end encryption, fault tolerance, Grace Hopper, Higgs boson, Large Hadron Collider, Lyft, Maui Hawaii, messenger bag, microservices, Mitch Kapor, Richard Feynman, ride hailing / ride sharing, self-driving car, Silicon Valley, TechCrunch disrupt, Travis Kalanick, Uber for X, uber lyft, work culture

During my Engucation classes, I tried to wrap my head around what the computing infrastructure underneath all of these applications actually looked like. It was that infrastructure—the servers, the operating systems, the networks, and all of the code that connected the applications together—that I would be working on, that I would need to make better, more reliable, and more fault-tolerant. After Engucation came more specialized training to prepare new hires for their particular roles within the company. New data scientists spent time with their data science teams, front-end developers learned how to work with the front-end code, and I would embed with one of the site reliability engineering (SRE) teams and learn the basics before I could join my permanent team.

pages: 227 words: 63,186

An Elegant Puzzle: Systems of Engineering Management by Will Larson

Ben Horowitz, Cass Sunstein, Clayton Christensen, data science, DevOps, en.wikipedia.org, fault tolerance, functional programming, Google Earth, hive mind, Innovator's Dilemma, iterative process, Kanban, Kickstarter, Kubernetes, loose coupling, microservices, MITM: man-in-the-middle, no silver bullet, pull request, Richard Thaler, seminal paper, Sheryl Sandberg, Silicon Valley, statistical model, systems thinking, the long tail, web application

Raft is used by etcd14 and influxDB15 among many others. “Paxos Made Simple” One of Leslie Lamport’s numerous influential papers, “Paxos Made Simple” is a gem, both in explaining the notoriously complex Paxos algorithm and because, even at its simplest, Paxos isn’t really that simple: The Paxos algorithm for implementing a fault-tolerant distributed system has been regarded as difficult to understand, perhaps because the original presentation was Greek to many readers. In fact, it is among the simplest and most obvious of distributed algorithms. At its heart is a consensus algorithm—the “synod” algorithm. The next section shows that this consensus algorithm follows almost unavoidably from the properties we want it to satisfy.

pages: 661 words: 185,701

The Future of Money: How the Digital Revolution Is Transforming Currencies and Finance by Eswar S. Prasad

access to a mobile phone, Adam Neumann (WeWork), Airbnb, algorithmic trading, altcoin, bank run, barriers to entry, Bear Stearns, Ben Bernanke: helicopter money, Bernie Madoff, Big Tech, bitcoin, Bitcoin Ponzi scheme, Bletchley Park, blockchain, Bretton Woods, business intelligence, buy and hold, capital controls, carbon footprint, cashless society, central bank independence, cloud computing, coronavirus, COVID-19, Credit Default Swap, cross-border payments, cryptocurrency, deglobalization, democratizing finance, disintermediation, distributed ledger, diversified portfolio, Dogecoin, Donald Trump, Elon Musk, Ethereum, ethereum blockchain, eurozone crisis, fault tolerance, fiat currency, financial engineering, financial independence, financial innovation, financial intermediation, Flash crash, floating exchange rates, full employment, gamification, gig economy, Glass-Steagall Act, global reserve currency, index fund, inflation targeting, informal economy, information asymmetry, initial coin offering, Internet Archive, Jeff Bezos, Kenneth Rogoff, Kickstarter, light touch regulation, liquidity trap, litecoin, lockdown, loose coupling, low interest rates, Lyft, M-Pesa, machine readable, Mark Zuckerberg, Masayoshi Son, mobile money, Money creation, money market fund, money: store of value / unit of account / medium of exchange, Network effects, new economy, offshore financial centre, open economy, opioid epidemic / opioid crisis, PalmPilot, passive investing, payday loans, peer-to-peer, peer-to-peer lending, Peter Thiel, Ponzi scheme, price anchoring, profit motive, QR code, quantitative easing, quantum cryptography, RAND corporation, random walk, Real Time Gross Settlement, regulatory arbitrage, rent-seeking, reserve currency, ride hailing / ride sharing, risk tolerance, risk/return, Robinhood: mobile stock trading app, robo advisor, Ross Ulbricht, Salesforce, Satoshi Nakamoto, seigniorage, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, smart contracts, SoftBank, special drawing rights, the payments system, too big to fail, transaction costs, uber lyft, unbanked and underbanked, underbanked, Vision Fund, Vitalik Buterin, Wayback Machine, WeWork, wikimedia commons, Y Combinator, zero-sum game

Moreover, the reserve would feature a “loss-absorbing capital buffer,” meaning that, to offset any doubts about the extent of the backing, the stablecoins would be matched more than one-to-one by the stock of fiat currencies held in reserve. The Libra project also includes some technical innovations. It employs a new programming language, Move, which is designed to keep the Libra blockchain secure while allowing for the use of specific types of smart contracts. The blockchain is Byzantine fault tolerant, which means that its integrity cannot be compromised by a small number of malicious nodes (other consensus mechanisms such as Proof of Work have this property as well). The consensus protocol ensures transaction finality, is more energy-efficient than Proof of Work, and allows the network to function properly even if nearly one-third of the validator nodes fail or are compromised.

…

Vitalik Buterin, cofounder of Ethereum DeFi relies on smart contract blockchains, of which Ethereum is by far the most widely used. The Bitcoin blockchain, as noted earlier, does not have smart contract capabilities. Vitalik Buterin, a wunderkind who is a cofounder of Ethereum (and is a college dropout, need you ask?), has argued that decentralization confers many advantages over traditional financial systems. One is fault tolerance—failure is less likely because such a system relies on many separate components. Another is attack resistance—there is no central point, such as a major financial institution or centralized exchange, that is vulnerable to attack. A third advantage is collusion resistance—it is difficult for participants in a large decentralized system to collude; corporations and governments, by contrast, have the power to act in ways that might not necessarily benefit common people.

pages: 237 words: 76,486

Mars Rover Curiosity: An Inside Account From Curiosity's Chief Engineer by Rob Manning, William L. Simon

Elon Musk, fault tolerance, fear of failure, James Webb Space Telescope, Kickstarter, Kuiper Belt, Mars Rover, Neil Armstrong

My first job was as an apprentice electronics tester, helping run tests on what would become the brains of the Galileo spacecraft. I quickly discovered that building spacecraft included many extremely tedious jobs. After Galileo, I worked on Magellan (to Venus) and Cassini (to Saturn), becoming expert in the design of spacecraft computers, computer memory, computer architectures, and fault-tolerant systems. In 1993, after thirteen years at JPL, my career took a sudden leap forward. Brian Muirhead, the most inspiring and level-headed spacecraft leader I have ever met, had recently been named spacecraft manager for a funky little mission to Mars called Pathfinder. We had a conversation in which he explained that he was a master of mechanical systems but had not had much experience with electronics.

pages: 313 words: 75,583

Ansible for DevOps: Server and Configuration Management for Humans by Jeff Geerling

Abraham Maslow, AGPL, Amazon Web Services, cloud computing, continuous integration, database schema, Debian, defense in depth, DevOps, fault tolerance, Firefox, full text search, Google Chrome, inventory management, loose coupling, microservices, Minecraft, MITM: man-in-the-middle, punch-card reader, Ruby on Rails, web application

Use a distributed file system, like Gluster, Lustre, Fraunhofer, or Ceph. Some options are easier to set up than others, and all have benefits—and drawbacks. Rsync, git, or NFS offer simple initial setup, and low impact on filesystem performance (in many scenarios). But if you need more flexibility and scalability, less network overhead, and greater fault tolerance, you will have to consider something that requires more configuration (e.g. a distributed file system) and/or more hardware (e.g. a SAN). GlusterFS is licensed under the AGPL license, has good documentation, and a fairly active support community (especially in the #gluster IRC channel). But to someone new to distributed file systems, it can be daunting to get set it up the first time.

Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos

AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, machine readable, machine translation, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, Wikidata, wikimedia commons, Wikivoyage

It is a highly scalable, open source storage and computing platform [11]. Suitable for Big Data applications and selected for the Wikidata Query Service, Blazegraph is specifically designed to support big graphs, offering Semantic Web (RDF/SPARQL) and graph database (tinkerpop, blueprints, vertex-centric) APIs. The robust, scalable, fault-tolerant, enterprise-class storage and query features are combined with high availability, online backup, failover, and self-healing. Blazegraph features an ultra-high performance RDF graph database that supports RDFS and OWL Lite reasoning, as well as SPARQL 1.1 querying. Designed for huge amounts of information, the Blazegraph RDF graph database can load 1 billion graph edges in less than an hour on a 15-node cluster.

pages: 328 words: 77,877

API Marketplace Engineering: Design, Build, and Run a Platform for External Developers by Rennay Dorasamy

Airbnb, Amazon Web Services, barriers to entry, business logic, business process, butterfly effect, continuous integration, DevOps, digital divide, disintermediation, fault tolerance, if you build it, they will come, information security, Infrastructure as a Service, Internet of things, Jeff Bezos, Kanban, Kubernetes, Lyft, market fragmentation, microservices, minimum viable product, MITM: man-in-the-middle, mobile money, optical character recognition, platform as a service, pull request, ride hailing / ride sharing, speech recognition, the payments system, transaction costs, two-pizza team, Uber and Lyft, uber lyft, underbanked, web application

I hope that this position inspires a reader out there to write a book on the “Dummy’s Guide to Persistent Storage on Kubernetes” and I promise to buy a copy. However, this decision was made after intense discussion and deliberation. The deciding factor that helped to settle the matter was that database management would never be a core function of our team. We could easily build a database container, but making it highly available and fault tolerant would make our container cluster configuration much more complex. Our Platform now uses an Enterprise provided service, and from observing the various areas of database management, I have no doubt that we made the right decision. To clarify, our team owns the data. The Enterprise service team owns the database.

pages: 283 words: 78,705

Principles of Web API Design: Delivering Value with APIs and Microservices by James Higginbotham

Amazon Web Services, anti-pattern, business intelligence, business logic, business process, Clayton Christensen, cognitive dissonance, cognitive load, collaborative editing, continuous integration, create, read, update, delete, database schema, DevOps, fallacies of distributed computing, fault tolerance, index card, Internet of things, inventory management, Kubernetes, linked data, loose coupling, machine readable, Metcalfe’s law, microservices, recommendation engine, semantic web, side project, single page application, Snapchat, software as a service, SQL injection, web application, WebSocket

Finally, don’t underestimate the effort required to separate a monolithic data store into a data store per service. Distributed Systems Challenges The journey toward microservices requires a deep understanding of distributed systems. Those not as familiar with the concepts of distributed tracing, observability, eventual consistency, fault tolerance, and failover will encounter a more difficult time with microservices. The eight fallacies of distributed computing, written in 1994 and still applicable today, must be understood by every developer. Additionally, many find that architectural oversight is required to initially decompose and subsequently integrate services into solutions.

pages: 619 words: 197,256

Apollo by Charles Murray, Catherine Bly Cox

Apollo 11, Apollo 13, cuban missile crisis, fault tolerance, Gene Kranz, index card, low earth orbit, military-industrial complex, Neil Armstrong, old-boy network, orbital mechanics / astrodynamics, pneumatic tube, Ted Sorensen, The Bell Curve by Richard Herrnstein and Charles Murray, War on Poverty, white flight

“There was only one engine bell, of course, and only one combustion chamber, but all the avionics that fed the signals to that engine and all the mechanical components that had to work, like the little valves that had to be pressurized to open the ball valves, and so forth, were at least single-fault tolerant and usually two-fault tolerant. . . . There were a heck of a lot of ways to start that engine.” And of course they had indeed checked it out carefully before the flight, but nothing they didn’t do for any other mission. All this was still correct as of Christmas Eve, 1968. And yet it ultimately didn’t make any difference to the way many of the people in Apollo felt.

pages: 1,380 words: 190,710

Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems by Heather Adkins, Betsy Beyer, Paul Blankinship, Ana Oprea, Piotr Lewandowski, Adam Stubblefield

air gap, anti-pattern, barriers to entry, bash_history, behavioural economics, business continuity plan, business logic, business process, Cass Sunstein, cloud computing, cognitive load, continuous integration, correlation does not imply causation, create, read, update, delete, cryptocurrency, cyber-physical system, database schema, Debian, defense in depth, DevOps, Edward Snowden, end-to-end encryption, exponential backoff, fault tolerance, fear of failure, general-purpose programming language, Google Chrome, if you see hoof prints, think horses—not zebras, information security, Internet of things, Kubernetes, load shedding, margin call, microservices, MITM: man-in-the-middle, NSO Group, nudge theory, operational security, performance metric, pull request, ransomware, reproducible builds, revision control, Richard Thaler, risk tolerance, self-driving car, single source of truth, Skype, slashdot, software as a service, source of truth, SQL injection, Stuxnet, the long tail, Turing test, undersea cable, uranium enrichment, Valgrind, web application, Y2K, zero day

Your risk assessment may vary depending on where your organization’s assets are located. For example, a site in Japan or Taiwan should account for typhoons, while a site in the Southeastern US should account for hurricanes. Risk ratings may also change as an organization matures and incorporates fault-tolerant systems, like redundant internet circuits and backup power supplies, into its systems. Large organizations should perform risk assessments on both global and per-site levels, and review and update these assessments periodically as the operating environment changes. Equipped with a risk assessment that identifies which systems need protection, you’re ready to create a response team prepared with tools, procedures, and training.

…

The IR team should have read access to logs for analysis and event reconstruction, as well as access to tools for analyzing data, sending reports, and conducting forensic examinations. Configuring Systems You can make a number of adjustments to systems before a disaster or incident to reduce an IR team’s initial response time. For example: Build fault tolerance into local systems and create failovers. For more information on this topic, see Chapters 8 and 9. Deploy forensic agents, such as GRR agents or EnCase Remote Agents, across the network with logs enabled. This will aid both your response and later forensic analysis. Be aware that security logs may require a lengthy retention period, as discussed in Chapter 15 (the industry average for detecting intrusions is approximately 200 days, and logs deleted before an incident is detected cannot be used to investigate it).

pages: 275 words: 84,980

Before Babylon, Beyond Bitcoin: From Money That We Understand to Money That Understands Us (Perspectives) by David Birch

"World Economic Forum" Davos, agricultural Revolution, Airbnb, Alan Greenspan, bank run, banks create money, bitcoin, blockchain, Bretton Woods, British Empire, Broken windows theory, Burning Man, business cycle, capital controls, cashless society, Clayton Christensen, clockwork universe, creative destruction, credit crunch, cross-border payments, cross-subsidies, crowdsourcing, cryptocurrency, David Graeber, dematerialisation, Diane Coyle, disruptive innovation, distributed ledger, Dogecoin, double entry bookkeeping, Ethereum, ethereum blockchain, facts on the ground, fake news, fault tolerance, fiat currency, financial exclusion, financial innovation, financial intermediation, floating exchange rates, Fractional reserve banking, index card, informal economy, Internet of things, invention of the printing press, invention of the telegraph, invention of the telephone, invisible hand, Irish bank strikes, Isaac Newton, Jane Jacobs, Kenneth Rogoff, knowledge economy, Kuwabatake Sanjuro: assassination market, land bank, large denomination, low interest rates, M-Pesa, market clearing, market fundamentalism, Marshall McLuhan, Martin Wolf, mobile money, Money creation, money: store of value / unit of account / medium of exchange, new economy, Northern Rock, Pingit, prediction markets, price stability, QR code, quantitative easing, railway mania, Ralph Waldo Emerson, Real Time Gross Settlement, reserve currency, Satoshi Nakamoto, seigniorage, Silicon Valley, smart contracts, social graph, special drawing rights, Suez canal 1869, technoutopianism, The future is already here, the payments system, The Wealth of Nations by Adam Smith, too big to fail, transaction costs, tulip mania, wage slave, Washington Consensus, wikimedia commons

Ripple After Bitcoin and Ethereum, the third biggest cryptocurrency is Ripple, which unlike those first two has its roots in local exchange trading systems (Peck 2013). It is a protocol for value exchange that uses a shared ledger but it does not use a Bitcoin-like blockchain, preferring another kind of what is known as a ‘Byzantine fault-tolerant consensus-forming process’. Ripple signs every transaction that parties submit to the network with a digital signature. Each user selects a list, called a ‘unique node list’, comprising other users that it trusts as what are known as ‘validating nodes’. Each validating node independently verifies every proposed transaction within its network to determine if it is valid.

pages: 362 words: 86,195

Fatal System Error: The Hunt for the New Crime Lords Who Are Bringing Down the Internet by Joseph Menn

Brian Krebs, dumpster diving, fault tolerance, Firefox, John Markoff, Menlo Park, offshore financial centre, pirate software, plutocrats, popular electronics, profit motive, RFID, Silicon Valley, zero day

Cerf, who has a generally upbeat tone about most things, gives the impression that he remains pleasantly surprised that the Internet has continued to function and thrive—even though, as he put it, “We never got to do the production engineering,” the version ready for prime time. Even after his years on the front line, Barrett found such statements amazing. “It’s incredibly disturbing,” he said. “The engine of the world economy is based on this really cool experiment that is not designed for security, it’s designed for fault-tolerance,” which is a system’s ability to withstand some failures. “You can reduce your risks, but the naughty truth is that the Net is just not a secure place for business or society.” Cerf listed a dozen things that could be done to make the Internet safer. Among them: encouraging research into “hardware-assisted security mechanisms,” limiting the enormous damage that Web browsers can wreak on operating systems, and hiring more and better trained federal cybercrime agents while pursuing international legal frameworks.

pages: 669 words: 210,153

Tools of Titans: The Tactics, Routines, and Habits of Billionaires, Icons, and World-Class Performers by Timothy Ferriss

Abraham Maslow, Adam Curtis, Airbnb, Alexander Shulgin, Alvin Toffler, An Inconvenient Truth, artificial general intelligence, asset allocation, Atul Gawande, augmented reality, back-to-the-land, Ben Horowitz, Bernie Madoff, Bertrand Russell: In Praise of Idleness, Beryl Markham, billion-dollar mistake, Black Swan, Blue Bottle Coffee, Blue Ocean Strategy, blue-collar work, book value, Boris Johnson, Buckminster Fuller, business process, Cal Newport, call centre, caloric restriction, caloric restriction, Carl Icahn, Charles Lindbergh, Checklist Manifesto, cognitive bias, cognitive dissonance, Colonization of Mars, Columbine, commoditize, correlation does not imply causation, CRISPR, David Brooks, David Graeber, deal flow, digital rights, diversification, diversified portfolio, do what you love, Donald Trump, effective altruism, Elon Musk, fail fast, fake it until you make it, fault tolerance, fear of failure, Firefox, follow your passion, fulfillment center, future of work, Future Shock, Girl Boss, Google X / Alphabet X, growth hacking, Howard Zinn, Hugh Fearnley-Whittingstall, Jeff Bezos, job satisfaction, Johann Wolfgang von Goethe, John Markoff, Kevin Kelly, Kickstarter, Lao Tzu, lateral thinking, life extension, lifelogging, Mahatma Gandhi, Marc Andreessen, Mark Zuckerberg, Mason jar, Menlo Park, microdosing, Mikhail Gorbachev, MITM: man-in-the-middle, Neal Stephenson, Nelson Mandela, Nicholas Carr, Nick Bostrom, off-the-grid, optical character recognition, PageRank, Paradox of Choice, passive income, pattern recognition, Paul Graham, peer-to-peer, Peter H. Diamandis: Planetary Resources, Peter Singer: altruism, Peter Thiel, phenotype, PIHKAL and TIHKAL, post scarcity, post-work, power law, premature optimization, private spaceflight, QWERTY keyboard, Ralph Waldo Emerson, Ray Kurzweil, recommendation engine, rent-seeking, Richard Feynman, risk tolerance, Ronald Reagan, Salesforce, selection bias, sharing economy, side project, Silicon Valley, skunkworks, Skype, Snapchat, Snow Crash, social graph, software as a service, software is eating the world, stem cell, Stephen Hawking, Steve Jobs, Stewart Brand, superintelligent machines, TED Talk, Tesla Model S, The future is already here, the long tail, The Wisdom of Crowds, Thomas L Friedman, traumatic brain injury, trolley problem, vertical integration, Wall-E, Washington Consensus, We are as Gods, Whole Earth Catalog, Y Combinator, zero-sum game

DEC was first in minicomputers. Many other computer companies (and their entrepreneurial owners) became rich and famous by following a simple principle: If you can’t be first in a category, set up a new category you can be first in. Tandem was first in fault-tolerant computers and built a $1.9 billion business. So Stratus stepped down with the first fault-tolerant minicomputer. Are the laws of marketing difficult? No, they are quite simple. Working things out in practice is another matter, however. Cray Research went over the top with the first supercomputer. So Convex put two and two together and launched the first mini supercomputer.

pages: 722 words: 90,903

Practical Vim: Edit Text at the Speed of Thought by Drew Neil

Bram Moolenaar, don't repeat yourself, en.wikipedia.org, fault tolerance, finite state, fizzbuzz, off-by-one error, place-making, QWERTY keyboard, web application

I’ve borrowed the expressions in series and in parallel from the field of electronics to differentiate between two techniques for executing a macro multiple times. The technique for executing a macro in series is brittle. Like cheap Christmas tree lights, it breaks easily. The technique for executing a macro in parallel is more fault tolerant. Execute the Macro in Series Picture a robotic arm and a conveyor belt containing a series of items for the robot to manipulate (Figure 4, Vim's macros make quick work of repetitive tasks). Recording a macro is like programming the robot to do a single unit of work. As a final step, we instruct the robot to move the conveyor belt and bring the next item within reach.

pages: 350 words: 90,898

A World Without Email: Reimagining Work in an Age of Communication Overload by Cal Newport

Cal Newport, call centre, Claude Shannon: information theory, cognitive dissonance, collaborative editing, Compatible Time-Sharing System, computer age, COVID-19, creative destruction, data science, David Heinemeier Hansson, fault tolerance, Ford Model T, Frederick Winslow Taylor, future of work, Garrett Hardin, hive mind, Inbox Zero, interchangeable parts, it's over 9,000, James Watt: steam engine, Jaron Lanier, John Markoff, John Nash: game theory, Joseph Schumpeter, Kanban, Kickstarter, knowledge worker, Marshall McLuhan, Nash equilibrium, passive income, Paul Graham, place-making, pneumatic tube, remote work: asynchronous communication, remote working, Richard Feynman, rolodex, Salesforce, Saturday Night Live, scientific management, Silicon Valley, Silicon Valley startup, Skype, social graph, stealth mode startup, Steve Jobs, supply-chain management, technological determinism, the medium is the message, the scientific method, Tragedy of the Commons, web application, work culture , Y Combinator

It made it clear that asynchronous communication complicates attempts to coordinate, and therefore, it’s almost always worth the extra cost required to introduce more synchrony. In the context of distributed systems, the added synchrony explored in the aftermath of this famous 1985 paper took several forms. One heavy-handed solution, used in some early fly-by-wire systems and fault-tolerant credit card transaction processing machines, was to connect the machines on a common electrical circuit, allowing them to operate at the same lockstep pace. This approach eliminates unpredictable communication delays and allows your application to immediately detect if a machine has crashed. Because these circuits were sometimes complicated to implement, software approaches to adding synchrony also became popular.

The Internet Trap: How the Digital Economy Builds Monopolies and Undermines Democracy by Matthew Hindman

A Declaration of the Independence of Cyberspace, accounting loophole / creative accounting, activist fund / activist shareholder / activist investor, AltaVista, Amazon Web Services, barriers to entry, Benjamin Mako Hill, bounce rate, business logic, Cambridge Analytica, cloud computing, computer vision, creative destruction, crowdsourcing, David Ricardo: comparative advantage, death of newspapers, deep learning, DeepMind, digital divide, discovery of DNA, disinformation, Donald Trump, fake news, fault tolerance, Filter Bubble, Firefox, future of journalism, Ida Tarbell, incognito mode, informal economy, information retrieval, invention of the telescope, Jeff Bezos, John Perry Barlow, John von Neumann, Joseph Schumpeter, lake wobegon effect, large denomination, longitudinal study, loose coupling, machine translation, Marc Andreessen, Mark Zuckerberg, Metcalfe’s law, natural language processing, Netflix Prize, Network effects, New Economic Geography, New Journalism, pattern recognition, peer-to-peer, Pepsi Challenge, performance metric, power law, price discrimination, recommendation engine, Robert Metcalfe, search costs, selection bias, Silicon Valley, Skype, sparse data, speech recognition, Stewart Brand, surveillance capitalism, technoutopianism, Ted Nelson, The Chicago School, the long tail, The Soul of a New Machine, Thomas Malthus, web application, Whole Earth Catalog, Yochai Benkler

Shoenfeld, Z. (2017, June). MTV News—and other sites—are frantically pivoting to video. It won’t work. Newsweek. Retrieved from http://www.newsweek.com/mtv -news-video-vocativ-media-ads-pivot-630223. Shute, J., Oancea, M., Ellner, S., Handy, B., Rollins, E., Samwel, B.,. . . , Jegerlehner, B., et al. (2012). F1: the fault-tolerant distributed RDBMS supporting Google’s ad business. In Proceedings of the 2012 International Conference on Management of Data, Scottsdale, AZ (pp. 777–78). ACM. Sifry, M. (2009, November). Critiquing Matthew Hindman’s “The Myth of Digital Democracy”. TechPresident. Retrieved from http://techpresident.com/blog-entry /critiquing-matthew-hindmans-myth-digital-democracy.

pages: 332 words: 93,672

Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy by George Gilder

23andMe, Airbnb, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, AlphaGo, AltaVista, Amazon Web Services, AOL-Time Warner, Asilomar, augmented reality, Ben Horowitz, bitcoin, Bitcoin Ponzi scheme, Bletchley Park, blockchain, Bob Noyce, British Empire, Brownian motion, Burning Man, business process, butterfly effect, carbon footprint, cellular automata, Claude Shannon: information theory, Clayton Christensen, cloud computing, computer age, computer vision, crony capitalism, cross-subsidies, cryptocurrency, Danny Hillis, decentralized internet, deep learning, DeepMind, Demis Hassabis, disintermediation, distributed ledger, don't be evil, Donald Knuth, Donald Trump, double entry bookkeeping, driverless car, Elon Musk, Erik Brynjolfsson, Ethereum, ethereum blockchain, fake news, fault tolerance, fiat currency, Firefox, first square of the chessboard, first square of the chessboard / second half of the chessboard, floating exchange rates, Fractional reserve banking, game design, Geoffrey Hinton, George Gilder, Google Earth, Google Glasses, Google Hangouts, index fund, inflation targeting, informal economy, initial coin offering, Internet of things, Isaac Newton, iterative process, Jaron Lanier, Jeff Bezos, Jim Simons, Joan Didion, John Markoff, John von Neumann, Julian Assange, Kevin Kelly, Law of Accelerating Returns, machine translation, Marc Andreessen, Mark Zuckerberg, Mary Meeker, means of production, Menlo Park, Metcalfe’s law, Money creation, money: store of value / unit of account / medium of exchange, move fast and break things, Neal Stephenson, Network effects, new economy, Nick Bostrom, Norbert Wiener, Oculus Rift, OSI model, PageRank, pattern recognition, Paul Graham, peer-to-peer, Peter Thiel, Ponzi scheme, prediction markets, quantitative easing, random walk, ransomware, Ray Kurzweil, reality distortion field, Recombinant DNA, Renaissance Technologies, Robert Mercer, Robert Metcalfe, Ronald Coase, Ross Ulbricht, Ruby on Rails, Sand Hill Road, Satoshi Nakamoto, Search for Extraterrestrial Intelligence, self-driving car, sharing economy, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, Singularitarianism, Skype, smart contracts, Snapchat, Snow Crash, software is eating the world, sorting algorithm, South Sea Bubble, speech recognition, Stephen Hawking, Steve Jobs, Steven Levy, Stewart Brand, stochastic process, Susan Wojcicki, TED Talk, telepresence, Tesla Model S, The Soul of a New Machine, theory of mind, Tim Cook: Apple, transaction costs, tulip mania, Turing complete, Turing machine, Vernor Vinge, Vitalik Buterin, Von Neumann architecture, Watson beat the top human players on Jeopardy!, WikiLeaks, Y Combinator, zero-sum game

Nick Tredennick and Paul Wu, Transaction Security, Cryptochain, and Chip Level Identity (Cupertino: Jonetix, 2018). See also Tredennick and Wu, “Transaction Security Begins With Chip Level Identity,” Int’l Conference on Internet Computing and Internet of Things, ICOMP, 2017. 8. Leemon Baird, “The Swirlds Hashgraph Consensus Algorithm: Fair, Fast, Byzantine Fault Tolerance,” Swirlds Tech Report, May 31, 2016, revised February 16, 2018. 9. Leemon Baird, Mance Harmon, and Paul Madsen, “Hedera: A Governing Council & Public Hashgraph Network: The Trust Layer of the Internet,” white paper V1.0, March 13, 2018, 22. 10. Ibid., 19. Epilogue: The New System of the World 1.

Practical Vim, Second Edition (for Stefano Alcazi) by Drew Neil

Bram Moolenaar, don't repeat yourself, en.wikipedia.org, fault tolerance, finite state, fizzbuzz, off-by-one error, place-making, QWERTY keyboard, web application

I’ve borrowed the expressions in series and in parallel from the field of electronics to differentiate between two techniques for executing a macro multiple times. The technique for executing a macro in series is brittle. Like cheap Christmas tree lights, it breaks easily. The technique for executing a macro in parallel is more fault tolerant. Execute the Macro in Series Picture a robotic arm and a conveyor belt containing a series of items for the robot to manipulate. Recording a macro is like programming the robot to do a single unit of work. As a final step, we instruct the robot to move the conveyor belt and bring the next item within reach.

Practical Vim by Drew Neil

Bram Moolenaar, don't repeat yourself, en.wikipedia.org, fault tolerance, finite state, fizzbuzz, off-by-one error, place-making, QWERTY keyboard, web application

I’ve borrowed the expressions in series and in parallel from the field of electronics to differentiate between two techniques for executing a macro multiple times. The technique for executing a macro in series is brittle. Like cheap Christmas tree lights, it breaks easily. The technique for executing a macro in parallel is more fault tolerant. Execute the Macro in Series Picture a robotic arm and a conveyor belt containing a series of items for the robot to manipulate. Recording a macro is like programming the robot to do a single unit of work. As a final step, we instruct the robot to move the conveyor belt and bring the next item within reach.

pages: 352 words: 96,532

Where Wizards Stay Up Late: The Origins of the Internet by Katie Hafner, Matthew Lyon

air freight, Bill Duvall, Charles Babbage, Compatible Time-Sharing System, computer age, conceptual framework, Donald Davies, Douglas Engelbart, Douglas Engelbart, fault tolerance, Hush-A-Phone, information retrieval, Ivan Sutherland, John Markoff, Kevin Kelly, Leonard Kleinrock, Marc Andreessen, Menlo Park, military-industrial complex, Multics, natural language processing, OSI model, packet switching, RAND corporation, RFC: Request For Comment, Robert Metcalfe, Ronald Reagan, seminal paper, Silicon Valley, Skinner box, speech recognition, Steve Crocker, Steven Levy, The Soul of a New Machine

But imagine a local post office somewhere that decided to go it alone, making up its own rules for addressing, packaging, stamping, and sorting mail. Imagine if that rogue post office decided to invent its own set of ZIP codes. Imagine any number of post offices taking it upon themselves to invent new rules. Imagine widespread confusion. Mail handling begs for a certain amount of conformity, and because computers are less fault-tolerant than human beings, e-mail begs loudly. The early wrangling on the ARPANET over attempts to impose standard message headers was typical of other debates over computer industry standards that came later. But because the struggle over e-mail standards was one of the first sources of real tension in the community, it stood out.

Scratch Monkey by Stross, Charles

carbon-based life, defense in depth, fault tolerance, gravity well, heat death of the universe, Kuiper Belt, packet switching, phenotype, telepresence

I'm probably grinning like a corpse but I don't care -- she must know by now that blind people often smile. It's easier to grin than to frown; the facial muscles contract into a smirk more easily. Even when you're about to die. "It takes a lot of stress to unbalance a network processor the size of a small moon," she replies calmly; "it shows a remarkable degree of fault tolerance. As for physical assault, the automatic defences are still armed ... as they always have been. So If we want to take it for ourselves, we must overwhelm it by frontal assault, sending uploaded minds out into the simulation space until it overloads and drops into NP-stasis. They do that if you feed them faster than they can transfer capacity elsewhere, you know.

Data and the City by Rob Kitchin,Tracey P. Lauriault,Gavin McArdle

A Declaration of the Independence of Cyberspace, algorithmic management, bike sharing, bitcoin, blockchain, Bretton Woods, Chelsea Manning, citizen journalism, Claude Shannon: information theory, clean water, cloud computing, complexity theory, conceptual framework, corporate governance, correlation does not imply causation, create, read, update, delete, crowdsourcing, cryptocurrency, data science, dematerialisation, digital divide, digital map, digital rights, distributed ledger, Evgeny Morozov, fault tolerance, fiat currency, Filter Bubble, floating exchange rates, folksonomy, functional programming, global value chain, Google Earth, Hacker News, hive mind, information security, Internet of things, Kickstarter, knowledge economy, Lewis Mumford, lifelogging, linked data, loose coupling, machine readable, new economy, New Urbanism, Nicholas Carr, nowcasting, open economy, openstreetmap, OSI model, packet switching, pattern recognition, performance metric, place-making, power law, quantum entanglement, RAND corporation, RFID, Richard Florida, ride hailing / ride sharing, semantic web, sentiment analysis, sharing economy, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart contracts, smart grid, smart meter, social graph, software studies, statistical model, tacit knowledge, TaskRabbit, technological determinism, technological solutionism, text mining, The Chicago School, The Death and Life of Great American Cities, the long tail, the market place, the medium is the message, the scientific method, Toyota Production System, urban planning, urban sprawl, web application

Such data include public administrative records, operational management information, as well as that produced by sensors, transponders and cameras that make up the internet of things, smartphones, wearables, social media, loyalty cards and commercial sources. In many cases, cities are turning to big data technologies and their novel distributed computational infrastructure for the reliable and fault tolerant storage, analysis and dissemination of data from various sources. In such systems, processing is generally brought to the data, rather than bringing data to the processing. Since each organization uses different platforms, operating systems and software to generate and analyse data, data sharing mechanisms should ideally be provided as platform-independent services so that they can be utilized by various users for different purposes, for example, for research, business, improving existing services of city authorities and organizations, and for facilitating communication between people and policymakers.

pages: 406 words: 105,602

The Startup Way: Making Entrepreneurship a Fundamental Discipline of Every Enterprise by Eric Ries

activist fund / activist shareholder / activist investor, Affordable Care Act / Obamacare, Airbnb, AOL-Time Warner, autonomous vehicles, barriers to entry, basic income, Ben Horowitz, billion-dollar mistake, Black-Scholes formula, Blitzscaling, call centre, centralized clearinghouse, Clayton Christensen, cognitive dissonance, connected car, corporate governance, DevOps, Elon Musk, en.wikipedia.org, fault tolerance, financial engineering, Frederick Winslow Taylor, global supply chain, Great Leap Forward, hockey-stick growth, index card, Jeff Bezos, Kickstarter, Lean Startup, loss aversion, machine readable, Marc Andreessen, Mark Zuckerberg, means of production, minimum viable product, moral hazard, move fast and break things, obamacare, PalmPilot, peer-to-peer, place-making, rent-seeking, Richard Florida, Sam Altman, Sand Hill Road, scientific management, secular stagnation, shareholder value, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, Steve Jobs, TechCrunch disrupt, the scientific method, time value of money, Toyota Production System, two-pizza team, Uber for X, universal basic income, web of trust, Y Combinator

“We spent three or four weeks when the only visible thing we were doing was making everybody come to one place,” he recalls. “When things went wrong, we just went and found the person who was responsible.” In addition, the site architecture was so bad that the slightest problem had the potential to knock the whole thing out. There was no way to track issues, and none of the fault tolerance or resistance that such a massive system should have had in place, as a matter of course, existed. Faced with this quagmire, the team asked a single question: “Why is the site not working on October 22?” Then they worked backward, applying the management and technological practices that by now should sound familiar: small teams, rapid iteration, accountability metrics, and a culture of transparency without fear of recrimination.

pages: 648 words: 108,814

Solr 1.4 Enterprise Search Server by David Smiley, Eric Pugh

Amazon Web Services, bioinformatics, cloud computing, continuous integration, database schema, domain-specific language, en.wikipedia.org, fault tolerance, Firefox, information retrieval, Ruby on Rails, SQL injection, Wayback Machine, web application, Y Combinator

There has been a fair amount of discussion on Solr mailing lists about setting up distributed Solr on a robust foundation that adapts to changing environment. There has been some investigation regarding using Apache Hadoop, a platform for building reliable, distributing computing as a foundation for Solr that would provide a robust fault-tolerant filesystem. Another interesting sub project of Hadoop is ZooKeeper, which aims to be a service for centralizing the management required by distributed applications. There has been some development work on integrating ZooKeeper as the management interface for Solr. Keep an eye on the Hadoop homepage for more information about these efforts at http://hadoop.apache.org/ and Zookeeper at http://hadoop.apache.org/zookeeper/.

pages: 354 words: 26,550

High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems by Irene Aldridge

algorithmic trading, asset allocation, asset-backed security, automated trading system, backtesting, Black Swan, Brownian motion, business cycle, business process, buy and hold, capital asset pricing model, centralized clearinghouse, collapse of Lehman Brothers, collateralized debt obligation, collective bargaining, computerized trading, diversification, equity premium, fault tolerance, financial engineering, financial intermediation, fixed income, global macro, high net worth, implied volatility, index arbitrage, information asymmetry, interest rate swap, inventory management, Jim Simons, law of one price, Long Term Capital Management, Louis Bachelier, machine readable, margin call, market friction, market microstructure, martingale, Myron Scholes, New Journalism, p-value, paper trading, performance metric, Performance of Mutual Funds in the Period, pneumatic tube, profit motive, proprietary trading, purchasing power parity, quantitative trading / quantitative ﬁnance, random walk, Renaissance Technologies, risk free rate, risk tolerance, risk-adjusted returns, risk/return, Sharpe ratio, short selling, Small Order Execution System, statistical arbitrage, statistical model, stochastic process, stochastic volatility, systematic trading, tail risk, trade route, transaction costs, value at risk, yield curve, zero-sum game

New York–based MarketFactory provides a suite of software tools to help automated traders get an extra edge in the market, help their models scale, increase their fill ratios, reduce slippage, and thereby improve profitability (P&L). Chapter 18 discusses optimization of execution. Run-time risk management applications ensure that the system stays within prespecified behavioral and P&L bounds. Such applications may also be known as system-monitoring and fault-tolerance software. 26 HIGH-FREQUENCY TRADING r Mobile applications suitable for monitoring performance of highfrequency trading systems alert administration of any issues. r Real-time third-party research can stream advanced information and forecasts. Legal, Accounting, and Other Professional Services Like any business in the financial sector, high-frequency trading needs to make sure that “all i’s are dotted and all t’s are crossed” in the legal and accounting departments.

pages: 1,266 words: 278,632

Backup & Recovery by W. Curtis Preston

Berlin Wall, business intelligence, business process, database schema, Debian, dumpster diving, failed state, fault tolerance, full text search, job automation, Kickstarter, operational security, rolling blackouts, side project, Silicon Valley, systems thinking, web application

In addition to these dbcc tasks, you need to choose a transaction log archive strategy. If you follow these tasks, you will help maintain the database, keeping it running smoothly and ready for proper backups. dbcc: The Database Consistency Checker Even though Sybase’s dataserver products are very robust and much effort has gone into making them fault-tolerant, there is always the chance that a problem will occur. For very large tables, some of these problems might not show until very specific queries are run. This is one of the reasons for the database consistency checker, dbcc. This set of SQL commands can review all the database page allocations, linkages, and data pointers, finding problems and, in many cases, fixing them before they become insurmountable.

…

(As of this writing, the MySQL team is developing other ACID-compliant storage engines.) With PostgreSQL, all data is stored in an ACID-compliant fashion. PostgreSQL also offers sophisticated features such as point-in-time recovery, tablespaces, checkpoints, hot backups, and write ahead logging for fault tolerance. These are all very good things from a data-protection and data-integrity standpoint. PostgreSQL Architecture From a power-user standpoint, PostgreSQL is like any other database. The following terms mean essentially the same in PostgreSQL as they do in any other relational database: Database Table Index Row Attribute Extent Partition Transaction Clusters A PostgreSQL cluster is analogous to an instance in other RDBMSs, and each cluster works with one or more databases.

pages: 302 words: 82,233

Beautiful security by Andy Oram, John Viega

Albert Einstein, Amazon Web Services, An Inconvenient Truth, Bletchley Park, business intelligence, business process, call centre, cloud computing, corporate governance, credit crunch, crowdsourcing, defense in depth, do well by doing good, Donald Davies, en.wikipedia.org, fault tolerance, Firefox, information security, loose coupling, Marc Andreessen, market design, MITM: man-in-the-middle, Monroe Doctrine, new economy, Nicholas Carr, Nick Leeson, Norbert Wiener, operational security, optical character recognition, packet switching, peer-to-peer, performance metric, pirate software, Robert Bork, Search for Extraterrestrial Intelligence, security theater, SETI@home, Silicon Valley, Skype, software as a service, SQL injection, statistical model, Steven Levy, the long tail, The Wisdom of Crowds, Upton Sinclair, web application, web of trust, zero day, Zimmermann PGP

With this dynamic control and command infrastructure, the botnet owner can mobilize a massive amount of computing resources from one corner of the Internet to another within a matter of minutes. It should be noted that the control server itself might not be static. Botnets have evolved from a static control infrastructure to a peer-to-peer structure for the purposes of fault tolerance and evading detection. When one server is detected and blocked, other servers can step in and take over. It is also common for the control server to run on a compromised machine or by proxy, so that the botnet’s owner is unlikely to be identified. Botnets commonly communicate through the same method as their creators’ public IRC servers.

pages: 424 words: 114,905

Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again by Eric Topol

"World Economic Forum" Davos, 23andMe, Affordable Care Act / Obamacare, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic bias, AlphaGo, Apollo 11, artificial general intelligence, augmented reality, autism spectrum disorder, autonomous vehicles, backpropagation, Big Tech, bioinformatics, blockchain, Cambridge Analytica, cloud computing, cognitive bias, Colonization of Mars, computer age, computer vision, Computing Machinery and Intelligence, conceptual framework, creative destruction, CRISPR, crowdsourcing, Daniel Kahneman / Amos Tversky, dark matter, data science, David Brooks, deep learning, DeepMind, Demis Hassabis, digital twin, driverless car, Elon Musk, en.wikipedia.org, epigenetics, Erik Brynjolfsson, fake news, fault tolerance, gamification, general purpose technology, Geoffrey Hinton, George Santayana, Google Glasses, ImageNet competition, Jeff Bezos, job automation, job satisfaction, Joi Ito, machine translation, Mark Zuckerberg, medical residency, meta-analysis, microbiome, move 37, natural language processing, new economy, Nicholas Carr, Nick Bostrom, nudge unit, OpenAI, opioid epidemic / opioid crisis, pattern recognition, performance metric, personalized medicine, phenotype, placebo effect, post-truth, randomized controlled trial, recommendation engine, Rubik’s Cube, Sam Altman, self-driving car, Silicon Valley, Skinner box, speech recognition, Stephen Hawking, techlash, TED Talk, text mining, the scientific method, Tim Cook: Apple, traumatic brain injury, trolley problem, War on Poverty, Watson beat the top human players on Jeopardy!, working-age population

The K supercomputer in Japan, by contrast, requires about 10 megawatts of power and occupies more than 1.3 million liters.56 Where our brain’s estimated 100 billion neurons and 100 trillion connections give it a high tolerance for failure—not to mention its astonishing ability to learn both with and without a teacher, from very few examples—even the most powerful computers have poor fault tolerance for any lost circuitry, and they certainly require plenty of programming before they can begin to learn, and then only from millions of examples. Another major difference is that our brain is relatively slow, with computation speeds 10 million times slower than machines, so a machine can respond to a stimulus much faster than we can.

pages: 587 words: 117,894

Cybersecurity: What Everyone Needs to Know by P. W. Singer, Allan Friedman

4chan, A Declaration of the Independence of Cyberspace, air gap, Apple's 1984 Super Bowl advert, barriers to entry, Berlin Wall, bitcoin, blood diamond, borderless world, Brian Krebs, business continuity plan, Chelsea Manning, cloud computing, cognitive load, crowdsourcing, cuban missile crisis, data acquisition, do-ocracy, Dr. Strangelove, drone strike, Edward Snowden, energy security, failed state, fake news, Fall of the Berlin Wall, fault tolerance, Free Software Foundation, global supply chain, Google Earth, information security, Internet of things, invention of the telegraph, John Markoff, John Perry Barlow, Julian Assange, Khan Academy, M-Pesa, military-industrial complex, MITM: man-in-the-middle, mutually assured destruction, Network effects, packet switching, Peace of Westphalia, pre–internet, profit motive, RAND corporation, ransomware, RFC: Request For Comment, risk tolerance, rolodex, Seymour Hersh, Silicon Valley, Skype, smart grid, SQL injection, Steve Jobs, Stuxnet, Twitter Arab Spring, uranium enrichment, vertical integration, We are Anonymous. We are Legion, web application, WikiLeaks, Yochai Benkler, zero day, zero-sum game

One is the importance of building in “the intentional capacity to work under degraded conditions.” Beyond that, resilient systems must also recover quickly, and, finally, learn lessons to deal better with future threats. For decades, most major corporations have had business continuity plans for fires or natural disasters, while the electronics industry has measured what it thinks of as fault tolerance, and the communications industry has talked about reliability and redundancy in its operations. All of these fit into the idea of resilience, but most assume some natural disaster, accident, failure, or crisis rather than deliberate attack. This is where cybersecurity must go in a very different direction: if you are only thinking in terms of reliability, a network can be made resilient merely by creating redundancies.

pages: 461 words: 125,845

This Machine Kills Secrets: Julian Assange, the Cypherpunks, and Their Fight to Empower Whistleblowers by Andy Greenberg

air gap, Apple II, Ayatollah Khomeini, Berlin Wall, Bill Gates: Altair 8800, Bletchley Park, Burning Man, Chelsea Manning, computerized markets, crowdsourcing, cryptocurrency, disinformation, domain-specific language, driverless car, drone strike, en.wikipedia.org, Evgeny Morozov, Fairchild Semiconductor, fault tolerance, hive mind, information security, Jacob Appelbaum, John Gilmore, John Perry Barlow, Julian Assange, Lewis Mumford, Mahatma Gandhi, military-industrial complex, Mitch Kapor, MITM: man-in-the-middle, Mohammed Bouazizi, Mondo 2000, Neal Stephenson, nuclear winter, offshore financial centre, operational security, PalmPilot, pattern recognition, profit motive, Ralph Nader, real-name policy, reality distortion field, Richard Stallman, Robert Hanssen: Double agent, Silicon Valley, Silicon Valley ideology, Skype, social graph, SQL injection, statistical model, stem cell, Steve Jobs, Steve Wozniak, Steven Levy, Teledyne, three-masted sailing ship, undersea cable, Vernor Vinge, We are Anonymous. We are Legion, We are the 99%, WikiLeaks, X Prize, Zimmermann PGP

The geekery had gotten so thick that even some of Tor’s modern-day cypherpunks and volunteer coders, loath as they might have been to admit it, might just have gotten lost. Within minutes, Mathewson, wearing a sport jacket over a Tor T-shirt over a dwarfish potbelly, was delving into security issues like “epistemic attacks” and “Byzantine fault tolerances.” By the time he sat down, still grinning, a growing fraction of the room seemed baffled or possibly bored. Appelbaum’s presence, on the other hand, is as much guerrilla as geek. He’s Tor’s field researcher, unofficial revolutionary, and man on the ground in countries from Qatar to Brazil.

pages: 448 words: 117,325

Click Here to Kill Everybody: Security and Survival in a Hyper-Connected World by Bruce Schneier

23andMe, 3D printing, air gap, algorithmic bias, autonomous vehicles, barriers to entry, Big Tech, bitcoin, blockchain, Brian Krebs, business process, Citizen Lab, cloud computing, cognitive bias, computer vision, connected car, corporate governance, crowdsourcing, cryptocurrency, cuban missile crisis, Daniel Kahneman / Amos Tversky, David Heinemeier Hansson, disinformation, Donald Trump, driverless car, drone strike, Edward Snowden, Elon Musk, end-to-end encryption, fault tolerance, Firefox, Flash crash, George Akerlof, incognito mode, industrial robot, information asymmetry, information security, Internet of things, invention of radio, job automation, job satisfaction, John Gilmore, John Markoff, Kevin Kelly, license plate recognition, loose coupling, market design, medical malpractice, Minecraft, MITM: man-in-the-middle, move fast and break things, national security letter, Network effects, Nick Bostrom, NSO Group, pattern recognition, precautionary principle, printed gun, profit maximization, Ralph Nader, RAND corporation, ransomware, real-name policy, Rodney Brooks, Ross Ulbricht, security theater, self-driving car, Seymour Hersh, Shoshana Zuboff, Silicon Valley, smart cities, smart transportation, Snapchat, sparse data, Stanislav Petrov, Stephen Hawking, Stuxnet, supply-chain attack, surveillance capitalism, The Market for Lemons, Timothy McVeigh, too big to fail, Uber for X, Unsafe at Any Speed, uranium enrichment, Valery Gerasimov, Wayback Machine, web application, WikiLeaks, Yochai Benkler, zero day

In 2017, traffic to and from several major US ISPs was briefly routed to an obscure Russian Internet provider. And don’t think this kind of attack is limited to nation-states; a 2008 talk at the DefCon hackers conference showed how anyone can do it. When the Internet was developed, what security there was focused on physical attacks against the network. Its fault-tolerant architecture can handle servers and connections failing or being destroyed. What it can’t handle is systemic attacks against the underlying protocols. The base Internet protocols were developed without security in mind, and many of them remain insecure to this day. There’s no security in the “From” line of an e-mail: anyone can pretend to be anyone.

pages: 960 words: 125,049

Mastering Ethereum: Building Smart Contracts and DApps by Andreas M. Antonopoulos, Gavin Wood Ph. D.

air gap, Amazon Web Services, bitcoin, blockchain, business logic, continuous integration, cryptocurrency, Debian, digital divide, Dogecoin, domain-specific language, don't repeat yourself, Edward Snowden, en.wikipedia.org, Ethereum, ethereum blockchain, fault tolerance, fiat currency, Firefox, functional programming, Google Chrome, information security, initial coin offering, intangible asset, Internet of things, litecoin, machine readable, move fast and break things, node package manager, non-fungible token, peer-to-peer, Ponzi scheme, prediction markets, pull request, QR code, Ruby on Rails, Satoshi Nakamoto, sealed-bid auction, sharing economy, side project, smart contracts, transaction costs, Turing complete, Turing machine, Vickrey auction, Vitalik Buterin, web application, WebSocket

While providing high availability, auditability, transparency, and neutrality, it also reduces or eliminates censorship and reduces certain counterparty risks. Compared to Bitcoin Many people will come to Ethereum with some prior experience of cryptocurrencies, specifically Bitcoin. Ethereum shares many common elements with other open blockchains: a peer-to-peer network connecting participants, a Byzantine fault–tolerant consensus algorithm for synchronization of state updates (a proof-of-work blockchain), the use of cryptographic primitives such as digital signatures and hashes, and a digital currency (ether). Yet in many ways, both the purpose and construction of Ethereum are strikingly different from those of the open blockchains that preceded it, including Bitcoin.

pages: 571 words: 124,448

Building Habitats on the Moon: Engineering Approaches to Lunar Settlements by Haym Benaroya

3D printing, anti-fragile, Apollo 11, Apollo 13, biofilm, Black Swan, Brownian motion, Buckminster Fuller, carbon-based life, centre right, clean water, Colonization of Mars, Computer Numeric Control, conceptual framework, data acquisition, dual-use technology, Elon Musk, fault tolerance, Gene Kranz, gravity well, inventory management, Johannes Kepler, low earth orbit, Neil Armstrong, orbital mechanics / astrodynamics, performance metric, RAND corporation, restrictive zoning, risk tolerance, Ronald Reagan, stochastic process, tacit knowledge, telepresence, telerobotics, the scientific method, Two Sigma, urban planning, Virgin Galactic, X Prize, zero-sum game

“Despite being critical to the reliability of redundant systems, however, mediating systems cannot be redundant themselves, as then they would need mediating, leading to an infinite regress. ‘The daunting truth,’ to quote a 1993 report to the FAA, ‘is that some of the core [mediating] mechanisms in fault-tolerant systems are single points of failure: they just have to work correctly’.” The assumption of independence of elements in the system, whether for purposes of redundancy or as part of the system model, can also be a cause of failure. Interdependencies (correlations) exist in complex systems at the least because they are operating in, and are driven by, the same environment.

pages: 482 words: 125,973

Competition Demystified by Bruce C. Greenwald

additive manufacturing, airline deregulation, AltaVista, AOL-Time Warner, asset allocation, barriers to entry, book value, business cycle, creative destruction, cross-subsidies, deindustrialization, discounted cash flows, diversified portfolio, Do you want to sell sugared water for the rest of your life?, Everything should be made as simple as possible, fault tolerance, intangible asset, John Nash: game theory, Nash equilibrium, Network effects, new economy, oil shock, packet switching, PalmPilot, Pepsi Challenge, pets.com, price discrimination, price stability, revenue passenger mile, search costs, selective serotonin reuptake inhibitor (SSRI), shareholder value, Silicon Valley, six sigma, Steve Jobs, transaction costs, vertical integration, warehouse automation, yield management, zero-sum game

TABLE 6.1 Compaq and Dell, 1990 and 1995 ($ million, costs as a percentage of sales) FIGURE 6.4 Compaq’s return on invested capital and operating income margin, 1990–2001 For a time, the approach was successful, as the company combined strong sales growth with decent operating margins and high return on invested capital (figure 6.4).* But ingrained cultures are difficult to uproot. The engineering mentality and love of technology that was part of Compaq’s tradition did not disappear, even after Rod Canion left. In 1997 the company bought Tandem Computers, a firm that specialized in producing fault-tolerant machines designed for uninterruptible transaction processing. A year later it bought Digital Equipment Corporation, a former engineering star in the computing world which had fallen from grace as its minicomputer bastion was undermined by the personal computer revolution. At the time of the purchase, Compaq wanted DEC for its consulting business, its AltaVista Internet search engine, and some in-process research.

Autonomous Driving: How the Driverless Revolution Will Change the World by Andreas Herrmann, Walter Brenner, Rupert Stadler

Airbnb, Airbus A320, algorithmic bias, augmented reality, autonomous vehicles, blockchain, call centre, carbon footprint, clean tech, computer vision, conceptual framework, congestion pricing, connected car, crowdsourcing, cyber-physical system, DARPA: Urban Challenge, data acquisition, deep learning, demand response, digital map, disruptive innovation, driverless car, Elon Musk, fault tolerance, fear of failure, global supply chain, industrial cluster, intermodal, Internet of things, Jeff Bezos, John Zimmer (Lyft cofounder), Lyft, manufacturing employment, market fundamentalism, Mars Rover, Masdar, megacity, Pearl River Delta, peer-to-peer rental, precision agriculture, QWERTY keyboard, RAND corporation, ride hailing / ride sharing, self-driving car, sensor fusion, sharing economy, Silicon Valley, smart cities, smart grid, smart meter, Steve Jobs, Tesla Model S, Tim Cook: Apple, trolley problem, uber lyft, upwardly mobile, urban planning, Zipcar

The more trafﬁc situations these algorithms are exposed to, the better prepared they are to master a new situation. Designing this training process so that the accuracy demanded by Jen-Hsun Huang is obtained will be the crucial challenge in the development of autonomous vehicles. When discussing what fault tolerance might be acceptable, it should be borne in mind that people are more likely to forgive mistakes made by other people than mistakes made by machines. This also applies to driving errors, which are more likely to be overlooked if they were committed by a driver and not by a machine. This means that autonomous vehicles will only be accepted if they cause signiﬁcantly fewer errors than the drivers.

pages: 580 words: 125,129

Androids: The Team That Built the Android Operating System by Chet Haase

Andy Rubin, Apple II, Apple's 1984 Super Bowl advert, augmented reality, barriers to entry, Beos Apple "Steve Jobs" next macos , Big Tech, Bill Atkinson, commoditize, continuous integration, crowdsourcing, en.wikipedia.org, fault tolerance, Firefox, General Magic , Google Chrome, Ken Thompson, lock screen, machine readable, Menlo Park, PalmPilot, Parkinson's law, pull request, QWERTY keyboard, side project, Silicon Valley, skunkworks, speech recognition, stealth mode startup, Steve Ballmer, Steve Jobs, Steve Wozniak, Tony Fadell, turn-by-turn navigation, web application

Bob’s fix was to catch that failure condition and set the initial time on the phone to the day that he fixed the bug. Bob also tracked down a networking problem that was specific to mobile data. Android phones were experiencing severe outages that seemed like a problem with bad carrier network infrastructure. Networking protocols have built in fault-tolerance, because networks can go down, or packets of data can get lost or delayed. Android was using the congestion window approach in Linux that responds to an outage by halving the size of the data packet, and halving it again, and again, until it gets a response from the server that packets are going through.

pages: 474 words: 130,575

Surveillance Valley: The Rise of the Military-Digital Complex by Yasha Levine

23andMe, activist fund / activist shareholder / activist investor, Adam Curtis, Airbnb, AltaVista, Amazon Web Services, Anne Wojcicki, anti-communist, AOL-Time Warner, Apple's 1984 Super Bowl advert, bitcoin, Black Lives Matter, borderless world, Boston Dynamics, British Empire, Californian Ideology, call centre, Charles Babbage, Chelsea Manning, cloud computing, collaborative editing, colonial rule, company town, computer age, computerized markets, corporate governance, crowdsourcing, cryptocurrency, data science, digital map, disinformation, don't be evil, Donald Trump, Douglas Engelbart, Douglas Engelbart, Dr. Strangelove, drone strike, dual-use technology, Edward Snowden, El Camino Real, Electric Kool-Aid Acid Test, Elon Musk, end-to-end encryption, fake news, fault tolerance, gentrification, George Gilder, ghettoisation, global village, Google Chrome, Google Earth, Google Hangouts, Greyball, Hacker Conference 1984, Howard Zinn, hypertext link, IBM and the Holocaust, index card, Jacob Appelbaum, Jeff Bezos, jimmy wales, John Gilmore, John Markoff, John Perry Barlow, John von Neumann, Julian Assange, Kevin Kelly, Kickstarter, Laura Poitras, life extension, Lyft, machine readable, Mark Zuckerberg, market bubble, Menlo Park, military-industrial complex, Mitch Kapor, natural language processing, Neal Stephenson, Network effects, new economy, Norbert Wiener, off-the-grid, One Laptop per Child (OLPC), packet switching, PageRank, Paul Buchheit, peer-to-peer, Peter Thiel, Philip Mirowski, plutocrats, private military company, RAND corporation, Ronald Reagan, Ross Ulbricht, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, Sheryl Sandberg, side project, Silicon Valley, Silicon Valley startup, Skype, slashdot, Snapchat, Snow Crash, SoftBank, speech recognition, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Susan Wojcicki, Telecommunications Act of 1996, telepresence, telepresence robot, The Bell Curve by Richard Herrnstein and Charles Murray, The Hackers Conference, Tony Fadell, uber lyft, vertical integration, Whole Earth Catalog, Whole Earth Review, WikiLeaks

NRL Review was an in-house navy magazine that showcased all the cool gadgets cooked up by the lab over the previous year. D. M. Goldschlag, M. G. Reed, and P. F. Syverson, “Internet Communication Resistant to Traffic Analysis,” NRL Review, April 1997. 13. This last stage of development was funded by both the Office of Naval Research and DARPA under its Fault Tolerant Networks Program. The amount of the DARPA funding is unknown. “Onion Routing: Brief Selected History,” website formerly operated by the Center for High Assurance Computer Systems in the Information Technology Division of the US Naval Research Lab, 2005, accessed July 6, 2017, https://www.onion-router.net/History.html. 14.

pages: 448 words: 71,301

Programming Scala by Unknown

billion-dollar mistake, business logic, domain-specific language, duck typing, en.wikipedia.org, fault tolerance, functional programming, general-purpose programming language, higher-order functions, information security, loose coupling, type inference, web application

ScalaModules Scala DSL to ease OSGi development (http://code.google.com/p/scalamodules/). Configgy Managing configuration files and logging for “daemons” written in Scala (http://www.lag.net/configgy/). scouchdb Scala interface to CouchDB (http://code.google.com/p/scouchdb/). Akka A project to implement a platform for building fault-tolerant, distributed applications based on REST, Actors, etc. (http://akkasource.org/). scala-query A type-safe database query API for Scala (http://github.com/szeiger/scala-query/tree/master). We’ll discuss using Scala with several well-known Java libraries after we discuss Java interoperability, next. 368 | Chapter 14: Scala Tools, Libraries, and IDE Support Download at WoweBook.Com Java Interoperability Of all the alternative JVM languages, Scala’s interoperability with Java source code is among the most seamless.

pages: 458 words: 137,960

Ready Player One by Ernest Cline

Albert Einstein, call centre, dematerialisation, disinformation, escalation ladder, fault tolerance, financial independence, game design, late fees, Neal Stephenson, Pepsi Challenge, pre–internet, Rubik’s Cube, side project, telemarketer, walking around money, WarGames: Global Thermonuclear War

In addition to restricting the overall size of their virtual environments, earlier MMOs had been forced to limit their virtual populations, usually to a few thousand users per server. If too many people were logged in at the same time, the simulation would slow to a crawl and avatars would freeze in midstride as the system struggled to keep up. But the OASIS utilized a new kind of fault-tolerant server array that could draw additional processing power from every computer connected to it. At the time of its initial launch, the OASIS could handle up to five million simultaneous users, with no discernible latency and no chance of a system crash. A massive marketing campaign promoted the launch of the OASIS.

pages: 559 words: 130,949

Learn You a Haskell for Great Good!: A Beginner's Guide by Miran Lipovaca

fault tolerance, functional programming, higher-order functions, loose coupling, type inference

ghci> solveRPN "2.7 ln" 0.9932517730102834 ghci> solveRPN "10 10 10 10 sum 4 /" 10.0 ghci> solveRPN "10 10 10 10 10 sum 4 /" 12.5 ghci> solveRPN "10 2 ^" 100.0 I think that making a function that can calculate arbitrary floating-point RPN expressions and has the option to be easily extended in 10 lines is pretty awesome. Note This RPN calculation solution is not really fault tolerant. When given input that doesn’t make sense, it might result in a runtime error. But don’t worry, you’ll learn how to make this function more robust in Chapter 14. Heathrow to London Suppose that we’re on a business trip. Our plane has just landed in England, and we rent a car. We have a meeting really soon, and we need to get from Heathrow Airport to London as fast as we can (but safely!).

AI 2041 by Kai-Fu Lee, Chen Qiufan

3D printing, Abraham Maslow, active measures, airport security, Albert Einstein, AlphaGo, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, augmented reality, autonomous vehicles, basic income, bitcoin, blockchain, blue-collar work, Cambridge Analytica, carbon footprint, Charles Babbage, computer vision, contact tracing, coronavirus, corporate governance, corporate social responsibility, COVID-19, CRISPR, cryptocurrency, DALL-E, data science, deep learning, deepfake, DeepMind, delayed gratification, dematerialisation, digital map, digital rights, digital twin, Elon Musk, fake news, fault tolerance, future of work, Future Shock, game design, general purpose technology, global pandemic, Google Glasses, Google X / Alphabet X, GPT-3, happiness index / gross national happiness, hedonic treadmill, hiring and firing, Hyperloop, information security, Internet of things, iterative process, job automation, language acquisition, low earth orbit, Lyft, Maslow's hierarchy, mass immigration, mirror neurons, money: store of value / unit of account / medium of exchange, mutually assured destruction, natural language processing, Neil Armstrong, Nelson Mandela, OpenAI, optical character recognition, pattern recognition, plutocrats, post scarcity, profit motive, QR code, quantitative easing, Richard Feynman, ride hailing / ride sharing, robotic process automation, Satoshi Nakamoto, self-driving car, seminal paper, Silicon Valley, smart cities, smart contracts, smart transportation, Snapchat, social distancing, speech recognition, Stephen Hawking, synthetic biology, telemarketer, Tesla Model S, The future is already here, trolley problem, Turing test, uber lyft, universal basic income, warehouse automation, warehouse robotics, zero-sum game

The IBM researchers acknowledge that control of errors caused by decoherence will get much worse the more qubits are added. To deal with this challenge, complex and fragile equipment must be built with new technologies and precision engineering. Also, decoherence errors will require each logical qubit to be represented by many physical qubits to provide stability, error correction, and fault tolerance. It is estimated that a QC will likely need a million or more physical qubits in order to deliver the performance of a 4,000 logical qubit QC. And even when a useful quantum computer is successfully demonstrated, mass production is another matter. Finally, quantum computers are programmed completely differently from classical computers, so new algorithms will need to be invented, and new software tools will need to be built.

pages: 528 words: 146,459

Computer: A History of the Information Machine by Martin Campbell-Kelly, William Aspray, Nathan L. Ensmenger, Jeffrey R. Yost

Ada Lovelace, air freight, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Apple's 1984 Super Bowl advert, barriers to entry, Bill Gates: Altair 8800, Bletchley Park, borderless world, Buckminster Fuller, Build a better mousetrap, Byte Shop, card file, cashless society, Charles Babbage, cloud computing, combinatorial explosion, Compatible Time-Sharing System, computer age, Computer Lib, deskilling, don't be evil, Donald Davies, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Jenner, Evgeny Morozov, Fairchild Semiconductor, fault tolerance, Fellow of the Royal Society, financial independence, Frederick Winslow Taylor, game design, garden city movement, Gary Kildall, Grace Hopper, Herman Kahn, hockey-stick growth, Ian Bogost, industrial research laboratory, informal economy, interchangeable parts, invention of the wheel, Ivan Sutherland, Jacquard loom, Jeff Bezos, jimmy wales, John Markoff, John Perry Barlow, John von Neumann, Ken Thompson, Kickstarter, light touch regulation, linked data, machine readable, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Mitch Kapor, Multics, natural language processing, Network effects, New Journalism, Norbert Wiener, Occupy movement, optical character recognition, packet switching, PageRank, PalmPilot, pattern recognition, Pierre-Simon Laplace, pirate software, popular electronics, prediction markets, pre–internet, QWERTY keyboard, RAND corporation, Robert X Cringely, Salesforce, scientific management, Silicon Valley, Silicon Valley startup, Steve Jobs, Steven Levy, Stewart Brand, Ted Nelson, the market place, Turing machine, Twitter Arab Spring, Vannevar Bush, vertical integration, Von Neumann architecture, Whole Earth Catalog, William Shockley: the traitorous eight, women in the workforce, young professional

Although computer technology is at the heart of the Internet, its importance is economic and social: the Internet gives computer users the ability to communicate, to gain access to information sources, and to conduct business. I. From the World Brain to the World Wide Web The Internet sprang from a confluence of three desires, two that emerged in the 1960s and one that originated much further back in time. First, there was the rather utilitarian desire for an efficient, fault-tolerant networking technology, suitable for military communications, that would never break down. Second, there was a wish to unite the world’s computer networks into a single system. Just as the telephone would never have become the dominant person-to-person communications medium if users had been restricted to the network of their particular provider, so the world’s isolated computer networks would be far more useful if they were joined together.

pages: 523 words: 143,139

Algorithms to Live By: The Computer Science of Human Decisions by Brian Christian, Tom Griffiths

4chan, Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, algorithmic bias, algorithmic trading, anthropic principle, asset allocation, autonomous vehicles, Bayesian statistics, behavioural economics, Berlin Wall, Big Tech, Bill Duvall, bitcoin, Boeing 747, Charles Babbage, cognitive load, Community Supported Agriculture, complexity theory, constrained optimization, cosmological principle, cryptocurrency, Danny Hillis, data science, David Heinemeier Hansson, David Sedaris, delayed gratification, dematerialisation, diversification, Donald Knuth, Donald Shoup, double helix, Dutch auction, Elon Musk, exponential backoff, fault tolerance, Fellow of the Royal Society, Firefox, first-price auction, Flash crash, Frederick Winslow Taylor, fulfillment center, Garrett Hardin, Geoffrey Hinton, George Akerlof, global supply chain, Google Chrome, heat death of the universe, Henri Poincaré, information retrieval, Internet Archive, Jeff Bezos, Johannes Kepler, John Nash: game theory, John von Neumann, Kickstarter, knapsack problem, Lao Tzu, Leonard Kleinrock, level 1 cache, linear programming, martingale, multi-armed bandit, Nash equilibrium, natural language processing, NP-complete, P = NP, packet switching, Pierre-Simon Laplace, power law, prediction markets, race to the bottom, RAND corporation, RFC: Request For Comment, Robert X Cringely, Sam Altman, scientific management, sealed-bid auction, second-price auction, self-driving car, Silicon Valley, Skype, sorting algorithm, spectrum auction, Stanford marshmallow experiment, Steve Jobs, stochastic process, Thomas Bayes, Thomas Malthus, Tragedy of the Commons, traveling salesman, Turing machine, urban planning, Vickrey auction, Vilfredo Pareto, Walter Mischel, Y Combinator, zero-sum game

In this algorithm, each item is compared to all the others, generating a tally of how many items it is bigger than. This number can then be used directly as the item’s rank. Since it compares all pairs, Comparison Counting Sort is a quadratic-time algorithm, like Bubble Sort. Thus it’s not a popular choice in traditional computer science applications, but it’s exceptionally fault-tolerant. This algorithm’s workings should sound familiar. Comparison Counting Sort operates exactly like a Round-Robin tournament. In other words, it strongly resembles a sports team’s regular season—playing every other team in the division and building up a win-loss record by which they are ranked.

pages: 470 words: 144,455

Secrets and Lies: Digital Security in a Networked World by Bruce Schneier

Ayatollah Khomeini, barriers to entry, Bletchley Park, business process, butterfly effect, cashless society, Columbine, defense in depth, double entry bookkeeping, drop ship, fault tolerance, game design, IFF: identification friend or foe, information security, John Gilmore, John von Neumann, knapsack problem, macro virus, Mary Meeker, MITM: man-in-the-middle, moral panic, Morris worm, Multics, multilevel marketing, mutually assured destruction, PalmPilot, pez dispenser, pirate software, profit motive, Richard Feynman, risk tolerance, Russell Brand, Silicon Valley, Simon Singh, slashdot, statistical model, Steve Ballmer, Steven Levy, systems thinking, the payments system, Timothy McVeigh, Y2K, Yogi Berra

These definitions have always struck me as being somewhat circular. We know intuitively what we mean by availability with respect to computers: We want the computer to work when we expect it to as we expect it to. Lots of software doesn’t work when and as we expect it to, and there are entire areas of computer science research in reliability and fault- tolerant computing and software quality ... none of which has anything to do with security. In the context of security, availability is about ensuring that an attacker can’t prevent legitimate users from having reasonable access to their systems. For example, availability is about ensuring that denial-of-service attacks are not possible.

pages: 590 words: 152,595

Army of None: Autonomous Weapons and the Future of War by Paul Scharre

"World Economic Forum" Davos, active measures, Air France Flight 447, air gap, algorithmic trading, AlphaGo, Apollo 13, artificial general intelligence, augmented reality, automated trading system, autonomous vehicles, basic income, Black Monday: stock market crash in 1987, brain emulation, Brian Krebs, cognitive bias, computer vision, cuban missile crisis, dark matter, DARPA: Urban Challenge, data science, deep learning, DeepMind, DevOps, Dr. Strangelove, drone strike, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, facts on the ground, fail fast, fault tolerance, Flash crash, Freestyle chess, friendly fire, Herman Kahn, IFF: identification friend or foe, ImageNet competition, information security, Internet of things, Jeff Hawkins, Johann Wolfgang von Goethe, John Markoff, Kevin Kelly, Korean Air Lines Flight 007, Loebner Prize, loose coupling, Mark Zuckerberg, military-industrial complex, moral hazard, move 37, mutually assured destruction, Nate Silver, Nick Bostrom, PalmPilot, paperclip maximiser, pattern recognition, Rodney Brooks, Rubik’s Cube, self-driving car, sensor fusion, South China Sea, speech recognition, Stanislav Petrov, Stephen Hawking, Steve Ballmer, Steve Wozniak, Strategic Defense Initiative, Stuxnet, superintelligent machines, Tesla Model S, The Signal and the Noise by Nate Silver, theory of mind, Turing test, Tyler Cowen, universal basic income, Valery Gerasimov, Wall-E, warehouse robotics, William Langewiesche, Y2K, zero day

Safety under these conditions requires something more than high-reliability organizations. It requires high-reliability fully autonomous complex machines, and there is no precedent for such systems. This would require a vastly different kind of machine from Aegis, one that was exceptionally predictable to the user but not to the enemy, and with a fault-tolerant design that defaulted to safe operations in the event of failures. Given the state of technology today, no one knows how to build a complex system that is 100 percent fail-safe. It is tempting to think that future systems will change this dynamic. The promise of “smarter” machines is seductive: they will be more advanced, more intelligent, and therefore able to account for more variables and avoid failures.

pages: 1,025 words: 150,187

ZeroMQ by Pieter Hintjens

AGPL, anti-pattern, behavioural economics, carbon footprint, cloud computing, Debian, distributed revision control, domain-specific language, eat what you kill, Eben Moglen, exponential backoff, factory automation, fail fast, fault tolerance, fear of failure, finite state, Internet of things, iterative process, no silver bullet, power law, premature optimization, profit motive, pull request, revision control, RFC: Request For Comment, Richard Stallman, Skype, smart transportation, software patent, Steve Jobs, Valgrind, WebSocket

Rob Gagnon’s Story “We use ØMQ to assist in aggregating thousands of events occurring every minute across our global network of telecommunications servers so that we can accurately report and monitor for situations that require our attention. ØMQ made the development of the system not only easier, but faster to develop and more robust and fault-tolerant than we had originally planned in our original design. “We’re able to easily add and remove clients from the network without the loss of any message. If we need to enhance the server portion of our system, we can stop and restart it as well, without having to worry about stopping all of the clients first.

pages: 598 words: 134,339

Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World by Bruce Schneier

23andMe, Airbnb, airport security, AltaVista, Anne Wojcicki, AOL-Time Warner, augmented reality, behavioural economics, Benjamin Mako Hill, Black Swan, Boris Johnson, Brewster Kahle, Brian Krebs, call centre, Cass Sunstein, Chelsea Manning, citizen journalism, Citizen Lab, cloud computing, congestion charging, data science, digital rights, disintermediation, drone strike, Eben Moglen, Edward Snowden, end-to-end encryption, Evgeny Morozov, experimental subject, failed state, fault tolerance, Ferguson, Missouri, Filter Bubble, Firefox, friendly fire, Google Chrome, Google Glasses, heat death of the universe, hindsight bias, informal economy, information security, Internet Archive, Internet of things, Jacob Appelbaum, James Bridle, Jaron Lanier, John Gilmore, John Markoff, Julian Assange, Kevin Kelly, Laura Poitras, license plate recognition, lifelogging, linked data, Lyft, Mark Zuckerberg, moral panic, Nash equilibrium, Nate Silver, national security letter, Network effects, Occupy movement, operational security, Panopticon Jeremy Bentham, payday loans, pre–internet, price discrimination, profit motive, race to the bottom, RAND corporation, real-name policy, recommendation engine, RFID, Ross Ulbricht, satellite internet, self-driving car, Shoshana Zuboff, Silicon Valley, Skype, smart cities, smart grid, Snapchat, social graph, software as a service, South China Sea, sparse data, stealth mode startup, Steven Levy, Stuxnet, TaskRabbit, technological determinism, telemarketer, Tim Cook: Apple, transaction costs, Uber and Lyft, uber lyft, undersea cable, unit 8200, urban planning, Wayback Machine, WikiLeaks, workplace surveillance , Yochai Benkler, yottabyte, zero day

If systemic imperfections are inevitable, we have to accept them—in laws, in government institutions, in corporations, in individuals, in society. We have to design systems that expect them and can work despite them. If something is going to fail or break, we need it to fail in a predictable way. That’s resilience. In systems design, resilience comes from a combination of elements: fault-tolerance, mitigation, redundancy, adaptability, recoverability, and survivability. It’s what we need in the complex and ever-changing threat landscape I’ve described in this book. I am advocating for several flavors of resilience for both our systems of surveillance and our systems that control surveillance: resilience to hardware and software failure, resilience to technological innovation, resilience to political change, and resilience to coercion.

pages: 496 words: 154,363

I'm Feeling Lucky: The Confessions of Google Employee Number 59 by Douglas Edwards

"World Economic Forum" Davos, Albert Einstein, AltaVista, Any sufficiently advanced technology is indistinguishable from magic, AOL-Time Warner, barriers to entry, book scanning, Build a better mousetrap, Burning Man, business intelligence, call centre, commoditize, crowdsourcing, don't be evil, Dutch auction, Elon Musk, fault tolerance, Googley, gravity well, invisible hand, Jeff Bezos, job-hopping, John Markoff, Kickstarter, machine translation, Marc Andreessen, Menlo Park, microcredit, music of the spheres, Network effects, PageRank, PalmPilot, performance metric, pets.com, Ralph Nader, risk tolerance, second-price auction, Sheryl Sandberg, side project, Silicon Valley, Silicon Valley startup, slashdot, stem cell, Superbowl ad, Susan Wojcicki, tech worker, The Turner Diaries, Y2K

"Build machines so cheap that we don't care if they fail. And if they fail, just ignore them until we get around to fixing them." That was Google's strategy, according to hardware designer Will Whitted, who joined the company in 2001. "That concept of using commodity parts and of being extremely fault tolerant, of writing the software in a way that the hardware didn't have to be very good, was just brilliant." But only if you could get the parts to fix the broken computers and keep adding new machines. Or if you could improve the machines' efficiency so you didn't need so many of them. The first batch of Google servers had been so hastily assembled that the solder points on the motherboards touched the metal of the trays beneath them, so the engineers added corkboard liners as insulation.

pages: 739 words: 174,990

The TypeScript Workshop: A Practical Guide to Confident, Effective TypeScript Programming by Ben Grynhaus, Jordan Hudgens, Rayon Hunte, Matthew Thomas Morgan, Wekoslav Stefanovski

Ada Lovelace, Albert Einstein, business logic, Charles Babbage, create, read, update, delete, don't repeat yourself, Donald Knuth, fault tolerance, Firefox, full stack developer, functional programming, Google Chrome, Hacker News, higher-order functions, inventory management, Kickstarter, loose coupling, node package manager, performance metric, QR code, Ruby on Rails, SQL injection, type inference, web application, WebSocket

; }; const secondaryFn = async () => { console.log('Aye aye!'); }; const asyncFn = async () => { try { await primaryFn(); } catch (e) { console.warn(e); secondaryFn(); } }; asyncFn(); In this case, we just throw a warning and fall back to the secondary system because this program was designed to be fault-tolerant. It's still a good idea to log the warning so that we can trace how our system is behaving. One more variation of this for now. Let's put our try and catch blocks at the top level and rewrite our program like this:export const errorFN = async () => { throw new Error('An error has occurred!')

pages: 552 words: 168,518

MacroWikinomics: Rebooting Business and the World by Don Tapscott, Anthony D. Williams

"World Economic Forum" Davos, accounting loophole / creative accounting, airport security, Andrew Keen, augmented reality, Ayatollah Khomeini, barriers to entry, Ben Horowitz, bioinformatics, blood diamond, Bretton Woods, business climate, business process, buy and hold, car-free, carbon footprint, carbon tax, Charles Lindbergh, citizen journalism, Clayton Christensen, clean water, Climategate, Climatic Research Unit, cloud computing, collaborative editing, collapse of Lehman Brothers, collateralized debt obligation, colonial rule, commoditize, corporate governance, corporate social responsibility, creative destruction, crowdsourcing, death of newspapers, demographic transition, digital capitalism, digital divide, disruptive innovation, distributed generation, do well by doing good, don't be evil, en.wikipedia.org, energy security, energy transition, Evgeny Morozov, Exxon Valdez, failed state, fault tolerance, financial innovation, Galaxy Zoo, game design, global village, Google Earth, Hans Rosling, hive mind, Home mortgage interest deduction, information asymmetry, interchangeable parts, Internet of things, invention of movable type, Isaac Newton, James Watt: steam engine, Jaron Lanier, jimmy wales, Joseph Schumpeter, Julian Assange, Kevin Kelly, Kickstarter, knowledge economy, knowledge worker, machine readable, Marc Andreessen, Marshall McLuhan, mass immigration, medical bankruptcy, megacity, military-industrial complex, mortgage tax deduction, Netflix Prize, new economy, Nicholas Carr, ocean acidification, off-the-grid, oil shock, old-boy network, online collectivism, open borders, open economy, pattern recognition, peer-to-peer lending, personalized medicine, radical decentralization, Ray Kurzweil, RFID, ride hailing / ride sharing, Ronald Reagan, Rubik’s Cube, scientific mainstream, shareholder value, Silicon Valley, Skype, smart grid, smart meter, social graph, social web, software patent, Steve Jobs, synthetic biology, systems thinking, text mining, the long tail, the scientific method, The Wisdom of Crowds, transaction costs, transfer pricing, University of East Anglia, urban sprawl, value at risk, WikiLeaks, X Prize, Yochai Benkler, young professional, Zipcar

To make it work, you’ll need to reveal your IP in an appropriate network, socializing it with participants and letting it spawn new knowledge and invention. You’ll need to stay plugged into the community so that you can leverage new contributions as they come in. You’ll also need to dedicate some resources to filtering and aggregating contributions. It can be a lot of work, but these types of collaborations can produce more robust, user-defined, fault-tolerant products in less time and for less expense than the conventional closed approach. 3. LET GO Leaders in business and society who are attempting to transform their organizations have many understandable concerns about moving forward. One of the biggest is a fear of losing control. I can’t open up, it’s too risky.

Turing's Cathedral by George Dyson

1919 Motor Transport Corps convoy, Abraham Wald, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anti-communist, Benoit Mandelbrot, Bletchley Park, British Empire, Brownian motion, cellular automata, Charles Babbage, cloud computing, computer age, Computing Machinery and Intelligence, Danny Hillis, dark matter, double helix, Dr. Strangelove, fault tolerance, Fellow of the Royal Society, finite state, Ford Model T, Georg Cantor, Henri Poincaré, Herman Kahn, housing crisis, IFF: identification friend or foe, indoor plumbing, Isaac Newton, Jacquard loom, John von Neumann, machine readable, mandelbrot fractal, Menlo Park, Murray Gell-Mann, Neal Stephenson, Norbert Wiener, Norman Macrae, packet switching, pattern recognition, Paul Erdős, Paul Samuelson, phenotype, planetary scale, RAND corporation, random walk, Richard Feynman, SETI@home, social graph, speech recognition, The Theory of the Leisure Class by Thorstein Veblen, Thorstein Veblen, Turing complete, Turing machine, Von Neumann architecture

“If the only demerit of the digital expansion system were its greater logical complexity, nature would not, for this reason alone, have rejected it,” von Neumann admitted in 1948.48 Search engines and social networks are analog computers of unprecedented scale. Information is being encoded (and operated upon) as continuous (and noise-tolerant) variables such as frequencies (of connection or occurrence) and the topology of what connects where, with location being increasingly defined by a fault-tolerant template rather than by an unforgiving numerical address. Pulse-frequency coding for the Internet is one way to describe the working architecture of a search engine, and PageRank for neurons is one way to describe the working architecture of the brain. These computational structures use digital components, but the analog computing being performed by the system as a whole exceeds the complexity of the digital code on which it runs.

pages: 604 words: 161,455

The Moral Animal: Evolutionary Psychology and Everyday Life by Robert Wright

agricultural Revolution, Andrei Shleifer, Apollo 13, Asian financial crisis, British Empire, centre right, cognitive dissonance, cotton gin, double entry bookkeeping, double helix, Easter island, fault tolerance, Francis Fukuyama: the end of history, Garrett Hardin, George Gilder, global village, Great Leap Forward, invention of gunpowder, invention of movable type, invention of the telegraph, invention of writing, invisible hand, John Nash: game theory, John von Neumann, Marshall McLuhan, Multics, Norbert Wiener, planetary scale, planned obsolescence, pre–internet, profit motive, Ralph Waldo Emerson, random walk, Richard Thaler, rising living standards, Robert Solow, Silicon Valley, social intelligence, social web, Steven Pinker, talking drums, technological determinism, the medium is the message, The Wealth of Nations by Adam Smith, trade route, Tragedy of the Commons, your tax dollars at work, zero-sum game

The iron horseshoe and the windpipe-friendly harness seem to have been invented in Asia and then to have leapt from person to person to person—maybe hitching a ride with nomads for a time—all the way to the Atlantic Ocean. One key to the resilience of this giant multicultural brain is its multiculturalness. No one culture is in charge, so no one culture controls the memes (though some try in vain). This decentralization makes epic social setbacks of reliably limited duration; the system is “fault-tolerant,” as computer engineers say. While Europe fell into its slough of despond, Byzantium and southern China stayed standing, India had ups and downs, and the newborn Islamic civilization flourished. These cultures performed two key services: inventing neat new things that would eventually spread into Europe (the spinning wheel probably arose somewhere in the Orient); and conserving useful old things that were now scarce in Europe (the astrolabe, a Greek invention, came to Europe via Islam, as did Ptolemy’s astronomy—which, though ultimately wrong, worked for navigational purposes).

pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives by Steven Levy

"World Economic Forum" Davos, 23andMe, AltaVista, Andy Rubin, Anne Wojcicki, Apple's 1984 Super Bowl advert, autonomous vehicles, Bill Atkinson, book scanning, Brewster Kahle, Burning Man, business process, clean water, cloud computing, crowdsourcing, Dean Kamen, discounted cash flows, don't be evil, Donald Knuth, Douglas Engelbart, Douglas Engelbart, Dutch auction, El Camino Real, Evgeny Morozov, fault tolerance, Firefox, General Magic , Gerard Salton, Gerard Salton, Google bus, Google Chrome, Google Earth, Googley, high-speed rail, HyperCard, hypertext link, IBM and the Holocaust, informal economy, information retrieval, Internet Archive, Jeff Bezos, John Markoff, Ken Thompson, Kevin Kelly, Kickstarter, large language model, machine translation, Mark Zuckerberg, Menlo Park, one-China policy, optical character recognition, PageRank, PalmPilot, Paul Buchheit, Potemkin village, prediction markets, Project Xanadu, recommendation engine, risk tolerance, Rubik’s Cube, Sand Hill Road, Saturday Night Live, search inside the book, second-price auction, selection bias, Sheryl Sandberg, Silicon Valley, SimCity, skunkworks, Skype, slashdot, social graph, social software, social web, spectrum auction, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, subscription business, Susan Wojcicki, Ted Nelson, telemarketer, The future is already here, the long tail, trade route, traveling salesman, turn-by-turn navigation, undersea cable, Vannevar Bush, web application, WikiLeaks, Y Combinator

Google’s first CIO, Douglas Merrill, once noted that the disk drives Google purchased were “poorer quality than you would put into your kid’s computer at home.” But Google designed around the flaws. “We built capabilities into the software, the hardware, and the network—the way we hook them up, the load balancing, and so on—to build in redundancy, to make the system fault-tolerant,” says Reese. The Google File System, written by Jeff Dean and Sanjay Ghemawat, was invaluable in this process: it was designed to manage failure by “sharding” data, distributing it to multiple servers. If Google search called for certain information at one server and didn’t get a reply after a couple of milliseconds, there were two other Google servers that could fulfill the request.

pages: 778 words: 239,744

Gnomon by Nick Harkaway

"Margaret Hamilton" Apollo, Albert Einstein, back-to-the-land, banking crisis, behavioural economics, Burning Man, choice architecture, clean water, cognitive dissonance, false flag, fault tolerance, fear of failure, Future Shock, gravity well, Great Leap Forward, high net worth, impulse control, Isaac Newton, Khartoum Gordon, lifelogging, neurotypical, off-the-grid, pattern recognition, place-making, post-industrial society, Potemkin village, precautionary principle, Richard Feynman, Scramble for Africa, self-driving car, side project, Silicon Valley, skeuomorphism, skunkworks, the market place, trade route, Tragedy of the Commons, urban planning, urban sprawl

Effectively deployed bad practice under the System is a disaster. It would place the most absolute surveillance machine in history in the hands of villainous actors or mob instincts.’ ‘And you stop that from happening?’ ‘Oh no. Not us. The System itself, as designed by its original architects. Firespine is not a back door. It is a fault-tolerant architecture – a protocol of desperation. It adjusts where necessary, pushes people to vote when they are wise and not when they are foolish. It organises instants in time, perfect moments that unlock our better selves, serendipitous encounters to correct negative ones that make us less than we should be.

pages: 798 words: 240,182

The Transhumanist Reader by Max More, Natasha Vita-More

"World Economic Forum" Davos, 23andMe, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, augmented reality, Bill Joy: nanobots, bioinformatics, brain emulation, Buckminster Fuller, cellular automata, clean water, cloud computing, cognitive bias, cognitive dissonance, combinatorial explosion, Computing Machinery and Intelligence, conceptual framework, Conway's Game of Life, cosmological principle, data acquisition, discovery of DNA, Douglas Engelbart, Drosophila, en.wikipedia.org, endogenous growth, experimental subject, Extropian, fault tolerance, Flynn Effect, Francis Fukuyama: the end of history, Frank Gehry, friendly AI, Future Shock, game design, germ theory of disease, Hans Moravec, hypertext link, impulse control, index fund, John von Neumann, joint-stock company, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, Louis Pasteur, Menlo Park, meta-analysis, moral hazard, Network effects, Nick Bostrom, Norbert Wiener, pattern recognition, Pepto Bismol, phenotype, positional goods, power law, precautionary principle, prediction markets, presumed consent, Project Xanadu, public intellectual, radical life extension, Ray Kurzweil, reversible computing, RFID, Ronald Reagan, scientific worldview, silicon-based life, Singularitarianism, social intelligence, stem cell, stochastic process, superintelligent machines, supply-chain management, supply-chain management software, synthetic biology, systems thinking, technological determinism, technological singularity, Ted Nelson, telepresence, telepresence robot, telerobotics, the built environment, The Coming Technological Singularity, the scientific method, The Wisdom of Crowds, transaction costs, Turing machine, Turing test, Upton Sinclair, Vernor Vinge, Von Neumann architecture, VTOL, Whole Earth Review, women in the workforce, zero-sum game

The goal of substrate-independence is to continue personality, individual characteristics, a manner of experiencing, and a personal way of processing those experiences (Koene 2011a, 2011b). Your identity, your memories can then be embodied physically in many ways. They can also be backed up and operate robustly on fault-tolerant hardware with redundancy schemes. Achieving substrate-independence will allow us to optimize the operational framework, the hardware, to challenges posed by novel circumstances and different environments. Think, instead of sending extremophile bacteria to slowly terraform another world into a habitat, we ourselves can be extremophiles.

pages: 945 words: 292,893

Seveneves by Neal Stephenson

Apollo 13, Biosphere 2, clean water, Colonization of Mars, Danny Hillis, digital map, double helix, epigenetics, fault tolerance, Fellow of the Royal Society, Filipino sailors, gravity well, hydroponic farming, Isaac Newton, Jeff Bezos, kremlinology, Kuiper Belt, low earth orbit, machine readable, microbiome, military-industrial complex, Neal Stephenson, orbital mechanics / astrodynamics, phenotype, Potemkin village, pre–internet, random walk, remote working, selection bias, side project, Silicon Valley, Skype, Snow Crash, space junk, statistical model, Stewart Brand, supervolcano, tech billionaire, TED Talk, the scientific method, Tunguska event, VTOL, zero day, éminence grise

If the Cloud Ark survived, it would survive on a water-based economy. A hundred years from now everything in space would be cooled by circulating water systems. But for now they had to keep the ammonia-based equipment running as well. Further complications, as if any were wanted, came from the fact that the systems had to be fault tolerant. If one of them got bashed by a hurtling piece of moon shrapnel and began to leak, it needed to be isolated from the rest of the system before too much of the precious water, or ammonia, leaked into space. So, the system as a whole possessed vast hierarchies of check valves, crossover switches, and redundancies that had saturated even Ivy’s brain, normally an infinite sink for detail.

pages: 1,164 words: 309,327

Trading and Exchanges: Market Microstructure for Practitioners by Larry Harris

active measures, Andrei Shleifer, AOL-Time Warner, asset allocation, automated trading system, barriers to entry, Bernie Madoff, Bob Litterman, book value, business cycle, buttonwood tree, buy and hold, compound rate of return, computerized trading, corporate governance, correlation coefficient, data acquisition, diversified portfolio, equity risk premium, fault tolerance, financial engineering, financial innovation, financial intermediation, fixed income, floating exchange rates, High speed trading, index arbitrage, index fund, information asymmetry, information retrieval, information security, interest rate swap, invention of the telegraph, job automation, junk bonds, law of one price, London Interbank Offered Rate, Long Term Capital Management, margin call, market bubble, market clearing, market design, market fragmentation, market friction, market microstructure, money market fund, Myron Scholes, National best bid and offer, Nick Leeson, open economy, passive investing, pattern recognition, payment for order flow, Ponzi scheme, post-materialism, price discovery process, price discrimination, principal–agent problem, profit motive, proprietary trading, race to the bottom, random walk, Reminiscences of a Stock Operator, rent-seeking, risk free rate, risk tolerance, risk-adjusted returns, search costs, selection bias, shareholder value, short selling, short squeeze, Small Order Execution System, speech recognition, statistical arbitrage, statistical model, survivorship bias, the market place, transaction costs, two-sided market, vertical integration, winner-take-all economy, yield curve, zero-coupon bond, zero-sum game

They must eliminate all single points of failure. Since failures are inevitable, given current technologies, markets also must invest in systems that allow them to recover from service interruptions. Markets—as well as brokers and dealers—employ many of the following processes to create reliable trading systems: • They use fault-tolerant computer hardware. • They build redundant computer systems. • They build redundant network connections. * * * ▶ Some Examples of the Risks of Trading Through Unreliable Data Networks • A trader submits a limit order to an electronic market. After the order is accepted, but before it trades, the trader’s network connection fails.