DevOps

42 results back to index


pages: 757 words: 193,541

The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Volume 2 by Thomas A. Limoncelli, Strata R. Chalup, Christina J. Hogan

active measures, Amazon Web Services, anti-pattern, barriers to entry, business process, cloud computing, commoditize, continuous integration, correlation coefficient, database schema, Debian, defense in depth, delayed gratification, DevOps, domain-specific language, en.wikipedia.org, fault tolerance, finite state, Firefox, functional programming, Google Glasses, information asymmetry, Infrastructure as a Service, intermodal, Internet of things, job automation, job satisfaction, Kickstarter, load shedding, longitudinal study, loose coupling, Malcom McLean invented shipping containers, Marc Andreessen, place-making, platform as a service, premature optimization, recommendation engine, revision control, risk tolerance, side project, Silicon Valley, software as a service, sorting algorithm, standardized shipping container, statistical model, Steven Levy, supply-chain management, The future is already here, Toyota Production System, web application, Yogi Berra

First printing, September 2014 Contents at a Glance Contents Preface About the Authors Introduction Part I Design: Building It Chapter 1 Designing in a Distributed World Chapter 2 Designing for Operations Chapter 3 Selecting a Service Platform Chapter 4 Application Architectures Chapter 5 Design Patterns for Scaling Chapter 6 Design Patterns for Resiliency Part II Operations: Running It Chapter 7 Operations in a Distributed World Chapter 8 DevOps Culture Chapter 9 Service Delivery: The Build Phase Chapter 10 Service Delivery: The Deployment Phase Chapter 11 Upgrading Live Services Chapter 12 Automation Chapter 13 Design Documents Chapter 14 Oncall Chapter 15 Disaster Preparedness Chapter 16 Monitoring Fundamentals Chapter 17 Monitoring Architecture and Practice Chapter 18 Capacity Planning Chapter 19 Creating KPIs Chapter 20 Operational Excellence Epilogue Part III Appendices Appendix A Assessments Appendix B The Origins and Future of Distributed Computing and Clouds Appendix C Scaling Terminology and Concepts Appendix D Templates and Examples Appendix E Recommended Reading Bibliography Index Contents Preface About the Authors Introduction Part I Design: Building It 1 Designing in a Distributed World 1.1 Visibility at Scale 1.2 The Importance of Simplicity 1.3 Composition 1.3.1 Load Balancer with Multiple Backend Replicas 1.3.2 Server with Multiple Backends 1.3.3 Server Tree 1.4 Distributed State 1.5 The CAP Principle 1.5.1 Consistency 1.5.2 Availability 1.5.3 Partition Tolerance 1.6 Loosely Coupled Systems 1.7 Speed 1.8 Summary Exercises 2 Designing for Operations 2.1 Operational Requirements 2.1.1 Configuration 2.1.2 Startup and Shutdown 2.1.3 Queue Draining 2.1.4 Software Upgrades 2.1.5 Backups and Restores 2.1.6 Redundancy 2.1.7 Replicated Databases 2.1.8 Hot Swaps 2.1.9 Toggles for Individual Features 2.1.10 Graceful Degradation 2.1.11 Access Controls and Rate Limits 2.1.12 Data Import Controls 2.1.13 Monitoring 2.1.14 Auditing 2.1.15 Debug Instrumentation 2.1.16 Exception Collection 2.1.17 Documentation for Operations 2.2 Implementing Design for Operations 2.2.1 Build Features in from the Beginning 2.2.2 Request Features as They Are Identified 2.2.3 Write the Features Yourself 2.2.4 Work with a Third-Party Vendor 2.3 Improving the Model 2.4 Summary Exercises 3 Selecting a Service Platform 3.1 Level of Service Abstraction 3.1.1 Infrastructure as a Service 3.1.2 Platform as a Service 3.1.3 Software as a Service 3.2 Type of Machine 3.2.1 Physical Machines 3.2.2 Virtual Machines 3.2.3 Containers 3.3 Level of Resource Sharing 3.3.1 Compliance 3.3.2 Privacy 3.3.3 Cost 3.3.4 Control 3.4 Colocation 3.5 Selection Strategies 3.6 Summary Exercises 4 Application Architectures 4.1 Single-Machine Web Server 4.2 Three-Tier Web Service 4.2.1 Load Balancer Types 4.2.2 Load Balancing Methods 4.2.3 Load Balancing with Shared State 4.2.4 User Identity 4.2.5 Scaling 4.3 Four-Tier Web Service 4.3.1 Frontends 4.3.2 Application Servers 4.3.3 Configuration Options 4.4 Reverse Proxy Service 4.5 Cloud-Scale Service 4.5.1 Global Load Balancer 4.5.2 Global Load Balancing Methods 4.5.3 Global Load Balancing with User-Specific Data 4.5.4 Internal Backbone 4.6 Message Bus Architectures 4.6.1 Message Bus Designs 4.6.2 Message Bus Reliability 4.6.3 Example 1: Link-Shortening Site 4.6.4 Example 2: Employee Human Resources Data Updates 4.7 Service-Oriented Architecture 4.7.1 Flexibility 4.7.2 Support 4.7.3 Best Practices 4.8 Summary Exercises 5 Design Patterns for Scaling 5.1 General Strategy 5.1.1 Identify Bottlenecks 5.1.2 Reengineer Components 5.1.3 Measure Results 5.1.4 Be Proactive 5.2 Scaling Up 5.3 The AKF Scaling Cube 5.3.1 x: Horizontal Duplication 5.3.2 y: Functional or Service Splits 5.3.3 z: Lookup-Oriented Split 5.3.4 Combinations 5.4 Caching 5.4.1 Cache Effectiveness 5.4.2 Cache Placement 5.4.3 Cache Persistence 5.4.4 Cache Replacement Algorithms 5.4.5 Cache Entry Invalidation 5.4.6 Cache Size 5.5 Data Sharding 5.6 Threading 5.7 Queueing 5.7.1 Benefits 5.7.2 Variations 5.8 Content Delivery Networks 5.9 Summary Exercises 6 Design Patterns for Resiliency 6.1 Software Resiliency Beats Hardware Reliability 6.2 Everything Malfunctions Eventually 6.2.1 MTBF in Distributed Systems 6.2.2 The Traditional Approach 6.2.3 The Distributed Computing Approach 6.3 Resiliency through Spare Capacity 6.3.1 How Much Spare Capacity 6.3.2 Load Sharing versus Hot Spares 6.4 Failure Domains 6.5 Software Failures 6.5.1 Software Crashes 6.5.2 Software Hangs 6.5.3 Query of Death 6.6 Physical Failures 6.6.1 Parts and Components 6.6.2 Machines 6.6.3 Load Balancers 6.6.4 Racks 6.6.5 Datacenters 6.7 Overload Failures 6.7.1 Traffic Surges 6.7.2 DoS and DDoS Attacks 6.7.3 Scraping Attacks 6.8 Human Error 6.9 Summary Exercises Part II Operations: Running It 7 Operations in a Distributed World 7.1 Distributed Systems Operations 7.1.1 SRE versus Traditional Enterprise IT 7.1.2 Change versus Stability 7.1.3 Defining SRE 7.1.4 Operations at Scale 7.2 Service Life Cycle 7.2.1 Service Launches 7.2.2 Service Decommissioning 7.3 Organizing Strategy for Operational Teams 7.3.1 Team Member Day Types 7.3.2 Other Strategies 7.4 Virtual Office 7.4.1 Communication Mechanisms 7.4.2 Communication Policies 7.5 Summary Exercises 8 DevOps Culture 8.1 What Is DevOps? 8.1.1 The Traditional Approach 8.1.2 The DevOps Approach 8.2 The Three Ways of DevOps 8.2.1 The First Way: Workflow 8.2.2 The Second Way: Improve Feedback 8.2.3 The Third Way: Continual Experimentation and Learning 8.2.4 Small Batches Are Better 8.2.5 Adopting the Strategies 8.3 History of DevOps 8.3.1 Evolution 8.3.2 Site Reliability Engineering 8.4 DevOps Values and Principles 8.4.1 Relationships 8.4.2 Integration 8.4.3 Automation 8.4.4 Continuous Improvement 8.4.5 Common Nontechnical DevOps Practices 8.4.6 Common Technical DevOps Practices 8.4.7 Release Engineering DevOps Practices 8.5 Converting to DevOps 8.5.1 Getting Started 8.5.2 DevOps at the Business Level 8.6 Agile and Continuous Delivery 8.6.1 What Is Agile?

While traditionally change has been seen as a potential destabilizer, DevOps shows that infrastructure change can be done rapidly and frequently in a way that increases overall stability. DevOps is not a job title; you cannot hire a “DevOp.” It is not a product; you cannot purchase “DevOps software.” There are teams and organizations that exhibit DevOps culture and practices. Many of the practices are aided by one software package or another. But there is no box you can purchase, press the DevOps button, and magically “have” DevOps. Adam Jacob’s seminal “Choose Your Own Adventure” talk at Velocity 2010 (Jacob 2010) makes the case that DevOps is not a job description, but rather an inclusive movement that codifies a culture.

As a result availability becomes the problem for the entire organization, not just for the system administrators. DevOps is not just about developers and system administrators. In his blog post “DevOps is not a technology problem. DevOps is a business problem,” Damon Edwards (2010) emphasizes that DevOps is about collaboration and optimization across the whole organization. DevOps expands to help the process from idea to customer. It isn’t just about leveraging cool new tools. In fact, it’s not just about software. The organizational changes involved in creating a DevOps environment are best understood in contrast to the traditional software development approach. The DevOps approach evolved because of the drawbacks of such methods when developing custom web applications or cloud service offerings, and the need to meet the higher availability requirements of these environments. 8.1.1 The Traditional Approach For software packages sold in shrink-wrapped packages at computer stores or downloaded over the Internet, the developer is finished when the software is complete and ships.


Seeking SRE: Conversations About Running Production Systems at Scale by David N. Blank-Edelman

Affordable Care Act / Obamacare, algorithmic trading, Amazon Web Services, backpropagation, bounce rate, business continuity plan, business process, cloud computing, cognitive bias, cognitive dissonance, commoditize, continuous integration, crowdsourcing, dark matter, database schema, Debian, defense in depth, DevOps, domain-specific language, en.wikipedia.org, fault tolerance, fear of failure, friendly fire, game design, Grace Hopper, information retrieval, Infrastructure as a Service, Internet of things, invisible hand, iterative process, Kubernetes, loose coupling, Lyft, Marc Andreessen, microaggression, microservices, minimum viable product, MVC pattern, performance metric, platform as a service, pull request, RAND corporation, remote working, Richard Feynman, risk tolerance, Ruby on Rails, search engine result page, self-driving car, sentiment analysis, Silicon Valley, single page application, Snapchat, software as a service, software is eating the world, source of truth, the scientific method, Toyota Production System, web application, WebSocket, zero day

When introducing toil limits, you will need to educate your people to socialize the concept of toil and to get them to understand why toil is destructive to both the individual and the organization. Leverage Existing Enthusiasm for DevOps Although DevOps was once the exclusive domain of web-scale startups, it has become an accepted ideal in most enterprises. Born in 2009, DevOps is a broad cultural and professional movement focused on “world-class quality, reliability, stability, and security at ever lower cost and effort; and accelerated flow and reliability throughout the technology value stream.”12 There is quite a bit of overlap between the goals of DevOps and SRE. There is also quite a bit of overlap between the theoretical underpinnings of DevOps and SRE. Benjamin Treynor Sloss, the Google leader who first coined the term SRE and presided over the codification of Google’s SRE practices, sees a clear overlap between DevOps and SRE: One could view DevOps as a generalization of several core SRE principles to a wider range of organizations, management structures, and personnel.

Benjamin Treynor Sloss, the Google leader who first coined the term SRE and presided over the codification of Google’s SRE practices, sees a clear overlap between DevOps and SRE: One could view DevOps as a generalization of several core SRE principles to a wider range of organizations, management structures, and personnel. One could equivalently view SRE as a specific implementation of DevOps with some idiosyncratic extensions.13 Within the enterprise, DevOps has been applied most often to the limited scope that starts with software development and moves through the service delivery pipeline (from source code check-in to automated deployment). In these enterprises, the penetration of the DevOps transformation is minimal beyond deployment and the bulk of operations practices have remained unchanged.

There is overlap, where deployment/delivery to an operational site is a shared domain with SRE. There is also opposing ideals where DevOps is integrated across the pipeline, SRE is only on the operational infrastructure, and would be considered a silo under strict DevOps philosophy. — Joaquin Menchaca, Senior DevOps engineer, NinjaPants Consulting ◆ ◆ ◆ While many consider DevOps to be a single framework, it is in fact an umbrella for a pipeline of practices that span the organization’s value stream from concept to value creation. Most DevOps practices focus on the stages from development through deployment as Continuous Integration, Continuous Delivery and Continuous Deployment.


pages: 178 words: 33,275

Ansible Playbook Essentials by Gourav Shah

Amazon Web Services, cloud computing, Debian, DevOps, fault tolerance, web application

Our problem statement includes the following: Create a devops user on all hosts. This user should be part of the devops group. Install the "htop" utility. Htop is an improved version of top—an interactive system process monitor. Add the Nginx repository to the web servers and start it as a service. Now, we will create our first playbook and save it as simple_playbook.yml containing the following code: --- - hosts: all remote_user: vagrant sudo: yes tasks: - group: name: devops state: present - name: create devops user with admin privileges user: name: devops comment: "Devops User" uid: 2001 group: devops - name: install htop package action: apt name=htop state=present update_cache=yes - hosts: www user: vagrant sudo: yes tasks: - name: add official nginx repository apt_repository: repo: 'deb http://nginx.org/packages/ubuntu/ lucid nginx' - name: install nginx web server and ensure its at the latest version apt: name: nginx state: latest - name: start nginx service service: name: nginx state: started Our playbook contains two plays.

Since we are only going to specify tasks, we just need one subdirectory inside the base: $ mkdir -p roles/base/tasks Create the main.yml file inside roles/base/tasks to specify tasks for the base role. Edit the main.yml file and add the following code:--- # essential tasks. should run on all nodes - name: creating devops group group: name=devops state=present - name: create devops user user: name=devops comment="Devops User" uid=2001 group=devops - name: install htop package action: apt name=htop state=present update_cache=yes Creating an Nginx role We will now create a separate role for Nginx and move the previous code that we wrote in the simple_playbook.yml file to it, as follows: Create the directory layout for the Nginx role: $ mkdir roles/nginx $ cd roles/nginx $ mkdir tasks meta files $ cd tasks Create the install.yml file inside roles/base.

Tasks are a sequence of actions performed against a group of hosts that match the pattern specified in a play. Each play typically contains multiple tasks that are run serially on each machine that matches the pattern. For example, take a look at the following code snippet: - group: name:devops state: present - name: create devops user with admin privileges user: name: devops comment: "Devops User" uid: 2001 group: devops In the preceding example, we have two tasks. The first one is to create a group, and second is to create a user and add it to the group created earlier. If you notice, there is an additional line in the second task, which starts with name:.


Terraform: Up and Running: Writing Infrastructure as Code by Yevgeniy Brikman

Amazon Web Services, cloud computing, DevOps, en.wikipedia.org, full stack developer, functional programming, general-purpose programming language, microservices, Ruby on Rails

It may still make sense to have a sepa‐ rate Dev team responsible for the application code and an Ops team responsible for the operational code, but it’s clear that Dev and Ops need to work more closely together. This is where the DevOps movement comes from. DevOps isn’t the name of a team or a job title or a particular technology. Instead, it’s a set of processes, ideas, and techniques. Everyone has a slightly different definition of DevOps, but for this book, I’m going to go with the following: The goal of DevOps is to make software delivery vastly more efficient. Instead of multi-day merge nightmares, you integrate code continuously and always keep it in a deployable state.

And instead 18 | Chapter 1: Why Terraform of constant outages and downtime, you build resilient, self-healing systems, and use monitoring and alerting to catch problems that can’t be resolved automatically. The results from companies that have undergone DevOps transformations are astounding. For example, Nordstrom found that after applying DevOps practices to their organization, they were able to double the number of features they delivered per month, reduce defects by 50%, reduce lead times (the time from coming up with an idea to running code in production) by 60%, and reduce the number of production incidents by 60% to 90%. After HP’s LaserJet Firmware division began using DevOps practices, the amount of time their developers spent on developing new features went from 5% to 40% and overall development costs were reduced by 40%.

After HP’s LaserJet Firmware division began using DevOps practices, the amount of time their developers spent on developing new features went from 5% to 40% and overall development costs were reduced by 40%. Etsy used DevOps practices to go from stressful, infrequent deployments that caused numerous outages to deploying 25-50 times per day.1 There are four core values in the DevOps movement: Culture, Automation, Measure‐ ment, and Sharing (sometimes abbreviated as the acronym CAMS).2 This book is not meant as a comprehensive overview of DevOps (check out ??? for recommended reading), so I will just focus on one of these values: automation. The goal is to automate as much of the software delivery process as possible. That means that you manage your infrastructure not by clicking around a webpage or manually executing shell commands, but through code.


pages: 313 words: 75,583

Ansible for DevOps: Server and Configuration Management for Humans by Jeff Geerling

AGPL, Amazon Web Services, cloud computing, continuous integration, database schema, Debian, defense in depth, DevOps, fault tolerance, Firefox, full text search, Google Chrome, inventory management, loose coupling, microservices, Minecraft, MITM: man-in-the-middle, Ruby on Rails, web application

Since we’re only going to run a simple example, we will create a playbook in Tower’s default projects directory located in /var/lib/awx/projects: Log into the Tower VM: vagrant ssh Switch to the awx user: sudo su - awx Go to Tower’s default projects directory: cd /var/lib/awx/projects Create a new project directory: mkdir ansible-for-devops && cd ansible-for-devops Create a new playbook file, main.yml, within the new directory, with the following contents: 1 --- 2 - hosts: all 3 gather_facts: no 4 connection: local 5 6 tasks: 7 - name: Check the date on the server. 8 command: date Switch back to your web browser and get everything set up to run the test playbook inside Ansible Tower’s web UI: Create a new Organization, called ‘Ansible for DevOps’. Add a new User to the Organization, named John Doe, with the username johndoe and password johndoe1234. Create a new Team, called ‘DevOps Engineers’, in the ‘Ansible for DevOps’ Organization.

Create a new Team, called ‘DevOps Engineers’, in the ‘Ansible for DevOps’ Organization. Under the Team’s Credentials section, add in SSH credentials by selecting ‘Machine’ for the Credential type, and setting ‘Name’ to Vagrant, ‘Type’ to Machine, ‘SSH Username’ to vagrant, and ‘SSH Password’ to vagrant. Under the Team’s Projects section, add a new Project. Set the ‘Name’ to Tower Test, ‘Organization’ to Ansible for DevOps, ‘SCM Type’ to Manual, and ‘Playbook Directory’ to ansible-for-devops (Tower automatically detects all folders placed inside /var/lib/awx/projects, but you could also use an alternate Project Base Path if you want to store projects elsewhere).

Removed ‘Variables’ chapter (variables will be covered in-depth elsewhere). Added Appendix B - Ansible Best Practices and Conventions. Started tagging code in Ansible for DevOps GitHub repository to match manuscript version (starting with this version, 0.50). Fixed various layout issues. Version 0.49 (2014-04-24) Completed history of SSH in chapter 10. Clarified definition of the word ‘DevOps’ in chapter 1. Added section “Testing Ansible Playbooks” in chapter 14. Added links to Ansible for DevOps GitHub repository in the introduction and chapter 4. Version 0.47 (2014-04-13) Added Apache Solr example in chapter 4. Updated VM diagrams in chapter 4.


pages: 153 words: 45,721

Making Work Visible: Exposing Time Theft to Optimize Workflow by Dominica Degrandis, Tonianne Demaria

cloud computing, cognitive bias, DevOps, Elon Musk, en.wikipedia.org, informal economy, Jeff Bezos, loose coupling, microservices, Parkinson's law, sunk-cost fallacy, transaction costs

,” interview by Jeremy Hobson, Here and Now, March 31, 2014, www.wbur.org/hereandnow/2014/03/31/saying-no-psychology. 2. Todd Watts, “Addressing the Detrimental Effects of Context Switching with DevOps,” DevOps Blog, Software Engineering Institute at Carnegie Mellon University, March 5, 2015, https://insights.sei.cmu.edu/devops/2015/03/addressing-the-detrimental-effects-of-context-switching-with-devops.html. 3. “Context Switching,” OSDev.org, last modified December 29, 2015, http://wiki.osdev.org/Context_Switching. 4. Harry F. Harlow, as quoted in Daniel H. Pink, Drive: The Surprising Truth about What Motivates Us, (New York: Riverhead Books, 2011), 3. 5.

CONCLUSION: CALIBRATION Never let formal education get in the way of your learning. —Mark Twain Mountain View, California, September 2011 Following the first ever Kanban for DevOps class in Mountain View, California, a man sporting a kilt and long locks asked, “How do you integrate kanban with ticket systems without slowing down high-throughput Ops teams?” The man, whose name was Ben, wrote the question on a large, orange sticky note while standing at the back of the DevOps meetup room. Now, looking at the orange note, which I saved, I remember that I didn’t know how to answer the question at the time. It’s as valid a question today as it wasin 2011.

Troy Magennis, “Entangled: Solving the Hairy Problem of Team Dependencies,” Agile Alliance conference video, 1:15:15, August 5, 2015, https://www.agilealliance.org/resources/videos/entangled-solving-the-hairy-problem-of-team-dependencies/. 2. Maura Thomas, “Your Team’s Time Management Problem Might Be a Focus Problem,” Harvard Business Review, February 28, 2017, https://hbr.org/2017/02/your-teams-time-management-problem-might-be-a-focus-problem. 1.3 1. 2016 State of DevOps Report, (Portland, OR: Puppet Labs, 2016) 26, https://puppet.com/resources/whitepaper/2016-state-of-devops-report. 1.4 1. Ross Garber, as quoted in Gary Keller, The ONE Thing: The Surprisingly Simple Truth Behind Extraordinary Results (London: John Murray, 2013), 19. 1.5 1. Michael Feathers, Working Effectively with Legacy Code, (Upper Saddle River, NJ: Prentice Hall, 2004), xvi. 2.


pages: 355 words: 81,788

Monolith to Microservices: Evolutionary Patterns to Transform Your Monolith by Sam Newman

Airbnb, business process, continuous integration, database schema, DevOps, fault tolerance, ghettoisation, inventory management, Jeff Bezos, Kubernetes, loose coupling, microservices, MVC pattern, price anchoring, pull request, single page application, software as a service, source of truth, sunk-cost fallacy, telepresence

You can provide help and training, add new people to the team (perhaps by embedding people from the current operations team in delivery teams). No matter what change you want to bring about, just as with our software, you can make this happen in an incremental fashion. DevOps Doesn’t Mean NoOps! There is widespread confusion around DevOps, with some people assuming that it means that developers do all the operations, and that operations people are not needed. This is far from the case. Fundamentally, DevOps is a cultural movement based on the concept of breaking down barriers between development and operations. You may still want specialists in these roles, or you might not, but whatever you want to do, you want to promote common alignment and understanding across the people involved in delivering your software, no matter what their specific responsibilities are.

You may still want specialists in these roles, or you might not, but whatever you want to do, you want to promote common alignment and understanding across the people involved in delivering your software, no matter what their specific responsibilities are. For more on this, I recommend Team Topologies,9 which explores DevOps organizational structures. Another excellent resource on this topic, albeit broader in scope, is The Devops Handbook.10 Making a Change So if you shouldn’t just copy someone else’s structure, where should you start? When working with organizations that are changing the role of delivery teams, I like to begin with explicitly listing all the activities and responsibilities that are involved in delivering software within that company.

Introducing EventStorming. Leanpub, 2019. http://bit.ly/2n0zCLU. Brooks, Frederick P. The Mythical Man-Month, 20th Anniversary Edition. Addison Wesley, 1995. Bryant, Daniel. “Building Resilience in Netflix Production Data Migrations: Sangeeta Handa at QCon SF.” http://bit.ly/2m1EwHT. Devops Research & Assessment. Accelerate: State Of Devops Report 2018. http://bit.ly/2nPDNLe. Evans, Eric. Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley, 2003. Feathers, Michael. Working Effectively with Legacy Code. Prentice-Hall, 2004. Fowler, Martin. “Strangler Fig Application.” http://bit.ly/2p5xMKo.


Agile Project Management with Kanban (Developer Best Practices) by Eric Brechner

Amazon Web Services, cloud computing, continuous integration, crowdsourcing, DevOps, don't repeat yourself, en.wikipedia.org, index card, loose coupling, minimum viable product, pull request, software as a service

See also Williams, Laurie, and Robert Kessler. Pair Programming Illuminated. Reading, MA: Addison-Wesley, 2002. DevOps DevOps is the practice of developers collaborating closely with service operators to create and maintain software services together. When a service is being designed, service operators directly contribute. When there is a serious production issue, the developer (or developers) who wrote the service that’s affected are directly engaged. DevOps is often tied to testing in production (TIP) and continuous deployment. DevOps can be incorporated into Kanban as described earlier in the “Continuous deployment” section in Chapter 6.

For large projects (100+ people), the two roles are broken up, with a specialized PM, called a release manager, taking on the project-management responsibilities, and feature-team PMs acting mostly as analysts, but who are also responsible for reporting their team’s status to the release manager. PM is a tricky role at Microsoft—good PMs are highly valued. My current teams don’t have testers. Developers, automation, or partners validate improvements before they are used by larger customer audiences, basically running a DevOps model (see Chapter 9, “Further resources and beyond,” for details on DevOps). We still have a Validate step because that is real work that must be tracked for each item, even though many of the people performing that step might not be on our team. The more you use Kanban, the more you focus on the smooth flow of work than on getting caught up in the people assigned.

In addition, the Validate step may be performed by developers instead of by testers. For a DevOps team, the example of the Validate done rule from the Kanban quick-start guide (Chapter 2) is particularly pertinent: “The work is deployed to production and tried by a significant subset of real customers. All issues found are resolved.” See also Humble, Jez, and David Farley. Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation. Upper Saddle River, NJ: Addison-Wesley, 2010. All of my current and past Xbox teams use DevOps. Many of my teams use TDD and refactoring, and some have used pair programming.


pages: 561 words: 157,589

WTF?: What's the Future and Why It's Up to Us by Tim O'Reilly

4chan, Affordable Care Act / Obamacare, Airbnb, Alvin Roth, Amazon Mechanical Turk, Amazon Web Services, artificial general intelligence, augmented reality, autonomous vehicles, barriers to entry, basic income, Bernie Madoff, Bernie Sanders, Bill Joy: nanobots, bitcoin, blockchain, Bretton Woods, Brewster Kahle, British Empire, business process, call centre, Capital in the Twenty-First Century by Thomas Piketty, Captain Sullenberger Hudson, Chuck Templeton: OpenTable:, Clayton Christensen, clean water, cloud computing, cognitive dissonance, collateralized debt obligation, commoditize, computer vision, corporate governance, corporate raider, creative destruction, crowdsourcing, Danny Hillis, data acquisition, deskilling, DevOps, disinformation, Donald Davies, Donald Trump, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Filter Bubble, Firefox, Flash crash, full employment, future of work, George Akerlof, gig economy, glass ceiling, Google Glasses, Gordon Gekko, gravity well, greed is good, Guido van Rossum, High speed trading, hiring and firing, Home mortgage interest deduction, Hyperloop, income inequality, independent contractor, index fund, informal economy, information asymmetry, Internet Archive, Internet of things, invention of movable type, invisible hand, iterative process, Jaron Lanier, Jeff Bezos, jitney, job automation, job satisfaction, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Kevin Kelly, Khan Academy, Kickstarter, Kim Stanley Robinson, knowledge worker, Kodak vs Instagram, Lao Tzu, Larry Wall, Lean Startup, Leonard Kleinrock, Lyft, Marc Andreessen, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, McMansion, microbiome, microservices, minimum viable product, mortgage tax deduction, move fast and break things, move fast and break things, Network effects, new economy, Nicholas Carr, obamacare, Oculus Rift, packet switching, PageRank, pattern recognition, Paul Buchheit, peer-to-peer, peer-to-peer model, Ponzi scheme, race to the bottom, Ralph Nader, randomized controlled trial, RFC: Request For Comment, Richard Feynman, Richard Stallman, ride hailing / ride sharing, Robert Gordon, Robert Metcalfe, Ronald Coase, Sam Altman, school choice, Second Machine Age, secular stagnation, self-driving car, SETI@home, shareholder value, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart contracts, Snapchat, Social Responsibility of Business Is to Increase Its Profits, social web, software as a service, software patent, spectrum auction, speech recognition, Stephen Hawking, Steve Ballmer, Steve Jobs, Steven Levy, Stewart Brand, strong AI, TaskRabbit, telepresence, the built environment, The future is already here, The Future of Employment, the map is not the territory, The Nature of the Firm, The Rise and Fall of American Growth, The Wealth of Nations by Adam Smith, Thomas Davenport, Tragedy of the Commons, transaction costs, transcontinental railway, transportation-network company, Travis Kalanick, trickle-down economics, Uber and Lyft, Uber for X, uber lyft, ubercab, universal basic income, US Airways Flight 1549, VA Linux, Watson beat the top human players on Jeopardy!, We are the 99%, web application, Whole Earth Catalog, winner-take-all economy, women in the workforce, Y Combinator, yellow journalism, zero-sum game, Zipcar

We organized a summit to host the leaders of the emerging field of web operations, and soon thereafter launched our Velocity Conference to host the growing number of professionals who worked behind the scenes to make Internet sites run faster and more effectively. The Velocity Conference brought together a community working on a new discipline that came to be called DevOps, a portmanteau word combining software development and operations. (The term was coined a few months after the first Velocity Conference by Patrick Debois and Andrew “Clay” Shafer, who ran a series of what they called “DevOps Days” in Belgium.) The primary insight of DevOps is that there were traditionally two separate groups responsible for the technical infrastructure of modern web applications: the developers who build the software, and the IT operations staff who manage the servers and network infrastructure on which it runs.

And those two groups typically didn’t talk to each other, leading to unforeseen problems once the software was actually deployed at scale. DevOps is a way of seeing the entire software life cycle as analogous to the lean manufacturing processes that Toyota had identified for manufacturing. DevOps takes the software life cycle and workflow of an Internet application and turns it into the workflow of the organization, building in measurement, identifying key choke points, and clarifying the network of essential communication. In an appendix to The Phoenix Project, a novelized tutorial on DevOps created by Gene Kim, Kevin Behr, and George Spafford as homage to The Goal, the famous novel about the principles of lean manufacturing, Gene Kim notes that speed is one of the key competitive advantages that DevOps brings to an organization.

“Just as mass production changed the way products were assembled and continuous improvement changed how manufacturing was done,” he writes, “continuous experimentation . . . improve[s] the way we optimize business processes in our organizations.” But DevOps also brings higher reliability and better responsiveness to customers. Gene Kim characterizes what happens in a high-performance DevOps organization: “Instead of upstream Development groups causing chaos for those in the downstream work centers (e.g., QA, IT operations, and Infosec), Development is spending twenty percent of its time helping ensure that work flows smoothly through the entire value stream, speeding up automated tests, improving deployment infrastructure, and ensuring that all applications create useful production telemetry.”


pages: 1,409 words: 205,237

Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale by Jan Kunigk, Ian Buss, Paul Wilkinson, Lars George

Amazon Web Services, barriers to entry, bitcoin, business intelligence, business process, cloud computing, commoditize, computer vision, continuous integration, create, read, update, delete, database schema, Debian, DevOps, domain-specific language, fault tolerance, Firefox, functional programming, Google Chrome, Induced demand, Infrastructure as a Service, Internet of things, job automation, Kickstarter, Kubernetes, loose coupling, microservices, natural language processing, Network effects, platform as a service, source of truth, statistical model, web application

Note When you run in the public cloud, as you will see in Chapter 16, you can also choose PaaS/SaaS solutions, which reduce the footprint of roles (as well as your flexibility and control over infrastructure components). Do I Need DevOps? In a sense, the rise of the term DevOps mirrors the rise of corporate distributed computing. In some definitions DevOps signals a merging of the developer and operator roles into one, often skipping any compartmentalization as described earlier. Others simply define it as operators who automate as much as possible in code, which is certainly beneficial in the world of distributed systems.

skill profile, Big data engineer split responsibilities with other team roles, Split Responsibilities bigdata-interop project (Google), Hadoop integration binary releases of Hadoop, Installation Choices bind (LDAP), LDAP Authentication BitLocker, Encryption in Microsoft Azure bits per second (bps), translating to bytes per second (B/s), Measuring throughput blob storage (Azure), Azure storage options, Azure storage options, Blob storageencryption in, Encryption in Microsoft Azure in Hadoop, Azure storage options integration with Hadoop, Azure Blob storage block blobs (Azure), Azure storage options block encryption key (BEK), Encryption in Microsoft Azure block I/O prioritization, Linux kernel cgroups and, Requirements for Multitenancy block locality, Erasure Coding Versus Replicationlocality opitmization, Locality optimization block reports, HDFS blocks, HDFSblock size, The Linux Page Cache block-level filesystems, Filesystems-Filesystems different meanings of, Sequential I/O performance mlocked by the DataNode, Short-Circuit and Zero-Copy Reads placement of, using replication, Erasure Coding Versus Replication replication of, Replication bring your own key (BYOK), Options for Encryption in the Cloud, BYOK, Recommendations and Summary for Cloud Encryption brokers (Kafka), Kafka bucket policies (Amazon S3), Amazon Simple Storage Service bucketsin Amazon S3, AWS storage options, Caveats and service limits in Google Cloud Storage (GCS), Storage options business continuity team, Policies and Objectives business intelligence project case study, Case Study: A Typical Business Intelligence Project-Do I Need a Center of Excellence/Competence?center for excellence/competence in solution with Hadoop, Do I Need a Center of Excellence/Competence? DevOps and solution with Hadoop, Do I Need DevOps? new team setup for solution with Hadoop, New Team Setup solution overview with Hadoop, Solution Overview with Hadoop split responsibilities in solution with Hadoop, Split Responsibilities traditional solution approach, The Traditional Approach-The Traditional Approach typical team setup, Typical Team Setup-Systems engineer C C++, Apache Impala, Everything Is Java, Web UIsImpala and Kudu implementations, The role of the x86 architecture server certificate verification, Application Integration X.509 certificate format, Converting Certificates cablingcross-cabling racks, Network in stacked networks, Stacked network cabling considerations using high-speed cables in network Layer 1, Layer 1 Recommendations caches, Commodity Serverscache coherence, Commodity Servers disk cache, Disk cacheenabled and disabled, throughput testing, Disk cache HDFS, implementation of, Important System Calls Kerberos tickets stored in, Kerberos Clients L3 cache size and core count, CPU Specifications Linux page cache, The Linux Page Cache Linux, access for filesystem I/O, User Space simulating a cache miss, Sequential I/O performance storage controller cache, Controller cacheguidelines for, Guidelines read-ahead caching, Read-ahead caching throughput testing, Disk cache write-back caching, Write-back caching caching, Hadoop and the Linux Storage Stackenabling name service caching, OS Configuration for Hadoop HDFS, Short-Circuit and Zero-Copy Readscache administration commands, Short-Circuit and Zero-Copy Reads instructing Linux to minimize, The Linux Page Cache of Sentry roles and permissions by services, Impala Canonical Name (CNAME) records (DNS), DNS round robin CAP theorem, Quorum spanning with two datacenters catalog, Impala daemons catalog server, Catalog server categories (cable), Layer 1 Recommendations center of excellence or competence, Do I Need a Center of Excellence/Competence?

in the public cloud, security of, Assessing the Risk ingest and intercluster connectivity, Ingest and Intercluster Connectivity-Hardwarehardware, Hardware software, Software replacements and repair, Replacements and Repairoperational procedures, Operational Procedures space and racking constraints, Space and Racking Constraints typical pitfalls, Typical Pitfalls DataNode, HDFS, The Linux Page Cachechecksumming, opting out of, Short-Circuit and Zero-Copy Reads failure during consistency operations, Disk cache Datasets, Apache Spark dd tool, Disk cache, Single writes and readsusing to measure sequential I/O performance for disks, Sequential I/O performance-Sequential I/O performance Debian, OS Choicesobtaining sysbench, Validation approaches package management, Installation Process dedicated subnets, Layer 3 Recommendations dedicated switches, Layer 1 Recommendations deep learning, Data scientist default group, Hue delegation tokens, Delegation Tokens, Impersonation, Security, Deployment considerations, Temporary security credentialspersistent store for, in Hive high availability, Deployment considerations dependencieshigh availability for Kafka dependencies, Deployment considerations in Hadoop stack, A Tour of the Landscape deployment, Hadoop Deployment-Summarycloud deployment for Hadoop, Network Architecture deploying HBase for high availability, Deployment considerations deploying HDFS for high availability, Deployment recommendations deploying Hue for high availability, Deployment options deploying Impala for high availability, Deployment considerations deploying Kafka for high availability, Deployment considerations deploying KMS for high availability, Deployment considerations deploying Oozie for high efficiency, Deployment considerations deploying Solr for high availability, Deployment considerations deploying YARN for high availability, Deployment recommendations deploying ZooKeeper for high availability, Deployment considerations deployment risks in the cloud, Deployment Risksmitigation, Mitigation Hadoop distribution architecture, Distribution Architecture-Distribution Architecture Hadoop distributions, Hadoop Distributions-Hadoop Distributions installation choices for Hadoop distributions, Installation Choices-Installation Choices installation process for Hadoop platform, Installation Process-Summary of long-lived clusters in the cloud, Configuration and Templating-Post-install tasksone-click deployment, One-Click Deployments developer keys (GCP Cloud Storage), GCP Cloud Storage developers, Software developer developmentdata replication for software development, Replication for Software Development multiple clusters for software development, Multiple Clusters for Software Developmentvariations in cluster sizing, Variation in cluster sizing DevOps, Do I Need DevOps? digest scheme (ZooKeeper), ZooKeeper direct bind (LDAP), LDAP Authentication disaster recovery, Multiple Clusters for Resiliency, Data Replication, Alternative solutions(see also backups and disaster recovery) operational procedures for, in datacenters, Operational Procedures disaster tolerance, cluster spanning used for, Cluster Spanning disksdata encryption on, reasons for, At-Rest Encryption dedicated disks in ZooKeeper deployment, Deployment considerations disk and network tests with TeraGen, Disk and network tests disk layer storage, Disk Layer-Disk cachecharacteristics of hard disk drive types, Disk Layer disk cache, Disk cache disk sizes, Disk sizes SAS, SATA, and Nearline SAS drives, SAS, Nearline SAS, or SATA (or SSDs)?


pages: 241 words: 43,073

Puppet 3 Beginner's Guide by John Arundel

cloud computing, Debian, DevOps, job automation, job satisfaction, Lao Tzu, Larry Wall, Network effects, SpamAssassin

Developers, who now often build applications, services, and businesses by themselves, couldn't do what they do without knowing how to set up and fix servers. The term "devops" has begun to be used to describe the growing overlap between these skill sets. It can mean sysadmins who happily turn their hand to writing code when needed, or developers who don't fear the command line, or it can simply mean the people for whom the distinction is no longer useful. Devops write code, herd servers, build apps, scale systems, analyze outages, and fix bugs. With the advent of CM systems, devs and ops are now all just people who work with code.

Introduction to Puppet The problem Configuration management A day in the life of a sysadmin Keeping the configuration synchronized Repeating changes across many servers Self-updating documentation Coping with different platforms Version control and history Solving the problem Reinventing the wheel A waste of effort Transferable skills Configuration management tools Infrastructure as code Dawn of the devop Job satisfaction The Puppet advantage Welcome aboard The Puppet way Growing your network Cloud scaling What is Puppet? The Puppet language Resources and attributes Summary Configuration management What Puppet does The Puppet advantage Scaling The Puppet language 2. First steps with Puppet What you'll need Time for action – preparing for Puppet Time for action – installing Puppet Your first manifest How it works Applying the manifest What just happened?

We can adopt the tools and techniques that regular programmers—who write code in Ruby or Java, for example—have used for years: Powerful editing and refactoring tools Version control Tests Pair programming Code reviews This can make us more agile and flexible as system administrators, able to deal with fast-changing requirements and deliver things quickly to the business. We can also produce higher-quality, more reliable work. Dawn of the devop Some of the benefits are more subtle, organizational, and psychological. There is often a divide between "devs", who wrangle code, and "ops", who wrangle configuration. Traditionally, the skill sets of the two groups haven't overlapped much. It was common until recently for system administrators not to write complex programs, and for developers to have little or no experience of building and managing servers.


pages: 419 words: 102,488

Chaos Engineering: System Resiliency in Practice by Casey Rosenthal, Nora Jones

Amazon Web Services, Asilomar, autonomous vehicles, barriers to entry, blockchain, business continuity plan, business intelligence, business process, cloud computing, complexity theory, continuous integration, cyber-physical system, database schema, DevOps, fault tolerance, hindsight bias, Kubernetes, linear programming, loose coupling, microservices, MITM: man-in-the-middle, node package manager, pull request, ransomware, risk tolerance, Silicon Valley, six sigma, Skype, software as a service, statistical model, the scientific method, WebSocket

About the Author John Allspaw has worked in software systems engineering and operations for over twenty years in many different environments. John’s publications include The Art of Capacity Planning and Web Operations (both O’Reilly) as well as the foreword to The DevOps Handbook (IT Revolution Press). His 2009 Velocity talk with Paul Hammond, “10+ Deploys Per Day: Dev and Ops Cooperation” helped start the DevOps movement. John served as CTO at Etsy, and holds an MSc in Human Factors and Systems Safety from Lund University. 1 David D. Woods, STELLA: Report from the SNAFUcatchers Workshop on Coping with Complexity (Columbus, OH: The Ohio State University, 2017). 2 Principles of Chaos Engineering, retrieved June 10, 2019, from https://principlesofchaos.org. 3 Deploying a change in software with its execution path gated “on” only for some portion of users is known as a “dark” deploy.

Adoption can be split into four considerations: Who bought into the idea How much of the organization participates The prerequisites The obstacles Who Bought into Chaos Engineering Early in the adoption cycle, it is likely that the individual contributors (ICs) who are closest to the repercussions of an outage or security incident are most likely to adopt Chaos Engineering or seek out the discipline for obvious reasons. This is often followed by internal championing, with advocacy often found in DevOps, SRE, and Incident Management teams. In more traditional organizations, this often falls to Operations or IT. These are the teams that understand the pressure of being paged for an availability incident. Of course, the urgency around getting the system back online sets up obstacles for learning.

Since Chaos Engineering can be applied to infrastructure, inter-application, security, and other levels of the sociotechnical system, there are many opportunities for the practice to spread to different parts of the organization. Ultimate adoption occurs when practicing Chaos Engineering is accepted as a responsibility for all individual contributors throughout the organization, at every level of the hierarchy, even if a centralized team provides tooling to make it easier. This is similar to how in a DevOps-activated organization, every team is responsible for the operational properties of their software, even though centralized teams may specifically contribute to improving some of those properties. Prerequisites There are fewer prerequisites for Chaos Engineering than most people think. The first question to ask an organization considering adopting the practice is whether or not they know when they are in a degraded state.


pages: 282 words: 85,658

Ask Your Developer: How to Harness the Power of Software Developers and Win in the 21st Century by Jeff Lawson

Airbnb, AltaVista, Amazon Web Services, barriers to entry, big data - Walmart - Pop Tarts, big-box store, bitcoin, business process, call centre, Chuck Templeton: OpenTable:, cloud computing, coronavirus, Covid-19, COVID-19, create, read, update, delete, cryptocurrency, David Heinemeier Hansson, DevOps, Elon Musk, financial independence, global pandemic, global supply chain, Internet of things, Jeff Bezos, Lean Startup, loose coupling, Lyft, Marc Andreessen, Mark Zuckerberg, microservices, minimum viable product, Mitch Kapor, move fast and break things, move fast and break things, Paul Graham, peer-to-peer, ride hailing / ride sharing, risk tolerance, Ruby on Rails, side project, Silicon Valley, Silicon Valley startup, Skype, software as a service, software is eating the world, sorting algorithm, Startup school, Steve Ballmer, Steve Jobs, Telecommunications Act of 1996, Toyota Production System, transaction costs, transfer pricing, Uber and Lyft, uber lyft, ubercab, web application, Y Combinator

But beneath his easygoing manner there’s an intensity and discipline that he learned way back in Marine boot camp. This combination was crucial as we started to embrace a methodology called DevOps in building our developer platform. Even if you don’t work directly in technology you might have heard the term DevOps without really understanding what it is. A cynic might say DevOps has become kind of the “flavor of the month” for software development, the way Agile and Lean Startup did before. Amazon lists more than a thousand books on the topic. You could spend years learning everything about DevOps, but for our purposes I’m going to provide an extremely simplified explanation, which goes like this: Once upon a time, software development organizations broke the process of producing a piece of code into multiple roles.

At each step there could be delays as a developer waited for a test engineer or release engineer to finish other projects and then get to theirs. Multiply all those potential delays by the number of steps, and you can see how things could get bogged down. DevOps, first conceived about a decade ago, represents an attempt to speed things up by having one developer handle all of the steps. The concept is reflected in the name itself: instead of having “developers” who write code and “operators” who do everything else, you combine all of the duties in one person. In a DevOps environment, the same developer writes the code, tests the code, packages it, monitors it, and remains responsible for it after it goes into production.

Yet if every team has to become domain experts, and build their own automation for each of these categories, it would take forever. That’s where Jason’s team comes in. Jason defines his job, and that of the platform team—a group of about one hundred engineers across thirteen small teams—as “to provide software that will enable a traditional software developer to be successful in a DevOps culture without having a deep background in all of these specialized disciplines.” They don’t develop software that ships to customers. They make software that developers use to write, test, deploy, and monitor software. If anything about our process resembles an assembly line, this is probably the closest thing.


pages: 409 words: 112,055

The Fifth Domain: Defending Our Country, Our Companies, and Ourselves in the Age of Cyber Threats by Richard A. Clarke, Robert K. Knake

A Declaration of the Independence of Cyberspace, Affordable Care Act / Obamacare, Airbnb, Albert Einstein, Amazon Web Services, autonomous vehicles, barriers to entry, bitcoin, Black Swan, blockchain, borderless world, business cycle, business intelligence, call centre, Cass Sunstein, cloud computing, cognitive bias, commoditize, computer vision, corporate governance, cryptocurrency, data acquisition, DevOps, disinformation, don't be evil, Donald Trump, Edward Snowden, Exxon Valdez, global village, immigration reform, Infrastructure as a Service, Internet of things, Jeff Bezos, Julian Assange, Kubernetes, Mark Zuckerberg, Metcalfe’s law, MITM: man-in-the-middle, move fast and break things, move fast and break things, Network effects, open borders, platform as a service, Ponzi scheme, ransomware, Richard Thaler, Sand Hill Road, Schrödinger's Cat, self-driving car, shareholder value, Silicon Valley, Silicon Valley startup, Skype, smart cities, Snapchat, software as a service, Steven Levy, Stuxnet, technoutopianism, The future is already here, Tim Cook: Apple, undersea cable, WikiLeaks, Y2K, zero day

a concept borrowed from the military: For a thorough discussion of the OODA loop, see Daniel Ford, Vision So Noble: John Boyd, the OODA Loop, and America’s War on Terror (n.p.: CreateSpace Independent Publishing Platform, 2010). DevOps, short for “development and operations”: For a kind and gentle explanation of DevOps (in novel form) see Gene Kim, Kevin Behr, and George Spafford, The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win (Glenside, Penn.: IT Revolution Press, 2013). According to data from Spamhaus: The “Spamhaus Botnet Threat Report 2017” put Amazon at number two on its list, behind the French hosting provider OVH.

In short, these technologies are being designed with security built in. While Yu notes that since the 1980s, security and IT have been diverging and CISOs and CIOs are increasingly reporting to different leaders (and at one another’s throats), he sees trends such as DevOps, bring your own device, and the ever-present specter of shadow IT bringing them back together. DevOps, short for “development and operations,” shortens the software development life cycle by bringing the development team and the operations team in closer alignment so they can rapidly push out new versions of software. The fact that employees tend to prefer to carry around one device and not two has forced most companies to allow work to be done on personal devices.

But some companies are starting to embrace these trends despite the apprehension of their security teams. If carried out with security in mind, they are finding, doing so has security benefits. Yu argues that by embracing trends in technology rather than fighting against them, security can harness the speed of modern businesses as a weapon to be wielded against malicious cyber actors. With DevOps, companies may be releasing updated versions of their software dozens of times a day. That means that when bugs are discovered, they can be fixed immediately. It also means that bugs may be eliminated in rewrites before an attacker can identify and exploit them. The concept of chaos engineering pioneered at Netflix has corporations running a constant stream of experiments to test the resilience of their systems.


pages: 232 words: 71,237

Kill It With Fire: Manage Aging Computer Systems by Marianne Bellotti

anti-pattern, barriers to entry, cloud computing, cognitive bias, computer age, continuous integration, create, read, update, delete, Daniel Kahneman / Amos Tversky, database schema, DevOps, fault tolerance, fear of failure, Google Chrome, iterative process, loose coupling, microservices, minimum viable product, platform as a service, pull request, QWERTY keyboard, Richard Stallman, risk tolerance, Schrödinger's Cat, side project, software as a service, Steven Levy, web application, Y Combinator, Y2K

The problem with seeing Conway’s law as prescriptive is that technology is filled with little shifts in perception like this. The technology in our example has not fundamentally changed, but our groupings of what belongs with what have changed. We could tell the same story in reverse: what if we want to transition away from a traditional operations team to a DevOps model? Do our operations people now get moved to the product engineering teams? Do backend engineers learn the DevOps tools with operations acting as an oversight authority? Do we keep operations where it is and just ask them to automate? Reorgs Are Traumatic The reorg is the matching misused tool of the full rewrite. As the software engineer gravitates toward throwing everything out and starting over to project confidence and certainty, so too does the software engineers’ manager gravitate toward the reorg to fix all manner of institutional ills.

One of the reasons the DevOps and SRE movements have had such a beneficial effect on software development is that they seek to re-establish accountability. If product engineering teams play a role in running and maintaining their own infrastructure, they are the ones who feel the impact of their own decisions. When they build something that doesn’t scale, they are the ones who are awakened at 3 am with a page. Making software engineers responsible for the health of their infrastructure instead of a separate operations team unmutes the feedback loop. But anyone who has ever tried to run an SRE or DevOps team will tell you that maintaining the expectation that product engineering teams should be responsible for their infrastructure is easier said than done.

See failure drills chief information officer (CIO), 15 CloudFlare, 204 Code Yellow, 116–122, 156, 193 Collins Aerospace, 204 column width, 18 commercial cloud, 3, 15, 69, 86 Committee on Data Systems Languages (CODASYL), 29 compiler design, 71 complexity, 41, 46–50, 61, 103, 108, 137, 146, 173, 207 compliance, 90 configuration management, 65 containerization, 65 continuous integration, 72, 183 contract testing, 110 control flow graphs, 72 conventions, 106 Conway, Melvin, 140, 144 Conway’s law, 98, 140–141, 149–152, 156, 159 costs, 9 coupling, 46–50, 56, 64, 66, 85, 101, 103, 173 cross-compatibility, 64, 69 D databases, 36 data contracts, 102–110, 171 data flow graphs, 72 Deep Impact probe, the, 198 Dekker, Sidney, 145, 167 delays, 211 Department of Justice, 24 Department of Treasury, 15 Department of Veterans Affairs, 68 dependencies, 68, 111, 115 graphs, 71 management, 64 deprecations, 179 development environments, 72 development view, 173 DevOps, 150, 218 diagnosis-policies-actions, 184–187 drift, 145–146 E ECMA Office Open XML specification, 61 encoding, 20 enterprise architects, 77 enterprise service buses (ESB), 7–8 Etsy, 166 Excel, 61 F FAA (Federal Aviation Administration), 204 Facebook, 114 failovers, 55 failure drills, 114, 153, 172, 178 Falcone, Rino, 168 Feathers, Michael, 55 feature parity, 79 Federal Aviation Administration (FAA), 204 feedback loops, 210–211, 218–219 filetypes PDF, 67 fixed-point, 70 Flickr, 102 floating-point, 70 flows, 210 Fog Creek Software, 33 Ford, Neal, 105 formal methods, 109 Alloy, 110 Petri nets, 110 TLA+, 110 formal specification, 109–110 Fowler, Chad, 33 frameworks Angular.js, 150 Node.js, 36, 68 React.js, 36, 150 Vue.js, 150 G garbage collection, 44, 206 Gawker Media, 204 Glidden, Carlos, 19 GNU, 25 Google, 113, 117–118, 169, 205, 207 Chrome 119 GPS, 202–204 Groupon, 102 H Hadoop, 15 hard cutoff, 57 hardware lifecycles, 196 Harvard Business Review, 140 Harvard’s Kennedy School for Government, 162 Hölzle, Urs, 119 hooks, 65 HTTPS (HyperText Transfer Protocol Secure), 114 human factors, 145 I IBM, 19, 140, 198 Simon, 5 incentives, 34, 122, 140–144, 148–156, 163–165 incident commander, 121 incident response, 109, 188 InsightMaker, 212 Instagram, 204 International Telegraph Alphabet No. 1.


pages: 395 words: 110,994

The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win by Gene Kim, Kevin Behr, George Spafford

air freight, anti-work, business intelligence, business process, centre right, cloud computing, continuous integration, dark matter, database schema, DevOps, friendly fire, index card, inventory management, Lean Startup, shareholder value, Toyota Production System

He says, “I want you to write a book, describing the Three Ways and how other people can replicate the transformation you’ve made here at Parts Unlimited. Call it The DevOps Cookbook and show how IT can regain the trust of the business and end decades of intertribal warfare. Can you do that for me?” Write a book? He can’t be serious. I reply, “I’m not a writer. I’ve never written a book before. In fact, I haven’t written anything longer than an e-mail in a decade.” Unamused, he says sternly, “Learn.” Shaking my head for a moment, I finally say, “Of course. It would be an honor and a privilege to write The DevOps Cookbook for you while I embark on what will probably be the most challenging three years of my entire career.”

With my drink in hand, I ponder how far we’ve come. During the Phoenix launch, I doubt anyone in this group could have imagined being part of a super-tribe that was bigger than just Dev or Ops or Security. There’s a term that we’re hearing more lately: something called “DevOps.” Maybe everyone attending this party is a form of DevOps, but I suspect it’s something much more than that. It’s Product Management, Development, IT Operations, and even Information Security all working together and supporting one another. Even Steve is a part of this super-tribe. In that moment, I let myself feel how incredibly proud I am of everyone in this room.

Old instincts kicking in, I urgently look around the room for Patty who is making a beeline toward me, her phone already in her hand. “First off, congratulations, boss,” she says, with a half smile on her face. “You want the bad news or the good news first?” Turning to her, I say with a sense of calm and inner peace, “What have we got, Patty?” To access more free resources on IT, DevOps, and helping your business win, visit: http://itrevolution.com/next Join us in spreading the word by leaving a review on Amazon or GoodReads, writing a blog post or telling a friend! Acknowledgements First and foremost, I want to acknowledge all the support from my loving wife, who put up with far more than I promised, Margueritte, and my sons, Reid, Parker, and Grant.


pages: 719 words: 181,090

Site Reliability Engineering: How Google Runs Production Systems by Betsy Beyer, Chris Jones, Jennifer Petoff, Niall Richard Murphy

Air France Flight 447, anti-pattern, barriers to entry, business intelligence, business process, Checklist Manifesto, cloud computing, combinatorial explosion, continuous integration, correlation does not imply causation, crowdsourcing, database schema, defense in depth, DevOps, en.wikipedia.org, fault tolerance, Flash crash, George Santayana, Google Chrome, Google Earth, information asymmetry, job automation, job satisfaction, Kubernetes, linear programming, load shedding, loose coupling, meta-analysis, microservices, minimum viable product, MVC pattern, performance metric, platform as a service, revision control, risk tolerance, side project, six sigma, the scientific method, Toyota Production System, trickle-down economics, web application, zero day

For example, the decision to stop releases for the remainder of the quarter once an error budget is depleted might not be embraced by a product development team unless mandated by their management. DevOps or SRE? The term “DevOps” emerged in industry in late 2008 and as of this writing (early 2016) is still in a state of flux. Its core principles—involvement of the IT function in each phase of a system’s design and development, heavy reliance on automation versus human effort, the application of engineering practices and tools to operations tasks—are consistent with many of SRE’s principles and practices. One could view DevOps as a generalization of several core SRE principles to a wider range of organizations, management structures, and personnel.

If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-92912-4 [LSI] Foreword Google’s story is a story of scaling up. It is one of the great success stories of the computing industry, marking a shift towards IT-centric business. Google was one of the first companies to define what business-IT alignment meant in practice, and went on to inform the concept of DevOps for a wider IT community. This book has been written by a broad cross-section of the very people who made that transition a reality. Google grew at a time when the traditional role of the system administrator was being transformed. It questioned system administration, as if to say: we can’t afford to hold tradition as an authority, we have to think anew, and we don’t have time to wait for everyone else to catch up.

., [Gla02] for more details. 2 For our purposes, reliability is “The probability that [a system] will perform a required function without failure under stated conditions for a stated period of time,” following the definition in [Oco12]. 3 The software systems we’re concerned with are largely websites and similar services; we do not discuss the reliability concerns that face software intended for nuclear power plants, aircraft, medical equipment, or other safety-critical systems. We do, however, compare our approaches with those used in other industries in Chapter 33. 4 In this, we are distinct from the industry term DevOps, because although we definitely regard infrastructure as code, we have reliability as our main focus. Additionally, we are strongly oriented toward removing the necessity for operations—see Chapter 7 for more details. 5 In addition to this great story, she also has a substantial claim to popularizing the term “software engineering.”


Mastering Ansible by Jesse Keating

cloud computing, Debian, DevOps, don't repeat yourself, microservices, remote working

He has worked on both start-ups and big established companies. His interests include SDN, NFV, Network Automation, DevOps, and Cloud technologies. He also likes to try out and follow open source projects in these areas. You can find him on his blog at https://sreeninet.wordpress.com/. Tim Rupp has been working in various fields of computing for the last 10 years. He has held positions in computer security, software engineering, and most recently, in the fields of Cloud computing and DevOps. He was first introduced to Ansible while at Rackspace. As part of the Cloud engineering team, he made extensive use of the tool to deploy new capacity for the Rackspace Public Cloud.

Sawant Shah is a passionate and experienced full-stack application developer with a formal degree in computer science. Being a software engineer, he has focused on developing web and mobile applications for the last 9 years. From building frontend interfaces and programming application backend as a developer to managing and automating service delivery as a DevOps engineer, he has worked at all stages of an application and project's lifecycle. He is currently spearheading the web and mobile projects division at the Express Media Group—one of the country's largest media houses. His previous experience includes leading teams and developing solutions at a software house, a BPO, a non-profit organization, and an Internet startup.


pages: 425 words: 112,220

The Messy Middle: Finding Your Way Through the Hardest and Most Crucial Part of Any Bold Venture by Scott Belsky

23andMe, 3D printing, Airbnb, Albert Einstein, Anne Wojcicki, augmented reality, autonomous vehicles, Ben Horowitz, bitcoin, blockchain, Chuck Templeton: OpenTable:, commoditize, correlation does not imply causation, cryptocurrency, delayed gratification, DevOps, Donald Trump, Elon Musk, endowment effect, hiring and firing, Inbox Zero, iterative process, Jeff Bezos, knowledge worker, Lean Startup, Lyft, Mark Zuckerberg, Marshall McLuhan, minimum viable product, move fast and break things, move fast and break things, NetJets, Network effects, new economy, old-boy network, pattern recognition, Paul Graham, ride hailing / ride sharing, Silicon Valley, slashdot, Snapchat, Steve Jobs, subscription business, TaskRabbit, the medium is the message, Travis Kalanick, Uber for X, uber lyft, WeWork, Y Combinator, young professional

He had only a small amount of engineering experience before he joined us, but the team fell for him after his very first interview. Malcolm joined us as the third member of our dev-ops team, which is a group of engineers dedicated to the infrastructure, stability, and security of Behance’s platform. The dev-ops team is at the front line of every nightmare situation: spam problems, security breaches, latency in the speed of millions of portfolios loading for millions of visitors every day, and, when the site goes down, the dev-ops team diagnoses the problem and fixes it. Putting out fires all day—and trying to make the company flame retardant—is a stressful job, compounded by the constant battery of questions and concerns coming from all corners.

Putting out fires all day—and trying to make the company flame retardant—is a stressful job, compounded by the constant battery of questions and concerns coming from all corners. On paper, Malcolm wasn’t the perfect fit for the job based on his past experience. But his level of enthusiasm and willingness to take on any responsibility and master it helped him not only succeed but also elevate the dev-ops culture more broadly. Malcolm transformed the team and became a leader we all admired. Skills may be shared, but sheer initiative (and the energy and enthusiasm that comes along with it) helps the culture and spreads like wildfire—the good kind of fire. HOW DO YOU HIRE FOR INITIATIVE? Past initiative is the best indicator of future initiative.

This doesn’t make for a good politician, at least not in difficult moments. But over time, people come to respect candidness and directness. When you feel like a problem is being obfuscated by disclaimers, delicateness, or a lack of intellectual honesty, try to simplify it and compartmentalize the issues. One of my most frequent questions to our dev-ops team at Behance—the folks responsible for keeping our services up and running for millions of people to use every day—was “What’s keeping you up at night right now?” I was always trying to get beneath the surface of the progress we were making to unearth the real vulnerabilities. When you’re proposing a solution to a problem and meet resistance, take a step back to make sure everyone understands the problem first.


pages: 234 words: 57,267

Python Network Programming Cookbook by M. Omar Faruque Sarker

business intelligence, cloud computing, Debian, DevOps, Firefox, inflight wifi, RFID, web application

So, this book covers less theory, but it's packed with practical materials. This book is written with a "devops" mindset where a developer is also more or less in charge of operation, that is, deploying the application and managing various aspects of it, such as remote server administration, monitoring, scaling-up, and optimizing for better performance. This book introduces you to a bunch of open-source, third-party Python libraries, which are awesome to use in various usecases. I use many of these libraries on a daily basis to enjoy automating my devops tasks. For example, I use Fabric for automating software deployment tasks and other libraries for other purposes, such as, searching things on the Internet, screen-scraping, or sending an e-mail from a Python script.

Faruque Sarker Reviewers Ahmed Soliman Farghal Vishrut Mehta Tom Stephens Deepak Thukral Acquisition Editors Aarthi Kumarswamy Owen Roberts Content Development Editor Arun Nadar Technical Editors Manan Badani Shashank Desai Copy Editors Janbal Dharmaraj Deepa Nambiar Karuna Narayanan Project Coordinator Sanchita Mandal Proofreaders Faye Coulman Paul Hindle Joanna McMahon Indexer Mehreen Deshmukh Production Coordinator Nilesh R. Mohite Cover Work Nilesh R. Mohite About the Author Dr. M. O. Faruque Sarker is a software architect, and DevOps engineer who's currently working at University College London (UCL), United Kingdom. In recent years, he has been leading a number of Python software development projects, including the implementation of an interactive web-based scientific computing framework using the IPython Notebook service at UCL.


Learn Algorithmic Trading by Sebastien Donadio

active measures, algorithmic trading, automated trading system, backtesting, Bayesian statistics, buy and hold, buy low sell high, cryptocurrency, DevOps, en.wikipedia.org, fixed income, Flash crash, Guido van Rossum, latency arbitrage, locking in a profit, market fundamentalism, market microstructure, martingale, natural language processing, p-value, paper trading, performance metric, prediction markets, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, Sharpe ratio, short selling, sorting algorithm, statistical arbitrage, statistical model, stochastic process, survivorship bias, transaction costs, type inference, WebSocket, zero-sum game

Despite all of these precautions, software implementation bugs do slip into live trading markets, so we should always be aware and cautious because software is never perfect and the cost of mistakes/bugs is very high in the algorithmic trading business, and even higher in the HFT business. DevOps risk DevOps risk is the term that is used to describe the risk potential when algorithmic trading strategies are deployed to live markets. This involves building and deploying correct trading strategies and configuring the configuration, the signal parameters, the trading parameters, and starting, stopping, and monitoring them.

Choice of IDE – Pycharm or Notebook Our first algorithmic trading (buy when the price is low, and sell when the price is high) Setting up your workspace PyCharm 101 Getting the data Preparing the data – signal Signal visualization Backtesting Summary Section 2: Trading Signal Generation and Strategies Deciphering the Markets with Technical Analysis Designing a trading strategy based on trend- and momentum-based indicators Support and resistance indicators Creating trading signals based on fundamental technical analysis Simple moving average Implementation of the simple moving average Exponential moving average Implementation of the exponential moving average Absolute price oscillator Implementation of the absolute price oscillator Moving average convergence divergence Implementation of the moving average convergence divergence Bollinger bands Implementation of Bollinger bands Relative strength indicator Implementation of the relative strength indicator Standard deviation Implementing standard derivatives Momentum Implementation of momentum Implementing advanced concepts, such as seasonality, in trading instruments Summary Predicting the Markets with Basic Machine Learning Understanding the terminology and notations Exploring our financial dataset Creating predictive models using linear regression methods Ordinary Least Squares Regularization and shrinkage – LASSO and Ridge regression Decision tree regression Creating predictive models using linear classification methods K-nearest neighbors Support vector machine Logistic regression Summary Section 3: Algorithmic Trading Strategies Classical Trading Strategies Driven by Human Intuition Creating a trading strategy based on momentum and trend following Examples of momentum strategies Python implementation Dual moving average Naive trading strategy Turtle strategy Creating a trading strategy that works for markets with reversion behavior Examples of reversion strategies Creating trading strategies that operate on linearly correlated groups of trading instruments Summary Sophisticated Algorithmic Strategies Creating a trading strategy that adjusts for trading instrument volatility Adjusting for trading instrument volatility in technical indicators Adjusting for trading instrument volatility in trading strategies Volatility adjusted mean reversion trading strategies Mean reversion strategy using the absolute price oscillator trading signal Mean reversion strategy that dynamically adjusts for changing volatility Trend-following strategy using absolute price oscillator trading signal Trend-following strategy that dynamically adjusts for changing volatility Creating a trading strategy for economic events Economic releases Economic release format Electronic economic release services Economic releases in trading Understanding and implementing basic statistical arbitrage trading strategies Basics of StatArb Lead-lag in StatArb Adjusting portfolio composition and relationships Infrastructure expenses in StatArb StatArb trading strategy in Python StatArb data set Defining StatArb signal parameters Defining StatArb trading parameters Quantifying and computing StatArb trading signals StatArb execution logic StatArb signal and strategy performance analysis Summary Managing the Risk of Algorithmic Strategies Differentiating between the types of risk and risk factors Risk of trading losses Regulation violation risks Spoofing Quote stuffing Banging the close Sources of risk Software implementation risk DevOps risk Market risk Quantifying the risk The severity of risk violations Differentiating the measures of risk Stop-loss Max drawdown Position limits Position holding time Variance of PnLs Sharpe ratio Maximum executions per period Maximum trade size Volume limits Making a risk management algorithm Realistically adjusting risk Summary  Section 4: Building a Trading System Building a Trading System in Python Understanding the trading system Gateways Order book management Strategy Order management system  Critical components Non-critical components Command and control Services Building a trading system in Python LiquidityProvider class Strategy class OrderManager class MarketSimulator class TestTradingSimulation class Designing a limit order book Summary Connecting to Trading Exchanges Making a trading system trade with exchanges Reviewing the Communication API Network basics Trading protocols FIX communication protocols Price updates Orders Receiving price updates Initiator code example Price updates Sending orders and receiving a market response Acceptor code example Market Data request handling Order Other trading APIs Summary Creating a Backtester in Python Learning how to build a backtester  In-sample versus out-of-sample data Paper trading (forward testing) Naive data storage HDF5 file Databases Relational databases Non-relational databases Learning how to choose the correct assumptions For-loop backtest systems Advantages Disadvantages Event-driven backtest systems Advantages Disadvantages Evaluating what the value of time is Backtesting the dual-moving average trading strategy For-loop backtester Event-based backtester Summary Section 5: Challenges in Algorithmic Trading Adapting to Market Participants and Conditions Strategy performance in backtester versus live markets Impact of backtester dislocations Signal validation Strategy validation Risk estimates Risk management system Choice of strategies for deployment Expected performance Causes of simulation dislocations Slippage Fees Operational issues Market data issues Latency variance Place-in-line estimates Market impact Tweaking backtesting and strategies in response to live trading Historical market data accuracy Measuring and modeling latencies Improving backtesting sophistication Adjusting expected performance for backtester bias Analytics on live trading strategies Continued profitability in algorithmic trading Profit decay in algorithmic trading strategies Signal decay due to lack of optimization Signal decay due to absence of leading participants Signal discovery by other participants Profit decay due to exit of losing participants Profit decay due to discovery by other participants Profit decay due to changes in underlying assumptions/relationships Seasonal profit decay Adapting to market conditions and changing participants Building a trading signals dictionary/database Optimizing trading signals Optimizing prediction models Optimizing trading strategy parameters Researching new trading signals Expanding to new trading strategies Portfolio optimization Uniform risk allocation PnL-based risk allocation PnL-sharpe-based risk allocation Markowitz allocation Regime Predictive allocation Incorporating technological advances Summary Final words Other Books You May Enjoy Leave a review - let other readers know what you think Preface In modern times, it is increasingly difficult to gain a significant competitive edge just by being faster than others, which means relying on sophisticated trading signals, predictive models, and strategies.

Most modern trading firms trade markets electronically almost 23 hours a day, and they have a large number of staff whose only job is to keep an eye on the automated algorithmic trading strategies that are deployed to live markets to ensure they are behaving as expected and no erroneous behavior goes uninvestigated. They are known as the Trading Desk, or TradeOps or DevOps. These people have a decent understanding of software development, trading rules, and exchange for provided risk monitoring interfaces. Often, when software implementation bugs end up going to live markets, they are the final line of defense, and it is their job to monitor the systems, detect issues, safely pause or stop the algorithms, and contact and resolve the issues that have emerged.


pages: 234 words: 63,522

Puppet Essentials by Felix Frank

cloud computing, Debian, DevOps, domain-specific language, Infrastructure as a Service, platform as a service, web application

Thomas Dao has spent over two decades playing around with various Unix flavors as a Unix administrator, build and release engineer, and configuration manager. He is passionate about open source software and tools, so Puppet was something he naturally gravitated toward. Currently employed in the telecommunications industry as a configuration analyst, he also divides some of his time as a technical editor at devops.ninja. I would like to thank my lovely wife, whose patience with me while I'm glued to my monitor gives me the inspiration to pursue my passions, and my dog, Bento, who is always by my side, giving me company. Brian Moore is a senior product engineer, a father of two, and a quintessential hacker.

Table of Contents Preface Chapter 1: Writing Your First Manifests Getting started Introducing resources and properties Interpreting the output of the puppet apply command Dry-testing your manifest Adding control structures in manifests Using variables Variable types Controlling the order of evaluation Declaring dependencies Error propagation Avoiding circular dependencies Implementing resource interaction Examining the most notable resource types The user and group types The exec resource type The cron resource type The mount resource type Summary Chapter 2: The Master and Its Agents The Puppet master Setting up the master machine Creating the master manifest Inspecting the configuration settings Setting up the Puppet agent The agent's life cycle 1 7 8 10 11 12 13 14 14 16 17 20 21 22 25 26 27 29 29 30 31 31 32 33 35 35 38 Table of Contents Renewing an agent's certificate Running the agent from cron Performance considerations Switching to Phusion Passenger Using Passenger with Nginx Basic tuning Troubleshooting SSL issues Summary Chapter 3: A Peek Under the Hood – Facts, Types, and Providers Summarizing systems with Facter Accessing and using fact values Extending Facter with custom facts Simplifying things using external facts Goals of Facter Understanding the type system The resource type's life cycle on the agent side Substantiating the model with providers Providerless resource types Summarizing types and providers Putting it all together Summary Chapter 4: Modularizing Manifests with Classes and Defined Types Introducing classes and defined types Defining and declaring classes Creating and using defined types Understanding and leveraging the differences Structured design patterns Writing comprehensive classes Writing component classes Using defined types as resource wrappers Using defined types as resource multiplexers Using defined types as macros Exploiting array values using defined types Including classes from defined types Nesting definitions in classes Establishing relationships among containers Passing events between classes and defined types [ ii ] 40 41 42 43 45 46 47 48 49 50 52 53 55 57 57 58 59 61 61 62 64 65 66 66 67 69 71 71 73 74 76 77 78 81 82 83 83 Table of Contents Ordering containers Limitations Performance implications of container relationships Mitigating the limitations 86 86 89 90 Making classes more flexible through parameters Caveats of parameterized classes Preferring the include keyword Summary 92 92 93 94 The anchor pattern The contain function Chapter 5: Extending Your Puppet Infrastructure with Modules 90 91 95 An overview of Puppet's modules Parts of a module How the content of each module is structured Documentation in modules Maintaining environments Configuring environment locations Obtaining and installing modules Modules' best practices Putting everything in modules Avoiding generalization Testing your modules 96 96 97 98 99 100 101 102 102 103 104 Building a specific module Naming your module Making your module available to Puppet Implementing the basic module functionality Creating utilities for derived manifests 105 106 106 106 110 Safe testing with environments Adding configuration items Allowing customization Removing unwanted configuration items Dealing with complexity Enhancing the agent through plugins Replacing a defined type with a native type Enhancing Puppet's system knowledge through facts Refining the interface of your module through custom functions Making your module portable across platforms Finding helpful Forge modules Identifying modules' characteristics Summary [ iii ] 104 111 113 114 115 116 118 125 126 128 130 130 131 Table of Contents Chapter 6: Leveraging the Full Toolset of the Language 133 Chapter 7: Separating Data from Code Using Hiera 157 Templating dynamic configuration files Learning the template syntax Using templates in practice Avoiding performance bottlenecks from templates Creating virtual resources Realizing resources more flexibly using collectors Exporting resources to other agents Exporting and importing resources Configuring the master to store exported resources Exporting SSH host keys Managing hosts files locally Automating custom configuration items Simplifying the Nagios configuration Maintaining your central firewall Overriding resource parameters Making classes more flexible through inheritance Understanding class inheritance in Puppet Naming an inheriting class Making parameters safer through inheritance Saving redundancy using resource defaults Avoiding antipatterns Summary Understanding the need for separate data storage Consequences of defining data in the manifest Structuring configuration data in a hierarchy Configuring Hiera Storing Hiera data Choosing your backends Retrieving and using Hiera values in manifests Working with simple values Binding class parameter values automatically Handling hashes and arrays Converting resources to data Choosing between manifest and Hiera designs Using Hiera in different contexts A practical example Debugging Hiera lookups Summary [ iv ] 134 134 135 136 137 140 141 142 142 143 144 144 145 146 147 148 149 151 151 152 154 155 158 159 161 163 164 165 165 166 167 170 172 175 175 177 179 180 Table of Contents Chapter 8: Configuring Your Cloud Application with Puppet 181 Index 207 Typical scopes of Puppet Common data center use – roles and profiles Taking Puppet to the cloud Initializing agents in the cloud Using Puppet's cloud-provisioner module Building manifests for the cloud Mapping functionalities to nodes Choosing certificate names Creating a distributed catalog Composing arbitrary configuration files Handling instance deletions Preparing for autoscaling Managing certificates Limiting round trip times Ensuring successful provisioning Adding necessary relationships Testing the manifests Summary [v] 182 183 184 185 186 187 187 190 191 194 197 198 198 200 202 203 204 205 Preface The software industry is changing and so are its related fields. Old paradigms are slowly giving way to new roles and shifting views on what the different professions should bring to the table. The DevOps trend pervades evermore workflows. Developers set up and maintain their own environments, and operations raise automation to new levels and translate whole infrastructures to code. A steady stream of new technologies allows for more efficient organizational principles. One of these newcomers is Puppet.

What you have learned will most likely satisfy your immediate requirements. For information beyond these lessons, don't hesitate to look up the excellent online documentation at https://docs. puppetlabs.com/ or join the community and ask your questions on chat or in the mailing list. Thanks for reading, and have lots of fun with Puppet and its family of DevOps tools. [ 206 ] Index A agents initializing, in cloud 185 resources, exporting to 141 anchor pattern about 90 URL 91 antipatterns avoiding 154, 155 apt-get command 8 arrays 15 autorequire feature 125 autoscaling feature about 198 certificates, managing 198-200 round trip times, limiting 200-202 autosigning URL 200 autosigning script 198 B backends selecting 165 URL, for online documentation 165 beaker about 105 URL 105 before metaparameter 19, 21, 24 C classes about 66 component classes, writing 73, 74 comprehensive classes, writing 71, 72 creating, with parameters 92 declaring 66, 67 defining 66, 67 definitions, nesting 82 differentiating, with defined types 69, 70 include keyword, preferring 93 parameterized classes, consequences 92, 93 class inheritance 149 cloud agents, initializing in 185 manifests, building for 187 cloud-provisioner module using 186 collectors used, for realizing resources 140, 141 component classes writing 73, 74 composite design 71 comprehensive classes writing 71, 72 configuration data structuring, in hierarchy 161, 162 containers events, passing between classes and defined types 83-85 limitations 86-89 limitations, mitigating 90 ordering 86 relationships, establishing among 83 containers, limitations anchor pattern 90 contain function 91 control structures adding, in manifest 13, 14 creates parameter 28 cron resource type 29 custom attribute 191 custom facts about 53 Facter, extending with 53-55 custom functions about 96 used, for refining custom module interface 126-128 custom module building 105 enhancing, through facts 125 implementing 106-109 interface, refining through custom functions 126-128 making, portable across platforms 128, 129 naming 106 using 106 utilities, creating for derived manifests 110 custom types 117 D data resources, converting to 172-174 data, defining in manifest consequences 159, 160 defined types about 66 creating 67-69 differentiating, with classes 69, 70 used, for exploiting array values 78-81 using 67-69 using, as macros 77, 78 using, as resource multiplexers 76 using, as resource wrappers 74, 75 dependency 20 documentation, modules 98, 99 domain-specific language (DSL) 8 dynamic configuration files templating 134 dynamic scoping 154 E enabled property 10 ensure property 10 environment.conf file 100 environment locations configuring 100, 101 environments maintaining 99, 100 modules, installing 101, 102 modules, obtaining 101, 102 used, for testing modules 104, 105 evaluation order circular dependencies, avoiding 21, 22 controlling 16 dependencies, declaring 17-20 error propagation 20 events about 23 passing, between classes and defined types 83-85 exec resource type 27 external facts using 55, 56 External Node Classifiers (ENCs) 174 F Faces 186 Facter example 62 extending, with custom facts 53-55 goals 57 systems, summarizing with 50, 51 facts URL, for documentation 125 used, for enhancing custom module 125 fact values accessing 52, 53 using 52, 53 flexibility, providing to classes about 148 class inheritance 149 inheriting class, naming 151 parameters, making safer through inheritance 151 [ 208 ] Forge modules' characteristics, identifying 130 URL 130 used, for searching modules 130 fqdn_rand function 41 fully qualified domain name (FQDN) 52 G group resource type 26 H hashes 14 Hiera arrays, handling 170-172 class parameter values, binding 167-169 configuring 163 data, storing 164 hashes, handling 170-172 lookups, defining 179 practical example 177, 178 using, in different contexts 175, 176 values, retrieving 165 values, using in manifest 165 working with simple values 166, 167 hiera_array function 170 hiera_hash function 171 hierarchy configuration data, structuring in 161, 162 I immutability, variables 14 include keyword preferring 93 Infrastructure as a Service (IaaS) 184 Infrastructure as Code paradigm 105 inheriting class naming 151 installation, modules 101, 102 instances method 123 M manifest about 182 control structures, adding in 13, 14 dry-testing 12 structure 9 manifest, and Hiera designs selecting between 175 manifest, building for cloud about 187 arbitrary configuration files, composing 194-196 certificate names, selecting 190, 191 distributed catalog, creating 191-194 functionality, mapping to nodes 187-189 instance deletions, handling 197, 198 metaparameters 18 model substantiating, with providers 59, 60 modules about 96 agent, enhancing through plugins 116, 117 best practices 102 content structure 97, 98 documentation 98, 99 generalization, avoiding 103 identifying, in Forge 130 important parts 96 installing 101, 102 manifest files, gathering 102, 103 obtaining 101, 102 searching, in Forge 130 testing 104 testing, with environments 104, 105 URL, for publishing 98 monolithic implementation 71 mount resource type 29, 30 N Nginx about 45 Phusion Passenger, using with 45, 46 nodes file 100 Notice keyword 20 [ 209 ] O operatingsystemrelease fact 53 output interpreting, of puppet apply command 11, 12 P Proudly sourced and uploaded by [StormRG] Kickass Torrents | TPB | ExtraTorrent | h33t parameterized classes consequences 92, 93 parameters versus properties 10 parser functions 96 performance bottlenecks avoiding, from templates 136 performance considerations about 42 basic tuning 46 Passenger, using with Nginx 45 switching, to Phusion Passenger 43, 44 Phusion Passenger switching to 43, 44 URL, for installation instructions 45 using, with Nginx 45, 46 Platform as a Service (PaaS) 184 plugins about 116 custom types, creating 118 custom types, naming 118 management commands, declaring 121 provider, adding 121 provider, allowing to prefetch existing resources 123, 124 provider functionality, implementing 122, 123 resource names, using 120 resource type interface, creating 119 sensible parameter hooks, designing 120 types, making robust 125 used, for enhancing modules agent 116, 117 plugins, types custom facts 116 parser functions 116 providers 116 types 116 processorcount fact 52 properties about 10 versus parameters 10 providerless resource types 61 provider parameter 10 providers model, substantiating with 59, 60 summarizing 61 Puppet about 182 installing 8 modules 96 typical scopes 182 URL 182 Puppet agent certificate, renewing 40 life cycle 38, 39 running, from cron 41 setting up 35-37 puppet apply command about 9, 31 output, interpreting of 11, 12 PuppetBoard 186 Puppet Dashboard 186 Puppet Explorer 186 Puppet Labs URL 8 URL, for advanced approaches 43 URL, for core resource types 61 URL, for style guide 52 URL, for system installation information 32 URL, for Troubleshooting section 47 puppetlabs-strings module URL 99 Puppet master about 31 configuration settings, inspecting 35 master machine, setting up 32 master manifest, creating 33, 34 tasks 32 puppetmaster system service 33 puppet module install command 101 Puppet support, for SSL CSR attributes URL 199 [ 210 ] Puppet, taking to cloud about 184 agents, initializing 185 cloud-provisioner module, using 186 Puppet toolchain 46 rspec-puppet module about 105 URL 105 R separate data storage need for 158 singletons 135 site manifest 33 SSL troubleshooting 47, 48 stdlib module 101 strings 15 subscribe metaparameter 23 successful provisioning, ensuring about 202 manifests, testing 204, 205 necessary relationships, adding 203 systems summarizing, with Facter 50, 51 S realize function 138, 139 redundancy saving, resource defaults used 152, 153 relationships, containers performance implications 89 require metaparameter 19 resource chaining 17 resource defaults used, for saving redundancy 152, 153 resource interaction implementing 22-24 resource parameters overriding 147, 148 resources about 10 converting, to data 172-174 exporting 142 exporting, to agents 141 importing 142 realizing, collectors used 140, 141 resources, exporting about 141 central firewall, maintaining 146 custom configuration, automating 144 hosts files, managing 144 master configuration, for storing exported resources 142 Nagios configuration, simplifying 145, 146 SSH host keys, exporting 143 resource type life cycle, agent side 58, 59 resource types cron 29 examining 25, 26 exec 27, 28 group 26 mount 29, 30 user 26 revocation 39 Roles and Profiles pattern 183 T templates performance bottlenecks, avoiding from 136 using 135, 136 template syntax learning 134, 135 transaction 57 Trusted Facts 189 types about 117 summarizing 61 type system 57 typical scopes, Puppet about 182 profiles 183, 184 roles 183, 184 U user resource type 26 utilities, custom module complexity, dealing 115, 116 configuration items, adding 111, 112 creating, for derived manifests 110 [ 211 ] customization, allowing 113 unwanted configuration items, removing 114, 115 W Warning keyword 20 V Y Vagrant 182 variables using 14 variable types about 14 arrays 15 hashes 14 strings 15 virtual resources creating 137, 138 yum command 8 [ 212 ] Thank you for buying Puppet Essentials About Packt Publishing Packt, pronounced 'packed', published its first book "Mastering phpMyAdmin for Effective MySQL Management" in April 2004 and subsequently continued to specialize in publishing highly focused books on specific technologies and solutions.


pages: 1,380 words: 190,710

Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems by Heather Adkins, Betsy Beyer, Paul Blankinship, Ana Oprea, Piotr Lewandowski, Adam Stubblefield

anti-pattern, barriers to entry, bash_history, business continuity plan, business process, Cass Sunstein, cloud computing, continuous integration, correlation does not imply causation, create, read, update, delete, cryptocurrency, cyber-physical system, database schema, Debian, defense in depth, DevOps, Edward Snowden, fault tolerance, fear of failure, general-purpose programming language, Google Chrome, Internet of things, Kubernetes, load shedding, margin call, microservices, MITM: man-in-the-middle, performance metric, pull request, ransomware, revision control, Richard Thaler, risk tolerance, self-driving car, Skype, slashdot, software as a service, source of truth, Stuxnet, Turing test, undersea cable, uranium enrichment, Valgrind, web application, Y2K, zero day

For many years, my colleagues and I have argued that security should be a first-class and embedded quality of software. I believe that embracing an SRE-inspired approach is a logical step in that direction. Since arriving at Google, I’ve learned more about how the SRE model was established here, how SRE implements DevOps philosophies, and how SRE and DevOps have evolved. Meanwhile, I’ve been translating my IT security experience in the financial services industry to the technical and programmatic security capabilities at Google. These two sectors are not unrelated, but each has its own history worth understanding. At the same time, enterprises are at a critical point where cloud computing, various forms of machine learning, and a complicated cybersecurity landscape are together determining where an increasingly digital world is going, how quickly it will get there, and what risks are involved.

Ever since I began working in the tech industry, across organizations of varying sizes, I’ve seen people struggling with the question of how security should be organized: Should it be centralized or federated? Independent or embedded? Operational or consultative? Technical or governing? The list goes on…. When the SRE model, and SRE-like versions of DevOps, became popular, I noticed that the problem space SRE tackles exhibits similar dynamics to security problems. Some organizations have combined these two disciplines into an approach called “DevSecOps.” Both SRE and security have strong dependencies on classic software engineering teams. Yet both differ from classic software engineering teams in fundamental ways: Site Reliability Engineers (SREs) and security engineers tend to break and fix, as well as build.

In my previous roles, I looked for a more formal exploration of these questions; I hope that a variety of teams inside and outside of security organizations find this discussion useful as approaches and tools evolve. This project has reinforced my belief that the topics it covers are worth discussing and promoting in the industry—particularly as more organizations adopt DevOps, DevSecOps, SRE, and hybrid cloud architectures along with their associated operating models. At a minimum, this book is another step in the evolution and enhancement of system and data security in an increasingly digital world. Royal Hansen, Vice President, Security Engineering Foreword by Michael Wildpaner At their core, both Site Reliability Engineering and Security Engineering are concerned with keeping a system usable.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst

algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

Training is a prerequisite for understanding the paradigm shift that Big Data offers. Without that insider knowledge, it becomes difficult to explain and communicate the value of data, especially when the data are public in nature. Next on the list is the integration of development and operations teams (known as DevOps), the people most likely to deal with the burdens of storing and transforming the data into something usable. Much of the process of moving forward will lie with the business executives and decision makers, who will also need to be brought up to speed on the value of Big Data. The advantages must be explained in a fashion that makes sense to the business operations, which in turn means that IT pros are going to have to do some legwork.

See Business intelligence (BI) Big Data and Big Data analytics analysis categories application platforms best practices business case development challenges classifications components defined evolution of examples of 4Vs of goal setting introduction investment in path to phases of potential of privacy issues processing role of security (See Security) sources of storage team development technologies (See Technologies) value of visualizations Big Science BigSheets Bigtable Bioinformatics Biomedical industry Blekko Business analytics (BA) Business case best practices data collection and storage options elements of introduction Business intelligence (BI) as Big Data analytics foundation Big Data analytics team incorporation Big Data impact defined extract, transform, and load (ETL) information technology and in-memory processing limitations of marketing campaigns risk analysis storage capacity issues unstructured data visualizations Business leads Business logic Business objectives Business rules C Capacity of storage systems Cassandra Census data CERN Citi Classification of data Cleaning Click-stream data Cloud computing Cloudera Combs, Nick Commodity hardware Common Crawl Corpus Communication Competition Compliance Computer security officers (CSOs) Consulting firms Core capabilities, data analytics team Costs Counterintelligence mind-set CRUD (create, retrieve, update, delete) applications Cryptographic keys Culture, corporate Customer needs Cutting, Doug D Data defined growth in volume of value of See also Big Data and Big Data analytics Data analysis categories challenges complexity of as critical skill for team members data accuracy evolution of importance of process technologies Database design Data classification Data discovery Data extraction Data integration technologies value creation Data interpretation Data manipulation Data migration Data mining components as critical skill for team members defined examples methods technologies Data modeling Data protection. See Security Data retention Data scientists Data sources growth of identification of importation of data into platform public information Data visualization Data warehouses DevOPs Discovery of data Disk cloning Disruptive technologies Distributed file systems. See also Hadoop Dynamo E e-commerce Economist e-discovery Education 80Legs Electronic medical records compliance data errors data extraction privacy issues trends Electronic transactions EMC Corporation Employees data analytics team membership monitoring of training Encryption Entertainment industry Entity extraction Entity relation extraction Errors Event-driven data distribution Evidence-based medicine Evolution of Big Data algorithms current issues future developments modern era origins of Expectations Expediency-accuracy tradeoff External data Extract, transform, and load (ETL) Extractiv F Facebook Filters Financial controllers Financial sector Financial transactions Flexibility of storage systems 4Vs of Big Data G Gartner General Electric (GE) Gephi Goal setting Google Google Books Ngrams Google Refine Governance Government agencies Grep H Hadoop advantages and disadvantages of design and function of event-processing framework future origins of vendor support Yahoo’s use HANA HBase HDFS Health care Big Data analytics opportunities Big Data trends compliance evolution of Big Data See also Electronic medical records Hibernate High-value opportunities History.


Learning Puppet 4: A Guide to Configuration Management and Automation by Jo Rhett

Amazon Web Services, Debian, DevOps, Golden Gate Park, pull request

Most important of all, this book will cover how to scale your Puppet installation to handle thousands of nodes. You’ll learn multiple strategies for handling diverse and heterogenous environments, and reasons why each of these approaches may be appropriate or not for your needs. Who this book is for This book is primarily aimed at System Administrators and Operations or DevOps Engineers. If you are responsible for development or production nodes, this book will provide you with immediately useful tools to make your job easier than ever before. If you run a high-uptime production environment, you’re going to learn how Puppet ix www.it-ebooks.info can enforce your existing standards throughout the implementation.

I owe a drink and many thanks to the many people who provided input and feedback on the book during the writing process, including but definitely not limited to the technical reviewers: • Chris Barbour, Taos Mountain And finally, I’d like to thank my O’Reilly editor, Brian Anderson, who gave me excel‐ lent guidance on the book and was a pleasure to work with. Jo Rhett, DevOps Architect, Net Consonance xii | Preface www.it-ebooks.info Introduction What is Puppet? Puppet brings computer systems into compliance with a policy you design. Puppet manages configuration data on these systems, including users, packages, processes, services; any component of the system you can define.


pages: 274 words: 58,675

Puppet 3 Cookbook by John Arundel

Amazon Web Services, cloud computing, continuous integration, Debian, defense in depth, DevOps, don't repeat yourself, GnuPG, Larry Wall, place-making, Ruby on Rails, web application

Herman Carlos Nilton Araújo Corrêa Daniele Sluijters Dao Thomas Acquisition Editor Kartikey Pandey Lead Technical Editor Madhuja Chaudhari Technical Editors Anita Nayak Larissa Pinto Indexers Hemangini Bari Monica Ajmera Mehta Graphics Ronak Dhruv Production Coordinator Kyle Albuquerque Cover Work Kyle Albuquerque About the Author John Arundel is a devops consultant, which means he solves difficult problems for a living. (He doesn't get called in for easy problems.) He has worked in the tech industry for 20 years, and during that time has done wrong (or seen done wrong) almost everything that you can do wrong with computers. That comprehensive knowledge of what not to do, he feels, is one of his greatest assets as a consultant.

You'll find links and references to more information on every topic, so that you can explore further for yourself. Whatever your level of Puppet experience, there's something for you, from simple workflow tips to advanced, high-performance Puppet architectures. I've tried hard to write the kind of book that would be useful to me in my day-to-day work as a devops consultant. I hope it will inspire you to learn, to experiment, and to come up with your own new ideas in this exciting and fast-moving field. Preface What this book covers You'll find the following chapters in this book: Chapter 1, Puppet Infrastructure, shows how to set up Puppet for the first time, including instructions on installing Puppet, creating your first manifests, using version control with Puppet, building a distributed Puppet architecture based on Git, writing a script to apply Puppet manifests, running Puppet automatically, using Rake to bootstrap machines and deploy changes, and using Git hooks to automatically syntax-check your manifests.


pages: 56 words: 16,788

The New Kingmakers by Stephen O'Grady

AltaVista, Amazon Web Services, barriers to entry, cloud computing, correlation does not imply causation, crowdsourcing, David Heinemeier Hansson, DevOps, Jeff Bezos, Khan Academy, Kickstarter, Marc Andreessen, Mark Zuckerberg, Netflix Prize, Paul Graham, Ruby on Rails, Silicon Valley, Skype, software as a service, software is eating the world, Steve Ballmer, Steve Jobs, The future is already here, Tim Cook: Apple, Y Combinator

Internally, Netflix oriented its business around its developers. As cloud architect Adrian Cockcroft put it: The typical environment you have for developers is this image that they can write code that works on a perfect machine that will always work, and operations will figure out how to create this perfect machine for them. That’s the traditional dev-ops, developer versus operations contract. But then of course machines aren’t perfect and code isn’t perfect, so everything breaks and everyone complains to each other. So we got rid of the operations piece of that and just have the developers, so you can’t depend on everybody and you have to assume that all the other developers are writing broken code that isn’t properly deployed.


Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps by Valliappa Lakshmanan, Sara Robinson, Michael Munn

A Pattern Language, Airbnb, algorithmic trading, automated trading system, business intelligence, business process, combinatorial explosion, computer vision, continuous integration, Covid-19, COVID-19, DevOps, discrete time, en.wikipedia.org, iterative process, Kubernetes, microservices, mobile money, natural language processing, Netflix Prize, optical character recognition, pattern recognition, performance metric, recommendation engine, ride hailing / ride sharing, selection bias, self-driving car, sentiment analysis, speech recognition, statistical model, the payments system, web application

The full code can be found in our GitHub repository. We strongly encourage you to peruse the code as you read the pattern description. Machine Learning Terminology Because machine learning practitioners today may have different areas of primary expertise—software engineering, data analysis, DevOps, or statistics—there can be subtle differences in the way that different practitioners use certain terms. In this section, we define terminology that we use throughout the book. Models and Frameworks At its core, machine learning is a process of building models that learn from data. This is in contrast to traditional programming where we write explicit rules that tell programs how to behave.

Using a machine with more powerful GPUs, for example, typically helps to improve the performance of deep learning models. Choosing a machine with multiple accelerators and/or threads helps improve the number of requests per second. Using an autoscaling cluster of machines can help lower cost on spiky workloads. These kinds of tweaks are often done by the ML/DevOps team; some are ML-specific, some are not. Language-neutral Every modern programming language can speak REST, and a discovery service is provided to autogenerate the necessary HTTP stubs. Thus, Python clients can invoke the REST API as follows. Note that there is nothing framework specific in the code below.

Powerful ecosystem Because web application frameworks are so widely used, there is a lot of tooling available to measure, monitor, and manage web applications. If we deploy the ML model to a web application framework, the model can be monitored and throttled using tools that software reliability engineers (SREs), IT administrators, and DevOps personnel are familiar with. They do not have to know anything about machine learning. Similarly, your business development colleagues know how to meter and monetize web applications using API gateways. They can carry over that knowledge and apply it to metering and monetizing machine learning models.


pages: 761 words: 80,914

Ansible: Up and Running: Automating Configuration Management and Deployment the Easy Way by Lorin Hochstein

Amazon Web Services, cloud computing, continuous integration, Debian, DevOps, domain-specific language, don't repeat yourself, general-purpose programming language, Infrastructure as a Service, job automation, MITM: man-in-the-middle, pull request, side project, smart transportation, web application

I didn’t know what else to recommend, so I decided to write something to fill the gap — and here it is. Alas, this book comes too late for him, but I hope you’ll find it useful. Who Should Read This Book This book is for anyone who needs to deal with Linux or Unix-like servers. If you’ve ever used the terms systems administration, operations, deployment, configuration management, or (sigh) DevOps, then you should find some value here. Although I have managed my share of Linux servers, my background is in software engineering. This means that the examples in this book tend toward the deployment end of the spectrum, although I’m in agreement with Andrew Clay Shafer ([webops]) that the distinction between deployment and configuration is unresolved.

Declarative A type of programming language where the programmer describes the desired output, not the procedure for how to compute the output. Ansible’s playbooks are declarative. SQL is another example of a declarative language. Contrast with procedural languages, such as Java and Python. Deployment The process of bringing software up onto a live system. DevOps IT buzzword that gained popularity in the mid-2010s. Dry run See Check mode. DSL Domain-specific language. In systems that use DSLs, the user interacts with the systems by writing text files in the domain-specific language and then runs those files through the system. DSLs are not as powerful as general-purpose programming language, but (if designed well) they are easier to read and write than general-purpose programming language.


pages: 309 words: 81,975

Brave New Work: Are You Ready to Reinvent Your Organization? by Aaron Dignan

"side hustle", activist fund / activist shareholder / activist investor, Airbnb, Albert Einstein, autonomous vehicles, basic income, Bertrand Russell: In Praise of Idleness, bitcoin, Black Swan, blockchain, Buckminster Fuller, Burning Man, butterfly effect, cashless society, Clayton Christensen, clean water, cognitive bias, cognitive dissonance, corporate governance, corporate social responsibility, correlation does not imply causation, creative destruction, crony capitalism, crowdsourcing, cryptocurrency, David Heinemeier Hansson, deliberate practice, DevOps, disruptive innovation, don't be evil, Elon Musk, endowment effect, Ethereum, ethereum blockchain, Frederick Winslow Taylor, future of work, gender pay gap, Geoffrey West, Santa Fe Institute, gig economy, Google X / Alphabet X, hiring and firing, hive mind, impact investing, income inequality, information asymmetry, Internet of things, Jeff Bezos, job satisfaction, Kevin Kelly, Kickstarter, Lean Startup, loose coupling, loss aversion, Lyft, Marc Andreessen, Mark Zuckerberg, minimum viable product, new economy, Paul Graham, race to the bottom, remote working, Richard Thaler, shareholder value, Silicon Valley, six sigma, smart contracts, Social Responsibility of Business Is to Increase Its Profits, software is eating the world, source of truth, Stanford marshmallow experiment, Steve Jobs, TaskRabbit, The future is already here, the High Line, too big to fail, Toyota Production System, Tragedy of the Commons, uber lyft, universal basic income, WeWork, Y Combinator, zero-sum game

We need to ensure that we’ve thought about, and made space for, evolution at all levels. Thought Starters Innovation Everywhere. Because of our obsession with efficiency, most organizations like to keep innovation and operations separate. They build things and they run things. But within the last decade, DevOps has emerged as a software engineering culture and practice that has completely upended that notion. As teams release software faster and more frequently, the interaction between development, quality assurance, and operations was stressed. Developers want change. Testers want risk reduction. And operators want stability.

(Catmull), 191 criticality, 193, 216–18 Crunchbase, 253 cultural differences, 258 culture, company, 180–81, 190 hiring and, 142–43 Dalio, Ray, 152, 153–54 David, Joshua, 188 de Blok, Jos, 34–36, 105, 144 debt, 27 organizational, 27–29, 91 decentralized autonomous organizations (DAOs), 250–51 Deci, Edward, 41–42 decision making, 67, 121, 132, 152 advice process in, 70, 72–73 consent in, 70–73, 195 decision stack in, 72–73 discipline about, 69 emotion in, 174–75 information and, 136 Integrative Decision Making, 71–73 waterline principle and, 69–70, 72 defaults vs. standards, 106–7 degradation, graceful, 29–34 Deming, W. Edwards, 53, 87, 165 DeSteno, David, 236 DevOps, 104 Doerr, John, 87 Donovan, William J., 6–7 Dunbar’s number, 197 Dweck, Carol, 154 dynamic networks, 77–78 dynamic teaming, 81 economy, economics, 27, 246–47, 248 Edmondson, Amy, 221 education, 257–58 Einstein, Albert, 247 email, 131, 133–35 Emergent Inc., 183–85, 195–96, 199, 206, 212, 222, 234–35, 237–38, 239 Emerson, Harrington, 20, 25 emotions, 174–75 empowerment, 136 enabling constraints, 46 Endenburg, Gerard, 70–71 Enspiral, 99–100, 133 Ernst & Young, 35 essential intent, 62 Essentialism (McKeown), 62 Etsy, 157 eudaemonic purpose, 59 even over statements, 88–90 Everlane, 130–31, 259 Everyone Culture, An (Kegan and Lahey), 153 Evolutionary Organizations, 13, 14, 16, 21, 34–37, 48, 53, 244, 248 Complexity Conscious mindset in, see Complexity Conscious mindset list of, 267–70 operating system for, see Operating System Canvas People Positive mindset in, see People Positive mindset exaptation, 103, 104 experiments, conducting, 213–16 Facebook, 62–63, 84, 88, 235, 252, 268 facilitators, 122–23 failure, 68, 74 FAVI, 16, 37–38, 42–43 Fayol, Henri, 24 fear, 141, 222 Federal Reserve System, 252 Fifth Discipline, The (Senge), 153, 202 Firms of Endearment (Sisodia, Sheth, and Wolfe), 60 Fitzgerald, F.


pages: 344 words: 96,020

Hacking Growth: How Today's Fastest-Growing Companies Drive Breakout Success by Sean Ellis, Morgan Brown

Airbnb, Amazon Web Services, barriers to entry, Ben Horowitz, bounce rate, business intelligence, business process, correlation does not imply causation, crowdsourcing, DevOps, disruptive innovation, Elon Musk, game design, Google Glasses, Internet of things, inventory management, iterative process, Jeff Bezos, Khan Academy, Kickstarter, Lean Startup, Lyft, Mark Zuckerberg, market design, minimum viable product, Network effects, Paul Graham, Peter Thiel, Ponzi scheme, recommendation engine, ride hailing / ride sharing, side project, Silicon Valley, Silicon Valley startup, Skype, Snapchat, software as a service, Steve Jobs, subscription business, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, working poor, Y Combinator, young professional

If you’re just starting to form a growth team, then bringing over one or two individuals from different departments to get the team started may be a good way to get the ball rolling, and the size of the team can grow over time. In some cases, as the process is learned, additional teams can be formed. At IBM, for example, a growth team was formed to work specifically on growing the adoption of its Bluemix DevOps product, a software development package for engineers, by assigning five engineers and five other staff, from business operations and marketing, to make up the team. At Inman, Morgan comprised his growth team of a data scientist, three marketers, and their Web developer to start the growth hacking process.

If you are the head of a small team and want to give the process a try, it’s best to set your team up for success by getting buy-in first, even if it is just with a few peers and a supervisor. You will make mistakes, experiments will fail, webpages will break—it’s an inevitable part of the experimentation process. Having the support of higher-ups can alleviate the blowback from such eventualities. Lauren Schaefer, who was the growth hacking lead on the Bluemix DevOps team at IBM, launched a test early in the process of experimenting with growth hacking that crippled the product’s home page. But her boss was a supporter of the effort, and she and her growth team got past that stumble.14 It’s just as important that the growth team machine not be put into drive too early.


pages: 139 words: 35,022

Roads and Bridges by Nadia Eghbal

AGPL, Airbnb, Amazon Web Services, barriers to entry, Benevolent Dictator For Life (BDFL), corporate social responsibility, crowdsourcing, cryptocurrency, David Heinemeier Hansson, Debian, DevOps, en.wikipedia.org, Firefox, GnuPG, Guido van Rossum, Khan Academy, Kickstarter, Marc Andreessen, market design, Network effects, platform as a service, pull request, Richard Stallman, Ruby on Rails, side project, Silicon Valley, Skype, software is eating the world, Tragedy of the Commons, Y Combinator

The enormous social contributions of today’s digital infrastructure cannot be ignored or argued away, as has happened with other, equally important debates about data and privacy, net neutrality, or private versus public interests. This makes it easier to shift the conversation to solutions. Secondly, there are already engaged, thriving open source communities to work with. Many developers identify with the programming language they use (such as Python or JavaScript), the function they provide (such as data science or devops), or a prominent project (such as Node.js or Rails). These are strong, vocal, and enthusiastic communities. The builders of our digital infrastructure are connected to each other, aware of their needs, and technically talented. They already built our city; we just need to help keep the lights on so they can continue doing what they do best.


pages: 514 words: 111,012

The Art of Monitoring by James Turnbull

Amazon Web Services, anti-pattern, cloud computing, continuous integration, correlation does not imply causation, Debian, DevOps, domain-specific language, failed state, functional programming, Kickstarter, Kubernetes, microservices, performance metric, pull request, Ruby on Rails, software as a service, source of truth, web application, WebSocket

Instrument schemas Time and the observer effect Metrics Application metrics Business metrics Monitoring patterns, or where to put your metrics The utility pattern The external pattern Building metrics into a sample application Logging Adding our own structured log entries Adding structured logging to our sample application Working with your existing logs Health checks, endpoints, and external monitoring Checking an internal endpoint Deployments Adding deployment notifications to our sample application Working with our deployment events Tracing Summary Notifications Our current notifications Updating expired event configuration Upgrading our email notifications Formatting the email subject Formatting the email body Adding graphs to notifications Defining our data source Defining our query parameters Defining our graph panels and rows Rendering the dashboard Adding our dashboard to the Riemann notification Some sample scripted dashboards Other context Adding Slack as a destination Adding PagerDuty as a destination Maintenance and downtime Learning from your notifications Other alerting tools Summary Monitoring Tornado: a capstone The Tornado application Application architecture Monitoring strategy Tagging our Tornado events Monitoring Tornado — Web tier Monitoring HAProxy Monitoring Nginx Addressing the Web tier monitoring concerns Setting up the Tornado checks in Riemann The webtier function Adding Tornado checks to Riemann Summary Monitoring Tornado: Application Tier Monitoring the Application tier JVM Configuring collectd for JMX Collecting our Application tier JVM logs Monitoring the Tornado API application Addressing the Tornado Application tier monitoring concerns Summary Monitoring Tornado: Data tier Monitoring the Data tier MySQL server Using MySQL data for metrics Query timing Monitoring the Data tier's Redis server Addressing the Tornado Data tier monitoring concerns The Tornado dashboard Expanding monitoring beyond Tornado Summary An Introduction to Clojure and Functional Programming A brief introduction to Clojure Installing Leiningen Clojure syntax and types Clojure functions Lists Vectors Sets Maps Strings Creating our own functions Creating variables Creating named functions Learning more Clojure Cover Table of contents The Art of Monitoring Who is this book for? This book is for engineers, developers, sysadmins, operations staff, and those with an interest in monitoring and DevOps. It provides a simple, hands-on introduction to the art of modern application and infrastructure monitoring. There is an expectation that the reader has basic Unix/Linux skills and is familiar with the command line, editing files, installing packages, managing services, and basic networking. Credits and Acknowledgments Ruth Brown, who continues to be the most amazing person in my life.

Monitoring provides data that measures quality or service and provides data that helps IT justify budgets, costs, or new projects. Much of this data is provided directly to business units, application teams, and other relevant parties via dashboards and reports. This is typical in web-centric organizations and many mature startups. This type of approach is also commonly espoused by organizations that have adopted a DevOps culture/methodology. Monitoring will still largely be managed by an operations team, but responsibility for ensuring new applications and services are monitored may be delegated to application developers. Products will not be considered feature complete or ready for deployment without monitoring.


pages: 779 words: 116,439

Test-Driven Development With Python by Harry J. W. Percival

continuous integration, database schema, Debian, DevOps, don't repeat yourself, Firefox, loose coupling, MVC pattern, platform as a service, pull request, web application, WebSocket

Try and leave yourself in a position where you can freely make changes to the design and layout, without having to go back and adjust tests all the time. 130 | Chapter 7: Prettification: Layout and Styling, and What to Test About It www.it-ebooks.info CHAPTER 8 Testing Deployment Using a Staging Site Is all fun and game until you are need of put it in production. — Devops Borat It’s time to deploy the first version of our site and make it public. They say that if you wait until you feel ready to ship, then you’ve waited too long. Is our site usable? Is it better than nothing? Can we make lists on it? Yes, yes, yes. No, you can’t log in yet. No, you can’t mark tasks as completed.

This is one of the chapters I’m most pleased with, and it’s one that people often write to me saying they were really glad they stuck through it. If you’ve never done a server deployment before, it will demystify a whole world for you, and there’s nothing like the feeling of seeing your site live on the actual Internet. Give it a buzzword name like “DevOps” if that’s what it takes to convince you it’s worth it. 131 www.it-ebooks.info Why not ping me a note once your site is live on the web, and send me the URL? It always gives me a warm and fuzzy feeling … obeythe testinggoat@gmail.com. TDD and the Danger Areas of Deployment Deploying a site to a live web server can be a tricky topic.


pages: 192 words: 44,789

Vagrant: Up and Running by Mitchell Hashimoto

Amazon Web Services, barriers to entry, Debian, DevOps, remote working, software as a service, web application

about, Preface, An Introduction to Vagrant alternatives to, Alternatives to Vagrant setting up, Setting Up Vagrant–Conflicting RubyGems Installation common mistakes, Common Mistakes installation, Installing Vagrant–Linux VirtualBox, Installing VirtualBox Vagrantfile about, The Vagrantfile defaults, Setting Vagrantfile Defaults VAGRANT_CWD, VAGRANT_CWD VAGRANT_HOME, VAGRANT_HOME VAGRANT_LOG, VAGRANT_LOG VAGRANT_NO_PLUGINS, VAGRANT_NO_PLUGINS VAGRANT_VAGRANTFILE, VAGRANT_VAGRANTFILE validation, plug-in configuration, Validation version 1 plug-ins, Plug-In Definition version control, Up versions, Installing Vagrant virtual machine, plug-in custom commands, Working with the Virtual Machine–Parsing Command-Line Options VirtualBox export, Box Format installation, Installing VirtualBox installing guest additions, Installing VirtualBox Guest Additions machine, Creating the VirtualBox Machine using Vagrant without, Using Vagrant Without VirtualBox virtualization, Preface, Plain Desktop Virtualization W Windows environmental variable, Troubleshooting and Debugging installing Vagrant, Windows working directory, Hiera Data About the Author Mitchell Hashimoto is a passionate engineer, professional speaker, and entrepreneur. Mitchell has been creating and contributing to open source software for almost a decade. He has spoken at dozens of conferences about his work, such as VelocityConf, OSCON, FOSDEM, and more. Mitchell is the founder of HashiCorp, a company whose goal is to make the best DevOps tools in the world, including Vagrant. Prior to HashiCorp, Mitchell spent five years as a web developer and another four as an operations engineer. Colophon The animal on the cover of Vagrant: Up and Running is a blue rock pigeon (Columba livia). The cover image is from Wood’s Animate Creations.


pages: 255 words: 55,018

Architecting For Scale by Lee Atchison

Amazon Web Services, business process, cloud computing, continuous integration, DevOps, Internet of things, microservices, platform as a service, risk tolerance, software as a service, web application

Once you’ve mastered these skills, your applications will be able to reliably handle huge quantities of traffic as well as huge variability in traffic without affecting the quality your customers expect. Who Should Read This Book This book is intended for software engineers, architects, engineering managers, and directors who build and operate large-scale applications and systems. If you manage software developers, system reliability engineers, or DevOps engineers, or you run an organization that contains large-scale applications and systems, the suggestions and guidance provided in this book will help you make your applications run smoother and more reliably. If your application started small and has seen incredible growth (and is now suffering from some of the growing pains associated with that growth), you might be suffering from reduced reliability and reduced availability.


pages: 590 words: 152,595

Army of None: Autonomous Weapons and the Future of War by Paul Scharre

active measures, Air France Flight 447, algorithmic trading, artificial general intelligence, augmented reality, automated trading system, autonomous vehicles, basic income, brain emulation, Brian Krebs, cognitive bias, computer vision, cuban missile crisis, dark matter, DARPA: Urban Challenge, DevOps, drone strike, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, facts on the ground, fault tolerance, Flash crash, Freestyle chess, friendly fire, IFF: identification friend or foe, ImageNet competition, Internet of things, Johann Wolfgang von Goethe, John Markoff, Kevin Kelly, Loebner Prize, loose coupling, Mark Zuckerberg, moral hazard, mutually assured destruction, Nate Silver, pattern recognition, Rodney Brooks, Rubik’s Cube, self-driving car, sensor fusion, South China Sea, speech recognition, Stanislav Petrov, Stephen Hawking, Steve Ballmer, Steve Wozniak, Stuxnet, superintelligent machines, Tesla Model S, The Signal and the Noise by Nate Silver, theory of mind, Turing test, universal basic income, Valery Gerasimov, Wall-E, William Langewiesche, Y2K, zero day

id=100706&ver=0. 201 speeds measured in microseconds: Michael Lewis, Flash Boys: A Wall Street Revolt (New York: W. W. Norton, 2015), 63, 69, 74, 81. 201 shortest route for their cables: Ibid, 62–63. 201 optimizing every part of their hardware for speed: Ibid, 63–64. 201 test them against actual stock market data: D7, “Knightmare: A DevOps Cautionary Tale,” Doug Seven, April 17, 2014, https://dougseven.com/2014/04/17/knightmare-a-devops-cautionary-tale/. 202 “The Science of Trading, the Standard of Trust”: Jeff Cox, “ ‘Knight-Mare’: Trading Glitches May Just Get Worse,” August 2, 2012, http://www.cnbc.com/id/48464725. 202 Knight’s trading system began flooding the market: “Knight Shows How to Lose $440 Million in 30 Minutes,” Bloomberg.com, August 2, 2012, https://www.bloomberg.com/news/articles/2012-08-02/knight-shows-how-to-lose-440-million-in-30-minutes. 202 neglected to install a “kill switch”: D7, “Knightmare.” 202 executed 4 million trades: “How the Robots Lost: High-Frequency Trading’s Rise and Fall,” Bloomberg.com, June 7, 2013, https://www.bloomberg.com/news/articles/2013-06-06/how-the-robots-lost-high-frequency-tradings-rise-and-fall. 202 Knight was bankrupt: D7, “Knightmare.” 202 “Knightmare on Wall Street”: For a theory on what happened, see Nanex Research, “03-Aug-2012—The Knightmare Explained,” http://www.nanex.net/aqck2/3525.html. 203 Waddell & Reed: Waddell & Reed was not named in the official SEC and CFTC report, which referred only to a “large fundamental trader (a mutual fund complex).”


Learning Ansible 2 - Second Edition by Fabio Alessandro Locati

Amazon Web Services, anti-pattern, cloud computing, continuous integration, Debian, DevOps, don't repeat yourself, Infrastructure as a Service, inventory management, Kickstarter, revision control, source of truth, web application

Since Ansible is an open source project, I thank all companies that decided to invest into it as well as all people that decided to volunteer their time to the project. About the Reviewer Tim Rupp has been working in various fields of computing for the last 10 years. He has held positions in computer security, software engineering, and, most recently, in the fields of cloud computing and DevOps. He was first introduced to Ansible while at Rackspace. As part of the cloud engineering team, he made extensive use of the tool to deploy new capacity for the Rackspace public cloud. Since then, he has contributed patches, provided support for, and presented on Ansible topics at local meetups. Tim is currently a senior software engineer at F5 Networks, where he works on data plane programmability.


Learning Flask Framework by Matt Copperwaite, Charles Leifer

create, read, update, delete, database schema, Debian, DevOps, don't repeat yourself, full text search, place-making, Skype, web application

ISBN 978-1-78398-336-0 www.packtpub.com www.allitebooks.com Credits Authors Project Coordinator Matt Copperwaite Shipra Chawhan Charles Leifer Proofreaders Stephen Copestake Reviewers Abhishek Gahlot Safis Editing Burhan Khalid Indexer Commissioning Editor Mariammal Chettiyar Ashwin Nair Production Coordinator Acquisition Editor Conidon Miranda Subho Gupta Cover Work Content Development Editor Conidon Miranda Mamata Walkar Technical Editors Siddhesh Ghadi Siddhesh Patil Copy Editor Sonia Mathur www.allitebooks.com About the Authors Matt Copperwaite graduated from the University of Plymouth in 2008 with a bachelor of science (Hons) degree in computer systems and networks. Since then, he has worked in various private and public sectors in the UK. Matt is currently working as a Python software developer and DevOps engineer for the UK Government, focusing mainly on Django. However, his first love is Flask, with which he has built several products under the General Public License (GPL). Matt is also a trustee of South London Makerspace, a hackerspace community in South London; a cohost of The Dick Turpin Road Show, a podcast for free and open source software; and LUG Master of Greater London Linux User Group.


pages: 333 words: 64,581

Clean Agile: Back to Basics by Robert C. Martin

Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, c2.com, continuous integration, DevOps, disinformation, double entry bookkeeping, en.wikipedia.org, failed state, Frederick Winslow Taylor, index card, iterative process, Kubernetes, loose coupling, microservices, remote working, revision control, Turing machine

I was careful to isolate each of these concepts into their own modules and to use those names exclusively throughout the application. Those names were my Ubiquitous Language. The Ubiquitous Language is used in all parts of the project. The business uses it. The developers use it. QA uses it. Ops/Devops use it. Even the customers use those parts of it that are appropriate. It supports the business case, the requirements, the design, the architecture, and the acceptance tests. It is a thread of consistency that interconnects the entire project during every phase of its lifecycle.4 4. “It’s an energy field created by all living things.


pages: 296 words: 66,815

The AI-First Company by Ash Fontana

23andMe, Amazon Mechanical Turk, Amazon Web Services, autonomous vehicles, barriers to entry, blockchain, business intelligence, business process, business process outsourcing, call centre, chief data officer, Clayton Christensen, cloud computing, combinatorial explosion, computer vision, crowdsourcing, data acquisition, DevOps, en.wikipedia.org, independent contractor, industrial robot, inventory management, John Conway, knowledge economy, Kubernetes, Lean Startup, minimum viable product, natural language processing, Network effects, optical character recognition, Pareto efficiency, performance metric, price discrimination, recommendation engine, Ronald Coase, software as a service, source of truth, speech recognition, the scientific method, transaction costs, yield management

You don’t need to get this whole loop running in order to get started; you can be successful with just part of it in motion. This chapter aims to increase your awareness of what could go wrong, and when, so that you can stay one step ahead of potential problems. Here we’ll relate model management to concepts familiar to those in the technology industry such as agile development, DevOps, and statistical process control (SPC), while also introducing novel ideas specific to managing intelligent systems. We have two diagrams to guide you through the two parts to this chapter: Implementation and Management. STEPS TO ACCEPTANCE IMPLEMENTATION Taking a model from the lab to live typically involves lots of people, processes, and pieces of software.


pages: 266 words: 79,297

Forge Your Future with Open Source by VM (Vicky) Brasseur

AGPL, anti-pattern, Benevolent Dictator For Life (BDFL), call centre, continuous integration, Debian, DevOps, don't repeat yourself, en.wikipedia.org, Firefox, Guido van Rossum, Internet Archive, Larry Wall, microservices, Perl 6, premature optimization, pull request, Richard Stallman, risk tolerance, Turing machine

Second Edition A single dramatic software failure can cost a company millions of dollars—but can be avoided with simple changes to design and architecture. This new edition of the best-selling industry standard shows you how to create systems that run longer, with fewer failures, and recover better when bad things happen. New coverage includes DevOps, microservices, and cloud-native architecture. Stability antipatterns have grown to include systemic problems in large-scale systems. This is a must-have pragmatic guide to engineering for production systems. Michael Nygard (376 pages) ISBN: 9781680502398 $47.95 Your Code as a Crime Scene Jack the Ripper and legacy codebases have more in common than you’d think.


pages: 291 words: 90,771

Upscale: What It Takes to Scale a Startup. By the People Who've Done It. by James Silver

Airbnb, augmented reality, Ben Horowitz, blockchain, business process, call centre, credit crunch, crowdsourcing, DevOps, family office, future of work, Google Hangouts, high net worth, hiring and firing, Jeff Bezos, Kickstarter, Lean Startup, Lyft, Mark Zuckerberg, minimum viable product, Network effects, pattern recognition, ride hailing / ride sharing, Silicon Valley, Skype, Snapchat, software as a service, Uber and Lyft, uber lyft, WeWork, women in the workforce, Y Combinator

‘If you don’t have good people taking control of those areas, who are able to step up and be accountable for key issues like hiring, product and growth, then you won’t really know where to turn as a founder.’ ‘Communication and ongoing dialogue are critical.’ As a founder, you need to make sure that your teams - right down to specialists, if you’re a software company, such as your back-end and front-end developers, data scientists and DevOps person - are talking to one another, and that people are pointed in the same direction. Obviously the more people you have in the organisation, the more challenging that becomes to manage and orchestrate. ‘That’s where things like strategy, culture, goals and objectives become really important because the larger, the more complex the organisation becomes, the more you need things that really bind people together.’


Industry 4.0: The Industrial Internet of Things by Alasdair Gilchrist

3D printing, additive manufacturing, Amazon Web Services, augmented reality, autonomous vehicles, barriers to entry, business intelligence, business process, chief data officer, cloud computing, connected car, cyber-physical system, deindustrialization, DevOps, digital twin, fault tolerance, global value chain, Google Glasses, hiring and firing, industrial robot, inflight wifi, Infrastructure as a Service, Internet of things, inventory management, job automation, low cost airline, low skilled workers, microservices, millennium bug, pattern recognition, peer-to-peer, platform as a service, pre–internet, race to the bottom, RFID, Skype, smart cities, smart grid, smart meter, smart transportation, software as a service, stealth mode startup, supply-chain management, The future is already here, trade route, undersea cable, web application, WebRTC, Y2K

However, they are correct to stress the importance of operational efficiency as it is paramount to all business and Industry 4.0 lends itself to increased productivity, efficiency, and customer engagement. Merge OT with IT The biggest problem with merging OT (operational technology) with IT (Information technology) is that they have completely different goals and aspirations. It is actually similar to merging operations and development into devops. In reality, OT is about manufacturing and OT workers and technicians have evolved via a different mindset. OT workers have come through the industrial workforce, where employees are labor-oriented and expect that the job they do is vital to the manufacturing of the product. OT staff work hard in difficult conditions and they work to meet production targets and work closely with the factory workforce as part of a team.


pages: 398 words: 86,855

Bad Data Handbook by Q. Ethan McCallum

Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, Chuck Templeton: OpenTable:, cloud computing, cognitive dissonance, combinatorial explosion, commoditize, conceptual framework, database schema, DevOps, en.wikipedia.org, Firefox, Flash crash, functional programming, Gini coefficient, illegal immigration, iterative process, labor-force participation, loose coupling, natural language processing, Netflix Prize, quantitative trading / quantitative finance, recommendation engine, selection bias, sentiment analysis, statistical model, supply-chain management, survivorship bias, text mining, too big to fail, web application

Using a Production Environment for Ad-Hoc Analysis The use cases of performing exploratory analysis or any other data R&D effort are very different than the use cases for running production analytics processes. Generally, the design of production systems specify that they have to meet certain service level agreements (SLAs), such as for uptime (availability) and speed. These systems are maintained by an operations or devops teams, and are usually locked down, have very tight user space quotas, and may be located in self-contained environments for protection. The production processes that run on these systems are clearly defined, consistent, repeatable, and reliable. In contrast, the process of performing ad-hoc analytical tasks is nonlinear, error-prone, and usually requires tools that are in varying states of development, especially when using open source software.


Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, Ethereum, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, functional programming, general-purpose programming language, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, Kubernetes, loose coupling, Marc Andreessen, microservices, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, surveillance capitalism, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them. This approach—automation, rapid prototyping, incremental iteration, being friendly to experimentation, and breaking down large projects into manageable chunks— sounds remarkably like the Agile and DevOps movements of today. Surprisingly little has changed in four decades. 394 | Chapter 10: Batch Processing The sort tool is a great example of a program that does one thing well. It is arguably a better sorting implementation than most programming languages have in their standard libraries (which do not spill to disk and do not use multiple threads, even when that would be beneficial).

The opposite of bounded. 558 | Glossary Index A aborts (transactions), 222, 224 in two-phase commit, 356 performance of optimistic concurrency con‐ trol, 266 retrying aborted transactions, 231 abstraction, 21, 27, 222, 266, 321 access path (in network model), 37, 60 accidental complexity, removing, 21 accountability, 535 ACID properties (transactions), 90, 223 atomicity, 223, 228 consistency, 224, 529 durability, 226 isolation, 225, 228 acknowledgements (messaging), 445 active/active replication (see multi-leader repli‐ cation) active/passive replication (see leader-based rep‐ lication) ActiveMQ (messaging), 137, 444 distributed transaction support, 361 ActiveRecord (object-relational mapper), 30, 232 actor model, 138 (see also message-passing) comparison to Pregel model, 425 comparison to stream processing, 468 Advanced Message Queuing Protocol (see AMQP) aerospace systems, 6, 10, 305, 372 aggregation data cubes and materialized views, 101 in batch processes, 406 in stream processes, 466 aggregation pipeline query language, 48 Agile, 22 minimizing irreversibility, 414, 497 moving faster with confidence, 532 Unix philosophy, 394 agreement, 365 (see also consensus) Airflow (workflow scheduler), 402 Ajax, 131 Akka (actor framework), 139 algorithms algorithm correctness, 308 B-trees, 79-83 for distributed systems, 306 hash indexes, 72-75 mergesort, 76, 402, 405 red-black trees, 78 SSTables and LSM-trees, 76-79 all-to-all replication topologies, 175 AllegroGraph (database), 50 ALTER TABLE statement (SQL), 40, 111 Amazon Dynamo (database), 177 Amazon Web Services (AWS), 8 Kinesis Streams (messaging), 448 network reliability, 279 postmortems, 9 RedShift (database), 93 S3 (object storage), 398 checking data integrity, 530 amplification of bias, 534 of failures, 364, 495 Index | 559 of tail latency, 16, 207 write amplification, 84 AMQP (Advanced Message Queuing Protocol), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 message ordering, 446 analytics, 90 comparison to transaction processing, 91 data warehousing (see data warehousing) parallel query execution in MPP databases, 415 predictive (see predictive analytics) relation to batch processing, 411 schemas for, 93-95 snapshot isolation for queries, 238 stream analytics, 466 using MapReduce, analysis of user activity events (example), 404 anti-caching (in-memory databases), 89 anti-entropy, 178 Apache ActiveMQ (see ActiveMQ) Apache Avro (see Avro) Apache Beam (see Beam) Apache BookKeeper (see BookKeeper) Apache Cassandra (see Cassandra) Apache CouchDB (see CouchDB) Apache Curator (see Curator) Apache Drill (see Drill) Apache Flink (see Flink) Apache Giraph (see Giraph) Apache Hadoop (see Hadoop) Apache HAWQ (see HAWQ) Apache HBase (see HBase) Apache Helix (see Helix) Apache Hive (see Hive) Apache Impala (see Impala) Apache Jena (see Jena) Apache Kafka (see Kafka) Apache Lucene (see Lucene) Apache MADlib (see MADlib) Apache Mahout (see Mahout) Apache Oozie (see Oozie) Apache Parquet (see Parquet) Apache Qpid (see Qpid) Apache Samza (see Samza) Apache Solr (see Solr) Apache Spark (see Spark) 560 | Index Apache Storm (see Storm) Apache Tajo (see Tajo) Apache Tez (see Tez) Apache Thrift (see Thrift) Apache ZooKeeper (see ZooKeeper) Apama (stream analytics), 466 append-only B-trees, 82, 242 append-only files (see logs) Application Programming Interfaces (APIs), 5, 27 for batch processing, 403 for change streams, 456 for distributed transactions, 361 for graph processing, 425 for services, 131-136 (see also services) evolvability, 136 RESTful, 133 SOAP, 133 application state (see state) approximate search (see similarity search) archival storage, data from databases, 131 arcs (see edges) arithmetic mean, 14 ASCII text, 119, 395 ASN.1 (schema language), 127 asynchronous networks, 278, 553 comparison to synchronous networks, 284 formal model, 307 asynchronous replication, 154, 553 conflict detection, 172 data loss on failover, 157 reads from asynchronous follower, 162 Asynchronous Transfer Mode (ATM), 285 atomic broadcast (see total order broadcast) atomic clocks (caesium clocks), 294, 295 (see also clocks) atomicity (concurrency), 553 atomic increment-and-get, 351 compare-and-set, 245, 327 (see also compare-and-set operations) replicated operations, 246 write operations, 243 atomicity (transactions), 223, 228, 553 atomic commit, 353 avoiding, 523, 528 blocking and nonblocking, 359 in stream processing, 360, 477 maintaining derived data, 453 for multi-object transactions, 229 for single-object writes, 230 auditability, 528-533 designing for, 531 self-auditing systems, 530 through immutability, 460 tools for auditable data systems, 532 availability, 8 (see also fault tolerance) in CAP theorem, 337 in service level agreements (SLAs), 15 Avro (data format), 122-127 code generation, 127 dynamically generated schemas, 126 object container files, 125, 131, 414 reader determining writer’s schema, 125 schema evolution, 123 use in Hadoop, 414 awk (Unix tool), 391 AWS (see Amazon Web Services) Azure (see Microsoft) B B-trees (indexes), 79-83 append-only/copy-on-write variants, 82, 242 branching factor, 81 comparison to LSM-trees, 83-85 crash recovery, 82 growing by splitting a page, 81 optimizations, 82 similarity to dynamic partitioning, 212 backpressure, 441, 553 in TCP, 282 backups database snapshot for replication, 156 integrity of, 530 snapshot isolation for, 238 use for ETL processes, 405 backward compatibility, 112 BASE, contrast to ACID, 223 bash shell (Unix), 70, 395, 503 batch processing, 28, 389-431, 553 combining with stream processing lambda architecture, 497 unifying technologies, 498 comparison to MPP databases, 414-418 comparison to stream processing, 464 comparison to Unix, 413-414 dataflow engines, 421-423 fault tolerance, 406, 414, 422, 442 for data integration, 494-498 graphs and iterative processing, 424-426 high-level APIs and languages, 403, 426-429 log-based messaging and, 451 maintaining derived state, 495 MapReduce and distributed filesystems, 397-413 (see also MapReduce) measuring performance, 13, 390 outputs, 411-413 key-value stores, 412 search indexes, 411 using Unix tools (example), 391-394 Bayou (database), 522 Beam (dataflow library), 498 bias, 534 big ball of mud, 20 Bigtable data model, 41, 99 binary data encodings, 115-128 Avro, 122-127 MessagePack, 116-117 Thrift and Protocol Buffers, 117-121 binary encoding based on schemas, 127 by network drivers, 128 binary strings, lack of support in JSON and XML, 114 BinaryProtocol encoding (Thrift), 118 Bitcask (storage engine), 72 crash recovery, 74 Bitcoin (cryptocurrency), 532 Byzantine fault tolerance, 305 concurrency bugs in exchanges, 233 bitmap indexes, 97 blockchains, 532 Byzantine fault tolerance, 305 blocking atomic commit, 359 Bloom (programming language), 504 Bloom filter (algorithm), 79, 466 BookKeeper (replicated log), 372 Bottled Water (change data capture), 455 bounded datasets, 430, 439, 553 (see also batch processing) bounded delays, 553 in networks, 285 process pauses, 298 broadcast hash joins, 409 Index | 561 brokerless messaging, 442 Brubeck (metrics aggregator), 442 BTM (transaction coordinator), 356 bulk synchronous parallel (BSP) model, 425 bursty network traffic patterns, 285 business data processing, 28, 90, 390 byte sequence, encoding data in, 112 Byzantine faults, 304-306, 307, 553 Byzantine fault-tolerant systems, 305, 532 Byzantine Generals Problem, 304 consensus algorithms and, 366 C caches, 89, 553 and materialized views, 101 as derived data, 386, 499-504 database as cache of transaction log, 460 in CPUs, 99, 338, 428 invalidation and maintenance, 452, 467 linearizability, 324 CAP theorem, 336-338, 554 Cascading (batch processing), 419, 427 hash joins, 409 workflows, 403 cascading failures, 9, 214, 281 Cascalog (batch processing), 60 Cassandra (database) column-family data model, 41, 99 compaction strategy, 79 compound primary key, 204 gossip protocol, 216 hash partitioning, 203-205 last-write-wins conflict resolution, 186, 292 leaderless replication, 177 linearizability, lack of, 335 log-structured storage, 78 multi-datacenter support, 184 partitioning scheme, 213 secondary indexes, 207 sloppy quorums, 184 cat (Unix tool), 391 causal context, 191 (see also causal dependencies) causal dependencies, 186-191 capturing, 191, 342, 494, 514 by total ordering, 493 causal ordering, 339 in transactions, 262 sending message to friends (example), 494 562 | Index causality, 554 causal ordering, 339-343 linearizability and, 342 total order consistent with, 344, 345 consistency with, 344-347 consistent snapshots, 340 happens-before relationship, 186 in serializable transactions, 262-265 mismatch with clocks, 292 ordering events to capture, 493 violations of, 165, 176, 292, 340 with synchronized clocks, 294 CEP (see complex event processing) certificate transparency, 532 chain replication, 155 linearizable reads, 351 change data capture, 160, 454 API support for change streams, 456 comparison to event sourcing, 457 implementing, 454 initial snapshot, 455 log compaction, 456 changelogs, 460 change data capture, 454 for operator state, 479 generating with triggers, 455 in stream joins, 474 log compaction, 456 maintaining derived state, 452 Chaos Monkey, 7, 280 checkpointing in batch processors, 422, 426 in high-performance computing, 275 in stream processors, 477, 523 chronicle data model, 458 circuit-switched networks, 284 circular buffers, 450 circular replication topologies, 175 clickstream data, analysis of, 404 clients calling services, 131 pushing state changes to, 512 request routing, 214 stateful and offline-capable, 170, 511 clocks, 287-299 atomic (caesium) clocks, 294, 295 confidence interval, 293-295 for global snapshots, 294 logical (see logical clocks) skew, 291-294, 334 slewing, 289 synchronization and accuracy, 289-291 synchronization using GPS, 287, 290, 294, 295 time-of-day versus monotonic clocks, 288 timestamping events, 471 cloud computing, 146, 275 need for service discovery, 372 network glitches, 279 shared resources, 284 single-machine reliability, 8 Cloudera Impala (see Impala) clustered indexes, 86 CODASYL model, 36 (see also network model) code generation with Avro, 127 with Thrift and Protocol Buffers, 118 with WSDL, 133 collaborative editing multi-leader replication and, 170 column families (Bigtable), 41, 99 column-oriented storage, 95-101 column compression, 97 distinction between column families and, 99 in batch processors, 428 Parquet, 96, 131, 414 sort order in, 99-100 vectorized processing, 99, 428 writing to, 101 comma-separated values (see CSV) command query responsibility segregation (CQRS), 462 commands (event sourcing), 459 commits (transactions), 222 atomic commit, 354-355 (see also atomicity; transactions) read committed isolation, 234 three-phase commit (3PC), 359 two-phase commit (2PC), 355-359 commutative operations, 246 compaction of changelogs, 456 (see also log compaction) for stream operator state, 479 of log-structured storage, 73 issues with, 84 size-tiered and leveled approaches, 79 CompactProtocol encoding (Thrift), 119 compare-and-set operations, 245, 327 implementing locks, 370 implementing uniqueness constraints, 331 implementing with total order broadcast, 350 relation to consensus, 335, 350, 352, 374 relation to transactions, 230 compatibility, 112, 128 calling services, 136 properties of encoding formats, 139 using databases, 129-131 using message-passing, 138 compensating transactions, 355, 461, 526 complex event processing (CEP), 465 complexity distilling in theoretical models, 310 hiding using abstraction, 27 of software systems, managing, 20 composing data systems (see unbundling data‐ bases) compute-intensive applications, 3, 275 concatenated indexes, 87 in Cassandra, 204 Concord (stream processor), 466 concurrency actor programming model, 138, 468 (see also message-passing) bugs from weak transaction isolation, 233 conflict resolution, 171, 174 detecting concurrent writes, 184-191 dual writes, problems with, 453 happens-before relationship, 186 in replicated systems, 161-191, 324-338 lost updates, 243 multi-version concurrency control (MVCC), 239 optimistic concurrency control, 261 ordering of operations, 326, 341 reducing, through event logs, 351, 462, 507 time and relativity, 187 transaction isolation, 225 write skew (transaction isolation), 246-251 conflict-free replicated datatypes (CRDTs), 174 conflicts conflict detection, 172 causal dependencies, 186, 342 in consensus algorithms, 368 in leaderless replication, 184 Index | 563 in log-based systems, 351, 521 in nonlinearizable systems, 343 in serializable snapshot isolation (SSI), 264 in two-phase commit, 357, 364 conflict resolution automatic conflict resolution, 174 by aborting transactions, 261 by apologizing, 527 convergence, 172-174 in leaderless systems, 190 last write wins (LWW), 186, 292 using atomic operations, 246 using custom logic, 173 determining what is a conflict, 174, 522 in multi-leader replication, 171-175 avoiding conflicts, 172 lost updates, 242-246 materializing, 251 relation to operation ordering, 339 write skew (transaction isolation), 246-251 congestion (networks) avoidance, 282 limiting accuracy of clocks, 293 queueing delays, 282 consensus, 321, 364-375, 554 algorithms, 366-368 preventing split brain, 367 safety and liveness properties, 365 using linearizable operations, 351 cost of, 369 distributed transactions, 352-375 in practice, 360-364 two-phase commit, 354-359 XA transactions, 361-364 impossibility of, 353 membership and coordination services, 370-373 relation to compare-and-set, 335, 350, 352, 374 relation to replication, 155, 349 relation to uniqueness constraints, 521 consistency, 224, 524 across different databases, 157, 452, 462, 492 causal, 339-348, 493 consistent prefix reads, 165-167 consistent snapshots, 156, 237-242, 294, 455, 500 (see also snapshots) 564 | Index crash recovery, 82 enforcing constraints (see constraints) eventual, 162, 322 (see also eventual consistency) in ACID transactions, 224, 529 in CAP theorem, 337 linearizability, 324-338 meanings of, 224 monotonic reads, 164-165 of secondary indexes, 231, 241, 354, 491, 500 ordering guarantees, 339-352 read-after-write, 162-164 sequential, 351 strong (see linearizability) timeliness and integrity, 524 using quorums, 181, 334 consistent hashing, 204 consistent prefix reads, 165 constraints (databases), 225, 248 asynchronously checked, 526 coordination avoidance, 527 ensuring idempotence, 519 in log-based systems, 521-524 across multiple partitions, 522 in two-phase commit, 355, 357 relation to consensus, 374, 521 relation to event ordering, 347 requiring linearizability, 330 Consul (service discovery), 372 consumers (message streams), 137, 440 backpressure, 441 consumer offsets in logs, 449 failures, 445, 449 fan-out, 11, 445, 448 load balancing, 444, 448 not keeping up with producers, 441, 450, 502 context switches, 14, 297 convergence (conflict resolution), 172-174, 322 coordination avoidance, 527 cross-datacenter, 168, 493 cross-partition ordering, 256, 294, 348, 523 services, 330, 370-373 coordinator (in 2PC), 356 failure, 358 in XA transactions, 361-364 recovery, 363 copy-on-write (B-trees), 82, 242 CORBA (Common Object Request Broker Architecture), 134 correctness, 6 auditability, 528-533 Byzantine fault tolerance, 305, 532 dealing with partial failures, 274 in log-based systems, 521-524 of algorithm within system model, 308 of compensating transactions, 355 of consensus, 368 of derived data, 497, 531 of immutable data, 461 of personal data, 535, 540 of time, 176, 289-295 of transactions, 225, 515, 529 timeliness and integrity, 524-528 corruption of data detecting, 519, 530-533 due to pathological memory access, 529 due to radiation, 305 due to split brain, 158, 302 due to weak transaction isolation, 233 formalization in consensus, 366 integrity as absence of, 524 network packets, 306 on disks, 227 preventing using write-ahead logs, 82 recovering from, 414, 460 Couchbase (database) durability, 89 hash partitioning, 203-204, 211 rebalancing, 213 request routing, 216 CouchDB (database) B-tree storage, 242 change feed, 456 document data model, 31 join support, 34 MapReduce support, 46, 400 replication, 170, 173 covering indexes, 86 CPUs cache coherence and memory barriers, 338 caching and pipelining, 99, 428 increasing parallelism, 43 CRDTs (see conflict-free replicated datatypes) CREATE INDEX statement (SQL), 85, 500 credit rating agencies, 535 Crunch (batch processing), 419, 427 hash joins, 409 sharded joins, 408 workflows, 403 cryptography defense against attackers, 306 end-to-end encryption and authentication, 519, 543 proving integrity of data, 532 CSS (Cascading Style Sheets), 44 CSV (comma-separated values), 70, 114, 396 Curator (ZooKeeper recipes), 330, 371 curl (Unix tool), 135, 397 cursor stability, 243 Cypher (query language), 52 comparison to SPARQL, 59 D data corruption (see corruption of data) data cubes, 102 data formats (see encoding) data integration, 490-498, 543 batch and stream processing, 494-498 lambda architecture, 497 maintaining derived state, 495 reprocessing data, 496 unifying, 498 by unbundling databases, 499-515 comparison to federated databases, 501 combining tools by deriving data, 490-494 derived data versus distributed transac‐ tions, 492 limits of total ordering, 493 ordering events to capture causality, 493 reasoning about dataflows, 491 need for, 385 data lakes, 415 data locality (see locality) data models, 27-64 graph-like models, 49-63 Datalog language, 60-63 property graphs, 50 RDF and triple-stores, 55-59 query languages, 42-48 relational model versus document model, 28-42 data protection regulations, 542 data systems, 3 about, 4 Index | 565 concerns when designing, 5 future of, 489-544 correctness, constraints, and integrity, 515-533 data integration, 490-498 unbundling databases, 499-515 heterogeneous, keeping in sync, 452 maintainability, 18-22 possible faults in, 221 reliability, 6-10 hardware faults, 7 human errors, 9 importance of, 10 software errors, 8 scalability, 10-18 unreliable clocks, 287-299 data warehousing, 91-95, 554 comparison to data lakes, 415 ETL (extract-transform-load), 92, 416, 452 keeping data systems in sync, 452 schema design, 93 slowly changing dimension (SCD), 476 data-intensive applications, 3 database triggers (see triggers) database-internal distributed transactions, 360, 364, 477 databases archival storage, 131 comparison of message brokers to, 443 dataflow through, 129 end-to-end argument for, 519-520 checking integrity, 531 inside-out, 504 (see also unbundling databases) output from batch workflows, 412 relation to event streams, 451-464 (see also changelogs) API support for change streams, 456, 506 change data capture, 454-457 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 unbundling, 499-515 composing data storage technologies, 499-504 designing applications around dataflow, 504-509 566 | Index observing derived state, 509-515 datacenters geographically distributed, 145, 164, 278, 493 multi-tenancy and shared resources, 284 network architecture, 276 network faults, 279 replication across multiple, 169 leaderless replication, 184 multi-leader replication, 168, 335 dataflow, 128-139, 504-509 correctness of dataflow systems, 525 differential, 504 message-passing, 136-139 reasoning about, 491 through databases, 129 through services, 131-136 dataflow engines, 421-423 comparison to stream processing, 464 directed acyclic graphs (DAG), 424 partitioning, approach to, 429 support for declarative queries, 427 Datalog (query language), 60-63 datatypes binary strings in XML and JSON, 114 conflict-free, 174 in Avro encodings, 122 in Thrift and Protocol Buffers, 121 numbers in XML and JSON, 114 Datomic (database) B-tree storage, 242 data model, 50, 57 Datalog query language, 60 excision (deleting data), 463 languages for transactions, 255 serial execution of transactions, 253 deadlocks detection, in two-phase commit (2PC), 364 in two-phase locking (2PL), 258 Debezium (change data capture), 455 declarative languages, 42, 554 Bloom, 504 CSS and XSL, 44 Cypher, 52 Datalog, 60 for batch processing, 427 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 delays bounded network delays, 285 bounded process pauses, 298 unbounded network delays, 282 unbounded process pauses, 296 deleting data, 463 denormalization (data representation), 34, 554 costs, 39 in derived data systems, 386 materialized views, 101 updating derived data, 228, 231, 490 versus normalization, 462 derived data, 386, 439, 554 from change data capture, 454 in event sourcing, 458-458 maintaining derived state through logs, 452-457, 459-463 observing, by subscribing to streams, 512 outputs of batch and stream processing, 495 through application code, 505 versus distributed transactions, 492 deterministic operations, 255, 274, 554 accidental nondeterminism, 423 and fault tolerance, 423, 426 and idempotence, 478, 492 computing derived data, 495, 526, 531 in state machine replication, 349, 452, 458 joins, 476 DevOps, 394 differential dataflow, 504 dimension tables, 94 dimensional modeling (see star schemas) directed acyclic graphs (DAGs), 424 dirty reads (transaction isolation), 234 dirty writes (transaction isolation), 235 discrimination, 534 disks (see hard disks) distributed actor frameworks, 138 distributed filesystems, 398-399 decoupling from query engines, 417 indiscriminately dumping data into, 415 use by MapReduce, 402 distributed systems, 273-312, 554 Byzantine faults, 304-306 cloud versus supercomputing, 275 detecting network faults, 280 faults and partial failures, 274-277 formalization of consensus, 365 impossibility results, 338, 353 issues with failover, 157 limitations of distributed transactions, 363 multi-datacenter, 169, 335 network problems, 277-286 quorums, relying on, 301 reasons for using, 145, 151 synchronized clocks, relying on, 291-295 system models, 306-310 use of clocks and time, 287 distributed transactions (see transactions) Django (web framework), 232 DNS (Domain Name System), 216, 372 Docker (container manager), 506 document data model, 30-42 comparison to relational model, 38-42 document references, 38, 403 document-oriented databases, 31 many-to-many relationships and joins, 36 multi-object transactions, need for, 231 versus relational model convergence of models, 41 data locality, 41 document-partitioned indexes, 206, 217, 411 domain-driven design (DDD), 457 DRBD (Distributed Replicated Block Device), 153 drift (clocks), 289 Drill (query engine), 93 Druid (database), 461 Dryad (dataflow engine), 421 dual writes, problems with, 452, 507 duplicates, suppression of, 517 (see also idempotence) using a unique ID, 518, 522 durability (transactions), 226, 554 duration (time), 287 measurement with monotonic clocks, 288 dynamic partitioning, 212 dynamically typed languages analogy to schema-on-read, 40 code generation and, 127 Dynamo-style databases (see leaderless replica‐ tion) E edges (in graphs), 49, 403 property graph model, 50 edit distance (full-text search), 88 effectively-once semantics, 476, 516 Index | 567 (see also exactly-once semantics) preservation of integrity, 525 elastic systems, 17 Elasticsearch (search server) document-partitioned indexes, 207 partition rebalancing, 211 percolator (stream search), 467 usage example, 4 use of Lucene, 79 ElephantDB (database), 413 Elm (programming language), 504, 512 encodings (data formats), 111-128 Avro, 122-127 binary variants of JSON and XML, 115 compatibility, 112 calling services, 136 using databases, 129-131 using message-passing, 138 defined, 113 JSON, XML, and CSV, 114 language-specific formats, 113 merits of schemas, 127 representations of data, 112 Thrift and Protocol Buffers, 117-121 end-to-end argument, 277, 519-520 checking integrity, 531 publish/subscribe streams, 512 enrichment (stream), 473 Enterprise JavaBeans (EJB), 134 entities (see vertices) epoch (consensus algorithms), 368 epoch (Unix timestamps), 288 equi-joins, 403 erasure coding (error correction), 398 Erlang OTP (actor framework), 139 error handling for network faults, 280 in transactions, 231 error-correcting codes, 277, 398 Esper (CEP engine), 466 etcd (coordination service), 370-373 linearizable operations, 333 locks and leader election, 330 quorum reads, 351 service discovery, 372 use of Raft algorithm, 349, 353 Ethereum (blockchain), 532 Ethernet (networks), 276, 278, 285 packet checksums, 306, 519 568 | Index Etherpad (collaborative editor), 170 ethics, 533-543 code of ethics and professional practice, 533 legislation and self-regulation, 542 predictive analytics, 533-536 amplifying bias, 534 feedback loops, 536 privacy and tracking, 536-543 consent and freedom of choice, 538 data as assets and power, 540 meaning of privacy, 539 surveillance, 537 respect, dignity, and agency, 543, 544 unintended consequences, 533, 536 ETL (extract-transform-load), 92, 405, 452, 554 use of Hadoop for, 416 event sourcing, 457-459 commands and events, 459 comparison to change data capture, 457 comparison to lambda architecture, 497 deriving current state from event log, 458 immutability and auditability, 459, 531 large, reliable data systems, 519, 526 Event Store (database), 458 event streams (see streams) events, 440 deciding on total order of, 493 deriving views from event log, 461 difference to commands, 459 event time versus processing time, 469, 477, 498 immutable, advantages of, 460, 531 ordering to capture causality, 493 reads as, 513 stragglers, 470, 498 timestamp of, in stream processing, 471 EventSource (browser API), 512 eventual consistency, 152, 162, 308, 322 (see also conflicts) and perpetual inconsistency, 525 evolvability, 21, 111 calling services, 136 graph-structured data, 52 of databases, 40, 129-131, 461, 497 of message-passing, 138 reprocessing data, 496, 498 schema evolution in Avro, 123 schema evolution in Thrift and Protocol Buffers, 120 schema-on-read, 39, 111, 128 exactly-once semantics, 360, 476, 516 parity with batch processors, 498 preservation of integrity, 525 exclusive mode (locks), 258 eXtended Architecture transactions (see XA transactions) extract-transform-load (see ETL) F Facebook Presto (query engine), 93 React, Flux, and Redux (user interface libra‐ ries), 512 social graphs, 49 Wormhole (change data capture), 455 fact tables, 93 failover, 157, 554 (see also leader-based replication) in leaderless replication, absence of, 178 leader election, 301, 348, 352 potential problems, 157 failures amplification by distributed transactions, 364, 495 failure detection, 280 automatic rebalancing causing cascading failures, 214 perfect failure detectors, 359 timeouts and unbounded delays, 282, 284 using ZooKeeper, 371 faults versus, 7 partial failures in distributed systems, 275-277, 310 fan-out (messaging systems), 11, 445 fault tolerance, 6-10, 555 abstractions for, 321 formalization in consensus, 365-369 use of replication, 367 human fault tolerance, 414 in batch processing, 406, 414, 422, 425 in log-based systems, 520, 524-526 in stream processing, 476-479 atomic commit, 477 idempotence, 478 maintaining derived state, 495 microbatching and checkpointing, 477 rebuilding state after a failure, 478 of distributed transactions, 362-364 transaction atomicity, 223, 354-361 faults, 6 Byzantine faults, 304-306 failures versus, 7 handled by transactions, 221 handling in supercomputers and cloud computing, 275 hardware, 7 in batch processing versus distributed data‐ bases, 417 in distributed systems, 274-277 introducing deliberately, 7, 280 network faults, 279-281 asymmetric faults, 300 detecting, 280 tolerance of, in multi-leader replication, 169 software errors, 8 tolerating (see fault tolerance) federated databases, 501 fence (CPU instruction), 338 fencing (preventing split brain), 158, 302-304 generating fencing tokens, 349, 370 properties of fencing tokens, 308 stream processors writing to databases, 478, 517 Fibre Channel (networks), 398 field tags (Thrift and Protocol Buffers), 119-121 file descriptors (Unix), 395 financial data, 460 Firebase (database), 456 Flink (processing framework), 421-423 dataflow APIs, 427 fault tolerance, 422, 477, 479 Gelly API (graph processing), 425 integration of batch and stream processing, 495, 498 machine learning, 428 query optimizer, 427 stream processing, 466 flow control, 282, 441, 555 FLP result (on consensus), 353 FlumeJava (dataflow library), 403, 427 followers, 152, 555 (see also leader-based replication) foreign keys, 38, 403 forward compatibility, 112 forward decay (algorithm), 16 Index | 569 Fossil (version control system), 463 shunning (deleting data), 463 FoundationDB (database) serializable transactions, 261, 265, 364 fractal trees, 83 full table scans, 403 full-text search, 555 and fuzzy indexes, 88 building search indexes, 411 Lucene storage engine, 79 functional reactive programming (FRP), 504 functional requirements, 22 futures (asynchronous operations), 135 fuzzy search (see similarity search) G garbage collection immutability and, 463 process pauses for, 14, 296-299, 301 (see also process pauses) genome analysis, 63, 429 geographically distributed datacenters, 145, 164, 278, 493 geospatial indexes, 87 Giraph (graph processing), 425 Git (version control system), 174, 342, 463 GitHub, postmortems, 157, 158, 309 global indexes (see term-partitioned indexes) GlusterFS (distributed filesystem), 398 GNU Coreutils (Linux), 394 GoldenGate (change data capture), 161, 170, 455 (see also Oracle) Google Bigtable (database) data model (see Bigtable data model) partitioning scheme, 199, 202 storage layout, 78 Chubby (lock service), 370 Cloud Dataflow (stream processor), 466, 477, 498 (see also Beam) Cloud Pub/Sub (messaging), 444, 448 Docs (collaborative editor), 170 Dremel (query engine), 93, 96 FlumeJava (dataflow library), 403, 427 GFS (distributed file system), 398 gRPC (RPC framework), 135 MapReduce (batch processing), 390 570 | Index (see also MapReduce) building search indexes, 411 task preemption, 418 Pregel (graph processing), 425 Spanner (see Spanner) TrueTime (clock API), 294 gossip protocol, 216 government use of data, 541 GPS (Global Positioning System) use for clock synchronization, 287, 290, 294, 295 GraphChi (graph processing), 426 graphs, 555 as data models, 49-63 example of graph-structured data, 49 property graphs, 50 RDF and triple-stores, 55-59 versus the network model, 60 processing and analysis, 424-426 fault tolerance, 425 Pregel processing model, 425 query languages Cypher, 52 Datalog, 60-63 recursive SQL queries, 53 SPARQL, 59-59 Gremlin (graph query language), 50 grep (Unix tool), 392 GROUP BY clause (SQL), 406 grouping records in MapReduce, 406 handling skew, 407 H Hadoop (data infrastructure) comparison to distributed databases, 390 comparison to MPP databases, 414-418 comparison to Unix, 413-414, 499 diverse processing models in ecosystem, 417 HDFS distributed filesystem (see HDFS) higher-level tools, 403 join algorithms, 403-410 (see also MapReduce) MapReduce (see MapReduce) YARN (see YARN) happens-before relationship, 340 capturing, 187 concurrency and, 186 hard disks access patterns, 84 detecting corruption, 519, 530 faults in, 7, 227 sequential write throughput, 75, 450 hardware faults, 7 hash indexes, 72-75 broadcast hash joins, 409 partitioned hash joins, 409 hash partitioning, 203-205, 217 consistent hashing, 204 problems with hash mod N, 210 range queries, 204 suitable hash functions, 203 with fixed number of partitions, 210 HAWQ (database), 428 HBase (database) bug due to lack of fencing, 302 bulk loading, 413 column-family data model, 41, 99 dynamic partitioning, 212 key-range partitioning, 202 log-structured storage, 78 request routing, 216 size-tiered compaction, 79 use of HDFS, 417 use of ZooKeeper, 370 HDFS (Hadoop Distributed File System), 398-399 (see also distributed filesystems) checking data integrity, 530 decoupling from query engines, 417 indiscriminately dumping data into, 415 metadata about datasets, 410 NameNode, 398 use by Flink, 479 use by HBase, 212 use by MapReduce, 402 HdrHistogram (numerical library), 16 head (Unix tool), 392 head vertex (property graphs), 51 head-of-line blocking, 15 heap files (databases), 86 Helix (cluster manager), 216 heterogeneous distributed transactions, 360, 364 heuristic decisions (in 2PC), 363 Hibernate (object-relational mapper), 30 hierarchical model, 36 high availability (see fault tolerance) high-frequency trading, 290, 299 high-performance computing (HPC), 275 hinted handoff, 183 histograms, 16 Hive (query engine), 419, 427 for data warehouses, 93 HCatalog and metastore, 410 map-side joins, 409 query optimizer, 427 skewed joins, 408 workflows, 403 Hollerith machines, 390 hopping windows (stream processing), 472 (see also windows) horizontal scaling (see scaling out) HornetQ (messaging), 137, 444 distributed transaction support, 361 hot spots, 201 due to celebrities, 205 for time-series data, 203 in batch processing, 407 relieving, 205 hot standbys (see leader-based replication) HTTP, use in APIs (see services) human errors, 9, 279, 414 HyperDex (database), 88 HyperLogLog (algorithm), 466 I I/O operations, waiting for, 297 IBM DB2 (database) distributed transaction support, 361 recursive query support, 54 serializable isolation, 242, 257 XML and JSON support, 30, 42 electromechanical card-sorting machines, 390 IMS (database), 36 imperative query APIs, 46 InfoSphere Streams (CEP engine), 466 MQ (messaging), 444 distributed transaction support, 361 System R (database), 222 WebSphere (messaging), 137 idempotence, 134, 478, 555 by giving operations unique IDs, 518, 522 idempotent operations, 517 immutability advantages of, 460, 531 Index | 571 deriving state from event log, 459-464 for crash recovery, 75 in B-trees, 82, 242 in event sourcing, 457 inputs to Unix commands, 397 limitations of, 463 Impala (query engine) for data warehouses, 93 hash joins, 409 native code generation, 428 use of HDFS, 417 impedance mismatch, 29 imperative languages, 42 setting element styles (example), 45 in doubt (transaction status), 358 holding locks, 362 orphaned transactions, 363 in-memory databases, 88 durability, 227 serial transaction execution, 253 incidents cascading failures, 9 crashes due to leap seconds, 290 data corruption and financial losses due to concurrency bugs, 233 data corruption on hard disks, 227 data loss due to last-write-wins, 173, 292 data on disks unreadable, 309 deleted items reappearing, 174 disclosure of sensitive data due to primary key reuse, 157 errors in transaction serializability, 529 gigabit network interface with 1 Kb/s throughput, 311 network faults, 279 network interface dropping only inbound packets, 279 network partitions and whole-datacenter failures, 275 poor handling of network faults, 280 sending message to ex-partner, 494 sharks biting undersea cables, 279 split brain due to 1-minute packet delay, 158, 279 vibrations in server rack, 14 violation of uniqueness constraint, 529 indexes, 71, 555 and snapshot isolation, 241 as derived data, 386, 499-504 572 | Index B-trees, 79-83 building in batch processes, 411 clustered, 86 comparison of B-trees and LSM-trees, 83-85 concatenated, 87 covering (with included columns), 86 creating, 500 full-text search, 88 geospatial, 87 hash, 72-75 index-range locking, 260 multi-column, 87 partitioning and secondary indexes, 206-209, 217 secondary, 85 (see also secondary indexes) problems with dual writes, 452, 491 SSTables and LSM-trees, 76-79 updating when data changes, 452, 467 Industrial Revolution, 541 InfiniBand (networks), 285 InfiniteGraph (database), 50 InnoDB (storage engine) clustered index on primary key, 86 not preventing lost updates, 245 preventing write skew, 248, 257 serializable isolation, 257 snapshot isolation support, 239 inside-out databases, 504 (see also unbundling databases) integrating different data systems (see data integration) integrity, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 in consensus formalization, 365 integrity checks, 530 (see also auditing) end-to-end, 519, 531 use of snapshot isolation, 238 maintaining despite software bugs, 529 Interface Definition Language (IDL), 117, 122 intermediate state, materialization of, 420-423 internet services, systems for implementing, 275 invariants, 225 (see also constraints) inversion of control, 396 IP (Internet Protocol) unreliability of, 277 ISDN (Integrated Services Digital Network), 284 isolation (in transactions), 225, 228, 555 correctness and, 515 for single-object writes, 230 serializability, 251-266 actual serial execution, 252-256 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 violating, 228 weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-237 snapshot isolation, 237-242 iterative processing, 424-426 J Java Database Connectivity (JDBC) distributed transaction support, 361 network drivers, 128 Java Enterprise Edition (EE), 134, 356, 361 Java Message Service (JMS), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 distributed transaction support, 361 message ordering, 446 Java Transaction API (JTA), 355, 361 Java Virtual Machine (JVM) bytecode generation, 428 garbage collection pauses, 296 process reuse in batch processors, 422 JavaScript in MapReduce querying, 46 setting element styles (example), 45 use in advanced queries, 48 Jena (RDF framework), 57 Jepsen (fault tolerance testing), 515 jitter (network delay), 284 joins, 555 by index lookup, 403 expressing as relational operators, 427 in relational and document databases, 34 MapReduce map-side joins, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 MapReduce reduce-side joins, 403-408 handling skew, 407 sort-merge joins, 405 parallel execution of, 415 secondary indexes and, 85 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 support in document databases, 42 JOTM (transaction coordinator), 356 JSON Avro schema representation, 122 binary variants, 115 for application data, issues with, 114 in relational databases, 30, 42 representing a résumé (example), 31 Juttle (query language), 504 K k-nearest neighbors, 429 Kafka (messaging), 137, 448 Kafka Connect (database integration), 457, 461 Kafka Streams (stream processor), 466, 467 fault tolerance, 479 leader-based replication, 153 log compaction, 456, 467 message offsets, 447, 478 request routing, 216 transaction support, 477 usage example, 4 Ketama (partitioning library), 213 key-value stores, 70 as batch process output, 412 hash indexes, 72-75 in-memory, 89 partitioning, 201-205 by hash of key, 203, 217 by key range, 202, 217 dynamic partitioning, 212 skew and hot spots, 205 Kryo (Java), 113 Kubernetes (cluster manager), 418, 506 L lambda architecture, 497 Lamport timestamps, 345 Index | 573 Large Hadron Collider (LHC), 64 last write wins (LWW), 173, 334 discarding concurrent writes, 186 problems with, 292 prone to lost updates, 246 late binding, 396 latency instability under two-phase locking, 259 network latency and resource utilization, 286 response time versus, 14 tail latency, 15, 207 leader-based replication, 152-161 (see also replication) failover, 157, 301 handling node outages, 156 implementation of replication logs change data capture, 454-457 (see also changelogs) statement-based, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 linearizability of operations, 333 locking and leader election, 330 log sequence number, 156, 449 read-scaling architecture, 161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 leaderless replication, 177-191 (see also replication) detecting concurrent writes, 184-191 capturing happens-before relationship, 187 happens-before relationship and concur‐ rency, 186 last write wins, 186 merging concurrently written values, 190 version vectors, 191 multi-datacenter, 184 quorums, 179-182 consistency limitations, 181-183, 334 sloppy quorums and hinted handoff, 183 read repair and anti-entropy, 178 leap seconds, 8, 290 in time-of-day clocks, 288 leases, 295 implementation with ZooKeeper, 370 574 | Index need for fencing, 302 ledgers, 460 distributed ledger technologies, 532 legacy systems, maintenance of, 18 less (Unix tool), 397 LevelDB (storage engine), 78 leveled compaction, 79 Levenshtein automata, 88 limping (partial failure), 311 linearizability, 324-338, 555 cost of, 335-338 CAP theorem, 336 memory on multi-core CPUs, 338 definition, 325-329 implementing with total order broadcast, 350 in ZooKeeper, 370 of derived data systems, 492, 524 avoiding coordination, 527 of different replication methods, 332-335 using quorums, 334 relying on, 330-332 constraints and uniqueness, 330 cross-channel timing dependencies, 331 locking and leader election, 330 stronger than causal consistency, 342 using to implement total order broadcast, 351 versus serializability, 329 LinkedIn Azkaban (workflow scheduler), 402 Databus (change data capture), 161, 455 Espresso (database), 31, 126, 130, 153, 216 Helix (cluster manager) (see Helix) profile (example), 30 reference to company entity (example), 34 Rest.li (RPC framework), 135 Voldemort (database) (see Voldemort) Linux, leap second bug, 8, 290 liveness properties, 308 LMDB (storage engine), 82, 242 load approaches to coping with, 17 describing, 11 load testing, 16 load balancing (messaging), 444 local indexes (see document-partitioned indexes) locality (data access), 32, 41, 555 in batch processing, 400, 405, 421 in stateful clients, 170, 511 in stream processing, 474, 478, 508, 522 location transparency, 134 in the actor model, 138 locks, 556 deadlock, 258 distributed locking, 301-304, 330 fencing tokens, 303 implementation with ZooKeeper, 370 relation to consensus, 374 for transaction isolation in snapshot isolation, 239 in two-phase locking (2PL), 257-261 making operations atomic, 243 performance, 258 preventing dirty writes, 236 preventing phantoms with index-range locks, 260, 265 read locks (shared mode), 236, 258 shared mode and exclusive mode, 258 in two-phase commit (2PC) deadlock detection, 364 in-doubt transactions holding locks, 362 materializing conflicts with, 251 preventing lost updates by explicit locking, 244 log sequence number, 156, 449 logic programming languages, 504 logical clocks, 293, 343, 494 for read-after-write consistency, 164 logical logs, 160 logs (data structure), 71, 556 advantages of immutability, 460 compaction, 73, 79, 456, 460 for stream operator state, 479 creating using total order broadcast, 349 implementing uniqueness constraints, 522 log-based messaging, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 disk space usage, 450 replaying old messages, 451, 496, 498 slow consumers, 450 using logs for message storage, 447 log-structured storage, 71-79 log-structured merge tree (see LSMtrees) replication, 152, 158-161 change data capture, 454-457 (see also changelogs) coordination with snapshot, 156 logical (row-based) replication, 160 statement-based replication, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 scalability limits, 493 loose coupling, 396, 419, 502 lost updates (see updates) LSM-trees (indexes), 78-79 comparison to B-trees, 83-85 Lucene (storage engine), 79 building indexes in batch processes, 411 similarity search, 88 Luigi (workflow scheduler), 402 LWW (see last write wins) M machine learning ethical considerations, 534 (see also ethics) iterative processing, 424 models derived from training data, 505 statistical and numerical algorithms, 428 MADlib (machine learning toolkit), 428 magic scaling sauce, 18 Mahout (machine learning toolkit), 428 maintainability, 18-22, 489 defined, 23 design principles for software systems, 19 evolvability (see evolvability) operability, 19 simplicity and managing complexity, 20 many-to-many relationships in document model versus relational model, 39 modeling as graphs, 49 many-to-one and many-to-many relationships, 33-36 many-to-one relationships, 34 MapReduce (batch processing), 390, 399-400 accessing external services within job, 404, 412 comparison to distributed databases designing for frequent faults, 417 diversity of processing models, 416 diversity of storage, 415 Index | 575 comparison to stream processing, 464 comparison to Unix, 413-414 disadvantages and limitations of, 419 fault tolerance, 406, 414, 422 higher-level tools, 403, 426 implementation in Hadoop, 400-403 the shuffle, 402 implementation in MongoDB, 46-48 machine learning, 428 map-side processing, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 mapper and reducer functions, 399 materialization of intermediate state, 419-423 output of batch workflows, 411-413 building search indexes, 411 key-value stores, 412 reduce-side processing, 403-408 analysis of user activity events (exam‐ ple), 404 grouping records by same key, 406 handling skew, 407 sort-merge joins, 405 workflows, 402 marshalling (see encoding) massively parallel processing (MPP), 216 comparison to composing storage technolo‐ gies, 502 comparison to Hadoop, 414-418, 428 master-master replication (see multi-leader replication) master-slave replication (see leader-based repli‐ cation) materialization, 556 aggregate values, 101 conflicts, 251 intermediate state (batch processing), 420-423 materialized views, 101 as derived data, 386, 499-504 maintaining, using stream processing, 467, 475 Maven (Java build tool), 428 Maxwell (change data capture), 455 mean, 14 media monitoring, 467 median, 14 576 | Index meeting room booking (example), 249, 259, 521 membership services, 372 Memcached (caching server), 4, 89 memory in-memory databases, 88 durability, 227 serial transaction execution, 253 in-memory representation of data, 112 random bit-flips in, 529 use by indexes, 72, 77 memory barrier (CPU instruction), 338 MemSQL (database) in-memory storage, 89 read committed isolation, 236 memtable (in LSM-trees), 78 Mercurial (version control system), 463 merge joins, MapReduce map-side, 410 mergeable persistent data structures, 174 merging sorted files, 76, 402, 405 Merkle trees, 532 Mesos (cluster manager), 418, 506 message brokers (see messaging systems) message-passing, 136-139 advantages over direct RPC, 137 distributed actor frameworks, 138 evolvability, 138 MessagePack (encoding format), 116 messages exactly-once semantics, 360, 476 loss of, 442 using total order broadcast, 348 messaging systems, 440-451 (see also streams) backpressure, buffering, or dropping mes‐ sages, 441 brokerless messaging, 442 event logs, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 replaying old messages, 451, 496, 498 slow consumers, 450 message brokers, 443-446 acknowledgements and redelivery, 445 comparison to event logs, 448, 451 multiple consumers of same topic, 444 reliability, 442 uniqueness in log-based messaging, 522 Meteor (web framework), 456 microbatching, 477, 495 microservices, 132 (see also services) causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 Microsoft Azure Service Bus (messaging), 444 Azure Storage, 155, 398 Azure Stream Analytics, 466 DCOM (Distributed Component Object Model), 134 MSDTC (transaction coordinator), 356 Orleans (see Orleans) SQL Server (see SQL Server) migrating (rewriting) data, 40, 130, 461, 497 modulus operator (%), 210 MongoDB (database) aggregation pipeline, 48 atomic operations, 243 BSON, 41 document data model, 31 hash partitioning (sharding), 203-204 key-range partitioning, 202 lack of join support, 34, 42 leader-based replication, 153 MapReduce support, 46, 400 oplog parsing, 455, 456 partition splitting, 212 request routing, 216 secondary indexes, 207 Mongoriver (change data capture), 455 monitoring, 10, 19 monotonic clocks, 288 monotonic reads, 164 MPP (see massively parallel processing) MSMQ (messaging), 361 multi-column indexes, 87 multi-leader replication, 168-177 (see also replication) handling write conflicts, 171 conflict avoidance, 172 converging toward a consistent state, 172 custom conflict resolution logic, 173 determining what is a conflict, 174 linearizability, lack of, 333 replication topologies, 175-177 use cases, 168 clients with offline operation, 170 collaborative editing, 170 multi-datacenter replication, 168, 335 multi-object transactions, 228 need for, 231 Multi-Paxos (total order broadcast), 367 multi-table index cluster tables (Oracle), 41 multi-tenancy, 284 multi-version concurrency control (MVCC), 239, 266 detecting stale MVCC reads, 263 indexes and snapshot isolation, 241 mutual exclusion, 261 (see also locks) MySQL (database) binlog coordinates, 156 binlog parsing for change data capture, 455 circular replication topology, 175 consistent snapshots, 156 distributed transaction support, 361 InnoDB storage engine (see InnoDB) JSON support, 30, 42 leader-based replication, 153 performance of XA transactions, 360 row-based replication, 160 schema changes in, 40 snapshot isolation support, 242 (see also InnoDB) statement-based replication, 159 Tungsten Replicator (multi-leader replica‐ tion), 170 conflict detection, 177 N nanomsg (messaging library), 442 Narayana (transaction coordinator), 356 NATS (messaging), 137 near-real-time (nearline) processing, 390 (see also stream processing) Neo4j (database) Cypher query language, 52 graph data model, 50 Nephele (dataflow engine), 421 netcat (Unix tool), 397 Netflix Chaos Monkey, 7, 280 Network Attached Storage (NAS), 146, 398 network model, 36 Index | 577 graph databases versus, 60 imperative query APIs, 46 Network Time Protocol (see NTP) networks congestion and queueing, 282 datacenter network topologies, 276 faults (see faults) linearizability and network delays, 338 network partitions, 279, 337 timeouts and unbounded delays, 281 next-key locking, 260 nodes (in graphs) (see vertices) nodes (processes), 556 handling outages in leader-based replica‐ tion, 156 system models for failure, 307 noisy neighbors, 284 nonblocking atomic commit, 359 nondeterministic operations accidental nondeterminism, 423 partial failures in distributed systems, 275 nonfunctional requirements, 22 nonrepeatable reads, 238 (see also read skew) normalization (data representation), 33, 556 executing joins, 39, 42, 403 foreign key references, 231 in systems of record, 386 versus denormalization, 462 NoSQL, 29, 499 transactions and, 223 Notation3 (N3), 56 npm (package manager), 428 NTP (Network Time Protocol), 287 accuracy, 289, 293 adjustments to monotonic clocks, 289 multiple server addresses, 306 numbers, in XML and JSON encodings, 114 O object-relational mapping (ORM) frameworks, 30 error handling and aborted transactions, 232 unsafe read-modify-write cycle code, 244 object-relational mismatch, 29 observer pattern, 506 offline systems, 390 (see also batch processing) 578 | Index stateful, offline-capable clients, 170, 511 offline-first applications, 511 offsets consumer offsets in partitioned logs, 449 messages in partitioned logs, 447 OLAP (online analytic processing), 91, 556 data cubes, 102 OLTP (online transaction processing), 90, 556 analytics queries versus, 411 workload characteristics, 253 one-to-many relationships, 30 JSON representation, 32 online systems, 389 (see also services) Oozie (workflow scheduler), 402 OpenAPI (service definition format), 133 OpenStack Nova (cloud infrastructure) use of ZooKeeper, 370 Swift (object storage), 398 operability, 19 operating systems versus databases, 499 operation identifiers, 518, 522 operational transformation, 174 operators, 421 flow of data between, 424 in stream processing, 464 optimistic concurrency control, 261 Oracle (database) distributed transaction support, 361 GoldenGate (change data capture), 161, 170, 455 lack of serializability, 226 leader-based replication, 153 multi-table index cluster tables, 41 not preventing write skew, 248 partitioned indexes, 209 PL/SQL language, 255 preventing lost updates, 245 read committed isolation, 236 Real Application Clusters (RAC), 330 recursive query support, 54 snapshot isolation support, 239, 242 TimesTen (in-memory database), 89 WAL-based replication, 160 XML support, 30 ordering, 339-352 by sequence numbers, 343-348 causal ordering, 339-343 partial order, 341 limits of total ordering, 493 total order broadcast, 348-352 Orleans (actor framework), 139 outliers (response time), 14 Oz (programming language), 504 P package managers, 428, 505 packet switching, 285 packets corruption of, 306 sending via UDP, 442 PageRank (algorithm), 49, 424 paging (see virtual memory) ParAccel (database), 93 parallel databases (see massively parallel pro‐ cessing) parallel execution of graph analysis algorithms, 426 queries in MPP databases, 216 Parquet (data format), 96, 131 (see also column-oriented storage) use in Hadoop, 414 partial failures, 275, 310 limping, 311 partial order, 341 partitioning, 199-218, 556 and replication, 200 in batch processing, 429 multi-partition operations, 514 enforcing constraints, 522 secondary index maintenance, 495 of key-value data, 201-205 by key range, 202 skew and hot spots, 205 rebalancing partitions, 209-214 automatic or manual rebalancing, 213 problems with hash mod N, 210 using dynamic partitioning, 212 using fixed number of partitions, 210 using N partitions per node, 212 replication and, 147 request routing, 214-216 secondary indexes, 206-209 document-based partitioning, 206 term-based partitioning, 208 serial execution of transactions and, 255 Paxos (consensus algorithm), 366 ballot number, 368 Multi-Paxos (total order broadcast), 367 percentiles, 14, 556 calculating efficiently, 16 importance of high percentiles, 16 use in service level agreements (SLAs), 15 Percona XtraBackup (MySQL tool), 156 performance describing, 13 of distributed transactions, 360 of in-memory databases, 89 of linearizability, 338 of multi-leader replication, 169 perpetual inconsistency, 525 pessimistic concurrency control, 261 phantoms (transaction isolation), 250 materializing conflicts, 251 preventing, in serializability, 259 physical clocks (see clocks) pickle (Python), 113 Pig (dataflow language), 419, 427 replicated joins, 409 skewed joins, 407 workflows, 403 Pinball (workflow scheduler), 402 pipelined execution, 423 in Unix, 394 point in time, 287 polyglot persistence, 29 polystores, 501 PostgreSQL (database) BDR (multi-leader replication), 170 causal ordering of writes, 177 Bottled Water (change data capture), 455 Bucardo (trigger-based replication), 161, 173 distributed transaction support, 361 foreign data wrappers, 501 full text search support, 490 leader-based replication, 153 log sequence number, 156 MVCC implementation, 239, 241 PL/pgSQL language, 255 PostGIS geospatial indexes, 87 preventing lost updates, 245 preventing write skew, 248, 261 read committed isolation, 236 recursive query support, 54 representing graphs, 51 Index | 579 serializable snapshot isolation (SSI), 261 snapshot isolation support, 239, 242 WAL-based replication, 160 XML and JSON support, 30, 42 pre-splitting, 212 Precision Time Protocol (PTP), 290 predicate locks, 259 predictive analytics, 533-536 amplifying bias, 534 ethics of (see ethics) feedback loops, 536 preemption of datacenter resources, 418 of threads, 298 Pregel processing model, 425 primary keys, 85, 556 compound primary key (Cassandra), 204 primary-secondary replication (see leaderbased replication) privacy, 536-543 consent and freedom of choice, 538 data as assets and power, 540 deleting data, 463 ethical considerations (see ethics) legislation and self-regulation, 542 meaning of, 539 surveillance, 537 tracking behavioral data, 536 probabilistic algorithms, 16, 466 process pauses, 295-299 processing time (of events), 469 producers (message streams), 440 programming languages dataflow languages, 504 for stored procedures, 255 functional reactive programming (FRP), 504 logic programming, 504 Prolog (language), 61 (see also Datalog) promises (asynchronous operations), 135 property graphs, 50 Cypher query language, 52 Protocol Buffers (data format), 117-121 field tags and schema evolution, 120 provenance of data, 531 publish/subscribe model, 441 publishers (message streams), 440 punch card tabulating machines, 390 580 | Index pure functions, 48 putting computation near data, 400 Q Qpid (messaging), 444 quality of service (QoS), 285 Quantcast File System (distributed filesystem), 398 query languages, 42-48 aggregation pipeline, 48 CSS and XSL, 44 Cypher, 52 Datalog, 60 Juttle, 504 MapReduce querying, 46-48 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 query optimizers, 37, 427 queueing delays (networks), 282 head-of-line blocking, 15 latency and response time, 14 queues (messaging), 137 quorums, 179-182, 556 for leaderless replication, 179 in consensus algorithms, 368 limitations of consistency, 181-183, 334 making decisions in distributed systems, 301 monitoring staleness, 182 multi-datacenter replication, 184 relying on durability, 309 sloppy quorums and hinted handoff, 183 R R-trees (indexes), 87 RabbitMQ (messaging), 137, 444 leader-based replication, 153 race conditions, 225 (see also concurrency) avoiding with linearizability, 331 caused by dual writes, 452 dirty writes, 235 in counter increments, 235 lost updates, 242-246 preventing with event logs, 462, 507 preventing with serializable isolation, 252 write skew, 246-251 Raft (consensus algorithm), 366 sensitivity to network problems, 369 term number, 368 use in etcd, 353 RAID (Redundant Array of Independent Disks), 7, 398 railways, schema migration on, 496 RAMCloud (in-memory storage), 89 ranking algorithms, 424 RDF (Resource Description Framework), 57 querying with SPARQL, 59 RDMA (Remote Direct Memory Access), 276 read committed isolation level, 234-237 implementing, 236 multi-version concurrency control (MVCC), 239 no dirty reads, 234 no dirty writes, 235 read path (derived data), 509 read repair (leaderless replication), 178 for linearizability, 335 read replicas (see leader-based replication) read skew (transaction isolation), 238, 266 as violation of causality, 340 read-after-write consistency, 163, 524 cross-device, 164 read-modify-write cycle, 243 read-scaling architecture, 161 reads as events, 513 real-time collaborative editing, 170 near-real-time processing, 390 (see also stream processing) publish/subscribe dataflow, 513 response time guarantees, 298 time-of-day clocks, 288 rebalancing partitions, 209-214, 556 (see also partitioning) automatic or manual rebalancing, 213 dynamic partitioning, 212 fixed number of partitions, 210 fixed number of partitions per node, 212 problems with hash mod N, 210 recency guarantee, 324 recommendation engines batch process outputs, 412 batch workflows, 403, 420 iterative processing, 424 statistical and numerical algorithms, 428 records, 399 events in stream processing, 440 recursive common table expressions (SQL), 54 redelivery (messaging), 445 Redis (database) atomic operations, 243 durability, 89 Lua scripting, 255 single-threaded execution, 253 usage example, 4 redundancy hardware components, 7 of derived data, 386 (see also derived data) Reed–Solomon codes (error correction), 398 refactoring, 22 (see also evolvability) regions (partitioning), 199 register (data structure), 325 relational data model, 28-42 comparison to document model, 38-42 graph queries in SQL, 53 in-memory databases with, 89 many-to-one and many-to-many relation‐ ships, 33 multi-object transactions, need for, 231 NoSQL as alternative to, 29 object-relational mismatch, 29 relational algebra and SQL, 42 versus document model convergence of models, 41 data locality, 41 relational databases eventual consistency, 162 history, 28 leader-based replication, 153 logical logs, 160 philosophy compared to Unix, 499, 501 schema changes, 40, 111, 130 statement-based replication, 158 use of B-tree indexes, 80 relationships (see edges) reliability, 6-10, 489 building a reliable system from unreliable components, 276 defined, 6, 22 hardware faults, 7 human errors, 9 importance of, 10 of messaging systems, 442 Index | 581 software errors, 8 Remote Method Invocation (Java RMI), 134 remote procedure calls (RPCs), 134-136 (see also services) based on futures, 135 data encoding and evolution, 136 issues with, 134 using Avro, 126, 135 using Thrift, 135 versus message brokers, 137 repeatable reads (transaction isolation), 242 replicas, 152 replication, 151-193, 556 and durability, 227 chain replication, 155 conflict resolution and, 246 consistency properties, 161-167 consistent prefix reads, 165 monotonic reads, 164 reading your own writes, 162 in distributed filesystems, 398 leaderless, 177-191 detecting concurrent writes, 184-191 limitations of quorum consistency, 181-183, 334 sloppy quorums and hinted handoff, 183 monitoring staleness, 182 multi-leader, 168-177 across multiple datacenters, 168, 335 handling write conflicts, 171-175 replication topologies, 175-177 partitioning and, 147, 200 reasons for using, 145, 151 single-leader, 152-161 failover, 157 implementation of replication logs, 158-161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 state machine replication, 349, 452 using erasure coding, 398 with heterogeneous data systems, 453 replication logs (see logs) reprocessing data, 496, 498 (see also evolvability) from log-based messaging, 451 request routing, 214-216 582 | Index approaches to, 214 parallel query execution, 216 resilient systems, 6 (see also fault tolerance) response time as performance metric for services, 13, 389 guarantees on, 298 latency versus, 14 mean and percentiles, 14 user experience, 15 responsibility and accountability, 535 REST (Representational State Transfer), 133 (see also services) RethinkDB (database) document data model, 31 dynamic partitioning, 212 join support, 34, 42 key-range partitioning, 202 leader-based replication, 153 subscribing to changes, 456 Riak (database) Bitcask storage engine, 72 CRDTs, 174, 191 dotted version vectors, 191 gossip protocol, 216 hash partitioning, 203-204, 211 last-write-wins conflict resolution, 186 leaderless replication, 177 LevelDB storage engine, 78 linearizability, lack of, 335 multi-datacenter support, 184 preventing lost updates across replicas, 246 rebalancing, 213 search feature, 209 secondary indexes, 207 siblings (concurrently written values), 190 sloppy quorums, 184 ring buffers, 450 Ripple (cryptocurrency), 532 rockets, 10, 36, 305 RocksDB (storage engine), 78 leveled compaction, 79 rollbacks (transactions), 222 rolling upgrades, 8, 112 routing (see request routing) row-oriented storage, 96 row-based replication, 160 rowhammer (memory corruption), 529 RPCs (see remote procedure calls) Rubygems (package manager), 428 rules (Datalog), 61 S safety and liveness properties, 308 in consensus algorithms, 366 in transactions, 222 sagas (see compensating transactions) Samza (stream processor), 466, 467 fault tolerance, 479 streaming SQL support, 466 sandboxes, 9 SAP HANA (database), 93 scalability, 10-18, 489 approaches for coping with load, 17 defined, 22 describing load, 11 describing performance, 13 partitioning and, 199 replication and, 161 scaling up versus scaling out, 146 scaling out, 17, 146 (see also shared-nothing architecture) scaling up, 17, 146 scatter/gather approach, querying partitioned databases, 207 SCD (slowly changing dimension), 476 schema-on-read, 39 comparison to evolvable schema, 128 in distributed filesystems, 415 schema-on-write, 39 schemaless databases (see schema-on-read) schemas, 557 Avro, 122-127 reader determining writer’s schema, 125 schema evolution, 123 dynamically generated, 126 evolution of, 496 affecting application code, 111 compatibility checking, 126 in databases, 129-131 in message-passing, 138 in service calls, 136 flexibility in document model, 39 for analytics, 93-95 for JSON and XML, 115 merits of, 127 schema migration on railways, 496 Thrift and Protocol Buffers, 117-121 schema evolution, 120 traditional approach to design, fallacy in, 462 searches building search indexes in batch processes, 411 k-nearest neighbors, 429 on streams, 467 partitioned secondary indexes, 206 secondaries (see leader-based replication) secondary indexes, 85, 557 partitioning, 206-209, 217 document-partitioned, 206 index maintenance, 495 term-partitioned, 208 problems with dual writes, 452, 491 updating, transaction isolation and, 231 secondary sorts, 405 sed (Unix tool), 392 self-describing files, 127 self-joins, 480 self-validating systems, 530 semantic web, 57 semi-synchronous replication, 154 sequence number ordering, 343-348 generators, 294, 344 insufficiency for enforcing constraints, 347 Lamport timestamps, 345 use of timestamps, 291, 295, 345 sequential consistency, 351 serializability, 225, 233, 251-266, 557 linearizability versus, 329 pessimistic versus optimistic concurrency control, 261 serial execution, 252-256 partitioning, 255 using stored procedures, 253, 349 serializable snapshot isolation (SSI), 261-266 detecting stale MVCC reads, 263 detecting writes that affect prior reads, 264 distributed execution, 265, 364 performance of SSI, 265 preventing write skew, 262-265 two-phase locking (2PL), 257-261 index-range locks, 260 performance, 258 Serializable (Java), 113 Index | 583 serialization, 113 (see also encoding) service discovery, 135, 214, 372 using DNS, 216, 372 service level agreements (SLAs), 15 service-oriented architecture (SOA), 132 (see also services) services, 131-136 microservices, 132 causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 remote procedure calls (RPCs), 134-136 issues with, 134 similarity to databases, 132 web services, 132, 135 session windows (stream processing), 472 (see also windows) sessionization, 407 sharding (see partitioning) shared mode (locks), 258 shared-disk architecture, 146, 398 shared-memory architecture, 146 shared-nothing architecture, 17, 146-147, 557 (see also replication) distributed filesystems, 398 (see also distributed filesystems) partitioning, 199 use of network, 277 sharks biting undersea cables, 279 counting (example), 46-48 finding (example), 42 website about (example), 44 shredding (in relational model), 38 siblings (concurrent values), 190, 246 (see also conflicts) similarity search edit distance, 88 genome data, 63 k-nearest neighbors, 429 single-leader replication (see leader-based rep‐ lication) single-threaded execution, 243, 252 in batch processing, 406, 421, 426 in stream processing, 448, 463, 522 size-tiered compaction, 79 skew, 557 584 | Index clock skew, 291-294, 334 in transaction isolation read skew, 238, 266 write skew, 246-251, 262-265 (see also write skew) meanings of, 238 unbalanced workload, 201 compensating for, 205 due to celebrities, 205 for time-series data, 203 in batch processing, 407 slaves (see leader-based replication) sliding windows (stream processing), 472 (see also windows) sloppy quorums, 183 (see also quorums) lack of linearizability, 334 slowly changing dimension (data warehouses), 476 smearing (leap seconds adjustments), 290 snapshots (databases) causal consistency, 340 computing derived data, 500 in change data capture, 455 serializable snapshot isolation (SSI), 261-266, 329 setting up a new replica, 156 snapshot isolation and repeatable read, 237-242 implementing with MVCC, 239 indexes and MVCC, 241 visibility rules, 240 synchronized clocks for global snapshots, 294 snowflake schemas, 95 SOAP, 133 (see also services) evolvability, 136 software bugs, 8 maintaining integrity, 529 solid state drives (SSDs) access patterns, 84 detecting corruption, 519, 530 faults in, 227 sequential write throughput, 75 Solr (search server) building indexes in batch processes, 411 document-partitioned indexes, 207 request routing, 216 usage example, 4 use of Lucene, 79 sort (Unix tool), 392, 394, 395 sort-merge joins (MapReduce), 405 Sorted String Tables (see SSTables) sorting sort order in column storage, 99 source of truth (see systems of record) Spanner (database) data locality, 41 snapshot isolation using clocks, 295 TrueTime API, 294 Spark (processing framework), 421-423 bytecode generation, 428 dataflow APIs, 427 fault tolerance, 422 for data warehouses, 93 GraphX API (graph processing), 425 machine learning, 428 query optimizer, 427 Spark Streaming, 466 microbatching, 477 stream processing on top of batch process‐ ing, 495 SPARQL (query language), 59 spatial algorithms, 429 split brain, 158, 557 in consensus algorithms, 352, 367 preventing, 322, 333 using fencing tokens to avoid, 302-304 spreadsheets, dataflow programming capabili‐ ties, 504 SQL (Structured Query Language), 21, 28, 43 advantages and limitations of, 416 distributed query execution, 48 graph queries in, 53 isolation levels standard, issues with, 242 query execution on Hadoop, 416 résumé (example), 30 SQL injection vulnerability, 305 SQL on Hadoop, 93 statement-based replication, 158 stored procedures, 255 SQL Server (database) data warehousing support, 93 distributed transaction support, 361 leader-based replication, 153 preventing lost updates, 245 preventing write skew, 248, 257 read committed isolation, 236 recursive query support, 54 serializable isolation, 257 snapshot isolation support, 239 T-SQL language, 255 XML support, 30 SQLstream (stream analytics), 466 SSDs (see solid state drives) SSTables (storage format), 76-79 advantages over hash indexes, 76 concatenated index, 204 constructing and maintaining, 78 making LSM-Tree from, 78 staleness (old data), 162 cross-channel timing dependencies, 331 in leaderless databases, 178 in multi-version concurrency control, 263 monitoring for, 182 of client state, 512 versus linearizability, 324 versus timeliness, 524 standbys (see leader-based replication) star replication topologies, 175 star schemas, 93-95 similarity to event sourcing, 458 Star Wars analogy (event time versus process‐ ing time), 469 state derived from log of immutable events, 459 deriving current state from the event log, 458 interplay between state changes and appli‐ cation code, 507 maintaining derived state, 495 maintenance by stream processor in streamstream joins, 473 observing derived state, 509-515 rebuilding after stream processor failure, 478 separation of application code and, 505 state machine replication, 349, 452 statement-based replication, 158 statically typed languages analogy to schema-on-write, 40 code generation and, 127 statistical and numerical algorithms, 428 StatsD (metrics aggregator), 442 stdin, stdout, 395, 396 Stellar (cryptocurrency), 532 Index | 585 stock market feeds, 442 STONITH (Shoot The Other Node In The Head), 158 stop-the-world (see garbage collection) storage composing data storage technologies, 499-504 diversity of, in MapReduce, 415 Storage Area Network (SAN), 146, 398 storage engines, 69-104 column-oriented, 95-101 column compression, 97-99 defined, 96 distinction between column families and, 99 Parquet, 96, 131 sort order in, 99-100 writing to, 101 comparing requirements for transaction processing and analytics, 90-96 in-memory storage, 88 durability, 227 row-oriented, 70-90 B-trees, 79-83 comparing B-trees and LSM-trees, 83-85 defined, 96 log-structured, 72-79 stored procedures, 161, 253-255, 557 and total order broadcast, 349 pros and cons of, 255 similarity to stream processors, 505 Storm (stream processor), 466 distributed RPC, 468, 514 Trident state handling, 478 straggler events, 470, 498 stream processing, 464-481, 557 accessing external services within job, 474, 477, 478, 517 combining with batch processing lambda architecture, 497 unifying technologies, 498 comparison to batch processing, 464 complex event processing (CEP), 465 fault tolerance, 476-479 atomic commit, 477 idempotence, 478 microbatching and checkpointing, 477 rebuilding state after a failure, 478 for data integration, 494-498 586 | Index maintaining derived state, 495 maintenance of materialized views, 467 messaging systems (see messaging systems) reasoning about time, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 types of windows, 472 relation to databases (see streams) relation to services, 508 search on streams, 467 single-threaded execution, 448, 463 stream analytics, 466 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 streams, 440-451 end-to-end, pushing events to clients, 512 messaging systems (see messaging systems) processing (see stream processing) relation to databases, 451-464 (see also changelogs) API support for change streams, 456 change data capture, 454-457 derivative of state by time, 460 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 topics, 440 strict serializability, 329 strong consistency (see linearizability) strong one-copy serializability, 329 subjects, predicates, and objects (in triplestores), 55 subscribers (message streams), 440 (see also consumers) supercomputers, 275 surveillance, 537 (see also privacy) Swagger (service definition format), 133 swapping to disk (see virtual memory) synchronous networks, 285, 557 comparison to asynchronous networks, 284 formal model, 307 synchronous replication, 154, 557 chain replication, 155 conflict detection, 172 system models, 300, 306-310 assumptions in, 528 correctness of algorithms, 308 mapping to the real world, 309 safety and liveness, 308 systems of record, 386, 557 change data capture, 454, 491 treating event log as, 460 systems thinking, 536 T t-digest (algorithm), 16 table-table joins, 474 Tableau (data visualization software), 416 tail (Unix tool), 447 tail vertex (property graphs), 51 Tajo (query engine), 93 Tandem NonStop SQL (database), 200 TCP (Transmission Control Protocol), 277 comparison to circuit switching, 285 comparison to UDP, 283 connection failures, 280 flow control, 282, 441 packet checksums, 306, 519, 529 reliability and duplicate suppression, 517 retransmission timeouts, 284 use for transaction sessions, 229 telemetry (see monitoring) Teradata (database), 93, 200 term-partitioned indexes, 208, 217 termination (consensus), 365 Terrapin (database), 413 Tez (dataflow engine), 421-423 fault tolerance, 422 support by higher-level tools, 427 thrashing (out of memory), 297 threads (concurrency) actor model, 138, 468 (see also message-passing) atomic operations, 223 background threads, 73, 85 execution pauses, 286, 296-298 memory barriers, 338 preemption, 298 single (see single-threaded execution) three-phase commit, 359 Thrift (data format), 117-121 BinaryProtocol, 118 CompactProtocol, 119 field tags and schema evolution, 120 throughput, 13, 390 TIBCO, 137 Enterprise Message Service, 444 StreamBase (stream analytics), 466 time concurrency and, 187 cross-channel timing dependencies, 331 in distributed systems, 287-299 (see also clocks) clock synchronization and accuracy, 289 relying on synchronized clocks, 291-295 process pauses, 295-299 reasoning about, in stream processors, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 timestamp of events, 471 types of windows, 472 system models for distributed systems, 307 time-dependence in stream joins, 475 time-of-day clocks, 288 timeliness, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 timeouts, 279, 557 dynamic configuration of, 284 for failover, 158 length of, 281 timestamps, 343 assigning to events in stream processing, 471 for read-after-write consistency, 163 for transaction ordering, 295 insufficiency for enforcing constraints, 347 key range partitioning by, 203 Lamport, 345 logical, 494 ordering events, 291, 345 Titan (database), 50 tombstones, 74, 191, 456 topics (messaging), 137, 440 total order, 341, 557 limits of, 493 sequence numbers or timestamps, 344 total order broadcast, 348-352, 493, 522 consensus algorithms and, 366-368 Index | 587 implementation in ZooKeeper and etcd, 370 implementing with linearizable storage, 351 using, 349 using to implement linearizable storage, 350 tracking behavioral data, 536 (see also privacy) transaction coordinator (see coordinator) transaction manager (see coordinator) transaction processing, 28, 90-95 comparison to analytics, 91 comparison to data warehousing, 93 transactions, 221-267, 558 ACID properties of, 223 atomicity, 223 consistency, 224 durability, 226 isolation, 225 compensating (see compensating transac‐ tions) concept of, 222 distributed transactions, 352-364 avoiding, 492, 502, 521-528 failure amplification, 364, 495 in doubt/uncertain status, 358, 362 two-phase commit, 354-359 use of, 360-361 XA transactions, 361-364 OLTP versus analytics queries, 411 purpose of, 222 serializability, 251-266 actual serial execution, 252-256 pessimistic versus optimistic concur‐ rency control, 261 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 single-object and multi-object, 228-232 handling errors and aborts, 231 need for multi-object transactions, 231 single-object writes, 230 snapshot isolation (see snapshots) weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-238 transitive closure (graph algorithm), 424 trie (data structure), 88 triggers (databases), 161, 441 implementing change data capture, 455 implementing replication, 161 588 | Index triple-stores, 55-59 SPARQL query language, 59 tumbling windows (stream processing), 472 (see also windows) in microbatching, 477 tuple spaces (programming model), 507 Turtle (RDF data format), 56 Twitter constructing home timelines (example), 11, 462, 474, 511 DistributedLog (event log), 448 Finagle (RPC framework), 135 Snowflake (sequence number generator), 294 Summingbird (processing library), 497 two-phase commit (2PC), 353, 355-359, 558 confusion with two-phase locking, 356 coordinator failure, 358 coordinator recovery, 363 how it works, 357 issues in practice, 363 performance cost, 360 transactions holding locks, 362 two-phase locking (2PL), 257-261, 329, 558 confusion with two-phase commit, 356 index-range locks, 260 performance of, 258 type checking, dynamic versus static, 40 U UDP (User Datagram Protocol) comparison to TCP, 283 multicast, 442 unbounded datasets, 439, 558 (see also streams) unbounded delays, 558 in networks, 282 process pauses, 296 unbundling databases, 499-515 composing data storage technologies, 499-504 federation versus unbundling, 501 need for high-level language, 503 designing applications around dataflow, 504-509 observing derived state, 509-515 materialized views and caching, 510 multi-partition data processing, 514 pushing state changes to clients, 512 uncertain (transaction status) (see in doubt) uniform consensus, 365 (see also consensus) uniform interfaces, 395 union type (in Avro), 125 uniq (Unix tool), 392 uniqueness constraints asynchronously checked, 526 requiring consensus, 521 requiring linearizability, 330 uniqueness in log-based messaging, 522 Unix philosophy, 394-397 command-line batch processing, 391-394 Unix pipes versus dataflow engines, 423 comparison to Hadoop, 413-414 comparison to relational databases, 499, 501 comparison to stream processing, 464 composability and uniform interfaces, 395 loose coupling, 396 pipes, 394 relation to Hadoop, 499 UPDATE statement (SQL), 40 updates preventing lost updates, 242-246 atomic write operations, 243 automatically detecting lost updates, 245 compare-and-set operations, 245 conflict resolution and replication, 246 using explicit locking, 244 preventing write skew, 246-251 V validity (consensus), 365 vBuckets (partitioning), 199 vector clocks, 191 (see also version vectors) vectorized processing, 99, 428 verification, 528-533 avoiding blind trust, 530 culture of, 530 designing for auditability, 531 end-to-end integrity checks, 531 tools for auditable data systems, 532 version control systems, reliance on immutable data, 463 version vectors, 177, 191 capturing causal dependencies, 343 versus vector clocks, 191 Vertica (database), 93 handling writes, 101 replicas using different sort orders, 100 vertical scaling (see scaling up) vertices (in graphs), 49 property graph model, 50 Viewstamped Replication (consensus algo‐ rithm), 366 view number, 368 virtual machines, 146 (see also cloud computing) context switches, 297 network performance, 282 noisy neighbors, 284 reliability in cloud services, 8 virtualized clocks in, 290 virtual memory process pauses due to page faults, 14, 297 versus memory management by databases, 89 VisiCalc (spreadsheets), 504 vnodes (partitioning), 199 Voice over IP (VoIP), 283 Voldemort (database) building read-only stores in batch processes, 413 hash partitioning, 203-204, 211 leaderless replication, 177 multi-datacenter support, 184 rebalancing, 213 reliance on read repair, 179 sloppy quorums, 184 VoltDB (database) cross-partition serializability, 256 deterministic stored procedures, 255 in-memory storage, 89 output streams, 456 secondary indexes, 207 serial execution of transactions, 253 statement-based replication, 159, 479 transactions in stream processing, 477 W WAL (write-ahead log), 82 web services (see services) Web Services Description Language (WSDL), 133 webhooks, 443 webMethods (messaging), 137 WebSocket (protocol), 512 Index | 589 windows (stream processing), 466, 468-472 infinite windows for changelogs, 467, 474 knowing when all events have arrived, 470 stream joins within a window, 473 types of windows, 472 winners (conflict resolution), 173 WITH RECURSIVE syntax (SQL), 54 workflows (MapReduce), 402 outputs, 411-414 key-value stores, 412 search indexes, 411 with map-side joins, 410 working set, 393 write amplification, 84 write path (derived data), 509 write skew (transaction isolation), 246-251 characterizing, 246-251, 262 examples of, 247, 249 materializing conflicts, 251 occurrence in practice, 529 phantoms, 250 preventing in snapshot isolation, 262-265 in two-phase locking, 259-261 options for, 248 write-ahead log (WAL), 82, 159 writes (database) atomic write operations, 243 detecting writes affecting prior reads, 264 preventing dirty writes with read commit‐ ted, 235 WS-* framework, 133 (see also services) WS-AtomicTransaction (2PC), 355 590 | Index X XA transactions, 355, 361-364 heuristic decisions, 363 limitations of, 363 xargs (Unix tool), 392, 396 XML binary variants, 115 encoding RDF data, 57 for application data, issues with, 114 in relational databases, 30, 41 XSL/XPath, 45 Y Yahoo!


pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, Ethereum, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, functional programming, general-purpose programming language, informal economy, information retrieval, Infrastructure as a Service, Internet of things, iterative process, John von Neumann, Kubernetes, loose coupling, Marc Andreessen, microservices, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, surveillance capitalism, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them. This approach—automation, rapid prototyping, incremental iteration, being friendly to experimentation, and breaking down large projects into manageable chunks—sounds remarkably like the Agile and DevOps movements of today. Surprisingly little has changed in four decades. The sort tool is a great example of a program that does one thing well. It is arguably a better sorting implementation than most programming languages have in their standard libraries (which do not spill to disk and do not use multiple threads, even when that would be beneficial).

in derived data systems, Derived Data materialized views, Aggregation: Data Cubes and Materialized Views updating derived data, Single-Object and Multi-Object Operations, The need for multi-object transactions, Combining Specialized Tools by Deriving Data versus normalization, Deriving several views from the same event log derived data, Derived Data, Stream Processing, Glossaryfrom change data capture, Implementing change data capture in event sourcing, Deriving current state from the event log-Deriving current state from the event log maintaining derived state through logs, Databases and Streams-API support for change streams, State, Streams, and Immutability-Concurrency control observing, by subscribing to streams, End-to-end event streams outputs of batch and stream processing, Batch and Stream Processing through application code, Application code as a derivation function versus distributed transactions, Derived data versus distributed transactions deterministic operations, Pros and cons of stored procedures, Faults and Partial Failures, Glossaryaccidental nondeterminism, Fault tolerance and fault tolerance, Fault tolerance, Fault tolerance and idempotence, Idempotence, Reasoning about dataflows computing derived data, Maintaining derived state, Correctness of dataflow systems, Designing for auditability in state machine replication, Using total order broadcast, Databases and Streams, Deriving current state from the event log joins, Time-dependence of joins DevOps, The Unix Philosophy differential dataflow, What’s missing? dimension tables, Stars and Snowflakes: Schemas for Analytics dimensional modeling (see star schemas) directed acyclic graphs (DAGs), Graphs and Iterative Processing dirty reads (transaction isolation), No dirty reads dirty writes (transaction isolation), No dirty writes discrimination, Bias and discrimination disks (see hard disks) distributed actor frameworks, Distributed actor frameworks distributed filesystems, MapReduce and Distributed Filesystems-MapReduce and Distributed Filesystemsdecoupling from query engines, Diversity of processing models indiscriminately dumping data into, Diversity of storage use by MapReduce, MapReduce workflows distributed systems, The Trouble with Distributed Systems-Summary, GlossaryByzantine faults, Byzantine Faults-Weak forms of lying cloud versus supercomputing, Cloud Computing and Supercomputing detecting network faults, Detecting Faults faults and partial failures, Faults and Partial Failures-Cloud Computing and Supercomputing formalization of consensus, Fault-Tolerant Consensus impossibility results, The CAP theorem, Distributed Transactions and Consensus issues with failover, Leader failure: Failover limitations of distributed transactions, Limitations of distributed transactions multi-datacenter, Multi-datacenter operation, The Cost of Linearizability network problems, Unreliable Networks-Can we not simply make network delays predictable?


pages: 406 words: 105,602

The Startup Way: Making Entrepreneurship a Fundamental Discipline of Every Enterprise by Eric Ries

activist fund / activist shareholder / activist investor, Affordable Care Act / Obamacare, Airbnb, autonomous vehicles, barriers to entry, basic income, Ben Horowitz, Black-Scholes formula, call centre, centralized clearinghouse, Clayton Christensen, cognitive dissonance, connected car, corporate governance, DevOps, Elon Musk, en.wikipedia.org, fault tolerance, Frederick Winslow Taylor, global supply chain, hockey-stick growth, index card, Jeff Bezos, Kickstarter, Lean Startup, loss aversion, Marc Andreessen, Mark Zuckerberg, means of production, minimum viable product, moral hazard, move fast and break things, move fast and break things, obamacare, peer-to-peer, place-making, rent-seeking, Richard Florida, Sam Altman, Sand Hill Road, secular stagnation, shareholder value, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, Steve Jobs, the scientific method, time value of money, Toyota Production System, Uber for X, universal basic income, web of trust, Y Combinator

In government alone, projects like RFP-EZ11 (Request for Proposal-EZ), one of the first Presidential Innovation Fellows projects, which created an online marketplace where small businesses could bid for government work; and the Agile Blanket Purchase Agreement (Agile BPA),12 which gives the entire government access to contractors and vendors who provide agile delivery services like DevOps, user-centered design, and agile software development have both cut the requirements and time needed for purchasing, leading to faster resolution of critical problems. But that’s not all. Even the Nuclear Codes Need Procurement Reform It may seem highly improbable that procurement reform, which many consider inherently uninteresting, could be connected to something as critical and sensitive as generating nuclear codes.


pages: 688 words: 107,867

Python Data Analytics: With Pandas, NumPy, and Matplotlib by Fabio Nelli

Amazon Web Services, backpropagation, centre right, computer vision, Debian, DevOps, functional programming, Google Earth, Guido van Rossum, Internet of things, optical character recognition, pattern recognition, sentiment analysis, speech recognition, statistical model, web application

About the Technical Reviewer Raul Samayoa is a senior software developer and machine learning specialist with many years of experience in the financial industry. An MSc graduate from the Georgia Institute of Technology, he’s never met a neural network or dataset he did not like. He’s fond of evangelizing the use of DevOps tools for data science and software development. Raul enjoys the energy of his hometown of Toronto, Canada, where he runs marathons, volunteers as a technology instructor with the University of Toronto coders, and likes to work with data in Python and R. © Fabio Nelli 2018 Fabio NelliPython Data Analyticshttps://doi.org/10.1007/978-1-4842-3913-1_1 1.


pages: 455 words: 133,719

Overwhelmed: Work, Love, and Play When No One Has the Time by Brigid Schulte

8-hour work day, affirmative action, Bertrand Russell: In Praise of Idleness, blue-collar work, Burning Man, business cycle, call centre, cognitive dissonance, David Brooks, deliberate practice, desegregation, DevOps, East Village, Edward Glaeser, epigenetics, fear of failure, feminist movement, financial independence, game design, gender pay gap, glass ceiling, helicopter parent, hiring and firing, income inequality, job satisfaction, John Maynard Keynes: Economic Possibilities for our Grandchildren, knowledge economy, knowledge worker, labor-force participation, meta-analysis, new economy, profit maximization, Results Only Work Environment, Richard Feynman, Ronald Reagan, Saturday Night Live, sensible shoes, sexual politics, Silicon Valley, Skype, Steve Jobs, The Theory of the Leisure Class by Thorstein Veblen, Thorstein Veblen, women in the workforce, working poor, Zipcar, éminence grise

Robinson based some of her conclusions on a white paper written by her computer game designer husband, Evan Robinson: “Why Crunch Modes Doesn’t Work: Six Lessons,” International Game Developers Association, www.igda.org/why-crunch-modes-doesnt-work-six-lessons. 16. Christopher P. Landrigan et al., “Effect of Reducing Interns’ Work Hours on Serious Medical Errors in Intensive Care Units,” New England Journal of Medicine 351 (2004): 1838–48, doi: 10.1056/NEJMoa041406. 17. Klint Finley, “What Research Says About Working Long Hours,” Devops Angle, April 18, 2012, http://devopsangle.com/2012/04/18/what-research-says-about-working-long-hours/. 18. www.businessinsider.com/best-buy-ending-work-from-home-2013-3. 19. U.S. Department of Commerce, Economics and Statistics Administration, “Women-Owned Businesses in the 21st Century,” White House Council on Women and Girls, October 2010, www.esa.doc.gov/sites/default/files/reports/documents/women-owned-businesses.pdf.


pages: 821 words: 178,631

The Rust Programming Language by Steve Klabnik, Carol Nichols

anti-pattern, bioinformatics, business process, cryptocurrency, DevOps, Firefox, functional programming, Internet of things, iterative process, pull request, Ruby on Rails, type inference

Through efforts such as this book, the Rust teams want to make systems concepts more accessible to more people, especially those new to programming. Companies Hundreds of companies, large and small, use Rust in production for a variety of tasks. Those tasks include command line tools, web services, DevOps tooling, embedded devices, audio and video analysis and transcoding, cryptocurrencies, bioinformatics, search engines, Internet of Things applications, machine learning, and even major parts of the Firefox web browser. Open Source Developers Rust is for people who want to build the Rust programming language, community, developer tools, and libraries.


pages: 394 words: 110,352

The Art of Community: Building the New Age of Participation by Jono Bacon

barriers to entry, Benevolent Dictator For Life (BDFL), collaborative editing, crowdsourcing, Debian, DevOps, do-ocracy, en.wikipedia.org, Firefox, game design, Guido van Rossum, Johann Wolfgang von Goethe, Jono Bacon, Kickstarter, Larry Wall, Mark Shuttleworth, Mark Zuckerberg, openstreetmap, Richard Stallman, side project, Silicon Valley, Skype, slashdot, social graph, software as a service, telemarketer, union organizing, VA Linux, web application

As such, we needed to pick which events we wanted him to attend, and pick wisely. With this in mind I asked Jorge to put together a spreadsheet that listed all the events that could be interesting for us to attend. The focus of this list was clear: these need to be cloud events and oriented around technology (as opposed to business events) and DevOps (the audience we were focusing on). I asked Jorge to gather this list of events and to determine the following characteristics for each one: Location and venue Date(s) of the event Typical attendance size Number of sessions and average talk audience size Team priority Each of these pieces of information helped to provide an overview of each event and its respective details.