performance metric

76 results back to index

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling by Ralph Kimball, Margy Ross

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

active measures, Albert Einstein, business intelligence, business process, call centre, cloud computing, data acquisition, discrete time, inventory management, iterative process, job automation, knowledge worker, performance metric, platform as a service, side project, supply-chain management, zero-sum game

Multiple study groups can be defined and derivative study groups can be created with intersections, unions, and set differences. Chapter 8 Customer Relationship Management, p 249 Aggregated Facts as Dimension Attributes Business users are often interested in constraining the customer dimension based on aggregated performance metrics, such as filtering on all customers who spent over a certain dollar amount during last year or perhaps over the customer’s lifetime. Selected aggregated facts can be placed in a dimension as targets for constraining and as row labels for reporting. The metrics are often presented as banded ranges in the dimension table. Dimension attributes representing aggregated performance metrics add burden to the ETL processing, but ease the analytic burden in the BI layer. Chapter 8 Customer Relationship Management, p 239 Dynamic Value Bands A dynamic value banding report is organized as a series of report row headers that define a progressive set of varying-sized ranges of a target numeric fact.

These en masse changes are prime candidates because business users often want the ability to analyze performance metrics using either the pre- or post-hierarchy reorganization for a period of time. With type 3 changes, the prior column is labeled to distinctly represent the prechanged grouping, such as 2012 department or premerger department. These column names provide clarity, but there may be unwanted ripples in the BI layer. Finally, if the type 3 attribute represents a hierarchical rollup level within the dimension, then as discussed with type 1, the type 3 update and additional column would likely cause OLAP cubes to be reprocessed. 156 Chapter 5 Multiple Type 3 Attributes If a dimension attribute changes with a predictable rhythm, sometimes the business wants to summarize performance metrics based on any of the historic attribute values.

The bulk of the document centers on the business processes; for each process, describe why business users want to analyze the process’s performance metrics, what capabilities they want, their current limitations, and potential benefits or impact. Commentary about the feasibility of tackling each process is also important. As described in Chapter 4 and illustrated in Figure 4-11, the processes are sometimes unveiled in an opportunity/stakeholder matrix to convey the impact across the organization. In this case, the rows of the opportunity matrix identify business processes, just like a bus matrix. However, in the opportunity matrix, the columns identify the organizational groups or functions. Surprisingly, this matrix is usually quite dense because many groups want access to the same core performance metrics. Prioritizing Requirements The consolidated findings document serves as the basis for presentations back to senior management and other requirements participants.


pages: 263 words: 75,455

Quantitative Value: A Practitioner's Guide to Automating Intelligent Investment and Eliminating Behavioral Errors by Wesley R. Gray, Tobias E. Carlisle

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

activist fund / activist shareholder / activist investor, Albert Einstein, Andrei Shleifer, asset allocation, Atul Gawande, backtesting, beat the dealer, Black Swan, capital asset pricing model, Checklist Manifesto, cognitive bias, compound rate of return, corporate governance, correlation coefficient, credit crunch, Daniel Kahneman / Amos Tversky, discounted cash flows, Edward Thorp, Eugene Fama: efficient market hypothesis, forensic accounting, hindsight bias, intangible asset, Louis Bachelier, p-value, passive investing, performance metric, quantitative hedge fund, random walk, Richard Thaler, risk-adjusted returns, Robert Shiller, Robert Shiller, shareholder value, Sharpe ratio, short selling, statistical model, survivorship bias, systematic trading, The Myth of the Rational Market, time value of money, transaction costs

When we examine the price ratios on a factor-adjusted basis using CAPM alpha, we again find that EBIT enterprise multiple is a top-performing metric, showing statistically and economically significant alpha of 5.23 percent for the top decile stocks. Here, the alternative EBITDA enterprise yield, earnings yield, and gross profits yield also perform well. BM and the free cash flow yield show smaller alphas than the other metrics. The EBIT enterprise multiple shines on a risk-adjusted basis using the Sharpe and Sortino ratios. The EBIT enterprise multiple shows a Sharpe ratio, which calculates risk-to-reward by examining excess return against volatility, of 0.58. When we examine the metric's risk/reward ratio using the Sortino ratio, which ignores upside volatility, and measures only excess return against downside volatility, we again find the augmented enterprise multiple to be the best-performed metric, with a Sortino ratio of 0.89.

Figure 1.1 sets out a brief graphical overview of the performance of the cheapest stocks according to common fundamental price ratios, such as the price-to-earnings (P/E) ratio, the price-to-book (P/B) ratio, and the EBITDA enterprise multiple (total enterprise value divided by earnings before interest, taxes, depreciation, and amortization, or TEV/EBITDA). FIGURE 1.1 Cumulative Returns to Common Price Ratios As Figure 1.1 illustrates, value investing according to simple fundamental price ratios has cumulatively beaten the S&P 500 over almost 50 years. Table 1.1 shows some additional performance metrics for the price ratios. The numbers illustrate that value strategies have been very successful (Chapter 7 has a detailed discussion of our method of our investment simulation procedures). TABLE 1.1 Long-Term Performance of Common Price Ratios (1964 to 2011) The counterargument to the empirical outperformance of value stocks is that these stocks are inherently more risky. In this instance, risk is defined as the additional volatility of the value stocks.

To this end, we focus our quantitative metrics on long-term averages for a set of simple measures. We have chosen eight years as our “long term” for two reasons: First, eight years likely captures a boom-and-bust cycle for the typical stock, and, second, there are sufficient stocks with eight years of historical data that we can identify a sufficiently large universe of stocks.9 We analyze three long-term, high-return operating performance metrics and rank these variables against the entire universe of stocks: long-term free cash flow on assets, long-term geometric return on assets, and long-term geometric return on capital, discussed next. The first measure is long-term free cash flow on assets (CFOA), defined as the sum of eight years of free cash flow divided by total assets. The measure can be expressed more formally as follows: CFOA = Sum (Eight Years Free Cash Flow) / Total Assets We define free cash flow as net income + depreciation and amortization − changes in working capital − capital expenditures.


pages: 372 words: 67,140

Jenkins Continuous Integration Cookbook by Alan Berg

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

anti-pattern, continuous integration, Debian, don't repeat yourself, en.wikipedia.org, Firefox, job automation, performance metric, revision control, web application, x509 certificate

Consider storing User-Agents and other browser headers in a textfile, and then picking the values up for HTTP requests through the CSV Data Set Config element. This is useful if resources returned to your web browser, such as JavaScript or images, depend on the User-Agents. JMeter can then loop through the User-Agents, asserting that the resources exist. See also Reporting JMeter performance metrics Functional testing using JMeter assertions Reporting JMeter performance metrics In this recipe, you will be shown how to configure Jenkins to run a JMeter test plan, and then collect and report the results. The passing of variables from an Ant script to JMeter will also be explained. Getting ready It is assumed that you have run through the last recipe, Creating JMeter test plans. You will also need to install the Jenkins performance plugin (https://wiki.jenkins-ci.org/display/JENKINS/Performance+Plugin).

See also Looking for "smelly" code through code coverage Activating more PMD rulesets Interpreting JavaNCSS Chapter 6. Testing Remotely In this chapter, we will cover the following recipes: Deploying a WAR file from Jenkins to Tomcat Creating multiple Jenkins nodes Testing with Fitnesse Activating Fitnesse HtmlUnit Fixtures Running Selenium IDE tests Triggering failsafe integration tests with Selenium Webdriver Creating JMeter test plans Reporting JMeter performance metrics Functional testing using JMeter assertions Enabling Sakai web services Writing test plans with SoapUI Reporting SoapUI test results Introduction By the end of this chapter, you will have ran performance and functional tests against web applications and web services. Two typical setup recipes are included. The first is the deployment of a war file through Jenkins to an application server.

This allows JMeter to fail Jenkins builds based on a range of JMeter tests. This approach is especially important when starting from an HTML mockup of a web application, whose underlying code is changing rapidly. The test plan logs in and out of your local instance of Jenkins, checking size, duration, and text found in the login response. Getting ready We assume that you have already performed the Creating JMeter test plans and Reporting JMeter performance metrics recipes. The recipe requires the creation of a user tester1 in Jenkins. Feel free to change the username and password. Remember to delete the test user once it is no longer needed. How to do it... Create a user in Jenkins named tester1 with password testtest. Run JMeter. In the Test Plan element, change Name to LoginLogoutPlan, and add the following details for User Defined Variables:Name: USER; Value:tester1 Name: PASS; Value:testtest Right-click on Test Plan, then select Add | Config Element | HTTP cookie Manager.


pages: 597 words: 119,204

Website Optimization by Andrew B. King

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AltaVista, bounce rate, don't be evil, en.wikipedia.org, Firefox, In Cold Blood by Truman Capote, information retrieval, iterative process, medical malpractice, Network effects, performance metric, search engine result page, second-price auction, second-price sealed-bid, semantic web, Silicon Valley, slashdot, social graph, Steve Jobs, web application

You can then retrieve revenue information about your conversions by running a report in the AdWords interface that opts to include value information for conversion columns. Tracking and Metrics You should track the success of all PPC elements through website analytics and conversion tracking. Google offers a free analytics program called Google Analytics. With it you can track multiple campaigns and get separate data for organic and paid listings. Whatever tracking program you use, you have to be careful to keep track of performance metrics correctly. The first step in optimizing a PPC campaign is to use appropriate metrics. Profitable campaigns with equally valued conversions might be optimized to: Reduce the CPC given the same (or greater) click volume and conversion rates. Increase the CTR given the same (or a greater) number of impressions and the same (or better) conversion rates. Increase conversion rates given the same (or a greater) number of clicks.

ComScore, http://www.comscore.com/request/cookie_deletion_white_paper.pdf (accessed February 5, 2008). According to the study, "Approximately 31 percent of U.S. computer users clear their first-party cookies in a month " Under these conditions, a server-centric measurement would overestimate unique visitors by 150%. [166] PathLoss is a metric developed by Paul Holstein of CableOrganizer.com. Web Performance Metrics At first glance, measuring the speed of a web page seems straightforward. Start a timer. Load up the page. Click Stop when the web page is "ready." Write down the time. For users, however, "ready" varies across different browsers on different connection speeds (dial-up, DSL, cable, LAN) at different locations (Washington, DC, versus Mountain View, California, versus Bangalore, India) at different times of the day (peak versus off-peak times) and from different browse paths (fresh from search results or accessed from a home page).

Tip If you have a machine dedicated to performance analysis, use about:blank as your home page. IBM Page Detailer IBM Page Detailer is a Windows tool that sits quietly in the background as you browse. It captures snapshots of how objects are loading on the page behind the scenes. Download it from http://www.alphaworks.ibm.com/tech/pagedetailer/download. IBM Page Detailer captures three basic performance metrics: load time, bytes, and items. These correlate to the Document Complete, kilobytes received, and number of requests metrics we are tracking. We recommend capturing three to five page loads and averaging the metrics to ensure that no anomalies impacted performance in the data, such as a larger ad. It is important, however, to note the occurrence and work to mitigate such anomalies. Table 10-2 shows our averaged results.


pages: 502 words: 107,510

Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, game design, information retrieval, iterative process, natural language processing, pattern recognition, performance metric, sentiment analysis, social web, speech recognition, statistical model, text mining

Again, using some of the features that are identified in Natural Language Processing with Python, we have:[2] F1: last_letter = “a” F2: last_letter = “k” F3: last_letter = “f” F4: last_letter = “r” F5: last_letter = “y” F6: last_2_letters = “yn” Choose a learning algorithm to infer the target function from the experience you provide it with. We will start with the decision tree method. Evaluate the results according to the performance metric you have chosen. We will use accuracy over the resultant classifications as a performance metric. But, now, where do we start? That is, which feature do we use to start building our tree? When using a decision tree to partition your data, this is one of the most difficult questions to answer. Fortunately, there is a very nice way to assess the impact of choosing one feature over another. It is called information gain and is based on the notion of entropy from information theory.

Choose how to represent the target function. We will assume that target function is represented as the MAP of the Bayesian classifier over the features. Choose a learning algorithm to infer the target function from the experience you provide it with. This is tied to the way we chose to represent the function, namely: Evaluate the results according to the performance metric you have chosen. We will use accuracy over the resultant classifications as a performance metric. Sentiment classification Now let’s look at some classification tasks where different feature sets resulting from richer annotation have proved to be helpful for improving results. We begin with sentiment or opinion classification of texts. This is really two classification tasks: first, distinguishing fact from opinion in language; and second, if a text is an opinion, determining the sentiment conveyed by the opinion holder, and what object it is directed toward.

We will learn when to use each of these classes, as well as which algorithms are most appropriate for each feature type. In particular, we will answer the following question: when does annotation actually help in a learning algorithm? Defining Our Learning Task To develop an algorithm, we need to have a precise representation of what we are trying to learn. We’ll start with Tom Mitchell’s [1] definition of a learning task: Learning involves improving on a task, T, with respect to a performance metric, P, based on experience, E. Given this statement of the problem (inspired by Simon’s concise phrasing shown earlier), Mitchell then discusses the five steps involved in the design of a learning system. Consider what the role of a specification and the associated annotated data will be for each of the following steps for designing a learning system: Choose the “training experience.” For our purposes, this is the corpus that you just built.


pages: 354 words: 26,550

High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems by Irene Aldridge

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

algorithmic trading, asset allocation, asset-backed security, automated trading system, backtesting, Black Swan, Brownian motion, business process, capital asset pricing model, centralized clearinghouse, collapse of Lehman Brothers, collateralized debt obligation, collective bargaining, computerized trading, diversification, equity premium, fault tolerance, financial intermediation, fixed income, high net worth, implied volatility, index arbitrage, information asymmetry, interest rate swap, inventory management, law of one price, Long Term Capital Management, Louis Bachelier, margin call, market friction, market microstructure, martingale, Myron Scholes, New Journalism, p-value, paper trading, performance metric, profit motive, purchasing power parity, quantitative trading / quantitative finance, random walk, Renaissance Technologies, risk tolerance, risk-adjusted returns, risk/return, Sharpe ratio, short selling, Small Order Execution System, statistical arbitrage, statistical model, stochastic process, stochastic volatility, systematic trading, trade route, transaction costs, value at risk, yield curve, zero-sum game

Kurtosis indicates whether the tails of the distribution are normal; high kurtosis signifies “fat tails,” a higher than normal probability of extreme positive or negative events. COMPARATIVE RATIOS While average return, standard deviation, and maximum drawdown present a picture of the performance of a particular trading strategy, the measures do not lend to an easy point comparison among two or more strategies. Several comparative performance metrics have been developed in an attempt to summarize mean, variance, and tail risk in a single number that can be used to compare different trading strategies. Table 5.1 summarizes the most popular point measures. The first generation of point performance measures were developed in the 1960s and include the Sharpe ratio, Jensen’s alpha, and the Treynor ratio. The Sharpe ratio is probably the most widely used measure in comparative performance evaluation; it incorporates three desirable metrics—average return, standard deviation, and the cost of capital.

VaR companion measure, the conditional VaR (CVaR), also known as expected loss (EL), measures the average value of return within the cut-off tail. Of course, the original VaR assumes normal distributions of returns, whereas the returns are known to be fat-tailed. To address this issue, a modified VaR (MVaR) measure was proposed by Gregoriou and Gueyie (2003) and takes into account deviations from normality. Gregoriou and Gueyie (2003) also suggest using MVaR in place of standard deviation in Sharpe ratio calculations. How do these performance metrics stack up against each other? It turns out that all metrics deliver comparable rankings of trading strategies. Evaluating Performance of High-Frequency Strategies 57 Eling and Schuhmacher (2007) compare hedge fund ranking performance of the 13 measures listed and conclude that the Sharpe ratio is an adequate measure for hedge fund performance. PERFORMANCE ATTRIBUTION Performance attribution analysis, often referred to as “benchmarking,” goes back to the arbitrage pricing theory of Ross (1977) and has been applied to trading strategy performance by Sharpe (1992) and Fung and Hsieh (1997), among others.

Methods for forecast comparisons include: r Mean squared error (MSE) r Mean absolute deviation (MAD) 221 Back-Testing Trading Models r Mean absolute percentage error (MAPE) r Distributional performance r Cumulative accuracy profiling If the value of a financial security is forecasted to be xF,t at some future time t and the realized value of the same security at time t is xR,t , the forecast error for the given forecast, εF,t , is computed as follows: ε F,t = xF,t − x R,t (15.2) The mean squared error (MSE) is then computed as the average of squared forecast errors over T estimation periods, analogously to volatility computation: MSE = T 1 2 ε T τ =1 F,τ (15.3) The mean absolute deviation (MAD) and the mean absolute percentage error (MAPE) also summarize properties of forecast errors: MAD = MAPE = T 1 |ε F,τ | T τ =1 T 1 ε F,τ x T R,τ (15.4) (15.5) τ =1 Naturally, the lower each of the three metrics (MSE, MAD, and MAPE), the better the forecasting performance of the trading system. The distributional evaluation of forecast performance also examines forecast errors ε F,t normalized by the realized value, x R,t . Unlike MSE, MAD, and MAPE metrics, however, the distributional performance metric seeks to establish whether the forecast errors are random. If the errors are indeed random, there exists no consistent bias in either of price direction ε F,t movement, and the distribution of normalized errors xR,t should fall on the uniform [0, 1] distribution. If the errors are nonrandom, the forecast can be improved. One test that can be used to determine whether the errors are random is a comparison of errors with the uniform distribution using the Kolmogorov-Smirnov statistic.


pages: 351 words: 123,876

Beautiful Testing: Leading Professionals Reveal How They Improve Software (Theory in Practice) by Adam Goucher, Tim Riley

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, barriers to entry, Black Swan, call centre, continuous integration, Debian, Donald Knuth, en.wikipedia.org, Firefox, Grace Hopper, index card, Isaac Newton, natural language processing, p-value, performance metric, revision control, six sigma, software as a service, software patent, the scientific method, Therac-25, Valgrind, web application

The performance test cases, however, were renamed “Performance Testing Checkpoints” and included the following (abbreviated here): 42 CHAPTER FOUR • Collect baseline system performance metrics and verify that each functional task included in the system usage model achieves performance requirements under a user load of 1 for each performance testing build in which the functional task has been implemented. — [Functional tasks listed, one per line] • Collect system performance metrics and verify that each functional task included in the system usage model achieves performance requirements under a user load of 10 for each performance testing build in which the functional task has been implemented. — [Functional tasks listed, one per line] • Collect system performance metrics and verify that the system usage model achieves performance requirements under the following loads to the degree that the usage model has been implemented in each performance testing build. — [Increasing loads from 100 users to 3,000 users, listed one per line] • Collect system performance metrics and verify that the system usage model achieves performance requirements for the duration of a 9-hour, 1,000-user stress test on performance testing builds that the lead developer, performance tester, and project manager deem appropriate.

Clearly frustrated, but calm, Harold told me that he’d been asked to establish the performance requirements that were going to appear in our contract to the client. Now understanding the intent, I suggested that Harold schedule a conference room for a few hours for us to discuss his task further. He agreed. As it turned out, it took more than one meeting for Harold to explain to me the client’s expectations, the story behind his task, and for me to explain to Harold why we didn’t want to be contractually obligated to performance metrics that were inherently ambiguous, what those ambiguities were, and what we could realistically measure that would be valuable. Finally, Harold and I took what were now several sheets of paper with the following bullets to Sandra, our project manager, to review: “System Performance Testing Requirements: • Performance testing will be conducted under a variety of loads and usage models, to be determined when system features and workflows are established

. — [Functional tasks listed, one per line] • Collect system performance metrics and verify that the system usage model achieves performance requirements under the following loads to the degree that the usage model has been implemented in each performance testing build. — [Increasing loads from 100 users to 3,000 users, listed one per line] • Collect system performance metrics and verify that the system usage model achieves performance requirements for the duration of a 9-hour, 1,000-user stress test on performance testing builds that the lead developer, performance tester, and project manager deem appropriate. The beauty here was that what we created was clear, easy to build a strategy around, and mapped directly to information that the client eventually requested in the final report. An added bonus was that from that point forward in the project, whenever someone challenged our approach to performance testing, one or more of the folks who were involved in the creation of the checkpoints always came to my defense—frequently before I even found out about the challenge!


pages: 304 words: 80,965

What They Do With Your Money: How the Financial System Fails Us, and How to Fix It by Stephen Davis, Jon Lukomnik, David Pitt-Watson

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

activist fund / activist shareholder / activist investor, Admiral Zheng, banking crisis, Basel III, Bernie Madoff, Black Swan, centralized clearinghouse, clean water, computerized trading, corporate governance, correlation does not imply causation, credit crunch, Credit Default Swap, crowdsourcing, David Brooks, Dissolution of the Soviet Union, diversification, diversified portfolio, en.wikipedia.org, financial innovation, financial intermediation, fixed income, Flash crash, income inequality, index fund, information asymmetry, invisible hand, Kenneth Arrow, light touch regulation, London Whale, Long Term Capital Management, moral hazard, Myron Scholes, Northern Rock, passive investing, performance metric, Ponzi scheme, principal–agent problem, rent-seeking, Ronald Coase, shareholder value, Silicon Valley, South Sea Bubble, sovereign wealth fund, statistical model, Steve Jobs, the market place, The Wealth of Nations by Adam Smith, transaction costs, Upton Sinclair, value at risk, WikiLeaks

Elson commented, “Even the best corporate boards will fail to address executive compensation concerns unless they tackle the structural bias created by external peer group benchmarking metrics. … Boards should measure performance and determine compensation by focusing on internal metrics. For example, if customer satisfaction is deemed important to the company, then results of customer surveys should play into the compensation equation. Other internal performance metrics can include revenue growth, cash flow, and other measures of return.”58 In other words, boards should focus, as owners do, on what makes the business flourish. USE THE RIGHT METRICS As discussed earlier, 90 percent of large American companies measure the performance of their executive teams over a three-year period or less. About a quarter don’t have any long-term performance–based awards at all.59 Fewer than 25 percent incorporate the cost of capital into their executive compensation formulas, and only 13 percent consider innovation—such as new products, markets, or services, research and development, or intellectual property development—in determining compensation.60 You couldn’t design an incentive scheme better suited to keeping a CEO focused strictly on the short term if you tried.

., 254n2 BrightScope, 122 Brokers, fiduciary duty and, 256n23 Brooks, David, 167 Buffett, Warren, 45, 63, 64, 80, 150, 221 Business judgment rule, 78–79 Business school curriculum, 190–92 Buy and Hold Is Dead (Again) (Solow), 65 Buy and Hold Is Dead (Kee), 65 Buycott, 118 Cadbury, Adrian, 227 Call option, 93 CalPERS, 91, 110, 111–12, 208, 221, 241n37 CalSTRS, 208 Canada, pension funds in, 59, 111, 209 Capital Aberto (magazine), 117 Capital gains, taxation of, 92 Capital Institute, 59, 87 Capital losses, 92 Capitalism: agency, 33, 74–80 defined, 243n2 Eastern European countries’ transition to, 167 financial system and, 9 injecting ownership back into, 83–93 private ownership and, 62 reforming, 11–12 Carbon Disclosure Project, 89 Career paths, new economic thinking and, 189–90 CDC. See Collective pension plans CDFIs. See Community Development Financial Institutions (CDFIs) CDSs. See Credit default swaps (CDSs) CEM Benchmarking, 54 Central banks, 20, 213 Centre for Policy Studies, 105 CEOs: performance metrics, 68, 86–87 short-term mindset among, 67–68. See also Executive compensation Ceres, 120 CFA Institute, 121 Chabris, Christopher, 174 Charles Schwab, 29, 31 Cheating, regulations and, 144–45 Chinese Academy of Social Sciences, 167 Citadel, 29 Citicorp, 76 Citizen investors/savers, 19 charter for, 227–31 communication between funds and, 110–11 dependence on others to manage money, 5–6, 19, 20 goals of, 48, 49 government regulation to safeguard, 107–9 lack of accountability to, 5–7, 96, 99–106 technology and, 90–92 trading platforms that protect, 88–89 City Bank of Glasgow, 257n34 Civil society organizations (CSOs), 153 corporate accountability and, 119–23 scrutiny of funds by, 224 “Civil Stewardship Leagues,” 122 Clark, Gordon L., 101, 106 Classical economics, 159–61 Clegg, Nick, 9 Clinton, Bill, 68–69 Clinton, Hillary Rodham, 119 Coase, Ronald, 169–70, 243n2, 261n31 Cohen, Lauren, 102 Coles Myer, 82 Collective Defined Contribution (CDC), 266n28 Collective pension plans, 263n1, 266n28 duration of liabilities, 264n3 in Netherlands, 197, 199, 209, 264n6.

See also Retirement savings Pension Trustee Code of Conduct, 121 Pension trustees, 105–6, 108–9, 137–38, 140, 205, 207, 224–25, 229 People’s Pension, 202–11 cost of, 217 enrollment into, 208–9 feedback mechanisms, 207 fees, 204 governance and, 202–3, 205–6 investment interests of beneficiaries, 206–7 models for, 266n28 reform of financial institutions and, 226 transparency and, 203–4, 207–8 Performance: asset managers and, 48–50 defined, 149 encouraging through collective action, 57–58 executive compensation and, 68, 148–49 fees, 239n16 governance and, 100–104 institutional investors and incentives for, 112–13 investment management, 35–38 Performance metrics for executives, 68, 86–87 Perry Capital, 81 PFZW. See Stichting Pensioenfonds Zorg en Welzijn (PFZW) PGGM, 77, 111 Philippon, Thomas, 26–28, 220 Philosophy, Politics and Economics (PPE), 190 Pitman, Brian, 213 Pitt, William, 158 Pitt-Watson, David, 263n1, 264n4, 264–65n11, 266n28 Plender, John, 259n5 Political economy, 142, 152 Political institutions, 183–84 Portfolio management: ownership and, 246n36 pension fund, 208–9 PPE (Philosophy, Politics and Economics), 190 Premium, 22 Price of goods, 160 Principles for Responsible Investment.


pages: 719 words: 181,090

Site Reliability Engineering by Betsy Beyer, Chris Jones, Jennifer Petoff, Niall Richard Murphy

Air France Flight 447, anti-pattern, barriers to entry, business intelligence, business process, Checklist Manifesto, cloud computing, combinatorial explosion, continuous integration, correlation does not imply causation, crowdsourcing, database schema, defense in depth, DevOps, en.wikipedia.org, fault tolerance, Flash crash, George Santayana, Google Chrome, Google Earth, job automation, job satisfaction, linear programming, load shedding, loose coupling, meta analysis, meta-analysis, minimum viable product, MVC pattern, performance metric, platform as a service, revision control, risk tolerance, side project, six sigma, the scientific method, Toyota Production System, trickle-down economics, web application, zero day

A given set of production dependencies can be shared, possibly with different stipulations around intent. Performance metrics Demand for one service trickles down to result in demand for one or more other services. Understanding the chain of dependencies helps formulate the general scope of the bin packing problem, but we still need more information about expected resource usage. How many compute resources does service Foo need to serve N user queries? For every N queries of service Foo, how many Mbps of data do we expect for service Bar? Performance metrics are the glue between dependencies. They convert from one or more higher-level resource type(s) to one or more lower-level resource type(s). Deriving appropriate performance metrics for a service can involve load testing and resource usage monitoring. Prioritization Inevitably, resource constraints result in trade-offs and hard decisions: of the many requirements that all services have, which requirements should be sacrificed in the face of insufficient capacity?

Doing so resolves the maximum amount of dependent uncertainty for the minimum number of iterations. Of course, when an area of uncertainty resolves into a fault, you need to select additional branch points. Testing Scalable Tools As pieces of software, SRE tools also need testing.10 SRE-developed tools might perform tasks such as the following: Retrieving and propagating database performance metrics Predicting usage metrics to plan for capacity risks Refactoring data within a service replica that isn’t user accessible Changing files on a server SRE tools share two characteristics: Their side effects remain within the tested mainstream API They’re isolated from user-facing production by an existing validation and release barrier Barrier Defenses Against Risky Software Software that bypasses the usual heavily tested API (even if it does so for a good cause) could wreak havoc on a live service.

Ideally, all levels of intent should be supported together, with services benefiting the more they shift to specifying intent versus implementation. In Google’s experience, services tend to achieve the best wins as they cross to step 3: good degrees of flexibility are available, and the ramifications of this request are in higher-level and understandable terms. Particularly sophisticated services may aim for step 4. Precursors to Intent What information do we need in order to capture a service’s intent? Enter dependencies, performance metrics, and prioritization. Dependencies Services at Google depend on many other infrastructure and user-facing services, and these dependencies heavily influence where a service can be placed. For example, imagine user-facing service Foo, which depends upon Bar, an infrastructure storage service. Foo expresses a requirement that Bar must be located within 30 milliseconds of network latency of Foo.


pages: 353 words: 88,376

The Investopedia Guide to Wall Speak: The Terms You Need to Know to Talk Like Cramer, Think Like Soros, and Buy Like Buffett by Jack (edited By) Guinan

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, asset allocation, asset-backed security, Brownian motion, business process, capital asset pricing model, clean water, collateralized debt obligation, computerized markets, correlation coefficient, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, discounted cash flows, diversification, diversified portfolio, dividend-yielding stocks, equity premium, fixed income, implied volatility, index fund, intangible asset, interest rate swap, inventory management, London Interbank Offered Rate, margin call, market fundamentalism, money market fund, mortgage debt, Myron Scholes, passive investing, performance metric, risk tolerance, risk-adjusted returns, risk/return, shareholder value, Sharpe ratio, short selling, statistical model, time value of money, transaction costs, yield curve, zero-coupon bond

Related Terms: • American Depositary Receipt—ADR • Correlation • Exchange-Traded Fund—ETF • Global Depositary Receipt—GDR • Index Multiple What Does Multiple Mean? A term that measures a particular aspect of a company’s financial well-being, determined by dividing one metric by another metric. The metric in the numerator is typically larger than the one in the denominator, because the top metric usually is supposed to be many times Multiple = Performance Metric “A” Performance Metric “B” larger than the bottom metric. It is calculated as follows: Investopedia explains Multiple As an example, the term “multiple” can be used to show how much investors are willing to pay per dollar of earnings, as computed by the P/E ratio. Suppose one is analyzing a stock with $2 of earnings per share (EPS) that is trading at $20; this stock has a P/E of 10. This means that investors are willing to pay a multiple of 10 times earnings for the stock.

For example, a European investor purchasing shares of an American company on a foreign exchange (using American dollars to do so) would be exposed to exchange-rate risk while holding that stock. To hedge that risk, the investor could purchase currency futures to lock in a specified exchange rate for the future stock sale and conversion back into the foreign currency. Related Terms: • Credit Derivative • Hedge • Stock Option • Forward Contract • Option Diluted Earnings per Share (Diluted EPS) What Does Diluted Earnings per Share (Diluted EPS) Mean? A performance metric used to gauge the quality of a company’s earnings per share (EPS) if all convertible securities were exercised. Convertible securities refer to all outstanding convertible preferred shares, convertible debentures, stock options (primarily employee-based), The Investopedia Guide to Wall Speak 75 and warrants. Unless the company has no additional potential shares outstanding (a relatively rare circumstance), the diluted EPS will always be lower than the simple EPS.


pages: 297 words: 91,141

Market Sense and Nonsense by Jack D. Schwager

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3Com Palm IPO, asset allocation, Bernie Madoff, Brownian motion, collateralized debt obligation, commodity trading advisor, computerized trading, conceptual framework, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, diversification, diversified portfolio, fixed income, high net worth, implied volatility, index arbitrage, index fund, London Interbank Offered Rate, Long Term Capital Management, margin call, market bubble, market fundamentalism, merger arbitrage, negative equity, pattern recognition, performance metric, pets.com, Ponzi scheme, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, risk/return, Robert Shiller, Robert Shiller, selection bias, Sharpe ratio, short selling, statistical arbitrage, statistical model, survivorship bias, transaction costs, two-sided market, value at risk, yield curve

But the story does not end there. Figure 3.11 NAV Comparison: Three-Period Prior Best S&P Sector versus Prior Worst and Average Data source: S&P Dow Jones Indices. So far, the analysis has only considered returns and has shown that choosing the best past sector would have yielded slightly lower returns than an equal-allocation approach (that is, the average). Return, however, is an incomplete performance metric. Any meaningful performance comparison must also consider risk (a concept we will elaborate on in Chapter 4). We use two measures of risk here: 1. Standard deviation. The standard deviation is a volatility measure that indicates how spread out the data is—in this case, how broadly the returns vary. Roughly speaking, we would expect approximately 95 percent of the data points to fall within two standard deviations of the mean.

Based on performance, it would be difficult to justify choosing Manager E over Manager F, even for the most risk-tolerant investor. Figure 8.12 2DUC: Manager E versus Manager F Investment Misconceptions Investment Misconception 23: The average annual return is probably the single most important performance statistic. Reality: Return alone is a meaningless statistic because return can always be increased by increasing risk. The return/risk ratio should be the primary performance metric. Investment Misconception 24: For a risk-seeking investor considering two investment alternatives, an investment with expected lower return/risk but higher return may often be preferable to an equivalent-quality investment with the reverse characteristics. Reality: The higher return/risk alternative would still be preferable, even for risk-seeking investors, because by using leverage it can be translated into an equivalent return with lower risk (or higher return with equal risk).

However, pro forma results that only adjust for differences between current and past fees and commissions can be more representative than actual results. It is critical to differentiate between these two radically different applications of the same term: pro forma. 16. Return alone is a meaningless statistic because return can be increased by increasing risk. The return/risk ratio should be the primary performance metric. 17. Although the Sharpe ratio is by far the most widely used return/risk measure, return/risk measures based on downside risk come much closer to reflecting risk as it is perceived by most investors. 18. Conventional arithmetic-scale net asset value (NAV) charts provide a distorted picture, especially for longer-term track records that traverse a wide range of NAV levels. A log scale should be used for long-term NAV charts. 19.


pages: 49 words: 12,968

Industrial Internet by Jon Bruner

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

autonomous vehicles, barriers to entry, commoditize, computer vision, data acquisition, demand response, en.wikipedia.org, factory automation, Google X / Alphabet X, industrial robot, Internet of things, job automation, loose coupling, natural language processing, performance metric, Silicon Valley, slashdot, smart grid, smart meter, statistical model, web application

Newer wind turbines use software that acts in real-time to squeeze a little more current out of each revolution, pitching the blades slightly as they rotate to compensate for the fact that gravity shortens them as they approach the top of their spin and lengthens them as they reach the bottom. Power producers use higher-level data analysis to inform longer-range capital strategies. The 150-foot-long blades on a wind turbine, for instance, chop at the air as they move through it, sending turbulence to the next row of turbines and reducing efficiency. By analyzing performance metrics from existing wind installations, planners can recommend new layouts that take into account common wind patterns and minimize interference. Automotive Google captured the public imagination when, in 2010, it announced that its autonomous cars had already driven 140,000 miles of winding California roads without incident. The idea of a car that drives itself was finally realized in a practical way by software that has strong links to the physical world around it: inbound, through computer vision software that takes in images and rangefinder data and builds an accurate model of the environment around the car; and outbound, through a full linkage to the car’s controls.


pages: 231 words: 71,248

Shipping Greatness by Chris Vander Mey

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

corporate raider, don't be evil, en.wikipedia.org, fudge factor, Google Chrome, Google Hangouts, Gordon Gekko, Jeff Bezos, Kickstarter, Lean Startup, minimum viable product, performance metric, recommendation engine, Skype, slashdot, sorting algorithm, source of truth, Steve Jobs, Superbowl ad, web application

Some rocket surgeon a while back came up with the notion that goals should be specific, measurable, attainable, reasonable, and time-based. This is a good, but not sufficiently specific, framework. I prefer the Great Delta Convention (described in Chapter 10). If you apply the Great Delta Convention to your goals, nobody will question them—they will almost be S.M.A.R.T. by definition (lacking only the “reasonable” part). Business Performance Business performance metrics tell you where your problems are and how you can improve your user’s experience. These metrics are frequently measured as ratios, such as conversion from when a user clicks the Buy button to when the checkout process is complete. Like goal metrics, it’s critical to measure the right aspects of your business. For example, if you want to build a great social product, you don’t need to measure friends—different segments of users have different numbers of friends.

Google Analytics provides A/B comparison tools that are incredibly powerful, but they’re just one kind of many tools you can use. Most major websites have testing frameworks that they use to roll out features incrementally and ensure that a new feature or experience has the intended effect. If it’s even remotely possible, try to build an experimentation framework in from the beginning (see Chapter 7’s discussion of launching for other benefits of experiments). Systems Performance Systems performance metrics measure the health of your product in real time. Metrics like these include 99.9% mean latency, total requests per second, simultaneous users, orders per second, and other time-based metrics. When these metrics go down substantially, something has gone wrong. A pager should go off. If you’re a very fancy person, you’ll want to look at your metrics through the lens of statistical process control (SPC).


pages: 291 words: 77,596

Total Recall: How the E-Memory Revolution Will Change Everything by C. Gordon Bell, Jim Gemmell

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

airport security, Albert Einstein, book scanning, cloud computing, conceptual framework, Douglas Engelbart, full text search, information retrieval, invention of writing, inventory management, Isaac Newton, John Markoff, lifelogging, Menlo Park, optical character recognition, pattern recognition, performance metric, RAND corporation, RFID, semantic web, Silicon Valley, Skype, social web, statistical model, Stephen Hawking, Steve Ballmer, Ted Nelson, telepresence, Turing test, Vannevar Bush, web application

“Recognizing soldier activities in the field.” Proceedings of International IEEE Workshop on Wearable and Implantable Body Sensor Networks (BSN), Aachen, Germany, March 2007. Schlenoff, Craig, et al. “Overview of the First Advanced Technology Evaluations for ASSIST.” Proceedings of Performance Metrics for Intelligent Systems (PerMIS) 2006, IEEE Press, Gaithersburg, Maryland, August 2006. Stevers, Michelle Potts. “Utility Assessments of Soldier-Worn Sensor Systems for ASSIST.” Proceedings of the Performance Metrics for Intelligent Systems Workshop, 2006. Starner, Thad. “The Virtual Patrol: Capturing and Accessing Information for the Soldier in the Field.” Proceedings of the 3rd ACM Workshop on Continuous Archival and Retrieval of Personal Experiences, Santa Barbara, California, 2006. Glass Box: Cowley, Paula, Jereme Haack, Rik Littlefield, and Ernest Hampson.


pages: 98 words: 25,753

Ethics of Big Data: Balancing Risk and Innovation by Kord Davis, Doug Patterson

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

4chan, business process, corporate social responsibility, crowdsourcing, en.wikipedia.org, Mahatma Gandhi, Mark Zuckerberg, Netflix Prize, Occupy movement, performance metric, Robert Bork, side project, smart grid, urban planning

The volume at which new data is being generated is staggering. We live in an age when the amount of data we expect to be generated in the world is measured in exabytes and zettabytes. By 2025, the forecast is that the Internet will exceed the brain capacity of everyone living on the entire planet. Additionally, the variety of sources and data types being generated expands as fast as new technology can be created. Performance metrics from in-car monitors, manufacturing floor yield measurements, all manner of healthcare devices, and the growing number of Smart Grid energy appliances all generate data. More importantly, they generate data at a rapid pace. The velocity of data generation, acquisition, processing, and output increases exponentially as the number of sources and increasingly wider variety of formats grows over time.


pages: 561 words: 114,843

Startup CEO: A Field Guide to Scaling Up Your Business, + Website by Matt Blumberg

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

activist fund / activist shareholder / activist investor, airport security, Albert Einstein, bank run, Broken windows theory, crowdsourcing, deskilling, fear of failure, high batting average, high net worth, hiring and firing, Inbox Zero, James Hargreaves, Jeff Bezos, job satisfaction, Kickstarter, knowledge economy, knowledge worker, Lean Startup, Mark Zuckerberg, minimum viable product, pattern recognition, performance metric, pets.com, rolodex, Rubik’s Cube, shareholder value, Silicon Valley, Skype

This was less surprising about some aspects than others. For example, I wasn’t surprised that there was a high degree of convergence in the way people thought about the organization’s values since we had a strong values-driven culture that people were living every day, even if those values hadn’t been well articulated in the past. But it was a little surprising that we could effectively crowdsource a strategy statement and key performance metrics at a time when the business was at a fork in the road. Given this degree of alignment, our task as an executive team became less about picking concepts and more about picking words. We worked together to come up with a solid draft that took the best of what was submitted to us. We worked with a copywriter to make the statements flow well. Then we shared the results with the company and opened the floor for comments.

How and when is this investment going to pay itself back? What is the capital required to get there and what are your financing requirements from where your balance sheet sits today? The costs are easier to forecast, especially if you carefully articulated your resource requirements. As everybody in the startup world knows, ROI is trickier. You’re not leading an enterprise that has extremely detailed historical performance metrics to rely on in their forecasting. When Schick or Gillette introduces a new razor into the marketplace, they can very accurately forecast how much it’s going to cost them and what their return will be. If you’re creating a new product in a new marketplace, that isn’t the case. While monthly burn and revenue projections will inevitably change, capital expenditures can be more predictable, though you need to make sure you understand the cash flow mechanics of capital expenditure.

Second, those criteria have to be things that will remain in the control of the acquired company for the length of the earn-out; asking an entrepreneur to agree to an earn-out based on sales, for example, when your sales force will be doing all of the selling, doesn’t make sense. Finally, an earn-out can’t be too high a percentage of the deal. The preponderance will have to be cash and stock. Otherwise, the process of judging performance should be shared by both parties. In one of our largest deals at Return Path, each side appointed representatives who met quarterly to agree on performance metrics, adjustments, and so on. We also designated a third representative in advance who was available to adjudicate any disagreements. We never had to use him. Whatever mechanism you put in place, trust plays a huge role here. If it’s not there, this acquisition might not be a good idea. THE FLIP SIDE OF M&A: DIVESTITURE When Return Path turned six years old in 2005, we had gone from being a startup focused on our initial ECOA business to the world’s smallest conglomerate, with five lines of business: in addition to change of address, we were market leaders in email delivery assurance (a market we created), email–based market research (a tiny market when we started) and email list management and list rental (both huge markets when we founded the company).


pages: 556 words: 46,885

The World's First Railway System: Enterprise, Competition, and Regulation on the Railway Network in Victorian Britain by Mark Casson

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

banking crisis, barriers to entry, Beeching cuts, British Empire, combinatorial explosion, Corn Laws, corporate social responsibility, David Ricardo: comparative advantage, intermodal, iterative process, joint-stock company, joint-stock limited liability company, knowledge economy, linear programming, Network effects, New Urbanism, performance metric, railway mania, rent-seeking, strikebreaker, the market place, transaction costs

To this end, the counterfactual has been constructed on very conservative assumptions, which are elaborated below. The engineering assumptions are very conservative relative to actual railway practice, while the use of detailed land surveys and large-scale maps means that major infringements of local parks and amenities have been avoided. 1 . 4 . P E R F O R M A N C E M E T R I C S : D I S TA N C E A N D T I M E Two main performance metrics are used in this study: journey distance and journey time. The most obvious metric by which to compare the actual and counterfactual systems is by the route mileages between pairs of towns. This Introduction and Summary 7 metric is not quite so useful as it seems, however. For many types of traYc, including passengers, mail, troops, and perishable goods, it is the time taken by the journey that is important and not the distance per se.

In practice the counterfactual system, being smaller, would have been completed much earlier than the actual system, assuming that the pace of construction had been the same. Thus the average working life of the counterfactual system would have been longer—another advantage which is not formally included in the comparison. 3 . 4 . C O N S T RU C T I O N O F T H E C O UN T E R FAC T UA L : PERFORMANCE METRICS To compare the performance of the actual and counterfactual systems a set of 250 representative journeys was examined. Ten different types of journey were 64 The World’s First Railway System distinguished, and sub-samples of 25 journeys of each type were generated. Performance was measured for each type of journey, and an overall measure of performance, based on an arithmetic average, was constructed.

R. 367 Clitheroe as secondary natural hub 83 Tab 3.4 Clyde River 199 Clyde Valley 156 coal industry 1, 50 exports 5 see also regional coalfields coal traffic 53, 182–3, 270 coalfield railways 127, 167 Coalville 187 Coatbridge 157 Cobden, Richard 37 Cockermouth 219 Colchester 69, 107, 108 Coldstream 158, 159 Colebrook 198 Colonial Office, British 48 Combe Down Tunnel 144 commerce, industry and railways 308 Index Commercial Railway Scheme, London 152, 154 Commission on the Merits of the Broad and Narrow Gauge 228 Tab 6.2 company law 42–3 competing local feeders 204–7 competition adverse effects of 221 adversarial 316–19 concept applied to railways 258–60 Duopolistic on networks 492–4 and duplication of routes 94 and excess capacity 477–97 excessive 16–19 and fare reduction 261–2 individual/multiple linkages 266, 267 inter-town 323–4 and invasions by competing companies 268–9, 273 and invasions in joint venture schemes (large) 166–73 and invasions in joint venture schemes (small) 173–8 network effects 262–4 principle of 221 and territorial strategy 286–7 wastage/inefficiency 162, 166 compulsory purchase of land 30, 223, 288 concavity principle 72, 82 connectivity and networks 2–3 Connel Ferry 161 construction costs 16–17 consultant engineers see civil engineers; mechanical engineers contour principle 72 contractors 301–2 Conway River 136 cooperation between companies 324–6 core and peripheral areas, UK 85 Fig 3.8 Corn Laws, Repeal (1846) 37, 110 Cornwall 152 Cornwall Railway 141 corporate social responsibility 311–13 corridor trains 311 Cosham 147, 190 Cotswold Hills 110, 111, 114, 149 counterfactual map of the railway network East Midlands 90 Fig 3.10 North of England 92 Fig 3.12 South East England 90 Fig 3.10 Wales 91 Fig 3.11 West of England 91 Fig 3.11 counterfactual railway network 4–29, 58–104 bypass principle 80–2, 89 and cities 306 concavity principle 82 continuous linear trunk network with coastal constraints 74 Fig 3.2 503 continuous linear trunk network with no coastal constraints 73 Fig 3.1 contour principle 87, 88 Fig 3.9 core and periphery principle 82–6, 84 Tab 3.5, 85 Fig 3.8 coverage of cities, town and villages 62–3 cross-country linkages on the symmetric network 100 Fig 3.19 cross-country routes 274 cut-off principle 80, 81 Fig 3.7, 89 cut-off principle with traffic weighting 81 Fig 3.7 Darlington core hub 89 Derby core hub 89 frequency of service 65–6 Gloucester as corner hub 82 heuristic principles of 10–12, 71–2 hubs 439–71, 440–9 Tab A5.1 hubs, configuration of 89, 94–103 hubs, size and distribution 95 Fig 3.13 Huddersfield core hub 89 influence of valleys and mountains 88 Fig 3.9 iterative process 64 Kirkby Lonsdale core hub 89 Leicester core hub 89 Lincolnshire region cross-country routes 119 London as corner hub 82 London terminals 155 loop principle 86–7 Melrose core hub 89, 158–9 mileage 437 Tab A4.4 Newcastle as corner hub 82 North-South linkages 148 North-South spine with ribs 75 Fig 3.3 objections to 12–14 optimality of the system 91–3 performance compared to actual system 64–5, 65 Tab 3.2 performance metrics 63–6 quality of network 392 Tab A4.1 and rational principles 322 Reading core hub 89 role of network 392, 393 Tab A4.2 route description 392–438, 393–436 Tab A4.3 and Severn Tunnel 112–14 Shoreham as corner hub 82 Southampton as corner hub 82 space-filling principle 87–9 Steiner solution 76 Fig 3.4 Steiner solution with traffic weighting 78 Fig 3.5 Stoke-on-Trent as corner hub 89 timetable 8, 89–90, 472–6, 474–6 Tab A6.1 timetable compared with actual 315–16 504 Index counterfactual railway network (cont.) traffic flows 66–71 traffic-weighting principle 77, 78 Fig 3.5 trial solution, first 89–91, 90 Fig 3.10, 91 Fig 3.11, 92 Fig 3.12 triangle principle 77–80, 79 Fig 3.6, 89, 96 triangle principle without traffic weighting 79 Fig 3.6 Trowbridge core hub 89 Warrington as corner hub 82 Wetherby core hub 122 country towns avoided by railway schemes 307–9 Coventry 68, 118, 135 Coventry Canal 117 Crafts, Nicholas F.


pages: 444 words: 86,565

Investment Banking: Valuation, Leveraged Buyouts, and Mergers and Acquisitions by Joshua Rosenbaum, Joshua Pearl, Joseph R. Perella

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

asset allocation, asset-backed security, bank run, barriers to entry, capital asset pricing model, collateralized debt obligation, corporate governance, credit crunch, discounted cash flows, diversification, fixed income, intangible asset, London Interbank Offered Rate, performance metric, shareholder value, sovereign wealth fund, technology bubble, time value of money, transaction costs, yield curve

First, we benchmark the key financial statistics and ratios for the target and its comparables in order to establish relative positioning, with a focus on identifying the closest or “best” comparables and noting potential outliers. Second, we analyze and compare the trading multiples for the peer group, placing particular emphasis on the best comparables. Benchmark the Financial Statistics and Ratios The first stage of the benchmarking analysis involves a comparison of the target and comparables universe on the basis of key financial performance metrics. These metrics, as captured in the financial profile framework outlined in Steps I and III, include measures of size, profitability, growth, returns, and credit strength. They are core value drivers and typically translate directly into relative valuation. The results of the benchmarking exercise are displayed on spreadsheet output pages that present the data for each company in an easy-to-compare format (see Exhibits 1.53 and 1.54).

EXHIBIT 3.38 ValueCo Projected Taxes Capex Projections We projected ValueCo’s capex as a percentage of sales in line with historical levels. As shown in Exhibit 3.39, this approach led us to hold capex constant throughout the projection period at 2% of sales. Based on this assumption, capex increases from $21.6 million in 2009E to $25.3 million in 2013E. EXHIBIT 3.39 ValueCo Historical and Projected Capex Change in Net Working Capital Projections As with ValueCo’s other financial performance metrics, historical working capital levels normally serve as reliable indicators of future performance. The direct prior year’s ratios are typically the most indicative provided they are consistent with historical levels. This was the case for ValueCo’s 2007 working capital ratios, which we held constant throughout the projection period (see Exhibit 3.40). EXHIBIT 3.40Valueco Historical and Projected Net Working Capital For A/R, inventory, and A/P, respectively, these ratios are DSO of 60.2, DIH of 76.0, and DPO of 45.6.


pages: 484 words: 104,873

Rise of the Robots: Technology and the Threat of a Jobless Future by Martin Ford

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, additive manufacturing, Affordable Care Act / Obamacare, AI winter, algorithmic trading, Amazon Mechanical Turk, artificial general intelligence, assortative mating, autonomous vehicles, banking crisis, basic income, Baxter: Rethink Robotics, Bernie Madoff, Bill Joy: nanobots, call centre, Capital in the Twenty-First Century by Thomas Piketty, Chris Urmson, Clayton Christensen, clean water, cloud computing, collateralized debt obligation, commoditize, computer age, creative destruction, debt deflation, deskilling, diversified portfolio, Erik Brynjolfsson, factory automation, financial innovation, Flash crash, Fractional reserve banking, Freestyle chess, full employment, Goldman Sachs: Vampire Squid, Gunnar Myrdal, High speed trading, income inequality, indoor plumbing, industrial robot, informal economy, iterative process, Jaron Lanier, job automation, John Markoff, John Maynard Keynes: technological unemployment, John von Neumann, Kenneth Arrow, Khan Academy, knowledge worker, labor-force participation, labour mobility, liquidity trap, low skilled workers, low-wage service sector, Lyft, manufacturing employment, Marc Andreessen, McJob, moral hazard, Narrative Science, Network effects, new economy, Nicholas Carr, Norbert Wiener, obamacare, optical character recognition, passive income, Paul Samuelson, performance metric, Peter Thiel, Plutocrats, plutocrats, post scarcity, precision agriculture, price mechanism, Ray Kurzweil, rent control, rent-seeking, reshoring, RFID, Richard Feynman, Richard Feynman, Rodney Brooks, secular stagnation, self-driving car, Silicon Valley, Silicon Valley startup, single-payer health, software is eating the world, sovereign wealth fund, speech recognition, Spread Networks laid a new fibre optics cable between New York and Chicago, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Steven Pinker, strong AI, Stuxnet, technological singularity, telepresence, telepresence robot, The Bell Curve by Richard Herrnstein and Charles Murray, The Coming Technological Singularity, The Future of Employment, Thomas L Friedman, too big to fail, Tyler Cowen: Great Stagnation, union organizing, Vernor Vinge, very high income, Watson beat the top human players on Jeopardy!, women in the workforce

Police departments across the globe are turning to algorithmic analysis to predict the times and locations where crimes are most likely to occur and then deploying their forces accordingly. The City of Chicago’s data portal allows residents to see both historical trends and real-time data in a range of areas that capture the ebb and flow of life in a major city—including energy usage, crime, performance metrics for transportation, schools and health care, and even the number of potholes patched in a given period of time. Tools that provide new ways to visualize data collected from social media interactions as well as sensors built into doors, turnstiles, and escalators offer urban planners and city managers graphic representations of the way people move, work, and interact in urban environments, a development that may lead directly to more efficient and livable cities.

He received the green light from IBM management in 2007 and set out to build, in his words, “the most sophisticated intelligence architecture the world has ever seen.”18 To do this, he drew on resources from throughout the company and put together a team consisting of artificial intelligence experts from within IBM as well as at top universities, including MIT and Carnegie Mellon.19 Ferrucci’s team, which eventually grew to include about twenty researchers, began by building a massive collection of reference information that would form the basis for Watson’s responses. This amounted to about 200 million pages of information, including dictionaries and reference books, works of literature, newspaper archives, web pages, and nearly the entire content of Wikipedia. Next they collected historical data for the Jeopardy! quiz show. Over 180,000 clues from previously televised matches became fodder for Watson’s machine learning algorithms, while performance metrics from the best human competitors were used to refine the computer’s betting strategy.20 Watson’s development required thousands of separate algorithms, each geared toward a specific task—such as searching within text; comparing dates, times, and locations; analyzing the grammar in clues; and translating raw information into properly formatted candidate responses. Watson begins by pulling apart the clue, analyzing the words, and attempting to understand what exactly it should look for.


pages: 324 words: 92,805

The Impulse Society: America in the Age of Instant Gratification by Paul Roberts

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, accounting loophole / creative accounting, activist fund / activist shareholder / activist investor, Affordable Care Act / Obamacare, American Society of Civil Engineers: Report Card, asset allocation, business process, Cass Sunstein, centre right, choice architecture, collateralized debt obligation, collective bargaining, computerized trading, corporate governance, corporate raider, corporate social responsibility, creative destruction, crony capitalism, David Brooks, delayed gratification, double helix, factory automation, financial deregulation, financial innovation, fixed income, full employment, game design, greed is good, If something cannot go on forever, it will stop - Herbert Stein's Law, impulse control, income inequality, inflation targeting, invisible hand, job automation, John Markoff, Joseph Schumpeter, knowledge worker, late fees, Long Term Capital Management, loss aversion, low skilled workers, mass immigration, new economy, Nicholas Carr, obamacare, Occupy movement, oil shale / tar sands, performance metric, postindustrial economy, profit maximization, Report Card for America’s Infrastructure, reshoring, Richard Thaler, rising living standards, Robert Shiller, Robert Shiller, Rodney Brooks, Ronald Reagan, shareholder value, Silicon Valley, speech recognition, Steve Jobs, technoutopianism, the built environment, The Predators' Ball, the scientific method, The Wealth of Nations by Adam Smith, Thorstein Veblen, too big to fail, total factor productivity, Tyler Cowen: Great Stagnation, Walter Mischel, winner-take-all economy

On the downside side, Autor told me, those jobs will always be low-wage “because the skills they use are generic and almost anyone can be productive at them within a couple of days.”34 And, in fact, there will likely be far more downsides to these jobs than upsides. For example, because Big Data will allow companies to more easily and accurately measure worker productivity, workers will be under constant pressure to meet specific performance metrics and will be subject to constant ratings, just as restaurants and online products are today. Companies will assess every data point that might affect performance, so that every aspect of employment, from applying for a job to the actual performance of duties, will become much more closely scrutinized and assessed. “If you’re a worker, there’ll be, like, credit scores,” Cowen told NPR.35 “There already are, to some extent.

There will be no middle class in the way we now understand the term: median income will be much lower than it is, and many of the poor will lack access to even basic public services, in part because the wealthy will resist tax increases. “Rather than balancing our budget with higher taxes or lower benefits,” Cowen says, “we will allow the real wages of many workers to fall, and thus we will allow the creation of a new underclass.” Certain critics have found such dystopic visions far too grim. And yet, the signs of such a future are everywhere. Already, companies are using Big Data performance metrics to determine whom to cut—meaning that to be laid off is to be branded unemployable. In the ultimate corruption of innovation, a technology that might be used to help workers upgrade their skills and become more secure is instead being use to harass them. To be sure, Big Data will be put to more beneficial uses. Digital technologies will certainly remake the way we deliver education, for example.


pages: 323 words: 90,868

The Wealth of Humans: Work, Power, and Status in the Twenty-First Century by Ryan Avent

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, Airbnb, American energy revolution, assortative mating, autonomous vehicles, Bakken shale, barriers to entry, basic income, Bernie Sanders, BRICs, call centre, Capital in the Twenty-First Century by Thomas Piketty, Clayton Christensen, cloud computing, collective bargaining, computer age, creative destruction, dark matter, David Ricardo: comparative advantage, deindustrialization, dematerialisation, Deng Xiaoping, deskilling, Dissolution of the Soviet Union, Donald Trump, Downton Abbey, Edward Glaeser, Erik Brynjolfsson, eurozone crisis, everywhere but in the productivity statistics, falling living standards, first square of the chessboard, first square of the chessboard / second half of the chessboard, Ford paid five dollars a day, Francis Fukuyama: the end of history, future of work, gig economy, global supply chain, global value chain, hydraulic fracturing, income inequality, indoor plumbing, industrial robot, intangible asset, interchangeable parts, Internet of things, inventory management, invisible hand, Jacquard loom, James Watt: steam engine, Jeff Bezos, John Maynard Keynes: Economic Possibilities for our Grandchildren, Joseph-Marie Jacquard, knowledge economy, low skilled workers, lump of labour, Lyft, manufacturing employment, Marc Andreessen, mass immigration, means of production, new economy, performance metric, pets.com, price mechanism, quantitative easing, Ray Kurzweil, rent-seeking, reshoring, rising living standards, Robert Gordon, Ronald Coase, savings glut, Second Machine Age, secular stagnation, self-driving car, sharing economy, Silicon Valley, single-payer health, software is eating the world, supply-chain management, supply-chain management software, TaskRabbit, The Future of Employment, The Nature of the Firm, The Spirit Level, The Wealth of Nations by Adam Smith, Thomas Malthus, trade liberalization, transaction costs, Tyler Cowen: Great Stagnation, Uber and Lyft, Uber for X, very high income, working-age population

That knowledge is absorbed by newer employees over time, through long exposure to the old habits. What our firm is, is not so much a business that produces a weekly magazine, but a way of doing things consisting of an enormous set of processes. You run that programme, and you get a weekly magazine at the end of it. Employees want job security, to advance, to receive pay rises. Those desires are linked to tangible performance metrics; within The Economist, it matters that a writer delivers the expected stories with the expected frequency and with the expected quality. Yet that is not all that matters. Advancement is also about the extent to which a worker thrives within a culture. What constitutes thriving depends on the culture. In some firms, it may mean buttering up the bosses and working long hours. In others, it may mean the practice of Machiavellian office politics.

The information-processing role of the firm can help us to understand the phenomenon of ‘disruption’, in which older businesses struggle to adapt to powerful new technologies or market opportunities. The notion of a ‘disruptive’ technology was first described in detail by Clayton Christensen, a scholar at Harvard Business School.4 Disruption is one of the most important ideas in business and management to emerge over the last generation. A disruptive innovation, in Christensen’s sense, is one that is initially not very good, in the sense that it does badly on the performance metrics that industry leaders care about, but which then catches on rapidly, wrong-footing older firms and upending the industry. Christensen explained his idea through the disk-drive industry, which was once dominated by large, 8-inch disks that could hold lots of information and access it very quickly. Both disk-drive makers and their customers initially thought that smaller drives were of little practical use.


pages: 290 words: 87,549

The Airbnb Story: How Three Ordinary Guys Disrupted an Industry, Made Billions...and Created Plenty of Controversy by Leigh Gallagher

Airbnb, Amazon Web Services, barriers to entry, Bernie Sanders, cloud computing, crowdsourcing, don't be evil, Donald Trump, East Village, Elon Musk, housing crisis, iterative process, Jeff Bezos, Jony Ive, Justin.tv, Lyft, Marc Andreessen, Mark Zuckerberg, medical residency, Menlo Park, Network effects, Paul Buchheit, Paul Graham, performance metric, Peter Thiel, RFID, Sand Hill Road, Saturday Night Live, sharing economy, side project, Silicon Valley, Silicon Valley startup, South of Market, San Francisco, Startup school, Steve Jobs, TaskRabbit, the payments system, Tony Hsieh, Y Combinator, yield management

That ability could be used as a powerful reward mechanism to its hosts: those who provided positive experiences for guests and received good reviews would get vaulted to the top of search results, giving them greater exposure and increasing their chances of future bookings. But decline too many requests or respond too slowly or cancel too many reservations or simply appear inhospitable in reviews, and Airbnb can drop a powerful hammer: it can lower your listing in search results or even deactivate your account. Behave well, though, and Airbnb will shine its love upon you. If you hit a certain series of performance metrics—in the past year, if you have hosted at least ten trips, if you have maintained a 90 percent response rate or higher, if you have received a five-star review at least 80 percent of the time, and if you’ve canceled a reservation only rarely or in extenuating circumstances, you are automatically elevated to “Superhost” status. That means you get a special logo on your site, your listing will be bumped way up in the rankings, you’ll get access to a dedicated customer-support line, and you might even get the chance to preview new products and attend events.

., 139 Maslow, Abraham, 70–71, 92 Mason, Andrew, 49 matching (guest and host), 44–45 McAdoo, Greg, 30–31, 35–36, 164 McCann, Pol, 74, 116, 117 McChrystal, Stanley, 173, 186 McGovern, George, xvi, 167 McNamara, Robert, 166 media and press Airbnb in pop culture, xv–xvi, 60–61 at conventions (2009), 38 Democratic National Convention coverage, 19–20 “Meet Carol” television ad, 112 negative exposure, 50–55, 80–82, 86, 91 presidential inauguration, 28 “Meet Carol” television ad, 112 Meyer, Danny, 191 Michael (original guest), 8, 10 Mildenhall, Jonathan, 64 millennials as Airbnb early adopters, xii, xiii, 59, 66, 150–51, 157–58 apartments and, 129–30 hotel industry and, 141, 152 as mobilizing force, 134 New York and, 108 mission statement, xiv, xix, 36, 64–67, 78–79, 117, 171, 172, 194, 205 Moore, Geoffrey, 181, 188 Morey, Elizabeth, 31 Morgan, Jonathan, 74–75, 116, 117, 134 Morgan Stanley, 145 Morris, Phil, 202 Moxy, 152 multifamily buildings, 129–31 Multiple Dwelling Law, 107, 115 multiunit listings, 110–13, 116–17 Murphy, Laura, 102, 171 Mushroom Dome, 60, 183 Musk, Elon, 196 N Nassetta, Christopher, 141–42 neighbors, 83–85, 109, 118–19, 132–34 network effect, 40–41 New Jersey, 126 New York City, 105–37 anti-Airbnb alliance, 109 attorney general’s report, 109–110 Chesky’s reaction to, 113 commercial “multiunit” listings, 110–13, 115–16 customer base, 26–28, 106, 119, 126 future negotiations, 133–37 objections to short-term rentals, 118–24 Warren verdict, 108–9 Noirbnb, 102 O Oasis, 154, 155–56 Obama, Barack, 18, 28, 92, 161–62, 173–74, 209 Obama O’s, 20–23, 24, 33, 47, 174 Olympics, 156 “one host, one home” policy, 114 onefinestay, 153, 154–55, 158 online travel agencies (OTAs), 148 Open Doors policy, 102 Orbitz, 148 Orlando, 142 Oswald, Lee Harvey, xvi P Packard, Dave, 1 Paltrow, Gwyneth, 59, 60, 191 Panetta, Leon, x Paris, Airbnb Open, 77–78 parties, 81–90 Patel, Elissa, 159, 209 Patton, George S., 166 payment system, 14, 16, 27, 39–40, 42–43 PayPal, 43 Peak (Conley), 70–71 Penz, Hans, 200 performance metrics, 72–73 photography, xvii, 27, 45, 99, 100–104, 206 Pillow, 75 politics Airbnb as force for change, 126–28 Airbnb guests and, 133 future negotiations, 133–37 Lehane and, 125–29 New York advertising policy, 121–22 New York short-term rentals, 105–10 pop culture, xv–xvi, 60–61 popular listings, 60 Pressler, Paul, 196 Priceline, 148, 154, 198 pricing, as issue, 27, 99–100 privacy policy, 87, 115 product evolution, 59–60 product/market fit, 34–37 professional operators, Airbnb, 111 profit and earnings, 73, 110, 112–13, 127 property management, 129 Proposition F, 128–29 prototype operations, 177–78 R Rabois, Keith, 31 racial discrimination, 99–104 “Ramen profitable,” 26, 29 rankings, 16, 72–73, 162 Rasulo, Jay, 196 Rausch Street apartment, 7–8, 14, 25, 36–38, 179, 183, 208 rebranding, 64–67, 78–79 regulations.


pages: 132 words: 31,976

Getting Real by Jason Fried, David Heinemeier Hansson, Matthew Linderman, 37 Signals

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

call centre, collaborative editing, David Heinemeier Hansson, iterative process, John Gruber, knowledge worker, Merlin Mann, Metcalfe's law, performance metric, premature optimization, Ruby on Rails, slashdot, Steve Jobs, web application

Complexity Does Not Scale Linearly With Size The most important ruleof software engineering is also the least known: Complexity does not scale linearly with size...A 2000 line program requires more than twice as much development time as one half the size. —The Ganssle Group (from Keep It Small) Table of contents | Essay list for this chapter | Next essay Optimize for Happiness Choose tools that keep your team excited and motivated A happy programmer is a productive programmer. That's why we optimize for happiness and you should too. Don't just pick tools and practices based on industry standards or performance metrics. Look at the intangibles: Is there passion, pride, and craftmanship here? Would you truly be happy working in this environment eight hours a day? This is especially important for choosing a programming language. Despite public perception to the contrary, they are not created equal. While just about any language can create just about any application, the right one makes the effort not merely possible or bearable, but pleasant and invigorating.


pages: 128 words: 38,187

The New Prophets of Capital by Nicole Aschoff

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, affirmative action, Affordable Care Act / Obamacare, Airbnb, American Legislative Exchange Council, basic income, Bretton Woods, clean water, collective bargaining, commoditize, crony capitalism, feminist movement, follow your passion, Food sovereignty, glass ceiling, global supply chain, global value chain, helicopter parent, hiring and firing, income inequality, Khan Academy, late capitalism, Lyft, Mark Zuckerberg, mass incarceration, means of production, performance metric, profit motive, rent-seeking, Ronald Reagan, Rosa Parks, school vouchers, shareholder value, sharing economy, Silicon Valley, Slavoj Žižek, structural adjustment programs, Thomas L Friedman, Tim Cook: Apple, urban renewal, women in the workforce, working poor, zero-sum game

But they are not, and feminist ideals cannot be achieved if they are pursued Sandberg-style. Women who channel their energies toward reaching the top of corporate America undermine the struggles of women trying to realize institutional change by organizing unions and implementing laws that protect women (and men) in the workplace. An anecdote shared by Sandberg illustrates this point: In 2010 Mark Zuckerberg pledged $100 million to improve the performance metrics of the Newark Public Schools. The money would be distributed through a new foundation called Startup: Education. Sandberg recommended Jen Holleran, a woman she knew “with deep knowledge and experience in school reform” to run the foundation. The only problem was that Jen was raising fourteen-month-old twins at the time, working part time, and not getting much help from her husband. Jen hesitated to accept the offer, fearful of “upsetting the current order” at home.


pages: 892 words: 91,000

Valuation: Measuring and Managing the Value of Companies by Tim Koller, McKinsey, Company Inc., Marc Goedhart, David Wessels, Barbara Schwimmer, Franziska Manoury

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

activist fund / activist shareholder / activist investor, air freight, barriers to entry, Basel III, BRICs, business climate, business process, capital asset pricing model, capital controls, Chuck Templeton: OpenTable, cloud computing, commoditize, compound rate of return, conceptual framework, corporate governance, corporate social responsibility, creative destruction, credit crunch, Credit Default Swap, discounted cash flows, distributed generation, diversified portfolio, energy security, equity premium, fixed income, index fund, intangible asset, iterative process, Long Term Capital Management, market bubble, market friction, meta analysis, meta-analysis, Myron Scholes, negative equity, new economy, p-value, performance metric, Ponzi scheme, price anchoring, purchasing power parity, quantitative easing, risk/return, Robert Shiller, Robert Shiller, shareholder value, six sigma, sovereign wealth fund, speech recognition, survivorship bias, technology bubble, time value of money, too big to fail, transaction costs, transfer pricing, value at risk, yield curve, zero-coupon bond

Equal attention is paid to the long-term value-creating intent behind short-term profit targets, and people across the company are in constant communication about the adjustments needed to stay in line with long-term performance goals. We approach performance management from both an analytical and an organizational perspective. The analytical perspective focuses first on ensuring that companies use the right metrics at the right level in the organization. Companies should not just rely on performance metrics for divisions or business units, but disaggregate performance to the level of individual business segments. In addition to historical performance measures, companies need to use diagnostic metrics that help them understand and manage their ability to create value over the longer term. Second, we analyze how to set appropriate targets, giving examples of analytically sound performance measurement in action.

At some point, expansion of market share and sales will require additional production capacity. Once that point is reached, the associated 6 For example, declining sales in one segment would imply increasing capital allocated to other segments even if their sales would be unchanged. 592 PERFORMANCE MANAGEMENT investments and operating costs need to be factored in for target setting in individual business segments. The Right Metrics in Action Choosing the right performance metrics can provide new insights into how a company might improve its performance in the future. For instance, Exhibit 26.8 illustrates the most important value drivers for a pharmaceutical company. The exhibit shows the key value drivers, the company’s current performance relative to best- and worst-in-class benchmarks, its aspirations for each driver, and the potential value impact from meeting its targets.

The greatest value creation would come from three areas: accelerating the rate of release of new products from 0.5 to 0.8 per year, reducing from six years to four the time it takes for a new drug to reach 80 percent of peak sales, and cutting the cost of goods sold from 26 percent to 23 percent of sales. Some of the value drivers (such as new-drug development) are long-term, whereas others (such as reducing cost of goods sold) have a shorter-term focus. Similarly, focusing on the right performance metrics can help reveal what may be driving underperformance. A consumer goods company we know illustrates the importance of having a tailored set of key value metrics. For several years, a business unit showed consistent double-digit growth in economic profit. Since the financial results were consistently strong—in fact, the strongest across all the business units—corporate managers were pleased and did not ask many questions of the business unit.


pages: 302 words: 82,233

Beautiful security by Andy Oram, John Viega

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, Amazon Web Services, business intelligence, business process, call centre, cloud computing, corporate governance, credit crunch, crowdsourcing, defense in depth, Donald Davies, en.wikipedia.org, fault tolerance, Firefox, loose coupling, Marc Andreessen, market design, Monroe Doctrine, new economy, Nicholas Carr, Nick Leeson, Norbert Wiener, optical character recognition, packet switching, peer-to-peer, performance metric, pirate software, Robert Bork, Search for Extraterrestrial Intelligence, security theater, SETI@home, Silicon Valley, Skype, software as a service, statistical model, Steven Levy, The Wisdom of Crowds, Upton Sinclair, web application, web of trust, x509 certificate, zero day, Zimmermann PGP

Operational profile definition Explore Problem definition prioritizes key performance and capacity needs Architect for performance, capacity, and future growth Volume deploy Execute Performance budgets Performance targets Performance engineer begins work during requirements phase Annotated use cases and user scenarios Releases and iterations prioritized to validate key performance issues early Prototyping Performance estimates Benchmarks Performance measurements Code instrumentation Automated execution of performance and load tests Performance data capture Test tools/scripts for field measurement of performance/capacity Project management tracks performance metrics FIGURE 10-3. Best practices dependencies: Performance and Capacity SECURITY BY DESIGN 177 Explore Problem definition prioritizes key functions needed Operational profile definition Reliability engineer begins work during requirements phase to understand critical functions and constraints Tune physical and functional architecture for reliability and Define acceptable failure and Annotated use cases and user scenarios availability recovery rates– availability and reliability targets Predict expected reliability and availability Releases and iterations prioritized to handle capabilities early Fault/failure injection testing Failure data collected and analyzed and predictions made Fault detection, System auditing and isolation, and repair sanity control Automated execution Project management tracks of stability testing Code instrumentation quality index Volume deploy Execute Reliability budgets for failure and recovery rates Reliability and availability data capture Field measurement of failures and recovery FIGURE 10-4.

I initially dreaded this decision since it limited the leverage I had to encourage project leaders to identify and remediate security vulnerabilities. The results proved that this decision actually increased compliance with the security plan. With the requirement to pass the static analysis test still hanging over teams, they felt the need to remove defects earlier in the lifecycle so that they would avoid last-minute rejections. The second decision was the implementation of a detailed reporting framework in which key performance metrics (for instance, percentage of high-risk vulnerabilities per lines of code) were shared with team leaders, their managers, and the CIO on a monthly basis. The vulnerability information from the static code analyzer was summarized at the project, portfolio, and organization level and shared with all three sets of stakeholders. Over time, development leaders focused on the issues that were raising their risk score and essentially competed with each other to achieve better results.


pages: 493 words: 139,845

Women Leaders at Work: Untold Tales of Women Achieving Their Ambitions by Elizabeth Ghaffari

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, AltaVista, business process, cloud computing, Columbine, corporate governance, corporate social responsibility, dark matter, family office, Fellow of the Royal Society, financial independence, follow your passion, glass ceiling, Grace Hopper, high net worth, knowledge worker, Long Term Capital Management, performance metric, pink-collar, profit maximization, profit motive, recommendation engine, Ronald Reagan, shareholder value, Silicon Valley, Silicon Valley startup, Steve Ballmer, Steve Jobs, thinkpad, trickle-down economics, urban planning, women in the workforce, young professional

Trying to do the best for them. My whole academic and personal upbringing was working with physicians. So I don’t view physicians as the enemy. It just doesn’t make good business sense. Ghaffari: How many departments did you end up having under you? Luttgens: I had a total of ten professional services departments. Most of them were physician-led or physician-supported. Ghaffari: What was your performance metric that you did for them? Luttgens: Back in those days, the early eighties, we didn’t have quality management or outcomes as we do today. You needed to control expenses, enhance revenue, increase patient volume, and get along. I was well-known around the medical center for getting substantial capital funding for items in my capital budgets each year. Most of my departments were very capital-intensive.

It was a big change to run a nonprofit where a major part of your job is fundraising. That taught me that I was both good at, and enjoyed, fundraising because I understood the customer and believed in the product. Ghaffari: Was your primary responsibility there in an executive director role? What were some of your key accomplishments? Roden: Yes. Regarding accomplishments, we tracked several metrics. First of all, sponsorship was an important performance metric. When I started, SVASE was bringing in about $10,000 a year in sponsorship. When I left, it was $300,000 a year. Another key metric was the mailing list. When I started, we had about two thousand people on our e-mail list. When I left, it was about twenty thousand people. When I started, we had about twenty volunteers. When I left, we had about two hundred and fifty volunteers, meaning people actively engaged in running parts of the organization.


pages: 320 words: 33,385

Market Risk Analysis, Quantitative Methods in Finance by Carol Alexander

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

asset allocation, backtesting, barriers to entry, Brownian motion, capital asset pricing model, constrained optimization, credit crunch, Credit Default Swap, discounted cash flows, discrete time, diversification, diversified portfolio, en.wikipedia.org, fixed income, implied volatility, interest rate swap, market friction, market microstructure, p-value, performance metric, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, risk/return, Sharpe ratio, statistical arbitrage, statistical model, stochastic process, stochastic volatility, Thomas Bayes, transaction costs, value at risk, volatility smile, Wiener process, yield curve, zero-sum game

We describe some standard utility functions that display different risk aversion characteristics and show how an investor’s utility determines his optimal portfolio. Then we solve the portfolio allocation decision for a risk averse investor, following and then generalizing the classical problem of portfolio selection that was introduced by Markowitz (1959). This lays the foundation for our review of the theory of asset pricing, and our critique of the many risk adjusted performance metrics that are commonly used by asset managers. ABOUT THE CD-ROM My golden rule of teaching has always been to provide copious examples, and whenever possible to illustrate every formula by replicating it in an Excel spreadsheet. Virtually all the concepts in this book are illustrated using numerical and empirical examples, and the Excel workbooks for each chapter may be found on the accompanying CD-ROM.

Many risk adjusted performance measures that are commonly used today are either not linked to a utility function at all, or if they are associated with a utility function we assume the investor cares nothing at all about the gains he makes above a certain threshold. Kappa indices can be loosely tailored to the degree of risk aversion of the investor, but otherwise the rankings produced by the risk adjusted performance measure may not be ranking in the order of an investor’s preference! The only universal risk adjusted performance metric, i.e. one that can rank investments having any returns distributions for investors having any type of utility function, is the certain equivalent. The certain equivalent of an uncertain investment is the amount of money, received for certain, that gives the same utility to the investor as the uncertain investment. References Adjaouté, K. and Danthine, J.P. (2004) Equity returns and integration: Is Europe changing?

How I Became a Quant: Insights From 25 of Wall Street's Elite by Richard R. Lindsey, Barry Schachter

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, algorithmic trading, Andrew Wiles, Antoine Gombaud: Chevalier de Méré, asset allocation, asset-backed security, backtesting, bank run, banking crisis, Black-Scholes formula, Bonfire of the Vanities, Bretton Woods, Brownian motion, business process, buy low sell high, capital asset pricing model, centre right, collateralized debt obligation, commoditize, computerized markets, corporate governance, correlation coefficient, creative destruction, Credit Default Swap, credit default swaps / collateralized debt obligations, currency manipulation / currency intervention, discounted cash flows, disintermediation, diversification, Donald Knuth, Edward Thorp, Emanuel Derman, en.wikipedia.org, Eugene Fama: efficient market hypothesis, financial innovation, fixed income, full employment, George Akerlof, Gordon Gekko, hiring and firing, implied volatility, index fund, interest rate derivative, interest rate swap, John von Neumann, linear programming, Loma Prieta earthquake, Long Term Capital Management, margin call, market friction, market microstructure, martingale, merger arbitrage, Myron Scholes, Nick Leeson, P = NP, pattern recognition, Paul Samuelson, pensions crisis, performance metric, prediction markets, profit maximization, purchasing power parity, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Richard Feynman, Richard Feynman, Richard Stallman, risk-adjusted returns, risk/return, shareholder value, Sharpe ratio, short selling, Silicon Valley, six sigma, sorting algorithm, statistical arbitrage, statistical model, stem cell, Steven Levy, stochastic process, systematic trading, technology bubble, The Great Moderation, the scientific method, too big to fail, trade route, transaction costs, transfer pricing, value at risk, volatility smile, Wiener process, yield curve, young professional

In the early 1990s, the entire banking industry was moving headlong toward Raroc as a pricing and performance measurement framework. However, as early as 1992, I recognized that the common Raroc measure based on own portfolio risk or VaR was at odds with equilibrium and arbitrage pricing theory (see Wilson (1992)). Using classical finance to make the point, I recast a simple CAPM model into a Raroc performance metric and showed that Raroc based on own portfolio risk without the recognition of funding was inherently biased. In the years since 1992, many other authors have followed a similar line of thought. What is the appropriate cost of capital, by line of business, if capital is allocated based on the standalone risk of each underlying business? And, what role does earnings volatility play in the valuation of a bank or insurance company?

See Credit risk integrated tool set, application, 80 technology, usage, 134–135 Portfolio optimization, 281–283 “Portfolio Optimization with Factors, Scenarios, and Realistic Short Positions,” 281 Portfolio Theory (Levy/Sarnat), 228 Portfolio trading, mathematics, 128–130 Positive interest rates, ensuring, 161–162 Prepayment data, study, 183 Press, Bill, 36 Price/book controls, pure return, 272 Price data, study, 183 Price/earnings ratios, correlation, 269 Price limits, impact, 77 Primitive polynomial modulo two, 170 Prisoner’s dilemma, 160 Private equity returns, benchmarks (finding), 145 Private signals, quality (improvement), 159–160 Publicly traded contingent claims, combinations (determination), 249 Public pension funds, investment, 25 Pure mathematics, 119, 126 Quantitative active management, growth, 46–47 Quantitative approach, usage, 26–27 Quantitative finance, 237–238 purpose, 96–98 Quantitative Financial Research (Bloomberg), 137 Quantitative investing, limitation, 209 Quantitative label, implication, 25–26 Quantitative methods, role, 96–97 Quantitative Methods for Financial Analysis (Stephen/Kritzman), 253 Quantitative models, enthusiasm, 234 Quantitative portfolio management, 130–131 Quantitative strategies, usage, 240 Quantitative Strategies (SAC Capital Management, LLC), 107 Quantitative training, application process, 255–260 Quants business sense, discussion, 240–241 characteristics/description, 208–210 conjecture, 177–179 conversion, 327 data mining, 209–210 description, New York Times article, 32 due diligence, requirement, 169 future, 13–16, 261 innovations, 255–258 myths, dispelling, 258–260 perspective, change, 134–135 process, 92–93 research, 127–128 Quigg, Laura, 156–158, 160 Quotron, recorded data (usage), 22 Rahl, Leslie, 83–93 Ramaswamy, Krishna, 253 385 RAND Corporation, 13–17 Raroc models, usage/development, 102–103 Raroc performance metric, 103 Reagan, Ronald, 15 Real economic behavior, level (usefulness), 101 Real options (literature), study, 149 Real-time artificial intelligence, 16 Rebonato, Riccardo, 168, 169, 232 Reed, John, 89 Registered investment advisors, 79 Regression, time-varying, 239 Renaissance Medallion fund, 310 Representation Theory and Complex Geometry, 122–125 Resampling statistics, usage, 239–240 Research collaboration, type, 157–158 Research Objectivity Standards, 280–281 Retail markets, changes, 148–149 Return, examination, 71–72 Return-predictor relationships, 269 Returns separation, 34–35 variance, increasing, 72 “Revenue Recognition Certificates: A New Security” (LeClair/Schulman), 82 Rich, Don, 256 Riemann Hypothesis, solution, 108 Risk analytics, sale, 301 bank rating, 216 buckets, 71 cost, 129 examination, 70–71 forecast, BARRA bond model (usage), 39 importance, 34–35 manager, role, 302–303 reversal, 299 worries, 39 Risk-adjusted return, 102 Risk management, 233 consulting firm, 293 technology, usage, 134–135 world developments, 96 Risk Management (Clinton Group), 295 Risk Management & Quantitative Research (Permal Group), 227 RiskMetrics, 300–301 business, improvement, 301 computational device, 240 Technical Document, publication (1996), 66 Risk/return trade-off, 259 RJR Nabisco, LBO, 39 Roll, Richard, 140 Ronn, Ehud, 157, 160–162 Rosenberg, Barr, 34–42 models, development, 34–37 Rosenbluth, Jeff, 132 Ross, Stephen A., 141, 254, 336 arbitrage pricing model, development, 147–148 Rubinstein, Mark, 278, 336 P1: OTE/PGN JWPR007-Lindsey P2: OTE January 1, 1904 6:33 386 Rudd, Andrew, 35, 307 historical performance analysis, 44 Rudy, Rob, 219 Russell 3000, constitution, 275 Salomon Brothers, Bloomberg (employ), 73 Samuelson, Paul, 256–257 time demonstration, 258 Sankar, L., 162 Sargent, Thomas, 188 Savine, Antoine, 167 Sayles, Loomis, 33 SBCC, 285 Scholes, Myron, 11, 88, 177, 336 input, 217 Schulman, Evan, 67–82 Schwartz, Robert J., 293, 320 Secret, classification, 16–18 Securities Act of 1933, 147 Securities Exchange Act of 1934, 147 Security replication, probability (usage), 122 SETS, 77 Settlement delays, 174 Seymour, Carl, 175–176 Shareholder value creation, questions, 98 Sharpe, William, 34, 254 algorithm, 257–258 modification, 258 Shaw, Julian, 227–242 Sherring, Mike, 232 Short selling, 275–276 Short selling, risk-reducing/returnenhancing benefits, 277 Short-term reversal strategy, 198–199 Shubik, Martin, 288–289, 291, 293 Siegel’s Paradox, 321–322 Sklar’s theorem, 240 Slawsky, Al, 40–41 Small-cap stocks, purchase, 268 Smoothing, 192–193 Sobol’ numbers, 173–173 Social Sciences Research Network (SSRN), 122 Social Security system, bankruptcy, 148 Society for Quantitative Analysis (SQA), 253 Spatt, Chester, 252 Spot volatility, parameter, 89–90 Standard & Poor’s futures, price (relationship), 75 INDEX Start-up company, excitement, 24–25 Statistical data analysis, 213–214 Statistical error, 228 Sterge, Andrew J., 317–327 Stevens, Ross, 201 Stochastic calculus, 239 Stock market crash (1987), 282 Stocks portfolio trading, path trace, 129 stories, analogy, 23–26 Strategic Business Development (RiskMetrics Group), 49 Sugimoto, E., 171 Summer experience, impact, 57 Sun Unix workstation, 22 Surplus insurance, usage, 255–256 Swaps rate, Black volatilities, 172 usage, 292–293 Sweeney, Richard, 190 Symbolics, 16, 18 Taleb, Nassim, 132 Tenenbein, Aaron, 252 Textbook learning, expansion, 144 Theoretical biases, 103 Theory, usage/improvement, 182–185 Thornton, Dan, 139 Time diversification, myths, 258 Top secret, classification, 16–18 Tracking error, focus, 80–81 Trading, 72–73 Transaction cost, 129 absence, 247 impact, 273–274 Transaction pricing, decision-making process, 248 Transistor experiment (TX), 11 Transistorized Experimental Computer Zero (tixo), usage, 86 Treynor, Jack, 34, 254 Trigger, usage, 117–118 Trimability, 281 TRS-80 (Radio Shack), usage, 50, 52, 113 Trust companies, individually managed accounts (growth), 79 Tucker, Alan, 334 Uncertainty examination, 149–150 resolution, 323–324 Unit initialization, 172 Universal Investment Reasoning, 19–20 Upstream Technologies, LLC, 67 U.S. individual stock data, research, 201–202 Value-at-Risk (VaR), 195. calculation possibility tails, changes, 100 design, 293 evolution, 235 measurement, 196 number, emergence, 235 van Eyseren, Olivier, 173–175 Vanilla interest swaptions, 172 VarianceCoVariance (VCV), 235 Variance reduction techniques, 174 Vector auto-regression (VAR), 188 Venture capital investments, call options (analogy), 145–146 Volatility, 100, 174, 193–194 Volcker, Paul, 32 von Neumann, John, 319 Waddill, Marcellus, 318 Wall Street business, arrival, 61–65 interest, 160–162 move, shift, 125–127 quant search, genesis, 32 roots, 83–85 Wanless, Derek, 173 Wavelets, 239 Weisman, Andrew B., 187–196 Wells Fargo Nikko Investment Advisors, Grinold (attachment), 44 Westlaw database, 146–148 “What Practitioners Need to Know” (Kritzman), 255 Wigner, Eugene, 54 Wiles, Andrew, 112 Wilson, Thomas C., 95–105 Windham Capital Management LLC, 251, 254 Wires, rat consumption (prevention), 20–23 Within-horizon risk, usage, 256 Worker longevity, increase, 148 Wyckoff, Richard D., 321 Wyle, Steven, 18 Yield, defining, 182 Yield curve, 89–90, 174 Zimmer, Bob, 131–132


pages: 559 words: 155,372

Chaos Monkeys: Obscene Fortune and Random Failure in Silicon Valley by Antonio Garcia Martinez

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Airbnb, airport security, always be closing, Amazon Web Services, Burning Man, Celtic Tiger, centralized clearinghouse, cognitive dissonance, collective bargaining, corporate governance, Credit Default Swap, crowdsourcing, death of newspapers, drone strike, El Camino Real, Elon Musk, Emanuel Derman, financial independence, global supply chain, Goldman Sachs: Vampire Squid, hive mind, income inequality, information asymmetry, interest rate swap, intermodal, Jeff Bezos, Malcom McLean invented shipping containers, Marc Andreessen, Mark Zuckerberg, Maui Hawaii, means of production, Menlo Park, minimum viable product, move fast and break things, move fast and break things, Network effects, Paul Graham, performance metric, Peter Thiel, Ponzi scheme, pre–internet, Ralph Waldo Emerson, random walk, Ruby on Rails, Sand Hill Road, Scientific racism, second-price auction, self-driving car, Silicon Valley, Silicon Valley startup, Skype, Snapchat, social graph, social web, Socratic dialogue, source of truth, Steve Jobs, telemarketer, urban renewal, Y Combinator, zero-sum game, éminence grise

I even hung a real length of Spanish chorizo from my monitor, as a rallying symbol, and the targeting team got down to the serious business of monetizing every last user action on Facebook. Just as my first view of Facebook’s high-level revenue dashboard proved a dispiriting exercise, Chorizo’s final results, which took months to produce, were a similar tale of woe. No user data we had, if fed freely into the topics that Facebook’s savviest marketers used to target their ads, improved any performance metric we had access to. That meant that advertisers trying to find someone who, say, wanted to buy a car, benefited not at all from all the car chatter taking place on Facebook. It was as if we had fed a mile-long trainful of meat cows into a slaughterhouse, and had come out with one measly sausage to show for it. It was incomprehensible, and it tested my faith (which, believe it or not, I certainly had at that time) in Facebook’s claim to unique primacy in the realm of user data.

Immature advertising markets, the embryonic state of their e-commerce infrastructure, and their lower general wealth meant the impact of new optimization tricks or targeting data on those countries was minimal. And so the Ads team would slice off tranches of the FB user base in rich ads markets and dose them with different versions of the ads system to measure the effect of a new feature, as you would test subjects in a clinical drug trial.* The performance metrics of interest included clickthrough rates, which are a coarse measure of user interest. More convincing is the actual downstream monetization resulting from someone clicking through and buying something—assuming Facebook got the conversion data, which it often didn’t, given that Facebook didn’t have a conversion-tracking system. Also important, and not related to money at all, was overall usage.


pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life by Adam Greenfield

3D printing, Airbnb, augmented reality, autonomous vehicles, bank run, barriers to entry, basic income, bitcoin, blockchain, business intelligence, business process, call centre, cellular automata, centralized clearinghouse, centre right, Chuck Templeton: OpenTable, cloud computing, collective bargaining, combinatorial explosion, Computer Numeric Control, computer vision, Conway's Game of Life, cryptocurrency, David Graeber, dematerialisation, digital map, distributed ledger, drone strike, Elon Musk, ethereum blockchain, facts on the ground, fiat currency, global supply chain, global village, Google Glasses, IBM and the Holocaust, industrial robot, informal economy, information retrieval, Internet of things, James Watt: steam engine, Jane Jacobs, Jeff Bezos, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, joint-stock company, Kevin Kelly, Kickstarter, late capitalism, license plate recognition, lifelogging, M-Pesa, Mark Zuckerberg, means of production, megacity, megastructure, minimum viable product, money: store of value / unit of account / medium of exchange, natural language processing, Network effects, New Urbanism, Occupy movement, Oculus Rift, Pareto efficiency, pattern recognition, Pearl River Delta, performance metric, Peter Eisenman, Peter Thiel, planetary scale, Ponzi scheme, post scarcity, RAND corporation, recommendation engine, RFID, rolodex, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, sharing economy, Silicon Valley, smart cities, smart contracts, sorting algorithm, special economic zone, speech recognition, stakhanovite, statistical model, stem cell, technoutopianism, Tesla Model S, the built environment, The Death and Life of Great American Cities, The Future of Employment, transaction costs, Uber for X, universal basic income, urban planning, urban sprawl, Whole Earth Review, WikiLeaks, women in the workforce

This shrunken workforce will be asked to do more, for lower wages, at a yet higher pace. Amazon is again the leading indicator here.28 Its warehouse workers are hired on fixed, short-term contracts, through a deniable outsourcing agency, and precluded from raises, benefits, opportunities for advancement or the meaningful prospect of permanent employment. They work under conditions of “rationalized” oversight in the form of performance metrics that are calibrated in real time. Any degree of discretion or autonomy they might have retained is ruthlessly pared away by efficiency algorithm. The point couldn’t be made much more clearly: these facilities are places that no one sane would choose to be if they had any other option at all. And this is only the most obvious sort of technological intervention in the workplace. We barely have words for what happens when an algorithm breaks down jobs into tasks that are simple enough that they don’t call for any particular expertise—just about anybody will suffice to perform them—and outsources them to a global network of individuals made precarious and therefore willing to work for very little.

The company uses the accompanying analytic suite to “identify top performers” (and, by implication, those at the bottom as well), and plan schedules and distribute assignments in the store accordingly. Theatro’s devices are less elaborate than a Hitachi wearable called Business Microscope, which aims to capture, quantify and make inferences from several dimensions of employee behavior.33 As grim as call-center work is, a Hitachi press release brags about their ability to render it more dystopian yet via the use of this tool—improving performance metrics not by reducing employees’ workload, but by compelling them to be more physically active during their allotted break periods.34 Hitachi’s wearables, in turn, are less capable than the badges offered by Cambridge, MA, startup Sociometric Solutions, which are “equipped with two microphones, a location sensor and an accelerometer” and are capable of registering “tone of voice, posture and body language, as well as who spoke to whom for how long.”35 As with all of these devices, the aim is to continuously monitor (and eventually regulate) employee behavior.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

That is why it is important to build objectives, measurements, and milestones that demonstrate the benefits of a team focused on Big Data analytics. Developing performance measurements is an important part of designing a business plan. With Big Data, those metrics can be assigned to the specific goal in mind. For example, if an organization is looking to bring efficiency to a warehouse, a performance metric may be measuring the amount of empty shelf space and what the cost of that empty shelf space means to the company. Analytics can be used to identify product movement, sales predictions, and so forth to move product into that shelf space to better service the needs of customers. It is a simple comparison of the percentage of space used before the analytics process and the percentage of space used after the analytics team has tackled the issue.


pages: 204 words: 58,565

Keeping Up With the Quants: Your Guide to Understanding and Using Analytics by Thomas H. Davenport, Jinho Kim

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Black-Scholes formula, business intelligence, business process, call centre, computer age, correlation coefficient, correlation does not imply causation, Credit Default Swap, en.wikipedia.org, feminist movement, Florence Nightingale: pie chart, forensic accounting, global supply chain, Hans Rosling, hypertext link, invention of the telescope, inventory management, Jeff Bezos, margin call, Moneyball by Michael Lewis explains big data, Myron Scholes, Netflix Prize, p-value, performance metric, publish or perish, quantitative hedge fund, random walk, Renaissance Technologies, Robert Shiller, Robert Shiller, self-driving car, sentiment analysis, six sigma, Skype, statistical model, supply-chain management, text mining, the scientific method

MODELING (VARIABLE SELECTION). The variables in deciding whether to acquire Battier from the Grizzlies would be the cost of acquiring him (outright or in trade for other players), the amount that he would be paid going forward, various individual performance measures, and ideally some measure of team performance while Battier was on the court versus when he was not. DATA COLLECTION (MEASUREMENT). The individual performance metrics and financials were easy to gather. And there is a way to measure an individual player’s impact on team performance. The “plus/minus” statistic, adapted by Roland Beech of 82games.com from a similar statistic used in hockey, compares how a team performs with a particular player in the game versus its performance when he is on the bench. DATA ANALYSIS. Morey and his statisticians decided to use plus/ minus analysis to evaluate Battier.


pages: 204 words: 54,395

Drive: The Surprising Truth About What Motivates Us by Daniel H. Pink

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

affirmative action, call centre, Daniel Kahneman / Amos Tversky, Dean Kamen, deliberate practice, Firefox, Frederick Winslow Taylor, functional fixedness, game design, George Akerlof, Isaac Newton, Jean Tirole, job satisfaction, knowledge worker, performance metric, profit maximization, profit motive, Results Only Work Environment, side project, the built environment, Tony Hsieh, transaction costs, zero-sum game

It's another way to allow people to focus on the work itself. Indeed, other economists have shown that providing an employee a high level of base pay does more to boost performance and organizational commitment than an attractive bonus structure. Of course, by the very nature of the exercise, paying above the average will work for only about half of you. So get going before your competitors do. 3. IF YOU USE PERFORMANCE METRICS, MAKE THEM WIDE-RANGING, RELEVANT, AND HARD TO GAME I magine you're a product manager and your pay depends largely on reaching a particular sales goal for the next quarter. If you're smart, or if you've got a family to feed, you're going to try mightily to hit that number. You probably won't concern yourself much with the quarter after that or the health of the company or whether the firm is investing enough in research and development.


pages: 261 words: 16,734

Peopleware: Productive Projects and Teams by Tom Demarco, Timothy Lister

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

A Pattern Language, cognitive dissonance, interchangeable parts, job satisfaction, knowledge worker, Parkinson's law, performance metric, skunkworks, supply-chain management, women in the workforce

Three rules of thumb seem to apply whenever you measure variations in performance over a sample of individuals. • Count on the best people outperforming the worst by about 10:1. • Count on the best performer being about 2.5 times better than the median performer. • Count on the half that are better-than-median performers outdoing the other half by more than 2:1. These rules apply for virtually any performance metric you define. So, for instance, the better half of a sample will do a given job in less than half the time the others take; the more defect-prone half will put in more than two thirds of the defects, and so on. Results of the Coding War Games were very much in line with this profile. Take as an example Figure 8–2, which shows the performance spread of time to achieve the first milestone (clean compile, ready for test) in one year’s games.

Toast by Stross, Charles

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

anthropic principle, Buckminster Fuller, cosmological principle, dark matter, double helix, Ernest Rutherford, Extropian, Francis Fukuyama: the end of history, glass ceiling, gravity well, Khyber Pass, Mars Rover, Mikhail Gorbachev, NP-complete, oil shale / tar sands, peak oil, performance metric, phenotype, Plutocrats, plutocrats, Ronald Reagan, Silicon Valley, slashdot, speech recognition, strong AI, traveling salesman, Turing test, urban renewal, Vernor Vinge, Whole Earth Review, Y2K

It was a woman I’d met somewhere—some conference or other—lanky blonde hair, palid skin, and far too evangelical about formal methods. “Feel free.” She pulled a chair out and sat down and the steward poured her a cup of coffee immediately. I noticed that even on a cruise ship she was dressed in a business suit, although it looked somewhat the worse for wear. “Coffee, please,” I called after the retreating steward. “We met in Darmstadt, `97,” she said. “You’re Marcus Jackman? I critiqued your paper on performance metrics for IEEE maintenance transactions.” The penny dropped. “Karla . . . Carrol?” I asked. She smiled. “Yes, I remember your review.” I did indeed, and nearly burned my tongue on the coffee trying not to let slip precisely how I remembered it. I’m not fit to be rude until after at least the third cup of the morning. “Most interesting. What brings you here?” “The usual risk contingency planning.


pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, Affordable Care Act / Obamacare, airport security, AltaVista, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, IBM and the Holocaust, index card, informal economy, intangible asset, Internet of things, invention of the printing press, Jeff Bezos, lifelogging, Louis Pasteur, Mark Zuckerberg, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, performance metric, Peter Thiel, Post-materialism, post-materialism, random walk, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, speech recognition, Steve Jobs, Steven Levy, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Watson beat the top human players on Jeopardy!

Grigsby, Pamela Ann Nesbitt, and Lisa Anne Seacat. “Securing premises using surfaced-based computing technology,” U.S. Patent number: 8138882. Issue date: March 20, 2012. The quantified-self movement—“Counting Every Moment,” The Economist, March 3, 2012. Apple earbuds for bio-measurements—Jesse Lee Dorogusker, Anthony Fadell, Donald J. Novotney, and Nicholas R Kalayjian, “Integrated Sensors for Tracking Performance Metrics,” U.S. Patent Application 20090287067. Assignee: Apple. Application Date: 2009-07-23. Publication Date: 2009-11-19. Derawi Biometrics, “Your Walk Is Your PIN-Code,” press release, February 21, 2011 (http://biometrics.derawi.com/?p=175). iTrem information—See the iTrem project page of the Landmarc Research Center at Georgia Tech (http://eosl.gtri.gatech.edu/Capabilities/LandmarcResearchCenter/LandmarcProjects/iTrem/tabid/798/Default.aspx) and email exchange.


pages: 294 words: 82,438

Simple Rules: How to Thrive in a Complex World by Donald Sull, Kathleen M. Eisenhardt

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Affordable Care Act / Obamacare, Airbnb, asset allocation, Atul Gawande, barriers to entry, Basel III, Berlin Wall, carbon footprint, Checklist Manifesto, complexity theory, Craig Reynolds: boids flock, Credit Default Swap, Daniel Kahneman / Amos Tversky, diversification, drone strike, en.wikipedia.org, European colonialism, Exxon Valdez, facts on the ground, Fall of the Berlin Wall, haute cuisine, invention of the printing press, Isaac Newton, Kickstarter, late fees, Lean Startup, Louis Pasteur, Lyft, Moneyball by Michael Lewis explains big data, Nate Silver, Network effects, obamacare, Paul Graham, performance metric, price anchoring, RAND corporation, risk/return, Saturday Night Live, sharing economy, Silicon Valley, Startup school, statistical model, Steve Jobs, TaskRabbit, The Signal and the Noise by Nate Silver, transportation-network company, two-sided market, Wall-E, web application, Y Combinator, Zipcar

You can also limit your rules to two or three, as we have seen elsewhere in the book, to increase the odds that you will remember and follow them. After crafting your preliminary rules, it is helpful to measure how well they are working. Measuring impact allows you to pinpoint what is and isn’t working, and evidence of success also provides more motivation to stick with the rules. The best performance metrics are tightly linked to what will move the needles for you—pounds lost for a dieter, or dollars invested if you are trying to save for retirement. Apps have made collecting data and tracking progress easier than at any other time in history. Imagine what the legendary self-improver Benjamin Franklin could have accomplished if he’d had an iPhone. To measure the impact of your simple rules, it helps to collect some data before you start using your rules.


pages: 256 words: 15,765

The New Elite: Inside the Minds of the Truly Wealthy by Dr. Jim Taylor

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

British Empire, call centre, dark matter, Donald Trump, estate planning, full employment, glass ceiling, income inequality, Jeff Bezos, Louis Pasteur, Maui Hawaii, McMansion, means of production, passive income, performance metric, Plutocrats, plutocrats, Plutonomy: Buying Luxury, Explaining Global Imbalances, Ronald Reagan, stealth mode startup, Steve Jobs, Thorstein Veblen, trickle-down economics, women in the workforce, zero-sum game

For any respondent who wanted it, we provided a coded identification number that enabled the individual to examine the results and reports for personal reasons. In some cases, we even let them examine their own data in comparison to others in the financial elite. For a generation of business men and women who believe in measurement, and who grew Debunking Paris Hilton 15 up with IQ tests, SAT scores, and other performance metrics, this quantitative capability was an often irresistible source of pleasure. This was particularly true because the individuals had been on a special journey, one their upbringings had left them largely unprepared for, and so understanding the journeys of others was a means for understanding their own trips and themselves. But there is a deeper, more telling reason the wealthy volunteered hours of their time for us.


The Fix: How Bankers Lied, Cheated and Colluded to Rig the World's Most Important Number (Bloomberg) by Liam Vaughan, Gavin Finch

asset allocation, asset-backed security, bank run, banking crisis, Bernie Sanders, Big bang: deregulation of the City of London, buy low sell high, call centre, central bank independence, collapse of Lehman Brothers, corporate governance, credit crunch, Credit Default Swap, eurozone crisis, fear of failure, financial deregulation, financial innovation, fixed income, interest rate derivative, interest rate swap, light touch regulation, London Interbank Offered Rate, London Whale, mortgage debt, Northern Rock, performance metric, Ponzi scheme, Ronald Reagan, sovereign wealth fund, urban sprawl

His voice sped up when he talked about heady days piling into positions, squeezing the best prices from brokers and playing traders off against each other. “The first thing you think is where’s the edge, where can I make a bit more money, how can I push, push the boundaries, maybe you know a bit of a gray area, push the edge of the envelope,” he said in one early interview. “But the point is, you are greedy, you want every little bit of money that you can possibly get because, like I say, that is how you are judged, that is your performance metric.” Paper coffee cups piled up as Hayes went over the minutiae of the case: how to hedge a forward rate agreement; the nuances of Libor and Tibor; why he and Darin hated each other so much. One of the interviews was conducted in the dark so Hayes could talk the investigators through his trading book, which was beamed onto a wall. At one stage, Hayes was asked about how he viewed his attempts to move Libor around.


pages: 253 words: 65,834

Mastering the VC Game: A Venture Capital Insider Reveals How to Get From Start-Up to IPO on Your Terms by Jeffrey Bussgang

business process, carried interest, digital map, discounted cash flows, hiring and firing, Jeff Bezos, Marc Andreessen, Mark Zuckerberg, Menlo Park, moveable type in China, pattern recognition, Paul Graham, performance metric, Peter Thiel, pets.com, risk tolerance, rolodex, Ronald Reagan, Sand Hill Road, selection bias, shareholder value, Silicon Valley, Skype, software as a service, sovereign wealth fund, Steve Jobs, technology bubble, The Wisdom of Crowds

” Gail made a point of previewing news with each director right before the board meeting to allow them to have some reflection time before the meeting, and to alert her to any of their initial concerns. “I wanted them to know exactly what they were going to hear in the meeting. By the time we got into the board meeting, everybody was informed and we could really get into the meat of whatever the issue was.” The “no surprises” rule applies to changes in management as well as performance metrics. “The board would lose confidence in some team members at different times,” Gail told me. “So I was very clear about saying, ‘I see the same weaknesses. But here’s what they’re doing. And I’ll make the decision about this person at the right time.’ You can’t fool these guys. If you have an executive that has weaknesses and you try to deny it, it erodes the board’s confidence, makes them think you don’t have good judgment when it comes to people.


pages: 1,088 words: 228,743

Expected Returns: An Investor's Guide to Harvesting Market Rewards by Antti Ilmanen

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Andrei Shleifer, asset allocation, asset-backed security, availability heuristic, backtesting, balance sheet recession, bank run, banking crisis, barriers to entry, Bernie Madoff, Black Swan, Bretton Woods, buy low sell high, capital asset pricing model, capital controls, Carmen Reinhart, central bank independence, collateralized debt obligation, commoditize, commodity trading advisor, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, debt deflation, deglobalization, delta neutral, demand response, discounted cash flows, disintermediation, diversification, diversified portfolio, dividend-yielding stocks, equity premium, Eugene Fama: efficient market hypothesis, fiat currency, financial deregulation, financial innovation, financial intermediation, fixed income, Flash crash, framing effect, frictionless, frictionless market, George Akerlof, global reserve currency, Google Earth, high net worth, hindsight bias, Hyman Minsky, implied volatility, income inequality, incomplete markets, index fund, inflation targeting, information asymmetry, interest rate swap, invisible hand, Kenneth Rogoff, laissez-faire capitalism, law of one price, Long Term Capital Management, loss aversion, margin call, market bubble, market clearing, market friction, market fundamentalism, market microstructure, mental accounting, merger arbitrage, mittelstand, moral hazard, Myron Scholes, negative equity, New Journalism, oil shock, p-value, passive investing, Paul Samuelson, performance metric, Ponzi scheme, prediction markets, price anchoring, price stability, principal–agent problem, private sector deleveraging, purchasing power parity, quantitative easing, quantitative trading / quantitative finance, random walk, reserve currency, Richard Thaler, risk tolerance, risk-adjusted returns, risk/return, riskless arbitrage, Robert Shiller, Robert Shiller, savings glut, selection bias, Sharpe ratio, short selling, sovereign wealth fund, statistical arbitrage, statistical model, stochastic volatility, survivorship bias, systematic trading, The Great Moderation, The Myth of the Rational Market, too big to fail, transaction costs, tulip mania, value at risk, volatility arbitrage, volatility smile, working-age population, Y2K, yield curve, zero-coupon bond, zero-sum game

Most studies conclude that irrational mispricing contributes importantly to observed option market regularities. The rational camp responds that risk stories can explain a surprisingly large part of observed returns without resorting to irrationality—and that various market frictions can make exploiting any remaining opportunities difficult. Specifically, Broadie–Chernov–Johannes (2009) argue that options are often thought to be mispriced because the performance metrics that are used (Sharpe ratios and CAPM alphas) are ill-suited for option analysis, especially over short samples. After documenting the huge challenge for rational models—massively negative average returns for long index puts, losses of 30% per month, or worse, as noted earlier—they proceed to show that standard option-pricing models can largely explain these average returns. OTM puts are especially highly levered positions on the underlying index; during a period of high realized equity premium, OTM puts with large negative betas can be expected to have large negative returns.

Operational risks (errors and fraud) are a good example; the SR of Madoff’s track record was hard to beat but it came with huge operational risk. Conclusions The portfolio SR is a good starting point but it needs to be supplemented with other portfolio attributes. All of the desirable attributes discussed above may be worth some SR sacrifice. However, no single risk-adjusted return measure can capture them all, and many of these tradeoffs can only be assessed in a qualitative fashion. Multiple performance metrics are needed, given the multi-dimensional nature of the problem. 28.2.4 Smart risk taking and portfolio construction There now follow some intuitive rules of thumb for smart investing: a recipe for optimal diversification and the “fundamental law of active management”. First, here is a recipe for smart portfolio construction, which sums up mean variance optimization in a nutshell: allocate equal volatility to each asset class (or return source) in a portfolio, unless some assets’ exceptional SRs or diversification abilities justify deviating from equal volatility weightings.


pages: 903 words: 235,753

The Stack: On Software and Sovereignty by Benjamin H. Bratton

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

1960s counterculture, 3D printing, 4chan, Ada Lovelace, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, basic income, Benevolent Dictator For Life (BDFL), Berlin Wall, bioinformatics, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, carbon footprint, carbon-based life, Cass Sunstein, Celebration, Florida, charter city, clean water, cloud computing, connected car, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Douglas Engelbart, Edward Snowden, Elon Musk, en.wikipedia.org, Eratosthenes, ethereum blockchain, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, Hyperloop, illegal immigration, industrial robot, information retrieval, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Jacob Appelbaum, Jaron Lanier, John Markoff, Jony Ive, Julian Assange, Khan Academy, liberal capitalism, lifelogging, linked data, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megastructure, Menlo Park, Minecraft, Monroe Doctrine, Network effects, new economy, offshore financial centre, oil shale / tar sands, packet switching, PageRank, pattern recognition, peak oil, peer-to-peer, performance metric, personalized medicine, Peter Eisenman, Peter Thiel, phenotype, Philip Mirowski, Pierre-Simon Laplace, place-making, planetary scale, RAND corporation, recommendation engine, reserve currency, RFID, Robert Bork, Sand Hill Road, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, Slavoj Žižek, smart cities, smart grid, smart meter, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, TaskRabbit, the built environment, The Chicago School, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, universal basic income, urban planning, Vernor Vinge, Washington Consensus, web application, Westphalian system, WikiLeaks, working poor, Y Combinator

We see this play out with the absolute User's slide into an abyssal dissolution of the self when confronted with the potential totality of virtualized experiences. In response to the white noise of his infinitely refracted subjectivity, he reflects this entropy by sliding back into perceptual incoherency (or potentially stumbling toward secular hypermaterialism). It's true that the real purpose of QS is not to provide all possible information at once, but to reduce systemic complexity with summary diagrammatic accounts of one's inputs, states, and performance metrics. But adding more and more data sources to the mix and providing greater multivariate fidelity also produces other pathways of dissolution. By tracking external forces (e.g., environmental, microbial, economic) and their role in the formation of the User-subject's state and performance, the boundaries between internal and external systems are perforated and blurred. Those external variables not only act on you; in effect they are you as well, and so the profile reflecting back at the User is both more and less than a single figure (and as we'll see, sometimes those extrinsic forces live inside one's own body).

As discussed in the Interfaces chapter, the images of systemic interrelationality found in GUI and in dynamic visualizations not only diagram how platforms operate; they are the very instruments with which a User interacts with those platforms and with other Users in the first place. At stake for the redesign of the User is not only the subjective (QS) and objective (Exit) reflections of her inputs, states, and performance metrics within local/global and intrinsic/extrinsic variations, but also that the profiles of these traces are the medium through which those interactions are realized. The recursion is not only between scales of action; it is also between event and its mediation. Put differently, the composition with which (and into which) the tangled positions of Users draw their own maps (the sum of the parts that busily sum themselves) is always both more and less whole than the whole that sums their sums!


Martin Kleppmann-Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems-O’Reilly (2017) by Unknown

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, general-purpose programming language, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, loose coupling, Marc Andreessen, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, web application, WebSocket, wikimedia commons

Reliability | 9 • Allow quick and easy recovery from human errors, to minimize the impact in the case of a failure. For example, make it fast to roll back configuration changes, roll out new code gradually (so that any unexpected bugs affect only a small subset of users), and provide tools to recompute data (in case it turns out that the old com‐ putation was incorrect). • Set up detailed and clear monitoring, such as performance metrics and error rates. In other engineering disciplines this is referred to as telemetry. (Once a rocket has left the ground, telemetry is essential for tracking what is happening, and for understanding failures [14].) Monitoring can show us early warning sig‐ nals and allow us to check whether any assumptions or constraints are being vio‐ lated. When a problem occurs, metrics can be invaluable in diagnosing the issue. • Implement good management practices and training—a complex and important aspect, and beyond the scope of this book.

The opposite of bounded. 558 | Glossary Index A aborts (transactions), 222, 224 in two-phase commit, 356 performance of optimistic concurrency con‐ trol, 266 retrying aborted transactions, 231 abstraction, 21, 27, 222, 266, 321 access path (in network model), 37, 60 accidental complexity, removing, 21 accountability, 535 ACID properties (transactions), 90, 223 atomicity, 223, 228 consistency, 224, 529 durability, 226 isolation, 225, 228 acknowledgements (messaging), 445 active/active replication (see multi-leader repli‐ cation) active/passive replication (see leader-based rep‐ lication) ActiveMQ (messaging), 137, 444 distributed transaction support, 361 ActiveRecord (object-relational mapper), 30, 232 actor model, 138 (see also message-passing) comparison to Pregel model, 425 comparison to stream processing, 468 Advanced Message Queuing Protocol (see AMQP) aerospace systems, 6, 10, 305, 372 aggregation data cubes and materialized views, 101 in batch processes, 406 in stream processes, 466 aggregation pipeline query language, 48 Agile, 22 minimizing irreversibility, 414, 497 moving faster with confidence, 532 Unix philosophy, 394 agreement, 365 (see also consensus) Airflow (workflow scheduler), 402 Ajax, 131 Akka (actor framework), 139 algorithms algorithm correctness, 308 B-trees, 79-83 for distributed systems, 306 hash indexes, 72-75 mergesort, 76, 402, 405 red-black trees, 78 SSTables and LSM-trees, 76-79 all-to-all replication topologies, 175 AllegroGraph (database), 50 ALTER TABLE statement (SQL), 40, 111 Amazon Dynamo (database), 177 Amazon Web Services (AWS), 8 Kinesis Streams (messaging), 448 network reliability, 279 postmortems, 9 RedShift (database), 93 S3 (object storage), 398 checking data integrity, 530 amplification of bias, 534 of failures, 364, 495 Index | 559 of tail latency, 16, 207 write amplification, 84 AMQP (Advanced Message Queuing Protocol), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 message ordering, 446 analytics, 90 comparison to transaction processing, 91 data warehousing (see data warehousing) parallel query execution in MPP databases, 415 predictive (see predictive analytics) relation to batch processing, 411 schemas for, 93-95 snapshot isolation for queries, 238 stream analytics, 466 using MapReduce, analysis of user activity events (example), 404 anti-caching (in-memory databases), 89 anti-entropy, 178 Apache ActiveMQ (see ActiveMQ) Apache Avro (see Avro) Apache Beam (see Beam) Apache BookKeeper (see BookKeeper) Apache Cassandra (see Cassandra) Apache CouchDB (see CouchDB) Apache Curator (see Curator) Apache Drill (see Drill) Apache Flink (see Flink) Apache Giraph (see Giraph) Apache Hadoop (see Hadoop) Apache HAWQ (see HAWQ) Apache HBase (see HBase) Apache Helix (see Helix) Apache Hive (see Hive) Apache Impala (see Impala) Apache Jena (see Jena) Apache Kafka (see Kafka) Apache Lucene (see Lucene) Apache MADlib (see MADlib) Apache Mahout (see Mahout) Apache Oozie (see Oozie) Apache Parquet (see Parquet) Apache Qpid (see Qpid) Apache Samza (see Samza) Apache Solr (see Solr) Apache Spark (see Spark) 560 | Index Apache Storm (see Storm) Apache Tajo (see Tajo) Apache Tez (see Tez) Apache Thrift (see Thrift) Apache ZooKeeper (see ZooKeeper) Apama (stream analytics), 466 append-only B-trees, 82, 242 append-only files (see logs) Application Programming Interfaces (APIs), 5, 27 for batch processing, 403 for change streams, 456 for distributed transactions, 361 for graph processing, 425 for services, 131-136 (see also services) evolvability, 136 RESTful, 133 SOAP, 133 application state (see state) approximate search (see similarity search) archival storage, data from databases, 131 arcs (see edges) arithmetic mean, 14 ASCII text, 119, 395 ASN.1 (schema language), 127 asynchronous networks, 278, 553 comparison to synchronous networks, 284 formal model, 307 asynchronous replication, 154, 553 conflict detection, 172 data loss on failover, 157 reads from asynchronous follower, 162 Asynchronous Transfer Mode (ATM), 285 atomic broadcast (see total order broadcast) atomic clocks (caesium clocks), 294, 295 (see also clocks) atomicity (concurrency), 553 atomic increment-and-get, 351 compare-and-set, 245, 327 (see also compare-and-set operations) replicated operations, 246 write operations, 243 atomicity (transactions), 223, 228, 553 atomic commit, 353 avoiding, 523, 528 blocking and nonblocking, 359 in stream processing, 360, 477 maintaining derived data, 453 for multi-object transactions, 229 for single-object writes, 230 auditability, 528-533 designing for, 531 self-auditing systems, 530 through immutability, 460 tools for auditable data systems, 532 availability, 8 (see also fault tolerance) in CAP theorem, 337 in service level agreements (SLAs), 15 Avro (data format), 122-127 code generation, 127 dynamically generated schemas, 126 object container files, 125, 131, 414 reader determining writer’s schema, 125 schema evolution, 123 use in Hadoop, 414 awk (Unix tool), 391 AWS (see Amazon Web Services) Azure (see Microsoft) B B-trees (indexes), 79-83 append-only/copy-on-write variants, 82, 242 branching factor, 81 comparison to LSM-trees, 83-85 crash recovery, 82 growing by splitting a page, 81 optimizations, 82 similarity to dynamic partitioning, 212 backpressure, 441, 553 in TCP, 282 backups database snapshot for replication, 156 integrity of, 530 snapshot isolation for, 238 use for ETL processes, 405 backward compatibility, 112 BASE, contrast to ACID, 223 bash shell (Unix), 70, 395, 503 batch processing, 28, 389-431, 553 combining with stream processing lambda architecture, 497 unifying technologies, 498 comparison to MPP databases, 414-418 comparison to stream processing, 464 comparison to Unix, 413-414 dataflow engines, 421-423 fault tolerance, 406, 414, 422, 442 for data integration, 494-498 graphs and iterative processing, 424-426 high-level APIs and languages, 403, 426-429 log-based messaging and, 451 maintaining derived state, 495 MapReduce and distributed filesystems, 397-413 (see also MapReduce) measuring performance, 13, 390 outputs, 411-413 key-value stores, 412 search indexes, 411 using Unix tools (example), 391-394 Bayou (database), 522 Beam (dataflow library), 498 bias, 534 big ball of mud, 20 Bigtable data model, 41, 99 binary data encodings, 115-128 Avro, 122-127 MessagePack, 116-117 Thrift and Protocol Buffers, 117-121 binary encoding based on schemas, 127 by network drivers, 128 binary strings, lack of support in JSON and XML, 114 BinaryProtocol encoding (Thrift), 118 Bitcask (storage engine), 72 crash recovery, 74 Bitcoin (cryptocurrency), 532 Byzantine fault tolerance, 305 concurrency bugs in exchanges, 233 bitmap indexes, 97 blockchains, 532 Byzantine fault tolerance, 305 blocking atomic commit, 359 Bloom (programming language), 504 Bloom filter (algorithm), 79, 466 BookKeeper (replicated log), 372 Bottled Water (change data capture), 455 bounded datasets, 430, 439, 553 (see also batch processing) bounded delays, 553 in networks, 285 process pauses, 298 broadcast hash joins, 409 Index | 561 brokerless messaging, 442 Brubeck (metrics aggregator), 442 BTM (transaction coordinator), 356 bulk synchronous parallel (BSP) model, 425 bursty network traffic patterns, 285 business data processing, 28, 90, 390 byte sequence, encoding data in, 112 Byzantine faults, 304-306, 307, 553 Byzantine fault-tolerant systems, 305, 532 Byzantine Generals Problem, 304 consensus algorithms and, 366 C caches, 89, 553 and materialized views, 101 as derived data, 386, 499-504 database as cache of transaction log, 460 in CPUs, 99, 338, 428 invalidation and maintenance, 452, 467 linearizability, 324 CAP theorem, 336-338, 554 Cascading (batch processing), 419, 427 hash joins, 409 workflows, 403 cascading failures, 9, 214, 281 Cascalog (batch processing), 60 Cassandra (database) column-family data model, 41, 99 compaction strategy, 79 compound primary key, 204 gossip protocol, 216 hash partitioning, 203-205 last-write-wins conflict resolution, 186, 292 leaderless replication, 177 linearizability, lack of, 335 log-structured storage, 78 multi-datacenter support, 184 partitioning scheme, 213 secondary indexes, 207 sloppy quorums, 184 cat (Unix tool), 391 causal context, 191 (see also causal dependencies) causal dependencies, 186-191 capturing, 191, 342, 494, 514 by total ordering, 493 causal ordering, 339 in transactions, 262 sending message to friends (example), 494 562 | Index causality, 554 causal ordering, 339-343 linearizability and, 342 total order consistent with, 344, 345 consistency with, 344-347 consistent snapshots, 340 happens-before relationship, 186 in serializable transactions, 262-265 mismatch with clocks, 292 ordering events to capture, 493 violations of, 165, 176, 292, 340 with synchronized clocks, 294 CEP (see complex event processing) certificate transparency, 532 chain replication, 155 linearizable reads, 351 change data capture, 160, 454 API support for change streams, 456 comparison to event sourcing, 457 implementing, 454 initial snapshot, 455 log compaction, 456 changelogs, 460 change data capture, 454 for operator state, 479 generating with triggers, 455 in stream joins, 474 log compaction, 456 maintaining derived state, 452 Chaos Monkey, 7, 280 checkpointing in batch processors, 422, 426 in high-performance computing, 275 in stream processors, 477, 523 chronicle data model, 458 circuit-switched networks, 284 circular buffers, 450 circular replication topologies, 175 clickstream data, analysis of, 404 clients calling services, 131 pushing state changes to, 512 request routing, 214 stateful and offline-capable, 170, 511 clocks, 287-299 atomic (caesium) clocks, 294, 295 confidence interval, 293-295 for global snapshots, 294 logical (see logical clocks) skew, 291-294, 334 slewing, 289 synchronization and accuracy, 289-291 synchronization using GPS, 287, 290, 294, 295 time-of-day versus monotonic clocks, 288 timestamping events, 471 cloud computing, 146, 275 need for service discovery, 372 network glitches, 279 shared resources, 284 single-machine reliability, 8 Cloudera Impala (see Impala) clustered indexes, 86 CODASYL model, 36 (see also network model) code generation with Avro, 127 with Thrift and Protocol Buffers, 118 with WSDL, 133 collaborative editing multi-leader replication and, 170 column families (Bigtable), 41, 99 column-oriented storage, 95-101 column compression, 97 distinction between column families and, 99 in batch processors, 428 Parquet, 96, 131, 414 sort order in, 99-100 vectorized processing, 99, 428 writing to, 101 comma-separated values (see CSV) command query responsibility segregation (CQRS), 462 commands (event sourcing), 459 commits (transactions), 222 atomic commit, 354-355 (see also atomicity; transactions) read committed isolation, 234 three-phase commit (3PC), 359 two-phase commit (2PC), 355-359 commutative operations, 246 compaction of changelogs, 456 (see also log compaction) for stream operator state, 479 of log-structured storage, 73 issues with, 84 size-tiered and leveled approaches, 79 CompactProtocol encoding (Thrift), 119 compare-and-set operations, 245, 327 implementing locks, 370 implementing uniqueness constraints, 331 implementing with total order broadcast, 350 relation to consensus, 335, 350, 352, 374 relation to transactions, 230 compatibility, 112, 128 calling services, 136 properties of encoding formats, 139 using databases, 129-131 using message-passing, 138 compensating transactions, 355, 461, 526 complex event processing (CEP), 465 complexity distilling in theoretical models, 310 hiding using abstraction, 27 of software systems, managing, 20 composing data systems (see unbundling data‐ bases) compute-intensive applications, 3, 275 concatenated indexes, 87 in Cassandra, 204 Concord (stream processor), 466 concurrency actor programming model, 138, 468 (see also message-passing) bugs from weak transaction isolation, 233 conflict resolution, 171, 174 detecting concurrent writes, 184-191 dual writes, problems with, 453 happens-before relationship, 186 in replicated systems, 161-191, 324-338 lost updates, 243 multi-version concurrency control (MVCC), 239 optimistic concurrency control, 261 ordering of operations, 326, 341 reducing, through event logs, 351, 462, 507 time and relativity, 187 transaction isolation, 225 write skew (transaction isolation), 246-251 conflict-free replicated datatypes (CRDTs), 174 conflicts conflict detection, 172 causal dependencies, 186, 342 in consensus algorithms, 368 in leaderless replication, 184 Index | 563 in log-based systems, 351, 521 in nonlinearizable systems, 343 in serializable snapshot isolation (SSI), 264 in two-phase commit, 357, 364 conflict resolution automatic conflict resolution, 174 by aborting transactions, 261 by apologizing, 527 convergence, 172-174 in leaderless systems, 190 last write wins (LWW), 186, 292 using atomic operations, 246 using custom logic, 173 determining what is a conflict, 174, 522 in multi-leader replication, 171-175 avoiding conflicts, 172 lost updates, 242-246 materializing, 251 relation to operation ordering, 339 write skew (transaction isolation), 246-251 congestion (networks) avoidance, 282 limiting accuracy of clocks, 293 queueing delays, 282 consensus, 321, 364-375, 554 algorithms, 366-368 preventing split brain, 367 safety and liveness properties, 365 using linearizable operations, 351 cost of, 369 distributed transactions, 352-375 in practice, 360-364 two-phase commit, 354-359 XA transactions, 361-364 impossibility of, 353 membership and coordination services, 370-373 relation to compare-and-set, 335, 350, 352, 374 relation to replication, 155, 349 relation to uniqueness constraints, 521 consistency, 224, 524 across different databases, 157, 452, 462, 492 causal, 339-348, 493 consistent prefix reads, 165-167 consistent snapshots, 156, 237-242, 294, 455, 500 (see also snapshots) 564 | Index crash recovery, 82 enforcing constraints (see constraints) eventual, 162, 322 (see also eventual consistency) in ACID transactions, 224, 529 in CAP theorem, 337 linearizability, 324-338 meanings of, 224 monotonic reads, 164-165 of secondary indexes, 231, 241, 354, 491, 500 ordering guarantees, 339-352 read-after-write, 162-164 sequential, 351 strong (see linearizability) timeliness and integrity, 524 using quorums, 181, 334 consistent hashing, 204 consistent prefix reads, 165 constraints (databases), 225, 248 asynchronously checked, 526 coordination avoidance, 527 ensuring idempotence, 519 in log-based systems, 521-524 across multiple partitions, 522 in two-phase commit, 355, 357 relation to consensus, 374, 521 relation to event ordering, 347 requiring linearizability, 330 Consul (service discovery), 372 consumers (message streams), 137, 440 backpressure, 441 consumer offsets in logs, 449 failures, 445, 449 fan-out, 11, 445, 448 load balancing, 444, 448 not keeping up with producers, 441, 450, 502 context switches, 14, 297 convergence (conflict resolution), 172-174, 322 coordination avoidance, 527 cross-datacenter, 168, 493 cross-partition ordering, 256, 294, 348, 523 services, 330, 370-373 coordinator (in 2PC), 356 failure, 358 in XA transactions, 361-364 recovery, 363 copy-on-write (B-trees), 82, 242 CORBA (Common Object Request Broker Architecture), 134 correctness, 6 auditability, 528-533 Byzantine fault tolerance, 305, 532 dealing with partial failures, 274 in log-based systems, 521-524 of algorithm within system model, 308 of compensating transactions, 355 of consensus, 368 of derived data, 497, 531 of immutable data, 461 of personal data, 535, 540 of time, 176, 289-295 of transactions, 225, 515, 529 timeliness and integrity, 524-528 corruption of data detecting, 519, 530-533 due to pathological memory access, 529 due to radiation, 305 due to split brain, 158, 302 due to weak transaction isolation, 233 formalization in consensus, 366 integrity as absence of, 524 network packets, 306 on disks, 227 preventing using write-ahead logs, 82 recovering from, 414, 460 Couchbase (database) durability, 89 hash partitioning, 203-204, 211 rebalancing, 213 request routing, 216 CouchDB (database) B-tree storage, 242 change feed, 456 document data model, 31 join support, 34 MapReduce support, 46, 400 replication, 170, 173 covering indexes, 86 CPUs cache coherence and memory barriers, 338 caching and pipelining, 99, 428 increasing parallelism, 43 CRDTs (see conflict-free replicated datatypes) CREATE INDEX statement (SQL), 85, 500 credit rating agencies, 535 Crunch (batch processing), 419, 427 hash joins, 409 sharded joins, 408 workflows, 403 cryptography defense against attackers, 306 end-to-end encryption and authentication, 519, 543 proving integrity of data, 532 CSS (Cascading Style Sheets), 44 CSV (comma-separated values), 70, 114, 396 Curator (ZooKeeper recipes), 330, 371 curl (Unix tool), 135, 397 cursor stability, 243 Cypher (query language), 52 comparison to SPARQL, 59 D data corruption (see corruption of data) data cubes, 102 data formats (see encoding) data integration, 490-498, 543 batch and stream processing, 494-498 lambda architecture, 497 maintaining derived state, 495 reprocessing data, 496 unifying, 498 by unbundling databases, 499-515 comparison to federated databases, 501 combining tools by deriving data, 490-494 derived data versus distributed transac‐ tions, 492 limits of total ordering, 493 ordering events to capture causality, 493 reasoning about dataflows, 491 need for, 385 data lakes, 415 data locality (see locality) data models, 27-64 graph-like models, 49-63 Datalog language, 60-63 property graphs, 50 RDF and triple-stores, 55-59 query languages, 42-48 relational model versus document model, 28-42 data protection regulations, 542 data systems, 3 about, 4 Index | 565 concerns when designing, 5 future of, 489-544 correctness, constraints, and integrity, 515-533 data integration, 490-498 unbundling databases, 499-515 heterogeneous, keeping in sync, 452 maintainability, 18-22 possible faults in, 221 reliability, 6-10 hardware faults, 7 human errors, 9 importance of, 10 software errors, 8 scalability, 10-18 unreliable clocks, 287-299 data warehousing, 91-95, 554 comparison to data lakes, 415 ETL (extract-transform-load), 92, 416, 452 keeping data systems in sync, 452 schema design, 93 slowly changing dimension (SCD), 476 data-intensive applications, 3 database triggers (see triggers) database-internal distributed transactions, 360, 364, 477 databases archival storage, 131 comparison of message brokers to, 443 dataflow through, 129 end-to-end argument for, 519-520 checking integrity, 531 inside-out, 504 (see also unbundling databases) output from batch workflows, 412 relation to event streams, 451-464 (see also changelogs) API support for change streams, 456, 506 change data capture, 454-457 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 unbundling, 499-515 composing data storage technologies, 499-504 designing applications around dataflow, 504-509 566 | Index observing derived state, 509-515 datacenters geographically distributed, 145, 164, 278, 493 multi-tenancy and shared resources, 284 network architecture, 276 network faults, 279 replication across multiple, 169 leaderless replication, 184 multi-leader replication, 168, 335 dataflow, 128-139, 504-509 correctness of dataflow systems, 525 differential, 504 message-passing, 136-139 reasoning about, 491 through databases, 129 through services, 131-136 dataflow engines, 421-423 comparison to stream processing, 464 directed acyclic graphs (DAG), 424 partitioning, approach to, 429 support for declarative queries, 427 Datalog (query language), 60-63 datatypes binary strings in XML and JSON, 114 conflict-free, 174 in Avro encodings, 122 in Thrift and Protocol Buffers, 121 numbers in XML and JSON, 114 Datomic (database) B-tree storage, 242 data model, 50, 57 Datalog query language, 60 excision (deleting data), 463 languages for transactions, 255 serial execution of transactions, 253 deadlocks detection, in two-phase commit (2PC), 364 in two-phase locking (2PL), 258 Debezium (change data capture), 455 declarative languages, 42, 554 Bloom, 504 CSS and XSL, 44 Cypher, 52 Datalog, 60 for batch processing, 427 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 delays bounded network delays, 285 bounded process pauses, 298 unbounded network delays, 282 unbounded process pauses, 296 deleting data, 463 denormalization (data representation), 34, 554 costs, 39 in derived data systems, 386 materialized views, 101 updating derived data, 228, 231, 490 versus normalization, 462 derived data, 386, 439, 554 from change data capture, 454 in event sourcing, 458-458 maintaining derived state through logs, 452-457, 459-463 observing, by subscribing to streams, 512 outputs of batch and stream processing, 495 through application code, 505 versus distributed transactions, 492 deterministic operations, 255, 274, 554 accidental nondeterminism, 423 and fault tolerance, 423, 426 and idempotence, 478, 492 computing derived data, 495, 526, 531 in state machine replication, 349, 452, 458 joins, 476 DevOps, 394 differential dataflow, 504 dimension tables, 94 dimensional modeling (see star schemas) directed acyclic graphs (DAGs), 424 dirty reads (transaction isolation), 234 dirty writes (transaction isolation), 235 discrimination, 534 disks (see hard disks) distributed actor frameworks, 138 distributed filesystems, 398-399 decoupling from query engines, 417 indiscriminately dumping data into, 415 use by MapReduce, 402 distributed systems, 273-312, 554 Byzantine faults, 304-306 cloud versus supercomputing, 275 detecting network faults, 280 faults and partial failures, 274-277 formalization of consensus, 365 impossibility results, 338, 353 issues with failover, 157 limitations of distributed transactions, 363 multi-datacenter, 169, 335 network problems, 277-286 quorums, relying on, 301 reasons for using, 145, 151 synchronized clocks, relying on, 291-295 system models, 306-310 use of clocks and time, 287 distributed transactions (see transactions) Django (web framework), 232 DNS (Domain Name System), 216, 372 Docker (container manager), 506 document data model, 30-42 comparison to relational model, 38-42 document references, 38, 403 document-oriented databases, 31 many-to-many relationships and joins, 36 multi-object transactions, need for, 231 versus relational model convergence of models, 41 data locality, 41 document-partitioned indexes, 206, 217, 411 domain-driven design (DDD), 457 DRBD (Distributed Replicated Block Device), 153 drift (clocks), 289 Drill (query engine), 93 Druid (database), 461 Dryad (dataflow engine), 421 dual writes, problems with, 452, 507 duplicates, suppression of, 517 (see also idempotence) using a unique ID, 518, 522 durability (transactions), 226, 554 duration (time), 287 measurement with monotonic clocks, 288 dynamic partitioning, 212 dynamically typed languages analogy to schema-on-read, 40 code generation and, 127 Dynamo-style databases (see leaderless replica‐ tion) E edges (in graphs), 49, 403 property graph model, 50 edit distance (full-text search), 88 effectively-once semantics, 476, 516 Index | 567 (see also exactly-once semantics) preservation of integrity, 525 elastic systems, 17 Elasticsearch (search server) document-partitioned indexes, 207 partition rebalancing, 211 percolator (stream search), 467 usage example, 4 use of Lucene, 79 ElephantDB (database), 413 Elm (programming language), 504, 512 encodings (data formats), 111-128 Avro, 122-127 binary variants of JSON and XML, 115 compatibility, 112 calling services, 136 using databases, 129-131 using message-passing, 138 defined, 113 JSON, XML, and CSV, 114 language-specific formats, 113 merits of schemas, 127 representations of data, 112 Thrift and Protocol Buffers, 117-121 end-to-end argument, 277, 519-520 checking integrity, 531 publish/subscribe streams, 512 enrichment (stream), 473 Enterprise JavaBeans (EJB), 134 entities (see vertices) epoch (consensus algorithms), 368 epoch (Unix timestamps), 288 equi-joins, 403 erasure coding (error correction), 398 Erlang OTP (actor framework), 139 error handling for network faults, 280 in transactions, 231 error-correcting codes, 277, 398 Esper (CEP engine), 466 etcd (coordination service), 370-373 linearizable operations, 333 locks and leader election, 330 quorum reads, 351 service discovery, 372 use of Raft algorithm, 349, 353 Ethereum (blockchain), 532 Ethernet (networks), 276, 278, 285 packet checksums, 306, 519 568 | Index Etherpad (collaborative editor), 170 ethics, 533-543 code of ethics and professional practice, 533 legislation and self-regulation, 542 predictive analytics, 533-536 amplifying bias, 534 feedback loops, 536 privacy and tracking, 536-543 consent and freedom of choice, 538 data as assets and power, 540 meaning of privacy, 539 surveillance, 537 respect, dignity, and agency, 543, 544 unintended consequences, 533, 536 ETL (extract-transform-load), 92, 405, 452, 554 use of Hadoop for, 416 event sourcing, 457-459 commands and events, 459 comparison to change data capture, 457 comparison to lambda architecture, 497 deriving current state from event log, 458 immutability and auditability, 459, 531 large, reliable data systems, 519, 526 Event Store (database), 458 event streams (see streams) events, 440 deciding on total order of, 493 deriving views from event log, 461 difference to commands, 459 event time versus processing time, 469, 477, 498 immutable, advantages of, 460, 531 ordering to capture causality, 493 reads as, 513 stragglers, 470, 498 timestamp of, in stream processing, 471 EventSource (browser API), 512 eventual consistency, 152, 162, 308, 322 (see also conflicts) and perpetual inconsistency, 525 evolvability, 21, 111 calling services, 136 graph-structured data, 52 of databases, 40, 129-131, 461, 497 of message-passing, 138 reprocessing data, 496, 498 schema evolution in Avro, 123 schema evolution in Thrift and Protocol Buffers, 120 schema-on-read, 39, 111, 128 exactly-once semantics, 360, 476, 516 parity with batch processors, 498 preservation of integrity, 525 exclusive mode (locks), 258 eXtended Architecture transactions (see XA transactions) extract-transform-load (see ETL) F Facebook Presto (query engine), 93 React, Flux, and Redux (user interface libra‐ ries), 512 social graphs, 49 Wormhole (change data capture), 455 fact tables, 93 failover, 157, 554 (see also leader-based replication) in leaderless replication, absence of, 178 leader election, 301, 348, 352 potential problems, 157 failures amplification by distributed transactions, 364, 495 failure detection, 280 automatic rebalancing causing cascading failures, 214 perfect failure detectors, 359 timeouts and unbounded delays, 282, 284 using ZooKeeper, 371 faults versus, 7 partial failures in distributed systems, 275-277, 310 fan-out (messaging systems), 11, 445 fault tolerance, 6-10, 555 abstractions for, 321 formalization in consensus, 365-369 use of replication, 367 human fault tolerance, 414 in batch processing, 406, 414, 422, 425 in log-based systems, 520, 524-526 in stream processing, 476-479 atomic commit, 477 idempotence, 478 maintaining derived state, 495 microbatching and checkpointing, 477 rebuilding state after a failure, 478 of distributed transactions, 362-364 transaction atomicity, 223, 354-361 faults, 6 Byzantine faults, 304-306 failures versus, 7 handled by transactions, 221 handling in supercomputers and cloud computing, 275 hardware, 7 in batch processing versus distributed data‐ bases, 417 in distributed systems, 274-277 introducing deliberately, 7, 280 network faults, 279-281 asymmetric faults, 300 detecting, 280 tolerance of, in multi-leader replication, 169 software errors, 8 tolerating (see fault tolerance) federated databases, 501 fence (CPU instruction), 338 fencing (preventing split brain), 158, 302-304 generating fencing tokens, 349, 370 properties of fencing tokens, 308 stream processors writing to databases, 478, 517 Fibre Channel (networks), 398 field tags (Thrift and Protocol Buffers), 119-121 file descriptors (Unix), 395 financial data, 460 Firebase (database), 456 Flink (processing framework), 421-423 dataflow APIs, 427 fault tolerance, 422, 477, 479 Gelly API (graph processing), 425 integration of batch and stream processing, 495, 498 machine learning, 428 query optimizer, 427 stream processing, 466 flow control, 282, 441, 555 FLP result (on consensus), 353 FlumeJava (dataflow library), 403, 427 followers, 152, 555 (see also leader-based replication) foreign keys, 38, 403 forward compatibility, 112 forward decay (algorithm), 16 Index | 569 Fossil (version control system), 463 shunning (deleting data), 463 FoundationDB (database) serializable transactions, 261, 265, 364 fractal trees, 83 full table scans, 403 full-text search, 555 and fuzzy indexes, 88 building search indexes, 411 Lucene storage engine, 79 functional reactive programming (FRP), 504 functional requirements, 22 futures (asynchronous operations), 135 fuzzy search (see similarity search) G garbage collection immutability and, 463 process pauses for, 14, 296-299, 301 (see also process pauses) genome analysis, 63, 429 geographically distributed datacenters, 145, 164, 278, 493 geospatial indexes, 87 Giraph (graph processing), 425 Git (version control system), 174, 342, 463 GitHub, postmortems, 157, 158, 309 global indexes (see term-partitioned indexes) GlusterFS (distributed filesystem), 398 GNU Coreutils (Linux), 394 GoldenGate (change data capture), 161, 170, 455 (see also Oracle) Google Bigtable (database) data model (see Bigtable data model) partitioning scheme, 199, 202 storage layout, 78 Chubby (lock service), 370 Cloud Dataflow (stream processor), 466, 477, 498 (see also Beam) Cloud Pub/Sub (messaging), 444, 448 Docs (collaborative editor), 170 Dremel (query engine), 93, 96 FlumeJava (dataflow library), 403, 427 GFS (distributed file system), 398 gRPC (RPC framework), 135 MapReduce (batch processing), 390 570 | Index (see also MapReduce) building search indexes, 411 task preemption, 418 Pregel (graph processing), 425 Spanner (see Spanner) TrueTime (clock API), 294 gossip protocol, 216 government use of data, 541 GPS (Global Positioning System) use for clock synchronization, 287, 290, 294, 295 GraphChi (graph processing), 426 graphs, 555 as data models, 49-63 example of graph-structured data, 49 property graphs, 50 RDF and triple-stores, 55-59 versus the network model, 60 processing and analysis, 424-426 fault tolerance, 425 Pregel processing model, 425 query languages Cypher, 52 Datalog, 60-63 recursive SQL queries, 53 SPARQL, 59-59 Gremlin (graph query language), 50 grep (Unix tool), 392 GROUP BY clause (SQL), 406 grouping records in MapReduce, 406 handling skew, 407 H Hadoop (data infrastructure) comparison to distributed databases, 390 comparison to MPP databases, 414-418 comparison to Unix, 413-414, 499 diverse processing models in ecosystem, 417 HDFS distributed filesystem (see HDFS) higher-level tools, 403 join algorithms, 403-410 (see also MapReduce) MapReduce (see MapReduce) YARN (see YARN) happens-before relationship, 340 capturing, 187 concurrency and, 186 hard disks access patterns, 84 detecting corruption, 519, 530 faults in, 7, 227 sequential write throughput, 75, 450 hardware faults, 7 hash indexes, 72-75 broadcast hash joins, 409 partitioned hash joins, 409 hash partitioning, 203-205, 217 consistent hashing, 204 problems with hash mod N, 210 range queries, 204 suitable hash functions, 203 with fixed number of partitions, 210 HAWQ (database), 428 HBase (database) bug due to lack of fencing, 302 bulk loading, 413 column-family data model, 41, 99 dynamic partitioning, 212 key-range partitioning, 202 log-structured storage, 78 request routing, 216 size-tiered compaction, 79 use of HDFS, 417 use of ZooKeeper, 370 HDFS (Hadoop Distributed File System), 398-399 (see also distributed filesystems) checking data integrity, 530 decoupling from query engines, 417 indiscriminately dumping data into, 415 metadata about datasets, 410 NameNode, 398 use by Flink, 479 use by HBase, 212 use by MapReduce, 402 HdrHistogram (numerical library), 16 head (Unix tool), 392 head vertex (property graphs), 51 head-of-line blocking, 15 heap files (databases), 86 Helix (cluster manager), 216 heterogeneous distributed transactions, 360, 364 heuristic decisions (in 2PC), 363 Hibernate (object-relational mapper), 30 hierarchical model, 36 high availability (see fault tolerance) high-frequency trading, 290, 299 high-performance computing (HPC), 275 hinted handoff, 183 histograms, 16 Hive (query engine), 419, 427 for data warehouses, 93 HCatalog and metastore, 410 map-side joins, 409 query optimizer, 427 skewed joins, 408 workflows, 403 Hollerith machines, 390 hopping windows (stream processing), 472 (see also windows) horizontal scaling (see scaling out) HornetQ (messaging), 137, 444 distributed transaction support, 361 hot spots, 201 due to celebrities, 205 for time-series data, 203 in batch processing, 407 relieving, 205 hot standbys (see leader-based replication) HTTP, use in APIs (see services) human errors, 9, 279, 414 HyperDex (database), 88 HyperLogLog (algorithm), 466 I I/O operations, waiting for, 297 IBM DB2 (database) distributed transaction support, 361 recursive query support, 54 serializable isolation, 242, 257 XML and JSON support, 30, 42 electromechanical card-sorting machines, 390 IMS (database), 36 imperative query APIs, 46 InfoSphere Streams (CEP engine), 466 MQ (messaging), 444 distributed transaction support, 361 System R (database), 222 WebSphere (messaging), 137 idempotence, 134, 478, 555 by giving operations unique IDs, 518, 522 idempotent operations, 517 immutability advantages of, 460, 531 Index | 571 deriving state from event log, 459-464 for crash recovery, 75 in B-trees, 82, 242 in event sourcing, 457 inputs to Unix commands, 397 limitations of, 463 Impala (query engine) for data warehouses, 93 hash joins, 409 native code generation, 428 use of HDFS, 417 impedance mismatch, 29 imperative languages, 42 setting element styles (example), 45 in doubt (transaction status), 358 holding locks, 362 orphaned transactions, 363 in-memory databases, 88 durability, 227 serial transaction execution, 253 incidents cascading failures, 9 crashes due to leap seconds, 290 data corruption and financial losses due to concurrency bugs, 233 data corruption on hard disks, 227 data loss due to last-write-wins, 173, 292 data on disks unreadable, 309 deleted items reappearing, 174 disclosure of sensitive data due to primary key reuse, 157 errors in transaction serializability, 529 gigabit network interface with 1 Kb/s throughput, 311 network faults, 279 network interface dropping only inbound packets, 279 network partitions and whole-datacenter failures, 275 poor handling of network faults, 280 sending message to ex-partner, 494 sharks biting undersea cables, 279 split brain due to 1-minute packet delay, 158, 279 vibrations in server rack, 14 violation of uniqueness constraint, 529 indexes, 71, 555 and snapshot isolation, 241 as derived data, 386, 499-504 572 | Index B-trees, 79-83 building in batch processes, 411 clustered, 86 comparison of B-trees and LSM-trees, 83-85 concatenated, 87 covering (with included columns), 86 creating, 500 full-text search, 88 geospatial, 87 hash, 72-75 index-range locking, 260 multi-column, 87 partitioning and secondary indexes, 206-209, 217 secondary, 85 (see also secondary indexes) problems with dual writes, 452, 491 SSTables and LSM-trees, 76-79 updating when data changes, 452, 467 Industrial Revolution, 541 InfiniBand (networks), 285 InfiniteGraph (database), 50 InnoDB (storage engine) clustered index on primary key, 86 not preventing lost updates, 245 preventing write skew, 248, 257 serializable isolation, 257 snapshot isolation support, 239 inside-out databases, 504 (see also unbundling databases) integrating different data systems (see data integration) integrity, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 in consensus formalization, 365 integrity checks, 530 (see also auditing) end-to-end, 519, 531 use of snapshot isolation, 238 maintaining despite software bugs, 529 Interface Definition Language (IDL), 117, 122 intermediate state, materialization of, 420-423 internet services, systems for implementing, 275 invariants, 225 (see also constraints) inversion of control, 396 IP (Internet Protocol) unreliability of, 277 ISDN (Integrated Services Digital Network), 284 isolation (in transactions), 225, 228, 555 correctness and, 515 for single-object writes, 230 serializability, 251-266 actual serial execution, 252-256 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 violating, 228 weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-237 snapshot isolation, 237-242 iterative processing, 424-426 J Java Database Connectivity (JDBC) distributed transaction support, 361 network drivers, 128 Java Enterprise Edition (EE), 134, 356, 361 Java Message Service (JMS), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 distributed transaction support, 361 message ordering, 446 Java Transaction API (JTA), 355, 361 Java Virtual Machine (JVM) bytecode generation, 428 garbage collection pauses, 296 process reuse in batch processors, 422 JavaScript in MapReduce querying, 46 setting element styles (example), 45 use in advanced queries, 48 Jena (RDF framework), 57 Jepsen (fault tolerance testing), 515 jitter (network delay), 284 joins, 555 by index lookup, 403 expressing as relational operators, 427 in relational and document databases, 34 MapReduce map-side joins, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 MapReduce reduce-side joins, 403-408 handling skew, 407 sort-merge joins, 405 parallel execution of, 415 secondary indexes and, 85 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 support in document databases, 42 JOTM (transaction coordinator), 356 JSON Avro schema representation, 122 binary variants, 115 for application data, issues with, 114 in relational databases, 30, 42 representing a résumé (example), 31 Juttle (query language), 504 K k-nearest neighbors, 429 Kafka (messaging), 137, 448 Kafka Connect (database integration), 457, 461 Kafka Streams (stream processor), 466, 467 fault tolerance, 479 leader-based replication, 153 log compaction, 456, 467 message offsets, 447, 478 request routing, 216 transaction support, 477 usage example, 4 Ketama (partitioning library), 213 key-value stores, 70 as batch process output, 412 hash indexes, 72-75 in-memory, 89 partitioning, 201-205 by hash of key, 203, 217 by key range, 202, 217 dynamic partitioning, 212 skew and hot spots, 205 Kryo (Java), 113 Kubernetes (cluster manager), 418, 506 L lambda architecture, 497 Lamport timestamps, 345 Index | 573 Large Hadron Collider (LHC), 64 last write wins (LWW), 173, 334 discarding concurrent writes, 186 problems with, 292 prone to lost updates, 246 late binding, 396 latency instability under two-phase locking, 259 network latency and resource utilization, 286 response time versus, 14 tail latency, 15, 207 leader-based replication, 152-161 (see also replication) failover, 157, 301 handling node outages, 156 implementation of replication logs change data capture, 454-457 (see also changelogs) statement-based, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 linearizability of operations, 333 locking and leader election, 330 log sequence number, 156, 449 read-scaling architecture, 161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 leaderless replication, 177-191 (see also replication) detecting concurrent writes, 184-191 capturing happens-before relationship, 187 happens-before relationship and concur‐ rency, 186 last write wins, 186 merging concurrently written values, 190 version vectors, 191 multi-datacenter, 184 quorums, 179-182 consistency limitations, 181-183, 334 sloppy quorums and hinted handoff, 183 read repair and anti-entropy, 178 leap seconds, 8, 290 in time-of-day clocks, 288 leases, 295 implementation with ZooKeeper, 370 574 | Index need for fencing, 302 ledgers, 460 distributed ledger technologies, 532 legacy systems, maintenance of, 18 less (Unix tool), 397 LevelDB (storage engine), 78 leveled compaction, 79 Levenshtein automata, 88 limping (partial failure), 311 linearizability, 324-338, 555 cost of, 335-338 CAP theorem, 336 memory on multi-core CPUs, 338 definition, 325-329 implementing with total order broadcast, 350 in ZooKeeper, 370 of derived data systems, 492, 524 avoiding coordination, 527 of different replication methods, 332-335 using quorums, 334 relying on, 330-332 constraints and uniqueness, 330 cross-channel timing dependencies, 331 locking and leader election, 330 stronger than causal consistency, 342 using to implement total order broadcast, 351 versus serializability, 329 LinkedIn Azkaban (workflow scheduler), 402 Databus (change data capture), 161, 455 Espresso (database), 31, 126, 130, 153, 216 Helix (cluster manager) (see Helix) profile (example), 30 reference to company entity (example), 34 Rest.li (RPC framework), 135 Voldemort (database) (see Voldemort) Linux, leap second bug, 8, 290 liveness properties, 308 LMDB (storage engine), 82, 242 load approaches to coping with, 17 describing, 11 load testing, 16 load balancing (messaging), 444 local indexes (see document-partitioned indexes) locality (data access), 32, 41, 555 in batch processing, 400, 405, 421 in stateful clients, 170, 511 in stream processing, 474, 478, 508, 522 location transparency, 134 in the actor model, 138 locks, 556 deadlock, 258 distributed locking, 301-304, 330 fencing tokens, 303 implementation with ZooKeeper, 370 relation to consensus, 374 for transaction isolation in snapshot isolation, 239 in two-phase locking (2PL), 257-261 making operations atomic, 243 performance, 258 preventing dirty writes, 236 preventing phantoms with index-range locks, 260, 265 read locks (shared mode), 236, 258 shared mode and exclusive mode, 258 in two-phase commit (2PC) deadlock detection, 364 in-doubt transactions holding locks, 362 materializing conflicts with, 251 preventing lost updates by explicit locking, 244 log sequence number, 156, 449 logic programming languages, 504 logical clocks, 293, 343, 494 for read-after-write consistency, 164 logical logs, 160 logs (data structure), 71, 556 advantages of immutability, 460 compaction, 73, 79, 456, 460 for stream operator state, 479 creating using total order broadcast, 349 implementing uniqueness constraints, 522 log-based messaging, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 disk space usage, 450 replaying old messages, 451, 496, 498 slow consumers, 450 using logs for message storage, 447 log-structured storage, 71-79 log-structured merge tree (see LSMtrees) replication, 152, 158-161 change data capture, 454-457 (see also changelogs) coordination with snapshot, 156 logical (row-based) replication, 160 statement-based replication, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 scalability limits, 493 loose coupling, 396, 419, 502 lost updates (see updates) LSM-trees (indexes), 78-79 comparison to B-trees, 83-85 Lucene (storage engine), 79 building indexes in batch processes, 411 similarity search, 88 Luigi (workflow scheduler), 402 LWW (see last write wins) M machine learning ethical considerations, 534 (see also ethics) iterative processing, 424 models derived from training data, 505 statistical and numerical algorithms, 428 MADlib (machine learning toolkit), 428 magic scaling sauce, 18 Mahout (machine learning toolkit), 428 maintainability, 18-22, 489 defined, 23 design principles for software systems, 19 evolvability (see evolvability) operability, 19 simplicity and managing complexity, 20 many-to-many relationships in document model versus relational model, 39 modeling as graphs, 49 many-to-one and many-to-many relationships, 33-36 many-to-one relationships, 34 MapReduce (batch processing), 390, 399-400 accessing external services within job, 404, 412 comparison to distributed databases designing for frequent faults, 417 diversity of processing models, 416 diversity of storage, 415 Index | 575 comparison to stream processing, 464 comparison to Unix, 413-414 disadvantages and limitations of, 419 fault tolerance, 406, 414, 422 higher-level tools, 403, 426 implementation in Hadoop, 400-403 the shuffle, 402 implementation in MongoDB, 46-48 machine learning, 428 map-side processing, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 mapper and reducer functions, 399 materialization of intermediate state, 419-423 output of batch workflows, 411-413 building search indexes, 411 key-value stores, 412 reduce-side processing, 403-408 analysis of user activity events (exam‐ ple), 404 grouping records by same key, 406 handling skew, 407 sort-merge joins, 405 workflows, 402 marshalling (see encoding) massively parallel processing (MPP), 216 comparison to composing storage technolo‐ gies, 502 comparison to Hadoop, 414-418, 428 master-master replication (see multi-leader replication) master-slave replication (see leader-based repli‐ cation) materialization, 556 aggregate values, 101 conflicts, 251 intermediate state (batch processing), 420-423 materialized views, 101 as derived data, 386, 499-504 maintaining, using stream processing, 467, 475 Maven (Java build tool), 428 Maxwell (change data capture), 455 mean, 14 media monitoring, 467 median, 14 576 | Index meeting room booking (example), 249, 259, 521 membership services, 372 Memcached (caching server), 4, 89 memory in-memory databases, 88 durability, 227 serial transaction execution, 253 in-memory representation of data, 112 random bit-flips in, 529 use by indexes, 72, 77 memory barrier (CPU instruction), 338 MemSQL (database) in-memory storage, 89 read committed isolation, 236 memtable (in LSM-trees), 78 Mercurial (version control system), 463 merge joins, MapReduce map-side, 410 mergeable persistent data structures, 174 merging sorted files, 76, 402, 405 Merkle trees, 532 Mesos (cluster manager), 418, 506 message brokers (see messaging systems) message-passing, 136-139 advantages over direct RPC, 137 distributed actor frameworks, 138 evolvability, 138 MessagePack (encoding format), 116 messages exactly-once semantics, 360, 476 loss of, 442 using total order broadcast, 348 messaging systems, 440-451 (see also streams) backpressure, buffering, or dropping mes‐ sages, 441 brokerless messaging, 442 event logs, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 replaying old messages, 451, 496, 498 slow consumers, 450 message brokers, 443-446 acknowledgements and redelivery, 445 comparison to event logs, 448, 451 multiple consumers of same topic, 444 reliability, 442 uniqueness in log-based messaging, 522 Meteor (web framework), 456 microbatching, 477, 495 microservices, 132 (see also services) causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 Microsoft Azure Service Bus (messaging), 444 Azure Storage, 155, 398 Azure Stream Analytics, 466 DCOM (Distributed Component Object Model), 134 MSDTC (transaction coordinator), 356 Orleans (see Orleans) SQL Server (see SQL Server) migrating (rewriting) data, 40, 130, 461, 497 modulus operator (%), 210 MongoDB (database) aggregation pipeline, 48 atomic operations, 243 BSON, 41 document data model, 31 hash partitioning (sharding), 203-204 key-range partitioning, 202 lack of join support, 34, 42 leader-based replication, 153 MapReduce support, 46, 400 oplog parsing, 455, 456 partition splitting, 212 request routing, 216 secondary indexes, 207 Mongoriver (change data capture), 455 monitoring, 10, 19 monotonic clocks, 288 monotonic reads, 164 MPP (see massively parallel processing) MSMQ (messaging), 361 multi-column indexes, 87 multi-leader replication, 168-177 (see also replication) handling write conflicts, 171 conflict avoidance, 172 converging toward a consistent state, 172 custom conflict resolution logic, 173 determining what is a conflict, 174 linearizability, lack of, 333 replication topologies, 175-177 use cases, 168 clients with offline operation, 170 collaborative editing, 170 multi-datacenter replication, 168, 335 multi-object transactions, 228 need for, 231 Multi-Paxos (total order broadcast), 367 multi-table index cluster tables (Oracle), 41 multi-tenancy, 284 multi-version concurrency control (MVCC), 239, 266 detecting stale MVCC reads, 263 indexes and snapshot isolation, 241 mutual exclusion, 261 (see also locks) MySQL (database) binlog coordinates, 156 binlog parsing for change data capture, 455 circular replication topology, 175 consistent snapshots, 156 distributed transaction support, 361 InnoDB storage engine (see InnoDB) JSON support, 30, 42 leader-based replication, 153 performance of XA transactions, 360 row-based replication, 160 schema changes in, 40 snapshot isolation support, 242 (see also InnoDB) statement-based replication, 159 Tungsten Replicator (multi-leader replica‐ tion), 170 conflict detection, 177 N nanomsg (messaging library), 442 Narayana (transaction coordinator), 356 NATS (messaging), 137 near-real-time (nearline) processing, 390 (see also stream processing) Neo4j (database) Cypher query language, 52 graph data model, 50 Nephele (dataflow engine), 421 netcat (Unix tool), 397 Netflix Chaos Monkey, 7, 280 Network Attached Storage (NAS), 146, 398 network model, 36 Index | 577 graph databases versus, 60 imperative query APIs, 46 Network Time Protocol (see NTP) networks congestion and queueing, 282 datacenter network topologies, 276 faults (see faults) linearizability and network delays, 338 network partitions, 279, 337 timeouts and unbounded delays, 281 next-key locking, 260 nodes (in graphs) (see vertices) nodes (processes), 556 handling outages in leader-based replica‐ tion, 156 system models for failure, 307 noisy neighbors, 284 nonblocking atomic commit, 359 nondeterministic operations accidental nondeterminism, 423 partial failures in distributed systems, 275 nonfunctional requirements, 22 nonrepeatable reads, 238 (see also read skew) normalization (data representation), 33, 556 executing joins, 39, 42, 403 foreign key references, 231 in systems of record, 386 versus denormalization, 462 NoSQL, 29, 499 transactions and, 223 Notation3 (N3), 56 npm (package manager), 428 NTP (Network Time Protocol), 287 accuracy, 289, 293 adjustments to monotonic clocks, 289 multiple server addresses, 306 numbers, in XML and JSON encodings, 114 O object-relational mapping (ORM) frameworks, 30 error handling and aborted transactions, 232 unsafe read-modify-write cycle code, 244 object-relational mismatch, 29 observer pattern, 506 offline systems, 390 (see also batch processing) 578 | Index stateful, offline-capable clients, 170, 511 offline-first applications, 511 offsets consumer offsets in partitioned logs, 449 messages in partitioned logs, 447 OLAP (online analytic processing), 91, 556 data cubes, 102 OLTP (online transaction processing), 90, 556 analytics queries versus, 411 workload characteristics, 253 one-to-many relationships, 30 JSON representation, 32 online systems, 389 (see also services) Oozie (workflow scheduler), 402 OpenAPI (service definition format), 133 OpenStack Nova (cloud infrastructure) use of ZooKeeper, 370 Swift (object storage), 398 operability, 19 operating systems versus databases, 499 operation identifiers, 518, 522 operational transformation, 174 operators, 421 flow of data between, 424 in stream processing, 464 optimistic concurrency control, 261 Oracle (database) distributed transaction support, 361 GoldenGate (change data capture), 161, 170, 455 lack of serializability, 226 leader-based replication, 153 multi-table index cluster tables, 41 not preventing write skew, 248 partitioned indexes, 209 PL/SQL language, 255 preventing lost updates, 245 read committed isolation, 236 Real Application Clusters (RAC), 330 recursive query support, 54 snapshot isolation support, 239, 242 TimesTen (in-memory database), 89 WAL-based replication, 160 XML support, 30 ordering, 339-352 by sequence numbers, 343-348 causal ordering, 339-343 partial order, 341 limits of total ordering, 493 total order broadcast, 348-352 Orleans (actor framework), 139 outliers (response time), 14 Oz (programming language), 504 P package managers, 428, 505 packet switching, 285 packets corruption of, 306 sending via UDP, 442 PageRank (algorithm), 49, 424 paging (see virtual memory) ParAccel (database), 93 parallel databases (see massively parallel pro‐ cessing) parallel execution of graph analysis algorithms, 426 queries in MPP databases, 216 Parquet (data format), 96, 131 (see also column-oriented storage) use in Hadoop, 414 partial failures, 275, 310 limping, 311 partial order, 341 partitioning, 199-218, 556 and replication, 200 in batch processing, 429 multi-partition operations, 514 enforcing constraints, 522 secondary index maintenance, 495 of key-value data, 201-205 by key range, 202 skew and hot spots, 205 rebalancing partitions, 209-214 automatic or manual rebalancing, 213 problems with hash mod N, 210 using dynamic partitioning, 212 using fixed number of partitions, 210 using N partitions per node, 212 replication and, 147 request routing, 214-216 secondary indexes, 206-209 document-based partitioning, 206 term-based partitioning, 208 serial execution of transactions and, 255 Paxos (consensus algorithm), 366 ballot number, 368 Multi-Paxos (total order broadcast), 367 percentiles, 14, 556 calculating efficiently, 16 importance of high percentiles, 16 use in service level agreements (SLAs), 15 Percona XtraBackup (MySQL tool), 156 performance describing, 13 of distributed transactions, 360 of in-memory databases, 89 of linearizability, 338 of multi-leader replication, 169 perpetual inconsistency, 525 pessimistic concurrency control, 261 phantoms (transaction isolation), 250 materializing conflicts, 251 preventing, in serializability, 259 physical clocks (see clocks) pickle (Python), 113 Pig (dataflow language), 419, 427 replicated joins, 409 skewed joins, 407 workflows, 403 Pinball (workflow scheduler), 402 pipelined execution, 423 in Unix, 394 point in time, 287 polyglot persistence, 29 polystores, 501 PostgreSQL (database) BDR (multi-leader replication), 170 causal ordering of writes, 177 Bottled Water (change data capture), 455 Bucardo (trigger-based replication), 161, 173 distributed transaction support, 361 foreign data wrappers, 501 full text search support, 490 leader-based replication, 153 log sequence number, 156 MVCC implementation, 239, 241 PL/pgSQL language, 255 PostGIS geospatial indexes, 87 preventing lost updates, 245 preventing write skew, 248, 261 read committed isolation, 236 recursive query support, 54 representing graphs, 51 Index | 579 serializable snapshot isolation (SSI), 261 snapshot isolation support, 239, 242 WAL-based replication, 160 XML and JSON support, 30, 42 pre-splitting, 212 Precision Time Protocol (PTP), 290 predicate locks, 259 predictive analytics, 533-536 amplifying bias, 534 ethics of (see ethics) feedback loops, 536 preemption of datacenter resources, 418 of threads, 298 Pregel processing model, 425 primary keys, 85, 556 compound primary key (Cassandra), 204 primary-secondary replication (see leaderbased replication) privacy, 536-543 consent and freedom of choice, 538 data as assets and power, 540 deleting data, 463 ethical considerations (see ethics) legislation and self-regulation, 542 meaning of, 539 surveillance, 537 tracking behavioral data, 536 probabilistic algorithms, 16, 466 process pauses, 295-299 processing time (of events), 469 producers (message streams), 440 programming languages dataflow languages, 504 for stored procedures, 255 functional reactive programming (FRP), 504 logic programming, 504 Prolog (language), 61 (see also Datalog) promises (asynchronous operations), 135 property graphs, 50 Cypher query language, 52 Protocol Buffers (data format), 117-121 field tags and schema evolution, 120 provenance of data, 531 publish/subscribe model, 441 publishers (message streams), 440 punch card tabulating machines, 390 580 | Index pure functions, 48 putting computation near data, 400 Q Qpid (messaging), 444 quality of service (QoS), 285 Quantcast File System (distributed filesystem), 398 query languages, 42-48 aggregation pipeline, 48 CSS and XSL, 44 Cypher, 52 Datalog, 60 Juttle, 504 MapReduce querying, 46-48 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 query optimizers, 37, 427 queueing delays (networks), 282 head-of-line blocking, 15 latency and response time, 14 queues (messaging), 137 quorums, 179-182, 556 for leaderless replication, 179 in consensus algorithms, 368 limitations of consistency, 181-183, 334 making decisions in distributed systems, 301 monitoring staleness, 182 multi-datacenter replication, 184 relying on durability, 309 sloppy quorums and hinted handoff, 183 R R-trees (indexes), 87 RabbitMQ (messaging), 137, 444 leader-based replication, 153 race conditions, 225 (see also concurrency) avoiding with linearizability, 331 caused by dual writes, 452 dirty writes, 235 in counter increments, 235 lost updates, 242-246 preventing with event logs, 462, 507 preventing with serializable isolation, 252 write skew, 246-251 Raft (consensus algorithm), 366 sensitivity to network problems, 369 term number, 368 use in etcd, 353 RAID (Redundant Array of Independent Disks), 7, 398 railways, schema migration on, 496 RAMCloud (in-memory storage), 89 ranking algorithms, 424 RDF (Resource Description Framework), 57 querying with SPARQL, 59 RDMA (Remote Direct Memory Access), 276 read committed isolation level, 234-237 implementing, 236 multi-version concurrency control (MVCC), 239 no dirty reads, 234 no dirty writes, 235 read path (derived data), 509 read repair (leaderless replication), 178 for linearizability, 335 read replicas (see leader-based replication) read skew (transaction isolation), 238, 266 as violation of causality, 340 read-after-write consistency, 163, 524 cross-device, 164 read-modify-write cycle, 243 read-scaling architecture, 161 reads as events, 513 real-time collaborative editing, 170 near-real-time processing, 390 (see also stream processing) publish/subscribe dataflow, 513 response time guarantees, 298 time-of-day clocks, 288 rebalancing partitions, 209-214, 556 (see also partitioning) automatic or manual rebalancing, 213 dynamic partitioning, 212 fixed number of partitions, 210 fixed number of partitions per node, 212 problems with hash mod N, 210 recency guarantee, 324 recommendation engines batch process outputs, 412 batch workflows, 403, 420 iterative processing, 424 statistical and numerical algorithms, 428 records, 399 events in stream processing, 440 recursive common table expressions (SQL), 54 redelivery (messaging), 445 Redis (database) atomic operations, 243 durability, 89 Lua scripting, 255 single-threaded execution, 253 usage example, 4 redundancy hardware components, 7 of derived data, 386 (see also derived data) Reed–Solomon codes (error correction), 398 refactoring, 22 (see also evolvability) regions (partitioning), 199 register (data structure), 325 relational data model, 28-42 comparison to document model, 38-42 graph queries in SQL, 53 in-memory databases with, 89 many-to-one and many-to-many relation‐ ships, 33 multi-object transactions, need for, 231 NoSQL as alternative to, 29 object-relational mismatch, 29 relational algebra and SQL, 42 versus document model convergence of models, 41 data locality, 41 relational databases eventual consistency, 162 history, 28 leader-based replication, 153 logical logs, 160 philosophy compared to Unix, 499, 501 schema changes, 40, 111, 130 statement-based replication, 158 use of B-tree indexes, 80 relationships (see edges) reliability, 6-10, 489 building a reliable system from unreliable components, 276 defined, 6, 22 hardware faults, 7 human errors, 9 importance of, 10 of messaging systems, 442 Index | 581 software errors, 8 Remote Method Invocation (Java RMI), 134 remote procedure calls (RPCs), 134-136 (see also services) based on futures, 135 data encoding and evolution, 136 issues with, 134 using Avro, 126, 135 using Thrift, 135 versus message brokers, 137 repeatable reads (transaction isolation), 242 replicas, 152 replication, 151-193, 556 and durability, 227 chain replication, 155 conflict resolution and, 246 consistency properties, 161-167 consistent prefix reads, 165 monotonic reads, 164 reading your own writes, 162 in distributed filesystems, 398 leaderless, 177-191 detecting concurrent writes, 184-191 limitations of quorum consistency, 181-183, 334 sloppy quorums and hinted handoff, 183 monitoring staleness, 182 multi-leader, 168-177 across multiple datacenters, 168, 335 handling write conflicts, 171-175 replication topologies, 175-177 partitioning and, 147, 200 reasons for using, 145, 151 single-leader, 152-161 failover, 157 implementation of replication logs, 158-161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 state machine replication, 349, 452 using erasure coding, 398 with heterogeneous data systems, 453 replication logs (see logs) reprocessing data, 496, 498 (see also evolvability) from log-based messaging, 451 request routing, 214-216 582 | Index approaches to, 214 parallel query execution, 216 resilient systems, 6 (see also fault tolerance) response time as performance metric for services, 13, 389 guarantees on, 298 latency versus, 14 mean and percentiles, 14 user experience, 15 responsibility and accountability, 535 REST (Representational State Transfer), 133 (see also services) RethinkDB (database) document data model, 31 dynamic partitioning, 212 join support, 34, 42 key-range partitioning, 202 leader-based replication, 153 subscribing to changes, 456 Riak (database) Bitcask storage engine, 72 CRDTs, 174, 191 dotted version vectors, 191 gossip protocol, 216 hash partitioning, 203-204, 211 last-write-wins conflict resolution, 186 leaderless replication, 177 LevelDB storage engine, 78 linearizability, lack of, 335 multi-datacenter support, 184 preventing lost updates across replicas, 246 rebalancing, 213 search feature, 209 secondary indexes, 207 siblings (concurrently written values), 190 sloppy quorums, 184 ring buffers, 450 Ripple (cryptocurrency), 532 rockets, 10, 36, 305 RocksDB (storage engine), 78 leveled compaction, 79 rollbacks (transactions), 222 rolling upgrades, 8, 112 routing (see request routing) row-oriented storage, 96 row-based replication, 160 rowhammer (memory corruption), 529 RPCs (see remote procedure calls) Rubygems (package manager), 428 rules (Datalog), 61 S safety and liveness properties, 308 in consensus algorithms, 366 in transactions, 222 sagas (see compensating transactions) Samza (stream processor), 466, 467 fault tolerance, 479 streaming SQL support, 466 sandboxes, 9 SAP HANA (database), 93 scalability, 10-18, 489 approaches for coping with load, 17 defined, 22 describing load, 11 describing performance, 13 partitioning and, 199 replication and, 161 scaling up versus scaling out, 146 scaling out, 17, 146 (see also shared-nothing architecture) scaling up, 17, 146 scatter/gather approach, querying partitioned databases, 207 SCD (slowly changing dimension), 476 schema-on-read, 39 comparison to evolvable schema, 128 in distributed filesystems, 415 schema-on-write, 39 schemaless databases (see schema-on-read) schemas, 557 Avro, 122-127 reader determining writer’s schema, 125 schema evolution, 123 dynamically generated, 126 evolution of, 496 affecting application code, 111 compatibility checking, 126 in databases, 129-131 in message-passing, 138 in service calls, 136 flexibility in document model, 39 for analytics, 93-95 for JSON and XML, 115 merits of, 127 schema migration on railways, 496 Thrift and Protocol Buffers, 117-121 schema evolution, 120 traditional approach to design, fallacy in, 462 searches building search indexes in batch processes, 411 k-nearest neighbors, 429 on streams, 467 partitioned secondary indexes, 206 secondaries (see leader-based replication) secondary indexes, 85, 557 partitioning, 206-209, 217 document-partitioned, 206 index maintenance, 495 term-partitioned, 208 problems with dual writes, 452, 491 updating, transaction isolation and, 231 secondary sorts, 405 sed (Unix tool), 392 self-describing files, 127 self-joins, 480 self-validating systems, 530 semantic web, 57 semi-synchronous replication, 154 sequence number ordering, 343-348 generators, 294, 344 insufficiency for enforcing constraints, 347 Lamport timestamps, 345 use of timestamps, 291, 295, 345 sequential consistency, 351 serializability, 225, 233, 251-266, 557 linearizability versus, 329 pessimistic versus optimistic concurrency control, 261 serial execution, 252-256 partitioning, 255 using stored procedures, 253, 349 serializable snapshot isolation (SSI), 261-266 detecting stale MVCC reads, 263 detecting writes that affect prior reads, 264 distributed execution, 265, 364 performance of SSI, 265 preventing write skew, 262-265 two-phase locking (2PL), 257-261 index-range locks, 260 performance, 258 Serializable (Java), 113 Index | 583 serialization, 113 (see also encoding) service discovery, 135, 214, 372 using DNS, 216, 372 service level agreements (SLAs), 15 service-oriented architecture (SOA), 132 (see also services) services, 131-136 microservices, 132 causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 remote procedure calls (RPCs), 134-136 issues with, 134 similarity to databases, 132 web services, 132, 135 session windows (stream processing), 472 (see also windows) sessionization, 407 sharding (see partitioning) shared mode (locks), 258 shared-disk architecture, 146, 398 shared-memory architecture, 146 shared-nothing architecture, 17, 146-147, 557 (see also replication) distributed filesystems, 398 (see also distributed filesystems) partitioning, 199 use of network, 277 sharks biting undersea cables, 279 counting (example), 46-48 finding (example), 42 website about (example), 44 shredding (in relational model), 38 siblings (concurrent values), 190, 246 (see also conflicts) similarity search edit distance, 88 genome data, 63 k-nearest neighbors, 429 single-leader replication (see leader-based rep‐ lication) single-threaded execution, 243, 252 in batch processing, 406, 421, 426 in stream processing, 448, 463, 522 size-tiered compaction, 79 skew, 557 584 | Index clock skew, 291-294, 334 in transaction isolation read skew, 238, 266 write skew, 246-251, 262-265 (see also write skew) meanings of, 238 unbalanced workload, 201 compensating for, 205 due to celebrities, 205 for time-series data, 203 in batch processing, 407 slaves (see leader-based replication) sliding windows (stream processing), 472 (see also windows) sloppy quorums, 183 (see also quorums) lack of linearizability, 334 slowly changing dimension (data warehouses), 476 smearing (leap seconds adjustments), 290 snapshots (databases) causal consistency, 340 computing derived data, 500 in change data capture, 455 serializable snapshot isolation (SSI), 261-266, 329 setting up a new replica, 156 snapshot isolation and repeatable read, 237-242 implementing with MVCC, 239 indexes and MVCC, 241 visibility rules, 240 synchronized clocks for global snapshots, 294 snowflake schemas, 95 SOAP, 133 (see also services) evolvability, 136 software bugs, 8 maintaining integrity, 529 solid state drives (SSDs) access patterns, 84 detecting corruption, 519, 530 faults in, 227 sequential write throughput, 75 Solr (search server) building indexes in batch processes, 411 document-partitioned indexes, 207 request routing, 216 usage example, 4 use of Lucene, 79 sort (Unix tool), 392, 394, 395 sort-merge joins (MapReduce), 405 Sorted String Tables (see SSTables) sorting sort order in column storage, 99 source of truth (see systems of record) Spanner (database) data locality, 41 snapshot isolation using clocks, 295 TrueTime API, 294 Spark (processing framework), 421-423 bytecode generation, 428 dataflow APIs, 427 fault tolerance, 422 for data warehouses, 93 GraphX API (graph processing), 425 machine learning, 428 query optimizer, 427 Spark Streaming, 466 microbatching, 477 stream processing on top of batch process‐ ing, 495 SPARQL (query language), 59 spatial algorithms, 429 split brain, 158, 557 in consensus algorithms, 352, 367 preventing, 322, 333 using fencing tokens to avoid, 302-304 spreadsheets, dataflow programming capabili‐ ties, 504 SQL (Structured Query Language), 21, 28, 43 advantages and limitations of, 416 distributed query execution, 48 graph queries in, 53 isolation levels standard, issues with, 242 query execution on Hadoop, 416 résumé (example), 30 SQL injection vulnerability, 305 SQL on Hadoop, 93 statement-based replication, 158 stored procedures, 255 SQL Server (database) data warehousing support, 93 distributed transaction support, 361 leader-based replication, 153 preventing lost updates, 245 preventing write skew, 248, 257 read committed isolation, 236 recursive query support, 54 serializable isolation, 257 snapshot isolation support, 239 T-SQL language, 255 XML support, 30 SQLstream (stream analytics), 466 SSDs (see solid state drives) SSTables (storage format), 76-79 advantages over hash indexes, 76 concatenated index, 204 constructing and maintaining, 78 making LSM-Tree from, 78 staleness (old data), 162 cross-channel timing dependencies, 331 in leaderless databases, 178 in multi-version concurrency control, 263 monitoring for, 182 of client state, 512 versus linearizability, 324 versus timeliness, 524 standbys (see leader-based replication) star replication topologies, 175 star schemas, 93-95 similarity to event sourcing, 458 Star Wars analogy (event time versus process‐ ing time), 469 state derived from log of immutable events, 459 deriving current state from the event log, 458 interplay between state changes and appli‐ cation code, 507 maintaining derived state, 495 maintenance by stream processor in streamstream joins, 473 observing derived state, 509-515 rebuilding after stream processor failure, 478 separation of application code and, 505 state machine replication, 349, 452 statement-based replication, 158 statically typed languages analogy to schema-on-write, 40 code generation and, 127 statistical and numerical algorithms, 428 StatsD (metrics aggregator), 442 stdin, stdout, 395, 396 Stellar (cryptocurrency), 532 Index | 585 stock market feeds, 442 STONITH (Shoot The Other Node In The Head), 158 stop-the-world (see garbage collection) storage composing data storage technologies, 499-504 diversity of, in MapReduce, 415 Storage Area Network (SAN), 146, 398 storage engines, 69-104 column-oriented, 95-101 column compression, 97-99 defined, 96 distinction between column families and, 99 Parquet, 96, 131 sort order in, 99-100 writing to, 101 comparing requirements for transaction processing and analytics, 90-96 in-memory storage, 88 durability, 227 row-oriented, 70-90 B-trees, 79-83 comparing B-trees and LSM-trees, 83-85 defined, 96 log-structured, 72-79 stored procedures, 161, 253-255, 557 and total order broadcast, 349 pros and cons of, 255 similarity to stream processors, 505 Storm (stream processor), 466 distributed RPC, 468, 514 Trident state handling, 478 straggler events, 470, 498 stream processing, 464-481, 557 accessing external services within job, 474, 477, 478, 517 combining with batch processing lambda architecture, 497 unifying technologies, 498 comparison to batch processing, 464 complex event processing (CEP), 465 fault tolerance, 476-479 atomic commit, 477 idempotence, 478 microbatching and checkpointing, 477 rebuilding state after a failure, 478 for data integration, 494-498 586 | Index maintaining derived state, 495 maintenance of materialized views, 467 messaging systems (see messaging systems) reasoning about time, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 types of windows, 472 relation to databases (see streams) relation to services, 508 search on streams, 467 single-threaded execution, 448, 463 stream analytics, 466 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 streams, 440-451 end-to-end, pushing events to clients, 512 messaging systems (see messaging systems) processing (see stream processing) relation to databases, 451-464 (see also changelogs) API support for change streams, 456 change data capture, 454-457 derivative of state by time, 460 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 topics, 440 strict serializability, 329 strong consistency (see linearizability) strong one-copy serializability, 329 subjects, predicates, and objects (in triplestores), 55 subscribers (message streams), 440 (see also consumers) supercomputers, 275 surveillance, 537 (see also privacy) Swagger (service definition format), 133 swapping to disk (see virtual memory) synchronous networks, 285, 557 comparison to asynchronous networks, 284 formal model, 307 synchronous replication, 154, 557 chain replication, 155 conflict detection, 172 system models, 300, 306-310 assumptions in, 528 correctness of algorithms, 308 mapping to the real world, 309 safety and liveness, 308 systems of record, 386, 557 change data capture, 454, 491 treating event log as, 460 systems thinking, 536 T t-digest (algorithm), 16 table-table joins, 474 Tableau (data visualization software), 416 tail (Unix tool), 447 tail vertex (property graphs), 51 Tajo (query engine), 93 Tandem NonStop SQL (database), 200 TCP (Transmission Control Protocol), 277 comparison to circuit switching, 285 comparison to UDP, 283 connection failures, 280 flow control, 282, 441 packet checksums, 306, 519, 529 reliability and duplicate suppression, 517 retransmission timeouts, 284 use for transaction sessions, 229 telemetry (see monitoring) Teradata (database), 93, 200 term-partitioned indexes, 208, 217 termination (consensus), 365 Terrapin (database), 413 Tez (dataflow engine), 421-423 fault tolerance, 422 support by higher-level tools, 427 thrashing (out of memory), 297 threads (concurrency) actor model, 138, 468 (see also message-passing) atomic operations, 223 background threads, 73, 85 execution pauses, 286, 296-298 memory barriers, 338 preemption, 298 single (see single-threaded execution) three-phase commit, 359 Thrift (data format), 117-121 BinaryProtocol, 118 CompactProtocol, 119 field tags and schema evolution, 120 throughput, 13, 390 TIBCO, 137 Enterprise Message Service, 444 StreamBase (stream analytics), 466 time concurrency and, 187 cross-channel timing dependencies, 331 in distributed systems, 287-299 (see also clocks) clock synchronization and accuracy, 289 relying on synchronized clocks, 291-295 process pauses, 295-299 reasoning about, in stream processors, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 timestamp of events, 471 types of windows, 472 system models for distributed systems, 307 time-dependence in stream joins, 475 time-of-day clocks, 288 timeliness, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 timeouts, 279, 557 dynamic configuration of, 284 for failover, 158 length of, 281 timestamps, 343 assigning to events in stream processing, 471 for read-after-write consistency, 163 for transaction ordering, 295 insufficiency for enforcing constraints, 347 key range partitioning by, 203 Lamport, 345 logical, 494 ordering events, 291, 345 Titan (database), 50 tombstones, 74, 191, 456 topics (messaging), 137, 440 total order, 341, 557 limits of, 493 sequence numbers or timestamps, 344 total order broadcast, 348-352, 493, 522 consensus algorithms and, 366-368 Index | 587 implementation in ZooKeeper and etcd, 370 implementing with linearizable storage, 351 using, 349 using to implement linearizable storage, 350 tracking behavioral data, 536 (see also privacy) transaction coordinator (see coordinator) transaction manager (see coordinator) transaction processing, 28, 90-95 comparison to analytics, 91 comparison to data warehousing, 93 transactions, 221-267, 558 ACID properties of, 223 atomicity, 223 consistency, 224 durability, 226 isolation, 225 compensating (see compensating transac‐ tions) concept of, 222 distributed transactions, 352-364 avoiding, 492, 502, 521-528 failure amplification, 364, 495 in doubt/uncertain status, 358, 362 two-phase commit, 354-359 use of, 360-361 XA transactions, 361-364 OLTP versus analytics queries, 411 purpose of, 222 serializability, 251-266 actual serial execution, 252-256 pessimistic versus optimistic concur‐ rency control, 261 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 single-object and multi-object, 228-232 handling errors and aborts, 231 need for multi-object transactions, 231 single-object writes, 230 snapshot isolation (see snapshots) weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-238 transitive closure (graph algorithm), 424 trie (data structure), 88 triggers (databases), 161, 441 implementing change data capture, 455 implementing replication, 161 588 | Index triple-stores, 55-59 SPARQL query language, 59 tumbling windows (stream processing), 472 (see also windows) in microbatching, 477 tuple spaces (programming model), 507 Turtle (RDF data format), 56 Twitter constructing home timelines (example), 11, 462, 474, 511 DistributedLog (event log), 448 Finagle (RPC framework), 135 Snowflake (sequence number generator), 294 Summingbird (processing library), 497 two-phase commit (2PC), 353, 355-359, 558 confusion with two-phase locking, 356 coordinator failure, 358 coordinator recovery, 363 how it works, 357 issues in practice, 363 performance cost, 360 transactions holding locks, 362 two-phase locking (2PL), 257-261, 329, 558 confusion with two-phase commit, 356 index-range locks, 260 performance of, 258 type checking, dynamic versus static, 40 U UDP (User Datagram Protocol) comparison to TCP, 283 multicast, 442 unbounded datasets, 439, 558 (see also streams) unbounded delays, 558 in networks, 282 process pauses, 296 unbundling databases, 499-515 composing data storage technologies, 499-504 federation versus unbundling, 501 need for high-level language, 503 designing applications around dataflow, 504-509 observing derived state, 509-515 materialized views and caching, 510 multi-partition data processing, 514 pushing state changes to clients, 512 uncertain (transaction status) (see in doubt) uniform consensus, 365 (see also consensus) uniform interfaces, 395 union type (in Avro), 125 uniq (Unix tool), 392 uniqueness constraints asynchronously checked, 526 requiring consensus, 521 requiring linearizability, 330 uniqueness in log-based messaging, 522 Unix philosophy, 394-397 command-line batch processing, 391-394 Unix pipes versus dataflow engines, 423 comparison to Hadoop, 413-414 comparison to relational databases, 499, 501 comparison to stream processing, 464 composability and uniform interfaces, 395 loose coupling, 396 pipes, 394 relation to Hadoop, 499 UPDATE statement (SQL), 40 updates preventing lost updates, 242-246 atomic write operations, 243 automatically detecting lost updates, 245 compare-and-set operations, 245 conflict resolution and replication, 246 using explicit locking, 244 preventing write skew, 246-251 V validity (consensus), 365 vBuckets (partitioning), 199 vector clocks, 191 (see also version vectors) vectorized processing, 99, 428 verification, 528-533 avoiding blind trust, 530 culture of, 530 designing for auditability, 531 end-to-end integrity checks, 531 tools for auditable data systems, 532 version control systems, reliance on immutable data, 463 version vectors, 177, 191 capturing causal dependencies, 343 versus vector clocks, 191 Vertica (database), 93 handling writes, 101 replicas using different sort orders, 100 vertical scaling (see scaling up) vertices (in graphs), 49 property graph model, 50 Viewstamped Replication (consensus algo‐ rithm), 366 view number, 368 virtual machines, 146 (see also cloud computing) context switches, 297 network performance, 282 noisy neighbors, 284 reliability in cloud services, 8 virtualized clocks in, 290 virtual memory process pauses due to page faults, 14, 297 versus memory management by databases, 89 VisiCalc (spreadsheets), 504 vnodes (partitioning), 199 Voice over IP (VoIP), 283 Voldemort (database) building read-only stores in batch processes, 413 hash partitioning, 203-204, 211 leaderless replication, 177 multi-datacenter support, 184 rebalancing, 213 reliance on read repair, 179 sloppy quorums, 184 VoltDB (database) cross-partition serializability, 256 deterministic stored procedures, 255 in-memory storage, 89 output streams, 456 secondary indexes, 207 serial execution of transactions, 253 statement-based replication, 159, 479 transactions in stream processing, 477 W WAL (write-ahead log), 82 web services (see services) Web Services Description Language (WSDL), 133 webhooks, 443 webMethods (messaging), 137 WebSocket (protocol), 512 Index | 589 windows (stream processing), 466, 468-472 infinite windows for changelogs, 467, 474 knowing when all events have arrived, 470 stream joins within a window, 473 types of windows, 472 winners (conflict resolution), 173 WITH RECURSIVE syntax (SQL), 54 workflows (MapReduce), 402 outputs, 411-414 key-value stores, 412 search indexes, 411 with map-side joins, 410 working set, 393 write amplification, 84 write path (derived data), 509 write skew (transaction isolation), 246-251 characterizing, 246-251, 262 examples of, 247, 249 materializing conflicts, 251 occurrence in practice, 529 phantoms, 250 preventing in snapshot isolation, 262-265 in two-phase locking, 259-261 options for, 248 write-ahead log (WAL), 82, 159 writes (database) atomic write operations, 243 detecting writes affecting prior reads, 264 preventing dirty writes with read commit‐ ted, 235 WS-* framework, 133 (see also services) WS-AtomicTransaction (2PC), 355 590 | Index X XA transactions, 355, 361-364 heuristic decisions, 363 limitations of, 363 xargs (Unix tool), 392, 396 XML binary variants, 115 encoding RDF data, 57 for application data, issues with, 114 in relational databases, 30, 41 XSL/XPath, 45 Y Yahoo!


pages: 772 words: 203,182

What Went Wrong: How the 1% Hijacked the American Middle Class . . . And What Other Countries Got Right by George R. Tyler

8-hour work day, active measures, activist fund / activist shareholder / activist investor, affirmative action, Affordable Care Act / Obamacare, bank run, banking crisis, Basel III, Black Swan, blood diamonds, blue-collar work, Bolshevik threat, bonus culture, British Empire, business process, capital controls, Carmen Reinhart, carried interest, cognitive dissonance, collateralized debt obligation, collective bargaining, commoditize, corporate governance, corporate personhood, corporate raider, corporate social responsibility, creative destruction, credit crunch, crony capitalism, crowdsourcing, currency manipulation / currency intervention, David Brooks, David Graeber, David Ricardo: comparative advantage, declining real wages, deindustrialization, Diane Coyle, Double Irish / Dutch Sandwich, eurozone crisis, financial deregulation, financial innovation, fixed income, Francis Fukuyama: the end of history, full employment, George Akerlof, George Gilder, Gini coefficient, Gordon Gekko, hiring and firing, income inequality, invisible hand, job satisfaction, John Markoff, joint-stock company, Joseph Schumpeter, Kenneth Rogoff, labor-force participation, labour market flexibility, laissez-faire capitalism, lake wobegon effect, light touch regulation, Long Term Capital Management, manufacturing employment, market clearing, market fundamentalism, Martin Wolf, minimum wage unemployment, mittelstand, moral hazard, Myron Scholes, Naomi Klein, Northern Rock, obamacare, offshore financial centre, Paul Samuelson, pension reform, performance metric, pirate software, Plutocrats, plutocrats, Ponzi scheme, precariat, price stability, profit maximization, profit motive, purchasing power parity, race to the bottom, Ralph Nader, rent-seeking, reshoring, Richard Thaler, rising living standards, road to serfdom, Robert Gordon, Robert Shiller, Robert Shiller, Ronald Reagan, Sand Hill Road, shareholder value, Silicon Valley, South Sea Bubble, sovereign wealth fund, Steve Ballmer, Steve Jobs, The Chicago School, The Spirit Level, The Wealth of Nations by Adam Smith, Thorstein Veblen, too big to fail, transcontinental railway, transfer pricing, trickle-down economics, tulip mania, Tyler Cowen: Great Stagnation, union organizing, Upton Sinclair, upwardly mobile, women in the workforce, working poor, zero-sum game

European and Asian executives, even those running multinational corporations, are paid a fraction of the salaries paid in the Anglo sphere.”41 CEO Lemons: The Collapse of Pay-for-Performance in America Foreign scholars describe American firms as providing “pathological overcompensation of fair-weather captains.”42 They are correct: the rise in US executive compensation of recent decades is unjustified by any performance metric, vastly outstripping indices like sales, profits, or returns to shareholders. The Clinton administration’s Secretary of Labor, Robert Reich, unearthed the smoking gun evidence: “By 2006, CEOs were earning, on average, eight times as much per dollar of corporate profits as they did in the 1980s.”43 A vast disparity like this in trend lines is powerful evidence that executive pay suffers from market failure.

They are more common in Europe and even in the United Kingdom since the Financial Services Authority in London imposed clawback rules in 2009, targeted at bad apples. That is perhaps why Lloyds Banking Group reclaimed bonuses paid to senior executives who engineered a consumer scam.107 And it seems likely that the LIBOR scandal will eventually involve clawbacks at firms such as Barclays. The general framework just outlined, with modest bonuses featuring delayed vesting and dependent on long-term performance metrics, has been endorsed by academics, notably the Squam Lake Group of economists including Kenneth French of Dartmouth and Robert Shiller of Yale.108 And its principles are reflected in Germany’s VorstAG law enacted in July 2009, explicitly intended to lengthen executive time horizons, with incentive pay vesting only after four years.109 Moreover, risky decision-making is discouraged by its legal provisions precluding management profiting from extraordinary developments such as takeovers or other realization of hidden assets.


pages: 327 words: 103,336

Everything Is Obvious: *Once You Know the Answer by Duncan J. Watts

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

active measures, affirmative action, Albert Einstein, Amazon Mechanical Turk, Black Swan, butterfly effect, Carmen Reinhart, Cass Sunstein, clockwork universe, cognitive dissonance, collapse of Lehman Brothers, complexity theory, correlation does not imply causation, crowdsourcing, death of newspapers, discovery of DNA, East Village, easy for humans, difficult for computers, edge city, en.wikipedia.org, Erik Brynjolfsson, framing effect, Geoffrey West, Santa Fe Institute, George Santayana, happiness index / gross national happiness, high batting average, hindsight bias, illegal immigration, industrial cluster, interest rate swap, invention of the printing press, invention of the telescope, invisible hand, Isaac Newton, Jane Jacobs, Jeff Bezos, Joseph Schumpeter, Kenneth Rogoff, lake wobegon effect, Long Term Capital Management, loss aversion, medical malpractice, meta analysis, meta-analysis, Milgram experiment, natural language processing, Netflix Prize, Network effects, oil shock, packet switching, pattern recognition, performance metric, phenotype, Pierre-Simon Laplace, planetary scale, prediction markets, pre–internet, RAND corporation, random walk, RFID, school choice, Silicon Valley, statistical model, Steve Ballmer, Steve Jobs, Steve Wozniak, supply-chain management, The Death and Life of Great American Cities, the scientific method, The Wisdom of Crowds, too big to fail, Toyota Production System, ultimatum game, urban planning, Vincenzo Peruggia: Mona Lisa, Watson beat the top human players on Jeopardy!, X Prize

The problem is therefore not that planning of any kind is impossible, any more than prediction of any kind is impossible, but rather that certain kinds of plans can be made reliably and others can’t be, and that planners need to be able to tell the difference. 3. See Helft (2008) for a story about the Yahoo! home page overhoul. 4. See Kohavi et al. (2010) and Tang et al. (2010). 5. See Clifford (2009) for a story about startup companies using quantitative performance metrics to substitute for design instinct. 6. See Alterman (2008) for Peretti’s original description of the Mullet Strategy. See Dholakia and Vianello (2009) for a discussion of how the same approach can work for communities built around brands, and the associated tradeoff between control and insight. 7. See Howe (2008, 2006) for a general discussion of crowdsourcing. See Rice (2010) for examples of recent trends in online journalism. 8.


pages: 292 words: 81,699

More Joel on Software by Joel Spolsky

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

a long time ago in a galaxy far, far away, barriers to entry, Black Swan, Build a better mousetrap, business process, call centre, Danny Hillis, David Heinemeier Hansson, failed state, Firefox, fixed income, George Gilder, Larry Wall, low cost carrier, Mars Rover, Network effects, Paul Graham, performance metric, place-making, price discrimination, prisoner's dilemma, Ray Oldenburg, Ruby on Rails, Sand Hill Road, Silicon Valley, slashdot, social software, Steve Ballmer, Steve Jobs, Superbowl ad, The Great Good Place, type inference, unpaid internship, wage slave, web application, Y Combinator

Or the tester agrees to report the bug “informally” to the developer before writing it up in the bug tracking system. And now nobody uses the bug tracking system. The bug count goes way down, but the number of bugs stays the same. Developers are clever this way. Whatever you try to measure, they’ll find a way to maximize, and you’ll never quite get what you want. Robert D. Austin, in his book Measuring and Managing Performance in Organizations, says there are two phases when you introduce new performance metrics. At first, you actually get what you want, because nobody has figured out how to cheat. In the second phase, you actually get something worse, as everyone figures out the trick to maximizing the thing that you’re measuring, even at the cost of ruining the company. Worse, Econ 101 managers think that they can somehow avoid this situation just by tweaking the metrics. Dr. Austin’s conclusion is that you just can’t.


pages: 309 words: 91,581

The Great Divergence: America's Growing Inequality Crisis and What We Can Do About It by Timothy Noah

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

assortative mating, autonomous vehicles, blue-collar work, Bonfire of the Vanities, Branko Milanovic, call centre, collective bargaining, computer age, corporate governance, Credit Default Swap, David Ricardo: comparative advantage, Deng Xiaoping, Erik Brynjolfsson, feminist movement, Frank Levy and Richard Murnane: The New Division of Labor, Gini coefficient, Gunnar Myrdal, income inequality, industrial robot, invisible hand, job automation, Joseph Schumpeter, low skilled workers, lump of labour, manufacturing employment, moral hazard, oil shock, pattern recognition, Paul Samuelson, performance metric, positional goods, post-industrial society, postindustrial economy, Powell Memorandum, purchasing power parity, refrigerator car, rent control, Richard Feynman, Richard Feynman, Ronald Reagan, shareholder value, Silicon Valley, Simon Kuznets, Stephen Hawking, Steve Jobs, The Spirit Level, too big to fail, trickle-down economics, Tyler Cowen: Great Stagnation, union organizing, upwardly mobile, very high income, Vilfredo Pareto, War on Poverty, We are the 99%, women in the workforce, Works Progress Administration, Yom Kippur War

But the bill exempted performance-based bonuses and stock options, on the theory that these tied chief executives’ compensation to company profitability. Corporate compensation committees responded in three ways. First, “everybody got a raise to $1 million,” Nell Minow, a corporate governance critic, told me.16 Next, corporate compensation committees, which remained bent on showering chief executives indiscriminately with cash, started inventing make-believe performance metrics. For instance, AES Corp., a firm based in Arlington, Virginia, that operates power plants, made it one of chief executive Dennis Bakke’s performance goals to ensure that AES remained a “fun” place to work. (“To some, it’s soft,” the fun-loving Bakke told Businessweek. “To me, it’s a vision of the world.”) Third, and most important, corporations showered top executives with so many stock options that this form of compensation came to account, on average, for the majority of CEO pay.


pages: 368 words: 96,825

Bold: How to Go Big, Create Wealth and Impact the World by Peter H. Diamandis, Steven Kotler

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, additive manufacturing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, cloud computing, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, dematerialisation, deskilling, Elon Musk, en.wikipedia.org, Exxon Valdez, fear of failure, Firefox, Galaxy Zoo, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, ImageNet competition, industrial robot, Internet of things, Jeff Bezos, John Harrison: Longitude, John Markoff, Jono Bacon, Just-in-time delivery, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, loss aversion, Louis Pasteur, Mahatma Gandhi, Marc Andreessen, Mark Zuckerberg, Mars Rover, meta analysis, meta-analysis, microbiome, minimum viable product, move fast and break things, Narrative Science, Netflix Prize, Network effects, Oculus Rift, optical character recognition, packet switching, PageRank, pattern recognition, performance metric, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, ride hailing / ride sharing, risk tolerance, rolodex, self-driving car, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart grid, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, telepresence, telepresence robot, Turing test, urban renewal, web application, X Prize, Y Combinator, zero-sum game

You’ve probably heard about hackathons—those mysterious tournaments where coders compete to see who can hack together the best piece of software in a weekend. Well, with TopCoder, now you can have over 600,000 developers, designers, and data scientists hacking away to create solutions just for you. In fields like software and algorithm development, where there are many ways to solve a problem, having multiple submissions lets you compare performance metrics and choose the best one. Or take Gigwalk, a crowdsourced information-gathering platform that pays a small denomination to incentivize the crowd (i.e., anyone who has the Gigwalk app) to perform a simple task at a particular place and time. “Crowdsourced platforms are being quickly adopted in the retail and consumer products industry,” says Marcus Shingles, a principal with Deloitte Consulting.


pages: 317 words: 100,414

Superforecasting: The Art and Science of Prediction by Philip Tetlock, Dan Gardner

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Affordable Care Act / Obamacare, Any sufficiently advanced technology is indistinguishable from magic, availability heuristic, Black Swan, butterfly effect, cloud computing, cuban missile crisis, Daniel Kahneman / Amos Tversky, desegregation, drone strike, Edward Lorenz: Chaos theory, forward guidance, Freestyle chess, fundamental attribution error, germ theory of disease, hindsight bias, index fund, Jane Jacobs, Jeff Bezos, Kenneth Arrow, Mikhail Gorbachev, Mohammed Bouazizi, Nash equilibrium, Nate Silver, obamacare, pattern recognition, performance metric, Pierre-Simon Laplace, place-making, placebo effect, prediction markets, quantitative easing, random walk, randomized controlled trial, Richard Feynman, Richard Feynman, Richard Thaler, Robert Shiller, Robert Shiller, Ronald Reagan, Saturday Night Live, Silicon Valley, Skype, statistical model, stem cell, Steve Ballmer, Steve Jobs, Steven Pinker, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Watson beat the top human players on Jeopardy!

Elisabeth Rosenthal, “The Hype over Hospital Rankings,” New York Times, July 27, 2013. Efforts to identify “supers”—superhospitals or superteachers or super–intelligence analysts—are easy to dismiss for two reasons: (1) excellence is multidimensional and we can only imperfectly capture some dimensions (patient longevity or test results or Brier scores); (2) as soon as we anoint an official performance metric, we create incentives to game the new system by rejecting very sick patients or ejecting troublesome students. But the solution is not to abandon metrics. It is to resist overinterpreting them. 16. Thomas Friedman, “Iraq Without Saddam,” New York Times, September 1, 2002. 17. Thomas Friedman, “Is Vacation Over?,” New York Times, December 23, 2014. 18. Caleb Melby, Laura Marcinek, and Danielle Burger, “Fed Critics Say ’10 Letter Warning Inflation Still Right,” Bloomberg, October 2, 2014, http://www.bloomberg.com/news/articles/2014-10-02/fed-critics-say-10-letter-warning-inflation-still-right. 19.


pages: 445 words: 105,255

Radical Abundance: How a Revolution in Nanotechnology Will Change Civilization by K. Eric Drexler

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, additive manufacturing, agricultural Revolution, Bill Joy: nanobots, Brownian motion, carbon footprint, Cass Sunstein, conceptual framework, continuation of politics by other means, crowdsourcing, dark matter, double helix, failed state, global supply chain, industrial robot, iterative process, Mars Rover, means of production, Menlo Park, mutually assured destruction, New Journalism, performance metric, reversible computing, Richard Feynman, Richard Feynman, Silicon Valley, South China Sea, Thomas Malthus, V2 rocket, Vannevar Bush, zero-sum game

Participants in the ITRS can safely assume that silicon will rule for years to come, but the QISTR collaboration faced a range of fundamentally different competing approaches: quantum bits represented by the states of (pick one or more) trapped atoms in a vacuum, spin states of atoms embedded in silicon, nuclear spins in solution-phase molecules, or photons in purely photonic systems. These approaches differ radically in scalability and manufacturability as well as in the range of functions that each can implement. The QISTR document must rise to a higher level of abstraction than ITRS. Rather than focusing on performance metrics, it adopts the “DiVincenzo promise criteria” (including scalability, gate universality, decoherence times, and suitable means for input and output) and through these criteria for essential functional capabilities, QISTR then compares diverse approaches and their potential to serve as more than dead-end demos. QISTR shows how a community can explore fields that are rich in alternatives, identifying the technologies that have a genuine potential to serve a role in a functional system, setting others aside as unpromising.


pages: 347 words: 97,721

Only Humans Need Apply: Winners and Losers in the Age of Smart Machines by Thomas H. Davenport, Julia Kirby

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AI winter, Andy Kessler, artificial general intelligence, asset allocation, Automated Insights, autonomous vehicles, basic income, Baxter: Rethink Robotics, business intelligence, business process, call centre, carbon-based life, Clayton Christensen, clockwork universe, commoditize, conceptual framework, dark matter, David Brooks, deliberate practice, deskilling, digital map, Douglas Engelbart, Edward Lloyd's coffeehouse, Elon Musk, Erik Brynjolfsson, estate planning, fixed income, follow your passion, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, game design, general-purpose programming language, Google Glasses, Hans Lippershey, haute cuisine, income inequality, index fund, industrial robot, information retrieval, intermodal, Internet of things, inventory management, Isaac Newton, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Khan Academy, knowledge worker, labor-force participation, lifelogging, loss aversion, Mark Zuckerberg, Narrative Science, natural language processing, Norbert Wiener, nuclear winter, pattern recognition, performance metric, Peter Thiel, precariat, quantitative trading / quantitative finance, Ray Kurzweil, Richard Feynman, Richard Feynman, risk tolerance, Robert Shiller, Robert Shiller, Rodney Brooks, Second Machine Age, self-driving car, Silicon Valley, six sigma, Skype, speech recognition, spinning jenny, statistical model, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, superintelligent machines, supply-chain management, transaction costs, Tyler Cowen: Great Stagnation, Watson beat the top human players on Jeopardy!, Works Progress Administration, Zipcar

The important thing, for the individual learner, is to adopt some framework like this that can bring discipline to the task of focusing on a strength and building it.16 Part of any conscious attempt to build a strength should be a defensible way of measuring progress. We suspect that one reason why “left brain” skills so dominate discussions of human intelligence is simply that they are so easily assessed and compared. The yardsticks we use to measure human achievement—our “performance metrics,” to use business parlance—always push us back to believing that more hard skills training is the answer. Yet that belief constrains us to a narrow track, and the same track we have designed computers to dominate. We are limiting ourselves to running a race we have already determined we cannot win. It might even be that our attempts to have humans keep pace with machines militate against the development of other human strengths.


pages: 338 words: 92,465

Reskilling America: Learning to Labor in the Twenty-First Century by Katherine S. Newman, Hella Winston

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

active measures, blue-collar work, collective bargaining, Computer Numeric Control, deindustrialization, desegregation, factory automation, interchangeable parts, invisible hand, job-hopping, knowledge economy, low skilled workers, performance metric, reshoring, Ronald Reagan, Silicon Valley, two tier labour market, union organizing, upwardly mobile, War on Poverty, Wolfgang Streeck, working poor

“The crossover between the two sides has been excellent.”4 Even though some students need to take the MCAS multiple times before they pass—vocational schools are particularly committed to offering help and remediation for students who fail—only three seniors did not receive diplomas in 2002. Moreover, Massachusetts vocational schools do far better than comprehensive high schools on crucial performance metrics.5 The statewide dropout rate at regular/comprehensive high schools averaged 2.8 percent in 2011 but was only 1.6 percent among the thirty-nine vocational technical schools and averaged 0.9 percent among regional vocational technical schools. (Massachusetts requires every school district to offer students a career vocational technical education option, either by providing it themselves—common among the larger districts—or as part of a regional career vocational technical high school system.)


pages: 831 words: 98,409

SUPERHUBS: How the Financial Elite and Their Networks Rule Our World by Sandra Navidi

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

activist fund / activist shareholder / activist investor, assortative mating, bank run, barriers to entry, Bernie Sanders, Black Swan, Bretton Woods, butterfly effect, Capital in the Twenty-First Century by Thomas Piketty, Carmen Reinhart, central bank independence, cognitive bias, collapse of Lehman Brothers, collateralized debt obligation, commoditize, conceptual framework, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, diversification, East Village, Elon Musk, eurozone crisis, family office, financial repression, Gini coefficient, glass ceiling, Goldman Sachs: Vampire Squid, Google bus, Gordon Gekko, haute cuisine, high net worth, hindsight bias, income inequality, index fund, intangible asset, Jaron Lanier, John Meriwether, Kenneth Arrow, Kenneth Rogoff, knowledge economy, London Whale, Long Term Capital Management, Mark Zuckerberg, mass immigration, McMansion, mittelstand, money market fund, Myron Scholes, NetJets, Network effects, offshore financial centre, old-boy network, Parag Khanna, Paul Samuelson, peer-to-peer, performance metric, Peter Thiel, Plutocrats, plutocrats, Ponzi scheme, quantitative easing, Renaissance Technologies, rent-seeking, reserve currency, risk tolerance, Robert Gordon, Robert Shiller, Robert Shiller, rolodex, Satyajit Das, shareholder value, Silicon Valley, sovereign wealth fund, Stephen Hawking, Steve Jobs, The Future of Employment, The Predators' Ball, too big to fail, women in the workforce, young professional

However, in the complex and opaque world of finance, objective performance measurement is challenging. There are many unknown variables beyond executive control, such as the blowup of a previously hailed asset class, like energy, or the bursting of a bubble like the Internet. A systemic financial crisis may even reveal that all asset classes are in fact negatively correlated. The application of performance metrics has been questioned in view of the recent billion-dollar losses and fines ranging in the hundreds of millions. Yet, CEOs still receive rising pay. Proponents argue that winner-takes-all compensation is simply the result of market forces and freely agreed contracts, and that competitive salaries are necessary to obtain and retain top talent. According to them, paying finance executives handsomely is less costly and disruptive than losing them.


pages: 360 words: 85,321

The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling by Adam Kucharski

Ada Lovelace, Albert Einstein, Antoine Gombaud: Chevalier de Méré, beat the dealer, Benoit Mandelbrot, butterfly effect, call centre, Chance favours the prepared mind, Claude Shannon: information theory, collateralized debt obligation, correlation does not imply causation, diversification, Edward Lorenz: Chaos theory, Edward Thorp, Everything should be made as simple as possible, Flash crash, Gerolamo Cardano, Henri Poincaré, Hibernia Atlantic: Project Express, if you build it, they will come, invention of the telegraph, Isaac Newton, John Nash: game theory, John von Neumann, locking in a profit, Louis Pasteur, Nash equilibrium, Norbert Wiener, p-value, performance metric, Pierre-Simon Laplace, probability theory / Blaise Pascal / Pierre de Fermat, quantitative trading / quantitative finance, random walk, Richard Feynman, Richard Feynman, Ronald Reagan, Rubik’s Cube, statistical model, The Design of Experiments, Watson beat the top human players on Jeopardy!, zero-sum game

1398870164. 205hockey analyst Brian King suggested a way: Charron, Cam. “Analytics Mailbag: Save Percentages, PDO, and Repeatability.” TheLeafsNation.com. May 27, 2014. http://theleafsnation.com/2014/5/27/analytics-mailbag-save-percentages-pdo-and-repeatability. 205The statistic, later dubbed PDO: Details on PDO and NHL statistics given in: Weissbock, Joshua, Herna Viktor, and Diana Inkpen. “Use of Performance Metrics to Forecast Success in the National Hockey League” (paper presented at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Prague, September 23–27, 2013). 205England had the lowest PDO: Burn-Murdoch, John. “Were England the Uunluckiest Team in the World Cup Group Stages?” FT Data Blog. 29 June 2014. http://blogs.ft.com/ftdata/2014/06/29/were-england-the-unluckiest-team-in-the-world-cup-group-stages/. 206Cambridge college spent on wine: “In Vino Veritas, Redux.”


pages: 470 words: 109,589

Apache Solr 3 Enterprise Search Server by Unknown

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

bioinformatics, continuous integration, database schema, en.wikipedia.org, fault tolerance, Firefox, full text search, information retrieval, Internet Archive, natural language processing, performance metric, platform as a service, Ruby on Rails, web application

Summary We briefly covered a wide variety of the issues that surround taking a Solr configuration that works in a development environment and getting it ready for the rigors of a production environment. Solr's modular nature and stripped down focus on search allows it to be compatible with a broad variety of deployment platforms. Solr offers a wealth of monitoring options, from log files, to HTTP request logs, to JMX options. Nonetheless, for a really robust solution, you must define what the key performance metrics are that concern you, and then implement automated solutions for tracking them. Now that we have set up our Solr server, we need to take advantage of it to build better applications. In the next chapter, we'll look at how to easily integrate Solr search through various client libraries. Chapter 9. Integrating Solr As the saying goes, if a tree falls in the woods and no one hears it, did it make a sound?


pages: 476 words: 132,042

What Technology Wants by Kevin Kelly

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, Alfred Russel Wallace, Buckminster Fuller, c2.com, carbon-based life, Cass Sunstein, charter city, Clayton Christensen, cloud computing, computer vision, Danny Hillis, dematerialisation, demographic transition, double entry bookkeeping, Douglas Engelbart, en.wikipedia.org, Exxon Valdez, George Gilder, gravity well, hive mind, Howard Rheingold, interchangeable parts, invention of air conditioning, invention of writing, Isaac Newton, Jaron Lanier, John Conway, John Markoff, John von Neumann, Kevin Kelly, knowledge economy, Lao Tzu, life extension, Louis Daguerre, Marshall McLuhan, megacity, meta analysis, meta-analysis, new economy, off grid, out of africa, performance metric, personalized medicine, phenotype, Picturephone, planetary scale, RAND corporation, random walk, Ray Kurzweil, recommendation engine, refrigerator car, Richard Florida, Rubik’s Cube, Silicon Valley, silicon-based life, Skype, speech recognition, Stephen Hawking, Steve Jobs, Stewart Brand, Ted Kaczynski, the built environment, the scientific method, Thomas Malthus, Vernor Vinge, wealth creators, Whole Earth Catalog, Y2K

As one exponential boom is subsumed into the next, an established technology relays its momentum to the next paradigm and carries forward an unrelenting growth. The exact unit of what is being measured can also morph from one subcurve to the next. We may start out counting pixel size, then shift to pixel density, then to pixel speed. The final performance trait may not be evident in the initial technologies and reveal itself only over the long term, perhaps as a macrotrend that continues indefinitely. In the case of computers, as the performance metric of chips is constantly recalibrated from one technological stage to the next, Moore’s Law—redefined—will never end. Compound S Curves. On this idealized chart, technological performance is measured on the vertical axis and time or engineering effort captured on the horizontal. A series of sub-S curves create an emergent larger-scale invariant slope. The slow demise of the more-transistors-per-chip trend is inevitable.


pages: 443 words: 51,804

Handbook of Modeling High-Frequency Data in Finance by Frederi G. Viens, Maria C. Mariani, Ionut Florescu

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

algorithmic trading, asset allocation, automated trading system, backtesting, Black-Scholes formula, Brownian motion, business process, continuous integration, corporate governance, discrete time, distributed generation, fixed income, Flash crash, housing crisis, implied volatility, incomplete markets, linear programming, mandelbrot fractal, market friction, market microstructure, martingale, Menlo Park, p-value, pattern recognition, performance metric, principal–agent problem, random walk, risk tolerance, risk/return, short selling, statistical model, stochastic process, stochastic volatility, transaction costs, value at risk, volatility smile, Wiener process

Mergers Acquis 1979. 74–82. Wuthrich B, Permunetilleke D, Leung S, Cho V, Zhang J, Lam W. Daily prediction of major stock indices from textual www data. Proceedings of the Fourth International conference on knowledge discovery and data mining, New York, August 27–31, 1998. New York: AAAI Press; 1998. p 364–368. Youngblood A, Collins T. Addressing balanced scorecard trade-off issues between performance metrics using multi-attribute utility theory. Eng Manag J 2003;15:11–17. Zavgren C. The prediction of corporate failure: the state of the art. J Account Lit 1983;2:1–37. Chapter Four Impact of Correlation Fluctuations on Securitized Structures ERIC HILLEBRAND Department of Economics, Louisiana State University, Baton Rouge, LA A M B A R N . S E N G U P TA Department of Mathematics, Louisiana State University, Baton Rouge, LA JUNYUE XU Department of Economics, Louisiana State University, Baton Rouge, LA 4.1 Introduction The financial crisis precipitated by the subprime mortgage fiasco has focused attention on the use of Gaussian copula methods in pricing and risk managing CDOs involving subprime mortgages.


pages: 429 words: 114,726

The Computer Boys Take Over: Computers, Programmers, and the Politics of Technical Expertise by Nathan L. Ensmenger

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

barriers to entry, business process, Claude Shannon: information theory, computer age, deskilling, Donald Knuth, Firefox, Frederick Winslow Taylor, future of work, Grace Hopper, informal economy, information retrieval, interchangeable parts, Isaac Newton, Jacquard loom, Jacquard loom, job satisfaction, John von Neumann, knowledge worker, loose coupling, new economy, Norbert Wiener, pattern recognition, performance metric, Philip Mirowski, post-industrial society, Productivity paradox, RAND corporation, Robert Gordon, Shoshana Zuboff, sorting algorithm, Steve Jobs, Steven Levy, the market place, Thomas Kuhn: the structure of scientific revolutions, Thorstein Veblen, Turing machine, Von Neumann architecture, Y2K

One guidebook from 1969 for managers captured the essence of this adversarial approach to programmer management by describing the successful computer manager as the “one whose grasp of the job is reflected in simple work units that are in the hand[s] of simple programmers; not one who, with control lost, is held in contempt by clever programmers dangerously maintaining control on his behalf.”32 An uncritical reading of this and other similar management perspectives on the process of software development, with their confident claims about the value and efficacy of various performance metrics, development methodologies, and programming languages, might suggest that Kraft and Greenbaum are correct in their assessments. In fact, many of these methodologies do indeed represent “elaborate efforts” that “are being made to develop ways of gradually eliminating programmers, or at least reduce their average skill levels, required training, experience, and so on.”33 Their authors would be the first to admit it.


pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, anti-communist, artificial general intelligence, autonomous vehicles, barriers to entry, Bayesian statistics, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, demographic transition, Donald Knuth, Douglas Hofstadter, Drosophila, Elon Musk, en.wikipedia.org, endogenous growth, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, Gödel, Escher, Bach, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John Markoff, John von Neumann, knowledge worker, Menlo Park, meta analysis, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Norbert Wiener, NP-complete, nuclear winter, optical character recognition, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, strong AI, superintelligent machines, supervolcano, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, transaction costs, Turing machine, Vernor Vinge, Watson beat the top human players on Jeopardy!, World Values Survey, zero-sum game

It is difficult even to make a rough estimate—for aught we know, the efficiency savings could be five orders of magnitude, or ten, or twenty-five.15 * * * Figure 3 Supercomputer performance. In a narrow sense, “Moore’s law” refers to the observation that the number of transistors on integrated circuits have for several decades doubled approximately every two years. However, the term is often used to refer to the more general observation that many performance metrics in computing technology have followed a similarly fast exponential trend. Here we plot peak speed of the world’s fastest supercomputer as a function of time (on a logarithmic vertical scale). In recent years, growth in the serial speed of processors has stagnated, but increased use of parallelization has enabled the total number of computations performed to remain on the trend line.16 There is a further complication with these kinds of evolutionary considerations, one that makes it hard to derive from them even a very loose upper bound on the difficulty of evolving intelligence.


pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AI winter, artificial general intelligence, bioinformatics, brain emulation, combinatorial explosion, complexity theory, computer vision, conceptual framework, correlation coefficient, epigenetics, friendly AI, information retrieval, Isaac Newton, John Conway, Loebner Prize, Menlo Park, natural language processing, Occam's razor, p-value, pattern recognition, performance metric, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K

Evaluating intelligence: A computational semiotics perspective. In IEEE International conference on systems, man and cybernetics, pages 2080–2085, Nashville, Tenessee, USA, 2000. [30] R. R. Gudwin. Evaluating intelligence: A computational semiotics perspective. In IEEE International conference on systems, man and cybernetics, pages 2080–2085, Nashville, Tenessee, USA, 2000. [30] J. Horst. A native intelligence metric for artificial systems. In Performance Metrics for Intelligent Systems Workshop, Gaithersburg, MD, USA, 2002. [31] D. Lenat and E. Feigenbaum. On the thresholds of knowledge. Artificial Intelligence, 47:185–250, 1991. 24 S. Legg and M. Hutter / A Collection of Definitions of Intelligence [32] H. Masum, S. Christensen, and F. Oppacher. The Turing ratio: Metrics for open-ended tasks. In GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 973–980, New York, 2002.


pages: 489 words: 148,885

Accelerando by Stross, Charles

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

call centre, carbon-based life, cellular automata, cognitive dissonance, commoditize, Conway's Game of Life, dark matter, dumpster diving, Extropian, finite state, Flynn Effect, glass ceiling, gravity well, John von Neumann, knapsack problem, Kuiper Belt, Magellanic Cloud, mandelbrot fractal, market bubble, means of production, packet switching, performance metric, phenotype, planetary scale, Pluto: dwarf planet, reversible computing, Richard Stallman, SETI@home, Silicon Valley, Singularitarianism, slashdot, South China Sea, stem cell, technological singularity, telepresence, The Chicago School, theory of mind, Turing complete, Turing machine, Turing test, upwardly mobile, Vernor Vinge, Von Neumann architecture, web of trust, Y2K, zero-sum game

He laughs, briefly. "I used to have an idea a second. Now it's maybe one a year. I'm just a melancholy old birdbrain, me." "Yes, but you know the old saying? The fox has many ideas – the hedgehog has only one, but it's a big idea." "So tell me, what is my big idea?" Manfred leans forward, one elbow on the table, one eye focused on inner space as a hot-burning thread of consciousness barks psephological performance metrics at him, analysing the game ahead. "Where do you think I'm going?" "I think –" Annette breaks off suddenly, staring past his shoulder. Privacy slips, and for a frozen moment Manfred glances round in mild horror and sees thirty or forty other guests in the crowded garden, elbows rubbing, voices raised above the background chatter: "Gianni!" She beams widely as she stands up. "What a surprise!


pages: 496 words: 154,363

I'm Feeling Lucky: The Confessions of Google Employee Number 59 by Douglas Edwards

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, AltaVista, Any sufficiently advanced technology is indistinguishable from magic, barriers to entry, book scanning, Build a better mousetrap, Burning Man, business intelligence, call centre, commoditize, crowdsourcing, don't be evil, Elon Musk, fault tolerance, Googley, gravity well, invisible hand, Jeff Bezos, job-hopping, John Markoff, Marc Andreessen, Menlo Park, microcredit, music of the spheres, Network effects, P = NP, PageRank, performance metric, pets.com, Ralph Nader, risk tolerance, second-price auction, side project, Silicon Valley, Silicon Valley startup, slashdot, stem cell, Superbowl ad, Y2K

It would be indiscreet for me to go into the details of people's private lives beyond what the participants have acknowledged publicly—and it would also be largely irrelevant, since office relationships had little effect on the course of the company. Usually, anyway. I did detect the tidal force of one pairing tugging at my ability to get my job done. Larry and Sergey's insistance on seeing performance metrics for marketing redoubled with the addition of our ad buy on Yahoo. They began a drumbeat of demands for better measurement of our customer-acquisition techniques. What about the promotional text on our homepage? Which messages converted the most newbies to regular users? Testimonials? Promises? Comparisons? How many ads did they click? How many searches did they do? The only way to answer these questions was to generate the homepage dynamically—essentially to implement code that would give us the ability to deliver variant versions of the homepage to users who came to our site.


pages: 478 words: 126,416

Other People's Money: Masters of the Universe or Servants of the People? by John Kay

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Affordable Care Act / Obamacare, asset-backed security, bank run, banking crisis, Basel III, Bernie Madoff, Big bang: deregulation of the City of London, bitcoin, Black Swan, Bonfire of the Vanities, bonus culture, Bretton Woods, call centre, capital asset pricing model, Capital in the Twenty-First Century by Thomas Piketty, cognitive dissonance, corporate governance, Credit Default Swap, cross-subsidies, dematerialisation, diversification, diversified portfolio, Edward Lloyd's coffeehouse, Elon Musk, Eugene Fama: efficient market hypothesis, eurozone crisis, financial innovation, financial intermediation, financial thriller, fixed income, Flash crash, forward guidance, Fractional reserve banking, full employment, George Akerlof, German hyperinflation, Goldman Sachs: Vampire Squid, Growth in a Time of Debt, income inequality, index fund, inflation targeting, information asymmetry, intangible asset, interest rate derivative, interest rate swap, invention of the wheel, Irish property bubble, Isaac Newton, John Meriwether, light touch regulation, London Whale, Long Term Capital Management, loose coupling, low cost carrier, M-Pesa, market design, millennium bug, mittelstand, money market fund, moral hazard, mortgage debt, Myron Scholes, new economy, Nick Leeson, Northern Rock, obamacare, Occupy movement, offshore financial centre, oil shock, passive investing, Paul Samuelson, peer-to-peer lending, performance metric, Peter Thiel, Piper Alpha, Ponzi scheme, price mechanism, purchasing power parity, quantitative easing, quantitative trading / quantitative finance, railway mania, Ralph Waldo Emerson, random walk, regulatory arbitrage, Renaissance Technologies, rent control, Richard Feynman, risk tolerance, road to serfdom, Robert Shiller, Robert Shiller, Ronald Reagan, Schrödinger's Cat, shareholder value, Silicon Valley, Simon Kuznets, South Sea Bubble, sovereign wealth fund, Spread Networks laid a new fibre optics cable between New York and Chicago, Steve Jobs, Steve Wozniak, The Great Moderation, The Market for Lemons, the market place, The Myth of the Rational Market, the payments system, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Tobin tax, too big to fail, transaction costs, tulip mania, Upton Sinclair, Vanguard fund, Washington Consensus, We are the 99%, Yom Kippur War

Even as the thinly capitalised Deutsche Bank was benefiting from state guarantees of its liabilities, it was buying back its own shares to reduce its capital base. And whatever return on equity was claimed by the financial officers of Deutsche Bank, the shareholder returns told a different, and more enlightening, story: the average annual total return on its shares (in US dollars with dividends re-invested) over the period May 2002 to May 2012 (Ackermann’s tenure as chief executive of the bank) was around minus 2 per cent. RoE is an inappropriate performance metric for any company, but especially for a bank, and it is bizarre that its use should have been championed by people who profess particular expertise in financial and risk management. Banks still proclaim return on equity targets: less ambitious, but nevertheless fanciful. In recent discussions of the implications of imposing more extensive capital requirements on banks, a figure of 15 per cent has been proposed and endorsed as a measure of the cost of equity capital to conglomerate banks.28 If these companies were really likely to earn 15 per cent rates of return for the benefit of their shareholders, there would be long queues of investors seeking these attractive returns.


pages: 497 words: 130,817

Pedigree: How Elite Students Get Elite Jobs by Lauren A. Rivera

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

affirmative action, availability heuristic, barriers to entry, Donald Trump, fundamental attribution error, glass ceiling, income inequality, job satisfaction, knowledge economy, meta analysis, meta-analysis, new economy, performance metric, profit maximization, profit motive, school choice, Silicon Valley, Silicon Valley startup, The Wisdom of Crowds, unpaid internship, women in the workforce, young professional

Although cultural similarity can facilitate trust and communication, it often does so at the expense of group effectiveness and high-quality team decision making.39 Furthermore, the emphasis on super-elite schools and the lack of systematic structures in place to reduce the use of gender and race stereotypes in candidate evaluation push qualified women and minorities out of the pool in favor of males and whites. Such patterns could adversely affect organizational performance not only because of the relationship between demographic diversity and higher-quality decision making but also because gender and racial diversity have become key performance metrics that clients and future job candidates use to evaluate firm quality and status. Likewise, the subjective nature of the hiring process can leave employers open to costly gender and racial discrimination lawsuits. EPS firms have faced such suits in the past and continue to face them in the present. Finally, although screening on socioeconomic status may enhance a firm’s status and facilitate client comfort, it excludes individuals who have critical skills relevant for successful job performance.


pages: 382 words: 120,064

Bank 3.0: Why Banking Is No Longer Somewhere You Go but Something You Do by Brett King

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, additive manufacturing, Airbus A320, Albert Einstein, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, asset-backed security, augmented reality, barriers to entry, bitcoin, bounce rate, business intelligence, business process, business process outsourcing, call centre, capital controls, citizen journalism, Clayton Christensen, cloud computing, credit crunch, crowdsourcing, disintermediation, en.wikipedia.org, fixed income, George Gilder, Google Glasses, high net worth, I think there is a world market for maybe five computers, Infrastructure as a Service, invention of the printing press, Jeff Bezos, jimmy wales, London Interbank Offered Rate, M-Pesa, Mark Zuckerberg, mass affluent, Metcalfe’s law, microcredit, mobile money, more computing power than Apollo, Northern Rock, Occupy movement, optical character recognition, peer-to-peer, performance metric, Pingit, platform as a service, QR code, QWERTY keyboard, Ray Kurzweil, recommendation engine, RFID, risk tolerance, Robert Metcalfe, self-driving car, Skype, speech recognition, stem cell, telepresence, Tim Cook: Apple, transaction costs, underbanked, US Airways Flight 1549, web application

There are, however, two sides of Big Data that are consistently discussed in the industry as having strong business benefit. The first is the ability to make better trading decisions, and the second, the ability to connect with customers in the retail environment. In a trading environment, the financial benefits of Big Data appear extremely compelling. The ability, for example, to understand trading cost analytics, capacity of a trade, performance metrics of traders, etc. could be massively profitable to a trading business. How do you create alpha opportunities to outperform, based on that data? The ability to create algorithms that forecast prices in the near term and then make trading decisions accordingly is what will likely drive the profits of banking and trading firms in the near term. Speed of execution is, of course, another key platform capability to leverage this learning and has spawned a raft of low-latency platform investments designed to capture the value of these so-called “alpha” data points.


pages: 413 words: 117,782

What Happened to Goldman Sachs: An Insider's Story of Organizational Drift and Its Unintended Consequences by Steven G. Mandis

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

activist fund / activist shareholder / activist investor, algorithmic trading, Berlin Wall, bonus culture, BRICs, business process, collapse of Lehman Brothers, collateralized debt obligation, commoditize, complexity theory, corporate governance, corporate raider, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, disintermediation, diversification, Emanuel Derman, financial innovation, fixed income, friendly fire, Goldman Sachs: Vampire Squid, high net worth, housing crisis, London Whale, Long Term Capital Management, merger arbitrage, Myron Scholes, new economy, passive investing, performance metric, risk tolerance, Ronald Reagan, Saturday Night Live, Satyajit Das, shareholder value, short selling, sovereign wealth fund, The Nature of the Firm, too big to fail, value at risk

Although my new bosses were smart, sophisticated, and supportive, and as demanding as my investment banking bosses, there was an intense focus on measuring relatively short-term results because they were measurable. Our performance as investors was marked to market every day, meaning that the value of the trades we made was calculated every day, so there was total transparency about how much money we’d made or lost for the firm each and every day. This isn’t done in investment banking, although each year new performance metrics were being added by the time I left for FICC. Typically in banking, relationships take a long time to develop and pay off. A bad day in banking may mean that, after years of meetings and presentations performed for free, a client didn’t select you to execute a transaction. You could offer excuses: “The other bank offered to loan them money,” “They were willing to do it much cheaper,” and so on.


pages: 518 words: 147,036

The Fissured Workplace by David Weil

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

accounting loophole / creative accounting, affirmative action, Affordable Care Act / Obamacare, banking crisis, barriers to entry, business process, call centre, Carmen Reinhart, Cass Sunstein, Clayton Christensen, clean water, collective bargaining, commoditize, corporate governance, corporate raider, Corrections Corporation of America, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, declining real wages, employer provided health coverage, Frank Levy and Richard Murnane: The New Division of Labor, George Akerlof, global supply chain, global value chain, hiring and firing, income inequality, information asymmetry, intermodal, inventory management, Jane Jacobs, Kenneth Rogoff, law of one price, loss aversion, low skilled workers, minimum wage unemployment, moral hazard, Network effects, new economy, occupational segregation, Paul Samuelson, performance metric, pre–internet, price discrimination, principal–agent problem, Rana Plaza, Richard Florida, Richard Thaler, Ronald Coase, shareholder value, Silicon Valley, statistical model, Steve Jobs, supply-chain management, The Death and Life of Great American Cities, The Nature of the Firm, transaction costs, ultimatum game, union organizing, women in the workforce, Y2K, yield management

It also makes clear that the relationship between the two organizations is a principal/vendor one, where “PWV will, at all times, remain the sole and exclusive … employer of any personnel utilized in providing the Services and the Principal of any subcontractor it may elect to utilize.”10 This and other provisions regarding indemnification attempt to establish market-relation distance between the parties. However, other features of the agreement imply a fuzzier boundary between the responsibilities of the two companies. Section 2 describes in considerable detail the standards to which Schneider holds PWV and the mechanisms it will use to monitor compliance with them. Section 2.06, for example, describes a variety of audit-based performance metrics that PWV will periodically provide to Schneider (at no cost to the latter) regarding average number of cases loaded per hour; number of trailers loaded per week; trailer loading accuracy (a critical dimension for Walmart); and average cubic meters packed in trailers per week. These measures serve as the basis of compensation and for ongoing evaluation of PWV’s performance as a contractor.


pages: 515 words: 126,820

Blockchain Revolution: How the Technology Behind Bitcoin Is Changing Money, Business, and the World by Don Tapscott, Alex Tapscott

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Airbnb, altcoin, asset-backed security, autonomous vehicles, barriers to entry, bitcoin, blockchain, Bretton Woods, business process, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, clean water, cloud computing, cognitive dissonance, commoditize, corporate governance, corporate social responsibility, creative destruction, Credit Default Swap, crowdsourcing, cryptocurrency, disintermediation, distributed ledger, Donald Trump, double entry bookkeeping, Edward Snowden, Elon Musk, Erik Brynjolfsson, ethereum blockchain, failed state, fiat currency, financial innovation, Firefox, first square of the chessboard, first square of the chessboard / second half of the chessboard, future of work, Galaxy Zoo, George Gilder, glass ceiling, Google bus, Hernando de Soto, income inequality, informal economy, information asymmetry, intangible asset, interest rate swap, Internet of things, Jeff Bezos, jimmy wales, Kickstarter, knowledge worker, Kodak vs Instagram, Lean Startup, litecoin, Lyft, M-Pesa, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, means of production, microcredit, mobile money, money market fund, Network effects, new economy, Oculus Rift, off grid, pattern recognition, peer-to-peer, peer-to-peer lending, peer-to-peer model, performance metric, Peter Thiel, planetary scale, Ponzi scheme, prediction markets, price mechanism, Productivity paradox, QR code, quantitative easing, ransomware, Ray Kurzweil, renewable energy credits, rent-seeking, ride hailing / ride sharing, Ronald Coase, Ronald Reagan, Satoshi Nakamoto, Second Machine Age, seigniorage, self-driving car, sharing economy, Silicon Valley, Skype, smart contracts, smart grid, social graph, social software, Stephen Hawking, Steve Jobs, Steve Wozniak, Stewart Brand, supply-chain management, TaskRabbit, The Fortune at the Bottom of the Pyramid, The Nature of the Firm, The Wisdom of Crowds, transaction costs, Turing complete, Turing test, Uber and Lyft, unbanked and underbanked, underbanked, unorthodox policies, wealth creators, X Prize, Y2K, Zipcar

When they do the job as specified, they are instantly paid—perhaps not biweekly but daily, hourly, or in microseconds. As the entity wouldn’t necessarily have an anthropomorphic body, employees might not even know that algorithms are managing them. But they would know the rules and norms for good behavior. Given that the smart contract could encode the collective knowledge of management science and that their assignments and performance metrics would be transparent, people could love to work. Customers would provide feedback that the enterprise would apply dispassionately and instantly to correct course. Shareholders would receive dividends, perhaps frequently, as real-time accounting would obviate the need for year-end reports. The organization would perform all these activities under the guidance and incorruptible business rules that are as transparent as the open source software that its founders used to set it in motion.


pages: 461 words: 128,421

The Myth of the Rational Market: A History of Risk, Reward, and Delusion on Wall Street by Justin Fox

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

activist fund / activist shareholder / activist investor, Albert Einstein, Andrei Shleifer, asset allocation, asset-backed security, bank run, beat the dealer, Benoit Mandelbrot, Black-Scholes formula, Bretton Woods, Brownian motion, capital asset pricing model, card file, Cass Sunstein, collateralized debt obligation, complexity theory, corporate governance, corporate raider, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, discovery of the americas, diversification, diversified portfolio, Edward Glaeser, Edward Thorp, endowment effect, Eugene Fama: efficient market hypothesis, experimental economics, financial innovation, Financial Instability Hypothesis, fixed income, floating exchange rates, George Akerlof, Henri Poincaré, Hyman Minsky, implied volatility, impulse control, index arbitrage, index card, index fund, information asymmetry, invisible hand, Isaac Newton, John Meriwether, John Nash: game theory, John von Neumann, joint-stock company, Joseph Schumpeter, Kenneth Arrow, libertarian paternalism, linear programming, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, market bubble, market design, Myron Scholes, New Journalism, Nikolai Kondratiev, Paul Lévy, Paul Samuelson, pension reform, performance metric, Ponzi scheme, prediction markets, pushing on a string, quantitative trading / quantitative finance, Ralph Nader, RAND corporation, random walk, Richard Thaler, risk/return, road to serfdom, Robert Bork, Robert Shiller, Robert Shiller, rolodex, Ronald Reagan, shareholder value, Sharpe ratio, short selling, side project, Silicon Valley, South Sea Bubble, statistical model, The Chicago School, The Myth of the Rational Market, The Predators' Ball, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas L Friedman, Thorstein Veblen, Tobin tax, transaction costs, tulip mania, value at risk, Vanguard fund, Vilfredo Pareto, volatility smile, Yogi Berra

Gerd Gigerenzer, Zeno Swijtink, Theodore Porter, Lorraine Daston, John Beatty, Lorenz Krüger, The Empire of Chance: How Probability Changed Science and Everyday Life (Cambridge: Cambridge University Press, 1989), 3–4. 23. A crucial intermediate step between Markowitz and Treynor was James Tobin, Liquidity Preference as Behavior Towards Risk,” Review of Economic Studies 25, no. 1 (1958): 65–86. 24. Jack L. Treynor, “Towards a Theory of Market Value of Risky Assets,” in Asset Pricing and Portfolio Performance; Models, Strategy and Performance Metrics, Robert A. Korajczk, ed. (London: Risk Books, 1999). 25. William F. Sharpe, “A Simplified Model for Portfolio Analysis,” Management Science (Jan. 1963): 281. 26. William F. Sharpe, “Capital Asset Prices: A Theory of Market Equilibrium Under Conditions of Risk,” Journal of Finance (Sept. 1964): 425–42. 27. John Lintner, “The Valuation of Risk Assets and the Selection of Risky Investments in Stock Portfolios and Capital Budgets,” Review of Economics and Statistics (Feb. 1965): 13–37.


pages: 472 words: 117,093

Machine, Platform, Crowd: Harnessing Our Digital Future by Andrew McAfee, Erik Brynjolfsson

3D printing, additive manufacturing, AI winter, Airbnb, airline deregulation, airport security, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, artificial general intelligence, augmented reality, autonomous vehicles, backtesting, barriers to entry, bitcoin, blockchain, book scanning, British Empire, business process, carbon footprint, Cass Sunstein, centralized clearinghouse, Chris Urmson, cloud computing, cognitive bias, commoditize, complexity theory, computer age, creative destruction, crony capitalism, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, Dean Kamen, discovery of DNA, disintermediation, distributed ledger, double helix, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, ethereum blockchain, everywhere but in the productivity statistics, family office, fiat currency, financial innovation, George Akerlof, global supply chain, Hernando de Soto, hive mind, information asymmetry, Internet of things, inventory management, iterative process, Jean Tirole, Jeff Bezos, jimmy wales, John Markoff, joint-stock company, Joseph Schumpeter, Kickstarter, law of one price, Lyft, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Marc Andreessen, Mark Zuckerberg, meta analysis, meta-analysis, moral hazard, multi-sided market, Myron Scholes, natural language processing, Network effects, new economy, Norbert Wiener, Oculus Rift, PageRank, pattern recognition, peer-to-peer lending, performance metric, Plutocrats, plutocrats, precision agriculture, prediction markets, pre–internet, price stability, principal–agent problem, Ray Kurzweil, Renaissance Technologies, Richard Stallman, ride hailing / ride sharing, risk tolerance, Ronald Coase, Satoshi Nakamoto, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Skype, slashdot, smart contracts, Snapchat, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Pinker, supply-chain management, TaskRabbit, Ted Nelson, The Market for Lemons, The Nature of the Firm, Thomas L Friedman, too big to fail, transaction costs, transportation-network company, traveling salesman, two-sided market, Uber and Lyft, Uber for X, Watson beat the top human players on Jeopardy!, winner-take-all economy, yield management, zero day

US presidential elections are determined by the electoral college, not the national popular vote, and that calls for a more nuanced, state-by-state strategy. Similarly, it’s easy to measure page views or click-through generated by an online advertising campaign, but most companies care more about long-term sales, which are usually maximized by a different kind of campaign. Careful selection of the right data inputs and the right performance metrics, especially the overall evaluation criterion, is a key characteristic of successful data-driven decision makers. Algorithms Behaving Badly A real risk of turning over decisions to machines is that bias in algorithmic systems can perpetuate or even amplify some of the pernicious biases that exist in our society. For instance, Latanya Sweeney, a widely cited professor at Harvard, had a disturbing experience when she entered her own name into the Google search engine.


pages: 496 words: 174,084

Masterminds of Programming: Conversations With the Creators of Major Programming Languages by Federico Biancuzzi, Shane Warden

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Benevolent Dictator For Life (BDFL), business intelligence, business process, cellular automata, cloud computing, commoditize, complexity theory, conceptual framework, continuous integration, data acquisition, domain-specific language, Douglas Hofstadter, Fellow of the Royal Society, finite state, Firefox, follow your passion, Frank Gehry, general-purpose programming language, Guido van Rossum, HyperCard, information retrieval, iterative process, John von Neumann, Larry Wall, linear programming, loose coupling, Mars Rover, millennium bug, NP-complete, Paul Graham, performance metric, Perl 6, QWERTY keyboard, RAND corporation, randomized controlled trial, Renaissance Technologies, Ruby on Rails, Sapir-Whorf hypothesis, Silicon Valley, slashdot, software as a service, software patent, sorting algorithm, Steve Jobs, traveling salesman, Turing complete, type inference, Valgrind, Von Neumann architecture, web application

Lots of discussions go on between individuals or between groups of the form of “I couldn’t do this work because you didn’t give me the requirements yet,” or “We need to have a group of people that goes out and gathers the requirements for this new system.” The term is simply too imprecise. You need to have more precise terms as an alternative. On a big project that I’ve been involved in we have imposed a requirements tax. If anybody uses the word “requirements” standalone, they have to add $2.00 to the entertainment fund. If they want to talk about use cases, or if they want to talk about story cards, or they want to talk about performance metrics, or they want to talk about business cases or business process models, those are all acceptable terms. They don’t incur a tax, because now if you say, “I need to have the use cases or the functional specification, or a mockup of the application that needs to be developed,” that’s a precise request. I see projects getting into trouble when they don’t get that part right. Writing the code doesn’t seem like the hard part anymore.


pages: 1,085 words: 219,144

Solr in Action by Trey Grainger, Timothy Potter

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

business intelligence, cloud computing, commoditize, conceptual framework, crowdsourcing, data acquisition, en.wikipedia.org, failed state, fault tolerance, finite state, full text search, glass ceiling, information retrieval, natural language processing, performance metric, premature optimization, recommendation engine, web application

If you’re not using the default Jetty configuration (java -jar start.jar), you’ll need to separately configure your Java servlet container or bootstrap settings to ensure that these extra JVM parameters are enabled. Most modern application performance monitoring tools are able to read JMX beans and provide long-term collection and graphing of metrics, often along with monitoring and alerting when the numbers deviate significantly beyond performance metrics you can set. In addition, several application performance monitoring tools—including cloud-based ones—now exist with direct support and understanding of Solr’s internals. A simple web search for Solr application performance monitoring will help you find a long list of companies interested in helping you further monitor the performance of your Solr cluster. 12.9.5. Solr logs As with most applications, logs provide the richest source of information about the state of your cluster at any time.


The Art of Scalability: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise by Martin L. Abbott, Michael T. Fisher

always be closing, anti-pattern, barriers to entry, Bernie Madoff, business climate, business continuity plan, business intelligence, business process, call centre, cloud computing, combinatorial explosion, commoditize, Computer Numeric Control, conceptual framework, database schema, discounted cash flows, en.wikipedia.org, fault tolerance, finite state, friendly fire, hiring and firing, Infrastructure as a Service, inventory management, new economy, packet switching, performance metric, platform as a service, Ponzi scheme, RFC: Request For Comment, risk tolerance, Rubik’s Cube, Search for Extraterrestrial Intelligence, SETI@home, shareholder value, Silicon Valley, six sigma, software as a service, the scientific method, transaction costs, Vilfredo Pareto, web application, Y2K

Obviously, running a single instance of anything is not an effective way to scale, but it is common for a team to start on a single server and not test the job or program on multiple servers until they are needed. Migrating to a cloud and realizing on the new virtual server the processing of the job is falling behind might put you in panic mode to test and validate that that job can correctly run on multiple hosts. 445 446 C HAPTER 29 S OARING IN THE C LOUDS The virtual hardware underperforms in some aspects by orders of magnitude. The standard performance metrics include memory speed, CPU, disk access, and so on. There is no standard degradation or equivalence among virtual hosts; in fact, it often varies within cloud environments and certainly varies from one vendor to another. Most companies and applications either don’t notice this or don’t care, but for those making a cost benefit analysis about switching to a cloud computing vendor, you need to test this yourself with your application.


pages: 827 words: 239,762

The Golden Passport: Harvard Business School, the Limits of Capitalism, and the Moral Failure of the MBA Elite by Duff McDonald

activist fund / activist shareholder / activist investor, Affordable Care Act / Obamacare, Albert Einstein, barriers to entry, Bayesian statistics, Bernie Madoff, Bob Noyce, Bonfire of the Vanities, business process, butterfly effect, capital asset pricing model, Capital in the Twenty-First Century by Thomas Piketty, Clayton Christensen, cloud computing, collateralized debt obligation, collective bargaining, commoditize, corporate governance, corporate raider, corporate social responsibility, creative destruction, deskilling, discounted cash flows, disintermediation, Donald Trump, family office, financial innovation, Frederick Winslow Taylor, full employment, George Gilder, glass ceiling, Gordon Gekko, hiring and firing, income inequality, invisible hand, Jeff Bezos, job-hopping, John von Neumann, Joseph Schumpeter, Kenneth Arrow, London Whale, Long Term Capital Management, market fundamentalism, Menlo Park, new economy, obamacare, oil shock, pattern recognition, performance metric, Peter Thiel, Plutocrats, plutocrats, profit maximization, profit motive, pushing on a string, Ralph Nader, Ralph Waldo Emerson, RAND corporation, random walk, rent-seeking, Ronald Coase, Ronald Reagan, Sand Hill Road, Saturday Night Live, shareholder value, Silicon Valley, Skype, Steve Jobs, survivorship bias, The Nature of the Firm, the scientific method, Thorstein Veblen, union organizing, urban renewal, Vilfredo Pareto, War on Poverty, William Shockley: the traitorous eight, women in the workforce, Y Combinator

The article was well received, in no small part because it represented a sensible response to the absurdity of Michael Jensen’s insistence that there was one, and only one, measure that mattered: stock price. And whereas Jensen has always dismissed the Balanced Scorecard out of hand, Kaplan has been more diplomatic about their differences. “I obviously agree with Jensen that managers cannot be paid by a set of unweighted performance metrics,” he wrote in 2010. “Ultimately, if a company wants to set bonuses based on measured performance, it must reward based on a single measure (either a stock market or accounting-based metric) or provide a weighting among the multiple measures a manager has been instructed to improve. But linking performance to pay is only one component of a comprehensive management system.”2 Kaplan’s Balanced Scorecard has sometimes been viewed as interchangeable with the concept of “stakeholder theory,” in which companies are encouraged to define objectives for their various stakeholders—external ones (shareholders, customers, and communities) and internal ones (employees and suppliers)—and then develop a strategy thereafter.

Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

■ Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (because most data warehouses store historic rather than up-to-date information), although many could be complex queries. Other features that distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics. These are summarized in Table 4.1. Table 4.1 Comparison of OLTP and OLAP Systems Note: Table is partially based on Chaudhuri and Dayal [CD97]. FeatureOLTPOLAP Characteristic operational processing informational processing Orientation transaction analysis User clerk, DBA, database professional knowledge worker (e.g., manager, executive, analyst) Function day-to-day operations long-term informational requirements decision support DB design ER-based, application-oriented star/snowflake, subject-oriented Data current, guaranteed up-to-date historic, accuracy maintainedover time Summarization primitive, highly detailed summarized, consolidated View detailed, flat relational summarized, multidimensional Unit of work short, simple transaction complex query Access read/write mostly read Focus data in information out Operations index/hash on primary key lots of scans Number of records accessed tens millions Number of users thousands hundreds DB size GB to high-order GB ≥ TB Priority high performance, high availability high flexibility, end-user autonomy Metric transaction throughput query throughput, response time 4.1.3.