correlation coefficient

107 results back to index


Statistics in a Nutshell by Sarah Boslaugh

Antoine Gombaud: Chevalier de Méré, Bayesian statistics, business climate, computer age, correlation coefficient, experimental subject, Florence Nightingale: pie chart, income per capita, iterative process, job satisfaction, labor-force participation, linear programming, longitudinal study, meta analysis, meta-analysis, p-value, pattern recognition, placebo effect, probability theory / Blaise Pascal / Pierre de Fermat, publication bias, purchasing power parity, randomized controlled trial, selection bias, six sigma, statistical model, The Design of Experiments, the scientific method, Thomas Bayes, Vilfredo Pareto

To put it another way, the 90% confidence interval includes less of the total probability than the 95% confidence interval, so it’s not surprising that it is narrower. Figure 6-29. The different t-tests and their uses Chapter 7. The Pearson Correlation Coefficient The Pearson correlation coefficient is a measure of linear association between two interval- or ratio-level variables. Although there are other types of correlation (several are discussed in Chapter 5, including the Spearman rank-order correlation coefficient), the Pearson correlation coefficient is the most common, and often the label “Pearson” is dropped, and we simply speak of “correlation” or “the correlation coefficient.” Unless otherwise specified in this book, “correlation” means the Pearson correlation coefficient. Correlations are often computed during the exploratory stage of a research project to see what kinds of relationships the different continuous variables have with each other, and often scatterplots (discussed in Chapter 4) are created to examine these relationships graphically.

.)), Probability Tables for Common Distributions NNT (Number Needed to Treat), Attributable Risk, Attributable Risk Percentage, and Number Needed to Treat nominal data, Nominal Data–Nominal Data, Glossary of Statistical Terms about, Nominal Data–Nominal Data definition of, Glossary of Statistical Terms nonparametric statistics, Data Transformations, Nonparametric Statistics, Nonparametric Statistics, Glossary of Statistical Terms about, Nonparametric Statistics definition of, Glossary of Statistical Terms parametric statistics and, Data Transformations, Nonparametric Statistics nonprobability sampling, Nonprobability Sampling–Nonprobability Sampling, Glossary of Statistical Terms nonresponse bias, Bias in Sample Selection and Retention, Glossary of Statistical Terms norm group, Percentiles norm-referenced, Percentiles, Test Construction scoring, Percentiles tests, Test Construction normal distribution, The Normal Distribution–The Normal Distribution, The Histogram normal distribution, standard, The Standard Normal Distribution–The t-Distribution normal score, The Normal Distribution–The Normal Distribution, Percentiles normalized scores, The Normal Distribution–The Normal Distribution null hypothesis, Hypothesis Testing number line, Linear regression, Laws of Arithmetic Number Needed to Treat (NNT), Attributable Risk, Attributable Risk Percentage, and Number Needed to Treat numeric and string data, String and Numeric Data O observational studies, Observational Studies–Observational Studies observed score, Glossary of Statistical Terms observed values, The Chi-Square Test odds ratio, The Odds Ratio–The Odds Ratio odds, calculating, The Odds Ratio OLS (Ordinary Least Squares) regression equation, Independent and Dependent Variables omnibus F-test, Post Hoc Tests one-group pretest-posttest design, Quasi-Experimental Studies one-sample t-test, The One-Sample t-Test–Confidence Interval for the One-Sample t-Test one-way ANOVA, The t-Test, One-Way ANOVA–One-Way ANOVA about, One-Way ANOVA–One-Way ANOVA t-test and, The t-Test online resources, Online Resources–Online Textbooks operationalization, Operationalization, Glossary of Statistical Terms opportunity loss table, Minimax, Maximax, and Maximin ordinal data, Ordinal Data–Ordinal Data, Categorical Data, The R×C Table–The R×C Table, Measures of Agreement–Measures of Agreement, The Wilcoxon Rank Sum Test, The Wilcoxon Rank Sum Test, Glossary of Statistical Terms about, Ordinal Data–Ordinal Data, Categorical Data definition of, Glossary of Statistical Terms mean rank, The Wilcoxon Rank Sum Test measures of agreement, Measures of Agreement–Measures of Agreement rank sum, The Wilcoxon Rank Sum Test R×C table, The R×C Table–The R×C Table ordinal variables, correlation statistics for, Ordinal Variables, Ordinal Variables–Ordinal Variables, Ordinal Variables, Ordinal Variables–Ordinal Variables, Ordinal Variables, Ordinal Variables, Ordinal Variables–Ordinal Variables gamma, Ordinal Variables–Ordinal Variables Kendall’s tau-a, Ordinal Variables, Ordinal Variables Kendall’s tau-b, Ordinal Variables–Ordinal Variables Kendall’s tau-c, Ordinal Variables Somers’s d, Ordinal Variables–Ordinal Variables Spearman’s rank-order coefficient, Ordinal Variables Ordinary Least Squares (OLS) regression equation, Independent and Dependent Variables orthogonality, in research design structure, Ingredients of a Good Design outliers, Outliers–Outliers overfitting, Overfitting–Overfitting P p-values, p-values–p-values, The Z-Statistic about, p-values–p-values of Z value, The Z-Statistic Paasche index, Index Numbers–Index Numbers Packel, Edward, The Mathematics of Games and Gambling, Closing Note: The Connection between Statistics and Gambling parallel-forms (multiple-forms) reliability, Reliability parameters, in descriptive statistics, Inferential Statistics, Populations and Samples parametric statistics, Data Transformations, Nonparametric Statistics, Glossary of Statistical Terms Pareto charts (diagrams), Pareto Charts–Pareto Charts Pareto, Vilfredo, Pareto Charts partial correlation, Methods for Building Regression Models PCA (Principal Components Analysis), Factor Analysis–Factor Analysis, Factor Analysis, Factor Analysis–Factor Analysis Pearson correlation coefficient, Correlation Statistics for Categorical Data, Binary Variables, The Pearson Correlation Coefficient, Association–Association, Scatterplots–Relationships Between Continuous Variables, Relationships Between Continuous Variables–Relationships Between Continuous Variables, The Pearson Correlation Coefficient–Testing Statistical Significance for the Pearson Correlation, Testing Statistical Significance for the Pearson Correlation–Testing Statistical Significance for the Pearson Correlation, The Coefficient of Determination about, Correlation Statistics for Categorical Data, Binary Variables, The Pearson Correlation Coefficient about correlation coefficient, The Pearson Correlation Coefficient–Testing Statistical Significance for the Pearson Correlation associations, Association–Association coefficient of determination, The Coefficient of Determination relationships between continuous variables, Relationships Between Continuous Variables–Relationships Between Continuous Variables scatterplots as visual tool, Scatterplots–Relationships Between Continuous Variables testing statistical significance for, Testing Statistical Significance for the Pearson Correlation–Testing Statistical Significance for the Pearson Correlation Pearson’s chi-square test, The Chi-Square Test (see chi-square test) peer review process, journal, The Peer Review Process–The Peer Review Process percent agreement measures, Measures of Agreement percentages, interpreting, Power for the Test of the Difference between Two Sample Means (Independent Samples t-Test) percentiles, Percentiles–Percentiles permutations, Factorials, Permutations, and Combinations–Factorials, Permutations, and Combinations permutations of elements, Permutations phi coefficient, Binary Variables–Binary Variables, Item Analysis physical vs. social sciences, definition of treatments, Specifying Treatment Levels pie charts, Pie Charts placebo, Glossary of Statistical Terms placebo effect, Blinding, Glossary of Statistical Terms playing cards, Dice, Coins, and Playing Cards point estimates, calculating, Confidence Intervals point-biserial correlation coefficient, The Point-Biserial Correlation Coefficient–The Point-Biserial Correlation Coefficient, Item Analysis polynomial regression, Polynomial Regression–Polynomial Regression populations, Inferential Statistics, Inferential Statistics, Populations and Samples–Probability Sampling, Descriptive Statistics and Graphic Displays–Populations and Samples, The Mean–The Mean, The Variance and Standard Deviation, The Variance and Standard Deviation, Population in descriptive statistics, Descriptive Statistics and Graphic Displays–Populations and Samples, The Mean–The Mean, The Variance and Standard Deviation, The Variance and Standard Deviation calculating variance, The Variance and Standard Deviation formula for standard deviation, The Variance and Standard Deviation mean, The Mean–The Mean samples and, Descriptive Statistics and Graphic Displays–Populations and Samples in inferential statistics, Inferential Statistics, Inferential Statistics, Populations and Samples–Probability Sampling mean, Inferential Statistics samples and, Populations and Samples–Probability Sampling variance, Inferential Statistics issues in research design with, Population positive discrimination, Item Analysis post hoc test, Post Hoc Tests–Post Hoc Tests, Glossary of Statistical Terms posttest only design, Quasi-Experimental Studies posttest-only non-equivalent groups design, Quasi-Experimental Studies power, Glossary of Statistical Terms power analysis, Power Analysis–Power Analysis, Ingredients of a Good Design power of coincidence, issues in research design with, The Power of Coincidence Practical Nonparametric Statistics (Conover), Nonparametric Statistics presidential elections, predictions of, Exercises pretest-posttest design with comparison group, Quasi-Experimental Studies prevalence, Prevalence and Incidence–Prevalence and Incidence, Prevalence and Incidence, Glossary of Statistical Terms primary data, Basic Vocabulary Principal Components Analysis (PCA), Factor Analysis–Factor Analysis probability, Defining Probability–Intersection of nonindependent events, Expressing the Probability of an Event–Expressing the Probability of an Event, Conditional Probabilities–Conditional Probabilities conditional, Conditional Probabilities–Conditional Probabilities definition of, Defining Probability–Intersection of nonindependent events of events, Expressing the Probability of an Event–Expressing the Probability of an Event probability distributions, in inferential statistics, Probability Distributions–The Binomial Distribution probability sampling, Probability Sampling–Probability Sampling, Glossary of Statistical Terms probability tables for distributions, Probability Tables for Common Distributions–The Chi-Square Distribution, The Standard Normal Distribution–The t-Distribution, The t-Distribution–The t-Distribution, The Binomial Distribution–The Binomial Distribution, The Chi-Square Distribution–The Chi-Square Distribution about, Probability Tables for Common Distributions–The Chi-Square Distribution binomial distribution, The Binomial Distribution–The Binomial Distribution chi-square distribution, The Chi-Square Distribution–The Chi-Square Distribution standard normal distribution, The Standard Normal Distribution–The t-Distribution t-distribution, The t-Distribution–The t-Distribution probability theory, Probability–Probability, About Formulas–About Formulas, About Formulas–Combinations, Defining Probability–Intersection of nonindependent events, Bayes’ Theorem–Bayes’ Theorem, Closing Note: The Connection between Statistics and Gambling–Closing Note: The Connection between Statistics and Gambling about, Probability–Probability Bayes’ theorem and, Bayes’ Theorem–Bayes’ Theorem defining probability, Defining Probability–Intersection of nonindependent events definitions in, About Formulas–Combinations formulas, About Formulas–About Formulas gambling and, Closing Note: The Connection between Statistics and Gambling–Closing Note: The Connection between Statistics and Gambling product-moment correlation coefficient, The Pearson Correlation Coefficient propensity score, Observational Studies properties of equality, Solving Equations proportion, Proportions: The Large Sample Case–Proportions: The Large Sample Case, Ratio, Proportion, and Rate, Ratio, Proportion, and Rate, Glossary of Statistical Terms about, Ratio, Proportion, and Rate definition of, Glossary of Statistical Terms formula for, Ratio, Proportion, and Rate large-sample Z tests for, Proportions: The Large Sample Case–Proportions: The Large Sample Case prospective cohort study, Basic Vocabulary prospective study, Basic Vocabulary, Glossary of Statistical Terms proxy measurement, Proxy Measurement–Proxy Measurement, Glossary of Statistical Terms pseudo-chance-level parameter, Item Response Theory psychological and educational statistics, Educational and Psychological Statistics–Educational and Psychological Statistics, Percentiles–Percentiles, Standardized Scores–Standardized Scores, Test Construction–Test Construction, Classical Test Theory: The True Score Model–Classical Test Theory: The True Score Model, Reliability of a Composite Test–Reliability of a Composite Test, Measures of Internal Consistency–Coefficient Alpha, Item Analysis–Item Analysis, Item Response Theory–Item Response Theory about, Educational and Psychological Statistics–Educational and Psychological Statistics classical test theory, Classical Test Theory: The True Score Model–Classical Test Theory: The True Score Model item analysis, Item Analysis–Item Analysis item response theory, Item Response Theory–Item Response Theory measures of internal consistency, Measures of Internal Consistency–Coefficient Alpha percentiles, Percentiles–Percentiles reliability of composite test, Reliability of a Composite Test–Reliability of a Composite Test standardized scores, Standardized Scores–Standardized Scores test construction, Test Construction–Test Construction psychometrics, Educational and Psychological Statistics publication bias, Quick Checklist Q quadratic regression model, Polynomial Regression–Polynomial Regression Quality Improvement (QI), Quality Improvement–Run Charts and Control Charts quasi-experimental, Basic Vocabulary–Basic Vocabulary, Quasi-Experimental Studies–Quasi-Experimental Studies research design type, Basic Vocabulary–Basic Vocabulary studies, Quasi-Experimental Studies–Quasi-Experimental Studies quota sampling, Nonprobability Sampling R R programming language, Graphic Methods, R–R random errors, Random and Systematic Error–Random and Systematic Error, Glossary of Statistical Terms definition of, Glossary of Statistical Terms vs. systematic errors, Random and Systematic Error–Random and Systematic Error random measurement error, Classical Test Theory: The True Score Model–Classical Test Theory: The True Score Model Random-Digit-Dialing (RDD) techniques, Bias in Sample Selection and Retention randomization, Confounding, Stratified Analysis, and the Mantel-Haenszel Common Odds Ratio randomized block design, Blocking and the Latin Square range, Glossary of Statistical Terms range and interquartile range, The Range and Interquartile Range–The Range and Interquartile Range rank sum, The Wilcoxon Rank Sum Test Rasch model, Item Response Theory Rasch, Georg, Item Response Theory rate, Ratio, Proportion, and Rate–Ratio, Proportion, and Rate, Crude, Category-Specific, and Standardized Rates–Crude, Category-Specific, and Standardized Rates, Glossary of Statistical Terms about, Ratio, Proportion, and Rate–Ratio, Proportion, and Rate crude rate as, Crude, Category-Specific, and Standardized Rates–Crude, Category-Specific, and Standardized Rates definition of, Glossary of Statistical Terms ratio, Ratio, Proportion, and Rate, Glossary of Statistical Terms about, Ratio, Proportion, and Rate definition of, Glossary of Statistical Terms ratio data, Ratio Data–Ratio Data, Glossary of Statistical Terms about, Ratio Data–Ratio Data definition of, Glossary of Statistical Terms raw time series, Time Series real numbers, properties of, Properties of Real Numbers recall bias, Information Bias, Glossary of Statistical Terms rectangular coordinates (Cartesian coordinates), Graphing Equations–Graphing Equations rectangular data file, storing data electronically in, Codebooks–The Rectangular Data File regression, Independent and Dependent Variables–Independent and Dependent Variables, Introduction to Regression and ANOVA, Linear Regression–Linear Regression, Assumptions–Assumptions, Calculating Simple Regression by Hand–Calculating Simple Regression by Hand, Multiple Regression Models–Multiple Regression Models, Multiple Regression Models–Multiple Regression Models, Multiple Regression Models, Multiple Regression Models–Multiple Regression Models, Multiple Regression Models–Multiple Regression Models, Multiple Regression Models–Multiple Regression Models, Multiple Regression Models, Multiple Regression Models–Multiple Regression Models, Multiple Regression Models–Multiple Regression Models, Dummy Variables–Dummy Variables, Methods for Building Regression Models–Backward removal, Logistic Regression–Converting Logits to Probabilities, Multinomial Logistic Regression–Multinomial Logistic Regression, Polynomial Regression–Polynomial Regression, Polynomial Regression–Polynomial Regression, Polynomial Regression–Polynomial Regression, Overfitting–Overfitting, Quasi-Experimental Studies about, Introduction to Regression and ANOVA arbitrary curve-fitting, Overfitting–Overfitting calculating by hand, Calculating Simple Regression by Hand–Calculating Simple Regression by Hand cubic regression model, Polynomial Regression–Polynomial Regression independent variables and dependent variables, Independent and Dependent Variables–Independent and Dependent Variables linear, Linear Regression–Linear Regression, Assumptions–Assumptions about, Linear Regression–Linear Regression assumptions, Assumptions–Assumptions logistic, Logistic Regression–Converting Logits to Probabilities modeling principles, Multiple Regression Models–Multiple Regression Models multinomial logistic, Multinomial Logistic Regression–Multinomial Logistic Regression multiple linear, Multiple Regression Models–Multiple Regression Models, Multiple Regression Models, Multiple Regression Models–Multiple Regression Models, Multiple Regression Models–Multiple Regression Models, Multiple Regression Models–Multiple Regression Models, Multiple Regression Models, Multiple Regression Models–Multiple Regression Models, Multiple Regression Models–Multiple Regression Models, Dummy Variables–Dummy Variables, Methods for Building Regression Models–Backward removal about, Multiple Regression Models–Multiple Regression Models adding interaction term, Multiple Regression Models–Multiple Regression Models assumptions, Multiple Regression Models creating a correlation matrix, Multiple Regression Models–Multiple Regression Models dummy variables, Dummy Variables–Dummy Variables methods for building regression models, Methods for Building Regression Models–Backward removal regression equation for data, Multiple Regression Models–Multiple Regression Models results for individual predictors, Multiple Regression Models–Multiple Regression Models standardized coefficients, Multiple Regression Models variables in model, Multiple Regression Models–Multiple Regression Models polynomial, Polynomial Regression–Polynomial Regression quadratic regression model, Polynomial Regression–Polynomial Regression to the mean, Quasi-Experimental Studies regression equations, independent variables and dependent variables in, Independent and Dependent Variables–Independent and Dependent Variables regression to the mean, Quasi-Experimental Studies related samples t-test, Repeated Measures t-Test–Confidence Interval for the Repeated Measures t-Test relational databases, for data management, Spreadsheets and Relational Databases–Spreadsheets and Relational Databases relative frequency, Frequency Tables, Bar Charts–Bar Charts relative risk, The Risk Ratio–The Risk Ratio reliability, Reliability and Validity–Triangulation, Reliability–Reliability, Glossary of Statistical Terms about, Reliability–Reliability definition of, Glossary of Statistical Terms validity and, Reliability and Validity–Triangulation reliability coefficient, Reliability of a Composite Test reliability index, Reliability of a Composite Test repeated measures (related samples) t-test, Repeated Measures t-Test–Confidence Interval for the Repeated Measures t-Test research articles, Writing the Article–Writing the Article, Common Problems–Common Problems, Quick Checklist–Quick Checklist, Issues in Research Design–The Power of Coincidence, Descriptive Statistics–Extrapolation and Trends, Extrapolation and Trends–Linear regression checklist for statistics based investigations, Quick Checklist–Quick Checklist common problems with, Common Problems–Common Problems critiquing descriptive statistics, Descriptive Statistics–Extrapolation and Trends incorrect use of tests in inferential statistics, Extrapolation and Trends–Linear regression issues in research design, Issues in Research Design–The Power of Coincidence writing, Writing the Article–Writing the Article research design, Research Design, Basic Vocabulary–Basic Vocabulary, Basic Vocabulary, Basic Vocabulary, Basic Vocabulary, Basic Vocabulary, Basic Vocabulary, Basic Vocabulary, Observational Studies–Observational Studies, Quasi-Experimental Studies–Quasi-Experimental Studies, Experimental Studies–Experimental Studies, Ingredients of a Good Design–Ingredients of a Good Design, Gathering Experimental Data–Blocking and the Latin Square, Specifying Treatment Levels, Specifying Response Variables, Blinding, Retrospective Adjustment, Blocking and the Latin Square, Example Experimental Design–Example Experimental Design, Communicating with Statistics–Writing for Your Workplace, Issues in Research Design–The Power of Coincidence about, Research Design blinding, Blinding blocking and Latin square, Blocking and the Latin Square classification of studies, Basic Vocabulary communicating with statistics, Communicating with Statistics–Writing for Your Workplace data types, Basic Vocabulary example of, Example Experimental Design–Example Experimental Design experimental studies, Experimental Studies–Experimental Studies factor in, Basic Vocabulary factorial design, Basic Vocabulary gathering experimental data, Gathering Experimental Data–Blocking and the Latin Square hypothesis testing vs. data mining, Specifying Response Variables ingredients of good design, Ingredients of a Good Design–Ingredients of a Good Design issues in, Issues in Research Design–The Power of Coincidence observational studies, Observational Studies–Observational Studies physical vs. social sciences definition of treatments, Specifying Treatment Levels quasi-experimental studies, Quasi-Experimental Studies–Quasi-Experimental Studies retrospective adjustment, Retrospective Adjustment style of notation, Basic Vocabulary types of, Basic Vocabulary–Basic Vocabulary unit of analysis in study, Basic Vocabulary response variables, specifying in experimental design, Specifying Response Variables–Specifying Response Variables responses, experimental, Experimental Studies restriction, Confounding, Stratified Analysis, and the Mantel-Haenszel Common Odds Ratio results section, Writing the Article, Evaluating the Whole Article critiquing in articles, Evaluating the Whole Article writing, Writing the Article retrospective adjustment, Retrospective Adjustment retrospective study, Basic Vocabulary, Glossary of Statistical Terms risk ratio, The Risk Ratio–Attributable Risk, Attributable Risk Percentage, and Number Needed to Treat Robinson, W.S., Basic Vocabulary rolling average, Time Series roots, properties of, Properties of Roots–Properties of Roots Rosenbaum, Paul, Observational Studies Rubin, Donald, Observational Studies Rubin, Roderick J.A., String and Numeric Data–Missing Data run charts and control charts, Run Charts and Control Charts R×C table (contingency table), The R×C Table–The R×C Table S Safari Books Online, Safari® Books Online sample size calculations, Sample Size Calculations–Power for the Test of the Difference between Two Sample Means (Independent Samples t-Test) sample space, definition of, Sample Space–Sample Space samples, Inferential Statistics, Inferential Statistics, Populations and Samples–Probability Sampling, Populations and Samples, Descriptive Statistics and Graphic Displays–Populations and Samples, The Variance and Standard Deviation, The Variance and Standard Deviation, The One-Sample t-Test–Confidence Interval for the One-Sample t-Test, The Independent Samples t-Test–Confidence Interval for the Independent Samples t-Test, Repeated Measures t-Test–Confidence Interval for the Repeated Measures t-Test in descriptive statistics, Descriptive Statistics and Graphic Displays–Populations and Samples, The Variance and Standard Deviation, The Variance and Standard Deviation calculating variance, The Variance and Standard Deviation formula for standard deviation, The Variance and Standard Deviation populations and, Descriptive Statistics and Graphic Displays–Populations and Samples in inferential statistics, Inferential Statistics, Inferential Statistics, Populations and Samples–Probability Sampling mean, Inferential Statistics populations and, Inferential Statistics, Populations and Samples–Probability Sampling one-sample t-test, The One-Sample t-Test–Confidence Interval for the One-Sample t-Test related samples t-test, Repeated Measures t-Test–Confidence Interval for the Repeated Measures t-Test two-sample t-test, The Independent Samples t-Test–Confidence Interval for the Independent Samples t-Test U.S.

., Basic Vocabulary, Quasi-Experimental Studies, Quasi-Experimental Studies correlation statistics for categorical data, Binary Variables–Ordinal Variables correlations, The Pearson Correlation Coefficient, Association–Association, Scatterplots–Relationships Between Continuous Variables, Relationships Between Continuous Variables–Relationships Between Continuous Variables, The Pearson Correlation Coefficient–Testing Statistical Significance for the Pearson Correlation, Testing Statistical Significance for the Pearson Correlation–Testing Statistical Significance for the Pearson Correlation, The Coefficient of Determination, Methods for Building Regression Models about, The Pearson Correlation Coefficient associations, Association–Association coefficient of determination, The Coefficient of Determination correlation coefficient, The Pearson Correlation Coefficient–Testing Statistical Significance for the Pearson Correlation partial, Methods for Building Regression Models relationships between continuous variables, Relationships Between Continuous Variables–Relationships Between Continuous Variables scatterplots as visual tool, Scatterplots–Relationships Between Continuous Variables testing statistical significance for, Testing Statistical Significance for the Pearson Correlation–Testing Statistical Significance for the Pearson Correlation CPI (Consumer Price Index), Index Numbers, Index Numbers Cramer’s V, Binary Variables–Binary Variables criterion for factor retention, Factor Analysis criterion validity, Glossary of Statistical Terms criterion-referenced tests, Test Construction critiquing presentations about statistics, Evaluating the Whole Article–Evaluating the Whole Article, The Misuse of Statistics–The Misuse of Statistics, Common Problems–Common Problems, Quick Checklist–Quick Checklist, Issues in Research Design–The Power of Coincidence, Descriptive Statistics–Extrapolation and Trends, Extrapolation and Trends–Linear regression checklist for statistics based investigations, Quick Checklist–Quick Checklist common problems in presentations, Common Problems–Common Problems evaluating whole article, Evaluating the Whole Article–Evaluating the Whole Article incorrect use of tests in inferential statistics, Extrapolation and Trends–Linear regression interpretation of descriptive statistics, Descriptive Statistics–Extrapolation and Trends issues in research design, Issues in Research Design–The Power of Coincidence misusing statistics, The Misuse of Statistics–The Misuse of Statistics Cronbach’s alpha (coefficient alpha), Reliability cross-sectional design, Observational Studies–Observational Studies cross-sectional study, Glossary of Statistical Terms cross-tabulation, The Risk Ratio crude rate, Crude, Category-Specific, and Standardized Rates–Crude, Category-Specific, and Standardized Rates cubic regression model, Polynomial Regression–Polynomial Regression cumulative frequency, Frequency Tables Cumulative Incidence (CI), Prevalence and Incidence CV (Coefficient of Variation), The Variance and Standard Deviation–The Variance and Standard Deviation D data, Statistics in the Age of Information, Basic Concepts of Measurement, Basic Concepts of Measurement–Proxy Measurement, The Rectangular Data File, String and Numeric Data–Missing Data, Gathering Experimental Data–Blocking and the Latin Square, Evaluating the Whole Article converting information into, Basic Concepts of Measurement critiquing in articles, Evaluating the Whole Article gathering experimental data, Gathering Experimental Data–Blocking and the Latin Square meaning of, Statistics in the Age of Information missing data, String and Numeric Data–Missing Data types of, Basic Concepts of Measurement–Proxy Measurement unit of analysis, The Rectangular Data File data management, Data Management–Data Management, An Approach, Not a Set of Recipes–An Approach, Not a Set of Recipes, The Chain of Command, Codebooks–Codebooks, Codebooks–The Rectangular Data File, Spreadsheets and Relational Databases–Spreadsheets and Relational Databases, Spreadsheets and Relational Databases, Inspecting a New Data File–Inspecting a New Data File, Inspecting a New Data File, Inspecting a New Data File, String and Numeric Data, String and Numeric Data–Missing Data about, Data Management–Data Management approach to, An Approach, Not a Set of Recipes–An Approach, Not a Set of Recipes codebooks, Codebooks–Codebooks data entry software, Spreadsheets and Relational Databases in projects, The Chain of Command inspecting new data file, Inspecting a New Data File–Inspecting a New Data File missing data, String and Numeric Data–Missing Data spreadsheets and relational databases for, Spreadsheets and Relational Databases–Spreadsheets and Relational Databases storing data electronically in rectangular data file, Codebooks–The Rectangular Data File string and numeric data, String and Numeric Data unique identifier in, Inspecting a New Data File variable names in transfer process to software, Inspecting a New Data File data mining vs. hypothesis testing, Specifying Response Variables data transformations, Data Transformations–Data Transformations data types, Basic Vocabulary databases, for data management, Spreadsheets and Relational Databases–Spreadsheets and Relational Databases decision analysis, Decision Analysis–Decision Trees decision trees, Decision Trees decision-making, Decision Analysis, Decision Analysis, Decision Analysis under certainty, Decision Analysis under risk, Decision Analysis under uncertainty, Decision Analysis degrees of freedom, Glossary of Statistical Terms Deming, W.


Analysis of Financial Time Series by Ruey S. Tsay

Asian financial crisis, asset allocation, Bayesian statistics, Black-Scholes formula, Brownian motion, business cycle, capital asset pricing model, compound rate of return, correlation coefficient, data acquisition, discrete time, frictionless, frictionless market, implied volatility, index arbitrage, Long Term Capital Management, market microstructure, martingale, p-value, pattern recognition, random walk, risk tolerance, short selling, statistical model, stochastic process, stochastic volatility, telemarketer, transaction costs, value at risk, volatility smile, Wiener process, yield curve

A Lagrange multiplier statistic 371 ••••••••••••••••••••••• •• • • • • • • • • ••••••••••• • ••••••••••••••••••••• •• •• • •••••••• •••• •••••• • ••• •• ••••••••••• •••••• • • ••• • • • • • ••• • •• • • •••••••• • • • ••• ••• • • •••• ••••• •••• ••••••• •••••••••• • •••••• •• • •• • •• •• •• •• • •• • •• •• ••• • • ••• • •• • ••••••• ••• • ••• ••••• • •• •••• ••• • ••••• •••••••••• • • • • ••• • • • • • • • • • • • • •• • ••••• •• ••• •• ••••• • •••••• •••••••••••••• • ••• •• • • • •••••• • •• ••• ••••• • •••• •• •• ••••• ••••••• •• • • • •••••• •• • 0.4 0.5 rho(t) 0.6 0.7 0.8 GARCH MODELS FOR BIVARIATE RETURNS 1940 1950 1960 1970 year 1980 1990 2000 Figure 9.5. The sample correlation coefficient between monthly log returns of IBM stock and the S&P 500 index. The correlation is computed by a moving window of 120 observations. The sample period is from January 1926 to December 1999. was proposed recently by Tse (2000) to test constant correlation coefficients in a multivariate GARCH model. A simple way to relax the constant-correlation constraint within the GARCH framework is to specify an exact equation for the conditional correlation coefficient. This can be done by two methods using the two reparameterizations of Σt discussed in Section 9.1. First, we use the correlation coefficient directly. Because the correlation coefficient between the returns of IBM stock and S&P 500 index is positive and must be in the interval [0, 1], we employ the equation ρ21,t = exp(qt ) , 1 + exp(qt ) (9.23) where a1,t−1 a2,t−1 qt = 0 + 1 ρ21,t−1 + 2 √ , σ11,t−1 σ22,t−1 where σii,t−1 is the conditional variance of the shock ai,t−1 .

. , k. In other words, D = diag{ 11 (0), . . . , kk (0)}. The concurrent, or lag-zero, cross-correlation matrix of rt is defined as ρ0 ≡ [ρi j (0)] = D−1 Γ0 D−1 . More specifically, the (i, j)th element of ρ0 is Cov(rit , r jt ) i j (0) ρi j (0) = = , std(r ii (0) j j (0) it )std(r jt ) which is the correlation coefficient between rit and r jt . In a time series analysis, such a correlation coefficient is referred to as a concurrent, or contemporaneous, CROSS - CORRELATION 301 correlation coefficient because it is the correlation of the two series at time t. It is easy to see that ρi j (0) = ρ ji (0), −1 ≤ ρi j (0) ≤ 1, and ρii (0) = 1 for 1 ≤ i, j ≤ k. Thus, ρ(0) is a symmetric matrix with unit diagonal elements. An important topic in multivariate time series analysis is the lead-lag relationships between component series.

From the definition, Cov(rit , r j,t− ) i j () ρi j () = = , std(rit )std(r jt ) ii (0) j j (0) (8.4) which is the correlation coefficient between rit and r j,t− . When > 0, this correlation coefficient measures the linear dependence of rit on r j,t− , which occurred prior to time t. Consequently, if ρi j () = 0 and > 0, we say that the series r jt leads the series rit at lag . Similarly, ρ ji () measures the linear dependence of r jt and ri,t− , and we say that the series rit leads the series r jt at lag if ρ ji () = 0 and > 0. Equation (8.4) also shows that the diagonal element ρii () is simply the lag- autocorrelation coefficient of rit . Based on this discussion, we obtain some important properties of the crosscorrelations when > 0. First, in general, ρi j () = ρ ji () for i = j because the two correlation coefficients measure different linear relationships between {rit } and {r jt }.


pages: 923 words: 163,556

Advanced Stochastic Models, Risk Assessment, and Portfolio Optimization: The Ideal Risk, Uncertainty, and Performance Measures by Frank J. Fabozzi

algorithmic trading, Benoit Mandelbrot, capital asset pricing model, collateralized debt obligation, correlation coefficient, distributed generation, diversified portfolio, fixed income, index fund, Louis Bachelier, Myron Scholes, p-value, quantitative trading / quantitative finance, random walk, risk-adjusted returns, short selling, stochastic volatility, Thomas Bayes, transaction costs, value at risk

This shortcoming of the covariance can be circumvented by dividing the joint variation as defined by equation (5.16) by the product of the respective variations of the component variables. The resulting measure is the Pearson correlation coefficient or simply the correlation coefficient defined by(5.19) where the covariance is divided by the product of the standard deviations of x and y. By definition, rx,y ∈[−1,1] for any bivariate quantitative data. Hence, we can compare different data with respect to the correlation coefficient equation (5.19). Generally, we make the following distinctionrx,y < 0 Negative correlation rx,y = 0 No correlation rx,y > 0 Positive correlation to indicate the possible direction of joint behavior. In contrast to the covariance, the correlation coefficient is invariant with respect to linear transformation. That is, it is said to be scaling invariant. For example, if we translate x to ax + b, we still have rax+b,y = cov(ax + b, y) / (sax+b ⋅ sy) = a cov(x, y) / asx ⋅ sy = r x,y For example, using the monthly bivariate return data from the S&P 500 and GE, we compute sS&P500 = Var(rS&P500) = 0.0025 and sGE = Var(rGE) = 0.0096 such that, according to (5.19), we obtain as the correlation coefficient the value rS&P500,GE = 0.0018/(0.0497 · 0.0978) = 0.3657.

That is, the correlation coefficient161 of two random variables X and Y, denoted by ρX,Y is defined as(14.22) We expressed the standard deviations as the square roots of the respective variances and Note that the correlation coefficient is equal to one, that is, ρX,X = 1, for the correlation between the random variable X with itself. This can be seen from (14.22) by inserting for the covariance in the numerator, and having , in the denominator. Moreover, the correlation coefficient is symmetric. This is due to definition (14.22) and the fact that the covariance is symmetric. The correlation coefficient given by (14.22) can take on real values in the range of -1 and 1 only. When its value is negative, we say that the random variables X and Y are negatively correlated, while they are positively correlated in the case of a positive correlation coefficient. When the correlation is zero, due to a zero covariance, we refer to X and Y as uncorrelated.

When the correlation is zero, due to a zero covariance, we refer to X and Y as uncorrelated. We summarize this below: −1 ≤ ρX ,Y ≤ 1 −1≤ ρX,Y < 0 X and Y negatively correlated ρX,Y = 0 X and Y uncorrelated 0 < ρX ,Y ≤ 1 X and Y positively correlated As with the covariances of a k-dimensional random vector, we list the correlation coefficients of all pairwise combinations of the k components in a k-by-k matrix This matrix, referred to as the correlation coefficient matrix and denoted by Γ, is also symmetric since the correlation coefficients are symmetric. For example, suppose we have a portfolio consisting of two assets whose prices are denominated in different currencies, say asset A in U.S. dollars ($) and asset B in euros (€). Furthermore, suppose the exchange rate was constant at $1.30 per €1. Consequently, asset B always moves 1.3 times as much when translated into the equivalent amount of dollars then when measured in euros.


The Intelligent Asset Allocator: How to Build Your Portfolio to Maximize Returns and Minimize Risk by William J. Bernstein

asset allocation, backtesting, buy and hold, capital asset pricing model, commoditize, computer age, correlation coefficient, diversification, diversified portfolio, Eugene Fama: efficient market hypothesis, fixed income, index arbitrage, index fund, intangible asset, Long Term Capital Management, p-value, passive investing, prediction markets, random walk, Richard Thaler, risk tolerance, risk-adjusted returns, risk/return, South Sea Bubble, stocks for the long run, survivorship bias, the rule of 72, the scientific method, time value of money, transaction costs, Vanguard fund, Yogi Berra, zero-coupon bond

Most of the points lie on nearly a straight line; a poor return for one was invariably associated with a poor return for the other. The correlation coefficient of .777 for these two assets is quite high. This graph demonstrates that adding U.S. small stocks to a portfolio of U.S. large stocks does not diminish risk very much, as a poor return for one will be very likely associated with a poor return for the other. Figure 3-4 plots two loosely correlated assets—U.S. large stocks (S&P 500) and foreign large stocks (EAFE Index). Although there does appear to be a loose relation between the two, it is far from perfect. The correlation coefficient of this pair is .483. Lastly, Figure 3-5 plots two very poorly correlated assets (correlation coefficient of .068): Japanese small stocks and REITs. This plot is a “scattergram” with no discernable pattern.

The Behavior of Multiple-Asset Portfolios 37 (This is the same reason why big offices have messier politics than small ones. A three-person office has three interpersonal relationships; a 10-person office has 45 relationships.) Real assets are almost always imperfectly correlated. In other words, an above-average return in one is somewhat more likely to be associated with an above-average return in the other. The degree of correlation is expressed by a correlation coefficient. This value ranges from ⫺1 to ⫹1. Perfectly correlated assets have a correlation coefficient of ⫹1, and uncorrelated assets have a coefficient of 0. Perfectly inversely (or negatively) correlated assets have a coefficient of ⫺1. The easiest way to understand this is to plot the returns of two assets against each other for many periods, as is done in Figures 3-3, 3-4, and 3-5. Each figure plots the 288 monthly returns for each asset pair for the 24-year period from January 1975 to December 1998.

This plot is a “scattergram” with no discernable pattern. A good or bad result for one of these assets tells us nothing about the result for the other. Why is this so important? As already discussed the most diversification benefit is obtained from uncorrelated assets. The above Math Details: How to Calculate a Correlation Coefficient In this book’s previous versions, I included a section on the manual calculation of the correlation coefficient. In the personal computer age, this is an exercise in masochism.The easiest way to do this is with a spreadsheet. Let’s assume that you have 36 monthly returns for two assets, A and B. Enter the returns in columns A and B, next to each other, spanning rows 1 to 36 for each pair of values. In Excel,enter in a separate cell the formula ⫽ CORREL(A1:A36, B1:B36) In Quattro Pro, the formula would be @CORREL(A1..A36, B1..B36) Both of these packages also contain a tool that will calculate a “correlation grid” of all of the correlations of an array of data for more than two assets.Those of you who would like an explanation of the steps involved in calculating a correlation coefficient are referred to a standard statistics text. 40 The Intelligent Asset Allocator analysis suggests that there is not much benefit from mixing domestic small and large stocks and that there is great benefit from mixing REITs and Japanese small stocks.


Commodity Trading Advisors: Risk, Performance Analysis, and Selection by Greg N. Gregoriou, Vassilios Karavas, François-Serge Lhabitant, Fabrice Douglas Rouah

Asian financial crisis, asset allocation, backtesting, buy and hold, capital asset pricing model, collateralized debt obligation, commodity trading advisor, compound rate of return, constrained optimization, corporate governance, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, discrete time, distributed generation, diversification, diversified portfolio, dividend-yielding stocks, fixed income, high net worth, implied volatility, index arbitrage, index fund, interest rate swap, iterative process, linear programming, London Interbank Offered Rate, Long Term Capital Management, market fundamentalism, merger arbitrage, Mexican peso crisis / tequila crisis, p-value, Pareto efficiency, Ponzi scheme, quantitative trading / quantitative finance, random walk, risk-adjusted returns, risk/return, selection bias, Sharpe ratio, short selling, stochastic process, survivorship bias, systematic trading, technology bubble, transaction costs, value at risk, zero-sum game

These two indices display the highest skewness and kurtosis; the former is the only index to exhibit negative returns over the entire sample. Table 6.3 examines the correlation coefficients between the different CTA indices as well as between the CTA indices and the first two return moments of the Russell 3000 (Russell squared). The results for the entire sample as well as the subsamples confirm our earlier findings. The correlation coefficient between the CTA index, the Financial and Metal Traders Index, the Systematic Traders Index, and the Diversified Traders Index are positive and close to 1 for all the different periods. The Currency Trader Index and the Discretionary Index have the lowest correlation coefficient with the other CTA indices. The coefficients are still positive between all the indices and for all the subperiods, but the correlation coefficient is much smaller. Over the entire period, all of the CTA indices have a small and negative correlation coefficient with the Russell 3000 index and a positive relation with the square of the Russell 3000 returns.

To match EGR’s assumption of homoskedasticity, data sets were generated with the standard deviation set at 2. Heteroskedasticity was created by letting the values of σ be 5, 10, 15, and 20, with one-fourth of the observations using each value. This allowed us to compare the Spearman correlation coefficient calculated for data sets with and without homoskedasticity. The funds were ranked in ascending order of returns for period one (first 12 months) and period two (last 12 months). From each 24-month period of generated returns, Spearman correlation coefficients were calculated for a fund’s rank in both periods. For the distribution of Spearman correlation coefficients to be suitably approximated by a normal, at least 10 observations are needed. Because 120 pairs are used here, the normal approximation is used. Mean returns also were calculated for each fund in period one and period two, and then ranked.

Returns-protection diversifiers have relatively high correlations in both the up and down markets with a generic asset class (such as the S&P 500 Index). 2. Returns-enhancing diversifiers possess correlations with the same generic asset class in an up market but are relatively less correlated in a down market. 3. “Ineffective” diversifiers are assets that do not add value, even though they may possess significant correlation coefficients with the generic asset class. CTA Strategies for Returns-Enhancing Diversification 339 To illustrate, a hedge fund strategy that has a negative correlation coefficient in an up-market regime and positive correlation coefficient in a down-market regime provides diversification with no incremental returns. We classify this in the third category, that is, as an ineffective diversifier. Indeed, a strategy with such a characteristic will have the opposite effect of a good diversifier as it weakens the returns on an uptrend and exaggerates the negative returns of the portfolio.


pages: 755 words: 121,290

Statistics hacks by Bruce Frey

Bayesian statistics, Berlin Wall, correlation coefficient, Daniel Kahneman / Amos Tversky, distributed generation, en.wikipedia.org, feminist movement, G4S, game design, Hacker Ethic, index card, Milgram experiment, p-value, place-making, reshoring, RFID, Search for Extraterrestrial Intelligence, SETI@home, Silicon Valley, statistical model, Thomas Bayes

The groups of people didn't score exactly the same on both scales, of course, and the rank order isn't even the same, but, relatively speaking, the position of each person to each of the other people when it comes to cheese attitude is about the same as when it comes to cheesecake attitude. The Association's marketer has support for her hypothesis. Computing a Correlation Coefficient Just eyeballing two columns of numbers from a sample, though, is usually not enough to really know whether there is a relationship between two things. The marketing specialist in our example wants to use a single number to more precisely describe whatever relationship is seen. The correlation coefficient takes into account all the information we used when we looked at our two columns of numbers in Table 2-1 and decided whether there was a relationship there. The correlation coefficient is produced through a formula that does the following things: Looks at each score in a column Sees how distant that score is from the mean of that column Identifies the distance from the mean of its matching score in the other column Multiplies the paired distances together Averages the results of those multiplications If this were a statistics textbook, I'd have to present a somewhat complicated formula for calculating the correlation coefficient.

This is very close to 1.0, which is the strongest a positive correlation can be, so the cheese-to-cheesecake correlation represents a very strong relationship. Interpreting a Correlation Coefficient Somewhat magically, the correlation formula process produces a number, ranging in value from -1.00 to +1.00, that measures the strength of relationship between two variables. Positive signs indicate the relationship is in the same direction. As one value increases, the other value increases. Negative signs indicate the relationship is in the opposite direction. As one value increases, the other value decreases. An important point to make is that the correlation coefficient provides a standardized measure of the strength of linear relationship between two variables [Hack #12]. The direction of a correlation (whether it is negative or positive) is the artificial result of the direction of the scale one chooses to use to measure the variables.

Let's imagine that a small college decides to use scores on the American College Test (ACT) as a predictor of college grade point average (GPA) at the end of students' first years. The admissions office goes back through a few years of records and gathers the ACT scores and freshman GPAs for a couple hundred students. They discover, to their delight, that there is a moderate relationship between these two variables: a correlation coefficient of .55. Correlation coefficients are a measure of the strength of linear relationships between two variables [Hack #11], and .55 indicates a fairly large relationship. This is good news because the existence of a relationship between the two makes ACT scores a good candidate as a predictor to guess GPA. Simple linear regression is the procedure that produces all the values we need to cook up the magic formula that will predict the future.


pages: 442 words: 94,734

The Art of Statistics: Learning From Data by David Spiegelhalter

Antoine Gombaud: Chevalier de Méré, Bayesian statistics, Carmen Reinhart, complexity theory, computer vision, correlation coefficient, correlation does not imply causation, dark matter, Edmond Halley, Estimating the Reproducibility of Psychological Science, Hans Rosling, Kenneth Rogoff, meta analysis, meta-analysis, Nate Silver, Netflix Prize, p-value, placebo effect, probability theory / Blaise Pascal / Pierre de Fermat, publication bias, randomized controlled trial, recommendation engine, replication crisis, self-driving car, speech recognition, statistical model, The Design of Experiments, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Malthus

This means it can be near 1 or −1 if the points are close to a line that steadily increases or decreases, even if this line is not straight; the Spearman’s rank correlation for the data in Figure 2.5(a) is 0.85, considerably higher than the Pearson correlation, since the points are closer to an increasing curve than a straight line. Figure 2.6 Two sets of (fictitious) data-points for which the Pearson correlation coefficients are both 0. This clearly does not mean there is no relationship between the two variables being plotted. From Alberto Cairo’s wonderful Datasaurus Dozen4. The Pearson correlation is 0.17 for the 2012–2015 data in Figure 2.5(b), and the Spearman’s rank correlation is −0.03, suggesting that there is no longer any clear relationship between the number of cases and survival rates. However, with so few hospitals the correlation coefficient can be very sensitive to individual data-points – if we remove the smallest hospital, which has a high survival rate, the Pearson correlation jumps to 0.42. Correlation coefficients are simply summaries of association, and cannot be used to conclude that there is definitely an underlying relationship between volume and survival rates, let alone why one might exist.fn13 In many applications the x-axis represents a quantity known as the independent variable, and interest focuses on its influence on the dependent variable plotted on the y-axis.

over-fitting: building a statistical model that is over-adapted to training data, so that its predictive ability starts to decline. parameters: the unknown quantities in a statistical model, generally denoted with Greek letters. Pearson correlation coefficient: for a set of n paired numbers, (x1, y1), (x2, y2) … (xn, yn), when , sx are the sample mean and standard deviation of the xs, and , sy are the sample mean and standard deviation of the ys, the Pearson correlation coefficient is given by Suppose xs and ys have both been standardized to Z-scores given by us and vs respectively, so that ui = (xi – )/sx, and vi = (yi – )/sy. Then the Pearson correlation coefficient can be expressed as , that is the ‘cross-product’ of the Z-scores. percentile (of a population): there is, for example, a 70% chance of drawing a random observation below the 70th percentile.

In other words, wealthy people with higher education are more likely to be diagnosed and get their tumour registered, an example of what is known as ascertainment bias in epidemiology. ‘Correlation Does Not Imply Causation’ We saw in the last chapter how Pearson’s correlation coefficient measures how close the points on a scatter-plot are to a straight line. When considering English hospitals conducting children’s heart surgery in the 1990s, and plotting the number of cases against their survival, the high correlation showed that bigger hospitals were associated with lower mortality. But we could not conclude that bigger hospitals caused the lower mortality. This cautious attitude has a long pedigree. When Karl Pearson’s newly developed correlation coefficient was being discussed in the journal Nature in 1900, a commentator warned that ‘correlation does not imply causation’. In the succeeding century this phrase has been a mantra repeatedly uttered by statisticians when confronted by claims based on simply observing that two things tend to vary together.


Stocks for the Long Run, 4th Edition: The Definitive Guide to Financial Market Returns & Long Term Investment Strategies by Jeremy J. Siegel

addicted to oil, asset allocation, backtesting, Black-Scholes formula, Bretton Woods, business cycle, buy and hold, buy low sell high, California gold rush, capital asset pricing model, cognitive dissonance, compound rate of return, correlation coefficient, Daniel Kahneman / Amos Tversky, diversification, diversified portfolio, dividend-yielding stocks, dogs of the Dow, equity premium, Eugene Fama: efficient market hypothesis, Everybody Ought to Be Rich, fixed income, German hyperinflation, implied volatility, index arbitrage, index fund, Isaac Newton, joint-stock company, Long Term Capital Management, loss aversion, market bubble, mental accounting, Myron Scholes, new economy, oil shock, passive investing, Paul Samuelson, popular capitalism, prediction markets, price anchoring, price stability, purchasing power parity, random walk, Richard Thaler, risk tolerance, risk/return, Robert Shiller, Robert Shiller, Ronald Reagan, shareholder value, short selling, South Sea Bubble, stocks for the long run, survivorship bias, technology bubble, The Great Moderation, The Wisdom of Crowds, transaction costs, tulip mania, Vanguard fund

This will be true if bond and stock returns are negatively correlated, which means that bond yields and stock prices move in opposite directions. The diversifying strength of an asset is measured by the correlation coefficient. The correlation coefficient, which theoretically ranges between –1 and +1, measures the correlation between an asset’s return and the return of the rest of the portfolio. The lower the correlation coefficient, the better the asset serves as a portfolio diversifier. Assets with negative correlations are particularly good diversifiers. As the correlation coefficient between the asset and portfolio returns increases, the diversifying quality of the asset declines. The correlation coefficient between annual stock and bond returns for six subperiods between 1926 and 2006 is shown in Figure 2-4. From 1926 through 1965 the correlation was only slightly positive, indicating that bonds were fairly good diversifiers for stocks.

An asset with a low correlation with the rest of the market provides better diversification than an asset with a high correlation. The correlation of returns between stocks or portfolios of stocks is measured by the correlation coefficient. A good case for investors is if there is no correlation between the stock returns of two countries, and the correlation coefficient is equal to zero. In this case, an investor who allocates his or her portfolio equally between each country can reduce his or her risk by almost one-third, compared to investing in a single country. As the correlation coefficient increases, the gains from diversification dwindle, and if there is perfect synchronization of returns, the correlation coefficient equals 1 and there is no gain (but no loss) from diversification. “Efficient” Portfolios: Formal Analysis How do you determine how much should be invested at home and abroad?

From 1926 through 1965 the correlation was only slightly positive, indicating that bonds were fairly good diversifiers for stocks. From 1966 through 1989 the correlation coefficient jumped to +0.34, and from 1990 through 1997 the correlation increased further to +0.55. This means that the diversifying quality of bonds diminished markedly from 1926 to 1997. There are good reasons why the correlation became more positive during this period. Under the gold-based monetary standard of the 1920s and early 1930s, bad economic times were associated with falling commodity prices; when the real economy was sinking, stocks declined and the real value of government bonds rose. Under a paper money standard, bad economic times are more likely to be associated with inflation, not deflation, as the government at- CHAPTER 2 Risk, Return, and Portfolio Allocation FIGURE 31 2–4 Correlation Coefficient between Monthly Stock and Bond Returns tempts to offset economic downturns with expansionary monetary policy.


pages: 519 words: 102,669

Programming Collective Intelligence by Toby Segaran

always be closing, correlation coefficient, Debian, en.wikipedia.org, Firefox, full text search, information retrieval, PageRank, prediction markets, recommendation engine, slashdot, Thomas Bayes, web application

Euclidean distance A clear implementation of this formula is shown here: def euclidean(p,q): sumSq=0.0 # add up the squared differences for i in range(len(p)): sumSq+=(p[i]-q[i])**2 # take the square root return (sumSq**0.5) Euclidean distance is used in several places in this book to determine how similar two items are. Pearson Correlation Coefficient The Pearson correlation coefficient is a measure of how highly correlated two variables are. It is a value between 1 and −1, where 1 indicates that the variables are perfectly correlated, 0 indicates no correlation, and −1 means they are perfectly inversely correlated. Figure B-2 shows the Pearson correlation coefficient. Figure B-2. Pearson correlation coefficient This can be implemented with the following code: def pearson(x,y): n=len(x) vals=range(n) # Simple sums sumx=sum([float(x[i]) for i in vals]) sumy=sum([float(y[i]) for i in vals]) # Sum up the squares sumxSq=sum([x[i]**2.0 for i in vals]) sumySq=sum([y[i]**2.0 for i in vals]) # Sum up the products pSum=sum([x[i]*y[i] for i in vals]) # Calculate Pearson score num=pSum-(sumx*sumy/n) den=((sumxSq-pow(sumx,2)/n)*(sumySq-pow(sumy,2)/n))**.5 if den==0: return 1 r=num/den return r We used the Pearson correlation in Chapter 2 to calculate the level of similarity between people's preferences.

, Learning from Clicks, Learning from Clicks, Setting Up the Database, Feeding Forward, Training with Backpropagation, Training Test, Training Test artificial, Learning from Clicks, Learning from Clicks, Setting Up the Database, Feeding Forward, Training with Backpropagation, Training Test, Training Test backpropagation, Training with Backpropagation connecting to search engine, Training Test designing click-training network, Learning from Clicks feeding forward, Feeding Forward setting up database, Setting Up the Database training test, Training Test neural network classifier, Exercises neural networks, Neural Networks, Neural Networks, Neural Networks, Neural Networks, Training a Neural Network, Training a Neural Network, Training a Neural Network, Strengths and Weaknesses, Strengths and Weaknesses backpropagation, and, Training a Neural Network black box method, Strengths and Weaknesses combinations of words, and, Neural Networks multilayer perceptron network, Neural Networks strengths and weaknesses, Strengths and Weaknesses synapses, and, Neural Networks training, Training a Neural Network using code, Training a Neural Network news sources, A Corpus of News newsfeatures.py, Selecting Sources, Downloading Sources, Downloading Sources, Downloading Sources, Converting to a Matrix, Using NumPy, The Algorithm, Displaying the Results, Displaying the Results, Displaying by Article, Displaying by Article getarticlewords function, Downloading Sources makematrix function, Converting to a Matrix separatewords function, Downloading Sources shape function, The Algorithm showarticles function, Displaying the Results, Displaying by Article showfeatures function, Displaying the Results, Displaying by Article stripHTML function, Downloading Sources transpose function, Using NumPy nn.py, Setting Up the Database, Setting Up the Database, Setting Up the Database, Setting Up the Database searchnet class, Setting Up the Database, Setting Up the Database, Setting Up the Database, Setting Up the Database generatehiddennode function, Setting Up the Database getstrength method, Setting Up the Database setstrength method, Setting Up the Database nnmf.py, The Algorithm difcost function, The Algorithm non-negative matrix factorization (NMF), Supervised versus Unsupervised Learning, Clustering, Non-Negative Matrix Factorization, Non-Negative Matrix Factorization, Non-Negative Matrix Factorization, Using Your NMF Code factorization, Supervised versus Unsupervised Learning goal of, Non-Negative Matrix Factorization update rules, Non-Negative Matrix Factorization using code, Using Your NMF Code normalization, Normalization Function numerical predictions, Building Price Models numpredict.py, Building a Sample Dataset, Building a Sample Dataset, Defining Similarity, Defining Similarity, Defining Similarity, Defining Similarity, Subtraction Function, Subtraction Function, Weighted kNN, Weighted kNN, Cross-Validation, Cross-Validation, Cross-Validation, Heterogeneous Variables, Scaling Dimensions, Optimizing the Scale, Optimizing the Scale, Uneven Distributions, Estimating the Probability Density, Graphing the Probabilities, Graphing the Probabilities, Graphing the Probabilities createcostfunction function, Optimizing the Scale createhiddendataset function, Uneven Distributions crossvalidate function, Cross-Validation, Optimizing the Scale cumulativegraph function, Graphing the Probabilities distance function, Defining Similarity dividedata function, Cross-Validation euclidian function, Defining Similarity gaussian function, Weighted kNN getdistances function, Defining Similarity inverseweight function, Subtraction Function knnestimate function, Defining Similarity probabilitygraph function, Graphing the Probabilities probguess function, Estimating the Probability Density, Graphing the Probabilities rescale function, Scaling Dimensions subtractweight function, Subtraction Function testalgorithm function, Cross-Validation weightedknn function, Weighted kNN wineprice function, Building a Sample Dataset wineset1 function, Building a Sample Dataset wineset2 function, Heterogeneous Variables NumPy, Using NumPy, Using NumPy, Simple Usage Example, NumPy, Installation on Other Platforms, Installation on Other Platforms installation on other platforms, Installation on Other Platforms installation on Windows, Simple Usage Example usage example, Installation on Other Platforms using, Using NumPy O online technique, Strengths and Weaknesses Open Web APIs, Open APIs optimization, Optimization, Group Travel, Representing Solutions, Representing Solutions, Representing Solutions, Representing Solutions, The Cost Function, The Cost Function, The Cost Function, Random Searching, Hill Climbing, Simulated Annealing, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Optimizing for Preferences, Optimizing for Preferences, The Cost Function, The Cost Function, Network Visualization, Network Visualization, Counting Crossed Lines, Drawing the Network, Exercises, Exercises, Exercises, Exercises, Exercises, Exercises, Exercises, Optimizing the Scale, Exercises, Optimization, Optimization annealing starting points, Exercises cost function, The Cost Function, Optimization exercises, Exercises genetic algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms crossover or breeding, Genetic Algorithms generation, Genetic Algorithms mutation, Genetic Algorithms population, Genetic Algorithms genetic optimization stopping criteria, Exercises group travel cost function, Exercises group travel planning, Group Travel, Representing Solutions, Representing Solutions, Representing Solutions, The Cost Function, The Cost Function car rental period, The Cost Function departure time, Representing Solutions price, Representing Solutions time, Representing Solutions waiting time, The Cost Function hill climbing, Hill Climbing line angle penalization, Exercises network visualization, Network Visualization, Counting Crossed Lines, Drawing the Network counting crossed lines, Counting Crossed Lines drawing networks, Drawing the Network layout problem, Network Visualization network vizualization, Network Visualization pairing students, Exercises preferences, Optimizing for Preferences, Optimizing for Preferences, The Cost Function, The Cost Function cost function, The Cost Function running, The Cost Function student dorm, Optimizing for Preferences random searching, Random Searching representing solutions, Representing Solutions round-trip pricing, Exercises simulated annealing, Simulated Annealing where it may not work, Genetic Algorithms optimization.py, Group Travel, Representing Solutions, Representing Solutions, The Cost Function, Random Searching, Hill Climbing, Simulated Annealing, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Optimizing the Scale annealingoptimize function, Simulated Annealing geneticoptimize function, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms, Genetic Algorithms elite, Genetic Algorithms maxiter, Genetic Algorithms mutprob, Genetic Algorithms popsize, Genetic Algorithms getminutes function, Representing Solutions hillclimb function, Hill Climbing printschedule function, Representing Solutions randomoptimize function, Random Searching schedulecost function, The Cost Function P PageRank algorithm, Real-Life Examples, The PageRank Algorithm pairing students, Exercises Pandora, Real-Life Examples parse tree, Programs As Trees Pearson correlation, Hierarchical Clustering, Viewing Data in Two Dimensions hierarchical clustering, Hierarchical Clustering multidimensional scaling, Viewing Data in Two Dimensions Pearson correlation coefficient, Pearson Correlation Score, Pearson Correlation Coefficient, Pearson Correlation Coefficient code, Pearson Correlation Coefficient Pilgrim, Mark, Universal Feed Parser polynomial transformation, The Kernel Trick poplib, Exercises population, Genetic Algorithms, What Is Genetic Programming?, Creating the Initial Population, Genetic Algorithms diversity and, Creating the Initial Population Porter Stemmer, Finding the Words on a Page Pr(Document), Exercises prediction markets, Real-Life Examples price models, Building a Sample Dataset, Building a Sample Dataset, k-Nearest Neighbors, Exercises, Exercises, Exercises, Exercises, Exercises, Exercises, Exercises, Exercises building sample dataset, Building a Sample Dataset eliminating variables, Exercises exercises, Exercises item types, Exercises k-nearest neighbors (kNN), k-Nearest Neighbors laptop dataset, Exercises leave-one-out cross-validation, Exercises optimizing number of neighbors, Exercises search attributes, Exercises varying ss for graphing probability, Exercises probabilities, Calculating Probabilities, Starting with a Reasonable Guess, Probability of a Whole Document, A Quick Introduction to Bayes' Theorem, Combining the Probabilities, Graphing the Probabilities, Conditional Probability assumed probability, Starting with a Reasonable Guess Bayes' Theorem, A Quick Introduction to Bayes' Theorem combining, Combining the Probabilities conditional probability, Calculating Probabilities graphing, Graphing the Probabilities of entire document given classification, Probability of a Whole Document product marketing, Other Uses for Learning Algorithms public message boards, Filtering Spam pydelicious, Simple Usage Example, Simple Usage Example, pydelicious installation, Simple Usage Example usage example, Simple Usage Example pysqlite, Building the Index, Persisting the Trained Classifiers, Installation on All Platforms, Installation on All Platforms, pysqlite, Simple Usage Example importing, Persisting the Trained Classifiers installation on other platforms, Installation on All Platforms installation on Windows, Installation on All Platforms usage example, Simple Usage Example Python, Style of Examples, Python Tips advantages of, Style of Examples tips, Python Tips Python Imaging Library (PIL), Drawing the Dendrogram, Python Imaging Library, Installation on Windows, Installation on Windows, Installation on Windows installation on other platforms, Installation on Windows usage example, Installation on Windows Windows installation, Installation on Windows Python, genetic programming and, Programs As Trees, Programs As Trees, Representing Trees in Python, Building and Evaluating Trees, Displaying the Program building and evaluating trees, Building and Evaluating Trees displaying program, Displaying the Program representing trees, Representing Trees in Python traversing complete tree, Programs As Trees Q query layer, Design of a Click-Tracking Network querying, Querying, Querying query function, Querying R radial-basis function, The Kernel Trick random searching, Random Searching random-restart hill climbing, Hill Climbing ranking, What's in a Search Engine?

, Exercises marketing, Other Uses for Learning Algorithms mass-and-spring algorithm, The Layout Problem matchmaker dataset, Matchmaker Dataset, Difficulties with the Data, Decision Tree Classifier, Categorical Features, Creating the New Dataset, Creating the New Dataset, Applying SVM to the Matchmaker Dataset categorical features, Categorical Features creating new, Creating the New Dataset decision tree algorithm, Decision Tree Classifier difficulties with data, Difficulties with the Data LIBSVM, applying to, Applying SVM to the Matchmaker Dataset scaling data, Creating the New Dataset matchmaker.csv file, Matchmaker Dataset mathematical formulas, Euclidean Distance, Euclidean Distance, Pearson Correlation Coefficient, Weighted Mean, Tanimoto Coefficient, Conditional Probability, Gini Impurity, Entropy, Variance, Gaussian Function, Dot-Products conditional probability, Conditional Probability dot-product, Dot-Products entropy, Entropy Euclidean distance, Euclidean Distance Gaussian function, Gaussian Function Gini impurity, Gini Impurity Pearson correlation coefficient, Pearson Correlation Coefficient Tanimoto coefficient, Tanimoto Coefficient variance, Variance weighted mean, Weighted Mean matplotlib, Graphing the Probabilities, matplotlib, Installation, Simple Usage Example installation, Installation usage example, Simple Usage Example matrix math, Clustering, A Quick Introduction to Matrix Math, A Quick Introduction to Matrix Math, What Does This Have to Do with the Articles Matrix?


pages: 321

Finding Alphas: A Quantitative Approach to Building Trading Strategies by Igor Tulchinsky

algorithmic trading, asset allocation, automated trading system, backtesting, barriers to entry, business cycle, buy and hold, capital asset pricing model, constrained optimization, corporate governance, correlation coefficient, credit crunch, Credit Default Swap, discounted cash flows, discrete time, diversification, diversified portfolio, Eugene Fama: efficient market hypothesis, financial intermediation, Flash crash, implied volatility, index arbitrage, index fund, intangible asset, iterative process, Long Term Capital Management, loss aversion, market design, market microstructure, merger arbitrage, natural language processing, passive investing, pattern recognition, performance metric, popular capitalism, prediction markets, price discovery process, profit motive, quantitative trading / quantitative finance, random walk, Renaissance Technologies, risk tolerance, risk-adjusted returns, risk/return, selection bias, sentiment analysis, shareholder value, Sharpe ratio, short selling, Silicon Valley, speech recognition, statistical arbitrage, statistical model, stochastic process, survivorship bias, systematic trading, text mining, transaction costs, Vanguard fund, yield curve

Note: tests usually select the number of days for correlation as two or four years instead of a full history, to save computational resources. Pearson Correlation Coefficient The Pearson correlation coefficient, also known as the Pearson product-­ moment correlation coefficient, has no units and can take values from 1 to 1. The mathematical formula was first developed by Karl Pearson in 1895: cov Pi , Pj r Pi where cov Pi , Pj E Pi Pj Pi (2) Pj Pj is the covariance and Pi and Pj are the standard deviations of Pi and Pj , respectively. For two vectors of PnLs, the coefficient is computed by using the sample covariance and variances. In particular, n r t 1 n t 1 Pit Pit Pi Pi 2 Pjt n t 1 Pj Pjt Pj 2 . (3) The coefficient is invariant to linear transformations of either variable. If the sign of the correlation coefficient is positive, it means that the PnLs of the two alphas tend to move in the same direction.

These metrics are derived mainly from the alpha’s profit and loss (PnL). For example, the information ratio is just the average returns divided by the standard deviation of returns. Another key quality of an alpha is its uniqueness, which is evaluated by the correlation coefficient between a given alpha and other existing alphas. An alpha with a lower correlation coefficient normally is considered to be adding more value to the pool of existing alphas. If the number of alphas in the pool is small, the importance of correlation is low. As the number of alphas increases, however, different techniques to measure the correlation coefficient among them become more important in helping the investor diversify his or her portfolio. Portfolio managers will want to include relatively uncorrelated alphas in their portfolios because a diversified portfolio helps to reduce risk.

The formula transforms input pairs of vectors (Pi , Pj ) into time-scaled vectors and then computes the angle between the two scaled vectors: T P'i =  w1 Pi1 , w2 Pi 2 , … , wn Pin  ∈  n T P' j =  w1 Pj1 , w2 Pj 2 , … , wn Pjn  ∈  n . (7) As a result, the temporal-based correlation still preserves many desirable aspects of the traditional dot product, such as commutative, distributive, and bilinear properties. The Pearson correlation coefficient can be computed here for the two scaled vectors in Equation 7. We can see that the centered variables have zero correlation or are uncorrelated in the sense of the Pearson correlation coefficient (i.e. the mean of each vector is subtracted from the elements of that vector), while orthogonality is a property of the raw variables. Zero correlation implies that the two demeaned vectors are orthogonal. The demeaning process often changes the angle of each vector and the angle between two vectors.


pages: 312 words: 35,664

The Mathematics of Banking and Finance by Dennis W. Cox, Michael A. A. Cox

barriers to entry, Brownian motion, call centre, correlation coefficient, fixed income, G4S, inventory management, iterative process, linear programming, meta analysis, meta-analysis, pattern recognition, random walk, traveling salesman, value at risk

So, from above we know that the gradient of the line is estimated by n n n 1 xi yi − xi yi n i=1 i=1 i=1 â = n n 2 2 1 xi − xi n i=1 i=1 This may be written in a more compact form since the variance (section 5.6) is n n 2 2 1 xi − xi n i=1 i=1 var(x) = n−1 The covariance may be similarly written as: n cov(x, y) = (xi − x̄) (yi − ȳ) i=1 = n−1 n xi yi i=1 n n 1 − xi yi n i=1 i=1 n−1 (*) We can then write the equation for the estimated gradient in a more compact form using this new notation: â = cov(x, y) var(x) 13.3 CORRELATION COEFFICIENT A closely related term to the covariance is the correlation coefficient (r (x, y)), which is simply r (x, y) = cov(x, y) std(x) std(y) where the standard deviation of x (section 5.6) is given by n 2 n 2 1 i=1 xi − n i=1 xi std(x) = n−1 There would be a similar expression for y. The correlation coefficient is a measure of the interdependence of two variables. The coefficient ranges in value from −1 to +1, indicating perfect negative correlation at −1, absence Linear Regression 105 of correlation at zero, and perfect positive correlation at +1. Two variables are positively correlated if the correlation coefficient is greater than zero and the line that is drawn to show a relationship between the items sampled has a positive gradient.

Two variables are positively correlated if the correlation coefficient is greater than zero and the line that is drawn to show a relationship between the items sampled has a positive gradient. If the correlation coefficient is negative, the variables are negatively correlated and the line drawn will have a negative gradient. The final option is that the correlation coefficient vanishes, and the gradient vanishes since â = 0, in which case the variables are completely uncorrelated. This means that there is no relationship between the two variables. An example of this might be the time taken to process a batch of transactions and the movements in interest rates. As a rough guide, 100r 2 is the percentage of the total variation of the y population that is accounted for by their relationship with x. More precisely there is generally considered to be a significant correlation if the correlation coefficient, or r-value, exceeds the critical value (rcrit ) at n − 2 degrees of freedom.

Index a notation 103–4, 107–20, 135–47 linear regression 103–4, 107–20 slope significance test 112–20 variance 112 abscissa see horizontal axis absolute value, notation 282–4 accuracy and reliability, data 17, 47 adaptive resonance theory 275 addition, mathematical notation 279 addition of normal variables, normal distribution 70 addition rule, probability theory 24–5 additional variables, linear programming 167–70 adjusted cash flows, concepts 228–9 adjusted discount rates, concepts 228–9 Advanced Measurement Approach (AMA) 271 advertising allocation, linear programming 154–7 air-conditioning units 182–5 algorithms, neural networks 275–6 alternatives, decisions 191–4 AMA see Advanced Measurement Approach analysis data 47–52, 129–47, 271–4 Latin squares 131–2, 143–7 linear regression 110–20 projects 190–2, 219–25, 228–34 randomised block design 129–35 sampling 47–52, 129–47 scenario analysis 40, 193–4, 271–4 trends 235–47 two-way classification 135–47 variance 110–20, 121–7 anonimised databases, scenario analysis 273–4 ANOVA (analysis of variance) concepts 110–20, 121–7, 134–47 examples 110–11, 123–7, 134–40 formal background 121–2 linear regression 110–20 randomised block design 134–5, 141–3 tables 110–11, 121–3, 134–47 two-way classification 136–7 appendix 279–84 arithmetic mean, concepts 37–45, 59–60, 65–6, 67–74, 75–81 assets classes 149–57 reliability 17, 47, 215–18, 249–60 replacement of assets 215–18, 249–60 asymptotic distributions 262 ATMs 60 averages see also mean; median; mode concepts 37–9 b notation 103–4, 107–20, 132–5 linear regression 103–4, 107–20 variance 112 back propagation, neural networks 275–7 backwards recursion 179–87 balance sheets, stock 195 bank cashier problem, Monte Carlo simulation 209–12 Bank for International Settlements (BIS) 267–9, 271 banks Basel Accord 262, 267–9, 271 failures 58 loss data 267–9, 271–4 modelling 75–81, 85, 97, 267–9, 271–4 profitable loans 159–66 bar charts comparative data 10–12 concepts 7–12, 54, 56, 59, 205–6, 232–3 discrete data 7–12 examples 9–12, 205–6, 232–3 286 Index bar charts (cont.) narrative explanations 10 relative frequencies 8–12 rules 8–9 uses 7–12, 205–6, 232–3 base rates, trends 240 Basel Accord 262, 267–9, 271 bathtub curves, reliability concepts 249–51 Bayes’theorem, probability theory 27–30, 31 bell-shaped normal distribution see normal distribution bi-directional associative memory 275 bias 1, 17, 47–50, 51–2, 97, 129–35 randomised block design 129–35 sampling 17, 47–50, 51–2, 97, 129–35 skewness 41–5 binomial distribution concepts 55–8, 61–5, 71–2, 98–9, 231–2 examples 56–8, 61–5, 71–2, 98–9 net present value (NPV) 231–2 normal distribution 71–2 Pascal’s triangle 56–7 uses 55, 57, 61–5, 71–2, 98–9, 231–2 BIS see Bank for International Settlements boards of directors 240–1 break-even analysis, concepts 229–30 Brownian motion 22 see also random walks budgets 149–57 calculators, log functions 20, 61 capital Basel Accord 262, 267–9, 271 cost of capital 219–25, 229–30 cash flows adjusted cash flows 228–9 future cash flows 219–25, 227–34, 240–1 net present value (NPV) 219–22, 228–9, 231–2 standard deviation 232–4 central limit theorem concepts 70, 75 examples 70 chi-squared test concepts 83–4, 85, 89, 91–5 contingency tables 92–5 examples 83–4, 85, 89, 91–2 goodness of fit test 91–5 multi-way tables 94–5 tables 84, 91 Chu Shi-Chieh’s Ssu Yuan Y Chien 56 circles, tree diagrams 30–5 class intervals concepts 13–20, 44–5, 63–4, 241–7 histograms 13–20, 44–5 mean calculations 44–5 mid-points 44–5, 241–7 notation 13–14, 20 Sturges’s formula 20 variance calculations 44–5 classical approach, probability theory 22, 27 cluster sampling 50 coin-tossing examples, probability theory 21–3, 53–4 collection techniques, data 17, 47–52, 129–47 colours, graphical presentational approaches 9 combination, probability distribution (density) functions 54–8 common logarithm (base 10) 20 communications, decisions 189–90 comparative data, bar charts 10–12 comparative histograms see also histograms examples 14–19 completed goods 195 see also stock . . . conditional probability, concepts 25–7, 35 confidence intervals, concepts 71, 75–81, 105, 109, 116–20, 190, 262–5 constraining equations, linear programming 159–70 contingency tables, concepts 92–5 continuous approximation, stock control 200–1 continuous case, failures 251 continuous data concepts 7, 13–14, 44–5, 65–6, 251 histograms 7, 13–14 continuous uniform distribution, concepts 64–6 correlation coefficient concepts 104–20, 261–5, 268–9 critical value 105–6, 113–20 equations 104–5 examples 105–8, 115–20 costs capital 219–25, 229–30 dynamic programming 180–82 ghost costs 172–7 holding costs 182–5, 197–201, 204–8 linear programming 167–70, 171–7 sampling 47 stock control 182–5, 195–201 transport problems 171–7 trend analysis 236–47 types 167–8, 182 counting techniques, probability distribution (density) functions 54 covariance see also correlation coefficient concepts 104–20, 263–5 credit cards 159–66, 267–9 credit derivatives 97 see also derivatives Index credit risk, modelling 75, 149, 261–5 critical value, correlation coefficient 105–6, 113–20 cumulative frequency polygons concepts 13–20, 39–40, 203 examples 14–20 uses 13–14 current costs, linear programming 167–70 cyclical variations, trends 238–47 data analysis methods 47–52, 129–47, 271–4 collection techniques 17, 47–52, 129–47 continuous/discrete types 7–12, 13–14, 44–5, 53–5, 65–6, 72, 251 design/approach to analysis 129–47 errors 129–47 graphical presentational approaches 1–20, 149–57 identification 2–5, 261–5 Latin squares 131–2, 143–7 loss data 267–9, 271–4 neural networks 275–7 qualities 17, 47 randomised block design 129–35 reliability and accuracy 17, 47 sampling 17, 47–52 time series 235–47 trends 5, 10, 235–47 two-way classification analysis 135–47 data points, scatter plots 2–5 databases, loss databases 272–4 debentures 149–57 decisions alternatives 191–4 Bayes’theorem 27–30, 31 communications 189–90 concepts 21–35, 189–94, 215–25, 228–34, 249–60 courses of action 191–2 definition 21 delegation 189–90 empowerment 189–90 guesswork 191 lethargy pitfalls 189 minimax regret rule 192–4 modelling problems 189–91 Monty Hall problem 34–5, 212–13 pitfalls 189–94 probability theory 21–35, 53–66, 189–94, 215–18 problem definition 129, 190–2 project analysis guidelines 190–2, 219–25, 228–34 replacement of assets 215–18, 249–60 staff 189–90 287 steps 21 stock control 195–201, 203–8 theory 189–94 degrees of freedom 70–1, 75–89, 91–5, 110–20, 136–7 ANOVA (analysis of variance) 110–20, 121–7, 136–7 concepts 70–1, 75–89, 91–5, 110–20, 136–7 delegation, decisions 189–90 density functions see also probability distribution (density) functions concepts 65–6, 67, 83–4 dependent variables, concepts 2–5, 103–20, 235 derivatives 58, 97–8, 272 see also credit . . . ; options design/approach to analysis, data 129–47 dice-rolling examples, probability theory 21–3, 53–5 differentiation 251 discount factors adjusted discount rates 228–9 net present value (NPV) 220–1, 228–9, 231–2 discrete data bar charts 7–12, 13 concepts 7–12, 13, 44–5, 53–5, 72 discrete uniform distribution, concepts 53–5 displays see also presentational approaches data 1–5 Disraeli, Benjamin 1 division notation 280, 282 dynamic programming complex examples 184–7 concepts 179–87 costs 180–82 examples 180–87 principle of optimality 179–87 returns 179–80 schematic 179–80 ‘travelling salesman’ problem 185–7 e-mail surveys 50–1 economic order quantity see also stock control concepts 195–201 examples 196–9 empowerment, staff 189–90 error sum of the squares (SSE), concepts 122–5, 133–47 errors, data analysis 129–47 estimates mean 76–81 probability theory 22, 25–6, 31–5, 75–81 Euler, L. 131 288 Index events independent events 22–4, 35, 58, 60, 92–5 mutually exclusive events 22–4, 58 probability theory 21–35, 58–66, 92–5 scenario analysis 40, 193–4, 271–4 tree diagrams 30–5 Excel 68, 206–7 exclusive events see mutually exclusive events expected errors, sensitivity analysis 268–9 expected value, net present value (NPV) 231–2 expert systems 275 exponent notation 282–4 exponential distribution, concepts 65–6, 209–10, 252–5 external fraud 272–4 extrapolation 119 extreme value distributions, VaR 262–4 F distribution ANOVA (analysis of variance) 110–20, 127, 134–7 concepts 85–9, 110–20, 127, 134–7 examples 85–9, 110–20, 127, 137 tables 85–8 f notation 8–9, 13–20, 26, 38–9, 44–5, 65–6, 85 factorial notation 53–5, 283–4 failure probabilities see also reliability replacement of assets 215–18, 249–60 feasibility polygons 152–7, 163–4 finance selection, linear programming 164–6 fire extinguishers, ANOVA (analysis of variance) 123–7 focus groups 51 forward recursion 179–87 four by four tables 94–5 fraud 272–4, 276 Fréchet distribution 262 frequency concepts 8–9, 13–20, 37–45 cumulative frequency polygons 13–20, 39–40, 203 graphical presentational approaches 8–9, 13–20 frequentist approach, probability theory 22, 25–6 future cash flows 219–25, 227–34, 240–1 fuzzy logic 276 Garbage In, Garbage Out (GIGO) 261–2 general rules, linear programming 167–70 genetic algorithms 276 ghost costs, transport problems 172–7 goodness of fit test, chi-squared test 91–5 gradient (a notation), linear regression 103–4, 107–20 graphical method, linear programming 149–57, 163–4 graphical presentational approaches concepts 1–20, 149–57, 235–47 rules 8–9 greater-than notation 280–4 Greek alphabet 283 guesswork, modelling 191 histograms 2, 7, 13–20, 41, 73 class intervals 13–20, 44–5 comparative histograms 14–19 concepts 7, 13–20, 41, 73 continuous data 7, 13–14 examples 13–20, 73 skewness 41 uses 7, 13–20 holding costs 182–5, 197–201, 204–8 home insurance 10–12 Hopfield 275 horizontal axis bar charts 8–9 histograms 14–20 linear regression 103–4, 107–20 scatter plots 2–5, 103 hypothesis testing concepts 77–81, 85–95, 110–27 examples 78–80, 85 type I and type II errors 80–1 i notation 8–9, 13–20, 28–30, 37–8, 103–20 identification data 2–5, 261–5 trends 241–7 identity rule 282 impact assessments 21, 271–4 independent events, probability theory 22–4, 35, 58, 60, 92–5 independent variables, concepts 2–5, 70, 103–20, 235 infinity, normal distribution 67–72 information, quality needs 190–4 initial solution, linear programming 167–70 insurance industry 10–12, 29–30 integers 280–4 integration 65–6, 251 intercept (b notation), linear regression 103–4, 107–20 interest rates base rates 240 daily movements 40, 261 project evaluation 219–25, 228–9 internal rate of return (IRR) concepts 220–2, 223–5 examples 220–2 interpolation, IRR 221–2 interviews, uses 48, 51–2 inventory control see stock control Index investment strategies 149–57, 164–6, 262–5 IRR see internal rate of return iterative processes, linear programming 170 j notation 28–30, 37, 104–20, 121–2 JP Morgan 263 k notation 20, 121–7 ‘know your customer’ 272 Kohonen self-organising maps 275 Latin squares concepts 131–2, 143–7 examples 143–7 lead times, stock control 195–201 learning strategies, neural networks 275–6 less-than notation 281–4 lethargy pitfalls, decisions 189 likelihood considerations, scenario analysis 272–3 linear programming additional variables 167–70 concepts 149–70 concerns 170 constraining equations 159–70 costs 167–70, 171–7 critique 170 examples 149–57, 159–70 finance selection 164–6 general rules 167–70 graphical method 149–57, 163–4 initial solution 167–70 iterative processes 170 manual preparation 170 most profitable loans 159–66 optimal advertising allocation 154–7 optimal investment strategies 149–57, 164–6 returns 149–57, 164–6 simplex method 159–70, 171–2 standardisation 167–70 time constraints 167–70 transport problems 171–7 linear regression analysis 110–20 ANOVA (analysis of variance) 110–20 concepts 3, 103–20 equation 103–4 examples 107–20 gradient (a notation) 103–4, 107–20 intercept (b notation) 103–4, 107–20 interpretation 110–20 notation 103–4 residual sum of the squares 109–20 slope significance test 112–20 uncertainties 108–20 literature searches, surveys 48 289 loans finance selection 164–6 linear programming 159–66 risk assessments 159–60 log-normal distribution, concepts 257–8 logarithms (logs), types 20, 61 losses, banks 267–9, 271–4 lotteries 22 lower/upper quartiles, concepts 39–41 m notation 55–8 mail surveys 48, 50–1 management information, graphical presentational approaches 1–20 Mann–Whitney test see U test manual preparation, linear programming 170 margin of error, project evaluation 229–30 market prices, VaR 264–5 marketing brochures 184–7 mathematics 1, 7–8, 196–9, 219–20, 222–5, 234, 240–1, 251, 279–84 matrix plots, concepts 2, 4–5 matrix-based approach, transport problems 171–7 maximum and minimum, concepts 37–9, 40, 254–5 mean comparison of two sample means 79–81 comparisons 75–81 concepts 37–45, 59–60, 65–6, 67–74, 75–81, 97–8, 100–2, 104–27, 134–5 confidence intervals 71, 75–81, 105, 109, 116–20, 190, 262–5 continuous data 44–5, 65–6 estimates 76–81 hypothesis testing 77–81 linear regression 104–20 normal distribution 67–74, 75–81, 97–8 sampling 75–81 mean square causes (MSC), concepts 122–7, 134–47 mean square errors (MSE), ANOVA (analysis of variance) 110–20, 121–7, 134–7 median, concepts 37, 38–42, 83, 98–9 mid-points class intervals 44–5, 241–7 moving averages 241–7 minimax regret rule, concepts 192–4 minimum and maximum, concepts 37–9, 40 mode, concepts 37, 39, 41 modelling banks 75–81, 85, 97, 267–9, 271–4 concepts 75–81, 83, 91–2, 189–90, 195–201, 215–18, 261–5 decision-making pitfalls 189–91 economic order quantity 195–201 290 Index modelling (cont.) guesswork 191 neural networks 275–7 operational risk 75, 262–5, 267–9, 271–4 output reviews 191–2 replacement of assets 215–18, 249–60 VaR 261–5 moments, density functions 65–6, 83–4 money laundering 272–4 Monte Carlo simulation bank cashier problem 209–12 concepts 203–14, 234 examples 203–8 Monty Hall problem 212–13 queuing problems 208–10 random numbers 207–8 stock control 203–8 uses 203, 234 Monty Hall problem 34–5, 212–13 moving averages concepts 241–7 even numbers/observations 244–5 moving totals 245–7 MQMQM plot, concepts 40 MSC see mean square causes MSE see mean square errors multi-way tables, concepts 94–5 multiplication notation 279–80, 282 multiplication rule, probability theory 26–7 multistage sampling 50 mutually exclusive events, probability theory 22–4, 58 n notation 7, 20, 28–30, 37–45, 54–8, 103–20, 121–7, 132–47, 232–4 n!


pages: 447 words: 104,258

Mathematics of the Financial Markets: Financial Instruments and Derivatives Modelling, Valuation and Risk Issues by Alain Ruttiens

algorithmic trading, asset allocation, asset-backed security, backtesting, banking crisis, Black Swan, Black-Scholes formula, Brownian motion, capital asset pricing model, collateralized debt obligation, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, delta neutral, discounted cash flows, discrete time, diversification, fixed income, implied volatility, interest rate derivative, interest rate swap, margin call, market microstructure, martingale, p-value, passive investing, quantitative trading / quantitative finance, random walk, risk/return, Satyajit Das, Sharpe ratio, short selling, statistical model, stochastic process, stochastic volatility, time value of money, transaction costs, value at risk, volatility smile, Wiener process, yield curve, zero-coupon bond

A famous example is if (x, y) can only value (0, 1), (0, −1), (1, 0) or (−1, 0), with equal probability. The linear correlation of x and y is 0, although they are clearly dependent: if x = 0, y can only value 1 or −1, and if x ≠ 0, y = 0. Coming back to the general case of φx(x) and φy(y) being not Gaussian, this inference cannot be made. Typically, the classic “rank correlation” coefficient of Spearman shows the way to get round the problem: this rank correlation consists in a linear correlation coefficient of the variates, 5 now transformed in a non-linear way, by a probability transformation, that is, their respective cumulative marginal distributions: with The Spearman correlation is a correlation measure that can be computed from these relationships and from the general formula for ρx, y above, but, as a step further, we can link above Φx(x), Φy(y) and Φ(x, y) relationships in a more general way that defines C – named a copula of two variables x and y – as a cumulative probability function of the marginal cumulative probabilities Φx(x), Φy(y) of x and y.6 A copula is thus a general measure of co-dependence between two variates, which is independent of their individual marginal distribution – see Figure 13.5.

It is indeed based on several restrictive hypotheses: Hypotheses related to financial assets: Asset returns r are modeled by a random variable, distributed as a Gaussian probabilities distribution, fully determined by its first two moments, namely its expected value E and its variance V, although instead of V, the theory makes use of the corresponding standard deviation measure STD (STD = ). Returns of different financial assets i and j are correlated by the linear correlation coefficient ρij. Markets are efficient1 – practically speaking, we observe that the more liquid a market, the more efficient it is. The theory is built on mid prices (average of the market quoted bid and offer (or ask) prices): the market bid–offer spread is thus not considered here. Various costs such as brokerage fees, taxes, and so on are not taken into account (they are too much affected by local circumstances, market features, and the investor's situation).

For example, in 2006, based on successive daily close prices, the return and risk of L'Oreal were 20% and 19% respectively. Figure 4.3 Example of a stock showed in a (r, σ) graph 4.3.3 The Markowitz model Markowitz's goal was to optimize the budget allocation to a portfolio P of n stocks Si(ri, σi), weighted by wi, with 0 ≤ wi ≤ 1 and ∑wi = 1, so that for P: (4.1) that is, where the ρij correlation coefficients are computed by In a (r, σ) chart, it is possible, for a given past period of data to locate by a point any Si(ri, σi), but also any possible weighted combination of up to n stocks, defining points that represent portfolios, among which the optimal ones have to be identified. Performing this graph representation shows that there is a (non-linear) “frontier” of possible portfolios presenting the highest return, for different risks.


pages: 297 words: 91,141

Market Sense and Nonsense by Jack D. Schwager

3Com Palm IPO, asset allocation, Bernie Madoff, Brownian motion, buy and hold, collateralized debt obligation, commodity trading advisor, computerized trading, conceptual framework, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, diversified portfolio, fixed income, high net worth, implied volatility, index arbitrage, index fund, London Interbank Offered Rate, Long Term Capital Management, margin call, market bubble, market fundamentalism, merger arbitrage, negative equity, pattern recognition, performance metric, pets.com, Ponzi scheme, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, risk/return, Robert Shiller, Robert Shiller, selection bias, Sharpe ratio, short selling, statistical arbitrage, statistical model, survivorship bias, transaction costs, two-sided market, value at risk, yield curve

In this chapter, we take a closer look at correlation and some of the ways it is often misinterpreted. Correlation Defined The correlation coefficient, typically denoted by the letter r, measures the degree of linear relationship between two variables. The correlation coefficient ranges from −1.0 to +1.0. The closer the correlation coefficient is to +1.0, the closer the relationship is between the two variables. A perfect correlation of 1.0 would occur only in artificial situations. For example, the heights of a group of people measured in inches and the heights of the same group of people measured in feet would be perfectly correlated. The closer the correlation coefficient is to −1.0, the stronger the inverse correlation is between the two variables. For example, average winter temperatures in the U.S.

For example, average winter temperatures in the U.S. Northeast and heating oil usage in that region would be inversely related variables (variables with a negative correlation coefficient). If two variables have a correlation coefficient near zero, it indicates that there is no significant (linear) relationship between the variables. It is important to understand that the correlation coefficient only indicates the degree of correlation between two variables and does not imply anything about cause and effect. Correlation Shows Linear Relationships Correlation reflects only linear relationships. For example, Figure 9.1 illustrates the returns of a hypothetical stock index option selling strategy (selling out-of-the-money calls and puts) versus Standard & Poor’s (S&P) returns. Calls that expire below the strike price and puts that expire above the strike price would generate profits equal to the premium collected.

Although Figure 9.1 clearly reflects a strong relationship between the strategy and S&P returns, the correlation between the two is actually zero! Why? Because correlation reflects only linear relationships, and there is no linear relationship between the two variables. Figure 9.1 Strategy Returns versus S&P Returns The Coefficient of Determination (r2) The square of the correlation coefficient, which is called the coefficient of determination and is denoted as r2, has a very specific interpretation: It represents the percentage of the variability of one variable explained by the other. For example, if the correlation coefficient (r) of a fund versus the S&P is 0.7, it implies that nearly half the variability of the fund’s returns is explained by the S&P returns (r2 = 0.49). For a mutual fund that is a so-called closet benchmarker—a fund that maintains a portfolio very similar to the S&P index with only minor differences—the r2 would tend to be very high (e.g., above 0.9).


pages: 321 words: 97,661

How to Read a Paper: The Basics of Evidence-Based Medicine by Trisha Greenhalgh

call centre, complexity theory, conceptual framework, correlation coefficient, correlation does not imply causation, deskilling, knowledge worker, longitudinal study, meta analysis, meta-analysis, microbiome, New Journalism, p-value, personalized medicine, placebo effect, publication bias, randomized controlled trial, selection bias, the scientific method

If you do, you might be stuck with non-parametric tests, which aren't as much fun (see section ‘What sort of data have they got, and have they used appropriate statistical tests?’). 4. Ignore all withdrawals (‘drop outs’) and non-responders, so the analysis only concerns subjects who fully complied with treatment (see section ‘Were preliminary statistical questions addressed?’). 5. Always assume that you can plot one set of data against another and calculate an ‘r-value’ (Pearson correlation coefficient) (see section ‘Has correlation been distinguished from regression, and has the correlation coefficient (‘r-value’) been calculated and interpreted correctly?’), and that a ‘significant’ r-value proves causation (see section ‘Have assumptions been made about the nature and direction of causality?’). 6. If outliers (points that lie a long way from the others on your graph) are messing up your calculations, just rub them out. But if outliers are helping your case, even if they appear to be spurious results, leave them in (see section ‘Were ‘outliers’ analysed with both common sense and appropriate statistical adjustments?’).

Correlation, regression and causation Has correlation been distinguished from regression, and has the correlation coefficient (‘r-value’) been calculated and interpreted correctly? For many non-statisticians, the terms correlation and regression are synonymous, and refer vaguely to a mental image of a scatter graph with dots sprinkled messily along a diagonal line sprouting from the intercept of the axes. You would be right in assuming that if two things are not correlated, it will be meaningless to attempt a regression. But regression and correlation are both precise statistical terms that serve different functions [2]. The r-value (or to give it its official name, ‘Pearson’s product–moment correlation coefficient') is among the most overused statistical instruments in the book. Strictly speaking, the r-value is not valid unless certain criteria, as given here, are fulfilled. 1.

Every r-value should be accompanied by a p-value, which expresses how likely an association of this strength would be to have arisen by chance (see section ‘Have ‘p-values’ been calculated and interpreted appropriately?’), or a confidence interval, which expresses the range within which the ‘true’ R-value is likely to lie (see section ‘Have confidence intervals been calculated, and do the authors' conclusions reflect them?’). (Note that lower case ‘r’ represents the correlation coefficient of the sample, whereas upper case ‘R’ represents the correlation coefficient of the entire population.) Remember, too, that even if the r-value is an appropriate value to calculate from a set of data, it does not tell you whether the relationship, however strong, is causal (see subsequent text). The term regression refers to a mathematical equation that allows one variable (the target variable) to be predicted from another (the independent variable).


pages: 119 words: 10,356

Topics in Market Microstructure by Ilija I. Zovko

Brownian motion, computerized trading, continuous double auction, correlation coefficient, financial intermediation, Gini coefficient, information asymmetry, market design, market friction, market microstructure, Murray Gell-Mann, p-value, quantitative trading / quantitative finance, random walk, stochastic process, stochastic volatility, transaction costs

CORRELATION AND CLUSTERING IN THE TRADING OF THE MEMBERS OF THE LSE it assumes normally distributed disturbances, whereas we have discrete ternary values. Later in the text we use a bootstrap approach to test the significance. Now, however, we test the significance of the correlation coefficients using a standard algorithm as in ref. (Best and Roberts, 1975). The algorithm calculates the approximate tail probabilities for Spearman’s correlation coefficient ρ. Its precision unfortunately degrades when there are ties in the data, which is the case here. With this caveat in mind, as a preliminary test, we find that, for example, for on-book trading in Vodafone for the month of May 2000, 10.3% of all correlation coefficients are significant at the 5% level. Averaging over all stocks and months, the average percentage of significant coefficients for on-book trading is 10.5% ± 0.4%, while for off-book trading it is 20.7% ± 1.7%.

Averaging over all stocks and months, the average percentage of significant coefficients for on-book trading is 10.5% ± 0.4%, while for off-book trading it is 20.7% ± 1.7%. Both of these averages are substantially higher than the 5% we would expect randomly with a 5% acceptance level of the test. 4.2 Significance and structure in the correlation matrices The preliminary result of the previous section that some correlation coefficients are non-random is further corroborated by testing for non-random structure in the correlation matrices. The hypothesis that there is structure in the correlation matrices contains the weaker hypothesis that some coefficients are statistically significant. The test for structure in the matrices would involve multiple joint tests for the significance of the coefficients. An alternative method, however, is to examine the eigenvalue spectrum of the correlation matrices. Intuitively, one can understand the relation between the two tests by remembering that eigenvalues λ are roots of the characteristic equation det(A − λ1) = 0, and that the determinant is a sum !

However, being stronger, they are perhaps of a more simple nature: The second largest eigenvalue is almost never significant for off-book trading, while on the on-book market it is quite often significant. 4.2.3 Clustering of trading behaviour The existence of significant eigenvalues allows us to use the correlation matrix as a distance measure in the attempt to classify institutions into groups of similar or dissimilar trading patterns. We apply clustering techniques using a metric chosen so that two strongly correlated institutions are ’close’ and anti-correlated institutions are ’far away’. A functional form fulfilling this requirement and satisfying the properties of being a metric is (Bonanno et al., 2000) # (4.2) di,j = 2 · (1 − ρi,j ), where ρi,j is the correlation coefficient between strategies i and j. We have tried several reasonable modifications to this form but without obvious differences in the results. Ultimately the choice of this metric is influenced by the fact that it has been successfully used in other studies (Bonanno et al., 2000). We use complete linkage clustering, in which the distance between two clusters is calculated as the maximum distance between its members.


Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Geron

Amazon Mechanical Turk, Bayesian statistics, centre right, combinatorial explosion, constrained optimization, correlation coefficient, crowdsourcing, en.wikipedia.org, iterative process, Netflix Prize, NP-complete, optical character recognition, P = NP, p-value, pattern recognition, performance metric, recommendation engine, self-driving car, SpamAssassin, speech recognition, statistical model

Finally, coefficients close to zero mean that there is no linear correlation. Figure 2-14 shows various plots along with the correlation coefficient between their horizontal and vertical axes. Figure 2-14. Standard correlation coefficient of various datasets (source: Wikipedia; public domain image) Warning The correlation coefficient only measures linear correlations (“if x goes up, then y generally goes up/down”). It may completely miss out on nonlinear relationships (e.g., “if x is close to zero then y generally goes up”). Note how all the plots of the bottom row have a correlation coefficient equal to zero despite the fact that their axes are clearly not independent: these are examples of nonlinear relationships. Also, the second row shows examples where the correlation coefficient is equal to 1 or –1; notice that this has nothing to do with the slope.

The ocean proximity attribute may be useful as well, although in Northern California the housing prices in coastal districts are not too high, so it is not a simple rule. Looking for Correlations Since the dataset is not too large, you can easily compute the standard correlation coefficient (also called Pearson’s r) between every pair of attributes using the corr() method: corr_matrix = housing.corr() Now let’s look at how much each attribute correlates with the median house value: >>> corr_matrix["median_house_value"].sort_values(ascending=False) median_house_value 1.000000 median_income 0.687170 total_rooms 0.135231 housing_median_age 0.114220 households 0.064702 total_bedrooms 0.047865 population -0.026699 longitude -0.047279 latitude -0.142826 Name: median_house_value, dtype: float64 The correlation coefficient ranges from –1 to 1. When it is close to 1, it means that there is a strong positive correlation; for example, the median house value tends to go up when the median income goes up.

Also, the second row shows examples where the correlation coefficient is equal to 1 or –1; notice that this has nothing to do with the slope. For example, your height in inches has a correlation coefficient of 1 with your height in feet or in nanometers. Another way to check for correlation between attributes is to use Pandas’ scatter_matrix function, which plots every numerical attribute against every other numerical attribute. Since there are now 11 numerical attributes, you would get 112 = 121 plots, which would not fit on a page, so let’s just focus on a few promising attributes that seem most correlated with the median housing value (Figure 2-15): from pandas.plotting import scatter_matrix attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"] scatter_matrix(housing[attributes], figsize=(12, 8)) Figure 2-15. Scatter matrix The main diagonal (top left to bottom right) would be full of straight lines if Pandas plotted each variable against itself, which would not be very useful.


pages: 436 words: 140,256

The Rise and Fall of the Third Chimpanzee by Jared Diamond

agricultural Revolution, assortative mating, Atahualpa, Columbian Exchange, correlation coefficient, double helix, Drosophila, European colonialism, invention of gunpowder, invention of the wheel, invention of writing, longitudinal study, out of africa, phenotype, Scientific racism, Search for Extraterrestrial Intelligence, the scientific method, trade route

Psychologists have tackled this question by examining many married couples, measuring everything conceivable about their physical appearance and other characteristics, and then trying to make sense out of who married whom. A simple numerical way of describing the result is by means of a statistical index called the correlation coefficient. If you line up 100 husbands in order of their ranking for some characteristic (say, their height), and if you also line up their 100 wives with respect to the same characteristic, the correlation coefficient describes whether a man tends to be at the same position in the husbands' line-up as his wife is in the line-up of wives. A correlation coefficient of plus one would mean perfect correspondence: the tallest man marries the tallest woman, the thirty-seventh tallest man marries the thirty-seventh tallest woman, and so on. A correlation coefficient of minus one would mean perfect matching by opposites: the tallest man marries the shortest woman, the thirty-seventh tallest man marries the thirty-seventh shortest woman, and so on.

A correlation coefficient of minus one would mean perfect matching by opposites: the tallest man marries the shortest woman, the thirty-seventh tallest man marries the thirty-seventh shortest woman, and so on. Finally, a correlation coefficient of zero would mean that husbands and wives assort completely randomly by height: a tall man is as likely to marry a short woman as a tall woman. These examples are for height, but correlation coefficients can also be calculated for anything else, such as income and IQ. If you measure enough things about enough couples, here is what you will find. Not surprisingly, the highest correlation coefficients—typically around +0.9—are for religion, ethnic background, race, socioeconomic status, age, and political views. That is, most husbands and wives prove to be of the same religion, ethnic background, and so on. Perhaps you also will not be surprised that the next highest correlation coefficients, usually around +0.4, are for measures of personality and intelligence, such as extroversion, neatness, and IQ.

Those other traits include ones as diverse as breadth of nose, length of ear lobe or middle finger, circumference of wrist, distance between eyes, and lung volume! Experimenters have made this finding for people as diverse as Poles in Poland, Americans in Michigan, and Africans in Chad. If you do not believe it, try noting eye colours (or measuring ear lobes) the next time you are at a dinner party with many couples, and then get your pocket calculator to give you the correlation coefficient. Coefficients for physical traits are on the average +0.2- not so high as for personality traits (+0.4) or religion (+0.9), but still significantly higher than zero. For a few physical traits the correlation is even higher than 0.2-for instance, an astonishing 0.61 for length of middle finger. At least unconsciously people care more about their spouse's middle finger length than about his or her hair colour and intelligence!


pages: 517 words: 139,477

Stocks for the Long Run 5/E: the Definitive Guide to Financial Market Returns & Long-Term Investment Strategies by Jeremy Siegel

Asian financial crisis, asset allocation, backtesting, banking crisis, Black-Scholes formula, break the buck, Bretton Woods, business cycle, buy and hold, buy low sell high, California gold rush, capital asset pricing model, carried interest, central bank independence, cognitive dissonance, compound rate of return, computer age, computerized trading, corporate governance, correlation coefficient, Credit Default Swap, Daniel Kahneman / Amos Tversky, Deng Xiaoping, discounted cash flows, diversification, diversified portfolio, dividend-yielding stocks, dogs of the Dow, equity premium, Eugene Fama: efficient market hypothesis, eurozone crisis, Everybody Ought to Be Rich, Financial Instability Hypothesis, fixed income, Flash crash, forward guidance, fundamental attribution error, housing crisis, Hyman Minsky, implied volatility, income inequality, index arbitrage, index fund, indoor plumbing, inflation targeting, invention of the printing press, Isaac Newton, joint-stock company, London Interbank Offered Rate, Long Term Capital Management, loss aversion, market bubble, mental accounting, money market fund, mortgage debt, Myron Scholes, new economy, Northern Rock, oil shock, passive investing, Paul Samuelson, Peter Thiel, Ponzi scheme, prediction markets, price anchoring, price stability, purchasing power parity, quantitative easing, random walk, Richard Thaler, risk tolerance, risk/return, Robert Gordon, Robert Shiller, Robert Shiller, Ronald Reagan, shareholder value, short selling, Silicon Valley, South Sea Bubble, sovereign wealth fund, stocks for the long run, survivorship bias, technology bubble, The Great Moderation, the payments system, The Wisdom of Crowds, transaction costs, tulip mania, Tyler Cowen: Great Stagnation, Vanguard fund

This will be particularly true if bond and stock returns are negatively correlated, which would happen if bond and stock prices move in the opposite direction.4 The diversifying strength of an asset is measured by the correlation coefficient. The correlation coefficient ranges between -1 and +1 and measures the co-movement between an asset’s return and the return of the rest of the portfolio. The lower the correlation coefficient, the better the asset serves as a portfolio diversifier. Assets with near-zero or especially negative correlations are particularly good diversifiers. As the correlation coefficient between the asset and portfolio returns increases, the diversifying quality of the asset declines. In Chapter 3 we examined the changing correlation coefficient between the return on 10-year Treasury bonds and stocks, represented by the S&P 500 Index. Figure 6-3 displays the correlation coefficient between annual stock and bond returns for three subperiods between 1926 and 2012.

Malkiel, A Random Walk Down Wall Street: The Time-Tested Strategy for Successful Investing, 5th ed., New York: Norton, 1990, p. 362. 5. The standard deviation of the Magellan Fund over Lynch’s period was 21.38 percent, compared with 13.88 percent for the Wilshire 5000, while its correlation coefficient with the Wilshire was .86. 6. “The Superinvestors of Graham-and-Doddsville,” Hermes, the Columbia Business School Magazine, 1984 (reprinted 2004). 7. Money managers are assumed to expose their clients to the same risk as would the market, and the money managers have a correlation coefficient of .88 with market returns, which has been typical of equity mutual funds since 1971. 8. Darryll Hendricks, Jayendu Patel, and Richard Zeckhauser, “Hot Hands in Mutual Funds: Short-Run Persistence of Relative Performance, 1974-1988,” Journal of Finance, vol. 48, no. 1 (March 1993), pp. 93-130. 9.

Similarly it is not a good strategy to buy the stocks only in your own country, especially when developed economies are becoming an ever smaller part of the world’s market. International diversification reduces risk because the stock prices of different countries do not rise and fall in tandem, and this asynchronous movement of returns dampens the volatility of the portfolio. As long as two assets are not perfectly correlated, i.e., their correlation coefficient is less than 1, then combining these assets will lower the risk of your portfolio for a given return or, alternatively, raise the return for a given risk. International Stock Returns Table 13-1 displays the historical risk and returns for dollar-based investors in the international markets from 1970 to the present (1988 for emerging market data). Over the entire period, the dollar returns among different regions do not differ greatly.


pages: 239 words: 77,436

Pure, White and Deadly: How Sugar Is Killing Us and What We Can Do to Stop It by John Yudkin

correlation coefficient, correlation does not imply causation, discovery of penicillin

I calculated what are called the ‘correlation coefficients’ between these cancers and sugar consumption in all the countries for which statistics were then available. Let me explain first what correlation coefficients are, and let me take as an example the relation between people’s height and weight. On the whole, the taller people are, the more they weigh. But it is all very well to say that there is ‘on the whole’ this association between height and weight; it would be better if we could say how close this association is. Supposing that it was a precise and exact association, so that the person who was only a little taller than another would inevitably be heavier, and one still taller would be still heavier. If this were so, you would say that the correlation coefficient was 1·0. Supposing on the other hand – and this is even more unlikely – that there was no relationship whatever between height and weight, so that it would be just as likely for a man weighing 150 pounds to be five feet tall or six feet tall.

Supposing on the other hand – and this is even more unlikely – that there was no relationship whatever between height and weight, so that it would be just as likely for a man weighing 150 pounds to be five feet tall or six feet tall. In this case the correlation coefficient would be 0. In fact, there is a relationship, but not a precise one; tall people tend to be heavier. If you work it out exactly, for adult men the correlation coefficient between height and weight comes to about 0·6. The correlation coefficients I have found so far for cancer and sugar consumption in different countries are as follows: Cancer of the large intestine in men: 0·60 Cancer of the large intestine in women: 0·50 Cancer of the breast: 0·63 However, such international statistics, as I have stressed repeatedly, can do no more than give a clue as to the possible role of sugar or fat in producing disease.


pages: 589 words: 69,193

Mastering Pandas by Femi Anthony

Amazon Web Services, Bayesian statistics, correlation coefficient, correlation does not imply causation, Debian, en.wikipedia.org, Internet of things, natural language processing, p-value, random walk, side project, statistical model, Thomas Bayes

For more information on correlation and dependency, refer to http://en.wikipedia.org/wiki/Correlation_and_dependence. The correlation measure, known as correlation coefficient, is a number that captures the size and direction of the relationship between the two variables. It can vary from -1 to +1 in direction and 0 to 1 in magnitude. The direction of the relationship is expressed via the sign, with a + sign expressing positive correlation and a - sign negative correlation. The higher the magnitude, the greater the correlation with a one being termed as the perfect correlation. The most popular and widely used correlation coefficient is the Pearson product-moment correlation coefficient, known as r. It measures the linear correlation or dependence between two x and y variables and takes values between -1 and +1. The sample correlation coefficient r is defined as follows: This can also be written as follows: Here, we have omitted the summation limits.

However, note that the intercept value is not really meaningful as it is outside the bounds of the data. We can also only make predictions for values within the bounds of the data. For example, we cannot predict what the chirpFrequency is at 32 degrees Fahrenheit as it is outside the bounds of the data; moreover, at 32 degrees Fahrenheit, the crickets would have frozen to death. The value of R, the correlation coefficient, is given as follows: In [38]: R=np.sqrt(result.rsquared) R Out[38]: 0.83514378678237422 Thus, our correlation coefficient is R = 0.835. This would indicate that about 84 percent of the chirp frequency can be explained by the changes in temperature. Reference of this information: The Song of Insects http://www.hup.harvard.edu/catalog.php?isbn=9780674420663 The data is sourced from http://bit.ly/1MrlJqR. For a more in-depth treatment of single and multi-variable regression, refer to the following websites: Regression (Part I): http://bit.ly/1Eq5kSx Regression (Part II): http://bit.ly/1OmuFTV Summary In this chapter, we took a brief tour of the classical or frequentist approach to statistics and showed you how to combine pandas along with the stats packages—scipy.stats and statsmodels—to calculate, interpret, and make inferences from statistical data.


pages: 147 words: 39,910

The Great Mental Models: General Thinking Concepts by Shane Parrish

Albert Einstein, Atul Gawande, Barry Marshall: ulcers, bitcoin, Black Swan, colonial rule, correlation coefficient, correlation does not imply causation, cuban missile crisis, Daniel Kahneman / Amos Tversky, dark matter, delayed gratification, feminist movement, index fund, Isaac Newton, Jane Jacobs, mandelbrot fractal, Pierre-Simon Laplace, Ponzi scheme, Richard Feynman, statistical model, stem cell, The Death and Life of Great American Cities, the map is not the territory, the scientific method, Thomas Bayes, Torches of Freedom

We then often act upon that erroneous conclusion, making decisions that can have immense influence across our lives. The problem is, without a good understanding of what is meant by these terms, these decisions fail to capitalize on real dynamics in the world and instead are successful only by luck. No Correlation The correlation coefficient between two measures, which varies between -1 and 1, is a measure of the relative weight of the factors they share. For example, two phenomena with few factors shared, such as bottled water consumption versus suicide rate, should have a correlation coefficient of close to 0. That is to say, if we looked at all countries in the world and plotted suicide rates of a specific year against per capita consumption of bottled water, the plot would show no pattern at all. Perfect Correlation On the contrary, there are measures which are solely dependent on the same factor.

Perfect Correlation On the contrary, there are measures which are solely dependent on the same factor. A good example of this is temperature. The only factor governing temperature—velocity of molecules—is shared by all scales. Thus each degree in Celsius will have exactly one corresponding value in Fahrenheit. Therefore temperature in Celsius and Fahrenheit will have a correlation coefficient of 1 and the plot will be a straight line. Weak to Moderate Correlation There are few phenomena in human sciences that have a correlation coefficient of 1. There are, however, plenty where the association is weak to moderate and there is some explanatory power between the two phenomena. Consider the correlation between height and weight, which would land somewhere between 0 and 1. While virtually every three-year-old will be lighter and shorter than every grown man, not all grown men or three-year-olds of the same height will weigh the same.


Risk Management in Trading by Davis Edwards

asset allocation, asset-backed security, backtesting, Black-Scholes formula, Brownian motion, business cycle, computerized trading, correlation coefficient, Credit Default Swap, discrete time, diversified portfolio, fixed income, implied volatility, intangible asset, interest rate swap, iterative process, John Meriwether, London Whale, Long Term Capital Management, margin call, Myron Scholes, Nick Leeson, p-value, paper trading, pattern recognition, random walk, risk tolerance, risk/return, selection bias, shareholder value, Sharpe ratio, short selling, statistical arbitrage, statistical model, stochastic process, systematic trading, time value of money, transaction costs, value at risk, Wiener process, zero-coupon bond

The most common way to measure the relationship between two assets is to calculate the correlation coefficient of their price changes. The correlation coefficient is a number between −1 and +1 that indicates the strength of 77 Financial Mathematics KEY CONCEPT: CORRELATION In the financial markets, the statement that “two assets are correlated” means “the price changes in the two assets are correlated” rather than the “prices are correlated.” This distinction is very important because it is changes in value that determine the risk, profit, and loss of investments. the relationship between the two data series. (See Figure 3.10, Positive and Negative Correlation.) Some features of correlation are: ■ ■ ■ Positive Correlation. A correlation coefficient equal to +1 means that the two series have behaved identically over the testing period.

A correlation coefficient equal to +1 means that the two series have behaved identically over the testing period. Negative Correlation. A correlation coefficient of −1 indicates that the series have been inversely proportional during the testing period. In other words, when one price rises, the other price falls. Zero Correlation. A correlation coefficient of zero indicates no relationship between the two values The calculation of the correlation coefficient, ρ, is mathematically defined. (See Equation 3.10, Correlation.) Positive Correlation FIGURE 3.10 Negative Correlation Positive and Negative Correlation ∑ ρ= (x−x) (y − y ) (n − 1)σ x σ y where x Data Set. The first set of data x Mean. The average of the first data set Zero (Low) Correlation 78 RISK MANAGEMENT IN TRADING σx Standard Deviation. The standard deviation of the first data set y Data Set. The second set of data y Mean.

As long as the hedge and hedged item have similar changes in value, the hedge is effective. The relative size of the value assigned to the hedge and hedged item do not affect hedge effectiveness. A number of summary statistics are produced by a regression analysis. These statistics are commonly used to evaluate the effectiveness of the hedge. Two major statistics used for this purpose are the slope (abbreviated b above) and the correlation coefficient (usually squared and abbreviated as R2 or R‐squared). Secondary statistics include checking that there are enough observations to conduct a valid test, and that the slope and R2 tests are sufficiently stable to trust the results. For example, a hedge‐accounting memo might define five tests to determine a highly effective hedge. Highly effective is commonly interpreted to mean that: ■ ■ ■ ■ ■ Test 1 (Slope).


pages: 482 words: 121,672

A Random Walk Down Wall Street: The Time-Tested Strategy for Successful Investing (Eleventh Edition) by Burton G. Malkiel

accounting loophole / creative accounting, Albert Einstein, asset allocation, asset-backed security, beat the dealer, Bernie Madoff, bitcoin, butter production in bangladesh, buttonwood tree, buy and hold, capital asset pricing model, compound rate of return, correlation coefficient, Credit Default Swap, Daniel Kahneman / Amos Tversky, Detroit bankruptcy, diversification, diversified portfolio, dogs of the Dow, Edward Thorp, Elliott wave, Eugene Fama: efficient market hypothesis, experimental subject, feminist movement, financial innovation, financial repression, fixed income, framing effect, George Santayana, hindsight bias, Home mortgage interest deduction, index fund, invisible hand, Isaac Newton, Long Term Capital Management, loss aversion, margin call, market bubble, money market fund, mortgage tax deduction, new economy, Own Your Own Home, passive investing, Paul Samuelson, pets.com, Ponzi scheme, price stability, profit maximization, publish or perish, purchasing power parity, RAND corporation, random walk, Richard Thaler, risk tolerance, risk-adjusted returns, risk/return, Robert Shiller, Robert Shiller, short selling, Silicon Valley, South Sea Bubble, stocks for the long run, survivorship bias, the rule of 72, The Wisdom of Crowds, transaction costs, Vanguard fund, zero-coupon bond, zero-sum game

Still, at least at certain times, some stocks and some classes of assets do move against the market; that is, they have negative covariance or (and this is the same thing) they are negatively correlated with each other. THE CORRELATION COEFFICIENT AND THE ABILITY OF DIVERSIFICATION TO REDUCE RISK Correlation Coefficient Effect of Diversification on Risk +1.0 No risk reduction is possible. +0.5 Moderate risk reduction is possible. 0 Considerable risk reduction is possible. –0.5 Most risk can be eliminated. –1.0 All risk can be eliminated. Now comes the real kicker; negative correlation is not necessary to achieve the risk reduction benefits from diversification. Markowitz’s great contribution to investors’ wallets was his demonstration that anything less than perfect positive correlation can potentially reduce risk. His research led to the results presented in the preceding table. As shown, it demonstrates the crucial role of the correlation coefficient in determining whether adding a security or an asset class can reduce risk.

When higher returns can be achieved with lower risk by adding international stocks, no investor should fail to take notice. Some portfolio managers have argued that diversification has not continued to give the same degree of benefit as was previously the case. Globalization led to an increase in the correlation coefficients between the U.S. and foreign markets as well as between stocks and commodities. The following three charts indicate how correlation coefficients have risen over the first decade of the 2000s. The charts show the correlation coefficients calculated over every twenty-four-month period between U.S. stocks (as measured by the S&P 500-Stock Index) and the EAFE index of developed foreign stocks, between U.S. stocks and the broad (MSCI) index of emerging-market stocks, and between U.S. stocks and the Goldman Sachs (GSCI) index of a basket of commodities such as oil, metals, and the like.

The following graph shows that an investment in the S&P 500 did not make any money during the first decade of the 2000s. But investment in a broad emerging-market index produced quite satisfactory returns. Broad international diversification would have been of enormous benefit to U.S. investors, even during “the lost decade.” Source: Vanguard, Datastream, Morningstar. Moreover, safe bonds proved their worth as a risk reducer. The graph on page 208 shows how correlation coefficients between U.S. Treasury bonds and large capitalization U.S. equities fell during the 2008–09 financial crisis. Even during the horrible stock market of 2008, a broadly diversified portfolio of bonds invested in the Barclay’s Capital broad bond index returned 5.2 percent. There was a place to hide during the financial crisis. Bonds (and bond-like securities to be covered in Part Four) have proved their worth as an effective diversifier.


pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron

Amazon Mechanical Turk, Anton Chekhov, combinatorial explosion, computer vision, constrained optimization, correlation coefficient, crowdsourcing, don't repeat yourself, Elon Musk, en.wikipedia.org, friendly AI, ImageNet competition, information retrieval, iterative process, John von Neumann, Kickstarter, natural language processing, Netflix Prize, NP-complete, optical character recognition, P = NP, p-value, pattern recognition, pull request, recommendation engine, self-driving car, sentiment analysis, SpamAssassin, speech recognition, stochastic process

Finally, coefficients close to zero mean that there is no linear correlation. Figure 2-14 shows various plots along with the correlation coefficient between their horizontal and vertical axes. Figure 2-14. Standard correlation coefficient of various datasets (source: Wikipedia; public domain image) Warning The correlation coefficient only measures linear correlations (“if x goes up, then y generally goes up/down”). It may completely miss out on nonlinear relationships (e.g., “if x is close to zero then y generally goes up”). Note how all the plots of the bottom row have a correlation coefficient equal to zero despite the fact that their axes are clearly not independent: these are examples of nonlinear relationships. Also, the second row shows examples where the correlation coefficient is equal to 1 or –1; notice that this has nothing to do with the slope.

The ocean proximity attribute may be useful as well, although in Northern California the housing prices in coastal districts are not too high, so it is not a simple rule. Looking for Correlations Since the dataset is not too large, you can easily compute the standard correlation coefficient (also called Pearson’s r) between every pair of attributes using the corr() method: corr_matrix = housing.corr() Now let’s look at how much each attribute correlates with the median house value: >>> corr_matrix["median_house_value"].sort_values(ascending=False) median_house_value 1.000000 median_income 0.687170 total_rooms 0.135231 housing_median_age 0.114220 households 0.064702 total_bedrooms 0.047865 population -0.026699 longitude -0.047279 latitude -0.142826 Name: median_house_value, dtype: float64 The correlation coefficient ranges from –1 to 1. When it is close to 1, it means that there is a strong positive correlation; for example, the median house value tends to go up when the median income goes up.

Also, the second row shows examples where the correlation coefficient is equal to 1 or –1; notice that this has nothing to do with the slope. For example, your height in inches has a correlation coefficient of 1 with your height in feet or in nanometers. Another way to check for correlation between attributes is to use Pandas’ scatter_matrix function, which plots every numerical attribute against every other numerical attribute. Since there are now 11 numerical attributes, you would get 112 = 121 plots, which would not fit on a page, so let’s just focus on a few promising attributes that seem most correlated with the median housing value (Figure 2-15): from pandas.tools.plotting import scatter_matrix attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"] scatter_matrix(housing[attributes], figsize=(12, 8)) Figure 2-15. Scatter matrix The main diagonal (top left to bottom right) would be full of straight lines if Pandas plotted each variable against itself, which would not be very useful.


pages: 741 words: 199,502

Human Diversity: The Biology of Gender, Race, and Class by Charles Murray

23andMe, affirmative action, Albert Einstein, Alfred Russel Wallace, Asperger Syndrome, assortative mating, basic income, bioinformatics, Cass Sunstein, correlation coefficient, Daniel Kahneman / Amos Tversky, double helix, Drosophila, epigenetics, equal pay for equal work, European colonialism, feminist movement, glass ceiling, Gunnar Myrdal, income inequality, Kenneth Arrow, labor-force participation, longitudinal study, meta analysis, meta-analysis, out of africa, p-value, phenotype, publication bias, quantitative hedge fund, randomized controlled trial, replication crisis, Richard Thaler, risk tolerance, school vouchers, Scientific racism, selective serotonin reuptake inhibitor (SSRI), Silicon Valley, social intelligence, statistical model, Steven Pinker, The Bell Curve by Richard Herrnstein and Charles Murray, the scientific method, The Wealth of Nations by Adam Smith, theory of mind, Thomas Kuhn: the structure of scientific revolutions, twin studies, universal basic income, working-age population

Now continue to read and see how well you have intuitively produced the basis for a correlation coefficient and a regression coefficient. The Correlation Coefficient Modern statistics provide more than one method for measuring correlation, but we confine ourselves to the one that is most important in both use and generality: the Pearson product-moment correlation coefficient (named after Karl Pearson, the English mathematician and biometrician). To get at this coefficient, let us first replot the graph of the class, replacing inches and pounds with standard scores. The variables are now expressed in general terms. Remember: Any set of measurements can be transformed similarly. The next step on our way to the correlation coefficient is to apply a formula that finds the best possible straight line passing through the cloud of points—the mathematically “best” version of the line you just drew by intuition.

Note that while the line in the graph above goes uphill to the right, it would go downhill for pairs of variables that are negatively correlated. We focus on the slope of the best-fitting line because it is the correlation coefficient—in this case, equal to .50, which is quite large by the standards of variables used by social scientists. The closer it gets to ±1.0, the stronger is the linear relationship between the standardized variables (the variables expressed as standard scores). When the two variables are mutually independent, the best-fitting line is horizontal; hence its slope is 0. Anything other than 0 signifies a relationship, albeit possibly a very weak one. Whatever the correlation coefficient of a pair of variables is, squaring it yields another notable number. Squaring .50, for example, gives .25. The significance of the squared correlation is that it tells us how much the variation in weight would decrease if we could make everyone the same height, or vice versa.

For example, a study that tracked two million financial transactions found that the correlation between a person’s score on a measure of extraversion and the amount spent on holiday shopping is just +.09. “Multiply the effect identified with this correlation by the number of people in a department store the week before Christmas,” the authors wrote, “and it becomes obvious why merchandisers should care deeply about the personalities of their customers.”9 They offered a new set of guidelines based on the correlation coefficient (r). In the summary that follows, I have replaced the value of r with the equivalent value of Cohen’s d. The authors argued that an effect size of .10 “is ‘very small’ for the explanations of single events but potentially consequential in the not-very long run,” while an effect size of .20 “is still ‘small’ at the level of single events but potentially more ultimately consequential.”10 Other scholars have advocated similar guidelines for interpreting small values of d.11 But their treatment of “small” collides with the position taken by the most influential work arguing for small sex differences in cognitive repertoires—the “gender similarities hypothesis” originated by psychologist Janet Shibley Hyde in the September 1985 issue of American Psychologist, the flagship journal of the American Psychological Association.


pages: 408 words: 85,118

Python for Finance by Yuxing Yan

asset-backed security, business cycle, business intelligence, capital asset pricing model, constrained optimization, correlation coefficient, distributed generation, diversified portfolio, implied volatility, market microstructure, P = NP, p-value, quantitative trading / quantitative finance, Sharpe ratio, time value of money, value at risk, volatility smile, zero-sum game

Find out the meaning of zscore() included in the stats submodule (SciPy), and offer a simple example of using this function. 18. What is the market risk (beta) for IBM in 2010? (Hint: the source of data could be from Yahoo! Finance.) 19. What is wrong with the following lines of code? >>>c=20 >>>npv=np.npv(0.1,c) [ 121 ] Introduction to NumPy and SciPy 20. The correlation coefficient function from NumPy is np.corrcoef(). Find more about this function. Estimate the correlation coefficient between IBM, DELL, and W-Mart. 21. Why is it claimed that the sn.npv() function from SciPY() is really a Present Value (PV) function? 22. Design a true NPV function using all cash flows, including today's cash flow. 23. The Sharpe ratio is used to measure the trade-off between risk and return: Sharpe = R − Rf σ Here, R is the expected returns for an individual security, and R f is the expected risk-free rate. σ is the volatility, that is, standard deviation of the return on the underlying security.

The color of the arrow is black. For more detail about the function, just type help(plt.annotate) after issuing import matplotlib.pyplot as plt. From the preceding graph, we see that the fluctuation, uncertainty, or risk of our equal-weighted portfolio is much smaller than those of individual stocks in its portfolio. We can also estimate their means, standard deviation, and correlation coefficient. The correlation coefficient between those two stocks is -0.75, and this is the reason why we could diversify away firm-specific risk by forming an even equal-weighted portfolio as shown in the following code: >>>import scipy as sp >>>sp.corrcoef(A,B) array([[ 1. , -0.74583429], [-0.74583429, 1. ]]) In the preceding example, we use hypothetical numbers (returns) for two stocks. How about IBM and W-Mart?

First, let us look at a hypothetical case by assuming that we have 5 years' annual returns of two stocks as follows: Year Stock A Stock B 2009 0.102 0.1062 2010 -0.02 0.23 2011 0.213 0.045 2012 0.12 0.234 2013 0.13 0.113 We form an equal-weighted portfolio using those two stocks. Using the mean() and std() functions contained in NumPy, we can estimate their means, standard deviations, and correlation coefficients as follows: >>>import numpy as np >>>A=[0.102,-0.02, 0.213,0.12,0.13] >>>B=[0.1062,0.23, 0.045,0.234,0.113] >>>port_EW=(np.array(ret_A)+np.array(ret_B))/2. >>>round(np.mean(A),3),round(np.mean(B),3),round(np.mean(port_EW),3) (0.109, 0.146, 0.127) >>>round(np.std(A),3),round(np.std(B),3),round(np.std(port_EW),3) (0.075, 0.074, 0.027) In the preceding code, we estimate mean returns, their standard deviations for individual stocks, and an equal-weighted portfolio.


pages: 416 words: 118,592

A Random Walk Down Wall Street: The Time-Tested Strategy for Successful Investing by Burton G. Malkiel

accounting loophole / creative accounting, Albert Einstein, asset allocation, asset-backed security, backtesting, beat the dealer, Bernie Madoff, BRICs, butter production in bangladesh, buy and hold, capital asset pricing model, compound rate of return, correlation coefficient, Credit Default Swap, Daniel Kahneman / Amos Tversky, diversification, diversified portfolio, dogs of the Dow, Edward Thorp, Elliott wave, Eugene Fama: efficient market hypothesis, experimental subject, feminist movement, financial innovation, fixed income, framing effect, hindsight bias, Home mortgage interest deduction, index fund, invisible hand, Isaac Newton, Long Term Capital Management, loss aversion, margin call, market bubble, money market fund, mortgage tax deduction, new economy, Own Your Own Home, passive investing, Paul Samuelson, pets.com, Ponzi scheme, price stability, profit maximization, publish or perish, purchasing power parity, RAND corporation, random walk, Richard Thaler, risk tolerance, risk-adjusted returns, risk/return, Robert Shiller, Robert Shiller, short selling, Silicon Valley, South Sea Bubble, stocks for the long run, survivorship bias, The Myth of the Rational Market, the rule of 72, The Wisdom of Crowds, transaction costs, Vanguard fund, zero-coupon bond

Still, at least at certain times, some stocks and some classes of assets do move against the market; that is, they have negative covariance or (and this is the same thing) they are negatively correlated with each other. THE CORRELATION COEFFICIENT AND THE ABILITY OF DIVERSIFICATION TO REDUCE RISK Correlation Coefficient Effect of Diversification on Risk +1.0 No risk reduction is possible. +0.5 Moderate risk reduction is possible. 0 Considerable risk reduction is possible. –0.5 Most risk can be eliminated. –1.0 All risk can be eliminated. Now comes the real kicker; negative correlation is not necessary to achieve the risk reduction benefits from diversification. Markowitz’s great contribution to investors’ wallets was his demonstration that anything less than perfect positive correlation can potentially reduce risk. His research led to the results presented in the preceding table. As shown, it demonstrates the crucial role of the correlation coefficient in determining whether adding a security or an asset class can reduce risk.

When higher returns can be achieved with lower risk by adding international stocks, no investor should fail to take notice. Some portfolio managers have argued that diversification has not continued to give the same degree of benefit as was previously the case. Globalization led to an increase in the correlation coefficients between the U.S. and foreign markets as well as between stocks and commodities. The following three charts indicate how correlation coefficients have risen over the first decade of the 2000s. The charts show the correlation coefficients calculated over every twenty-four-month period between U.S. stocks (as measured by the S&P 500-Stock Index) and the EAFE index of developed foreign stocks, between U.S. stocks and the broad (MSCI) index of emerging-market stocks, and between U.S. stocks and the Goldman Sachs (GSCI) index of a basket of commodities such as oil, metals, and the like.

Broad international diversification would have been of enormous benefit to U.S. investors, even during “the lost decade.” DIVERSIFICATION INTO EMERGING MARKETS HELPED DURING “THE LOST DECADE”: CUMULATIVE RETURNS FROM ALTERNATIVE MARKETS Source: Vanguard, Datastream, Morningstar. Moreover, safe bonds proved their worth as a risk reducer. The graph Time Varying Stock–Bond Correlation shows how correlation coefficients between U.S. Treasury bonds and large capitalization U.S. equities fell during the 2008–09 financial crisis. Even during the horrible stock market of 2008, a broadly diversified portfolio of bonds invested in the Barclay’s Capital broad bond index returned 5.2 percent. There was a place to hide during the financial crisis. Bonds have proved their worth as an effective diversifier. TIME VARYING STOCK–BOND CORRELATION Data: 10Y Treasury return is calculated from 10Y Treasury yields.


pages: 369 words: 128,349

Beyond the Random Walk: A Guide to Stock Market Anomalies and Low Risk Investing by Vijay Singal

3Com Palm IPO, Andrei Shleifer, asset allocation, buy and hold, capital asset pricing model, correlation coefficient, cross-subsidies, Daniel Kahneman / Amos Tversky, diversified portfolio, endowment effect, fixed income, index arbitrage, index fund, information asymmetry, liberal capitalism, locking in a profit, Long Term Capital Management, loss aversion, margin call, market friction, market microstructure, mental accounting, merger arbitrage, Myron Scholes, new economy, prediction markets, price stability, profit motive, random walk, Richard Thaler, risk-adjusted returns, risk/return, selection bias, Sharpe ratio, short selling, survivorship bias, transaction costs, Vanguard fund

Once you choose to invest in both stocks, the return becomes riskless because one of the two stocks does well and the other does poorly regardless of the kind of weather. The key in diversification of risk is correlation. Notice that the returns from beachwear and video rental always go in the opposite direction. If one of them does well, the other does not. Therefore, adding stocks that do not behave like other stocks in your portfolio is good and can reduce risk. The correlation is measured by what is called a correlation coefficient. The correlation coefficient varies between –1 and +1. The two stocks in the above example have a correlation of –1. Unfortunately, most stocks have a positive correlation, and many of them have a correlation with the market portfolio that is close to +1. The challenge in diversifying risk is to find stocks that have a correlation of less than +1. However, if you own only one stock, such as the stock of the company you work for, it is easy to find other stocks that are not well correlated with that stock.

This important point underscores the trade-off between risk and return: investors are happy to give up some return if the reduction in risk is sufficient. Therefore, it is not always necessary to ensure that the return is preserved. A general rule to evaluate whether a new asset should be included in an existing portfolio is based on the risk-return trade-off relationship: E(Rn ) = R f + σ n ρn, p σp × E(Rp ) − R f  where E(R) is the return from an asset, s is the standard deviation, r is the correlation coefficient, and the subscripts n and p refer to the new stock and existing portfolio. Rf is the return on the risk-free asset. If the new asset’s return is greater than the right-hand side in the above equation, then the asset should be included in the existing portfolio, otherwise not. That condition can be rewritten as below: E( Rn ) − R f σn > E(Rp ) − R f σp × ρn, p Evidence Before looking at the evidence, consider the potential benefits from international investing and the source of those benefits.

That condition can be rewritten as below: E( Rn ) − R f σn > E(Rp ) − R f σp × ρn, p Evidence Before looking at the evidence, consider the potential benefits from international investing and the source of those benefits. Assume that the dollar return on U.S. stocks is 12 percent with a standard deviation of 18 percent, and the dollar return on non-U.S. stocks is also 12 percent with a standard deviation of 18 percent. Since the U.S. markets and foreign markets are not well correlated, let the correlation coefficient be 0.60. Putting the U.S. stocks and the non-U.S. stocks in a 50-50 combination would generate a new world portfolio with the following characteristics: Rw = w1RUS + w2 Rnon −US = 0.50 × 12% + 0.50 × 12% = 12% σ w = w12σ 12 + w22σ 22 + 2w1w2 ρσ 1σ 2 = 0.50 2 × 0.18 2 + 0.50 2 × 0.18 2 + 2 × 0.50 × 0.50 × 0.60 × 0.18 × 0.18 = 0.16 236 Beyond the Random Walk The new world portfolio has a return of 12 percent and a risk of 16 percent.


Trading Risk: Enhanced Profitability Through Risk Control by Kenneth L. Grant

backtesting, business cycle, buy and hold, commodity trading advisor, correlation coefficient, correlation does not imply causation, delta neutral, diversification, diversified portfolio, fixed income, frictionless, frictionless market, George Santayana, implied volatility, interest rate swap, invisible hand, Isaac Newton, John Meriwether, Long Term Capital Management, market design, Myron Scholes, performance metric, price mechanism, price stability, risk tolerance, risk-adjusted returns, Sharpe ratio, short selling, South Sea Bubble, Stephen Hawking, the scientific method, The Wealth of Nations by Adam Smith, transaction costs, two-sided market, value at risk, volatility arbitrage, yield curve, zero-coupon bond

For example, 170 TRADING RISK there’s no reason to believe that there are any statistical commonalities between, say, the Swedish rate of inflation and the price of silkworms in Malaysia; and over time we would expect a correlation between these two variables to be roughly zero. In terms of magnitudes, the correlation coefficient has a maximum value of 1.0, or 100%, indicating perfect correlation (e.g., the temperature in Toronto as measured in Fahrenheit and Celsius), and a minimum value of 1.0, or 100%, indicating perfect negative correlation (e.g., the price of a zero-coupon bond and its yield). All values in between are valid, and the process lends itself to all the subjectivity that the human mind can muster. However, you may find the following (admittedly simplistic) rules of thumb to be useful: Value of Correlation Coefficient Less than 50% Between 50% and 10% Between 10% and 10% Between 10% and 50% Greater than 50% Interpretation High negative correlation—merits full investigation.

Moreover, if you use drawdown as an “inverse barometer” of the amount of exposure acceptable for your account—reducing risk when significant drawdowns occur and increasing your exposure only when they are substantially erased—you stand to retain much more explicit control over your trading fortunes than you would if you operated in a vacuum with respect to this critical information metric. Correlations The final core element of our introductory statistical tool kit is correlation analysis. You ought to be at least nominally familiar with this concept, which involves identifying the extent to which two or more data series dynamically exhibit similar characteristics, most notably, for our purposes, across time. Correlation coefficients can range from 100% to 100% but (unless data series are simply disguised representations of a single concept, for example, the yield on a given bond and its price) typically fall somewhere in between. By performing correlation analysis on the time series of portfolio returns, traders stand to gain unique and specific insights into underlying portfolio economics. For example, you may find yourself highly correlated to some benchmark stock index such as the Standard & Poor’s (S&P) 500, the Dow Jones Industrial Average (the Dow, DJIA), or the Nasdaq Composite.

However, this is merely one type of correlation analysis that can be applied to great effect to your P/L time series. Following is a summary of some of the standard categories of correlation analysis that you may find useful in identifying the drivers of relative performance for your portfolio. Correlation against Market Benchmarks. This is the general case associated with the example provided prior, under which you might calculate “correlation coefficients” between your returns and the performance 74 TRADING RISK of various market indexes. Here, I recommend that you begin the process by simply identifying, in an anecdotal sense, the market indexes that might best capture the essence of your trading and then running some introductory correlations there. For example, if you are trading U.S. equities, you might begin with the S&P, the Dow, or the Nasdaq Composite.


pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzić

Albert Einstein, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application

One parameter, which shows this strength of linear association between two variables by means of a single number, is called a correlation coefficient r. Its computation requires some intermediate results in a regression analysis. where The value of r is between −1 and 1. Negative values for r correspond to regression lines with negative slopes and a positive r shows a positive slope. We must be very careful in interpreting the r value. For example, values of r equal to 0.3 and 0.6 only mean that we have two positive correlations, the second somewhat stronger than the first. It is wrong to conclude that r = 0.6 indicates a linear relationship twice as strong as that indicated by the value r = 0.3. For our simple example of linear regression given at the beginning of this section, the model obtained was B = 0.8 + 0.92A. We may estimate the quality of the model using the correlation coefficient r as a measure.

We may estimate the quality of the model using the correlation coefficient r as a measure. Based on the available data in Figure 4.3, we obtained intermediate results and the final correlation coefficient: A correlation coefficient r = 0.85 indicates a good linear relationship between two variables. Additional interpretation is possible. Because r2 = 0.72, we can say that approximately 72% of the variations in the values of B is accounted for by a linear relationship with A. 5.5 ANOVA Often the problem of analyzing the quality of the estimated regression line and the influence of the independent variables on the final regression is handled through an ANOVA approach. This is a procedure where the total variation in the dependent variable is subdivided into meaningful components that are then observed and treated in a systematic fashion. ANOVA is a powerful tool that is used in many data-mining applications.

For the training set given in Table 5.1, predict the classification of the following samples using simple Bayesian classifier. (a) {2, 1, 1} (b) {0, 1, 1} 4. Given a data set with two dimensions X and Y: X Y 1 5 4 2.75 3 3 5 2.5 (a) Use a linear regression method to calculate the parameters α and β where y = α + β x. (b) Estimate the quality of the model obtained in (a) using the correlation coefficient r. (c) Use an appropriate nonlinear transformation (one of those represented in Table 5.3) to improve regression results. What is the equation for a new, improved, and nonlinear model? Discuss a reduction of the correlation coefficient value. 5. A logit function, obtained through logistic regression, has the form: Find the probability of output values 0 and 1 for the following samples: (a) { 1, −1, −1 } (b) { −1, 1, 0 } (c) { 0, 0, 0 } 6. Analyze the dependency between categorical attributes X and Y if the data set is summarized in a 2 × 3 contingency table: 7.


pages: 130 words: 11,880

Optimization Methods in Finance by Gerard Cornuejols, Reha Tutuncu

asset allocation, call centre, constrained optimization, correlation coefficient, diversification, finite state, fixed income, frictionless, frictionless market, index fund, linear programming, Long Term Capital Management, passive investing, Sharpe ratio, transaction costs, value at risk

We will discuss his model in more detail later. Here we give a brief description of the model and relate it to QPs. Consider an investor who has a certain amount of money to be invested in a number of different securities (stocks, bonds, etc.) with random returns. For each security i, i = 1, . . . , n, estimates of its expected return, µi , and variance, σi2 , are given. Furthermore, for any two securities i and j, their correlation coefficient ρij is also assumed to be known. If we represent the proportion of the total funds invested in security i by xi , one can compute the expected return and the variance of the resulting portfolio x = (x1 , . . . , xn ) as follows: E[x] = x1 µ1 + . . . + xn µn = µT x, and V ar[x] = X ρij σi σj xi xj = xT Qx i,j where ρii ≡ 1, Qij = ρij σi σj for i 6= j, Qii = σi2 , and µ = (µ1 , . . . , µn ).

Chapter 5 QP Models and Tools in Finance 5.1 Mean-Variance Optimization In the introductory chapter, we have discussed Markowitz’ theory of mean-variance optimization (MVO) for the selection of portfolios of securities (or asset classes) in a manner that trades off the expected returns and the perceived risk of potential portfolios. Consider assets S1 , S2 , . . . , Sn (n ≥ 2) with random returns. Let µi and σi denote the expected return and the standard deviation of the return of asset Si . For i 6= j, ρij denotes the correlation coefficient of the returns of assets Si and Sj . Let µ = [µ1 , . . . , µn ]T , and Q be the n × n symmetric covariance matrix with Qii = σi2 and Qij = ρij σi σj for i 6= j. Denoting the proportion of the total funds invested in security i by xi , one can represent the expected return and the variance of the resulting portfolio x = (x1 , . . . , xn ) as follows: E[x] = x1 µ1 + . . . + xn µn = µT x, and V ar[x] = X ρij σi σj xi xj = xT Qx, i,j where ρii ≡ 1.

Ω The variance of a random variable X is defined by h V ar[X] = E (X − E[X])2 i = E[X 2 ] − (E[X])2 . The standard deviation of a random variable is the square-root of its variance. 108 APPENDIX C. A PROBABILITY PRIMER For two jointly distributed random variables X1 and X2 , their covariance is defined to be Cov(X1 , X2 ) = E [(X1 − E[X1 ])(X2 − E[X2 ])] = E[X1 X2 ] − E[X1 ]E[X2 ] The correlation coefficient of two random variables is the ratio of their covariance to the product of their standard deviations. For a collection of random variables X1 , . . . , Xn , the expected value of the sum of these random variables is equal to the sum of their expected values: " E n X i=1 # Xi = n X E[Xi ]. i=1 The formula for the variance of the sum of the random variables X1 , . . . , Xn is a bit more complicated: " V ar n X i=1 # Xi = n X i=1 V ar[Xi ] + 2 X 1≤i<j≤n Cov(Xi , Xj ).


The Art of Computer Programming by Donald Ervin Knuth

Brownian motion, complexity theory, correlation coefficient, Donald Knuth, Eratosthenes, G4S, Georg Cantor, information retrieval, Isaac Newton, iterative process, John von Neumann, Louis Pasteur, mandelbrot fractal, Menlo Park, NP-complete, P = NP, Paul Erdős, probability theory / Blaise Pascal / Pierre de Fermat, RAND corporation, random walk, sorting algorithm, Turing machine, Y2K

Similar remarks apply to the subtract-with-borrow and add- with-carry generators of exercise 3.2.1.1-14. K. Serial correlation test. We may also compute the following statistic: This is the "serial correlation coefficient," a measure of the extent to which Uj+i depends on Uj. Correlation coefficients appear frequently in statistical work. If we have n quantities Uo, Ui, ..., t/n-i and n others Vo, Vi, ..., Vn_i, the correlation coefficient between them is defined to be c = All summations in this formula are to be taken over the range 0 < j < n; Eq. B3) is the special case Vj = C/(j+i) mod n- The denominator of B4) is zero when JJo = U\ = ¦ ¦ ¦ = Un-\ or V$ = V\ = ¦ ¦ ¦ = Vn-\\ we exclude that case from discussion. 3.3.2 EMPIRICAL TESTS 73 A correlation coefficient always lies between —1 and +1. When it is zero or very small, it indicates that the quantities Uj and Vj are (relatively speaking) independent of each other, whereas a value of ±1 indicates total linear depen- dependence.

., Vn_i, let their mean values be «=- V v = - V vk. n ^ n ^ ' n 0<k<n 0<k<n a) Let U'k = Uk — u, V"fe' = Vk — v. Show that the correlation coefficient C given in Eq. B4) is equal to ? u'kvL 0<k<n b) Let C = N/D, where N and D denote the numerator and denominator of the expression in part (a). Show that N2 < D2, hence — 1 < C < 1; and obtain a formula for the difference D2 - N2. [Hint: See exercise 1.2.3-30.] c) If C = ±1, show that aUk + CVk = t, 0 < k < n, for some constants a, C, and r, not all zero. 18. [M20] (a) Show that if n = 2, the serial correlation coefficient B3) is always equal to —1 (unless the denominator is zero), (b) Similarly, show that when n = 3, the serial correlation coefficient always equals — \. (c) Show that the denominator in B3) is zero if and only if Uq = U\ = • • • = Un-i- 19. [M30] (J.

Therefore it is desirable to have C in Eq. B3) close to zero. In actual fact, since U0U1 is not completely independent of U1U2, the serial correlation coefficient is not expected to be exactly zero. (See exercise 18.) A "good" value of C will be between \xn — 2an and \xn + 2an, where l" ">2' B5) "-=;rrr °l=(n-1) (n - 2)' ">2' We expect C to be between these limits about 95 percent of the time. The formula for a\ in B5) is an upper bound, valid for serial correlations between independent random variables from an arbitrary distribution. When the C/'s are uniformly distributed, the true variance is obtained by subtracting %r-n~2 + O(n~7/3 logn). (See exercise 20.) Instead of simply computing the correlation coefficient between the obser- observations (Uo, U\, ..., Un-\) and their immediate successors (U\,..., Un-i,Uo), we can also compute it between (Uo, U\,..., Un-\) and any cyclically shifted sequence (Uq,...


Triumph of the Optimists: 101 Years of Global Investment Returns by Elroy Dimson, Paul Marsh, Mike Staunton

asset allocation, banking crisis, Berlin Wall, Bretton Woods, British Empire, buy and hold, capital asset pricing model, capital controls, central bank independence, colonial rule, corporate governance, correlation coefficient, cuban missile crisis, discounted cash flows, diversification, diversified portfolio, dividend-yielding stocks, equity premium, Eugene Fama: efficient market hypothesis, European colonialism, fixed income, floating exchange rates, German hyperinflation, index fund, information asymmetry, joint-stock company, negative equity, new economy, oil shock, passive investing, purchasing power parity, random walk, risk tolerance, risk/return, selection bias, shareholder value, Sharpe ratio, stocks for the long run, survivorship bias, technology bubble, transaction costs, yield curve

The top panel shows that when the full set of pairwise correlation coefficients between equity markets are estimated separately for the first and second halves of the twentieth century, there was no discernable relationship between the two. It would not have been possible to predict correlations for 1950–2000 from those estimated from annual data over the first half-century. The slope coefficient was insignificantly different from zero and the adjusted R2 was negative. Table 8-4: Regression of correlations between equity markets on earlier historical correlations Predicted correlations Slope t-value Adjusted R2 .07 1.0 -.001 1971–85 (180 months) .08 6.9 .342 1991–95 (60 months) .07 7.1 .296 Historical correlations Annual correlation coefficients (all 101 years) 1950–2000 (51 years) 1900–49 (50 years) Monthly correlations (post-Bretton Woods) 1986–2000 (192 months) Monthly correlations (recent data) 1996–2000 (60 months) Triumph of the Optimists: 101 Years of Global Investment Returns 116 Goetzmann, Li, and Rouwenhorst (2001) show how correlations between equity markets changed between 1872–2000 over seven successive sub-periods representing distinct economic and political conditions.

But while there were some similarities between the “early integration” and Bretton Woods periods, the correlation structures otherwise differed a great deal. The inter-war period, with its post-war boom, hyperinflation in Germany, the Wall Street Crash, and the Great Depression, was unique. Correlations were quite high due to common factors such as the crash and Depression, but the correlation structure differed from all other periods. Figure 8-6: Correlation coefficients between four core countries over seven successive sub-periods 0.7 0.6 0.5 0.4 Correlation coefficients US:UK US:Fra UK:Fra Average US:Ger UK:Ger Ger:Fra .40 .26 0.3 0.2 0.1 .09 .14 .15 .01 0.0 -0.1 -.07 -0.2 -0.3 -0.4 -0.5 1872–1889 1889–1914 Source: Goetzmann, Li, and Rouwenhorst, 2001 1915–1918 1919–1939 1940–1945 1946–1971 1972–2000 Chapter 8: International investment 117 Longin and Solnik (1995) provide further evidence of high correlations during periods of poor performance.

France’s highest correlations were with Belgium, The Netherlands, Italy, Ireland, Spain, and Switzerland; Italy’s were with France and Switzerland; The Netherlands was most highly correlated with Belgium, followed by France, Denmark, and Switzerland; and Sweden was highly correlated with Denmark, Canada (natural resources), and Switzerland (neutral countries). Australia’s highest correlations were with the United Kingdom and Ireland (historical and trade links), and Canada and South Africa (gold, mining, and the British Empire). 115 Chapter 8: International investment Table 8-3: Correlation coefficients between world equity markets* Wld Wld US UK .93 Swi Swe Spa SAf Neth Jap Ita Ire Ger Fra Den Can Bel Aus .77 .59 .62 .67 .54 .73 .68 .52 .69 .69 .73 .57 .82 .54 .69 .67 .44 .46 .53 .46 .57 .49 .40 .66 .56 .56 .46 .78 .45 .57 US .85 UK .70 .55 Swi .68 .50 .62 Swe .62 .44 .42 .54 Spa .41 .25 .25 .36 .37 SAf .55 .43 .49 .39 .34 .26 Neth .57 .39 .42 .51 .43 .28 .58 .44 .63 .31 .71 .42 .39 .73 .58 .59 .57 .57 .59 .56 .39 .60 .19 .72 .36 .45 .57 .53 .64 .58 .35 .63 .37 .63 .38 .63 .34 .49 .27 .76 .76 .44 .61 .29 .44 .35 .63 .32 .64 .50 .64 .75 .56 .51 .55 .54 .30 .29 .44 .24 .31 .42 .37 .25 .62 .10 .66 .39 .59 .63 .74 .77 .64 .55 .70 .46 Jap .45 .21 .33 .29 .39 .40 .31 .25 Ita .54 .37 .43 .52 .39 .41 .41 .32 Ire .58 .38 .73 .70 .42 .35 .42 .46 .29 .43 Ger .30 .12 -.01 .22 .09 -.03 .05 .27 .06 .16 .18 .34 .33 .25 .36 .24 .50 .17 .59 .33 .55 .71 .50 .40 .51 .38 .42 .03 .45 .49 .54 .57 .50 .83 .61 .57 .59 .46 Fra .62 .36 .45 .54 .44 .47 .38 .48 .25 .52 .53 .19 Den .57 .38 .40 .51 .56 .34 .31 .50 .46 .38 .55 .22 .45 Can .80 .80 .55 .48 .53 .27 .54 .34 .30 .37 .41 .13 .35 .46 Bel .58 .38 .40 .57 .43 .40 .29 .60 .25 .47 .49 .26 .68 .42 .35 Aus .66 .47 .66 .51 .50 .28 .56 .41 .28 .43 .62 .04 .47 .42 .62 .63 .60 .66 .48 .55 .54 .30 .30 .65 .30 .35 * Correlations in bold (lower left-hand triangle) are based on 101 years of real dollar returns, 1900–2000.


Principles of Corporate Finance by Richard A. Brealey, Stewart C. Myers, Franklin Allen

3Com Palm IPO, accounting loophole / creative accounting, Airbus A320, Asian financial crisis, asset allocation, asset-backed security, banking crisis, Bernie Madoff, big-box store, Black-Scholes formula, break the buck, Brownian motion, business cycle, buy and hold, buy low sell high, capital asset pricing model, capital controls, Carmen Reinhart, carried interest, collateralized debt obligation, compound rate of return, computerized trading, conceptual framework, corporate governance, correlation coefficient, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, cross-subsidies, discounted cash flows, disintermediation, diversified portfolio, equity premium, eurozone crisis, financial innovation, financial intermediation, fixed income, frictionless, fudge factor, German hyperinflation, implied volatility, index fund, information asymmetry, intangible asset, interest rate swap, inventory management, Iridium satellite, Kenneth Rogoff, law of one price, linear programming, Livingstone, I presume, London Interbank Offered Rate, Long Term Capital Management, loss aversion, Louis Bachelier, market bubble, market friction, money market fund, moral hazard, Myron Scholes, new economy, Nick Leeson, Northern Rock, offshore financial centre, Ponzi scheme, prediction markets, price discrimination, principal–agent problem, profit maximization, purchasing power parity, QR code, quantitative trading / quantitative finance, random walk, Real Time Gross Settlement, risk tolerance, risk/return, Robert Shiller, Robert Shiller, shareholder value, Sharpe ratio, short selling, Silicon Valley, Skype, Steve Jobs, The Nature of the Firm, the payments system, the rule of 72, time value of money, too big to fail, transaction costs, University of East Anglia, urban renewal, VA Linux, value at risk, Vanguard fund, yield curve, zero-coupon bond, zero-sum game, Zipcar

The entries in these diagonal boxes depend on the variances of stocks 1 and 2; the entries in the other two boxes depend on their covariance. As you might guess, the covariance is a measure of the degree to which the two stocks “covary.” The covariance can be expressed as the product of the correlation coefficient ρ12 and the two standard deviations:28 For the most part stocks tend to move together. In this case the correlation coefficient ρ12 is positive, and therefore the covariance σ12 is also positive. If the prospects of the stocks were wholly unrelated, both the correlation coefficient and the covariance would be zero; and if the stocks tended to move in opposite directions, the correlation coefficient and the covariance would be negative. Just as you weighted the variances by the square of the proportion invested, so you must weight the covariance by the product of the two proportionate holdings x1 and x2.

Portfolio risk and return Look back at the calculation for Heinz and Exxon in Section 8-1. Recalculate the expected portfolio return and standard deviation for different values of x1 and x2, assuming the correlation coefficient ρ12 = 0. Plot the range of possible combinations of expected return and standard deviation as in Figure 8.3. Repeat the problem for ρ12 = +.25. 11. Portfolio risk and return Mark Harrywitz proposes to invest in two shares, X and Y. He expects a return of 12% from X and 8% from Y. The standard deviation of returns is 8% for X and 5% for Y. The correlation coefficient between the returns is .2. a. Compute the expected return and standard deviation of the following portfolios: b. Sketch the set of portfolios composed of X and Y. c. Suppose that Mr. Harrywitz can also borrow or lend at an interest rate of 5%.

Calculate the variance and standard deviation of the returns on a portfolio that has equal investments in 2 shares, 3 shares, and so on, up to 10 shares. b. Use your estimates to draw a graph like Figure 7.11. How large is the underlying market risk that cannot be diversified away? c. Now repeat the problem, assuming that the correlation between each pair of stocks is zero. 17. Portfolio risk Table 7.9 shows standard deviations and correlation coefficients for eight stocks from different countries. Calculate the variance of a portfolio with equal investments in each stock. 18. Portfolio risk Your eccentric Aunt Claudia has left you $50,000 in BP shares plus $50,000 cash. Unfortunately her will requires that the BP stock not be sold for one year and the $50,000 cash must be entirely invested in one of the stocks shown in Table 7.9. What is the safest attainable portfolio under these restrictions?


pages: 315 words: 93,628

Is God a Mathematician? by Mario Livio

Albert Einstein, Antoine Gombaud: Chevalier de Méré, Brownian motion, cellular automata, correlation coefficient, correlation does not imply causation, cosmological constant, Dava Sobel, double helix, Edmond Halley, Eratosthenes, Georg Cantor, Gerolamo Cardano, Gödel, Escher, Bach, Henri Poincaré, Isaac Newton, Johannes Kepler, John von Neumann, music of the spheres, Myron Scholes, probability theory / Blaise Pascal / Pierre de Fermat, Russell's paradox, Thales of Miletus, The Design of Experiments, the scientific method, traveling salesman

For a given value of the temperature, one cannot predict precisely the number of forest fires that will break out, since the latter depends on other variables such as the humidity and the number of fires started by people. In other words, for any value of the temperature, there could be many corresponding numbers of forest fires and vice versa. Still, the mathematical concept known as the correlation coefficient allows us to measure quantitatively the strength of the relationship between two such variables. The person who first introduced the tool of the correlation coefficient was the Victorian geographer, meteorologist, anthropologist, and statistician Sir Francis Galton (1822–1911). Galton—who was, by the way, the half-cousin of Charles Darwin—was not a professional mathematician. Being an extraordinarily practical man, he usually left the mathematical refinements of his innovative concepts to other mathematicians, in particular to the statistician Karl Pearson (1857–1936).

If the correlation between them is very close, a very long cubit would usually imply a very tall stature, but if it were not very close, a very long cubit would be on the average associated with only a tall stature, and not a very tall one; while, if it were nil, a very long cubit would be associated with no especial stature, and therefore, on the average, with mediocrity. Pearson eventually gave a precise mathematical definition of the correlation coefficient. The coefficient is defined in such a way that when the correlation is very high—that is, when one variable closely follows the up-and-down trends of the other—the coefficient takes the value of 1. When two quantities are anticorrelated, meaning that when one increases the other decreases and vice versa, the coefficient is equal to–1. Two variables that each behave as if the other didn’t even exist have a correlation coefficient of 0. (For instance, the behavior of some governments unfortunately shows almost zero correlation with the wishes of the people whom they supposedly represent.) Modern medical research and economic forecasting depend crucially on identifying and calculating correlations.


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

For 1 degree of freedom, the χ2 value needed to reject the hypothesis at the 0.001 significance level is 10.828 (taken from the table of upper percentage points of the χ2 distribution, typically available from any textbook on statistics). Since our computed value is above this, we can reject the hypothesis that gender and preferred_reading are independent and conclude that the two attributes are (strongly) correlated for the given group of people. Correlation Coefficient for Numeric Data For numeric attributes, we can evaluate the correlation between two attributes, A and B, by computing the correlation coefficient (also known as Pearson's product moment coefficient, named after its inventer, Karl Pearson). This is(3.3) where n is the number of tuples, ai and bi are the respective values of A and B in tuple i, Ā and are the respective mean values of A and B, σA and σB are the respective standard deviations of A and B (as defined in Section 2.2.2), and Σ(aibi) is the sum of the AB cross-product (i.e., for each tuple, the value for A is multiplied by the value for B in that tuple).

An attribute (such as annual revenue, for instance) may be redundant if it can be “derived” from another attribute or set of attributes. Inconsistencies in attribute or dimension naming can also cause redundancies in the resulting data set. Some redundancies can be detected by correlation analysis. Given two attributes, such analysis can measure how strongly one attribute implies the other, based on the available data. For nominal data, we use the χ2 (chi-square) test. For numeric attributes, we can use the correlation coefficient and covariance, both of which access how one attribute's values vary from those of another. χ2 Correlation Test for Nominal Data For nominal data, a correlation relationship between two attributes, A and B, can be discovered by a χ2 (chi-square) test. Suppose A has c distinct values, namely a1, a2, … ac. B has r distinct values, namely b1, b2, … br. The data tuples described by A and B can be shown as a contingency table, with the c values of A making up the columns and the r values of B making up the rows.

Covariance of Numeric Data In probability theory and statistics, correlation and covariance are two similar measures for assessing how much two attributes change together. Consider two numeric attributes A and B, and a set of n observations {(a1, b1), …, (an, bn)}. The mean values of A and B, respectively, are also known as the expected values on A and B, that is, and The covariance between A and B is defined as(3.4) If we compare Eq. (3.3) for rA, B (correlation coefficient) with Eq. (3.4) for covariance, we see that(3.5) where σA and σB are the standard deviations of A and B, respectively. It can also be shown that(3.6) This equation may simplify calculations. For two attributes A and B that tend to change together, if A is larger than Ā (the expected value of A), then B is likely to be larger than (the expected value of B). Therefore, the covariance between A and B is positive.


The Concepts and Practice of Mathematical Finance by Mark S. Joshi

Black-Scholes formula, Brownian motion, correlation coefficient, Credit Default Swap, delta neutral, discrete time, Emanuel Derman, fixed income, implied volatility, incomplete markets, interest rate derivative, interest rate swap, London Interbank Offered Rate, martingale, millennium bug, quantitative trading / quantitative finance, short selling, stochastic process, stochastic volatility, the market place, time value of money, transaction costs, value at risk, volatility smile, yield curve, zero-coupon bond

If we now compare Yt with X (l), the fact that Xil) was used to construct Yt means that their movements are correlated. In particular, we have that E((Yt - YS)(Xil) - X(1))) = PE((Xtl) - Xs 1))2) + 1 - p2]E((Xi2)- X(Z))(Xt`1 Xs1))). (11.4) Since V) and X(2) are independent, the second expectation is zero and so 1E((Yt - (Xrl) YS) - Xs 1))) = p(t - s). (11.5) As Yt - YS and Xtl) - XSl) both have variance t - s, this means that the correlation coefficient is p. Thus we have constructed a Brownian motion whose increments are correlated to those of X(1) with correlation p. More generally, we could construct a Brownian motion from any vector a = (al,ak) with a? = 1, by taking Ek=l ajX(J). 11.3 The higher-dimensional Ito calculus 263 The existence of such correlated Brownian motions will be crucial in pricing multi-asset options. In general, we may want a whole vector of Brownian motions with a specified correlation matrix.

This means that (Wr+At - (yVr+At - Wt(k)) = AtpjkZk + Ot 1 - pjkejkZk. (11.12) The second term has mean zero and variance of order At2 so we can discard it as small, whereas the first term has mean pjk At and variance of order At2 and therefore contributes. This gives us a new rule for the multi-dimensional Ito calculus: dWrj)dW(k) = pjkdt. To summarize, we have Theorem 11.1 (Multi-dimensional Ito lemma) Let Wtj) be correlated Brownian motions with correlation coefficient pjk between the Brownian motions WU) and Wtk). Let Xj be an Ito process with respect to Wt W. Let f be a smooth function; we then have that af of 11 at j=1 11 1 T a2f + 2 j,k=1 ax axi` (t, X1, ax j (t,X1,...,X,1)dXj ..., X7,)dX jdXk, (11.14) with dWtj)dWtk) = pjkdt. When collecting terms, the final double sum will be absorbed into the dt term. We still need to think a little about what a process of the form 12 dYt = ltdt + ajdWtj), (11.16) j=1 means.

Perfect correlation means the vectors point the same way, perfect negative correlation means they point the opposite way, and zero correlation means they are orthogonal. In (11.20), the first vector has length O'1, and the second length oa2. When we add two vectors, v1, V2, the square of the length of the resultant vector is IIv1112 + 2 cos(9)IIv111. i1v211 + IIv2112, where 9 is the angle between the vectors. If we interpret the correlation coefficient as being the cosine of the angle between the two Brownian motions, then this means that the new volatility is just the length of the vector obtained by summing the vectors for each Brownian motion. 266 Multiple sources of risk More generally, we could construct a Brownian motion from any vector a=(a1,...,ak) Ek=1 ajX( ). with a? = 1, by taking When we have a process driven by k > 2 Brownian motions, we obtain a similar expression to (11.20).


The Book of Why: The New Science of Cause and Effect by Judea Pearl, Dana Mackenzie

affirmative action, Albert Einstein, Asilomar, Bayesian statistics, computer age, computer vision, correlation coefficient, correlation does not imply causation, Daniel Kahneman / Amos Tversky, Edmond Halley, Elon Musk, en.wikipedia.org, experimental subject, Isaac Newton, iterative process, John Snow's cholera map, Loebner Prize, loose coupling, Louis Pasteur, Menlo Park, pattern recognition, Paul Erdős, personalized medicine, Pierre-Simon Laplace, placebo effect, prisoner's dilemma, probability theory / Blaise Pascal / Pierre de Fermat, randomized controlled trial, selection bias, self-driving car, Silicon Valley, speech recognition, statistical model, Stephen Hawking, Steve Jobs, strong AI, The Design of Experiments, the scientific method, Thomas Bayes, Turing test

The correlation will always reflect the degree of cross predictability between the two variables. Galton’s disciple Karl Pearson later derived a formula for the slope of the (properly rescaled) regression line and called it the correlation coefficient. This is still the first number that statisticians all over the world compute when they want to know how strongly two different variables in a data set are related. Galton and Pearson must have been thrilled to find such a universal way of describing the relationships between random variables. For Pearson, especially, the slippery old concepts of cause and effect seemed outdated and unscientific, compared to the mathematically clear and precise concept of a correlation coefficient. GALTON AND THE ABANDONED QUEST It is an irony of history that Galton started out in search of causation and ended up discovering correlation, a relationship that is oblivious of causation.

“I interpreted… Galton to mean that there was a category broader than causation, namely correlation, of which causation was only the limit, and that this new conception of correlation brought psychology, anthropology, medicine and sociology in large part into the field of mathematical treatment. It was Galton who first freed me from the prejudice that sound mathematics could only be applied to natural phenomena under the category of causation.” In Pearson’s eyes, Galton had enlarged the vocabulary of science. Causation was reduced to nothing more than a special case of correlation (namely, the case where the correlation coefficient is 1 or –1 and the relationship between x and y is deterministic). He expresses his view of causation with great clarity in The Grammar of Science (1892): “That a certain sequence has occurred and reoccurred in the past is a matter of experience to which we give expression in the concept causation.… Science in no case can demonstrate any inherent necessity in a sequence, nor prove with absolute certainty that it must be repeated.”

In the case of one treatment variable (X) and one outcome variable (Y), the equation of the regression line will look like this: Y = aX + b. The parameter a (often denoted by rYX, the regression coefficient of Y on X) tells us the average observed trend: a one-unit increase of X will, on average, produce an a-unit increase in Y. If there are no confounders of Y and X, then we can use this as our estimate of an intervention to increase X by one unit. But what if there is a confounder, Z? In this case, the correlation coefficient rYX will not give us the average causal effect; it only gives us the average observed trend. That was the case in Wright’s problem of the guinea pig birth weights, discussed in Chapter 2, where the apparent benefit (5.66 grams) of an extra day’s gestation was biased because it was confounded with the effect of a smaller litter size. But there is still a way out: by plotting all three variables together, with each value of (X, Y, Z) describing one point in space.


pages: 206 words: 70,924

The Rise of the Quants: Marschak, Sharpe, Black, Scholes and Merton by Colin Read

"Robert Solow", Albert Einstein, Bayesian statistics, Black-Scholes formula, Bretton Woods, Brownian motion, business cycle, capital asset pricing model, collateralized debt obligation, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, David Ricardo: comparative advantage, discovery of penicillin, discrete time, Emanuel Derman, en.wikipedia.org, Eugene Fama: efficient market hypothesis, financial innovation, fixed income, floating exchange rates, full employment, Henri Poincaré, implied volatility, index fund, Isaac Newton, John Meriwether, John von Neumann, Joseph Schumpeter, Kenneth Arrow, Long Term Capital Management, Louis Bachelier, margin call, market clearing, martingale, means of production, moral hazard, Myron Scholes, Paul Samuelson, price stability, principal–agent problem, quantitative trading / quantitative finance, RAND corporation, random walk, risk tolerance, risk/return, Ronald Reagan, shareholder value, Sharpe ratio, short selling, stochastic process, Thales and the olive presses, Thales of Miletus, The Chicago School, the scientific method, too big to fail, transaction costs, tulip mania, Works Progress Administration, yield curve

These assumptions allow us to calculate the expected return of the portfolio Rp by summing across all securities: E ( Rp ) = ∑ wi E (Ri ) i and the portfolio variance: s 2p = ∑ wi2 s i2 + ∑ ∑ wi w j si sj r ij i i j ≠i Notice that all of the coefficients on the right-hand side of the portfolio variance expression are necessarily positive, except for the correlation coefficient. We can readily see that portfolio variance is minimized if the correlation coefficient ij 1. We can generalize this risk minimization procedure through the matrix algebra for which Markowitz developed efficient solution algorithms that were more easily computable. This matrix algebra approach that minimizes variance for a given return R and wealth w becomes: min s 2 = min XVX T ∋ r = (W − X.1)rf + XR X X where the wealth constraint r = ( W − X.1) rf + XR affirms that wealth is invested in a risky portfolio R that returns R and a risk-free asset that returns rf.

Marschak had framed the The Theory 23 problem and indicated the direction for its solution. Most significantly for financial pricing theory, he went on: [W]e reinterpret [the decision variables] to mean not future yields but parameters [e.g., moments and joint moments] of the jointfrequency distribution of future yields. Thus, x may be interpreted as the mathematical expectation of first year’s meat consumption, y may be its standard deviation, z may be the correlation coefficient between meat and salt consumption … etc. … It is sufficiently realistic, however, to confine ourselves, for each [return] to two parameters only: the mathematical expectation … and the coefficient of variation [“risk”].8 Marschak proposed a simple approach to the consideration of the interplay between return and risk by confining its description to first moments, known as means, and second moments of returns, labeled variances and covariances.


pages: 757 words: 193,541

The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Volume 2 by Thomas A. Limoncelli, Strata R. Chalup, Christina J. Hogan

active measures, Amazon Web Services, anti-pattern, barriers to entry, business process, cloud computing, commoditize, continuous integration, correlation coefficient, database schema, Debian, defense in depth, delayed gratification, DevOps, domain-specific language, en.wikipedia.org, fault tolerance, finite state, Firefox, Google Glasses, information asymmetry, Infrastructure as a Service, intermodal, Internet of things, job automation, job satisfaction, Kickstarter, load shedding, longitudinal study, loose coupling, Malcom McLean invented shipping containers, Marc Andreessen, place-making, platform as a service, premature optimization, recommendation engine, revision control, risk tolerance, side project, Silicon Valley, software as a service, sorting algorithm, standardized shipping container, statistical model, Steven Levy, supply-chain management, Toyota Production System, web application, Yogi Berra

Are there any particular events in the coming year, such as the Olympics or an election, that are expected to cause a usage spike? How much spare capacity do you need to handle these spikes gracefully? Headroom is usually specified as a percentage of current capacity. • Timetable: For each component, what is the lead time from ordering to delivery, and from delivery until it is in service? Are there specific constraints for bringing new capacity into service, such as change windows? * * * Math Terms Correlation Coefficient: Describes how strongly measurements for different data sources resemble each other. Moving Average: A series of averages, each of which is taken across a short time interval (window), rather than across the whole data set. Regression Analysis: A statistical method for analyzing relationships between different data sources to determine how well they correlate, and to predict changes in one based on changes in another.

To perform a regression analysis on time-series data, you first need to define a time interval, such as 1 day or 4 weeks. The number of data samples in that time period is n. If your core driver metric is x and your primary resource metric is y, you first calculate the sum of the last n values for x, x2, y, y2, and x times y, giving Σx, Σx2, Σy, Σy2, and Σxy. Then calculate SSxy, SSxx, SSyy, and R as follows: Regression analysis results in a correlation coefficient R, which is a number between –1 and 1. Squaring this number and then multiplying by 100 gives the percentage match between the two data sources. For example, for the MAU and network utilization figures shown in Figure 18.2, this calculation gives a very high correlation, between 96 percent and 100 percent, as shown in Figure 18.3, where R2 is graphed. Figure 18.2: The number of users correlates well with network traffic.

Notice that after the upgrade b changes significantly during the time period chosen for the correlation analysis and then becomes stable again but at a higher value. The large fluctuations in b for the length of the correlation window are due to significant changes in the moving averages from day to day, as the moving average has both pre- and post-upgrade data. When sufficient time has passed so that only post-upgrade data is used in the moving average, b becomes stable and the correlation coefficient returns to its previous high levels. The value of b corresponds to the slope of the line, or the multiplier in the equation linking the core driver and the usage of the primary resource. When correlation returns to normal, b is at a higher level. This result indicates that the primary resource will be consumed more rapidly with this software release than with the previous one. Any marked change in correlation should trigger a reevaluation of the multiplier b and corresponding resource usage predictions.


pages: 586 words: 159,901

Wall Street: How It Works And for Whom by Doug Henwood

accounting loophole / creative accounting, activist fund / activist shareholder / activist investor, affirmative action, Andrei Shleifer, asset allocation, asset-backed security, bank run, banking crisis, barriers to entry, borderless world, Bretton Woods, British Empire, business cycle, capital asset pricing model, capital controls, central bank independence, computerized trading, corporate governance, corporate raider, correlation coefficient, correlation does not imply causation, credit crunch, currency manipulation / currency intervention, David Ricardo: comparative advantage, debt deflation, declining real wages, deindustrialization, dematerialisation, diversification, diversified portfolio, Donald Trump, equity premium, Eugene Fama: efficient market hypothesis, experimental subject, facts on the ground, financial deregulation, financial innovation, Financial Instability Hypothesis, floating exchange rates, full employment, George Akerlof, George Gilder, hiring and firing, Hyman Minsky, implied volatility, index arbitrage, index fund, information asymmetry, interest rate swap, Internet Archive, invisible hand, Irwin Jacobs, Isaac Newton, joint-stock company, Joseph Schumpeter, kremlinology, labor-force participation, late capitalism, law of one price, liberal capitalism, liquidationism / Banker’s doctrine / the Treasury view, London Interbank Offered Rate, Louis Bachelier, market bubble, Mexican peso crisis / tequila crisis, microcredit, minimum wage unemployment, money market fund, moral hazard, mortgage debt, mortgage tax deduction, Myron Scholes, oil shock, Paul Samuelson, payday loans, pension reform, plutocrats, Plutocrats, price mechanism, price stability, prisoner's dilemma, profit maximization, publication bias, Ralph Nader, random walk, reserve currency, Richard Thaler, risk tolerance, Robert Gordon, Robert Shiller, Robert Shiller, selection bias, shareholder value, short selling, Slavoj Žižek, South Sea Bubble, The inhabitant of London could order by telephone, sipping his morning tea in bed, the various products of the whole earth, The Market for Lemons, The Nature of the Firm, The Predators' Ball, The Wealth of Nations by Adam Smith, transaction costs, transcontinental railway, women in the workforce, yield curve, zero-coupon bond

Sometimes it doesn't matter whether the bad news is true; if the short can take a position and undertake a successful disinformation campaign, he or she can profitably cover the short. 35. For the "real" sector, however, borders still matter, and the "global assembly line" is a bit of an exaggeration. 36. The correlation coefficient is a measure of how tightly two sets of numbers are related to each other, ranging from -1 (a perfect mirror image) through 0 (no relation at all) to PLAYERS +1 (perfect lockstep). A correlation coefficient under 0.2 marks a fairly cacophanous relation, but figures over 0.9 signify great intimacy. 37. In fact, many foreign investments made in the U.S. during the 1980s have had apparently dismal rates of return. The dollar's decline has savaged financial invesments, and real investments haven't done much better.

In dollar terms, 70% of all foreign debt issued in the U.S. between 1926 and 1929 (excluding Canada) went bad — compared with a default rate of "only" 30% on corporate debt issued in the late 1920s. Most of the sovereign defaulters, by the way, had good ratings from Moody's (Cantor and Packer 1995). But now those defaults are a distant memory, and today's capital markets look seamless. Statistics confirm the decreasing importance of borders for the financial markets."*^ In the 1970s, the correlation coefficient between interest rates on 10-year U.S. government bonds and German bonds of similar maturity was 0.191, but from 1990 to 1994, it was 0.934; Japan and the U.S., 0.182 and 0.965, respectively; and the U.S. and the U.K., 0.590 and 0.949 (Bank of England data, reported in Goldstein et al. 1994, p. 5).-^'' While it would be an exaggeration to say that there's now a single global credit market, we're definitely moving in that direction.

One doesn't want to get too carried away naturalizing temperament and values, but the model seems particularly to drive away women and nonwhites, at least in America, because of its chilly irreality. It may just be that sex and race are simply convenient markers for hierarchy —that economics is an ideology of privilege, and the already privileged, or those who wish to become apologists for the privileged, are drawn to its study. WALL STREET 8. Correlation coefficients for the various versions of q suggested in the text are all well over .92. The correlation for the simple equity q (market value of stock divided by tangible assets, as shown in the charts and used in the text) and the values for 1960-74 reported in Tobin and Brainard C1977) is .97. 9- It's interesting that investment rose during what are usually considered the bad years of the 1970s.


pages: 467 words: 154,960

Trend Following: How Great Traders Make Millions in Up or Down Markets by Michael W. Covel

Albert Einstein, Atul Gawande, backtesting, beat the dealer, Bernie Madoff, Black Swan, buy and hold, buy low sell high, capital asset pricing model, Clayton Christensen, commodity trading advisor, computerized trading, correlation coefficient, Daniel Kahneman / Amos Tversky, delayed gratification, deliberate practice, diversification, diversified portfolio, Edward Thorp, Elliott wave, Emanuel Derman, Eugene Fama: efficient market hypothesis, Everything should be made as simple as possible, fiat currency, fixed income, game design, hindsight bias, housing crisis, index fund, Isaac Newton, John Meriwether, John Nash: game theory, linear programming, Long Term Capital Management, mandelbrot fractal, margin call, market bubble, market fundamentalism, market microstructure, mental accounting, money market fund, Myron Scholes, Nash equilibrium, new economy, Nick Leeson, Ponzi scheme, prediction markets, random walk, Renaissance Technologies, Richard Feynman, risk tolerance, risk-adjusted returns, risk/return, Robert Shiller, Robert Shiller, shareholder value, Sharpe ratio, short selling, South Sea Bubble, Stephen Hawking, survivorship bias, systematic trading, the scientific method, Thomas L Friedman, too big to fail, transaction costs, upwardly mobile, value at risk, Vanguard fund, William of Occam, zero-sum game

It is the historical tendency of one thing to move in tandem with another.” The correlation coefficient is a number from –1 to +1, with –1 being the perfectly opposite behavior of two investments (for example, up 5 percent every time the other is down 5 percent). The +1 reflects identical investment results (up or down the same amount each period). The further away from +1 one gets (and thus closer to –1), the better a diversifier one investment is for the other. But because his firm is keenly aware of keeping things simple, it also provides another description of correlation: the tendency for one investment to “zig” while another “zags.”27 I took the monthly performance numbers of trend followers and computed their correlation coefficients. Comparing correlations provided evidence that trend followers trade typically the same markets in the same way at the same time.

He explained the reason—most investors pulled out at the wrong time. They traded with their gut and treated drawdowns as a cancer, rather than the natural ebb and flow of trading.” Interestingly, there is another perspective on drawdowns that few people consider. When you look at trend following performance data—for example, Dunn’s track record—you can’t help but notice that certain times are better than others to invest with Dunn. Correlation coefficient: A statistical measure of the interdependence of two or more random variables. Fundamentally, the value indicates how much of a change in one variable is explained by a change in another.25 Smart clients of Dunn look at his performance chart and buy in when his fund is experiencing a drawdown. Why? Because if he is down 30 percent, and you know from analysis of past performance data that his recovery from drawdowns is typically quick, why not “buy” Dunn while he is on sale?

We’ve all evolved and developed systems that are very different from those we were taught, and that independent evolution suggests that the dissimilarities to trading between turtles are always increasing.”29 A Turtle correlation chart paints a clear picture. The relationship is solid. The data (Chart 3.6) is the judge: CHART 3.6: Correlation Among Turtle Traders Chesapeake Chesapeake Eckhardt Hawksbill JPD Rabar 1 0.53 0.62 0.75 0.75 Eckhardt 0.53 1 0.7 0.7 0.71 Hawksbill 0.62 0.7 1 0.73 0.76 JPD 0.75 0.7 0.73 1 0.87 Rabar 0.75 0.71 0.76 0.87 1 Correlation coefficients gauge how closely an advisor’s performance resembles another advisor. Values exceeding 0.66 might be viewed as having significant positive performance correlation. Consequently, values exceeding –0.66 might be viewed as having significant negative performance correlation. Chesapeake Capital Corporation Eckhardt Trading Co. Hawksbill Capital Management JPD Enterprises Inc. Rabar Market Research Of course, there is more to the story than just correlation.


The Handbook of Personal Wealth Management by Reuvid, Jonathan.

asset allocation, banking crisis, BRICs, business cycle, buy and hold, collapse of Lehman Brothers, correlation coefficient, credit crunch, cross-subsidies, diversification, diversified portfolio, estate planning, financial deregulation, fixed income, high net worth, income per capita, index fund, interest rate swap, laissez-faire capitalism, land tenure, market bubble, merger arbitrage, negative equity, new economy, Northern Rock, pattern recognition, Ponzi scheme, prediction markets, Right to Buy, risk tolerance, risk-adjusted returns, risk/return, short selling, side project, sovereign wealth fund, statistical arbitrage, systematic trading, transaction costs, yield curve

As a result, forestry is considered to have strong diversification potential and the capability of reducing an investment portfolio’s overall risk. Forestry in the United States has been repeatedly shown to have a negative correlation coefficient with, among other financial assets, common stocks, corporate and government bonds, and the S&P 500 (see Table 2.3.1), and in certain studies reduced real portfolio risk by an average of 5 per cent.1 We observe that forestry generally forms a minor element of an overall investment portfolio, perhaps no more than 5–10 per cent as a maximum. Table 2.3.1 Timberland correlation coefficients, 1959–78 Investment correlation coefficient Timberland Residential housing Farm real estate S&P 500 index OTC stocks Preferred stock average No-load mutual fund average Municipal bonds Treasury Bills Long-term corporate bonds Commodity futures average 1.0000 –0.0905 0.5612 –0.4889 –0.4917 –0.3533 –0.6351 –0.0900 0.3118 –0.2704 0.8988 Source: Zinkhan, FC, Sizemore, WR, Mason, GH and Ebner, TJ (1992) Timberland Investments, Portland, Oregon, Timber Press. _______________________________________ CURRENT OPPORTUNITIES IN FORESTRY 81 ឣ Liquidity Certain forestry investments such as COEIC funds and exchange-traded funds (see below) are traded daily on markets such as the London Stock Exchange and Alternative Investment Market (AIM).


pages: 266 words: 86,324

The Drunkard's Walk: How Randomness Rules Our Lives by Leonard Mlodinow

Albert Einstein, Alfred Russel Wallace, Antoine Gombaud: Chevalier de Méré, Atul Gawande, Brownian motion, butterfly effect, correlation coefficient, Daniel Kahneman / Amos Tversky, Donald Trump, feminist movement, forensic accounting, Gerolamo Cardano, Henri Poincaré, index fund, Isaac Newton, law of one price, pattern recognition, Paul Erdős, Pepto Bismol, probability theory / Blaise Pascal / Pierre de Fermat, RAND corporation, random walk, Richard Feynman, Ronald Reagan, Stephen Hawking, Steve Jobs, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Bayes, V2 rocket, Watson beat the top human players on Jeopardy!

The coefficient of correlation is a number between -1 and 1; if it is near ± 1, it indicates that two variables are linearly related; a coefficient of 0 means there is no relation. For example, if data revealed that by eating the latest McDonald’s 1,000-calorie meal once a week, people gained 10 pounds a year and by eating it twice a week they gained 20 pounds, and so on, the correlation coefficient would be 1. If for some reason everyone were to instead lose those amounts of weight, the correlation coefficient would be -1. And if the weight gain and loss were all over the map and didn’t depend on meal consumption, the coefficient would be 0. Today correlation coefficients are among the most widely employed concepts in statistics. They are used to assess such relationships as those between the number of cigarettes smoked and the incidence of cancer, the distance of stars from Earth and the speed with which they are moving away from our planet, and the scores students achieve on standardized tests and the income of the students’ families.


pages: 363 words: 28,546

Portfolio Design: A Modern Approach to Asset Allocation by R. Marston

asset allocation, Bretton Woods, business cycle, capital asset pricing model, capital controls, carried interest, commodity trading advisor, correlation coefficient, diversification, diversified portfolio, equity premium, Eugene Fama: efficient market hypothesis, family office, financial innovation, fixed income, German hyperinflation, high net worth, hiring and firing, housing crisis, income per capita, index fund, inventory management, Long Term Capital Management, mortgage debt, passive investing, purchasing power parity, risk-adjusted returns, Robert Shiller, Robert Shiller, Ronald Reagan, Sharpe ratio, Silicon Valley, stocks for the long run, superstar cities, survivorship bias, transaction costs, Vanguard fund

c05 P2: c/d QC: e/f JWBT412-Marston T1: g December 8, 2010 17:36 Printer: Courier Westford 88 PORTFOLIO DESIGN 1.0 Correlation Coefficients P1: a/b 0.8 10 year 0.6 5 year 0.4 0.2 Dec-74 Dec-79 Dec-84 Dec-89 Dec-94 Dec-99 Dec-04 Dec-09 FIGURE 5.9 Correlations between S&P 500 and EAFE Measured over Five and Ten Year Periods, 1970–2009 Data Sources: MSCI, © Morningstar, and S&P. and MSCI Pacific has an even lower correlation of 0.43. But for the last 10 years alone ending in 2009, the correlation between EAFE and the S&P rises to 0.87. There are correspondingly large increases in correlations between the S&P and the regional MSCI indexes. When did this increase in correlations occur? Consider Figure 5.9 which shows five- and 10-year correlation coefficients between the EAFE and S&P 500 indexes. Since the EAFE index starts only in 1970, the graph begins in 1975 for the five-year correlation and in 1980 for the 10-year correlation. The figure is noteworthy in several respects. First, the correlations vary widely over time whether they are measured over five- or ten-year intervals. The five-year correlation begins above 60 percent and at times falls below 30 percent.

Data Sources: Barclays Capital and Russell 9 10 P1: a/b c08 P2: c/d QC: e/f JWBT412-Marston T1: g December 8, 2010 Strategic Asset Allocation 17:51 Printer: Courier Westford 153 with the lowest allocation of 10 percent in stocks to the portfolio that is invested wholly in stocks.7 Large-cap growth stocks, as represented by the Russell 1000 Growth Index, do not appear in any of the 10 portfolios. The Russell 1000 Growth Index is dominated by the Russell 1000 Value Index. This result should not be surprising given the analysis in Chapter 4. Russell 1000 Value has a higher return and a lower standard deviation than Russell 1000 Growth. What’s more, the two indexes are highly correlated with a correlation coefficient of 0.82. The optimizer finds that one series is totally dominated by the other. So the optimizer rejects one whole asset class. The optimizer is also not fond of small-cap stocks. The Russell 2000 Index has a small weighting in the lowest risk portfolios, and its role disappears in portfolios with larger allocations to stocks. The point of this experiment is not to show the inferiority of large-cap growth stocks or small-cap stocks.

The four databases together have 3,924 live funds at the end of December 2002. The overlap among the three largest databases was analyzed after eliminating the funds that only appeared in MSCI. The percentages were rounded to the nearest decimal. 12. As explained in Chapter 5, the underperformance of EAFE relative to U.S. stocks is almost entirely due to Japan. 13. Recall that beta is equal to the correlation coefficient times the ratio of the standard deviation of the asset relative to the standard deviation of the benchmark. The beta is 0.36 = 0.77 ∗ (0.071/0.152). 14. Since the average return on the risk-free Treasury bill is 3.8 percent and the average return on the Russell 3000 is 9.2 percent, the alpha = 11.8 percent – [3.8% + 0.36∗(9.2% – 3.8%)] = 6.1%. 15. Over the same period, the correlation between the S&P 500 index and the Russell 1000 large-cap index is 1.00 and the correlation between the S&P 500 and the Russell 3000 all-cap index is 0.99.


pages: 545 words: 137,789

How Markets Fail: The Logic of Economic Calamities by John Cassidy

"Robert Solow", Albert Einstein, Andrei Shleifer, anti-communist, asset allocation, asset-backed security, availability heuristic, bank run, banking crisis, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Black-Scholes formula, Blythe Masters, Bretton Woods, British Empire, business cycle, capital asset pricing model, centralized clearinghouse, collateralized debt obligation, Columbine, conceptual framework, Corn Laws, corporate raider, correlation coefficient, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, Daniel Kahneman / Amos Tversky, debt deflation, different worldview, diversification, Elliott wave, Eugene Fama: efficient market hypothesis, financial deregulation, financial innovation, Financial Instability Hypothesis, financial intermediation, full employment, George Akerlof, global supply chain, Gunnar Myrdal, Haight Ashbury, hiring and firing, Hyman Minsky, income per capita, incomplete markets, index fund, information asymmetry, Intergovernmental Panel on Climate Change (IPCC), invisible hand, John Nash: game theory, John von Neumann, Joseph Schumpeter, Kenneth Arrow, Kickstarter, laissez-faire capitalism, Landlord’s Game, liquidity trap, London Interbank Offered Rate, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, margin call, market bubble, market clearing, mental accounting, Mikhail Gorbachev, money market fund, Mont Pelerin Society, moral hazard, mortgage debt, Myron Scholes, Naomi Klein, negative equity, Network effects, Nick Leeson, Northern Rock, paradox of thrift, Pareto efficiency, Paul Samuelson, Ponzi scheme, price discrimination, price stability, principal–agent problem, profit maximization, quantitative trading / quantitative finance, race to the bottom, Ralph Nader, RAND corporation, random walk, Renaissance Technologies, rent control, Richard Thaler, risk tolerance, risk-adjusted returns, road to serfdom, Robert Shiller, Robert Shiller, Ronald Coase, Ronald Reagan, shareholder value, short selling, Silicon Valley, South Sea Bubble, sovereign wealth fund, statistical model, technology bubble, The Chicago School, The Great Moderation, The Market for Lemons, The Wealth of Nations by Adam Smith, too big to fail, transaction costs, unorthodox policies, value at risk, Vanguard fund, Vilfredo Pareto, wealth creators, zero-sum game

Unfortunately, during periods of great stress, the relationships between different asset classes tend to change dramatically. As the big hedge fund Long-Term Capital Management discovered to its cost during the international financial crisis of 1998, many assets that seem to have little or nothing in common suddenly move in the same direction. Prior to the blowup, for example, the correlation coefficient between certain bonds issued by the governments of the Philippines and Bulgaria was just 0.04: as the crisis unfolded, their correlation coefficient rose to 0.84. (A correlation coefficient of zero means two assets have no relationship; a coefficient of one means they move in perfect unison.) During a period of market upheaval, as a Wall Street saying has it, “all correlations go to one.” Investors panic and sell many different types of assets at the same time. When this happens, even a bank or financial institution that appears to be well diversified can suffer losses much bigger than a VAR model would have predicted, especially if it is highly leveraged (as Long-Term Capital was).

.”: Quoted in Jenny Anderson, “Merrill Painfully Learns the Risks of Managing Risk,” New York Times, October 12, 2007. 274 In its 1994 . . . : Philippe Jorion, Value at Risk: The New Benchmark for Managing Financial Risk, 2nd ed. (New York: McGraw-Hill, 2000), 107. 274 “In contrast with traditional . . .”: Ibid., xxii. 275 “It helps you understand . . .”: Quoted in Joe Nocera, “Risk Mismanagement,” New York Times Magazine, January 2, 2009. 277 the correlation coefficient . . . : Linda Allen, Jacob Boudoukh, and Anthony Saunders, Understanding Market, Credit, and Operational Risk: The Value at Risk Approach (Hoboken, N.J.: Wiley-Blackwell, 2004), 103. 278 “We remind our readers . . .”: “CreditMetrics Technical Document,” RiskMetrics, April 1997, available at www.riskmetrics.com/publications/techdocs/cmtdovv.html. 278 “The relative prevalence of . . .”: Allen et al., Understanding Market, 35. 278 “I believe that . . .”: “Against Value at Risk: Nassim Taleb Replies to Philippe Jorion,” 1997, available at www.fooledbyrandomness.com/jorion.html. 279 “business planning relied on . . .”: UBS, “Shareholder Report on UBS’s Write-Downs,” 34. 279 “even though delinquency . . .”: Ibid., 38–39. 280 “would overturn . . .”: Gillian Tett, Fool’s Gold: How the Bold Dream of a Small Tribe at J.P.


pages: 335 words: 94,657

The Bogleheads' Guide to Investing by Taylor Larimore, Michael Leboeuf, Mel Lindauer

asset allocation, buy and hold, buy low sell high, corporate governance, correlation coefficient, Daniel Kahneman / Amos Tversky, diversification, diversified portfolio, Donald Trump, endowment effect, estate planning, financial independence, financial innovation, high net worth, index fund, late fees, Long Term Capital Management, loss aversion, Louis Bachelier, margin call, market bubble, mental accounting, money market fund, passive investing, Paul Samuelson, random walk, risk tolerance, risk/return, Sharpe ratio, statistical model, stocks for the long run, survivorship bias, the rule of 72, transaction costs, Vanguard fund, yield curve, zero-sum game

Some bond funds invest in government bonds, some in corporate bonds, and others in municipal bonds. While some bond funds invest in highly rated investment-grade bonds, still others invest in lower-rated junk bonds. For more information on the various types of bonds, see Chapter 3. When investments (like stocks and bonds) don't always move together, they're said to have a low correlation coefficient. Understanding the correlation coefficient principal isn't really that difficult. The correlation numbers for any two investments can range from +1.0 (perfect correlation) to -1.0 (negative correlation) Basically, if two stocks (or funds) normally move together at the same rate, they're said to be highly correlated, and when two investments move in the opposite directions, they're said to be negatively correlated.

When two investments each randomly go their separate ways, independent of the movement of the other one, there is said to be no correlation between them, and their correlation figure would be shown as 0. Finally, when two investments always move in the opposite direction, they would have a negative correlation, which would be represented by a rating of -1.0. In actual practice, you'll find that most investment choices available to you will have a correlation coefficient somewhere between 1.0 (perfect correlation) and 0 (noncorrelated). It's very difficult to find negatively correlated asset classes that have similar expected returns. The closer the number is to 1.0, the higher the correlation between the two assets, and the lower the number, the less correlation there is between the two investments. So, a correlation figure of 0.71 would mean the two assets are not perfectly correlated, but a fund with a correlation figure of 0.52 would offer still more diversification, since it has an even lower number.


pages: 654 words: 191,864

Thinking, Fast and Slow by Daniel Kahneman

Albert Einstein, Atul Gawande, availability heuristic, Bayesian statistics, Black Swan, Cass Sunstein, Checklist Manifesto, choice architecture, cognitive bias, complexity theory, correlation coefficient, correlation does not imply causation, Daniel Kahneman / Amos Tversky, delayed gratification, demand response, endowment effect, experimental economics, experimental subject, Exxon Valdez, feminist movement, framing effect, hedonic treadmill, hindsight bias, index card, information asymmetry, job satisfaction, John von Neumann, Kenneth Arrow, libertarian paternalism, loss aversion, medical residency, mental accounting, meta analysis, meta-analysis, nudge unit, pattern recognition, Paul Samuelson, pre–internet, price anchoring, quantitative trading / quantitative finance, random walk, Richard Thaler, risk tolerance, Robert Metcalfe, Ronald Reagan, Shai Danziger, Supply of New York City Cabdrivers, The Chicago School, The Wisdom of Crowds, Thomas Bayes, transaction costs, union organizing, Walter Mischel, Yom Kippur War

Each adviser’s scoof ဆre for each year was his (most of them were men) main determinant of his year-end bonus. It was a simple matter to rank the advisers by their performance in each year and to determine whether there were persistent differences in skill among them and whether the same advisers consistently achieved better returns for their clients year after year. To answer the question, I computed correlation coefficients between the rankings in each pair of years: year 1 with year 2, year 1 with year 3, and so on up through year 7 with year 8. That yielded 28 correlation coefficients, one for each pair of years. I knew the theory and was prepared to find weak evidence of persistence of skill. Still, I was surprised to find that the average of the 28 correlations was .01. In other words, zero. The consistent correlations that would indicate differences in skill were not to be found. The results resembled what you would expect from a dice-rolling contest, not a game of skill.

If all you know about Tom is that he ranks twelfth in weight (well above average), you can infer (statistically) that he is probably older than average and also that he probably consumes more ice cream than other children. If all you know about Barbara is that she is eighty-fifth in piano (far below the average of the group), you can infer that she is likely to be young and that she is likely to practice less than most other children. The correlation coefficient between two measures, which varies between 0 and 1, is a measure of the relative weight of the factors they share. For example, we all share half our genes with each of our parents, and for traits in which environmental factors have relatively little influence, such as height, the correlation between parent and child is not far from .50. To appreciate the meaning of the correlation measure, the following are some examples of coefficients: The correlation between the size of objects measured with precision in English or in metric units is 1.

Of course they do, and the effects have been confirmed by systematic research that objectively assessed the characteristics of CEOs and their decisions, and related them to subsequent outcomes of the firm. In one study, the CEOs were characterized by the strategy of the companies they had led before their current appointment, as well as by management rules and procedures adopted after their appointment. CEOs do influence performance, but the effects are much smaller than a reading of the business press suggests. Researchers measure the strength of relationships by a correlation coefficient, which varies between 0 and 1. The coefficient was defined earlier (in relation to regression to the mean) by the extent to which two measures are determined by shared factors. A very generous estimate of the correlation between the success of the firm and the quality of its CEO might be as high as .30, indicating 30% overlap. To appreciate the significance of this number, consider the following question: Suppose you consider many pairs of firms.


pages: 193 words: 47,808

The Flat White Economy by Douglas McWilliams

"Robert Solow", access to a mobile phone, banking crisis, Big bang: deregulation of the City of London, bonus culture, Boris Johnson, Chuck Templeton: OpenTable:, cleantech, cloud computing, computer age, correlation coefficient, Edward Glaeser, en.wikipedia.org, Erik Brynjolfsson, eurozone crisis, George Gilder, hiring and firing, income inequality, informal economy, Kickstarter, knowledge economy, loadsamoney, low skilled workers, mass immigration, Metcalfe’s law, Network effects, new economy, offshore financial centre, Pareto efficiency, Peter Thiel, Productivity paradox, Robert Metcalfe, Silicon Valley, smart cities, special economic zone, Steve Jobs, working-age population, zero-sum game

b) Growing Together: London and the UK Economy 2005 This report looks at a range of links between the London economy and those of the rest of the UK.18 It concludes that London’s growth is not at the expense of the rest of the UK, but that London and other UK regions and countries are interdependent. Table 6.1: Correlation between economic growth in London and the rest of the UK, 1983–2004 Regions and countries of Great Britain Correlation coefficient South East 0.80 East England 0.81 South West 0.64 East Midlands 0.45 West Midlands 0.73 North West 0.73 Yorkshire and Humberside 0.56 North East 0.22 Wales 0.55 Scotland 0.27 Northern Ireland 0.36 Table 6.2: Percentage change in employment, 1989–2001 Regions and countries of Great Britain % change South East 23.7 South West 21.2 East of England 18.8 Scotland 17.4 London 15.3 East Midlands 12.5 Wales 11.7 West Midlands 10.8 Yorkshire and Humberside 10.2 North West 9.9 North East 7.2 A unique feature of this analysis is research into the correlations between GVA growth in London and other regions and countries.

It is also interesting that the three emerging economies for which we have data in this sample – China, Mexico and South Korea – have much lower labour shares of income than in the advanced economies (which might be a bit of evidence to support Marx’s contention that capitalism in the long term would ultimately bid profits down to a level that is too low to permit economic growth). However, these emerging economies currently have much faster rates of economic growth. What is the evidence that higher profits boost economic growth? A crude statistical analysis for the OECD economies for which data is available shows a negative correlation coefficient of -0.31 between the labour income share average from 2000 to 2006 and the rate of economic growth from 2001 to 2008. What this says is that there is a statistically significant negative correlation between the labour income share and GDP growth. In other words, the higher the share of profits, the faster the rate of growth. Although the capital theorists differ, two undoubted heavyweights in Karl Marx and John Maynard Keynes definitely believed that the higher profits and hence investment, the faster the rate of economic growth.


All About Asset Allocation, Second Edition by Richard Ferri

activist fund / activist shareholder / activist investor, asset allocation, asset-backed security, barriers to entry, Bernie Madoff, buy and hold, capital controls, commoditize, commodity trading advisor, correlation coefficient, Daniel Kahneman / Amos Tversky, diversification, diversified portfolio, equity premium, estate planning, financial independence, fixed income, full employment, high net worth, Home mortgage interest deduction, implied volatility, index fund, intangible asset, Long Term Capital Management, Mason jar, money market fund, mortgage tax deduction, passive income, pattern recognition, random walk, Richard Thaler, risk tolerance, risk-adjusted returns, risk/return, Robert Shiller, Robert Shiller, selection bias, Sharpe ratio, stocks for the long run, survivorship bias, too big to fail, transaction costs, Vanguard fund, yield curve

That said, I will also warn you that there are no two asset classes that relate the same way to each other all the time. These relationships are dynamic, and they can and do change without warning. Selecting investments that do not go up and down at the same time (or most of the time) can be made easier with correlation analysis. This is a mathematical measure of the tendency of one investment to move in relation to another. The correlation coefficient is a mathematically derived number that measures this tendency toward comovement relative to the investments’ average return. If two investments each move in the same direction at the same time above their average returns, they have a positive correlation. If they each move in opposite directions below their average returns, they have a negative correlation. If the movement of one investment relative to its average return is independent of the other, the two investments are noncorrelated.

Negative correlation is theoretically ideal when selecting investments for a portfolio, but you are not going to find it in the real world. These pairs of investments just do not exist. Correlation is measured using a range between ⫹1 and ⫺1. Two investments that have a correlation of ⫹0.3 or greater are considered positively correlated. When two investments have a correlation of ⫺0.3 or less, this is considered negative correlation. A correlation coefficient between ⫺0.3 and ⫹0.3 is considered noncorrelated. When two investments are noncorrelated, either the movement of one does not track the movement of the other or the tracking is inconsistent and shifts between positive and negative. Figure 3-4 represents two investments that are noncorrelated; sometimes they move together, and sometimes they do not. There is a diversification benefit from investing in noncorrelated assets.

Chartered Financial Analyst (CFA) An investment professional who has met competency standards in economics, securities, portfolio management, and financial accounting as determined by the Institute of Chartered Financial Analysts. Closed-End Fund A mutual fund that has a fixed number of shares, usually listed on a major stock exchange. Commodities Unprocessed goods, such as grains, metals, and minerals, traded in large amounts on a commodities exchange. Consumer Price Index (CPI) A measure of the price change in consumer goods and services. The CPI is used to track the pace of inflation. Correlation Coefficient A number between ⫺1 and 1 that measures the degree to which two variables are linearly related. Cost Basis The original cost of an investment. For tax purposes, the cost basis is subtracted from the sale price to determine any capital gain or loss. Glossary 323 Country Risk The possibility that political events (e.g., a war, national elections), financial problems (e.g., rising inflation, government default), or natural disasters (e.g., an earthquake, a poor harvest) will weaken a country’s economy and cause investments in that country to decline.


pages: 764 words: 261,694

The Elements of Statistical Learning (Springer Series in Statistics) by Trevor Hastie, Robert Tibshirani, Jerome Friedman

Bayesian statistics, bioinformatics, computer age, conceptual framework, correlation coefficient, G4S, greed is good, linear programming, p-value, pattern recognition, random walk, selection bias, speech recognition, statistical model, stochastic process, The Wisdom of Crowds

Consider the partial covariance matrix Σa.b = Σaa − Σab Σ−1 bb Σba between the two subsets of variables Xa = (X1 , X2 ) consisting of the first two, and Xb the rest. This is the covariance matrix between these two variables, after linear adjustment for all the rest. In the Gaussian distribution, this is the covariance matrix of the conditional distribution of Xa |Xb . The partial correlation coefficient ρjk|rest between the pair Xa conditional on the rest Xb , is simply computed from this partial covariance. Define Θ = Σ−1 . 1. Show that Σa.b = Θ−1 aa . 2. Show that if any off-diagonal element of Θ is zero, then the partial correlation coefficient between the corresponding variables is zero. 3. Show that if we treat Θ as if it were a covariance matrix, and compute the corresponding “correlation” matrix R = diag(Θ)−1/2 · Θ · diag(Θ)−1/2 , then rjk = −ρjk|rest Ex. 17.4 Denote by f (X1 |X2 , X3 , . . . , Xp ) the conditional density of X1 given X2 , . . . , Xp .

Breiman and Friedman (1997) explored with some success shrinkage of the canonical variates between X and Y, a smooth version of reduced rank regression. Their proposal has the form (compare (3.69)) B̂c+w = B̂UΛU−1 , (3.72) where Λ is a diagonal shrinkage matrix (the “c+w” stands for “Curds and Whey,” the name they gave to their procedure). Based on optimal prediction in the population setting, they show that Λ has diagonal entries λm = c2m + c2m p N (1 − c2m ) , m = 1, . . . , M, (3.73) where cm is the mth canonical correlation coefficient. Note that as the ratio of the number of input variables to sample size p/N gets small, the shrinkage factors approach 1. Breiman and Friedman (1997) proposed modified versions of Λ based on training data and cross-validation, but the general form is the same. Here the fitted response has the form Ŷc+w = HYSc+w , (3.74) 86 3. Linear Methods for Regression where Sc+w = UΛU−1 is the response shrinkage operator.

Thus, the choice of a particular value of M is not critical, as long as it is not too small. This tends to be the case in many applications. The shrinkage strategy (10.41) tends to eliminate the problem of overfitting, especially for larger data sets. The value of AAE after 800 iterations is 0.31. This can be compared to that of the optimal constant predictor median{yi } which is 0.89. In terms of more familiar quantities, the squared multiple correlation coefficient of this model is R2 = 0.84. Pace and Barry (1997) use a sophisticated spatial autoregression procedure, where prediction for each neighborhood is based on median house values in nearby neighborhoods, using the other predictors as covariates. Experimenting with transformations they achieved R2 = 0.85, predicting log Y . Using log Y as the response the corresponding value for gradient boosting was R2 = 0.86. 2 http://lib.stat.cmu.edu. 372 10.


pages: 295 words: 66,824

A Mathematician Plays the Stock Market by John Allen Paulos

Benoit Mandelbrot, Black-Scholes formula, Brownian motion, business climate, business cycle, butter production in bangladesh, butterfly effect, capital asset pricing model, correlation coefficient, correlation does not imply causation, Daniel Kahneman / Amos Tversky, diversified portfolio, dogs of the Dow, Donald Trump, double entry bookkeeping, Elliott wave, endowment effect, Erdős number, Eugene Fama: efficient market hypothesis, four colour theorem, George Gilder, global village, greed is good, index fund, intangible asset, invisible hand, Isaac Newton, John Nash: game theory, Long Term Capital Management, loss aversion, Louis Bachelier, mandelbrot fractal, margin call, mental accounting, Myron Scholes, Nash equilibrium, Network effects, passive investing, Paul Erdős, Paul Samuelson, Ponzi scheme, price anchoring, Ralph Nelson Elliott, random walk, Richard Thaler, Robert Shiller, Robert Shiller, short selling, six sigma, Stephen Hawking, stocks for the long run, survivorship bias, transaction costs, ultimatum game, Vanguard fund, Yogi Berra

Even a portfolio of stocks from the same sector will be less volatile than the individual stocks in it, while a portfolio consisting of Wal-Mart, Pfizer, General Electric, Exxon, and Citigroup, the biggest stocks in their respective sectors, will provide considerably more protection against volatility. To find the volatility of a portfolio in general, we need what is called the “covariance” (closely related to the correlation coefficient) between any pair of stocks X and Y in the portfolio. The covariance between two stocks is roughly the degree to which they vary together—the degree, that is, to which a change in one is proportional to a change in the other. Note that unlike many other contexts in which the distinction between covariance (or, more familiarly, correlation) and causation is underlined, the market generally doesn’t care much about it.

Brian auditors Aumann, Robert availability error average values compared with distribution of incomes risk as variance from averages average return compared with median return average value compared with distribution of incomes buy-sell rules and outguessing average guess risk as variance from average value averaging down Bachelier, Louis Bak, Per Barabasi, Albert-Lazló Bartiromo, Maria bear markets investor self-descriptions and shorting and distorting strategy in Benford, Frank Benford’s Law applying to corporate fraud background of frequent occurrence of numbers governed by Bernoulli, Daniel Beta (B) values causes of variations in comparing market against individual stocks or funds strengths and weaknesses of technique for finding volatility and Big Bang billiards, as example of nonlinear system binary system biorhythm theory Black, Fischer Black-Scholes option formula blackjack strategies Blackledge, Todd “blow up,” investor blue chip companies, P/E ratio of Bogle, John bonds Greenspan’s impact on bond market history of stocks outperforming will not necessarily continue to be outperformed by stocks Bonds, Barry bookkeeping. see accounting practices bottom-line investing Brock, William brokers. see stock brokers Buffett, Warren bull markets investor self-descriptions and pump and dump strategy in Butterfly Economics (Ormerod) “butterfly effect,” of nonlinear systems buy-sell rules buying on the margin. see also margin investments calendar effects call options. see also stock options covering how they work selling strategies valuation tools campaign contributions Capital Asset Pricing Model capital gains vs. dividends Central Limit Theorem CEOs arrogance of benefits in manipulating stock prices remuneration compared with that of average employee volatility due to malfeasance of chain letters Chaitin, Gregory chance. see also whim trading strategies and as undeniable factor in market chaos theory. see also nonlinear systems charity Clayman, Michelle cognitive illusions availability error confirmation bias heuristics rules of thumb for saving time mental accounts status quo bias Cohen, Abby Joseph coin flipping common knowledge accounting scandals and definition and importance to investors dynamic with private knowledge insider trading and parable illustrating private information becoming companies/corporations adjusting results to meet expectations applying Benford’s Law to corporate fraud comparing corporate and personal accounting financial health and P/E ratio of blue chips competition vs. cooperation, prisoner’s dilemma complexity changing over time horizon of sequences (mathematics) of trading strategies compound interest as basis of wealth doubling time and formulas for future value and present value and confirmation bias definition of investments reflecting stock-picking and connectedness. see also networks European market causing reaction on Wall Street interactions based on whim interactions between technical traders and value traders irrational interactions between traders Wolfram model of interactions between traders Consumer Confidence Index (CCI) contrarian investing dogs of the Dow measures of excellence and rate of return and cooperation vs. competition, prisoner’s dilemma correlation coefficient. see also statistical correlations counter-intuitive investment counterproductive behavior, psychology of covariance calculation of portfolio diversification based on portfolio volatility and stock selection and Cramer, James crowd following or not herd-like nature of price movements dart throwing, stock-picking contest in the Wall Street Journal data mining illustrated by online chatrooms moving averages and survivorship bias and trading strategies and DeBondt, Werner Deciding What’s News (Gans) decimalization reforms decision making minimizing regret selling WCOM depression of derivatives trading, Enron despair and guilt over market losses deviation from the mean. see also mean value covariance standard deviation (d) variance dice, probability and Digex discounting process, present value of future money distribution of incomes distribution of wealth dynamic of concentration UN report on diversified portfolios. see stock portfolios, diversifying dividends earnings and proposals benefitting returns from Dodd, David dogs of the Dow strategy “dominance” principle, game theory dot com IPOs, as a pyramid scheme double-bottom trend reversal “double-dip” recession double entry bookkeeping doubling time, compound interest and Dow dogs of the Dow strategy percentages of gains and losses e (exponential growth) compound interest and higher mathematics and earnings anchoring effect and complications with determination of inflating (WCOM) P/E ratio and stock valuation and East, Steven H.


pages: 217 words: 152

Why Airplanes Crash: Aviation Safety in a Changing World by Clinton V. Oster, John S. Strong, C. Kurt Zorn

airline deregulation, airport security, correlation coefficient, Tenerife airport disaster, trickle-down economics

Source: Data provided by Battelle Aviation Safety Reporting System Office. the risk that appears to be indicated by the incidents and the appropriate type of accident. Air traffic control operational errors, pilot deviations, and near midair collisions all appear to be indicators of risk of midair collision. Operational errors do not seem closely correlated with midair collisions. Terminal airspace operational errors have essentially no correlation (a correlation coefficient of -0.03) based on the eight years of available data. ARTCC operational errors have been influenced by the introduction of the snitch patch and the controllers' adjustment to it. In the four years of the post-snitch patch era, the correlation with midair collisions is only 0.20. Over the same period, operational errors are actually negatively correlated with the FAA's count of total near midair collisions (-0.84) and critical near midair collisions (-0.83).

Pilot deviations resulting in a loss of separation are highly correlated over the period, but total pilot deviations, deviations that resulted in violation of restricted airspace, and deviations by general aviation pilots are negatively correlated. Five years of data are simply too little upon which to base a conclusion. The Margin of Safety 119 Near midair collisions are also not strongly correlated with midair collisions. Indeed, near midair collisions as reported to the ASRS show no correlation (-0.08) over the seven years of available data. Midair collisions reported to the FAA have a correlation coefficient of only 0.51, although critical NMACs are somewhat higher at 0.76. The FAA correlation is based on only seven years of data. The poor correlation between potential midair collision incidents and accidents is disappointing to those seeking nonaccident leading indicators of aviation safety, but may not be surprising in light of the characteristics of the incident data. First, of course, the data have not been collected long enough to find a relationship even if one existed.


pages: 681 words: 64,159

Numpy Beginner's Guide - Third Edition by Ivan Idris

algorithmic trading, business intelligence, Conway's Game of Life, correlation coefficient, Debian, discrete time, en.wikipedia.org, general-purpose programming language, Khan Academy, p-value, random walk, reversible computing, time value of money

We will measure the correlaton of our pair with the correlaton coefcient. The correlaton coefcient takes values between -1 and 1 . The correlaton of a set of values with itself is 1 by defniton. This would be the ideal value; however, we will also be happy with a slightly lower value. Calculate the correlaton coefcient (or, more accurately, the correlaton matrix) with the corrcoef() functon: print("Correlation coefficient", np.corrcoef(bhp_returns, vale_ returns)) The coefcients are as follows: [[ 1. 0.67841747] [ 0.67841747 1. ]] The values on the diagonal are just the correlatons of the BHP and VALE with themselves and are, therefore, equal to 1. In all likelihood, no real calculaton takes place. The other two values are equal to each other since correlaton is symmetrical, meaning that the correlaton of BHP with VALE is equal to the correlaton of VALE with BHP.

For the source code, see the correlation.py fle in this book's code bundle: from __future__ import print_function import numpy as np import matplotlib.pyplot as plt bhp = np.loadtxt('BHP.csv', delimiter=',', usecols=(6,), unpack=True) bhp_returns = np.diff(bhp) / bhp[ : -1] vale = np.loadtxt('VALE.csv', delimiter=',', usecols=(6,), unpack=True) vale_returns = np.diff(vale) / vale[ : -1] covariance = np.cov(bhp_returns, vale_returns) print("Covariance", covariance) print("Covariance diagonal", covariance.diagonal()) print("Covariance trace", covariance.trace()) print(covariance/ (bhp_returns.std() * vale_returns.std())) print("Correlation coefficient", np.corrcoef(bhp_returns, vale_ returns)) difference = bhp - vale avg = np.mean(difference) dev = np.std(difference) print("Out of sync", np.abs(difference[-1] - avg) > 2 * dev) t = np.arange(len(bhp_returns)) plt.plot(t, bhp_returns, lw=1, label='BHP returns') plt.plot(t, vale_returns, '--', lw=2, label='VALE returns') plt.title('Correlating arrays') plt.xlabel('Days') plt.ylabel('Returns') plt.grid() plt.legend(loc='best') plt.show() Q1.


pages: 220 words: 73,451

Democratizing innovation by Eric von Hippel

additive manufacturing, correlation coefficient, Debian, disruptive innovation, hacker house, informal economy, information asymmetry, inventory management, iterative process, James Watt: steam engine, knowledge economy, longitudinal study, meta analysis, meta-analysis, Network effects, placebo effect, principal–agent problem, Richard Stallman, software patent, transaction costs, Vickrey auction

Ogawa determined how much of the design for each was done by the user firm and how much by the manufacturer firm. Controlling for profit expectations, he found that increases in the stickiness of user information were associated with a significant increase in the amount of need-related design undertaken by the user (Kendall correlation coefficient = 0.5784, P < 0.01). Conversely he found that increased stickiness of technology-related information was associated in a significant reduction in the amount of technology design done by the user (Kendall correlation coefficients = 0.4789, P < 0.05). In other words, need-intensive tasks within product-development projects will tend to be done by users, while solutionintensive ones will tend to be done by manufacturers. Low-Cost Innovation Niches Just as there are information asymmetries between users and manufacturers as classes, there are also information asymmetries among individual user firms and individuals, and among individual manufacturers as well.


pages: 263 words: 75,455

Quantitative Value: A Practitioner's Guide to Automating Intelligent Investment and Eliminating Behavioral Errors by Wesley R. Gray, Tobias E. Carlisle

activist fund / activist shareholder / activist investor, Albert Einstein, Andrei Shleifer, asset allocation, Atul Gawande, backtesting, beat the dealer, Black Swan, business cycle, butter production in bangladesh, buy and hold, capital asset pricing model, Checklist Manifesto, cognitive bias, compound rate of return, corporate governance, correlation coefficient, credit crunch, Daniel Kahneman / Amos Tversky, discounted cash flows, Edward Thorp, Eugene Fama: efficient market hypothesis, forensic accounting, hindsight bias, intangible asset, Louis Bachelier, p-value, passive investing, performance metric, quantitative hedge fund, random walk, Richard Thaler, risk-adjusted returns, Robert Shiller, Robert Shiller, shareholder value, Sharpe ratio, short selling, statistical model, survivorship bias, systematic trading, The Myth of the Rational Market, time value of money, transaction costs

Profitable Months Proportion of monthly performances that have a positive return. Rolling 5-Year Wins Proportion of rolling 5-year periods that a designated strategy beats the identified benchmarks. Rolling 5-Year Wins Proportion of rolling 10-year periods that a designated strategy beats identified benchmarks. Cumulative Drawdown Sum of the rolling 5-year period worst drawdowns for the designated strategy. Correlation Correlation coefficient for a designated strategy and the identified benchmarks, which demonstrates the extent to which a designated strategy and the identified benchmarks move together. RISK AND RETURN Table 12.2 sets out the standard statistical analyses of the Quantitative Value strategy's performance and risk profile, comparing it to the Magic Formula, the Standard & Poor's (S&P) 500 and the MW Index, the market capitalization–weighted index of the universe from which we draw the stocks in the model portfolios.

Profitable Months Proportion of monthly performances that have a positive return. Rolling 5-Year Wins Proportion of rolling 5-year periods that a designated strategy beats the identified benchmarks. Rolling 10-Year Wins Proportion of rolling 10-year periods that a designated strategy beats identified benchmarks. Cumulative Drawdown Sum of the rolling 5-year period worst drawdowns for the designated strategy. Correlation Correlation coefficient for a designated strategy and the identified benchmarks, which demonstrates the extent to which a designated strategy and the identified benchmarks move together. About the Authors Wesley R. Gray, PhD, is the founder and executive managing member of Empiritrage, LLC, an SEC-Registered Investment Advisor, and Turnkey Analyst, LLC, a firm dedicated to educating and sharing quantitative investment techniques to the general public.


pages: 431 words: 132,416

No One Would Listen: A True Financial Thriller by Harry Markopolos

backtesting, barriers to entry, Bernie Madoff, buy and hold, call centre, centralized clearinghouse, correlation coefficient, diversified portfolio, Edward Thorp, Emanuel Derman, Eugene Fama: efficient market hypothesis, family office, financial thriller, fixed income, forensic accounting, high net worth, index card, Long Term Capital Management, Louis Bachelier, offshore financial centre, Ponzi scheme, price mechanism, quantitative trading / quantitative finance, regulatory arbitrage, Renaissance Technologies, risk-adjusted returns, risk/return, rolodex, Sharpe ratio, statistical arbitrage, too big to fail, transaction costs, your tax dollars at work

That meant that he knew from his order flow what stocks were going to go up, which obviously would have been extremely beneficial when he was picking stocks for his basket. We found out later that several hedge funds believed he was doing this. I created hypothetical baskets using the best-performing stocks and followed his split-strike strategy, selling the call option to generate income and buying the put option for protection. The following week I’d pick another basket. I expected the correlation coefficient—the relationship between Bernie’s returns and the movement of the entire S&P 100—legitimately to be around 50 percent, but it could have been anywhere between 30 percent and 80 percent and I would have accepted it naively. Instead Madoff was coming in at about 6 percent. Six percent! That was impossible. That number was much too low. It meant there was almost no relationship between those stocks and the entire index.

After going through my work, Dan told us that whatever Madoof, as he referred to him, was doing, he was not getting his results from the market. Pointing to the 6 percent correlation and the 45-degree return line, he said, “That doesn’t look like it came from a finance distribution. We don’t have those kinds of charts in finance.” I was right, he agreed. Madoof’s strategy description claimed his returns were market-driven, yet his correlation coefficient was only 6 percent to the market and his performance line certainly wasn’t coming from the stock market. Volatility is a natural part of the market. It moves up and down—and does it every day. Any graphic representation of the market has to reflect that. Yet Madoff’s 45-degree rise represented a market without that volatility. It wasn’t possible. Bernie Madoff was a fraud. And whatever he was actually doing, it was enough to put him in prison.

Broyhill meets Scott Franzblau and mobsters on Ponzi scheme on Ponzi scheme vs. front-running post Bernie Madoff arrest public acknowledgment of role and Rene-Thierry de la Villehuchet on reporting to SEC on reverse engineering role of sailing disaster on SEC failure Wall Street Journal warns individual investors Casey, Judy Cattle trading scam Charles, Prince Chelo, Neil: business education of careers at Rampart continued activities of early career of impact of Bernie Madoff case on information gathering leaves Rampart OPRA tapes on payment for order flow on Ponzi scheme vs. front-running post Bernie Madoff arrest public acknowledgment of role on quants on reporting to SEC reviews strategy analysis role of talks to Amit Vijayvergiya Wall Street Journal warns individual investors Cheung, Meaghan Chicago Art Museum Chicago Board of Options Exchange (CBOE) Chinese vitamin suppliers scandal Citigroup Client redemptions Clinton, Hillary Cohen, Steve Collars Commodities straddle Commodity Futures Trading Commission Congress: Chuck Schumer call to SEC Harry Markopolos testimony investigation by SEC established by SEC in hearings by Congressional Record Contacts and relationships Cook, Boyd Correlation coefficient Corruption: as business as usual drug cartels incompetence vs. municipal bonds organized crime regulatory reporting on Russian mafia vs. stupidity Wall Street crimes See also Taxpayers Against Fraud; whistleblowers Court, Andy Covered call writing program Cox, Christopher Criminal investigation CSPAN3 Cuomo, Andrew Danger, concerns about Darien Capital Management Data analysis Data collection DeBello, Nicole de la Villehuchet, Bertrand de la Villehuchet, Claudine de la Villehuchet, Rene-Thierry: on Bernie Madoff and Frank Casey and Harry Markopolos meets Frank Casey suicide Department of Justice Derivative experts Devoe, George diBartolomeo, Dan Dickens, Charles Direct accounts Discrepancies Documentation Documentation and literature Dominelli, David Donnelly, Joe Drosos, Elaine Drug cartels Due diligence Dumb equity Ebbers, Bernie Efficient markets hypothesis Electronic security Electronic trading European banks European investors Excuses for investing with Bernie Madoff Fairfield Emerald Fairfield Greenwich Group Fairfield Greenwich Sentry Fund False Claims Act cases Fax machines Federal Bureau of Investigation (FBI) Feeder funds Fielder, David Financial frauds.


pages: 394 words: 85,734

The Global Minotaur by Yanis Varoufakis, Paul Mason

active measures, banking crisis, Berlin Wall, Big bang: deregulation of the City of London, Bretton Woods, business climate, business cycle, capital controls, Carmen Reinhart, central bank independence, collapse of Lehman Brothers, collateralized debt obligation, colonial rule, corporate governance, correlation coefficient, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, debt deflation, declining real wages, deindustrialization, endogenous growth, eurozone crisis, financial innovation, first-past-the-post, full employment, Hyman Minsky, industrial robot, Joseph Schumpeter, Kenneth Rogoff, Kickstarter, labour market flexibility, light touch regulation, liquidity trap, London Interbank Offered Rate, Long Term Capital Management, market fundamentalism, Mexican peso crisis / tequila crisis, money market fund, mortgage debt, Myron Scholes, negative equity, new economy, Northern Rock, paper trading, Paul Samuelson, planetary scale, post-oil, price stability, quantitative easing, reserve currency, rising living standards, Ronald Reagan, special economic zone, Steve Jobs, structural adjustment programs, systematic trading, too big to fail, trickle-down economics, urban renewal, War on Poverty, WikiLeaks, Yom Kippur War

This book is not the place to enter into the proof in any detail. If interested, please consult Y. Varoufakis, J. Halevi and N. Theocarakis (2011) Modern Political Economics: Making sense of the post-2008 world, London and New York: Routledge. 11. In more technical language, the formulae used to assemble the CDOs assumed that the correlation coefficient between the probability of default across a CDO’s different tranches or slices was constant, small and knowable. 12. Doubt about the constancy of the correlation coefficient (see previous footnote) would have cost them their jobs, particularly as their supervisors did not really understand the formula but were receiving huge bonuses while it was being used. 13. See George Soros (2009) The Crash of 2008 and What It Means: The new paradigm for financial markets, New York: Public Affairs.


Beginning R: The Statistical Programming Language by Mark Gardener

correlation coefficient, distributed generation, natural language processing, New Urbanism, p-value, statistical model

subset = group %in% “sample” If the data includes a grouping variable, the subset instruction can be used to select one or more samples from this grouping. The commands summarized in Table 6-3 enable you to carry out a range of correlation tasks. In the following sections you see a few of these options illustrated, and you can then try some correlations yourself in the activity that follows. Simple Correlation Simple correlations are between two continuous variables and you can use the cor() command to obtain a correlation coefficient like so: > count = c(9, 25, 15, 2, 14, 25, 24, 47) > speed = c(2, 3, 5, 9, 14, 24, 29, 34) > cor(count, speed) [1] 0.7237206 The default for R is to carry out the Pearson product moment, but you can specify other correlations using the method = instruction, like so: > cor(count, speed, method = 'spearman') [1] 0.5269556 This example used the Spearman rho correlation but you can also apply Kendall’s tau by specifying method = “kendall”.

. $ weight: num 115 117 120 123 126 129 132 135 139 142 ... You need to use attach() or with() commands to allow R to “read inside” the data frame and access the variables within. You could also use the $ syntax so that the command can access the variables as the following example shows: > cor(women$height, women$weight) [1] 0.9954948 In this example the cor() command has calculated the Pearson correlation coefficient between the height and weight variables contained in the women data frame. You can also use the cor() command directly on a data frame (or matrix). If you use the data frame women that you just looked at, for example, you get the following: > cor(women) height weight height 1.0000000 0.9954948 weight 0.9954948 1.0000000 Now you have a correlation matrix that shows you all combinations of the variables in the data frame.

In the Pearson correlation you are assuming that the data are normally distributed and are looking to see how close the relationship is between the variables. In regression you are taking the analysis further and assuming a mathematical, and therefore predictable, relationship between the variables. The results of regression analysis show the slope and intercept values that describe this relationship. The R squared value that you obtain from the regression is the square of the correlation coefficient from the Pearson correlation, which demonstrates the similarities between the methods. The result shows you the coefficients for the regression, that is, the intercept and the slope. To see more details you should save your regression as a named object; then you can use the summary() command like so: > fw.lm = lm(count ~ speed, data = fw) > summary(fw.lm) Call: lm(formula = count ~ speed, data = fw) Residuals: Min 1Q Median 3Q Max -13.377 -5.801 -1.542 5.051 14.371 Coefficients: Estimate Std.


pages: 459 words: 144,009

Upheaval: Turning Points for Nations in Crisis by Jared Diamond

anti-communist, Asian financial crisis, Berlin Wall, British Empire, California gold rush, clean water, correlation coefficient, cuban missile crisis, Dissolution of the Soviet Union, Gini coefficient, illegal immigration, interchangeable parts, invention of writing, Jeff Bezos, medical malpractice, mutually assured destruction, Nelson Mandela, nuclear winter, oil shale / tar sands, peak oil, post-work, purchasing power parity, rising living standards, risk tolerance, Ronald Reagan, The Spirit Level, traffic fines, transcontinental railway, women in the workforce, World Values Survey

Is this central belief of ours true? One method by which social scientists have tested this belief is to compare, among different countries, the correlation coefficients between incomes (or income ranks within people of their generation) of adults and the incomes of their parents. A correlation coefficient of 1.0 would mean that relative incomes of parents and of their adult children are perfectly correlated: all high-income people are children of high-income parents, all low-income people are children of low-income parents, kids from low-income families have zero chance of achieving high incomes, and socio-economic mobility is zero. At the opposite extreme, if the correlation coefficient were zero, it would mean that children of low-income parents have as good a chance of achieving high incomes as do children of high-income parents, and socio-economic mobility is high.


pages: 268 words: 89,761

Unhealthy societies: the afflictions of inequality by Richard G. Wilkinson

attribution theory, business cycle, clean water, correlation coefficient, experimental subject, full employment, fundamental attribution error, Gini coefficient, income inequality, income per capita, Indoor air pollution, invisible hand, land reform, longitudinal study, means of production, purchasing power parity, rising living standards, twin studies, upwardly mobile

If increases in GNPpc over time were simply understated, it might be thought that this would not mask a statistical relationship between health and GNPpc: the extent to which societies benefited from qualitative changes in output would be a constant function of their growth rates. If this were so, then understated growth would change the units rather than weaken the correlation between the two. It would tend to make any given increase in income appear more health effective. In technical terms: rather than weakening the correlation coefficient it would increase the size of the regression coefficient. However, it could be argued that the spread of better products does not depend simply on the expenditure which results from the few per cent of income growth. Much nearer to the truth is that, as earlier forms of goods are made obsolete and replaced in the shops by new models and lines, the whole flow of expenditure is applied to the current range of goods, including new goods and ones in which the quality has changed.

Looking at data from nine industrialised countries, Kunst and Mackenbach (1994) found that, ‘The rank order of countries in terms of income inequalities strongly corresponds to their rank order in terms of inequalities in mortality.’ Since then van Doorslaer et al. have reported that differences in self-reported illness were greatest in countries whose income differences were greatest (van Doorslaer Income distribution and health 89 et al. 1996). The relationship between measures of inequality in income and in illness was very close: across the USA and the eight European countries for which they had data, the correlation coefficient was 0.87. The methods used in each study were quite different. Kunst and Mackenbach classified people according to occupation in one of their studies and by education in another, and they concluded that occupational and educational differences in mortality were greater in countries where income differences were greater. In contrast, van Doorslaer et al. used data from surveys giving details of incomes and self-reported health for the same individuals.


pages: 287 words: 44,739

Guide to business modelling by John Tennent, Graham Friend, Economist Group

business cycle, correlation coefficient, discounted cash flows, double entry bookkeeping, G4S, intangible asset, iterative process, purchasing power parity, RAND corporation, shareholder value, the market place, time value of money

The coefficient of determination R2 (which is calculated automatically by most spreadsheet packages) indicates how much of the variation in Y is explained by the explanatory variables. The greater the value of R2, the more the variation in the dependent variable is explained by the selected independent variables. The square root of the coefficient of determination is the product moment correlation coefficient in the case of linear regression of a straight line. The product moment correlation is a number between 1 and ⫺1. If r ⫽ 1 then there is a perfect, positive relationship between the dependent and explanatory variable. A perfect relation implies that every data point lies on a straight line. If r ⫽ ⫺1 then a perfect negative relationship exists, and if r ⫽ 0 there is no relationship. Estimating the coefficients To demonstrate a number of linear regression estimation techniques, it is necessary to develop a forecast for gross connections based on the historical data set out in Chart 10.12.

The resulting graph, the regression line and the regression equation and R2 value are shown in Chart 10.14. Chart 10.14 Regression equation for monthly gross connections against time The R2 value is very low at 0.315. This implies that only 31.5% of the variation in gross connections is explained by time, so any forecast based on this regression equation will be liable to considerable error. The correlation coefficient is the square root of R2⫽SQR(0.315)⫽0.561. Although this value is low it is possible to show, using significance testing, that time is still a significant determinant of gross connections. Regression techniques 99 This procedure is quick and simple to use. However, to develop a forecast it is necessary to use the equation of the straight line. The use of the graphical procedure does not allow this unless the formula is reproduced manually by extrapolating the coefficients by hand and entering them in a spreadsheet.


pages: 353 words: 88,376

The Investopedia Guide to Wall Speak: The Terms You Need to Know to Talk Like Cramer, Think Like Soros, and Buy Like Buffett by Jack (edited By) Guinan

Albert Einstein, asset allocation, asset-backed security, Brownian motion, business cycle, business process, buy and hold, capital asset pricing model, clean water, collateralized debt obligation, computerized markets, correlation coefficient, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, discounted cash flows, diversification, diversified portfolio, dividend-yielding stocks, dogs of the Dow, equity premium, fixed income, implied volatility, index fund, intangible asset, interest rate swap, inventory management, London Interbank Offered Rate, margin call, money market fund, mortgage debt, Myron Scholes, passive investing, performance metric, risk tolerance, risk-adjusted returns, risk/return, shareholder value, Sharpe ratio, short selling, statistical model, time value of money, transaction costs, yield curve, zero-coupon bond

Related Terms: • Bond • Municipal Bond • Yield to maturity—YTM • Debt Financing • Yield Correlation What Does Correlation Mean? In the investment world, correlation is a statistical measure of how two securities move in relation to each other. Correlations are used in advanced portfolio management. Investopedia explains Correlation Correlation is expressed as the correlation coefficient, which ranges between –1 and +1. Perfect positive correlation (a correlation coefficient of +1) means that as one security moves up or down, the other security will move lockstep in the same direction. Perfect negative correlation means that when one security moves in one direction, the other security will move by an equal amount in the opposite direction. If the correlation is 0, the movements of the securities are said to have no correlation; they are completely random.


The Rise and Decline of Nations: Economic Growth, Stagflation, and Social Rigidities by Mancur Olson

"Robert Solow", barriers to entry, British Empire, business cycle, California gold rush, collective bargaining, correlation coefficient, David Ricardo: comparative advantage, full employment, income per capita, Kenneth Arrow, market clearing, Norman Macrae, Pareto efficiency, price discrimination, profit maximization, rent-seeking, Sam Peltzman, selection bias, Simon Kuznets, The Wealth of Nations by Adam Smith, trade liberalization, transaction costs, urban decay, working poor

.: Prentice-Hall, 1980), especially chapter 3. 33. Kwang Choi, "A Study of Comparative Rates of Economic Growth" (forthcoming, Iowa State University Press) and Kwang Choi, "A Statistical Test of the Political Economy of Comparative Growth Rates Model," in Mueller, The Political Economy of Growth. 34. Spearman rank correlation coefficients between years since statehood and LPI, PN, and per capita LP/, PN were respectively -.52, -.67, -.52, and -.52, and the correlation coefficients were in every case significant. 35. Farm organization membership need not be correlated with union membership, but farm groups focus almost exclusively on the farm policies of the federal government, and any losses in output due to them must fall mainly on consumers throughout the United States, rather than in the state in which the farmers are organized, so farm organization membership probably should not be included in tests on the forty-eight contiguous states.


pages: 339 words: 112,979

Unweaving the Rainbow by Richard Dawkins

Any sufficiently advanced technology is indistinguishable from magic, Arthur Eddington, complexity theory, correlation coefficient, David Attenborough, discovery of DNA, double helix, Douglas Engelbart, Douglas Engelbart, I think there is a world market for maybe five computers, Isaac Newton, Jaron Lanier, Mahatma Gandhi, music of the spheres, Necker cube, p-value, phenotype, Ralph Waldo Emerson, Richard Feynman, Ronald Reagan, Solar eclipse in 1919, Steven Pinker, Zipf's Law

Different astrologers, after all, presumably have access to the same books. Even if their verdicts are wrong, you'd think their methods would be systematic enough at least to agree in producing the same wrong verdicts! Alas, as shown in a study by G. Dean and colleagues, they don't even achieve this minimal and easy benchmark. For comparison, when different assessors judged people on their performance in structured interviews, the correlation coefficient was greater than 0.8 (a correlation coefficient of 1.0 would represent perfect agreement, –1.0 would represent perfect disagreement, 0.0 would represent complete randomness or lack of association; 0.8 is pretty good). Against this, in the same study, the reliability coefficient for astrology was a pitiable 0.1, comparable to the figure for palmistry (0.11), and indicating near total randomness. However wrong astrologers may be, you'd think that they would have got their act together to the extent of at least being consistent Apparently not.


pages: 397 words: 109,631

Mindware: Tools for Smart Thinking by Richard E. Nisbett

affirmative action, Albert Einstein, availability heuristic, big-box store, Cass Sunstein, choice architecture, cognitive dissonance, correlation coefficient, correlation does not imply causation, cosmological constant, Daniel Kahneman / Amos Tversky, dark matter, endowment effect, experimental subject, feminist movement, fixed income, fundamental attribution error, glass ceiling, Henri Poincaré, Intergovernmental Panel on Climate Change (IPCC), Isaac Newton, job satisfaction, Kickstarter, lake wobegon effect, libertarian paternalism, longitudinal study, loss aversion, low skilled workers, Menlo Park, meta analysis, meta-analysis, quantitative easing, Richard Thaler, Ronald Reagan, selection bias, Shai Danziger, Socratic dialogue, Steve Jobs, Steven Levy, the scientific method, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, William of Occam, Zipcar

A correlation of .7 corresponds to the association between height and weight—substantial but still not perfect. A correlation of .8 corresponds to the degree of association you find between scores on the math portion of the Scholastic Aptitude Test (SAT) at one testing and scores on that test a year later—quite high but still plenty of room for difference between the two scores on average. Correlation Does Not Establish Causality Correlation coefficients are one step in assessing causal relations. If there is no correlation between variable A and variable B, there (probably) is no causal relation between A and B. (An exception would be when there is a third variable C that masks the correlation between A and B when there is in fact a causal relation between A and B.) If there is a correlation between variable A and variable B, this doesn’t establish that variation in A causes variation in B.

Coding Is the Key to Thinking Statistically I’m going to ask you some questions concerning your beliefs about what you think the correlation is between a number of pairs of variables. The way I’ll do that is to ask you how likely it is that A would be greater than B on one occasion given that A was greater than B on another occasion. Your answers in probability terms can be converted to correlation coefficients by a mathematical formula. Note that if you say “50 percent” for a question below, you’re saying that you think there’s no relationship between behavior on one occasion and behavior on another. If you say “90 percent,” you’re saying that there is an extremely strong relationship between behavior on one occasion and behavior on another. For the first question below about spelling ability, if you think that there is no consistency between spelling performance on one occasion and spelling performance on another occasion, you would say “50 percent.”


pages: 416 words: 106,532

Cryptoassets: The Innovative Investor's Guide to Bitcoin and Beyond: The Innovative Investor's Guide to Bitcoin and Beyond by Chris Burniske, Jack Tatar

Airbnb, altcoin, asset allocation, asset-backed security, autonomous vehicles, bitcoin, blockchain, Blythe Masters, business cycle, business process, buy and hold, capital controls, Carmen Reinhart, Clayton Christensen, clean water, cloud computing, collateralized debt obligation, commoditize, correlation coefficient, creative destruction, Credit Default Swap, credit default swaps / collateralized debt obligations, cryptocurrency, disintermediation, distributed ledger, diversification, diversified portfolio, Donald Trump, Elon Musk, en.wikipedia.org, Ethereum, ethereum blockchain, fiat currency, financial innovation, fixed income, George Gilder, Google Hangouts, high net worth, Jeff Bezos, Kenneth Rogoff, Kickstarter, Leonard Kleinrock, litecoin, Marc Andreessen, Mark Zuckerberg, market bubble, money market fund, money: store of value / unit of account / medium of exchange, moral hazard, Network effects, packet switching, passive investing, peer-to-peer, peer-to-peer lending, Peter Thiel, pets.com, Ponzi scheme, prediction markets, quantitative easing, RAND corporation, random walk, Renaissance Technologies, risk tolerance, risk-adjusted returns, Robert Shiller, Robert Shiller, Ross Ulbricht, Satoshi Nakamoto, Sharpe ratio, Silicon Valley, Simon Singh, Skype, smart contracts, social web, South Sea Bubble, Steve Jobs, transaction costs, tulip mania, Turing complete, Uber for X, Vanguard fund, WikiLeaks, Y2K

A group of stocks is inherently more diversified than a single stock, and therefore the volatility should be lower. Cryptoassets have near-zero correlation to other capital market assets. The best explanation for this is that cryptoassets are so new that many capital market investors don’t play in the same asset pools. Therefore, cryptoassets aren’t dancing to the same rhythm of information as traditional capital market assets, at least not yet. Figure 7.19 The correlation coefficient and effects of diversification on risk Source: A Random Walk Down Wall Street, Burton G. Malkiel, 2015 Figure 7.19 clearly shows that if an asset is zero correlated to other assets in a portfolio, then “considerable risk reduction is possible.” In quantitative terms, reducing risk can be seen by a decrease in the volatility of the portfolio. If an asset merely reduces the risk of the overall portfolio by being lowly to negatively correlated with other assets, then it doesn’t have to provide superior absolute returns to improve the risk-reward ratio of the overall portfolio.

See Bitcoin Tracker One Cold storage, 221–222 Collaboration, 111 community and, 56 platforms for, 159 Collateralized mortgage obligations (CMOs), 4–5 Colored coins, 53 Commodities, 80, 172, 276–277 Commodities Futures Trading Commission (CFTC), 107, 112, 224, 276 Communication, 14 Communication Nets, xxiii Community, 57, 62 collaboration and, 56 of computers, 18 developers and, 182 Companies, 28, 63, 118 as incumbents, 264–273 interface services by, 113 OTC by, 216 as peer-to-peer, 13 perspective of, 249–250 risk and, 75 support and, 198–200 technology and, 264–265 value of, 152 venture capitalism for, 248 Competition, 16, 214 Compound annual growth rate (CAGR), 118–119 Compound annual returns, 87, 88, 103–104 Computer scientists, 60 Computers blockchain technology and, 26, 186 community of, 18 as miners, 16 for mining, 212 private keys on, 226 supercomputers as, 59 Consortium, 272–273 Consumable/Transformable (C/T) Assets, 109–110 Content, 174 Corbin, Abel, 164–165 Cornering, 163–166 cryptoassets and, 166–168 Correlation coefficient, 101 Correlation of returns, 74–76 Correlations, 122 assets and, 74 Bitcoin and, 133 cryptoassets and, 101–102 market behavior and, 132–135 Counterparty, 53–54 CPUs. See Central processing units Credit, 153 assets and, 143 issuers quality of, 239 Credulity, 141 The Crowd: A Study of the Popular Mind (Le Bon), 140 Crowd theory, 141 Crowdfunding, 60 Internet and, 250–254, 256 for investors, 250–252 for projects, 254 regulations and, 250 Crowds, 137–153 Crowdsale, 257 Cryptoassets.


pages: 743 words: 189,512

The Big Fat Surprise: Why Butter, Meat and Cheese Belong in a Healthy Diet by Nina Teicholz

Albert Einstein, correlation coefficient, correlation does not imply causation, Gary Taubes, Indoor air pollution, meta analysis, meta-analysis, phenotype, placebo effect, randomized controlled trial, Robert Gordon, selection bias, the scientific method, Upton Sinclair

In 1999, when the Seven Countries study’s lead Italian researcher, Alessandro Menotti, went back twenty-five years later and looked at data from the study’s 12,770 subjects, he noticed an interesting fact: the category of foods that best correlated with coronary mortality was sweets. By “sweets,” he meant sugar products and pastries, which had a correlation coefficient with coronary mortality of 0.821 (a perfect correlation is 1.0). Possibly this number would have been higher had Menotti included chocolate, ice cream, and soft drinks in his “sweets” category, but those fell under a different category and, he explained, would have been “too troublesome” to recode. By contrast, “animal food” (butter, meat, eggs, margarine, lard, milk, and cheese) had a correlation coefficient of 0.798, and this number likely would have been lower had Menotti excluded margarine. (Margarine is usually made from vegetable fats, but researchers at the time tended to lump it in with animal foods because it looked so much like butter.)

“a remarkable and troublesome omission”: Katerina Sarri and Anthony Kafatos, letter to the editor, “The Seven Countries Study in Crete: Olive Oil, Mediterranean Diet or Fasting?” Public Health Nutrition 8, no. 6 (2005): 666. “we should not” . . . “the ideal thing all the time”: Daan Kromhout, interview with author, October 4, 2007. he knew it would go unnoticed: Keys, Aravanis, and Sdrin, “Diets of Middle-Aged Men in Two Rural Areas of Greece,” 577. category of foods . . . which had a correlation coefficient: Alessandro Menotti et al., “Food Intake Patterns and 25-Year Mortality from Coronary Heart Disease: Cross-Cultural Correlations in the Seven Countries Study,” European Journal of Epidemiology 15, no. 6 (1999): 507–515. “too troublesome” to recode: Alessandro Menotti, interview with author, July 24, 2008. “Keys was very opposed to the sugar idea”: Kromhout, interview. “He was so convinced that fatty acids” . . .


pages: 823 words: 220,581

Debunking Economics - Revised, Expanded and Integrated Edition: The Naked Emperor Dethroned? by Steve Keen

"Robert Solow", accounting loophole / creative accounting, banking crisis, banks create money, barriers to entry, Benoit Mandelbrot, Big bang: deregulation of the City of London, Black Swan, Bonfire of the Vanities, business cycle, butterfly effect, capital asset pricing model, cellular automata, central bank independence, citizen journalism, clockwork universe, collective bargaining, complexity theory, correlation coefficient, creative destruction, credit crunch, David Ricardo: comparative advantage, debt deflation, diversification, double entry bookkeeping, en.wikipedia.org, Eugene Fama: efficient market hypothesis, experimental subject, Financial Instability Hypothesis, fixed income, Fractional reserve banking, full employment, Henri Poincaré, housing crisis, Hyman Minsky, income inequality, information asymmetry, invisible hand, iterative process, John von Neumann, Kickstarter, laissez-faire capitalism, liquidity trap, Long Term Capital Management, mandelbrot fractal, margin call, market bubble, market clearing, market microstructure, means of production, minimum wage unemployment, money market fund, open economy, Pareto efficiency, Paul Samuelson, place-making, Ponzi scheme, profit maximization, quantitative easing, RAND corporation, random walk, risk tolerance, risk/return, Robert Shiller, Robert Shiller, Ronald Coase, Schrödinger's Cat, scientific mainstream, seigniorage, six sigma, South Sea Bubble, stochastic process, The Great Moderation, The Wealth of Nations by Adam Smith, Thorstein Veblen, time value of money, total factor productivity, tulip mania, wage slave, zero-sum game

Just as significantly, the unemployment rate stabilized when the decline in debt-financed demand turned around. Though the huge fiscal and monetary stimulus packages also played a role, changes in debt-financed demand dominate economic performance. One statistical indicator of the importance of debt dynamics in causing both the Great Depression and the Great Recession and the booms that preceded them is the correlation coefficient between changes in debt and the level of unemployment. Over the whole period from 1921 till 1940, the correlation coefficient was minus 0.83, while over the period from 1990 till 2011, it was minus 0.91 (versus the maximum value it could have taken of minus one). A correlation of that scale, over time periods of that length, when economic circumstances varied from bust to boom and back again, is staggering. 13.31 Debt-financed demand and unemployment, 1990–2011 The Credit Impulse confirms the dominant role of private debt.

In Sharpe’s words: In order to derive conditions for equilibrium in the capital market we invoke two assumptions. First, we assume a common pure rate of interest, with all investors able to borrow or lend funds on equal terms. Second, we assume homogeneity of investor expectations: investors are assumed to agree on the prospects of various investments – the expected values, standard deviations and correlation coefficients described in Part II. Needless to say, these are highly restrictive and undoubtedly unrealistic assumptions. However, since the proper test of a theory is not the realism of its assumptions but the acceptability of its implications, and since these assumptions imply equilibrium conditions which form a major part of classical financial doctrine, it is far from clear that this formulation should be rejected – especially in view of the dearth of alternative models leading to similar results.


The Age of Turbulence: Adventures in a New World (Hardback) - Common by Alan Greenspan

"Robert Solow", addicted to oil, air freight, airline deregulation, Albert Einstein, asset-backed security, bank run, Berlin Wall, Bretton Woods, business cycle, business process, buy and hold, call centre, capital controls, central bank independence, collateralized debt obligation, collective bargaining, conceptual framework, Corn Laws, corporate governance, corporate raider, correlation coefficient, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, cuban missile crisis, currency peg, Deng Xiaoping, Dissolution of the Soviet Union, Doha Development Round, double entry bookkeeping, equity premium, everywhere but in the productivity statistics, Fall of the Berlin Wall, fiat currency, financial innovation, financial intermediation, full employment, Gini coefficient, Hernando de Soto, income inequality, income per capita, invisible hand, Joseph Schumpeter, labor-force participation, laissez-faire capitalism, land reform, Long Term Capital Management, Mahatma Gandhi, manufacturing employment, market bubble, means of production, Mikhail Gorbachev, moral hazard, mortgage debt, Myron Scholes, Nelson Mandela, new economy, North Sea oil, oil shock, open economy, Pearl River Delta, pets.com, Potemkin village, price mechanism, price stability, Productivity paradox, profit maximization, purchasing power parity, random walk, reserve currency, Right to Buy, risk tolerance, Ronald Reagan, shareholder value, short selling, Silicon Valley, special economic zone, stocks for the long run, the payments system, The Wealth of Nations by Adam Smith, Thorstein Veblen, too big to fail, total factor productivity, trade liberalization, trade route, transaction costs, transcontinental railway, urban renewal, working-age population, Y2K, zero-sum game

THE M O D E S OF C A P I T A L I S M at the same time, Germany ranks among the highest in terms of the freedom of its people to open and close businesses, property-rights protection, and the overall rule of law. France (number forty-five) and Italy (number sixty) have profiles that are similarly mixed. The ultimate test of the usefulness of such a scoring process is whether it correlates with economic performance. And it does. The correlation coefficient of 157 countries between their "Economic Freedom Score" and the log of their per capita incomes is 0.65, impressive for such a motley body of data.* Thus, we are left with a critical question: Granted that open competitive markets foster economic growth, is there an optimum trade-off between economic performance and the competitive stress it imposes on the one hand, and the civility that, for example, the continental Europeans and many others espouse?

Accordingly the weighted correlation between national saving rates and domestic investment rates for countries or regions representing virtually all of the world's gross domestic product, a measure of the degree of home bias, declined from a coefficient of around 0.95 in 1992, where it had hovered since 1970, to an estimated 0.74 in 2005. (If in every country saving equaled investment—that is, if there were 100 percent home bias—the correlation coefficient would be 1.0. On the other hand, if there were no home bias, and the amount of domestic saving bore no relationship to the amount and location of investments, the coefficient would be 0.)* Only in the past decade has expanding trade been associated with the emergence of ever-larger U.S. trade and current account deficits, matched by a corresponding widening of the aggregate external surpluses of many of our trading partners, most recently including China.

The piling up of dollar claims against U.S. residents is already leading to concerns about "concentration risk"—the too-many-eggs-in-one-basket worry that could prompt foreign holders to exchange dollars for other currencies, even when the dollar investments yield more. Although foreign investors *The persistent divergence subsequent to t h e creation of t h e euro of m a n y prices of identical goods a m o n g m e m b e r countries of t h e euro area is analyzed in John H. Rogers (2002). For t h e case of U.S. and Canadian prices, see Charles Engel and John H. Rogers ( 1 9 9 6 ) . t T h e correlation coefficient measures of h o m e bias have flattened o u t since 2 0 0 0 . So have t h e measures of dispersion. This is consistent w i t h t h e United States' accounting for a rising share of deficits. 361 More ebooks visit: http://www.ccebook.cn ccebook-orginal english ebooks This file was collected by ccebook.cn form the internet, the author keeps the copyright. THE AGE OF T U R B U L E N C E have not yet significantly slowed their financing of U.S. capital investments, since early 2002 the value of the dollar relative to other currencies has declined, as has the share of dollar assets in some measures of global cross-border portfolios.* If the current disturbing drift toward protectionism is contained and markets remain sufficiently flexible, changing terms of trade, interest rates, asset prices, and exchange rates should cause U.S. saving to rise relative to domestic investment.


Mastering Machine Learning With Scikit-Learn by Gavin Hackeling

computer vision, constrained optimization, correlation coefficient, Debian, distributed generation, iterative process, natural language processing, Occam's razor, optical character recognition, performance metric, recommendation engine

An r-squared score of one indicates that the response variable can be predicted without any error using the model. An r-squared score of one half indicates that half of the variance in the response variable can be predicted using the model. There are several methods to calculate r-squared. In the case of simple linear regression, r-squared is equal to the square of the Pearson product moment correlation coefficient, or Pearson's r. [ 29 ] www.it-ebooks.info Linear Regression Using this method, r-squared must be a positive number between zero and one. This method is intuitive; if r-squared describes the proportion of variance in the response variable explained by the model, it cannot be greater than one or less than zero. Other methods, including the method used by scikit-learn, do not calculate r-squared as the square of Pearson's r, and can return a negative r-squared if the model performs extremely poorly.


pages: 433 words: 53,078

Be Your Own Financial Adviser: The Comprehensive Guide to Wealth and Financial Planning by Jonquil Lowe

AltaVista, asset allocation, banking crisis, BRICs, buy and hold, correlation coefficient, cross-subsidies, diversification, diversified portfolio, estate planning, fixed income, high net worth, money market fund, mortgage debt, mortgage tax deduction, negative equity, offshore financial centre, Own Your Own Home, passive investing, place-making, Right to Buy, risk/return, short selling, zero-coupon bond

The extent to which different investments or assets are correlated can be measured and represented by a statistic called a ‘correlation coefficient’. A coefficient of 1 would mean that two asset classes moved in exactly the same way (so there would not be any point combining the assets). A coefficient of zero would mean the asset classes were completely uncorrelated. Most coefficients lie between these two extremes. A negative coefficient means that positive performance for one asset class is associated with negative performance from the other. (A coefficient of –1 would mean the assets were perfectly negatively correlated with the risk of losses on one asset being completely offset by the chance of gains on the other, so eliminating risk altogether for a portfolio made up of these two assets.) Table 10.1 on p. 301 shows the correlation coefficient for different pairs of asset class using data for the period from 1997 to 2009.


Trend Commandments: Trading for Exceptional Returns by Michael W. Covel

Albert Einstein, Bernie Madoff, Black Swan, business cycle, buy and hold, commodity trading advisor, correlation coefficient, delayed gratification, diversified portfolio, en.wikipedia.org, Eugene Fama: efficient market hypothesis, family office, full employment, Lao Tzu, Long Term Capital Management, market bubble, market microstructure, Mikhail Gorbachev, moral hazard, Myron Scholes, Nick Leeson, oil shock, Ponzi scheme, prediction markets, quantitative trading / quantitative finance, random walk, Sharpe ratio, systematic trading, the scientific method, transaction costs, tulip mania, upwardly mobile, Y2K, zero-sum game

Consider this nearly 40-year track record of trend trading wealth building: Chart 1: Bill Dunn Unit Value Log Scale DUNN Composite Performance: $500,000 $570,490 10 Drawdowns Greater Than -25% October 1974 through January 2011 Average Major Drawdown: 38% $100,000 -40% Compound Annual Rate of Return -27% -60% -34% $62,375 DUNN Composite: 19.09% S&P 500 (Total return) : 11.85% -29% -34% -45% -35% -51% -43% $10,000 -30% Correlation Coefficient -0.05 Past Performance is Not Necessarily Indicative of Future Results -28% -45% Includes Notional and Proprietary Funds All Net of Pro Forma Fees and Expenses -52% ' 70 ' 71 ' 72 ' 73 ' 74 ' 75 ' 76 ' 77 ' 78 ' 79 ' 80 ' 81 ' 82 ' 83 ' 84 ' 85 ' 86 ' 87 ' 88 ' 89 ' 90 ' 91 ' 92 ' 93 ' 94 ' 95 ' 96 ' 97 ' 98 ' 99 ' 00 ' 01 ' 02 ' 03 ' 04 ' 05 ' 06 ' 07 ' 08 ' 09 ' 10 ' 11 $1,000 That picture is worth a thousand words.


pages: 204 words: 58,565

Keeping Up With the Quants: Your Guide to Understanding and Using Analytics by Thomas H. Davenport, Jinho Kim

Black-Scholes formula, business intelligence, business process, call centre, computer age, correlation coefficient, correlation does not imply causation, Credit Default Swap, en.wikipedia.org, feminist movement, Florence Nightingale: pie chart, forensic accounting, global supply chain, Hans Rosling, hypertext link, invention of the telescope, inventory management, Jeff Bezos, Johannes Kepler, longitudinal study, margin call, Moneyball by Michael Lewis explains big data, Myron Scholes, Netflix Prize, p-value, performance metric, publish or perish, quantitative hedge fund, random walk, Renaissance Technologies, Robert Shiller, Robert Shiller, self-driving car, sentiment analysis, six sigma, Skype, statistical model, supply-chain management, text mining, the scientific method, Thomas Davenport

., records in a database) into groups (called clusters) so that objects within clusters are similar in some manner while objects across clusters are dissimilar to each other. Clustering is a main task of exploratory data mining, and a common technique for statistical data analysis used in many fields. Correlation: The extent to which two or more variables are related to one another. The degree of relatedness is expressed as a correlation coefficient, which ranges from −1.0 to +1.0. Correlation = +1 (Perfect positive correlation, meaning that both variables always move in the same direction together) Correlation = 0 (No relationship between the variables) Correlation = −1 (Perfect negative correlation, meaning that as one variable goes up, the other always trends downward) Correlation does not imply causation. Correlation is a necessary but insufficient condition for casual conclusions.


pages: 935 words: 267,358

Capital in the Twenty-First Century by Thomas Piketty

"Robert Solow", accounting loophole / creative accounting, Asian financial crisis, banking crisis, banks create money, Berlin Wall, Branko Milanovic, British Empire, business cycle, capital controls, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, central bank independence, centre right, circulation of elites, collapse of Lehman Brothers, conceptual framework, corporate governance, correlation coefficient, David Ricardo: comparative advantage, demographic transition, distributed generation, diversification, diversified portfolio, European colonialism, eurozone crisis, Fall of the Berlin Wall, financial intermediation, full employment, German hyperinflation, Gini coefficient, high net worth, Honoré de Balzac, immigration reform, income inequality, income per capita, index card, inflation targeting, informal economy, invention of the steam engine, invisible hand, joint-stock company, Joseph Schumpeter, Kenneth Arrow, market bubble, means of production, mortgage debt, mortgage tax deduction, new economy, New Urbanism, offshore financial centre, open economy, Paul Samuelson, pension reform, purchasing power parity, race to the bottom, randomized controlled trial, refrigerator car, regulatory arbitrage, rent control, rent-seeking, Robert Gordon, Ronald Reagan, Simon Kuznets, sovereign wealth fund, Steve Jobs, The Nature of the Firm, the payments system, The Wealth of Nations by Adam Smith, Thomas Malthus, Thorstein Veblen, trade liberalization, twin studies, very high income, Vilfredo Pareto, We are the 99%, zero-sum game

But this is a different issue from skill and earned income mobility, which is what is of interest here and is the focal point of these measurements of intergenerational mobility. The data used in these works do not allow us to isolate mobility of capital income. 28. The correlation coefficient ranges from 0.2–0.3 in Sweden and Finland to 0.5–0.6 in the United States. Britain (0.4–0.5) is closer to the United States but not so far from Germany or France (0.4). Concerning international comparisons of intergenerational correlation coefficients of earned income (which are also confirmed by twin studies), see the work of Markus Jantti. See the online technical appendix. 29. The cost of an undergraduate year at Harvard in 2012–2013 was $54,000, including room and board and various other fees (tuition in the strict sense was $38,000).

According to the available data, the answer seems to be no: the intergenerational correlation of education and earned incomes, which measures the reproduction of the skill hierarchy over time, shows no trend toward greater mobility over the long run, and in recent years mobility may even have decreased.26 Note, however, that it is much more difficult to measure mobility across generations than it is to measure inequality at a given point in time, and the sources available for estimating the historical evolution of mobility are highly imperfect.27 The most firmly established result in this area of research is that intergenerational reproduction is lowest in the Nordic countries and highest in the United States (with a correlation coefficient two-thirds higher than in Sweden). France, Germany, and Britain occupy a middle ground, less mobile than northern Europe but more mobile than the United States.28 These findings stand in sharp contrast to the belief in “American exceptionalism” that once dominated US sociology, according to which social mobility in the United States was exceptionally high compared with the class-bound societies of Europe.


Once the American Dream: Inner-Ring Suburbs of the Metropolitan United States by Bernadette Hanlon

big-box store, correlation coefficient, deindustrialization, desegregation, edge city, feminist movement, housing crisis, illegal immigration, informal economy, longitudinal study, low skilled workers, low-wage service sector, manufacturing employment, McMansion, New Urbanism, Silicon Valley, statistical model, The Chicago School, transit-oriented development, urban sprawl, white flight, working-age population, zero-sum game

Table A.7 174 / Appendix TABLE A.7 RESULTS OF PEARSON CORRELATION BETWEEN INDEX SCORE AND CHANGE IN THE MEDIAN HOUSEHOLD INCOME RATIO FROM 1980 TO 2000 Variables Index score Change in median household income ratio from 1980 to 2000 Change in median household income ratio from 1980 to 2000 1 −0.801a −0.801a 1 3,428 3,428 N a Correlation Index score is significant at the 0.01 level (2-tailed). shows a Pearson’s Correlation between these two variables of −0.801. Pearson’s Correlation is a measure of correlation between two variables— that is, a measure of the tendency of variables to increase or decrease together. The correlation coefficient of −0.801 indicates that 80 percent of the variance in income is explained by variance in index score. The index score and the change in median household income ratio are highly negatively correlated. As the index score increases, the median household income ratio increases less over time. In other words, as the index score increases (i.e., indicating decline), the suburb becomes less affluent over time.


pages: 306 words: 78,893

After the New Economy: The Binge . . . And the Hangover That Won't Go Away by Doug Henwood

"Robert Solow", accounting loophole / creative accounting, affirmative action, Asian financial crisis, barriers to entry, borderless world, Branko Milanovic, Bretton Woods, business cycle, capital controls, corporate governance, corporate raider, correlation coefficient, credit crunch, deindustrialization, dematerialisation, deskilling, ending welfare as we know it, feminist movement, full employment, gender pay gap, George Gilder, glass ceiling, Gordon Gekko, greed is good, half of the world's population has never made a phone call, income inequality, indoor plumbing, intangible asset, Internet Archive, job satisfaction, joint-stock company, Kevin Kelly, labor-force participation, liquidationism / Banker’s doctrine / the Treasury view, manufacturing employment, means of production, minimum wage unemployment, Naomi Klein, new economy, occupational segregation, pets.com, post-work, profit maximization, purchasing power parity, race to the bottom, Ralph Nader, Robert Gordon, Robert Shiller, Robert Shiller, Ronald Reagan, shareholder value, Silicon Valley, Simon Kuznets, statistical model, structural adjustment programs, Telecommunications Act of 1996, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, The Wealth of Nations by Adam Smith, total factor productivity, union organizing, War on Poverty, women in the workforce, working poor, zero-sum game

That estimate was arrived at by dividing the Fed's X4HTK2 output index, part of the industrial production series, by an estimate of hours worked. Hours worked was estimated by multiplying the BL5 figure for total employment by average weekly hours in each of the component industries, adding them together, and subtracting the result from a similar estimate of total manufacturing hours worked. While not exact, the approximation is pretty good; an estimate of total manufacturing productivity using this technique had a correlation coefficient of .86 with the official index. See text for discussion. the way we live and work, sometimes to the good, sometimes not. Do they make a 28% annual contribution to the growth of human happiness? Closely related to the productivity argument is a claim about innovation: that we Hve in a time of new product development without any his- torical precedent. This is a remarkably amnesiac claim.


pages: 209 words: 13,138

Empirical Market Microstructure: The Institutions, Economics and Econometrics of Securities Trading by Joel Hasbrouck

Alvin Roth, barriers to entry, business cycle, conceptual framework, correlation coefficient, discrete time, disintermediation, distributed generation, experimental economics, financial intermediation, index arbitrage, information asymmetry, interest rate swap, inventory management, market clearing, market design, market friction, market microstructure, martingale, price discovery process, price discrimination, quantitative trading / quantitative finance, random walk, Richard Thaler, second-price auction, selection bias, short selling, statistical model, stochastic process, stochastic volatility, transaction costs, two-sided market, ultimatum game, zero-sum game

Equating the certainty equivalents of being hit and not being hit gives: B1 = µ1 − ασ1 [(1 + 2n1 ) σ1 + 2ρn2 σ2 ] , 2 (11.3) where ρ = Corr(X1 , X2 ). Thus, the dealer will bid less aggressively if the securities are positively correlated. This conforms to the usual intuition that positive correlation aggravates total portfolio risk. On the other hand, if we assume (as before) that the dealer is starting at his optimum, then B1 = P1 − ασ12 /2. Surprisingly, this is the same result as in the one-security case. In particular, the correlation coefficient drops out. This is a consequence of offsetting effects. The optimal n1 and n2 in equation (11.3) depends negatively on ρ, leaving the bracketed term invariant to changes in ρ. (Although this offset is a general feature of the problem, the complete disappearance of ρ in the final expression for the bid is a consequence of CARA utility.) 11.3 Empirical Analysis of Dealer Inventories 11.3.1 A First Look at the Data Changes in the dealer’s position reveal the dealer’s trades, which may disclose strategy and profitability.


pages: 325 words: 73,035

Who's Your City?: How the Creative Economy Is Making Where to Live the Most Important Decision of Your Life by Richard Florida

active measures, assortative mating, barriers to entry, big-box store, blue-collar work, borderless world, BRICs, business climate, Celebration, Florida, correlation coefficient, creative destruction, dark matter, David Brooks, David Ricardo: comparative advantage, deindustrialization, demographic transition, edge city, Edward Glaeser, epigenetics, extreme commuting, Geoffrey West, Santa Fe Institute, happiness index / gross national happiness, high net worth, income inequality, industrial cluster, invention of the telegraph, Jane Jacobs, job satisfaction, Joseph Schumpeter, knowledge economy, knowledge worker, low skilled workers, megacity, new economy, New Urbanism, Peter Calthorpe, place-making, post-work, Richard Florida, risk tolerance, Robert Gordon, Robert Shiller, Robert Shiller, Seaside, Florida, Silicon Valley, Silicon Valley startup, superstar cities, The Death and Life of Great American Cities, The Wealth of Nations by Adam Smith, Thomas L Friedman, urban planning, World Values Survey, young professional

Seligman, “Beyond Money: Toward an Economy of Well-Being,” Psychological Science in the Public Interest 5, 1, 2004, pp. 1-31.Betsy Stevenson and Justin Wolfers, “Economic Growth and Subjective Wellbeing: reassessing the Easterlin Paradox,” Wharton School, University of Pennsylvania, May 9, 2008, http://bpp.wharton.upenn.edu/jwolfers/Papers/EasterlinParadox.pdf. 3 Also see Angus Deaton, “Income, Aging, Health, and Wellbeing Around the World: Evidence from the Gallup World Poll,” Center for Health and Wellbeing, Research Program in Development Studies, Princeton University, August 2007. 4 Nick Paumgarten, “There and Back Again,” New Yorker, April 16, 2007. 5 Robert Manchin, “The Emotional Capital and Desirability of European Cities,” Gallup Europe, presented at the European Week of Cities and Regions, Brussels, October 2007. 6 The correlation coefficients between overall happiness and various factors are as follows: financial satisfaction (.369), job satisfaction (.367), place satisfaction (.303). Compare with income (.153), homeownership (.126), and age (.06). The regression coefficients (from an ordered probit regression) are as follows: financial satisfaction (.342), place satisfaction (.254), job satisfaction (.254). Compare with income (.039), age (-.06), and education (-.09). 7 The overall correlation between income and community satisfaction is relatively weak (.15). 8 Veolia Observatory of Urban Lifestyles, Life in the City (Paris), http://www.observatoire.veolia.com/en, 2008. 9 Mihaly Csikszentmihalyi, Flow: The Psychology of Optimal Experience, HarperCollins, 1990; and Csikszentmihalyi, Finding Flow: The Psychology of Engagement with Everyday Life, Basic Books, 1997. 10 See Teresa Amabile et al., “Affect and Creativity at Work,” Administrative Science Quarterly 50, March 2005, pp. 367-403.


pages: 249 words: 77,342

The Behavioral Investor by Daniel Crosby

affirmative action, Asian financial crisis, asset allocation, availability heuristic, backtesting, bank run, Black Swan, buy and hold, cognitive dissonance, colonial rule, compound rate of return, correlation coefficient, correlation does not imply causation, Daniel Kahneman / Amos Tversky, diversification, diversified portfolio, Donald Trump, endowment effect, feminist movement, Flash crash, haute cuisine, hedonic treadmill, housing crisis, IKEA effect, impulse control, index fund, Isaac Newton, job automation, longitudinal study, loss aversion, market bubble, market fundamentalism, mental accounting, meta analysis, meta-analysis, Milgram experiment, moral panic, Murray Gell-Mann, Nate Silver, neurotypical, passive investing, pattern recognition, Ponzi scheme, prediction markets, random walk, Richard Feynman, Richard Thaler, risk tolerance, Robert Shiller, Robert Shiller, science of happiness, Shai Danziger, short selling, South Sea Bubble, Stanford prison experiment, Stephen Hawking, Steve Jobs, stocks for the long run, Thales of Miletus, The Signal and the Noise by Nate Silver, tulip mania, Vanguard fund

The authors of ‘Positive Illusions and Forecasting Errors in Mutual Fund Investment Decisions’ discovered that most participants had consistently overestimated both the future and past performance of their investments.70 One-third of those who believed that they had outperformed the market had actually lagged by at least 5% and another quarter of people lagged by 15% or greater. Even more damning research is found by Glaser and Weber who discovered that, “Investors are unable to give a correct estimate of their own past portfolio performance. The correlation coefficient between return estimates and realized returns was not distinguishable from zero.”71 The finding that investors would misstate their returns is not entirely surprising, but the size and scope of the problem is. Only 30% of those surveyed considered themselves to be “average” investors and the average overestimation of returns was 11.5% per year! More shocking still, portfolio performance was negatively tied to the difference between estimates and actual returns; the lower the returns, the worse investors were at remembering their realized returns.


pages: 270 words: 73,485

Hubris: Why Economists Failed to Predict the Crisis and How to Avoid the Next One by Meghnad Desai

"Robert Solow", 3D printing, bank run, banking crisis, Berlin Wall, Big bang: deregulation of the City of London, Bretton Woods, BRICs, British Empire, business cycle, Capital in the Twenty-First Century by Thomas Piketty, Carmen Reinhart, central bank independence, collapse of Lehman Brothers, collateralized debt obligation, correlation coefficient, correlation does not imply causation, creative destruction, Credit Default Swap, credit default swaps / collateralized debt obligations, David Ricardo: comparative advantage, deindustrialization, demographic dividend, Eugene Fama: efficient market hypothesis, eurozone crisis, experimental economics, Fall of the Berlin Wall, financial innovation, Financial Instability Hypothesis, floating exchange rates, full employment, German hyperinflation, Gunnar Myrdal, Home mortgage interest deduction, imperial preference, income inequality, inflation targeting, invisible hand, Isaac Newton, Joseph Schumpeter, Kenneth Arrow, Kenneth Rogoff, laissez-faire capitalism, liquidity trap, Long Term Capital Management, market bubble, market clearing, means of production, Mexican peso crisis / tequila crisis, mortgage debt, Myron Scholes, negative equity, Northern Rock, oil shale / tar sands, oil shock, open economy, Paul Samuelson, price stability, purchasing power parity, pushing on a string, quantitative easing, reserve currency, rising living standards, risk/return, Robert Shiller, Robert Shiller, Ronald Reagan, savings glut, secular stagnation, seigniorage, Silicon Valley, Simon Kuznets, The Chicago School, The Great Moderation, The inhabitant of London could order by telephone, sipping his morning tea in bed, the various products of the whole earth, The Wealth of Nations by Adam Smith, Tobin tax, too big to fail, women in the workforce

To convey this a random (or stochastic) error (or shock) term is added to the equation (y = a – bx + u). This is to allow for the basic uncertainty of all economic events, as well as to allow for many other variables which have to be omitted to keep the relationship simple. If the basic equation is sound, then it will explain a large part of the variation in the variable we are interested in, in our case y, the amount bought of a commodity. A measure of the “goodness of fit” is the correlation coefficient r or its square R2 (R squared). Many equations together constitute a model and there are more sophisticated measures of the explanatory powers of a model. The use of econometric techniques is widespread now in public and private sector decision-making. Increasingly numbers have become an indispensable part of the toolkit of economists. The Econometric Society was born at a time when economics itself was about to become more mathematically oriented and would require the services of experts who could translate policy advice into specific numbers.


pages: 266 words: 76,299

Ever Since Darwin: Reflections in Natural History by Stephen Jay Gould

Alfred Russel Wallace, British Empire, correlation coefficient, Drosophila, European colonialism, invisible hand, Isaac Newton, Monroe Doctrine, Paul Samuelson, Scientific racism, sexual politics, the scientific method, twin studies

Kamin has done the dog-work of meticulously checking through details of the twin studies that form the basis of this estimate. He has found an astonishing number of inconsistencies and downright inaccuracies. For example, the late Sir Cyril Burt, who generated the largest body of data on identical twins reared apart, pursued his studies of intelligence for more than forty years. Although he increased his sample sizes in a variety of “improved” versions, some of his correlation coefficients remain unchanged to the third decimal place—a statistically impossible situation.5 IQ depends in part upon sex and age; and other studies did not standardize properly for them. An improper correction may produce higher values between twins not because they hold genes for intelligence in common, but simply because they share the same sex and age. The data are so flawed that no valid estimate for the heritability of IQ can be drawn at all.


pages: 701 words: 199,010

The Crisis of Crowding: Quant Copycats, Ugly Models, and the New Crash Normal by Ludwig B. Chincarini

affirmative action, asset-backed security, automated trading system, bank run, banking crisis, Basel III, Bernie Madoff, Black-Scholes formula, business cycle, buttonwood tree, Carmen Reinhart, central bank independence, collapse of Lehman Brothers, collateralized debt obligation, collective bargaining, corporate governance, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, delta neutral, discounted cash flows, diversification, diversified portfolio, family office, financial innovation, financial intermediation, fixed income, Flash crash, full employment, Gini coefficient, high net worth, hindsight bias, housing crisis, implied volatility, income inequality, interest rate derivative, interest rate swap, John Meriwether, Kickstarter, liquidity trap, London Interbank Offered Rate, Long Term Capital Management, low skilled workers, margin call, market design, market fundamentalism, merger arbitrage, Mexican peso crisis / tequila crisis, Mitch Kapor, money market fund, moral hazard, mortgage debt, Myron Scholes, negative equity, Northern Rock, Occupy movement, oil shock, price stability, quantitative easing, quantitative hedge fund, quantitative trading / quantitative finance, Ralph Waldo Emerson, regulatory arbitrage, Renaissance Technologies, risk tolerance, risk-adjusted returns, Robert Shiller, Robert Shiller, Ronald Reagan, Sam Peltzman, Sharpe ratio, short selling, sovereign wealth fund, speech recognition, statistical arbitrage, statistical model, survivorship bias, systematic trading, The Great Moderation, too big to fail, transaction costs, value at risk, yield curve, zero-coupon bond

., ) and you’ll have a riskless portfolio with a positive return. Unfortunately, it is impossible to have an average correlation of −1 with more than two strategies in a portfolio. As you add more strategies, however, the conditions necessary for a very low-risk portfolio grow less stringent. Consider a portfolio of n trading strategies. Assuming a similar variance for each strategy, , and that any pair of strategies has the same correlation coefficient, . If all strategies are equally weighted (that is, ) and individual strategy returns are positive, the portfolio variance is: (A.4) Construct a portfolio with a large number of positions that have an average correlation of zero, and the portfolio risk decreases toward zero. For the purposes of this analysis, assume that this was LTCM’s driving concept. It may seem that this is too simple an explanation of how LTCM operated.

A simple value-at-risk (VaR) formula for the above structure is: (A.9) where represents the expected return of the levered portfolio, represents the standard deviation of the levered portfolio, Vt represents the initial portfolio value, and k represents the confidence level critical value, assuming a normal distribution (i.e., k = 1.96 for a 97.5% confidence interval).9 Table A.1 presents the potential VaR calculations at a 99% confidence level for a normal distribution (k = 2.33) and a capital base of $4.8B (the amount that LTCM had at the beginning of 1998). The VaR numbers are presented as monthly numbers. Given the correlation coefficient, this represents what might have been expected to occur in any given month at LTCM. TABLE A.1 Sensitivity of VaR to Strategy Correlations Table A.1 shows that an unlevered fund’s standard deviation was 0.0951% per month and 0.6723% per month with a correlation of 0 and 1 respectively. The equivalent annualized volatility was 0.3294% and 2.3290% respectively. Generally this illustrates the portfolio that LTCM sought: a high Sharpe ratio and very low unlevered risk.


pages: 297 words: 84,009

Big Business: A Love Letter to an American Anti-Hero by Tyler Cowen

23andMe, Affordable Care Act / Obamacare, augmented reality, barriers to entry, Bernie Sanders, bitcoin, blockchain, Bretton Woods, cloud computing, cognitive dissonance, corporate governance, corporate social responsibility, correlation coefficient, creative destruction, crony capitalism, cryptocurrency, dark matter, David Brooks, David Graeber, don't be evil, Donald Trump, Elon Musk, employer provided health coverage, experimental economics, Filter Bubble, financial innovation, financial intermediation, global reserve currency, global supply chain, Google Glasses, income inequality, Internet of things, invisible hand, Jeff Bezos, late fees, Mark Zuckerberg, mobile money, money market fund, mortgage debt, Network effects, new economy, Nicholas Carr, obamacare, offshore financial centre, passive investing, payday loans, peer-to-peer lending, Peter Thiel, pre–internet, price discrimination, profit maximization, profit motive, RAND corporation, rent-seeking, reserve currency, ride hailing / ride sharing, risk tolerance, Ronald Coase, shareholder value, Silicon Valley, Silicon Valley startup, Skype, Snapchat, Social Responsibility of Business Is to Increase Its Profits, Steve Jobs, The Nature of the Firm, Tim Cook: Apple, too big to fail, transaction costs, Tyler Cowen: Great Stagnation, ultimatum game, WikiLeaks, women in the workforce, World Values Survey, Y Combinator

Here too we should be cautious about how grand a conclusion we draw from a single study, but this is suggestive evidence that the workplace often serves a significant protective and equalizing function when it comes to personal stress. Furthermore, the Kahneman and Krueger research generates a broadly similar result. The positive affect associated with the workday is not closely related to the features we usually associate with a “good” job. (For instance, the correlation coefficient of positive affect in the workplace with “excellent benefits” is only about 0.10.) People with lower-quality jobs still get a lot of the benefits from the positive affect associated with work. Here’s a simple and probably familiar story from Elizabeth Bernstein, writing in the Wall Street Journal. This narrative reflects how important work can be as a refuge and a hiding place: Tara Kennedy-Kline, a family advocate and owner of a toy-distribution company, says on an evening or weekend she has been known to go to her warehouse and rearrange 1,500 boxes in a shipping container just to get away from her family’s requests of “What’s for dinner?”


pages: 345 words: 87,745

The Power of Passive Investing: More Wealth With Less Work by Richard A. Ferri

asset allocation, backtesting, Bernie Madoff, buy and hold, capital asset pricing model, cognitive dissonance, correlation coefficient, Daniel Kahneman / Amos Tversky, diversification, diversified portfolio, endowment effect, estate planning, Eugene Fama: efficient market hypothesis, fixed income, implied volatility, index fund, intangible asset, Long Term Capital Management, money market fund, passive investing, Paul Samuelson, Ponzi scheme, prediction markets, random walk, Richard Thaler, risk tolerance, risk-adjusted returns, risk/return, Sharpe ratio, survivorship bias, too big to fail, transaction costs, Vanguard fund, yield curve, zero-sum game

chartered financial analyst (CFA) An investment professional who has met competency standards in economics, securities, portfolio management, and financial accounting as determined by the Institute of Chartered Financial Analysts. closed-end fund A mutual fund that has a fixed number of shares, usually listed on a major stock exchange. commodities Unprocessed goods, such as grains, metals, and minerals, traded in large amounts on a commodities exchange. consumer price index (CPI) A measure of the price change in consumer goods and services. The CPI is used to track the pace of inflation. correlation coefficient A number between −1 and 1 that measures the degree to which two variables are linearly related. cost basis The original cost of an investment. For tax purposes, the cost basis is subtracted from the sale price to determine any capital gain or loss. country risk The possibility that political events (e.g., a war, national elections); financial problems (e.g., rising inflation, government default); or natural disasters (e.g., an earthquake, a poor harvest) will weaken a country’s economy and cause investments in that country to decline.


pages: 901 words: 234,905

The Blank Slate: The Modern Denial of Human Nature by Steven Pinker

affirmative action, Albert Einstein, Alfred Russel Wallace, anti-communist, British Empire, clean water, cognitive dissonance, Columbine, conceptual framework, correlation coefficient, correlation does not imply causation, cuban missile crisis, Daniel Kahneman / Amos Tversky, Defenestration of Prague, desegregation, epigenetics, Exxon Valdez, George Akerlof, germ theory of disease, ghettoisation, glass ceiling, Hobbesian trap, income inequality, invention of agriculture, invisible hand, Joan Didion, long peace, meta analysis, meta-analysis, More Guns, Less Crime, Murray Gell-Mann, mutually assured destruction, Norman Mailer, Peter Singer: altruism, phenotype, plutocrats, Plutocrats, Potemkin village, prisoner's dilemma, profit motive, QWERTY keyboard, Richard Feynman, Richard Thaler, risk tolerance, Robert Bork, Rodney Brooks, Saturday Night Live, social intelligence, speech recognition, Stanford prison experiment, stem cell, Steven Pinker, The Bell Curve by Richard Herrnstein and Charles Murray, the new new thing, theory of mind, Thomas Malthus, Thorstein Veblen, twin studies, ultimatum game, urban renewal, War on Poverty, women in the workforce, Yogi Berra, zero-sum game

Chapter 16 Politics I often think it’s comical How nature always does contrive That every boy and every gal, That’s born into the world alive, Is either a little Liberal, Or else a little Conservative!1 GILBERT AND SULLIVAN got it mostly right in 1882: liberal and conservative political attitudes are largely, though far from completely, heritable. When identical twins who were separated at birth are tested in adulthood, their political attitudes turn out to be similar, with a correlation coefficient of. 62 (on a scale from-1 to +1).2 Liberal and conservative attitudes are heritable not, of course, because attitudes are synthesized directly from DNA but because they come naturally to people with different temperaments. Conservatives, for example, tend to be more authoritarian, conscientious, traditional, and rule-bound. But whatever its immediate source, the heritability of political attitudes can explain some of the sparks that fly when liberals and conservatives meet.

For example, the variance in weight in a sample of Labrador retrievers will be smaller than the variance in weight in a sample that contains dogs of different breeds. Variance can be carved into pieces. It is mathematically meaningful to say that a certain percentage of the variance in a group overlaps with one factor (perhaps, though not necessarily, its cause), another percentage overlaps with a second factor, and so on, the percentages adding up to 100. The degree of overlap may be measured as a correlation coefficient, a number between-1 and +1 that captures the degree to which people who are high on one measurement are also high on another measurement. It is used in behavioral genetic research as an estimate of the proportion of variance accounted for by some factor.3 Heritability is the proportion of variance in a trait that correlates with genetic differences. It can be measured in several ways.4 The simplest is to take the correlation between identical twins who were separated at birth and reared apart.


pages: 317 words: 106,130

The New Science of Asset Allocation: Risk Management in a Multi-Asset World by Thomas Schneeweis, Garry B. Crowder, Hossein Kazemi

asset allocation, backtesting, Bernie Madoff, Black Swan, business cycle, buy and hold, capital asset pricing model, collateralized debt obligation, commodity trading advisor, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, diversified portfolio, fixed income, high net worth, implied volatility, index fund, interest rate swap, invisible hand, market microstructure, merger arbitrage, moral hazard, Myron Scholes, passive investing, Richard Feynman, Richard Feynman: Challenger O-ring, risk tolerance, risk-adjusted returns, risk/return, selection bias, Sharpe ratio, short selling, statistical model, stocks for the long run, survivorship bias, systematic trading, technology bubble, the market place, Thomas Kuhn: the structure of scientific revolutions, transaction costs, value at risk, yield curve, zero-sum game

The break-even (Rc) and excess break-even rate of return (EBK) is often computed as follows: ⎛ E (Rp ) − Rf ⎞ E ( Rc ) = ⎜ ⎟⎠ ( ρcp ) σ c + Rf ⎝ σp ⎤ ⎡⎛ E (Rp ) − Rf ⎞ EBK = Rc − ⎢⎜ ⎟⎠ ( ρcp ) σ c + Rf ⎥ ⎝ σ p ⎦ ⎣ where E(Rc) = Break-even rate of return required for the asset to improve the Sharpe Ratio of alternative index p Rc = Rate of return on asset c Rf = Riskless rate of return E(Rp) = Rate of return on alternative index p ρcp = Correlation coefficient between asset c and alternative benchmark p σc = Standard deviation of asset c σp = Standard deviation of alternative index p First, it is important to realize that the above expression is based on the assumption that only mean and variance matter in evaluating the risk-return profile of a portfolio. Second, one must be familiar with the potential prob- 44 THE NEW SCIENCE OF ASSET ALLOCATION lems that can arise in using this expression.


pages: 417 words: 103,458

The Intelligence Trap: Revolutionise Your Thinking and Make Wiser Decisions by David Robson

active measures, Affordable Care Act / Obamacare, Albert Einstein, Alfred Russel Wallace, Atul Gawande, availability heuristic, cognitive bias, corporate governance, correlation coefficient, cuban missile crisis, Daniel Kahneman / Amos Tversky, dark matter, deliberate practice, dematerialisation, Donald Trump, Flynn Effect, framing effect, fundamental attribution error, illegal immigration, Isaac Newton, job satisfaction, knowledge economy, lone genius, meta analysis, meta-analysis, Nelson Mandela, obamacare, pattern recognition, price anchoring, Richard Feynman, risk tolerance, Silicon Valley, social intelligence, Steve Jobs, the scientific method, theory of mind, traveling salesman, ultimatum game, Y2K, Yom Kippur War

One study of the high-IQ society Mensa, for example, showed that 44 per cent of its members believed in astrology, and 56 per cent believed that the Earth had been visited by extra-terrestrials.10 But rigorous experiments, specifically exploring the link between intelligence and rationality, were lacking. Stanovich has now spent more than two decades building on those foundations with a series of carefully controlled experiments. To understand his results, we need some basic statistical theory. In psychology and other sciences, the relationship between two variables is usually expressed as a correlation coefficient between 0 and 1. A perfect correlation would have a value of 1 – the two parameters would essentially be measuring the same thing; this is unrealistic for most studies of human health and behaviour (which are determined by so many variables), but many scientists would consider a ‘moderate’ correlation to lie between 0.4 and 0.59.11 Using these measures, Stanovich found that the relationships between rationality and intelligence were generally very weak.


pages: 571 words: 105,054

Advances in Financial Machine Learning by Marcos Lopez de Prado

algorithmic trading, Amazon Web Services, asset allocation, backtesting, bioinformatics, Brownian motion, business process, Claude Shannon: information theory, cloud computing, complexity theory, correlation coefficient, correlation does not imply causation, diversification, diversified portfolio, en.wikipedia.org, fixed income, Flash crash, G4S, implied volatility, information asymmetry, latency arbitrage, margin call, market fragmentation, market microstructure, martingale, NP-complete, P = NP, p-value, paper trading, pattern recognition, performance metric, profit maximization, quantitative trading / quantitative finance, RAND corporation, random walk, risk-adjusted returns, risk/return, selection bias, Sharpe ratio, short selling, Silicon Valley, smart cities, smart meter, statistical arbitrage, statistical model, stochastic process, survivorship bias, transaction costs, traveling salesman

Bruss, F. (1984): “A unified approach to a class of best choice problems with an unknown number of options.” Annals of Probability, Vol. 12, No. 3, pp. 882–891. Dmitrienko, A., A.C. Tamhane, and F. Bretz (2010): Multiple Testing Problems in Pharmaceutical Statistics, 1st ed. CRC Press. Dudoit, S. and M.J. van der Laan (2008): Multiple Testing Procedures with Applications to Genomics, 1st ed. Springer. Fisher, R.A. (1915): “Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population.” Biometrika (Biometrika Trust), Vol. 10, No. 4, pp. 507–521. Hand, D. J. (2014): The Improbability Principle, 1st ed. Scientific American/Farrar, Straus and Giroux. Harvey, C., Y. Liu, and H. Zhu (2013): “. . . And the cross-section of expected returns.” Working paper, Duke University. Available at http://ssrn.com/abstract=2249314. Harvey, C. and Y.


pages: 389 words: 109,207

Fortune's Formula: The Untold Story of the Scientific Betting System That Beat the Casinos and Wall Street by William Poundstone

Albert Einstein, anti-communist, asset allocation, beat the dealer, Benoit Mandelbrot, Black-Scholes formula, Brownian motion, buy and hold, buy low sell high, capital asset pricing model, Claude Shannon: information theory, computer age, correlation coefficient, diversified portfolio, Edward Thorp, en.wikipedia.org, Eugene Fama: efficient market hypothesis, high net worth, index fund, interest rate swap, Isaac Newton, Johann Wolfgang von Goethe, John Meriwether, John von Neumann, Kenneth Arrow, Long Term Capital Management, Louis Bachelier, margin call, market bubble, market fundamentalism, Marshall McLuhan, Myron Scholes, New Journalism, Norbert Wiener, offshore financial centre, Paul Samuelson, publish or perish, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, Robert Shiller, Robert Shiller, Ronald Reagan, Rubik’s Cube, short selling, speech recognition, statistical arbitrage, The Predators' Ball, The Wealth of Nations by Adam Smith, transaction costs, traveling salesman, value at risk, zero-coupon bond, zero-sum game

For reasons mathematical, psychological, and sociological, it is a good idea to use a money management system that is relatively forgiving of estimation errors. Fat Tails and Leverage Suppose you’re betting on a simultaneous toss of coins believed to have a 55 percent chance of coming up heads, as depicted on the previous page. But on this toss, only 45 percent of the coins are heads. Call it a “fat tail” event, or a failure of correlation coefficients, or a big dumb mistake in somebody’s computer model. What then? The Kelly bettor cannot be ruined in a single toss. (He is prepared to survive the worst-case scenario, of zero heads.) In this situation, with many coins, the Kelly bettor will stake just short of his full bankroll. He wins only 45 percent of the wagers, doubling the amount bet on each coin that comes up heads. The Kelly bettor therefore preserves at least 90 percent of his bankroll.


pages: 421 words: 110,272

Deaths of Despair and the Future of Capitalism by Anne Case, Angus Deaton

Affordable Care Act / Obamacare, basic income, Bertrand Russell: In Praise of Idleness, business cycle, call centre, collapse of Lehman Brothers, collective bargaining, Corn Laws, corporate governance, correlation coefficient, crack epidemic, creative destruction, crony capitalism, declining real wages, deindustrialization, demographic transition, Dissolution of the Soviet Union, Donald Trump, Downton Abbey, Edward Glaeser, Elon Musk, falling living standards, Fellow of the Royal Society, germ theory of disease, income inequality, Jeff Bezos, Joseph Schumpeter, Kenneth Arrow, labor-force participation, low skilled workers, Martin Wolf, Mikhail Gorbachev, obamacare, pensions crisis, randomized controlled trial, refrigerator car, rent-seeking, risk tolerance, shareholder value, Silicon Valley, The Spirit Level, The Wealth of Nations by Adam Smith, Tim Cook: Apple, trade liberalization, universal basic income, working-age population, zero-sum game

.,” Fact Tank, Pew Research Center, December 27, https://www.pewresearch.org/fact-tank/2018/12/27/facts-about-guns-in-united-states/. 11. National Research Council, 2005, “Firearms and suicide,” in Firearms and violence: A critical review, National Academies Press, 152–200. 12. Robert D. Putnam, 2000, Bowling alone: The collapse and revival of American community, Simon and Schuster. 13. CDC Wonder, average suicide rates over the period 2008–17. 14. Across the fifty US states, the correlation coefficient is .4. 15. Anne Case and Angus Deaton, 2017, “Suicide, age, and well-being: An empirical investigation,” in David A. Wise, ed., Insights in the economics of aging, National Bureau of Economic Research Conference Report, University of Chicago Press for NBER, 307–34. 16. The fractions of the birth cohorts of 1945 and 1970 who finish a four-year degree are not very different, so these results are unlikely to be attributable to changing compositions of those with and without a bachelor’s degree between the cohorts. 17.


pages: 370 words: 107,983

Rage Inside the Machine: The Prejudice of Algorithms, and How to Stop the Internet Making Bigots of Us All by Robert Elliott Smith

Ada Lovelace, affirmative action, AI winter, Alfred Russel Wallace, Amazon Mechanical Turk, animal electricity, autonomous vehicles, Black Swan, British Empire, cellular automata, citizen journalism, Claude Shannon: information theory, combinatorial explosion, corporate personhood, correlation coefficient, crowdsourcing, Daniel Kahneman / Amos Tversky, desegregation, discovery of DNA, Douglas Hofstadter, Elon Musk, Fellow of the Royal Society, feminist movement, Filter Bubble, Flash crash, Gerolamo Cardano, gig economy, Gödel, Escher, Bach, invention of the wheel, invisible hand, Jacquard loom, Jacques de Vaucanson, John Harrison: Longitude, John von Neumann, Kenneth Arrow, low skilled workers, Mark Zuckerberg, mass immigration, meta analysis, meta-analysis, mutually assured destruction, natural language processing, new economy, On the Economy of Machinery and Manufactures, p-value, pattern recognition, Paul Samuelson, performance metric, Pierre-Simon Laplace, precariat, profit maximization, profit motive, Silicon Valley, social intelligence, statistical model, Stephen Hawking, stochastic process, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, The Future of Employment, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Thomas Bayes, Thomas Malthus, traveling salesman, Turing machine, Turing test, twin studies, Vilfredo Pareto, Von Neumann architecture, women in the workforce

When he took over the UK Eugenics Records Office from Galton in 1907, Pearson renamed it the Eugenics Laboratory, which had a more scientific ring, and reflected how the facility’s work moved from merely gathering data to creating a new science around data analysis, via statistics. Processing the lab’s big data required statistical mathematics, so in 1911 Pearson (who already held a chair in Applied Mathematics at UCL) merged the biometric and eugenics laboratories to form the Department of Applied Statistics, the first university statistics department in the world. Pearson went on to create the Pearson correlation coefficient, one of the most fundamental calculations in statistics. In fact, his work is so foundational to statistics that he was offered a knighthood (which he declined based on his personal commitment to socialism). The UCL building which once housed the Department of Statistics bears his name. Pearson also founded The Annals of Eugenics journal (which now exists as the prominent Annals of Genetics), the masthead of which originally included the famous (mis)quote from Charles Darwin.


pages: 387 words: 119,409

Work Rules!: Insights From Inside Google That Will Transform How You Live and Lead by Laszlo Bock

Airbnb, Albert Einstein, AltaVista, Atul Gawande, Black Swan, book scanning, Burning Man, call centre, Cass Sunstein, Checklist Manifesto, choice architecture, citizen journalism, clean water, correlation coefficient, crowdsourcing, Daniel Kahneman / Amos Tversky, deliberate practice, en.wikipedia.org, experimental subject, Frederick Winslow Taylor, future of work, Google Earth, Google Glasses, Google Hangouts, Google X / Alphabet X, Googley, helicopter parent, immigration reform, Internet Archive, longitudinal study, Menlo Park, mental accounting, meta analysis, meta-analysis, Moneyball by Michael Lewis explains big data, nudge unit, PageRank, Paul Buchheit, Ralph Waldo Emerson, Rana Plaza, random walk, Richard Thaler, Rubik’s Cube, self-driving car, shareholder value, side project, Silicon Valley, six sigma, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, Steven Pinker, survivorship bias, TaskRabbit, The Wisdom of Crowds, Tony Hsieh, Turing machine, winner-take-all economy, Y2K

Murphy, “Differentiating Insight from Non-Insight Problems,” Thinking & Reasoning 11, no. 3 (2005): 279–302. 85. Frank L. Schmidt and John E. Hunter, “The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 Years of Research Findings,” Psychological Bulletin 124, no. 2 (1998): 262–274. The r2 values presented in this chapter are calculated based on the reported corrected correlation coefficients (r). 86. Phyllis Rosser, The SAT Gender Gap: Identifying the Causes (Washington, DC: Center for Women Policy Studies, 1989). 87. Subsequent studies have validated the gender gap on the SAT and demonstrated racial bias as well. See, for example, Christianne Corbett, Catherine Hill, and Andresse St. Rose, “Where the Girls Are: The Facts About Gender Equity in Education,” American Association of University Women (2008).


pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang

AI winter, artificial general intelligence, bioinformatics, brain emulation, combinatorial explosion, complexity theory, computer vision, conceptual framework, correlation coefficient, epigenetics, friendly AI, G4S, information retrieval, Isaac Newton, John Conway, Loebner Prize, Menlo Park, natural language processing, Occam's razor, p-value, pattern recognition, performance metric, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K

Analysis by Comparison with Psychometric Data In order to further validate our findings, we compared our principal dimensions found in the English core against the dimensions of the ANEW dataset: pleasure5, arousal and dominance. The ANEW list contains 1,034 words, 479 of which were found in the English core. The scatter plot of our PC #1 versus the first dimension of ANEW, which is the mean value of pleasure, is represented in Figure 7. The plot shows strong correlation, with similar bimodal distributions in both PC #1 and the ANEW-pleasure dimensions. Pearson correlation coefficient r = 0.70. Figure 7. Scatter plot demonstrating strong correlation of PC #1 with the first dimension of ANEW: pleasure. The dashed line is a linear fit. The two clusters (“positive” and “negative”) are separated in each dimension. How can we match PCs with ANEW dimensions? Our correlation analysis shows that PC #1 is the best match (i.e., most highly correlated among all PCs) for ANEWpleasure, and vice versa (r = 0.70, p = 10-70).


How I Became a Quant: Insights From 25 of Wall Street's Elite by Richard R. Lindsey, Barry Schachter

Albert Einstein, algorithmic trading, Andrew Wiles, Antoine Gombaud: Chevalier de Méré, asset allocation, asset-backed security, backtesting, bank run, banking crisis, Black-Scholes formula, Bonfire of the Vanities, Bretton Woods, Brownian motion, business cycle, business process, butter production in bangladesh, buy and hold, buy low sell high, capital asset pricing model, centre right, collateralized debt obligation, commoditize, computerized markets, corporate governance, correlation coefficient, creative destruction, Credit Default Swap, credit default swaps / collateralized debt obligations, currency manipulation / currency intervention, discounted cash flows, disintermediation, diversification, Donald Knuth, Edward Thorp, Emanuel Derman, en.wikipedia.org, Eugene Fama: efficient market hypothesis, financial innovation, fixed income, full employment, George Akerlof, Gordon Gekko, hiring and firing, implied volatility, index fund, interest rate derivative, interest rate swap, John von Neumann, linear programming, Loma Prieta earthquake, Long Term Capital Management, margin call, market friction, market microstructure, martingale, merger arbitrage, Myron Scholes, Nick Leeson, P = NP, pattern recognition, Paul Samuelson, pensions crisis, performance metric, prediction markets, profit maximization, purchasing power parity, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Richard Feynman, Richard Stallman, risk-adjusted returns, risk/return, shareholder value, Sharpe ratio, short selling, Silicon Valley, six sigma, sorting algorithm, statistical arbitrage, statistical model, stem cell, Steven Levy, stochastic process, systematic trading, technology bubble, The Great Moderation, the scientific method, too big to fail, trade route, transaction costs, transfer pricing, value at risk, volatility smile, Wiener process, yield curve, young professional

They termed this effect money illusion. Our paper examined a corollary of their result: In the presence of money illusion, the correlation between stock and bond returns will be abnormally high during periods of high inflation. For the United States, it was shown that inflation had exactly this effect on stock/bond correlations during the postwar era. As a result, asset allocation strategies that are based on the high correlation coefficients calculated using data from the 1970s and early 1980s can be expected to generate inefficient portfolios in regimes of low inflation. JWPR007-Lindsey 82 May 7, 2007 16:44 h ow i b e cam e quant Ray LeClair and I wrote a paper, “Revenue Recognition Certificates: A New Security,” in which we explored the concept and potential benefits of a new type of security.14 This security provides returns as a specified function of a firm’s sales or gross revenues over a defined period of time, say 10 years, and then expires worthless.


pages: 385 words: 128,358

Inside the House of Money: Top Hedge Fund Traders on Profiting in a Global Market by Steven Drobny

Albert Einstein, asset allocation, Berlin Wall, Bonfire of the Vanities, Bretton Woods, business cycle, buy and hold, buy low sell high, capital controls, central bank independence, commoditize, commodity trading advisor, corporate governance, correlation coefficient, Credit Default Swap, diversification, diversified portfolio, family office, fixed income, glass ceiling, high batting average, implied volatility, index fund, inflation targeting, interest rate derivative, inventory management, John Meriwether, Long Term Capital Management, margin call, market bubble, Maui Hawaii, Mexican peso crisis / tequila crisis, moral hazard, Myron Scholes, new economy, Nick Leeson, oil shale / tar sands, oil shock, out of africa, paper trading, Paul Samuelson, Peter Thiel, price anchoring, purchasing power parity, reserve currency, risk tolerance, risk-adjusted returns, risk/return, rolodex, Sharpe ratio, short selling, Silicon Valley, The Wisdom of Crowds, too big to fail, transaction costs, value at risk, yield curve, zero-coupon bond, zero-sum game

LTCM was at the forefront of investing at the time and offers insight into some of the failings of risk management systems. Risk management systems based on historical prices are one way to look at risk but are in no way faultless. Financial market history is filled with theoretically low probability or fat tail events. In LTCM’s case, its risk systems calculated roughly a 1-in-6-billion chance of a major blowup. Ironically, however, one correlation the brilliant minds of LTCM neglected to consider was the correlation coefficient of positions that were linked for no other reason than the fact 2.50 AAA Spread BAA Spread Yield (%) 2.00 Spreads Blow Out 1.50 1.00 0.50 4 Ju l-9 4 Oc t-9 4 Ja n95 Ap r-9 5 Ju l-9 5 Oc t-9 5 Ja n96 Ap r-9 6 Ju l-9 6 Oc t-9 6 Ja n97 Ap r-9 7 Ju l-9 7 Oc t-9 7 Ja n98 Ap r-9 8 Ju l-9 8 Oc t-9 8 Ja n99 Ap r-9 9 Ju l-9 9 Oc t-9 9 -9 Ap r Ja n- 94 0.00 FIGURE 2.13 Corporate Spreads to Treasuries, 1994–1999 Source: Bloomberg. 26 INSIDE THE HOUSE OF MONEY GREENSPAN ON LTCM How much dependence should be placed on financial modeling, which, for all its sophistication, can get too far ahead of human judgment?


Mastering Private Equity by Zeisberger, Claudia,Prahl, Michael,White, Bowen, Michael Prahl, Bowen White

asset allocation, backtesting, barriers to entry, Basel III, business process, buy low sell high, capital controls, carried interest, commoditize, corporate governance, corporate raider, correlation coefficient, creative destruction, discounted cash flows, disintermediation, disruptive innovation, distributed generation, diversification, diversified portfolio, family office, fixed income, high net worth, information asymmetry, intangible asset, Lean Startup, market clearing, passive investing, pattern recognition, performance metric, price mechanism, profit maximization, risk tolerance, risk-adjusted returns, risk/return, shareholder value, Sharpe ratio, Silicon Valley, sovereign wealth fund, statistical arbitrage, time value of money, transaction costs

Most pension consultants did not follow or cover the asset class. We spent a lot of time doing educational presentations for trustees and their consultants at offsite retreats, board meetings and pension conferences. During the 1980s, our hard work finally began to pay off. As we had actual data going back to 1972, we became pension funds’ source of information on expected returns, standard deviations and correlation coefficients for the private equity “asset class.” The new term “asset class” implied a transition from a niche activity to something that was becoming institutional. We took the lead in establishing the first industry performance benchmarks, chaired the committee that established the private equity valuation guidelines, and worked with the CFA Institute to establish the guidelines for private equity performance reporting.


Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, longitudinal study, Mars Rover, natural language processing, openstreetmap, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social graph, SPARQL, speech recognition, statistical model, supply-chain management, text mining, Vernor Vinge, web application

(This relationship holds even if one controls for other predictors of roll call voting, such as nominee quality and ideological distance between the senator and the nominee.) The beauty of this graph is that it combines raw data with a simple inferential model in a single plot. Typically, bivariate relationships are presented in tabular form; in this example, doing so would require either nine correlation coefficients or regression coefficients and standard errors from nine regression models, which would be ungainly, make it difficult to visualize the relationship between opinion and voting for each nominee, and create difficulties in making comparisons across nominees. The only actual numbers we include BEAUTIFUL POLITICAL DATA Download at Boykma.Com 329 Pr(Voting Yes) Bork Rehnquist 1 1 .75 .75 .75 .5 .5 .5 .25 .25 .25 42−58 0 65−33 0 40 45 50 55 60 65 45 50 55 60 65 70 Pr(Voting Yes) 55 1 .75 .75 .75 .5 .5 .25 0 78−22 0 70 75 .25 65 Ginsburg 70 75 80 85 Breyer 70 75 80 85 90 O'Connor 1 1 .75 .75 .75 .5 .5 .5 .25 .25 .25 87−9 0 70 75 80 85 State Support for Nominee 80 90−9 65 1 96−3 75 0 60 0 70 .5 .25 52−48 65 Souter 1 65 60 Roberts 1 60 58−42 0 Thomas Pr(Voting Yes) Alito 1 99−0 0 70 75 80 85 90 88 90 92 State Support for Nominee All Nominees Pr(Voting Yes) 1 .75 .5 .25 0 40 50 60 70 80 90 State Support for Nominee F I G U R E 1 9 - 4 .


pages: 386

Good Money: Birmingham Button Makers, the Royal Mint, and the Beginnings of Modern Coinage, 1775-1821 by George Anthony Selgin

British Empire, correlation coefficient, George Gilder, invention of the steam engine, Isaac Newton, James Watt: steam engine, large denomination, lone genius, profit motive, RAND corporation, school choice, seigniorage, The Wealth of Nations by Adam Smith

Manufactured Copper Prices and Halfpenny Token Weights, 1787-1800 Year 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 Price of copper (d/lb) (Grenfell) (Tooke) 11 11 11 11 11 12 13 13 13 13 1.4 14 15 17 9.480 9.600 9.600 10.080 10.400 11.460 13.230 13.152 13.152 13.776 14.400 14.400 15.600 18.000 Average weight (ounces) Average "intrinsic value" (pence) (Tooke) (Grenfell) 0.499 0.499 0.452 0.448 0.450 0.419 0.372 0.346 0.339 0.334 0.351 0.370 0.361 0.275 0.296 0.299 0.271 0.282 0.293 0.300 0.307 0.285 0.279 0.287 0.316 0.333 0.352 0.309 0.343 0.343 0.311 0.308 0.309 0.314 0.302 0.281 0.275 0.271 0.307 0.324 0.338 0.292 12.595 0.394 Average 12.786 Correlation coefficients: -0.879 (Grenfell) -0.928 (Tooke) 0.301 0.309 SOUT(:es: Token weight~: Elks 2005. Copper prices: Thomas Tooke 1838,400 (average of reported quarterly prices); Grenfel1 1814. 146 GOOD MONEY well) have devoted so much effort and time to distinguishing specious tokens and mules from authentic issues and to documenting variants of authentic issues. But it does not follow from this that the many varieties of tokens proved a "nuisance" to members of the general public; and the general public's perspective must be taken in reaching an economic verdict concerning commercial coinage.


Data Wrangling With Python: Tips and Tools to Make Your Life Easier by Jacqueline Kazil

Amazon Web Services, bash_history, cloud computing, correlation coefficient, crowdsourcing, data acquisition, database schema, Debian, en.wikipedia.org, Firefox, Google Chrome, job automation, Nate Silver, natural language processing, pull request, Ronald Reagan, Ruby on Rails, selection bias, social web, statistical model, web application, WikiLeaks

These are a good first toolset—you can often start with the agate library tools and then move on to more advanced statistical libraries, including pandas, numpy, and scipy, as needed. We want to determine whether perceived government corruption and child labor rates are related. The first tool we’ll use is a simple Pearson’s correlation. agate is at this point in time working on building this correlation into the agate-stats library. Until then, you can correlate using numpy. Correlation coefficients (like Pearson’s) tell us if data is related and whether one variable has any effect on another. If you haven’t already installed numpy, you can do so by running pip install numpy. Then, calculate the correlation between child labor rates and perceived government corruption by running the following line of code: import numpy numpy.corrcoef(cpi_and_cl.columns['Total (%)'].values(), cpi_and_cl.columns['CPI 2013 Score'].values())[0, 1] We first get an error which looks similar to the CastError we saw before.


The Origins of the Urban Crisis by Sugrue, Thomas J.

affirmative action, business climate, collective bargaining, correlation coefficient, creative destruction, Credit Default Swap, deindustrialization, desegregation, Detroit bankruptcy, Ford paid five dollars a day, George Gilder, ghettoisation, Gunnar Myrdal, hiring and firing, housing crisis, income inequality, indoor plumbing, informal economy, invisible hand, job automation, jobless men, Joseph Schumpeter, labor-force participation, low-wage service sector, manufacturing employment, mass incarceration, New Urbanism, oil shock, pink-collar, postindustrial economy, rent control, Richard Florida, Ronald Reagan, side project, Silicon Valley, strikebreaker, The Bell Curve by Richard Herrnstein and Charles Murray, The Chicago School, union organizing, upwardly mobile, urban planning, urban renewal, War on Poverty, white flight, working-age population, Works Progress Administration

“Ghetto” tracts, in contrast, were poorer—only five out of twenty-six tracts with a majority-black population in 1940 had incomes above the average for all blacks. “Infill” tracts (containing second-wave black newcomers) were split evenly between above- and below-average income. To offer a more precise statistical measure of impressionistic evidence about black residential stratification, the correlation coefficient (Pearson) was calculated for tract of percentage black population in 1940 and percentage change of black population with income in 1950. The results demonstrate a clear negative correlation between the income and percentage black in 1940 and income and increase in black population. Both correlations underscore the fact that transitional tracts—those that had smaller black populations in 1940 than in 1950, and those that gained a large number of blacks between 1940 and 1950—were those tracts that had the highest median incomes. 36.


pages: 612 words: 187,431

The Art of UNIX Programming by Eric S. Raymond

A Pattern Language, Albert Einstein, barriers to entry, bioinformatics, Clayton Christensen, combinatorial explosion, commoditize, correlation coefficient, David Brooks, Debian, domain-specific language, don't repeat yourself, Donald Knuth, Everything should be made as simple as possible, facts on the ground, finite state, general-purpose programming language, George Santayana, Innovator's Dilemma, job automation, Larry Wall, MVC pattern, pattern recognition, Paul Graham, peer-to-peer, premature optimization, pre–internet, publish or perish, revision control, RFC: Request For Comment, Richard Stallman, Robert Metcalfe, Steven Levy, transaction costs, Turing complete, Valgrind, wage slave, web application

In his paper, Graham noted accurately that computer programmers like the idea of pattern-matching filters, and sometimes have difficulty seeing past that approach, because it offers them so many opportunities to be clever. Statistical spam filters, on the other hand, work by collecting feedback about what the user judges to be spam versus nonspam. That feedback is processed into databases of statistical correlation coefficients or weights connecting words or phrases to the user's spam/nonspam classification. The most popular algorithms use minor variants of Bayes's Theorem on conditional probabilities, but other techniques (including various sorts of polynomial hashing) are also employed. In all these programs, the correlation check is a relatively trivial mathematical formula. The weights fed into the formula along with the message being checked serve as implicit control structure for the filtering algorithm.


pages: 819 words: 181,185

Derivatives Markets by David Goldenberg

Black-Scholes formula, Brownian motion, capital asset pricing model, commodity trading advisor, compound rate of return, conceptual framework, correlation coefficient, Credit Default Swap, discounted cash flows, discrete time, diversification, diversified portfolio, en.wikipedia.org, financial innovation, fudge factor, implied volatility, incomplete markets, interest rate derivative, interest rate swap, law of one price, locking in a profit, London Interbank Offered Rate, Louis Bachelier, margin call, market microstructure, martingale, Myron Scholes, Norbert Wiener, Paul Samuelson, price mechanism, random walk, reserve currency, risk/return, riskless arbitrage, Sharpe ratio, short selling, stochastic process, stochastic volatility, time value of money, transaction costs, volatility smile, Wiener process, yield curve, zero-coupon bond, zero-sum game

We are assured by replication (and no-arbitrage, of course) that, H1=C1–Δ*S1=rB and therefore, H0=C0–Δ*S0=B. 17.4.1 Volatility of the Hedge Portfolio We now want to look at the volatility of the hedged portfolio, H, in another way in terms of its components. We write it generically as H=C–Δ*S and calculate its variance using our rule from portfolio analysis that says, where and ρX,Y is the correlation coefficient between X and Y defined by ρX,Y≡Cov(X,Y)/(σX*σY). Applying this rule to our hedge portfolio H we obtain, The interpretation of is the variance of the dollar returns on the option. Similarly, is the variance of the dollar returns on the underlying stock. Why dollar returns? We will demonstrate this shortly and also formulate the analysis in terms of percentage returns to the option, the stock, and the hedge.


The Impact of Early Life Trauma on Health and Disease by Lanius, Ruth A.; Vermetten, Eric; Pain, Clare

conceptual framework, correlation coefficient, delayed gratification, epigenetics, false memory syndrome, impulse control, intermodal, longitudinal study, meta analysis, meta-analysis, Nelson Mandela, p-value, phenotype, randomized controlled trial, selective serotonin reuptake inhibitor (SSRI), social intelligence, Socratic dialogue, theory of mind, twin studies, yellow journalism

The latter vague statement understandably worried many academics studying controversial topics. Trauma and FM-affiliated scientists responded quite differently to the Rind study. Trauma researchers criticized the methods and conclusions of the work in multiple ways, reminding readers that estimates of psychopathology based on college samples are likely to be skewed, criticizing the use and interpretation of the correlation coefficient as the measure of effect size, objecting to biases in sampling that they believed were present, and criticizing the conclusion of “no harm” when only specific harms were assessed [30–33]. Further, they disagreed with suggestion by Rind and colleagues that the label “child sexual abuse” should be reserved for those children who were showing present symptoms and who did not “consent,” typically arguing that it is not meaningful to speak of a “willing” 5-year-old child in the context of sexual activity or to attempt “value-neutral” discussion of child abuse sexuality [31,32].


pages: 1,164 words: 309,327

Trading and Exchanges: Market Microstructure for Practitioners by Larry Harris

active measures, Andrei Shleifer, asset allocation, automated trading system, barriers to entry, Bernie Madoff, business cycle, buttonwood tree, buy and hold, compound rate of return, computerized trading, corporate governance, correlation coefficient, data acquisition, diversified portfolio, fault tolerance, financial innovation, financial intermediation, fixed income, floating exchange rates, High speed trading, index arbitrage, index fund, information asymmetry, information retrieval, interest rate swap, invention of the telegraph, job automation, law of one price, London Interbank Offered Rate, Long Term Capital Management, margin call, market bubble, market clearing, market design, market fragmentation, market friction, market microstructure, money market fund, Myron Scholes, Nick Leeson, open economy, passive investing, pattern recognition, Ponzi scheme, post-materialism, price discovery process, price discrimination, principal–agent problem, profit motive, race to the bottom, random walk, rent-seeking, risk tolerance, risk-adjusted returns, selection bias, shareholder value, short selling, Small Order Execution System, speech recognition, statistical arbitrage, statistical model, survivorship bias, the market place, transaction costs, two-sided market, winner-take-all economy, yield curve, zero-coupon bond, zero-sum game

. ◀ * * * The fraction of total variation that a statistical model explains is called the R2 of the model. Factor models typically have R2 of less than 90 percent in annual data. For comparison, the R2 of the simple market-adjusted return model is 81 percent when the portfolio standard deviation is 16 percent, and the market-adjusted return standard deviation is 7.0 percent {0.81 = (0.9)2 = (162 - 72).162}. (In the simple market-adjusted return model, the R2 is equal to the square of the correlation coefficient of the portfolio returns with the market returns.) In principle, analysts could construct stronger tests if they knew more about a manager’s presumed skill. For example, suppose an analyst believes that a manager may be skilled only in rising markets but not in falling markets. This information would allow the analyst to construct a stronger test of whether the manager is skilled. In particular, the analyst would examine returns only in rising markets.


The Art of Computer Programming: Sorting and Searching by Donald Ervin Knuth

card file, Claude Shannon: information theory, complexity theory, correlation coefficient, Donald Knuth, double entry bookkeeping, Eratosthenes, Fermat's Last Theorem, G4S, information retrieval, iterative process, John von Neumann, linked data, locality of reference, Menlo Park, Norbert Wiener, NP-complete, p-value, Paul Erdős, RAND corporation, refrigerator car, sorting algorithm, Vilfredo Pareto, Yogi Berra, Zipf's Law

., Xk are all the elements > an; the other elements appear in (possibly empty) strings ai, ..., Qfe. Compare the number of inversions of h(a) — ol\X\OL2X2 • ¦ -CtkXk to inv(a); in this construction the number an does not appear in h(a).] b) Use / to define another one-to-one correspondence g having the following two properties: (i) ind(g(a)) = inv(a); (ii) inv(g(a)) — ind(a). [Hint: Consider inverse permutations.] 26. [M25] What is the statistical correlation coefficient between the number of inver- inversions and the index of a random permutation? (See Eq. 3.3.2-B4).) 27. [M37] Prove that, in addition to A5), there is a simple relationship between inv(ai a-2 ¦ ¦ ¦ an) and the n-tuple (gi, 92, • ¦ ¦,qn)- Use this fact to generalize the deriva- derivation of A7), obtaining an algebraic characterization of the bivariate generating function Hn(w,z) = J2winviai a2-an)z[nd(aia2-an), where the sum is over all n!


pages: 1,351 words: 385,579

The Better Angels of Our Nature: Why Violence Has Declined by Steven Pinker

1960s counterculture, affirmative action, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, availability heuristic, Berlin Wall, Bonfire of the Vanities, British Empire, Broken windows theory, business cycle, California gold rush, Cass Sunstein, citation needed, clean water, cognitive dissonance, colonial rule, Columbine, computer age, conceptual framework, correlation coefficient, correlation does not imply causation, crack epidemic, cuban missile crisis, Daniel Kahneman / Amos Tversky, David Brooks, delayed gratification, demographic transition, desegregation, Doomsday Clock, Douglas Hofstadter, Edward Glaeser, en.wikipedia.org, European colonialism, experimental subject, facts on the ground, failed state, first-past-the-post, Flynn Effect, food miles, Francis Fukuyama: the end of history, fudge factor, full employment, George Santayana, ghettoisation, Gini coefficient, global village, Henri Poincaré, Hobbesian trap, humanitarian revolution, impulse control, income inequality, informal economy, Intergovernmental Panel on Climate Change (IPCC), invention of the printing press, Isaac Newton, lake wobegon effect, libertarian paternalism, long peace, longitudinal study, loss aversion, Marshall McLuhan, mass incarceration, McMansion, means of production, mental accounting, meta analysis, meta-analysis, Mikhail Gorbachev, moral panic, mutually assured destruction, Nelson Mandela, open economy, Peace of Westphalia, Peter Singer: altruism, QWERTY keyboard, race to the bottom, Ralph Waldo Emerson, random walk, Republic of Letters, Richard Thaler, Ronald Reagan, Rosa Parks, Saturday Night Live, security theater, Skype, Slavoj Žižek, South China Sea, Stanford marshmallow experiment, Stanford prison experiment, statistical model, stem cell, Steven Levy, Steven Pinker, The Bell Curve by Richard Herrnstein and Charles Murray, The Wealth of Nations by Adam Smith, theory of mind, transatlantic slave trade, Turing machine, twin studies, ultimatum game, uranium enrichment, Vilfredo Pareto, Walter Mischel, WikiLeaks, women in the workforce, zero-sum game

The graph also shows the trend for Canada since 1961. Canadians kill at less than a third of the rate of Americans, partly because in the 19th century the Mounties got to the western frontier before the settlers and spared them from having to cultivate a violent code of honor. Despite this difference, the ups and downs of the Canadian homicide rate parallel those of their neighbor to the south (with a correlation coefficient between 1961 and 2009 of 0.85), and it sank almost as much in the 1990s: 35 percent, compared to the American decline of 42 percent.132 The parallel trajectory of Canada and the United States is one of many surprises in the great crime decline of the 1990s. The two countries differed in their economic trends and in their policies of criminal justice, yet they enjoyed similar drops in violence.