99 results back to index
Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei
bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application
By mining in the gene dimension, we may find patterns shared by multiple genes, or cluster genes into groups. For example, we may find a group of genes that express themselves similarly, which is highly interesting in bioinformatics, such as in finding pathways. ■ When analyzing in the sample/condition dimension, we treat each sample/condition as an object and treat the genes as attributes. In this way, we may find patterns of samples/conditions, or cluster samples/conditions into groups. For example, we may find the differences in gene expression by comparing a group of tumor samples and nontumor samples. Gene expression Gene expression matrices are popular in bioinformatics research and development. For example, an important task is to classify a new gene using the expression data of the gene and that of other genes in known classes. Symmetrically, we may classify a new sample (e.g., a new patient) using the expression data of the sample and that of samples in known classes (e.g., tumor and nontumor).
Every enterprise benefits from collecting and analyzing its data: Hospitals can spot trends and anomalies in their patient records, search engines can do better ranking and ad placement, and environmental and public health agencies can spot patterns and abnormalities in their data. The list continues, with cybersecurity and computer network intrusion detection; monitoring of the energy consumption of household appliances; pattern analysis in bioinformatics and pharmaceutical data; financial and business intelligence data; spotting trends in blogs, Twitter, and many more. Storage is inexpensive and getting even less so, as are data sensors. Thus, collecting and storing data is easier than ever before. The problem then becomes how to analyze the data. This is exactly the focus of this Third Edition of the book. Jiawei, Micheline, and Jian give encyclopedic coverage of all the related methods, from the classic topics of clustering and classification, to database methods (e.g., association rules, data cubes) to more recent and advanced topics (e.g., SVD/PCA, wavelets, support vector machines).
Web mining can help us learn about the distribution of information on the WWW in general, characterize and classify web pages, and uncover web dynamics and the association and other relationships among different web pages, users, communities, and web-based activities. It is important to keep in mind that, in many applications, multiple types of data are present. For example, in web mining, there often exist text data and multimedia data (e.g., pictures and videos) on web pages, graph data like web graphs, and map data on some web sites. In bioinformatics, genomic sequences, biological networks, and 3-D spatial structures of genomes may coexist for certain biological objects. Mining multiple data sources of complex data often leads to fruitful findings due to the mutual enhancement and consolidation of such multiple sources. On the other hand, it is also challenging because of the difficulties in data cleaning and data integration, as well as the complex interactions among the multiple sources of such data.
Forty Signs of Rain by Kim Stanley Robinson
bioinformatics, business intelligence, double helix, experimental subject, Intergovernmental Panel on Climate Change (IPCC), phenotype, prisoner's dilemma, Ronald Reagan, social intelligence, stem cell, the scientific method, zero-sum game
It was not a matter of her being warm and fuzzy, as you might expect from the usual characterizations of feminine thought—on the contrary, Anna’s scientific work (she still often coauthored papers in statistics, despite her bureaucratic load) often displayed a finicky perfectionism that made her a very meticulous scientist, a first-rate statistician—smart, quick, competent in a range of fields and really excellent in more than one. As good a scientist as one could find for the rather odd job of running the Bioinformatics Division at NSF, good almost to the point of exaggeration—too precise, too interrogatory—it kept her from pursuing a course of action with drive. Then again, at NSF maybe that was an advantage. In any case she was so intense about it. A kind of Puritan of science, rational to an extreme. And yet of course at the same time that was all such a front, as with the early Puritans; the hyperrational coexisted in her with all the emotional openness, intensity, and variability that was the American female interactional paradigm and social role.
This was a major manifestation of the peer-review process, a process Frank thoroughly approved of—in principle. But a year of it was enough. Anna had been watching him, and now she said, “I suppose it is a bit of a rat race.” “Well, no more than anywhere else. In fact if I were home it’d probably be worse.” They laughed. “And you have your journal work too.” “That’s right.” Frank waved at the piles of typescripts: three stacks for Review of Bioinformatics, two for The Journal of Sociobiology. “Always behind. Luckily the other editors are better at keeping up.” Anna nodded. Editing a journal was a privilege and an honor, even though usually unpaid—indeed, one often had to continue to subscribe to a journal just to get copies of what one had edited. It was another of science’s many noncompensated activities, part of its extensive economy of social credit.
A key to any part of the mystery could be very valuable. Frank scrolled down the pages of the application with practiced speed. Yann Pierzinski, Ph.D. in biomath, Caltech. Still doing postdoc work with his thesis advisor there, a man Frank had come to consider a bit of a credit hog, if not worse. It was interesting, then, that Pierzinski had gone down to Torrey Pines to work on a temporary contract, for a bioinformatics researcher whom Frank didn’t know. Perhaps that had been a bid to escape the advisor. But now he was back. Frank dug into the substantive part of the proposal. The algorithm set was one Pierzinski had been working on even back in his dissertation. Chemical mechanics of protein creation as a sort of natural algorithm, in effect. Frank considered the idea, operation by operation. This was his real expertise; this was what had interested him from childhood, when the puzzles solved had been simple ciphers.
As the Future Catches You: How Genomics & Other Forces Are Changing Your Work, Health & Wealth by Juan Enriquez
Albert Einstein, Berlin Wall, bioinformatics, borderless world, British Empire, Buckminster Fuller, business cycle, creative destruction, double helix, global village, half of the world's population has never made a phone call, Howard Rheingold, Jeff Bezos, Joseph Schumpeter, Kevin Kelly, knowledge economy, more computing power than Apollo, new economy, personalized medicine, purchasing power parity, Ray Kurzweil, Richard Feynman, Robert Metcalfe, Search for Extraterrestrial Intelligence, SETI@home, Silicon Valley, spice trade, stem cell, the new new thing
The machines and technology coming out of the digital and genetic revolutions may allow people to leverage their mental capacity a thousand … A million … Or a trillionfold. Biology is now driven by applied math … statistics … computer science … robotics … The world’s best programmers are increasingly gravitating toward biology … You will be hearing a lot about two new fields in the coming months … Bioinformatics and Biocomputing. You rarely see bioinformaticians … They are too valuable to companies and universities. Things are moving too fast … And they are too passionate about what they do … To spend a lot of time giving speeches and interviews. But if you go into the bowels of Harvard Medical School … And are able to find the genetics department inside the Warren Alpert Building … (A significant test of intelligence in and of itself … Start by finding the staircase inspired by the double helix … and go past the bathrooms marked XX and XY …) There you can find a small den where George Church hangs out, surrounded by computers.
This is ground zero for a wonderful commune of engineers, physicists, molecular biologists, and physicians …3 And some of the world’s smartest graduate students … Who are trying to make sense of the 100 terabytes of data that come out of gene labs yearly … A task equivalent to trying to sort and use a million new encyclopedias … every year.4 You can’t build enough “wet” labs (labs full of beakers, cells, chemicals, refrigerators) to process and investigate all the opportunities this scale of data generates. The only way for Church & Co. to succeed … Is to force biology to divide … Into theoretical and applied disciplines. Which is why he is one of the founders of bioinformatics … A new discipline that attempts to predict what biologists will find … When they carry out wet-lab experiments in a few months, years, or decades. In a sense, this mirrors Craig Venter’s efforts at The Institute for Genomic Research and Celera. Celera and Church’s labs are information centers … not traditional labs … And a few smart people are going to be able to do … A lot of biology … Very quickly.
THE RULES ARE DIFFERENT IN A KNOWLEDGE ECONOMY … IT’S A SCARY TIME FOR THE ESTABLISHMENT. Countries, regions, governments, and companies that assume they are … And will remain … Dominant … Soon lose their competitive edge. (Particularly those whose leadership ignores or disparages emerging technologies … Remember those old saws: The sun never sets on the British Empire … Vive La France! … All roads lead to Rome … China, the Middle Kingdom.) Which is one of the reasons bioinformatics is so important … And why you should pay attention. What we are seeing is just the beginning of the digital-genomics convergence. When you think of a DNA molecule and its ability to … Carry our complete life code within each of our cells … Accurately copy the code … Billions of times per day … Read and execute life’s functions … Transmit this information across generations … It becomes clear that … The world’s most powerful and compact coding and information-processing system … is a genome.
Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst
algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application
Much of the disruption is fed by improved instrument and sensor technology; for instance, the Large Synoptic Survey Telescope has a 3.2-gigabyte pixel camera and generates over 6 petabytes of image data per year. It is the platform of Big Data that is making such lofty goals attainable. The validation of Big Data analytics can be illustrated by advances in science. The biomedical corporation Bioinformatics recently announced that it has reduced the time it takes to sequence a genome from years to days, and it has also reduced the cost, so it will be feasible to sequence an individual’s genome for $1,000, paving the way for improved diagnostics and personalized medicine. The financial sector has seen how Big Data and its associated analytics can have a disruptive impact on business. Financial services firms are seeing larger volumes through smaller trading sizes, increased market volatility, and technological improvements in automated and algorithmic trading.
Big Data has transformed astronomy from a field in which taking pictures of the sky was a large part of the job to one in which the pictures are all in a database already and the astronomer’s task is to find interesting objects and phenomena in the database. Transformation is taking place in the biological arena as well. There is now a well-established tradition of depositing scientific data into a public repository and of creating public databases for use by other scientists. In fact, there is an entire discipline of bioinformatics that is largely devoted to the maintenance and analysis of such data. As technology advances, particularly with the advent of next-generation sequencing, the size and number of available experimental data sets are increasing exponentially. Big Data has the potential to revolutionize more than just research; the analytics process has started to transform education as well. A recent detailed quantitative comparison of different approaches taken by 35 charter schools in New York City has found that one of the top five policies correlated with measurable academic effectiveness was the use of data to guide instruction.
It may take a significant amount of work to achieve automated error-free difference resolution. The data preparation challenge even extends to analysis that uses only a single data set. Here there is still the issue of suitable database design, further complicated by the many alternative ways in which to store the information. Particular database designs may have certain advantages over others for analytical purposes. A case in point is the variety in the structure of bioinformatics databases, in which information on substantially similar entities, such as genes, is inherently different but is represented with the same data elements. Examples like these clearly indicate that database design is an artistic endeavor that has to be carefully executed in the enterprise context by professionals. When creating effective database designs, professionals such as data scientists must have the tools to assist them in the design process, and more important, they must develop techniques so that databases can be used effectively in the absence of intelligent database design.
The Zero Marginal Cost Society: The Internet of Things, the Collaborative Commons, and the Eclipse of Capitalism by Jeremy Rifkin
"Robert Solow", 3D printing, active measures, additive manufacturing, Airbnb, autonomous vehicles, back-to-the-land, big-box store, bioinformatics, bitcoin, business process, Chris Urmson, clean water, cleantech, cloud computing, collaborative consumption, collaborative economy, Community Supported Agriculture, Computer Numeric Control, computer vision, crowdsourcing, demographic transition, distributed generation, en.wikipedia.org, Frederick Winslow Taylor, global supply chain, global village, Hacker Ethic, industrial robot, informal economy, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Isaac Newton, James Watt: steam engine, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Julian Assange, Kickstarter, knowledge worker, longitudinal study, Mahatma Gandhi, manufacturing employment, Mark Zuckerberg, market design, mass immigration, means of production, meta analysis, meta-analysis, natural language processing, new economy, New Urbanism, nuclear winter, Occupy movement, off grid, oil shale / tar sands, pattern recognition, peer-to-peer, peer-to-peer lending, personalized medicine, phenotype, planetary scale, price discrimination, profit motive, QR code, RAND corporation, randomized controlled trial, Ray Kurzweil, RFID, Richard Stallman, risk/return, Ronald Coase, search inside the book, self-driving car, shareholder value, sharing economy, Silicon Valley, Skype, smart cities, smart grid, smart meter, social web, software as a service, spectrum auction, Steve Jobs, Stewart Brand, the built environment, The Nature of the Firm, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas L Friedman, too big to fail, transaction costs, urban planning, Watson beat the top human players on Jeopardy!, web application, Whole Earth Catalog, Whole Earth Review, WikiLeaks, working poor, zero-sum game, Zipcar
Reducing the cost of electricity in the management of data centers goes hand in hand with cutting the cost of storing data, an ever larger part of the data-management process. And the sheer volume of data is mushrooming faster than the capacity of hard drives to save it. Researchers are just beginning to experiment with a new way of storing data that could eventually drop the marginal cost to near zero. In January 2013 scientists at the European Bioinformatics Institute in Cambridge, England, announced a revolutionary new method of storing massive electronic data by embedding it in synthetic DNA. Two researchers, Nick Goldman and Ewan Birney, converted text from five computer files—which included an MP3 recording of Martin Luther King Jr.’s “I Have a Dream” speech, a paper by James Watson and Francis Crick describing the structure of DNA, and all of Shakespeare’s sonnets and plays—and converted the ones and zeros of digital information into the letters that make up the alphabet of the DNA code.
Harvard researcher George Church notes that the information currently stored in all the disk drives in the world could fit in a tiny bit of DNA the size of the palm of one’s hand. Researchers add that DNA information can be preserved for centuries, as long as it is kept in a dark, cool environment.65 At this early stage of development, the cost of reading the code is high and the time it takes to decode information is substantial. Researchers, however, are reasonably confident that an exponential rate of change in bioinformatics will drive the marginal cost to near zero over the next several decades. A near zero marginal cost communication/energy infrastructure for the Collaborative Age is now within sight. The technology needed to make it happen is already being deployed. At present, it’s all about scaling up and building out. When we compare the increasing expenses of maintaining an old Second Industrial Revolution communication/energy matrix of centralized telecommunications and centralized fossil fuel energy generation, whose costs are rising with each passing day, with a Third Industrial Revolution communication/energy matrix whose costs are dramatically shrinking, it’s clear that the future lies with the latter.
Its network of thousands of scientists and plant breeders is continually searching for heirloom and wild seeds, growing them out to increase seed stock, and ferrying samples to the vault for long-term storage.32 In 2010, the trust launched a global program to locate, catalog, and preserve the wild relatives of the 22 major food crops humanity relies on for survival. The intensification of genetic-Commons advocacy comes at a time when new IT and computing technology is speeding up genetic research. The new field of bioinformatics has fundamentally altered the nature of biological research just as IT, computing, and Internet technology did in the fields of renewable-energy generation and 3D printing. According to research compiled by the National Human Genome Research Institute, gene-sequencing costs are plummeting at a rate that exceeds the exponential curves of Moore’s Law in computing power.33 Dr. David Altshuler, deputy director of the Broad Institute of Harvard University and the Massachusetts Institute of Technology, observes that in just the past several years, the price of genetic sequencing has dropped a million fold.34 Consider that the cost of reading one million base pairs of DNA—the human genome contains around three billion pairs—has plunged from $100,000 to just six cents.35 This suggests that the marginal cost of some genetic research will approach zero in the not-too-distant future, making valuable biological data available for free, just like information on the Internet.
Protocol: how control exists after decentralization by Alexander R. Galloway
Ada Lovelace, airport security, Berlin Wall, bioinformatics, Bretton Woods, computer age, Craig Reynolds: boids flock, discovery of DNA, Donald Davies, double helix, Douglas Engelbart, Douglas Engelbart, easy for humans, difficult for computers, Fall of the Berlin Wall, Grace Hopper, Hacker Ethic, informal economy, John Conway, John Markoff, Kevin Kelly, Kickstarter, late capitalism, linear programming, Marshall McLuhan, means of production, Menlo Park, moral panic, mutually assured destruction, Norbert Wiener, old-boy network, packet switching, Panopticon Jeremy Bentham, phenotype, post-industrial society, profit motive, QWERTY keyboard, RAND corporation, Ray Kurzweil, RFC: Request For Comment, Richard Stallman, semantic web, SETI@home, stem cell, Steve Crocker, Steven Levy, Stewart Brand, Ted Nelson, telerobotics, the market place, theory of mind, urban planning, Vannevar Bush, Whole Earth Review, working poor
This dual property (regulated ﬂow) is central to Protocol’s analysis of the Internet as a political technology. Isomorphic Biopolitics As a ﬁnal comment, it is worthwhile to note that the concept of “protocol” is related to a biopolitical production, a production of the possibility for experience in control societies. It is in this sense that Protocol is doubly materialist—in the sense of networked bodies inscribed by informatics, and Foreword: Protocol Is as Protocol Does xix in the sense of this bio-informatic network producing the conditions of experience. The biopolitical dimension of protocol is one of the parts of this book that opens onto future challenges. As the biological and life sciences become more and more integrated with computer and networking technology, the familiar line between the body and technology, between biologies and machines, begins to undergo a set of transformations. “Populations” deﬁned nationally or ethnically are also deﬁned informatically.
(Witness the growing business of population genomics.) Individual subjects are not only civil subjects, but also medical subjects for a medicine increasingly inﬂuenced by genetic science. The ongoing research and clinical trials in gene therapy, regenerative medicine, and genetic diagnostics reiterate the notion of the biomedical subject as being in some way amenable to a database. In addition to this bio-informatic encapsulation of individual and collective bodies, the transactions and economies between bodies are also being affected. Research into stem cells has ushered in a new era of molecular bodies that not only are self-generating like a reservoir (a new type of tissue banking), but that also create a tissue economy of potential biologies (lab-grown tissues and organs). Such biotechnologies often seem more science ﬁction than science, and indeed health care systems are far from fully integrating such emerging research into routine medical practice.
If layering is dependent upon portability, then portability is in turn enabled by the existence of ontology standards. These are some of the sites that Protocol opens up concerning the possible relations between information and biological networks. While the concept of biopolitics is often used at its most general level, Protocol asks us to respecify biopolitics in the age of biotechnology and bioinformatics. Thus one site of future engagement is in the zones where info-tech and bio-tech intersect. The “wet” biological body has not simply been superceded by “dry” computer code, just as the wet body no longer accounts for the virtual body. Biotechnologies of all sorts demonstrate this to us—in vivo tissue engineering, ethnic genome projects, gene-ﬁnding software, unregulated genetically modiﬁed foods, portable DNA diagnostics kits, and distributed proteomic computing.
Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzić
Albert Einstein, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application
In contrast to the (global) model structure, a temporal pattern is a local model that makes a specific statement about a few data samples in time. Spikes, for example, are patterns in a real-valued time series that may be of interest. Similarly, in symbolic sequences, regular expressions represent well-defined patterns. In bioinformatics, genes are known to appear as local patterns interspersed between chunks of noncoding DNA. Matching and discovery of such patterns are very useful in many applications, not only in bioinformatics. Due to their readily interpretable structure, patterns play a particularly dominant role in data mining. There have been many techniques used to model global or local temporal events. We will introduce only some of the most popular modeling techniques. Finite State Machine (FSM) has a set of states and a set of transitions.
The book represents an excellent surveys, practical guidance, and comprehensive tutorials from leading experts. It paints a picture of the state-of-the-art techniques that can boost the capabilities of many existing data-mining tools and gives the novel developments of feature selection that have emerged in recent years, including causal feature selection and Relief. The book contains real-world case studies from a variety of areas, including text classification, web mining, and bioinformatics. Saul, L. K., et al., Spectral Methods for Dimensionality Reduction, in Semisupervised Learning, B. Schööelkopf, O. Chapelle and A. Zien eds., MIT Press, Cambridge, MA, 2005. Spectral methods have recently emerged as a powerful tool for nonlinear dimensionality reduction and manifold learning. These methods are able to reveal low-dimensional structure in high-dimensional data from the top or bottom eigenvectors of specially constructed matrices.
However, there are some situations where the RBF kernel is not suitable, and one may just use the linear kernel with extremely good results. The question is when to use the linear kernel as a first choice. If the number of features is large, one may not need to map data to a higher dimensional space. Experiments showed that the nonlinear mapping does not improve the SVM performance. Using the linear kernel is good enough, and C is the only tuning parameter. Many microarray data in bioinformatics and collection of electronic documents for classification are examples of this data set type. As the number of features is smaller, and the number of samples increases, SVM successfully maps data to higher dimensional spaces using nonlinear kernels. One of the methods for finding optimal parameter values for an SVM is a grid search. The algorithm tries values of each parameter across the specified search range using geometric steps.
100 Plus: How the Coming Age of Longevity Will Change Everything, From Careers and Relationships to Family And by Sonia Arrison
23andMe, 8-hour work day, Albert Einstein, Anne Wojcicki, artificial general intelligence, attribution theory, Bill Joy: nanobots, bioinformatics, Clayton Christensen, dark matter, disruptive innovation, East Village, en.wikipedia.org, epigenetics, Frank Gehry, Googley, income per capita, indoor plumbing, Jeff Bezos, Johann Wolfgang von Goethe, Kickstarter, Law of Accelerating Returns, life extension, personalized medicine, Peter Thiel, placebo effect, post scarcity, Ray Kurzweil, rolodex, Silicon Valley, Simon Kuznets, Singularitarianism, smart grid, speech recognition, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Levy, Thomas Malthus, upwardly mobile, World Values Survey, X Prize
SU’s mission is practical: “to assemble, educate and inspire leaders who strive to understand and facilitate the development of exponentially advancing technologies in order to address humanity’s grand challenges.”20 The academic tracks are geared toward understanding how fast-moving technologies can work together, and more than half of them have a direct impact on the field of longevity research. These tracks include AI and robotics; nanotechnology, networks, and computing systems; biotechnology and bioinformatics; medicine and neuroscience; and futures studies and forecasting.21 SU is a place where mavens speak to those who are superfocused on changing the world for the better. It is no surprise, then, that it also functions as an institutional “connector”—the third component needed to successfully spread a game-changing meme. CONNECT ME Peter Diamandis always seems to be on the phone or leaving a meeting to get on a phone call.
Craig Venter, and the Human Genome Project, an international public consortium backed with around $3 billion U.S. tax dollars.54 Both President Bill Clinton and Prime Minister of Britain Tony Blair presided over the press conference announcing that humanity now possessed “the genetic blueprint for human beings.”55 President Clinton proudly told the world that the capacity to sequence human genomes “will revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases.”56 This new ability to look at the “source code” of humans particularly resonated with computer experts in Silicon Valley and around the world who spend much of their time designing code for computers. If the source code of humans can be identified, then it is not that much of a leap to think about re-engineering it. Suddenly, biology became a field that computer geeks could attempt to tackle, which not only resulted in smart biohackers forming do-it-yourself biology clubs, but also increased the pace of advances in biology. Bioinformatics are moving at the speed of Moore’s Law and sometimes faster. To the extent that wealthy technology moguls influence public opinion and hackers seem cool, the context for the longevity meme is sizzling hot. In a Wired magazine interview in April 2010, Bill Gates, America’s richest man, told reporter Steven Levy that if he were a teenager today, “he’d be hacking biology.”57 Gates elaborated, saying, “Creating artificial life with DNA synthesis, that’s sort of the equivalent of machine-language programming.”
Policy makers, activists, journalists, educators, investors, philanthropists, analysts, entrepreneurs, and a whole host of others need to come together to fight for their lives. We now know that aging is plastic and that humanity’s time horizons are not set in stone. Larry Ellison, Bill Gates, Peter Thiel, Jeff Bezos, Larry Page, Sergey Brin, and Paul Allen have all recognized the wealth of opportunity in the bioinformatics revolution, but this is not enough. Other heroes must come forward—perhaps there is even one reading this sentence right now. The goal is more healthy time, which, as we have seen throughout this book, will lead to greater wealth and prospects for happiness. A longer health span means more time to enjoy the wonders of life, including relationships with family and friends, career building, knowledge seeking, adventure, and exploration.
Our Posthuman Future: Consequences of the Biotechnology Revolution by Francis Fukuyama
Albert Einstein, Asilomar, assortative mating, Berlin Wall, bioinformatics, Columbine, demographic transition, Fall of the Berlin Wall, Flynn Effect, Francis Fukuyama: the end of history, impulse control, life extension, Menlo Park, meta analysis, meta-analysis, out of africa, Peter Singer: altruism, phenotype, presumed consent, Ray Kurzweil, Scientific racism, selective serotonin reuptake inhibitor (SSRI), sexual politics, stem cell, Steven Pinker, The Bell Curve by Richard Herrnstein and Charles Murray, Turing test, twin studies
Beyond genomics lies the burgeoning field of proteomics, which seeks to understand how genes code for proteins and how the proteins themselves fold into the exquisitely complex shapes required by cells.2 And beyond proteomics there lies the unbelievably complex task of understanding how these molecules develop into tissues, organs, and complete human beings. The Human Genome Project would not have been possible without parallel advances in the information technology required to record, catalog, search, and analyze the billions of bases making up human DNA. The merger of biology and information technology has led to the emergence of a new field, known as bioinformatics.3 What will be possible in the future will depend heavily on the ability of computers to interpret the mind-boggling amounts of data generated by genomics and proteomics and to build reliable models of phenomena such as protein folding. The simple identification of genes in the genome does not mean that anyone knows what it is they do. A great deal of progress has been made in the past two decades in finding the genes connected to cystic fibrosis, sickle-cell anemia, Huntington’s chorea, Tay-Sachs disease, and the like.
Schlesinger, Jr.’s, Cycles of American History (Boston: Houghton Mifflin, 1986); see also William Strauss and Neil Howe, The Fourth Turning: An American Prophecy (New York: Broadway Books, 1997). 22 Kirkwood (1999), pp. 131–132. 23 Michael Norman, “Living Too Long,” The New York Times Magazine, January 14, 1996, pp. 36–38. 24 Kirkwood (1999), p. 238. 25 On the evolution of human sexuality, see Donald Symons, The Evolution of Human Sexuality (Oxford: Oxford University Press, 1979) CHAPTER 5: GENETIC ENGINEERING 1 On the history of the Human Genome Project, see Robert Cook-Degan, The Gene Wars: Science, Politics, and the Human Genome (New York: W. W. Norton, 1994); Kathryn Brown, “The Human Genome Business Today,” Scientific American 283 (July 2000): 50–55; and Kevin Davies, Cracking the Genome: Inside the Race to Unlock Human DNA (New York: Free Press, 2001). 2 Carol Ezzell, “Beyond the Human Genome,” Scientific American 283, no. 1 ( July 2000): 64–69. 3 Ken Howard, “The Bioinformatics Gold Rush,” Scientific American 283, no. 1 (July 2000): 58–63. 4 Interview with Stuart A. Kauffman, “Forget In Vitro—Now It’s ‘In Silico,’” Scientific American 283, no. I July 2000): 62–63. 5 Gina Kolata, “Genetic Defects Detected in Embryos Just Days Old,” The New York Times, September 24, 1992, p. A1 6 Lee M. Silver, Remaking Eden: Cloning and Beyond in a Brave New World (New York: Avon, 1998), pp. 233–247 7 Ezzell (2000). 8 For Wilmut’s own account of this accomplishment, see Ian Wilmut, Keith Campbell, and Colin Tudge, The Second Creation: Dolly and the Age of Biological Control (New York: Farrar, Straus and Giroux, 2000). 9 National Bioethics Advisory Commission, Cloning Human Beings (Rockville, Md.: National Bioethics Advisory Commission, 1997). 10 Margaret Talbot, “A Desire to Duplicate,” The New York Times Magazine, February 4, 2001, pp. 40–68; Brian Alexander, “(You)2,” Wired, February 2001, 122–135. 11 Glenn McGee, The Perfect Baby: A Pragmatic Approach to Genetics (Lanham, Md.: Rowman and Littlefield, 1997). 12 For an overview of the present state of human germ-line engineering, see Gregory Stock and John Campbell, eds., Engineering the Human Germline: An Exploration of the Science and Ethics of Altering the Genes We Pass to Our Children (New York: Oxford University Press, 2000); Marc Lappé, “Ethical Issues in Manipulating the Human Germ Line,” in Peter Singer and Helga Kuhse, eds., Bioethics: An Anthology (Oxford: Blackwell, 1999), p. 156; and Mark S.
Heidegger, Martin. Basic Writings. New York: Harper and Row, 1957. High, Jack, and Clayton A. Coppin. The Politics of Purity: Harvey Washington Wiley and the Origins of Federal Food Policy Ann Arbor, Mich.: University of Michigan Press, 1999. Hirschi, Travis, and Michael Gottfredson. A General Theory of Crime. Stanford, Calif.: Stanford University Press, 1990. Howard, Ken. “The Bioinformatics Gold Rush.” Scientific American 283, no. I (July 2000): 58–63. Hrdy, Sarah B., and Glenn Hausfater. Infanticide: Comparative and Evolutionary Perspectives. New York: Aldine Publishing, 1984. Hubbard, Ruth. The Politics of Women’s Biology. New Brunswick, N.J.: Rutgers University Press, 1990. Huber, Peter. Orwell’s Revenge: The 1984 Palimpsest. New York: Free Press, 1994. Hull, Terence H.
Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher
23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, longitudinal study, Mars Rover, natural language processing, openstreetmap, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social graph, SPARQL, speech recognition, statistical model, supply-chain management, text mining, Vernor Vinge, web application
To capture the skill set required to perform this multitude of tasks, we created the role of “Data Scientist.” In the financial services domain, large data stores of past market activity are built to serve as the proving ground for complex new models developed by the Data Scientists of their domain, known as Quants. Outside of industry, I’ve found that grad students in many scientific domains are playing the role of the Data Scientist. One of our hires for the Facebook Data team came from a bioinformatics lab where he was building data pipelines and performing offline data analysis of a similar kind. The well-known Large Hadron Collider at CERN generates reams of data that are collected and pored over by graduate students looking for breakthroughs. Recent books such as Davenport and Harris’s Competing on Analytics (Harvard Business School Press, 2007), Baker’s The Numerati (Houghton Mifflin Harcourt, 2008), and Ayres’s Super Crunchers (Bantam, 2008) have emphasized the critical role of the Data Scientist across industries in enabling an organization to improve over time based on the information it collects.
An individual’s genome could then be represented by a traversal of the reference graph. DNA As a Data Source To a programming language, DNA is simply a string: char(3*10^6) human_genome; The full genomic information for man consists of 3 billion characters and is easily handled in memory by even the most inefficient home-brewed language. However, the process of determining the exact order of these 3 billion bases requires a significant effort spanning chemistry, bioinformatics, laboratory procedures, and a lot of spinning disks. The Human Genome Project aimed, for the first time, to sequence every one of these characters. A number of large, high-throughput institutes from around the world put academic competition aside and set about a task that would last 13 years and consume billions of dollars. Their aim was to produce a robust, accurate map of the human genome, available to all, for free.
While the raw data is not backed up (restoring from tape would take three months), each raw image from the sequencing run is scaled down to low-quality JPEG files, and stored in a database. Although unsuitable for analysis, this data is useful should any run require a manual review to identify imaging problems or artifacts (oil, poor DNA clustering, and even fingerprints aren’t uncommon). Once the sequencing data is available, it is stored in two formats in a high-performance Oracle database. While production systems make good use of databases, bioinformatics tools tend to continue to work against flat files on a physical filesystem. To be sure that we cater to all tastes, the vast swaths of sequence information available in this sequence archive are also presented to Sanger’s internal compute farms via a Fuse user-space filesystem. This approach scales surprisingly well. The sequence data is then passed through a series of quality control steps, which again run on the sequencing analysis cluster, and check for low sequencing yield, high levels of unknown bases, or low complexity sequence, all of which are telltale signs for sequencing 256 CHAPTER FIFTEEN Download at Boykma.Com errors.
Life at the Speed of Light: From the Double Helix to the Dawn of Digital Life by J. Craig Venter
Albert Einstein, Alfred Russel Wallace, Asilomar, Barry Marshall: ulcers, bioinformatics, borderless world, Brownian motion, clean water, discovery of DNA, double helix, epigenetics, experimental subject, global pandemic, Isaac Newton, Islamic Golden Age, John von Neumann, Louis Pasteur, Mars Rover, Mikhail Gorbachev, phenotype, Richard Feynman, stem cell, the scientific method, Thomas Kuhn: the structure of scientific revolutions, Turing machine
They believed that the then-unprecedented amount of molecular information available for a wide range of model organisms would yield vivid new insights into intracellular molecular processes that could, if simulated in a computer, enable them to predict the dynamic behavior of living cells. Within a computer it would be possible to explore the functions of proteins, protein–protein interactions, protein–DNA interactions, regulation of gene expression, and other features of cellular metabolism. In other words, a virtual cell could provide a new perspective on both the software and hardware of life. In the spring of 1996 Tomita and his students at the Laboratory for Bioinformatics at Keio started investigating the molecular biology of Mycoplasma genitalium (which we had sequenced in 1995) and by the end of that year had established the E-Cell Project. The Japanese team had constructed a model of a hypothetical cell with only 127 genes, which were sufficient for transcription, translation, and energy production. Most of the genes that they used were taken from Mycoplasma genitalium.
Currently Novartis and other vaccine companies rely on the World Health Organization to identify and distribute the seed viruses. To speed up the process we are using a method called “reverse vaccinology,” which was first applied to the development of a meningococcal vaccine by Rino Rappuoli, now at Novartis. The basic idea is that the entire pathogenic genome of an influenza virus can be screened using bioinformatic approaches to identify and analyze its genes. Next, particular genes are selected for attributes that would make good vaccine targets, such as outer-membrane proteins. Those proteins then undergo normal testing for immune responses. My team has sequenced genes representing the diversity of influenza viruses that have been encountered since 2005. We have sequenced the complete genomes of a large collection of human influenza isolates, as well as a select number of avian and other non-human influenza strains relevant to the evolution of viruses with pandemic potential, and made the information publicly available.
Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-physikalische Klasse, Fachgruppe VI, Biologie, Neue Folge 1, no. 13 (1935): pp. 189–245. 5. Richard Dawkins. River Out of Eden (New York: Basic Books, 1995). 6. Motoo Kimura. “Natural selection as the process of accumulating genetic information in adaptive evolution.” Genetical Research 2 (1961): pp. 127–40. 7. Sydney Brenner. “Life’s code script.” Nature 482 (February 23, 2012): p. 461. 8. W. J. Kress and D. L. Erickson. “DNA barcodes: Genes, genomics, and bioinformatics.” Proceedings of the National Academy of Sciences 105, no. 8 (2008): pp. 2761–62. 9. Lulu Qian and Erik Winfree. “Scaling up digital circuit computation with DNA strand displacement cascades.” Science 332, no. 6034 (June 3, 2011): pp. 1196–201. 10. George M. Church, Yuan Gao, and Sriram Kosuri. “Next-generation digital information storage in DNA.” Science 337, no. 6102 (September 28, 2012): p. 1628. 11.
In Our Own Image: Savior or Destroyer? The History and Future of Artificial Intelligence by George Zarkadakis
3D printing, Ada Lovelace, agricultural Revolution, Airbnb, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, animal electricity, anthropic principle, Asperger Syndrome, autonomous vehicles, barriers to entry, battle of ideas, Berlin Wall, bioinformatics, British Empire, business process, carbon-based life, cellular automata, Claude Shannon: information theory, combinatorial explosion, complexity theory, continuous integration, Conway's Game of Life, cosmological principle, dark matter, dematerialisation, double helix, Douglas Hofstadter, Edward Snowden, epigenetics, Flash crash, Google Glasses, Gödel, Escher, Bach, income inequality, index card, industrial robot, Internet of things, invention of agriculture, invention of the steam engine, invisible hand, Isaac Newton, Jacquard loom, Jacques de Vaucanson, James Watt: steam engine, job automation, John von Neumann, Joseph-Marie Jacquard, Kickstarter, liberal capitalism, lifelogging, millennium bug, Moravec's paradox, natural language processing, Norbert Wiener, off grid, On the Economy of Machinery and Manufactures, packet switching, pattern recognition, Paul Erdős, post-industrial society, prediction markets, Ray Kurzweil, Rodney Brooks, Second Machine Age, self-driving car, Silicon Valley, social intelligence, speech recognition, stem cell, Stephen Hawking, Steven Pinker, strong AI, technological singularity, The Coming Technological Singularity, The Future of Employment, the scientific method, theory of mind, Turing complete, Turing machine, Turing test, Tyler Cowen: Great Stagnation, Vernor Vinge, Von Neumann architecture, Watson beat the top human players on Jeopardy!, Y2K
At the same time, the computer metaphor frames our way of thinking, and how we communicate the fundamental ideas of our time. We speak of the brain as the ‘hardware’ and of the mind as the ‘software’. This dualistic software–hardware paradigm is applied across many fields, including life itself. Cells are the ‘computers’ that run a ‘program’ called the genetic code, or genome. The ‘code’ is written on the DNA. Cutting-edge research in biology does not take place in vitro in a wet lab, but in silico in a computer. Bioinformatics – the accumulation, tagging, storing, manipulation and mining of digital biological data – is the present, and future, of biology research. The computer metaphor for life is reinforced by its apparently successful application to real problems. Many disruptive new technologies in molecular biology – for instance ‘DNA printing’ – function on the basis of digital information. This is how they do it: DNA is a molecule formed by two sets of base pairs: adenine-thymine (A-T) and guanine-cytosine (G-C).
Thanks to digital data and ever-accelerating computer power we are at the cusp of an era in which we can gain unprecedented insights into natural phenomena, the human body, markets, Earth’s climate, ecosystems, energy grids, and just about everything in between. Norbert Wiener’s cybernetic dream is slowly becoming a reality: the more information we have about systems, the more control we can exercise over them with the help of our computers. Big data are our newfound economic bounty. The big data economy In 2010, I took a contract as External Relations Officer at the European Bioinformatics Institute (EBI) at Hinxton, Cambridge. The Institute is part of the intergovernmental European Molecular Biology Laboratory, and its core mission is to provide an infrastructure for the storage and manipulation of biological data. This is the data that researchers in the life sciences produce every day, including information about the genes of humans and of other species, chemical molecules that might provide the basis for new therapies, proteins, and also about research findings in general.
At the time that I worked for them, EBI’s challenge was to increase the capacity of its infrastructure in order to accommodate this ‘data deluge’. As someone who facilitated communications between the Institute and potential government funders across Europe, I had first-hand experience of the importance that governments placed on biological data. Almost everyone understood the potential for driving innovation through this data, and was ready to support the expansion of Europe’s bioinformatics infrastructure, even as Europe was going through the Great Recession. The message was simple and clear: whoever owned the data owned the future. Governments and scientists are not the only ones to have jumped on the bandwagon of big data. The advent of social media and Google Search has transformed the marketing operations of almost every business in the world, big and small. Tools have been developed to ‘mine’ the text written by billions of people on Facebook and Twitter, in order to measure sentiment and target consumers with, hopefully, the right products.
Dinosaurs Rediscovered by Michael J. Benton
All science is either physics or stamp collecting, Bayesian statistics, biofilm, bioinformatics, David Attenborough, Ernest Rutherford, germ theory of disease, Isaac Newton, lateral thinking, North Sea oil, nuclear winter
Through my entire research career, palaeontologists have squabbled strenuously over the classification of their organisms of choice, whether it be dinosaurs, trilobites, or fossil plants. These fights might seem inconsequential, but we are considering the fundamentals of how to document the wonders of biodiversity, and we are also addressing origins. Documenting biodiversity and origins is big science now – indeed, it forms part of the modern techniques termed, rather forbiddingly, phylogenomics and bioinformatics. Phylogenomics is the new discipline of establishing evolutionary trees from molecular data. Bioinformatics is the field of managing large data sets in the life sciences and number-crunching those data to produce information on the genetic basis of disease, adaptations, and cell function, and has applications fundamental to medicine and agriculture. Practitioners of these methods block their university’s supercomputers for weeks while they run billions of repeat calculations to get their answers.
McNeill 215, 216, 218, 228–29, 234, 252 Allen, Percy 73 alligators 118, 164–65, 194 Allosaurus 49, 121, 188 animated skin of 250 diet 206 fact file 188–89 feeding mechanisms 186–88, 190–91, 193, 193 medullary bone 145 Morrison Formation 69, 71 movement 248 skulls 17–18, X teeth and bite force 188, 189, 192, 196 Alvarez, Luis 259–62, 260, 264, 267, 285, 286 Alvarez, Walter 259, 260, 261–62, 264 amber dinosaurs preserved in 131–32, VI extracting DNA from fossils in 136, 137 American Museum of Natural History (AMNH) 54, 156, 166, 243 American National Science Foundation 52 Amherst College Museum, Connecticut 223, 224–25, 227 Amphicoelias 206 analogues, modern 16 Anatosaurus 221, 221 Anchiornis 68–69, 70, V fact file 70 feathers 125, 126 flight 245 footprints 224–25, 225 angiosperms 78–79 animation 249–52, 251 Ankylosaurus 65, 79, 272 extinction 276 fact file 272–73 Hell Creek Formation 270 use of arms and legs 236 Anning, Mary 195 apatite 142 Apatosaurus 206 Archaeopteryx 110, 112, IV as ‘missing link’ fossil 114, 121 fact file 112–13 flight 114, 124, 247 Richard Owen and 111, 114 skeleton found at Solnhofen 111, 277 archosauromorphs 35–36, 37 archosaurs 16, 21–22, 35, 39, 56 Armadillosuchus 201 Asaro, Frank 259 Asilisaurus 32–33 asteroid impact 254–69, 275–76, 280, 281, 286–87, XIX Attenborough, David 98, 213 B Bakker, Bob 109–10, 115, 126 asteroid impact and extinction 262 Deinonychus 110, 111, 221, 244–45 dinosaurs as warm-blooded creatures 109, 116, 117 modern birds as dinosaurs 110 speed of dinosaurs 230 validity of Owen’s Dinosauria 57, 59 Baron, Matt 80–83 Barosaurus 206 Barreirosuchus 201 Barrett, Paul 80–83 Baryonyx 193 Bates, Karl 192 Bayesian statistical methods 273, 275 BBC Horizon 229, 264–65 Walking with Dinosaurs 249–52, 251 beetles 78, 139, 204 Beloc, Haiti 265–66, 265 Bernard Price Palaeontological Institute 160, 163 Bernardi, Massimo 43, 46 biodiversity, documenting 52 bioinformatics 52 bipedal dinosaurs arms and legs 235–40 early images of 219–21 movement and posture 221–22, 222, 249 speed 228 Bird, Roland T. 242–43 birds 145 brains 129 breathing 118 eggs 155, 158, 159, 166 evolution of 277, 278–79, 279–81, 280 feathers 125–26, 127 flight 244, 247, 248 gastroliths 194 growth 174 identifying ancestral genetic sequences 151–52 intelligence 128 as living dinosaurs 110–15, 118, 120–21, 124, 132 and the mass extinction 277–81 medullary bone 143, 145 Mesozoic birds from China 118–24 movement 234 sexual selection 126 using feet to hold prey down 235, 235 bite force 191–94 blood, identifying dinosaur 141–43 Bonaparte, José 239 bones 99 age of 155 bone histology 116–18, 119 bone remodelling 116–17 casting 100 composition 142 excavating from rock 87–99, 105 extracting blood from 141–42 first found 65 first illustrated 65 growth lines 116, 117, 154–55, 170, 172–73, 184 how dinosaurs’ jaws worked 186 mapping 93–94 reconstructing 99–101 structures 170, XIII Brachiosaurus 49, 69, 178–79 diet 206, 207–8 fact file 178–79 Morrison Formation 69 size 175 bracketing 15–17 brain size 128–30, XI, XII breakpoint analysis 42, 43 breathing 118 Bristol City Museum 104 Bristol Dinosaur Project 101–4 British Museum, London 111, 114 Brontosaurus 69, 225 Brookes, Richard 65 Brown, Barnum 273 Brusatte, Steve 32, 36–37, 39 bubble plots 42, 43 Buckland, William 67, 195 Buckley, Michael 142 Burroughs, Edgar Rice, The Land that Time Forgot 134 Butler, Richard 32 Button, David 208, 213 C Camarasaurus 175, 206, 208–9, 209, 213, IX Cano, Raúl 136 Carcharodontosaurus 196 Carnegie, Andrew 211 Carnian Pluvial Episode 40, 42, 43, 45, 46, 50 carnivores 201 see also individual dinosaurs Carnotaurus 201, 238, 239, 240 fact file 239 carotenoids 124 cartilage 142 Caudipteryx 121, 123 fact file 123 Centrosaurus 87, 88 fact file 88–89 ceratopsians 79, 143, 156 diversity of 272, 275 use of arms and legs 236 Ceratosaurus 69, 71, 187, 206 Cetiosaurus 57, 66 Chapman Andrews, Roy 156, 166 Charig, Alan 22–23, 34, 39 Chasmosaurus 87 Chen, Pei-ji 121 Chicxulub crater, Mexico 264–68, 267, 285, 286 Chin, Karen 195, 204 China Jurassic dinosaurs 68–71 Mesozoic birds from 118–24 Chinsamy-Turan, Anusuya 145 chitin 139 chromosomes 151–52 Chukar partridges 248 clades 55, 82, 110 cladistics 53–55, 82–83 cladograms 55, 56 Clashach, Scotland 85, 86 classic model 21, 21 classification, evolutionary trees 52–84, 60–61 climate climate change 22, 40, 41, 43 Cretaceous 269 identifying ancient 46–47 Late Triassic 40, 41, 43, 49 Triassic Period 48, 49 cloning 134–35, 137, 148–51, 150 Coelophysis 193, 236, I, X Colbert, Ned 22, 23, 34 Romer-Colbert ecological relay model 22, 35, 36, 39–40 size and core temperature 118 cold-blooded animals 116 collagen 142, 143 colour of dinosaurs 124–25 of feathers 8–10, 17, 139, V computational methods 35–39 Conan Doyle, Sir Arthur, The Lost World 133–34, 133, 135 Confuciusornis 144, 145, 147, XIII fact file 146–47 conifers 22, 131, 197, III Connecticut Valley 223–26, 224–25, 227, 243 contamination of DNA 138 continental plates 47 Cope, Edward 208 coprolites 195, 195, 197, 204 coprophagy 204 crests 126, 128, 143 Cretaceous 50, 71–75 birds 277–78 climate 269 decline of dinosaurs 274, 275 dinosaur evolution rates 77 ecosystems 205 in North America 240–42 ornithopods 71 sauropods 71 see also Early Cretaceous; Late Cretaceous Cretaceous–Palaeogene boundary 260, 261–62, 265–66, 269 evolution of birds 276, 277, 278–79 Cretaceous Terrestrial Revolution 77–80, 131 Crichton, Michael, Jurassic Park 134–35, 136 criticism and scientific method 287–88 crocodiles 218 Adamantina Formation food web 201–3 eggs and babies 155, 159, 164, 165 feeding methods 194 function of the snout 193 crurotarsans 39 CT (computerized tomographic) scanning 97, 99 dinosaur embryos 160, 162 dinosaur skulls 163, 191 Currie, Phil 86, 91, 121 Cuvier, Georges 257 D Dal Corso, Jacopo 40 Daohugou Bed, China 68 Darwin, Charles 23, 107, 114, 132, 287 Daspletosaurus 170, 171 dating dinosaurian diversification 44–46 de-extinction science 149, 151 death of dinosaurs see extinction Deccan Traps 268, 285, 287 Deinonychus 112, 114, 121 fact file 112–13 John Ostrom’s monograph on 110, 111, 113, 116, 244–45 movement 221 dentine 196, 197 Dial, Ken 248 diet collapsing food webs 204–5 dinosaur food webs 201–4 fossil evidence for 194–95 microwear on teeth and diet 199–201 niche division and specialization in 205–13 digital models 17, 18, 19, 191–94, 231–34, 249, 252 dimorphism, sexual 126, 143 dinomania 107 Dinosaur Park Formation, Drumheller 86, 91–99, 100 Dinosaur Provincial Park, Alberta 86, 87, 91–92, 91 Dinosaur Ridge, Colorado 240 Dinosauria 33, 55, 82, 107 discovery of the clade 57–59 Diplodocus 175, 210–11, II diet 207, 208–9, 213 fact file 210–11 Morrison Formation 69 skulls IX teeth and bite force 209, 213 diversification of dinosaurs 29, 44–46 DNA (deoxyribonucleic acid) 134–35 cloning 148–51 dinosaurian genome 151–52 extracting from fossils in amber 136 extracting from museum skins and skeletons 138 identifying dinosaur 136–37 survival of in fossils 138–39, 141 Doda, Bajazid 180 Dolly the sheep 148, 149 Dromaeosaurus 87, 121 duck-billed dinosaurs see hadrosaurs dung beetles 204 dwarf dinosaurs 180–84 Dysalotosaurus 145 Dzik, Jerzy 29, 31 E Early Cretaceous diversity of species on land and in sea 78 Jehol Beds 124 Wealden 72–74, 74, 75, 78 ecological relay model 21, 22, 35, 36, 39 ecology, and the origin of dinosaurs 23–25 education, using dinosaurs in 101–4 eggs, birds 155, 158, 159, 166 eggs, dinosaur 154, 155–56 dinosaur embryos 160–63 nests and parental care 163–67 size of 158–59 El Kef, Tunisia 276 Elgin, Scotland 25–26, 26, 34, 85–86 embryos, dinosaur 154, 160–63 enamel, tooth 196, 197 enantiornithines 277–78 encephalization quotient (EQ) 130 engineering models 17–18 Eoraptor 29 Erickson, Greg 154–55, 170, 172–73, 184–85, 197 eumelanin 124 eumelanosomes V Euoplocephalus 87, 88 fact file 88–89 Europasaurus 117 European Synchrotron Radiation Facility (ESRF) 162 evolution 13, 23, 40 evolutionary trees 52–84, 60–61, 281 Richard Owen’s views on 106–7, 114 size and 181, 184 Evolution (journal) 109 excavations 87–99 Dinosaur Park Formation 86, 91–99, 100 recording 92–97 extant phylogenetic bracket 16, 217 external fundamental system (EFS) 170 extinction Carnian Pluvial Episode 40, 42, 43, 45, 46, 50 end-Triassic event 64 mass extinction 254–85 Permian–Triassic mass extinction 14, 33–34, 46, 222 sudden or gradual 270–75 eyes 100 F faeces, fossil 194, 195, 197, 204 Falkingham, Peter 192, 226 feathers 99, 245 in amber 131, VI bird feathers 125–26, 127 colour of 8–10, 17, 139, V as insulation 126 melanosomes 8–10, 8, 17, 124–25, 132, V sexual signalling 126, 128, 143 Sinosauropteryx 8–9, 8, 10, 17, 119, 120–21, 125, 126 Field, Dan 279, 281 films, dinosaurs in 249–52 Jurassic Park 134–35, 136, 217, 252 finding dinosaurs 87–105 finite element analysis (FEA) 18, 190–91, 199, 208 fishes 128, 159, 163–64, 196 flight 244–49 flowering plants 78–79, III food webs 71–75, 201–4 Adamantina Formation 201–4, 202–3 collapsing 204–5 Wealden 74, 75 footprints 223–27, 240 megatracksites 242 photogrammetry 94 swimming tracks 242, 243 fossils casting 100 extracting skeletons from 94–99, 105 plants 269 reconstructing 99–101 scanning 97, 99 survival of organic molecules in 138–39, 141 Framestore 249–50 Froude, William 228–29 G Galton, Peter 58, 59, 110, 115, 221, 221 Garcia, Mariano 232, 234 gastroliths 194 Gatesy, Stephen 226, 231 gaur 148–49 Gauthier, Jacques 53, 59, 245 genetic engineering, bringing dinosaurs back to life with 148–51 genome, dinosaurian 151–52 geological time scale 6–7, 44–45 gharials 193, 194 gigantothermy 117, 118 Gill, Pam 199 glasses, impact 265–66, 269 gliding 245, 247, 248 Gorgosaurus 87, 170, 171 Granger, Walter 157 Great Exhibition (1851) 107, 108 Gregory, William 157 Grimaldi, David 131 growth dwarf dinosaurs 180–84 growth rates 154, 170–74, 184 growth rings 116, 117, 154–55, 170, 172–73, 184 growth spurts 145 how dinosaurs could be so huge 175–79 Gryposaurus 87 Gubbio, Italy 260, 261–62, 265, 266, 286 H hadrosaurs 79, 143 Dinosaur Park Formation 91–99, 100 diversity of 272, 275 first skeleton 218–19, 220 teeth 196–97, 198, 201, XVIII use of arms and legs 236 Hadrosaurus foulkii 220 Haiti 265–66, 265 Haldane, J.
Pearls of Functional Algorithm Design by Richard Bird
Final remarks The origins of the maximum segment sum problem go back to about 1975, and its history is described in one of Bentley’s (1987) programming pearls. For a derivation using invariant assertions, see Gries (1990); for an algebraic approach, see Bird (1989). The problem refuses to go away, and variations are still an active topic for algorithm designers because of potential applications in data-mining and bioinformatics; see Mu (2008) for recent results. The interest in the non-segment problem is what it tells us about any maximum marking problem in which the marking criterion can be formulated 78 Pearls of Functional Algorithm Design as a regular expression. For instance, it is immediate that there is an O(nk ) algorithm for computing the maximum at-least-length-k segment problem because F ∗ T n F ∗ (n ≥ k ) can be recognised by a k -state automaton.
In particular, the function sorttails that returns the unique permutation that sorts the tails of a list can be obtained from the ﬁnal program for ranktails simply by replacing resort ·concat ·label in the ﬁrst line of ranktails by concat. The function sorttails is needed as a preliminary step in the Burrows–Wheeler algorithm for data compression, a problem we will take up in the following pearl. The problem of sorting the suﬃxes of a string has been treated extensively in the literature because it has other applications in string matching and bioinformatics; a good source is Gusﬁeld (1997). This pearl was rewritten a number of times. Initially we started out with the idea of computing perm, a permutation that sorts a list. But perm is too speciﬁc in the way it treats duplicates: there is more than one permutation that sorts a list containing duplicate elements. One cannot get very far with perm unless one generalises to either rank or partition.
– array index, 25, 29, 87, 100 – preﬁx, 103, 119, 127 accumArray, 2, 5, 82, 123 applyUntil, 82 array, 29, 85 bounds, 25 break , 154, 164, 182 compare, 29 concatMap, 42 elems, 85 foldrn – fold over nonempty lists, 42 fork , 35, 83, 94, 118 inits, 66, 67, 117 listArray, 25, 100 minors, 172 nodups, 149 nub, 64 partition, 4 partitions, 38 reverse, 119, 244 scanl, 118, 238 scanr , 70 sort, 28, 95 sortBy, 29, 94 span, 67 subseqs, 57, 65, 157, 163 tails, 7, 79, 100, 102 transpose, 98, 150, 193 unfoldr , 202, 243 zip, 35, 83 zipWith, 83 Abelian group, 27 abides property, 3, 22 abstraction function, 129, 211, 226 accumulating function, 2 accumulating parameter, 131, 138, 140, 177, 253 adaptive encoding, 200 amortised time, 5, 118, 131, 133 annotating a tree, 170 arithmetic decoding, 201 arithmetic expressions, 37, 156 array update operation, 3, 6 arrays, 1, 2, 21, 29, 85, 99 association list, 29, 238 asymptotic complexity, 27 bags, 25, 50, 51 balanced trees, 21, 54, 234 Bareiss algorithm, 186 bijection, 129 binary search, 7, 10, 14, 15, 19, 54 binomial trees, 178 bioinformatics, 77, 90 Boolean satisﬁability, 155 borders of a list, 103 bottom-up algorithm, 41 boustrophedon product, 245, 251, 260 breadth-ﬁrst search, 136, 137, 178 Bulldozer algorithm, 196 bzip2, 101 call-tree, 168 Cartesian coordinates, 141, 155 Cartesian product, 149 celebrity clique, 56 Chió’s identity, 182 clique, 56 combinatorial patterns, 242 comparison-based sorting, 10, 16, 27 computaional geometry, 188 conjugate, 263 constraint satisfaction, 155 continuations, 273 coroutines, 273 275 276 cost function, 41, 48, 52 cyclic structures, 133, 179 data compression, 91, 198 data mining, 77 data reﬁnement, 5, 48, 108, 114, 129, 210 deforestation, 168 depth-ﬁrst search, 137, 221, 222 destreaming, 214 destreaming theorem, 214 Dilworth’s theorem, 54 divide and conquer, 1, 3, 5, 7, 8, 15, 21–23, 27, 29, 30, 65, 81, 171 dot product, 185 dynamic programming, 168 EOF (end-of-ﬁle symbol), 203 exhaustive search, 12, 33, 39, 57, 148, 156 facets, 190 failure function, 133 ﬁctitious values, 14, 77 ﬁnite automaton, 74, 136 ﬁssion law of foldl, 130 ﬁxpoint induction, 205 forests, 42, 174 fringe of a tree, 41 frontier, 137 fully strict composition, 243 fusion law of foldl, 76, 130, 195 fusion law of foldr , 34, 51, 52, 61, 247, 260, 261, 265 fusion law of foldrn, 43 fusion law of fork , 35 fusion law of unfoldr , 206, 212 Galil’s algorithm, 122 garbage collection, 165, 166 Garsia–Wachs algorithm, 49 Gaussian elimination, 180 graph traversal, 178, 221 Gray path order, 258 greedy algorithms, 41, 48, 50, 140 Gusﬁeld’s Z algorithm, 116 Hu–Tucker algorithm, 49 Huﬀman coding, 91, 198, 201 immutable arrays, 25 incremental algorithm, 188, 191, 204 incremental decoding, 216 incremental encoding, 203, 209 indexitis, 150 inductive algorithm, 42, 93, 102 integer arithmetic, 182, 198, 208 integer division, 182 intermediate data structure, 168 interval expansion, 209, 210 inversion table, 10 inverting a function, 12, 93 involution, 150 iterative algorithm, 10, 82, 109, 113 Index Knuth and Ruskey algorithm, 258 Knuth’s spider spinning algorithm, 242 Koda–Ruskey algorithm, 242 law of iterate, 99 laws of ﬁlter , 118, 152 laws of fork , 35 lazy evaluation, 33, 147, 185, 243 leaf-labelled trees, 41, 165, 168 left spines, 43, 45, 177 left-inverse, 129 Leibniz formula, 180 lexicographic ordering, 45, 52, 64, 102, 104 linear ordering, 43 linked list, 225 longest common preﬁx, 103, 112, 120 longest decreasing subsequence, 54 loop invariants, 62, 111 lower bounds, 16, 27, 28, 64 Mahajan and Vinay’s algorithm, 186 majority voting problem, 62 matrices, 147, 181 matrix Cartesian product, 149 maximum marking problems, 77 maximum non-segment sum, 73 maximum segment sum, 73 maximum surpasser count, 7 McCarthy S-expression, 221 memo table, 163 memoisation, 162 merge, 26, 142, 158 mergesort, 29, 89, 171, 173 minimal element, 53 minimum cost tree, 44 minimum element, 53 minors, 181 model checking, 155 monads, 3, 114, 155 monotonicity condition, 48, 53 move-to-front encoding, 91 multisets, 25 narrowing, 199 nondeterministic functions, 43, 51 normal form, 160 online list labelling, 241 Open Problems Project, 31 optimal bracketing, 176 optimisation problems, 48, 176 order-maintenance problem, 241 overﬂow, 214 parametricitiy, 62 partial evaluation, 134 partial ordering, 53 partial preorder, 52 partition sort, 85 partition sorting, 87 perfect binary trees, 171 Index permutations, 79, 90, 91, 96, 97, 180, 189, 242, 251 planning algorithm, 136, 138 plumbing combinators, 36 preﬁx, 66 preﬁx ordering, 103, 105, 119 preorder traversal, 245, 270 principal submatrices, 185 program transformation, 221 PSPACE completeness, 136 queues, 109, 137, 248, 249 Quicksort, 5, 85, 89 radix sort, 95, 101 ranking a list, 79 rational arithmetic, 180, 188, 198 rational division, 181 recurrence relations, 15, 31, 88 reﬁnement, 44, 48, 51–53, 80 regular cost function, 49 regular expression, 74 relations, 48, 167, 229 representation function, 129, 211 right spines, 177 Rose trees, 164, 245 rotations of a list, 91 rule of ﬂoors, 215 run-length encoding, 91 saddleback search, 14 safe replacement, 222 scan lemma, 118, 125 segments, 73, 171 Shannon–Fano coding, 198 sharing, 168, 173 shortest upravel, 50 simplex, 188 skeleton trees, 165 sliding-block puzzle, 136 smart constructors, 48, 170, 177 smooth algorithms, 241 solving a recursion, 98 sorting, 9, 10, 16, 91, 149 sorting numbers, 1, 3 sorting permutation, 10 space/time trade-oﬀs, 156 spanning tree, 178 stable sorting algorithm, 86, 95 stacks, 137, 221, 222 streaming, 203, 214 streaming theorem, 204 string matching, 112, 117, 127 stringology, 103 subsequences, 50, 64, 74, 162, 177, 242 suﬃx tree, 101 suﬃxes, 79, 100 Sylvester’s identity, 186 thinning algorithm, 161 top-down algorithm, 41 totally acyclic digraph, 258 transitions, 242 trees, 130, 165, 248 tries, 163 tupling law of foldl, 118, 125 tupling law of foldr , 247 unfolds, 168 unmerges, 158, 159, 165 unravel, 50 upper triangular matrix, 185 Vandermonde’s convolution, 17 well-founded recursion, 4, 30 while loop, 111, 113 wholemeal programming, 150 windows of a text, 120 Young tableau, 28 277
ucd-csi-2011-02 by Unknown
The main contribution of this work is to present the notion of bipolarity that captures the level of conflict between the contributors to a page. Thus the work is more directed at the problem of Wikipedia vandalism than the issue of authoritativeness that is the subject of this paper. 3 Extracting and Comparing Network Motif Profiles The idea of characterizing networks in terms of network motif profiles is well established and has had a considerable impact in bioinformatics . Our objective is to characterize Wikipedia pages in terms of network motif profiles and then examine whether or not different pages have characteristic network motif profiles. The datasets we considered were entries in the English language Wikipedia 2 on famous sociologists and footballers in the English Premiership 4 (see Table 1). The first step in the analysis is to identify a set of network motifs to use. 3.1 Wikipedia Network Motifs Our Wikipedia network motifs comprise author and page nodes and author-page (AP) and page-page (PP) edges (see Figures 3 and 4).
The Patient Will See You Now: The Future of Medicine Is in Your Hands by Eric Topol
23andMe, 3D printing, Affordable Care Act / Obamacare, Anne Wojcicki, Atul Gawande, augmented reality, bioinformatics, call centre, Clayton Christensen, clean water, cloud computing, commoditize, computer vision, conceptual framework, connected car, correlation does not imply causation, creative destruction, crowdsourcing, dark matter, data acquisition, disintermediation, disruptive innovation, don't be evil, Edward Snowden, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Firefox, global village, Google Glasses, Google X / Alphabet X, Ignaz Semmelweis: hand washing, information asymmetry, interchangeable parts, Internet of things, Isaac Newton, job automation, Julian Assange, Kevin Kelly, license plate recognition, lifelogging, Lyft, Mark Zuckerberg, Marshall McLuhan, meta analysis, meta-analysis, microbiome, Nate Silver, natural language processing, Network effects, Nicholas Carr, obamacare, pattern recognition, personalized medicine, phenotype, placebo effect, RAND corporation, randomized controlled trial, Second Machine Age, self-driving car, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, Snapchat, social graph, speech recognition, stealth mode startup, Steve Jobs, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Uber for X, uber lyft, Watson beat the top human players on Jeopardy!, WikiLeaks, X Prize
Indeed, the state of California, which has the largest prenatal screening program in the world, with more than four hundred thousand expectant mothers assessed annually, already provides these tests to all pregnant women who have increased risk.26 Of course, we could also sequence the fetus’s entire genome instead of just doing the simpler screens. While that is not a commercially available test, and there are substantial bioinformatic challenges that lie ahead before it could be scalable, the anticipatory bioethical issues that this engenders are considerable.27 We are a long way off for determining what would constitute acceptable genomic criteria for early termination of pregnancy, since this not only relies on accurately determining a key genomic variant linked to a serious illness, but also understanding whether this condition would actually manifest.
Now it is possible to use sequencing to unravel the molecular diagnosis of an unknown condition, and the chances for success are enhanced when there is DNA from the mother and father, or other relatives, to use for anchoring and comparative sequencing analysis. At several centers around the country, the success rate for making the diagnosis ranges between 25 percent and 50 percent. It requires considerable genome bioinformatic expertise, for a trio of individuals will generate around 750 billion data points (six billion letters per sequence, three people, each done forty times to assure accuracy). Of course, just making the diagnosis is not the same as coming up with an effective treatment or a cure. But there have been some striking anecdotal examples of children whose lives were saved or had dramatic improvement.
The most far-reaching component of the molecular stethoscope appears to be cell-free RNA, which can potentially be used to monitor any organ of the body.82 Previously that was unthinkable in a healthy person. How could one possibly conceive of doing a brain or liver biopsy in someone as part of a normal checkup? Using high-throughput sequencing of cell-free RNA in the blood, and sophisticated bioinformatic methods to analyze this data, Stephen Quake and his colleagues at Stanford were able to show it is possible to follow the gene expression from each of the body’s organs from a simple blood sample. And that is changing all the time in each of us. This is an ideal case for deep learning to determine what these dynamic genomic signatures mean, to determine what can be done to change the natural history of a disease in the making, and to develop the path for prevention.
The Architecture of Open Source Applications by Amy Brown, Greg Wilson
8-hour work day, anti-pattern, bioinformatics, c2.com, cloud computing, collaborative editing, combinatorial explosion, computer vision, continuous integration, create, read, update, delete, David Heinemeier Hansson, Debian, domain-specific language, Donald Knuth, en.wikipedia.org, fault tolerance, finite state, Firefox, friendly fire, Guido van Rossum, linked data, load shedding, locality of reference, loose coupling, Mars Rover, MITM: man-in-the-middle, MVC pattern, peer-to-peer, Perl 6, premature optimization, recommendation engine, revision control, Ruby on Rails, side project, Skype, slashdot, social web, speech recognition, the scientific method, The Wisdom of Crowds, web application, WebSocket
Amy Brown (editorial): Amy has a bachelor's degree in Mathematics from the University of Waterloo, and worked in the software industry for ten years. She now writes and edits books, sometimes about software. She lives in Toronto and has two children and a very old cat. C. Titus Brown (Continuous Integration): Titus has worked in evolutionary modeling, physical meteorology, developmental biology, genomics, and bioinformatics. He is now an Assistant Professor at Michigan State University, where he has expanded his interests into several new areas, including reproducibility and maintainability of scientific software. He is also a member of the Python Software Foundation, and blogs at http://ivory.idyll.org. Roy Bryant (Snowflock): In 20 years as a software architect and CTO, Roy designed systems including Electronics Workbench (now National Instruments' Multisim) and the Linkwalker Data Pipeline, which won Microsoft's worldwide Winning Customer Award for High-Performance Computing in 2006.
He has since contributed to almost all areas of Asterisk development, from project management to core architectural design and development. He blogs at http://www.russellbryant.net. Rosangela Canino-Koning (Continuous Integration): After 13 years of slogging in the software industry trenches, Rosangela returned to university to pursue a Ph.D. in Computer Science and Evolutionary Biology at Michigan State University. In her copious spare time, she likes to read, hike, travel, and hack on open source bioinformatics software. She blogs at http://www.voidptr.net. Francesco Cesarini (Riak): Francesco Cesarini has used Erlang on a daily basis since 1995, having worked in various turnkey projects at Ericsson, including the OTP R1 release. He is the founder of Erlang Solutions and co-author of O'Reilly's Erlang Programming. He currently works as Technical Director at Erlang Solutions, but still finds the time to teach graduates and undergraduates alike at Oxford University in the UK and the IT University of Gotheburg in Sweden.
After graduate studies in distributed systems at Carnegie-Mellon University, he worked on compilers (Tartan Labs), printing and imaging systems (Adobe Systems), electronic commerce (Adobe Systems, Impresse), and storage area network management (SanNavigator, McDATA). Returning to distributed systems and HDFS, Rob found many familiar problems, but all of the numbers had two or three more zeros. James Crook (Audacity): James is a contract software developer based in Dublin, Ireland. Currently he is working on tools for electronics design, though in a previous life he developed bioinformatics software. He has many audacious plans for Audacity, and he hopes some, at least, will see the light of day. Chris Davis (Graphite): Chris is a software consultant and Google engineer who has been designing and building scalable monitoring and automation tools for over 12 years. Chris originally wrote Graphite in 2006 and has lead the open source project ever since. When he's not writing code he enjoys cooking, making music, and doing research.
Reinventing Discovery: The New Era of Networked Science by Michael Nielsen
Albert Einstein, augmented reality, barriers to entry, bioinformatics, Cass Sunstein, Climategate, Climatic Research Unit, conceptual framework, dark matter, discovery of DNA, Donald Knuth, double helix, Douglas Engelbart, Douglas Engelbart, en.wikipedia.org, Erik Brynjolfsson, fault tolerance, Fellow of the Royal Society, Firefox, Freestyle chess, Galaxy Zoo, Internet Archive, invisible hand, Jane Jacobs, Jaron Lanier, Johannes Kepler, Kevin Kelly, Magellanic Cloud, means of production, medical residency, Nicholas Carr, P = NP, publish or perish, Richard Feynman, Richard Stallman, selection bias, semantic web, Silicon Valley, Silicon Valley startup, Simon Singh, Skype, slashdot, social intelligence, social web, statistical model, Stephen Hawking, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, The Nature of the Firm, The Wisdom of Crowds, University of East Anglia, Vannevar Bush, Vernor Vinge
p 106: Mapping the brain is far too large a subject for me to give a comprehensive list of references. An overview of work on the Allen Brain Atlas may be found in Jonah Lehrer’s excellent article . Most of the facts I relate are from that article. The paper announcing the atlas of gene expression in the mouse brain is . Overviews of some of the progress and challenges in mapping the human connectome may be found in  and . p 108: Bioinformatics and cheminformatics are now well-established fields, with a significant literature, and I won’t attempt to single out any particular reference for special mention. Astroinformatics has emerged more recently. See especially  for a manifesto on the need for astroinformatics. p 113: A report on the 2005 Playchess.com freestyle chess tournament may be found at , with follow-up commentary on the winners at .
See architecture of attention; restructuring expert attention augmented reality, 41, 87 autism-vaccine controversy, 156 Avatar (film), 34 Axelrod, Robert, 219 Baker, David, 146 basic research: economic scale of, 203 secrecy in, 87, 184–86 Bayh-Dole Act, 184–85 Benkler, Yochai, 218, 224 Bennett, John Caister, 149 Berges, Aida, 155 Bermuda Agreement, 7, 108, 190, 192, 222 Berners-Lee, Tim, 218 bioinformatics, 108 biology: data-driven intelligence in, 116–19 data web for, 121–22 open source, 48. See also genetics birdwatchers, 150 black holes, orbiting pair of, 96, 100–101, 103, 112, 114 Blair, Tony, 7, 156 Block, Peter, 218 blogs: architecture of attention and, 42, 56 as basis of Polymath Project, 1–2, 42 invention of, 20 in quantum computing, 187 rumors on, 201–2 scientific, 6, 165–69, 203–4 Borgman, Christine, 218 Boroson, Todd, 100–101, 103, 114 Borucki, William, 201 botany, 107 Brahe, Tycho, 104 brain atlases, 106, 108 British Chiropractic Association, 165–66 Brown, Zacary, 23–24, 27, 35, 41, 223 Burkina Faso, open architecture project in, 46–48 Bush, Vannevar, 217, 218 business: data-driven intelligence for, 112 data sharing methods in, 120.
See also amplifying collective intelligence Colwell, Robert, 218 combinatorial line, 211 comet hunters, 148–49 comment sites: successful examples of, 234 user-contributed, 179–81 commercialization of science, 87, 184–86 Company of Strangers, The (Seabright), 37 comparative advantage: architecture of attention and, 32, 33, 43, 56 examples from the sciences, 82, 83, 84, 85 for InnoCentive Challenges, 24, 43 modularity and, 56 technical meaning of, 223 competition: data sharing and, 103–4 as obstacle to collaboration, 86 in protein structure prediction, 147–48 for scientific jobs, 8, 9, 178, 186 Complexity Zoo, 233 computer code: in bioinformatics, 108 centralized development of new tools, 236 citation of, 196, 204–5 for complex experiments, 203 height=" information commons in, 57–59 sharing, 87, 183, 193, 204–5. See also Firefox; Linux; MathWorks competition; open source software computer games: addictive quality of, 146, 147 for folding proteins (see Foldit) connectome, human, 106, 121 conversation, offline small-group, 39–43 conversational critical mass, 30, 31, 33, 42 Cornell University Laboratory of Ornithology, 150 Cox, Alan, 57 Creative Commons, 219, 220 creative problem solving, 24, 30, 34, 35, 36, 38.
The New Harvest: Agricultural Innovation in Africa by Calestous Juma
agricultural Revolution, Albert Einstein, barriers to entry, bioinformatics, business climate, carbon footprint, clean water, colonial rule, conceptual framework, creative destruction, double helix, energy security, energy transition, global value chain, income per capita, industrial cluster, informal economy, Intergovernmental Panel on Climate Change (IPCC), Joseph Schumpeter, knowledge economy, land tenure, M-Pesa, microcredit, mobile money, non-tariff barriers, off grid, out of africa, precision agriculture, Second Machine Age, self-driving car, Silicon Valley, sovereign wealth fund, structural adjustment programs, supply-chain management, total factor productivity, undersea cable
New machines can now sequence a human genome for just $1,000.48 Dozens of genomes of agricultural, medical, and environmental importance to Africa have already been seque nced. These include rice, corn, mosquito, chicken, cattle, and 82 THE NEW HARVEST dozens of plant, animal, and human pathogens. The challenge facing Africa is building capacity in bioinformatics to understand the location and functions of genes. It is through the annotation of genomes that scientists can understand the role of genes and their potential contributions to agriculture, medicine, environmental management, and other fields. Bioinformatics could do for Africa what computer software did for India. The field would also give African science a new purpose and help to integrate the region into the global knowledge ecology. This opportunity offers Africa another opportunity for technological leapfrogging.
See African Union Australia, 63–64, 67, 131 Awuah, Patrick, 241 Babban Gona agricultural franchise (Nigeria), 214–16 “Back Home” projects (Uganda Rural Development and Training Program), 153–54 Index bananas: diseases affecting, 70–71; EARTH University production of, 171; “Golden Banana” variety and, 72–73; transgenic varieties of, 66, 70–73 Bangladesh, 71–72, 75, 202 Bangladesh Agricultural Research Institute, 71–72 banks: agricultural sector financing and, 5–6, 93–94, 100–101, 107, 143, 185; clusters and, 107; educational partnerships and, 176; infrastructure and, 143; stateowned, 107; technology and, 49, 52 Banque Régionale de Solidarité (BRS), 100–101 beans: entrepreneurship and, 164; infrastructure and, 120, 122; innovation and, 92–93 Benin: educational videos on agriculture in, 202–3; gender inequality in, 149; rice cluster in, 99–102; solar-powered irrigation in, 129 Bhoomi Project, 52 biodiversity, 73, 77–78, 255–56, 259 bioinformatics, 82 biopolymers, 39, 56–58 biosafety, 79–80, 82 biotechnology: African Panel on Modern Biotechnology and, 251; benefits of, 68–76; biodiversity, 73, 77; debates regarding safety of, 76–80, 82; food security and, 64; frontiers of, 61–63; genomes and, xxi, 23, 62, 81–82; GM crops and, xxi, 62, 249; incomes and, 68, 79; innovation and, xviii, 23, 41, 63–70, 190, 239, 242–43, 251; land-saving aspects of, 74; 303 “leapfrogging” and, 64–65, 68; regulation and, xxi, 61, 63, 72, 76–81; research and, 87, 111, 190; transgenic crops and, 62–81; trends in, 63–67 Black Sigatoka fungus, 71 Blue Skies Agro-processing Company, Ltd., 197 Boston (Massachusetts), 243 Brazil: Agricultural Research Corporation in, 30, 113–14; drought-resistant crops in, 74; entrepreneurship and education in, 165–66; flash drying in, 90; fruit exports from, 197; infrastructure in, 114; innovation and, 113–14; National System for Agriculture Research and Innovation (SNPA) in, 114; technology and, 242–44 Brazilian Agricultural Research Corporation (EMBRAPA), 30, 113–14, 243 Brazilian Development Cooperation Agency, 245 breadfruit, 211–13 Breadfruit Institute, 213–14 brinjal crops, 71–72 BRS (Banque Régionale de Solidarité), 100–101 BSS-Société Industrielle pour la Production du Riz (BSS-SIPRi), 100–101 Burkina Faso: aquaculture in, 24; CAADP and, 27–28; cereal cultivation in, 36; service sector in, 22; transgenic crops in, 65, 71 Burundi, 174, 205 businesses.
Advances in Financial Machine Learning by Marcos Lopez de Prado
algorithmic trading, Amazon Web Services, asset allocation, backtesting, bioinformatics, Brownian motion, business process, Claude Shannon: information theory, cloud computing, complexity theory, correlation coefficient, correlation does not imply causation, diversification, diversified portfolio, en.wikipedia.org, fixed income, Flash crash, G4S, implied volatility, information asymmetry, latency arbitrage, margin call, market fragmentation, market microstructure, martingale, NP-complete, P = NP, p-value, paper trading, pattern recognition, performance metric, profit maximization, quantitative trading / quantitative ﬁnance, RAND corporation, random walk, risk-adjusted returns, risk/return, selection bias, Sharpe ratio, short selling, Silicon Valley, smart cities, smart meter, statistical arbitrage, statistical model, stochastic process, survivorship bias, transaction costs, traveling salesman
Blackwell, pp. 256–278. Louppe, G., L. Wehenkel, A. Sutera, and P. Geurts (2013): “Understanding variable importances in forests of randomized trees.” Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 431–439. Strobl, C., A. Boulesteix, A. Zeileis, and T. Hothorn (2007): “Bias in random forest variable importance measures: Illustrations, sources and a solution.” BMC Bioinformatics, Vol. 8, No. 25, pp. 1–11. White, A. and W. Liu (1994): “Technical note: Bias in information-based measures in decision tree induction.” Machine Learning, Vol. 15, No. 3, pp. 321–329. Note 1 http://blog.datadive.net/selecting-good-features-part-iii-random-forests/. CHAPTER 9 Hyper-Parameter Tuning with Cross-Validation 9.1 Motivation Hyper-parameter tuning is an essential step in fitting an ML algorithm.
Beyond the basic library for organizing user data into files, the HDF Group also provides a suite of tools and specialization of HDF5 for different applications. For example, HDF5 includes a performance profiling tool. NASA has a specialization of HDF5, named HDF5-EOS, for data from their Earth-Observing System (EOS); and the next-generation DNA sequence community has produced a specialization named BioHDF for their bioinformatics data. HDF5 provides an efficient way for accessing the storage systems on HPC platform. In tests, we have demonstrated that using HDF5 to store stock markets data significantly speeds up the analysis operations. This is largely due to its efficient compression/decompression algorithms that minimize network traffic and I/O operations, which brings us to our next point. 22.5.3 In Situ Processing Over the last few decades, CPU performance has roughly doubled every 18 months (Moore's law), while disk performance has been increasing less than 5% a year.
One of the motivations of the CIFT project is to seek a way to transfer the above tools to the computing environments of the future. 22.6 Use Cases Data processing is such an important part of modern scientific research that some researchers are calling it the fourth paradigm of science (Hey, Tansley, and Tolle ). In economics, the same data-driven research activities have led to the wildly popular behavioral economics (Camerer and Loewenstein ). Much of the recent advances in data-driven research are based on machine learning applications (Qiu et al. , Rudin and Wagstaff ). Their successes in a wide variety of fields, such as planetary science and bioinformatics, have generated considerable interest among researchers from diverse domains. In the rest of this section, we describe a few examples applying advanced data analysis techniques to various fields, where many of these use cases originated in the CIFT project. 22.6.1 Supernova Hunting In astronomy, the determination of many important facts such as the expansion speed of the universe, is performed by measuring the light from exploding type Ia supernovae (Bloom et al. ).
Physics of the Future: How Science Will Shape Human Destiny and Our Daily Lives by the Year 2100 by Michio Kaku
agricultural Revolution, AI winter, Albert Einstein, Asilomar, augmented reality, Bill Joy: nanobots, bioinformatics, blue-collar work, British Empire, Brownian motion, cloud computing, Colonization of Mars, DARPA: Urban Challenge, delayed gratification, double helix, Douglas Hofstadter, en.wikipedia.org, friendly AI, Gödel, Escher, Bach, hydrogen economy, I think there is a world market for maybe five computers, industrial robot, Intergovernmental Panel on Climate Change (IPCC), invention of movable type, invention of the telescope, Isaac Newton, John Markoff, John von Neumann, life extension, Louis Pasteur, Mahatma Gandhi, Mars Rover, mass immigration, megacity, Mitch Kapor, Murray Gell-Mann, new economy, oil shale / tar sands, optical character recognition, pattern recognition, planetary scale, postindustrial economy, Ray Kurzweil, refrigerator car, Richard Feynman, Rodney Brooks, Ronald Reagan, Search for Extraterrestrial Intelligence, Silicon Valley, Simon Singh, social intelligence, speech recognition, stem cell, Stephen Hawking, Steve Jobs, telepresence, The Wealth of Nations by Adam Smith, Thomas L Friedman, Thomas Malthus, trade route, Turing machine, uranium enrichment, Vernor Vinge, Wall-E, Walter Mischel, Whole Earth Review, X Prize
I imagine in the near future, many people will have the same strange feeling I did, holding the blueprint of their bodies in their hands and reading the intimate secrets, including dangerous diseases, lurking in the genome and the ancient migration patterns of their ancestors. But for scientists, this is opening an entirely new branch of science, called bioinformatics, or using computers to rapidly scan and analyze the genome of thousands of organisms. For example, by inserting the genomes of several hundred individuals suffering from a certain disease into a computer, one might be able to calculate the precise location of the damaged DNA. In fact, some of the world’s most powerful computers are involved in bioinformatics, analyzing millions of genes found in plants and animals for certain key genes. This could even revolutionize TV detective shows like CSI. Given tiny scraps of DNA (found in hair follicles, saliva, or bloodstains), one might be able to determine not just the person’s hair color, eye color, ethnicity, height, and medical history, but perhaps also his face.
See Robotics/AI Artificial vision Artsutanov, Yuri ASIMO robot, 2.1, 2.2, 2.3 Asimov, Isaac, 2.1, 6.1, 8.1 ASPM gene Asteroid landing Atala, Anthony Atomic force microscope Augmented reality Augustine Commission report, 6.1, 6.2 Avatar (movie), 1.1, 2.1, 6.1, 7.1 Avatars Backscatter X-rays Back to the Future movies, 5.1, 5.2 Badylak, Stephen Baldwin, David E. Baltimore, David, 1.1, 3.1, 3.2, 3.3 Benford, Gregory Big bang research Binnig, Gerd Bioinformatics Biotechnology. See Medicine/biotechnology Birbaumer, Niels Birth control Bismarck, Otto von Blade Runner (movie) Blue Gene computer Blümich, Bernhard, 1.1, 1.2 Boeing Corporation Booster-rocket technologies Bova, Ben, 5.1, 5.2 Boys from Brazil, The (movie) Brain artificial body parts, adaptation to basic structure of emotions and growing a human brain Internet contact lenses and locating every neuron in as neural network parallel processing in reverse engineering of simulations of “Brain drain” to the United States BrainGate device Brain injuries, treatment for Branson, Richard Brave New World (Huxley) Breast cancer Breazeal, Cynthia Brenner, Sydney Brooks, Rodney, 2.1, 2.2, 4.1 Brown, Dan Brown, Lester Buckley, William F.
See also Intellectual capitalism Carbon nanotubes, 4.1, 6.1 Carbon sequestration Cars driverless electric maglev, 5.1, 9.1 Cascio, Jamais Catoms Cave Man Principle biotechnology and computer animations and predicting the future and replicators and, 4.1, 4.2 robotics/AI and, 2.1, 2.2 sports and Cerf, Vint, 4.1, 6.1 Chalmers, David Charles, Prince of Wales Chemotherapy Chernobyl nuclear accident Chevy Volt Chinese Empire, 7.1, 7.2 Church, George Churchill, Winston, itr.1, 8.1 Cipriani, Christian Civilizations alien civilizations characteristics of various Types entropy and information processing and resistance to Type I civilization rise and fall of great empires rise of civilization on Earth science and wisdom, importance of transition from Type 0 to Type I, itr.1, 8.1, 8.2 Type II civilizations, 8.1, 8.2, 8.3 Type III civilizations, 8.1, 8.2 waste heat and Clarke, Arthur C. Clausewitz, Carl von Cloning, 3.1, 3.2 Cloud computing, 1.1, 7.1 Cochlear implants Code breaking Collins, Francis Comets Common sense, 2.1, 2.2, 2.3, 7.1, 7.2 Computers animations created by augmented reality bioinformatics brain simulations carbon nanotubes and cloud computing, 1.1, 7.1 digital divide DNA computers driverless cars exponential growth of computer power (Moore’s law), 1.1, 1.2, 1.3, 4.1 fairy tale life and far future (2070) four stages of technology and Internet glasses and contact lenses, 1.1, 1.2 medicine and midcentury (2030) mind control of molecular and atomic transistors nanotechnology and near future (present to 2030) optical computers parallel processing physics of computer revolution quantum computers quantum dot computers quantum theory and, 1.1, 4.1, 4.2, 4.3 scrap computers self-assembly and silicon chips, limitations of, 1.1, 1.2, 4.1 telekinesis with 3-D technology universal translators virtual reality wall screens See also Mind reading; Robotics/AI Condorcet, Marquis de Conscious robots, 2.1, 2.2 Constellation Program COROT satellite, 6.1, 8.1 Crick, Francis Criminology Crutzen, Paul Culture in Type I civilization Customization of products Cybertourism, itr.1, itr.2 CYC project Damasio, Antonio Dating in 2100, 9.1, 9.2, 9.3, 9.4 Davies, Stephen Da Vinci robotic system Dawkins, Richard, 3.1, 3.2, 3.3 Dawn computer Dean, Thomas Decoherence problem Deep Blue computer, 2.1, 2.2, 2.3 Delayed gratification DEMO fusion reactor Depression treatments Designer children, 3.1, 3.2, 3.3 Developing nations, 7.1, 7.2 Diamandis, Peter Dictatorships Digital divide Dinosaur resurrection Disease, elimination of, 3.1, 8.1 DNA chips DNA computers Dog breeds Donoghue, John, 1.1, 1.2 Dreams, photographing of Drexler, Eric Driverless cars Duell, Charles H.
Fifty Degrees Below by Kim Stanley Robinson
airport security, bioinformatics, Burning Man, clean water, Donner party, full employment, Intergovernmental Panel on Climate Change (IPCC), invisible hand, iterative process, means of production, minimum wage unemployment, North Sea oil, Ralph Waldo Emerson, Richard Feynman, statistical model, Stephen Hawking, the scientific method
He wanted to talk to everyone implicated in this: Yann Pierzinski—meaning Marta too, which would be hard, terrible in fact, but Marta had moved to Atlanta with Yann and they lived together there, so there would be no avoiding her. And then Francesca Taolini, who had arranged for Yann’s hire by a company she consulted for, in the same way Frank had hoped to. Did she suspect that Frank had been after Yann? Did she know how powerful Yann’s algorithm might be? He googled her. Turned out, among many interesting things, that she was helping to chair a conference at MIT coming soon, on bioinformatics and the environment. Just the kind of event Frank might attend. NSF even had a group going already, he saw, to talk about the new federal institutes. Meet with her first, then go to Atlanta to meet with Yann—would that make his stock in the virtual market rise, triggering more intense surveillance? An unpleasant thought; he grimaced. He couldn’t evade most of this surveillance. He had to continue to behave as if it wasn’t happening.
What the hell was that, after all? And how would you measure it? So at work Anna spent her time trying to concentrate, over a persistent underlying turmoil of worry about her younger son. Work was absorbing, as always, and there was more to do than there was time to do it in, as always. And so it provided its partial refuge. But it was harder to dive in, harder to stay under the surface in the deep sea of bioinformatics. Even the content of the work reminded her, on some subliminal level, that health was a state of dynamic balance almost inconceivably complex, a matter of juggling a thousand balls while unicycling on a tightrope over the abyss—in a gale—at night—such that any life was an astonishing miracle, brief and tenuous. But enough of that kind of thinking! Bear down on the fact, on the moment and the problem of the moment!
Take a problem, break it down into parts (analyze), quantify whatever parts you could, see if what you learned suggested anything about causes and effects; then see if this suggested anything about long-term plans, and tangible things to do. She did not believe in revolution of any kind, and only trusted the mass application of the scientific method to get any real-world results. “One step at a time,” she would say to her team in bioinformatics, or Nick’s math group at school, or the National Science Board; and she hoped that as long as chaos did not erupt worldwide, one step at a time would eventually get them to some tolerable state. Of course there were all the hysterical operatics of “history” to distract people from this method and its incremental successes. The wars and politicians, the police state regimes and terrorist insurgencies, the gross injustices and cruelties, the unnecessarily ongoing plagues and famines—in short, all the mass violence and rank intimidation that characterized most of what filled the history books; all that was real enough, indeed all too real, undeniable—and yet it was not the whole story.
Food Allergy: Adverse Reactions to Foods and Food Additives by Dean D. Metcalfe
active measures, Albert Einstein, bioinformatics, epigenetics, hygiene hypothesis, impulse control, life extension, longitudinal study, meta analysis, meta-analysis, mouse model, pattern recognition, phenotype, placebo effect, randomized controlled trial, selection bias, statistical model, stem cell, twin studies
J Allergy Clin Immunol 2000;106:228–38. 73 Thomas K, Bannon G, Hefle S, et al. In silico methods for evaluating human allergenicity to novel proteins. Bioinformatics Workshop Meeting Report, February 23–24, 2005. Toxicol Sci 2005;88:307–10. 74 Ladics GS, Bannon GA, Silvanovich A, Cressman, RF. Comparison of conventional FASTA identity searches with the 80 amino acid sliding window FASTA search for the elucidation of potential identities to known allergens. Mol Nutr Food Res 2007;51:985–998. 75 Bannon G, Ogawa T. Evaluation of available IgE-binding epitope data and its utility in bioinformatics. Mol Nutr Food Res 2006;50:638–44. 76 Hileman RE, Silvanovich A, Goodman RE, et al. Bioinformatic methods for allergenicity assessment using a comprehensive allergen database. Int Archives Allergy Immunol 2002;128:280–91. 77 Silvanovich A, Nemeth MA, Song P, et al.
The most important food allergen families will be discussed in this chapter. Food allergen protein families Based on their shared amino acid sequences and conserved three-dimensional structures, proteins can be classified into families using various bioinformatics tools which form the basis of several protein family databases, one of which is Pfam . Over the past 10 years or so there has been an explosion in the numbers of well characterized allergens, which have been sequenced and are being collected into a number of databases to facilitate bioinformatic analysis . We have undertaken this analysis for both plant  and animal food allergens  along with pollen allergens . They show similar distributions with the majority of allergens in each group falling into just 3–12 families with a tail 43 44 Chapter 4 of between 14 and 23 families comprising between 1 and 3 allergens each.
For example, the Codex Alimentarius (www.codexalimentarius. net/web/index_en.jsp) recommended a percentage identity score of at least 35% matched amino acid residues of at least 80 residues as being the lowest identity criteria for proteins derived from biotechnology that could suggest IgE cross-reactivity with a known allergen. However, Aalberse  has noted that proteins sharing less than 50% identity across the full length of the protein sequence are unlikely to be cross-reactive, and immunological cross-reactivity may not occur unless the proteins share at least 70% identity. Recent published work has led to the harmonization of the methods used for bioinformatic searches and a better understanding of the data generated [73,74] from such studies. An additional bioinformatics approach can be taken by searching for 100% identity matches along short sequences contained in the query sequence as they are compared to sequences in a database. These regions of short amino acid sequence homologies are intended to represent the smallest sequence that could function as an IgE-binding epitope . If any exact matches between a known allergen and a transgenic sequence were found using this strategy, it could represent the most conservative approach to predicting potential for a peptide fragment to act as an allergen.
The Wealth of Networks: How Social Production Transforms Markets and Freedom by Yochai Benkler
affirmative action, barriers to entry, bioinformatics, Brownian motion, call centre, Cass Sunstein, centre right, clean water, commoditize, dark matter, desegregation, East Village, fear of failure, Firefox, game design, George Gilder, hiring and firing, Howard Rheingold, informal economy, information asymmetry, invention of radio, Isaac Newton, iterative process, Jean Tirole, jimmy wales, John Markoff, Kenneth Arrow, longitudinal study, market bubble, market clearing, Marshall McLuhan, Mitch Kapor, New Journalism, optical character recognition, pattern recognition, peer-to-peer, pre–internet, price discrimination, profit maximization, profit motive, random walk, recommendation engine, regulatory arbitrage, rent-seeking, RFID, Richard Stallman, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, shareholder value, Silicon Valley, Skype, slashdot, social software, software patent, spectrum auction, technoutopianism, The Fortune at the Bottom of the Pyramid, The Nature of the Firm, transaction costs, Vilfredo Pareto
As more of the process of drug discovery of potential leads can be done by modeling and computational analysis, more can be organized for peer production. The relevant model here is open bioinformatics. Bioinformatics generally is the practice of pursuing solutions to biological questions using mathematics and information technology. Open bioinformatics is a movement within bioinformatics aimed at developing the tools in an open-source model, and in providing access to the tools and the outputs on a free and open basis. Projects like these include the Ensmbl Genome Browser, operated by the European Bioinformatics Institute and the Sanger Centre, or the National Center for Biotechnology Information (NCBI), both of which use computer databases to provide access to data and to run various searches on combinations, patterns, and so forth, in the data.
Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again by Eric Topol
23andMe, Affordable Care Act / Obamacare, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, artificial general intelligence, augmented reality, autonomous vehicles, bioinformatics, blockchain, cloud computing, cognitive bias, Colonization of Mars, computer age, computer vision, conceptual framework, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, dark matter, David Brooks, digital twin, Elon Musk, en.wikipedia.org, epigenetics, Erik Brynjolfsson, fault tolerance, George Santayana, Google Glasses, ImageNet competition, Jeff Bezos, job automation, job satisfaction, Joi Ito, Mark Zuckerberg, medical residency, meta analysis, meta-analysis, microbiome, natural language processing, new economy, Nicholas Carr, nudge unit, pattern recognition, performance metric, personalized medicine, phenotype, placebo effect, randomized controlled trial, recommendation engine, Rubik’s Cube, Sam Altman, self-driving car, Silicon Valley, speech recognition, Stephen Hawking, text mining, the scientific method, Tim Cook: Apple, War on Poverty, Watson beat the top human players on Jeopardy!, working-age population
In many leading medical schools throughout the country, there’s an “arms race” for Adam 1s and academic achievement, as Jonathan Stock at Yale University School of Medicine aptly points out.61 We need to be nurturing the Adam 2s, which is something that is all too often an area of neglect in medical education. There are many other critical elements that need to be part of the medical school curriculum. Future doctors need a far better understanding of data science, including bioinformatics, biocomputing, probabilistic thinking, and the guts of deep learning neural networks. Much of their efforts in patient care will be supported by algorithms, and they need to understand all the liabilities, to recognize bias, errors, false output, and dissociation from common sense. Likewise, the importance of putting the patient’s values and preferences first in any human-machine collaboration cannot be emphasized enough.
., “Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women with Breast Cancer.” JAMA, 2017. 318(22): pp. 2199–2210. 52. Golden, J. A., “Deep Learning Algorithms for Detection of Lymph Node Metastases from Breast Cancer: Helping Artificial Intelligence Be Seen.” JAMA, 2017. 318(22): pp. 2184–2186. 53. Yang, S. J., et al., “Assessing Microscope Image Focus Quality with Deep Learning.” BMC Bioinformatics, 2018. 19(1): p. 77. 54. Wang et al., Deep Learning for Identifying Metastatic Breast Cancer. 55. Wong, D., and S. Yip, “Machine Learning Classifies Cancer.” Nature, 2018. 555(7697): pp. 446–447; Capper, D., et al., “DNA Methylation-Based Classification of Central Nervous System Tumours.” Nature, 2018. 555(7697): pp. 469–474. 56. Coudray, N., et al., “Classification and Mutation Prediction from Non–Small Cell Lung Cancer Histopathology Images Using Deep Learning.”
Nat Biotechnol, 2018. 36(9): pp. 820–828. 70. Ota, S., et al., “Ghost Cytometry.” Science, 2018. 360(6394): pp. 1246–1251. 71. Nitta, N., et al., “Intelligent Image-Activated Cell Sorting.” Cell, 2018. 175(1): pp. 266–276 e13. 72. Weigert, M., et al., Content-Aware Image Restoration: Pushing the Limits of Fluorescence Microscopy, bioRxiv. 2017; Yang, S. J., et al., “Assessing Microscope Image Focus Quality with Deep Learning.” BMC Bioinformatics, 2018. 19(1): p. 77. 73. Ouyang, W., et al., “Deep Learning Massively Accelerates Super-Resolution Localization Microscopy.” Nat Biotechnol, 2018. 36(5): pp. 460–468. 74. Stumpe, M., “An Augmented Reality Microscope for Realtime Automated Detection of Cancer,” Google AI Blog. 2018. 75. Wise, J., “These Robots Are Learning to Conduct Their Own Science Experiments,” Bloomberg. 2018. 76. Bohannon, J., “A New Breed of Scientist, with Brains of Silicon,” Science Magazine. 2017. 77.
The Data Journalism Handbook by Jonathan Gray, Lucy Chambers, Liliana Bounegru
Amazon Web Services, barriers to entry, bioinformatics, business intelligence, carbon footprint, citizen journalism, correlation does not imply causation, crowdsourcing, David Heinemeier Hansson, eurozone crisis, Firefox, Florence Nightingale: pie chart, game design, Google Earth, Hans Rosling, information asymmetry, Internet Archive, John Snow's cholera map, Julian Assange, linked data, moral hazard, MVC pattern, New Journalism, openstreetmap, Ronald Reagan, Ruby on Rails, Silicon Valley, social graph, SPARQL, text mining, web application, WikiLeaks
The Vaalirahoitus.fi website will provide the public and the press with information on campaign funding for every election from now on. Figure 3-12. Election financing (Helsingin Sanomat) 2. Brainstorm for ideas The participants of HS Open 2 came up with twenty different prototypes about what to do with the data. You can find all the prototypes on our website (text in Finnish). A bioinformatics researcher called Janne Peltola noted that campaign funding data looked like the gene data they research, in terms of containing many interdependencies. In bioinformatics there is an open source tool called Cytoscape that is used to map these interdependencies. So we ran the data through Cytoscape, and got a very interesting prototype. 3. Implement the idea on paper and on the Web The law on campaign funding states that elected members of parliament must declare their funding two months after the elections.
Exploring Everyday Things with R and Ruby by Sau Sheong Chang
Alfred Russel Wallace, bioinformatics, business process, butterfly effect, cloud computing, Craig Reynolds: boids flock, Debian, Edward Lorenz: Chaos theory, Gini coefficient, income inequality, invisible hand, p-value, price stability, Ruby on Rails, Skype, statistical model, stem cell, Stephen Hawking, text mining, The Wealth of Nations by Adam Smith, We are the 99%, web application, wikimedia commons
The largest is CRAN (Comprehensive R Archive Network; http://cran.r-project.org). CRAN is hosted by the R Foundation (the same organization that is developing R) and contains 3,646 packages as of this writing. CRAN is also mirrored in many sites worldwide. Another public repository is Bioconductor (http://www.bioconductor.org), an open source project that provides tools for bioinformatics and is primarily R-based. While the packages in Bioconductor are focused on bioinformatics, it doesn’t mean that they can’t be used for other domains. As of this writing, there are 516 packages in Bioconductor. Finally, there is R-Forge (http://r-forge.r-project.org), a collaborative software development application for R. It is based on FusionForge, a fork from GForge (on which RubyForge was based), which in turn was forked from the original software that was used to build SourceForge.
Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos
AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, wikimedia commons
TopQuadrant (2015) TopBraid Composer Standard Edition. www.topquadrant.com/ tools/modeling-topbraid-composer-standard-edition/. Accessed 31 March 2015. 12. TopQuadrant (2015) TopBraid Composer Maestro Edition. www.topquadrant.com/tools/ide-topbraid-composer-maestro-edition/. Accessed 31 March 2015. 13. The Apache Software Foundation (2015) Apache Stanbol. http://stanbol.apache.org. Accessed 31 March 2015. 14. Fluent Editor. www.cognitum.eu/semantics/FluentEditor/. Accessed 15 April 2015. 15. The European Bioinformatics Institute (2015) ZOOMA. www.ebi.ac.uk/fgpt/zooma/. Accessed 31 March 2015. 16. Harispe, S. (2014) Semantic Measures Library & ToolKit. www.semantic-measures-library.org. Accessed 29 March 2015. 17. Motik, B., Shearer, R., Glimm, B., Stoilos, G., Horrocks, I. (2013) HermiT OWL Reasoner. http://hermit-reasoner.com. Accessed 31 March 2015. 18. Clark & Parsia (2015) Pellet: OWL 2 Reasoner for Java. http://clarkparsia.com/ pellet/.
The Toolkit features an AML text editor and a visual editor, an AML validator, and provides mapping and testing view for AML. Semantic Automated Discovery and Integration (SADI) Semantic Automated Discovery and Integration (SADI) is a lightweight set of Semantic Web Service design patterns (https://code.google.com/p/sadi/). It was primarily designed for scientific service publication and is especially useful in bioinformatics. Powered by web standards, SADI implements Semantic Web technologies to consume and produce RDF instances of OWL-DL classes, where input and output class URIs resolve to an OWL document through HTTP GET. SADI supports RDF/XML and Notation3 serializations. The SADI design patterns provide automatic discovery of appropriate services, based on user needs, and can automatically chain these services into complex analytical workflows.
Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It) by Salim Ismail, Yuri van Geest
23andMe, 3D printing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, Ben Horowitz, bioinformatics, bitcoin, Black Swan, blockchain, Burning Man, business intelligence, business process, call centre, chief data officer, Chris Wanstrath, Clayton Christensen, clean water, cloud computing, cognitive bias, collaborative consumption, collaborative economy, commoditize, corporate social responsibility, cross-subsidies, crowdsourcing, cryptocurrency, dark matter, Dean Kamen, dematerialisation, discounted cash flows, disruptive innovation, distributed ledger, Edward Snowden, Elon Musk, en.wikipedia.org, Ethereum, ethereum blockchain, game design, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, hiring and firing, Hyperloop, industrial robot, Innovator's Dilemma, intangible asset, Internet of things, Iridium satellite, Isaac Newton, Jeff Bezos, Joi Ito, Kevin Kelly, Kickstarter, knowledge worker, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, lifelogging, loose coupling, loss aversion, low earth orbit, Lyft, Marc Andreessen, Mark Zuckerberg, market design, means of production, minimum viable product, natural language processing, Netflix Prize, NetJets, Network effects, new economy, Oculus Rift, offshore financial centre, PageRank, pattern recognition, Paul Graham, paypal mafia, peer-to-peer, peer-to-peer model, Peter H. Diamandis: Planetary Resources, Peter Thiel, prediction markets, profit motive, publish or perish, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, Ronald Coase, Second Machine Age, self-driving car, sharing economy, Silicon Valley, skunkworks, Skype, smart contracts, Snapchat, social software, software is eating the world, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, subscription business, supply-chain management, TaskRabbit, telepresence, telepresence robot, Tony Hsieh, transaction costs, Travis Kalanick, Tyler Cowen: Great Stagnation, uber lyft, urban planning, WikiLeaks, winner-take-all economy, X Prize, Y Combinator, zero-sum game
Once any domain, discipline, technology or industry becomes information-enabled and powered by information flows, its price/performance begins doubling approximately annually. Third, once that doubling pattern starts, it doesn’t stop. We use current computers to design faster computers, which then build faster computers, and so on. Finally, several key technologies today are now information-enabled and following the same trajectory. Those technologies include artificial intelligence (AI), robotics, biotech and bioinformatics, medicine, neuroscience, data science, 3D printing, nanotechnology and even aspects of energy. Never in human history have we seen so many technologies moving at such a pace. And now that we are information-enabling everything around us, the effects of the Kurzweil’s Law of Accelerating Returns are sure to be profound. What’s more, as these technologies intersect (e.g., using deep-learning AI algorithms to analyze cancer trials), the pace of innovation accelerates even further.
Of the 155 teams competing, three were awarded a total of $100,000 in prize money. What was particularly interesting was the fact that none of the winners had prior experience with natural language processing (NLP). Nonetheless, they beat the experts, many of them with decades of experience in NLP under their belts. This can’t help but impact the current status quo. Raymond McCauley, Biotechnology & Bioinformatics Chair at Singularity University, has noticed that “When people want a biotech job in Silicon Valley, they hide their PhDs to avoid being seen as a narrow specialist.” So, if experts are suspect, where should we turn instead? As we’ve already noted, everything is measurable. And the newest profession making those measurements is the data scientist. Andrew McAfee calls this new breed of data experts “geeks.”
Life's Greatest Secret: The Race to Crack the Genetic Code by Matthew Cobb
a long time ago in a galaxy far, far away, anti-communist, Asilomar, Asilomar Conference on Recombinant DNA, Benoit Mandelbrot, Berlin Wall, bioinformatics, Claude Shannon: information theory, conceptual framework, Copley Medal, dark matter, discovery of DNA, double helix, Drosophila, epigenetics, factory automation, From Mathematics to the Technologies of Life and Death, James Watt: steam engine, John von Neumann, Kickstarter, New Journalism, Norbert Wiener, phenotype, post-materialism, Stephen Hawking
One of the main tasks when a genome has been completed is to annotate it, identifying genes and their exons and introns, and above all finding genes that have equivalents in other organisms, preferably with some kind of known function. Often the only basis for identifying the function of a gene is because its DNA sequence is similar to a gene in a different organism where a function has been demonstrated. This has led to a new discipline called genomics, which involves obtaining genomes and understanding their nature and evolution. It includes a new set of techniques, collectively called bioinformatics, which combine computing and population genetics to make inferences about the patterns of evolution and enable us to determine which genes have a common origin or function. Training biologists in the techniques of computer science will be an important part of twenty-first-century scientific education. One of the most far-reaching scientific consequences of sequencing came with the work of Carl Woese, who realised in the 1960s that he could use the RNA found in ribosomes (rRNA), which is common to every organism on the planet, to study patterns of evolution.
., ‘Prematurity and uniqueness in scientific discovery’, Scientific American, vol. 227 (12), 1972, pp. 84–93. Stergachis, A. B., Haugen, E., Shafer. A. et al., ‘Exonic transcription factor binding directs codon choice and affects protein evolution’, Science, vol. 342, 2013, pp. 1367–72. Stern, K. G., ‘Nucleoproteins and gene structure’, Yale Journal of Biology and Medicine, vol. 19, 1947, pp. 937–49. Stevens, H., Life Out of Sequence: A Data-Driven History of Bioinformatics, London, University of Chicago Press, 2013. Strasser, B. J., ‘A world in one dimension: Linus Pauling, Francis Crick and the Central Dogma of molecular biology’, History and Philosophy of the Life Sciences, vol. 28, 2006, pp. 491–512. Stretton, A. O. W., ‘The first sequence: Fred Sanger and insulin’, Genetics, vol. 162, 2002, pp. 527–32. Sturtevant, A. H., A History of Genetics, London, Harper, 1965.
awards 38, 50 Francis Crick on 132, 136, 216 health 38 on nucleic acids as the transforming principle 43–53 reactions to his ideas 55–9, 62–4, 68–70 transformation in pneumococci 34–41 Avery, Roy (brother of Oswald) 44–5, 59, 63 B Bacillus thuringiensis 270 bacteria based on ‘synthetic’ DNA 267 capsule formation and virulence 36–7 DNA sequences online 235 enzymatic adaptation 152 generality of transformation in 59 negative feedback in biosynthesis by 153–5 sexual reproduction 51 transformation in E. coli 51–2, 56, 61, 63 transformation in pneumococci 36–9, 63 bacteriophages see phages Bakewell, Robert 1–2 Baltimore, David 251–2 Bar-Hillel, Yehoshuua 144 Barnett, Leslie 193 base pairing complementary base pairing 106, 109 frequency in different genomes 295 κ and Π base pairs 278 spontaneous 102 unnatural base pairs 277–8, 285 Z and P base pairs 278 base sequence as the genetic code 111 relation to amino acid sequence 117, 124–6, 133 variability 54, 62, 70 bases, DNA hydrogen bonding between 58, 92, 101, 106 ratio of pyrimidines to purines 42, 91, 102, 106, 109 sequence variation and specificity 57–8 bases, nucleic acid defined 316 investigations of DNA and RNA 198 orientation 42 proportions within and between species 62, 90 tetranucleotide hypothesis 7, 42, 51, 54, 62, 90 see also purines; pyrimidines Bateson, Gregory 22 Baulcombe, David 259 Beadle, George at Chemical Basis of Heredity symposium 132 comments on Benzer’s work 162 Nobel Prize 215 one-gene-one-enzyme hypothesis 9–11, 204, 243–4 at the Washington Physics conference 33 behaviour, genetic effects 304–5 Beighton, Elwyn 102 Beljanski, Mirko 189–90 Bell, Florence 91, 93, 104 Benner, Steven 277–8 Benzer, Seymour 161–3, 165, 187n, 203, 215, 302 Berg, Paul 279, 281, 285 Bergmann, Max 46 β-galactosidase 152–3, 156, 158, 160, 165 ‘Big Science’ 311–12 Bigelow, Julian 22–4, 27 Biochemical and Biophysical Research Communications 180 bioinformatics 238 The Biological Replication of Macromolecules symposium 130 ‘Biological units endowed with genetic continuity’ meeting 53, 59–60 biosecurity 280–1, 285 biotechnology DNA fingerprinting as 231 fermentation as 268 genetically modified organisms 269–71, 284 regulation of 284–5 synthetic biology 277 Birney, Ewan 242, 247, 271 bits (binary digits) 27, 78 Blair, Tony 233 ‘blender experiments’ 68 Boivin, André on DNA leading to RNA 71, 140, 214 Mirsky and 56–7, 59 transformation in E. coli 51–2, 56 on varying DNA quantities 60–1 Botstein, David 231 Boveri, Theodor 3 Brachet, Jean 58, 71–2, 116 Bragg, Sir Laurence 94–5, 100, 105, 108 BRCA1 gene 234 Brenner, Sydney adaptor hypothesis 121, 135, 209 on cell-free systems 182 on the coding problem 172 coinage of ‘codon’ 203 collaboration with Crick 121, 125, 165–6, 189, 192–3 developmental biology interest 216 disproves overlapping code idea 123–4, 200 messenger RNA idea 165–7, 172, 178, 182, 190 Nobel Prize 215 nonsense codons 213 on using polynucleotides 189 work with viruses 174, 192, 200, 213 Bridges, Calvin 4 Brillouin, Léon 76, 202 Britten, Roy 243 Brookhaven Laboratory 174 BSE (bovine spongiform encephalopathy) 253–4 Burnet, Macfarlane Enzyme, Antigen and Virus: A Study of Macromolecular Pattern in Action 134–5, 139, 141, 146–7 on information flows 139–41, 146–7 meeting with Avery 34–5 on non-coding DNA 141, 222 Bush, Vannevar 20–1, 26 C ‘C-value paradox’ 246 caddis-fly 175 Caenorhabditis elegans 231–2, 258, 277 Cairns, John 218 Caldwell, P.
Algorithms Unlocked by Thomas H. Cormen
A clique in an undirected graph G is a subset S of vertices such that the graph has an edge between every pair of vertices in S. The size of a clique is the number of vertices it contains. As you might imagine, cliques play a role in social network theory. Modeling each individual as a vertex and relationships between individuals as undirected edges, a clique represents a group of individuals all of whom have relationships with each other. Cliques also have applications in bioinformatics, engineering, and chemistry. The clique problem takes two inputs, a graph G and a positive integer k, and asks whether G has a clique of size k. For example, the graph on the next page has a clique of size 4, shown with heavily shaded vertices, and no other clique of size 4 or greater. 192 Chapter 10: Hard? Problems Verifying a certificate is easy. The certificate is the k vertices claimed to form a clique, and we just have to check that each of the k vertices has an edge to the other k 1.
Vertex cover A vertex cover in an undirected graph G is a subset S of the vertices such that every edge in G is incident on at least one vertex in S. We say that each vertex in S “covers” its incident edges. The size of a vertex cover is the number of vertices it contains. As in the clique problem, the vertex-cover problem takes as input an undirected graph G and a positive integer m. It asks whether G has a vertex cover of size m. Like the clique problem, the vertex-cover problem has applications in bioinformatics. In another application, you have a building with hallways and cameras that can scan up to 360 degrees located at the intersections of hallways, and you want to know whether m cameras will allow you to see all the hallways. Here, edges model hallways and vertices model intersections. In yet another application, finding vertex covers helps in designing strategies to foil worm attacks on computer networks.
The Simulation Hypothesis by Rizwan Virk
3D printing, Albert Einstein, Apple II, artificial general intelligence, augmented reality, Benoit Mandelbrot, bioinformatics, butterfly effect, discovery of DNA, Dmitri Mendeleev, Elon Musk, en.wikipedia.org, Ernest Rutherford, game design, Google Glasses, Isaac Newton, John von Neumann, Kickstarter, mandelbrot fractal, Marc Andreessen, Minecraft, natural language processing, Pierre-Simon Laplace, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, Schrödinger's Cat, Search for Extraterrestrial Intelligence, Silicon Valley, Stephen Hawking, Steve Jobs, Steve Wozniak, technological singularity, Turing test, Vernor Vinge, Zeno's paradox
Video games wouldn’t be possible without computer graphics, and it is the development of this relatively new field of science that has brought the simulation hypothesis out of science fiction and into serious consideration. Within computer science, video games and entertainment have played a unique role in driving the development of both hardware and software. Examples include the development of GPUs (graphics processing units) for optimized rendering, CGI (computer-generated effects), and CAD (computer-aided design), as well as artificial intelligence and bioinformatics. The most recent incarnation of fully immersive entertainment technology is virtual reality (VR). Despite wondering about the simulation hypothesis for many years, it wasn’t until VR and AI reached their current level of sophistication that I could see a clear path to how we might develop all-encompassing simulations like the one depicted in The Matrix, which led me to write this book. In 2016, I had a chance to play a VR table tennis game, which I played using VR goggles and motion controllers.
The biological world, it turns out, is also based on information, though, of course, of a different type—one which creates cells based on instructions in DNA through various biological processes. Within computer science and AI, biological processes have shown that they can be utilized to get much smarter and more unique results—most of today’s machine learning is based on the conditioning of neural networks, which are based on biological algorithms. While there is still some way to go, the burgeoning field of bioinformatics and modeling of biological processes has made information and computation an integral part of the organic world! Most importantly, the physical world, which was thought of in classical physics as a set of physical objects moving in continuous paths around the heavens, has been updated. As quantum physics reveals that there is no such thing as a physical object, that most objects consist of empty space and electrons, we start to get into metaphysical questions about what is real in the world.
Running Money by Andy Kessler
Andy Kessler, Apple II, bioinformatics, Bob Noyce, British Empire, business intelligence, buy and hold, buy low sell high, call centre, Corn Laws, Douglas Engelbart, family office, full employment, George Gilder, happiness index / gross national happiness, interest rate swap, invisible hand, James Hargreaves, James Watt: steam engine, joint-stock company, joint-stock limited liability company, knowledge worker, Leonard Kleinrock, Long Term Capital Management, mail merge, Marc Andreessen, margin call, market bubble, Maui Hawaii, Menlo Park, Metcalfe’s law, Mitch Kapor, Network effects, packet switching, pattern recognition, pets.com, railway mania, risk tolerance, Robert Metcalfe, Sand Hill Road, Silicon Valley, South China Sea, spinning jenny, Steve Jobs, Steve Wozniak, Toyota Production System, zero-sum game
They analyzed central banks and politicians and ﬁgured out the direction of currencies. In an era of relatively stable currencies, the modern-day investor has to dig, early and often and everywhere. I’d still rather dig than get whacked by a runaway yen-carry trade. Another cycle is coming. The drivers of it are still unclear. 296 Running Money Likely suspects are things like wireless data, on-command computing, nanotechnology, bioinformatics, genomic sorting—who the hell knows what it will be. But this is what I do. Looking for the next barrier, the next piece of technology, the next waterfall and the next great, longterm investment. Sounds quaint. I’ve come a long way from tripping across Homa Simpson dolls trying to raise money in Hong Kong. Or getting sweated on by desperate Koreans. Or driving around all day with Fred. Or getting thrown out of deals.
See AOL Andreessen, Marc, 197, 199 animation, 134–35 AOL (America Online), 69–73, 207, 208, 223, 290 Cisco routers and, 199 Inktomic cache software and, 143 Netscape Navigator purchase, 201, 225 Telesave deal, 72–73 TimeWarner deal, 223, 229 as top market cap company, 111 Apache Web server, 247 Apple Computer, 45, 127, 128 Apple II, 183 Applied Materials, 245 Archimedes (propeller ship), 94 Arkwright, Richard, 65 ARPANET, 186, 187, 189, 191 Arthur Andersen, 290 Artists and Repertoire (A&R), 212, 216 Asian debt crisis, 3, 150, 151, 229, 260 yen and, 162–65, 168, 292 @ (at sign), 187 AT&T, 61, 185–86, 189 August Capital, 2, 4 auto industry, 267–68 Aziz, Tariq, 26 Babbage, Charles, 93 Baker, James, 26 Balkanski, Alex, 44, 249 bandwidth, 60, 111, 121, 140, 180, 188–89 Baran, Paul, 184, 185 Barbados, 251, 254 300 Index Barksdale, Jim, 198, 199–201 Barksdale Group, 201 BASE, 249 BASIC computer language, 126, 127 BBN. See Bolt, Baranek and Newman Bechtolsheim, Andy, 191 Bedard, Kipp, 19–20 Bell, Dave, 127 Bell Labs, 103, 110 Berry, Hank, 205–6 Beyond.com, 208 Bezos, Jeff, 228 Biggs, Barton, 163 big-time trends. See waterfalls bioinformatics, 296 biotech industry, 237 Black, Joseph, 54 Blutcher (steam locomotive), 92 Boggs, David, 189, 190 Bolt, Baranek and Newman, 184, 187 bonds, 11, 30–31, 164 Bonsal, Frank, 144–49 Borislow, Daniel, 72–73 Bosack, Len, 191 Boulton, Matthew, 55–58, 65, 66, 89 Boulton & Watt Company, 56–58, 64, 65, 89, 246, 247, 272 Bowman, Larry, 291–92 Bowman Capital, 291 Brady bonds, 164 Britain, 42, 50–59, 258 industrial economy, 42, 64–68, 91–95, 272 patent law, 55 textile manufacture, 64–68 wealth creation, 257, 271–72 broadband, 164, 225 browsers, 196–201 Brunel, I.
The Deep Learning Revolution (The MIT Press) by Terrence J. Sejnowski
AI winter, Albert Einstein, algorithmic trading, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, bioinformatics, cellular automata, Claude Shannon: information theory, cloud computing, complexity theory, computer vision, conceptual framework, constrained optimization, Conway's Game of Life, correlation does not imply causation, crowdsourcing, Danny Hillis, delayed gratification, discovery of DNA, Donald Trump, Douglas Engelbart, Drosophila, Elon Musk, en.wikipedia.org, epigenetics, Flynn Effect, Frank Gehry, future of work, Google Glasses, Google X / Alphabet X, Guggenheim Bilbao, Gödel, Escher, Bach, haute couture, Henri Poincaré, I think there is a world market for maybe five computers, industrial robot, informal economy, Internet of things, Isaac Newton, John Conway, John Markoff, John von Neumann, Mark Zuckerberg, Minecraft, natural language processing, Netflix Prize, Norbert Wiener, orbital mechanics / astrodynamics, PageRank, pattern recognition, prediction markets, randomized controlled trial, recommendation engine, Renaissance Technologies, Rodney Brooks, self-driving car, Silicon Valley, Silicon Valley startup, Socratic dialogue, speech recognition, statistical model, Stephen Hawking, theory of mind, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, traveling salesman, Turing machine, Von Neumann architecture, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra
Instead of a string of letters, the input was a string of amino acids, and instead of predicting phonemes, the network predicted the secondary structure. The training set was 3D structures determined by x-ray crystallography. To our surprise, the secondary structure predictions for new proteins were far better than the best methods based on biophysics.10 This landmark study was the first application of machine learning to molecular sequences, a field that is now called bioinformatics. Backpropagating Errors 117 Another network that learned how to form the past tense of English verbs became a cause célèbre in the world of cognitive psychology as the rule-based old guard battled it out with the avant-garde PDP Group.11 The regular way to form the past tense of an English verb is to add the suffix “ed,” as in forming “trained” from “train.” But there are irregular exceptions, such as “ran” from “run.”
., 82 infomax ICA algorithm, 81, 83, 83f, 84, 86 neural nets and, 82, 90, 296n15 photograph, 83f on water structure, 296n15 writings, 79, 85f, 295n2, 295n4, 295n6, 296n9, 306n24 Bellman, Richard, 145, 304n4 Bellman equation. See Dynamic programming, algorithm for Benasich, April A., 184, 308n22 Bengio, Yoshua, 135, 139f, 141, 141f, 302nn4–5, 303n20, 304n25, 304n28 Berg, Howard C., 319n12 Berger, Hans, 86 Berra, Yogi, x Berry, Halle, 235, 236f, 237 Bi, Guoqiang Q., 216f Big data, 10, 164, 229 Bioinformatics, 116 Biophysics, 116. See also under Johns Hopkins University “Biophysics of Computation, The” (course), 104 Birds consulting with each other, 29f Birdsong, 155–156, 157f Bishop, Christopher M., 279 Index Black boxes the case against, 253–255 neural network as a black box, 123 Blakeslee, Sandra, 316n14 Blandford, Roger, 312n1 Blind source separation problem, 81, 82f, 83f Blocks World, 27 Boahen, Kwabena A., 313n14 Boltzmann, Ludwig, 99 Boltzmann learning, unsupervised, 106 Boltzmann machine backpropagation of errors contrasted with, 112 Charles Rosenberg on, 112 criticisms of, 106 diagram, 98b at equilibrium, 99 Geoffrey Hinton and, 49, 79, 104, 105f, 106, 110, 112, 127 hidden units, 98b, 101, 102, 104, 106, 109 learning mirror symmetries, 102, 104 limitations, 107 multilayer, 49, 104, 105f, 106, 109 for handwritten digit recognition and generation, 104, 105f, 106 overview, 97, 98b, 99, 101, 135 perceptron contrasted with, 99, 101, 102, 106, 109 restricted, 106 separating figure from ground with, 97, 100f supervised and unsupervised versions, 106 Boltzmann machine learning algorithm, 99, 101, 109, 133, 158 goal of, 99 history in neuroscience, 101 “wake” and “sleep” phases, 98b, 101–102 323 Boole, George, 54, 55f Boolean logic, 54 Border-ownerships cells, 99 Botvinick, Matthew, 317n15 Brain.
Solr 1.4 Enterprise Search Server by David Smiley, Eric Pugh
Amazon Web Services, bioinformatics, cloud computing, continuous integration, database schema, domain-specific language, en.wikipedia.org, fault tolerance, Firefox, information retrieval, Ruby on Rails, web application, Y Combinator
., which received angel funding from the Y Combinator fund, and he relocated to San Francisco. WebMynd is one of the largest installations of Solr, indexing up to two million HTML documents per day, and making heavy use of Solr's multicore features to enable a partially active index. Jerome Eteve holds a BSC in physics, maths and computing and an MSC in IT and bioinformatics from the University of Lille (France). After starting his career in the field of bioinformatics, where he worked as a biological data management and analysis consultant, he's now a senior web developer with interests ranging from database level issues to user experience online. He's passionate about open source technologies, search engines, and web application architecture. At present, he is working since 2006 for Careerjet Ltd, a worldwide job search engine.
The Elements of Statistical Learning (Springer Series in Statistics) by Trevor Hastie, Robert Tibshirani, Jerome Friedman
Bayesian statistics, bioinformatics, computer age, conceptual framework, correlation coefficient, G4S, greed is good, linear programming, p-value, pattern recognition, random walk, selection bias, speech recognition, statistical model, stochastic process, The Wisdom of Crowds
S TAT I S T I C S ---- › springer.com The Elements of Statistical Learning During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry.
In the early days, these problems often came from agricultural and industrial experiments and were relatively small in scope. With the advent of computers and the information age, statistical problems have exploded both in size and complexity. Challenges in the areas of data storage, organization and searching have led to the new field of “data mining”; statistical and computational problems in biology and medicine have created “bioinformatics.” Vast amounts of data are being generated in many fields, and the statistician’s job is to make sense of it all: to extract important patterns and trends, and understand “what the data says.” We call this learning from data. The challenges in learning from data have led to a revolution in the statistical sciences. Since computation plays such a key role, it is not surprising that much of this new development has been done by researchers in other fields such as computer science and engineering.
Third International Conference on Document Analysis and Recognition, Vol. 1, IEEE Computer Society Press, New York, pp. 278–282. References 713 Hoefling, H. and Tibshirani, R. (2008). Estimation of sparse Markov networks using modified logistic regression and the lasso, submitted. Hoerl, A. E. and Kennard, R. (1970). Ridge regression: biased estimation for nonorthogonal problems, Technometrics 12: 55–67. Hothorn, T. and Bühlmann, P. (2006). Model-based boosting in high dimensions, Bioinformatics 22(22): 2828–2829. Huber, P. (1964). Robust estimation of a location parameter, Annals of Mathematical Statistics 53: 73–101. Huber, P. (1985). Projection pursuit, Annals of Statistics 13: 435–475. Hunter, D. and Lange, K. (2004). A tutorial on MM algorithms, The American Statistician 58(1): 30–37. Hyvärinen, A. and Oja, E. (2000). Independent component analysis: algorithms and applications, Neural Networks 13: 411–430.
Information: A Very Short Introduction by Luciano Floridi
agricultural Revolution, Albert Einstein, bioinformatics, carbon footprint, Claude Shannon: information theory, conceptual framework, double helix, Douglas Engelbart, Douglas Engelbart, George Akerlof, Gordon Gekko, industrial robot, information asymmetry, intangible asset, Internet of things, invention of writing, John Nash: game theory, John von Neumann, Laplace demon, moral hazard, Nash equilibrium, Nelson Mandela, Norbert Wiener, Pareto efficiency, phenotype, Pierre-Simon Laplace, prisoner's dilemma, RAND corporation, RFID, Thomas Bayes, Turing machine, Vilfredo Pareto
Consider the following examples: medical information is information about medical facts (attributive use), not information that has curative properties; digital information is not information about something digital, but information that is in itself of digital nature (predicative use); and military information can be both information about something military (attributive) and of military nature in itself (predicative). When talking about biological or genetic information, the attributive sense is common and uncontroversial. In bioinformatics, for example, a database may contain medical records and genealogical or genetic data about a whole population. Nobody disagrees about the existence of this kind of biological or genetic information. It is the predicative sense that is more contentious. Are biological or genetic processes or elements intrinsically informational in themselves? If biological or genetic phenomena count as informational predicatively, is this just a matter of modelling, that is, may be seen as being informational?
Scikit-Learn Cookbook by Trent Hauck
In this recipe, we'll fit a regression model with a few 10,000 features, but only 1,000 points. We'll walk through the various univariate feature selection methods: >>> from sklearn import datasets >>> X, y = datasets.make_regression(1000, 10000) 184 www.it-ebooks.info Chapter 5 Now that we have the data, we will compare the features that are included with the various methods. This is actually a very common situation when you're dealing in text analysis or some areas of bioinformatics. How to do it... First, we need to import the feature_selection module: >>> from sklearn import feature_selection >>> f, p = feature_selection.f_regression(X, y) Here, f is the f score associated with each linear model fit with just one of the features. We can then compare these features and based on this comparison, we can cull features. p is also the p value associated with that f value.
The Invisible Web: Uncovering Information Sources Search Engines Can't See by Gary Price, Chris Sherman, Danny Sullivan
AltaVista, American Society of Civil Engineers: Report Card, bioinformatics, Brewster Kahle, business intelligence, dark matter, Donald Davies, Douglas Engelbart, Douglas Engelbart, full text search, HyperCard, hypertext link, information retrieval, Internet Archive, joint-stock company, knowledge worker, natural language processing, pre–internet, profit motive, publish or perish, search engine result page, side project, Silicon Valley, speech recognition, stealth mode startup, Ted Nelson, Vannevar Bush, web application
FishBase is a relational database with fish information to cater to different professionals such as research scientists, fisheries managers, zoologists, and many more. FishBase on the Web contains practically all fish species known to science.” Search Form URL: http://www.fishbase.org/search.cfm GeneCards http://bioinformatics.weizmann.ac.il “GeneCards is a database of human genes, their products, and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol, as well as selected others [gene listing].” Search Form URL: http://bioinformatics.weizmann.ac.il/cards/ Integrated Taxonomic Information System (Biological Names) http://www.itis.usda.gov/plantproj/itis/index.html “The Integrated Taxonomic Information System (ITIS) is a partnership of U.S., Canadian, and Mexican agencies, other organizations, and taxonomic specialists cooperating on the development of an online, scientifically credible, list of biological names focusing on the biota of North America.”
Gnuplot Cookbook by Lee Phillips
Phillips is now the Chief Scientist of the Alogus Research Corporation, which conducts research in the physical sciences and provides technology assessment for investors. I am grateful to the users of my gnuplot web pages for their interest, questions, and suggestions over the years, and to my family for their patience and support. About the Reviewers Andreas Bernauer is a Software Engineer at Active Group in Germany. He graduated at Eberhard Karls Universität Tübingen, Germany, with a Degree in Bioinformatics and received a Master of Science degree in Genetics from the University of Connecticut, USA. In 2011, he earned a doctorate in Computer Engineering from Eberhard Karls Universität Tübingen. Andreas has more than 10 years of professional experience in software engineering. He implemented the server-side scripting engine in the scheme-based SUnet web server, hosted the Learning-Classifier-System workshops in Tübingen.
Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang
AI winter, artificial general intelligence, bioinformatics, brain emulation, combinatorial explosion, complexity theory, computer vision, conceptual framework, correlation coefficient, epigenetics, friendly AI, G4S, information retrieval, Isaac Newton, John Conway, Loebner Prize, Menlo Park, natural language processing, Occam's razor, p-value, pattern recognition, performance metric, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K
We do not buy the argument that “Since X plays an important role in intelligence, studying X contributes to the study of intelligence in general”, where X can be replaced by reasoning, learning, planning, perceiving, acting, etc. On the contrary, we believe that most of the current AI research works make little direct contribution to AGI, though these works have value for many other reasons. Previously we have mentioned “machine learning” as an example. One of us (Goertzel) has published extensively about applications of machine learning algorithms to bioinformatics. This is a valid, and highly important sort of research – but it doesn’t have much to do with achieving general intelligence. There is no reason to believe that “intelligence” is simply a toolbox, containing mostly unconnected tools. Since the current AI “tools” have been built according to very different theoretical considerations, to implement them as modules in a big system will not necessarily make them work together, correctly and efficiently.
Unlike most contemporary AI projects, it is specifically oriented towards artificial general intelligence (AGI), rather than being restricted by design to one narrow domain or range of cognitive functions. The NAIE integrates aspects of prior AI projects and approaches, including symbolic, neural-network, evolutionary programming and reinforcement learning. The existing codebase is being applied in bioinformatics, NLP and other domains. To save space, some of the discussion in this paper will assume a basic familiarity with NAIE structures such as Atoms, Nodes, Links, ImplicationLinks and so forth, all of which are described in previous references and in other papers in this volume. 1.2. Cognitive Development in Simulated Androids Jean Piaget, in his classic studies of developmental psychology  conceived of child development as falling into four stages, each roughly identified with an age group: infantile, preoperational, concrete operational, and formal.
The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil
additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business cycle, business intelligence, c2.com, call centre, carbon-based life, cellular automata, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, coronavirus, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, disintermediation, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, factory automation, friendly AI, George Gilder, Gödel, Escher, Bach, informal economy, information retrieval, invention of the telephone, invention of the telescope, invention of writing, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Marshall McLuhan, Mikhail Gorbachev, Mitch Kapor, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Norbert Wiener, oil shale / tar sands, optical character recognition, pattern recognition, phenotype, premature optimization, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Robert Metcalfe, Rodney Brooks, scientific worldview, Search for Extraterrestrial Intelligence, selection bias, semantic web, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Y2K, Yogi Berra
Kurzweil Technologies is working with UT to develop pattern recognition-based analysis from either "Holter" monitoring (twenty-four-hour recordings) or "Event" monitoring (thirty days or more). 190. Kristen Philipkoski, "A Map That Maps Gene Functions," Wired News, May 28, 2002, http://www.wired.com/news/medtech/0,1286,52723,00.html. 191. Jennifer Ouellette, "Bioinformatics Moves into the Mainstream," The Industrial Physicist (October–November 2003), http://www.sciencemasters.com/bioinformatics.pdf. 192. Port, Arndt, and Carey, "Smart Tools." 193. "Protein Patterns in Blood May Predict Prostate Cancer Diagnosis," National Cancer Institute, October 15, 2002, http://www.nci.nih.gov/newscenter/ProstateProteomics, reporting on Emanuel F. Petricoin et al., "Serum Proteomic Patterns for Detection of Prostate Cancer," Journal of the National Cancer Institute 94 (2002): 1576–78. 194.
DARPA's Information Processing Technology Office's project in this vein is called LifeLog, http://www.darpa.mil/ipto/Programs/lifelog; see also Noah Shachtman, "A Spy Machine of DARPA's Dreams," Wired News, May 20, 2003, http://www.wired.com/news/business/0,1367,58909,00.html; Gordon Bell's project (for Microsoft) is MyLifeBits, http://research.microsoft.com/research/barc/MediaPresence/MyLifeBits.aspx; for the Long Now Foundation, see http://longnow.org. 44. Bergeron is assistant professor of anesthesiology at Harvard Medical School and the author of such books as Bioinformatics Computing, Biotech Industry: A Global, Economic, and Financing Overview, and The Wireless Web and Healthcare. 45. The Long Now Foundation is developing one possible solution: the Rosetta Disk, which will contain extensive archives of text in languages that may be lost in the far future. They plan to use a unique storage technology based on a two-inch nickel disk that can store up to 350,000 pages per disk, with an estimated life expectancy of 2,000 to 10,000 years.
Blockchain: Blueprint for a New Economy by Melanie Swan
23andMe, Airbnb, altcoin, Amazon Web Services, asset allocation, banking crisis, basic income, bioinformatics, bitcoin, blockchain, capital controls, cellular automata, central bank independence, clean water, cloud computing, collaborative editing, Conway's Game of Life, crowdsourcing, cryptocurrency, disintermediation, Edward Snowden, en.wikipedia.org, Ethereum, ethereum blockchain, fault tolerance, fiat currency, financial innovation, Firefox, friendly AI, Hernando de Soto, intangible asset, Internet Archive, Internet of things, Khan Academy, Kickstarter, lifelogging, litecoin, Lyft, M-Pesa, microbiome, Network effects, new economy, peer-to-peer, peer-to-peer lending, peer-to-peer model, personalized medicine, post scarcity, prediction markets, QR code, ride hailing / ride sharing, Satoshi Nakamoto, Search for Extraterrestrial Intelligence, SETI@home, sharing economy, Skype, smart cities, smart contracts, smart grid, software as a service, technological singularity, Turing complete, uber lyft, unbanked and underbanked, underbanked, web application, WikiLeaks
Bitcoin Magazine, May 22, 2014. http://bitcoinmagazine.com/13187/putting-the-blockchain-to-work-for-science-gridcoin/. 126 Buterin, V. “Primecoin: The Cryptocurrency Whose Mining Is Actually Useful.” Bitcoin Magazine, July 8, 2013. http://bitcoinmagazine.com/5635/primecoin-the-cryptocurrency-whose-mining-is-actually-useful/. 127 Myers, D.S., A.L. Bazinet, and M.P. Cummings. “Expanding the Reach of Grid Computing: Combining Globus-and BOINC-Based Systems.” Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland, February 6, 2007 (Draft). http://lattice.umiacs.umd.edu/latticefiles/publications/lattice/myers_bazinet_cummings.pdf. 128 Clenfield, J. and P. Alpeyev. “The Other Bitcoin Power Struggle.” Bloomberg Businessweek, April 24, 2014. http://www.businessweek.com/articles/2014-04-24/bitcoin-miners-seek-cheap-electricity-to-eke-out-a-profit. 129 Gimein, M.
From Bacteria to Bach and Back: The Evolution of Minds by Daniel C. Dennett
Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Andrew Wiles, Bayesian statistics, bioinformatics, bitcoin, Build a better mousetrap, Claude Shannon: information theory, computer age, computer vision, double entry bookkeeping, double helix, Douglas Hofstadter, Elon Musk, epigenetics, experimental subject, Fermat's Last Theorem, Gödel, Escher, Bach, information asymmetry, information retrieval, invention of writing, Isaac Newton, iterative process, John von Neumann, Menlo Park, Murray Gell-Mann, Necker cube, Norbert Wiener, pattern recognition, phenotype, Richard Feynman, Rodney Brooks, self-driving car, social intelligence, sorting algorithm, speech recognition, Stephen Hawking, Steven Pinker, strong AI, The Wealth of Nations by Adam Smith, theory of mind, Thomas Bayes, trickle-down economics, Turing machine, Turing test, Watson beat the top human players on Jeopardy!, Y2K
Empirical work in both areas has made enough progress in recent decades to encourage further inquiry, taking on board the default (and tentative) assumption that the “trees” of existing lineages we can trace back eventually have single trunks. Phylogenetic diagrams, or cladograms, such as the Great Tree of Life (which appears as figure 9.1) showing all the species, or more limited trees of descent in particular lineages, are getting clearer and clearer as bio-informatics research on the accumulation of differences in DNA sequences plug the gaps and correct the mistakes of earlier anatomical and physiological sleuthing.45 Glossogenetic trees, lineages of languages (figure 9.2), are also popular thinking tools, laying out the relations of descent among language families (and individual words) over many centuries. Biologists drawing phylogenetic trees run into difficulties when they must represent anastomosis, the joining together of what had been distinct lineages, a phenomenon now understood to have been prevalent in the early days of life (witness the endosymbiotic origin of eukaryotes).
The idea that languages evolve, that words today are the descendants in some fashion of words in the past, is actually older than Darwin’s theory of evolution of species. Texts of Homer’s Iliad and Odyssey, for instance, were known to descend by copying from texts descended from texts descended from texts going back to their oral ancestors in Homeric times. Philologists and paleographers had been reconstructing lineages of languages and manuscripts (e.g., the various extant copies of Plato’s Dialogues) since the Renaissance, and some of the latest bio-informatic techniques used today to determine relationships between genomes are themselves refined descendants of techniques developed to trace patterns of errors (mutations) in ancient texts. As Darwin noted, “The formation of different languages and of distinct species, and the proofs that both have been developed through a gradual process, are curiously the same” (1871, p. 59). We will follow Darwin’s example and begin in the middle, postponing the primordial origin of language until later in the chapter, and looking at what can be confidently said about the ongoing evolution of language, and in particular, words.
Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem
Amazon Web Services, anti-pattern, bioinformatics, commoditize, corporate governance, create, read, update, delete, data acquisition, en.wikipedia.org, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, web application
Graphs, on the other hand use index-free adjacency to ensure that traversing connected data is extremely rapid. The social network example helps illustrate how different technologies deal with con‐ nected data, but is it a valid use case? Do we really need to find such remote “friends?” But substitute social networks for any other domain, and you’ll see we experience similar performance, modeling and maintenance benefits. Whether music or data center man‐ agement, bio-informatics or football statistics, network sensors or time-series of trades, graphs provide powerful insight into our data. Let’s look, then, at another contemporary application of graphs: recommending products based on a user’s purchase history and the histories of their friends, neighbours, and other people like them. With this example, we’ll bring together several independent facets of a user’s lifestyle to make accurate and profitable recommendations.
MacroWikinomics: Rebooting Business and the World by Don Tapscott, Anthony D. Williams
accounting loophole / creative accounting, airport security, Andrew Keen, augmented reality, Ayatollah Khomeini, barriers to entry, Ben Horowitz, bioinformatics, Bretton Woods, business climate, business process, buy and hold, car-free, carbon footprint, Charles Lindbergh, citizen journalism, Clayton Christensen, clean water, Climategate, Climatic Research Unit, cloud computing, collaborative editing, collapse of Lehman Brothers, collateralized debt obligation, colonial rule, commoditize, corporate governance, corporate social responsibility, creative destruction, crowdsourcing, death of newspapers, demographic transition, disruptive innovation, distributed generation, don't be evil, en.wikipedia.org, energy security, energy transition, Exxon Valdez, failed state, fault tolerance, financial innovation, Galaxy Zoo, game design, global village, Google Earth, Hans Rosling, hive mind, Home mortgage interest deduction, information asymmetry, interchangeable parts, Internet of things, invention of movable type, Isaac Newton, James Watt: steam engine, Jaron Lanier, jimmy wales, Joseph Schumpeter, Julian Assange, Kevin Kelly, Kickstarter, knowledge economy, knowledge worker, Marc Andreessen, Marshall McLuhan, mass immigration, medical bankruptcy, megacity, mortgage tax deduction, Netflix Prize, new economy, Nicholas Carr, oil shock, old-boy network, online collectivism, open borders, open economy, pattern recognition, peer-to-peer lending, personalized medicine, Ray Kurzweil, RFID, ride hailing / ride sharing, Ronald Reagan, Rubik’s Cube, scientific mainstream, shareholder value, Silicon Valley, Skype, smart grid, smart meter, social graph, social web, software patent, Steve Jobs, text mining, the scientific method, The Wisdom of Crowds, transaction costs, transfer pricing, University of East Anglia, urban sprawl, value at risk, WikiLeaks, X Prize, young professional, Zipcar
Wikis provide a shared space for group learning, discussion, and collaboration, while a Facebook-like social networking application helps connect researchers working on similar problems. Meanwhile, over at the European Bioinformatics Institute, scientists are using Web services to revolutionize the way they extract and interpret data from different sources, and to create entirely new data services. Imagine, for example, you wanted to find out everything there is to know about a species, from its taxonomy and genetic sequence to its geographical distribution. Now imagine you had the power to weave together all the latest data on that species from all of the world’s biological databases with just one click. It’s not far-fetched. That power is here, today. Projects like these have inspired researchers in many fields to emulate the changes that are already sweeping disciplines such as bioinformatics and high-energy physics. Having said that, there will be some difficult adjustments and issues such as privacy and national security to confront along the way.
Data-Ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else by Steve Lohr
"Robert Solow", 23andMe, Affordable Care Act / Obamacare, Albert Einstein, big data - Walmart - Pop Tarts, bioinformatics, business cycle, business intelligence, call centre, cloud computing, computer age, conceptual framework, Credit Default Swap, crowdsourcing, Daniel Kahneman / Amos Tversky, Danny Hillis, data is the new oil, David Brooks, East Village, Edward Snowden, Emanuel Derman, Erik Brynjolfsson, everywhere but in the productivity statistics, Frederick Winslow Taylor, Google Glasses, impulse control, income inequality, indoor plumbing, industrial robot, informal economy, Internet of things, invention of writing, Johannes Kepler, John Markoff, John von Neumann, lifelogging, Mark Zuckerberg, market bubble, meta analysis, meta-analysis, money market fund, natural language processing, obamacare, pattern recognition, payday loans, personalized medicine, precision agriculture, pre–internet, Productivity paradox, RAND corporation, rising living standards, Robert Gordon, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, speech recognition, statistical model, Steve Jobs, Steven Levy, The Design of Experiments, the scientific method, Thomas Kuhn: the structure of scientific revolutions, unbanked and underbanked, underbanked, Von Neumann architecture, Watson beat the top human players on Jeopardy!
He does it selectively, but one speaking engagement in 2010 focused his interest and steered his career in a new direction. He had agreed to give a talk in Seattle at a conference hosted by Sage Bionetworks, a nonprofit organization dedicated to accelerate the sharing of data for biological research. Hammerbacher knew the two medical researchers who had founded the nonprofit, Stephen Friend and Eric Schadt. He had talked to them about how they might use big-data software to cope with the data explosion in bioinformatics and genomics. But the preparation for the speech forced him to really think about biology and technology, reading up and talking to people. The more Hammerbacher looked into it, the more intriguing the subject looked. Biological research, he says, could go the way of finance with its closed, proprietary systems and data being hoarded rather than shared. Or, he says, it could “go the way of the Web”—that is, toward openness.
Robots Will Steal Your Job, But That's OK: How to Survive the Economic Collapse and Be Happy by Pistono, Federico
3D printing, Albert Einstein, autonomous vehicles, bioinformatics, Buckminster Fuller, cloud computing, computer vision, correlation does not imply causation, en.wikipedia.org, epigenetics, Erik Brynjolfsson, Firefox, future of work, George Santayana, global village, Google Chrome, happiness index / gross national happiness, hedonic treadmill, illegal immigration, income inequality, information retrieval, Internet of things, invention of the printing press, jimmy wales, job automation, John Markoff, Kevin Kelly, Khan Academy, Kickstarter, knowledge worker, labor-force participation, Lao Tzu, Law of Accelerating Returns, life extension, Loebner Prize, longitudinal study, means of production, Narrative Science, natural language processing, new economy, Occupy movement, patent troll, pattern recognition, peak oil, post scarcity, QR code, race to the bottom, Ray Kurzweil, recommendation engine, RFID, Rodney Brooks, selection bias, self-driving car, slashdot, smart cities, software as a service, software is eating the world, speech recognition, Steven Pinker, strong AI, technological singularity, Turing test, Vernor Vinge, women in the workforce
And even those overseas jobs are now threatened by the rapid advances in automation and robotics. The more companies automate, because of the need to increase their productivity, the more jobs will be lost, forever. The future of work and innovation is not in the past that we know, but in unfamiliar territory of the future that is yet to come. New and exciting fields are emerging every day. Synthetic biology, neurocomputation, 3D printing, contour crafting, molecular engineering, bioinformatics, life extension, robotics, quantum computing, artificial intelligence, machine learning, these new frontiers that are rapidly evolving and are just the beginning of a new, amazing era of our species that will bring about the greatest transformation of all time. A transformation that will make the industrial revolution look like an event of minor importance. This new era will create new opportunities, new frontiers for research and innovation that we cannot even begin to comprehend now.
Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport
Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, commoditize, data acquisition, disruptive innovation, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, lifelogging, Mark Zuckerberg, move fast and break things, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining, Thomas Davenport
Netflix created the Netflix Prize for the data science team that could optimize the company’s movie recommendations for customers and, as I noted in chapter 2, is now using big data to help in the creation of proprietary content. The testing firm Kaplan uses its big data to begin advising customers on effective learning and test-preparation strategies. Novartis focuses on big data—the health-care industry calls it informatics—to develop new drugs. Its CEO, Joe Jimenez, commented in an interview, “If you think about the amounts of data that are now available, bioinformatics capability is becoming very important, as is the ability to mine that data and really understand, for example, the specific mutations that are leading to certain types of cancers.”7 These companies’ big data efforts are directly focused on products, services, and customers. This has important implications, of course, for the organizational locus of big data and the processes and pace of new product development.
HBase: The Definitive Guide by Lars George
Amazon Web Services, bioinformatics, create, read, update, delete, Debian, distributed revision control, domain-specific language, en.wikipedia.org, fault tolerance, Firefox, Google Earth, Kickstarter, place-making, revision control, smart grid, web application
If we were to take 140 bytes per message, as used by Twitter, it would total more than 17 TB every month. Even before the transition to HBase, the existing system had to handle more than 25 TB a month. In addition, less web-oriented companies from across all major industries are collecting an ever-increasing amount of data. For example: Financial Such as data generated by stock tickers Bioinformatics Such as the Global Biodiversity Information Facility (http://www.gbif.org/) Smart grid Such as the OpenPDC (http://openpdc.codeplex.com/) project Sales Such as the data generated by point-of-sale (POS) or stock/inventory systems Genomics Such as the Crossbow (http://bowtie-bio.sourceforge.net/crossbow/index.shtml) project Cellular services, military, environmental Which all collect a tremendous amount of data as well Storing petabytes of data efficiently so that updates and retrieval are still performed well is no easy feat.
A abort() method, HBaseAdmin class, Basic Operations Abortable interface, Basic Operations Accept header, switching REST formats, Supported formats, JSON (application/json), Protocol Buffer (application/x-protobuf) access control, Introduction to Coprocessors, HBase Versus Bigtable Bigtable column families for, HBase Versus Bigtable coprocessors for, Introduction to Coprocessors ACID properties, The Problem with Relational Database Systems add() method, Bytes class, The Bytes Class add() method, Put class, Single Puts addColumn() method, Get class, Single Gets addColumn() method, HBaseAdmin class, Schema Operations addColumn() method, Increment class, Multiple Counters addColumn() method, Scan class, Introduction addFamily() method, Get class, Single Gets addFamily() method, HTableDescriptor class, Table Properties addFamily() method, Scan class, Introduction, Client API: Best Practices add_peer command, HBase Shell, Replication alter command, HBase Shell, Data definition Amazon, The Dawn of Big Data, S3, S3 data requirements of, The Dawn of Big Data S3 (Simple Storage Service), S3, S3 Apache Avro, Introduction to REST, Thrift, and Avro (see Avro) Apache binary release for HBase, Apache Binary Release, Apache Binary Release Apache HBase, Quick-Start Guide (see HBase) Apache Hive, Hive (see Hive) Apache Lucene, Search Integration, Search Integration Apache Maven, Building the Examples (see Maven) Apache Pig, Pig (see Pig) Apache Solr, Search Integration Apache Whirr, deployment using, Apache Whirr, Apache Whirr Apache ZooKeeper, Implementation (see ZooKeeper) API, Native Java (see client API) append feature, for durability, Durability append() method, HLog class, HLog Class architecture, storage, Storage (see storage architecture) assign command, HBase Shell, Tools assign() method, HBaseAdmin class, Cluster Operations AssignmentManager class, The Region Life Cycle AsyncHBase client, Other Clients atomic read-modify-write, Dimensions, Tables, Rows, Columns, and Cells, Storage API, General Notes, Atomic compare-and-set, Atomic compare-and-set, Atomic compare-and-delete, Atomic compare-and-delete, Row Locks, WALEdit Class compare-and-delete operations, Atomic compare-and-delete, Atomic compare-and-delete compare-and-set, for put operations, Atomic compare-and-set, Atomic compare-and-set per-row basis for, Tables, Rows, Columns, and Cells, Storage API, General Notes row locks for, Row Locks for WAL edits, WALEdit Class auto-sharding, Auto-Sharding, Auto-Sharding Avro, Introduction to REST, Thrift, and Avro, Introduction to REST, Thrift, and Avro, Avro, Avro, Operation, Installation, Operation, Operation, Operation, Operation, Advanced Schemas documentation for, Operation installing, Installation port used by, Operation schema compilers for, Avro schema used by, Advanced Schemas starting server for, Operation stopping, Operation B B+ trees, B+ Trees, B+ Trees backup masters, adding, Adding a local backup master, Adding a backup master, Adding a backup master balancer, Load Balancing, Load Balancing, Node Decommissioning balancer command, HBase Shell, Tools, Load Balancing balancer() method, HBaseAdmin class, Cluster Operations, Load Balancing balanceSwitch() method, HBaseAdmin class, Cluster Operations, Load Balancing balance_switch command, HBase Shell, Tools, Load Balancing, Node Decommissioning base64 command, XML (text/xml) Base64 encoding, with REST, XML (text/xml), JSON (application/json) BaseEndpointCoprocessor class, The BaseEndpointCoprocessor class, The BaseEndpointCoprocessor class BaseMasterObserver class, The BaseMasterObserver class, The BaseMasterObserver class BaseRegionObserver class, The BaseRegionObserver class, The BaseRegionObserver class Batch class, The CoprocessorProtocol interface, The BaseEndpointCoprocessor class batch clients, Batch Clients batch operations, Batch Operations, Batch Operations, Caching Versus Batching, Caching Versus Batching, Custom Filters for scans, Caching Versus Batching, Caching Versus Batching, Custom Filters on tables, Batch Operations, Batch Operations batch() method, HTable class, Batch Operations, Batch Operations, Introduction to Counters Bigtable storage architecture, Backdrop, Summary, Nomenclature, HBase Versus Bigtable, HBase Versus Bigtable “Bigtable: A Distributed Storage System for Structured Data” (paper, by Google), Preface, Backdrop bin directory, Apache Binary Release BinaryComparator class, Comparators BinaryPrefixComparator class, Comparators binarySearch() method, Bytes class, The Bytes Class bioinformatics, data requirements of, The Dawn of Big Data BitComparator class, Comparators block cache, Single Gets, Introduction, Column Families, Column Families, Bloom Filters, Region Server Metrics, Client API: Best Practices, Configuration Bloom filters affecting, Bloom Filters controlling use of, Single Gets, Introduction, Client API: Best Practices enabling and disabling, Column Families metrics for, Region Server Metrics settings for, Configuration block replication, MapReduce Locality, MapReduce Locality blocks, Column Families, HFile Format, HFile Format, HFile Format, HFile Format compressing, HFile Format size of, Column Families, HFile Format Bloom filters, Column Families, Bloom Filters, Bloom Filters bypass() method, ObserverContext class, The ObserverContext class Bytes class, Single Puts, Single Gets, The Bytes Class, The Bytes Class C caching, Caching Versus Batching, Caching Versus Batching, Caching Versus Batching, The HTable Utility Methods, Client API: Best Practices, HBase Configuration Properties (see also block cache; Memcached) regions, The HTable Utility Methods for scan operations, Caching Versus Batching, Caching Versus Batching, Client API: Best Practices, HBase Configuration Properties Cacti server, JMXToolkit on, JMX Remote API call() method, Batch class, The CoprocessorProtocol interface CAP (consistency, availability, and partition tolerance) theorem, Nonrelational Database Systems, Not-Only SQL or NoSQL?
The Half-Life of Facts: Why Everything We Know Has an Expiration Date by Samuel Arbesman
Albert Einstein, Alfred Russel Wallace, Amazon Mechanical Turk, Andrew Wiles, bioinformatics, British Empire, Cesare Marchetti: Marchetti’s constant, Chelsea Manning, Clayton Christensen, cognitive bias, cognitive dissonance, conceptual framework, David Brooks, demographic transition, double entry bookkeeping, double helix, Galaxy Zoo, guest worker program, Gödel, Escher, Bach, Ignaz Semmelweis: hand washing, index fund, invention of movable type, Isaac Newton, John Harrison: Longitude, Kevin Kelly, life extension, Marc Andreessen, meta analysis, meta-analysis, Milgram experiment, Nicholas Carr, P = NP, p-value, Paul Erdős, Pluto: dwarf planet, publication bias, randomized controlled trial, Richard Feynman, Rodney Brooks, scientific worldview, social graph, social web, text mining, the scientific method, Thomas Kuhn: the structure of scientific revolutions, Thomas Malthus, Tyler Cowen: Great Stagnation
“Sildenafil: from angina to erectile dysfunction to pulmonary hypertension and beyond.” Nature Reviews Drug Discovery 5, no. 8 (August 2006): 689–702. 112 software designed to find undiscovered patterns: See TRIZ, a method of invention and discovery. For example, here: www.aitriz.org. 112 computerized systems devoted to drug repurposing: Sanseau, Philippe, and Jacob Koehler. “Editorial: Computational Methods for Drug Repurposing.” Briefings in Bioinformatics 12, no. 4 (July 1, 2011): 301–2. 112 can generate new and interesting: Darden, Lindley. “Recent Work in Computational Scientific Discovery.” In Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society (1997) 161–66. 113 names a novel, computationally created: See TheoryMine: http://theorymine.co.uk. 116 A Cornell professor of earth and atmospheric sciences: Cisne, John L.
Human Diversity: The Biology of Gender, Race, and Class by Charles Murray
23andMe, affirmative action, Albert Einstein, Alfred Russel Wallace, Asperger Syndrome, assortative mating, basic income, bioinformatics, Cass Sunstein, correlation coefficient, Daniel Kahneman / Amos Tversky, double helix, Drosophila, epigenetics, equal pay for equal work, European colonialism, feminist movement, glass ceiling, Gunnar Myrdal, income inequality, Kenneth Arrow, labor-force participation, longitudinal study, meta analysis, meta-analysis, out of africa, p-value, phenotype, publication bias, quantitative hedge fund, randomized controlled trial, replication crisis, Richard Thaler, risk tolerance, school vouchers, Scientific racism, selective serotonin reuptake inhibitor (SSRI), Silicon Valley, social intelligence, statistical model, Steven Pinker, The Bell Curve by Richard Herrnstein and Charles Murray, the scientific method, The Wealth of Nations by Adam Smith, theory of mind, Thomas Kuhn: the structure of scientific revolutions, twin studies, universal basic income, working-age population
For this figure, the unit of analysis was the population, and the cell entry was the proportion of genetic regions under selection shared by the two populations for that cell. For the visually similar figure in chapter 7, the unit of analysis was the individual and the cell entries were measures of genetic distance—Wright’s fixation index, FST. 9: The Landscape of Ancestral Population Differences 1. Responsibility for the GWAS Catalog was subsequently shared with the European Bioinformatics Institute (EBI). The GWAS Catalog is downloadable free of charge at its website, ebi.ac.uk/gwas. The level of statistical significance required for entry in the GWAS Catalog is p <1.0×10–5, which is more inclusive than the standard for statistical significance in the published literature (p <1.0×10–8). To be eligible for the database, the study must meet certain technical criteria and have been published in an English-language journal. 2.
“Human DNA Sequences: More Variation and Less Race.” American Journal of Physical Anthropology 139 (1): 23–34. LoParo, Devon, and Irwin Waldman. 2014. “Twins’ Rearing Environment Similarity and Childhood Externalizing Disorders: A Test of the Equal Environments Assumption.” Behavior Genetics 44 (6): 606–13. Lopez, Saioa, Lucy van Dorp, and Garrett Hallenthal. 2016. “Human Dispersal out of Africa: A Lasting Debate.” Evolutionary Bioinformatics 11 (S2): 57–68. Low, Bobbi S. 2015. Why Sex Matters: A Darwinian Look at Human Behavior. Princeton, NJ: Princeton University Press. Lubinski, David, and Camilla P. Benbow. 2006. “Study of Mathematically Precocious Youth After 35 Years: Uncovering Antecedents for the Development of Math-Science Expertise.” Psychological Science 1 (4): 316–45. Lubinski, David, Camilla P. Benbow, and Harrison J.
Business Metadata: Capturing Enterprise Knowledge by William H. Inmon, Bonnie K. O'Neil, Lowell Fryman
affirmative action, bioinformatics, business cycle, business intelligence, business process, call centre, carbon-based life, continuous integration, corporate governance, create, read, update, delete, database schema, en.wikipedia.org, informal economy, knowledge economy, knowledge worker, semantic web, The Wisdom of Crowds, web application
However, the NCI Thesaurus is not “just” a thesaurus; it uses OWL and is description logic based, also using a concept hierarchy organized into trees. The terms were stored in a 11179 registry, and the registry metadata was mapped to UML structures from the Class Diagram. The solution includes three main layers: ✦ Layer 1: Enterprise Vocabulary Services: DL (description logics) and ontology, thesaurus ✦ Layer 2: CADSR: Metadata Registry, consisting of Common Data Elements ✦ Layer 3: Cancer Bioinformatics Objects, using UML Domain Models The NCI Thesaurus contains over 48,000 concepts. Although its emphasis is on machine understandability, NCI has managed to translate description logic somewhat into English. Linking concepts together is accomplished through roles, which are also concepts themselves. Here’s an example: Concept: Disease: ALD Positive Anaplastic Large Cell Lymphoma Role: Disease_Has_Molecular_Abnormality Concept: Molecular Abnormality: Rearrangement of 2p23 (Warzel, 2006, p.18) 216 Chapter 11 Semantics and Business Metadata NCI’s toolkit is called caCORE, and it includes objects that developers can use in their applications.
We-Think: Mass Innovation, Not Mass Production by Charles Leadbeater
1960s counterculture, Andrew Keen, barriers to entry, bioinformatics, c2.com, call centre, citizen journalism, clean water, cloud computing, complexity theory, congestion charging, death of newspapers, Debian, digital Maoism, disruptive innovation, double helix, Douglas Engelbart, Edward Lloyd's coffeehouse, frictionless, frictionless market, future of work, game design, Google Earth, Google X / Alphabet X, Hacker Ethic, Hernando de Soto, hive mind, Howard Rheingold, interchangeable parts, Isaac Newton, James Watt: steam engine, Jane Jacobs, Jaron Lanier, Jean Tirole, jimmy wales, Johannes Kepler, John Markoff, John von Neumann, Joi Ito, Kevin Kelly, knowledge economy, knowledge worker, lateral thinking, lone genius, M-Pesa, Mark Shuttleworth, Mark Zuckerberg, Marshall McLuhan, Menlo Park, microcredit, Mitch Kapor, new economy, Nicholas Carr, online collectivism, planetary scale, post scarcity, Richard Stallman, Shoshana Zuboff, Silicon Valley, slashdot, social web, software patent, Steven Levy, Stewart Brand, supply-chain management, The Death and Life of Great American Cities, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Whole Earth Catalog, Zipcar
Yet even more traditional sectors will feel the pull of the pebbles in time, not least because the consumers and workforce of the near future will have grown up using the social web to search for and share ideas with one another. They will bring with them the web’s culture of lateral, semi-structured free association. This new organisational landscape is taking shape all around us. Scientific research is becoming ever more a question of organising a vast number of pebbles. Young scientists especially in emerging fields like bioinformatics draw on hundreds of data banks; use electronic lab notebooks to record and then share their results daily, often through blogs and wikis; work in multi-disciplinary teams threaded around the world organised by social networks; they publish their results, including open source versions of the software used in their experiments and their raw data, in open access online journals. Schools and universities are boulders, that are increasingly dealing with students who want to be in the pebble business, drawing information from a variety of sources, sharing with their peers, learning from one another.
Richard Dawkins: How a Scientist Changed the Way We Think by Alan Grafen; Mark Ridley
Alfred Russel Wallace, Arthur Eddington, bioinformatics, cognitive bias, computer age, conceptual framework, Dava Sobel, double helix, Douglas Hofstadter, epigenetics, Fellow of the Royal Society, Haight Ashbury, interchangeable parts, Isaac Newton, Johann Wolfgang von Goethe, John von Neumann, loose coupling, Murray Gell-Mann, Necker cube, phenotype, profit maximization, Ronald Reagan, Stephen Hawking, Steven Pinker, the scientific method, theory of mind, Thomas Kuhn: the structure of scientific revolutions, Yogi Berra, zero-sum game
The invention of an algorithmic biology Seth Bullock BIOLOGY and computing might not seem the most comfortable of bedfellows. It is easy to imagine nature and technology clashing as the green-welly brigade rub up awkwardly against the back-room boffins. But collaboration between the two fields has exploded in recent years, driven primarily by massive investment in the emerging field of bioinformatics charged with mapping the human genome. New algorithms and computational infrastructures have enabled research groups to collaborate effectively on a worldwide scale in building huge, exponentially growing genomic databases, to ‘mine’ these mountains of data for useful information, and to construct and manipulate innovative computational models of the genes and proteins that have been identified.
The Future of the Brain: Essays by the World's Leading Neuroscientists by Gary Marcus, Jeremy Freeman
23andMe, Albert Einstein, bioinformatics, bitcoin, brain emulation, cloud computing, complexity theory, computer age, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data acquisition, Drosophila, epigenetics, global pandemic, Google Glasses, iterative process, linked data, mouse model, optical character recognition, pattern recognition, personalized medicine, phenotype, race to the bottom, Richard Feynman, Ronald Reagan, semantic web, speech recognition, stem cell, Steven Pinker, supply-chain management, Turing machine, twin studies, web application
(Although, of course, there is potential to miss the true culprit if it lies outside the exome.) When geneticists began exome sequencing in earnest, they encountered an unexpected complication. It turns out that each human individual carries a surprisingly high number of potentially deleterious mutations, typically more than one hundred. These are mutations that alter or disturb protein sequences in a way that is predicted to have a damaging effect on protein function, based on bioinformatic (computer-based) analyses. Each mutation might be extremely rare in the population, or even unique to the person or family in which it is found. How do we sift out the true causal mutations, the ones that are functionally implicated in the disorder or trait we are studying, against a broader background of irrelevant genomic change? Sometimes we can rely on a lucky convergence of findings, for example, where distinct mutations in the same gene pop up in multiple different affected families or cases.
The Willpower Instinct: How Self-Control Works, Why It Matters, and What You Can Doto Get More of It by Kelly McGonigal
banking crisis, bioinformatics, Cass Sunstein, choice architecture, cognitive bias, delayed gratification, game design, impulse control, lifelogging, loss aversion, meta analysis, meta-analysis, phenotype, Richard Thaler, Stanford marshmallow experiment, Walter Mischel
See also Witkiewitz, K., and S. Bowen. “Depression, Craving, and Substance Use Following a Randomized Trial of Mindfulness-Based Relapse Prevention.” Journal of Consulting and Clinical Psychology 78 (2010): 362–74. Chapter 10: Final Thoughts Page 237—“Only reasonable conclusion to a book about scientific ideas is: Draw your own conclusions”: Credit for this suggestion goes to Brian Kidd, Senior Bioinformatics Research Specialist, Institute for Infection Immunity and Transplantation, Stanford University. INDEX acceptance inner power of Adams, Claire addiction addict loses his cravings candy addict conquers sweet tooth chocoholic takes inspiration from Hershey’s Kisses dopamine’s role in drinking drug e-mail Facebook shopping smoker under social influence smoking Advisor-Teller Money Manager Intervention (ATM) Ainslie, George Air Force Academy, U.S.
Here Comes Everybody: The Power of Organizing Without Organizations by Clay Shirky
Andrew Keen, Berlin Wall, bioinformatics, Brewster Kahle, c2.com, Charles Lindbergh, crowdsourcing, en.wikipedia.org, hiring and firing, hive mind, Howard Rheingold, Internet Archive, invention of agriculture, invention of movable type, invention of the printing press, invention of the telegraph, jimmy wales, Joi Ito, Kuiper Belt, liberation theology, Mahatma Gandhi, means of production, Merlin Mann, Metcalfe’s law, Nash equilibrium, Network effects, Nicholas Carr, Picturephone, place-making, Pluto: dwarf planet, prediction markets, price mechanism, prisoner's dilemma, profit motive, Richard Stallman, Robert Metcalfe, Ronald Coase, Silicon Valley, slashdot, social software, Stewart Brand, supply-chain management, The Nature of the Firm, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, transaction costs, ultimatum game, Vilfredo Pareto, Yogi Berra
The Chinese had the best chance of sequencing the virus; the threat of SARS was most significant in Asia, and especially in China, which had most of the world’s confirmed cases, and China is home to brilliant biologists, with significant expertise in distributed computing. Despite these resources and incentives, however, the solution didn’t come from China. On April 12, Genome Sciences Centre (GSC), a small Canadian lab specializing in the genetics of pathogens, published the genetic sequence of SARS. On the way, they had participated in not just one open network, but several. Almost the entire computational installation of GSC is open source; bioinformatics tools with names like BLAST, Phrap, Phred, and Consed, all running on Linux. GSC checked their work against Genbank, a public database of genetic sequences. They published their findings on their own site (run, naturally, using open source tools) and published the finished sequence to Genbank, for everyone to see. The story is shot through with involvement in various participatory networks.
Managing Projects With GNU Make by Robert Mecklenburg, Andrew Oram
(question mark), Wildcards calling functions and, Wildcards character classes, Wildcards expanding, Wildcards misuse, Wildcards pattern rules and, Rules ^ (tilde), Wildcards Windows filesystem, Cygwin and, Filesystem wordlist function, String Functions words function, String Functions X XML, Ant, XML Preprocessing build files, Ant preprocessing book makefile, XML Preprocessing About the Author Robert Mecklenburg began using Unix as a student in 1977 and has been programming professionally for 23 years. His make experience started in 1982 at NASA with Unix version 7. Robert received his Ph.D. in Computer Science from the University of Utah in 1991. Since then, he has worked in many fields ranging from mechanical CAD to bioinformatics, and he brings his extensive experience in C++, Java, and Lisp to bear on the problems of project management with make Colophon Our look is the result of reader comments, our own experimentation, and feedback from distribution channels. Distinctive covers complement our distinctive approach to technical topics, breathing personality and life into potentially dry subjects. The animal on the cover of Managing Projects with GNU Make, Third Edition is a potto, a member of the loris family.
My Life as a Quant: Reflections on Physics and Finance by Emanuel Derman
Berlin Wall, bioinformatics, Black-Scholes formula, Brownian motion, buy and hold, capital asset pricing model, Claude Shannon: information theory, Donald Knuth, Emanuel Derman, fixed income, Gödel, Escher, Bach, haute couture, hiring and firing, implied volatility, interest rate derivative, Jeff Bezos, John Meriwether, John von Neumann, law of one price, linked data, Long Term Capital Management, moral hazard, Murray Gell-Mann, Myron Scholes, Paul Samuelson, pre–internet, publish or perish, quantitative trading / quantitative ﬁnance, Sharpe ratio, statistical arbitrage, statistical model, Stephen Hawking, Steve Jobs, stochastic volatility, technology bubble, the new new thing, transaction costs, volatility smile, Y2K, yield curve, zero-coupon bond, zero-sum game
When I asked if he had known David, he told me that O'Connor had been intent on shutting down David's enterprise. With their deep pockets, he said "they had guys spending all their time running diff RMSs files and the O'Connor code" (Dill is one of the great suite of UNIX tools that make a programmer's life easier. It compares two different files of text and finds any common strings of words in them, a simpler version of current bio-informatics programs that search for common strings of DNA in the mouse and human genome.) I have no idea whether there were in fact commonalities, but even independent people coding the same wellknown algorithm might end up writing vaguely similar chunks of code. O'Connor eventually disappeared, too, absorbed into Swiss Bank, which itself subsequently merged with UBS. Starting in 1990 David disappeared into some alternate nonfinancial New York; none of his old friends saw him anymore.
Tools for Computational Finance by Rüdiger Seydel
bioinformatics, Black-Scholes formula, Brownian motion, commoditize, continuous integration, discrete time, implied volatility, incomplete markets, interest rate swap, linear programming, London Interbank Offered Rate, mandelbrot fractal, martingale, random walk, stochastic process, stochastic volatility, transaction costs, value at risk, volatility smile, Wiener process, zero-coupon bond
.: Second Course in Ordinary Differential Equations for Scientists and Engineers Franke, J.; Härdle, W.; Hafner, C. M.: Statistics of Financial Markets: An Introduction Hurwitz, A.; Kritikos, N.: Lectures on Number Theory Frauenthal, J. C.: Mathematical Modeling in Epidemiology Huybrechts, D.: Complex Geometry: An Introduction Freitag, E.; Busam, R.: Complex Analysis Isaev, A.: Introduction to Mathematical Methods in Bioinformatics Friedman, R.: Algebraic Surfaces and Holomorphic Vector Bundles Fuks, D. B.; Rokhlin, V. A.: Beginner’s Course in Topology Fuhrmann, P. A.: A Polynomial Approach to Linear Algebra Gallot, S.; Hulin, D.; Lafontaine, J.: Riemannian Geometry Istas, J.: Mathematical Modeling for the Life Sciences Iversen, B.: Cohomology of Sheaves Jacod, J.; Protter, P.: Probability Essentials Jennings, G. A.: Modern Geometry with Applications Gardiner, C.
The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity by Amy Webb
Ada Lovelace, AI winter, Airbnb, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, artificial general intelligence, Asilomar, autonomous vehicles, Bayesian statistics, Bernie Sanders, bioinformatics, blockchain, Bretton Woods, business intelligence, Cass Sunstein, Claude Shannon: information theory, cloud computing, cognitive bias, complexity theory, computer vision, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, Deng Xiaoping, distributed ledger, don't be evil, Donald Trump, Elon Musk, Filter Bubble, Flynn Effect, gig economy, Google Glasses, Grace Hopper, Gödel, Escher, Bach, Inbox Zero, Internet of things, Jacques de Vaucanson, Jeff Bezos, Joan Didion, job automation, John von Neumann, knowledge worker, Lyft, Mark Zuckerberg, Menlo Park, move fast and break things, move fast and break things, natural language processing, New Urbanism, one-China policy, optical character recognition, packet switching, pattern recognition, personalized medicine, RAND corporation, Ray Kurzweil, ride hailing / ride sharing, Rodney Brooks, Rubik’s Cube, Sand Hill Road, Second Machine Age, self-driving car, SETI@home, side project, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart cities, South China Sea, sovereign wealth fund, speech recognition, Stephen Hawking, strong AI, superintelligent machines, technological singularity, The Coming Technological Singularity, theory of mind, Tim Cook: Apple, trade route, Turing machine, Turing test, uber lyft, Von Neumann architecture, Watson beat the top human players on Jeopardy!, zero day
Now, when you’re not feeling well, an AGI diagnostic test helps determine what, exactly, is making you sick so that a treatment—one that maps to your PDR—can be prescribed. Over-the-counter medications are mostly gone, too, but compounding pharmacies have seen a resurgence. That’s because AGI helped accelerate critical developments in genetic editing and precision medicine. You now consult a computational pharmacist: specially trained pharmacists who have backgrounds in bioinformatics, medicine, and pharmacology. Computational pharmacy is a medical specialty, one that works closely with a new breed of AI-GPs: general practitioners who are trained in both medicine and technology. While AGI has obviated certain medical specialists—radiologists, immunologists, allergists, cardiologists, dermatologists, endocrinologists, anesthesiologists, neurologists, and others—doctors working in those fields had plenty of time to repurpose their skills for adjacent fields.
RDF Database Systems: Triples Storage and SPARQL Query Processing by Olivier Cure, Guillaume Blin
Amazon Web Services, bioinformatics, business intelligence, cloud computing, database schema, fault tolerance, full text search, information retrieval, Internet Archive, Internet of things, linked data, NP-complete, peer-to-peer, performance metric, random walk, recommendation engine, RFID, semantic web, Silicon Valley, social intelligence, software as a service, SPARQL, web application
CHAPTER NINE Conclusion The need to manage enormous quantities of data has never been greater. This fact is mainly due to the expansion of the Web and the load of information that can be harvested from our interactions with it, such as via personal computers, laptops, smartphones, and tablet devices. This data can be represented using various models and in the context of use cases thriving on the Web—that is, for social, geographical, recommendations, bioinformatics, network management, and fraud detection, to name a few, the graph data model is a particularly relevant choice. RDF, with its W3C recommendation status and its set of companions like SPARQL, SKOS, RDFS, and OWL, plays a primordial role in the graph data model ecosystem.The quantity and quality of tools, such as parsers, editors, and APIs, implemented to ease the use of RDF data attests for the strong enthusiasm surrounding this standard, as well as the importance to manage this data appropriately.The number of academic, open-source and commercial RDF stores presented in this book emphasize the importance of this tool category, the diversity of possible approaches, as well as the complexity to design efficient systems.
Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs
Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, game design, information retrieval, iterative process, natural language processing, pattern recognition, performance metric, sentiment analysis, social web, speech recognition, statistical model, text mining
Carlson, Andrew, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka Jr., and Tom M. Mitchell. 2010. “Coupled Semi-Supervised Learning for Information Extraction.” In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). Chomsky, Noam. 1957. Syntactic Structures. Paris: Mouton. Chuzhanova, N.A., A.J. Jones, and S. Margetts.1998. “Feature selection for genetic sequence classification. “Bioinformatics 14(2):139–143. Culotta, Aron, Michael Wick, Robert Hall, and Andrew McCallum. 2007. “First-Order Probabilistic Models for Coreference Resolution.” In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT/NAACL). Derczynski, Leon, and Robert Gaizauskas. 2010. “USFD2: Annotating Temporal Expressions and TLINKs for TempEval-2.”
The Industries of the Future by Alec Ross
23andMe, 3D printing, Airbnb, algorithmic trading, AltaVista, Anne Wojcicki, autonomous vehicles, banking crisis, barriers to entry, Bernie Madoff, bioinformatics, bitcoin, blockchain, Brian Krebs, British Empire, business intelligence, call centre, carbon footprint, cloud computing, collaborative consumption, connected car, corporate governance, Credit Default Swap, cryptocurrency, David Brooks, disintermediation, Dissolution of the Soviet Union, distributed ledger, Edward Glaeser, Edward Snowden, en.wikipedia.org, Erik Brynjolfsson, fiat currency, future of work, global supply chain, Google X / Alphabet X, industrial robot, Internet of things, invention of the printing press, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Joi Ito, Kickstarter, knowledge economy, knowledge worker, lifelogging, litecoin, M-Pesa, Marc Andreessen, Mark Zuckerberg, Mikhail Gorbachev, mobile money, money: store of value / unit of account / medium of exchange, Nelson Mandela, new economy, offshore financial centre, open economy, Parag Khanna, paypal mafia, peer-to-peer, peer-to-peer lending, personalized medicine, Peter Thiel, precision agriculture, pre–internet, RAND corporation, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Rubik’s Cube, Satoshi Nakamoto, selective serotonin reuptake inhibitor (SSRI), self-driving car, sharing economy, Silicon Valley, Silicon Valley startup, Skype, smart cities, social graph, software as a service, special economic zone, supply-chain management, supply-chain management software, technoutopianism, The Future of Employment, Travis Kalanick, underbanked, Vernor Vinge, Watson beat the top human players on Jeopardy!, women in the workforce, Y Combinator, young professional
While Seltzer makes the case that virtually every bit of our personal information is now available to those who want it, I do think there are parts of our lives that remain private and that we must fight to keep private. And I think the best way to do that is by focusing on defining rules for data retention and proper use. Most of our health information remains private, and the need for privacy will grow with the rise of genomics. John Quackenbush, a professor of computational biology and bioinformatics at Harvard, explained that “as soon as you touch genomic data, that information is fundamentally identifiable. I can erase your address and Social Security number and every other identifier, but I can’t anonymize your genome without wiping out the information that I need to analyze.” The danger of genomic information being widely available is difficult to overstate. All of the most intimate details of who and what we are genetically could be used by governments or corporations for reasons going beyond trying to develop precision medicines.
SQL Hacks by Andrew Cumming, Gordon Russell
Mimer is also taking active part in the standardization of SQL as a member of the ISO SQL-standardization committee ISO/IEC JTC1/SC32, WorkGroup 3, Database Languages. You can download free development versions of Mimer SQL from http://www.mimer.com. Troels Arvin lives with his wife and son in Copenhagen, Denmark. He went half-way through medical school before realizing that computer science was the thing to do. He has since worked in the web, bioinformatics, and telecommunications businesses. Troels is keen on database technology and maintains a slowly growing web page on how databases implement the SQL standard: http://troels.arvin.dk/db/rdbms. Acknowledgments We would like to thank our editor, Brian Jepson, for his hard work and exceptional skill; his ability to separate the wheat from the chaff was invaluable. We are grateful to Alan Beaulieu, author of Learning SQL and Mastering Oracle SQL (both from O'Reilly), for his time, energy, and technical insight.
Origins: How Earth's History Shaped Human History by Lewis Dartnell
agricultural Revolution, back-to-the-land, bioinformatics, clean water, Columbian Exchange, decarbonisation, discovery of the americas, Donald Trump, Eratosthenes, financial innovation, Google Earth, Khyber Pass, Malacca Straits, megacity, meta analysis, meta-analysis, oil shale / tar sands, out of africa, Pax Mongolica, peak oil, phenotype, Rosa Parks, Silicon Valley, South China Sea, spice trade, supervolcano, trade route, transatlantic slave trade
Chiang, K. H. Hung, T. Y. Chiang and B. A. Schaal (2006). ‘Phylogeography of Asian wild rice, Oryza rufipogon, reveals multiple independent domestications of cultivated rice, Oryza sativa’, Proceedings of the National Academy of Sciences of the United States of America 103(25): 9578–83. López, S., L. van Dorp and G. Hellenthal (2015). ‘Human Dispersal Out of Africa: A Lasting Debate’, Evolutionary Bioinformatics Online 11(Suppl 2): 57–68. Lutgens, F. K. and E. J. Tarbuck (2000). The Atmosphere: An Introduction to Meteorology, 8th edition, Prentice Hall. Lyons, T. W., C. T. Reinhard and N. J. Planavsky (2014). ‘The rise of oxygen in Earth’s early ocean and atmosphere’, Nature 506: 307–15. Macalister, T. (2015). ‘Kellingley colliery closure:“shabby end” for a once mighty industry’, Guardian, https://www.theguardian.com/environment/2015/dec/18/kellingley-colliery-shabby-end-for-an-industry.
Bad Blood: Secrets and Lies in a Silicon Valley Startup by John Carreyrou
Affordable Care Act / Obamacare, bioinformatics, corporate governance, Donald Trump, El Camino Real, Elon Musk, Google Chrome, John Markoff, Jony Ive, Kickstarter, Marc Andreessen, Mark Zuckerberg, Mars Rover, medical malpractice, Menlo Park, obamacare, Ponzi scheme, ride hailing / ride sharing, Right to Buy, Sand Hill Road, side project, Silicon Valley, Silicon Valley startup, stealth mode startup, Steve Jobs, supply-chain management, Travis Kalanick, ubercab
In the process of writing this book, I reached out to all of the key figures in the Theranos saga and offered them the opportunity to comment on any allegations concerning them. Elizabeth Holmes, as is her right, declined my interview requests and chose not to cooperate with this account. Prologue November 17, 2006 Tim Kemp had good news for his team. The former IBM executive was in charge of bioinformatics at Theranos, a startup with a cutting-edge blood-testing system. The company had just completed its first big live demonstration for a pharmaceutical company. Elizabeth Holmes, Theranos’s twenty-two-year-old founder, had flown to Switzerland and shown off the system’s capabilities to executives at Novartis, the European drug giant. “Elizabeth called me this morning,” Kemp wrote in an email to his fifteen-person team.
A Brief History of Everyone Who Ever Lived by Adam Rutherford
23andMe, agricultural Revolution, Albert Einstein, Alfred Russel Wallace, bioinformatics, British Empire, colonial rule, dark matter, delayed gratification, demographic transition, double helix, Drosophila, epigenetics, Google Earth, Isaac Newton, Kickstarter, longitudinal study, meta analysis, meta-analysis, out of africa, phenotype, sceptred isle, theory of mind, Thomas Malthus, twin studies
Back in the Cold Spring Harbor bar in 2000, a young British geneticist called Ewan Birney was pratting around, but inadvertently doing something quite profound at the same time. Nowadays it has become a tiresome cliché to say that a person’s passion or quintessential characteristic is ‘in their DNA’. The satirical magazine Private Eye has a whole column dedicated to this phrase flopping out of journalists’ and celebrities’ mouths. Well, Ewan Birney is a man with DNA in his DNA. These days he heads the European Bioinformatics Institute in Hinxton, just outside Cambridge, one of the great global genome powerhouses. While our contemporaries went off to Koh Samui or Goa to find themselves on their year off before going up to university, Ewan had won a place in the lab of James Watson, at Cold Spring Harbor, just at the birth of genomics, the biological science that would come to dominate all others. Maybe it was his familiarity with that bar – or maybe it was just the beer – that led him to do something quite silly, fairly trivial, but something in fact that is one of the great comments on the nature of science.
Apache Solr 3 Enterprise Search Server by Unknown
bioinformatics, continuous integration, database schema, en.wikipedia.org, fault tolerance, Firefox, full text search, information retrieval, natural language processing, performance metric, platform as a service, Ruby on Rails, web application
Lastly I want to thank all the adopters of Solr and Lucene! Without you, I wouldn't have this wonderful open source project to be so incredibly proud to be a part of! I look forward to meeting more of you at the next LuceneRevolution or Euro Lucene conference. About the Reviewers Jerome Eteve holds a MSc in IT and Sciences from the University of Lille (France). After starting his career in the field of bioinformatics where he worked as a Biological Data Management and Analysis Consultant, he's now a Senior Application Developer with interests ranging from architecture to delivering a great user experience online. He's passionate about open source technologies, search engines, and web application architecture. He now works for WCN Plc, a leading provider of recruitment software solutions. He has worked on Packt's Enterprise Solr published in 2009.
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos
Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, basic income, Bayesian statistics, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, creative destruction, crowdsourcing, Danny Hillis, data is the new oil, double helix, Douglas Hofstadter, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, global village, Google Glasses, Gödel, Escher, Bach, information retrieval, job automation, John Markoff, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, lone genius, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, NP-complete, off grid, P = NP, PageRank, pattern recognition, phenotype, planetary scale, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, scientific worldview, Second Machine Age, self-driving car, Silicon Valley, social intelligence, speech recognition, Stanford marshmallow experiment, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight, zero-sum game
Statistical Language Learning,* by Eugene Charniak (MIT Press, 1996), explains how hidden Markov models work. Statistical Methods for Speech Recognition,* by Fred Jelinek (MIT Press, 1997), describes their application to speech recognition. The story of HMM-style inference in communication is told in “The Viterbi algorithm: A personal history,” by David Forney (unpublished; online at arxiv.org/pdf/cs/0504020v2.pdf). Bioinformatics: The Machine Learning Approach,* by Pierre Baldi and Søren Brunak (2nd ed., MIT Press, 2001), is an introduction to the use of machine learning in biology, including HMMs. “Engineers look to Kalman filtering for guidance,” by Barry Cipra (SIAM News, 1993), is a brief introduction to Kalman filters, their history, and their applications. Judea Pearl’s pioneering work on Bayesian networks appears in his book Probabilistic Reasoning in Intelligent Systems* (Morgan Kaufmann, 1988).
Complexity: A Guided Tour by Melanie Mitchell
Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Albert Michelson, Alfred Russel Wallace, anti-communist, Arthur Eddington, Benoit Mandelbrot, bioinformatics, cellular automata, Claude Shannon: information theory, clockwork universe, complexity theory, computer age, conceptual framework, Conway's Game of Life, dark matter, discrete time, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, From Mathematics to the Technologies of Life and Death, Geoffrey West, Santa Fe Institute, Gödel, Escher, Bach, Henri Poincaré, invisible hand, Isaac Newton, John Conway, John von Neumann, Long Term Capital Management, mandelbrot fractal, market bubble, Menlo Park, Murray Gell-Mann, Network effects, Norbert Wiener, Norman Macrae, Paul Erdős, peer-to-peer, phenotype, Pierre-Simon Laplace, Ray Kurzweil, reversible computing, scientific worldview, stem cell, The Wealth of Nations by Adam Smith, Thomas Malthus, Turing machine
The best-known applications are in the field of coding theory, which deals with both data compression and the way codes need to be structured to be reliably transmitted. Coding theory affects nearly all of our electronic communications; cell phones, computer networks, and the worldwide global positioning system are a few examples. Information theory is also central in cryptography and in the relatively new field of bioinformatics, in which entropy and other information theory measures are used to analyze patterns in gene sequences. It has also been applied to analysis of language and music and in psychology, statistical inference, and artificial intelligence, among many other fields. Although information theory was inspired by notions of entropy in thermodynamics and statistical mechanics, it is controversial whether or not information theory has had much of a reverse impact on those and other fields of physics.
Age of Discovery: Navigating the Risks and Rewards of Our New Renaissance by Ian Goldin, Chris Kutarna
2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, Airbnb, Albert Einstein, AltaVista, Asian financial crisis, asset-backed security, autonomous vehicles, banking crisis, barriers to entry, battle of ideas, Berlin Wall, bioinformatics, bitcoin, Bonfire of the Vanities, clean water, collective bargaining, Colonization of Mars, Credit Default Swap, crowdsourcing, cryptocurrency, Dava Sobel, demographic dividend, Deng Xiaoping, Doha Development Round, double helix, Edward Snowden, Elon Musk, en.wikipedia.org, epigenetics, experimental economics, failed state, Fall of the Berlin Wall, financial innovation, full employment, Galaxy Zoo, global pandemic, global supply chain, Hyperloop, immigration reform, income inequality, indoor plumbing, industrial cluster, industrial robot, information retrieval, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invention of the printing press, Isaac Newton, Islamic Golden Age, Johannes Kepler, Khan Academy, Kickstarter, low cost airline, low cost carrier, low skilled workers, Lyft, Malacca Straits, mass immigration, megacity, Mikhail Gorbachev, moral hazard, Nelson Mandela, Network effects, New Urbanism, non-tariff barriers, Occupy movement, On the Revolutions of the Heavenly Spheres, open economy, Panamax, Pearl River Delta, personalized medicine, Peter Thiel, post-Panamax, profit motive, rent-seeking, reshoring, Robert Gordon, Robert Metcalfe, Search for Extraterrestrial Intelligence, Second Machine Age, self-driving car, Shenzhen was a fishing village, Silicon Valley, Silicon Valley startup, Skype, smart grid, Snapchat, special economic zone, spice trade, statistical model, Stephen Hawking, Steve Jobs, Stuxnet, The Future of Employment, too big to fail, trade liberalization, trade route, transaction costs, transatlantic slave trade, uber lyft, undersea cable, uranium enrichment, We are the 99%, We wanted flying cars, instead we got 140 characters, working poor, working-age population, zero day
Costandi, Moheb (2012, June 19). “Surgery on Ice.” Nature Middle East. Retrieved from www.natureasia.com. 8. Dwyer, Terence, PhD. (2015, October 1). “The Present State of Medical Science.” Interviewed by C. Kutarna, University of Oxford. 9. National Human Genome Research Institute (1998). “Twenty Questions about DNA Sequencing (and the Answers).” NHGRI. Retrieved from community.dur.ac.uk/biosci.bizhub/Bioinformatics/twenty_questions_about_DNA.htm. 10. Rincon, Paul (2014, January 15). “Science Enters $1,000 Genome Era.” BBC News. Retrieved from www.bbc.co.uk. 11. Regalado, Antonio (2014, September 24). “Emtech: Illumina Says 228,000 Human Genomes Will Be Sequenced This Year.” MIT Technology Review. Retrieved from www.technologyreview.com/news. 12. GENCODE (2015, July 15). “Statistics about the Current Human Gencode Release.”
Doing Data Science: Straight Talk From the Frontline by Cathy O'Neil, Rachel Schutt
Amazon Mechanical Turk, augmented reality, Augustin-Louis Cauchy, barriers to entry, Bayesian statistics, bioinformatics, computer vision, correlation does not imply causation, crowdsourcing, distributed generation, Edward Snowden, Emanuel Derman, fault tolerance, Filter Bubble, finite state, Firefox, game design, Google Glasses, index card, information retrieval, iterative process, John Harrison: Longitude, Khan Academy, Kickstarter, Mars Rover, Nate Silver, natural language processing, Netflix Prize, p-value, pattern recognition, performance metric, personalized medicine, pull request, recommendation engine, rent-seeking, selection bias, Silicon Valley, speech recognition, statistical model, stochastic process, text mining, the scientific method, The Wisdom of Crowds, Watson beat the top human players on Jeopardy!, X Prize
Shopping, communicating, reading news, listening to music, searching for information, expressing our opinions—all this is being tracked online, as most people know. What people might not know is that the “datafication” of our offline behavior has started as well, mirroring the online data collection revolution (more on this later). Put the two together, and there’s a lot to learn about our behavior and, by extension, who we are as a species. It’s not just Internet data, though—it’s finance, the medical industry, pharmaceuticals, bioinformatics, social welfare, government, education, retail, and the list goes on. There is a growing influence of data in most sectors and most industries. In some cases, the amount of data collected might be enough to be considered “big” (more on this in the next chapter); in other cases, it’s not. But it’s not only the massiveness that makes all this new data interesting (or poses challenges). It’s that the data itself, often in real time, becomes the building blocks of data products.
The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy by Sharon Bertsch McGrayne
Bayesian statistics, bioinformatics, British Empire, Claude Shannon: information theory, Daniel Kahneman / Amos Tversky, double helix, Edmond Halley, Fellow of the Royal Society, full text search, Henri Poincaré, Isaac Newton, Johannes Kepler, John Markoff, John Nash: game theory, John von Neumann, linear programming, longitudinal study, meta analysis, meta-analysis, Nate Silver, p-value, Pierre-Simon Laplace, placebo effect, prediction markets, RAND corporation, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman: Challenger O-ring, Robert Mercer, Ronald Reagan, speech recognition, statistical model, stochastic process, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, traveling salesman, Turing machine, Turing test, uranium enrichment, Yom Kippur War
Ron Howard, who had become interested in Bayes while at Harvard, was working on Bayesian networks in Stanford’s economic engineering department. A medical student, David E. Heckerman, became interested too and for his Ph.D. dissertation wrote a program to help pathologists diagnose lymph node diseases. Computerized diagnostics had been tried but abandoned decades earlier. Heckerman’s Ph.D. in bioinformatics concerned medicine, but his software won a prestigious national award in 1990 from the Association for Computing Machinery, the professional organization for computing. Two years later, Heckerman went to Microsoft to work on Bayesian networks. The Federal Drug Administration (FDA) allows the manufacturers of medical devices to use Bayes in their final applications for FDA approval. Devices include almost any medical item that is not a drug or biological product, items such as latex gloves, intraocular lenses, breast implants, thermometers, home AIDS kits, and artificial hips and hearts.
Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots by John Markoff
"Robert Solow", A Declaration of the Independence of Cyberspace, AI winter, airport security, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, basic income, Baxter: Rethink Robotics, Bill Duvall, bioinformatics, Brewster Kahle, Burning Man, call centre, cellular automata, Chris Urmson, Claude Shannon: information theory, Clayton Christensen, clean water, cloud computing, collective bargaining, computer age, computer vision, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, data acquisition, Dean Kamen, deskilling, don't be evil, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dynabook, Edward Snowden, Elon Musk, Erik Brynjolfsson, factory automation, From Mathematics to the Technologies of Life and Death, future of work, Galaxy Zoo, Google Glasses, Google X / Alphabet X, Grace Hopper, Gunnar Myrdal, Gödel, Escher, Bach, Hacker Ethic, haute couture, hive mind, hypertext link, indoor plumbing, industrial robot, information retrieval, Internet Archive, Internet of things, invention of the wheel, Jacques de Vaucanson, Jaron Lanier, Jeff Bezos, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, knowledge worker, Kodak vs Instagram, labor-force participation, loose coupling, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, medical residency, Menlo Park, Mitch Kapor, Mother of all demos, natural language processing, new economy, Norbert Wiener, PageRank, pattern recognition, pre–internet, RAND corporation, Ray Kurzweil, Richard Stallman, Robert Gordon, Rodney Brooks, Sand Hill Road, Second Machine Age, self-driving car, semantic web, shareholder value, side project, Silicon Valley, Silicon Valley startup, Singularitarianism, skunkworks, Skype, social software, speech recognition, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Nelson, telemarketer, telepresence, telepresence robot, Tenerife airport disaster, The Coming Technological Singularity, the medium is the message, Thorstein Veblen, Turing test, Vannevar Bush, Vernor Vinge, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, William Shockley: the traitorous eight, zero-sum game
Seated in his office at the company’s Mountain View headquarters, he read a message that warned him an alien attack was under way. Immediately after he read the message, two large men burst into his office and instructed him that it was essential he immediately accompany them to an undisclosed location in Woodside, the elite community populated by Silicon Valley’s technology executives and venture capitalists. This was Page’s surprise fortieth birthday party, orchestrated by his wife, Lucy Southworth, a Stanford bioinformatics Ph.D. A crowd of 150 people in appropriate alien-themed costumes had gathered, including Google cofounder Sergey Brin, who wore a dress. In the basement of the sprawling mansion where the party was held, a robot arm grabbed small boxes one at a time and gaily tossed the souvenirs to an appreciative crowd. The robot itself consisted of a standard Japanese-made industrial robot arm outfitted with a suction gripper hand driven by a noisy air compressor.
Nexus by Ramez Naam
artificial general intelligence, bioinformatics, Brownian motion, crowdsourcing, Golden Gate Park, hive mind, low earth orbit, mandatory minimum, Menlo Park, pattern recognition, the scientific method, upwardly mobile
He understood that the boy was leaving Thailand in a few days. Yes, he was in Bangkok. He was occupied at the moment, but would come by in a few hours. Niran hung up the phone, smiled to himself. It would be wonderful to see Thanom again. 35 ROOTS "I wasn't born Samantha Cataranes. I was born Sarita Catalan. I grew up in southern California, in a little town near San Diego. My parents were Roberto and Anita. They both worked in bioinformatics, had met on the job. I had a sister, Ana." Sorrow welled up from her. Tears began to flow again, silently running down the side of her face. Kade felt troubled, concerned, empathic. He stroked her hair, sent kindness. "My parents were hippies. The kind of hippies who worked in tech but went camping with the family, had singalongs with friends. There were always a lot of friends around the first few years.
Life on the Edge: The Coming of Age of Quantum Biology by Johnjoe McFadden, Jim Al-Khalili
agricultural Revolution, Albert Einstein, Alfred Russel Wallace, bioinformatics, complexity theory, dematerialisation, double helix, Douglas Hofstadter, Drosophila, Ernest Rutherford, Gödel, Escher, Bach, invention of the printing press, Isaac Newton, James Watt: steam engine, Louis Pasteur, New Journalism, phenotype, Richard Feynman, Schrödinger's Cat, theory of mind, traveling salesman, uranium enrichment, Zeno's paradox
Carlson, V. Gray-Schopfer, M. Dessing and C. Olsson, “Increased transcription levels induce higher mutation rates in a hypermutating cell line,” Journal of Immunology, vol. 166: 8 (2001), pp. 5051–7. 8 P. Cui, F. Ding, Q. Lin, L. Zhang, A. Li, Z. Zhang, S. Hu and J. Yu, “Distinct contributions of replication and transcription to mutation rate variation of human genomes,” Genomics, Proteomics and Bioinformatics, vol. 10: 1 (2012), pp. 4–10. 9 J. Cairns, J. Overbaugh and S. Millar, “The origin of mutants,” Nature, vol. 335 (1988), pp. 142–5. 10 John Cairns on Jim Watson, Cold Spring Harbor Oral History Collection. Interview available at: http://library.cshl.edu/oralhistory/interview/james-d-watson/meeting-jim-watson/watson/. 11 J. Gribbin, In Search of Schrödinger’s Cat (London: Wildwood House, 1984; repr.
Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data by Dipanjan Sarkar
bioinformatics, business intelligence, computer vision, continuous integration, en.wikipedia.org, general-purpose programming language, Guido van Rossum, information retrieval, Internet of things, invention of the printing press, iterative process, natural language processing, out of africa, performance metric, premature optimization, recommendation engine, self-driving car, semantic web, sentiment analysis, speech recognition, statistical model, text mining, Turing test, web application
Note here the emphasis on corpus of documents because the more diverse set of documents you have, the more topics or concepts you can generate—unlike with a single document where you will not get too many topics or concepts if it talks about a singular concept. Topic models are also often known as probabilistic statistical models, which use specific statistical techniques including singular valued decomposition and latent dirichlet allocation to discover connected latent semantic structures in text data that yield topics and concepts. They are used extensively in text analytics and even bioinformatics. Automated document summarizationis the process of using a computer program or algorithm based on statistical and ML techniques to summarize a document or corpus of documents such that we obtain a short summary that captures all the essential concepts and themes of the original document or corpus. A wide variety of techniques for building automated document summarizers exist, including various extraction- and abstraction-based techniques.
Googled: The End of the World as We Know It by Ken Auletta
23andMe, AltaVista, Anne Wojcicki, Apple's 1984 Super Bowl advert, Ben Horowitz, bioinformatics, Burning Man, carbon footprint, citizen journalism, Clayton Christensen, cloud computing, Colonization of Mars, commoditize, corporate social responsibility, creative destruction, death of newspapers, disintermediation, don't be evil, facts on the ground, Firefox, Frank Gehry, Google Earth, hypertext link, Innovator's Dilemma, Internet Archive, invention of the telephone, Jeff Bezos, jimmy wales, John Markoff, Kevin Kelly, knowledge worker, Long Term Capital Management, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Network effects, new economy, Nicholas Carr, PageRank, Paul Buchheit, Peter Thiel, Ralph Waldo Emerson, Richard Feynman, Sand Hill Road, Saturday Night Live, semantic web, sharing economy, Silicon Valley, Skype, slashdot, social graph, spectrum auction, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, strikebreaker, telemarketer, the scientific method, The Wisdom of Crowds, Upton Sinclair, X Prize, yield management, zero-sum game
Measured by growth, it was Google’s best year, with revenues soaring 60 percent to $16.6 billion, with international revenues contributing nearly half the total, and with profits climbing to $4.2 billion. Google ended the year with 16,805 full-time employees, offices in twenty countries, and the search engine available in 117 languages. And the year had been a personally happy one for Page and Brin. Page married Lucy Southworth, a former model who earned her Ph.D. in bioinformatics in January 2009 from Stanford; they married seven months after Brin wed Anne Wojcicki. But Sheryl Sandberg was worried. She had held a ranking job in the Clinton administration before, joining Google in 2001, where she supervised all online sales for AdWords and AdSense, and was regularly hailed by Fortune magazine as one of the fifty most powerful female executives in America. Sandberg came to believe Google’s vice was the flip side of its virtue.
Gnuplot in Action: Understanding Data With Graphs by Philipp Janert
bioinformatics, business intelligence, Debian, general-purpose programming language, iterative process, mandelbrot fractal, pattern recognition, random walk, Richard Stallman, six sigma, survivorship bias
For a project to be listed here, first of all I had to be aware of it. Then, the project had to be ■ ■ ■ ■ Free and open source Available for the Linux platform Active and mature Available as a standalone product and allowing interactive use (this requirement eliminates libraries and graphics command languages) 348 APPENDIX C ■ ■ C.3.1 Reasonably general purpose (this eliminates specialized tools for molecular modeling, bio-informatics, high-energy physics, and so on) Comparable to or going beyond gnuplot in at least some respects Math and statistics programming environments R The R language and environment (www.r-project.org) are in many ways the de facto standard for statistical computing and graphics using open source tools. R shares with gnuplot an emphasis on iterative work in an interactive environment. It’s extensible, and many user-contributed packages are available from the R website and its mirrors.
Inventors at Work: The Minds and Motivation Behind Modern Inventions by Brett Stern
Apple II, augmented reality, autonomous vehicles, bioinformatics, Build a better mousetrap, business process, cloud computing, computer vision, cyber-physical system, distributed generation, game design, Grace Hopper, Richard Feynman, Silicon Valley, skunkworks, Skype, smart transportation, speech recognition, statistical model, stealth mode startup, Steve Jobs, Steve Wozniak, the market place, Yogi Berra
Dougherty: Oftentimes, inventors who are prosecuting their application pro se are unaware that they may ask the examiner for assistance in drafting allowable claims if there is allowable subject matter in the written disclosure. The examiner’s function is to allow valid patents. So, they will help the inventor come to an allowable subject matter if it exists in the application. Stern: Which technologies or fields exhibit high-growth trends in terms of patents? Calvert: One area that is going to be big is bioinformatics, which is biology and computer software working together. Dougherty: Medical device art is a high-growth area, too. People are living longer and they’re seeking to reduce costs for an enhanced life. Devices are getting smaller. Nanotechnology is already enabling medical devices, for example, that can travel through your bloodstream, collecting and reporting medical data in real time. Calvert: Another area that’s booming is electronic games and betting devices in the gambling industry.
CTOs at Work by Scott Donaldson, Stanley Siegel, Gary Donaldson
Amazon Web Services, bioinformatics, business intelligence, business process, call centre, centre right, cloud computing, computer vision, connected car, crowdsourcing, data acquisition, distributed generation, domain-specific language, glass ceiling, orbital mechanics / astrodynamics, pattern recognition, Pluto: dwarf planet, QR code, Richard Feynman, Ruby on Rails, shareholder value, Silicon Valley, Skype, smart grid, smart meter, software patent, thinkpad, web application, zero day, zero-sum game
For example, in the ISR (intelligence, surveillance and reconnaissance) domain, we produce sensors that generate the bits, transfer those bits through networks, wireless or wired, convert the bits into data, into knowledge, and into decisions through the processing, exploitation, and dissemination chain. With a teammate we developed a brand-new type of biological sensor that we called “TIGER” (Threat ID through Genetic Evaluation of Risk). That technology won The Wall Street Journal “gold” Technology Innovation Award in 2009 for the best invention of the year. It relies on a combination of advanced biotech hardware with groundbreaking bio-informatics techniques that were based on our radar signal processing expertise. Information from a sensor like that can feed into our epidemiology and disease tracking work. That's an example of a sensor at the front end through information flow at the back end. In the cyber security domain, our subsidiary, CloudShield, has a very special piece of hardware that enables real-time, deep packet inspection of network traffic at network line speeds, and that allows you to find cyber threats embedded in the traffic.
The Speed of Dark by Elizabeth Moon
I stare at him and almost forget to stand up and say the words of the Nicene Creed, which is what comes next. I believe in God the Father, maker of heaven and earth and of all things seen and unseen. I believe God is important and does not make mistakes. My mother used to joke about God making mistakes, but I do not think if He is God He makes mistakes. So it is not a silly question. Do I want to be healed?And of what? The only self I know is this self, the person I am now, the autistic bioinformatics specialist fencer lover of Marjory. And I believe in his only begotten son, Jesus Christ, who actually in the flesh asked that question of the man by the pool. The man who perhaps—the story does not say—had gone there because people were Page 183 tired of him being sick and disabled, who perhaps had been content to lie down all day, but he got in the way. What would Jesus have done if the man had said, “No, I don’t want to be healed; I am quite content as I am”?
From Counterculture to Cyberculture: Stewart Brand, the Whole Earth Network, and the Rise of Digital Utopianism by Fred Turner
1960s counterculture, A Declaration of the Independence of Cyberspace, Apple's 1984 Super Bowl advert, back-to-the-land, bioinformatics, Buckminster Fuller, business cycle, Claude Shannon: information theory, complexity theory, computer age, conceptual framework, Danny Hillis, dematerialisation, distributed generation, Douglas Engelbart, Douglas Engelbart, Dynabook, Electric Kool-Aid Acid Test, From Mathematics to the Technologies of Life and Death, future of work, game design, George Gilder, global village, Golden Gate Park, Hacker Ethic, Haight Ashbury, hive mind, Howard Rheingold, informal economy, invisible hand, Jaron Lanier, John Markoff, John von Neumann, Kevin Kelly, knowledge economy, knowledge worker, market bubble, Marshall McLuhan, mass immigration, means of production, Menlo Park, Mitch Kapor, Mother of all demos, new economy, Norbert Wiener, peer-to-peer, post-industrial society, postindustrial economy, Productivity paradox, QWERTY keyboard, Ralph Waldo Emerson, RAND corporation, Richard Stallman, Robert Shiller, Robert Shiller, Ronald Reagan, Shoshana Zuboff, Silicon Valley, Silicon Valley ideology, South of Market, San Francisco, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, technoutopianism, Ted Nelson, Telecommunications Act of 1996, The Hackers Conference, theory of mind, urban renewal, Vannevar Bush, Whole Earth Catalog, Whole Earth Review, Yom Kippur War
Hosted at Los Alamos by Christopher Langton, then a postdoctoral researcher at the laboratory, the conference brought together 160 biologists, physicists, anthropologists, and computer scientists. Like the scientists and technicians of the Rad Lab and Los Alamos in World War II, the contributors to the ﬁrst Artiﬁcial Life Conference quickly established an intellectual trading zone. Specialists in robotics presented papers on questions of cultural evolution; computer scientists used new algorithms to model seemingly biological patterns of growth; bioinformatics specialists applied what they believed to be principles of natural ecologies to the development of social structures. For these scientists, as formerly for members of the Rad Lab and the cold war research institutes that followed it, systems theory served as a contact language and computers served as key supports for a systems orientation toward interdisciplinary work. Furthermore, computers granted participants in the workshop a familiar God’s-eye point of view.
Cooked: A Natural History of Transformation by Michael Pollan
biofilm, bioinformatics, Columbian Exchange, correlation does not imply causation, creative destruction, dematerialisation, Drosophila, energy security, Gary Taubes, Hernando de Soto, hygiene hypothesis, Kickstarter, Louis Pasteur, Mason jar, microbiome, peak oil, Ralph Waldo Emerson, Steven Pinker, women in the workforce
Blaser, Martin J. “Who Are We? Indigenous Microbes and the Ecology of Human Disease.” European Molecular Biology Organization, Vol. 7, No. 10, 2006. Bravo, Javier A., et al. “Ingestion of Lactobacillus Strain Regulates Emotional Behavior and Central GABA Receptor Expression in a Mouse Via the Vagus Nerve.” www.pnas.org/cgi/doi/10.1073/pnas.1102999108. Desiere, Frank, et al. “Bioinformatics and Data Knowledge: The New Frontiers for Nutrition and Food.” Trends in Food Science & Technology 12 (2002): 215–29. Douwes, J., et al. “Farm Exposure in Utero May Protect Against Asthma.” European Respiratory Journal 32 (2008): 603–11. Ege, M.J., et al. Parsifal study team. “Prenatal Farm Exposure Is Related to the Expression of Receptors of the Innate Immunity and to Atopic Sensitization in School-Age Children.”
Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking by Foster Provost, Tom Fawcett
Albert Einstein, Amazon Mechanical Turk, big data - Walmart - Pop Tarts, bioinformatics, business process, call centre, chief data officer, Claude Shannon: information theory, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, data acquisition, David Brooks, en.wikipedia.org, Erik Brynjolfsson, Gini coefficient, information retrieval, intangible asset, iterative process, Johann Wolfgang von Goethe, Louis Pasteur, Menlo Park, Nate Silver, Netflix Prize, new economy, p-value, pattern recognition, placebo effect, price discrimination, recommendation engine, Ronald Coase, selection bias, Silicon Valley, Skype, speech recognition, Steve Jobs, supply-chain management, text mining, The Signal and the Noise by Nate Silver, Thomas Bayes, transaction costs, WikiLeaks
A classification of pure malt Scotch whiskies. Applied Statistics, 43 (1), 237–257. Leigh, D. (1995). Neural networks for credit scoring. In Goonatilake, S., & Treleaven, P. (Eds.), Intelligent Systems for Finance and Business, pp. 61–69. John Wiley and Sons Ltd., West Sussex, England. Letunic, & Bork (2006). Interactive tree of life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics, 23 (1). Lin, J.-H., & Vitter, J. S. (1994). A theory for memory-based learning. Machine Learning, 17, 143–167. Lloyd, S. P. (1982). Least square quantization in PCM. IEEE Transactions on Information Theory, 28 (2), 129–137. MacKay, D. (2003). Information Theory, Inference and Learning Algorithms, Chapter 20. An Example Inference Task: Clustering. Cambridge University Press. MacQueen, J.
Natural language processing with Python by Steven Bird, Ewan Klein, Edward Loper
bioinformatics, business intelligence, conceptual framework, Donald Knuth, elephant in my pajamas, en.wikipedia.org, finite state, Firefox, Guido van Rossum, information retrieval, Menlo Park, natural language processing, P = NP, search inside the book, speech recognition, statistical model, text mining, Turing test
In Proceedings of the 14th Conference on Computational Linguistics (COLING), pages 539–545, 1992. [Heim and Kratzer, 1998] Irene Heim and Angelika Kratzer. Semantics in Generative Grammar. Blackwell, 1998. [Hirschman et al., 2005] Lynette Hirschman, Alexander Yeh, Christian Blaschke, and Alfonso Valencia. Overview of BioCreAtIvE: critical assessment of information extrac tion for biology. BMC Bioinformatics, 6, May 2005. Supplement 1. [Hodges, 1977] Wilfred Hodges. Logic. Penguin Books, Harmondsworth, 1977. [Huddleston and Pullum, 2002] Rodney D. Huddleston and Geoffrey K. Pullum. The Cambridge Grammar of the English Language. Cambridge University Press, 2002. [Hunt and Thomas, 2000] Andrew Hunt and David Thomas. The Pragmatic Programmer: From Journeyman to Master. Addison Wesley, 2000. [Indurkhya and Damerau, 2010] Nitin Indurkhya and Fred Damerau, editors.
Superintelligence: Paths, Dangers, Strategies by Nick Bostrom
agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, anti-communist, artificial general intelligence, autonomous vehicles, barriers to entry, Bayesian statistics, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, demographic transition, different worldview, Donald Knuth, Douglas Hofstadter, Drosophila, Elon Musk, en.wikipedia.org, endogenous growth, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, Gödel, Escher, Bach, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John Markoff, John von Neumann, knowledge worker, longitudinal study, Menlo Park, meta analysis, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Norbert Wiener, NP-complete, nuclear winter, optical character recognition, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, strong AI, superintelligent machines, supervolcano, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, transaction costs, Turing machine, Vernor Vinge, Watson beat the top human players on Jeopardy!, World Values Survey, zero-sum game
They also provide important insight into the concept of causality.28 One advantage of relating learning problems from specific domains to the general problem of Bayesian inference is that new algorithms that make Bayesian inference more efficient will then yield immediate improvements across many different areas. Advances in Monte Carlo approximation techniques, for example, are directly applied in computer vision, robotics, and computational genetics. Another advantage is that it lets researchers from different disciplines more easily pool their findings. Graphical models and Bayesian statistics have become a shared focus of research in many fields, including machine learning, statistical physics, bioinformatics, combinatorial optimization, and communication theory.35 A fair amount of the recent progress in machine learning has resulted from incorporating formal results originally derived in other academic fields. (Machine learning applications have also benefitted enormously from faster computers and greater availability of large data sets.) * * * Box 1 An optimal Bayesian agent An ideal Bayesian agent starts out with a “prior probability distribution,” a function that assigns probabilities to each “possible world” (i.e. to each maximally specific way the world could turn out to be).29 This prior incorporates an inductive bias such that simpler possible worlds are assigned higher probabilities.
The Information: A History, a Theory, a Flood by James Gleick
Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, AltaVista, bank run, bioinformatics, Brownian motion, butterfly effect, citation needed, Claude Shannon: information theory, clockwork universe, computer age, conceptual framework, crowdsourcing, death of newspapers, discovery of DNA, Donald Knuth, double helix, Douglas Hofstadter, en.wikipedia.org, Eratosthenes, Fellow of the Royal Society, Gödel, Escher, Bach, Henri Poincaré, Honoré de Balzac, index card, informal economy, information retrieval, invention of the printing press, invention of writing, Isaac Newton, Jacquard loom, Jaron Lanier, jimmy wales, Johannes Kepler, John von Neumann, Joseph-Marie Jacquard, lifelogging, Louis Daguerre, Marshall McLuhan, Menlo Park, microbiome, Milgram experiment, Network effects, New Journalism, Norbert Wiener, Norman Macrae, On the Economy of Machinery and Manufactures, PageRank, pattern recognition, phenotype, Pierre-Simon Laplace, pre–internet, Ralph Waldo Emerson, RAND corporation, reversible computing, Richard Feynman, Rubik’s Cube, Simon Singh, Socratic dialogue, Stephen Hawking, Steven Pinker, stochastic process, talking drums, the High Line, The Wisdom of Crowds, transcontinental railway, Turing machine, Turing test, women in the workforce
.… But now the damn thing is everywhere.”) Like any good meme, it spawned mutations. The “jumping the shark” entry in Wikipedia advised in 2009, “See also: jumping the couch; nuking the fridge.” Is this science? In his 1983 column, Hofstadter proposed the obvious memetic label for such a discipline: memetics. The study of memes has attracted researchers from fields as far apart as computer science and microbiology. In bioinformatics, chain letters are an object of study. They are memes; they have evolutionary histories. The very purpose of a chain letter is replication; whatever else a chain letter may say, it embodies one message: Copy me. One student of chain-letter evolution, Daniel W. VanArsdale, listed many variants, in chain letters and even earlier texts: “Make seven copies of it exactly as it is written” ; “Copy this in full and send to nine friends” ; “And if any man shall take away from the words of the book of this prophecy, God shall take away his part out of the book of life” [Revelation 22:19].♦ Chain letters flourished with the help of a new nineteenth-century technology: “carbonic paper,” sandwiched between sheets of writing paper in stacks.
The Rust Programming Language by Steve Klabnik, Carol Nichols
Through efforts such as this book, the Rust teams want to make systems concepts more accessible to more people, especially those new to programming. Companies Hundreds of companies, large and small, use Rust in production for a variety of tasks. Those tasks include command line tools, web services, DevOps tooling, embedded devices, audio and video analysis and transcoding, cryptocurrencies, bioinformatics, search engines, Internet of Things applications, machine learning, and even major parts of the Firefox web browser. Open Source Developers Rust is for people who want to build the Rust programming language, community, developer tools, and libraries. We’d love to have you contribute to the Rust language. People Who Value Speed and Stability Rust is for people who crave speed and stability in a language.
Programming Rust: Fast, Safe Systems Development by Jim Blandy, Jason Orendorff
bioinformatics, bitcoin, Donald Knuth, Elon Musk, Firefox, mandelbrot fractal, MVC pattern, natural language processing, side project, sorting algorithm, speech recognition, Turing test, type inference, WebSocket
We’ll also cover a wide range of topics that come up naturally as your project grows, including how to document and test Rust code, how to silence unwanted compiler warnings, how to use Cargo to manage project dependencies and versioning, how to publish open source libraries on crates.io, and more. Crates Rust programs are made of crates. Each crate is a Rust project: all the source code for a single library or executable, plus any associated tests, examples, tools, configuration, and other junk. For your fern simulator, you might use third-party libraries for 3D graphics, bioinformatics, parallel computation, and so on. These libraries are distributed as crates (see Figure 8-1). Figure 8-1. A crate and its dependencies The easiest way to see what crates are and how they work together is to use cargo build with the --verbose flag to build an existing project that has some dependencies. We did this, using “A Concurrent Mandelbrot Program” as our example. The results are shown here: $ cd mandelbrot $ cargo clean # delete previously compiled code $ cargo build --verbose Updating registry `https://github.com/rust-lang/crates.io-index` Downloading image v0.6.1 Downloading crossbeam v0.2.9 Downloading gif v0.7.0 Downloading png v0.4.2 ...
The Art of UNIX Programming by Eric S. Raymond
A Pattern Language, Albert Einstein, barriers to entry, bioinformatics, Clayton Christensen, combinatorial explosion, commoditize, correlation coefficient, David Brooks, Debian, domain-specific language, don't repeat yourself, Donald Knuth, Everything should be made as simple as possible, facts on the ground, finite state, general-purpose programming language, George Santayana, Innovator's Dilemma, job automation, Larry Wall, MVC pattern, pattern recognition, Paul Graham, peer-to-peer, premature optimization, pre–internet, publish or perish, revision control, RFC: Request For Comment, Richard Stallman, Robert Metcalfe, Steven Levy, transaction costs, Turing complete, Valgrind, wage slave, web application
XHTML, the latest version of HTML, is also an XML application described by a DTD, which explains the family resemblance between XHTML and DocBook tags. The XHTML toolchain consists of Web browsers that can format HTML as flat ASCII, together with any of a number of ad-hoc HTML-to-print utilities. Many other XML DTDs are maintained to help people exchange structured information in fields as diverse as bioinformatics and banking. You can look at a list of repositories to get some idea of the variety available. The DocBook Toolchain Normally, what you'll do to make XHTML from your DocBook sources is use the xmlto(1) front end. Your commands will look like this: bash$ xmlto xhtml foo.xml bash$ ls *.html ar01s02.html ar01s03.html ar01s04.html index.html In this example, you converted an XML-DocBook document named foo.xml with three top-level sections into an index page and two parts.
Hadoop: The Definitive Guide by Tom White
Amazon Web Services, bioinformatics, business intelligence, combinatorial explosion, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, full text search, Grace Hopper, information retrieval, Internet Archive, Kickstarter, linked data, loose coupling, openstreetmap, recommendation engine, RFID, SETI@home, social graph, web application
This would involve sampling page view logs (because the total page view data for a popular website is huge), grouping it by time and then finding the number of new users at different time points via a custom reduce script. This is a good example where both SQL and MapReduce are required for solving the end user problem and something that is possible to achieve easily with Hive. Data analysis Hive and Hadoop can be easily used for training and scoring for data analysis applications. These data analysis applications can span multiple domains such as popular websites, bioinformatics companies, and oil exploration companies. A typical example of such an application in the online ad network industry would be the prediction of what features of an ad makes it more likely to be noticed by the user. The training phase typically would involve identifying the response metric and the predictive features. In this case, a good metric to measure the effectiveness of an ad could be its click-through rate.
The Transhumanist Reader by Max More, Natasha Vita-More
23andMe, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, augmented reality, Bill Joy: nanobots, bioinformatics, brain emulation, Buckminster Fuller, cellular automata, clean water, cloud computing, cognitive bias, cognitive dissonance, combinatorial explosion, conceptual framework, Conway's Game of Life, cosmological principle, data acquisition, discovery of DNA, Douglas Engelbart, Drosophila, en.wikipedia.org, endogenous growth, experimental subject, Extropian, fault tolerance, Flynn Effect, Francis Fukuyama: the end of history, Frank Gehry, friendly AI, game design, germ theory of disease, hypertext link, impulse control, index fund, John von Neumann, joint-stock company, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, Louis Pasteur, Menlo Park, meta analysis, meta-analysis, moral hazard, Network effects, Norbert Wiener, pattern recognition, Pepto Bismol, phenotype, positional goods, prediction markets, presumed consent, Ray Kurzweil, reversible computing, RFID, Ronald Reagan, scientific worldview, silicon-based life, Singularitarianism, social intelligence, stem cell, stochastic process, superintelligent machines, supply-chain management, supply-chain management software, technological singularity, Ted Nelson, telepresence, telepresence robot, telerobotics, the built environment, The Coming Technological Singularity, the scientific method, The Wisdom of Crowds, transaction costs, Turing machine, Turing test, Upton Sinclair, Vernor Vinge, Von Neumann architecture, Whole Earth Review, women in the workforce, zero-sum game
The assertion is that genetic enhancement necessarily implies experimentation without consent and this violates bedrock bioethical principles requiring the protection of human subjects. Consequently, there is an unbridgeable gap which would-be enhancers cannot ethically cross. This view incorporates a rather static view of what it will be possible for future genetic enhancers to know and test beforehand. Any genetic enhancement techniques will first be extensively tested and perfected in animal models. Second, a vastly expanded bioinformatics enterprise will become crucial to understanding the ramifications of proposed genetic interventions (National Resource Center for Cell Analysis). As scientific understanding improves, the risk versus benefit calculations of various prospective genetic enhancements of embryos will shift. The arc of scientific discovery and technological progress strongly suggests that it will happen in the next few decades.
Coders at Work by Peter Seibel
Ada Lovelace, bioinformatics, cloud computing, Conway's Game of Life, domain-specific language, don't repeat yourself, Donald Knuth, fault tolerance, Fermat's Last Theorem, Firefox, George Gilder, glass ceiling, Guido van Rossum, HyperCard, information retrieval, Larry Wall, loose coupling, Marc Andreessen, Menlo Park, Metcalfe's law, Perl 6, premature optimization, publish or perish, random walk, revision control, Richard Stallman, rolodex, Ruby on Rails, Saturday Night Live, side project, slashdot, speech recognition, the scientific method, Therac-25, Turing complete, Turing machine, Turing test, type inference, Valgrind, web application
But we have to be willing to try and take advantage of that, but also take advantage of the integration of systems and the fact that data's coming from everywhere. It's no longer encapsulated with the program, the code. We're seeing now, I think, vast amounts of data, which is accessible. And it's numeric data as well as the informational kinds of data, and will be stored all over the globe, especially if you're working in some of the bioinformatics kind of stuff. And we have to be able to create a platform, probably composed of a lot of parts, which is going to enable those things to come together—computational capability that is probably quite different than we have now. And we also need to, sooner or later, address usability and integrity of these systems. Seibel: Usability from the point of the programmer, or usability for the end users of these systems?
The Stack: On Software and Sovereignty by Benjamin H. Bratton
1960s counterculture, 3D printing, 4chan, Ada Lovelace, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, basic income, Benevolent Dictator For Life (BDFL), Berlin Wall, bioinformatics, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, carbon footprint, carbon-based life, Cass Sunstein, Celebration, Florida, charter city, clean water, cloud computing, connected car, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Douglas Engelbart, Edward Snowden, Elon Musk, en.wikipedia.org, Eratosthenes, Ethereum, ethereum blockchain, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, Hyperloop, illegal immigration, industrial robot, information retrieval, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Jacob Appelbaum, Jaron Lanier, Joan Didion, John Markoff, Joi Ito, Jony Ive, Julian Assange, Khan Academy, liberal capitalism, lifelogging, linked data, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megastructure, Menlo Park, Minecraft, MITM: man-in-the-middle, Monroe Doctrine, Network effects, new economy, offshore financial centre, oil shale / tar sands, packet switching, PageRank, pattern recognition, peak oil, peer-to-peer, performance metric, personalized medicine, Peter Eisenman, Peter Thiel, phenotype, Philip Mirowski, Pierre-Simon Laplace, place-making, planetary scale, RAND corporation, recommendation engine, reserve currency, RFID, Robert Bork, Sand Hill Road, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, Slavoj Žižek, smart cities, smart grid, smart meter, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, TaskRabbit, the built environment, The Chicago School, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, undersea cable, universal basic income, urban planning, Vernor Vinge, Washington Consensus, web application, Westphalian system, WikiLeaks, working poor, Y Combinator
This also relates to what Heidegger once called our “confrontation with planetary technology” (an encounter that he never managed to actually make and which most Heideggerians manage to endlessly defer, or “differ”).15 That encounter should be motivated by an invested interest in several “planetary technologies” working at various scales of matter, and based on, in many respects, what cheap supercomputing, broadband networking, and isomorphic data management methodologies make possible to research and application. These include—but are no means limited to—geology (e.g., geochemistry, geophysics, oceanography, glaciology), earth sciences (e.g., focusing on the atmosphere, lithospere, biosphere, hydrosphere), as well as the various programs of biotechnology (e.g., bioinformatics, synthetic biology, cell therapy), of nanotechnology (e.g., materials, machines, medicines), of economics (e.g., modeling price, output cycles, disincentivized externalities), of neuroscience (e.g., behavioral, cognitive, clinical), and of astronomy (e.g., astrobiology, extragalactic imaging, cosmology). In that all of these are methodologically and even epistemologically informed by computer science (e.g., algorithmic modeling, macrosensors and microsensors, data structure optimization, information theory, data visualization, cryptography, networked collaboration), then all of these planetary technologies are also planetary computational technologies.
The Quest: Energy, Security, and the Remaking of the Modern World by Daniel Yergin
"Robert Solow", addicted to oil, Albert Einstein, Asian financial crisis, Ayatollah Khomeini, banking crisis, Berlin Wall, bioinformatics, borderless world, BRICs, business climate, carbon footprint, Carmen Reinhart, cleantech, Climategate, Climatic Research Unit, colonial rule, Colonization of Mars, corporate governance, cuban missile crisis, data acquisition, decarbonisation, Deng Xiaoping, Dissolution of the Soviet Union, diversification, diversified portfolio, Elon Musk, energy security, energy transition, Exxon Valdez, facts on the ground, Fall of the Berlin Wall, fear of failure, financial innovation, flex fuel, global supply chain, global village, high net worth, hydraulic fracturing, income inequality, index fund, informal economy, interchangeable parts, Intergovernmental Panel on Climate Change (IPCC), James Watt: steam engine, John von Neumann, Kenneth Rogoff, life extension, Long Term Capital Management, Malacca Straits, market design, means of production, megacity, Menlo Park, Mikhail Gorbachev, Mohammed Bouazizi, mutually assured destruction, new economy, Norman Macrae, North Sea oil, nuclear winter, off grid, oil rush, oil shale / tar sands, oil shock, Paul Samuelson, peak oil, Piper Alpha, price mechanism, purchasing power parity, rent-seeking, rising living standards, Robert Metcalfe, Robert Shiller, Robert Shiller, Ronald Coase, Ronald Reagan, Sand Hill Road, shareholder value, Silicon Valley, Silicon Valley startup, smart grid, smart meter, South China Sea, sovereign wealth fund, special economic zone, Stuxnet, technology bubble, the built environment, The Nature of the Firm, the new new thing, trade route, transaction costs, unemployed young men, University of East Anglia, uranium enrichment, William Langewiesche, Yom Kippur War
“This is how big the first ears of corn were. We have had agriculture for 10,000 years. We did not know that DNA was the genetic material until 1946. The Green Revolution in the late 1960s was an example of beginning to apply modern biology to plant improvement.”19 Many of the people working in this field are applying the know-how that emerged from the sequencing of the human genome. Calling on the new fields of bioinformatics and computational biology, and using what is called highthroughput experimentation, they seek to identify specific genes and their functions. The aim is to speed up the process of evolution, selecting for characteristics that will make such tall grasses as miscanthus and switchgrass effective energy crops that can grow in marginal lands that would not be cultivated for food. That means selecting for such objectives as speedy growth, accessibility of the sugars, resistance to drought, and lower requirements for fertilizer.
Golden Holocaust: Origins of the Cigarette Catastrophe and the Case for Abolition by Robert N. Proctor
bioinformatics, carbon footprint, clean water, corporate social responsibility, Deng Xiaoping, desegregation, facts on the ground, friendly fire, germ theory of disease, global pandemic, index card, Indoor air pollution, information retrieval, invention of gunpowder, John Snow's cholera map, language of flowers, life extension, New Journalism, optical character recognition, pink-collar, Ponzi scheme, Potemkin village, publication bias, Ralph Nader, Ronald Reagan, selection bias, speech recognition, stem cell, telemarketer, Thomas Kuhn: the structure of scientific revolutions, Triangle Shirtwaist Factory, Upton Sinclair, Yogi Berra
MCV faculty also helped undermine public health advocacy: in 1990 James Kilpatrick from biostatistics, working also as a consultant for the Tobacco Institute, wrote to the editor of the New York Times criticizing Stanton Glantz and William Parmley’s demonstration of thirty-five thousand U.S. cardiovascular deaths per annum from exposure to secondhand smoke.49 Glantz by this time was commonly ridiculed by the industry, which even organized skits (to practice courtroom scenarios) in which health advocates were given thinly disguised names: Glantz was “Ata Glance” or “Stanton Glass, professional anti-smoker”; Alan Blum was “Alan Glum” representing “Doctors Ought to Kvetch” or “Doctors Opposed to People Exhaling Smoke” (DOPES); Richard Daynard was “Richard Blowhard” from the “Product Liability Education Alliance,” and so forth.50 VCU continues even today to have close research relationships with Philip Morris, covering topics as diverse as pharmacogenomics, bioinformatics, and behavioral genetics.51 SYMBIOSIS It would be a mistake to characterize this interpenetration of tobacco and academia as merely a “conflict of interest”; the relationship has been far more symbiotic. We are really talking about a confluence of interests, and sometimes even a virtual identity of interests. The Medical College of Virginia was “sold American” by the early 1940s and remained one of the tobacco industry’s staunchest allies for seven decades.