69 results back to index

Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei


bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, web application

By mining in the gene dimension, we may find patterns shared by multiple genes, or cluster genes into groups. For example, we may find a group of genes that express themselves similarly, which is highly interesting in bioinformatics, such as in finding pathways. ■ When analyzing in the sample/condition dimension, we treat each sample/condition as an object and treat the genes as attributes. In this way, we may find patterns of samples/conditions, or cluster samples/conditions into groups. For example, we may find the differences in gene expression by comparing a group of tumor samples and nontumor samples. Gene expression Gene expression matrices are popular in bioinformatics research and development. For example, an important task is to classify a new gene using the expression data of the gene and that of other genes in known classes. Symmetrically, we may classify a new sample (e.g., a new patient) using the expression data of the sample and that of samples in known classes (e.g., tumor and nontumor).

Every enterprise benefits from collecting and analyzing its data: Hospitals can spot trends and anomalies in their patient records, search engines can do better ranking and ad placement, and environmental and public health agencies can spot patterns and abnormalities in their data. The list continues, with cybersecurity and computer network intrusion detection; monitoring of the energy consumption of household appliances; pattern analysis in bioinformatics and pharmaceutical data; financial and business intelligence data; spotting trends in blogs, Twitter, and many more. Storage is inexpensive and getting even less so, as are data sensors. Thus, collecting and storing data is easier than ever before. The problem then becomes how to analyze the data. This is exactly the focus of this Third Edition of the book. Jiawei, Micheline, and Jian give encyclopedic coverage of all the related methods, from the classic topics of clustering and classification, to database methods (e.g., association rules, data cubes) to more recent and advanced topics (e.g., SVD/PCA, wavelets, support vector machines).

Web mining can help us learn about the distribution of information on the WWW in general, characterize and classify web pages, and uncover web dynamics and the association and other relationships among different web pages, users, communities, and web-based activities. It is important to keep in mind that, in many applications, multiple types of data are present. For example, in web mining, there often exist text data and multimedia data (e.g., pictures and videos) on web pages, graph data like web graphs, and map data on some web sites. In bioinformatics, genomic sequences, biological networks, and 3-D spatial structures of genomes may coexist for certain biological objects. Mining multiple data sources of complex data often leads to fruitful findings due to the mutual enhancement and consolidation of such multiple sources. On the other hand, it is also challenging because of the difficulties in data cleaning and data integration, as well as the complex interactions among the multiple sources of such data.


pages: 362 words: 104,308

Forty Signs of Rain by Kim Stanley Robinson


bioinformatics, business intelligence, double helix, experimental subject, phenotype, prisoner's dilemma, Ronald Reagan, stem cell, the scientific method

It was not a matter of her being warm and fuzzy, as you might expect from the usual characterizations of feminine thought—on the contrary, Anna’s scientific work (she still often coauthored papers in statistics, despite her bureaucratic load) often displayed a finicky perfectionism that made her a very meticulous scientist, a first-rate statistician—smart, quick, competent in a range of fields and really excellent in more than one. As good a scientist as one could find for the rather odd job of running the Bioinformatics Division at NSF, good almost to the point of exaggeration—too precise, too interrogatory—it kept her from pursuing a course of action with drive. Then again, at NSF maybe that was an advantage. In any case she was so intense about it. A kind of Puritan of science, rational to an extreme. And yet of course at the same time that was all such a front, as with the early Puritans; the hyperrational coexisted in her with all the emotional openness, intensity, and variability that was the American female interactional paradigm and social role.

This was a major manifestation of the peer-review process, a process Frank thoroughly approved of—in principle. But a year of it was enough. Anna had been watching him, and now she said, “I suppose it is a bit of a rat race.” “Well, no more than anywhere else. In fact if I were home it’d probably be worse.” They laughed. “And you have your journal work too.” “That’s right.” Frank waved at the piles of typescripts: three stacks for Review of Bioinformatics, two for The Journal of Sociobiology. “Always behind. Luckily the other editors are better at keeping up.” Anna nodded. Editing a journal was a privilege and an honor, even though usually unpaid—indeed, one often had to continue to subscribe to a journal just to get copies of what one had edited. It was another of science’s many noncompensated activities, part of its extensive economy of social credit.

A key to any part of the mystery could be very valuable. Frank scrolled down the pages of the application with practiced speed. Yann Pierzinski, Ph.D. in biomath, Caltech. Still doing postdoc work with his thesis advisor there, a man Frank had come to consider a bit of a credit hog, if not worse. It was interesting, then, that Pierzinski had gone down to Torrey Pines to work on a temporary contract, for a bioinformatics researcher whom Frank didn’t know. Perhaps that had been a bid to escape the advisor. But now he was back. Frank dug into the substantive part of the proposal. The algorithm set was one Pierzinski had been working on even back in his dissertation. Chemical mechanics of protein creation as a sort of natural algorithm, in effect. Frank considered the idea, operation by operation. This was his real expertise; this was what had interested him from childhood, when the puzzles solved had been simple ciphers.


pages: 565 words: 151,129

The Zero Marginal Cost Society: The Internet of Things, the Collaborative Commons, and the Eclipse of Capitalism by Jeremy Rifkin


3D printing, additive manufacturing, Airbnb, autonomous vehicles, back-to-the-land, big-box store, bioinformatics, bitcoin, business process, Chris Urmson, clean water, cleantech, cloud computing, collaborative consumption, collaborative economy, Community Supported Agriculture, computer vision, crowdsourcing, demographic transition, distributed generation,, Frederick Winslow Taylor, global supply chain, global village, Hacker Ethic, industrial robot, informal economy, intermodal, Internet of things, invisible hand, Isaac Newton, James Watt: steam engine, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Julian Assange, Kickstarter, knowledge worker, labour mobility, Mahatma Gandhi, manufacturing employment, Mark Zuckerberg, market design, means of production, meta analysis, meta-analysis, natural language processing, new economy, New Urbanism, nuclear winter, Occupy movement, oil shale / tar sands, pattern recognition, peer-to-peer lending, personalized medicine, phenotype, planetary scale, price discrimination, profit motive, RAND corporation, randomized controlled trial, Ray Kurzweil, RFID, Richard Stallman, risk/return, Ronald Coase, search inside the book, self-driving car, shareholder value, sharing economy, Silicon Valley, Skype, smart cities, smart grid, smart meter, social web, software as a service, spectrum auction, Steve Jobs, Stewart Brand, the built environment, The Nature of the Firm, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas L Friedman, too big to fail, transaction costs, urban planning, Watson beat the top human players on Jeopardy!, web application, Whole Earth Catalog, Whole Earth Review, WikiLeaks, working poor, Zipcar

Reducing the cost of electricity in the management of data centers goes hand in hand with cutting the cost of storing data, an ever larger part of the data-management process. And the sheer volume of data is mushrooming faster than the capacity of hard drives to save it. Researchers are just beginning to experiment with a new way of storing data that could eventually drop the marginal cost to near zero. In January 2013 scientists at the European Bioinformatics Institute in Cambridge, England, announced a revolutionary new method of storing massive electronic data by embedding it in synthetic DNA. Two researchers, Nick Goldman and Ewan Birney, converted text from five computer files—which included an MP3 recording of Martin Luther King Jr.’s “I Have a Dream” speech, a paper by James Watson and Francis Crick describing the structure of DNA, and all of Shakespeare’s sonnets and plays—and converted the ones and zeros of digital information into the letters that make up the alphabet of the DNA code.

Harvard researcher George Church notes that the information currently stored in all the disk drives in the world could fit in a tiny bit of DNA the size of the palm of one’s hand. Researchers add that DNA information can be preserved for centuries, as long as it is kept in a dark, cool environment.65 At this early stage of development, the cost of reading the code is high and the time it takes to decode information is substantial. Researchers, however, are reasonably confident that an exponential rate of change in bioinformatics will drive the marginal cost to near zero over the next several decades. A near zero marginal cost communication/energy infrastructure for the Collaborative Age is now within sight. The technology needed to make it happen is already being deployed. At present, it’s all about scaling up and building out. When we compare the increasing expenses of maintaining an old Second Industrial Revolution communication/energy matrix of centralized telecommunications and centralized fossil fuel energy generation, whose costs are rising with each passing day, with a Third Industrial Revolution communication/energy matrix whose costs are dramatically shrinking, it’s clear that the future lies with the latter.

Its network of thousands of scientists and plant breeders is continually searching for heirloom and wild seeds, growing them out to increase seed stock, and ferrying samples to the vault for long-term storage.32 In 2010, the trust launched a global program to locate, catalog, and preserve the wild relatives of the 22 major food crops humanity relies on for survival. The intensification of genetic-Commons advocacy comes at a time when new IT and computing technology is speeding up genetic research. The new field of bioinformatics has fundamentally altered the nature of biological research just as IT, computing, and Internet technology did in the fields of renewable-energy generation and 3D printing. According to research compiled by the National Human Genome Research Institute, gene-sequencing costs are plummeting at a rate that exceeds the exponential curves of Moore’s Law in computing power.33 Dr. David Altshuler, deputy director of the Broad Institute of Harvard University and the Massachusetts Institute of Technology, observes that in just the past several years, the price of genetic sequencing has dropped a million fold.34 Consider that the cost of reading one million base pairs of DNA—the human genome contains around three billion pairs—has plunged from $100,000 to just six cents.35 This suggests that the marginal cost of some genetic research will approach zero in the not-too-distant future, making valuable biological data available for free, just like information on the Internet.


pages: 239 words: 45,926

As the Future Catches You: How Genomics & Other Forces Are Changing Your Work, Health & Wealth by Juan Enriquez


Albert Einstein, Berlin Wall, bioinformatics, borderless world, British Empire, Buckminster Fuller, double helix, global village, half of the world's population has never made a phone call, Howard Rheingold, Jeff Bezos, Joseph Schumpeter, Kevin Kelly, knowledge economy, more computing power than Apollo, new economy, personalized medicine, purchasing power parity, Ray Kurzweil, Richard Feynman, Richard Feynman, Search for Extraterrestrial Intelligence, SETI@home, Silicon Valley, spice trade, stem cell

The machines and technology coming out of the digital and genetic revolutions may allow people to leverage their mental capacity a thousand … A million … Or a trillionfold. Biology is now driven by applied math … statistics … computer science … robotics … The world’s best programmers are increasingly gravitating toward biology … You will be hearing a lot about two new fields in the coming months … Bioinformatics and Biocomputing. You rarely see bioinformaticians … They are too valuable to companies and universities. Things are moving too fast … And they are too passionate about what they do … To spend a lot of time giving speeches and interviews. But if you go into the bowels of Harvard Medical School … And are able to find the genetics department inside the Warren Alpert Building … (A significant test of intelligence in and of itself … Start by finding the staircase inspired by the double helix … and go past the bathrooms marked XX and XY …) There you can find a small den where George Church hangs out, surrounded by computers.

This is ground zero for a wonderful commune of engineers, physicists, molecular biologists, and physicians …3 And some of the world’s smartest graduate students … Who are trying to make sense of the 100 terabytes of data that come out of gene labs yearly … A task equivalent to trying to sort and use a million new encyclopedias … every year.4 You can’t build enough “wet” labs (labs full of beakers, cells, chemicals, refrigerators) to process and investigate all the opportunities this scale of data generates. The only way for Church & Co. to succeed … Is to force biology to divide … Into theoretical and applied disciplines. Which is why he is one of the founders of bioinformatics … A new discipline that attempts to predict what biologists will find … When they carry out wet-lab experiments in a few months, years, or decades. In a sense, this mirrors Craig Venter’s efforts at The Institute for Genomic Research and Celera. Celera and Church’s labs are information centers … not traditional labs … And a few smart people are going to be able to do … A lot of biology … Very quickly.

THE RULES ARE DIFFERENT IN A KNOWLEDGE ECONOMY … IT’S A SCARY TIME FOR THE ESTABLISHMENT. Countries, regions, governments, and companies that assume they are … And will remain … Dominant … Soon lose their competitive edge. (Particularly those whose leadership ignores or disparages emerging technologies … Remember those old saws: The sun never sets on the British Empire … Vive La France! … All roads lead to Rome … China, the Middle Kingdom.) Which is one of the reasons bioinformatics is so important … And why you should pay attention. What we are seeing is just the beginning of the digital-genomics convergence. When you think of a DNA molecule and its ability to … Carry our complete life code within each of our cells … Accurately copy the code … Billions of times per day … Read and execute life’s functions … Transmit this information across generations … It becomes clear that … The world’s most powerful and compact coding and information-processing system … is a genome.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst


algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

Much of the disruption is fed by improved instrument and sensor technology; for instance, the Large Synoptic Survey Telescope has a 3.2-gigabyte pixel camera and generates over 6 petabytes of image data per year. It is the platform of Big Data that is making such lofty goals attainable. The validation of Big Data analytics can be illustrated by advances in science. The biomedical corporation Bioinformatics recently announced that it has reduced the time it takes to sequence a genome from years to days, and it has also reduced the cost, so it will be feasible to sequence an individual’s genome for $1,000, paving the way for improved diagnostics and personalized medicine. The financial sector has seen how Big Data and its associated analytics can have a disruptive impact on business. Financial services firms are seeing larger volumes through smaller trading sizes, increased market volatility, and technological improvements in automated and algorithmic trading.

Big Data has transformed astronomy from a field in which taking pictures of the sky was a large part of the job to one in which the pictures are all in a database already and the astronomer’s task is to find interesting objects and phenomena in the database. Transformation is taking place in the biological arena as well. There is now a well-established tradition of depositing scientific data into a public repository and of creating public databases for use by other scientists. In fact, there is an entire discipline of bioinformatics that is largely devoted to the maintenance and analysis of such data. As technology advances, particularly with the advent of next-generation sequencing, the size and number of available experimental data sets are increasing exponentially. Big Data has the potential to revolutionize more than just research; the analytics process has started to transform education as well. A recent detailed quantitative comparison of different approaches taken by 35 charter schools in New York City has found that one of the top five policies correlated with measurable academic effectiveness was the use of data to guide instruction.

It may take a significant amount of work to achieve automated error-free difference resolution. The data preparation challenge even extends to analysis that uses only a single data set. Here there is still the issue of suitable database design, further complicated by the many alternative ways in which to store the information. Particular database designs may have certain advantages over others for analytical purposes. A case in point is the variety in the structure of bioinformatics databases, in which information on substantially similar entities, such as genes, is inherently different but is represented with the same data elements. Examples like these clearly indicate that database design is an artistic endeavor that has to be carefully executed in the enterprise context by professionals. When creating effective database designs, professionals such as data scientists must have the tools to assist them in the design process, and more important, they must develop techniques so that databases can be used effectively in the absence of intelligent database design.


pages: 348 words: 39,850

Data Scientists at Work by Sebastian Gutierrez


Albert Einstein, algorithmic trading, bioinformatics, bitcoin, business intelligence, chief data officer, clean water, cloud computing, computer vision, continuous integration, correlation does not imply causation, crowdsourcing, data is the new oil, DevOps, domain-specific language, follow your passion, full text search, informal economy, information retrieval, Infrastructure as a Service, inventory management, iterative process, linked data, Mark Zuckerberg, microbiome, Moneyball by Michael Lewis explains big data, move fast and break things, natural language processing, Network effects, nuclear winter, optical character recognition, pattern recognition, Paul Graham, personalized medicine, Peter Thiel, pre–internet, quantitative hedge fund, quantitative trading / quantitative finance, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, self-driving car, side project, Silicon Valley, Skype, software as a service, speech recognition, statistical model, Steve Jobs, stochastic process, technology bubble, text mining, the scientific method, web application

This was what started my fascination with computer science. In addition to computers, as a kid I was also really excited about bioinformatics. I was really interested in the fact that you could take all genetic data, actually fit it into computers, and solve many problems that look unsolvable before that, and reach medical discoveries. You could potentially build a combination of a human and a computer together. I took part in the Technion External Studies program when I was 15 years old, which allowed me to start taking college level classes while still in high school. And once I started studying at the Technion University, this was what I wanted to do—study bioinformatics. Before my studies started, at the age of 14, I went to a research camp. At the camp, each one of us selected research he or she wanted to lead—I chose Data Scientists at Work to perform a research on how natural compounds affect the proliferation of cancer cells, specifically prostate cancer cells.

So Konrad Kording1, a scientist at Northwestern, and I started trying to build models to discover the structure and patterns in this connectomic state. We have a paper that was just sent out for review on exactly this idea of how—given this high-throughput, ambiguous, noisy, sometimes error-filled data—you actually extract out scientific meaning. The analogy here to bioinformatics is really strong. It used to be that a ­biologist was a biologist. And then we had the rise of genomics as a field, and now you have computational genomics as a field. The entire field of bioinformatics is actually a field where people who are biologists just sit at a computer. They don’t actually touch a wet lab. It became a real independent field partially because of this transition toward the availability of high-quality, high-throughput data. I think neuroscience is going to see a similar transition.

In combination with,, and brick-and-mortar stores, these ­channels result in a rich data ecosystem that the Data Lab uses to inform business decisions and enhance the customer experience. Shellman’s data science career began with an internship at the National Institutes of Health in the Division of Computational Biosciences. It was here that she initially learned and applied machine learning to uncover patterns in genomic evolution. Following her internship, she completed a Master of Science degree in biostatistics and a doctoral degree in bioinformatics both from the University of Michigan in Ann Arbor. While at the University of Michigan, Shellman collaborated frequently and analyzed many types of heterogeneous biological data including gene expression microarrays, metabolomics, network graphs, and clinical time-series. A frequent speaker and teacher, Shellman has presented at conferences such as Strata and the Big Data Congress II, and also speaks regularly at meet-ups and gatherings in the Seattle technology community.


pages: 287 words: 86,919

Protocol: how control exists after decentralization by Alexander R. Galloway


Ada Lovelace, airport security, Berlin Wall, bioinformatics, Bretton Woods, computer age, Craig Reynolds: boids flock, discovery of DNA, double helix, Douglas Engelbart, easy for humans, difficult for computers, Fall of the Berlin Wall, Grace Hopper, Hacker Ethic, informal economy, John Conway, Kevin Kelly, late capitalism, linear programming, Marshall McLuhan, means of production, Menlo Park, mutually assured destruction, Norbert Wiener, packet switching, phenotype, post-industrial society, profit motive, QWERTY keyboard, RAND corporation, Ray Kurzweil, RFC: Request For Comment, Richard Stallman, semantic web, SETI@home, stem cell, Steve Crocker, Steven Levy, Stewart Brand, Ted Nelson, telerobotics, the market place, theory of mind, urban planning, Vannevar Bush, Whole Earth Review, working poor

This dual property (regulated flow) is central to Protocol’s analysis of the Internet as a political technology. Isomorphic Biopolitics As a final comment, it is worthwhile to note that the concept of “protocol” is related to a biopolitical production, a production of the possibility for experience in control societies. It is in this sense that Protocol is doubly materialist—in the sense of networked bodies inscribed by informatics, and Foreword: Protocol Is as Protocol Does xix in the sense of this bio-informatic network producing the conditions of experience. The biopolitical dimension of protocol is one of the parts of this book that opens onto future challenges. As the biological and life sciences become more and more integrated with computer and networking technology, the familiar line between the body and technology, between biologies and machines, begins to undergo a set of transformations. “Populations” defined nationally or ethnically are also defined informatically.

(Witness the growing business of population genomics.) Individual subjects are not only civil subjects, but also medical subjects for a medicine increasingly influenced by genetic science. The ongoing research and clinical trials in gene therapy, regenerative medicine, and genetic diagnostics reiterate the notion of the biomedical subject as being in some way amenable to a database. In addition to this bio-informatic encapsulation of individual and collective bodies, the transactions and economies between bodies are also being affected. Research into stem cells has ushered in a new era of molecular bodies that not only are self-generating like a reservoir (a new type of tissue banking), but that also create a tissue economy of potential biologies (lab-grown tissues and organs). Such biotechnologies often seem more science fiction than science, and indeed health care systems are far from fully integrating such emerging research into routine medical practice.

If layering is dependent upon portability, then portability is in turn enabled by the existence of ontology standards. These are some of the sites that Protocol opens up concerning the possible relations between information and biological networks. While the concept of biopolitics is often used at its most general level, Protocol asks us to respecify biopolitics in the age of biotechnology and bioinformatics. Thus one site of future engagement is in the zones where info-tech and bio-tech intersect. The “wet” biological body has not simply been superceded by “dry” computer code, just as the wet body no longer accounts for the virtual body. Biotechnologies of all sorts demonstrate this to us—in vivo tissue engineering, ethnic genome projects, gene-finding software, unregulated genetically modified foods, portable DNA diagnostics kits, and distributed proteomic computing.


pages: 381 words: 78,467

100 Plus: How the Coming Age of Longevity Will Change Everything, From Careers and Relationships to Family And by Sonia Arrison


23andMe, 8-hour work day, Albert Einstein, Anne Wojcicki, artificial general intelligence, attribution theory, Bill Joy: nanobots, bioinformatics, Clayton Christensen, dark matter, East Village,, epigenetics, Frank Gehry, Googley, income per capita, indoor plumbing, Jeff Bezos, Johann Wolfgang von Goethe, Law of Accelerating Returns, life extension, personalized medicine, Peter Thiel, placebo effect, post scarcity, Ray Kurzweil, rolodex, Silicon Valley, Simon Kuznets, Singularitarianism, smart grid, speech recognition, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Levy, Thomas Malthus, upwardly mobile, World Values Survey, X Prize

SU’s mission is practical: “to assemble, educate and inspire leaders who strive to understand and facilitate the development of exponentially advancing technologies in order to address humanity’s grand challenges.”20 The academic tracks are geared toward understanding how fast-moving technologies can work together, and more than half of them have a direct impact on the field of longevity research. These tracks include AI and robotics; nanotechnology, networks, and computing systems; biotechnology and bioinformatics; medicine and neuroscience; and futures studies and forecasting.21 SU is a place where mavens speak to those who are superfocused on changing the world for the better. It is no surprise, then, that it also functions as an institutional “connector”—the third component needed to successfully spread a game-changing meme. CONNECT ME Peter Diamandis always seems to be on the phone or leaving a meeting to get on a phone call.

Craig Venter, and the Human Genome Project, an international public consortium backed with around $3 billion U.S. tax dollars.54 Both President Bill Clinton and Prime Minister of Britain Tony Blair presided over the press conference announcing that humanity now possessed “the genetic blueprint for human beings.”55 President Clinton proudly told the world that the capacity to sequence human genomes “will revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases.”56 This new ability to look at the “source code” of humans particularly resonated with computer experts in Silicon Valley and around the world who spend much of their time designing code for computers. If the source code of humans can be identified, then it is not that much of a leap to think about re-engineering it. Suddenly, biology became a field that computer geeks could attempt to tackle, which not only resulted in smart biohackers forming do-it-yourself biology clubs, but also increased the pace of advances in biology. Bioinformatics are moving at the speed of Moore’s Law and sometimes faster. To the extent that wealthy technology moguls influence public opinion and hackers seem cool, the context for the longevity meme is sizzling hot. In a Wired magazine interview in April 2010, Bill Gates, America’s richest man, told reporter Steven Levy that if he were a teenager today, “he’d be hacking biology.”57 Gates elaborated, saying, “Creating artificial life with DNA synthesis, that’s sort of the equivalent of machine-language programming.”

Policy makers, activists, journalists, educators, investors, philanthropists, analysts, entrepreneurs, and a whole host of others need to come together to fight for their lives. We now know that aging is plastic and that humanity’s time horizons are not set in stone. Larry Ellison, Bill Gates, Peter Thiel, Jeff Bezos, Larry Page, Sergey Brin, and Paul Allen have all recognized the wealth of opportunity in the bioinformatics revolution, but this is not enough. Other heroes must come forward—perhaps there is even one reading this sentence right now. The goal is more healthy time, which, as we have seen throughout this book, will lead to greater wealth and prospects for happiness. A longer health span means more time to enjoy the wonders of life, including relationships with family and friends, career building, knowledge seeking, adventure, and exploration.


pages: 350 words: 96,803

Our Posthuman Future: Consequences of the Biotechnology Revolution by Francis Fukuyama


Albert Einstein, Berlin Wall, bioinformatics, Columbine, demographic transition, Fall of the Berlin Wall, Flynn Effect, Francis Fukuyama: the end of history, impulse control, life extension, Menlo Park, meta analysis, meta-analysis, out of africa, Peter Singer: altruism, phenotype, presumed consent, Ray Kurzweil, Scientific racism, stem cell, Steven Pinker, The Bell Curve by Richard Herrnstein and Charles Murray, Turing test

Beyond genomics lies the burgeoning field of proteomics, which seeks to understand how genes code for proteins and how the proteins themselves fold into the exquisitely complex shapes required by cells.2 And beyond proteomics there lies the unbelievably complex task of understanding how these molecules develop into tissues, organs, and complete human beings. The Human Genome Project would not have been possible without parallel advances in the information technology required to record, catalog, search, and analyze the billions of bases making up human DNA. The merger of biology and information technology has led to the emergence of a new field, known as bioinformatics.3 What will be possible in the future will depend heavily on the ability of computers to interpret the mind-boggling amounts of data generated by genomics and proteomics and to build reliable models of phenomena such as protein folding. The simple identification of genes in the genome does not mean that anyone knows what it is they do. A great deal of progress has been made in the past two decades in finding the genes connected to cystic fibrosis, sickle-cell anemia, Huntington’s chorea, Tay-Sachs disease, and the like.

Schlesinger, Jr.’s, Cycles of American History (Boston: Houghton Mifflin, 1986); see also William Strauss and Neil Howe, The Fourth Turning: An American Prophecy (New York: Broadway Books, 1997). 22 Kirkwood (1999), pp. 131–132. 23 Michael Norman, “Living Too Long,” The New York Times Magazine, January 14, 1996, pp. 36–38. 24 Kirkwood (1999), p. 238. 25 On the evolution of human sexuality, see Donald Symons, The Evolution of Human Sexuality (Oxford: Oxford University Press, 1979) CHAPTER 5: GENETIC ENGINEERING 1 On the history of the Human Genome Project, see Robert Cook-Degan, The Gene Wars: Science, Politics, and the Human Genome (New York: W. W. Norton, 1994); Kathryn Brown, “The Human Genome Business Today,” Scientific American 283 (July 2000): 50–55; and Kevin Davies, Cracking the Genome: Inside the Race to Unlock Human DNA (New York: Free Press, 2001). 2 Carol Ezzell, “Beyond the Human Genome,” Scientific American 283, no. 1 ( July 2000): 64–69. 3 Ken Howard, “The Bioinformatics Gold Rush,” Scientific American 283, no. 1 (July 2000): 58–63. 4 Interview with Stuart A. Kauffman, “Forget In Vitro—Now It’s ‘In Silico,’” Scientific American 283, no. I July 2000): 62–63. 5 Gina Kolata, “Genetic Defects Detected in Embryos Just Days Old,” The New York Times, September 24, 1992, p. A1 6 Lee M. Silver, Remaking Eden: Cloning and Beyond in a Brave New World (New York: Avon, 1998), pp. 233–247 7 Ezzell (2000). 8 For Wilmut’s own account of this accomplishment, see Ian Wilmut, Keith Campbell, and Colin Tudge, The Second Creation: Dolly and the Age of Biological Control (New York: Farrar, Straus and Giroux, 2000). 9 National Bioethics Advisory Commission, Cloning Human Beings (Rockville, Md.: National Bioethics Advisory Commission, 1997). 10 Margaret Talbot, “A Desire to Duplicate,” The New York Times Magazine, February 4, 2001, pp. 40–68; Brian Alexander, “(You)2,” Wired, February 2001, 122–135. 11 Glenn McGee, The Perfect Baby: A Pragmatic Approach to Genetics (Lanham, Md.: Rowman and Littlefield, 1997). 12 For an overview of the present state of human germ-line engineering, see Gregory Stock and John Campbell, eds., Engineering the Human Germline: An Exploration of the Science and Ethics of Altering the Genes We Pass to Our Children (New York: Oxford University Press, 2000); Marc Lappé, “Ethical Issues in Manipulating the Human Germ Line,” in Peter Singer and Helga Kuhse, eds., Bioethics: An Anthology (Oxford: Blackwell, 1999), p. 156; and Mark S.

Heidegger, Martin. Basic Writings. New York: Harper and Row, 1957. High, Jack, and Clayton A. Coppin. The Politics of Purity: Harvey Washington Wiley and the Origins of Federal Food Policy Ann Arbor, Mich.: University of Michigan Press, 1999. Hirschi, Travis, and Michael Gottfredson. A General Theory of Crime. Stanford, Calif.: Stanford University Press, 1990. Howard, Ken. “The Bioinformatics Gold Rush.” Scientific American 283, no. I (July 2000): 58–63. Hrdy, Sarah B., and Glenn Hausfater. Infanticide: Comparative and Evolutionary Perspectives. New York: Aldine Publishing, 1984. Hubbard, Ruth. The Politics of Women’s Biology. New Brunswick, N.J.: Rutgers University Press, 1990. Huber, Peter. Orwell’s Revenge: The 1984 Palimpsest. New York: Free Press, 1994. Hull, Terence H.


pages: 285 words: 78,180

Life at the Speed of Light: From the Double Helix to the Dawn of Digital Life by J. Craig Venter


Albert Einstein, Alfred Russel Wallace, Barry Marshall: ulcers, bioinformatics, borderless world, Brownian motion, clean water, discovery of DNA, double helix, epigenetics, experimental subject, Isaac Newton, Islamic Golden Age, John von Neumann, Louis Pasteur, Mars Rover, Mikhail Gorbachev, phenotype, Richard Feynman, Richard Feynman, stem cell, the scientific method, Thomas Kuhn: the structure of scientific revolutions, Turing machine

They believed that the then-unprecedented amount of molecular information available for a wide range of model organisms would yield vivid new insights into intracellular molecular processes that could, if simulated in a computer, enable them to predict the dynamic behavior of living cells. Within a computer it would be possible to explore the functions of proteins, protein–protein interactions, protein–DNA interactions, regulation of gene expression, and other features of cellular metabolism. In other words, a virtual cell could provide a new perspective on both the software and hardware of life. In the spring of 1996 Tomita and his students at the Laboratory for Bioinformatics at Keio started investigating the molecular biology of Mycoplasma genitalium (which we had sequenced in 1995) and by the end of that year had established the E-Cell Project. The Japanese team had constructed a model of a hypothetical cell with only 127 genes, which were sufficient for transcription, translation, and energy production. Most of the genes that they used were taken from Mycoplasma genitalium.

Currently Novartis and other vaccine companies rely on the World Health Organization to identify and distribute the seed viruses. To speed up the process we are using a method called “reverse vaccinology,” which was first applied to the development of a meningococcal vaccine by Rino Rappuoli, now at Novartis. The basic idea is that the entire pathogenic genome of an influenza virus can be screened using bioinformatic approaches to identify and analyze its genes. Next, particular genes are selected for attributes that would make good vaccine targets, such as outer-membrane proteins. Those proteins then undergo normal testing for immune responses. My team has sequenced genes representing the diversity of influenza viruses that have been encountered since 2005. We have sequenced the complete genomes of a large collection of human influenza isolates, as well as a select number of avian and other non-human influenza strains relevant to the evolution of viruses with pandemic potential, and made the information publicly available.

Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-physikalische Klasse, Fachgruppe VI, Biologie, Neue Folge 1, no. 13 (1935): pp. 189–245. 5. Richard Dawkins. River Out of Eden (New York: Basic Books, 1995). 6. Motoo Kimura. “Natural selection as the process of accumulating genetic information in adaptive evolution.” Genetical Research 2 (1961): pp. 127–40. 7. Sydney Brenner. “Life’s code script.” Nature 482 (February 23, 2012): p. 461. 8. W. J. Kress and D. L. Erickson. “DNA barcodes: Genes, genomics, and bioinformatics.” Proceedings of the National Academy of Sciences 105, no. 8 (2008): pp. 2761–62. 9. Lulu Qian and Erik Winfree. “Scaling up digital circuit computation with DNA strand displacement cascades.” Science 332, no. 6034 (June 3, 2011): pp. 1196–201. 10. George M. Church, Yuan Gao, and Sriram Kosuri. “Next-generation digital information storage in DNA.” Science 337, no. 6102 (September 28, 2012): p. 1628. 11.


ucd-csi-2011-02 by Unknown


bioinformatics,, pattern recognition, The Wisdom of Crowds

The main contribution of this work is to present the notion of bipolarity that captures the level of conflict between the contributors to a page. Thus the work is more directed at the problem of Wikipedia vandalism than the issue of authoritativeness that is the subject of this paper. 3 Extracting and Comparing Network Motif Profiles The idea of characterizing networks in terms of network motif profiles is well established and has had a considerable impact in bioinformatics [10]. Our objective is to characterize Wikipedia pages in terms of network motif profiles and then examine whether or not different pages have characteristic network motif profiles. The datasets we considered were entries in the English language Wikipedia 2 on famous sociologists and footballers in the English Premiership 4 (see Table 1). The first step in the analysis is to identify a set of network motifs to use. 3.1 Wikipedia Network Motifs Our Wikipedia network motifs comprise author and page nodes and author-page (AP) and page-page (PP) edges (see Figures 3 and 4).


pages: 523 words: 148,929

Physics of the Future: How Science Will Shape Human Destiny and Our Daily Lives by the Year 2100 by Michio Kaku


agricultural Revolution, AI winter, Albert Einstein, augmented reality, Bill Joy: nanobots, bioinformatics, blue-collar work, British Empire, Brownian motion, cloud computing, Colonization of Mars, DARPA: Urban Challenge, delayed gratification, double helix, Douglas Hofstadter,, friendly AI, Gödel, Escher, Bach, hydrogen economy, I think there is a world market for maybe five computers, industrial robot, invention of movable type, invention of the telescope, Isaac Newton, John von Neumann, life extension, Louis Pasteur, Mahatma Gandhi, Mars Rover, megacity, Murray Gell-Mann, new economy, oil shale / tar sands, optical character recognition, pattern recognition, planetary scale, postindustrial economy, Ray Kurzweil, refrigerator car, Richard Feynman, Richard Feynman, Rodney Brooks, Ronald Reagan, Search for Extraterrestrial Intelligence, Silicon Valley, Simon Singh, speech recognition, stem cell, Stephen Hawking, Steve Jobs, telepresence, The Wealth of Nations by Adam Smith, Thomas L Friedman, Thomas Malthus, trade route, Turing machine, uranium enrichment, Vernor Vinge, Wall-E, Walter Mischel, Whole Earth Review, X Prize

I imagine in the near future, many people will have the same strange feeling I did, holding the blueprint of their bodies in their hands and reading the intimate secrets, including dangerous diseases, lurking in the genome and the ancient migration patterns of their ancestors. But for scientists, this is opening an entirely new branch of science, called bioinformatics, or using computers to rapidly scan and analyze the genome of thousands of organisms. For example, by inserting the genomes of several hundred individuals suffering from a certain disease into a computer, one might be able to calculate the precise location of the damaged DNA. In fact, some of the world’s most powerful computers are involved in bioinformatics, analyzing millions of genes found in plants and animals for certain key genes. This could even revolutionize TV detective shows like CSI. Given tiny scraps of DNA (found in hair follicles, saliva, or bloodstains), one might be able to determine not just the person’s hair color, eye color, ethnicity, height, and medical history, but perhaps also his face.

See Robotics/­AI Artificial vision Artsutanov, Yuri ASIMO robot, 2.­1, 2.­2, 2.­3 Asimov, Isaac, 2.­1, 6.­1, 8.­1 ASPM gene Asteroid landing Atala, Anthony Atomic force microscope Augmented reality Augustine Commission report, 6.­1, 6.­2 Avatar (movie), 1.­1, 2.­1, 6.­1, 7.­1 Avatars Backscatter X-­rays Back to the Future movies, 5.­1, 5.­2 Badylak, Stephen Baldwin, David E.­ Baltimore, David, 1.­1, 3.­1, 3.­2, 3.­3 Benford, Gregory Big bang research Binnig, Gerd Bioinformatics Biotechnology. See Medicine/­biotechnology Birbaumer, Niels Birth control Bismarck, Otto von Blade Runner (movie) Blue Gene computer Blümich, Bernhard, 1.­1, 1.­2 Boeing Corporation Booster-­rocket technologies Bova, Ben, 5.­1, 5.­2 Boys from Brazil, The (movie) Brain artificial body parts, adaptation to basic structure of emotions and growing a human brain Internet contact lenses and locating every neuron in as neural network parallel processing in reverse engineering of simulations of “­Brain drain”­ to the United States BrainGate device Brain injuries, treatment for Branson, Richard Brave New World (Huxley) Breast cancer Breazeal, Cynthia Brenner, Sydney Brooks, Rodney, 2.­1, 2.­2, 4.­1 Brown, Dan Brown, Lester Buckley, William F.­

See also Intellectual capitalism Carbon nanotubes, 4.­1, 6.­1 Carbon sequestration Cars driverless electric maglev, 5.­1, 9.­1 Cascio, Jamais Catoms Cave Man Principle biotechnology and computer animations and predicting the future and replicators and, 4.­1, 4.­2 robotics/AI and, 2.­1, 2.­2 sports and Cerf, Vint, 4.­1, 6.­1 Chalmers, David Charles, Prince of Wales Chemotherapy Chernobyl nuclear accident Chevy Volt Chinese Empire, 7.­1, 7.­2 Church, George Churchill, Winston, itr.­1, 8.­1 Cipriani, Christian Civilizations alien civilizations characteristics of various Types entropy and information processing and resistance to Type I civilization rise and fall of great empires rise of civilization on Earth science and wisdom, importance of transition from Type 0 to Type I, itr.­1, 8.­1, 8.­2 Type II civilizations, 8.­1, 8.­2, 8.­3 Type III civilizations, 8.­1, 8.­2 waste heat and Clarke, Arthur C.­ Clausewitz, Carl von Cloning, 3.­1, 3.­2 Cloud computing, 1.­1, 7.­1 Cochlear implants Code breaking Collins, Francis Comets Common sense, 2.­1, 2.­2, 2.­3, 7.­1, 7.­2 Computers animations created by augmented reality bioinformatics brain simulations carbon nanotubes and cloud computing, 1.­1, 7.­1 digital divide DNA computers driverless cars exponential growth of computer power (Moore’s law), 1.­1, 1.­2, 1.­3, 4.­1 fairy tale life and far future (2070) four stages of technology and Internet glasses and contact lenses, 1.­1, 1.­2 medicine and midcentury (2030) mind control of molecular and atomic transistors nanotechnology and near future (present to 2030) optical computers parallel processing physics of computer revolution quantum computers quantum dot computers quantum theory and, 1.­1, 4.­1, 4.­2, 4.­3 scrap computers self-­assembly and silicon chips, limitations of, 1.­1, 1.­2, 4.­1 telekinesis with 3-­D technology universal translators virtual reality wall screens See also Mind reading; Robotics/­AI Condorcet, Marquis de Conscious robots, 2.­1, 2.­2 Constellation Program COROT satellite, 6.­1, 8.­1 Crick, Francis Criminology Crutzen, Paul Culture in Type I civilization Customization of products Cybertourism, itr.­1, itr.­2 CYC project Damasio, Antonio Dating in 2100, 9.­1, 9.­2, 9.­3, 9.­4 Davies, Stephen Da Vinci robotic system Dawkins, Richard, 3.­1, 3.­2, 3.­3 Dawn computer Dean, Thomas Decoherence problem Deep Blue computer, 2.­1, 2.­2, 2.­3 Delayed gratification DEMO fusion reactor Depression treatments Designer children, 3.­1, 3.­2, 3.­3 Developing nations, 7.­1, 7.­2 Diamandis, Peter Dictatorships Digital divide Dinosaur resurrection Disease, elimination of, 3.­1, 8.­1 DNA chips DNA computers Dog breeds Donoghue, John, 1.­1, 1.­2 Dreams, photographing of Drexler, Eric Driverless cars Duell, Charles H.­


pages: 560 words: 158,238

Fifty Degrees Below by Kim Stanley Robinson


airport security, bioinformatics, Burning Man, clean water, Donner party, full employment, invisible hand, iterative process, means of production, minimum wage unemployment, North Sea oil, Ralph Waldo Emerson, Richard Feynman, Richard Feynman, statistical model, Stephen Hawking, the scientific method

He wanted to talk to everyone implicated in this: Yann Pierzinski—meaning Marta too, which would be hard, terrible in fact, but Marta had moved to Atlanta with Yann and they lived together there, so there would be no avoiding her. And then Francesca Taolini, who had arranged for Yann’s hire by a company she consulted for, in the same way Frank had hoped to. Did she suspect that Frank had been after Yann? Did she know how powerful Yann’s algorithm might be? He googled her. Turned out, among many interesting things, that she was helping to chair a conference at MIT coming soon, on bioinformatics and the environment. Just the kind of event Frank might attend. NSF even had a group going already, he saw, to talk about the new federal institutes. Meet with her first, then go to Atlanta to meet with Yann—would that make his stock in the virtual market rise, triggering more intense surveillance? An unpleasant thought; he grimaced. He couldn’t evade most of this surveillance. He had to continue to behave as if it wasn’t happening.

What the hell was that, after all? And how would you measure it? So at work Anna spent her time trying to concentrate, over a persistent underlying turmoil of worry about her younger son. Work was absorbing, as always, and there was more to do than there was time to do it in, as always. And so it provided its partial refuge. But it was harder to dive in, harder to stay under the surface in the deep sea of bioinformatics. Even the content of the work reminded her, on some subliminal level, that health was a state of dynamic balance almost inconceivably complex, a matter of juggling a thousand balls while unicycling on a tightrope over the abyss—in a gale—at night—such that any life was an astonishing miracle, brief and tenuous. But enough of that kind of thinking! Bear down on the fact, on the moment and the problem of the moment!

Take a problem, break it down into parts (analyze), quantify whatever parts you could, see if what you learned suggested anything about causes and effects; then see if this suggested anything about long-term plans, and tangible things to do. She did not believe in revolution of any kind, and only trusted the mass application of the scientific method to get any real-world results. “One step at a time,” she would say to her team in bioinformatics, or Nick’s math group at school, or the National Science Board; and she hoped that as long as chaos did not erupt worldwide, one step at a time would eventually get them to some tolerable state. Of course there were all the hysterical operatics of “history” to distract people from this method and its incremental successes. The wars and politicians, the police state regimes and terrorist insurgencies, the gross injustices and cruelties, the unnecessarily ongoing plagues and famines—in short, all the mass violence and rank intimidation that characterized most of what filled the history books; all that was real enough, indeed all too real, undeniable—and yet it was not the whole story.


pages: 588 words: 131,025

The Patient Will See You Now: The Future of Medicine Is in Your Hands by Eric Topol


23andMe, 3D printing, Affordable Care Act / Obamacare, Anne Wojcicki, Atul Gawande, augmented reality, bioinformatics, call centre, Clayton Christensen, clean water, cloud computing, computer vision, conceptual framework, connected car, correlation does not imply causation, crowdsourcing, dark matter, data acquisition, disintermediation, don't be evil, Edward Snowden, Elon Musk,, Erik Brynjolfsson, Firefox, global village, Google Glasses, Google X / Alphabet X, Ignaz Semmelweis: hand washing, interchangeable parts, Internet of things, Isaac Newton, job automation, Joseph Schumpeter, Julian Assange, Kevin Kelly, license plate recognition, Lyft, Mark Zuckerberg, Marshall McLuhan, meta analysis, meta-analysis, microbiome, Nate Silver, natural language processing, Network effects, Nicholas Carr, obamacare, pattern recognition, personalized medicine, phenotype, placebo effect, RAND corporation, randomized controlled trial, Second Machine Age, self-driving car, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, Snapchat, social graph, speech recognition, stealth mode startup, Steve Jobs, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Uber for X, Watson beat the top human players on Jeopardy!, X Prize

Indeed, the state of California, which has the largest prenatal screening program in the world, with more than four hundred thousand expectant mothers assessed annually, already provides these tests to all pregnant women who have increased risk.26 Of course, we could also sequence the fetus’s entire genome instead of just doing the simpler screens. While that is not a commercially available test, and there are substantial bioinformatic challenges that lie ahead before it could be scalable, the anticipatory bioethical issues that this engenders are considerable.27 We are a long way off for determining what would constitute acceptable genomic criteria for early termination of pregnancy, since this not only relies on accurately determining a key genomic variant linked to a serious illness, but also understanding whether this condition would actually manifest.

Now it is possible to use sequencing to unravel the molecular diagnosis of an unknown condition, and the chances for success are enhanced when there is DNA from the mother and father, or other relatives, to use for anchoring and comparative sequencing analysis. At several centers around the country, the success rate for making the diagnosis ranges between 25 percent and 50 percent. It requires considerable genome bioinformatic expertise, for a trio of individuals will generate around 750 billion data points (six billion letters per sequence, three people, each done forty times to assure accuracy). Of course, just making the diagnosis is not the same as coming up with an effective treatment or a cure. But there have been some striking anecdotal examples of children whose lives were saved or had dramatic improvement.

The most far-reaching component of the molecular stethoscope appears to be cell-free RNA, which can potentially be used to monitor any organ of the body.82 Previously that was unthinkable in a healthy person. How could one possibly conceive of doing a brain or liver biopsy in someone as part of a normal checkup? Using high-throughput sequencing of cell-free RNA in the blood, and sophisticated bioinformatic methods to analyze this data, Stephen Quake and his colleagues at Stanford were able to show it is possible to follow the gene expression from each of the body’s organs from a simple blood sample. And that is changing all the time in each of us. This is an ideal case for deep learning to determine what these dynamic genomic signatures mean, to determine what can be done to change the natural history of a disease in the making, and to develop the path for prevention.


pages: 405 words: 117,219

In Our Own Image: Savior or Destroyer? The History and Future of Artificial Intelligence by George Zarkadakis


3D printing, Ada Lovelace, agricultural Revolution, Airbnb, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, anthropic principle, Asperger Syndrome, autonomous vehicles, barriers to entry, battle of ideas, Berlin Wall, bioinformatics, British Empire, business process, carbon-based life, cellular automata, Claude Shannon: information theory, combinatorial explosion, complexity theory, continuous integration, Conway's Game of Life, cosmological principle, dark matter, dematerialisation, double helix, Douglas Hofstadter, Edward Snowden, epigenetics, Flash crash, Google Glasses, Gödel, Escher, Bach, income inequality, index card, industrial robot, Internet of things, invention of agriculture, invention of the steam engine, invisible hand, Isaac Newton, Jacquard loom, Jacquard loom, Jacques de Vaucanson, James Watt: steam engine, job automation, John von Neumann, Joseph-Marie Jacquard, millennium bug, natural language processing, Norbert Wiener, On the Economy of Machinery and Manufactures, packet switching, pattern recognition, Paul Erdős, post-industrial society, prediction markets, Ray Kurzweil, Rodney Brooks, Second Machine Age, self-driving car, Silicon Valley, speech recognition, stem cell, Stephen Hawking, Steven Pinker, strong AI, technological singularity, The Coming Technological Singularity, the scientific method, theory of mind, Turing complete, Turing machine, Turing test, Tyler Cowen: Great Stagnation, Vernor Vinge, Von Neumann architecture, Watson beat the top human players on Jeopardy!, Y2K

At the same time, the computer metaphor frames our way of thinking, and how we communicate the fundamental ideas of our time. We speak of the brain as the ‘hardware’ and of the mind as the ‘software’. This dualistic software–hardware paradigm is applied across many fields, including life itself. Cells are the ‘computers’ that run a ‘program’ called the genetic code, or genome. The ‘code’ is written on the DNA. Cutting-edge research in biology does not take place in vitro in a wet lab, but in silico in a computer. Bioinformatics – the accumulation, tagging, storing, manipulation and mining of digital biological data – is the present, and future, of biology research. The computer metaphor for life is reinforced by its apparently successful application to real problems. Many disruptive new technologies in molecular biology – for instance ‘DNA printing’ – function on the basis of digital information. This is how they do it: DNA is a molecule formed by two sets of base pairs: adenine-thymine (A-T) and guanine-cytosine (G-C).

Thanks to digital data and ever-accelerating computer power we are at the cusp of an era in which we can gain unprecedented insights into natural phenomena, the human body, markets, Earth’s climate, ecosystems, energy grids, and just about everything in between. Norbert Wiener’s cybernetic dream is slowly becoming a reality: the more information we have about systems, the more control we can exercise over them with the help of our computers. Big data are our newfound economic bounty. The big data economy In 2010, I took a contract as External Relations Officer at the European Bioinformatics Institute (EBI) at Hinxton, Cambridge. The Institute is part of the intergovernmental European Molecular Biology Laboratory, and its core mission is to provide an infrastructure for the storage and manipulation of biological data. This is the data that researchers in the life sciences produce every day, including information about the genes of humans and of other species, chemical molecules that might provide the basis for new therapies, proteins, and also about research findings in general.

At the time that I worked for them, EBI’s challenge was to increase the capacity of its infrastructure in order to accommodate this ‘data deluge’. As someone who facilitated communications between the Institute and potential government funders across Europe, I had first-hand experience of the importance that governments placed on biological data. Almost everyone understood the potential for driving innovation through this data, and was ready to support the expansion of Europe’s bioinformatics infrastructure, even as Europe was going through the Great Recession. The message was simple and clear: whoever owned the data owned the future. Governments and scientists are not the only ones to have jumped on the bandwagon of big data. The advent of social media and Google Search has transformed the marketing operations of almost every business in the world, big and small. Tools have been developed to ‘mine’ the text written by billions of people on Facebook and Twitter, in order to measure sentiment and target consumers with, hopefully, the right products.


pages: 400 words: 94,847

Reinventing Discovery: The New Era of Networked Science by Michael Nielsen


Albert Einstein, augmented reality, barriers to entry, bioinformatics, Cass Sunstein, Climategate, Climatic Research Unit, conceptual framework, dark matter, discovery of DNA, double helix, Douglas Engelbart,, Erik Brynjolfsson, fault tolerance, Fellow of the Royal Society, Firefox, Freestyle chess, Galaxy Zoo, Internet Archive, invisible hand, Jane Jacobs, Jaron Lanier, Kevin Kelly, Magellanic Cloud, means of production, medical residency, Nicholas Carr, publish or perish, Richard Feynman, Richard Feynman, Richard Stallman, semantic web, Silicon Valley, Silicon Valley startup, Simon Singh, Skype, slashdot, social web, statistical model, Stephen Hawking, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, The Nature of the Firm, The Wisdom of Crowds, University of East Anglia, Vannevar Bush, Vernor Vinge

p 106: Mapping the brain is far too large a subject for me to give a comprehensive list of references. An overview of work on the Allen Brain Atlas may be found in Jonah Lehrer’s excellent article [120]. Most of the facts I relate are from that article. The paper announcing the atlas of gene expression in the mouse brain is [121]. Overviews of some of the progress and challenges in mapping the human connectome may be found in [119] and [125]. p 108: Bioinformatics and cheminformatics are now well-established fields, with a significant literature, and I won’t attempt to single out any particular reference for special mention. Astroinformatics has emerged more recently. See especially [24] for a manifesto on the need for astroinformatics. p 113: A report on the 2005 freestyle chess tournament may be found at [37], with follow-up commentary on the winners at [39].

See architecture of attention; restructuring expert attention augmented reality, 41, 87 autism-vaccine controversy, 156 Avatar (film), 34 Axelrod, Robert, 219 Baker, David, 146 basic research: economic scale of, 203 secrecy in, 87, 184–86 Bayh-Dole Act, 184–85 Benkler, Yochai, 218, 224 Bennett, John Caister, 149 Berges, Aida, 155 Bermuda Agreement, 7, 108, 190, 192, 222 Berners-Lee, Tim, 218 bioinformatics, 108 biology: data-driven intelligence in, 116–19 data web for, 121–22 open source, 48. See also genetics birdwatchers, 150 black holes, orbiting pair of, 96, 100–101, 103, 112, 114 Blair, Tony, 7, 156 Block, Peter, 218 blogs: architecture of attention and, 42, 56 as basis of Polymath Project, 1–2, 42 invention of, 20 in quantum computing, 187 rumors on, 201–2 scientific, 6, 165–69, 203–4 Borgman, Christine, 218 Boroson, Todd, 100–101, 103, 114 Borucki, William, 201 botany, 107 Brahe, Tycho, 104 brain atlases, 106, 108 British Chiropractic Association, 165–66 Brown, Zacary, 23–24, 27, 35, 41, 223 Burkina Faso, open architecture project in, 46–48 Bush, Vannevar, 217, 218 business: data-driven intelligence for, 112 data sharing methods in, 120.

See also amplifying collective intelligence Colwell, Robert, 218 combinatorial line, 211 comet hunters, 148–49 comment sites: successful examples of, 234 user-contributed, 179–81 commercialization of science, 87, 184–86 Company of Strangers, The (Seabright), 37 comparative advantage: architecture of attention and, 32, 33, 43, 56 examples from the sciences, 82, 83, 84, 85 for InnoCentive Challenges, 24, 43 modularity and, 56 technical meaning of, 223 competition: data sharing and, 103–4 as obstacle to collaboration, 86 in protein structure prediction, 147–48 for scientific jobs, 8, 9, 178, 186 Complexity Zoo, 233 computer code: in bioinformatics, 108 centralized development of new tools, 236 citation of, 196, 204–5 for complex experiments, 203 height=" information commons in, 57–59 sharing, 87, 183, 193, 204–5. See also Firefox; Linux; MathWorks competition; open source software computer games: addictive quality of, 146, 147 for folding proteins (see Foldit) connectome, human, 106, 121 conversation, offline small-group, 39–43 conversational critical mass, 30, 31, 33, 42 Cornell University Laboratory of Ornithology, 150 Cox, Alan, 57 Creative Commons, 219, 220 creative problem solving, 24, 30, 34, 35, 36, 38.


Pearls of Functional Algorithm Design by Richard Bird


bioinformatics, Menlo Park, sorting algorithm

Final remarks The origins of the maximum segment sum problem go back to about 1975, and its history is described in one of Bentley’s (1987) programming pearls. For a derivation using invariant assertions, see Gries (1990); for an algebraic approach, see Bird (1989). The problem refuses to go away, and variations are still an active topic for algorithm designers because of potential applications in data-mining and bioinformatics; see Mu (2008) for recent results. The interest in the non-segment problem is what it tells us about any maximum marking problem in which the marking criterion can be formulated 78 Pearls of Functional Algorithm Design as a regular expression. For instance, it is immediate that there is an O(nk ) algorithm for computing the maximum at-least-length-k segment problem because F ∗ T n F ∗ (n ≥ k ) can be recognised by a k -state automaton.

In particular, the function sorttails that returns the unique permutation that sorts the tails of a list can be obtained from the final program for ranktails simply by replacing resort ·concat ·label in the first line of ranktails by concat. The function sorttails is needed as a preliminary step in the Burrows–Wheeler algorithm for data compression, a problem we will take up in the following pearl. The problem of sorting the suffixes of a string has been treated extensively in the literature because it has other applications in string matching and bioinformatics; a good source is Gusfield (1997). This pearl was rewritten a number of times. Initially we started out with the idea of computing perm, a permutation that sorts a list. But perm is too specific in the way it treats duplicates: there is more than one permutation that sorts a list containing duplicate elements. One cannot get very far with perm unless one generalises to either rank or partition.

– array index, 25, 29, 87, 100 – prefix, 103, 119, 127 accumArray, 2, 5, 82, 123 applyUntil, 82 array, 29, 85 bounds, 25 break , 154, 164, 182 compare, 29 concatMap, 42 elems, 85 foldrn – fold over nonempty lists, 42 fork , 35, 83, 94, 118 inits, 66, 67, 117 listArray, 25, 100 minors, 172 nodups, 149 nub, 64 partition, 4 partitions, 38 reverse, 119, 244 scanl, 118, 238 scanr , 70 sort, 28, 95 sortBy, 29, 94 span, 67 subseqs, 57, 65, 157, 163 tails, 7, 79, 100, 102 transpose, 98, 150, 193 unfoldr , 202, 243 zip, 35, 83 zipWith, 83 Abelian group, 27 abides property, 3, 22 abstraction function, 129, 211, 226 accumulating function, 2 accumulating parameter, 131, 138, 140, 177, 253 adaptive encoding, 200 amortised time, 5, 118, 131, 133 annotating a tree, 170 arithmetic decoding, 201 arithmetic expressions, 37, 156 array update operation, 3, 6 arrays, 1, 2, 21, 29, 85, 99 association list, 29, 238 asymptotic complexity, 27 bags, 25, 50, 51 balanced trees, 21, 54, 234 Bareiss algorithm, 186 bijection, 129 binary search, 7, 10, 14, 15, 19, 54 binomial trees, 178 bioinformatics, 77, 90 Boolean satisfiability, 155 borders of a list, 103 bottom-up algorithm, 41 boustrophedon product, 245, 251, 260 breadth-first search, 136, 137, 178 Bulldozer algorithm, 196 bzip2, 101 call-tree, 168 Cartesian coordinates, 141, 155 Cartesian product, 149 celebrity clique, 56 Chió’s identity, 182 clique, 56 combinatorial patterns, 242 comparison-based sorting, 10, 16, 27 computaional geometry, 188 conjugate, 263 constraint satisfaction, 155 continuations, 273 coroutines, 273 275 276 cost function, 41, 48, 52 cyclic structures, 133, 179 data compression, 91, 198 data mining, 77 data refinement, 5, 48, 108, 114, 129, 210 deforestation, 168 depth-first search, 137, 221, 222 destreaming, 214 destreaming theorem, 214 Dilworth’s theorem, 54 divide and conquer, 1, 3, 5, 7, 8, 15, 21–23, 27, 29, 30, 65, 81, 171 dot product, 185 dynamic programming, 168 EOF (end-of-file symbol), 203 exhaustive search, 12, 33, 39, 57, 148, 156 facets, 190 failure function, 133 fictitious values, 14, 77 finite automaton, 74, 136 fission law of foldl, 130 fixpoint induction, 205 forests, 42, 174 fringe of a tree, 41 frontier, 137 fully strict composition, 243 fusion law of foldl, 76, 130, 195 fusion law of foldr , 34, 51, 52, 61, 247, 260, 261, 265 fusion law of foldrn, 43 fusion law of fork , 35 fusion law of unfoldr , 206, 212 Galil’s algorithm, 122 garbage collection, 165, 166 Garsia–Wachs algorithm, 49 Gaussian elimination, 180 graph traversal, 178, 221 Gray path order, 258 greedy algorithms, 41, 48, 50, 140 Gusfield’s Z algorithm, 116 Hu–Tucker algorithm, 49 Huffman coding, 91, 198, 201 immutable arrays, 25 incremental algorithm, 188, 191, 204 incremental decoding, 216 incremental encoding, 203, 209 indexitis, 150 inductive algorithm, 42, 93, 102 integer arithmetic, 182, 198, 208 integer division, 182 intermediate data structure, 168 interval expansion, 209, 210 inversion table, 10 inverting a function, 12, 93 involution, 150 iterative algorithm, 10, 82, 109, 113 Index Knuth and Ruskey algorithm, 258 Knuth’s spider spinning algorithm, 242 Koda–Ruskey algorithm, 242 law of iterate, 99 laws of filter , 118, 152 laws of fork , 35 lazy evaluation, 33, 147, 185, 243 leaf-labelled trees, 41, 165, 168 left spines, 43, 45, 177 left-inverse, 129 Leibniz formula, 180 lexicographic ordering, 45, 52, 64, 102, 104 linear ordering, 43 linked list, 225 longest common prefix, 103, 112, 120 longest decreasing subsequence, 54 loop invariants, 62, 111 lower bounds, 16, 27, 28, 64 Mahajan and Vinay’s algorithm, 186 majority voting problem, 62 matrices, 147, 181 matrix Cartesian product, 149 maximum marking problems, 77 maximum non-segment sum, 73 maximum segment sum, 73 maximum surpasser count, 7 McCarthy S-expression, 221 memo table, 163 memoisation, 162 merge, 26, 142, 158 mergesort, 29, 89, 171, 173 minimal element, 53 minimum cost tree, 44 minimum element, 53 minors, 181 model checking, 155 monads, 3, 114, 155 monotonicity condition, 48, 53 move-to-front encoding, 91 multisets, 25 narrowing, 199 nondeterministic functions, 43, 51 normal form, 160 online list labelling, 241 Open Problems Project, 31 optimal bracketing, 176 optimisation problems, 48, 176 order-maintenance problem, 241 overflow, 214 parametricitiy, 62 partial evaluation, 134 partial ordering, 53 partial preorder, 52 partition sort, 85 partition sorting, 87 perfect binary trees, 171 Index permutations, 79, 90, 91, 96, 97, 180, 189, 242, 251 planning algorithm, 136, 138 plumbing combinators, 36 prefix, 66 prefix ordering, 103, 105, 119 preorder traversal, 245, 270 principal submatrices, 185 program transformation, 221 PSPACE completeness, 136 queues, 109, 137, 248, 249 Quicksort, 5, 85, 89 radix sort, 95, 101 ranking a list, 79 rational arithmetic, 180, 188, 198 rational division, 181 recurrence relations, 15, 31, 88 refinement, 44, 48, 51–53, 80 regular cost function, 49 regular expression, 74 relations, 48, 167, 229 representation function, 129, 211 right spines, 177 Rose trees, 164, 245 rotations of a list, 91 rule of floors, 215 run-length encoding, 91 saddleback search, 14 safe replacement, 222 scan lemma, 118, 125 segments, 73, 171 Shannon–Fano coding, 198 sharing, 168, 173 shortest upravel, 50 simplex, 188 skeleton trees, 165 sliding-block puzzle, 136 smart constructors, 48, 170, 177 smooth algorithms, 241 solving a recursion, 98 sorting, 9, 10, 16, 91, 149 sorting numbers, 1, 3 sorting permutation, 10 space/time trade-offs, 156 spanning tree, 178 stable sorting algorithm, 86, 95 stacks, 137, 221, 222 streaming, 203, 214 streaming theorem, 204 string matching, 112, 117, 127 stringology, 103 subsequences, 50, 64, 74, 162, 177, 242 suffix tree, 101 suffixes, 79, 100 Sylvester’s identity, 186 thinning algorithm, 161 top-down algorithm, 41 totally acyclic digraph, 258 transitions, 242 trees, 130, 165, 248 tries, 163 tupling law of foldl, 118, 125 tupling law of foldr , 247 unfolds, 168 unmerges, 158, 159, 165 unravel, 50 upper triangular matrix, 185 Vandermonde’s convolution, 17 well-founded recursion, 4, 30 while loop, 111, 113 wholemeal programming, 150 windows of a text, 120 Young tableau, 28 277


pages: 678 words: 216,204

The Wealth of Networks: How Social Production Transforms Markets and Freedom by Yochai Benkler


affirmative action, barriers to entry, bioinformatics, Brownian motion, call centre, Cass Sunstein, centre right, clean water, dark matter, desegregation, East Village, fear of failure, Firefox, game design, George Gilder, hiring and firing, Howard Rheingold, informal economy, invention of radio, Isaac Newton, iterative process, Jean Tirole, jimmy wales, market bubble, market clearing, Marshall McLuhan, New Journalism, optical character recognition, pattern recognition, pre–internet, price discrimination, profit maximization, profit motive, random walk, recommendation engine, regulatory arbitrage, rent-seeking, RFID, Richard Stallman, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, shareholder value, Silicon Valley, Skype, slashdot, social software, software patent, spectrum auction, technoutopianism, The Fortune at the Bottom of the Pyramid, The Nature of the Firm, transaction costs

As more of the process of drug discovery of potential leads can be done by modeling and computational analysis, more can be organized for peer production. The relevant model here is open bioinformatics. Bioinformatics generally is the practice of pursuing solutions to biological questions using mathematics and information technology. Open bioinformatics is a movement within bioinformatics aimed at developing the tools in an open-source model, and in providing access to the tools and the outputs on a free and open basis. Projects like these include the Ensmbl Genome Browser, operated by the European Bioinformatics Institute and the Sanger Centre, or the National Center for Biotechnology Information (NCBI), both of which use computer databases to provide access to data and to run various searches on combinations, patterns, and so forth, in the data.


pages: 834 words: 180,700

The Architecture of Open Source Applications by Amy Brown, Greg Wilson


8-hour work day, anti-pattern, bioinformatics,, cloud computing, collaborative editing, combinatorial explosion, computer vision, continuous integration, create, read, update, delete, Debian, domain-specific language,, fault tolerance, finite state, Firefox, friendly fire, linked data, load shedding, locality of reference, loose coupling, Mars Rover, MVC pattern, premature optimization, recommendation engine, revision control, side project, Skype, slashdot, social web, speech recognition, the scientific method, The Wisdom of Crowds, web application, WebSocket

Amy Brown (editorial): Amy has a bachelor's degree in Mathematics from the University of Waterloo, and worked in the software industry for ten years. She now writes and edits books, sometimes about software. She lives in Toronto and has two children and a very old cat. C. Titus Brown (Continuous Integration): Titus has worked in evolutionary modeling, physical meteorology, developmental biology, genomics, and bioinformatics. He is now an Assistant Professor at Michigan State University, where he has expanded his interests into several new areas, including reproducibility and maintainability of scientific software. He is also a member of the Python Software Foundation, and blogs at Roy Bryant (Snowflock): In 20 years as a software architect and CTO, Roy designed systems including Electronics Workbench (now National Instruments' Multisim) and the Linkwalker Data Pipeline, which won Microsoft's worldwide Winning Customer Award for High-Performance Computing in 2006.

He has since contributed to almost all areas of Asterisk development, from project management to core architectural design and development. He blogs at Rosangela Canino-Koning (Continuous Integration): After 13 years of slogging in the software industry trenches, Rosangela returned to university to pursue a Ph.D. in Computer Science and Evolutionary Biology at Michigan State University. In her copious spare time, she likes to read, hike, travel, and hack on open source bioinformatics software. She blogs at Francesco Cesarini (Riak): Francesco Cesarini has used Erlang on a daily basis since 1995, having worked in various turnkey projects at Ericsson, including the OTP R1 release. He is the founder of Erlang Solutions and co-author of O'Reilly's Erlang Programming. He currently works as Technical Director at Erlang Solutions, but still finds the time to teach graduates and undergraduates alike at Oxford University in the UK and the IT University of Gotheburg in Sweden.

After graduate studies in distributed systems at Carnegie-Mellon University, he worked on compilers (Tartan Labs), printing and imaging systems (Adobe Systems), electronic commerce (Adobe Systems, Impresse), and storage area network management (SanNavigator, McDATA). Returning to distributed systems and HDFS, Rob found many familiar problems, but all of the numbers had two or three more zeros. James Crook (Audacity): James is a contract software developer based in Dublin, Ireland. Currently he is working on tools for electronics design, though in a previous life he developed bioinformatics software. He has many audacious plans for Audacity, and he hopes some, at least, will see the light of day. Chris Davis (Graphite): Chris is a software consultant and Google engineer who has been designing and building scalable monitoring and automation tools for over 12 years. Chris originally wrote Graphite in 2006 and has lead the open source project ever since. When he's not writing code he enjoys cooking, making music, and doing research.


Exploring Everyday Things with R and Ruby by Sau Sheong Chang


Alfred Russel Wallace, bioinformatics, business process, butterfly effect, cloud computing, Craig Reynolds: boids flock, Debian, Edward Lorenz: Chaos theory, Gini coefficient, income inequality, invisible hand, p-value, price stability, Skype, statistical model, stem cell, Stephen Hawking, text mining, The Wealth of Nations by Adam Smith, We are the 99%, web application, wikimedia commons

The largest is CRAN (Comprehensive R Archive Network; CRAN is hosted by the R Foundation (the same organization that is developing R) and contains 3,646 packages as of this writing. CRAN is also mirrored in many sites worldwide. Another public repository is Bioconductor (, an open source project that provides tools for bioinformatics and is primarily R-based. While the packages in Bioconductor are focused on bioinformatics, it doesn’t mean that they can’t be used for other domains. As of this writing, there are 516 packages in Bioconductor. Finally, there is R-Forge (, a collaborative software development application for R. It is based on FusionForge, a fork from GForge (on which RubyForge was based), which in turn was forked from the original software that was used to build SourceForge.


pages: 623 words: 448,848

Food Allergy: Adverse Reactions to Foods and Food Additives by Dean D. Metcalfe


Albert Einstein, bioinformatics, epigenetics, impulse control, life extension, meta analysis, meta-analysis, mouse model, pattern recognition, phenotype, placebo effect, randomized controlled trial, statistical model, stem cell

J Allergy Clin Immunol 2000;106:228–38. 73 Thomas K, Bannon G, Hefle S, et al. In silico methods for evaluating human allergenicity to novel proteins. Bioinformatics Workshop Meeting Report, February 23–24, 2005. Toxicol Sci 2005;88:307–10. 74 Ladics GS, Bannon GA, Silvanovich A, Cressman, RF. Comparison of conventional FASTA identity searches with the 80 amino acid sliding window FASTA search for the elucidation of potential identities to known allergens. Mol Nutr Food Res 2007;51:985–998. 75 Bannon G, Ogawa T. Evaluation of available IgE-binding epitope data and its utility in bioinformatics. Mol Nutr Food Res 2006;50:638–44. 76 Hileman RE, Silvanovich A, Goodman RE, et al. Bioinformatic methods for allergenicity assessment using a comprehensive allergen database. Int Archives Allergy Immunol 2002;128:280–91. 77 Silvanovich A, Nemeth MA, Song P, et al.

The most important food allergen families will be discussed in this chapter. Food allergen protein families Based on their shared amino acid sequences and conserved three-dimensional structures, proteins can be classified into families using various bioinformatics tools which form the basis of several protein family databases, one of which is Pfam [8]. Over the past 10 years or so there has been an explosion in the numbers of well characterized allergens, which have been sequenced and are being collected into a number of databases to facilitate bioinformatic analysis [9]. We have undertaken this analysis for both plant [1] and animal food allergens [10] along with pollen allergens [2]. They show similar distributions with the majority of allergens in each group falling into just 3–12 families with a tail 43 44 Chapter 4 of between 14 and 23 families comprising between 1 and 3 allergens each.

For example, the Codex Alimentarius (www.codexalimentarius. net/web/index_en.jsp) recommended a percentage identity score of at least 35% matched amino acid residues of at least 80 residues as being the lowest identity criteria for proteins derived from biotechnology that could suggest IgE cross-reactivity with a known allergen. However, Aalberse [72] has noted that proteins sharing less than 50% identity across the full length of the protein sequence are unlikely to be cross-reactive, and immunological cross-reactivity may not occur unless the proteins share at least 70% identity. Recent published work has led to the harmonization of the methods used for bioinformatic searches and a better understanding of the data generated [73,74] from such studies. An additional bioinformatics approach can be taken by searching for 100% identity matches along short sequences contained in the query sequence as they are compared to sequences in a database. These regions of short amino acid sequence homologies are intended to represent the smallest sequence that could function as an IgE-binding epitope [75]. If any exact matches between a known allergen and a transgenic sequence were found using this strategy, it could represent the most conservative approach to predicting potential for a peptide fragment to act as an allergen.


pages: 292 words: 85,151

Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It) by Salim Ismail, Yuri van Geest


23andMe, 3D printing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, bioinformatics, bitcoin, Black Swan, blockchain, Burning Man, business intelligence, business process, call centre, chief data officer, Clayton Christensen, clean water, cloud computing, cognitive bias, collaborative consumption, collaborative economy, corporate social responsibility, cross-subsidies, crowdsourcing, cryptocurrency, dark matter, Dean Kamen, dematerialisation, discounted cash flows, distributed ledger, Edward Snowden, Elon Musk,, ethereum blockchain, Galaxy Zoo, game design, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, hiring and firing, Hyperloop, industrial robot, Innovator's Dilemma, Internet of things, Iridium satellite, Isaac Newton, Jeff Bezos, Kevin Kelly, Kickstarter, knowledge worker, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, loose coupling, loss aversion, Lyft, Mark Zuckerberg, market design, means of production, minimum viable product, natural language processing, Netflix Prize, Network effects, new economy, Oculus Rift, offshore financial centre, p-value, PageRank, pattern recognition, Paul Graham, Peter H. Diamandis: Planetary Resources, Peter Thiel, prediction markets, profit motive, publish or perish, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, Ronald Coase, Second Machine Age, self-driving car, sharing economy, Silicon Valley, skunkworks, Skype, smart contracts, Snapchat, social software, software is eating the world, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, subscription business, supply-chain management, TaskRabbit, telepresence, telepresence robot, Tony Hsieh, transaction costs, Tyler Cowen: Great Stagnation, urban planning, WikiLeaks, winner-take-all economy, X Prize, Y Combinator

Once any domain, discipline, technology or industry becomes information-enabled and powered by information flows, its price/performance begins doubling approximately annually. Third, once that doubling pattern starts, it doesn’t stop. We use current computers to design faster computers, which then build faster computers, and so on. Finally, several key technologies today are now information-enabled and following the same trajectory. Those technologies include artificial intelligence (AI), robotics, biotech and bioinformatics, medicine, neuroscience, data science, 3D printing, nanotechnology and even aspects of energy. Never in human history have we seen so many technologies moving at such a pace. And now that we are information-enabling everything around us, the effects of the Kurzweil’s Law of Accelerating Returns are sure to be profound. What’s more, as these technologies intersect (e.g., using deep-learning AI algorithms to analyze cancer trials), the pace of innovation accelerates even further.

Of the 155 teams competing, three were awarded a total of $100,000 in prize money. What was particularly interesting was the fact that none of the winners had prior experience with natural language processing (NLP). Nonetheless, they beat the experts, many of them with decades of experience in NLP under their belts. This can’t help but impact the current status quo. Raymond McCauley, Biotechnology & Bioinformatics Chair at Singularity University, has noticed that “When people want a biotech job in Silicon Valley, they hide their PhDs to avoid being seen as a narrow specialist.” So, if experts are suspect, where should we turn instead? As we’ve already noted, everything is measurable. And the newest profession making those measurements is the data scientist. Andrew McAfee calls this new breed of data experts “geeks.”


pages: 323 words: 92,135

Running Money by Andy Kessler


Andy Kessler, Apple II, bioinformatics, British Empire, business intelligence, buy low sell high, call centre, Corn Laws, family office, full employment, George Gilder, happiness index / gross national happiness, interest rate swap, invisible hand, James Hargreaves, James Watt: steam engine, joint-stock company, joint-stock limited liability company, knowledge worker, Long Term Capital Management, mail merge, margin call, market bubble, Maui Hawaii, Menlo Park, Network effects, packet switching, pattern recognition,, railway mania, risk tolerance, Sand Hill Road, Silicon Valley, South China Sea, spinning jenny, Steve Jobs, Steve Wozniak, Toyota Production System

They analyzed central banks and politicians and figured out the direction of currencies. In an era of relatively stable currencies, the modern-day investor has to dig, early and often and everywhere. I’d still rather dig than get whacked by a runaway yen-carry trade. Another cycle is coming. The drivers of it are still unclear. 296 Running Money Likely suspects are things like wireless data, on-command computing, nanotechnology, bioinformatics, genomic sorting—who the hell knows what it will be. But this is what I do. Looking for the next barrier, the next piece of technology, the next waterfall and the next great, longterm investment. Sounds quaint. I’ve come a long way from tripping across Homa Simpson dolls trying to raise money in Hong Kong. Or getting sweated on by desperate Koreans. Or driving around all day with Fred. Or getting thrown out of deals.

See AOL Andreessen, Marc, 197, 199 animation, 134–35 AOL (America Online), 69–73, 207, 208, 223, 290 Cisco routers and, 199 Inktomic cache software and, 143 Netscape Navigator purchase, 201, 225 Telesave deal, 72–73 TimeWarner deal, 223, 229 as top market cap company, 111 Apache Web server, 247 Apple Computer, 45, 127, 128 Apple II, 183 Applied Materials, 245 Archimedes (propeller ship), 94 Arkwright, Richard, 65 ARPANET, 186, 187, 189, 191 Arthur Andersen, 290 Artists and Repertoire (A&R), 212, 216 Asian debt crisis, 3, 150, 151, 229, 260 yen and, 162–65, 168, 292 @ (at sign), 187 AT&T, 61, 185–86, 189 August Capital, 2, 4 auto industry, 267–68 Aziz, Tariq, 26 Babbage, Charles, 93 Baker, James, 26 Balkanski, Alex, 44, 249 bandwidth, 60, 111, 121, 140, 180, 188–89 Baran, Paul, 184, 185 Barbados, 251, 254 300 Index Barksdale, Jim, 198, 199–201 Barksdale Group, 201 BASE, 249 BASIC computer language, 126, 127 BBN. See Bolt, Baranek and Newman Bechtolsheim, Andy, 191 Bedard, Kipp, 19–20 Bell, Dave, 127 Bell Labs, 103, 110 Berry, Hank, 205–6, 208 Bezos, Jeff, 228 Biggs, Barton, 163 big-time trends. See waterfalls bioinformatics, 296 biotech industry, 237 Black, Joseph, 54 Blutcher (steam locomotive), 92 Boggs, David, 189, 190 Bolt, Baranek and Newman, 184, 187 bonds, 11, 30–31, 164 Bonsal, Frank, 144–49 Borislow, Daniel, 72–73 Bosack, Len, 191 Boulton, Matthew, 55–58, 65, 66, 89 Boulton & Watt Company, 56–58, 64, 65, 89, 246, 247, 272 Bowman, Larry, 291–92 Bowman Capital, 291 Brady bonds, 164 Britain, 42, 50–59, 258 industrial economy, 42, 64–68, 91–95, 272 patent law, 55 textile manufacture, 64–68 wealth creation, 257, 271–72 broadband, 164, 225 browsers, 196–201 Brunel, I.


Algorithms Unlocked by Thomas H. Cormen


bioinformatics, knapsack problem, NP-complete, optical character recognition, Silicon Valley, sorting algorithm, traveling salesman

A clique in an undirected graph G is a subset S of vertices such that the graph has an edge between every pair of vertices in S. The size of a clique is the number of vertices it contains. As you might imagine, cliques play a role in social network theory. Modeling each individual as a vertex and relationships between individuals as undirected edges, a clique represents a group of individuals all of whom have relationships with each other. Cliques also have applications in bioinformatics, engineering, and chemistry. The clique problem takes two inputs, a graph G and a positive integer k, and asks whether G has a clique of size k. For example, the graph on the next page has a clique of size 4, shown with heavily shaded vertices, and no other clique of size 4 or greater. 192 Chapter 10: Hard? Problems Verifying a certificate is easy. The certificate is the k vertices claimed to form a clique, and we just have to check that each of the k vertices has an edge to the other k 1.

Vertex cover A vertex cover in an undirected graph G is a subset S of the vertices such that every edge in G is incident on at least one vertex in S. We say that each vertex in S “covers” its incident edges. The size of a vertex cover is the number of vertices it contains. As in the clique problem, the vertex-cover problem takes as input an undirected graph G and a positive integer m. It asks whether G has a vertex cover of size m. Like the clique problem, the vertex-cover problem has applications in bioinformatics. In another application, you have a building with hallways and cameras that can scan up to 360 degrees located at the intersections of hallways, and you want to know whether m cameras will allow you to see all the hallways. Here, edges model hallways and vertices model intersections. In yet another application, finding vertex covers helps in designing strategies to foil worm attacks on computer networks.


pages: 137 words: 36,231

Information: A Very Short Introduction by Luciano Floridi


agricultural Revolution, Albert Einstein, bioinformatics, carbon footprint, Claude Shannon: information theory, conceptual framework, double helix, Douglas Engelbart, George Akerlof, Gordon Gekko, industrial robot, Internet of things, invention of writing, John Nash: game theory, John von Neumann, moral hazard, Nash equilibrium, Norbert Wiener, phenotype, prisoner's dilemma, RAND corporation, RFID, Turing machine

Consider the following examples: medical information is information about medical facts (attributive use), not information that has curative properties; digital information is not information about something digital, but information that is in itself of digital nature (predicative use); and military information can be both information about something military (attributive) and of military nature in itself (predicative). When talking about biological or genetic information, the attributive sense is common and uncontroversial. In bioinformatics, for example, a database may contain medical records and genealogical or genetic data about a whole population. Nobody disagrees about the existence of this kind of biological or genetic information. It is the predicative sense that is more contentious. Are biological or genetic processes or elements intrinsically informational in themselves? If biological or genetic phenomena count as informational predicatively, is this just a matter of modelling, that is, may be seen as being informational?


The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil


additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, augmented reality, autonomous vehicles, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business intelligence,, call centre, carbon-based life, cellular automata, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, disintermediation, double helix, Douglas Hofstadter,, epigenetics, factory automation, friendly AI, George Gilder, Gödel, Escher, Bach, informal economy, information retrieval, invention of the telephone, invention of the telescope, invention of writing, Isaac Newton, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Mikhail Gorbachev, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Norbert Wiener, oil shale / tar sands, optical character recognition, pattern recognition, phenotype, premature optimization, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Richard Feynman, Rodney Brooks, Search for Extraterrestrial Intelligence, semantic web, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, transaction costs, Turing machine, Turing test, Vernor Vinge, Y2K, Yogi Berra

Kurzweil Technologies is working with UT to develop pattern recognition-based analysis from either "Holter" monitoring (twenty-four-hour recordings) or "Event" monitoring (thirty days or more). 190. Kristen Philipkoski, "A Map That Maps Gene Functions," Wired News, May 28, 2002,,1286,52723,00.html. 191. Jennifer Ouellette, "Bioinformatics Moves into the Mainstream," The Industrial Physicist (October–November 2003), 192. Port, Arndt, and Carey, "Smart Tools." 193. "Protein Patterns in Blood May Predict Prostate Cancer Diagnosis," National Cancer Institute, October 15, 2002,, reporting on Emanuel F. Petricoin et al., "Serum Proteomic Patterns for Detection of Prostate Cancer," Journal of the National Cancer Institute 94 (2002): 1576–78. 194.

DARPA's Information Processing Technology Office's project in this vein is called LifeLog,; see also Noah Shachtman, "A Spy Machine of DARPA's Dreams," Wired News, May 20, 2003,,1367,58909,00.html; Gordon Bell's project (for Microsoft) is MyLifeBits,; for the Long Now Foundation, see 44. Bergeron is assistant professor of anesthesiology at Harvard Medical School and the author of such books as Bioinformatics Computing, Biotech Industry: A Global, Economic, and Financing Overview, and The Wireless Web and Healthcare. 45. The Long Now Foundation is developing one possible solution: the Rosetta Disk, which will contain extensive archives of text in languages that may be lost in the far future. They plan to use a unique storage technology based on a two-inch nickel disk that can store up to 350,000 pages per disk, with an estimated life expectancy of 2,000 to 10,000 years.


pages: 648 words: 108,814

Solr 1.4 Enterprise Search Server by David Smiley, Eric Pugh


Amazon Web Services, bioinformatics, cloud computing, continuous integration, database schema, domain-specific language,, fault tolerance, Firefox, information retrieval, Internet Archive, web application, Y Combinator

., which received angel funding from the Y Combinator fund, and he relocated to San Francisco. WebMynd is one of the largest installations of Solr, indexing up to two million HTML documents per day, and making heavy use of Solr's multicore features to enable a partially active index. Jerome Eteve holds a BSC in physics, maths and computing and an MSC in IT and bioinformatics from the University of Lille (France). After starting his career in the field of bioinformatics, where he worked as a biological data management and analysis consultant, he's now a senior web developer with interests ranging from database level issues to user experience online. He's passionate about open source technologies, search engines, and web application architecture. At present, he is working since 2006 for Careerjet Ltd, a worldwide job search engine.


pages: 481 words: 121,669

The Invisible Web: Uncovering Information Sources Search Engines Can't See by Gary Price, Chris Sherman, Danny Sullivan


AltaVista, American Society of Civil Engineers: Report Card, bioinformatics, Brewster Kahle, business intelligence, dark matter, Douglas Engelbart, full text search, HyperCard, hypertext link, information retrieval, Internet Archive, joint-stock company, knowledge worker, natural language processing, pre–internet, profit motive, publish or perish, search engine result page, side project, Silicon Valley, speech recognition, stealth mode startup, Ted Nelson, Vannevar Bush, web application

FishBase is a relational database with fish information to cater to different professionals such as research scientists, fisheries managers, zoologists, and many more. FishBase on the Web contains practically all fish species known to science.” Search Form URL: GeneCards “GeneCards is a database of human genes, their products, and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol, as well as selected others [gene listing].” Search Form URL: Integrated Taxonomic Information System (Biological Names) “The Integrated Taxonomic Information System (ITIS) is a partnership of U.S., Canadian, and Mexican agencies, other organizations, and taxonomic specialists cooperating on the development of an online, scientifically credible, list of biological names focusing on the biota of North America.”


pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang


AI winter, artificial general intelligence, bioinformatics, brain emulation, combinatorial explosion, complexity theory, computer vision, conceptual framework, correlation coefficient, epigenetics, friendly AI, information retrieval, Isaac Newton, John Conway, Loebner Prize, Menlo Park, natural language processing, Occam's razor, p-value, pattern recognition, performance metric, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K

We do not buy the argument that “Since X plays an important role in intelligence, studying X contributes to the study of intelligence in general”, where X can be replaced by reasoning, learning, planning, perceiving, acting, etc. On the contrary, we believe that most of the current AI research works make little direct contribution to AGI, though these works have value for many other reasons. Previously we have mentioned “machine learning” as an example. One of us (Goertzel) has published extensively about applications of machine learning algorithms to bioinformatics. This is a valid, and highly important sort of research – but it doesn’t have much to do with achieving general intelligence. There is no reason to believe that “intelligence” is simply a toolbox, containing mostly unconnected tools. Since the current AI “tools” have been built according to very different theoretical considerations, to implement them as modules in a big system will not necessarily make them work together, correctly and efficiently.

Unlike most contemporary AI projects, it is specifically oriented towards artificial general intelligence (AGI), rather than being restricted by design to one narrow domain or range of cognitive functions. The NAIE integrates aspects of prior AI projects and approaches, including symbolic, neural-network, evolutionary programming and reinforcement learning. The existing codebase is being applied in bioinformatics, NLP and other domains. To save space, some of the discussion in this paper will assume a basic familiarity with NAIE structures such as Atoms, Nodes, Links, ImplicationLinks and so forth, all of which are described in previous references and in other papers in this volume. 1.2. Cognitive Development in Simulated Androids Jean Piaget, in his classic studies of developmental psychology [8] conceived of child development as falling into four stages, each roughly identified with an age group: infantile, preoperational, concrete operational, and formal.


pages: 199 words: 47,154

Gnuplot Cookbook by Lee Phillips


bioinformatics, computer vision, general-purpose programming language, pattern recognition, statistical model, web application

Phillips is now the Chief Scientist of the Alogus Research Corporation, which conducts research in the physical sciences and provides technology assessment for investors. I am grateful to the users of my gnuplot web pages for their interest, questions, and suggestions over the years, and to my family for their patience and support. About the Reviewers Andreas Bernauer is a Software Engineer at Active Group in Germany. He graduated at Eberhard Karls Universität Tübingen, Germany, with a Degree in Bioinformatics and received a Master of Science degree in Genetics from the University of Connecticut, USA. In 2011, he earned a doctorate in Computer Engineering from Eberhard Karls Universität Tübingen. Andreas has more than 10 years of professional experience in software engineering. He implemented the server-side scripting engine in the scheme-based SUnet web server, hosted the Learning-Classifier-System workshops in Tübingen.


pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem


Amazon Web Services, anti-pattern, bioinformatics, corporate governance, create, read, update, delete, data acquisition,, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, web application

Graphs, on the other hand use index-free adjacency to ensure that traversing connected data is extremely rapid. The social network example helps illustrate how different technologies deal with con‐ nected data, but is it a valid use case? Do we really need to find such remote “friends?” But substitute social networks for any other domain, and you’ll see we experience similar performance, modeling and maintenance benefits. Whether music or data center man‐ agement, bio-informatics or football statistics, network sensors or time-series of trades, graphs provide powerful insight into our data. Let’s look, then, at another contemporary application of graphs: recommending products based on a user’s purchase history and the histories of their friends, neighbours, and other people like them. With this example, we’ll bring together several independent facets of a user’s lifestyle to make accurate and profitable recommendations.


pages: 271 words: 52,814

Blockchain: Blueprint for a New Economy by Melanie Swan


23andMe, Airbnb, altcoin, Amazon Web Services, asset allocation, banking crisis, bioinformatics, bitcoin, blockchain, capital controls, cellular automata, central bank independence, clean water, cloud computing, collaborative editing, Conway's Game of Life, crowdsourcing, cryptocurrency, disintermediation, Edward Snowden,, ethereum blockchain, fault tolerance, fiat currency, financial innovation, Firefox, friendly AI, Hernando de Soto, Internet Archive, Internet of things, Khan Academy, Kickstarter, litecoin, Lyft, M-Pesa, microbiome, Network effects, new economy, peer-to-peer lending, personalized medicine, post scarcity, prediction markets, ride hailing / ride sharing, Satoshi Nakamoto, Search for Extraterrestrial Intelligence, SETI@home, sharing economy, Skype, smart cities, smart contracts, smart grid, software as a service, technological singularity, Turing complete, unbanked and underbanked, underbanked, web application, WikiLeaks

Bitcoin Magazine, May 22, 2014. 126 Buterin, V. “Primecoin: The Cryptocurrency Whose Mining Is Actually Useful.” Bitcoin Magazine, July 8, 2013. 127 Myers, D.S., A.L. Bazinet, and M.P. Cummings. “Expanding the Reach of Grid Computing: Combining Globus-and BOINC-Based Systems.” Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland, February 6, 2007 (Draft). 128 Clenfield, J. and P. Alpeyev. “The Other Bitcoin Power Struggle.” Bloomberg Businessweek, April 24, 2014. 129 Gimein, M.


pages: 239 words: 70,206

Data-Ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else by Steve Lohr


23andMe, Affordable Care Act / Obamacare, Albert Einstein, big data - Walmart - Pop Tarts, bioinformatics, business intelligence, call centre, cloud computing, computer age, conceptual framework, Credit Default Swap, crowdsourcing, Daniel Kahneman / Amos Tversky, Danny Hillis, data is the new oil, David Brooks, East Village, Edward Snowden, Emanuel Derman, Erik Brynjolfsson, everywhere but in the productivity statistics, Frederick Winslow Taylor, Google Glasses, impulse control, income inequality, indoor plumbing, industrial robot, informal economy, Internet of things, invention of writing, John von Neumann, Mark Zuckerberg, market bubble, meta analysis, meta-analysis, natural language processing, obamacare, pattern recognition, payday loans, personalized medicine, precision agriculture, pre–internet, Productivity paradox, RAND corporation, rising living standards, Robert Gordon, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, speech recognition, statistical model, Steve Jobs, Steven Levy, The Design of Experiments, the scientific method, Thomas Kuhn: the structure of scientific revolutions, unbanked and underbanked, underbanked, Von Neumann architecture, Watson beat the top human players on Jeopardy!

He does it selectively, but one speaking engagement in 2010 focused his interest and steered his career in a new direction. He had agreed to give a talk in Seattle at a conference hosted by Sage Bionetworks, a nonprofit organization dedicated to accelerate the sharing of data for biological research. Hammerbacher knew the two medical researchers who had founded the nonprofit, Stephen Friend and Eric Schadt. He had talked to them about how they might use big-data software to cope with the data explosion in bioinformatics and genomics. But the preparation for the speech forced him to really think about biology and technology, reading up and talking to people. The more Hammerbacher looked into it, the more intriguing the subject looked. Biological research, he says, could go the way of finance with its closed, proprietary systems and data being hoarded rather than shared. Or, he says, it could “go the way of the Web”—that is, toward openness.


pages: 284 words: 79,265

The Half-Life of Facts: Why Everything We Know Has an Expiration Date by Samuel Arbesman


Albert Einstein, Alfred Russel Wallace, Amazon Mechanical Turk, Andrew Wiles, bioinformatics, British Empire, Chelsea Manning, Clayton Christensen, cognitive bias, cognitive dissonance, conceptual framework, David Brooks, demographic transition, double entry bookkeeping, double helix, Galaxy Zoo, guest worker program, Gödel, Escher, Bach, Ignaz Semmelweis: hand washing, index fund, invention of movable type, Isaac Newton, John Harrison: Longitude, Kevin Kelly, life extension, meta analysis, meta-analysis, Milgram experiment, Nicholas Carr, p-value, Paul Erdős, Pluto: dwarf planet, randomized controlled trial, Richard Feynman, Richard Feynman, Rodney Brooks, social graph, social web, text mining, the scientific method, Thomas Kuhn: the structure of scientific revolutions, Thomas Malthus, Tyler Cowen: Great Stagnation

“Sildenafil: from angina to erectile dysfunction to pulmonary hypertension and beyond.” Nature Reviews Drug Discovery 5, no. 8 (August 2006): 689–702. 112 software designed to find undiscovered patterns: See TRIZ, a method of invention and discovery. For example, here: 112 computerized systems devoted to drug repurposing: Sanseau, Philippe, and Jacob Koehler. “Editorial: Computational Methods for Drug Repurposing.” Briefings in Bioinformatics 12, no. 4 (July 1, 2011): 301–2. 112 can generate new and interesting: Darden, Lindley. “Recent Work in Computational Scientific Discovery.” In Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society (1997) 161–66. 113 names a novel, computationally created: See TheoryMine: 116 A Cornell professor of earth and atmospheric sciences: Cisne, John L.


Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport


Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, data acquisition, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, Mark Zuckerberg, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining

Netflix created the Netflix Prize for the data science team that could optimize the company’s movie recommendations for customers and, as I noted in chapter 2, is now using big data to help in the creation of proprietary content. The testing firm Kaplan uses its big data to begin advising customers on effective learning and test-preparation strategies. Novartis focuses on big data—the health-care industry calls it informatics—to develop new drugs. Its CEO, Joe Jimenez, commented in an interview, “If you think about the amounts of data that are now available, bioinformatics capability is becoming very important, as is the ability to mine that data and really understand, for example, the specific mutations that are leading to certain types of cancers.”7 These companies’ big data efforts are directly focused on products, services, and customers. This has important implications, of course, for the organizational locus of big data and the processes and pace of new product development.


pages: 552 words: 168,518

MacroWikinomics: Rebooting Business and the World by Don Tapscott, Anthony D. Williams


accounting loophole / creative accounting, airport security, Andrew Keen, augmented reality, Ayatollah Khomeini, barriers to entry, bioinformatics, Bretton Woods, business climate, business process, car-free, carbon footprint, citizen journalism, Clayton Christensen, clean water, Climategate, Climatic Research Unit, cloud computing, collaborative editing, collapse of Lehman Brothers, collateralized debt obligation, colonial rule, corporate governance, corporate social responsibility, crowdsourcing, death of newspapers, demographic transition, distributed generation, don't be evil,, energy security, energy transition, Exxon Valdez, failed state, fault tolerance, financial innovation, Galaxy Zoo, game design, global village, Google Earth, Hans Rosling, hive mind, Home mortgage interest deduction, interchangeable parts, Internet of things, invention of movable type, Isaac Newton, James Watt: steam engine, Jaron Lanier, jimmy wales, Joseph Schumpeter, Julian Assange, Kevin Kelly, knowledge economy, knowledge worker, Marshall McLuhan, medical bankruptcy, megacity, mortgage tax deduction, Netflix Prize, new economy, Nicholas Carr, oil shock, online collectivism, open borders, open economy, pattern recognition, peer-to-peer lending, personalized medicine, Ray Kurzweil, RFID, ride hailing / ride sharing, Ronald Reagan, scientific mainstream, shareholder value, Silicon Valley, Skype, smart grid, smart meter, social graph, social web, software patent, Steve Jobs, text mining, the scientific method, The Wisdom of Crowds, transaction costs, transfer pricing, University of East Anglia, urban sprawl, value at risk, WikiLeaks, X Prize, young professional, Zipcar

Wikis provide a shared space for group learning, discussion, and collaboration, while a Facebook-like social networking application helps connect researchers working on similar problems. Meanwhile, over at the European Bioinformatics Institute, scientists are using Web services to revolutionize the way they extract and interpret data from different sources, and to create entirely new data services. Imagine, for example, you wanted to find out everything there is to know about a species, from its taxonomy and genetic sequence to its geographical distribution. Now imagine you had the power to weave together all the latest data on that species from all of the world’s biological databases with just one click. It’s not far-fetched. That power is here, today. Projects like these have inspired researchers in many fields to emulate the changes that are already sweeping disciplines such as bioinformatics and high-energy physics. Having said that, there will be some difficult adjustments and issues such as privacy and national security to confront along the way.


HBase: The Definitive Guide by Lars George


Amazon Web Services, bioinformatics, create, read, update, delete, Debian, distributed revision control, domain-specific language,, fault tolerance, Firefox, Google Earth, place-making, revision control, smart grid, web application

If we were to take 140 bytes per message, as used by Twitter, it would total more than 17 TB every month. Even before the transition to HBase, the existing system had to handle more than 25 TB a month.[12] In addition, less web-oriented companies from across all major industries are collecting an ever-increasing amount of data. For example: Financial Such as data generated by stock tickers Bioinformatics Such as the Global Biodiversity Information Facility ( Smart grid Such as the OpenPDC ( project Sales Such as the data generated by point-of-sale (POS) or stock/inventory systems Genomics Such as the Crossbow ( project Cellular services, military, environmental Which all collect a tremendous amount of data as well Storing petabytes of data efficiently so that updates and retrieval are still performed well is no easy feat.

A abort() method, HBaseAdmin class, Basic Operations Abortable interface, Basic Operations Accept header, switching REST formats, Supported formats, JSON (application/json), Protocol Buffer (application/x-protobuf) access control, Introduction to Coprocessors, HBase Versus Bigtable Bigtable column families for, HBase Versus Bigtable coprocessors for, Introduction to Coprocessors ACID properties, The Problem with Relational Database Systems add() method, Bytes class, The Bytes Class add() method, Put class, Single Puts addColumn() method, Get class, Single Gets addColumn() method, HBaseAdmin class, Schema Operations addColumn() method, Increment class, Multiple Counters addColumn() method, Scan class, Introduction addFamily() method, Get class, Single Gets addFamily() method, HTableDescriptor class, Table Properties addFamily() method, Scan class, Introduction, Client API: Best Practices add_peer command, HBase Shell, Replication alter command, HBase Shell, Data definition Amazon, The Dawn of Big Data, S3, S3 data requirements of, The Dawn of Big Data S3 (Simple Storage Service), S3, S3 Apache Avro, Introduction to REST, Thrift, and Avro (see Avro) Apache binary release for HBase, Apache Binary Release, Apache Binary Release Apache HBase, Quick-Start Guide (see HBase) Apache Hive, Hive (see Hive) Apache Lucene, Search Integration, Search Integration Apache Maven, Building the Examples (see Maven) Apache Pig, Pig (see Pig) Apache Solr, Search Integration Apache Whirr, deployment using, Apache Whirr, Apache Whirr Apache ZooKeeper, Implementation (see ZooKeeper) API, Native Java (see client API) append feature, for durability, Durability append() method, HLog class, HLog Class architecture, storage, Storage (see storage architecture) assign command, HBase Shell, Tools assign() method, HBaseAdmin class, Cluster Operations AssignmentManager class, The Region Life Cycle AsyncHBase client, Other Clients atomic read-modify-write, Dimensions, Tables, Rows, Columns, and Cells, Storage API, General Notes, Atomic compare-and-set, Atomic compare-and-set, Atomic compare-and-delete, Atomic compare-and-delete, Row Locks, WALEdit Class compare-and-delete operations, Atomic compare-and-delete, Atomic compare-and-delete compare-and-set, for put operations, Atomic compare-and-set, Atomic compare-and-set per-row basis for, Tables, Rows, Columns, and Cells, Storage API, General Notes row locks for, Row Locks for WAL edits, WALEdit Class auto-sharding, Auto-Sharding, Auto-Sharding Avro, Introduction to REST, Thrift, and Avro, Introduction to REST, Thrift, and Avro, Avro, Avro, Operation, Installation, Operation, Operation, Operation, Operation, Advanced Schemas documentation for, Operation installing, Installation port used by, Operation schema compilers for, Avro schema used by, Advanced Schemas starting server for, Operation stopping, Operation B B+ trees, B+ Trees, B+ Trees backup masters, adding, Adding a local backup master, Adding a backup master, Adding a backup master balancer, Load Balancing, Load Balancing, Node Decommissioning balancer command, HBase Shell, Tools, Load Balancing balancer() method, HBaseAdmin class, Cluster Operations, Load Balancing balanceSwitch() method, HBaseAdmin class, Cluster Operations, Load Balancing balance_switch command, HBase Shell, Tools, Load Balancing, Node Decommissioning base64 command, XML (text/xml) Base64 encoding, with REST, XML (text/xml), JSON (application/json) BaseEndpointCoprocessor class, The BaseEndpointCoprocessor class, The BaseEndpointCoprocessor class BaseMasterObserver class, The BaseMasterObserver class, The BaseMasterObserver class BaseRegionObserver class, The BaseRegionObserver class, The BaseRegionObserver class Batch class, The CoprocessorProtocol interface, The BaseEndpointCoprocessor class batch clients, Batch Clients batch operations, Batch Operations, Batch Operations, Caching Versus Batching, Caching Versus Batching, Custom Filters for scans, Caching Versus Batching, Caching Versus Batching, Custom Filters on tables, Batch Operations, Batch Operations batch() method, HTable class, Batch Operations, Batch Operations, Introduction to Counters Bigtable storage architecture, Backdrop, Summary, Nomenclature, HBase Versus Bigtable, HBase Versus Bigtable “Bigtable: A Distributed Storage System for Structured Data” (paper, by Google), Preface, Backdrop bin directory, Apache Binary Release BinaryComparator class, Comparators BinaryPrefixComparator class, Comparators binarySearch() method, Bytes class, The Bytes Class bioinformatics, data requirements of, The Dawn of Big Data BitComparator class, Comparators block cache, Single Gets, Introduction, Column Families, Column Families, Bloom Filters, Region Server Metrics, Client API: Best Practices, Configuration Bloom filters affecting, Bloom Filters controlling use of, Single Gets, Introduction, Client API: Best Practices enabling and disabling, Column Families metrics for, Region Server Metrics settings for, Configuration block replication, MapReduce Locality, MapReduce Locality blocks, Column Families, HFile Format, HFile Format, HFile Format, HFile Format compressing, HFile Format size of, Column Families, HFile Format Bloom filters, Column Families, Bloom Filters, Bloom Filters bypass() method, ObserverContext class, The ObserverContext class Bytes class, Single Puts, Single Gets, The Bytes Class, The Bytes Class C caching, Caching Versus Batching, Caching Versus Batching, Caching Versus Batching, The HTable Utility Methods, Client API: Best Practices, HBase Configuration Properties (see also block cache; Memcached) regions, The HTable Utility Methods for scan operations, Caching Versus Batching, Caching Versus Batching, Client API: Best Practices, HBase Configuration Properties Cacti server, JMXToolkit on, JMX Remote API call() method, Batch class, The CoprocessorProtocol interface CAP (consistency, availability, and partition tolerance) theorem, Nonrelational Database Systems, Not-Only SQL or NoSQL?


pages: 286 words: 90,530

Richard Dawkins: How a Scientist Changed the Way We Think by Alan Grafen; Mark Ridley


Alfred Russel Wallace, Arthur Eddington, bioinformatics, cognitive bias, computer age, conceptual framework, Dava Sobel, double helix, Douglas Hofstadter, epigenetics, Fellow of the Royal Society, Haight Ashbury, interchangeable parts, Isaac Newton, Johann Wolfgang von Goethe, John von Neumann, loose coupling, Murray Gell-Mann, Necker cube, phenotype, profit maximization, Ronald Reagan, Stephen Hawking, Steven Pinker, the scientific method, theory of mind, Thomas Kuhn: the structure of scientific revolutions, Yogi Berra

The invention of an algorithmic biology Seth Bullock BIOLOGY and computing might not seem the most comfortable of bedfellows. It is easy to imagine nature and technology clashing as the green-welly brigade rub up awkwardly against the back-room boffins. But collaboration between the two fields has exploded in recent years, driven primarily by massive investment in the emerging field of bioinformatics charged with mapping the human genome. New algorithms and computational infrastructures have enabled research groups to collaborate effectively on a worldwide scale in building huge, exponentially growing genomic databases, to ‘mine’ these mountains of data for useful information, and to construct and manipulate innovative computational models of the genes and proteins that have been identified.


pages: 471 words: 94,519

Managing Projects With GNU Make by Robert Mecklenburg, Andrew Oram


bioinformatics, general-purpose programming language, Richard Stallman

(question mark), Wildcards calling functions and, Wildcards character classes, Wildcards expanding, Wildcards misuse, Wildcards pattern rules and, Rules ^ (tilde), Wildcards Windows filesystem, Cygwin and, Filesystem wordlist function, String Functions words function, String Functions X XML, Ant, XML Preprocessing build files, Ant preprocessing book makefile, XML Preprocessing About the Author Robert Mecklenburg began using Unix as a student in 1977 and has been programming professionally for 23 years. His make experience started in 1982 at NASA with Unix version 7. Robert received his Ph.D. in Computer Science from the University of Utah in 1991. Since then, he has worked in many fields ranging from mechanical CAD to bioinformatics, and he brings his extensive experience in C++, Java, and Lisp to bear on the problems of project management with make Colophon Our look is the result of reader comments, our own experimentation, and feedback from distribution channels. Distinctive covers complement our distinctive approach to technical topics, breathing personality and life into potentially dry subjects. The animal on the cover of Managing Projects with GNU Make, Third Edition is a potto, a member of the loris family.


pages: 502 words: 107,510

Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs


Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, game design, information retrieval, iterative process, natural language processing, pattern recognition, performance metric, sentiment analysis, social web, speech recognition, statistical model, text mining

Carlson, Andrew, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka Jr., and Tom M. Mitchell. 2010. “Coupled Semi-Supervised Learning for Information Extraction.” In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). Chomsky, Noam. 1957. Syntactic Structures. Paris: Mouton. Chuzhanova, N.A., A.J. Jones, and S. Margetts.1998. “Feature selection for genetic sequence classification. “Bioinformatics 14(2):139–143. Culotta, Aron, Michael Wick, Robert Hall, and Andrew McCallum. 2007. “First-Order Probabilistic Models for Coreference Resolution.” In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT/NAACL). Derczynski, Leon, and Robert Gaizauskas. 2010. “USFD2: Annotating Temporal Expressions and TLINKs for TempEval-2.”


pages: 314 words: 94,600

Business Metadata: Capturing Enterprise Knowledge by William H. Inmon, Bonnie K. O'Neil, Lowell Fryman


affirmative action, bioinformatics, business intelligence, business process, call centre, carbon-based life, continuous integration, corporate governance, create, read, update, delete, database schema,, informal economy, knowledge economy, knowledge worker, semantic web, The Wisdom of Crowds, web application

However, the NCI Thesaurus is not “just” a thesaurus; it uses OWL and is description logic based, also using a concept hierarchy organized into trees. The terms were stored in a 11179 registry, and the registry metadata was mapped to UML structures from the Class Diagram. The solution includes three main layers: ✦ Layer 1: Enterprise Vocabulary Services: DL (description logics) and ontology, thesaurus ✦ Layer 2: CADSR: Metadata Registry, consisting of Common Data Elements ✦ Layer 3: Cancer Bioinformatics Objects, using UML Domain Models The NCI Thesaurus contains over 48,000 concepts. Although its emphasis is on machine understandability, NCI has managed to translate description logic somewhat into English. Linking concepts together is accomplished through roles, which are also concepts themselves. Here’s an example: Concept: Disease: ALD Positive Anaplastic Large Cell Lymphoma Role: Disease_Has_Molecular_Abnormality Concept: Molecular Abnormality: Rearrangement of 2p23 (Warzel, 2006, p.18) 216 Chapter 11 Semantics and Business Metadata NCI’s toolkit is called caCORE, and it includes objects that developers can use in their applications.


pages: 313 words: 34,042

Tools for Computational Finance by Rüdiger Seydel


bioinformatics, Black-Scholes formula, Brownian motion, continuous integration, discrete time, implied volatility, incomplete markets, interest rate swap, linear programming, London Interbank Offered Rate, mandelbrot fractal, martingale, random walk, stochastic process, stochastic volatility, transaction costs, value at risk, volatility smile, Wiener process, zero-coupon bond

.: Second Course in Ordinary Differential Equations for Scientists and Engineers Franke, J.; Härdle, W.; Hafner, C. M.: Statistics of Financial Markets: An Introduction Hurwitz, A.; Kritikos, N.: Lectures on Number Theory Frauenthal, J. C.: Mathematical Modeling in Epidemiology Huybrechts, D.: Complex Geometry: An Introduction Freitag, E.; Busam, R.: Complex Analysis Isaev, A.: Introduction to Mathematical Methods in Bioinformatics Friedman, R.: Algebraic Surfaces and Holomorphic Vector Bundles Fuks, D. B.; Rokhlin, V. A.: Beginner’s Course in Topology Fuhrmann, P. A.: A Polynomial Approach to Linear Algebra Gallot, S.; Hulin, D.; Lafontaine, J.: Riemannian Geometry Istas, J.: Mathematical Modeling for the Life Sciences Iversen, B.: Cohomology of Sheaves Jacod, J.; Protter, P.: Probability Essentials Jennings, G. A.: Modern Geometry with Applications Gardiner, C.


pages: 313 words: 101,403

My Life as a Quant: Reflections on Physics and Finance by Emanuel Derman


Berlin Wall, bioinformatics, Black-Scholes formula, Brownian motion, capital asset pricing model, Claude Shannon: information theory, Emanuel Derman, fixed income, Gödel, Escher, Bach, haute couture, hiring and firing, implied volatility, interest rate derivative, Jeff Bezos, John von Neumann, law of one price, linked data, Long Term Capital Management, moral hazard, Murray Gell-Mann, pre–internet, publish or perish, quantitative trading / quantitative finance, Richard Feynman, Sharpe ratio, statistical arbitrage, statistical model, Stephen Hawking, Steve Jobs, stochastic volatility, technology bubble, transaction costs, value at risk, volatility smile, Y2K, yield curve, zero-coupon bond

When I asked if he had known David, he told me that O'Connor had been intent on shutting down David's enterprise. With their deep pockets, he said "they had guys spending all their time running diff RMSs files and the O'Connor code" (Dill is one of the great suite of UNIX tools that make a programmer's life easier. It compares two different files of text and finds any common strings of words in them, a simpler version of current bio-informatics programs that search for common strings of DNA in the mouse and human genome.) I have no idea whether there were in fact commonalities, but even independent people coding the same wellknown algorithm might end up writing vaguely similar chunks of code. O'Connor eventually disappeared, too, absorbed into Swiss Bank, which itself subsequently merged with UBS. Starting in 1990 David disappeared into some alternate nonfinancial New York; none of his old friends saw him anymore.


pages: 313 words: 84,312

We-Think: Mass Innovation, Not Mass Production by Charles Leadbeater


1960s counterculture, Andrew Keen, barriers to entry, bioinformatics,, call centre, citizen journalism, clean water, cloud computing, complexity theory, congestion charging, death of newspapers, Debian, digital Maoism, double helix, Edward Lloyd's coffeehouse, frictionless, frictionless market, future of work, game design, Google Earth, Google X / Alphabet X, Hacker Ethic, Hernando de Soto, hive mind, Howard Rheingold, interchangeable parts, Isaac Newton, James Watt: steam engine, Jane Jacobs, Jaron Lanier, Jean Tirole, jimmy wales, John von Neumann, Kevin Kelly, knowledge economy, knowledge worker, lone genius, M-Pesa, Mark Zuckerberg, Marshall McLuhan, Menlo Park, microcredit, new economy, Nicholas Carr, online collectivism, planetary scale, post scarcity, Richard Stallman, Silicon Valley, slashdot, social web, software patent, Steven Levy, Stewart Brand, supply-chain management, The Death and Life of Great American Cities, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Whole Earth Catalog, Zipcar

Yet even more traditional sectors will feel the pull of the pebbles in time, not least because the consumers and workforce of the near future will have grown up using the social web to search for and share ideas with one another. They will bring with them the web’s culture of lateral, semi-structured free association. This new organisational landscape is taking shape all around us. Scientific research is becoming ever more a question of organising a vast number of pebbles. Young scientists especially in emerging fields like bioinformatics draw on hundreds of data banks; use electronic lab notebooks to record and then share their results daily, often through blogs and wikis; work in multi-disciplinary teams threaded around the world organised by social networks; they publish their results, including open source versions of the software used in their experiments and their raw data, in open access online journals. Schools and universities are boulders, that are increasingly dealing with students who want to be in the pebble business, drawing information from a variety of sources, sharing with their peers, learning from one another.


pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists by Gary Marcus, Jeremy Freeman


23andMe, Albert Einstein, bioinformatics, bitcoin, brain emulation, cloud computing, complexity theory, computer age, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data acquisition, Drosophila, epigenetics, Google Glasses, iterative process, linked data, mouse model, optical character recognition, pattern recognition, personalized medicine, phenotype, race to the bottom, Richard Feynman, Richard Feynman, Ronald Reagan, semantic web, speech recognition, stem cell, Steven Pinker, supply-chain management, Turing machine, web application

(Although, of course, there is potential to miss the true culprit if it lies outside the exome.) When geneticists began exome sequencing in earnest, they encountered an unexpected complication. It turns out that each human individual carries a surprisingly high number of potentially deleterious mutations, typically more than one hundred. These are mutations that alter or disturb protein sequences in a way that is predicted to have a damaging effect on protein function, based on bioinformatic (computer-based) analyses. Each mutation might be extremely rare in the population, or even unique to the person or family in which it is found. How do we sift out the true causal mutations, the ones that are functionally implicated in the disorder or trait we are studying, against a broader background of irrelevant genomic change? Sometimes we can rely on a lucky convergence of findings, for example, where distinct mutations in the same gene pop up in multiple different affected families or cases.


pages: 313 words: 95,077

Here Comes Everybody: The Power of Organizing Without Organizations by Clay Shirky


Andrew Keen, Berlin Wall, bioinformatics, Brewster Kahle,, crowdsourcing,, hiring and firing, hive mind, Howard Rheingold, Internet Archive, invention of agriculture, invention of movable type, invention of the printing press, invention of the telegraph, jimmy wales, Kuiper Belt, lump of labour, Mahatma Gandhi, means of production, Merlin Mann, Nash equilibrium, Network effects, Nicholas Carr, Picturephone, place-making, Pluto: dwarf planet, prediction markets, price mechanism, prisoner's dilemma, profit motive, Richard Stallman, Ronald Coase, Silicon Valley, slashdot, social software, Stewart Brand, supply-chain management, The Nature of the Firm, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, transaction costs, ultimatum game, Yogi Berra

The Chinese had the best chance of sequencing the virus; the threat of SARS was most significant in Asia, and especially in China, which had most of the world’s confirmed cases, and China is home to brilliant biologists, with significant expertise in distributed computing. Despite these resources and incentives, however, the solution didn’t come from China. On April 12, Genome Sciences Centre (GSC), a small Canadian lab specializing in the genetics of pathogens, published the genetic sequence of SARS. On the way, they had participated in not just one open network, but several. Almost the entire computational installation of GSC is open source; bioinformatics tools with names like BLAST, Phrap, Phred, and Consed, all running on Linux. GSC checked their work against Genbank, a public database of genetic sequences. They published their findings on their own site (run, naturally, using open source tools) and published the finished sequence to Genbank, for everyone to see. The story is shot through with involvement in various participatory networks.


pages: 364 words: 99,897

The Industries of the Future by Alec Ross


23andMe, 3D printing, Airbnb, algorithmic trading, AltaVista, Anne Wojcicki, autonomous vehicles, banking crisis, barriers to entry, Bernie Madoff, bioinformatics, bitcoin, blockchain, Brian Krebs, British Empire, business intelligence, call centre, carbon footprint, cloud computing, collaborative consumption, connected car, corporate governance, Credit Default Swap, cryptocurrency, David Brooks, disintermediation, Dissolution of the Soviet Union, distributed ledger, Edward Glaeser, Edward Snowden,, Erik Brynjolfsson, fiat currency, future of work, global supply chain, Google X / Alphabet X, industrial robot, Internet of things, invention of the printing press, Jaron Lanier, Jeff Bezos, job automation, knowledge economy, knowledge worker, litecoin, M-Pesa, Mark Zuckerberg, Mikhail Gorbachev, mobile money, money: store of value / unit of account / medium of exchange, new economy, offshore financial centre, open economy, peer-to-peer lending, personalized medicine, Peter Thiel, precision agriculture, pre–internet, RAND corporation, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Satoshi Nakamoto, self-driving car, sharing economy, Silicon Valley, Silicon Valley startup, Skype, smart cities, social graph, software as a service, special economic zone, supply-chain management, supply-chain management software, technoutopianism, underbanked, Vernor Vinge, Watson beat the top human players on Jeopardy!, women in the workforce, Y Combinator, young professional

While Seltzer makes the case that virtually every bit of our personal information is now available to those who want it, I do think there are parts of our lives that remain private and that we must fight to keep private. And I think the best way to do that is by focusing on defining rules for data retention and proper use. Most of our health information remains private, and the need for privacy will grow with the rise of genomics. John Quackenbush, a professor of computational biology and bioinformatics at Harvard, explained that “as soon as you touch genomic data, that information is fundamentally identifiable. I can erase your address and Social Security number and every other identifier, but I can’t anonymize your genome without wiping out the information that I need to analyze.” The danger of genomic information being widely available is difficult to overstate. All of the most intimate details of who and what we are genetically could be used by governments or corporations for reasons going beyond trying to develop precision medicines.


pages: 354 words: 91,875

The Willpower Instinct: How Self-Control Works, Why It Matters, and What You Can Doto Get More of It by Kelly McGonigal


banking crisis, bioinformatics, Cass Sunstein, choice architecture, cognitive bias, delayed gratification, game design, impulse control, loss aversion, meta analysis, meta-analysis, phenotype, Richard Thaler, Wall-E, Walter Mischel

See also Witkiewitz, K., and S. Bowen. “Depression, Craving, and Substance Use Following a Randomized Trial of Mindfulness-Based Relapse Prevention.” Journal of Consulting and Clinical Psychology 78 (2010): 362–74. Chapter 10: Final Thoughts Page 237—“Only reasonable conclusion to a book about scientific ideas is: Draw your own conclusions”: Credit for this suggestion goes to Brian Kidd, Senior Bioinformatics Research Specialist, Institute for Infection Immunity and Transplantation, Stanford University. INDEX acceptance inner power of Adams, Claire addiction addict loses his cravings candy addict conquers sweet tooth chocoholic takes inspiration from Hershey’s Kisses dopamine’s role in drinking drug e-mail Facebook shopping smoker under social influence smoking Advisor-Teller Money Manager Intervention (ATM) Ainslie, George Air Force Academy, U.S.


pages: 532 words: 139,706

Googled: The End of the World as We Know It by Ken Auletta


23andMe, AltaVista, Anne Wojcicki, Apple's 1984 Super Bowl advert, bioinformatics, Burning Man, carbon footprint, citizen journalism, Clayton Christensen, cloud computing, Colonization of Mars, corporate social responsibility, death of newspapers, disintermediation, don't be evil, facts on the ground, Firefox, Frank Gehry, Google Earth, hypertext link, Innovator's Dilemma, Internet Archive, invention of the telephone, Jeff Bezos, jimmy wales, Kevin Kelly, knowledge worker, Long Term Capital Management, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Network effects, new economy, Nicholas Carr, PageRank, Paul Buchheit, Peter Thiel, Ralph Waldo Emerson, Richard Feynman, Richard Feynman, Sand Hill Road, Saturday Night Live, semantic web, sharing economy, Silicon Valley, Skype, slashdot, social graph, spectrum auction, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, strikebreaker, telemarketer, the scientific method, The Wisdom of Crowds, Upton Sinclair, X Prize, yield management

Measured by growth, it was Google’s best year, with revenues soaring 60 percent to $16.6 billion, with international revenues contributing nearly half the total, and with profits climbing to $4.2 billion. Google ended the year with 16,805 full-time employees, offices in twenty countries, and the search engine available in 117 languages. And the year had been a personally happy one for Page and Brin. Page married Lucy Southworth, a former model who earned her Ph.D. in bioinformatics in January 2009 from Stanford; they married seven months after Brin wed Anne Wojcicki. But Sheryl Sandberg was worried. She had held a ranking job in the Clinton administration before, joining Google in 2001, where she supervised all online sales for AdWords and AdSense, and was regularly hailed by Fortune magazine as one of the fifty most powerful female executives in America. Sandberg came to believe Google’s vice was the flip side of its virtue.


pages: 470 words: 109,589

Apache Solr 3 Enterprise Search Server by Unknown


bioinformatics, continuous integration, database schema,, fault tolerance, Firefox, full text search, information retrieval, Internet Archive, natural language processing, performance metric, platform as a service, web application

Lastly I want to thank all the adopters of Solr and Lucene! Without you, I wouldn't have this wonderful open source project to be so incredibly proud to be a part of! I look forward to meeting more of you at the next LuceneRevolution or Euro Lucene conference. About the Reviewers Jerome Eteve holds a MSc in IT and Sciences from the University of Lille (France). After starting his career in the field of bioinformatics where he worked as a Biological Data Management and Analysis Consultant, he's now a Senior Application Developer with interests ranging from architecture to delivering a great user experience online. He's passionate about open source technologies, search engines, and web application architecture. He now works for WCN Plc, a leading provider of recruitment software solutions. He has worked on Packt's Enterprise Solr published in 2009.


pages: 339 words: 57,031

From Counterculture to Cyberculture: Stewart Brand, the Whole Earth Network, and the Rise of Digital Utopianism by Fred Turner


1960s counterculture, A Declaration of the Independence of Cyberspace, Apple's 1984 Super Bowl advert, back-to-the-land, bioinformatics, Buckminster Fuller, Claude Shannon: information theory, complexity theory, computer age, conceptual framework, Danny Hillis, dematerialisation, distributed generation, Douglas Engelbart, Dynabook, From Mathematics to the Technologies of Life and Death, future of work, game design, George Gilder, global village, Golden Gate Park, Hacker Ethic, Haight Ashbury, hive mind, Howard Rheingold, informal economy, invisible hand, Jaron Lanier, John von Neumann, Kevin Kelly, knowledge economy, knowledge worker, market bubble, Marshall McLuhan, means of production, Menlo Park, Mother of all demos, new economy, Norbert Wiener, post-industrial society, postindustrial economy, Productivity paradox, QWERTY keyboard, Ralph Waldo Emerson, RAND corporation, Richard Stallman, Robert Shiller, Robert Shiller, Ronald Reagan, Silicon Valley, Silicon Valley ideology, South of Market, San Francisco, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, technoutopianism, Ted Nelson, Telecommunications Act of 1996, theory of mind, urban renewal, Vannevar Bush, Whole Earth Catalog, Whole Earth Review, Yom Kippur War

Hosted at Los Alamos by Christopher Langton, then a postdoctoral researcher at the laboratory, the conference brought together 160 biologists, physicists, anthropologists, and computer scientists. Like the scientists and technicians of the Rad Lab and Los Alamos in World War II, the contributors to the first Artificial Life Conference quickly established an intellectual trading zone. Specialists in robotics presented papers on questions of cultural evolution; computer scientists used new algorithms to model seemingly biological patterns of growth; bioinformatics specialists applied what they believed to be principles of natural ecologies to the development of social structures. For these scientists, as formerly for members of the Rad Lab and the cold war research institutes that followed it, systems theory served as a contact language and computers served as key supports for a systems orientation toward interdisciplinary work. Furthermore, computers granted participants in the workshop a familiar God’s-eye point of view.


pages: 561 words: 120,899

The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy by Sharon Bertsch McGrayne


bioinformatics, British Empire, Claude Shannon: information theory, Daniel Kahneman / Amos Tversky, double helix, Edmond Halley, Fellow of the Royal Society, full text search, Henri Poincaré, Isaac Newton, John Nash: game theory, John von Neumann, linear programming, meta analysis, meta-analysis, Nate Silver, p-value, placebo effect, prediction markets, RAND corporation, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, Richard Feynman: Challenger O-ring, Ronald Reagan, speech recognition, statistical model, stochastic process, Thomas Kuhn: the structure of scientific revolutions, traveling salesman, Turing machine, Turing test, uranium enrichment, Yom Kippur War

Ron Howard, who had become interested in Bayes while at Harvard, was working on Bayesian networks in Stanford’s economic engineering department. A medical student, David E. Heckerman, became interested too and for his Ph.D. dissertation wrote a program to help pathologists diagnose lymph node diseases. Computerized diagnostics had been tried but abandoned decades earlier. Heckerman’s Ph.D. in bioinformatics concerned medicine, but his software won a prestigious national award in 1990 from the Association for Computing Machinery, the professional organization for computing. Two years later, Heckerman went to Microsoft to work on Bayesian networks. The Federal Drug Administration (FDA) allows the manufacturers of medical devices to use Bayes in their final applications for FDA approval. Devices include almost any medical item that is not a drug or biological product, items such as latex gloves, intraocular lenses, breast implants, thermometers, home AIDS kits, and artificial hips and hearts.


pages: 486 words: 132,784

Inventors at Work: The Minds and Motivation Behind Modern Inventions by Brett Stern


Apple II, augmented reality, autonomous vehicles, bioinformatics, Build a better mousetrap, business process, cloud computing, computer vision, cyber-physical system, distributed generation, game design, Grace Hopper, Richard Feynman, Richard Feynman, Silicon Valley, skunkworks, Skype, smart transportation, speech recognition, statistical model, stealth mode startup, Steve Jobs, Steve Wozniak, the market place, Yogi Berra

Dougherty: Oftentimes, inventors who are prosecuting their application pro se are unaware that they may ask the examiner for assistance in drafting allowable claims if there is allowable subject matter in the written disclosure. The examiner’s function is to allow valid patents. So, they will help the inventor come to an allowable subject matter if it exists in the application. Stern: Which technologies or fields exhibit high-growth trends in terms of patents? Calvert: One area that is going to be big is bioinformatics, which is biology and computer software working together. Dougherty: Medical device art is a high-growth area, too. People are living longer and they’re seeking to reduce costs for an enhanced life. Devices are getting smaller. Nanotechnology is already enabling medical devices, for example, that can travel through your bloodstream, collecting and reporting medical data in real time. Calvert: Another area that’s booming is electronic games and betting devices in the gambling industry.


pages: 458 words: 135,206

CTOs at Work by Scott Donaldson, Stanley Siegel, Gary Donaldson


Amazon Web Services, bioinformatics, business intelligence, business process, call centre, centre right, cloud computing, computer vision, connected car, crowdsourcing, data acquisition, distributed generation, domain-specific language, glass ceiling, pattern recognition, Pluto: dwarf planet, Richard Feynman, Richard Feynman, shareholder value, Silicon Valley, Skype, smart grid, smart meter, software patent, thinkpad, web application, zero day

For example, in the ISR (intelligence, surveillance and reconnaissance) domain, we produce sensors that generate the bits, transfer those bits through networks, wireless or wired, convert the bits into data, into knowledge, and into decisions through the processing, exploitation, and dissemination chain. With a teammate we developed a brand-new type of biological sensor that we called “TIGER” (Threat ID through Genetic Evaluation of Risk). That technology won The Wall Street Journal “gold” Technology Innovation Award in 2009 for the best invention of the year. It relies on a combination of advanced biotech hardware with groundbreaking bio-informatics techniques that were based on our radar signal processing expertise. Information from a sensor like that can feed into our epidemiology and disease tracking work. That's an example of a sensor at the front end through information flow at the back end. In the cyber security domain, our subsidiary, CloudShield, has a very special piece of hardware that enables real-time, deep packet inspection of network traffic at network line speeds, and that allows you to find cyber threats embedded in the traffic.


pages: 398 words: 31,161

Gnuplot in Action: Understanding Data With Graphs by Philipp Janert


bioinformatics, business intelligence, centre right, Debian, general-purpose programming language, iterative process, mandelbrot fractal, pattern recognition, random walk, Richard Stallman, six sigma

For a project to be listed here, first of all I had to be aware of it. Then, the project had to be ■ ■ ■ ■ Free and open source Available for the Linux platform Active and mature Available as a standalone product and allowing interactive use (this requirement eliminates libraries and graphics command languages) 348 APPENDIX C ■ ■ C.3.1 Reasonably general purpose (this eliminates specialized tools for molecular modeling, bio-informatics, high-energy physics, and so on) Comparable to or going beyond gnuplot in at least some respects Math and statistics programming environments R The R language and environment ( are in many ways the de facto standard for statistical computing and graphics using open source tools. R shares with gnuplot an emphasis on iterative work in an interactive environment. It’s extensible, and many user-contributed packages are available from the R website and its mirrors.


pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom


agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, anti-communist, artificial general intelligence, autonomous vehicles, barriers to entry, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, demographic transition, Douglas Hofstadter, Drosophila, Elon Musk,, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, Gödel, Escher, Bach, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John von Neumann, knowledge worker, Menlo Park, meta analysis, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Norbert Wiener, NP-complete, nuclear winter, optical character recognition, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, strong AI, superintelligent machines, supervolcano, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, transaction costs, Turing machine, Vernor Vinge, Watson beat the top human players on Jeopardy!, World Values Survey

They also provide important insight into the concept of causality.28 One advantage of relating learning problems from specific domains to the general problem of Bayesian inference is that new algorithms that make Bayesian inference more efficient will then yield immediate improvements across many different areas. Advances in Monte Carlo approximation techniques, for example, are directly applied in computer vision, robotics, and computational genetics. Another advantage is that it lets researchers from different disciplines more easily pool their findings. Graphical models and Bayesian statistics have become a shared focus of research in many fields, including machine learning, statistical physics, bioinformatics, combinatorial optimization, and communication theory.35 A fair amount of the recent progress in machine learning has resulted from incorporating formal results originally derived in other academic fields. (Machine learning applications have also benefitted enormously from faster computers and greater availability of large data sets.) * * * Box 1 An optimal Bayesian agent An ideal Bayesian agent starts out with a “prior probability distribution,” a function that assigns probabilities to each “possible world” (i.e. to each maximally specific way the world could turn out to be).29 This prior incorporates an inductive bias such that simpler possible worlds are assigned higher probabilities.


pages: 413 words: 119,587

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots by John Markoff


A Declaration of the Independence of Cyberspace, AI winter, airport security, Apple II, artificial general intelligence, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, Bill Duvall, bioinformatics, Brewster Kahle, Burning Man, call centre, cellular automata, Chris Urmson, Claude Shannon: information theory, Clayton Christensen, clean water, cloud computing, collective bargaining, computer age, computer vision, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, data acquisition, Dean Kamen, deskilling, don't be evil, Douglas Engelbart, Douglas Hofstadter, Dynabook, Edward Snowden, Elon Musk, Erik Brynjolfsson, factory automation, From Mathematics to the Technologies of Life and Death, future of work, Galaxy Zoo, Google Glasses, Google X / Alphabet X, Grace Hopper, Gödel, Escher, Bach, Hacker Ethic, haute couture, hive mind, hypertext link, indoor plumbing, industrial robot, information retrieval, Internet Archive, Internet of things, invention of the wheel, Jacques de Vaucanson, Jaron Lanier, Jeff Bezos, job automation, John Conway, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, knowledge worker, Kodak vs Instagram, labor-force participation, loose coupling, Mark Zuckerberg, Marshall McLuhan, medical residency, Menlo Park, Mother of all demos, natural language processing, new economy, Norbert Wiener, PageRank, pattern recognition, pre–internet, RAND corporation, Ray Kurzweil, Richard Stallman, Robert Gordon, Rodney Brooks, Sand Hill Road, Second Machine Age, self-driving car, semantic web, shareholder value, side project, Silicon Valley, Silicon Valley startup, Singularitarianism, skunkworks, Skype, social software, speech recognition, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Nelson, telemarketer, telepresence, telepresence robot, Tenerife airport disaster, The Coming Technological Singularity, the medium is the message, Thorstein Veblen, Turing test, Vannevar Bush, Vernor Vinge, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, William Shockley: the traitorous eight

Seated in his office at the company’s Mountain View headquarters, he read a message that warned him an alien attack was under way. Immediately after he read the message, two large men burst into his office and instructed him that it was essential he immediately accompany them to an undisclosed location in Woodside, the elite community populated by Silicon Valley’s technology executives and venture capitalists. This was Page’s surprise fortieth birthday party, orchestrated by his wife, Lucy Southworth, a Stanford bioinformatics Ph.D. A crowd of 150 people in appropriate alien-themed costumes had gathered, including Google cofounder Sergey Brin, who wore a dress. In the basement of the sprawling mansion where the party was held, a robot arm grabbed small boxes one at a time and gaily tossed the souvenirs to an appreciative crowd. The robot itself consisted of a standard Japanese-made industrial robot arm outfitted with a suction gripper hand driven by a noisy air compressor.


pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos


3D printing, Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, crowdsourcing, Danny Hillis, data is the new oil, double helix, Douglas Hofstadter, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, global village, Google Glasses, Gödel, Escher, Bach, information retrieval, job automation, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, lone genius, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, NP-complete, P = NP, PageRank, pattern recognition, phenotype, planetary scale, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, Second Machine Age, self-driving car, Silicon Valley, speech recognition, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight

Statistical Language Learning,* by Eugene Charniak (MIT Press, 1996), explains how hidden Markov models work. Statistical Methods for Speech Recognition,* by Fred Jelinek (MIT Press, 1997), describes their application to speech recognition. The story of HMM-style inference in communication is told in “The Viterbi algorithm: A personal history,” by David Forney (unpublished; online at Bioinformatics: The Machine Learning Approach,* by Pierre Baldi and Søren Brunak (2nd ed., MIT Press, 2001), is an introduction to the use of machine learning in biology, including HMMs. “Engineers look to Kalman filtering for guidance,” by Barry Cipra (SIAM News, 1993), is a brief introduction to Kalman filters, their history, and their applications. Judea Pearl’s pioneering work on Bayesian networks appears in his book Probabilistic Reasoning in Intelligent Systems* (Morgan Kaufmann, 1988).


pages: 445 words: 129,068

The Speed of Dark by Elizabeth Moon


bioinformatics, gravity well, hiring and firing, industrial robot, life extension, theory of mind

I stare at him and almost forget to stand up and say the words of the Nicene Creed, which is what comes next. I believe in God the Father, maker of heaven and earth and of all things seen and unseen. I believe God is important and does not make mistakes. My mother used to joke about God making mistakes, but I do not think if He is God He makes mistakes. So it is not a silly question. Do I want to be healed?And of what? The only self I know is this self, the person I am now, the autistic bioinformatics specialist fencer lover of Marjory. And I believe in his only begotten son, Jesus Christ, who actually in the flesh asked that question of the man by the pool. The man who perhaps—the story does not say—had gone there because people were Page 183 tired of him being sick and disabled, who perhaps had been content to lie down all day, but he got in the way. What would Jesus have done if the man had said, “No, I don’t want to be healed; I am quite content as I am”?


pages: 476 words: 120,892

Life on the Edge: The Coming of Age of Quantum Biology by Johnjoe McFadden, Jim Al-Khalili


agricultural Revolution, Albert Einstein, Alfred Russel Wallace, bioinformatics, complexity theory, dematerialisation, double helix, Douglas Hofstadter, Drosophila, Ernest Rutherford, Gödel, Escher, Bach, invention of the printing press, Isaac Newton, James Watt: steam engine, Louis Pasteur, New Journalism, phenotype, Richard Feynman, Richard Feynman, Schrödinger's Cat, theory of mind, traveling salesman, uranium enrichment, Zeno's paradox

Carlson, V. Gray-Schopfer, M. Dessing and C. Olsson, “Increased transcription levels induce higher mutation rates in a hypermutating cell line,” Journal of Immunology, vol. 166: 8 (2001), pp. 5051–7. 8 P. Cui, F. Ding, Q. Lin, L. Zhang, A. Li, Z. Zhang, S. Hu and J. Yu, “Distinct contributions of replication and transcription to mutation rate variation of human genomes,” Genomics, Proteomics and Bioinformatics, vol. 10: 1 (2012), pp. 4–10. 9 J. Cairns, J. Overbaugh and S. Millar, “The origin of mutants,” Nature, vol. 335 (1988), pp. 142–5. 10 John Cairns on Jim Watson, Cold Spring Harbor Oral History Collection. Interview available at: 11 J. Gribbin, In Search of Schrödinger’s Cat (London: Wildwood House, 1984; repr.


pages: 476 words: 148,895

Cooked: A Natural History of Transformation by Michael Pollan


biofilm, bioinformatics, Columbian Exchange, correlation does not imply causation, dematerialisation, Drosophila, energy security, Gary Taubes, Hernando de Soto, Louis Pasteur, Mason jar, microbiome, peak oil, Ralph Waldo Emerson, Steven Pinker, women in the workforce

Blaser, Martin J. “Who Are We? Indigenous Microbes and the Ecology of Human Disease.” European Molecular Biology Organization, Vol. 7, No. 10, 2006. Bravo, Javier A., et al. “Ingestion of Lactobacillus Strain Regulates Emotional Behavior and Central GABA Receptor Expression in a Mouse Via the Vagus Nerve.” Desiere, Frank, et al. “Bioinformatics and Data Knowledge: The New Frontiers for Nutrition and Food.” Trends in Food Science & Technology 12 (2002): 215–29. Douwes, J., et al. “Farm Exposure in Utero May Protect Against Asthma.” European Respiratory Journal 32 (2008): 603–11. Ege, M.J., et al. Parsifal study team. “Prenatal Farm Exposure Is Related to the Expression of Receptors of the Innate Immunity and to Atopic Sensitization in School-Age Children.”


pages: 437 words: 113,173

Age of Discovery: Navigating the Risks and Rewards of Our New Renaissance by Ian Goldin, Chris Kutarna


2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, Airbnb, Albert Einstein, AltaVista, Asian financial crisis, asset-backed security, autonomous vehicles, banking crisis, barriers to entry, battle of ideas, Berlin Wall, bioinformatics, bitcoin, Bonfire of the Vanities, clean water, collective bargaining, Colonization of Mars, Credit Default Swap, crowdsourcing, cryptocurrency, Dava Sobel, demographic dividend, Deng Xiaoping, Doha Development Round, double helix, Edward Snowden, Elon Musk,, epigenetics, experimental economics, failed state, Fall of the Berlin Wall, financial innovation, full employment, Galaxy Zoo, global supply chain, Hyperloop, immigration reform, income inequality, indoor plumbing, industrial robot, information retrieval, intermodal, Internet of things, invention of the printing press, Isaac Newton, Islamic Golden Age, Khan Academy, Kickstarter, labour market flexibility, low cost carrier, low skilled workers, Lyft, Malacca Straits, megacity, Mikhail Gorbachev, moral hazard, Network effects, New Urbanism, non-tariff barriers, Occupy movement, On the Revolutions of the Heavenly Spheres, open economy, Panamax, personalized medicine, Peter Thiel, post-Panamax, profit motive, rent-seeking, reshoring, Robert Gordon, Search for Extraterrestrial Intelligence, Second Machine Age, self-driving car, Shenzhen was a fishing village, Silicon Valley, Silicon Valley startup, Skype, smart grid, Snapchat, special economic zone, spice trade, statistical model, Stephen Hawking, Steve Jobs, Stuxnet, TaskRabbit, too big to fail, trade liberalization, trade route, transaction costs, transatlantic slave trade, uranium enrichment, We are the 99%, We wanted flying cars, instead we got 140 characters, working poor, working-age population, zero day

Costandi, Moheb (2012, June 19). “Surgery on Ice.” Nature Middle East. Retrieved from 8. Dwyer, Terence, PhD. (2015, October 1). “The Present State of Medical Science.” Interviewed by C. Kutarna, University of Oxford. 9. National Human Genome Research Institute (1998). “Twenty Questions about DNA Sequencing (and the Answers).” NHGRI. Retrieved from 10. Rincon, Paul (2014, January 15). “Science Enters $1,000 Genome Era.” BBC News. Retrieved from 11. Regalado, Antonio (2014, September 24). “Emtech: Illumina Says 228,000 Human Genomes Will Be Sequenced This Year.” MIT Technology Review. Retrieved from 12. GENCODE (2015, July 15). “Statistics about the Current Human Gencode Release.”


pages: 504 words: 89,238

Natural language processing with Python by Steven Bird, Ewan Klein, Edward Loper


bioinformatics, business intelligence, conceptual framework, elephant in my pajamas,, finite state, Firefox, information retrieval, Menlo Park, natural language processing, P = NP, search inside the book, speech recognition, statistical model, text mining, Turing test

In Proceedings of the 14th Conference on Computational Linguistics (COLING), pages 539–545, 1992. [Heim and Kratzer, 1998] Irene Heim and Angelika Kratzer. Semantics in Generative Grammar. Blackwell, 1998. [Hirschman et al., 2005] Lynette Hirschman, Alexander Yeh, Christian Blaschke, and Alfonso Valencia. Overview of BioCreAtIvE: critical assessment of information extrac tion for biology. BMC Bioinformatics, 6, May 2005. Supplement 1. [Hodges, 1977] Wilfred Hodges. Logic. Penguin Books, Harmondsworth, 1977. [Huddleston and Pullum, 2002] Rodney D. Huddleston and Geoffrey K. Pullum. The Cambridge Grammar of the English Language. Cambridge University Press, 2002. [Hunt and Thomas, 2000] Andrew Hunt and David Thomas. The Pragmatic Programmer: From Journeyman to Master. Addison Wesley, 2000. [Indurkhya and Damerau, 2010] Nitin Indurkhya and Fred Damerau, editors.


pages: 1,201 words: 233,519

Coders at Work by Peter Seibel


Ada Lovelace, bioinformatics, cloud computing, Conway's Game of Life, domain-specific language, fault tolerance, Fermat's Last Theorem, Firefox, George Gilder, glass ceiling, HyperCard, information retrieval, loose coupling, Menlo Park, Metcalfe's law, premature optimization, publish or perish, random walk, revision control, Richard Stallman, rolodex, Saturday Night Live, side project, slashdot, speech recognition, the scientific method, Therac-25, Turing complete, Turing machine, Turing test, type inference, Valgrind, web application

But we have to be willing to try and take advantage of that, but also take advantage of the integration of systems and the fact that data's coming from everywhere. It's no longer encapsulated with the program, the code. We're seeing now, I think, vast amounts of data, which is accessible. And it's numeric data as well as the informational kinds of data, and will be stored all over the globe, especially if you're working in some of the bioinformatics kind of stuff. And we have to be able to create a platform, probably composed of a lot of parts, which is going to enable those things to come together—computational capability that is probably quite different than we have now. And we also need to, sooner or later, address usability and integrity of these systems. Seibel: Usability from the point of the programmer, or usability for the end users of these systems?


pages: 933 words: 205,691

Hadoop: The Definitive Guide by Tom White


Amazon Web Services, bioinformatics, business intelligence, combinatorial explosion, database schema, Debian, domain-specific language,, fault tolerance, full text search, Grace Hopper, information retrieval, Internet Archive, linked data, loose coupling, openstreetmap, recommendation engine, RFID, SETI@home, social graph, web application

This would involve sampling page view logs (because the total page view data for a popular website is huge), grouping it by time and then finding the number of new users at different time points via a custom reduce script. This is a good example where both SQL and MapReduce are required for solving the end user problem and something that is possible to achieve easily with Hive. Data analysis Hive and Hadoop can be easily used for training and scoring for data analysis applications. These data analysis applications can span multiple domains such as popular websites, bioinformatics companies, and oil exploration companies. A typical example of such an application in the online ad network industry would be the prediction of what features of an ad makes it more likely to be noticed by the user. The training phase typically would involve identifying the response metric and the predictive features. In this case, a good metric to measure the effectiveness of an ad could be its click-through rate.


pages: 798 words: 240,182

The Transhumanist Reader by Max More, Natasha Vita-More


23andMe, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, augmented reality, Bill Joy: nanobots, bioinformatics, brain emulation, Buckminster Fuller, cellular automata, clean water, cloud computing, cognitive bias, cognitive dissonance, combinatorial explosion, conceptual framework, Conway's Game of Life, cosmological principle, data acquisition, discovery of DNA, Drosophila,, experimental subject, Extropian, fault tolerance, Flynn Effect, Francis Fukuyama: the end of history, Frank Gehry, friendly AI, game design, germ theory of disease, hypertext link, impulse control, index fund, John von Neumann, joint-stock company, Kevin Kelly, Law of Accelerating Returns, life extension, Louis Pasteur, Menlo Park, meta analysis, meta-analysis, moral hazard, Network effects, Norbert Wiener, P = NP, pattern recognition, phenotype, positional goods, prediction markets, presumed consent, Ray Kurzweil, reversible computing, RFID, Richard Feynman, Ronald Reagan, silicon-based life, Singularitarianism, stem cell, stochastic process, superintelligent machines, supply-chain management, supply-chain management software, technological singularity, Ted Nelson, telepresence, telepresence robot, telerobotics, the built environment, The Coming Technological Singularity, the scientific method, The Wisdom of Crowds, transaction costs, Turing machine, Turing test, Upton Sinclair, Vernor Vinge, Von Neumann architecture, Whole Earth Review, women in the workforce

The assertion is that genetic enhancement necessarily implies experimentation without consent and this violates bedrock bioethical principles requiring the protection of human subjects. Consequently, there is an unbridgeable gap which would-be enhancers cannot ethically cross. This view incorporates a rather static view of what it will be possible for future genetic ­enhancers to know and test beforehand. Any genetic enhancement techniques will first be ­extensively tested and perfected in animal models. Second, a vastly expanded bioinformatics enterprise will become crucial to understanding the ramifications of proposed genetic inter­ventions (National Resource Center for Cell Analysis). As scientific understanding improves, the risk versus benefit calculations of various prospective genetic enhancements of embryos will shift. The arc of ­scientific discovery and technological progress strongly suggests that it will happen in the next few decades.


pages: 903 words: 235,753

The Stack: On Software and Sovereignty by Benjamin H. Bratton


1960s counterculture, 3D printing, 4chan, Ada Lovelace, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, Berlin Wall, bioinformatics, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, carbon footprint, carbon-based life, Cass Sunstein, Celebration, Florida, charter city, clean water, cloud computing, connected car, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Edward Snowden, Elon Musk,, Eratosthenes, ethereum blockchain, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, Hyperloop, illegal immigration, industrial robot, information retrieval, intermodal, Internet of things, invisible hand, Jacob Appelbaum, Jaron Lanier, Jony Ive, Julian Assange, Khan Academy, linked data, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megastructure, Menlo Park, Minecraft, Monroe Doctrine, Network effects, new economy, offshore financial centre, oil shale / tar sands, packet switching, PageRank, pattern recognition, peak oil, performance metric, personalized medicine, Peter Thiel, phenotype, place-making, planetary scale, RAND corporation, recommendation engine, reserve currency, RFID, Sand Hill Road, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, Slavoj Žižek, smart cities, smart grid, smart meter, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, TaskRabbit, the built environment, The Chicago School, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, universal basic income, urban planning, Vernor Vinge, Washington Consensus, web application, WikiLeaks, working poor, Y Combinator

This also relates to what Heidegger once called our “confrontation with planetary technology” (an encounter that he never managed to actually make and which most Heideggerians manage to endlessly defer, or “differ”).15 That encounter should be motivated by an invested interest in several “planetary technologies” working at various scales of matter, and based on, in many respects, what cheap supercomputing, broadband networking, and isomorphic data management methodologies make possible to research and application. These include—but are no means limited to—geology (e.g., geochemistry, geophysics, oceanography, glaciology), earth sciences (e.g., focusing on the atmosphere, lithospere, biosphere, hydrosphere), as well as the various programs of biotechnology (e.g., bioinformatics, synthetic biology, cell therapy), of nanotechnology (e.g., materials, machines, medicines), of economics (e.g., modeling price, output cycles, disincentivized externalities), of neuroscience (e.g., behavioral, cognitive, clinical), and of astronomy (e.g., astrobiology, extragalactic imaging, cosmology). In that all of these are methodologically and even epistemologically informed by computer science (e.g., algorithmic modeling, macrosensors and microsensors, data structure optimization, information theory, data visualization, cryptography, networked collaboration), then all of these planetary technologies are also planetary computational technologies.


pages: 855 words: 178,507

The Information: A History, a Theory, a Flood by James Gleick


Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, AltaVista, bank run, bioinformatics, Brownian motion, butterfly effect, citation needed, Claude Shannon: information theory, clockwork universe, computer age, conceptual framework, crowdsourcing, death of newspapers, discovery of DNA, double helix, Douglas Hofstadter,, Eratosthenes, Fellow of the Royal Society, Gödel, Escher, Bach, Henri Poincaré, Honoré de Balzac, index card, informal economy, information retrieval, invention of the printing press, invention of writing, Isaac Newton, Jacquard loom, Jacquard loom, Jaron Lanier, jimmy wales, John von Neumann, Joseph-Marie Jacquard, Louis Daguerre, Marshall McLuhan, Menlo Park, microbiome, Milgram experiment, Network effects, New Journalism, Norbert Wiener, On the Economy of Machinery and Manufactures, PageRank, pattern recognition, phenotype, pre–internet, Ralph Waldo Emerson, RAND corporation, reversible computing, Richard Feynman, Richard Feynman, Simon Singh, Socratic dialogue, Stephen Hawking, Steven Pinker, stochastic process, talking drums, the High Line, The Wisdom of Crowds, transcontinental railway, Turing machine, Turing test, women in the workforce

.… But now the damn thing is everywhere.”) Like any good meme, it spawned mutations. The “jumping the shark” entry in Wikipedia advised in 2009, “See also: jumping the couch; nuking the fridge.” Is this science? In his 1983 column, Hofstadter proposed the obvious memetic label for such a discipline: memetics. The study of memes has attracted researchers from fields as far apart as computer science and microbiology. In bioinformatics, chain letters are an object of study. They are memes; they have evolutionary histories. The very purpose of a chain letter is replication; whatever else a chain letter may say, it embodies one message: Copy me. One student of chain-letter evolution, Daniel W. VanArsdale, listed many variants, in chain letters and even earlier texts: “Make seven copies of it exactly as it is written” [1902]; “Copy this in full and send to nine friends” [1923]; “And if any man shall take away from the words of the book of this prophecy, God shall take away his part out of the book of life” [Revelation 22:19].♦ Chain letters flourished with the help of a new nineteenth-century technology: “carbonic paper,” sandwiched between sheets of writing paper in stacks.


pages: 1,199 words: 332,563

Golden Holocaust: Origins of the Cigarette Catastrophe and the Case for Abolition by Robert N. Proctor


bioinformatics, carbon footprint, clean water, corporate social responsibility, Deng Xiaoping, desegregation, facts on the ground, friendly fire, germ theory of disease, index card, Indoor air pollution, information retrieval, invention of gunpowder, John Snow's cholera map, language of flowers, life extension, New Journalism, optical character recognition, pink-collar, Ponzi scheme, Potemkin village, Ralph Nader, Ronald Reagan, speech recognition, stem cell, telemarketer, Thomas Kuhn: the structure of scientific revolutions, Triangle Shirtwaist Factory, Upton Sinclair, Yogi Berra

MCV faculty also helped undermine public health advocacy: in 1990 James Kilpatrick from biostatistics, working also as a consultant for the Tobacco Institute, wrote to the editor of the New York Times criticizing Stanton Glantz and William Parmley’s demonstration of thirty-five thousand U.S. cardiovascular deaths per annum from exposure to secondhand smoke.49 Glantz by this time was commonly ridiculed by the industry, which even organized skits (to practice courtroom scenarios) in which health advocates were given thinly disguised names: Glantz was “Ata Glance” or “Stanton Glass, professional anti-smoker”; Alan Blum was “Alan Glum” representing “Doctors Ought to Kvetch” or “Doctors Opposed to People Exhaling Smoke” (DOPES); Richard Daynard was “Richard Blowhard” from the “Product Liability Education Alliance,” and so forth.50 VCU continues even today to have close research relationships with Philip Morris, covering topics as diverse as pharmacogenomics, bioinformatics, and behavioral genetics.51 SYMBIOSIS It would be a mistake to characterize this interpenetration of tobacco and academia as merely a “conflict of interest”; the relationship has been far more symbiotic. We are really talking about a confluence of interests, and sometimes even a virtual identity of interests. The Medical College of Virginia was “sold American” by the early 1940s and remained one of the tobacco industry’s staunchest allies for seven decades.