optical character recognition

48 results back to index


pages: 291 words: 77,596

Total Recall: How the E-Memory Revolution Will Change Everything by C. Gordon Bell, Jim Gemmell

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

airport security, Albert Einstein, book scanning, cloud computing, conceptual framework, full text search, information retrieval, invention of writing, inventory management, Isaac Newton, Menlo Park, optical character recognition, pattern recognition, performance metric, RAND corporation, RFID, semantic web, Silicon Valley, Skype, social web, statistical model, Stephen Hawking, Steve Ballmer, Ted Nelson, telepresence, Turing test, Vannevar Bush, web application

This was not a project to store my life bits; it was about how to get them back! Scanned documents are image files, not text files, and as such, they’re invisible to keyword searches. But with thousands upon thousands of documents in my e-memory, keyword searching would be the only way to re-locate an old file that I could only recollect one or two fragments of, such as a name, a dollar amount, or a dateline. So I ran all the scanned documents through optical character recognition (OCR) software, which is able to recognize written letters and numbers in an image and reconstruct them in a text file. What I ended up with were thousands upon thousands of text files that were neatly interleaved among the scanned files. Now I just needed desktop search software, that is, software that would allow me to search through my thousands of files for some desired text, just like you search for Web pages now using Yahoo or Google.

But we aren’t there yet. A flatbed scanner is great for mementos, like medals, plaques, and so on. Unlike a digital camera, it always gets the lighting right. Some stuff just won’t fit in any scanner, though, so sometimes you will have to use a camera; if you can get it outdoors on a cloudy day, you can often get it nicely lit without reflections. Finally, make sure your scanner software is performing optical character recognition (OCR) on the scanned pages so that later the computer will be able to search for the text inside them. DEALING WITH WHAT YO U ALREADY HAVE Properly equipped, you are now ready to convert your old analog life’s worth of papers and memorabilia to digital form. Set a goal of being paperless within a year. Besides scanning the paper you already have, you should also arrange to receive more born-digital communications in the future, to reduce the flow of paper that you need to scan.

See also video and video cameras multitasking music MyCyberTwin Myhrvold, Nathan MyLifeBits and CARPE current status of and data storage described and e-memory and e-textbooks and fact checking and health data and lifelong learning and location tracking and memex origins of and painful memories and personal relationships and storytelling and summarization of data and total data collection and user interfaces and video recordings MyLifeDisk N naming conventions nanny cams national defense National Hockey League (NHL) National Library of Medicine National Public Radio National Security Agency (NSA) National Treasure 2 (film) natural user interface (NUI) Nelson, Ted networking The New York Times newspapers Newton, Isaac Niedermeier, Jerome J. Nike 1984 (Orwell) Nintendo Nixon, Richard Nokia Norman, Donald Northeastern University Northrup, Christiane note taking notebook computers . See also laptops O The Observers (Williamson) Office of Scientific Research and Development OneNote open systems operating systems optical character recognition (OCR) oral histories organic light-emitting polymer technology organization of data. See also files-and-folders organization automatic summarization categorization schemes and clutter and data analysis and DSpace and electronic memory and implementation of Total Recall and indexing and lifelong learning and lifetime periods and scanned documents Ornish, Dean Orwell, George OS X, Otlet, Paul Outlook ownership of data .


pages: 189 words: 57,632

Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future by Cory Doctorow

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

book scanning, Brewster Kahle, Burning Man, en.wikipedia.org, informal economy, information retrieval, Internet Archive, invention of movable type, Jeff Bezos, Law of Accelerating Returns, Metcalfe's law, mutually assured destruction, new economy, optical character recognition, patent troll, pattern recognition, Ponzi scheme, post scarcity, QWERTY keyboard, Ray Kurzweil, RFID, Sand Hill Road, Skype, slashdot, social software, speech recognition, Steve Jobs, Turing test, Vernor Vinge

The important question is: will it let more people participate in cultural production? Will it further decentralize decision-making for artists? And for SF writers and fans, the further question is, "Will it be any good to our chosen medium?" Like I said, science fiction is the only literature people care enough about to steal on the Internet. It's the only literature that regularly shows up, scanned and run through optical character recognition software and lovingly hand-edited on darknet newsgroups, Russian websites, IRC channels and elsewhere (yes, there's also a brisk trade in comics and technical books, but I'm talking about prose fiction here — though this is clearly a sign of hope for our friends in tech publishing and funnybooks). Some writers are using the Internet's affinity for SF to great effect. I've released every one of my novels under Creative Commons licenses that encourage fans to share them freely and widely — even, in some cases, to remix them and to make new editions of them for use in the developing world.

One meaning for that word is "legitimate" ebook ventures, that is to say, rightsholder-authorized editions of the texts of books, released in a proprietary, use-restricted format, sometimes for use on a general-purpose PC and sometimes for use on a special-purpose hardware device like the nuvoMedia Rocketbook [ROCKETBOOK]. The other meaning for ebook is a "pirate" or unauthorized electronic edition of a book, usually made by cutting the binding off of a book and scanning it a page at a time, then running the resulting bitmaps through an optical character recognition app to convert them into ASCII text, to be cleaned up by hand. These books are pretty buggy, full of errors introduced by the OCR. A lot of my colleagues worry that these books also have deliberate errors, created by mischievous book-rippers who cut, add or change text in order to "improve" the work. Frankly, I have never seen any evidence that any book-ripper is interested in doing this, and until I do, I think that this is the last thing anyone should be worrying about.

More importantly, the free e-book skeptics have no evidence to offer in support of their position — just hand-waving and dark muttering about a mythological future when book-lovers give up their printed books for electronic book-readers (as opposed to the much more plausible future where book lovers go on buying their fetish objects and carry books around on their electronic devices). I started giving away e-books after I witnessed the early days of the "bookwarez" scene, wherein fans cut the binding off their favorite books, scanned them, ran them through optical character recognition software, and manually proofread them to eliminate the digitization errors. These fans were easily spending 80 hours to rip their favorite books, and they were only ripping their favorite books, books they loved and wanted to share. (The 80-hour figure comes from my own attempt to do this — I'm sure that rippers get faster with practice.) I thought to myself that 80 hours' free promotional effort would be a good thing to have at my disposal when my books entered the market.


pages: 255 words: 78,207

Web Scraping With Python: Collecting Data From the Modern Web by Ryan Mitchell

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AltaVista, Amazon Web Services, cloud computing, en.wikipedia.org, Firefox, meta analysis, meta-analysis, natural language processing, optical character recognition, random walk, self-driving car, Turing test, web application

Hamidi, 227 intellectual property, 217-219 234 internal links crawling an entire site, 35-40 crawling with Scrapy, 45-48 traversing a single domain, 31-35 Internet about, 213-216 cautions downloading files from, 74 crawling across, 40-45 moving forward, 206 IP address blocking, avoiding, 199-200 ISO character sets, 96-98 is_displayed function, 186 Item object, 46, 48 items.py file, 46 | Index lambda expressions, 28, 74 legalities of web scraping, 217-230 lexicographical analysis with NLTK, 132-136 libraries bundling with projects, 7 OCR support, 161-164 logging with Scrapy, 48 logins about, 137 handling, 142-143 troubleshooting, 187 lxml library, 29 M machine learning, 135, 180 machine training, 135, 171-174 Markov text generators, 123-129 media files, storing, 71-74 Mersenne Twister algorithm, 34 methods (HTTP), 51 Microsoft SQL Server, 76 Microsoft Word, 102-105 MIME (Multipurpose Internet Mail Exten‐ sions) protocol, 90 MIMEText object, 90 MySQL about, 76 basic commands, 79-82 database techniques, 85-87 installing, 77-79 integrating with Python, 82-85 Wikipedia example, 87-89 N name attribute, 140 natural language processing about, 119 additional resources, 136 Markov models, 123-129 Natural Language Toolkit, 129-136 summarizing data, 120-123 Natural Language Toolkit (NLTK) about, 129 installation and setup, 129 lexicographical analysis, 132-136 statistical analysis, 130-132 NavigableString object, 18 navigating trees, 18-22 network connections about, 3-5 connecting reliably, 9-11 security considerations, 181 next_siblings() function, 21 ngrams module, 132 n-grams, 109-112, 120 NLTK (Natural Language Toolkit) about, 129 installation and setup, 129 lexicographical analysis, 132-136 statistical analysis, 130-132 NLTK Downloader interface, 130 NLTK module, 129 None object, 10 normalizing data, 112-113 NumPy library, 164 O OAuth authentication, 57 OCR (optical character recognition) about, 161 library support, 162-164 OpenRefine Expression Language (GREL), 116 OpenRefine tool about, 114 cleaning data, 116-118 filtering data, 115-116 installing, 114 usage considerations, 114 optical character recognition (OCR) about, 161 library support, 162-164 Oracle DBMS, 76 OrderedDict object, 112 os module, 74 P page load times, 154, 182 parentheses (), 25 parents (tags), 20, 22 parsing HTML pages (see HTML parsing) parsing JSON, 63 patents, 217 pay-per-hour computing instances, 205 PDF files, 100-102 PDFMiner3K library, 101 Penn Treebank Project, 133 period (.), 25 Peters, Tim, 211 PhantomJS tool, 152-155, 203 PIL (Python Imaging Library), 162 Pillow library about, 162 processing well-formatted text, 165-169 pipe (|), 25 plus sign (+), 25 POST method (HTTP) about, 51 tracking requests, 140 troubleshooting, 186 variable names and, 138 viewing form parameters, 140 Index | 235 previous_siblings() function, 21 primary keys in tables, 85 programming languages, regular expressions and, 27 projects, bundling with libraries, 7 pseudorandom number generators, 34 PUT method (HTTP), 51 PyMySQL library, 82-85 PySocks module, 202 Python Imaging Library (PIL), 162 Python language, installing, 209-211 Q query time versus database size, 86 quotation marks ("), 17 R random number generators, 34 random seeds, 34 rate limits about, 52 Google APIs, 60 Twitter API, 55 reading documents document encoding, 93 Microsoft Word, 102-105 PDF files, 100 text files, 94-98 recursion limit, 38, 89 redirects, 44, 158 Referrer header, 179 RegexPal website, 24 regular expressions about, 22-27 BeautifulSoup example, 27 commonly used symbols, 25 programming languages and, 27 relational data, 77 remote hosting running from a website hosting account, 203 running from the cloud, 204 remote servers avoiding IP address blocking, 199-200 extensibility and, 200 portability and, 200 PySocks and, 202 Tor and, 201-202 Requests library 236 | Index about, 137 auth module, 144 installing, 138, 179 submitting forms, 138 tracking cookies, 142-143 requests module, 179-181 responses, API calls and, 52 Robots Exclusion Standard, 223 robots.txt file, 138, 167, 222-225, 229 S safe harbor protection, 219, 230 Scrapy library, 45-48 screenshots, 197 script tag, 147 search engine optimization (SEO), 222 searching text data, 135 security considerations copyright law and, 219 forms and, 183-186 handling cookies, 181 SELECT statement, 79, 81 Selenium library about, 143 elements and, 153, 194 executing JavaScript, 152-156 handling redirects, 158 security considerations, 185 testing example, 193-198 Tor support, 203 semicolon (;), 210 SEO (search engine optimization), 222 server-side processing handling redirects, 44, 158 scripting languages and, 147 sets, 67 siblings (tags), 21 Simple Mail Transfer Protocol (SMTP), 90 site maps, 36 Six Degrees of Wikipedia, 31-35 SMTP (Simple Mail Transfer Protocol), 90 smtplib package, 90 sorted function, 112 span tag, 15 Spitler, Daniel, 227 SQL Server (Microsoft), 76 square brackets [], 25 src attribute, 28, 72, 74 StaleElementReferenceException, 158 statistical analysis with NLTK, 130-132 storing data (see data management) StringIO object, 99 strings, regular expressions and, 22-28 stylesheets about, 14, 216 dynamic HTML and, 151 hidden fields and, 184 Surface Web, 36 trademarks, 218 traversing the Web (see web crawlers) tree navigation, 18-22 trespass to chattels, 219-220, 226 trigrams module, 132 try...finally statement, 85 Twitov app, 123 Twitter API, 55-59 T underscore (_), 17 undirected graph problems, 127 Unicode standard, 83, 95-98, 110 unit tests, 190, 197 United States v.

Even in this day and age, many documents are simply scanned from hard copies and put on the Web, making these documents inaccessible as far as much of the Internet is concerned, although they are “hiding in plain sight.” Without image-to-text capabilities, the only way to make these documents accessible is for a human to type them up by hand—and nobody has time for that. Translating images into text is called optical character recognition, or OCR. There are a few major libraries that are able to perform OCR, and many other libraries that sup‐ port them or are built on top of them. This system of libraries can get fairly compli‐ cated at times, so I recommend you read the next section before attempting any of the exercises in this chapter. 161 Overview of Libraries Python is a fantastic language for image processing and reading, image-based machine-learning, and even image creation.

In addition, you will need to understand not just how to use the tools presented in this book in isolation, but how they can work together to solve a larger problem. Sometimes the data is easily available and well formatted, allowing a simple scraper to do the trick. Other times you have to put some thought into it. In Chapter 10, for example, I combined the Selenium library to identify Ajax-loaded images on Amazon, and Tesseract to use optical character recognition to read them. 206 | Chapter 14: Scraping Remotely In the “Six Degrees of Wikipedia” problem, I used regular expressions to write a crawler that stored link information in a database, and then used a graph-solving algorithm in order to answer the question, “What is the shortest path of links between Kevin Bacon and Eric Idle”? There is rarely an unsolvable problem when it comes to automated data collection on the Internet.


pages: 138 words: 27,404

OpenCV Computer Vision With Python by Joseph Howse

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

augmented reality, computer vision, Debian, optical character recognition, pattern recognition

He participated in Blender source code, an open source and 3D-software project, and worked in his first commercial movie Plumiferos—Aventuras voladoras as a Computer Graphics Software Developer. David now has more than 10 years of experience in IT, with more than seven years experience in computer vision, computer graphics, and pattern recognition working on different projects and startups, applying his knowledge of computer vision, optical character recognition, and augmented reality. He is the author of the DamilesBlog (http://blog.damiles.com), where he publishes research articles and tutorials about OpenCV, computer vision in general, and Optical Character Recognition algorithms. He is the co-author of Mastering OpenCV with Practical Computer Vision Projects , Daniel Lélis Baggio, Shervin Emami, David Millán Escrivá, Khvedchenia Ievgen, Naureen Mahmood, Jasonl Saragih, and Roy Shilkrot, Packt Publishing. He is also a reviewer of GnuPlot Cookbook, Lee Phillips, Packt Publishing.


pages: 696 words: 143,736

The Age of Spiritual Machines: When Computers Exceed Human Intelligence by Ray Kurzweil

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Any sufficiently advanced technology is indistinguishable from magic, Buckminster Fuller, call centre, cellular automata, combinatorial explosion, complexity theory, computer age, computer vision, cosmological constant, cosmological principle, Danny Hillis, double helix, Douglas Hofstadter, first square of the chessboard / second half of the chessboard, fudge factor, George Gilder, Gödel, Escher, Bach, I think there is a world market for maybe five computers, information retrieval, invention of movable type, Isaac Newton, iterative process, Jacquard loom, Jacquard loom, John von Neumann, Lao Tzu, Law of Accelerating Returns, mandelbrot fractal, Marshall McLuhan, Menlo Park, natural language processing, Norbert Wiener, optical character recognition, pattern recognition, phenotype, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, Richard Feynman, Schrödinger's Cat, Search for Extraterrestrial Intelligence, self-driving car, Silicon Valley, speech recognition, Steven Pinker, Stewart Brand, stochastic process, technological singularity, Ted Kaczynski, telepresence, the medium is the message, traveling salesman, Turing machine, Turing test, Whole Earth Review, Y2K

There are an estimated 100 billion neurons in the human brain. Noise A random sequence of data. Because the sequence is random and without meaning, noise carries no information. Contrasted with information. Objective experience The experience of an entity as observed by another entity, or measurement apparatus. OCR See Optical character recognition. Operating system A software program that manages and provides a variety of services to application programs, including user interface facilities and management of input-output and memory devices. Optical character recognition (OCR) A process in which a machine scans, recognizes, and encodes printed (and possibly handwritten) characters into digital form. Optical computer A computer that processes information encoded in patterns of light beams; different from today’s conventional computers, in which information is represented in electronic circuitry or encoded on magnetic surfaces.

We received a lot of letters from kids who were delighted with the college that our program had suggested. A few parents, on the other hand, were furious that we had failed to recommend Harvard. It was my first experience with the ability of computers to affect people’s lives. I sold that company to Harcourt, Brace & World, a New York publisher, and moved on to other ideas. In 1974, computer programs that could recognize printed letters, called optical character recognition (OCR), were capable of handling only one or two specialized type styles. I founded Kurzweil Computer Products that year to develop the first OCR program that could recognize any style of print, which we succeeded in doing later that year. So the question then became, What is it good for? Like alot of clever computer software, it was a solution in search of a problem. I happened to sit next to a blind gentleman oa a plane flight, and he explained to me that the only real handicap that he experienced was his inability to rad ordinary printed material.

(creator of Ray Kurzweil’s Cybernetic Poet and other software projects): <http://www.kurzweiltech.com> The dictation division of Lernout & Hauspie Speech Products (formerly Kurzweil Applied Intelligence, Inc.), creator of speech recognition and natural language software systems: <http://www.lhs.com/dictation/> The overall Lernout &: Hauspie web site: <http://www.lhs.com/> Kurzweil Music Systems, Inc., creator of computer-based music synthesizers, sold to Young Chang in 1990: <http:l/www youngchang. com/kurzweil/index.html> TextBridge Optical Character Recognition (OCR). Formerly Kurzweil OCR from Kurzweil Computer Products, Inc. (sold to Xerox Corp. in 1980): <http://www.xerox.com/scansoft/textbridge/> ARTIFICIAL LIFE AND ARTIFICIAL INTELLIGENCE RESEARCH The Artificial Intelligence Laboratory at Massachusetts Institute of Technology (MIT): <http://www.ai.mit.edu/> Artificial Life Online: <http://alife.santafe.edu> Contemporary Philosophy of Mind: An Annotated Bibliography: <http://ling.ucsc.edu/-chalmers/biblio.html> Machine Learning Laboratory, the University of Massachusetts, Amherst: <http://www-ml.cs.umass.edu/> The MIT Media Lab: <http://www.media.mit.edu/> SSIE 580B: Evolutionary Systems and Artificial Life, by Luis M.

Paper Knowledge: Toward a Media History of Documents by Lisa Gitelman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Andrew Keen, computer age, corporate governance, deskilling, Douglas Engelbart, East Village, en.wikipedia.org, information retrieval, Internet Archive, invention of movable type, Jaron Lanier, knowledge economy, Marshall McLuhan, Mikhail Gorbachev, national security letter, On the Economy of Machinery and Manufactures, optical character recognition, profit motive, RAND corporation, RFC: Request For Comment, Silicon Valley, Steve Jobs, The Structural Transformation of the Public Sphere, Turing test, Works Progress Administration

Search Google Images or Flickr all you like: you are effectively searching associated tags—textual metadata—rather than actual images. pdf page images inhabit the text-­image distinction as texts, not as images, because all pdf s are potentially searchable. That said, there are plenty of pdf s— called “image-­only”—that cannot be searched within a pdf -­reader application until or unless they have been manipulated computationally to identify the alphanumeric characters they contain through optical character recognition (ocr ), which produces machine-­encoded text. Before being scanned, these image-­only pdf s do function as images, and very “poor” ones at that.87 “To ocr ” a document has become a verb at least as handy in some situations as “to pdf ” one. Optical character recognition points precisely to the line that separates electronic texts from images. It is a line that disappears at the level of the alphanumeric character since “the algorithmic eyes of ” scanning technology effectively identify the shapes of characters, “seeing” them as patterns of yes/no variables that can together be “recognized” (that is, processed) as alphanumeric characters.

., 19 208   INDEX mla Handbook for Writers of Research Papers, 9 Morgan, Pierpont, 63 Moskowitz, Sam, 146, 148 Moxon, John, 49 Mumford, Lewis, 61 The Myth of the Paperless Office (Sellen and Harper), 4, 111, 128, 130 National Association of Book Publishers, 73 National Library of Medicine, 107 Neilsen, Jakob, 132 New Deal, 14, 62, 167n15 New York Public Library, 23, 55, 66–67, 73 New York Times, 84–88, 92, 94, 120–21, 129 newspapers, 2, 4, 25, 31, 33, 36, 40, 43, 45, 46, 52, 58, 72, 75, 77, 78, 85, 88, 111, 124, 138, 139, 141, 148, 149 Nixon, Richard M., 86–87, 95–96 novels, 3, 36, 40, 115 Novelty Job Printing Press, 138–40, 184n6 Nunberg, Geoffrey, 4 Obama, Barack, 95, 97, 116 The Office (television series), 106 Ohmann, Richard, 145 Oliver Optic’s Magazine, 141–42 optical character recognition (ocr), 134 Our Young Folks, 142 Oushakine, Serguei Alex., 174n41 Owen, Robert Dale, 34 paper, 3–4, 33, 46, 89, 123, 128, 147; format, 12, 41, 62; perishability of, 54, 66–67; shredding, 96 passport, 1, 10 Patriot Act of 2001, 97 Pentagon Papers, 16, 85–88, 90, 94–96, 107, 116–17 Phillips, John L., 38, 40–42; “The Art of Preservative”: 100 Fancy Specimens of Job Printing, 11, 38, 40, 42, 44, 140 photocopy.


pages: 1,199 words: 332,563

Golden Holocaust: Origins of the Cigarette Catastrophe and the Case for Abolition by Robert N. Proctor

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

bioinformatics, carbon footprint, clean water, corporate social responsibility, Deng Xiaoping, desegregation, facts on the ground, friendly fire, germ theory of disease, index card, Indoor air pollution, information retrieval, invention of gunpowder, John Snow's cholera map, language of flowers, life extension, New Journalism, optical character recognition, pink-collar, Ponzi scheme, Potemkin village, Ralph Nader, Ronald Reagan, speech recognition, stem cell, telemarketer, Thomas Kuhn: the structure of scientific revolutions, Triangle Shirtwaist Factory, Upton Sinclair, Yogi Berra

The present text is different in taking more of a global view (even if America remains the centerpiece) but also by virtue of being almost entirely based on the industry’s formerly secret archives, now (and only recently) available online in full-text searchable form. In this sense the book represents a new kind of historiography: history based on optical character recognition, allowing a rapid “combing” of the archives for historical gems (and fleas). Searching by optical character recognition works like a powerful magnet, allowing anyone with an Internet connection to pull out rhetorical needles from large and formidable document haystacks. (Try it—you need only go to http://legacy.library.ucsf.edu, and enter whatever search term you might fancy.) The Internet posting of documents in this form presents us with research opportunities that are largely unprobed.

Prior to computerization, it would have taken many lifetimes to go through such a large body of documents and gather up all usages of words such as “alleged,” “castoreum,” or “propaganda.” With full text searchability online, however—thanks to optical character recognition—this can now be done in a matter of seconds, and by anyone with an Internet connection. We can only search what has been turned over by the companies, of course—and that limitation is profound—but the archives do make it harder for ideas once captured to be lost. And optical character recognition works like an enormous magnet, allowing the tiniest of rhetorical needles to be found even in large archival haystacks. History is rendered transparent in ways not previous possible. Many of my colleagues used to labor in secret for the industry, for example, and some presumably still do.

The industry was forced to pay for the establishment and maintenance of a website posting these documents, which by the year 2000 consisted of about 44 million pages—and today consists of over 70 million pages, following addition of documents from BAT’s Guildford depository in the United Kingdom. Now accessible at http://legacy.library.ucsf.edu, the Legacy Tobacco Documents Library is the largest business archive in the world. Most documents are full-text searchable, and searches for terms like “cancer” or “nicotine” turn up hundreds of thousands of documents. Searches for terms like “baseball” or “sports” yield many thousands of hits. Optical character recognition was introduced in 2007, which means you can now search for expressions like “please destroy” or “subjects to be avoided,” with options to order the documents by date or by size; one can limit one’s search to documents from a particular company or a particular year or author or a particular document type (consumer letters, for example). Full-text searchability means you can probe the rhetorical microstructure of the archives; the expression “need more research,” for example, yields 666 documents, and there are hits for terms like “Nazis” and “Negroes” and “zealot.”


pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, Affordable Care Act / Obamacare, airport security, AltaVista, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, IBM and the Holocaust, index card, informal economy, Internet of things, invention of the printing press, Jeff Bezos, Louis Pasteur, Mark Zuckerberg, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, performance metric, Peter Thiel, Post-materialism, post-materialism, random walk, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, speech recognition, Steve Jobs, Steven Levy, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Watson beat the top human players on Jeopardy!

One could not have searched the text for particular words, or analyzed it, because the text hadn’t been datafied. All that Google had were images that only humans could transform into useful information—by reading. While this would still have been a great tool—a modern, digital Library of Alexandria, more comprehensive than any library in history—Google wanted more. The company understood that information has stored value that can only be released once it is datafied. And so Google used optical character-recognition software that could take a digital image and recognize the letters, words, sentences, and paragraphs on it. The result was datafied text rather than a digitized picture of a page. Now the information on the page was usable not just for human readers, but also for computers to process and algorithms to analyze. Datafication made text indexable and thus searchable. And it permitted an endless stream of textual analysis.

But when he realized that he was responsible for millions of people wasting lots of time each day typing in annoying, squiggly letters—vast amounts of information that was simply discarded afterwards—he didn’t feel so smart. Looking for ways to put all that human computational power to more productive use, he came up with a successor, fittingly named ReCaptcha. Instead of typing in random letters, people type two words from text-scanning projects that a computer’s optical character-recognition program couldn’t understand. One word is meant to confirm what other users have typed and thus is a signal that the person is a human; the other is a new word in need of disambiguation. To ensure accuracy, the system presents the same fuzzy word to an average of five different people to type in correctly before it trusts it’s right. The data had a primary use—to prove the user was human—but it also had a secondary purpose: to decipher unclear words in digitized texts.

Raw Data Is an Oxymoron by Lisa Gitelman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

collateralized debt obligation, computer age, continuous integration, crowdsourcing, Drosophila, Edmond Halley, Filter Bubble, Firefox, Google Earth, Howard Rheingold, index card, informal economy, Isaac Newton, Johann Wolfgang von Goethe, knowledge worker, Louis Daguerre, Menlo Park, optical character recognition, RFID, Richard Thaler, Silicon Valley, social graph, software studies, statistical model, Stephen Hawking, Steven Pinker, text mining, time value of money, trade route, Turing machine, urban renewal, Vannevar Bush

And while users can search for words under the page images, they cannot reveal what the computer sees; they cannot see the characters that the computer recognizes in the page image. Ironically, over time ECCO’s publisher has loosened its rules on downloading page images. So, for database subscribers, it has become easy and quick to download page images of full books from ECCO. Yet regular users cannot even download a single page of text as interpreted by ECCO’s optical character recognition (OCR) software, which suggests that over time Gale determined there is no percentage in books, not even in digitized images of books, unless the books are already packaged as data.23 The future is in data. Using ECCO, I began trying to understand the sense of “data” in Priestley. Happily, my first searches turned out to be promising. On the one hand, the ECCO results are consistent with those of Google.

., 83–84 National Data Center, 126 National Security Agency, 2 Nature, 151 Newcomb, Simon, 79–81, 85 New York Times, 1, 24 Newton, Isaac, 16, 21, 169 Newton, Robert Russell, 81–84, 85 Nissenbaum, Helen, 130 Noelle-Neumann, Elisabeth, 105 Number, 6, 8, 20, 36, 43–44, 50, 61, 69, 71, 84, 106, 124, 162 Nunberg, Geoffrey, 26, 91 Objectivity, 3–6, 7, 11, 50, 148, 164 Observation, 80, 82–84 Ohm, Paul, 128 Optical Character Recognition (OCR), 28, 31 Orwell, George, 124 Oxford English Dictionary (OED), 18, 20, 34, 35, 126 Paper, 9–10, 157, 165 Paper machine, 10, 105, 108–109 Pattern, 6, 16–17, 95, 123, 129, 159 Petacenter, 151–152, 164 Phenomenology of Spirit, 108, 112 Picciotto, Joanna, 4–5 Pinker, Steven, 19 Playfair, William, 17 Poindexter, John, 131 Pollan, Michael, 148–149 181 182 Index Poovey, Mary, 7, 17 Porter, Theordore, 14n20, 17 Poster, Mark, 126, 129 Preemptive Media Collective, 135–136 Press, the, 10, 89, 96 Priestly, Joseph, 15–17, 28 Privacy, 2, 128, 132, 136 Procustean Marxism, 50 Program, 170 Project Gutenberg, 22, 35 Property, 131 Protocol, 1, 31, 131, 138, 161 Proximity searching, 35 Ptolemy, 82, 83 Pynchon, Thomas, 129 Stephenson, F.


pages: 527 words: 147,690

Terms of Service: Social Media and the Price of Constant Connection by Jacob Silverman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, 4chan, A Declaration of the Independence of Cyberspace, Airbnb, airport security, Amazon Mechanical Turk, augmented reality, Brian Krebs, California gold rush, call centre, cloud computing, cognitive dissonance, correlation does not imply causation, Credit Default Swap, crowdsourcing, don't be evil, Edward Snowden, feminist movement, Filter Bubble, Firefox, Flash crash, game design, global village, Google Chrome, Google Glasses, hive mind, income inequality, informal economy, information retrieval, Internet of things, Jaron Lanier, jimmy wales, Kevin Kelly, Kickstarter, knowledge economy, knowledge worker, late capitalism, license plate recognition, life extension, Lyft, Mark Zuckerberg, Mars Rover, Marshall McLuhan, meta analysis, meta-analysis, Minecraft, move fast and break things, national security letter, Network effects, new economy, Nicholas Carr, Occupy movement, optical character recognition, payday loans, Peter Thiel, postindustrial economy, prediction markets, pre–internet, price discrimination, price stability, profit motive, quantitative hedge fund, race to the bottom, Ray Kurzweil, recommendation engine, rent control, RFID, ride hailing / ride sharing, self-driving car, sentiment analysis, shareholder value, sharing economy, Silicon Valley, Silicon Valley ideology, Snapchat, social graph, social web, sorting algorithm, Steve Ballmer, Steve Jobs, Steven Levy, TaskRabbit, technoutopianism, telemarketer, transportation-network company, Turing test, Uber and Lyft, Uber for X, universal basic income, unpaid internship, women in the workforce, Y Combinator, Zipcar

One could imagine a movement forming in which labor rights advocates say that micro-work is so unsustainable and dehumanizing that it must be automated. Add RFID chips to all packaged food and grocery products and you can track their movement through supply chains and stores without human assistance. Perhaps companies can partner with stores to help utilize their surveillance systems to monitor the placement of goods. Firm up sentiment analysis, trending-topic algorithms, and optical-character-recognition scanning so that humans aren’t forced to do such drudgery. To save content moderators from their on-the-job stress, we have to put them out of work again. Just as Facebook or Pinterest retains control of your data, online labor markets keep workers wedded to the platform. You can’t take your profile elsewhere, unless two labor markets form a partnership or decide to create an open protocol that other markets can take advantage of.

As Trebor Scholz says, “This digital labor is much akin to those less visible, unsung forms of traditional women’s labor such as child care, housework, and surrogacy.” As with online labor markets, the digital labor of social media is highly mediated, disguised. It’s made to look like play or a normal part of Web browsing. For example, CAPTCHA tests—those forms that require you to read a blurry sequence of words/letters/numbers/shapes and enter them to prove you are a human being—often double as ways of improving optical character recognition (OCR) programs. These words were scanned from books, newspaper archives, or other media, but existing OCR software can’t read them. Like a Mechanical Turk worker, you provide the final bit of cognitive labor, deciphering the word for the computer. That word then goes back to whichever company paid the CAPTCHA service to help digitize their material. Andrew Ross calls these and similar methods “the micro-division of labor into puzzles, stints, chorus, and bits.”

See also sentiment analysis Moran, Robert, 191 Morozov, Evgeny, 4–5, 84, 322 Moves fitness app, 305–6 mugshot Web sites, 207–9, 210–11, 213–14, 217 multitasking, 51–52 Mun, Sang, 358 MyEx.com, 210 Myspace, 9 Nambikwara tribe, Brazil, 167–68, 356 narcissism of the social media experience, 61–62 National Reconnaissance Office spy satellite, 314 National Science Foundation (NSF), 279 National Security Agency (NSA), 129–32, 312, 314 National Security Letters (NSLs), 130 NEC, 299, 301 negative sentiments and sharing, 24, 31, 203–4, 305 Negri, Toni, 264 networked privacy model, 291–92 network effects, 13–14, 47, 272–73, 275–76, 295, 327 news consumers’ culpability, 109 news organizations algorithms rating news outlet importance, 84–85 and audience metrics, 101–2, 103 and embeddable media, 259–60 firehose approach to news, 112 as invasion of privacy, 288 memes from local newscasts, 69–72 presidential press conferences, 105 pushing articles selectively, 98 social media/viral editor, 122–23 trawling social media, 113 trending articles as premium journalism, 101 See also BuzzFeed; journalism New Times newspaper, 67–68 New York City and Uber, 237 New York Comic Con 2013, 34 New York Post, 113 New York Times Magazine, 75 Nike, 139 Niquille, Simone C., 356–57 Nissenbaum, Helen, 284, 297 notifications and alerts, 50–53, 214 NSA (National Security Agency), 129–32, 312, 314 NSF (National Science Foundation), 279 NSLs (National Security Letters), 130 Obama, Barack, 134, 169, 194 “Obama Is Wrong” (Hayes), 105–6 ObscuraCam, 357 Occupy movement, 136–37 OCR (optical character recognition) software, 260, 358 O’Donnell, Robert, 152 Office Max, 279–80 OkCupid, 204 Old Spice advertising campaign, 93–94 Omidyar, Pierre, 239 online persona, 344–45 online recommendations, 201–2 online reputation. See reputation On the Media radio program, 109 Open Graph, 11–12 opting out of advertising-based social networks, 275–77 cost of, 295 difficulty finding option for, 32, 33 of friends adding you to a group, 92 of Google Shared Endorsements, 33 of including your location in messages, 177 of Klout, 195 opt-in vs., 7–8 of social media, 272, 340–41, 342, 346, 347 oral storytelling, 62, 63 Oremus, Will, 106–7, 265 outing students via privacy faux pas, 286 ownership of your identity, 256–57, 273–74, 275–77, 311, 360 Page, Larry, 250 page views overview, 95–96, 98 and advertising dollars, 71, 93, 97 Facebook-ready content for generating, 115 and invented controversy, 107 meme-related, 84, 103–4, 105 new outlets’ boosting of, 122–23 Palihapitiya, Chamath, 249 Pandora, 303 paparazzos, 211–12 parents, scrapbooking about their children, 46, 55–60 Pariser, Eli, 122 Paris, France, 267, 268 Patriot Act, 130 pay-per-gaze advertising, 302 Peers, 238–39, 244 peer-to-peer social networks, 311 Peretti, Jonah, 114–15 personal care, 224 personal endorsements, 31–35 personal graph, 18–19 Persson, Markus, 164–65 Pezold, John, 187 PGP, 368–69 PHD, 304 PhoneID Score, TeleSign, 40 phones.


pages: 398 words: 86,023

The Wikipedia Revolution: How a Bunch of Nobodies Created the World's Greatest Encyclopedia by Andrew Lih

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, AltaVista, barriers to entry, Benjamin Mako Hill, c2.com, Cass Sunstein, citation needed, crowdsourcing, Debian, en.wikipedia.org, Firefox, Hacker Ethic, HyperCard, index card, Jane Jacobs, Jason Scott: textfiles.com, jimmy wales, Marshall McLuhan, Network effects, optical character recognition, Ralph Waldo Emerson, Richard Stallman, side project, Silicon Valley, Skype, slashdot, social software, Steve Jobs, The Death and Life of Great American Cities, The Wisdom of Crowds, urban planning, urban renewal, Vannevar Bush, wikimedia commons, Y2K

In the meantime, other projects that involved harnessing “crowds” would take shape on the Internet. One was related to Project Gutenberg, a movement to have public domain print works available for free on the Internet. Project Gutenberg actually started in 1971 on mainframe computers; now it is one of the oldest online text repositories. The problem it faced was that starting in 1989 it digitized books using optical character recognition systems to automatically turn images of book pages into computer text. The problem was that OCR was imperfect, and there were small, but numerous, errors because of smudges, bad image quality, or dust. That gave Charles Franks the idea to start Distributed Proofreaders in 2000, where people from anywhere on the Internet could help proofread these imperfect OCR texts and fix the problems.

., 169 211 Merel, Peter, 62 editorial process of, 37–41, 43, 63, 64 meta-moderation, 68–69 GNUpedia and, 79 metaphors, 46–47 rules of, 36–37 242_Index Nupedia ( continued ) radio, amateur ham, 45–46 structure of, 37–38 Ramsey, Derek (Ram-man), 99–104, 108, Wikipedia and, 64–65, 88, 136, 109, 111, 177 171, 172 Rand, Ayn, 32 wiki software and, 61–65 Raul654 (Mark Pellegrini), 180–81 Nupedia Advisory Board, 37, 38, 64 Raymond, Eric S., 43, 85, 172–73, 175 Nupedia-L, 63 Reagle, Joseph, 82, 96, 112 Nupedia Open Content License, 35, 72 Rec.food.chocolate, 84–85 RickK, 120, 185–88 rings, Web site, 23, 31 objectivism, 32, 36–37 robots, software, 88, 99–106, 145, 147, OCR (optical character recognition), 35 177, 179 Open Directory Project (ODP), 30–31, Rosenfeld, Jeremy, 45 33, 35 Rousseau, Jean-Jacques, 15 Ota, Takashi, 146 Russell, Bertrand, 13, 81 Oxford English Dictionary (OED), 70–72 Russian language, 152 peer production, 108–9 Sandbox, 97, 115 Pellegrini, Mark (Raul654), 180–81 Sanger, Larry, 6–7, 32–34, 36–38, Perl, 56, 67, 101, 140 40–41, 43–45, 61–65, 67, 88, 89, Peul language, 158 115, 184, 202, 210–11 phantom authority, 175–76 boldness directive and, 91, 113 Philological Society, 70 Citizendium project of, 190, 211–12 PHP, 74, 101 Essjay and, 197 Pike, Rob, 144 memoir of, 174, 190, 225 piranha effect, 83, 106, 109, 113, 120 resignation from Wikipedia, 174–75, Plautus Satire, 181 210 Pliny the Elder, 15 on rules, 76, 112 Poe, Marshall, 171 Spanish Wikipedia and, 9, 136–38 Polish Wikipedia, 146, 147 trolls and, 170–75, 189–90 Popular Science, 126 Wikipedia license and, 72 Portland Pattern Repository, 59 Y2K bug and, 32–33 Portuguese language, 136 San Jose Mercury News, 126 PostScript, 52 Schechter, Danny, 8–9 “Potato chip” article, 136 Schiff, Stacy, 196 Professor and the Madman, The Schlossberg, Edwin, 46 (Winchester), 70, 71 schools, 177–78 Project Gutenberg, 35 Scott, Jason, 131, 189 public domain content, 26, 111 search engines, 11, 22, 34 Pupek, Dan, 58 Google, see Google Seigenthaler, John, 9–10, 191–94, 200, 220 Quickpolls, 126–27 Senegal, 158 Quiz Show, 13 Serbian Wikipedia, 155–56 Index_243 servers, 77–79, 191 Tagalog language, 160 Sethilys (Seth Anthony), 106–11 Taiwan, 150, 151, 154 Shah, Sunir, 59–60, 64 “Talossan language” article, 120 Shaw, George Bernard, 135 Tamil language, 160 Shell, Tim, 21–22, 32, 36, 66, 174, Tawker, 177, 179, 186 184 Tektronix, 46, 47, 50, 55, 56 sidewalks, 96–97 termites, 82 Sieradski, Daniel, 204 Thompson, Ken, 143–44 Signpost, 200 Time, 9, 13 Silsor, 186 Torvalds, Linus, 28–29, 30, 173, 175 Sinitic languages, 159 Tower of Babel, 133–34 see also China tragedy of the commons, 223 Skrenta, Rich, 23, 30 Trench, Chenevix, 70 Slashdot, 67–69, 73, 76, 88, 205, trolls, 170–76, 179, 186, 187, 189–90 207, 216 Truel, Bob, 23, 30 Sanger’s memoir for, 174, 190, 225 2channel, 145 Sneakernet, 50 Snow, Michael, 206–7 Socialtext, 207 “U,” article on, 64 sock puppets, 128, 178–79 Unicode, 142, 144 software, open-source, 5, 23–28, 30, 35, UTF-8, 144–45 62, 67, 79, 216 UTF-32, 142, 143 design patterns and, 55, 59 UNIX, 27, 30–31, 54, 56, 143 Linux, 28–30, 56, 108, 140, 143, 173, Unregistered Words Committee, 70 216, 228 urban planning, 96–97 software robots, 88, 99–106, 145, 147, URL (Uniform Resource Locator), 53, 54 177, 179 USA Today, 9, 191, 220 Souren, Kasper, 158 UseModWiki, 61–63, 66, 73–74, 140–41 South Africa, 157–58 Usenet, 35, 83–88, 114, 170, 190, 223 spam, 11, 87, 220 Usenet Moderation Project (Usemod), 62 Spanish Wikipedia, 9, 136–39, 175, 183, USWeb, 211 215, 226 squid servers, 77–79 Stallman, Richard, 23–32, 74, 86, 217 vandalism: GNU Free Documentation License of, on LA Times Wikitorial, 207–8 72–73, 211–12 on Wikipedia, 6, 93, 95, 125, 128, GNU General Public License of, 27, 72 176–79, 181, 184–88, 194, 195, GNU Manifesto of, 26 202, 220, 227 GNUpedia of, 79 Van Doren, Charles, 13–14 Steele, Guy, 86 verein, 147 Stevertigo, 184 VeryVerily, 128 stigmergy, 82, 89, 92, 109 Vibber, Brion, 76 Sun Microsystems, 23, 27, 29–30, 56 Viola, 54 Sun Tzu, 169 ViolaWWW, 54–55 Swedish language, 140, 152 Voltaire, 15 244_Index WAIS, 34, 53 Wik, 123–25, 170, 180 Wales, Christine, 20–21, 22, 139 Wikia, 196, 197 Wales, Doris, 18, 19 Wiki Base, 62 Wales, Jimmy, 1, 8, 9, 18–22, 44, 76, Wikibooks, 216 88, 115, 131, 184, 196, 213, 215, Wikimania, 1–3, 8, 146, 147–48 220 WikiMarkup, 90 administrators and, 94, 185 Wikimedia Commons, 216 background of, 18–19 Wikimedia Foundation, 146, 157, 183–84, at Chicago Options Associates, 20, 196, 199, 213–15, 225–26, 227 21, 22 Wikipedia: Cunctator essay and, 172 administrators of, 67, 93–96, 119, 121, and deletion of articles, 120 125, 127, 148, 178, 185–86, dispute resolution and, 179–80, 181, 195–96, 224–25 223 advertising and, 9, 11, 136–38, 215, Essjay and, 197, 199 226 languages and, 139, 140, 157–58 amateurs and professionals in, 225 neutrality policy and, 6, 7, 113 Arbitration Committee of, 180–81, 184, objectivism and, 32, 36–37 197, 223 Nupedia and, 32–35, 41, 43–45, “assume good faith” policy in, 114, 187, 61–63 195, 200 on piranha effect, 83 blocking of people from, 93 role of, in Wikipedia community, 174–76, boldness directive in, 8, 91, 102, 179–80, 223 113–14, 115, 122, 221 Seigenthaler incident and, 192, 194 categories in, 97–98, 221 Spanish Wikipedia and, 137, 175 “checkuser” privilege in, 179, 196, 199 Stallman and, 30–32 database for, 73–74, 77, 78, 94 three revert rule and, 127–28 discussions in, 7–8, 65–66, 75–76, 89, Wikimania and, 146 121–22 Wikipedia license and, 72 DMOZ as inspiration for, 23 Wikitorials and, 206–7 five pillars of, 113, 216 Wales, Jimmy, Sr., 18 future of, 213–17, 219–29 Wall Street Journal, 126 growth of, 4, 9, 10, 77, 88–89, 95–97, “War and Consequences” Wikitorial, 99–100, 126, 184, 215, 219, 220 206–7 how it works, 90–96 wasps, 82 influence of, 201–212 Weatherly, Keith, 106 launch of, 64, 69, 139, 171 Web browsers, 51–55 legal issues and, 94, 111, 186, 191–92, Weblogs Inc., 215 227; see also copyright; libel WebShare, 209 linking in, 66–67, 73 Webster, Noah, 70, 133 mailing list for, 89, 95 Web 2.0, 68, 111, 114, 201 main community namespace in, 76 Wei, Pei-Yan, 54–55 main page of, 95 Weinstock, Steven, 202–3 MeatballWiki and, 60, 114, 119, 187–88 “Why Wikipedia Must Jettison Its mediation of disputes in, 180, 181, 195 Anti-Elitism” (Sanger), 189–90 meta pages in, 91 Index_245 name of, 45 “diff” function and, 74, 75, 93, 99 namespaces in, 75–76 edit histories of, 64–65, 71, 82, 91–93 number of editors in, 95–96 editing of, 3–4, 6, 38, 64–66, 69, 73, Nupedia and, 64–65, 88, 136, 171, 172 88, 114–15, 131, 194 openness of, 5–6, 9 edit wars and, 95, 122–31, 136, 146 origins of, 43–60 eventualism and, 120–21, 129, 159 policies and rules of, 76, 112–14, first written, 64 127–28, 170, 171, 192, 221, flagged revisions of, 148–49, 215–16, 224–25 227 popularity of, 4 inclusion of, 115–21 Quickpolls in, 126–27 inverted pyramid formula for, 90 Recent Changes page in, 64–65, 82, license covering content of, 72–73, 98, 104, 109, 176–77 211–12 schools and, 177–78 locking of, 95 servers for, 77–79, 191 maps in, 107, 109–11 Slashdot and, 69, 73, 76, 88 neutral point of view in, 6–7, 82, 89, 111, sock puppets and, 128, 178–79 112–13, 117, 140, 174, 203–4, 217, SOFIXIT directive in, 114–15, 122, 221 228 software robots and, 88, 99–106, 145, news and, 7 147, 177, 179 original research and, 112–13, 117, 174 spam and self-promotion on, 11, 220 protection and semi-protection of, 194, talk pages in, 75–76, 89, 93, 98 216 templates in, 97–98, 113, 221 reverts and, 125, 127–28 trolls and, 170–76, 179, 186, 187, single versions of, 6 189–90 spelling mistakes in, 104–5 user pages in, 76, 89 stability of, 227–28 vandalism and, 6, 93, 95, 125, 128, stub, 92, 97, 101, 104, 148 176–79, 181, 184–88, 194, 195, talk pages for, 75–76, 89, 93, 98 202, 220, 227 test edits of, 176 watchlists in, 74, 82, 98–99, 109 “undo” function and, 93 wiki markup language for, 221–22 uneven development of, 220 wiki software for, 64–67, 73, 77, 90, 93, unusual, 92, 117–18 140–41, 216 verifiability and, 112–13, 117 Wikipedia articles: watchlists for, 74, 82, 98–99, 109 accuracy of, 10, 72, 188–89, 194, 208 Wikipedia community, 7–8, 81–132, 174, attempts to influence, 11–12 175, 183–200, 215–17, 222–23 biographies of living persons, 192, Essjay controversy and, 194–200 220–21 Missing Wikipedians page and, 184–85, census data in, 100–104, 106 188 citations in, 113 partitioning of, 223 consensus and, 7, 94, 95, 119–20, 122, Seigenthaler incident and, 9–10, 222–23 191–94, 220 consistency among, 213 stress in, 184 creation of, 90–93, 130–31, 188–89 trolls and, 170 deletion of, 93–94, 96, 119–21, 174 Wales’s role in, 174–76, 179–80, 223 246_Index Wikipedia international editions, 12, 77, Wikitorials, 205–8 100, 131–32, 133–67 Wikiversity, 216 African, 157–58 WikiWikiWeb, 44–45, 58–60, 61, 62 Chinese, 10, 141–44, 146, 150–55 Willy on Wheels (WoW), 178–79 encoding languages for, 140–45 Winchester, Simon, 70, 71 French, 83, 139, 146, 147 Wizards of OS conference, 211 German, 11, 139, 140, 147–49, 215, Wolof language, 158 220, 227 Wool, Danny, 3, 158, 199 Japanese, 139, 140, 141–42, 144, World Book, 16–19 145–47 World Is Flat, The (Friedman), 11 Kazakh, 155–57 World Wide Web, 34, 35, 47, 51–55 links to, 134–35, 140 Web 2.0, 68, 111, 114, 201 list of languages by size, 160–67 WYSIWYG, 222 Serbian, 155–56 Spanish, 9, 136–39, 175, 183, 215, 226 Yahoo, 4, 22, 23, 30, 191, 214 Wikipedia Watch, 192 “Year zero” article, 117 Wikipedia Weekly, 225 Yeats, William Butler, 183 wikis, 44, 51 Yongle encyclopedia, 15 Cunningham’s creation of, 2, 4, 56–60, “You have two cows” article, 118 62, 65–66, 90 YouTube, 58 MeatballWiki, 59–60, 114, 119, 175, Y2K bug, 32–33 187–88 Nupedia and, 61–65 Wikisource, 216 ZhengZhu, 152–57 About the Author Andrew Lih was an academic for ten years at Columbia University and Hong Kong University in new media and journalism.


pages: 372 words: 101,174

How to Create a Mind: The Secret of Human Thought Revealed by Ray Kurzweil

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Albert Michelson, anesthesia awareness, anthropic principle, brain emulation, cellular automata, Claude Shannon: information theory, cloud computing, computer age, Dean Kamen, discovery of DNA, double helix, en.wikipedia.org, epigenetics, George Gilder, Google Earth, Isaac Newton, iterative process, Jacquard loom, Jacquard loom, John von Neumann, Law of Accelerating Returns, linear programming, Loebner Prize, mandelbrot fractal, Norbert Wiener, optical character recognition, pattern recognition, Peter Thiel, Ralph Waldo Emerson, random walk, Ray Kurzweil, reversible computing, self-driving car, speech recognition, Steven Pinker, strong AI, the scientific method, theory of mind, Turing complete, Turing machine, Turing test, Wall-E, Watson beat the top human players on Jeopardy!, X Prize

We are now in a position to speed up the learning process by a factor of thousands or millions once again by migrating from biological to nonbiological intelligence. Once a digital neocortex learns a skill, it can transfer that know-how in minutes or even seconds. As one of many examples, at my first company, Kurzweil Computer Products (now Nuance Speech Technologies), which I founded in 1973, we spent years training a set of research computers to recognize printed letters from scanned documents, a technology called omni-font (any type font) optical character recognition (OCR). This particular technology has now been in continual development for almost forty years, with the current product called OmniPage from Nuance. If you want your computer to recognize printed letters, you don’t need to spend years training it to do so, as we did—you can simply download the evolved patterns already learned by the research computers in the form of software. In the 1980s we began on speech recognition, and that technology, which has also been in continuous development now for several decades, is part of Siri.

., 28–29, 206–7, 217 dimming of, 29, 59 hippocampus and, 101–2 as ordered sequences of patterns, 27–29, 54 redundancy of, 59 unexpected recall of, 31–32, 54, 68–69 working, 101 Menabrea, Luigi, 190 metacognition, 200, 201 metaphors, 14–15, 113–17, 176–77 Michelson, Albert, 18, 19, 36, 114 Michelson-Morley experiment, 19, 36, 114 microtubules, 206, 207, 208, 274 Miescher, Friedrich, 16 mind, 11 pattern recognition theory of (PRTM), 5–6, 8, 11, 34–74, 79, 80, 86, 92, 111, 172, 217 thought experiments on, 199–247 mind-body problem, 221 Minsky, Marvin, 62, 133–35, 134, 199, 228 MIT Artificial Intelligence Laboratory, 134 MIT Picower Institute for Learning and Memory, 101 MobilEye, 159 modeling, complexity and, 37–38 Modha, Dharmendra, 128, 195, 271–72 momentum, 20–21 conservation of, 21–22 Money, John William, 118, 119 montane vole, 119 mood, regulation of, 106 Moore, Gordon, 251 Moore’s law, 251, 255, 268 moral intelligence, 201 moral systems, consciousness as basis of, 212–13 Moravec, Hans, 196 Morley, Edward, 18, 19, 36, 114 Moskovitz, Dustin, 156 motor cortex, 36, 99 motor nerves, 99 Mountcastle, Vernon, 36, 37, 94 Mozart, Leopold, 111 Mozart, Wolfgang Amadeus, 111, 112 MRI (magnetic resonance imaging), 129 spatial resolution of, 262–65, 263, 309n MT (V5) visual cortex region, 83, 95 Muckli, Lars, 225 music, as universal to human culture, 62 mutations, simulated, 148 names, recalling, 32 National Institutes of Health, 129 natural selection, 76 geologic process as metaphor for, 14–15, 114, 177 see also evolution Nature, 94 nematode nervous system, simulation of, 124 neocortex, 3, 7, 77, 78 AI reverse-engineering of, see neocortex, digital bidirectional flow of information in, 85–86, 91 evolution of, 35–36 expansion of, through AI, 172, 266–72, 276 expansion of, through collaboration, 116 hierarchical order of, 41–53 learning process of, see learning linear organization of, 250 as metaphor machine, 113 neural leakage in, 150–51 old brain as modulated by, 93–94, 105, 108 one-dimensional representations of multidimensional data in, 53, 66, 91, 141–42 pattern recognition in, see pattern recognition pattern recognizers in, see pattern recognition modules plasticity of, see brain plasticity prediction by, 50–51, 52, 58, 60, 66–67, 250 PRTM as basic algorithm of, 6 pruning of unused connections in, 83, 90, 143, 174 redundancy in, 9, 224 regular grid structure of, 82–83, 84, 85, 129, 262 sensory input in, 58, 60 simultaneous processing of information in, 193 specific types of patterns associated with regions of, 86–87, 89–90, 91, 111, 152 structural simplicity of, 11 structural uniformity of, 36–37 structure of, 35–37, 38, 75–92 as survival mechanism, 79, 250 thalamus as gateway to, 100–101 total capacity of, 40, 280 total number of neurons in, 230 unconscious activity in, 228, 231, 233 unified model of, 24, 34–74 as unique to mammalian brain, 93, 286n universal processing algorithm of, 86, 88, 90–91, 152, 272 see also cerebral cortex neocortex, digital, 6–8, 41, 116–17, 121–78, 195 benefits of, 123–24, 247 bidirectional flow of information in, 173 as capable of being copied, 247 critical thinking module for, 176, 197 as extension of human brain, 172, 276 HHMMs in, 174–75 hierarchical structure of, 173 knowledge bases of, 177 learning in, 127–28, 175–76 metaphor search module in, 176–77 moral education of, 177–78 pattern redundancy in, 175 simultaneous searching in, 177 structure of, 172–78 virtual neural connections in, 173–74 neocortical columns, 36–37, 38, 90, 124–25 nervous systems, 2 neural circuits, unreliability of, 185 neural implants, 243, 245 neural nets, 131–35, 144, 155 algorithm for, 291n–97n feedforward, 134, 135 learning in, 132–33 neural processing: digital emulation of, 195–97 massive parallelism of, 192, 193, 195 speed of, 192, 195 neuromorphic chips, 194–95, 196 neuromuscular junction, 99 neurons, 2, 36, 38, 43, 80, 172 neurotransmitters, 105–7 new brain, see neocortex Newell, Allen, 181 New Kind of Science, A (Wolfram), 236, 239 Newton, Isaac, 94 Nietzsche, Friedrich, 117 nonbiological systems, as capable of being copied, 247 nondestructive imaging techniques, 127, 129, 264, 312n–13n nonmammals, reasoning by, 286n noradrenaline, 107 norepinephrine, 118 Notes from Underground (Dostoevsky), 199 Nuance Speech Technologies, 6–7, 108, 122, 152, 161, 162, 168 nucleus accumbens, 77, 105 Numenta, 156 NuPIC, 156 obsessive-compulsive disorder, 118 occipital lobe, 36 old brain, 63, 71, 90, 93–108 neocortex as modulator of, 93–94, 105, 108 sensory pathway in, 94–98 olfactory system, 100 Oluseun, Oluseyi, 204 OmniPage, 122 One Hundred Years of Solitude (García Márquez), 283n–85n On Intelligence (Hawkins and Blakeslee), 73, 156 On the Origin of Species (Darwin), 15–16 optical character recognition (OCR), 122 optic nerve, 95, 100 channels of, 94–95, 96 organisms, simulated, evolution of, 147–53 overfitting problem, 150 oxytocin, 119 pancreas, 37 panprotopsychism, 203, 213 Papert, Seymour, 134–35, 134 parameters, in pattern recognition: “God,” 147 importance, 42, 48–49, 60, 66, 67 size, 42, 49–50, 60, 61, 66, 67, 73–74, 91–92, 173 size variability, 42, 49–50, 67, 73–74, 91–92 Parker, Sean, 156 Parkinson’s disease, 243, 245 particle physics, see quantum mechanics Pascal, Blaise, 117 patch-clamp robotics, 125–26, 126 pattern recognition, 195 of abstract concepts, 58–59 as based on experience, 50, 90, 273–74 as basic unit of learning, 80–81 bidirectional flow of information in, 52, 58, 68 distortions and, 30 eye movement and, 73 as hierarchical, 33, 90, 138, 142 of images, 48 invariance and, see invariance, in pattern recognition learning as simultaneous with, 63 list combining in, 60–61 in neocortex, see pattern recognition modules redundancy in, 39–40, 57, 60, 64, 185 pattern recognition modules, 35–41, 42, 90, 198 autoassociation in, 60–61 axons of, 42, 43, 66, 67, 113, 173 bidirectional flow of information to and from thalamus, 100–101 dendrites of, 42, 43, 66, 67 digital, 172–73, 175, 195 expectation (excitatory) signals in, 42, 52, 54, 60, 67, 73, 85, 91, 100, 112, 173, 175, 196–97 genetically determined structure of, 80 “God parameter” in, 147 importance parameters in, 42, 48–49, 60, 66, 67 inhibitory signals in, 42, 52–53, 67, 85, 91, 100, 173 input in, 41–42, 42, 53–59 love and, 119–20 neural connections between, 90 as neuronal assemblies, 80–81 one-dimensional representation of multidimensional data in, 53, 66, 91, 141–42 prediction by, 50–51, 52, 58, 60, 66–67 redundancy of, 42, 43, 48, 91 sequential processing of information by, 266 simultaneous firings of, 57–58, 57, 146 size parameters in, 42, 49–50, 60, 61, 66, 67, 73–74, 91–92, 173 size variability parameters in, 42, 67, 73–74, 91–92, 173 of sounds, 48 thresholds of, 48, 52–53, 60, 66, 67, 111–12, 173 total number of, 38, 40, 41, 113, 123, 280 universal algorithm of, 111, 275 pattern recognition theory of mind (PRTM), 5–6, 8, 11, 34–74, 79, 80, 86, 92, 111, 172, 217 patterns: hierarchical ordering of, 41–53 higher-level patterns attached to, 43, 45, 66, 67 input in, 41, 42, 44, 66, 67 learning of, 63–64, 90 name of, 42–43 output of, 42, 44, 66, 67 redundancy and, 64 specific areas of neocortex associated with, 86–87, 89–90, 91, 111, 152 storing of, 64–65 structure of, 41–53 Patterns, Inc., 156 Pavlov, Ivan Petrovich, 216 Penrose, Roger, 207–8, 274 perceptions, as influenced by expectations and interpretations, 31 perceptrons, 131–35 Perceptrons (Minsky and Papert), 134–35, 134 phenylethylamine, 118 Philosophical Investigations (Wittgenstein), 221 phonemes, 61, 135, 137, 146, 152 photons, 20–21 physics, 37 computational capacity and, 281, 316n–19n laws of, 37, 267 standard model of, 2 see also quantum mechanics Pinker, Steven, 76–77, 278 pituitary gland, 77 Plato, 212, 221, 231 pleasure, in old and new brains, 104–8 Poggio, Tomaso, 85, 159 posterior ventromedial nucleus (VMpo), 99–100, 99 prairie vole, 119 predictable outcomes, determined outcomes vs., 26, 239 President’s Council of Advisors on Science and Technology, 269 price/performance, of computation, 4–5, 250–51, 257, 257, 267–68, 301n–3n Principia Mathematica (Russell and Whitehead), 181 probability fields, 218–19, 235–36 professional knowledge, 39–40 proteins, reverse-engineering of, 4–5 qualia, 203–5, 210, 211 quality of life, perception of, 277–78 quantum computing, 207–9, 274 quantum mechanics, 218–19 observation in, 218–19, 235–36 randomness vs. determinism in, 236 Quinlan, Karen Ann, 101 Ramachandran, Vilayanur Subramanian “Rama,” 230 random access memory: growth in, 259, 260, 301n–3n, 306n–7n three-dimensional, 268 randomness, determinism and, 236 rationalization, see confabulation reality, hierarchical nature of, 4, 56, 90, 94, 172 recursion, 3, 7–8, 56, 65, 91, 153, 156, 177, 188 “Red” (Oluseum), 204 redundancy, 9, 39–40, 64, 184, 185, 197, 224 in genome, 271, 314n, 315n of memories, 59 of pattern recognition modules, 42, 43, 48, 91 thinking and, 57 religious ecstacy, 118 “Report to the President and Congress, Designing a Digital Future” (President’s Council of Advisors on Science and Technology), 269 retina, 95 reverse-engineering: of biological systems, 4–5 of human brain, see brain, human, computer emulation of; neocortex, digital Rosenblatt, Frank, 131, 133, 134, 135, 191 Roska, Boton, 94 Rothblatt, Martine, 278 routine tasks, as series of hierarchical steps, 32–33 Rowling, J.


pages: 117 words: 30,654

Kindle Formatting: The Complete Guide to Formatting Books for the Amazon Kindle by Joshua Tallent

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

book scanning, job automation, optical character recognition

No Digital File There are times when an author or publisher only has a physical copy of the book they want to publish on the Kindle. This is most common with out-of-print books, but it can also happen when the rights to the book revert back to the author and the publisher, for whatever reason, does not have a copy of the book in a PDF or other digital format. The easiest way to get the book back into a digital format is to scan it and run it through an Optical Character Recognition (OCR) software program. There are a variety of options available to the do-it-yourself person or to the pay-someone-else person. The main benefit to doing the process yourself is saving money, but you may find that having some help in the process is easier and faster. The first step in the OCR process is to have your book scanned. This is a process where each page of your book is turned into an image that can be loaded into the OCR program.


pages: 118 words: 35,663

Smart Machines: IBM's Watson and the Era of Cognitive Computing (Columbia Business School Publishing) by John E. Kelly Iii

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AI winter, call centre, carbon footprint, crowdsourcing, demand response, discovery of DNA, Erik Brynjolfsson, future of work, Geoffrey West, Santa Fe Institute, global supply chain, Internet of things, John von Neumann, Mars Rover, natural language processing, optical character recognition, pattern recognition, planetary scale, RAND corporation, RFID, Richard Feynman, Richard Feynman, smart grid, smart meter, speech recognition, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!

But there was no way that the scientists could anticipate every dilemma that Watson might be confronted with and write a rule to deal with it, so, as they developed Watson, they realized they would need to invent a system that would enable Watson to learn on its own. Their invention represented a major advance in the science of machine learning, a branch of artificial intelligence that focuses on building systems that learn from data. The field was first defined in 1959 by IBM scientist Arthur Samuels and has found plenty of uses over the years—including common applications such as optical character recognition and e-mail spam filters. Such systems are trained to recognize repeated patterns in words or shapes and to react in a certain way when they encounter them again. Watson takes machine learning to a new level. In creating the technology for Watson, called DeepQA, which includes the learning capability, the developers provided the machine with a large corpus of unstructured information and the algorithms to extract knowledge from it.


pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives by Steven Levy

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, AltaVista, Anne Wojcicki, Apple's 1984 Super Bowl advert, autonomous vehicles, book scanning, Brewster Kahle, Burning Man, business process, clean water, cloud computing, crowdsourcing, Dean Kamen, discounted cash flows, don't be evil, Douglas Engelbart, El Camino Real, fault tolerance, Firefox, Gerard Salton, Google bus, Google Chrome, Google Earth, Googley, HyperCard, hypertext link, IBM and the Holocaust, informal economy, information retrieval, Internet Archive, Jeff Bezos, Kevin Kelly, Mark Zuckerberg, Menlo Park, optical character recognition, PageRank, Paul Buchheit, Potemkin village, prediction markets, recommendation engine, risk tolerance, Sand Hill Road, Saturday Night Live, search inside the book, second-price auction, Silicon Valley, skunkworks, Skype, slashdot, social graph, social software, social web, spectrum auction, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, Ted Nelson, telemarketer, trade route, traveling salesman, Vannevar Bush, web application, WikiLeaks, Y Combinator

If a web page required users to fill out a form to see certain content, Google had probably taught its spiders how to fill out the form. Sometimes content was locked inside programs that ran when users visit a page—applications running in the JavaScript language or a media program like Adobe’s Flash. Google knew how to look inside those programs and suck out the content for its indexes. Google even used optical character recognition to figure out if an image on the website had text on it. The accumulation of all those improvements lengthened Google’s lead over its competitors, and the circle of early adopters who first discovered Google was eventually joined by the masses, building a dominant market share. Even Google’s toughest competitors had to admit that Brin and Page had built something special. “In the search engine business, Google blew away the early innovators, just blew them away,” says Bill Gates.

Larry would say, “Don’t go too fast … don’t go too slow.” It had to be a rate that someone could maintain for a long time—this was going to scale, remember, to every book ever written. They finally used a metronome to synchronize their actions. After some practice, they found that they could capture a 300-page book such as Startup in about forty-two minutes, faster than they expected. Then they ran optical character recognition (OCR) software on the images and began searching inside the book. Page would open the book to a random page and say, “This word—can you find it?” Mayer would do a search to see if she could. It worked. Presumably, a dedicated machine could work faster, and that would make it possible to capture millions of books. How many books were ever printed? Around 30 million? Even if the cost was $10 a book, the price tag would only be $300 million.

., 236, 331, 344–47, 364, 365–66 Kahle, Brewster, 362, 365 Kamangar, Salar, 71–72, 74, 233, 235 and advertising, 86, 89, 91–92, 109, 113 and business plan, 72, 75, 201 and Google motto, 143–44 and YouTube, 248, 260–65 Karen, Alana, 97–98 Karim, Jawed, 243, 247, 250 Kay, Erik, 207 Keyhole, 239–40, 340 Keyword Pricing Index, 118 Khosla, Vinod, 28, 29 Kim, Jini, 166 Klau, Rick, 312, 318 Kleinberg, Jon, 24–26, 34, 38, 292 Knol, 240 Knuth, Donald, 14 Kohl, Herb, 332 Koogle, Timothy, 44 Kordestani, Omid, 75–76, 78, 81, 96, 97, 130, 155, 242 Krane, David, 69–70, 143, 144–45, 150, 156 Kraus, Joe, 28, 136, 201, 374–75 Kundra, Vivek, 322, 326 Kurzweil, Raymond, 66 language, translations, 55, 62–65 Lantos, Tom, 285–87 Larson, Mark, 208 Leach, Jim, 286 Lee, Kai-Fu: and China office, 4, 281–83, 289–90, 291, 292, 293, 294, 296, 298, 302, 303, 305, 307–8, 313 departure of, 307–8, 312 Lee, Steve, 338–39 Lenat, Douglas, 47 Leonard, Herman, 117 Lessig, Lawrence, 359, 360, 363 Levick, Jeff, 96, 110–11, 112–13 Levinson, Arthur, 218, 237 Li, Mark, 293, 298–99 Li, Robin (Yanhong), 26–27, 278, 292, 293, 298 Library of Congress, 352, 361 Liebman, Jason, 103–5 LinkAds, 102–3 Linux, 78, 182, 210 Litvack, Sanford “Sandy,” 345, 347 Liu, John, 296 Liu, Jun, 294, 303–4 long-tail businesses, 85, 105, 107, 118, 243, 334 Lu, Qi, 380 Lucovsky, Mark, 283 Luk, Ben, 290, 302 Maarek, Yoelle, 272 MacDonald, Brian, 380 Macgillivray, Alex “AMac,” 353–55 machine learning, 64–66, 100–101, 385 Malone, Kim (Scott), 107–8, 135 Manber, Udi, 44, 45, 57–58, 68, 240, 355, 380 MapReduce, 199–200 Marconi International Fellowship Award, 278 Markoff, John, 193 Matias, Yossi, 272 Mayer, Marissa, 36, 41, 381 and advertising, 78–79 and APM program, 1, 4, 5, 161–62, 259 and books, 348–50, 358, 365 and Gmail, 170–71 and Google culture, 121, 122, 126–27, 141, 142, 163, 164, 365 and Google motto, 143–44 and Google’s look, 206–7 and management structure, 160, 235 and social networking, 371–73, 375 and stock price, 155, 156–57 McCaffrey, Cindy, 3, 76, 77, 145, 150, 153, 164 McCarthy, John, 127 McLaughlin, Andrew: and China, 276–79, 283–84, 303, 304 and Obama administration, 316, 321, 322–23, 325–26, 327 and privacy, 176–78, 379 memex, 15, 44 Merrill, Douglas, 183 Mi, James, 276 Microsoft: and antitrust issues, 331–32, 344–45 and aQuantive, 331 Bing search engine, 186, 380–81 and books, 361, 363 and browser wars, 206, 283 and China, 281, 282, 283, 284, 285, 304 and competition, 70, 191, 197, 200–212, 218, 220, 266, 282–83, 331, 344–47, 363, 380–81 and Danger, 214 data centers of, 190 and disclosure, 108 and email, 168, 169, 179–80 Excel, 200 and Facebook, 370 Hotmail, 30, 168, 172, 180, 209 IE 7, 209 Internet Explorer, 204–7 and mapping, 342 monopolies of, 200, 331–32, 364 Office, 200, 202, 203 Outlook, 169 PowerPoint, 200, 203 and user data, 335 and values, 144 WebTV, 217 Windows, 200, 210, 212, 219, 331 Windows Mobile, 220 Word, 200 and Yahoo, 343–44, 346, 380 of yesterday, 369 MIDAS (Mining Data at Stanford), 16 Milgrom, Paul, 90 Miner, Rich, 215, 216 Mobile Accord, 325 mobile phones, 214–17, 219–22, 251 Moderator, 323–24 Mohan, Neal, 332, 336 Monier, Louis, 19, 20, 37 Montessori, Maria, 121, 124, 166 Montessori schools, 121–25, 129, 138, 149 Moonves, Leslie, 246 Moore’s Law, 169, 180, 261 Morgan Stanley, 149, 157 Moritz, Mike, 32, 73–74, 80, 147, 247–48, 249 Morozov, Evgeny, 379 Morris, Doug, 261 Mossberg, Walt, 94 Mowani, Rajeev, 38 Mozilla Firefox, 204, 206, 207–8, 209 Murdoch, Rupert, 249, 370 MySpace, 243, 375 name detection system, 50–52 Napier’s constant, 149 National Federation of the Blind, 365–66 National Institute of Standards and Technology (NIST), 65 National Science Digital Library, 347 National Security Agency (NSA), 310 Native Client, 212 navigation, 229, 232, 338 Nelson, Ted, 15 net neutrality, 222, 326–27, 330, 383–84 Netscape, 30, 75, 78, 147, 204, 206 Nevill-Manning, Craig, 129 Newsweek, 2, 3, 20, 179 New York Public Library, 354, 357 Nexus One, 230, 231–32 95th Percentile Rule, 187 Nokia, 341, 374 Norman, Donald, 12, 106 Norvig, Peter, 47, 62, 63, 138, 142, 316 Novell, 70 Obama, Barack, 315–21, 322, 323–24, 329, 346 Obama administration, 320–28 Ocean, 350–55 Och, Franz, 63–65 Oh, Sunny, 283, 297, 298 OKR (Objectives and Key Results), 163–64, 165, 186, 209, 325 Open Book Alliance, 362 Open Handset Alliance, 221–22 OpenSocial, 375–76 operating systems, 210–12 optical character recognition (OCR), 53, 349–50 Oracle, 220 Orkut, 371–73, 375 Otellini, Paul, 218 Overture, 89, 90, 91, 95, 96, 98–99, 103, 150 Oxford University Press, 354, 357 Page, Larry, 3, 5 achievements of, 53, 383 and advertising, 84, 86–87, 90, 92, 94, 95–97, 114, 334, 336–37 ambition of, 12, 39, 73, 127–28, 139, 198, 215, 238, 362, 386–87 and applications, 205, 206, 207, 208, 210, 240–42, 340 and artificial intelligence, 62, 100, 246, 385–86 and BackRub/PageRank, 17, 18, 21–24, 26 and birth of Google, 31–34 and Book Search, 11, 347–52, 355, 357, 359, 361, 362, 364 on capturing all the web, 22–24, 52, 58, 63 on changing the world, 6, 13, 33, 39, 97, 120, 125, 146, 173, 232, 279, 316, 327, 361, 384–85 childhood and early years of, 11–13 and China, 267, 276, 277–78, 279–80, 283, 284, 305, 311 and data centers, 182–83 and eco-activism, 241 and email, 169–72, 174, 179 and Excite, 28–29 and funding, 32, 33–34, 73–75 and hiring practices, 139–40, 142, 182, 271, 386 imagination of, 14, 271 and IPO, 146–47, 149–54, 157 and machine learning, 66, 67 and management, 74, 75–77, 79–82, 110, 143, 158–60, 162–66, 228, 231, 235, 252–53, 254, 255, 260, 272, 273, 386–87 marriage of, 254 as Montessori kid, 121–25, 127–28, 149, 331, 387 and Obama, 315–16 and philanthropy, 257–58 and privacy, 174, 176–77, 337 and robots, 246, 385 and secrecy, 31–32, 70, 72–73, 106, 218 and smart phones, 214–16, 224, 225, 226–30, 234 and social networking, 372 and speed, 184–85, 207 and Stanford, 12–13, 14, 16–17, 28, 29, 34 and trust, 221, 237 values of, 127–28, 130, 132, 135, 139–40, 146, 196, 361, 364 and wealth, 157 and web links, 51 and YouTube, 248 PageRank, 3, 17, 18, 21–24, 27, 34, 38, 48–49, 53, 55, 56, 294 Palm, 216, 221 Park, Lori, 235, 258 Pashupathy, Kannan, 270–72, 277, 282 Passion Device, 230 Patel, Amit, 45–46, 82 and Google motto, 143–44, 146 patents, 27, 39, 89, 102, 221, 235, 237, 350 PayPal, 242, 243 peer-to-peer protocols, 234–35 Peters, Marybeth, 352 Phil, 99–103 Philip, Prince, 122 Picasa, 185–86, 187, 239 Pichai, Sundar, 205–6, 207–8, 209–12 Pichette, Patrick, 120, 150, 254–56 Pike, Rob, 241 Pittman, R.


pages: 685 words: 203,949

The Organized Mind: Thinking Straight in the Age of Information Overload by Daniel J. Levitin

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

airport security, Albert Einstein, Amazon Mechanical Turk, Anton Chekhov, big-box store, business process, call centre, Claude Shannon: information theory, cloud computing, cognitive bias, complexity theory, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, cuban missile crisis, Daniel Kahneman / Amos Tversky, delayed gratification, Donald Trump, en.wikipedia.org, epigenetics, Eratosthenes, Exxon Valdez, framing effect, friendly fire, fundamental attribution error, Golden Gate Park, Google Glasses, haute cuisine, impulse control, index card, indoor plumbing, information retrieval, invention of writing, iterative process, jimmy wales, job satisfaction, Kickstarter, life extension, meta analysis, meta-analysis, more computing power than Apollo, Network effects, new economy, Nicholas Carr, optical character recognition, pattern recognition, phenotype, placebo effect, pre–internet, profit motive, randomized controlled trial, Skype, Snapchat, statistical model, Steve Jobs, supply-chain management, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Turing test, ultimatum game

Microsoft engineer Malcolm Slaney (formerly of Yahoo!, IBM, and Apple) advocates scanning everything into PDFs and keeping them on your computer. Home scanners are relatively inexpensive, and there are strikingly good scanning apps available on cell phones. If it’s something you want to keep, Malcolm says, scan it and save it under a filename and folder that will help you find it later. Use OCR (optical character recognition) mode so that the PDF is readable as text characters rather than simply a photograph of the file, to allow your computer’s own search function to find specific keywords you’re looking for. The advantage of digital filing is that it takes up virtually no space, is environmentally friendly, and is electronically searchable. Moreover, if you need to share the document with someone (your accountant, a colleague) it’s already in a digital format and so you can simply attach it to an e-mail.

Individually, each reCAPTCHA takes only about ten seconds to solve, but with more than 200 million of them being solved every day, this amounts to over 500,000 hours of work being done in one day. Why not turn all this time into something productive? The technology for automatically scanning written materials and turning them into searchable text is not perfect. Many words that a human being can discern are misread by computers. Consider the following example from an actual book being scanned by Google: After the text is scanned, two different OCR (for optical character recognition) programs attempt to map these blotches on the page to known words. If the programs disagree, the word is deemed unsolved, and then reCAPTCHA uses it as a challenge for users to solve. How does the system know if you guessed an unknown word correctly? It doesn’t! But reCAPTCHAs pair the unknown words with known words; they assume that if you solve the known word, you’re a human, and that your guess on the unknown word is reasonable.

See brain physiology news media, 338–40 Newton, Isaac, 162 New Yorker, 120, 336 New York Times, 6, 339, 365 Nietzsche, Friedrich, 375 Nixon, Richard, 201 NMDA receptor, 167 nonlinear thinking and perception, 38, 215, 217–18, 262, 380 Norman, Don, 35 number needed to treat metric, 236, 240, 247, 264, 264 Obama, Barack, 219, 303 object permanence, 24 Office of Presidential Correspondence, 303 Olds, James, 101 Old Testament, 151 O’Neal, Shaquille, 352–53 One Hundred Names for Love (Ackerman), 364–65 online dating, 130–34, 422n130, 423n132 optical character recognition (OCR), 93, 119, 119 optimal information, 308–10 orders of magnitude, 354–55, 358–59, 361, 363, 400n7 organizational structure, 271–76, 315–18, 470n315, 471n317 Otellini, Paul, 380–81 Overbye, Dennis, 6, 19 Oxford English Dictionary, 114 Oxford Filing Supply Company, 93–94 Page, Jimmy, 174 pair-bonding, 128, 142 paperwork, 293–306 Pareto optimality, 269 parking tickets, 237, 451n237 Parkinson’s disease, 167–68 passwords, xx, 103–5 Patel, Shreena, 258 paternalism, medical, 245, 257 pattern recognition, 28, 249 Patton, George S., 73–74 peak performance, 167, 189, 191–92, 203, 206 Peer Instruction (Mazur), 367 perfectionism, 174, 199–200 periodic table of elements, 372–73, 373, 480n372 Perry, Bruce, 56 Peterson, Jennifer, 368 pharmaceuticals, 256–57, 343, 345–46 Picasso, Pablo, 283 Pierce, John R., 73 Pirsig, Robert, 69–73, 89, 295–97, 300 placebo effect, 253, 255 place memory, 82–83, 106, 293–94 planning, 43, 161, 174–75, 319–26 Plato, 14, 58, 65–66 plausibility, 350, 352, 478n352 Plimpton, George, 200 Plutarch, 340 Poldrack, Russ, 97 Polya, George, 357 Ponzo illusion, 21, 22 positron emission tomography (PET), 40 prediction, 344–45 prefrontal cortex, 161 Area 47, 287 and attention, 16–17, 43, 45–46 and changing behaviors, 176 and children’s television, 368 and creative time, 202, 210 and decision-making, 277, 282 and flow state, 203, 207 and information overload, 8 and literary fiction, 367 and manager/worker distinction, 176 and multitasking, 96, 98, 307 and procrastination, 197, 198, 200–201 and sleep, 187 and task switching, 171–72 and time organization, 161, 165–66, 174, 180 See also brain physiology preselection effect, 331, 343 Presidential Committee on Information Literacy, 365 primacy effect, 55, 408n56 primates, 17–18, 125–26, 135 Prince, 174 Princeton Theological Seminary, 145–46 prior distributions, 249 prioritization, 5–7, 33–35, 379–80 probability.


pages: 566 words: 122,184

Code: The Hidden Language of Computer Hardware and Software by Charles Petzold

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Bill Gates: Altair 8800, Claude Shannon: information theory, computer age, Douglas Engelbart, Dynabook, Eratosthenes, Grace Hopper, invention of the telegraph, Isaac Newton, Jacquard loom, Jacquard loom, James Watt: steam engine, John von Neumann, Joseph-Marie Jacquard, Louis Daguerre, millennium bug, Norbert Wiener, optical character recognition, popular electronics, Richard Feynman, Richard Feynman, Richard Stallman, Silicon Valley, Steve Jobs, Turing machine, Turing test, Vannevar Bush, Von Neumann architecture

Similarly, the wider gaps between the bars are two, three, and four times the width of the thinnest gap. But another way to look at the UPC is as a series of bits. Keep in mind that the whole bar code symbol isn't exactly what the scanning wand "sees" at the checkout counter. The wand doesn't try to interpret the numbers at the bottom, for example, because that would require a more sophisticated computing technique known as optical character recognition, or OCR. Instead, the scanner sees just a thin slice of this whole block. The UPC is as large as it is to give the checkout person something to aim the scanner at. The slice that the scanner sees can be represented like this: This looks almost like Morse code, doesn't it? As the computer scans this information from left to right, it assigns a 1 bit to the first black bar it encounters, a 0 bit to the next white gap.

It's fairly straightforward to convert a metafile to a bitmap. Because video display memory and bitmaps are conceptually identical, if a program knows how to draw a metafile in video display memory, it knows how to draw a metafile on a bitmap. But converting a bitmap to a metafile isn't so easy, and for some complex images might well be impossible. One technique related to this job is optical character recognition, or OCR. OCR is used when you have a bitmap of some text (from a fax machine, perhaps, or scanned from typed pages) and need to convert it to ASCII character codes. The OCR software needs to analyze the patterns of bits and determine what characters they represent. Due to the algorithmic complexity of this job, OCR software is usually not 100 percent accurate. Even less accurate is software that attempts to convert handwriting to ASCII text.


pages: 382 words: 120,064

Bank 3.0: Why Banking Is No Longer Somewhere You Go but Something You Do by Brett King

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, additive manufacturing, Albert Einstein, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, asset-backed security, augmented reality, barriers to entry, bitcoin, bounce rate, business intelligence, business process, business process outsourcing, call centre, capital controls, citizen journalism, Clayton Christensen, cloud computing, credit crunch, crowdsourcing, disintermediation, en.wikipedia.org, George Gilder, Google Glasses, high net worth, I think there is a world market for maybe five computers, Infrastructure as a Service, invention of the printing press, Jeff Bezos, jimmy wales, London Interbank Offered Rate, M-Pesa, Mark Zuckerberg, mass affluent, microcredit, mobile money, more computing power than Apollo, Northern Rock, Occupy movement, optical character recognition, performance metric, platform as a service, QWERTY keyboard, Ray Kurzweil, recommendation engine, RFID, risk tolerance, self-driving car, Skype, speech recognition, stem cell, telepresence, Tim Cook: Apple, transaction costs, underbanked, web application

However, increasingly banks will be deploying solutions that overlay key data on the natural environment so that they can change consumer financial behaviour or influence decisions in real time. Augmenting our environment with the application of smart data will be an intriguing and highly profitable business over the next decade. Augmented reality Something that is a little bit out there, but interesting to think about, is the emerging technology around image recognition and data overlays in the real world. We’ve had OCR or Optical Character Recognition for many years now, but there have been recent improvements in image processing and matching. Recently Google has developed search engine technology called “Google Goggles” that allows users to search based on images taken by their camera phones. It is currently in beta with some reasonable search support for books, DVDs, landmarks, logos, contact info, artwork, businesses, products, barcodes, and text.

Mobile Wallet: An electronic account, dominated in a currency, held on a mobile phone that can be used to store and transfer value. Moore’s Law: Named after Gordon Moore, this law basically states that the number of transistors on a chip doubles every 24 months. NFC: Near Field Communication—a short-range high-frequency wireless communication technology which enables the exchange of data between devices over about a 10-centimetre distance OCR: Optical Character Recognition OpEx: Operating Expense OLED: Organic Light-Emitting Diode (also Organic Electro-luminescent Device OELD)—an LED whose electroluminescent layer is composed of a film of organic compounds. OTC: Over the Counter—refers to physical transactions or trades done on behalf of a customer by a trader or customer representative who has access to a specific closed financial system or network.


pages: 528 words: 146,459

Computer: A History of the Information Machine by Martin Campbell-Kelly, William Aspray, Nathan L. Ensmenger, Jeffrey R. Yost

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Ada Lovelace, air freight, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Apple's 1984 Super Bowl advert, barriers to entry, Bill Gates: Altair 8800, borderless world, Buckminster Fuller, Build a better mousetrap, Byte Shop, card file, cashless society, cloud computing, combinatorial explosion, computer age, deskilling, don't be evil, Douglas Engelbart, Dynabook, fault tolerance, Fellow of the Royal Society, financial independence, Frederick Winslow Taylor, game design, garden city movement, Grace Hopper, informal economy, interchangeable parts, invention of the wheel, Jacquard loom, Jacquard loom, Jeff Bezos, jimmy wales, John von Neumann, linked data, Mark Zuckerberg, Marshall McLuhan, Menlo Park, natural language processing, Network effects, New Journalism, Norbert Wiener, Occupy movement, optical character recognition, packet switching, PageRank, pattern recognition, pirate software, popular electronics, prediction markets, pre–internet, QWERTY keyboard, RAND corporation, Robert X Cringely, Silicon Valley, Silicon Valley startup, Steve Jobs, Steven Levy, Stewart Brand, Ted Nelson, the market place, Turing machine, Vannevar Bush, Von Neumann architecture, Whole Earth Catalog, William Shockley: the traitorous eight, women in the workforce, young professional

Applying the code to products had to be inexpensive in order not to disadvantage small manufacturers; and the expense of product coding could not add significantly to the cost of goods, as this would disadvantage retailers who were unable to participate in the system because they would be forced to bear the increased cost of the system without deriving any benefit. The checkout equipment also had to be relatively inexpensive because millions of barcode readers would eventually be needed. These conditions ruled out the use of the expensive magnetic ink and optical character-recognition systems then in use by banks and the Federal Reserve. Various experimental systems were tested at a cost of several million dollars. By the end of 1971 there was a much better awareness of the complex trade-offs that had to be made—for example, between printing costs and scanner costs. In the spring of 1973 the barcode system with which we are now all familiar was formally adopted. This system was largely developed by IBM, and it was selected simply for being the cheapest and the most trouble-free option.

See Business machine industry Office of Naval Research (ONR), 147–148, 150 Office of Scientific Research and Development (OSRD), 49, 65–66, 74 Office systematizers, 19, 134 Olivetti, 197, 251 Olsen, Kenneth, 217–218 Omidyar, Pierre, 295 “On Computable Numbers with an Application to the Entscheidungsproblem” (Turing), 60 Opel, John, 246 Open-source software, 215, 288, 296 Operating systems for mainframe computers, 179–182, 205, 206, 210, 212–215 for mobile devices, 297, 298 for personal computers, 242–243, 246–247, 253–254, 257–258, 264–267 See also specific operating systems Optical character recognition, 164 OS/2 operating system, 265, 266 OS/360 operating system, 179–182, 183, 212 Osborne 1 computer, 198 (photo), 296 Outsourcing of components and software, 245–246, 247 Oxford English Dictionary, 3 Packaged software programs, 186–188, 254 Packard, David, 249 Packet-switching technology, 281–282 Page, Larry, 294 Palm, Inc., 297, 298 Palo Alto Research Center (PARC), 260, 261, 280, 296 Papian, Bill, 150 Parker, Sean, 301 Pascal programming language, 185 Passages from the Life of a Philosopher (C.

The Future of Technology by Tom Standage

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

air freight, barriers to entry, business process, business process outsourcing, call centre, Clayton Christensen, computer vision, connected car, corporate governance, disintermediation, distributed generation, double helix, experimental economics, full employment, hydrogen economy, industrial robot, informal economy, interchangeable parts, job satisfaction, labour market flexibility, market design, Menlo Park, millennium bug, moral hazard, natural language processing, Network effects, new economy, Nicholas Carr, optical character recognition, railway mania, rent-seeking, RFID, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, six sigma, Skype, smart grid, software as a service, spectrum auction, speech recognition, stem cell, Steve Ballmer, technology bubble, telemarketer, transcontinental railway, Y2K

This work involved 100 Chinese rocket 120 4.3 2.1 A WORLD OF WORK people, each of whom cost the firm $6,000 in software-licence fees. The American company had been trying to write software to automate some of this work and reduce its licence-fee payments. Wipro scrapped the software project, hired 110 Indians and still did the work more cheaply. Once work has moved abroad, however, it joins the same cycle of automation and innovation that pushes technology forward everywhere. Optical-character-recognition software is automating the work of Indian data-entry workers. Electronic airline tickets are eliminating some of the ticket-reconciliation work airlines carry out in India. Eventually, natural-language speech recognition is likely to automate some of the call-centre work that is currently going to India, says Steve Rolls of Convergys, the world’s largest call-centre operator. All this helps to promote outsourcing and the building of production platforms in India. ge is selling gecis, its Indian financial-services administrator, and Citibank, Deutsche Bank and others have disposed of some of their Indian it operations.

Multiband OFDM Alliance (MBOA) 215–17 Murray, Bill 52, 63 Murthy, Narayana 125, 130, 142 music ix, 95, 99–101, 102, 165–7, 168, 170–1, 172, 202–8, 212–13, 219–29 music industry internet threat 222–9 quality concerns 224–5 music players 204, 207–8, 219–29 see also iPod... hard disks 204, 207–8, 219–20 social issues 220–1 351 THE FUTURE OF TECHNOLOGY MVNOs see mobile virtual network operators Myriad 243 N N-Gage handset 161, 171 nanobots 316 Nanogen 323 nanometre, definition ix Nanometrics 323 Nanosolar 315 NanoSonic 308 Nanosys 321–2, 326 nanotechnology ix–x, 233, 263–4, 306–29 applications 308–15 chemistry 310–11 companies 321–6 computer chips 313–14, 325–6 concepts 306–29 definition 306 developing world 319–20 energy 314–15 fears 316–20, 327–9 funding 308–9 future prospects 309–15, 321–6 “grey goo” 309, 316 patents 321–6, 329 problems 316–20, 327–9 profit expectations 321–6 revenue streams 321–6 safeguards 327–9 toxicity issues 316–17, 319, 328–9 warfare 319 Napster 229 Narayanan, Lakshmi 125, 131–2 NASA 311, 315, 333 National Football League (NFL), America 194 natural-language search software, AI 339–40 NBIC 327 NCR Corporation 210 NEC 171, 203, 311 .NET 86 Netscape 8, 54 Network Associates 68 network computers 102 network effect 91 networks see also internet complexity problems 85–7 concepts 6–7, 13–16, 24–7, 42–8, 85–7, 338 costs 14–15 digital homes ix, 94–7, 147, 200, 202–32 open standards 24–7, 31, 43, 85–7, 115, 152 security issues 42–8, 49–65, 66–9 wireless technology ix, 11, 34–5, 39, 66–7, 93, 95–7, 109–10, 147, 150–3, 167, 168–9, 171–3, 203, 209–13, 334 neural networks 338 352 new inventions see innovations New York Power Authority 287–8 New Zealand 168, 301 Newcomer, Eric 26 Newell, Alan 336 news media, camera phones 182 Nexia Biotechnologies 263, 269 NFL see National Football League nickel-cadmium batteries 280 nickel-metal-hydride batteries 280 Nilekani, Nandan 131 Nimda virus 45, 50, 55 Nintendo 191–3 Nokia 120, 130, 150, 152–3, 154–61, 164–6, 170–4, 176, 208, 211, 217, 280 Nordan, Matthew 322, 325 Nordhaus, William 136–7 Norman, Donald 78, 82–3, 101–2 Novartis 240 Novell 9, 69 Novozymes 258 NTF messages 87 nucleotides 236, 241–8 Nuovo, Frank 173, 176 NVIDIA 202 O Oblix 68–9 obsolescence issues, built-in obsolescence 8–9, 29 ODMs see original design manufacturers OFDM see orthogonal frequency-division multiplexing on-demand computing 22, 88 see also services... O’Neil, David 73 O’Neil, John 28, 30 OneSaf 197 online banks 37 online shopping viii, 37 open standards 7, 10, 22–7, 31, 38, 43, 85–7, 115, 118–19, 152 operating systems 9, 10, 23–5, 31, 38, 85, 101, 109 operators, mobile phones 157–61, 162–9 Opsware 8, 15 optical-character recognition 121 Oracle 5, 20–2, 33, 38, 39–40, 46, 56, 62, 86, 243 Orange 157–8 organic IT 13–16, 88 original design manufacturers (ODMs), mobile phones 156–7 O’Roarke, Brian 192 O’Roarke, John 96 Orr, Scott 187 orthogonal frequency-division multiplexing (OFDM) 212–13, 215–17 INDEX Otellini, Paul 11, 95 outshored developments, software 38, 115, 138–9 outsourcing viii, 9, 19–20, 22, 38, 68–9, 71, 72, 88–92, 112–46, 158–60 see also globalisation barriers 121–2, 143 concepts 112–46 costs 112–24, 131–5, 140–3 cultural issues 122, 142 Europe 140–6 historical background 119–20, 125–6, 133 India 38, 109, 112–15, 119–22, 125–35, 137–8, 140–6 legal agreements 121–4 mobile phones 155–6, 158–60 opportunities 144–6 protectionists 140–6 reasons 123–4, 143 services 113–30 social outsourcing 143 “overshoot” stage, industries 9, 10–11, 109 overview vii–x, 6–7 Ovi, Alessandro 275–6 Oxford GlycoSciences 243 P Pacific Cycle 140 Page, Larry 9 Pait, Rob 207 Palladium 74, 76 Palm Pilot 150 Palmisano, Samuel 22 Paltrow, Gwyneth 173 Panasonic 156 Papadopoulos, Greg 14, 78–9, 83–4, 91 Papadopoulos, Stelios 237 Parker, Andrew 143 Parks Associates 96, 203 Parr, Doug 319 particulate filters 296–7 passwords 53, 58–61, 67, 96–7 patents, nanotechnology 321–6, 329 Patriot Act, America 35 PCs 9–16, 78–81, 82–110, 151, 171–3, 202–18 see also digital homes; hardware commoditisation issues 9–16, 132–5, 203 complexity issues 78–81, 82–110 screen sizes 100–1 UWB 214–18 Wi-Fi 209–18 PDAs see personal digital assistants Peck, Art 203 PentaSafe Security 60 Pentium chips 199–200 PeopleSoft 39, 86, 119, 126, 132 Perez, Carlota 5–6, 134 performance issues see also processing power; returns cars 291–8 Cell chips 198–200 cost links 29–30 Perlegen 244 personal digital assistants (PDAs) 151, 277, 279 see also handheld computers personal video recorders (PVRs) 203, 205–6 perverse incentives, security issues 61–2 Pescatore, John 55 Pfizer 69, 240, 247, 312, 315 pharmaceutical companies 239–40, 241–50, 312 PHAs 260 Philippines 130 Philips 120, 217 “phishing” 76, 89 phonograph 82, 84 photo-voltaic cells 280 photos ix, 78, 95, 101, 179–83 Physiome 248 Picardi, Tony 79 Pick, Adam 156 Pink Floyd 225 Piper, H. 292 Pittsburgh convention centre 304 Pivotal 187 plasma screens 230–2 plastics 238–9, 259–64 PlayStation 191–2, 199–200, 206–7 plug-and-play devices 78 plug-in hybrid cars 295–6 Poland 120 police involvement, security breaches 72 polio 265 politics 32–5 see also governments Pollard, John 157 pollution 275, 296–7, 299–304, 319 Pop Idol (TV show) 225 Pope, Alexander 267 Porsche 292 “post-technology” period, IT industry vii, 5–7 Powell, Michael 98, 206 power grids 233, 285–90 PowerPoint presentations 4–5, 107 Predictive Networks 337 Presley, Elvis 225 prices, downward trends viii, 4–7 PricewaterhouseCoopers 38 printers 78, 96 privacy issues 27, 34, 42–8, 179–83 see also security... mobile phones 179–83 processing power see also computer chips 353 THE FUTURE OF TECHNOLOGY exponential growth 4–7, 8–14 Proctor, Donald 106 Prodi, Romano 274–5 profits, future prospects 7, 17–18, 37–40 proprietary technology 24, 26, 80, 86 protectionists, outsourcing 140–6 proteins, biotechnology 241–64 protocols, complexity issues 86 Proxim 210 Prozac 315 PSA Peugeot Citroën 293, 296–7 PSP, Sony 191–3 public accounts 44 Pullin, Graham 177–8 PVRs see personal video recorders Q Qualcomm 164 quantum dots 312, 317, 322, 325 R radiation fears, mobile phones 176 radio 34–5, 36, 39, 94–5, 108, 155–61, 164, 209–18, 223 see also wireless... chips 155–61, 164 “garbage bands” 209–10, 215 music industry 223 spectrum 34–5, 94–5, 209–18 UWB 96–7, 214–19 Radjou, Navi 333–4 railway age vii, 5, 7, 23, 36, 39, 134 Raleigh, Greg 211 RAND 195 rationalisation exercises 31 RCA 108–9, 206, 208, 220, 315 real-world skills, gaming comparisons 194–7 RealNetworks 203 rechargeable batteries 280–4 Recourse Technologies 62–3 Reed, Philip 177 regulations 35, 44, 209–10, 326–9 see also legal issues relational databases 101–2 reliability needs viii, 42–8 religion 19 renewable energy 275–6, 286, 289, 300, 310, 315 ReplayTV 205 Research in Motion (RIM) 152–3 resistance problems, employees 31 return on investment (ROI) 30–1 returns 20, 29–31, 329 see also performance issues risk 20, 30, 329 revenue streams biotechnology 237–8, 241–2 354 gaming 189–90, 191 GM 251–2 mobile phones 151, 154–5, 157, 162–3, 165–6, 174 nanotechnology 321–6 revolutionary ideas vii–viii, 5–7, 13–14, 36–40, 80–4, 107–10, 116, 134, 151–3, 198–200, 236–40, 326–9 RFID radio tags 39, 94–5 Rhapsody 203 Ricardo 296–7 Riley, James, Lieutenant-Colonel 195–7 RIM see Research in Motion ringtones 165–6 RISC chips 200 risk assessments 70–4, 76 attitudes 18 handling methods 71 insurance policies 71–3 management 70–4 mitigation 71–3 outsourced risk 71, 72, 88–92 returns 20, 30, 329 security issues 42–8, 49–69, 70–4 RNA molecules 241–2, 249–50, 265 Robinson, Shane 15–16 robotics x, 233, 316, 332–5 Roco, Mihail 309 Rodgers, T.J. 32 Rofheart, Martin 216–17 Rogers, Richard 300 ROI see return on investment Rolls, Steve 121 Romm, Joseph 298 Roomba 332, 334–5 “root kit” software 51 Rose, John 226 Roslin Institute 256 Roy, Raman 125–8 Russia 115, 130, 140, 142, 145, 319 Ryan, John 312 S S700 mobile phone 171 Saffo, Paul 83–4, 103, 182 Salesforce.com 19, 20, 84, 91–2, 109 Samsung 158–60, 181, 208, 217, 231, 277 Santa Fe Institute 39 SAP 22, 38, 86, 119, 126, 132 satellite television 205 Saudi Arabia 180 scandals 28 scanning tunnelling microscope (STM) 306 SCC see Sustainable Computing Consortium Schadler, Ted 95, 97 Schainker, Robert 285, 289 INDEX Scherf, Kurt 96–7 Schmelzer, Robert 91 Schmidt, Eric 9, 35, 36–8 Schmidt, Nathan 66 Schneider National 29–31 Schneier, Bruce 43, 58, 61–2, 65, 70, 73–4 schools, surveillance technology 181 Schwartz, John 46 Schwinn 140, 143 Scott, Tony 43, 68–9 screen sizes 100–1 screws 23–4 Seagate Technology 207 seamless computing 96–7 Sears, Roebuck & Co 36 Securities and Exchange Commission 321 security issues viii, 25–7, 32–5, 42–8, 49–74, 86–7 see also privacy... airport approach 68–9 anti-virus software 50–1, 60, 67–8 biometric systems 60, 64–5, 71, 74 breaches 43–4, 46, 49–52, 62, 72–3 civil liberties 74 concepts 42–74, 86–7 costs 45–6, 50–1, 62, 70–4 employees 58–63, 69 encryption 53–4 firewalls 51–3, 58, 60, 62, 66–8, 71, 86–7 hackers 4, 43, 47, 49, 51–3, 58–63 handheld computers 67–8 honeypot decoys 62–3 human factors 57–63, 69 identity management 69 IDSs 51, 53–4, 62, 87 impact assessments 70–1, 76 insider attacks 62–3 insurance policies 71–3 internet 35, 42–8, 49–57, 61–2, 66, 66–7, 71, 73–6, 179–83 job vacancies 46 joint ventures 67 major threats 35, 42, 43, 47, 49–63, 66–9 management approaches 60–3, 69 Microsoft 54–6, 72, 74, 76 misconceptions 46–8 networks 42–8, 49–65, 66–9 passwords 53, 58–61, 67, 96–7 patches 56–7, 76 perverse incentives 61–2 police involvement 72 risk assessments 70–4, 76 standards 71–3 terrorism 35, 42, 43, 50, 65, 74, 75–6, 265–6 tools 49–63, 86–7 viruses 45, 47, 49–56, 59–60, 67–8, 74, 86, 89 Wi-Fi 66–7, 93 sedimentation factors 8–9, 84 segmentation issues, mobile phones 167–9 self-configuration concepts 88–9 Sellers, William 23 Seminis 254 Sendo 160 Senegal 182 September 11th 2001 terrorist attacks 35, 42, 43, 50, 65, 75 servers 9–16, 37–8, 62–3, 85–7, 132–3, 203 services industry 14, 17–22, 25–7, 31, 36–40, 80, 88–92, 109, 113–35, 203 see also web services outsourcing 113–46 session initiation protocol (SIP) 104–6 sewing machines 82, 84 SG Cowen 237 shapes, mobile phones 170–6 Shapiro, Carl 24 Sharp 156, 231, 326 shelfware phenomenon 20 Shelley, Mary 267, 269 shipping costs 121 sick building syndrome 302 Siebel 86 Siemens 120, 130, 142, 156, 159, 170, 172, 174 SightSpeed 84, 98, 103 SilentRunner 62 Silicon Valley 9, 32–40, 45–6, 54, 69, 79, 96, 98, 101, 103, 152, 313–14, 321 silk 263, 269 Simon, Herbert 336 simplicity needs 78–81, 84, 87, 88–92, 98–110 SIP see session initiation protocol Sircam virus 45, 49 Sirkin, Hal 120, 140 “six sigma” methods 128 SK 169 Skidmore, Owings & Merrill 302 Sky 205 Skype 103–4, 110 Sloan School of Management, MIT 30 Slovakia 120 small screens 100 Smalley, Richard 311 smallpox 265–6 smart power grids 233, 285–90 smartcards 64, 69 smartphones 150–3, 157–61 see also mobile phones SMES devices 289 Smith Barney 37 Smith, George 307–8 Smith, Lamar 75 Smith, Vernon 17 SNP 243–4 SOAP 25–7 355 THE FUTURE OF TECHNOLOGY social issues mobile phones 177–8, 182–3 music players 220–1 social outsourcing 143 software see also information technology ASPs 19–20, 91–2, 109 bugs 20–1, 54–6 Cell chips 198–200 commoditisation issues 10–16, 25, 132–5, 159, 203 complexity issues 14–15, 78–81, 82–110, 117–22 firewalls 52–3, 58, 86–7 hackers 51–3, 58–63 Java programming language 21–2, 25, 86 management software 13–16, 21–2, 88, 117–18 mobile phones 158–9 natural-language search software 339–40 operating systems 9, 10, 23–5, 31, 38, 85, 101, 109 outsourcing 38, 115, 138–9 patches 56–7, 76 premature releases 20–1 shelfware phenomenon 20 viruses 45, 47, 49–56, 59–60, 67, 74, 89 solar power 275–6, 286, 289, 301–2, 310, 315, 325 Solectron 112–13, 119 solid-state storage media 204, 207, 219 SOMO... project, mobile phones 177–8 Sony 95, 108, 156, 191–3, 198–200, 203, 206–7, 217, 228, 231, 282–4, 332, 334, 338 Sony Ericsson 156, 158, 159–60, 171 Sony/BMG 222–3, 227, 229 Sood, Rahul 38 Sorrent 187 South Africa 309, 319, 334 South Korea 156, 158, 163–5, 167–9, 170–1, 181, 319 soyabean crops 252–4 spam 76, 89, 118 Spar, Debora 32–3 speculation vii speech recognition 102, 121, 336 SPH-V5400 mobile phone 208 Spider-Man 189–90 Spinks, David 60–1, 63 Spitzer, Eliot 223 Sprint 167–8, 180–1 SQL 53 @Stake 54 Standage, Ella 316 standards green buildings 300–4 open standards 7, 10, 22–7, 31, 38, 43, 85–7, 115, 118–19, 152 356 security issues 71–3 W-CDMA standard 163–4, 168 web services 90–1 Wi-Fi 210–13 Stanford University 82, 137 Star Wars (movie) 186 steam power ix, 5, 134 steel industry 134 steering committees 31 stem cells 268–9 Steven Winter Associates 302 Stewart, Martha 249 STM see scanning tunnelling microscope stop-start hybrid cars 293–4 storage problems, electricity 275–6, 289–90 StorageTek 85 strategy 30 stress-resistance, biotechnology 254 Studio Daniel Libeskind 302 Sturiale, Nick 45 Sun Microsystems 9, 13–15, 21–2, 25, 27, 37–8, 43, 56, 58, 78–9, 83, 85, 87, 91, 102 supercomputers 199–200 Superdome machines 21 supply chains 8, 37–40, 155 surveillance technology 35, 74, 179–83, 309 Sussex University 5, 220, 310 Sustainable Computing Consortium (SCC) 27 Sweden 109 Swiss Army-knife design, mobile phones 171–2 Swiss Re Tower, 30 St Mary Axe 299, 301–2, 304 swivel design, mobile phones 171 Symantec 39, 46, 50, 62–3, 67 Symbian 158 Symbol 210 synthetic materials 258–64, 317 systems analysts 137 T T-Mobile 167–8 Taiwan 156–7, 160 Talwar, Vikram 144 Taylor, Andy 226 Taylor, Carson 287 TCP/IP 25 TCS 132–5, 145–6 Teague, Clayton 314 TechNet 33 techniques, technology 17–18 techno-jewellery design, mobile phones 172–4 technology see also individual technologies concepts vii–x, 4–7, 17–18, 23–7, 32–3, 82–4, 134, 326–9 cultural issues 93–4, 142 INDEX geekiness problems 83–4 government links 7, 18, 27, 31–5, 43–8, 123–4, 179–83, 209–10 Luddites 327 surveillance technology 35, 74, 179–83, 309 Tehrani, Rich 105 telecommunications viii, 23, 26, 103–6, 134, 164–5 telegraph 32–3, 108 telephone systems 84, 103–6, 109–10, 212–13, 214 Telia 109 terrorism 35, 42, 43, 50, 65, 74, 75–6, 265–6 Tesco 168 Tetris 12 Texas Energy Centre 287 Texas Instruments 125–6, 217 text-messaging facilities 165, 167 Thelands, Mike 164 therapeutic antibodies 249–50, 256–7 Thiercy, Max 339–40 thin clients 102 third-generation mobile phone networks (3G) 151, 162–9, 212 Thomas, Jim 318 Thomson, Ken 59 Thornley, Tony 164 3G networks see third-generation mobile phone networks TIA see Total Information Awareness TiVo 203, 205–6 Tomb Raider (game/movie) 187–8 Toshiba 156, 198–200, 203 Total Information Awareness (TIA) 35 toxicity issues, nanotechnology 316–17, 319, 328–9 Toyota 291–5, 297, 300–1, 334 toys see also gaming robotics 334 transatlantic cable 36, 39 transistors 4–7, 8–12, 85–7, 109 see also computer chips Transmeta 313 Treat, Brad 84, 98 Tredennick, Nick 10–11 Treo 150, 153 “Trojan horse” software 51–2 True Crime (game) 187 TruSecure 52, 60, 63 TTPCom 155–6 Tuch, Bruce 210 TVs see also video recorders flat-panel displays ix, 94, 147, 202–3, 230–2, 311 hard disks 204–8 screens 202–3, 230–2 set-top boxes 203, 205–6 UWB 214–18 Wi-Fi 212–18 U UBS Warburg 31, 45, 80–1, 89, 170, 174 UDDI 25–7 ultrawideband (UWB) 96–7, 214–19 UMTS see W-CDMA standard “undershoot” stage, industries 9, 109 UNECE 332–4 Ungerman, Jerry 52 Unimate 332–3 United Airlines 27 Universal Music 222–3, 226–7 Unix 9, 25, 85, 108 USB ports 78 usernames 59 USGBC 300–2 utility companies, cyber-terrorism threats 75–6 utility factors 7, 16, 17, 19–22, 42–8 UWB see ultrawideband V V500 mobile phone 157 vaccines 265–6 Vadasz, Les 33 Vail, Tim 290 value added 5–7, 37–40, 133, 138–9 value transistors 11 van Nee, Richard 211 Varian, Hal 24 VC see venture capital Veeco Instruments 324 vendors complexity issues 84–110 consumer needs 94–7 Venter, Craig 262–3, 271 venture capital (VC) 12, 31, 45, 79, 92, 107, 126–7, 238, 308, 321–6 Verdia 254–5, 261 Veritas 39, 85 Vertex 247 vertical integration, mobile phones 156–61 Vertu brand 173–4 Viacom 224 video phone calls 84, 103–6, 164–5, 167–8 video recorders see also TVs DVRs 205–6 handheld video players 206 hard disks 204–8 PVRs 203, 205–6 Wi-Fi 212–13 video searches, Google 11 357 THE FUTURE OF TECHNOLOGY Video Voyeurism Prevention Act, America 180 video-game consoles see gaming Virgin 95, 160, 167–8 Virgin Mobile 160, 167–8 virtual private networks (VPNs) 54, 68, 86–7 virtual tissue, biotechnology 248 virtualisation concepts 15–16, 88–92 viruses 45, 47, 49–56, 59–60, 67–8, 74, 86, 89 anti-virus software 50–1, 60, 67–8 concepts 49–56, 59–60, 74 costs 50–1 double-clicking dangers 59–60 Vista Research 46, 62, 67 Vodafone 164–5 voice conversations internet 103–6 mobile phones 165–9, 171 voice mail 104–6 voice-over-internet protocol (VOIP) 103–6, 167 Vonage 104, 110 VPNs see virtual private networks W W3C see World Wide Web Consortium W-CDMA standard 163–4, 168 Waksal, Sam 249 Wal-Mart 95, 114–15, 131–2, 140, 224, 228 Walkman 192 warfare AI 338 biotechnology 265–6 gaming comparisons 195–7, 339 nanotechnology 319 Warner Music 222–3, 226–7 Watson, James 236, 247, 271 web services 21–2, 25–7, 31, 80, 88–92, 109, 203 see also internet; services... complexity issues 88–92, 109 standards 90–1 Webster, Mark 211 WECA see Wireless Ethernet Compatibility Alliance Weill, Peter 30 Welland, Mark 318 Western Union 33, 108 Westinghouse Electric 332 wheat 253 white page 99–100 Wi-Fi 34–5, 66–7, 93, 95–7, 153, 203, 209–18 concepts 209–18 forecasts 209, 212–13 historical background 209–13 hotspots 211–12 mobile phones 212 standards 210–13 358 threats 212–13 UWB 214–18 Wilkerson, John 237 Williams, Robbie 222, 226 Wilsdon, James 318 WiMax 212–13 WiMedia 213 Wimmer, Eckard 265 wind power 275–6, 286, 289–90, 302 Windows 15, 24–5, 55–6, 96, 101, 108, 152, 203 Windows Media Center 203 WinFS 101 Wipro 112, 115, 120–1, 125–9, 131–5, 138, 145–6 Wireless Ethernet Compatibility Alliance (WECA) 211 wireless technology ix, 11, 34–5, 39, 66–7, 93, 95–7, 109–10, 147, 150–3, 167, 168–9, 171–3, 203, 209–13, 334 see also Wi-Fi Bluetooth wireless links 171–2, 173, 214–15, 218 concepts 209–13, 334 historical background 209–13 Wladawsky-Berger, Irving vii, 5, 19, 22, 25, 38–9 Wolfe, Josh 323 Wong, Leonard 195 Wood, Ben 156–7, 160, 174 Woodcock, Steven 338–9 Word 84, 107 work-life balance 80–1, 94 see also employees World Wide Web Consortium (W3C) 25 worm viruses 49–50, 59, 86, 89 Wright, Myles 118 “ws splat” 90–1 WSDL 25–7 X x-ray crystallography 247–8 Xbox 189, 206–7 Xelibri mobile phones 170, 172, 174 Xerox 108–9 XML see extensible markup language XtremeSpectrum 216 Y Y2K crisis 76, 126, 128 Yagan, Sam 229 Yanagi, Soetsu 84 Yurek, Greg 288 Z ZapThink 91


pages: 144 words: 55,142

Interlibrary Loan Practices Handbook by Cherie L. Weible, Karen L. Janke

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Firefox, information retrieval, Internet Archive, late fees, optical character recognition, pull request, transaction costs, Works Progress Administration

In June 2002, Texas A&M significantly expanded its services to digitize or reformat its print collections on demand by implementing free electronic document delivery of articles for a campus of 48,000 students. Increasingly, ILL personnel will change ILL services and systems to include more document delivery, digital library production, digitization on demand that supports print-on-demand or reprinting services, and other services such as Optical Character Recognition (OCR). The need for format conversion parallels users’ expectations that the content we deliver take the form of their preferred technology and intended use. change factors for tomorrow’s hybrid services and context-sensitive workflow Today’s workflow must evolve new service models because of several key factors in our environment: the future of interlibrary loan •â•‡ Increased full-text sources will reduce the need for scanning of print.


pages: 188 words: 9,226

Collaborative Futures by Mike Linksvayer, Michael Mandiberg, Mushon Zer-Aviv

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

4chan, Benjamin Mako Hill, British Empire, citizen journalism, cloud computing, collaborative economy, corporate governance, crowdsourcing, Debian, en.wikipedia.org, Firefox, informal economy, jimmy wales, Kickstarter, late capitalism, loose coupling, Marshall McLuhan, means of production, Naomi Klein, Network effects, optical character recognition, packet switching, postnationalism / post nation state, prediction markets, Richard Stallman, semantic web, Silicon Valley, slashdot, Slavoj Žižek, stealth mode startup, technoutopianism, the medium is the message, The Wisdom of Crowds, web application

The Weakest Link… (1) Numerous technological frameworks gather information during use and feed the results back into the apparatus. The most evident example is Google, whose PageRank algorithm uses a survey of links between sites to classify their relevance to a user’s query. Likewise ReCaptcha uses a commonplace authentication in a two-part implementation, firstly to exclude automated spam, and then to digitize words from books that were not recognizable by optical character recognition. Contributions are extracted from participants unconscious of the recycling of their activity into the finessing of the value-chain. Web site operators who integrate ReCaptcha, however, know precisely what they're doing, and choose to transform a necessary defense mechanism for their site into a productive channel of contributions to what they regard as a useful task. (2) Aggregation services such as delicious and photographic archives like flickr, ordered by tags and geographic information, leverage users’ selfinterests in categorizing their own materials to enhance usability.

The Orbital Perspective: Lessons in Seeing the Big Picture From a Journey of 71 Million Miles by Astronaut Ron Garan, Muhammad Yunus

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Airbnb, barriers to entry, book scanning, Buckminster Fuller, clean water, corporate social responsibility, crowdsourcing, global village, Google Earth, Indoor air pollution, jimmy wales, optical character recognition, ride hailing / ride sharing, shareholder value, Silicon Valley, Skype, smart transportation, Stephen Hawking, transaction costs, Turing test, Uber for X, web of trust

ReCAPTCHA is an offshoot of this project, stemming from the realization that humans type about two hundred million CAPTCHAs into Internet pages every day—╉totaling more than 500,000 hours, if typing a single CAPTCHA takes ten seconds. But in typing that CAPTCHA, the human brain is doing something a machine can’t. And that human capability is now being used to help digitize books. Digitizing old books usually involves scanning the books and then converting the images into text using optical character recognition algorithms. Unfortunately, computers can’t recognize all of the words, and generally the older the book is, the harder it is for computers to decipher the words. ReCAPTCHA takes the images 146â•…    L O O K I N G F O R WARD computers can’t recognize and uses those words as CAPTCHAs, using humans to identify the words the computer couldn’t. If this unknown word is presented to a number of humans, and all identify it as the same word, the word is now accurately identified.


pages: 170 words: 51,205

Information Doesn't Want to Be Free: Laws for the Internet Age by Cory Doctorow, Amanda Palmer, Neil Gaiman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Airbnb, barriers to entry, Brewster Kahle, cloud computing, Dean Kamen, Edward Snowden, game design, Internet Archive, John von Neumann, Kickstarter, optical character recognition, Plutocrats, plutocrats, pre–internet, profit maximization, recommendation engine, rent-seeking, Saturday Night Live, Skype, Steve Jobs, Steve Wozniak, Stewart Brand, transfer pricing, Whole Earth Catalog, winner-take-all economy

Start with the e-book, then—all you need to do is download a free screen-capture program, one that is capable of capturing a predetermined region of your screen at the click of a button. Pair it up with your e-book-reading app (Amazon’s Kindle app, say), click the button that takes you to the first page, and then click the button that captures and saves the rectangle of screen where the page is. Do this once for every page in the book—call it one page per second—and you’ll end up with a folder full of pages. Now upload those pages to Google’s free optical character-recognition software (which converts pictures of words back into plain text), download the results, and call it a day. There are analogs to these processes for practically all locked media. You can play locked audio out the headphone jack of one device and into the mic jack of another, recapturing the audio. You can plug the high-definition analog outputs from your media player into the high-definition analog inputs on your computer and recapture video.


pages: 224 words: 12,941

From Gutenberg to Google: electronic representations of literary texts by Peter L. Shillingsburg

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

British Empire, computer age, double helix, HyperCard, hypertext link, interchangeable parts, invention of the telephone, means of production, optical character recognition, pattern recognition, Saturday Night Live, Socratic dialogue

This unsought notion of a dank cellar of electronic texts initiated a train of thoughts – the first being that even this early in the electronic revolution the world is overwhelmed by texts of unknown provenance, with unknown corruptions, representing unidentified or misidentified versions. These texts frequently result from enthusiasm for computers and the Internet in particular. Texts are easily scanned, either as images or by optical character recognition (OCR) software and posted on the World Wide Web; thus, almost anyone can easily become an editor, producer, and publisher. From comments at conferences and advice given on the Internet, I conclude that the big worry is not authenticity, verification, or attribution. It is to avoid posting texts of works still in copyright. Where new scholarly editions of works exist, this warning means that the ersatz editor cum publisher of an electronic text will pick a handy older edition, frequently a cheap reprint, as a source.


pages: 482 words: 106,041

The World Without Us by Alan Weisman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

British Empire, carbon-based life, conceptual framework, invention of radio, nuclear winter, optical character recognition, out of africa, Ray Kurzweil, the High Line, trade route, uranium enrichment

Via the self-accruing wizardry of computers, an abundance of silicon, and vast opportunities afforded by modular memory and mechanical appendages, human extinction would become merely a jettisoning of the limited and not very durable vessels that our technological minds have finally outgrown. Prominent in the transhumanist (sometimes called posthuman) movement are Oxford philosopher Nick Bostrom; heralded inventor Ray Kurzweil, originator of optical character recognition, flat-bed scanners, and print-to-speech reading machines for the blind; and Trinity College bioethicist James Hughes, author of Citizen Cyborg: Why Democratic Societies Must Respond to the Redesigned Human of the Future. However Faustian, their discussion is compelling in its lure of immortality and preternatural power—and almost touching in its Utopian faith that a machine could be made so perfect that it would transcend entropy.


pages: 368 words: 96,825

Bold: How to Go Big, Create Wealth and Impact the World by Peter H. Diamandis, Steven Kotler

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, additive manufacturing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, cloud computing, crowdsourcing, Daniel Kahneman / Amos Tversky, dematerialisation, deskilling, Elon Musk, en.wikipedia.org, Exxon Valdez, fear of failure, Firefox, Galaxy Zoo, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, industrial robot, Internet of things, Jeff Bezos, John Harrison: Longitude, Jono Bacon, Just-in-time delivery, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, loss aversion, Louis Pasteur, Mahatma Gandhi, Mark Zuckerberg, Mars Rover, meta analysis, meta-analysis, microbiome, minimum viable product, move fast and break things, Narrative Science, Netflix Prize, Network effects, Oculus Rift, optical character recognition, packet switching, PageRank, pattern recognition, performance metric, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, ride hailing / ride sharing, risk tolerance, rolodex, self-driving car, sentiment analysis, shareholder value, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart grid, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, telepresence, telepresence robot, Turing test, urban renewal, web application, X Prize, Y Combinator

So Ahn started wondering if there was a better way to make use of all this time and energy, a way to turn those ten seconds of waste into actual work. “What if,” says Ahn, “there was some giant task that humans could do that computers could not that can be broken down into ten-second chunks?” This was the birth of reCAPTCHA, a website that serves a dual purpose, both helping to distinguish bots from humans while simultaneously helping to digitize books.15 Normally, we digitize books by scanning pages into a computer; next, an optical character recognition program runs through this text, attempting to turn images into actual words. Sometimes this works great; other times, not so well. The big problem is with old books, especially ones whose pages have yellowed. On average, for books written more than fifty years ago, computers can make out only about 70 percent of the text. That remaining 30 percent—that’s where reCAPTCHA comes in. When the computer can’t recognize a word, it sends it out as a CAPTCHA—meaning the next time you’re typing in drunken letters into your computer, know that you’re actually helping digitize the world’s libraries.


pages: 294 words: 81,292

Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, AI winter, Amazon Web Services, artificial general intelligence, Automated Insights, Bernie Madoff, Bill Joy: nanobots, brain emulation, cellular automata, cloud computing, cognitive bias, computer vision, cuban missile crisis, Daniel Kahneman / Amos Tversky, Danny Hillis, data acquisition, don't be evil, Extropian, finite state, Flash crash, friendly AI, friendly fire, Google Glasses, Google X / Alphabet X, Isaac Newton, Jaron Lanier, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, Loebner Prize, lone genius, mutually assured destruction, natural language processing, Nicholas Carr, optical character recognition, PageRank, pattern recognition, Peter Thiel, prisoner's dilemma, Ray Kurzweil, Rodney Brooks, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, smart grid, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, Stuxnet, superintelligent machines, technological singularity, The Coming Technological Singularity, traveling salesman, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, zero day

After training, when French text is input, the ANN will refer to the probabilistic rules it derived during its training and output its best translation. In essence, the ANN is recognizing patterns in the data. Today, finding patterns in vast amounts of unstructured data is one of AI’s most lucrative jobs. Besides language translation and data mining, ANNs are at work today in computer game AI, analyzing the stock market, and identifying objects in images. They’re in Optical Character Recognition programs that read the printed word, and in computer chips that steer guided missiles. ANNs put the “smart” in smart bombs. They’ll be critical to most AGI architectures as well. And there’s something important to remember from chapter 7 about these ubiquitous neural nets. Like genetic algorithms, ANNs are “black box” systems. That is, the input, French language in our example, is transparent.


pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists by Gary Marcus, Jeremy Freeman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, Albert Einstein, bioinformatics, bitcoin, brain emulation, cloud computing, complexity theory, computer age, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data acquisition, Drosophila, epigenetics, Google Glasses, iterative process, linked data, mouse model, optical character recognition, pattern recognition, personalized medicine, phenotype, race to the bottom, Richard Feynman, Richard Feynman, Ronald Reagan, semantic web, speech recognition, stem cell, Steven Pinker, supply-chain management, Turing machine, web application

The computation was stored in the connection matrix, and programming was replaced by learning algorithms such as Paul Werbos’s backpropagation. The PDP approach promised to solve problems that classic AI could not. Although neural network and machine learning have proven to be very powerful at performing certain kinds of tasks, but they have not bridged the gap between biological and artificial intelligence, except in very narrow domains, such as optical character recognition. What is missing? One possibility is that even neural networks are not “biological” enough. For example, in my PhD thesis I explored the possibility that endowing the simple summation nodes of neural networks with greater complexity, such as that provided by the elaborate dendritic trees of real neurons, would qualitatively enhance the power of these networks to compute. But the advantages turn out to be quantitative only; adding that particular sort of biological fidelity scarcely allowed us to span the biological-computational gap as we had hoped.

Remix: Making Art and Commerce Thrive in the Hybrid Economy by Lawrence Lessig

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Amazon Web Services, Andrew Keen, Benjamin Mako Hill, Berlin Wall, Bernie Sanders, Brewster Kahle, Cass Sunstein, collaborative editing, disintermediation, don't be evil, Erik Brynjolfsson, Internet Archive, invisible hand, Jeff Bezos, jimmy wales, Kevin Kelly, late fees, Netflix Prize, Network effects, new economy, optical character recognition, PageRank, recommendation engine, revision control, Richard Stallman, Ronald Coase, Saturday Night Live, SETI@home, sharing economy, Silicon Valley, Skype, slashdot, Steve Jobs, The Nature of the Firm, thinkpad, transaction costs, VA Linux

Price, or money, doesn’t police access. 80706 i-xxiv 001-328 r4nk.indd 166 8/12/08 1:55:31 AM T W O EC O NO MIE S: C O MMERC I A L A ND SH A RING 167 Voluntary contributions are all the supporters can rely upon to keep the work alive. • Distributed Proofreaders is a sharing economy. Inspired by Michael Hart’s Project Gutenberg, and launched in 2000 by Charles Franks, the Distributed Proofreaders project was conceived to help proofread for free the books that Hart made available for free. To compensate for the errors of optical character recognition (OCR) technology, the Distributed Proofreaders project takes individual pages from scanned books and presents them to individuals, along with the original text. Volunteers then correct the text through a kind of distributed-computing project. (See the next item for more on distributed computing.) Distributed Proofreaders has contributed to more than ten thousand books on Project Gutenberg.


pages: 484 words: 104,873

Rise of the Robots: Technology and the Threat of a Jobless Future by Martin Ford

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, additive manufacturing, Affordable Care Act / Obamacare, AI winter, algorithmic trading, Amazon Mechanical Turk, artificial general intelligence, autonomous vehicles, banking crisis, Baxter: Rethink Robotics, Bernie Madoff, Bill Joy: nanobots, call centre, Capital in the Twenty-First Century by Thomas Piketty, Chris Urmson, Clayton Christensen, clean water, cloud computing, collateralized debt obligation, computer age, debt deflation, deskilling, diversified portfolio, Erik Brynjolfsson, factory automation, financial innovation, Flash crash, Fractional reserve banking, Freestyle chess, full employment, Goldman Sachs: Vampire Squid, High speed trading, income inequality, indoor plumbing, industrial robot, informal economy, iterative process, Jaron Lanier, job automation, John Maynard Keynes: technological unemployment, John von Neumann, Khan Academy, knowledge worker, labor-force participation, labour mobility, liquidity trap, low skilled workers, low-wage service sector, Lyft, manufacturing employment, McJob, moral hazard, Narrative Science, Network effects, new economy, Nicholas Carr, Norbert Wiener, obamacare, optical character recognition, passive income, performance metric, Peter Thiel, Plutocrats, plutocrats, post scarcity, precision agriculture, price mechanism, Ray Kurzweil, rent control, rent-seeking, reshoring, RFID, Richard Feynman, Richard Feynman, Rodney Brooks, secular stagnation, self-driving car, Silicon Valley, Silicon Valley startup, single-payer health, software is eating the world, sovereign wealth fund, speech recognition, Spread Networks laid a new fibre optics cable between New York and Chicago, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Steven Pinker, strong AI, Stuxnet, technological singularity, telepresence, telepresence robot, The Bell Curve by Richard Herrnstein and Charles Murray, The Coming Technological Singularity, Thomas L Friedman, too big to fail, Tyler Cowen: Great Stagnation, union organizing, Vernor Vinge, very high income, Watson beat the top human players on Jeopardy!, women in the workforce

Unlike Vinge, Kurzweil, who has become the Singularity’s primary evangelist, has no qualms about attempting to peer beyond the event horizon and give us a remarkably detailed account of what the future will look like. The first truly intelligent machine, he tells us, will be built by the late 2020s. The Singularity itself will occur some time around 2045. Kurzweil is by all accounts a brilliant inventor and engineer. He has founded a series of successful companies to market his inventions in areas like optical character recognition, computer-generated speech, and music synthesis. He’s been awarded twenty honorary doctorate degrees as well as the National Medal of Technology and was inducted into the US Patent Office’s Hall of Fame. Inc. magazine once referred to him as the “rightful heir” to Thomas Edison. His work on the Singularity, however, is an odd mixture composed of a well-grounded and coherent narrative about technological acceleration, together with ideas that seem so speculative as to border on the absurd—including, for example, a heartfelt desire to resurrect his late father by gathering DNA from the gravesite and then regenerating his body using futuristic nanotechnology.


pages: 372 words: 109,536

The Panama Papers: Breaking the Story of How the Rich and Powerful Hide Their Money by Frederik Obermaier

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

banking crisis, blood diamonds, credit crunch, crony capitalism, Deng Xiaoping, Edward Snowden, family office, high net worth, income inequality, liquidationism / Banker’s doctrine / the Treasury view, Mikhail Gorbachev, mortgage debt, offshore financial centre, optical character recognition, out of africa, race to the bottom, We are the 99%, WikiLeaks

Nuix’s basic principle is simple: the files you want to search are uploaded into the program as ‘evidence’ and automatically tagged – or as the pros say, indexed. This is relatively easy with Word documents and emails, but harder with PDFs and photo files – and there are already hundreds of thousands of those in our data by this point. The Nuix program must therefore first be able to identify if there is any text in the pictures. This is done by text recognition software called optical character recognition or OCR. Only when every document has undergone OCR is a negative search result truly a negative search result. Only then can you be relatively certain that Angela Merkel is not hiding in the data after the search for ‘Angela Merkel’ has produced zero hits. That’s as long as the name is not in a fax that has been printed out and later scanned or has been written using an old typewriter; if that’s the case, OCR will not produce any hits.


pages: 330 words: 91,805

Peers Inc: How People and Platforms Are Inventing the Collaborative Economy and Reinventing Capitalism by Robin Chase

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, Airbnb, Amazon Web Services, Andy Kessler, banking crisis, barriers to entry, bitcoin, blockchain, Burning Man, business climate, call centre, car-free, cloud computing, collaborative consumption, collaborative economy, collective bargaining, congestion charging, crowdsourcing, cryptocurrency, decarbonisation, don't be evil, Elon Musk, en.wikipedia.org, ethereum blockchain, Ferguson, Missouri, Firefox, frictionless, Gini coefficient, hive mind, income inequality, index fund, informal economy, Internet of things, Jane Jacobs, Jeff Bezos, jimmy wales, job satisfaction, Kickstarter, Lean Startup, Lyft, means of production, megacity, Minecraft, minimum viable product, Network effects, new economy, Oculus Rift, openstreetmap, optical character recognition, pattern recognition, peer-to-peer lending, Richard Stallman, ride hailing / ride sharing, Ronald Coase, Ronald Reagan, Satoshi Nakamoto, Search for Extraterrestrial Intelligence, self-driving car, shareholder value, sharing economy, Silicon Valley, six sigma, Skype, smart cities, smart grid, Snapchat, sovereign wealth fund, Steve Crocker, Steve Jobs, Steven Levy, TaskRabbit, The Death and Life of Great American Cities, The Nature of the Firm, transaction costs, Turing test, Uber and Lyft, Zipcar

12 And so reCAPTCHA was born in 2007. reCAPTCHA takes the effort of typing the characters in a CAPTCHA and repurposes it to solve an entirely different problem. In order to make old newspapers or books useful online, they have to be scanned and the resulting images turned into machine-readable text to be usefully searchable. Sometimes the scanned or photographed image results in words that can’t be decoded using optical character recognition (OCR). This is a problem. When the CAPTCHAs are constructed using words tagged by OCR programs as unreadable, we smart humans do what computers can’t: We easily decode them! Tests have shown that reCAPTCHA text images are deciphered and transcribed with 99.1 percent accuracy, a rate comparable to the best human professional transcription services. Today, 100 million reCAPTCHAs are seen by computer users every day.

Algorithms Unlocked by Thomas H. Cormen

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

bioinformatics, knapsack problem, NP-complete, optical character recognition, Silicon Valley, sorting algorithm, traveling salesman

If the road is congested, however, the GPS might give you bad advice if you’re looking for the fastest route. We can still say that the routing algorithm that the GPS runs is correct, however, even if the input to the algorithm is not; for the input given to the routing algorithm, the algorithm produces the fastest route. Now, for some problems, it might be difficult or even impossible to say whether an algorithm produces a correct solution. Take optical character recognition for example. Is this 11 6 pixel image a 5 or an S? Some people might call it a 5, whereas others might call it an S, so how could we declare that a computer’s decision is either correct or incor- Chapter 1: What Are Algorithms and Why Should You Care? 3 rect? We won’t. In this book, we will focus on computer algorithms that have knowable solutions. Sometimes, however, we can accept that a computer algorithm might produce an incorrect answer, as long as we can control how often it does so.

Multitool Linux: Practical Uses for Open Source Software by Michael Schwarz, Jeremy Anderson, Peter Curtis

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

business process, Debian, defense in depth, GnuPG, index card, indoor plumbing, optical character recognition, publish or perish, RFC: Request For Comment, Richard Stallman, SETI@home, slashdot, web application, x509 certificate

Considering the ubiquity and importance of image files, Web site designers should take care to understand how digital images work, how they are stored, and how they can be manipulated. Linux provides a number of powerful tools that make it an excellent platform for image processing. And this means more than just making images for a Web site: It can include document archiving and preservation, scene rendering and creation of textures for computer games, artistic endeavors, and optical character recognition, to name a few. In this chapter, we'll discuss some of the popular types of image formats, their strengths and weaknesses, and ways to convert between them. We'll then discuss various types of image retouching, primarily oriented toward Web presentation. And we'll learn how to use various Linux tools along the way. Types of Image Formats Raster vs. Vector One of the most fundamental distinctions between image formats is whether the image information is represented in raster form or vector form (Figure 22-1).


pages: 302 words: 82,233

Beautiful security by Andy Oram, John Viega

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, Amazon Web Services, business intelligence, business process, call centre, cloud computing, corporate governance, credit crunch, crowdsourcing, defense in depth, en.wikipedia.org, fault tolerance, Firefox, loose coupling, market design, Monroe Doctrine, new economy, Nicholas Carr, Nick Leeson, Norbert Wiener, optical character recognition, packet switching, performance metric, pirate software, Search for Extraterrestrial Intelligence, security theater, SETI@home, Silicon Valley, Skype, software as a service, statistical model, Steven Levy, The Wisdom of Crowds, Upton Sinclair, web application, web of trust, x509 certificate, zero day, Zimmermann PGP

Part of the struggle to end the U.S. export controls on crypto involved the publication of PGP source code in its entirety, in printed book form. Printed books were and are exempt from the export controls. This happened first in 1995 with the publication of PGP Source Code and Internals (MIT Press). It happened again later when Pretty Good Privacy, Inc., published the source code of PGP in a more sophisticated set of books with specialized software tools that were optimized for easy optical character recognition (OCR) scanning of C source code. This made it easy to export unlimited quantities of cryptographic source code, rendering the export controls moot and undermining the political will to continue imposing the export controls. Today, there has been nearly an about-face in government attitude about cryptography. National and international laws, regulations, and expectations about privacy, data governance, and corporate governance either imply or require the widespread use of strong cryptography.


pages: 423 words: 126,096

Our Own Devices: How Technology Remakes Humanity by Edward Tenner

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Bonfire of the Vanities, card file, Douglas Engelbart, Frederick Winslow Taylor, future of work, indoor plumbing, informal economy, invention of the telephone, invisible hand, Jacquard loom, Joseph-Marie Jacquard, Network effects, optical character recognition, QWERTY keyboard, Stewart Brand, women in the workforce

And PDAs without plug-in keyboards are routinely connected to conventional computers for keyboard-entered data.44 Voice control was as exciting in the late 1990s as handwriting recognition had been ten years earlier, and with similar results: a wave of marketing and financial troubles that has failed to shake underlying optimism about the technology’s future. While voice recognition software can promote its own overuse injuries, it can now work with natural phrasing and infer spelling from context. But it is unlikely to eliminate the keyboard, because it will still make errors (or users will still fail to enunciate properly), and editing copy orally is even slower and more tedious than correcting it with a keyboard. Optical character recognition data also need checking and editing. Typing will probably be further reduced in familiar applications in the future, but it will also be extended to new tasks. A new global keyboard order is emerging. Intensive “production” typing is less necessary because more data arrive already digitized and need only formatting and correction. The heavy typing that remains can, like the production of keyboards themselves, be outsourced to low-wage countries.


pages: 523 words: 148,929

Physics of the Future: How Science Will Shape Human Destiny and Our Daily Lives by the Year 2100 by Michio Kaku

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

agricultural Revolution, AI winter, Albert Einstein, augmented reality, Bill Joy: nanobots, bioinformatics, blue-collar work, British Empire, Brownian motion, cloud computing, Colonization of Mars, DARPA: Urban Challenge, delayed gratification, double helix, Douglas Hofstadter, en.wikipedia.org, friendly AI, Gödel, Escher, Bach, hydrogen economy, I think there is a world market for maybe five computers, industrial robot, invention of movable type, invention of the telescope, Isaac Newton, John von Neumann, life extension, Louis Pasteur, Mahatma Gandhi, Mars Rover, megacity, Murray Gell-Mann, new economy, oil shale / tar sands, optical character recognition, pattern recognition, planetary scale, postindustrial economy, Ray Kurzweil, refrigerator car, Richard Feynman, Richard Feynman, Rodney Brooks, Ronald Reagan, Search for Extraterrestrial Intelligence, Silicon Valley, Simon Singh, speech recognition, stem cell, Stephen Hawking, Steve Jobs, telepresence, The Wealth of Nations by Adam Smith, Thomas L Friedman, Thomas Malthus, trade route, Turing machine, uranium enrichment, Vernor Vinge, Wall-E, Walter Mischel, Whole Earth Review, X Prize

With the ability to devour or rearrange whole star systems, there should be some footprint left behind by this rapidly expanding singularity. (His detractors say that he is whipping up a near-religious fervor around the singularity. However, his supporters say that he has an uncanny ability to correctly see into the future, judging by his track record.) Kurzweil cut his teeth on the computer revolution by starting up companies in diverse fields involving pattern recognition, such as speech recognition technology, optical character recognition, and electronic keyboard instruments. In 1999, he wrote a best seller, The Age of Spiritual Machines: When Computers Exceed Human Intelligence, which predicted when robots will surpass us in intelligence. In 2005, he wrote The Singularity Is Near and elaborated on those predictions. The fateful day when computers surpass human intelligence will come in stages. By 2019, he predicts, a $1,000 personal computer will have as much raw power as a human brain.


pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets by David J. Leinweber

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AI winter, algorithmic trading, asset allocation, banking crisis, barriers to entry, Big bang: deregulation of the City of London, butterfly effect, buttonwood tree, buy low sell high, capital asset pricing model, citizen journalism, collateralized debt obligation, corporate governance, Craig Reynolds: boids flock, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Danny Hillis, demand response, disintermediation, distributed generation, diversification, diversified portfolio, Emanuel Derman, en.wikipedia.org, experimental economics, financial innovation, Gordon Gekko, implied volatility, index arbitrage, index fund, information retrieval, Internet Archive, John Nash: game theory, Khan Academy, load shedding, Long Term Capital Management, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, market fragmentation, market microstructure, Mars Rover, moral hazard, mutually assured destruction, natural language processing, Network effects, optical character recognition, paper trading, passive investing, pez dispenser, phenotype, prediction markets, quantitative hedge fund, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Renaissance Technologies, Richard Stallman, risk tolerance, risk-adjusted returns, risk/return, Ronald Reagan, semantic web, Sharpe ratio, short selling, Silicon Valley, Small Order Execution System, smart grid, smart meter, social web, South Sea Bubble, statistical arbitrage, statistical model, Steve Jobs, Steven Levy, Tacoma Narrows Bridge, the scientific method, The Wisdom of Crowds, time value of money, too big to fail, transaction costs, Turing machine, Upton Sinclair, value at risk, Vernor Vinge, yield curve, Yogi Berra

Intr oduction xliii xliv Introduction Clouds don’t lie. I hope that after reading this book you’ll have a better sense of how technology shapes markets, and how to be a nimble participant in the future of electronic finance. Web Site This book includes many URLs, which would tire the fingers of even the most dedicated nerds. Someday soon you’ll point your handheld’s camera at the book and it will use OCR (optical character recognition) to find (or offer to sell you) the material you’re looking for. Absent that fancy gadget, try the web site NerdsonWallStreet.com. It has links in to all of these references, plus color and animated versions of the black & white screen grabs found in the book. The site will be updated often with new and topical items. Notes 1. A term of respect popularized by Michael Lewis in his 1989 book, Liar’s Poker (W.W.


pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, anti-communist, artificial general intelligence, autonomous vehicles, barriers to entry, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, demographic transition, Douglas Hofstadter, Drosophila, Elon Musk, en.wikipedia.org, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, Gödel, Escher, Bach, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John von Neumann, knowledge worker, Menlo Park, meta analysis, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Norbert Wiener, NP-complete, nuclear winter, optical character recognition, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, strong AI, superintelligent machines, supervolcano, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, transaction costs, Turing machine, Vernor Vinge, Watson beat the top human players on Jeopardy!, World Values Survey

There are robotic pets and cleaning robots, lawn-mowing robots, rescue robots, surgical robots, and over a million industrial robots.64 The world population of robots exceeds 10 million.65 Modern speech recognition, based on statistical techniques such as hidden Markov models, has become sufficiently accurate for practical use (some fragments of this book were drafted with the help of a speech recognition program). Personal digital assistants, such as Apple’s Siri, respond to spoken commands and can answer simple questions and execute commands. Optical character recognition of handwritten and typewritten text is routinely used in applications such as mail sorting and digitization of old documents.66 Machine translation remains imperfect but is good enough for many applications. Early systems used the GOFAI approach of hand-coded grammars that had to be developed by skilled linguists from the ground up for each language. Newer systems use statistical machine learning techniques that automatically build statistical models from observed usage patterns.


pages: 380 words: 118,675

The Everything Store: Jeff Bezos and the Age of Amazon by Brad Stone

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, airport security, AltaVista, Amazon Mechanical Turk, Amazon Web Services, bank run, Bernie Madoff, big-box store, Black Swan, book scanning, Brewster Kahle, call centre, centre right, Clayton Christensen, cloud computing, collapse of Lehman Brothers, crowdsourcing, cuban missile crisis, Danny Hillis, Douglas Hofstadter, Elon Musk, facts on the ground, game design, housing crisis, invention of movable type, inventory management, James Dyson, Jeff Bezos, Kevin Kelly, Kodak vs Instagram, late fees, loose coupling, low skilled workers, Maui Hawaii, Menlo Park, Network effects, new economy, optical character recognition, pets.com, Ponzi scheme, quantitative hedge fund, recommendation engine, Renaissance Technologies, RFID, Rodney Brooks, search inside the book, shareholder value, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, Skype, statistical arbitrage, Steve Ballmer, Steve Jobs, Steven Levy, Stewart Brand, Thomas L Friedman, Tony Hsieh, Whole Earth Catalog, why are manhole covers round?

“Think of two bookstores, one where all the books are shrink-wrapped and one where you can sit as long as you want and read any book you want. Which one do you think will sell more books?” Publishers were concerned that Search Inside the Book might open up the floodgates of online piracy. Most, however, agreed to try it out and gave Amazon physical copies of their titles, which were shipped to a contractor in the Philippines to be scanned. Then Manber’s team ran optical character-recognition software over the book files to convert the scanned images into text that Amazon’s search algorithms could navigate and index. To reduce the chance that customers would read the books for free, Amazon served up only snippets of content—one or two pages before and after the search term, for example, and only to customers who had credit cards on file. It also dropped a small piece of code, called a cookie, in each customer’s computer to ensure he didn’t keep coming back to read additional pages without paying.


pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts by Richard Susskind, Daniel Susskind

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, 3D printing, additive manufacturing, AI winter, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, Andrew Keen, Atul Gawande, Automated Insights, autonomous vehicles, Big bang: deregulation of the City of London, big data - Walmart - Pop Tarts, Bill Joy: nanobots, business process, business process outsourcing, Cass Sunstein, Checklist Manifesto, Clapham omnibus, Clayton Christensen, clean water, cloud computing, computer age, computer vision, conceptual framework, corporate governance, crowdsourcing, Daniel Kahneman / Amos Tversky, death of newspapers, disintermediation, Douglas Hofstadter, en.wikipedia.org, Erik Brynjolfsson, Filter Bubble, Frank Levy and Richard Murnane: The New Division of Labor, full employment, future of work, Google Glasses, Google X / Alphabet X, Hacker Ethic, industrial robot, informal economy, information retrieval, interchangeable parts, Internet of things, Isaac Newton, James Hargreaves, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Khan Academy, knowledge economy, lump of labour, Marshall McLuhan, Narrative Science, natural language processing, Network effects, optical character recognition, personalized medicine, pre–internet, Ray Kurzweil, Richard Feynman, Richard Feynman, Second Machine Age, self-driving car, semantic web, Skype, social web, speech recognition, spinning jenny, strong AI, supply-chain management, telepresence, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, transaction costs, Turing test, Watson beat the top human players on Jeopardy!, young professional

At Deloitte, in the United Kingdom, for instance, the collective expertise of around 250 of their tax specialists was distilled into a system to help major clients directly prepare and submit their corporate tax returns. This system was being used by more than 70 per cent of FTSE 100 companies when it was sold in 2009 to Thomson Reuters, the global information provider. Also at Deloitte, the task of recovering foreign VAT payments is no longer done by human experts, but by a system, Revatic Smart. This scans clients’ documents using optical character-recognition software, and automatically files the correct forms, with little human input.260 In most cases, these tax platforms computerize tasks that would have once been done manually by a human being. But the firm also employs more than 10,000 people in India to undertake routine tax work. With regard to national tax authorities and their operations, many still rely on taxpayers, from individuals to the largest multinationals, to self-assess.

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, augmented reality, autonomous vehicles, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business intelligence, c2.com, call centre, carbon-based life, cellular automata, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, disintermediation, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, factory automation, friendly AI, George Gilder, Gödel, Escher, Bach, informal economy, information retrieval, invention of the telephone, invention of the telescope, invention of writing, Isaac Newton, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Mikhail Gorbachev, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Norbert Wiener, oil shale / tar sands, optical character recognition, pattern recognition, phenotype, premature optimization, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Richard Feynman, Rodney Brooks, Search for Extraterrestrial Intelligence, semantic web, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, transaction costs, Turing machine, Turing test, Vernor Vinge, Y2K, Yogi Berra

Machines that can more precisely carry out their missions have increased value, which explains why they are being built. There are tens of thousands of projects that are advancing the various aspects of the law of accelerating returns in diverse incremental ways. Regardless of near-term business cycles, support for "high tech" in the business community, and in particular for software development, has grown enormously. When I started my optical character recognition (OCR) and speech-synthesis company (Kurzweil Computer Products) in 1974, high-tech venture deals in the United States totaled less than thirty million dollars (in 1974 dollars). Even during the recent high-tech recession (2000–2003), the figure was almost one hundred times greater.79 We would have to repeal capitalism and every vestige of economic competition to stop this progression.


pages: 677 words: 206,548

Future Crimes: Everything Is Connected, Everyone Is Vulnerable and What We Can Do About It by Marc Goodman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, 3D printing, additive manufacturing, Affordable Care Act / Obamacare, Airbnb, airport security, Albert Einstein, algorithmic trading, artificial general intelligence, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, Bill Joy: nanobots, bitcoin, Black Swan, blockchain, borderless world, Brian Krebs, business process, butterfly effect, call centre, Chelsea Manning, cloud computing, cognitive dissonance, computer vision, connected car, corporate governance, crowdsourcing, cryptocurrency, data acquisition, data is the new oil, Dean Kamen, disintermediation, don't be evil, double helix, Downton Abbey, Edward Snowden, Elon Musk, Erik Brynjolfsson, Filter Bubble, Firefox, Flash crash, future of work, game design, Google Chrome, Google Earth, Google Glasses, Gordon Gekko, high net worth, High speed trading, hive mind, Howard Rheingold, hypertext link, illegal immigration, impulse control, industrial robot, Internet of things, Jaron Lanier, Jeff Bezos, job automation, John Harrison: Longitude, Jony Ive, Julian Assange, Kevin Kelly, Khan Academy, Kickstarter, knowledge worker, Kuwabatake Sanjuro: assassination market, Law of Accelerating Returns, Lean Startup, license plate recognition, litecoin, M-Pesa, Mark Zuckerberg, Marshall McLuhan, Menlo Park, mobile money, more computing power than Apollo, move fast and break things, Nate Silver, national security letter, natural language processing, obamacare, Occupy movement, Oculus Rift, offshore financial centre, optical character recognition, pattern recognition, personalized medicine, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, RAND corporation, ransomware, Ray Kurzweil, refrigerator car, RFID, ride hailing / ride sharing, Rodney Brooks, Satoshi Nakamoto, Second Machine Age, security theater, self-driving car, shareholder value, Silicon Valley, Silicon Valley startup, Skype, smart cities, smart grid, smart meter, Snapchat, social graph, software as a service, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, Stuxnet, supply-chain management, technological singularity, telepresence, telepresence robot, Tesla Model S, The Wisdom of Crowds, Tim Cook: Apple, trade route, uranium enrichment, Wall-E, Watson beat the top human players on Jeopardy!, Wave and Pay, We are Anonymous. We are Legion, web application, WikiLeaks, Y Combinator, zero day

Previously, such high-tech gear would only have resided in a spy agency or with the FBI, but now, given the exponential drop in pricing of these technologies, even a neighborhood mom can spy on her kids or potentially cheating spouse. In the world of big data, we can even leak our physical location without a bugged mobile phone or GPS tracker hidden in our car. A new technology, known as an automatic license plate reader (ALPR), allows both governments and individuals to use video cameras and optical character recognition to record the locations of cars as they pass from one camera point to another, revealing the real-time movement of any vehicle throughout a city or country with great detail. From Minnesota to New Jersey, and from Ankara to Sydney, hundreds of millions of individual license plate records have been stored. As a result, a query can be applied against these massive databases to determine the position of any vehicle over time.


pages: 678 words: 216,204

The Wealth of Networks: How Social Production Transforms Markets and Freedom by Yochai Benkler

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

affirmative action, barriers to entry, bioinformatics, Brownian motion, call centre, Cass Sunstein, centre right, clean water, dark matter, desegregation, East Village, fear of failure, Firefox, game design, George Gilder, hiring and firing, Howard Rheingold, informal economy, invention of radio, Isaac Newton, iterative process, Jean Tirole, jimmy wales, market bubble, market clearing, Marshall McLuhan, New Journalism, optical character recognition, pattern recognition, pre–internet, price discrimination, profit maximization, profit motive, random walk, recommendation engine, regulatory arbitrage, rent-seeking, RFID, Richard Stallman, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, shareholder value, Silicon Valley, Skype, slashdot, social software, software patent, spectrum auction, technoutopianism, The Fortune at the Bottom of the Pyramid, The Nature of the Firm, transaction costs

Project Gutenberg volunteers can select any book that is in the public domain to transform into an e-text. The volunteer submits a copy of the title page of the book to Michael Hart--who founded the project--for copyright research. The volunteer is notified to proceed if the book passes the copyright clearance. The decision on which book to convert to e-text is left up to the volunteer, subject to copyright limitations. Typically, a volunteer converts a book to ASCII format using OCR (optical character recognition) and proofreads it one time in order to screen it for major errors. He or she then passes the ASCII file to a volunteer proofreader. This exchange is orchestrated with very little supervision. The volunteers use a Listserv mailing list and a bulletin board to initiate and supervise the exchange. In addition, books are labeled with a version number indicating how many times they have been proofed.


pages: 348 words: 39,850

Data Scientists at Work by Sebastian Gutierrez

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, algorithmic trading, bioinformatics, bitcoin, business intelligence, chief data officer, clean water, cloud computing, computer vision, continuous integration, correlation does not imply causation, crowdsourcing, data is the new oil, DevOps, domain-specific language, follow your passion, full text search, informal economy, information retrieval, Infrastructure as a Service, inventory management, iterative process, linked data, Mark Zuckerberg, microbiome, Moneyball by Michael Lewis explains big data, move fast and break things, natural language processing, Network effects, nuclear winter, optical character recognition, pattern recognition, Paul Graham, personalized medicine, Peter Thiel, pre–internet, quantitative hedge fund, quantitative trading / quantitative finance, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, self-driving car, side project, Silicon Valley, Skype, software as a service, speech recognition, statistical model, Steve Jobs, stochastic process, technology bubble, text mining, the scientific method, web application

All of a sudden Viagra has an @ symbol in it, a one numeral instead of an “I”, or similar things like that. The spammers were trying to get around matching tokens, naive Bayes filters, and similar spam detection techniques. This kept escalating and escalating. Then, rather than using words, spammers started using images with words, which meant that the people trying to stop spammers had to start doing OCR [Optical Character Recognition] on images to get out the tokens to put them through their models in order to identify spam. This continued until it got to the point where there needed to be a new approach. It became very clear to us that it’s not good enough to just look at the content spammers are sending. We’re an international company, so the content we are analyzing could be in many, many languages. It could also be all images.


pages: 669 words: 210,153

Tools of Titans: The Tactics, Routines, and Habits of Billionaires, Icons, and World-Class Performers by Timothy Ferriss

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Airbnb, artificial general intelligence, asset allocation, Atul Gawande, augmented reality, back-to-the-land, Bernie Madoff, Bertrand Russell: In Praise of Idleness, Black Swan, blue-collar work, Buckminster Fuller, business process, Cal Newport, call centre, Checklist Manifesto, cognitive bias, cognitive dissonance, Colonization of Mars, Columbine, correlation does not imply causation, David Brooks, David Graeber, diversification, diversified portfolio, Donald Trump, effective altruism, Elon Musk, fault tolerance, fear of failure, Firefox, follow your passion, future of work, Google X / Alphabet X, Howard Zinn, Hugh Fearnley-Whittingstall, Jeff Bezos, job satisfaction, Johann Wolfgang von Goethe, Kevin Kelly, Kickstarter, Lao Tzu, life extension, Mahatma Gandhi, Mark Zuckerberg, Mason jar, Menlo Park, Mikhail Gorbachev, Nicholas Carr, optical character recognition, PageRank, passive income, pattern recognition, Paul Graham, Peter H. Diamandis: Planetary Resources, Peter Singer: altruism, Peter Thiel, phenotype, post scarcity, premature optimization, QWERTY keyboard, Ralph Waldo Emerson, Ray Kurzweil, recommendation engine, rent-seeking, Richard Feynman, Richard Feynman, risk tolerance, Ronald Reagan, sharing economy, side project, Silicon Valley, skunkworks, Skype, Snapchat, social graph, software as a service, software is eating the world, stem cell, Stephen Hawking, Steve Jobs, Stewart Brand, superintelligent machines, Tesla Model S, The Wisdom of Crowds, Thomas L Friedman, Wall-E, Washington Consensus, Whole Earth Catalog, Y Combinator

Note-Taking—Distilling the Gems Maria and I have a nearly identical note-taking process for books: “I highlight in the Kindle app in the iPad, and then Amazon has this function where you can, basically, see your Kindle notes and highlights on the desktop of your computer. I copy them from that page and paste them into an Evernote file to have all of my notes on a specific book in one place. I also take a screen grab of a specific iPad Kindle page with my highlighted passage, and then email that screen grab into my Evernote email because Evernote has, as you know, optical character recognition. So, when I search within it, it’s also going to search the text in that image. I don’t have to wait until I finish the book to explore all my notes. . . . I love Evernote. I’ve been using it for many years, and I could probably not get through my day without it.” If Maria is reading a paper book and adding her notes in the margins (what she calls “marginalia”), she’ll sometimes add “BL” to indicate “beautiful language.”


pages: 889 words: 433,897

The Best of 2600: A Hacker Odyssey by Emmanuel Goldstein

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

affirmative action, Apple II, call centre, don't be evil, Firefox, game design, Hacker Ethic, hiring and firing, information retrieval, late fees, license plate recognition, optical character recognition, packet switching, pirate software, place-making, profit motive, QWERTY keyboard, RFID, Robert Hanssen: Double agent, rolodex, Ronald Reagan, Silicon Valley, Skype, spectrum auction, statistical model, Steve Jobs, Steve Wozniak, Steven Levy, Telecommunications Act of 1996, telemarketer, Y2K

With a line drawn for height detection and a side-mounted camera, over-height vehicles, usually trucks, can be detected and someone alerted to stop them. If there are different speed limits for trucks and cars, this is how they can be differentiated. The resultant freeze frame will be automatically processed to produce a printed picture of your vehicle from the rear, showing your license plate, and then imprint the image with your vehicle’s speed, the date, and time. AT&T is above 95 percent accuracy in doing optical character recognition on your license plate and automatically entering the plate number into the computer system. Imagine how easy those European license plates must be for OCR. Now if we could just standardize the print and colors used on U.S. plates.… 94192c10.qxd 6/3/08 3:32 PM Page 331 Learning to Hack Other Things Not uncommonly, a second camera will simultaneously take a photo of the driver. Look around when you see one camera and see if you can find the second one.