book scanning

21 results back to index

pages: 117 words: 30,654

Kindle Formatting: The Complete Guide to Formatting Books for the Amazon Kindle by Joshua Tallent


book scanning, job automation, optical character recognition

The easiest way to get the book back into a digital format is to scan it and run it through an Optical Character Recognition (OCR) software program. There are a variety of options available to the do-it-yourself person or to the pay-someone-else person. The main benefit to doing the process yourself is saving money, but you may find that having some help in the process is easier and faster. The first step in the OCR process is to have your book scanned. This is a process where each page of your book is turned into an image that can be loaded into the OCR program. There are a variety of places that will do scanning for you, or you can tackle the process yourself. Some copy and print stores (like FedEx/Kinko’s) offer scanning services, but you will often find the best prices at companies that specialize in scanning documents onto microfiche.

Some of these companies even have machines that can automate the scanning process by automatically turning the pages of the book. Be aware that the easiest way to scan a book on regular consumer scanners is to cut off the binding, which will effectively ruin the book. If your book is rare and you want to keep it intact, you should make sure the scanning company knows to handle it gently and to not cut off the binding. There is one consumer scanner called the OpticBook 3600 that is specifically designed for book scanning. That device is built in a way that allows a good scan of the pages without cutting the binding off or breaking the binding by forcing the book into unnatural positions on a flat surface. If you decide to scan the book yourself, you will need a flatbed or feed scanner. These devices are available at most electronics and computer stores and at various retailers online. They can be inexpensive or very expensive, depending on the options included and the quality of the scanner, and you may find that the available options are overwhelming.


pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier


23andMe, Affordable Care Act / Obamacare, airport security, AltaVista, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, IBM and the Holocaust, index card, informal economy, Internet of things, invention of the printing press, Jeff Bezos, Louis Pasteur, Mark Zuckerberg, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, performance metric, Peter Thiel, Post-materialism, post-materialism, random walk, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, speech recognition, Steve Jobs, Steven Levy, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Watson beat the top human players on Jeopardy!

Instead of nicely translated pages of text in two languages, Google availed itself of a larger but also much messier dataset: the entire global Internet and more. Its system sucked in every translation it could find, in order to train the computer. In went to corporate websites in multiple languages, identical translations of official documents, and reports from intergovernmental bodies like the United Nations and the European Union. Even translations of books from Google’s book-scanning project were included. Where Candide had used three million carefully translated sentences, Google’s system harnessed billions of pages of translations of widely varying quality, according to the head of Google Translate, Franz Josef Och, one of the foremost authorities in the field. Its trillion-word corpus amounted to 95 billion English sentences, albeit of dubious quality. Despite the messiness of the input, Google’s service works the best.

Nevertheless culturomics has given us an entirely new lens with which to understand ourselves. Transforming words into data unleashes numerous uses. Yes, the data can be used by humans for reading and by machines for analysis. But as the paragon of a big-data company, Google knows that information has multiple potential purposes that can justify its collection and datafication. So Google cleverly used the datafied text from its book-scanning project to improve its machine-translation service. As explained in Chapter Three, the system would take books that are translations and analyze what words and phrases the translators used as alternatives from one language to another. Knowing this, it could then treat translation as a giant math problem, with the computer figuring out probabilities to determine what word best substitutes for another between languages.

. [>] Quantifying the world—Much of the authors’ thinking on the history of datafication has been inspired by Crosby, The Measure of Reality. [>] Europeans were never exposed to abacuses—Ibid., 112. Calculating faster using Arabic numerals—Alexander Murray, Reason and Society in the Middle Ages (Oxford University Press, 1978), p. 166. [>] Total number of books published and Harvard study on Google book-scanning project—Jean-Baptiste Michel et al., “Quantitative Analysis of Culture Using Millions of Digitized Books,” Science 331 (January 14, 2011), pp. 176–182 ( For a video lecture on the paper, see Erez Lieberman Aiden and Jean-Baptiste Michel, “What We Learned from 5 Million Books,” TEDx, Cambridge, MA, 2011 ( [>] On wireless modules in cars and insurance—See Cukier, “Data, Data Everywhere.”


pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives by Steven Levy


23andMe, AltaVista, Anne Wojcicki, Apple's 1984 Super Bowl advert, autonomous vehicles, book scanning, Brewster Kahle, Burning Man, business process, clean water, cloud computing, crowdsourcing, Dean Kamen, discounted cash flows, don't be evil, Douglas Engelbart, El Camino Real, fault tolerance, Firefox, Gerard Salton, Google bus, Google Chrome, Google Earth, Googley, HyperCard, hypertext link, IBM and the Holocaust, informal economy, information retrieval, Internet Archive, Jeff Bezos, Kevin Kelly, Mark Zuckerberg, Menlo Park, optical character recognition, PageRank, Paul Buchheit, Potemkin village, prediction markets, recommendation engine, risk tolerance, Sand Hill Road, Saturday Night Live, search inside the book, second-price auction, Silicon Valley, skunkworks, Skype, slashdot, social graph, social software, social web, spectrum auction, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, Ted Nelson, telemarketer, trade route, traveling salesman, Vannevar Bush, web application, WikiLeaks, Y Combinator

It would require more care when handling the books, but it seemed more economical. For one thing, the books could be sold afterward. Or they could simply be borrowed in the first place. “We came up with all these numbers,” says Mayer. “We were emailing them around, the right cost per hour, the right number of pages per hour—debate, debate, debate. After one thread hinged on how many pages an hour we could do, we decided we should just scan one.” They set up a makeshift book scanning device. They tried several sizes of books, the first one, appropriately enough, being The Google Book, an illustrated children’s story by V. C. Vickers. (The “Google” in the title was an odd creature with aspects of mammal, reptile, and fish.) They then tested a photo book, Ancient Forests by David Middleton; a dense text, Algorithms in C by Robert Sedgewick; and a general-interest book, Startup, by Jerry Kaplan.

So it commissioned some of its best wizards to build a machine that, presumably, would work much more accurately and at a somewhat brisker rate than Marissa Mayer turning pages one by one. Though Google wasn’t known for actually building machines, its data center needs had generated a lot of engineering expertise in that area: remember, it was the world’s biggest manufacturer of computer servers. One of the difficulties in book scanning rested in producing high-quality images from the printed page, so that OCR software could accurately translate the shapes of the letters on the page to computer-readable text. The problem was that, on their own, books did not sit flat on the platform: they presented a 3-D problem requiring a 2-D solution. The usual workarounds—flattening the book by pressing it on the glass or removing the binding—would not work since they were time-consuming and damaged the books.

In other areas, Google had put its investments into the public domain, like the open-source Android and Chrome operating systems. And as far as user information was concerned, Google made it easy for people not to become locked into using its products. It even had an initiative called the Data Liberation Front to make sure that users could easily move information they created with Google documents off Google’s servers. It would seem that book scanning was a good candidate for similar transparency. If Google had a more efficient way to scan books, sharing the improved techniques could benefit the company in the long run—inevitably, much of the output would find its way onto the web, bolstering Google’s indexes. But in this case, paranoia and a focus on short-term gain kept the machines under wraps. “We’ve done a ton of work to try to make those machines an order of magnitude better,” AMac said.


pages: 465 words: 109,653

Free Ride by Robert Levine


A Declaration of the Independence of Cyberspace, Anne Wojcicki, book scanning, borderless world, Buckminster Fuller, citizen journalism, correlation does not imply causation, crowdsourcing, death of newspapers, Edward Lloyd's coffeehouse, Firefox, future of journalism, Googley, Hacker Ethic, informal economy, Jaron Lanier, Julian Assange, Kevin Kelly, linear programming, offshore financial centre,, publish or perish, race to the bottom, Saturday Night Live, Silicon Valley, Silicon Valley startup, Skype, spectrum auction, Steve Jobs, Steven Levy, Stewart Brand, subscription business, Telecommunications Act of 1996, Whole Earth Catalog, WikiLeaks

Had the authors and publishers won, they would have received substantial damages but no way to sell out-of-print works. Perhaps most important, the settlement would have set an informal precedent that scanning books requires an agreement with publishers or authors. “The alternative was to take our chances on winning the lawsuit, and we probably would have,” Aiken says. “But if we didn’t, it would have been a catastrophe because [Google would have] millions of books scanned that authors and publishers would have no legal control over.” Like Amazon and Apple, Google sees books as a means to an end—in this case giving its search engine access to more information. “Probably the highest-quality knowledge is captured in books,” Sergey Brin said.16 Like record labels, publishers have become arms suppliers in a cold war between technology companies. By bringing Google into the business of selling books—and giving it enough of a selection to make it a legitimate competitor to Amazon and Apple—the proposed settlement could have given publishers more leverage.

Sergey Brin, “A Library to Last Forever,” New York Times, October 8, 2009. 11. Roy MacLeod, The Library of Alexandria: Centre of Learning in the Ancient World (New York: I. B. Tauris, 2000), p. 5. According to MacLeod, customs officials confiscated texts from passing ships, as well as visitors. They took originals for the library and returned copies to the owners. 12. There are two common views of whether Google’s book-scanning project qualifies as fair use. One, held by copyright reform activists, is that scanning books in order to create an index is no different from a card catalog, so it obviously falls under fair use. The other is that such a big project by a private company couldn’t possibly qualify. A court would probably find the issue less obvious than either side makes it out to be. On the one hand, Google’s use would further the aim of copyright law, and it could raise the value of the books in question by making them easier to find.


pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel


Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, call centre, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil,, Erik Brynjolfsson, experimental subject, Google Glasses, happiness index / gross national happiness, job satisfaction, Johann Wolfgang von Goethe, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, software as a service, speech recognition, statistical model, Steven Levy, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra

questions could be answered with a database lookup. The demands of open question answering reach far beyond the computer’s traditional arena of storing and accessing data for flight reservations and bank records. We’re going to need a smarter robot. The Ultimate Knowledge Source We are not scanning all those books to be read by people. We are scanning them to be read by an AI. —A Google employee regarding Google’s book scanning, as quoted by George Dyson in Turing’s Cathedral: The Origins of the Digital Universe A bit of good news: IBM didn’t need to create comprehensive databases for the Jeopardy! challenge because the ultimate knowledge source already exists: the written word. I am pleased to report that people like to report; we write down what we know in books, web pages, Wikipedia entries, blogs, and newspaper articles.

McKeown, “Learning Methods to Combine Linguistic Indicators: Improving Aspectual Classification and Revealing Linguistic Insights,” Computational Linguistics 26, issue 4 (December 2000). doi:10.1162/089120100750105957, Googling only 30 percent of the Jeopardy! questions right: Stephen Baker, Final Jeopardy: Man vs. Machine and the Quest to Know Everything (Houghton Mifflin Harcourt, 2011), 212–224. Quote about Google’s book scanning project: George Dyson, Turing’s Cathedral: The Origins of the Digital Universe (Pantheon Books, 2012). Natural language processing: Dursun Delen, Andrew Fast, Thomas Hill, Robert Nisbit, John Elder, and Gary Miner, Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications (Academic Press, 2012). James Allen, Natural Language Understanding, 2nd ed. (Addison-Wesley, 1994).


pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future by Kevin Kelly


3D printing, A Declaration of the Independence of Cyberspace, AI winter, Airbnb, Albert Einstein, Amazon Web Services, augmented reality, bank run, barriers to entry, Baxter: Rethink Robotics, bitcoin, blockchain, book scanning, Brewster Kahle, Burning Man, cloud computing, computer age, connected car, crowdsourcing, dark matter, dematerialisation, Downton Abbey, Edward Snowden, Elon Musk, Filter Bubble, Freestyle chess, game design, Google Glasses, hive mind, Howard Rheingold, index card, indoor plumbing, industrial robot, Internet Archive, Internet of things, invention of movable type, invisible hand, Jaron Lanier, Jeff Bezos, job automation, Kevin Kelly, Kickstarter, linked data, Lyft, M-Pesa, Marshall McLuhan, means of production, megacity, Minecraft, multi-sided market, natural language processing, Netflix Prize, Network effects, new economy, Nicholas Carr, peer-to-peer lending, personalized medicine, placebo effect, planetary scale, postindustrial economy, recommendation engine, RFID, ride hailing / ride sharing, Rodney Brooks, self-driving car, sharing economy, Silicon Valley, slashdot, Snapchat, social graph, social web, software is eating the world, speech recognition, Stephen Hawking, Steven Levy, Ted Nelson, the scientific method, transport as a service, two-sided market, Uber for X, Watson beat the top human players on Jeopardy!, Whole Earth Review

Legal tussles over the right to sample—to remix—snippets of music, particularly when either the sampled song or the borrowing song make a lot of money, are ongoing. The appropriateness of remixing, reusing material from one news source for another is a major restraint for new journalistic media. Legal uncertainty about Google’s reuse of snippets from the books it scanned was a major reason it closed down its ambitious book scanning program (although the court belatedly ruled in Google’s favor in late 2015). Intellectual property is a slippery realm. There are many aspects of contemporary intellectual property laws that are out of whack with the reality of how the underlying technology works. For instance, U.S. copyright law gives a temporary monopoly to a creator for his or her creation in order to encourage further creation, but the monopoly has been extended for at least 70 years after the death of the creator, long after a creator’s dead body can be motivated by anything.

., 62 extraordinary events, 277–79 eye tracking, 219–20 Facebook and aggregated information, 147 and artificial intelligence, 32, 39, 40 and “click-dreaming,” 280 cloud of, 128, 129 and collaboration, 273 and consumer attention system, 179, 184 and creative remixing, 199, 203 face recognition of, 39, 254 and filtering systems, 170, 171 flows of posts through, 63 and future searchability, 24 and interactivity, 235 and intermediation of content, 150 and lifestreaming, 246 and likes, 140 nonhierarchical infrastructure of, 152 number of users, 143, 144 as platform ecosystem, 123 and sharing economy, 139, 144, 145 and tracking technology, 239–40 and user-generated content, 21–22, 109, 138 facial recognition, 39, 40, 43, 220, 254 fan fiction, 194, 210 fear of technology, 191 Felton, Nicholas, 239–40 Fifield, William, 288 films and film industry, 196–99, 201–2 filtering, 165–91 and advertising, 179–89 differing approaches to, 168–75 filter bubble, 170 and storage capacity, 165–67 and superabundance of choices, 167–68 and value of attention, 175–79 findability of information, 203–7 firewalls, 294 first-in-line access, 68 first-person view (FPV), 227 fitness tracking, 238, 246, 255 fixity, 78–81 Flickr, 139, 199 Flows and flowing, 61–83 and engagement of users, 81–82 and free/ubiquitous copies, 61–62, 66–68 and generative values, 68–73 move from fixity to, 78–81 in real time, 64–65 and screen culture, 88 and sharing, 8 stages of, 80–81 streaming, 66, 74–75, 82 and users’ creations, 73–74, 75–78 fluidity, 66, 79, 282 food as service (FaS), 113–14 footnotes, 201 411 information service, 285 Foursquare, 139, 246 fraud, 184 freelancers (prosumers), 113, 115, 116–17, 148, 149 Freeman, Eric, 244–45 fungibility of digital data, 195 future, blindness to, 14–22 Galaxy phones, 219 gatekeepers, 167 Gates, Bill, 135, 136 gaze tracking, 219–20 Gelernter, David, 244–46 General Electric, 160 generatives, 68–73 genetics, 69, 238, 284 Gibson, William, 214 gifs, 195 global connectivity, 275, 276, 292 gluten, 241 GM, 185 goods, fixed, 62, 65 Google AdSense ads, 179–81 and artificial intelligence, 32, 36–37, 40 book scanning projects, 208 cloud of, 128, 129 and consumer attention system, 179, 184 and coveillance, 262 and facial recognition technology, 254 and filtering systems, 172, 188 and future searchability, 24 Google Drive, 126 Google Glass, 217, 224, 247, 250 Google Now, 287 Google Photo, 43 and intellectual property law, 208–9 and lifelogging, 250–51, 254 and lifestreaming, 247–48 and photo captioning, 51 quantity of searches, 285–86 and smart technology, 223–25 translator apps of, 51 and users’ usage patterns, 21, 146–47 and virtual reality technology, 215, 216–17 and visual intelligence, 203 government, 167, 175–76, 252, 255, 261–64 GPS technology, 226, 274 graphics processing units (GPU), 38–39, 40 Greene, Alan, 31–32, 238 grocery shopping, 62, 253 Guinness Book of World Records, 278 hackers, 252 Hall, Storrs, 264–65 Halo, 227 Hammerbacher, Jeff, 280 hand motion tracking, 222 haptic feedback, 233–34 harassment, online, 264 hard singularity, 296 Harry Potter series, 204, 209–10 Hartsell, Camille, 252 hashtags, 140 Hawking, Stephen, 44 health-related websites, 179–81 health tracking, 173, 238–40, 250 heat detection, 226 hierarchies, 148–54, 289 High Fidelity, 219 Hinton, Geoff, 40 historical documents, 101 hive mind, 153, 154, 272, 281 Hockney, David, 155 Hollywood films, 196–99 holodeck simulations, 211–12 HoloLens, 216 the “holos,” 292–97 home surveillance, 253 HotWired, 18, 149, 150 humanity, defining, 48–49 hyperlinking antifacts highlighted by, 279 of books, 95, 99 of cloud data, 125–26 and creative remixing, 201–2 early theories on, 18–19, 21 and Google search engines, 146–47 IBM, 30–31, 40, 41, 128, 287 identity passwords, 220, 235 IMAX technology, 211, 217 implantable technology, 225 indexing data, 258 individualism, 271 industrialization, 49–50, 57 industrial revolution, 189 industrial robots, 52–53 information production, 257–64.


pages: 629 words: 142,393

The Future of the Internet: And How to Stop It by Jonathan Zittrain


A Declaration of the Independence of Cyberspace, Amazon Mechanical Turk, Andy Kessler, barriers to entry, book scanning, Brewster Kahle, Burning Man,, call centre, Cass Sunstein, citizen journalism, Clayton Christensen, clean water, corporate governance, Daniel Kahneman / Amos Tversky, distributed generation,, Firefox, game design, Hacker Ethic, Howard Rheingold, Hush-A-Phone, illegal immigration, index card, informal economy, Internet Archive, jimmy wales, license plate recognition, loose coupling, mail merge, national security letter, packet switching, Post-materialism, post-materialism, pre–internet, price discrimination, profit maximization, Ralph Nader, RFC: Request For Comment, RFID, Richard Stallman, Richard Thaler, risk tolerance, Robert X Cringely, SETI@home, Silicon Valley, Skype, slashdot, software patent, Steve Ballmer, Steve Jobs, Ted Nelson, Telecommunications Act of 1996, The Nature of the Firm, The Wisdom of Crowds, web application, wikimedia commons

Digital Millennium Copyright Act of 1998 give some protection to search engines that point customers to material that infringes copyright,113 but they do not shield the actions required to create the search database in the first place. The act of creating a search engine, like the act of surfing itself, is something so commonplace that it would be difficult to imagine deeming it illegal—but this is not to say that search engines rest on any stronger of a legal basis than the practice of using robots.txt to determine when it is and is not appropriate to copy and archive a Web site.114 Only recently, with Google’s book scanning project, have copyright holders really begun to test this kind of question.115 That challenge has arisen over the scanning of paper books, not Web sites, as Google prepares to make them searchable in the same way Google has indexed the Web.116 The long-standing practice of Web site copying, guided by robots.txt, made that kind of indexing uncontro-versial even as it is, in theory, legally cloudy.

., 188–92; and procrastination principle, 152, 164, 180, 242, 245; security in, 166; stability of, 153–74; use of term, 74; as what we make them, 242, 244–46; as works in progress, 152 generative technology: accessibility of, 72–73, 93; and accountability, 162–63; adaptability of, 71–72, 93, 125; affordance theory, 78; Apple II, 2; benefits of, 64, 79–80, 84–85; blending of models for innovation, 86–90; control vs. anarchy in, 98, 150, 157–62; design features of, 43; ease of mastery, 72; end-to-end neutrality of, 165; expansion of, 34; features of, 71–73; freedom vs. security in, 3–5, 40–43, 151; free software philosophy, 77; and generative content, 245; group creativity, 94, 95; hourglass architecture, 67–71, 99; innovation as output of, 80–84, 90; input/participation in, 90–94; leverage in, 71, 92–93; non-generative generative technology (continued) compared to, 73–76; openness of, 19, 150, 156–57, 178; pattern of, 64, 67, 96–100; as platform, 2, 3; recursive, 95–96; success of, 42–43; theories of the commons, 78–79; transferability of, 73; vulnerability of, 37–51, 54–57, 60–61, 64–65 generative tools, 74–76 generativity: extra-legal solutions for, 168–73; Libertarian model of, 131; and network neutrality, 178–81; paradox of, 99; recursive, 94; reducing, and increasing security, 97, 102, 165, 167, 245; repurposing via, 212; use of term, 70; and Web 2.0, 123–26, 119, 189 GNE, 132–33, 134, 135 GNU/Linux, 64, 77, 89, 114, 190, 192 GNUpedia, 132 goldfish bowl cams, 158 “good neighbors” system, 160 Google: and advertising, 56; book scanning project of, 224–25, 242; Chinese censorship of, 113, 147; clarification available on, 230; data gathering by, 160, 221; death penalty of, 218, 220; image search on, 214–15; innovation in, 84; map service of, 124, 184, 185; privacy policy on, 306n47; and procrastination principle, 242; as search engine, 223, 226; and security, 52, 171; and spam, 170–73 Google Desktop, 185 Google News, 242 Google Pagerank, 160 Google Video, 124 governments: abuse of power by, 117–19, 187; oppressive, monitoring by, 33; PCs investigated by, 186–88; research funding from, 27, 28 GPS (Global Positioning Systems), 109, 214 graffiti, 45 Griffith, Virgil, and Wikiscanner, 151 Gulf Shipbuilding Corporation, 172 gun control legislation, 117 hackers: ethos of, 43, 45, 53; increasing skills of, 245 Harvard University, Berkman Center, 159, 170 HD-DVDs, 123 Health, Education, and Welfare (HEW) Department, U.S., privacy report of (1973), 201–5, 222, 233–34 Herdict, 160, 163, 167–68, 173, 241 Hippel, Eric von, 86–87, 98, 146 Hollerith, Herman, 11–12, 13; business model of, 17, 20, 24 Hollerith Tabulating Machine Company, 11–12 home boxes, 180–81 honor codes, 128–29 Horsley, Neal, 215 Hotmail, 169 “How’s My Driving” programs, 219, 229 HTML (hypertext markup language), 95 Hunt, Robert, 190 Hush-A-Phone, 21–22, 81, 82, 121 hyperlinks, 56, 89 hypertext, coining of term, 226 IBM (International Business Machines): antitrust suit against, 12; business model of, 12, 23, 30, 161; competitors of, 12–13; and generative technology, 64; Internet Security Systems, 47–48; mainframe computers, 12, 57; OS/2, 88; and risk aversion, 17, 57; System 360, 174 identity tokens, unsheddable, 228 image recognition, 215–16 immigration, illegal, 209 information appliances: accessibility of, 29, 232; code thickets, 188–92; content thickets, 192–93; and data portability, 176–78; generative systems compared to, 73–76; limitations of, 177; and network neutrality, 178–85; PCs as, 4, 59–61, 102, 185–88; PCs vs., 18, 29, 57–59; and perfect enforcement, 161; and privacy, 185–88; regulatory interventions in, 103–7, 125, 197; remote control of, 161; remote updates of, 106–7, 176; security dilemma of, 42, 106–7, 123–24, 150, 176–88; specific injunction, 108–9; variety of designs for, 20; Web 2.0 and, 102; See also specific information appliances information overload, 230 information services, early forms, 9 InnoCentive, 246 innovation: blending models for, 86–90; generativity as parent of, 80–84, 90; group, 94; and idiosyncrasy, 90–91; inertia vs., 83–84; “sustaining” vs.


pages: 189 words: 57,632

Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future by Cory Doctorow


book scanning, Brewster Kahle, Burning Man,, informal economy, information retrieval, Internet Archive, invention of movable type, Jeff Bezos, Law of Accelerating Returns, Metcalfe's law, mutually assured destruction, new economy, optical character recognition, patent troll, pattern recognition, Ponzi scheme, post scarcity, QWERTY keyboard, Ray Kurzweil, RFID, Sand Hill Road, Skype, slashdot, social software, speech recognition, Steve Jobs, Turing test, Vernor Vinge

More importantly, the free e-book skeptics have no evidence to offer in support of their position — just hand-waving and dark muttering about a mythological future when book-lovers give up their printed books for electronic book-readers (as opposed to the much more plausible future where book lovers go on buying their fetish objects and carry books around on their electronic devices). I started giving away e-books after I witnessed the early days of the "bookwarez" scene, wherein fans cut the binding off their favorite books, scanned them, ran them through optical character recognition software, and manually proofread them to eliminate the digitization errors. These fans were easily spending 80 hours to rip their favorite books, and they were only ripping their favorite books, books they loved and wanted to share. (The 80-hour figure comes from my own attempt to do this — I'm sure that rippers get faster with practice.) I thought to myself that 80 hours' free promotional effort would be a good thing to have at my disposal when my books entered the market.


pages: 173 words: 14,313

Peers, Pirates, and Persuasion: Rhetoric in the Peer-To-Peer Debates by John Logie


1960s counterculture, Berlin Wall, book scanning, cuban missile crisis, Fall of the Berlin Wall, Hacker Ethic, Isaac Newton, Marshall McLuhan, mutually assured destruction, Plutocrats, plutocrats, pre–internet, Richard Stallman, search inside the book, SETI@home, Silicon Valley, slashdot, Steve Jobs, Steven Levy, Stewart Brand, Whole Earth Catalog

In late 2004, the Internet megagiant Google announced, with great fanfare, its plan to digitize the library holdings of five major universities. Google intended to display small portions of the books, limiting users to reviewing a page at a time, and blocking printing. Less than a year later some members of the American Association of University Presses were petitioning the courts, demanding the right to opt out Pa r l orPr e s s wwwww. p a r l or p r e s s . c om Conclusion: The Cat Came Back 147 of having their authors’ books scanned. Other publishers are now demanding that Google request and receive permissions for each book it scans. And, for good measure, free speech advocates are encouraging Google to refuse to honor the publishers’ wishes and publish everything based on a hard-line fair use claim. Once again, U.S. Copyright Law has magically transformed an attempt to build Borges’s Library of Babel into the Tower of Babel, wherein the participants are unable to communicate with one another, and progress toward lofty goals is impossible.


The Orbital Perspective: Lessons in Seeing the Big Picture From a Journey of 71 Million Miles by Astronaut Ron Garan, Muhammad Yunus


Airbnb, barriers to entry, book scanning, Buckminster Fuller, clean water, corporate social responsibility, crowdsourcing, global village, Google Earth, Indoor air pollution, jimmy wales, optical character recognition, ride hailing / ride sharing, shareholder value, Silicon Valley, Skype, smart transportation, Stephen Hawking, transaction costs, Turing test, Uber for X, web of trust

When many people review and comment on a particular room for rent or an Uber driver, those evaluations start to become statistically accurate. The driver or homeowner has demonstrated a track record of living up to agreements, and the collective wisdom of the crowd can point to a high level of dependability. This is similar to Duolingo’s use of beginning language students to provide translations or ReÂ�CAPTCHA’s ability to crowdsource the accuracy of book scans. Community-Based Trust These examples relate to personal trust, but there are countless similar examples of communities that form online for a specific purpose and operate in a coordinated way for the greater good. Wikipedia, for instance, was built on the premise that people enjoy interacting within a community, which in the case of Wikipedia, is a global village documenting human knowledge.


pages: 291 words: 77,596

Total Recall: How the E-Memory Revolution Will Change Everything by C. Gordon Bell, Jim Gemmell


airport security, Albert Einstein, book scanning, cloud computing, conceptual framework, full text search, information retrieval, invention of writing, inventory management, Isaac Newton, Menlo Park, optical character recognition, pattern recognition, performance metric, RAND corporation, RFID, semantic web, Silicon Valley, Skype, social web, statistical model, Stephen Hawking, Steve Ballmer, Ted Nelson, telepresence, Turing test, Vannevar Bush, web application

My publications papers and reports e. My talks and presentations f. Other publications papers and reports g. People, references, recommendations, vitae h. Archived company and organizational folders (X) i) Digital Equipment Corp. . . . ii) NSF i. Archived calendars and correspondence (t) j. Archived files (e.g., DEC WPS, e-mail) 3. My Books books authored, books scanned 4. My Voice Conversations and Notes (telephone conversations are held in MyLifeBits database) 5. My Media, i.e., song collections from ripped CDs 6. My Videos including c. 1950s 8mm movies and lectures Psychologists have identified “lifetime periods” as an important way that autobiographical memories work. Lifetime periods are thematic and include work or jobs, educational institutions, and relationships that exist over an extended period of time.


pages: 236 words: 77,098

I Live in the Future & Here's How It Works: Why Your World, Work, and Brain Are Being Creatively Disrupted by Nick Bilton


3D printing, 4chan, Albert Einstein, augmented reality, barriers to entry, book scanning, Cass Sunstein, death of newspapers,, Internet of things, John Gruber, Marshall McLuhan, Nicholas Carr, recommendation engine, RFID, Saturday Night Live, Steve Jobs, Steven Pinker, Stewart Brand

Sixteen Postures by Pietro Aretino was a series of engravings of sexual positions, and Gargantua and Pantagruel by François Rabelais in the sixteenth century included stories and etchings of sexual encounters that were widely distributed throughout Europe. Rabelais, a famous French writer, boasted that more of his sexually explicit books were sold in two months than copies of the Bible were sold in years—although since BookScan, the database that tracks book sales, wasn’t developed until the twenty-first century, official figures aren’t available to prove it. He did, however, offer prescient advice to those in the media business: Sex sells. Centuries later, the roots of and eventual birth of early movie theaters grew out of early movie arcades, where a person could insert a coin and see a short, fuzzy clip of a woman undressing.


pages: 369 words: 80,355

Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room by David Weinberger


airport security, Alfred Russel Wallace, Amazon Mechanical Turk, Berlin Wall, Black Swan, book scanning, Cass Sunstein, corporate social responsibility, crowdsourcing, Danny Hillis, David Brooks, Debian, double entry bookkeeping, double helix,, Exxon Valdez, Fall of the Berlin Wall, future of journalism, Galaxy Zoo, Hacker Ethic, Haight Ashbury, hive mind, Howard Rheingold, invention of the telegraph, jimmy wales, John Harrison: Longitude, Kevin Kelly, linked data, Netflix Prize, New Journalism, Nicholas Carr, Norbert Wiener, openstreetmap, P = NP, Pluto: dwarf planet, profit motive, Ralph Waldo Emerson, RAND corporation, Ray Kurzweil, Republic of Letters, RFID, Richard Feynman, Richard Feynman, Ronald Reagan, semantic web, slashdot, social graph, Steven Pinker, Stewart Brand, technological singularity, Ted Nelson, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas Malthus, Whole Earth Catalog, X Prize

Even before books, the hundreds of thousands of scrolls in the Library of Alexandria were more than could be carried out to safety from the great fire, much less be read in a lifetime. Only about 2 percent of the Harvard University library system’s physical holdings circulate every year, and most of those are the same works that circulated the previous year.1 The new abundance makes the old abundance look like scarcity. The Google book-scanning project alone has over 15 million scanned books, which you can search through more easily than you can look up an item in the index of the book on your night table.2 Harvard’s Robert Darnton, whom we met in Chapter 6, is among those proposing a Digital Public Library of America,3 a call that has excited interest among public and research librarians, the government, and some large Internet projects.


Not That Kind of Girl: A Young Woman Tells You What She's "Learned" by Lena Dunham


book scanning, Mason jar, Saturday Night Live

Her sister, another imp with impossibly well-thought-out hair, has a funny phlegmy laugh. I know I shouldn’t drink anymore, or should at least temper it with a few handfuls of the crisps they are passing around. No one can explain how they came to live here. Nellie hops up, discarding her coat while announcing that it’s freezing. “Let me show you round,” she says. I take in every detail of the house like I’m six again and reading a picture book, scanning the illustrations carefully. Next to a marble fireplace lies an issue of Elle, a torn thigh-high stocking, an empty pack of Marlboros, a half-eaten pudding cup. And each room leads to another, like one of those New York real-estate dreams where you open a hidden door and discover massive rooms you didn’t even know you had. I spill some of my wine down the front of my dress. Nellie’s bedroom contains a freestanding claw-foot tub, and I eye all her books and clippings with a pathetic level of interest.


pages: 240 words: 109,474

Masters of Doom: How Two Guys Created an Empire and Transformed Pop Culture by David Kushner


Apple's 1984 Super Bowl advert, book scanning, Columbine, corporate governance, game design, glass ceiling, Hacker Ethic, informal economy, market design, Marshall McLuhan, Saturday Night Live, side project, Silicon Valley, slashdot, software patent, Steve Jobs, Steven Levy, X Prize

Also to speed things up, characters and objects in the game would not be in true 3-D, they would be sprites, flat images that, if encountered in real lite, would look like cardboard cutouts. Romero, in pure Melvin mode, imagined all the crazy stuff they could do in a game where the object was, as he said, “to mow down Nazis.” He wanted to have the suspense of an Apple II game pumped up with the shock and horror of storming a Nazi bunker. There would be SS soldiers and Hitler. 79 Adrian hit the history books, scanning images of the German leader to include throughout the game. But that wasn’t enough. “How about,” Romero suggested, “we throw in guard dogs? Dogs that you can shoot! Fucking German shepherds!” Adrian cracked up, sketching out a dog that, in a death animation, could yelp back. “And there should be blood,” Romero said, “lots of blood, blood like you never see in games. And the weapons should be lethal but simple: a knife, a pistol, maybe a Gatling gun too.”


pages: 391 words: 105,382

Utopia Is Creepy: And Other Provocations by Nicholas Carr


Air France Flight 447, Airbnb, AltaVista, Amazon Mechanical Turk, augmented reality, autonomous vehicles, Bernie Sanders, book scanning, Brewster Kahle, Buckminster Fuller, Burning Man, Captain Sullenberger Hudson, centralized clearinghouse, cloud computing, cognitive bias, collaborative consumption, computer age, corporate governance, crowdsourcing, Danny Hillis, deskilling, Donald Trump, Elon Musk, factory automation, failed state, feminist movement, Frederick Winslow Taylor, friendly fire, game design, global village, Google bus, Google Glasses, Google X / Alphabet X, Googley, hive mind, impulse control, indoor plumbing, interchangeable parts, Internet Archive, invention of movable type, invention of the steam engine, invisible hand, Isaac Newton, Jeff Bezos, jimmy wales, job automation, Kevin Kelly, low skilled workers, Mark Zuckerberg, Marshall McLuhan, means of production, Menlo Park, mental accounting, natural language processing, Network effects, new economy, Nicholas Carr, oil shale / tar sands, Peter Thiel, Plutocrats, plutocrats, profit motive, Ralph Waldo Emerson, Ray Kurzweil, recommendation engine, Republic of Letters, robot derives from the Czech word robota Czech, meaning slave, Ronald Reagan, self-driving car, SETI@home, side project, Silicon Valley, Silicon Valley ideology, Singularitarianism, Snapchat, social graph, social web, speech recognition, Startup school, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, technoutopianism, the medium is the message, theory of mind, Turing test, Whole Earth Catalog, Y Combinator

Internet or not, the world may still not be ready for the library of utopia. LARRY PAGE isn’t known for his literary sensibility, but he does like to think big. In 2002, the Google cofounder decided that it was time for his young company to scan all the world’s books into its database. If printed texts weren’t brought online, he feared, Google would never fulfill its mission of making the world’s information “universally accessible and useful.” After doing some book-scanning tests in his office—he manned the camera while Marissa Mayer, then a product manager, turned pages to the beat of a metronome—he concluded that Google had the smarts and the money to get the job done. He set a team of engineers and programmers to work. In a matter of months, they had invented an ingenious scanning device that used a stereoscopic infrared camera to correct for the bowing of pages that occurs when a book is opened.


pages: 380 words: 118,675

The Everything Store: Jeff Bezos and the Age of Amazon by Brad Stone


3D printing, airport security, AltaVista, Amazon Mechanical Turk, Amazon Web Services, bank run, Bernie Madoff, big-box store, Black Swan, book scanning, Brewster Kahle, call centre, centre right, Clayton Christensen, cloud computing, collapse of Lehman Brothers, crowdsourcing, cuban missile crisis, Danny Hillis, Douglas Hofstadter, Elon Musk, facts on the ground, game design, housing crisis, invention of movable type, inventory management, James Dyson, Jeff Bezos, Kevin Kelly, Kodak vs Instagram, late fees, loose coupling, low skilled workers, Maui Hawaii, Menlo Park, Network effects, new economy, optical character recognition,, Ponzi scheme, quantitative hedge fund, recommendation engine, Renaissance Technologies, RFID, Rodney Brooks, search inside the book, shareholder value, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, Skype, statistical arbitrage, Steve Ballmer, Steve Jobs, Steven Levy, Stewart Brand, Thomas L Friedman, Tony Hsieh, Whole Earth Catalog, why are manhole covers round?

About two dozen Amazon employees worked on the service from January 2004 to November 2005. It was considered a Jeff project, which meant that the product manager met with Bezos every few weeks and received a constant stream of e-mail from the CEO, usually containing extraordinarily detailed recommendations and frequently arriving late at night. Amazon started using Mechanical Turk internally in 2005 to have humans do things like review Search Inside the Book scans and check product images uploaded to Amazon by customers to ensure they were not pornographic. The company also used Mechanical Turk to match the images with the corresponding commercial establishments in A9’s fledgling Block View tool. Bezos himself became consumed with this task and used it as a way to demonstrate the service. As the company prepared to introduce Mechanical Turk to the public, Amazon’s PR team and a few employees complained they were uncomfortable with the system’s reference to the Turkish people.


pages: 496 words: 154,363

I'm Feeling Lucky: The Confessions of Google Employee Number 59 by Douglas Edwards


Albert Einstein, AltaVista, Any sufficiently advanced technology is indistinguishable from magic, barriers to entry, book scanning, Build a better mousetrap, Burning Man, business intelligence, call centre, crowdsourcing, don't be evil, Elon Musk, fault tolerance, Googley, gravity well, invisible hand, Jeff Bezos, job-hopping, Menlo Park, microcredit, music of the spheres, Network effects, P = NP, PageRank, performance metric,, Ralph Nader, risk tolerance, second-price auction, side project, Silicon Valley, Silicon Valley startup, slashdot, stem cell, Superbowl ad, Y2K

Larry decreed a meeting be established at which views would be heard from all corners of the Plex, disagreements would be aired, and edicts would be issued. He dubbed it "product review." Google had birthed a process. Product review met in Larry and Sergey's office. I arrived early to get a seat on the black pleather couch. Otherwise, I'd have had to balance my laptop while sitting on a three-foot rubber ball. A large metal exoskeleton—the prototype for Larry's book-scanning project-held a camera and an array of lights pointing down at the coffee table in front of me. Karen White, Marissa Mayer, Jen McGrath from the front-end team, and Craig Silverstein worked around it, connecting cables to a projector so we could display mockups against the office wall. Sergey leaned back in his desk chair across from us, reading and eating a sandwich. It was hard to tell if he was paying attention.


pages: 553 words: 151,139

The Teeth of the Tiger by Tom Clancy


airport security, book scanning, centralized clearinghouse, complexity theory, forensic accounting, illegal immigration, Occam's razor, sensible shoes

He could have attached it to his own laptop and gone exploring, but, no, that was a job for a real computer geek. They'd killed four people who had struck out at America, and now America had struck back on their turf and by their rules. The good part was that the enemy could not possibly know what kind of cat was in the jungle. They'd hardly met the teeth. Next, they'd meet the brain. * * * Scan Notes: [13 sep 2003-1.0-eBook Scanned, Proofed and Formatted by BookWurm] [23 sep 2003-2.0-re-proofed and converted from evil Word2000 HTML for #bookz] [07 oct 2003-2.1-re-re-proofed in rtf for #bookz] [05 dec 2003-2.2-re-re-proofed in rtf for #bookz by The_Ghiti - including restoring missing parts of pages 148-9] [23 dec 2003-3.0-re-re-re-proofed in rtf for #bookz by The_Ghiti - A few minor errors, but mostly restoring section breaks, which had all been lost during conversions at sometime or another.


pages: 510 words: 120,048

Who Owns the Future? by Jaron Lanier


3D printing, 4chan, Affordable Care Act / Obamacare, Airbnb, augmented reality, automated trading system, barriers to entry, bitcoin, book scanning, Burning Man, call centre, carbon footprint, cloud computing, computer age, crowdsourcing, David Brooks, David Graeber, delayed gratification, digital Maoism,, facts on the ground, Filter Bubble, financial deregulation, Fractional reserve banking, Francis Fukuyama: the end of history, George Akerlof, global supply chain, global village, Haight Ashbury, hive mind, if you build it, they will come, income inequality, informal economy, invisible hand, Jacquard loom, Jaron Lanier, Jeff Bezos, job automation, Kevin Kelly, Khan Academy, Kickstarter, Kodak vs Instagram, life extension, Long Term Capital Management, Mark Zuckerberg, meta analysis, meta-analysis, moral hazard, mutually assured destruction, Network effects, new economy, Norbert Wiener, obamacare, packet switching, Peter Thiel, place-making, Plutocrats, plutocrats, Ponzi scheme, post-oil, pre–internet, race to the bottom, Ray Kurzweil, rent-seeking, reversible computing, Richard Feynman, Richard Feynman, Ronald Reagan, self-driving car, side project, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, Skype, smart meter, stem cell, Steve Jobs, Steve Wozniak, Stewart Brand, Ted Nelson, The Market for Lemons, Thomas Malthus, too big to fail, trickle-down economics, Turing test, Vannevar Bush, WikiLeaks

The real people from whom the initial answers were gathered deserve to be paid for each new answer given by the machine. Consider too the act of scanning a book into digital form. The historian George Dyson has written that a Google engineer once said to him: “We are not scanning all those books to be read by people. We are scanning them to be read by an AI.” While we have yet to see how Google’s book scanning will play out, a machine-centric vision of the project might encourage software that treats books as grist for the mill, decontextualized snippets in one big database, rather than separate expressions from individual writers. In this approach, the contents of books would be atomized into bits of information to be aggregated, and the authors themselves, the feeling of their voices, their differing perspectives, would be lost.


pages: 645 words: 184,311

American Gods by Neil Gaiman


airport security, book scanning, Brownian motion, Golden Gate Park, Lao Tzu

Hinzelmann, originally of Hildemuhlen in Bavaria, was in charge of the lake-building project, and that the city council had granted him the sum of $370 toward the project, any shortfall to be made up by public subscription. Shadow tore off a strip of a paper towel and placed it into the book as a bookmark. He could imagine Hinzelmann's pleasure in seeing the reference to his grandfather. He wondered if the old man knew that his family had been instrumental in building the lake. Shadow flipped forward through the book, scanning for more references to the lake-building project. They had dedicated the lake in a ceremony in the spring of 1876, as a precursor to the town's centennial celebrations. A vote of thanks to Mr. Hinzelmann was taken by the council. Shadow checked his watch. It was five-thirty. He went into the bathroom, shaved, combed his hair. He changed his clothes. Somehow the final fifteen minutes passed.