Open Library

Import

For merge we need four indexes, three book identifiers, ISBN, OCLC, LCCN, and normalised title truncated to 25 characters as a fall back. These indexes are implemented using dbmhash. The problem with this is that only one process can read and write the indexes at a time. The solution is to build an import server that handles reading and writing the indexes. The most obvious way of implementing interprocess communication is with HTTP. The import server will be written using web.py.

The HTTP method GET is used for searching the indexes, and POST for adding a new record to the database and updating the indexes.

The import server handles updating the database to avoid conflicts when generating new keys for authors and editions.

Index fields are passed as GET parameters, lists are joined with '_'. The response is in JSON. For example:

For the fields: {'isbn': ['0415045568', '0391025511'], 'title': ['phenomenology of percepti']}

The URL is: http://wiki-beta.us.archive.org:9020/?isbn=0415045568_0391025511&title=phenomenology+of+percepti

And the response is:

{"fields": {"isbn": ["0415045568"], "title": ["phenomenology of percepti"]}, "pool": {"isbn": [1366447, 10187591], "title": [1366447, 8071041, 10146455, 10187591, 10198188, 10198270, 10568028, 13557619, 13620735, 17343673]}}

The numbers in the response are database IDs.

Roberts wishlist

Lending waiting list
Multiple copies to loan, pull books from shelves in libraries
Search inside filtered by collection
All archive.org books on Open Library

Bugs

MARC source should say 'Humboldt State University'
- http://openlibrary.org/books/OL24995495M/Pacifica
- http://openlibrary.org/books/OL24995494M/A_design_for_Native_American_education
Get rid of ia:ic in source_records: bug report from Dan

Todo

Fix ol-tasks so it has FTP iptables rules:

iptables -I INPUT 8 -p tcp --dport ftp-data -j ACCEPT
iptables -I INPUT 8 -p tcp --dport ftp-data -j ACCEPT
Load data from http://librisbloggen.kb.se/2011/09/21/swedish-national-bibliography-and-authority-data-released-with-open-license/ (see e-mail)
Browse http://www.smalldemons.com/

Notes

Hank implemented range requests on archive.org: http://www.archive.org/download/marc_records_scriblio_net/part05.dat?range=18095778:686
Available circulation data (e-mail from Karen): http://www.oclc.org/research/activities/ohiolink/circulation.htm
S3: http://www.archive.org/help/abouts3.txt
contrib_submit: http://www.archive.org/contrib_submit.php?help=1

clear out ivm29

onix_wiley_crawl 
elsevier_covers_crawl
onix_princeton