It would be nice to have a machine readable copy of the UK National rail timetable. Network rail supply a timetable in PDF format. It is possible to extract machine readable timetable data from this PDF.
Requirements
pdftops: Portable Document Format (PDF) to PostScript converter. Ships as part of Poppler (GPL 2)
To install on Debian or Ubuntu run:
apt-get install poppler-utils
CompleteTimetable.pdf: Download the time table PDF from Network Rail (60M)
Perl: Also need the modules Data::Dump and List::MoreUtils.
On Debian or Ubuntu run:
apt-get install libdata-dump-perl liblist-moreutils-perl
Rail timetable parser: Download parser
Usage
Put CompleteTimetable.pdf in a directory with parse. Run parse: perl parse The first time it is run it will call pdftops to convert the PDF, a binary format, into PostScript, a text format, which is easy to work with.
Then it will print lots of debugging output about pages, timetables and trains.
Output
Here is an sample of output in a range of formats:
Further work
This code is unfinished, lots of cases are not handled. Specifically:
- Base notes
- Head notes
- Date ranges
- Train flags
- Trains that join or split
- Repeat trains: "and at the same minutes past each hour until"