ATI PDF parser

by Edward Betts


Can the PDFs at be converted into a more usable format?


Convert from PDF to PostScript, because text formats are easier to work with:

pdftops -noembcidps -noembcidtt -noembtt -noembt1 42589_rv630_rrg_1.01o.pdf

This creates a PostScript file named ''. We don't need any fonts, thats the reason for the flags.

Download two Perl scripts I wrote: parse_ati_pdf, and ati_pdf_to_html. To run parse_ati_pdf you'll need these Perl modules: Data::Dump, List::Util and List::MoreUtils.

Run like this:

perl parse_ati_pdf > ati_pdf_data
perl ati_pdf_to_html ati_pdf_data > output.html


Further work

It is left as an exercise for the reader to modifier the parser to work with the other PDF.