ATI PDF parser

by Edward Betts

Problem

Can the PDFs at http://www.x.org/docs/AMD/ be converted into a more usable format?

Solution

Convert from PDF to PostScript, because text formats are easier to work with:

pdftops -noembcidps -noembcidtt -noembtt -noembt1 42589_rv630_rrg_1.01o.pdf

This creates a PostScript file named '42589_rv630_rrg_1.01o.ps'. We don't need any fonts, thats the reason for the flags.

Download two Perl scripts I wrote: parse_ati_pdf, and ati_pdf_to_html. To run parse_ati_pdf you'll need these Perl modules: Data::Dump, List::Util and List::MoreUtils.

Run like this:

perl parse_ati_pdf > ati_pdf_data
perl ati_pdf_to_html ati_pdf_data > output.html

Result

Further work

It is left as an exercise for the reader to modifier the parser to work with the other PDF.