[TriLUG] Where do I get PDF internals information?

Steve Litt slitt at troubleshooters.com
Thu Mar 15 10:12:38 EDT 2007


Hi all,

Where do I get PDF internals information? I've been writing a C program to 
tweak my Ebooks, and have suddenly come to the realization that PDF files are 
a lot more than a list of objects -- there's a hierarchy, there are linked 
lists of objects, there are keywords (beyond obj and endobj, stream and 
endstream), all horizontal and hierarchical links point bidirectionally, and 
there's lots of redundancy. Any modification made to any object changes the 
byte offset, so that in the byte address table in the xref section, such 
changes must be accounted for.

Perhaps worst, different PDF files use different keywords, so following the 
hierarchy to a page, a font, or content, is not easy.

And it's even more complex than what I just described, but I don't understand 
it.

Anyone know where I can learn more about PDF internals. Reverse engineering it 
with Vim can only get me so far, and I'm there :-).

Thanks

SteveT

Steve Litt
Author: Universal Troubleshooting Process books and courseware
http://www.troubleshooters.com/



More information about the TriLUG mailing list