[TriLUG] To The Oracle:

Alan Porter porter at trilug.org
Wed Apr 30 19:52:34 EDT 2014


> Our takeaway was that PDF is a print-ready document format, that makes
> no attempt to preserve the human-readable information that it contains
> in a consistent, extractable way.  We gave up and found other ways to
> get what we were after.

You mat have better luck rendering the document to a bitmap and then
running OCR on that.

Alan






More information about the TriLUG mailing list