Thread: scraping PDFs
View Single Post
  #1   Report Post  
Posted to microsoft.public.excel.programming
Jeff[_66_] Jeff[_66_] is offline
external usenet poster
 
Posts: 3
Default scraping PDFs

Hello,

I have a stack of PDFs (created electronically thankfully) that I need to parse a bit of text from. Been looking through the forum and PlanetPDF a bit for solutions, most posts are for working with Distiller the other way 'round, or outdated.

My current solution, which 'works' in a grim fashion, is to ducttape the handy pdftohtml (http://pdftohtml.sourceforge.net/) to a vba call, then parse one of the resulting html frames.

It ain't pretty, so I wondered how others might've approached this?

Thanks for your insights.