View Single Post
  #6   Report Post  
Posted to microsoft.public.excel.programming
David Hart David Hart is offline
external usenet poster
 
Posts: 1
Default use vba to open pdf then save as xml

Peter,

Have you had any luck with this? I have a hobby of collecting acronyms for a new job I have taken have created an excel spreadsheet that will alphabetize and show the definition of an acronym (if I already have it) or I can post a new acronym or jump to any particular spot in my 'database'. But I am looking to make a leap to being able to open acronym containing files (pdfs, csvs, the like), extract all the new info into my database.

Sincerely,
Dave

On Friday, December 02, 2011 1:34 AM Peter Jamieson wrote:


I can open manually my pdf file then choose: File- Save As- Tables in
Excel Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract what I
need.
The above works manually and as I now receive a bunch of pdf's every couple
of days it would
be nice to automate in VBA the creation of the xml files from the pdf's if
possible

So far I have the following code that runs in VBA and opens the pdf file but
I am
unable to figure out the next step of how I can now save it as the xml file.
Any help appreciated!
Cheers, Peter

Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias "ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:\temp\Test.pdf", vbNullString,
"C:\", SW_SHOWNORMAL
End Sub



On Friday, December 02, 2011 8:22 AM Auric__ wrote:


Peter Jamieson wrote:


Jesus. You need to take Outlook Express out back and shoot it.


Do you know anything about programming? The above looks like it was copy-
pasted from elsewhere with little understanding of its purpose. Still...


(Caveat: I am using Excel 2000, which does not support PDF or XML. Some of my
info I had to get off MSDN.)

The easiest way to see how to automate this would be to find the "record
macro" menu item (in 2000, it is Tools - Macro - Record New Macro...; no
idea about modern versions but the web says "Developer tab, in the Code
group, click Macros") and then run through the process once. Stop recording
after you have saved the XML file, then open the VBA editor and look through
your recorded code.

Doing that for saving a normal workbook as text gives me this (after I
removed unnecessary comments, named arguments, and unneeded args):
Sub Macro1()
Workbooks.Open "E:\MS download links.xls"
ActiveWorkbook.SaveAs "E:\MS download links.txt", xlText
End Sub

You should be able to do something similar for PDF-XML.

To do all PDFs in a given directory, you could do something like this:
Sub convertPdfToXml()
cd "C:\path\to\the\PDF files"
fPdf = Dir("*.pdf")
While Len(fPdf)
Workbooks.Open fPdf
fXml = Replace(fPdf, ".pdf", ".xlsx", Compa=vbTextCompare)
ActiveWorkbook.SaveAs fXml, FileFormat:=xlOpenXMLWorkbook
ActiveWorkbook.Close
fPdf = Dir
Wend
End Sub

This *might* work for you IF you do the following:
- fix the path containing the PDF files ("C:\path\to\the\PDF files")
- confirm the "Workbooks.Open" line works as-is
- confirm the extension of the XML files (I used ".xlsx"; change as needed)
- check the "ActiveWorkbook.SaveAs" FileFormat (maybe xlXMLSpreadsheet?)

Since I cannot test this, it is up to you... but this is at least a start in
the right direction.

--
Things do not get better as they get older.
Look at your truck.
Look at your roof.
Look in the MIRROR!



On Friday, December 02, 2011 12:00 PM Peter Jamieson wrote:


Hi Auric__,
Thx for your suggestions.
Unfortunately the: Workbooks.Open fPdf only gets what looks like some
formatting
syntax from the pdf file, none of the text.
I tried this suggestion of yours in Excel 2007 and 2003.

Your other idea to record a macro did not work: the code got to the
directory OK
but could not open the pdf so no progress.
Appreciate you help, Cheers, Peter



On Friday, December 02, 2011 4:06 PM GS wrote:


Peter Jamieson presented the following explanation :

Why do you think Excel can open a PDF file? I'd think you need a
software that is able to convert a PDF to your chosen format if capable
of doing so. The structure of a PDF is not something Excel or VBA can
handle same as it would a text file!

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc



On Friday, December 02, 2011 7:45 PM Peter Jamieson wrote:


Thx Garry,
Yes I am aware of that and have used various pdf to xl converters but
none of them were able to perform well enough for what I need.
I am not trying to open the pdf in Excel directly, I am trying to use VBA to
manipulate the existing manual conversion process.
When I manually convert the pdf with "Save As Tables in Excel Spreadsheet"
(within the adobe exec) the resultant .xml files are then easily parsed by
my
VBA code, far more accurately than any of the bought or free pdf to excel
converters I have tried.
An alternative is maybe Perl or AutoIt but I was hoping to keep everything
within the Excel/VBA project.
Anyway, thanks for your interest, cheers, Peter