View Single Post
  #2   Report Post  
Posted to microsoft.public.excel.programming
Auric__ Auric__ is offline
external usenet poster
 
Posts: 538
Default use vba to open pdf then save as xml

Peter Jamieson wrote:

I can open manually my pdf file then choose: File- Save As- Tables in
Excel Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract
what I need.
The above works manually and as I now receive a bunch of pdf's every
couple of days it would
be nice to automate in VBA the creation of the xml files from the pdf's
if possible

So far I have the following code that runs in VBA and opens the pdf file
but I am
unable to figure out the next step of how I can now save it as the xml
file. Any help appreciated!


Jesus. You need to take Outlook Express out back and shoot it.

Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias "ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:\temp\Test.pdf",
vbNullString,
"C:\", SW_SHOWNORMAL
End Sub


Do you know anything about programming? The above looks like it was copy-
pasted from elsewhere with little understanding of its purpose. Still...


(Caveat: I'm using Excel 2000, which doesn't support PDF or XML. Some of my
info I had to get off MSDN.)

The easiest way to see how to automate this would be to find the "record
macro" menu item (in 2000, it's Tools - Macro - Record New Macro...; no
idea about modern versions but the web says "Developer tab, in the Code
group, click Macros") and then run through the process once. Stop recording
after you've saved the XML file, then open the VBA editor and look through
your recorded code.

Doing that for saving a normal workbook as text gives me this (after I
removed unnecessary comments, named arguments, and unneeded args):
Sub Macro1()
Workbooks.Open "E:\MS download links.xls"
ActiveWorkbook.SaveAs "E:\MS download links.txt", xlText
End Sub

You should be able to do something similar for PDF-XML.

To do all PDFs in a given directory, you could do something like this:
Sub convertPdfToXml()
cd "C:\path\to\the\PDF files"
fPdf = Dir("*.pdf")
While Len(fPdf)
Workbooks.Open fPdf
fXml = Replace(fPdf, ".pdf", ".xlsx", Compa=vbTextCompare)
ActiveWorkbook.SaveAs fXml, FileFormat:=xlOpenXMLWorkbook
ActiveWorkbook.Close
fPdf = Dir
Wend
End Sub

This *might* work for you IF you do the following:
- fix the path containing the PDF files ("C:\path\to\the\PDF files")
- confirm the "Workbooks.Open" line works as-is
- confirm the extension of the XML files (I used ".xlsx"; change as needed)
- check the "ActiveWorkbook.SaveAs" FileFormat (maybe xlXMLSpreadsheet?)

Since I can't test this, it's up to you... but this is at least a start in
the right direction.

--
Things don't get better as they get older.
Look at your truck.
Look at your roof.
Look in the MIRROR!