Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9
Default use vba to open pdf then save as xml

I can open manually my pdf file then choose: File- Save As- Tables in
Excel Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract what I
need.
The above works manually and as I now receive a bunch of pdf's every couple
of days it would
be nice to automate in VBA the creation of the xml files from the pdf's if
possible

So far I have the following code that runs in VBA and opens the pdf file but
I am
unable to figure out the next step of how I can now save it as the xml file.
Any help appreciated!
Cheers, Peter

Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias "ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:\temp\Test.pdf", vbNullString,
"C:\", SW_SHOWNORMAL
End Sub


  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 538
Default use vba to open pdf then save as xml

Peter Jamieson wrote:

I can open manually my pdf file then choose: File- Save As- Tables in
Excel Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract
what I need.
The above works manually and as I now receive a bunch of pdf's every
couple of days it would
be nice to automate in VBA the creation of the xml files from the pdf's
if possible

So far I have the following code that runs in VBA and opens the pdf file
but I am
unable to figure out the next step of how I can now save it as the xml
file. Any help appreciated!


Jesus. You need to take Outlook Express out back and shoot it.

Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias "ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:\temp\Test.pdf",
vbNullString,
"C:\", SW_SHOWNORMAL
End Sub


Do you know anything about programming? The above looks like it was copy-
pasted from elsewhere with little understanding of its purpose. Still...


(Caveat: I'm using Excel 2000, which doesn't support PDF or XML. Some of my
info I had to get off MSDN.)

The easiest way to see how to automate this would be to find the "record
macro" menu item (in 2000, it's Tools - Macro - Record New Macro...; no
idea about modern versions but the web says "Developer tab, in the Code
group, click Macros") and then run through the process once. Stop recording
after you've saved the XML file, then open the VBA editor and look through
your recorded code.

Doing that for saving a normal workbook as text gives me this (after I
removed unnecessary comments, named arguments, and unneeded args):
Sub Macro1()
Workbooks.Open "E:\MS download links.xls"
ActiveWorkbook.SaveAs "E:\MS download links.txt", xlText
End Sub

You should be able to do something similar for PDF-XML.

To do all PDFs in a given directory, you could do something like this:
Sub convertPdfToXml()
cd "C:\path\to\the\PDF files"
fPdf = Dir("*.pdf")
While Len(fPdf)
Workbooks.Open fPdf
fXml = Replace(fPdf, ".pdf", ".xlsx", Compa=vbTextCompare)
ActiveWorkbook.SaveAs fXml, FileFormat:=xlOpenXMLWorkbook
ActiveWorkbook.Close
fPdf = Dir
Wend
End Sub

This *might* work for you IF you do the following:
- fix the path containing the PDF files ("C:\path\to\the\PDF files")
- confirm the "Workbooks.Open" line works as-is
- confirm the extension of the XML files (I used ".xlsx"; change as needed)
- check the "ActiveWorkbook.SaveAs" FileFormat (maybe xlXMLSpreadsheet?)

Since I can't test this, it's up to you... but this is at least a start in
the right direction.

--
Things don't get better as they get older.
Look at your truck.
Look at your roof.
Look in the MIRROR!
  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9
Default use vba to open pdf then save as xml

Hi Auric__,
Thx for your suggestions.
Unfortunately the: Workbooks.Open fPdf only gets what looks like some
formatting
syntax from the pdf file, none of the text.
I tried this suggestion of yours in Excel 2007 and 2003.

Your other idea to record a macro did not work: the code got to the
directory OK
but could not open the pdf so no progress.
Appreciate you help, Cheers, Peter



"Auric__" wrote in message
44.100...
Peter Jamieson wrote:

I can open manually my pdf file then choose: File- Save As- Tables in
Excel Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract
what I need.
The above works manually and as I now receive a bunch of pdf's every
couple of days it would
be nice to automate in VBA the creation of the xml files from the pdf's
if possible

So far I have the following code that runs in VBA and opens the pdf file
but I am
unable to figure out the next step of how I can now save it as the xml
file. Any help appreciated!


Jesus. You need to take Outlook Express out back and shoot it.

Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias "ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:\temp\Test.pdf",
vbNullString,
"C:\", SW_SHOWNORMAL
End Sub


Do you know anything about programming? The above looks like it was copy-
pasted from elsewhere with little understanding of its purpose. Still...


(Caveat: I'm using Excel 2000, which doesn't support PDF or XML. Some of
my
info I had to get off MSDN.)

The easiest way to see how to automate this would be to find the "record
macro" menu item (in 2000, it's Tools - Macro - Record New Macro...; no
idea about modern versions but the web says "Developer tab, in the Code
group, click Macros") and then run through the process once. Stop
recording
after you've saved the XML file, then open the VBA editor and look through
your recorded code.

Doing that for saving a normal workbook as text gives me this (after I
removed unnecessary comments, named arguments, and unneeded args):
Sub Macro1()
Workbooks.Open "E:\MS download links.xls"
ActiveWorkbook.SaveAs "E:\MS download links.txt", xlText
End Sub

You should be able to do something similar for PDF-XML.

To do all PDFs in a given directory, you could do something like this:
Sub convertPdfToXml()
cd "C:\path\to\the\PDF files"
fPdf = Dir("*.pdf")
While Len(fPdf)
Workbooks.Open fPdf
fXml = Replace(fPdf, ".pdf", ".xlsx", Compa=vbTextCompare)
ActiveWorkbook.SaveAs fXml, FileFormat:=xlOpenXMLWorkbook
ActiveWorkbook.Close
fPdf = Dir
Wend
End Sub

This *might* work for you IF you do the following:
- fix the path containing the PDF files ("C:\path\to\the\PDF files")
- confirm the "Workbooks.Open" line works as-is
- confirm the extension of the XML files (I used ".xlsx"; change as
needed)
- check the "ActiveWorkbook.SaveAs" FileFormat (maybe xlXMLSpreadsheet?)

Since I can't test this, it's up to you... but this is at least a start in
the right direction.

--
Things don't get better as they get older.
Look at your truck.
Look at your roof.
Look in the MIRROR!



  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default use vba to open pdf then save as xml

Peter Jamieson presented the following explanation :
I can open manually my pdf file then choose: File- Save As- Tables in Excel
Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract what I
need.
The above works manually and as I now receive a bunch of pdf's every couple
of days it would
be nice to automate in VBA the creation of the xml files from the pdf's if
possible

So far I have the following code that runs in VBA and opens the pdf file but
I am
unable to figure out the next step of how I can now save it as the xml file.
Any help appreciated!
Cheers, Peter

Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias "ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:\temp\Test.pdf", vbNullString,
"C:\", SW_SHOWNORMAL
End Sub


Why do you think Excel can open a PDF file? I'd think you need a
software that is able to convert a PDF to your chosen format if capable
of doing so. The structure of a PDF isn't something Excel or VBA can
handle same as it would a text file!

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9
Default use vba to open pdf then save as xml

Thx Garry,
Yes I am aware of that and have used various pdf to xl converters but
none of them were able to perform well enough for what I need.
I am not trying to open the pdf in Excel directly, I am trying to use VBA to
manipulate the existing manual conversion process.
When I manually convert the pdf with "Save As Tables in Excel Spreadsheet"
(within the adobe exec) the resultant .xml files are then easily parsed by
my
VBA code, far more accurately than any of the bought or free pdf to excel
converters I have tried.
An alternative is maybe Perl or AutoIt but I was hoping to keep everything
within the Excel/VBA project.
Anyway, thanks for your interest, cheers, Peter


"GS" wrote in message ...
Peter Jamieson presented the following explanation :
I can open manually my pdf file then choose: File- Save As- Tables in
Excel Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract what
I need.
The above works manually and as I now receive a bunch of pdf's every
couple of days it would
be nice to automate in VBA the creation of the xml files from the pdf's
if possible

So far I have the following code that runs in VBA and opens the pdf file
but I am
unable to figure out the next step of how I can now save it as the xml
file.
Any help appreciated!
Cheers, Peter

Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias "ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:\temp\Test.pdf", vbNullString,
"C:\", SW_SHOWNORMAL
End Sub


Why do you think Excel can open a PDF file? I'd think you need a software
that is able to convert a PDF to your chosen format if capable of doing
so. The structure of a PDF isn't something Excel or VBA can handle same as
it would a text file!

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc






  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default use vba to open pdf then save as xml

Peter,

Have you had any luck with this? I have a hobby of collecting acronyms for a new job I have taken have created an excel spreadsheet that will alphabetize and show the definition of an acronym (if I already have it) or I can post a new acronym or jump to any particular spot in my 'database'. But I am looking to make a leap to being able to open acronym containing files (pdfs, csvs, the like), extract all the new info into my database.

Sincerely,
Dave

On Friday, December 02, 2011 1:34 AM Peter Jamieson wrote:


I can open manually my pdf file then choose: File- Save As- Tables in
Excel Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract what I
need.
The above works manually and as I now receive a bunch of pdf's every couple
of days it would
be nice to automate in VBA the creation of the xml files from the pdf's if
possible

So far I have the following code that runs in VBA and opens the pdf file but
I am
unable to figure out the next step of how I can now save it as the xml file.
Any help appreciated!
Cheers, Peter

Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias "ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:\temp\Test.pdf", vbNullString,
"C:\", SW_SHOWNORMAL
End Sub



On Friday, December 02, 2011 8:22 AM Auric__ wrote:


Peter Jamieson wrote:


Jesus. You need to take Outlook Express out back and shoot it.


Do you know anything about programming? The above looks like it was copy-
pasted from elsewhere with little understanding of its purpose. Still...


(Caveat: I am using Excel 2000, which does not support PDF or XML. Some of my
info I had to get off MSDN.)

The easiest way to see how to automate this would be to find the "record
macro" menu item (in 2000, it is Tools - Macro - Record New Macro...; no
idea about modern versions but the web says "Developer tab, in the Code
group, click Macros") and then run through the process once. Stop recording
after you have saved the XML file, then open the VBA editor and look through
your recorded code.

Doing that for saving a normal workbook as text gives me this (after I
removed unnecessary comments, named arguments, and unneeded args):
Sub Macro1()
Workbooks.Open "E:\MS download links.xls"
ActiveWorkbook.SaveAs "E:\MS download links.txt", xlText
End Sub

You should be able to do something similar for PDF-XML.

To do all PDFs in a given directory, you could do something like this:
Sub convertPdfToXml()
cd "C:\path\to\the\PDF files"
fPdf = Dir("*.pdf")
While Len(fPdf)
Workbooks.Open fPdf
fXml = Replace(fPdf, ".pdf", ".xlsx", Compa=vbTextCompare)
ActiveWorkbook.SaveAs fXml, FileFormat:=xlOpenXMLWorkbook
ActiveWorkbook.Close
fPdf = Dir
Wend
End Sub

This *might* work for you IF you do the following:
- fix the path containing the PDF files ("C:\path\to\the\PDF files")
- confirm the "Workbooks.Open" line works as-is
- confirm the extension of the XML files (I used ".xlsx"; change as needed)
- check the "ActiveWorkbook.SaveAs" FileFormat (maybe xlXMLSpreadsheet?)

Since I cannot test this, it is up to you... but this is at least a start in
the right direction.

--
Things do not get better as they get older.
Look at your truck.
Look at your roof.
Look in the MIRROR!



On Friday, December 02, 2011 12:00 PM Peter Jamieson wrote:


Hi Auric__,
Thx for your suggestions.
Unfortunately the: Workbooks.Open fPdf only gets what looks like some
formatting
syntax from the pdf file, none of the text.
I tried this suggestion of yours in Excel 2007 and 2003.

Your other idea to record a macro did not work: the code got to the
directory OK
but could not open the pdf so no progress.
Appreciate you help, Cheers, Peter



On Friday, December 02, 2011 4:06 PM GS wrote:


Peter Jamieson presented the following explanation :

Why do you think Excel can open a PDF file? I'd think you need a
software that is able to convert a PDF to your chosen format if capable
of doing so. The structure of a PDF is not something Excel or VBA can
handle same as it would a text file!

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc



On Friday, December 02, 2011 7:45 PM Peter Jamieson wrote:


Thx Garry,
Yes I am aware of that and have used various pdf to xl converters but
none of them were able to perform well enough for what I need.
I am not trying to open the pdf in Excel directly, I am trying to use VBA to
manipulate the existing manual conversion process.
When I manually convert the pdf with "Save As Tables in Excel Spreadsheet"
(within the adobe exec) the resultant .xml files are then easily parsed by
my
VBA code, far more accurately than any of the bought or free pdf to excel
converters I have tried.
An alternative is maybe Perl or AutoIt but I was hoping to keep everything
within the Excel/VBA project.
Anyway, thanks for your interest, cheers, Peter




  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1
Default use vba to open pdf then save as xml

Peter Jamieson wrote on 12/02/2011 01:34 ET :
I can open manually my pdf file then choose: File- Save As- Tables in
Excel Spreadsheet (*xml) from the drop-down list.
After this I can open this file in Excel and use some VBA to extract what I
need.
The above works manually and as I now receive a bunch of pdf's every couple
of days it would
be nice to automate in VBA the creation of the xml files from the pdf's if
possible

So far I have the following code that runs in VBA and opens the pdf file but
I am
unable to figure out the next step of how I can now save it as the xml file.
Any help appreciated!
Cheers, Peter

Option Explicit
Declare Function ShellExecute Lib "shell32dll" Alias
"ShellExecuteA" ( _
ByVal hwnd As Long, _
ByVal lpOperation as String, _
ByVal lpFile as String, _
ByVal lpParameters as String, _
ByVal lpDirectory as String, _
ByVal nShowCmd as Long) As Long

Const SW_HIDE As Long = 0
Const SW_SHOWNORMAL As Long = 1
Const SW_SHOWMAXIMIZED As Long = 3
Const SW_SHOWMINIMIZED As Long = 2

sub pdf_to_xml()
ShellExecute Application.hwnd "open", "C:tempTest.pdf",
vbNullString,
"C:", SW_SHOWNORMAL
End Sub

did you find the solution,

I am too trying to do the same.....inserting a pdf than I need the output as a
single xml file.

Regards..
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Looping Open and Save as CSV Brad[_26_] Excel Programming 9 May 27th 10 01:27 AM
whenever open or save it is very slow to open Nav J Excel Discussion (Misc queries) 1 August 13th 09 02:32 PM
Open-Save Strategy Nigel[_2_] Excel Programming 2 October 10th 07 03:54 PM
Open and Save XML Eric Excel Discussion (Misc queries) 0 May 14th 05 12:28 AM
open & save old xls file quince Excel Programming 1 August 31st 04 12:36 AM


All times are GMT +1. The time now is 02:59 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"