LinkBack Thread Tools Search this Thread Display Modes
Prev Previous Post   Next Post Next
  #10   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default VBA Regular Expressions & URL Extraction

On Sat, 7 Feb 2009 07:34:41 -0800 (PST), Akrobrat
wrote:

Ron,

Sorry, by error I meant that there wasn't a match for either of the
examples. In fact, the last Pattern suggestion stalled Excel / VBA.

I do have a reference set to MS VBScript Regex 5.5, and I don't know
if this matters much, but I use a Windows XP SP2 machine.
Instead of the posting the contents of strPageContent (very long),
here is the URL to the webpage that it contains:
http://www.bestbuy.com/site/olspage....egories&ks=960

I'll be AWOL for the next 36 hours but I appreciate your help!


I don't believe the problem is in your regex pattern.

I obtained the data by going to the above URL, executing the View Source
command in FireFox; copying that data, and then processing it within VBA. The
expected URL was returned.

(I had to do the extraction entirely within VBA as the page source is too long
to fit into an Excel cell.

The following worked fine, after copying the data to the clipboard and also
setting a reference to Microsoft Forms 2.0 Object Library:

=============================
Option Explicit
Sub GetClipboardData()
Dim PageSource As String
Dim myData As DataObject
Set myData = New DataObject
myData.GetFromClipboard
PageSource = myData.GetText
Debug.Print MovieURL(PageSource)
End Sub
'---------------------------------------
Function MovieURL(str As String) As String
Dim myRegExp, myMatches, ResultString
Set myRegExp = New RegExp
myRegExp.MultiLine = True
myRegExp.IgnoreCase = True
myRegExp.Global = False
myRegExp.Pattern = "<a\s*href=""([^""]+)""\s*class=""prodlink"""
Set myMatches = myRegExp.Execute(str)
If myMatches.Count = 1 Then
MovieURL = myMatches(0).SubMatches(0)
End If
End Function
==============================

Of interest, in the particular page you sent me to, there are 11 different URL
snippets that meet your criteria of ending with classid="prodlink"

However, unlike your example, all of them include this string in the middle:

;jsessionid=BGJY2UTT13ORTKC4D3IFAGA

Also, they differ by the skuid.

I'm guessing you don't want the jsessionid information.

A slight change will exclude that from the result:

==============================================
Option Explicit
Sub GetClipboardData()
Dim PageSource As String
Dim myData As DataObject
Set myData = New DataObject
myData.GetFromClipboard

PageSource = myData.GetText

Debug.Print MovieURL(PageSource)
End Sub
'-----------------------------------
Function MovieURL(str As String) As String
Dim myRegExp As RegExp
Dim myMatches As MatchCollection

Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Pattern = _
"""([^""]+)(;jsessionid[^""]+)(\?[^""]+)(?=""\s*class=""prodlink"")"

If myRegExp.Test(str) = True Then
Set myMatches = myRegExp.Execute(str)
MovieURL = myMatches(0).SubMatches(0) & _
myMatches(0).SubMatches(2)
End If
End Function
====================================
--ron


 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Get rid of with regular expressions Howdy Excel Discussion (Misc queries) 1 January 18th 10 07:42 PM
Regular expressions Dave Excel Programming 5 September 20th 07 02:46 PM
Using Regular Expressions with VBA Andrew Hall NZ Excel Programming 5 November 21st 06 09:30 PM
Regular expressions JeffMelton Excel Programming 2 March 1st 06 12:52 AM
VBA and Regular expressions Friedrich Muecke Excel Programming 3 October 3rd 03 01:46 AM


All times are GMT +1. The time now is 07:35 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"