Home |
Search |
Today's Posts |
#10
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Sat, 7 Feb 2009 07:34:41 -0800 (PST), Akrobrat
wrote: Ron, Sorry, by error I meant that there wasn't a match for either of the examples. In fact, the last Pattern suggestion stalled Excel / VBA. I do have a reference set to MS VBScript Regex 5.5, and I don't know if this matters much, but I use a Windows XP SP2 machine. Instead of the posting the contents of strPageContent (very long), here is the URL to the webpage that it contains: http://www.bestbuy.com/site/olspage....egories&ks=960 I'll be AWOL for the next 36 hours but I appreciate your help! I don't believe the problem is in your regex pattern. I obtained the data by going to the above URL, executing the View Source command in FireFox; copying that data, and then processing it within VBA. The expected URL was returned. (I had to do the extraction entirely within VBA as the page source is too long to fit into an Excel cell. The following worked fine, after copying the data to the clipboard and also setting a reference to Microsoft Forms 2.0 Object Library: ============================= Option Explicit Sub GetClipboardData() Dim PageSource As String Dim myData As DataObject Set myData = New DataObject myData.GetFromClipboard PageSource = myData.GetText Debug.Print MovieURL(PageSource) End Sub '--------------------------------------- Function MovieURL(str As String) As String Dim myRegExp, myMatches, ResultString Set myRegExp = New RegExp myRegExp.MultiLine = True myRegExp.IgnoreCase = True myRegExp.Global = False myRegExp.Pattern = "<a\s*href=""([^""]+)""\s*class=""prodlink""" Set myMatches = myRegExp.Execute(str) If myMatches.Count = 1 Then MovieURL = myMatches(0).SubMatches(0) End If End Function ============================== Of interest, in the particular page you sent me to, there are 11 different URL snippets that meet your criteria of ending with classid="prodlink" However, unlike your example, all of them include this string in the middle: ;jsessionid=BGJY2UTT13ORTKC4D3IFAGA Also, they differ by the skuid. I'm guessing you don't want the jsessionid information. A slight change will exclude that from the result: ============================================== Option Explicit Sub GetClipboardData() Dim PageSource As String Dim myData As DataObject Set myData = New DataObject myData.GetFromClipboard PageSource = myData.GetText Debug.Print MovieURL(PageSource) End Sub '----------------------------------- Function MovieURL(str As String) As String Dim myRegExp As RegExp Dim myMatches As MatchCollection Set myRegExp = New RegExp myRegExp.IgnoreCase = True myRegExp.Pattern = _ """([^""]+)(;jsessionid[^""]+)(\?[^""]+)(?=""\s*class=""prodlink"")" If myRegExp.Test(str) = True Then Set myMatches = myRegExp.Execute(str) MovieURL = myMatches(0).SubMatches(0) & _ myMatches(0).SubMatches(2) End If End Function ==================================== --ron |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Get rid of with regular expressions | Excel Discussion (Misc queries) | |||
Regular expressions | Excel Programming | |||
Using Regular Expressions with VBA | Excel Programming | |||
Regular expressions | Excel Programming | |||
VBA and Regular expressions | Excel Programming |