Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 10
Default VBA Regular Expressions & URL Extraction

Greetings all,

I am trying to extract the URLs of a set of animated movies off
various sites using regular expressions and then dump those URLs into
an Excel document (via VBA). I have a decent grasp of regex but I
have hit a brick wall lately with a particular site. I have
experimented with a number of patterns but cannot yet get the correct
result.

The expected result is:
/site/olspage.jsp?skuId=8936896&st=Transformers+Wide screen&type=product&id=1754542


However, if I do get a non-null result back, it is usually:
http://www.bestbuy.com/site/olspage....ry&id=cat00000



---------------------- Sample Patterns Tested:
----------------------
..Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?prodlink\W? "
..Pattern = "\<a\s+href=""([A-Za-z0-9/;&\.\?\+-=]+)""\s+class"
..Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?\w\W?"



---------------------- Partial Source Data (from website):
----------------------

<div class="logo"
<a href="http://www.bestbuy.com/site/olspage.jsp?
type=category&id=cat00000" name="&lid=hdr_logo"<img src="http://
images.bestbuy.com:80/BestBuy_US/en_US/images/global/header/logo.gif"
alt="Best Buy Logo"/</a
</div

<td class="skucontent"

<a href="/site/olspage.jsp?skuId=8936896&amp;st=Transformers
+Widescreen&amp;type=product&amp;id=1754542" class="prodlink"
Transformers - Widescreen Dubbed Subtitle AC3</a<br/

---------------------- ---------------------- ----------------------

I'm most interested in utilizing the [class="prodlink"] string as this
is the tag that labels a movie URL. I know that regex in VBA can be a
bit tricky owing to the use of double quotes and other non-alpha
characters, but can any of you guys spot what I'm doing wrong? Thanks
for your help!

  #2   Report Post  
Posted to microsoft.public.excel.programming
jaf jaf is offline
external usenet poster
 
Posts: 300
Default VBA Regular Expressions & URL Extraction

Hi,
If.
Let me stress that.
IF this were a hyperlink on a webpage "/site/olspage.jsp" would be pointing to the subfolder "/site" and the file "olspage.jsp" on
the WEBSERVER and would function.

As a hyperlink in Excel this will not work without the http://www. unless you have a folder on your hard drive named "/site" and
Excel can find it without the rest of the path.

John


"Akrobrat" wrote in message ...
Greetings all,

I am trying to extract the URLs of a set of animated movies off
various sites using regular expressions and then dump those URLs into
an Excel document (via VBA). I have a decent grasp of regex but I
have hit a brick wall lately with a particular site. I have
experimented with a number of patterns but cannot yet get the correct
result.

The expected result is:
/site/olspage.jsp?skuId=8936896&amp;st=Transformers+Wide screen&amp;type=product&amp;id=1754542


However, if I do get a non-null result back, it is usually:
http://www.bestbuy.com/site/olspage....ry&id=cat00000



---------------------- Sample Patterns Tested:
----------------------
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?prodlink\W? "
.Pattern = "\<a\s+href=""([A-Za-z0-9/;&\.\?\+-=]+)""\s+class"
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?\w\W?"



---------------------- Partial Source Data (from website):
----------------------

<div class="logo"
<a href="http://www.bestbuy.com/site/olspage.jsp?
type=category&id=cat00000" name="&lid=hdr_logo"<img src="http://
images.bestbuy.com:80/BestBuy_US/en_US/images/global/header/logo.gif"
alt="Best Buy Logo"/</a
</div

<td class="skucontent"

<a href="/site/olspage.jsp?skuId=8936896&amp;st=Transformers
+Widescreen&amp;type=product&amp;id=1754542" class="prodlink"
Transformers - Widescreen Dubbed Subtitle AC3</a<br/

---------------------- ---------------------- ----------------------

I'm most interested in utilizing the [class="prodlink"] string as this
is the tag that labels a movie URL. I know that regex in VBA can be a
bit tricky owing to the use of double quotes and other non-alpha
characters, but can any of you guys spot what I'm doing wrong? Thanks
for your help!


  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 10
Default VBA Regular Expressions & URL Extraction

Jaf,

Good point. I do indeed concatenate "http://..." with the partial URL
and that works provided that I extract the correct URL. Thanks.

- Dilan
  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default VBA Regular Expressions & URL Extraction

On Fri, 6 Feb 2009 13:20:31 -0800 (PST), Akrobrat
wrote:

Greetings all,

I am trying to extract the URLs of a set of animated movies off
various sites using regular expressions and then dump those URLs into
an Excel document (via VBA). I have a decent grasp of regex but I
have hit a brick wall lately with a particular site. I have
experimented with a number of patterns but cannot yet get the correct
result.

The expected result is:
/site/olspage.jsp?skuId=8936896&amp;st=Transformers+Wide screen&amp;type=product&amp;id=1754542


However, if I do get a non-null result back, it is usually:
http://www.bestbuy.com/site/olspage....ry&id=cat00000



---------------------- Sample Patterns Tested:
----------------------
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?prodlink\W? "
.Pattern = "\<a\s+href=""([A-Za-z0-9/;&\.\?\+-=]+)""\s+class"
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?\w\W?"



---------------------- Partial Source Data (from website):
----------------------

<div class="logo"
<a href="http://www.bestbuy.com/site/olspage.jsp?
type=category&id=cat00000" name="&lid=hdr_logo"<img src="http://
images.bestbuy.com:80/BestBuy_US/en_US/images/global/header/logo.gif"
alt="Best Buy Logo"/</a
</div

<td class="skucontent"

<a href="/site/olspage.jsp?skuId=8936896&amp;st=Transformers
+Widescreen&amp;type=product&amp;id=1754542" class="prodlink"
Transformers - Widescreen Dubbed Subtitle AC3</a<br/

---------------------- ---------------------- ----------------------

I'm most interested in utilizing the [class="prodlink"] string as this
is the tag that labels a movie URL. I know that regex in VBA can be a
bit tricky owing to the use of double quotes and other non-alpha
characters, but can any of you guys spot what I'm doing wrong? Thanks
for your help!



Try this:

===================
Function MovieURL(str As String) As String
Dim myRegExp, myMatches, ResultString
Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Pattern = "href=""([^""]+)""\s*class=""prodlink"""
Set myMatches = myRegExp.Execute(str)
If myMatches.Count = 1 Then
MovieURL = myMatches(0).SubMatches(0)
End If
End Function
=====================
--ron
  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default VBA Regular Expressions & URL Extraction

On Fri, 06 Feb 2009 20:55:05 -0500, Ron Rosenfeld
wrote:

===================
Function MovieURL(str As String) As String
Dim myRegExp, myMatches, ResultString
Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Pattern = "href=""([^""]+)""\s*class=""prodlink"""
Set myMatches = myRegExp.Execute(str)
If myMatches.Count = 1 Then
MovieURL = myMatches(0).SubMatches(0)
End If
End Function
=====================


A bit simpler, but works on your example:

===================
Option Explicit
Function MovieURL(str As String) As String
Dim myRegExp As Object, myMatches As Object
Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.Pattern = "[^""]+(?=""\s*class=""prodlink"")"
If myRegExp.Test(str) = True Then
Set myMatches = myRegExp.Execute(str)
MovieURL = myMatches(0).Value
End If
End Function
=====================

By the way, both of my variants depend on there not being quotes (") within the
extracted portion.
--ron


  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default VBA Regular Expressions & URL Extraction

On Fri, 6 Feb 2009 13:20:31 -0800 (PST), Akrobrat
wrote:

---------------------- Sample Patterns Tested:
----------------------
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?prodlink\W? "
.Pattern = "\<a\s+href=""([A-Za-z0-9/;&\.\?\+-=]+)""\s+class"
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?\w\W?"


I did not tease this apart, but:

To include " in VBA, you need to "double" them.

The angle bracket is not a meta-character and the initial "\" should be
removed. (Unless you are trying to match a beginning of word boundary, in
which case you could use "\b(?=\w)" )

Dot will not match newline in VBScript or JavaScript flavors. Since your data
extends over several lines, it is likely that your "." is not doing what you
expect.
--ron
  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 10
Default VBA Regular Expressions & URL Extraction

Ron,

Thanks for the info. As you suggested, I tried:
..Pattern = "href=""([^""]+)""\s*class=""prodlink"""
.... but it didn't work. I too suspected the dot might be a source of
trouble, but I've implemented it successfully on other movie sites.

So that I'm not providing false assumptions, here is my code in
context. Note that I typically ignore the case and avoid a global
search because I only want the first accurate match - which is my core
problem. How do I get the regex pattern to match the HREF tag when it
contains the CLASS="prodlink" tag? I realize the code could be neater
(and I appreciate any advice), but all I want to do right now is
develop an accurate pattern.


--------------------------------------

Private Sub getBestBuy(strItem As String, lngRow As Long, IE As
InternetExplorer)

Dim RegexURL As RegExp
Dim RegexMatch As MatchCollection
Dim strURL As String

Set RegexURL = New RegExp

With RegexURL
.MultiLine = True
.IgnoreCase = True
.Global = False
'The following pattern was recommended by Ron from Google
Groups
.Pattern = "<a\s*href=""([^""]+)""\s*class=""prodlink"""
End With

With IE
.Visible = True
'Replace spaces with plus signs in URL as per BB's approach
.Navigate "http://www.bestbuy.com/site/olspage.jsp?
_dyncharset=ISO-8859-1&id=pcat17071&type=page&st=" & Replace(strItem,
" ", "+") & "&sc=Global&cp=1&nrp=15&sp=&qp=&list=n&iht=y&usc=A ll
+Categories&ks=960"
End With

'Load the page completely
Do Until IE.ReadyState = READYSTATE_COMPLETE And IE.Busy = False
Loop

'Get the URL of the first listed item, which is usually the
most accurate based on the keyword search
strPageContent = IE.Document.body.innerHTML

If RegexURL.Test(strPageContent) Then
Set RegexMatch = RegexURL.Execute(strPageContent)
Range("K" & lngRow).Value = "http://www.bestbuy.com" &
RegexMatch(0).SubMatches(0)
End If

'//--------------- Rest of code continues here ---------------//

End Sub

  #8   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default VBA Regular Expressions & URL Extraction

On Sat, 7 Feb 2009 06:30:19 -0800 (PST), Akrobrat
wrote:

Thanks for the info. As you suggested, I tried:
.Pattern = "href=""([^""]+)""\s*class=""prodlink"""
... but it didn't work. I too suspected the dot might be a source of
trouble, but I've implemented it successfully on other movie sites.


Hmmm.

What does "didn't work" mean? No match or an error message?

That pattern works fine here using the data you provided, as well as a minor
variation I included in the data.

I assume you have a reference set to "Microsoft VBScript Regular Expressions"
5.5 or else you'd be getting some error message from VBA.

In this context, setting the Multiline and Global parameters is irrelevant, but
harmless.

Perhaps if you posted the full contents of strPageContent I could make some
further suggestions.

Or perhaps you could try the Replace method I posted in a different context.
--ron
  #9   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 10
Default VBA Regular Expressions & URL Extraction


Hmmm.

What does "didn't work" mean? *No match or an error message?

That pattern works fine here using the data you provided, as well as a minor
variation I included in the data.

I assume you have a reference set to "Microsoft VBScriptRegularExpressions"
5.5 or else you'd be getting some error message fromVBA.

In this context, setting the Multiline and Global parameters is irrelevant, but
harmless.

Perhaps if you posted the full contents of strPageContent I could make some
further suggestions.

Or perhaps you could try the Replace method I posted in a different context.

  #10   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default VBA Regular Expressions & URL Extraction

On Sat, 7 Feb 2009 07:34:41 -0800 (PST), Akrobrat
wrote:

Ron,

Sorry, by error I meant that there wasn't a match for either of the
examples. In fact, the last Pattern suggestion stalled Excel / VBA.

I do have a reference set to MS VBScript Regex 5.5, and I don't know
if this matters much, but I use a Windows XP SP2 machine.
Instead of the posting the contents of strPageContent (very long),
here is the URL to the webpage that it contains:
http://www.bestbuy.com/site/olspage....egories&ks=960

I'll be AWOL for the next 36 hours but I appreciate your help!


I don't believe the problem is in your regex pattern.

I obtained the data by going to the above URL, executing the View Source
command in FireFox; copying that data, and then processing it within VBA. The
expected URL was returned.

(I had to do the extraction entirely within VBA as the page source is too long
to fit into an Excel cell.

The following worked fine, after copying the data to the clipboard and also
setting a reference to Microsoft Forms 2.0 Object Library:

=============================
Option Explicit
Sub GetClipboardData()
Dim PageSource As String
Dim myData As DataObject
Set myData = New DataObject
myData.GetFromClipboard
PageSource = myData.GetText
Debug.Print MovieURL(PageSource)
End Sub
'---------------------------------------
Function MovieURL(str As String) As String
Dim myRegExp, myMatches, ResultString
Set myRegExp = New RegExp
myRegExp.MultiLine = True
myRegExp.IgnoreCase = True
myRegExp.Global = False
myRegExp.Pattern = "<a\s*href=""([^""]+)""\s*class=""prodlink"""
Set myMatches = myRegExp.Execute(str)
If myMatches.Count = 1 Then
MovieURL = myMatches(0).SubMatches(0)
End If
End Function
==============================

Of interest, in the particular page you sent me to, there are 11 different URL
snippets that meet your criteria of ending with classid="prodlink"

However, unlike your example, all of them include this string in the middle:

;jsessionid=BGJY2UTT13ORTKC4D3IFAGA

Also, they differ by the skuid.

I'm guessing you don't want the jsessionid information.

A slight change will exclude that from the result:

==============================================
Option Explicit
Sub GetClipboardData()
Dim PageSource As String
Dim myData As DataObject
Set myData = New DataObject
myData.GetFromClipboard

PageSource = myData.GetText

Debug.Print MovieURL(PageSource)
End Sub
'-----------------------------------
Function MovieURL(str As String) As String
Dim myRegExp As RegExp
Dim myMatches As MatchCollection

Set myRegExp = New RegExp
myRegExp.IgnoreCase = True
myRegExp.Pattern = _
"""([^""]+)(;jsessionid[^""]+)(\?[^""]+)(?=""\s*class=""prodlink"")"

If myRegExp.Test(str) = True Then
Set myMatches = myRegExp.Execute(str)
MovieURL = myMatches(0).SubMatches(0) & _
myMatches(0).SubMatches(2)
End If
End Function
====================================
--ron


  #11   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default VBA Regular Expressions & URL Extraction

On Fri, 6 Feb 2009 13:20:31 -0800 (PST), Akrobrat
wrote:

Greetings all,

I am trying to extract the URLs of a set of animated movies off
various sites using regular expressions and then dump those URLs into
an Excel document (via VBA). I have a decent grasp of regex but I
have hit a brick wall lately with a particular site. I have
experimented with a number of patterns but cannot yet get the correct
result.

The expected result is:
/site/olspage.jsp?skuId=8936896&amp;st=Transformers+Wide screen&amp;type=product&amp;id=1754542


However, if I do get a non-null result back, it is usually:
http://www.bestbuy.com/site/olspage....ry&id=cat00000



---------------------- Sample Patterns Tested:
----------------------
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?prodlink\W? "
.Pattern = "\<a\s+href=""([A-Za-z0-9/;&\.\?\+-=]+)""\s+class"
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?\w\W?"



---------------------- Partial Source Data (from website):
----------------------

<div class="logo"
<a href="http://www.bestbuy.com/site/olspage.jsp?
type=category&id=cat00000" name="&lid=hdr_logo"<img src="http://
images.bestbuy.com:80/BestBuy_US/en_US/images/global/header/logo.gif"
alt="Best Buy Logo"/</a
</div

<td class="skucontent"

<a href="/site/olspage.jsp?skuId=8936896&amp;st=Transformers
+Widescreen&amp;type=product&amp;id=1754542" class="prodlink"
Transformers - Widescreen Dubbed Subtitle AC3</a<br/

---------------------- ---------------------- ----------------------

I'm most interested in utilizing the [class="prodlink"] string as this
is the tag that labels a movie URL. I know that regex in VBA can be a
bit tricky owing to the use of double quotes and other non-alpha
characters, but can any of you guys spot what I'm doing wrong? Thanks
for your help!



And here's another version that might work a bit better, depending on your
specific requirements. It has no problem with embedded quotes in the URL. This
uses the Replace method to get rid of everything else.

==============================
Option Explicit
Function MovieURL(str As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.IgnoreCase = True
re.Pattern = _
"[\s\S]*<a\shref=""([\s\S]+)""\s*class=""prodlink""[\s\S]*"
MovieURL = re.Replace(str, "$1")
End Function
==============================
--ron
  #12   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default VBA Regular Expressions & URL Extraction

On Sat, 07 Feb 2009 09:48:29 -0500, Ron Rosenfeld
wrote:

On Fri, 6 Feb 2009 13:20:31 -0800 (PST), Akrobrat
wrote:

Greetings all,

I am trying to extract the URLs of a set of animated movies off
various sites using regular expressions and then dump those URLs into
an Excel document (via VBA). I have a decent grasp of regex but I
have hit a brick wall lately with a particular site. I have
experimented with a number of patterns but cannot yet get the correct
result.

The expected result is:
/site/olspage.jsp?skuId=8936896&amp;st=Transformers+Wide screen&amp;type=product&amp;id=1754542


However, if I do get a non-null result back, it is usually:
http://www.bestbuy.com/site/olspage....ry&id=cat00000



---------------------- Sample Patterns Tested:
----------------------
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?prodlink\W? "
.Pattern = "\<a\s+href=""([A-Za-z0-9/;&\.\?\+-=]+)""\s+class"
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?\w\W?"



---------------------- Partial Source Data (from website):
----------------------

<div class="logo"
<a href="http://www.bestbuy.com/site/olspage.jsp?
type=category&id=cat00000" name="&lid=hdr_logo"<img src="http://
images.bestbuy.com:80/BestBuy_US/en_US/images/global/header/logo.gif"
alt="Best Buy Logo"/</a
</div

<td class="skucontent"

<a href="/site/olspage.jsp?skuId=8936896&amp;st=Transformers
+Widescreen&amp;type=product&amp;id=1754542" class="prodlink"
Transformers - Widescreen Dubbed Subtitle AC3</a<br/

---------------------- ---------------------- ----------------------

I'm most interested in utilizing the [class="prodlink"] string as this
is the tag that labels a movie URL. I know that regex in VBA can be a
bit tricky owing to the use of double quotes and other non-alpha
characters, but can any of you guys spot what I'm doing wrong? Thanks
for your help!



And here's another version that might work a bit better, depending on your
specific requirements. It has no problem with embedded quotes in the URL. This
uses the Replace method to get rid of everything else.

==============================
Option Explicit
Function MovieURL(str As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.IgnoreCase = True
re.Pattern = _
"[\s\S]*<a\shref=""([\s\S]+)""\s*class=""prodlink""[\s\S]*"
MovieURL = re.Replace(str, "$1")
End Function
==============================
--ron


Well, given your data and noting that there are multiple matches, I probably
wouldn't use this method.
--ron
  #13   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,934
Default VBA Regular Expressions & URL Extraction

The expected result is:
/site/olspage.jsp?skuId=8936896&amp;st=Transformers+Wide screen&amp;type=product&amp;id=1754542


Does it have to be a Regular Expression solution? This function will return
the first prodlink site (your expected result above from the Partial Source
Data text you posted)...

Function GetURL(S As String) As String
GetURL = Split(S, "class=""prodlink""", , vbTextCompare)(0)
GetURL = Mid(GetURL, InStrRev(GetURL, "href=") + 6)
GetURL = Replace(Replace(GetURL, vbLf, ""), vbCr, "")
GetURL = Trim(Replace(GetURL, """", ""))
End Function

--
Rick (MVP - Excel)


"Akrobrat" wrote in message
...
Greetings all,

I am trying to extract the URLs of a set of animated movies off
various sites using regular expressions and then dump those URLs into
an Excel document (via VBA). I have a decent grasp of regex but I
have hit a brick wall lately with a particular site. I have
experimented with a number of patterns but cannot yet get the correct
result.

The expected result is:
/site/olspage.jsp?skuId=8936896&amp;st=Transformers+Wide screen&amp;type=product&amp;id=1754542


However, if I do get a non-null result back, it is usually:
http://www.bestbuy.com/site/olspage....ry&id=cat00000



---------------------- Sample Patterns Tested:
----------------------
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?prodlink\W? "
.Pattern = "\<a\s+href=""([A-Za-z0-9/;&\.\?\+-=]+)""\s+class"
.Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?\w\W?"



---------------------- Partial Source Data (from website):
----------------------

<div class="logo"
<a href="http://www.bestbuy.com/site/olspage.jsp?
type=category&id=cat00000" name="&lid=hdr_logo"<img src="http://
images.bestbuy.com:80/BestBuy_US/en_US/images/global/header/logo.gif"
alt="Best Buy Logo"/</a
</div

<td class="skucontent"

<a href="/site/olspage.jsp?skuId=8936896&amp;st=Transformers
+Widescreen&amp;type=product&amp;id=1754542" class="prodlink"
Transformers - Widescreen Dubbed Subtitle AC3</a<br/

---------------------- ---------------------- ----------------------

I'm most interested in utilizing the [class="prodlink"] string as this
is the tag that labels a movie URL. I know that regex in VBA can be a
bit tricky owing to the use of double quotes and other non-alpha
characters, but can any of you guys spot what I'm doing wrong? Thanks
for your help!


  #14   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 10
Default VBA Regular Expressions & URL Extraction

Ron, my approach with Regular Expression was to capture the *first*
occurence of the HREF tag with the CLASS="prodlink" tag. However, you
might be right in that there's more going on here than expected - no
matter how much I tweak the regex, the returned value is either
incorrect or null. Your pattern suggestions will be useful for other
sites I hope to cover soon.

Rick, I don't have to use regex for this task, but I found that it was
a good solution early on in my project. Your function (with a couple
of modifications to handle the full HTML source) helped me solve the
problem.

Thank you immensely to both of you for helping with this challenge!
  #15   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default VBA Regular Expressions & URL Extraction

On Mon, 9 Feb 2009 08:04:33 -0800 (PST), Akrobrat
wrote:

Ron, my approach with Regular Expression was to capture the *first*
occurence of the HREF tag with the CLASS="prodlink" tag. However, you
might be right in that there's more going on here than expected - no
matter how much I tweak the regex, the returned value is either
incorrect or null. Your pattern suggestions will be useful for other
sites I hope to cover soon.

Rick, I don't have to use regex for this task, but I found that it was
a good solution early on in my project. Your function (with a couple
of modifications to handle the full HTML source) helped me solve the
problem.

Thank you immensely to both of you for helping with this challenge!


Glad you got it working. But it would be interesting to figure out why the
regex approach isn't working, especially since I was using the same data
source.
--ron


  #16   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 10
Default VBA Regular Expressions & URL Extraction

Glad you got it working. *But it would be interesting to figure out why the
regex approach isn't working, especially since I was using the same data
source.
--ron


Ron, I still don't know why the regex worked for you and not for me.
I sit behind a large corporation's firewall and use IE 6.0 that's been
slightly customized for the company's purposes. Could that have
something to do with it? Outside of that, I can't imagine what could
be wrong. Everything else is pretty standard - I use Win XP, Excel
2003 and the Microsoft VBScript Regular Expressions 5.5 library.

Thanks again.
  #17   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,651
Default VBA Regular Expressions & URL Extraction

On Thu, 12 Feb 2009 07:07:07 -0800 (PST), Akrobrat
wrote:

Glad you got it working. *But it would be interesting to figure out why the
regex approach isn't working, especially since I was using the same data
source.
--ron


Ron, I still don't know why the regex worked for you and not for me.
I sit behind a large corporation's firewall and use IE 6.0 that's been
slightly customized for the company's purposes. Could that have
something to do with it? Outside of that, I can't imagine what could
be wrong. Everything else is pretty standard - I use Win XP, Excel
2003 and the Microsoft VBScript Regular Expressions 5.5 library.

Thanks again.


I suppose there could be something about the data, but don't know what it is.

I tried again, using mostly the code you had previously posted, with some
variation to use data that had been copied to the clipboard, instead of
obtained via IE, and also to print the result to the immediate window, instead
of the worksheet, and it continues to work OK here.
--ron
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Get rid of with regular expressions Howdy Excel Discussion (Misc queries) 1 January 18th 10 07:42 PM
Regular expressions Dave Excel Programming 5 September 20th 07 02:46 PM
Using Regular Expressions with VBA Andrew Hall NZ Excel Programming 5 November 21st 06 09:30 PM
Regular expressions JeffMelton Excel Programming 2 March 1st 06 12:52 AM
VBA and Regular expressions Friedrich Muecke Excel Programming 3 October 3rd 03 01:46 AM


All times are GMT +1. The time now is 10:29 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"