Greetings all,
I am trying to extract the URLs of a set of animated movies off
various sites using regular expressions and then dump those URLs into
an Excel document (via VBA). I have a decent grasp of regex but I
have hit a brick wall lately with a particular site. I have
experimented with a number of patterns but cannot yet get the correct
result.
The expected result is:
/site/olspage.jsp?skuId=8936896&st=Transformers+Wide screen&type=product&id=1754542
However, if I do get a non-null result back, it is usually:
http://www.bestbuy.com/site/olspage....ry&id=cat00000
---------------------- Sample Patterns Tested:
----------------------
..Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?prodlink\W? "
..Pattern = "\<a\s+href=""([A-Za-z0-9/;&\.\?\+-=]+)""\s+class"
..Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?\w\W?"
---------------------- Partial Source Data (from website):
----------------------
<div class="logo"
<a href="http://www.bestbuy.com/site/olspage.jsp?
type=category&id=cat00000" name="&lid=hdr_logo"<img src="http://
images.bestbuy.com:80/BestBuy_US/en_US/images/global/header/logo.gif"
alt="Best Buy Logo"/</a
</div
<td class="skucontent"
<a href="/site/olspage.jsp?skuId=8936896&st=Transformers
+Widescreen&type=product&id=1754542" class="prodlink"
Transformers - Widescreen Dubbed Subtitle AC3</a<br/
---------------------- ---------------------- ----------------------
I'm most interested in utilizing the [class="prodlink"] string as this
is the tag that labels a movie URL. I know that regex in VBA can be a
bit tricky owing to the use of double quotes and other non-alpha
characters, but can any of you guys spot what I'm doing wrong? Thanks
for your help!