Home |
Search |
Today's Posts |
#11
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
On Fri, 6 Feb 2009 13:20:31 -0800 (PST), Akrobrat
wrote: Greetings all, I am trying to extract the URLs of a set of animated movies off various sites using regular expressions and then dump those URLs into an Excel document (via VBA). I have a decent grasp of regex but I have hit a brick wall lately with a particular site. I have experimented with a number of patterns but cannot yet get the correct result. The expected result is: /site/olspage.jsp?skuId=8936896&st=Transformers+Wide screen&type=product&id=1754542 However, if I do get a non-null result back, it is usually: http://www.bestbuy.com/site/olspage....ry&id=cat00000 ---------------------- Sample Patterns Tested: ---------------------- .Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?prodlink\W? " .Pattern = "\<a\s+href=""([A-Za-z0-9/;&\.\?\+-=]+)""\s+class" .Pattern = "\<a\s+href=\W?(.*?)\W?\s?class=\W?\w\W?" ---------------------- Partial Source Data (from website): ---------------------- <div class="logo" <a href="http://www.bestbuy.com/site/olspage.jsp? type=category&id=cat00000" name="&lid=hdr_logo"<img src="http:// images.bestbuy.com:80/BestBuy_US/en_US/images/global/header/logo.gif" alt="Best Buy Logo"/</a </div <td class="skucontent" <a href="/site/olspage.jsp?skuId=8936896&st=Transformers +Widescreen&type=product&id=1754542" class="prodlink" Transformers - Widescreen Dubbed Subtitle AC3</a<br/ ---------------------- ---------------------- ---------------------- I'm most interested in utilizing the [class="prodlink"] string as this is the tag that labels a movie URL. I know that regex in VBA can be a bit tricky owing to the use of double quotes and other non-alpha characters, but can any of you guys spot what I'm doing wrong? Thanks for your help! And here's another version that might work a bit better, depending on your specific requirements. It has no problem with embedded quotes in the URL. This uses the Replace method to get rid of everything else. ============================== Option Explicit Function MovieURL(str As String) As String Dim re As Object Set re = CreateObject("vbscript.regexp") re.Global = True re.IgnoreCase = True re.Pattern = _ "[\s\S]*<a\shref=""([\s\S]+)""\s*class=""prodlink""[\s\S]*" MovieURL = re.Replace(str, "$1") End Function ============================== --ron |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Get rid of with regular expressions | Excel Discussion (Misc queries) | |||
Regular expressions | Excel Programming | |||
Using Regular Expressions with VBA | Excel Programming | |||
Regular expressions | Excel Programming | |||
VBA and Regular expressions | Excel Programming |