LinkBack Thread Tools Search this Thread Display Modes
Prev Previous Post   Next Post Next
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 100
Default Regex syntax request for help

I'm parsing an HTML file, and originally, I thought I only needed to capture
all the links- the following worked well in my particular application
(sample HTML snippet pasted at bottom of post):
^<A HREF=.*

However, now I've found that I only need to capture and process certain
links. The information that will determine whether a link needs to be
processed is buried between the original link and the next link (or EOF), so
I need to capture a larger (multiline) section of text and test each one to
see if it contains my identifier. It appears that I'm safe using the </TR
tag as something that always comes after my new identifier and before the
next link (or EOF). So, I'm trying to edit my regex so I can grab this
larger (multiline) section of text, then if the identifier is the correct
one, I'll use my first regex expression or a slightly modified version to
grab just the URL from within the match.

I've been using http://www.aivosto.com/vbtips/regex.html as a helpful source
on regex expressions, but when I test my code on
http://regexlib.com/RETester.aspx I'm getting no results (my first
expression worked fine). Any assistance would be greatly appreciated. I
think I'm pretty close, but the following isn't working:
^<A HREF=.*/TR

Any advice? The only difference is replacing the single '' with '/TR'. I
suspect it may have to do with spaces or linebreaks, but I don't know for
certain.

I'm posting a sample of my much larger HTML below; I'm trying to only
capture the ^<A HREF=.* URL match for items where the class td includes
"Land Spread Vector".

I prefer using multiple simple Regex expressions versus one donated
expression that does it all, so I can understand my own code and at least
attempt to troubleshoot if I need to change anything.


Thanks!
Keith


<A Href=javascript:openDocument('0900043d802b3528');

<img src=/OurDir/images/formats/f_msw8_16.gif border=0 align=left width=16

&nbsp;101998

</a

</td

<td class='classtd'

Green-tipped Martin

</td

<td class='classtd'

CURRENT,3.2

</td



</TR



<TR

<TD</TD

<TD

<A Href=javascript:openDocument('0900043d803a1ce4');

<img src=/OurDir/images/formats/f_msw8_16.gif border=0 align=left width=16

&nbsp;101998 - APRRE - Assert.doc

</a

</td

<td class='classtd'

Land Spread Vector

</td

<td class='classtd'

CURRENT,3.0

</td



</TR



<TR

<TD</TD

<TD

<A Href=javascript:openDocument('0900043d802b635e');

<img src=/OurDir/images/formats/f_msw8_16.gif border=0 align=left width=16

&nbsp;101998-R

</a

</td

<td class='classtd'

Reevaluation

</td

<td class='classtd'

CURRENT,1.0

</td



</TR

</TD</TR</TABLE<BR<BR

<CENTER

<A Href='javascript:history.back();'<img
src='/OurDir/images/back_down.jpg' border=0 align='center'
alt='Back'</A&nbsp;

<A Href='javascript:goHome();'<img
src='/OurDir/images/home_down.jpg' border=0 align='center' alt='Home'</A

</CENTER

</BODY

</HTML



 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
application.match with multi-dimensional arrays (syntax request) Keith R Excel Programming 4 June 28th 07 09:37 PM
Help with a Regex Pattern [email protected] Excel Programming 11 April 30th 07 01:49 AM
Regex techniques Dave Runyan Excel Programming 5 April 28th 07 12:17 AM
RegEx to parse something like this... R Avery Excel Programming 2 March 7th 05 06:41 PM
Regex Question William Barnes Excel Programming 5 January 2nd 04 11:57 AM


All times are GMT +1. The time now is 09:43 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"