View Single Post
  #1   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Ron Rosenfeld Ron Rosenfeld is offline
external usenet poster
 
Posts: 5,651
Default Macro to apply parsing rules for strings and list the substrings

On Wed, 15 Apr 2009 10:49:37 -0700 (PDT), Luciano Paulino da Silva
wrote:

Dear All,
I'm looking for detect and list the substrings (bellow cell "A3") for
a given string (on cell "A1") that must be generated for one or more
general rule(s). I have almost 100 different rules that can be applied
alone or together one to the other. Do you have any idea about how
could I do that? The rules are listed bellow and I put an example.

Rules:
According to these rules, letters in a string undergoing parse are
designated in the left or right direction from the parsed letter.
There are some exceptions related to the presence of one or more
letters sorrounding some specific point of parse.

Parsing rules:

Rule Parse where? Exceptions
1 Right side of K or R if P is Right to K or
R
2 Right side of K or R
3 Right side of K or R if P is Right to
K or R; after K in CKY, DKD, CKH, CKD, KKR; after R in RRH, RRR, CRK,
DRD, RRF, KRR
4 Right side of K
5 Left side of K
6 Right side of M
7 Right side of R if P is Right to R
8 Left side of D
9 Left side of D, Right side of K
10 Left side of D or E
11 Right side of E if P is Right to E, or if
E is Right to E
12 Right side of D or E if P is Right to D or
E, or if E is Right to D or E
13 Right side of D, E and K if P is Right to D or E,
or if E is Right to D or E
14 Right side of F, L, M, W, Y if P is Right to F,
L, M, W, Y, if P is Left to Y
15 Right side of F, Y, W if P is Right to F,
Y, W, if P is Left to Y
16 Right side of K, R, F, Y, W if P is Right to K, R,
F, Y, W, if P is Left to Y
17 Right side of F, L
18 Right side of F, L, W, Y, A, E, Q
19 Right side of A, F, Y, W, L, I, V
20 Left side of A, F, I, L, M, V if D or E is Left to
A, F, I, L, M, V


Examples:
String:

AGFSAFSAHASGASHSGHHSRASAKSASFDDAKPASASAFDAGSRPASS DADASAPSASDASDASSRADSKADSKK

Using Rule 1:

AGFSAFSAHASGASHSGHHSR
ASAK
SASFDDAKPASASAFDAGSRPASSDADASAPSASDASDASSR
ADSK
ADSK
K

Using Rule 8:
AGFSAFSAHASGASHSGHHSRASAKSASF
D
DAKPASASAF
DAGSRPASS
DA
DASAPSAS
DAS
DASSRA
DSKA
DSKK

Using Rules 1 and 17
AGF
SAF
SAHASGASHSGHHSR
ASAK
SASF
DDAKPASASAF
DAGSRPASSDADASAPSASDASDASSR
ADSK
ADSK
K


You could implement each rule as a Regular Expression, and then select the
rule(s) you wish to apply.

For example, a routine with the Regular Expressions for rules 1,2,8,17 already
figured out for you, and which allows the application of multiple rules on a
given string, might look like the code below.

You select a cell which contains your string and then execute the macro. You
can enter one or more rules -- enter them by number <space separated.

The macro will then apply each rule in order, and output the results in the
rows below your selected cell, skipping the row immediately below (so that if
your data is in A1, the strings will start in A3.

There are other methods of implementing your rules than Regular Expressions,
but with so many rules, this would be the simplest for me.

=================================
Option Explicit
Sub ParseSpecial()
Dim c As Range
Dim i As Long, j As Long
Dim vRule As Variant
Dim aRule(1 To 100) As String
Dim aResRule1() As String
Dim aResRule2() As String
Dim re As Object, mc As Object, m As Object

'Rules
'construct so that all strings will have at least
'one match
'store each rule number in the corresponding element of
'the aRule array

aRule(1) = "([^KR]|[KR]P)+[KR]?|[KR]"
aRule(2) = "[^KR]+[KR]?|[KR]"
aRule(8) = "D?[^D]+|D"
aRule(17) = "[^FL]+[FL]?|[FL]"

vRule = _
Split(InputBox("Rule Number (for multiple rules, separate with space): "))

Set c = Selection 'or whatever

If c.Count < 1 Then
MsgBox ("Can only select one cell")
'but could add code to iterate through a
' bunch of cells
Exit Sub
End If

'add more code to ensure validity of rule number
' --------------------------
ReDim aResRule1(0)
aResRule1(0) = c.Value
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = False
re.Global = True

For j = 0 To UBound(vRule)
re.Pattern = aRule(vRule(j))
ReDim aResRule2(UBound(aResRule1))
'move current results to aResRule2
For i = 0 To UBound(aResRule1)
aResRule2(i) = aResRule1(i)
Next i
'clear out aResRule1
ReDim aResRule1(0)
For i = 0 To UBound(aResRule2)
Set mc = re.Execute(aResRule2(i))
For Each m In mc
If Len(aResRule1(0)) 0 Then
ReDim Preserve aResRule1(UBound(aResRule1) + 1)
End If
aResRule1(UBound(aResRule1)) = m
Next m
Next i
Next j
'clear and write results below
WriteResults aResRule1, c.Offset(2, 0)
End Sub
Sub WriteResults(res, rDest As Range)
'clear out range for results
Range(rDest, rDest.End(xlDown)).Clear
Dim i As Long
For i = 0 To UBound(res)
rDest(i + 1, 1).Value = res(i)
Next i
With rDest(i + 1, 1)
.Value = "End of List of Strings"
.Font.Italic = True
.Font.Bold = True
.Font.Color = vbRed
End With
End Sub
=============================
--ron