ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Programming (https://www.excelbanter.com/excel-programming/)
-   -   Regex Pattern (https://www.excelbanter.com/excel-programming/429573-regex-pattern.html)

Geoff K

Regex Pattern
 
Hi
The aim is to identify anomalous strings in user data using a Regex pattern
but as a complete newbie I have got bogged down.
My test code is below but as you will see there are some "bad" results.
The criteria - a string can be:
text only = Head, Dept Head
text plus numerics = Class 8-10, Class 8, 6th form, Head KS6
a single dot
If numerics are included then:
a single or double digit must be followed by a hyphen or an ordinal =
1st, 10-12, 15th
there must be no lead zeros = 0 or 01270
ordinals must not be hyphenated = 2nd-3rd, 2nd-4

My Regex doesn't yet include a test for text only and wondered if that had
to be a separate process?

I would be grateful of any help to complete the pattern.

T.I.A.

Geoff
Sub TestPattern()

Dim objRegExp As Object
Dim i As Long

'''some test and result strings
Dim arrString() As Variant
Dim arrResult() As Variant
arrString = Array("String", "0", "class 1-", "8-", "8-10", _
"12", "12-15", "165", "12-165", "0 12", "Class 8", "1st class", _
"2nd class", "3rd class", "6th form", "40th class", "400th class", _
"Head", "Head KS6", ".")
arrResult = Array("Result", "ok", "ok", "ok", "ok", "ok", "ok", _
"ok", "bad", "bad", "bad", "ok", "ok", "ok", "ok", "ok", "ok", _
"bad", "bad", "bad")

'''set pattern
Set objRegExp = CreateObject("Vbscript.RegExp")
With objRegExp
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "\b[1-9](-|st|nd|rd|th)\b|\b[1-9][0-9](|-|st|nd|rd|th)\b"
End With

With Sheets(1)

'''setup test strings
.Range("a:c").ClearContents
.Range("a:c").NumberFormat = "@"
.Cells(1, "B") = "Regex Test"
For i = LBound(arrString) To UBound(arrString)
.Cells(i + 1, "A") = arrString(i)
.Cells(i + 1, "C") = arrResult(i)
Next

'''run test pattern
For i = LBound(arrString) + 1 To UBound(arrString)
If objRegExp.Test(.Cells(i + 1, 1)) Then
.Cells(i + 1, "B") = "valid"
Else
.Cells(i + 1, "B") = "invalid"
End If
Next

End With

End Sub

r

Regex Pattern
 
you must use in excel application?

Your example is not clear to me

you can make a list of cases ok and a
list of cases bad with poor reasoning

in the list example set all your text between ""
I'm sorry for my english

regards
r

Il mio ultimo lavoro ...
http://excelvba.altervista.org/blog/...ternative.html


"Geoff K" wrote:

Hi
The aim is to identify anomalous strings in user data using a Regex pattern
but as a complete newbie I have got bogged down.
My test code is below but as you will see there are some "bad" results.
The criteria - a string can be:
text only = Head, Dept Head
text plus numerics = Class 8-10, Class 8, 6th form, Head KS6
a single dot
If numerics are included then:
a single or double digit must be followed by a hyphen or an ordinal =
1st, 10-12, 15th
there must be no lead zeros = 0 or 01270
ordinals must not be hyphenated = 2nd-3rd, 2nd-4

My Regex doesn't yet include a test for text only and wondered if that had
to be a separate process?

I would be grateful of any help to complete the pattern.

T.I.A.

Geoff
Sub TestPattern()

Dim objRegExp As Object
Dim i As Long

'''some test and result strings
Dim arrString() As Variant
Dim arrResult() As Variant
arrString = Array("String", "0", "class 1-", "8-", "8-10", _
"12", "12-15", "165", "12-165", "0 12", "Class 8", "1st class", _
"2nd class", "3rd class", "6th form", "40th class", "400th class", _
"Head", "Head KS6", ".")
arrResult = Array("Result", "ok", "ok", "ok", "ok", "ok", "ok", _
"ok", "bad", "bad", "bad", "ok", "ok", "ok", "ok", "ok", "ok", _
"bad", "bad", "bad")

'''set pattern
Set objRegExp = CreateObject("Vbscript.RegExp")
With objRegExp
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "\b[1-9](-|st|nd|rd|th)\b|\b[1-9][0-9](|-|st|nd|rd|th)\b"
End With

With Sheets(1)

'''setup test strings
.Range("a:c").ClearContents
.Range("a:c").NumberFormat = "@"
.Cells(1, "B") = "Regex Test"
For i = LBound(arrString) To UBound(arrString)
.Cells(i + 1, "A") = arrString(i)
.Cells(i + 1, "C") = arrResult(i)
Next

'''run test pattern
For i = LBound(arrString) + 1 To UBound(arrString)
If objRegExp.Test(.Cells(i + 1, 1)) Then
.Cells(i + 1, "B") = "valid"
Else
.Cells(i + 1, "B") = "invalid"
End If
Next

End With

End Sub


Ron Rosenfeld

Regex Pattern
 
On Tue, 9 Jun 2009 06:24:01 -0700, Geoff K
wrote:

Hi
The aim is to identify anomalous strings in user data using a Regex pattern
but as a complete newbie I have got bogged down.
My test code is below but as you will see there are some "bad" results.
The criteria - a string can be:
text only = Head, Dept Head
text plus numerics = Class 8-10, Class 8, 6th form, Head KS6
a single dot
If numerics are included then:
a single or double digit must be followed by a hyphen or an ordinal =
1st, 10-12, 15th
there must be no lead zeros = 0 or 01270
ordinals must not be hyphenated = 2nd-3rd, 2nd-4

My Regex doesn't yet include a test for text only and wondered if that had
to be a separate process?

I would be grateful of any help to complete the pattern.

T.I.A.

Geoff



It shouldn't be that hard to do, but your criteria are not congruent with your
examples.

For example, you give as an example in the text plus numerics category
"Class 8" and "Head KS6". But your criteria state that if numerics are
included, a single or double digit must be followed by a hyphen or an ordinal.
That is not present in these examples.

You also give "Class 8-10" as an example. But the 10 is not followed by a
hyphen or an ordinal.

In addition, are these single entries within a cell; or are there multiple
entries within a cell as shown on the various lines you have provided?





--ron

Ron Rosenfeld

Regex Pattern
 
On Wed, 10 Jun 2009 03:51:01 -0700, Geoff K
wrote:

Ron
In running through the databases with the new pattern I now realise the
significance of KS3, KS4, KS5 etc.
KS is an acroynym for Key Stage and this appears quite frequently in the
lists. It would be very useful to "allow" this through and I wondered if you
would be kind enough to amend the pattern accordingly.
It can take the form of KS3 Manager, Manager KS4 or expanded to Key Stage 5
Manager or Manager Key Stage 6.

Is this possible?

Re speed, I don't have a problem.

Thank you so far.

Geoff


The entries where the string ended with a digit should have been being flagged
as OK.

I don't have time to work on this for the next few days, but if your valid
phrases are all as above, you could just add, via alternation, to the existing
regex.

e.g. (for one digit KS entries):

(^|[^-])[1-9]\d?(st|nd|rd|th)(?!-)|((^|\D)[1-9]\d?-[1-9]\d?(?=\D|$))|(^(\D+|\.)$)|([a-z]\s?)[1-9]\d?$|KS[1-9]|Stage\s+[1-9]

(for two digit KS entries, just add the \d? after each [1-9] as in

KS[1-9]\d?
--ron

Geoff K

Regex Pattern
 
Ron

Great.

Thank you for the help and example. That is excellent advice. I can
progress from there - i hope <g

Geoff

Ron Rosenfeld

Regex Pattern
 
On Wed, 10 Jun 2009 07:13:04 -0700, Geoff K
wrote:

Ron

Great.

Thank you for the help and example. That is excellent advice. I can
progress from there - i hope <g

Geoff


Glad to help. Thanks for the feedback.
--ron


All times are GMT +1. The time now is 04:40 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com